Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gce --quit-soon does not have expected behavior #6

Open
tahorst opened this issue Oct 29, 2021 · 1 comment
Open

gce --quit-soon does not have expected behavior #6

tahorst opened this issue Oct 29, 2021 · 1 comment
Assignees

Comments

@tahorst
Copy link
Member

tahorst commented Oct 29, 2021

Running gce --quit-soon <name> does not delete the VM after a firework has completed in some cases. I would expect a worker to check for the metadata status after each firework completes but it looks like rapidfire can launch many fireworks before returning for the metadata check here:

rocket_launcher.rapidfire(
self.launchpad, self.fireworker, strm_lvl=self.strm_lvl,
max_loops=1, sleep_time=self.sleep_secs)
# Idle to the max.
idled = self.sleep_secs # rapidfire() just slept once
while not self.launchpad.run_exists(self.fireworker): # none ready to run
future_work = self.launchpad.future_run_exists(self.fireworker) # any ready or waiting?
if idled >= (self.idle_for_waiters if future_work else self.idle_for_rockets):
return 'idle'
req = gcp.instance_attribute('quit')
if req == 'soon' or req == 'when-idle':
return '"quit={}" request'.format(req)
FW_CONSOLE_LOGGER.debug(
'Sleeping for %s secs waiting for launchable rockets',
self.sleep_secs)
time.sleep(self.sleep_secs)
idled += self.sleep_secs
req = gcp.instance_attribute('quit')
if req == 'soon':
return '"quit={}" request'.format(req)

I think the arg nlaunches=1 should be passed to rapidfire to exit after launching only one firework so we can check for the quit metadata. I think rapidfire will launch as many rockets that are waiting as it can since it looks like it skips the loop check if more fireworks are ready.
https://github.com/materialsproject/fireworks/blob/6cb2a66d35239611ec2a1ccb807be38976198a0b/fireworks/core/rocket_launcher.py#L107-L126

Is the expectation to check for the metadata after each firework or to let rapidfire launch as many as it wants before checking?

@1fish2 1fish2 self-assigned this Oct 30, 2021
@1fish2
Copy link
Contributor

1fish2 commented Oct 30, 2021

Good idea. It looks like it should set nlaunches=1 in addition to max_loops=1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants