Early fails and synchronized container creation
Currently, DaskGatewayLauncher
implements a naive throw-into-tmux
-session-and-check-whether-it-is-still-alive approach. This loses all exception information which can, in principle, be retrieved: like the failure of downloading an image or failure of starting it. It also relies on timeouts to wait until command complete. Examples how to address this:
- Careful invoking ssh commands.
def check_output(ssh, command):
stdin, stdout, stderr = ssh.exec_command(command)
code = ssh.recv_exit_status() # blocks
stdout = stdout.read()
stderr = stderr.read()
if code:
raise subprocess.CalledProcessError(code, command, stdout=stdout, stderr=stderr)
return stdout
- Blocking tmux calls until complete. For example
download_container_template = 'tmux new-window -t {session_name} "wget {url} -O {filename}; tmux wait-for -S {uuid}"; tmux wait-for {uuid}',
- Check the return code with tmux. For example
download_container_template = 'tmux new-window -t {session_name} "wget {url} -O {filename}; echo $? > /tmp/{uuid}; tmux wait-for -S {uuid}"; tmux wait-for {uuid}; exit $(cat /tmp/{uuid})'
Edited by Artem Pulkin