meadowrun-manage-ec2 edit-management-lambda-config --terminate-instances-if-idle-for-secs <some large number>
(you can also give this option on install) then the machines stay around for longer. Do note that this applies to all machines indiscriminately, so if at some there are a number of machines created (because e.g. multiple people run a job at the same time), then those will all stay around for at least the idle timeout.
Kurt, you are a hero. Thank you so much for responding insanely fast! I don't think I have ever gotten my questions answered so fast. This project is really really cool and I'm excited to see it grow. I wish I had known about this project last year. I was doing a Reinforcement learning project and each trial would take over 10 hours on my macbook. So I was launching instances, SSHing into the machine, manually cloning my repo, running a trial, then later SSHing in again, and finally transfering the files to a bucket. I was running 16 different instances. It was a nightmare.
Sounds good, I will try this out! No worries with the delay. This tool has already helped me exponentially. Take care of yourself!
Also, I reran my program and it seems to be working now... I didn't run any of the commands you suggested so I'm not too sure what happened here.
EDIT: JK it just took longer to hit the error. I'm going through your troubleshooting steps now. I'll let you know what happens.
Results:
meadowrun-manage-ec2 clean
gave me a another KeyError: 'running_jobs' at "/aws_integration/management_lambdas/adjust_ec2_instances.py" meadowrun-manage-ec2 uninstall
resulted in a clean uninstall with no errors Running meadowrun-manage-ec2 install
resulted with no errors
Running my program after the reinstallation seems to be working fine. Thanks for the help Kurt. When in doubt, turn it off and then back on again.
apt
packages for installation. Not sure what you're using as deployment
option of run_map
or run_function
, but I'll assume the default which is Deployment.mirror_local()
. If you're not specifying a deployment
then that is what you're using. Deployment.mirror_local
has a keyword argument interpreter
- and interpreters have an additional_software
option which specifies extra apt packages to install. As an example, to mirror a local conda environment, and also install build-essentials: deployment=Deployment.mirror_local(interpreter=LocalCondaInterpreter(environment_name_or_path="my-conda-env", additional_software=["build-essentials"]))
Deployment.git_repo
that has similar options. I'll make a note to write a howto about this, we are lacking a bit of documentation about this feature it seems.
edit-management-lambda-config
command that allows configuring the TERMINATE_INSTANCES_IF_IDLE_FOR_SECS
parameter.