Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • 09:46
    csadorf labeled #423
  • 09:46
    csadorf milestoned #423
  • 08:27
    javierbg edited #423
  • 08:26
    javierbg opened #423
  • 06:21
    bdice edited #422
  • 06:17
    bdice synchronize #421
  • 06:17

    bdice on mkdir-p-python3

    Update changelog. (compare)

  • 06:16
    bdice synchronize #422
  • 06:16

    bdice on faster-job-init

    Update changelog. (compare)

  • 06:14
    bdice review_requested #422
  • 06:14
    bdice review_requested #422
  • 06:14
    bdice review_request_removed #422
  • 06:14
    bdice review_requested #422
  • 06:14
    bdice review_requested #422
  • 06:14
    bdice review_request_removed #422
  • 06:14
    bdice opened #422
  • 06:14

    bdice on faster-job-init

    Attempt to exit early from _ini… (compare)

  • 05:32
    bdice synchronize #421
  • 05:32

    bdice on mkdir-p-python3

    Optimize performance for _mkdir… (compare)

  • 05:12
    bdice edited #421
Vyas Ramasubramani
@vyasr
agreed, that seems to be the case. i'll try and track down what's happening there
Stephen Thomas
@amburan
Cool thanks
Vyas Ramasubramani
@vyasr
i actually think that since directives are per operation this is the expected behavior
since you are specifying executables for your individual operations, as i mentioned above they will get forked and launched with the correct python executable. the conda python will only get used for the process of determining what these ops are and launching them, which i think should be fine for your use case
your conda environment should still function on the compute nodes, right?
Stephen Thomas
@amburan
this is a funky cluster. The compute nodes and master node do not have the same environment. The compute node needs to be built to have whatever we need. Thats why I thought its simpler to just use containers
Vyas Ramasubramani
@vyasr
does your project.py require anything other than signac and signac-flow to submit? not to actually run the operations
you could still use containers to actually run your operations (your current decorators should do that), it's just that the python used for signac-flow to process your workflow would be the conda one
alternatively, if the container is available on the login node this should be fixed by just submitting using that python instead
docker run IMAGE_NAME python project.py submit... should make the submit script's run command use the python in the docker container
Stephen Thomas
@amburan
project.py only require signac and signac-flow to submit. But the thing with the uge scheduler is : once I specify #$ -soft -l docker,..., any line in the submission script without # .. is executed in the container. I think your second solution to use the python in the container to submit should work. I will try that
Andrei Berceanu
@berceanu
Hi guys :)
Andrei Berceanu
@berceanu
I have created a SLURM template for our cluster, together with a new environment.
The environment is OdinEnvironment, from src/project.py and the template is templates/odin.sh.
Now, my question is, why do I have to run pythons src/project.py submit repeateadly (for up to 4 times iirc)? If signac-flow figures out the order of the operations, why can't it submit all the jobs to the cluster at once?
Andrei Berceanu
@berceanu
I double-checked, I need to submit 4 times, and the operations for each submission are:
1. run_fbpic
2. create_dir_diags_rhos & plot_snapshots_000000
3. post_process_results
4. plot_2d_hist & plot_1d_diags & generate_movie
These are all defined in src/project.py
So basically what I want is to do python src/project.py submit only once and have all these jobs submitted, with interdependecies, ie. job 2 will only run after job 1 is done, etc.
Vyas Ramasubramani
@vyasr
@berceanu I think what you are looking for is the FlowGroup feature: https://docs.signac.io/en/latest/flow-group.html
This feature allows you to group operations together that can be submitted at once, even if only a subset are eligible. Dependencies will be resolved on-the-fly as eligible operations that are within the group complete, since those operations could result in the preconditions for other operations being satisfied.
Andrei Berceanu
@berceanu
Tnx @vyasr , that seems like a useful feature but I'm not sure how it applies to my case.
Vyas Ramasubramani
@vyasr
right now the problem you're running into is that an operation is only submitted if it is eligible to run at submit time. there is not dynamic dependency resolution (i'm assuming that you have pre- and post-conditions set that define the sequence of operations you posted above)
@berceanu with groups, you put a bunch of operations into a group and then submit the entire group. a group is composed of many operations, and a group can be submitted if any of its operations are eligible at submit time. inter-group dependencies will then be resolved when your submission script runs: flow will run any initially eligible operations, then once they are done flow will check if there are any new operations that are now eligible, and if so, run those
Andrei Berceanu
@berceanu
So basically I should put all the operations in my project in the same group in order to submit just once, right?
Vyas Ramasubramani
@vyasr
yes exactly
Andrei Berceanu
@berceanu
Now I've stumbled across a separate issue.
And the problem is, it exports visible devices in order, starting with 0.
Sometimes a node has many GPUs and some are used by other people, and this will try to run on GPU 0 even if it is in use, and for example GPU 7 is free.
Carl Simon Adorf
@csadorf
@berceanu I'm not quite sure how one would solve that problem. Have you talked to your cluster administrator on how to properly identify the GPU devices that are reserved for you?
Andrei Berceanu
@berceanu
Well there are no reserved ones, I just want to make use of the ones that are available at some given time.
Carl Simon Adorf
@csadorf
Maybe I don't understand the problem, but if you submit a cluster job, some resources are reserved for you, are they not?
Andrei Berceanu
@berceanu
Well what I mean is, some resources are free.
ie. some GPUs in this case. Not necessarily reserved for me, but free for anyone to use.
Andrei Berceanu
@berceanu
Is signac_project_document.json automatically produced by signac?
Andrei Berceanu
@berceanu
ie. is there a setting that controls it?
Brandon Butler
@b-butler
yes and it stores information regarding the project that can be accessed using project.doc[key].
Andrei Berceanu
@berceanu
What's the difference between --parallel and --test?
Sorry between --pretend and --test
Carl Simon Adorf
@csadorf
The --pretend option uses the actual environment and scheduler and will also query the scheduler to figure out what to submit, whereas --test does not do that.
Andrei Berceanu
@berceanu
I see, thanks @csadorf :)
Andrei Berceanu
@berceanu
Say I have a signac project, with 10 jobs folders inside my workspace. Each job has a out.data file. I want to copy all of these to a separate folder, appending the job hash, ie out_4fc56.dat etc.
Can I do this in a script using only signac?
Carl Simon Adorf
@csadorf
@berceanu I think this would be a minimal script to do this:
from pathlib import Path
import signac
project = signac.get_project()

for job in project: 
    src = Path(job.fn('out.data'))
    dst = Path(f'outdir/out_{job.id:.5}.dat')
    dst.write_bytes(src.read_bytes())
Andrei Berceanu
@berceanu
Awesome, thank you @csadorf
Javier Barbero
@javierbg
Hello! I'm integrating signac with my experimentation workflow and I had a small question: I have stored my dataset globally for the entire project as per the documentation (project.data), but I'm not really sure about what is the proper way to access this data from a job: the job._project attribute is protected, so I assume I cannot rely on that API, and loading the entire project from the CWD or the absolute path is cumbersome and doesn't play well with the with job: context manager. Any suggestions? Could job._project be made stable as job.project?
Carl Simon Adorf
@csadorf
@javierbg Hi! In principle you can always have a reference to "project" in your module-wide namespace and just access that. The reason that we have not yet publicly exposed job._project is because we worry that there might be some ambiguity as to whether that inverse relationship is well-defined. I think it would be super helpful if you could create an issue here: https://github.com/glotzerlab/signac/issues/new/choose with that feature request.
Vyas Ramasubramani
@vyasr
yes, please make an issue for this! i think exposing job.project is probably a reasonable plan for 2.0 based on the direction of some of our other conversations about how we would like to modify signac's data model, but i think we will need to wait until a 2.0 release since those changes will be breaking to some extent
Javier Barbero
@javierbg
@csadorf Ah, of course, that would work, it didn't occur to me. I'll create the issue!