FYI, I played wth some resource limits on. Raincloud
Braincloud
This may result in jobs being killed -- unclear. But better than cluster being killed! :-)
Greg Kiar
@gkiar
:+1:
Eric Bridgeford
@ebridge2
Word; psyched for the ami!
Greg Kiar
@gkiar
@ebridge2, send me your pipe file please? the one you just ran
Eric Bridgeford
@ebridge2
Is on my desktop; the one I ran is exactly identical to what I sent you last night though
With the virtual free change
Greg Kiar
@gkiar
ok cool
Eric Bridgeford
@ebridge2
Is stats not in 317?
I'm the only one here
joshua vogelstein
@jovo
or 301, we haven’t decided yet where it will be
Greg Kiar
@gkiar
@alexbaden the cluster crashed again - it's completely dumping to one node, and doing so ridiculously (i.e. attempts to give the load of 740 cores to 48). We can't run m2g on bc1 until we figure this out - which I'm more than willing to help support in any way I can. @/all In the mean time I'm focusing on aws.
Worse than yesterday AM or the same?
Greg Kiar
@gkiar
same
the queue on compute1 was in some sort of error state
i need to go over and reboot compute0, will get that done this afternoon
I think we made some progress w/ braincloud. @gkiar is running some test jobs now, and we're evenly split across both compute nodes. so that's better. if this goes well we can slowly open back up to everyone!
however, it appears loni doesnt pass any helpful info to the scheduler about memory usage, etc. so i think i have sge setup to kill jobs that are going to eat all the available memory -- but im not sure that works. so you should have a high index of suspicion for (1) your jobs being killed and (2) the cluster crashing as you run stuff going forward
and i can add sge_admin to my list of "Useless Technologies" on my resume :-)
William Gray
@willgray
hmmm bc1 error for matlab (licensing):

[will@compute0 bin]$/usr/local/matlab/bin/glnxa64/need_softwareopengl: error while loading shared libraries: libGL.so.1: cannot open shared object file: No such file or directory ## MATLAB is selecting SOFTWARE OPENGL rendering. Error: Activation cannot proceed. You may either: 1. Set an X11 display, and restart the activation process 2. Use the silent activation feature 3. Activate using the license center @alexbaden ? Alex Baden @alexbaden i guess use octave isn't an okay answer i added it to my todo list. will try and look tonight, if not tomorrow AM William Gray @willgray thanks, bud Alex Baden @alexbaden my own matlab expired, too. im wondering if all ours expired whats funny is i could actually connect bc1 to our aws license server.... since its just inbound ports William Gray @willgray HA i wonder if it expired 9/1 and we didn’t know bc of the other thing i didn’t run yesterday Greg Kiar @gkiar @/all @alexbaden is a hero. The cluster seems to be working again. I ran through the workflow that @ebridge2 last made (i.e that subset of the nki-enh dataset) last night and it seems to be running to completion which is awesome. @ebridge2 this means that you should now setup a workflow with the remainder of the subjects from that dataset, using the workflow I'm about to post here as the base. @ebridge2 I did all of the first 66 subjects you originally put into the workflow (the workflow I pasted here has fewer subjects because it is just a subset I used for testing) joshua vogelstein @jovo yay @alexbaden !!!!! Eric Bridgeford @ebridge2 Awesome sounds good Alex Baden @alexbaden @/all matlab has been reactivated on braincloud1. Greg Kiar @gkiar Greg Kiar @gkiar uh oh... Greg Kiar @gkiar @/all Design decision: We will not run more than 40 subjects in a single workflow of m2g. After this point it seems to break down very quickly, for reasons I'm unsure. Moving forward, @ebridge2 , partition datasets into chunks of$\leq\$ 40 subject workflows in m2g and save them all in a 'workflows' folder within the dataset directory on bc1
William Gray
@willgray
Can you guys figure out how many submissions to scheduler happen for each subject? Is it the same for grouped workflows or those submitted through cli? I have some ideas on how to scale up
For em processing things we parcellate at about 10k jobs
Greg Kiar
@gkiar
It's way less than that. Per subject there are 13 jobs, tfr 13 submissions, so we shouldn't be even coming close to reaching that limit you use for i2g. I am pretty sure that the scheduler and LONI aren't communicating properly and when we get to tensor gen things go wonky because that's the first module which really uses a lot of ram
William Gray
@willgray
happily, @gkiar, the work you and I are doing before the end of the month will also result in a way to submit directly to sge. :)
if sge itself is messed up, we need an alex
Greg Kiar
@gkiar
:) wonderful, @willgray
arana91
@arana91
Hello Matlab programmers, Needed some guidance.
Hit me back when online!
Greg Kiar
@gkiar
Hi there. Gitter is deprecated for our communication, please email support@neurodata.io with a specific question if there is something we can help you with. Also, if you can please let us know where you found this link that would be great so that we can make that more clear for others in the future. Thanks @arana91