Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Apr 15 2016 01:45
    amoise opened #49
  • Oct 09 2015 03:46
    rensa closed #48
  • Oct 09 2015 03:46
    rensa commented #48
  • Oct 09 2015 01:45
    DamienIrving commented #48
  • Oct 09 2015 00:48
    rensa opened #48
  • Sep 25 2015 02:02
    rensa commented #47
  • Sep 16 2015 05:54
    ccarouge commented #47
  • Sep 16 2015 05:50
    DamienIrving labeled #47
  • Sep 16 2015 05:50
    DamienIrving commented #47
  • Sep 16 2015 05:36
    ccarouge opened #47
  • Aug 31 2015 03:17
    cet900 commented #42
  • Aug 28 2015 05:37

    captainceramic on offscreen

    Fixed up output pattern constra… (compare)

  • Aug 21 2015 05:12

    captainceramic on offscreen

    Fixed output pattern json probl… (compare)

  • Aug 21 2015 05:00

    captainceramic on offscreen

    Fixed problems after speaking t… (compare)

  • Aug 20 2015 04:47

    captainceramic on offscreen

    Added specialised monthly tos d… Improved (slightly) the excepti… Corrected constraint names for … (compare)

  • Aug 18 2015 05:45

    captainceramic on devel

    Styling changes - mostly whites… More styling changes - mostly w… Fixed docstring typo (compare)

  • Aug 18 2015 01:37

    captainceramic on offscreen

    Added VT module for the webserv… (compare)

  • Aug 17 2015 03:45

    captainceramic on offscreen

    Added single-step Nino calculat… (compare)

  • Jul 22 2015 00:58

    captainceramic on devel

    Added the extracted files to Da… (compare)

  • Jul 12 2015 23:33

    captainceramic on master

    Added a new DatasetSummary modu… Merge branch 'master' into devel Added checks for other temporar… and 8 more (compare)

Damien Irving
@DamienIrving
@captainceramic One last thing that was on my mind. At the moment I've got VisTrails configured so that all my data gets written to my /local/ directory on the virtual desktop, because I don't care if that data disappears. Do other users regularly write data that they do care about to g/data/? And can the directories that they do write to be picked up by the CMIP5 dataset builder in VisTrails to avoid other users regenerating those same files?
Tim Bedin
@captainceramic
@DamienIrving - That is a pretty complex ongoing issue, and I think that NCI might need to be in on that discussion. As far as I understand, the /g/data file systems are really for good quality, shareable datasets, not for 'scratch' space, experimentation, sharing half-done data with colleagues etc. Can we put this on the agenda for next meeting - which I should organise!
Damien Irving
@DamienIrving
Here's a draft agenda for tomorrow's meeting (feel free to add/modify):
  1. Update on recent work (new modules, default file pattern, etc)
  2. Handling of multiple input files
  3. Process for merging devel into master branch and formal releases
  4. Where should users write their data to?
  5. Workshop in Hobart (June 5, see draft schedule)
Tim Bedin
@captainceramic
Sounds good.
Damien Irving
@DamienIrving
I'm on the line.
Damien Irving
@DamienIrving
I'm keeping track of the action items from our meetings on the development roadmap page of the wiki: https://github.com/CWSL/cwsl-mas/wiki/Development-roadmap
Damien Irving
@DamienIrving
Here's an idea: Each year I usually run a Software Carpentry workshop alongside the AMOS conference. Since there's such a short gap between the 2015 (Brisbane, July) and 2016 (Melbourne, February) conferences, I've been wondering whether I should bother running one in Melbourne. Instead, maybe we should run a CWSLab workflow tool workshop alongside the Melbourne conference? My Hobart workshop can basically be a trial run and we can tweak the format depending on how that goes. I'd be happy to help organise an AMOS workshop, and since NCI usually sponsors the AMOS conference I'm sure the conference organising committee will be happy to accommodate us.
If that sounds like a good idea I can run the idea past the conference organising committee and see what they think?
Tim Bedin
@captainceramic
I think that sounds like a really good idea, especially if you feel happy with how it goes in Hobart.
Damien Irving
@DamienIrving
Cool. I think I'll assess after Hobart and if it goes well I'll contact the conference organisers to see what they think.
Tim Bedin
@captainceramic
Good plan
Damien Irving
@DamienIrving
Is there a limit on how many VMs NCI makes available at any one time (i.e. what's the limit on the number of people who can attend my workshop in Hobart)?
Damien Irving
@DamienIrving
@captainceramic One of the people coming to the Hobart workshop next week is @mdsumner . He has a lot of useful code written in R that he might be interested in contributing to the CWSLab workflow tool, so I was wondering if you had any advice for him? ( @mdsumner - at the moment to add a module to the workflow tool it has to be executable from the command line, which is great for people who work with Python but I'm not sure that it's a workflow that users of R or MATLAB or Ferret usually go with - correct me if I'm wrong)
Tim Bedin
@captainceramic
@DamienIrving - We have got in touch with NCI and let them know that ~15 people will need access during your workshop on the 5th. If you thought there were going to be more than that we can talk them about it.
Damien Irving
@DamienIrving
@captainceramic Great, thanks. I'm doing a 20-30min general overview/demo for the regular "Data Science Hobart" Friday morning meetup, but then only 10 or so people are going to stick around to actually learn the nitty gritty details of submitting new modules. (In other words, 15 should be plenty).
Tim Bedin
@captainceramic
Sounds good. As for the restriction to executable tools, I'd certainly be interested in feedback about that. I would emphasise the primary use case for the tool which is to apply a known and well-tested script/algorithm across the whole suite of models, ensembles, variables etc. I don't think the tool is particularly useful when developing - it would be better to use interactive tools like ipython, Rstudio etc on the CWSLab desktops until you feel happy with the result, then go to the VisTrails plugin to apply it across the whole archive.
I would suggest that if you have a workflow that can easily be completed in an interactive tool, it probably won't be a great fit for the plugin. I feel that we would be picking up the kind of workflows that you would kick off with RScript (the executable R batch mode), because we don't really have the ability for things to be interactive through VisTrails
Damien Irving
@DamienIrving
Do R users regularly write RScripts? Or would this be a bit of a change to their usual workflow/thinking?
Tim Bedin
@captainceramic
That said, I have thought about a module that switches worfklows from 'file land' into 'data land' - i.e. a module that iterates of over the DataSet.files property and then opens every file, then returns a list of open cdms2 file datasets, or netCDF4 file objects.
I think that R users generally use it interactively rather than through RScript.
The key is that the workflow tool is aimed more at replacing hard-coded BASH scripts than the kind of exploratory work that involves making a plot, tweaking a few parameters and then re-making it.
Damien Irving
@DamienIrving
Totally agree. I'll just have to make it very clear (and perhaps I'll add some content to the wiki to say this) what the tool is and isn't for.
Damien Irving
@DamienIrving
I've just been chatting with an R guru over on https://gitter.im/resbaz/resbaz. It seems that it is possible to parse the command line in R scripts using external packages based on libraries such as Python's argparse (see here), but it's not something that most R users ever do (they usually call their scripts from within R). So interacting with the CWSLab workflow tool would be possible, it would require just a bit of a shift in their typical workflow
(To be fair, most Python users probably use things like the IPython notebook these days rather than command line-native scripts, so it would be a similar change in workflow/thinking)
Tim Bedin
@captainceramic
A good start for R users could be to encourage them to share their code through the ctools repo. I discussed this a fair bit with Jaci and Claire a year or so ago - we talked about the workflow of having the two sides of the lab (the interactive GUI desktop and the VisTrails plugin) complement each other, with people using VisTrails for batch-type workflows that were impractical or impossible to do through interactive interpreters.
Tim Bedin
@captainceramic
@DamienIrving - are you available to have a quick chat on the phone on Monday? I would like to discuss merging in dev into master and doing a tag (perhaps 0.9.0 or something like that). There are also a few issues that I think are ready to close, but I'd like to run them past you first.
Damien Irving
@DamienIrving
@captainceramic Yep. My calendar is completely free on Monday, so you can name your time.
Tim Bedin
@captainceramic
Awesome. 10:30? I'll send you a request (talk about overkill)
!
Tim Bedin
@captainceramic
Sorry @DamienIrving - stepped away from my desk. I'm back now.
Tim Bedin
@captainceramic
@DamienIrving - I've been working on the ensemble bug and pushed some code to speed it up a bit. I think the problem is mostly that creating full ensembles for multiple experiments involves so many files that sometimes it just takes too long and stops responding. I also seem to have hit an xml_to_nc bug, something about converting floats to doubles. Will keep looking into it.
Damien Irving
@DamienIrving
@captainceramic Cool. As well as removing unused file name patterns I'm thinking of updating some of the older modules (Climatology, Seasonal timeseries, Timeslice change, Plot Timeseries, Nino 3.4) so they're consistent with how the newer ones act/feel. In particular:
  • All can use the default file pattern except Timeslice change
  • Timeslice change just calls echo. Should we remove it from init.py until functional?
  • Climatology and Seasonal timeseries take a start and end year. I was thinking of changing that to a start and end date (YYYY-MM-DD) for consistency with other scripts. These dates should also be optional arguments since people could use Crop beforehand to extract the time period of interest. (This might mean the user needs to specify what the setdate should be rather than the script doing arithmetic to figure it out)
Hopeful this will avoid confusion and promote consistency amongst new contributors
Tim Bedin
@captainceramic
That sounds really good. Certainly get rid of the dummy Timeslice change. YYYY-MM-DD is good too.
Damien Irving
@DamienIrving
The bash script underlying Seasonal timeseries is terrifying
Tim Bedin
@captainceramic
I know! I would consider removing it altogether, it is way too long and complicated to be written in shell. I actually wrote a replacement in Python when I was at CSIRO: could be worth shooting Tim E a line and seeing if he can grab it from the CSIRO svn.
Damien Irving
@DamienIrving
Good idea. I think I will remove it for now and if I get time I'll have a search through the CSIRO svn (Tim sent me a copy of that repo)
Tim Bedin
@captainceramic
Great. It should be there - seas_vars.py I think it's called.
Damien Irving
@DamienIrving
@captainceramic I'm going to clean up the workflows repo this morning and include a few good examples to show in Hobart
Damien Irving
@DamienIrving
@captainceramic I've created a new workflow called new_nino34.vt which calculates the Nino 3.4 in the way it really should be calculated (i.e. by combining existing modules as opposed to doing it all in one long, confusing and inflexible script). When I run the individual components of that workflow at the command line they are all very quick, however when I try and execute the workflow as a whole from vistrails it stalls on the remapping step. Do you have any idea why this would be?
Background: I'm using the remapping because my Crop module doesn't recognise the funky ocean grids which have a two dimensional lat and lon variables. If I remap to a 1by1 grid first then everything is fine (this is what Jaci's group do - the first step of any of their workflows is that remapping)
Of course I could use cdo sellonlatbox which is able to handle funky grids, but I'd prefer not to write a second cropping module if I can avoid it.
Damien Irving
@DamienIrving

@captainceramic In future I should also keep workflows in my fork of cwsl-workflows until I'm sure they work. Sorry about that. In summary, here's where I'm at with workflows:

I know you've got a million other things to be doing, but would you mind having a quick look at the two failing workflows? Once those workflows are working I'll edit the README of the cwsl-workflows repo to briefly explain what each of the examples do.

Tim Bedin
@captainceramic
Sure - I'll have a look.
Tim Bedin
@captainceramic
@DamienIrving - I've made some progress with the the new_nino34.vt workflow. As I suspected, the problem is actually with regridding the incm4 model - it uses a curvilinear grid, and when I try and run a cdo remapcon2,r360x180 it really struggles - seems to stop at about 50%. I switched the model over to ACCESS1-3 and it runs fine. Then I added a second model - MIROC5 and found that the missing values haven't been set properly!
Annoying. I have tried to get model metadata fixed up before and had little success.
Damien Irving
@DamienIrving
Ah, when I tried the workflow step by step at the command line I tested ACCESS1-3 and not inmcm4, hence I didn't catch that. For the purposes of the workshop on Friday I'll just use models like ACCESS1-3 that work properly, but perhaps one of the biggest things to flag on the development workflow is some sort of process for highlighting data problems?
It will be a miracle if we can inspire community code development, community help desk via gitter/github and community logging of data problems (and solutions), but we can try!
Would you be happy with new_nino34.vt as a replacement for the existing Nino workflow? I just think we want to encourage a modular approach to using vistrails (and programming in general) as opposed to the monolithic all-in-one calc_nino34.sh script that we use at the moment.
Tim Bedin
@captainceramic
I am a bit unsure. I certainly agree that a more module approach helps with sustainability, but in certain cases a monolithic approach is required. For example, that existing nino34 script is just a black box. However, it also uses a rolling 30-year climatology to calculate the anomalies - useful to de-trend in this case, but not necessarily relevant for every use.