Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Sep 11 2015 16:36
    freeman-lab opened #5
  • Sep 11 2015 15:14
    bcipolli opened #4
  • Sep 11 2015 14:37

    freeman-lab on master

    Added Gitter badge Merge pull request #3 from gitt… (compare)

  • Sep 11 2015 14:37
    freeman-lab closed #3
  • Sep 11 2015 14:36
    gitter-badger opened #3
Ben Cipollini
@bcipolli
For the time being, I'd like to manually add a dataset or two into CodeNeuro, to try out the platform. But I don't know where to start. Any idea how difficult it would be to write a doc that gave some basic pointers into the code?
I could help flesh it out, if I had good starting points. If it's not really feasible at the moment, that's fine too.
Jeremy Freeman
@freeman-lab
awesome, so the main thing is that all the data are stored in a bucket on S3 with public read, if we added public write, it could potentially get swamped
i think the only sane approach is to have some kind of credential / whitelist / validation system
Ben Cipollini
@bcipolli
ok. i know some datasets already on Amazon S3. So, maybe just work with those, and don't worry about adding credentials yet.
I'm looking for an MVP to try out the system and see what it can do, and show people how they can work on their own data as easily and quickly as possible.
Jeremy Freeman
@freeman-lab
ah gotcha, so both the datasets and the notebooks began as a way to share some curated data and associated notebooks for more generic use cases, e.g. tutorials, challenges
it sounds like you're describing using the same components for individuals to analyze their own data
Ben Cipollini
@bcipolli
well, the datasets I have in mind are public, curated datasets. Say, from openfMRI.org, or the HCP, or other public datasets.
maybe the ABIDE dataset, or any of the preprocessed data Cameron Craddock has... most of that is on Amazon AWS
er S3, sorry
My idea was to show how to manipulate those types of public datasets with examples/tutorials, using your system.
Jeremy Freeman
@freeman-lab
ah awesome, now i understand =) for data already on S3, we could definitely make it easy to add references / links / notebooks to the existing sites
Ben Cipollini
@bcipolli
And see how to use some nice open source packages, such as the nilearn python library, through your system. That was my hope, at least.
Jeremy Freeman
@freeman-lab
that's a great idea
yeah, totally with you -- at first i thought you meant letting anyone upload data to our S3 bucket, which will be a little more complicated =)
Ben Cipollini
@bcipolli
haha no thanks on that one :)
If someone wants to use their data, they could get their own data on Amazon S3 themselves :)
Jeremy Freeman
@freeman-lab
yup, exactly
Ben Cipollini
@bcipolli
But yeah... IF we can get data on S3, how would we access it through your system, and create new notebooks in your system. And next step: how could we publicize that dataset as available, in your list of datasets. That was my idea.
b/c right now, the list of datasets is pretty sparse.
Jeremy Freeman
@freeman-lab
totally
Ben Cipollini
@bcipolli
So, if we can work on creating a public doc on how to do those things, I would certainly use it to try things out, and encourage others to do similarly.
Without docs, I'd be a bit stuck to get started.
Jeremy Freeman
@freeman-lab
oh for sure, i think we want to modify the system just a little bit first
right now the dataset listing is generated straight from S3
but we can switch to use a simple mongo db
Ben Cipollini
@bcipolli
nice
Jeremy Freeman
@freeman-lab
and then define an API that let's people add datasets to the db
Ben Cipollini
@bcipolli
:+1:
would it be possible to access other S3 buckets from notebooks as they are right now?
Jeremy Freeman
@freeman-lab
and they'd include in the post the S3 location, associated notebooks, and metadata
yup, all the notebooks run on EC2 so access to S3 is easy and free (at least within the US)
Ben Cipollini
@bcipolli
ok. then in the meantime, I'll try out the notebook system and see if I can get something running there.
I'll probably have time Wednesday or so to try it out.
Jeremy Freeman
@freeman-lab
ok cool, i’ll try to work on the db / API stuff
Ben Cipollini
@bcipolli
Shall I add a github issue for what we discussed, re: using MongoDB to allow users to add curated datasets to the list?
Jeremy Freeman
@freeman-lab
depending on our progress, could also be a great little project for CodeNeuro SF (november 20th)
Ben Cipollini
@bcipolli
That would help me keep track of progress, and contribute if needed.
Jeremy Freeman
@freeman-lab
that’d be great!
totally
would love your help
Ben Cipollini
@bcipolli
sounds great.
Jeremy Freeman
@freeman-lab
can start laying things out in the issue
Ben Cipollini
@bcipolli
nov 20... will see if I can make it!
I'm at UC San Diego... not too far. Is it open reg for that event?
*open registration
Jeremy Freeman
@freeman-lab
great! not open yet, but will be posting soon
Ben Cipollini
@bcipolli
cool. Will work on that issue now; catch up with y'all soon!
Wiktor Dolecki
@stmi
Hello, I've came here from edX course on machine learning and wanted to further study zebrafish dataset on my own. Is it possible to obtain it from some publicly available hosting? I've tried accessing http://datasets.codeneuro.org/, but it seems to be down I can't reach it.