These are chat archives for ipython/ipython

13th
May 2015
Matthew Rocklin
@mrocklin
May 13 2015 14:51
Thoughts on deploying an IPython parallel cluster via yarn or mesos?
Brian E. Granger
@ellisonbg
May 13 2015 15:57
@mrocklin HI! I would talk to @rgbkrk about that - he is closest to that type of thing and might be very interested in that.
Does this mean you are getting dask to work with ipyparallel? ;-)
Had a look at dask yeseterday - looking very nice - congrats
Kyle Kelley
@rgbkrk
May 13 2015 15:58
At the very least, someone has come to me about using Mesos underneath JupyterHub or tmpnb for an internal deployment
but I haven't had a chance to even play with Mesos
We've done light amounts of work with Docker swarm, but no other cluster/scheduling setups
Matthew Rocklin
@mrocklin
May 13 2015 17:37

@ellisonbg dask.distributed is a work in progress. Decent prototype code though http://dask.readthedocs.org/en/latest/distributed.html .

Deployment is an open question. People seem to want to interact with existing HDFS systems which sort of implies a need to interact with Yarn. I'd like to dodge this bullet. @cowlicks is looking into bootstrapping dask.distributed deployment given a ipparallel client/view . Was hoping to get you all to do the hard work :)

@rgbkrk is JuptyerHub the current motivating force behind IPython (Jupyter?) Parallel development?
Scott Sanderson
@ssanderson
May 13 2015 17:42
@mrocklin JupyterHub is a multi-user server that manages individual instances of the core Jupyter Notebook application (aka, the application formerly known as IPython Notebook)
basically it's a proxy that sits in front of some (possibly dynamic) number of single-user notebook servers, and provides extension points for configuring things like where/how new notebooks are actually spawned or how users authenticate
but it's its own project separate from the SU server
Matthew Rocklin
@mrocklin
May 13 2015 17:45
Hi @ssanderson thanks for the explanation. I guess I'm trying to understand what would motivate @rgbkrk to spend some cycles on mesos/yarn. It sounds like JupyterHub is a stronger motivator for you now than IPParallel but that it might share "grok yarn/mesos" as a subtask
Is that what I should take away?
Kyle Kelley
@rgbkrk
May 13 2015 17:47
The grokking is definitely a subtask
;)
Matthew Rocklin
@mrocklin
May 13 2015 17:47
Does "internal deployment" imply "paying customer"?
Kyle Kelley
@rgbkrk
May 13 2015 17:50
Nope. External, research group that approached me.
Not connected to Rackspace whatsoever
Matthew Rocklin
@mrocklin
May 13 2015 17:53
Hrm, I'm on a research project (DARPA, mixed academic/industry, fully open) that would happily interact about deploying ipparallel on a Yarn-managed cluster. Not sure if that's enticing enough though.
Min RK
@minrk
May 13 2015 17:54
Launchers for ipyparallel can come from anywhere. They don't need to be part of IPython.
I haven't a clue how Yarn works, but presumably a custom YarnEngineSetLauncher wouldn't be much work.
I don't have access to any systems with Yarn, though.
Matthew Rocklin
@mrocklin
May 13 2015 17:55
Yeah, I don't need it to be part of IPython either. I'm mostly trying to entice you all to do work so that I don't have to. If I were to do this it would probably end up being dask-centric. There might be some benefit to having this be under the IPython abtractions instead.
Kyle Kelley
@rgbkrk
May 13 2015 17:55
I'm in the same boat, though it's bascially at push-button level for me at Rackspace right now.
Matthew Rocklin
@mrocklin
May 13 2015 17:56
If I got you access to a yarn cluster would that entice you to spend cycles? (No is a perfectly reasonable and somewhat expected answer)
Min RK
@minrk
May 13 2015 17:56
Sure. Shouldn't be much work, assuming starting a process on Yarn isn't insane.
Matthew Rocklin
@mrocklin
May 13 2015 17:58
Willing to jump through a couple of government hoops? Fill out a non-disclosure form, log on to VPN, etc.?
Scott Sanderson
@ssanderson
May 13 2015 17:59
@mrocklin this is entirely unrelated to the discussion here, but where's the right place to go to ask questions about/discuss dask? Is most of that happening under the blaze umbrella?
Matthew Rocklin
@mrocklin
May 13 2015 18:00
Officially dask is a blaze subproject, so blaze-dev@continuum.io works well, also https://github.com/ContinuumIO/dask issues
Scott Sanderson
@ssanderson
May 13 2015 18:00
I ask because I'm currently redesigning most of Zipline's data APIs to be expressed as directed acyclic computation graphs whose atomic terms are numpy arrays
Matthew Rocklin
@mrocklin
May 13 2015 18:00
Last time I was here I was also inspried to open up a gitter channel
So we could try that out
Sounds like we would enjoy a chat, should probably continue the conversation in another venue though so as not to hijack this room
Scott Sanderson
@ssanderson
May 13 2015 18:01
yep
Min RK
@minrk
May 13 2015 18:32
@mrocklin after looking at the docs for a bit, it's not remotely clear how to launch a simple process with yarn.
Matthew Rocklin
@mrocklin
May 13 2015 18:32
ha!
Min RK
@minrk
May 13 2015 18:32
All I need to be able to do is command --cli-args
Is the Hadoop API what I want to use, instead of YARN?
Matthew Rocklin
@mrocklin
May 13 2015 18:35
I don't know
One sec, let me see if someone more knowledgable is around
Alas
Matthew Rocklin
@mrocklin
May 13 2015 18:41
There are a couple people at Continuum who know more about this than I do. The people who run the cluster that I work on are also pretty easy to work with.
I'll send an e-mail out to you and cluster people asking to give you access. That might be a good point to follow on with questions.
I'll also ask the continuum people to drop in when they have a moment
Min RK
@minrk
May 13 2015 18:42
ok