These are chat archives for ipython/ipython

24th
Mar 2016
Metehan Çelenk
@themlistone
Mar 24 2016 10:38
Has anyone worked with PyAudio?
Bryan Van de Ven
@bryevdv
Mar 24 2016 15:26
PyAudio is quite finicky in my experience
There is a bokeh example that uses it here: https://github.com/bokeh/bokeh/tree/master/examples/embed/spectrogram
(does not run on Windows, AFAIK)
Jason Grout
@jasongrout
Mar 24 2016 15:29
Dave, can you open the issue in ipykernel, and I'll try to dump what I was saying there in a comment?
@dwillmer ^
Dave Willmer
@dwillmer
Mar 24 2016 15:33
@jasongrout typing up now
Jason Grout
@jasongrout
Mar 24 2016 15:34
ah, I didn't realize you followed this channel, so I pinged you on the jupyter/jupyter one too. Thanks!
Dave Willmer
@dwillmer
Mar 24 2016 15:37
@jasongrout @minrk - what's the recommended way to run the tests on ipykernel?
Dave Willmer
@dwillmer
Mar 24 2016 15:44
nosetests --with-coverage ipykernel ?
Min RK
@minrk
Mar 24 2016 15:47
Yup
Dave Willmer
@dwillmer
Mar 24 2016 15:53
thanks!
Thomas
@tcwalther
Mar 24 2016 18:41
I've set up IPython on a new server, and I get a strange behaviour. Every kernel starts 8 threads. On a different server, it doesn't start any threads at all (other than the main thread). When I execute something, it executes the instruction on all 8 threads. Which includes a lot of context switches, but isn't any faster.
I've tried regular Python, and Anaconda's Python. Using IPython 4.1.2.
Min RK
@minrk
Mar 24 2016 18:44
The same code runs 8 times?
So when you type print("hello") you get 8 copies out?
Thomas
@tcwalther
Mar 24 2016 18:44
I only get one copy
The code works fine, it just uses a lot of CPU to get the same work done
Min RK
@minrk
Mar 24 2016 18:45
Then what tells you that it's executing the code 8 times?
Thomas
@tcwalther
Mar 24 2016 18:46
I don't think it's executing it 8 times. It looks like some really weird form of parallelization. If I run 48 engines to use all my 48 cores, the code runs 20x slower than on the other machine. That's the first indicator. Most of the CPU usage is spent on system load, because I have LOADS of context switches.
It's just very weird to begin with that I open an empty notebook, which starts a new kernel, and I get 8 threads:
Screen Shot 2016-03-24 at 18.47.24.png
Green lines are threads. You can see the kernel in the middle, with its 8 threads.
Min RK
@minrk
Mar 24 2016 18:49
You are using IPython parallel?
Thomas
@tcwalther
Mar 24 2016 18:50
Yes. That's where I get the 20x slowdown. However, it's already multithreaded even when not using ipyparallel:
Min RK
@minrk
Mar 24 2016 18:51
When using the IPython kernel, it is a multi-threaded app. It's not parallelism, it's just doing some different background tasks in threads.
Thomas
@tcwalther
Mar 24 2016 18:51
Screen Shot 2016-03-24 at 18.50.50.png
Well, maybe, but that CPU graph is crazy. On my other server, you get 100% user load on one core.
Min RK
@minrk
Mar 24 2016 18:52
Can you compare library versions?
Thomas
@tcwalther
Mar 24 2016 18:53
ipython 4.1.2 and jupyter 1.0.0 on both servers
Min RK
@minrk
Mar 24 2016 18:54
And ipykernel and ipyparallel?
And pyzmq?
Thomas
@tcwalther
Mar 24 2016 18:57
same on both - ipykernel: 4.3.1, ipyparallel: 5.0.1, pyzmq: 15.2.0
Min RK
@minrk
Mar 24 2016 18:58
And what code are you running? Do you see the same behavior with trivial echo/sleep tasks?
Oh, wait. Did you say one was using anaconda and the other wasn't?
Thomas
@tcwalther
Mar 24 2016 18:59
Yes. However, I also tried a non-anaconda distribution on the new server. However, I probably should try reproducing the error on the command line. I haven't done that, and I feel a bit ashamed for not having done so. Let me see if that happens with regular python, too, not just ipython
Min RK
@minrk
Mar 24 2016 19:00
Is your code using numpy?
If so, I bet it's MKL.
Thomas
@tcwalther
Mar 24 2016 19:01
Well, I guess I have to apologize here, cause it also happens in python. Definitely should have checked that first. I'll try to narrow it down more.
Min RK
@minrk
Mar 24 2016 19:01
Anaconda started shipping numpy linked against MKL pretty recently. The key being that MKL is a multi-threaded linear algebra library.
Thomas
@tcwalther
Mar 24 2016 19:02
yeah, I tried the nomkl versions already. They're actually faster.
(which are linked against openblas)
Min RK
@minrk
Mar 24 2016 19:02
Which makes sense, because you are actually using a million threads.
If you use mkl and N engines = N cores, you want to set MKL_NUM_THREADS=1
or import mkl; mkl.set_num_threads(1) from Python.
Thomas
@tcwalther
Mar 24 2016 19:06
@minrk, thanks a lot for your help. I know now that it's not iPython (and I probably should've figured that out before coming here), so thanks a ton for helping me anyway!
Min RK
@minrk
Mar 24 2016 19:07
No problem. This has come up long ago (it took me a lot longer to figure it out the first time), but with Anaconda, MKL is about to suddenly get a lot more common.
Thomas
@tcwalther
Mar 24 2016 19:08
I'm not using MKL right now, though.
I actually think it's something entirely different. The new server has a GPU. Some part of the benchmark uses Tensorflow, but only to apply a model, not to train it. My current benchmark has that "apply model" code in it. Looks like Tensorflow parallelizes the hell out of it, which I didn't expect. Running np.linalg.bench('full'), for example, nicely only occupies a single core.
I just didn't think of Tensorflow at first, because I still find it weird that on one machine, each ipython kernel has 8 threads just after starting, without running anything in the kernel, and on the other machine, it doesn't have any threads
but maybe my htop behaves differently on the two machines (although I made sure the settings were equal - outdated package maybe?)
Min RK
@minrk
Mar 24 2016 19:11
possible, or maybe tensorflow is setup differently? I'm not that familiar with it.
Thomas
@tcwalther
Mar 24 2016 19:11
Yes, it is setup to use the GPU. Again, I didn't expect it to make a difference outside of training, but seems like it does. I'm not certain with this yet, but I thank you again for your help. I know now that it's not iPython, which is a huge help.
Min RK
@minrk
Mar 24 2016 19:12
Always good to eliminate candidates!
Thomas
@tcwalther
Mar 24 2016 20:56

Here is some more information: I am using OpenBlas, but it turns out that numpy.dot is the evil function. So it is similar to the MKL problem, just with OpenBlas. The solution was to disable automatic parallelisation. I put

export OMP_NUM_THREADS=1
export MKL_NUM_THREADS=1
export NUMEXPR_NUM_THREADS=1

into my .bashrc now.

That solved it!
Min RK
@minrk
Mar 24 2016 21:10
@tcwalther great! Depending on how much time you spend in dot, you might try different values, and see if you get better results with np/num_threads engines for values other than num_threads=1.
Thomas
@tcwalther
Mar 24 2016 21:24
@minrk: I doubt it. I use 48 processes to use the 48 CPUs; more threads just mean more context switches.
(and I love ipyparallel's balanced view for that. It's just amazing)
Min RK
@minrk
Mar 24 2016 21:28
I meant trying num_threads=4 with 12 engines.
I don't know what fraction of the time you are spending in BLAS, though, so maybe it doesn't make sense.
Thomas
@tcwalther
Mar 24 2016 21:34
@minrk good point, that makes sense. I don't spend much time in BLAS I think, but this is something I will definitely evaluate.
Min RK
@minrk
Mar 24 2016 21:36
And with 48 being so nicely divisible, you can test 2/24, 3/16, 4/48 and get a nice view of what affect it might have.
Thomas
@tcwalther
Mar 24 2016 21:36
My main interest right now is seeing whether I can throw computations on the GPU to speed them up.
Min RK
@minrk
Mar 24 2016 21:36
That probably makes a bigger difference.
Thomas
@tcwalther
Mar 24 2016 21:37
Depends. If I spend most of the time copying from CPU memory to GPU memory and back, it may even be slower
Min RK
@minrk
Mar 24 2016 21:44
Bigger difference, in either direction :)