These are chat archives for dropbox/pyston

19th
Nov 2015
Rudi Chen
@rudi-c
Nov 19 2015 09:48
In callCLFunc@objmodel.cpp, after calling the compiled function, we have the assertion ASSERT(!PyErr_Occurred(), "%p", chosen_cf->code);. I think that should be ASSERT(!PyErr_Occurred() || S == CAPI, "%p", chosen_cf->code);?
There's a couple tests in NumPy that are supposed to raise exceptions but the exception gets 'blocked' at that point. But I'm not too familiar about the intricacies of what functions are responsible for handling exceptions, so just wanted to make sure.
and more importantly, could you order the key in the same order as the bars in the chart please for color blind people?
Rudi Chen
@rudi-c
Nov 19 2015 10:16
My guess is that we don't run the PGO built because it takes forever. You need to compile Pyston, run through a lot of code, then recompile with the instrumentation data.
lesshaste
@lesshaste
Nov 19 2015 10:19
and how about the colors? :)
Rudi Chen
@rudi-c
Nov 19 2015 10:20
Well, that shouldn't be too hard a change :P
lesshaste
@lesshaste
Nov 19 2015 10:23
thanks!
at least if they are in the right order I can guess :)
Kevin Modzelewski
@kmod
Nov 19 2015 11:04
I've never actually delved into the source code for the speed center
not sure how they decide what colors to use
@rudi-c hmm that's a bit odd that a CAPI exception occurred, since it's on the r != NULL path
I don't think that's a case that we are supposed to support -- cpython doesn't handle it very well
(they propagate the exception at some random point later when they notice it)
it might be that we had a CAPI exception going into that function
like, was PyErr_Occurred() true before the call to callChosenCF?
but anyway, that assert shouldn't technically be there, since CPython will allow those kinds of CAPI bugs to pass through
but for us it's mostly gotten tripped since we had a bug in our own code
which is why we left it in
Marius Wachtler
@undingen
Nov 19 2015 11:08
I suspect the same thing because this also something I encountered while getting lxm running (that this assert triggered because the error was set already before we called the function)
Marius Wachtler
@undingen
Nov 19 2015 11:18
@kmod thanks for pointing out that I should take a look at the generated code

looks like llvm is not very smart with our current pass pipeline:

mov    %rax,-0x248(%rbp)
cmpq   $0x0,-0x248(%rbp)
je     0x7f114504a17d <generate_tokens_e3_1+12669>

or

mov    %rax,-0x208(%rbp)
mov    -0x208(%rbp),%rax
Marius Wachtler
@undingen
Nov 19 2015 11:30
I'm not sure whats going on but the code is not good: and it's at least in this case not an memory aliasing issue.
mov    $0x1,%eax
mov    %rax,-0x1f8(%rbp)
mov    $0x1,%eax
mov    %rax,-0x238(%rbp)
Marius Wachtler
@undingen
Nov 19 2015 15:42

I'am still tracking down the minimal number of passes we have to add to get ride of all this unnecessary code.
But just for the lols one more: that is how we currently clear a region of memory

xorps  %xmm0,%xmm0
movaps %xmm0,-0xe0(%rbp)
xorps  %xmm0,%xmm0
movaps %xmm0,-0xf0(%rbp)
... same pattern repeats another >10 times

Looks like llvm doesn't trust xmm0 very much and makes sure its really stays at 0 :-D

Marius Wachtler
@undingen
Nov 19 2015 19:19

So it still unclear to me why llvm generates sometimes this stupid pattern and sometimes not:

good version:
xorps  %xmm0,%xmm0
movaps %xmm0,-0xd0(%rbp)
movaps %xmm0,-0xe0(%rbp)
movaps %xmm0,-0xf0(%rbp)
...

switching to the greedy reg alloc removes most of the stupid code and gives a significant perf improvement. (But makes my code crash for some benchmarks...)

Marius Wachtler
@undingen
Nov 19 2015 20:19

ok another run and I'm seeing again a pattern of

mov    $0x0,%ecx
mov    %rcx,-0x268(%rbp)
mov    $0x0,%ecx
mov    %rcx,-0x260(%rbp)
...repeated about 10 times...

something is really odd... but enough for today...

(and yes I made sure it's really generated by llvm and not our rewriter)
Kevin Modzelewski
@kmod
Nov 19 2015 22:03
oh man, that's pretty funny :)
Kevin Modzelewski
@kmod
Nov 19 2015 22:12
also, the recording of our talk is up!
Thomas Mangin
@thomas-mangin
Nov 19 2015 22:27
:+1:
Marius Wachtler
@undingen
Nov 19 2015 22:49
cool :+1:
Kevin Modzelewski
@kmod
Nov 19 2015 23:23
some interesting initial refcounting perf numbers:
I wrote a simple C API benchmark that just allocates 1B ints
and it was about 75% slower with refcounting :(
and all the time was being spent in jemalloc
(using normal malloc was even worse)
and then I ported CPython's integer freelist, and it is 3x faster than the GC version :)
I might try porting the allocator from our current GC into the refcounting version -- I'm surprised that it can do so much better than jemalloc
Thomas Mangin
@thomas-mangin
Nov 19 2015 23:27
I had a look at Julia some time ago and noticed that the quality of the LLVM bytecode was very much affected by the type declaration. Ultimately do you expect to be able to infer the type of data well enough to reach the best bytecode possible or will it likely be good but not optimal ?
Marius Wachtler
@undingen
Nov 19 2015 23:37
@kmod oh good sounds promising. at first I was a little bit disappointed to hear about the large slowdown but the freelist twist was nice :-)
yeah I noticed once too that the untracked allocations are faster than doing it with a malloc which is surprising (but maybe it just helps that we don't have to support "real" multithreading so much)