These are chat archives for dropbox/pyston

15th
Jul 2015
Chris Toshok
@toshok
Jul 15 2015 00:46
just tried running cpython in gdb/lldb on my laptop
totally useless
Kevin Modzelewski
@kmod
Jul 15 2015 00:54
there's python-dbg
as in apt-get install python-dbg; python-dbg t.py
which has full debug symbols
or make pydbg_fib
Chris Toshok
@toshok
Jul 15 2015 00:54
oh building from source works on linux
Kevin Modzelewski
@kmod
Jul 15 2015 00:55
oh python2.7-dbg
Chris Toshok
@toshok
Jul 15 2015 00:55
i had a cpython checkout on osx too so figured i’d do some stuff there. bad news :)
$ ./mercurial-3.4.1-install/bin/hg clone https://selenic.com/hg mercurial-repo
requesting all changes
adding changesets
adding manifests
adding file changes
added 25787 changesets with 48458 changes to 2312 files
updating to bookmark @
1092 files updated, 0 files merged, 0 files removed, 0 files unresolved
$
that’s with PYTHONPATH set to use their pure python implementation of some code, but still
Chris Toshok
@toshok
Jul 15 2015 01:39
8x slower than cpython on that clone. most time spent in gc (GCVisitor::visit = 30%)
Kevin Modzelewski
@kmod
Jul 15 2015 07:20
hmm so I'm thinking the reason that the ubuntu cpython is so much faster than a fresh-built one, is that they have both pgo and lto set up
I tried turning on just lto and it seemed to make my cpython build about 5% faster
Chris Toshok
@toshok
Jul 15 2015 07:21
yeah weren’t they saying they ran their test suite for pgo?
Kevin Modzelewski
@kmod
Jul 15 2015 07:21
yeah, they do
if anyone feels like incorporating lto + pgo into our build system it might end up being really helpful :)
I keep running into cases where I'm pretty sure we're doing the exact same thing as cpython
but they just do it faster
and their profiles end up looking a bit different
(despite it being the same source code)
Chris Toshok
@toshok
Jul 15 2015 07:27
lto is just a flag, right?
pgo would be nice, though. presumably that would let us get rid of the hand tuned section_ordering file?
Kevin Modzelewski
@kmod
Jul 15 2015 08:48
oh man, pgo+lto makes cpython 10% faster (30% on some microbenchmarks)
Marius Wachtler
@undingen
Jul 15 2015 08:51
The cpython build we make our comparisons with has pgo+lto enabled?
eg the weekly stats and speed.pyston.org?
Sun
@Daetalus
Jul 15 2015 09:36
Hello? Need help.
In pow(lhs, rhs, mod), we must check the rhs and mod. For example
```
Sun
@Daetalus
Jul 15 2015 09:54

For example:

has mod
    mod is float    ->  TypeError: pow() 3rd argument not allowed unless all arguments are integers

    mod is int        ->  rhs is float    -> TypeError: pow() 3rd argument not allowed unless all arguments are integers

                                 ->  rhs is int        -> ValueError: pow() 3rd argument cannot be 0
                                                                 -> rhs less than zero -> TypeError: pow() 2nd argument cannot be negative when 3rd argument specified

                                 ->  rhs is long     -> ValueError: pow() 3rd argument cannot be 0
                                                                 -> rhs less than zero -> TypeError: pow() 2nd argument cannot be negative when 3rd argument specified

    mod is long    ->  rhs is float    -> TypeError: pow() 3rd argument not allowed unless all arguments are integers

                                ->  rhs is int       -> ValueError: pow() 3rd argument cannot be 0
                                                              -> rhs less than zero -> TypeError: pow() 2nd argument cannot be negative when 3rd argument specified

                                ->  rhs is long   -> ValueError: pow() 3rd argument cannot be 0
                                                             - > rhs less than zero -> TypeError: pow() 2nd argument cannot be negative when 3rd argument specified

I want to write a check function for intPow and longPow, due to int.cpp already include long.h, I want to put it into long.cpp, is that ok?
Is there a more elegant solution than write a complex if clause?

Oh, sorry, the code format was broken...
Kevin Modzelewski
@kmod
Jul 15 2015 10:06
hmm it looks like cpython just has each of those checks in int_pow/long_pow/float_pow
and doesn't share any error-checking code
oh, I think one thing that keeps it simple
is int only deals with when the arguments are ints
long deals with int+long
and the float deals with all of them
but I think that brings up the issue that that kind of behavior can only be implemented through the capi
since there's no __rpow__ equivalent for the third argument.
Sun
@Daetalus
Jul 15 2015 10:08
For now, if my understanding is correct , Pyston choose different pow according the first argument.
Kevin Modzelewski
@kmod
Jul 15 2015 10:09
and @undingen yeah the cpython numbers have all been with the ubuntu-supplied cpython
I don't get exactly the same numbers with that as my custom-built one, but they are pretty similar
Sun
@Daetalus
Jul 15 2015 10:10
In pow(int, box, mod), call intPow to check the second box and mod. Similar with pow(long, box, mod)
Kevin Modzelewski
@kmod
Jul 15 2015 10:10
well, if pow doesn't exist, or if it returns NotImplemented, then we will go on to try rpow
and intpow returns NotImplemented if any of the arguments aren't int's
(at least, in cpython)
Sun
@Daetalus
Jul 15 2015 10:23
And I think there is no need to handle \_\_rpow\_\_ for the third argument. Just check the mod is whether float or zero and rhs is less than zero when has mod. Is that fine?
Daniel Agar
@dagar
Jul 15 2015 20:41
do you have a tool generating the before/after perf summary?
eg
   django_template.py             3.9s (2)             3.6s (2)  -7.8%
        pyxl_bench.py             3.6s (2)             3.4s (2)  -5.1%
   sqlalchemy_imperative2.py             4.8s (2)             4.4s (2)  -9.1%
              geomean                 4.1s                 3.8s  -7.3%
Chris Toshok
@toshok
Jul 15 2015 20:41
yeah, that’s the pyston-perf repo
python analysis/investigate.py before-git-hash after-git-hash
Daniel Agar
@dagar
Jul 15 2015 20:42
nice
Kevin Modzelewski
@kmod
Jul 15 2015 20:46
it also resolves symbolic refs, so I did python investigate.py pr/635~ pr/635
Chris Toshok
@toshok
Jul 15 2015 20:46
oh! i didn’t realize that
that’s much nicer
Marius Wachtler
@undingen
Jul 15 2015 20:51
nice, one week holiday? :-P
Chris Toshok
@toshok
Jul 15 2015 20:51
hopefully jasone can get us another few percent by fixing the perf regressions in 4.0 :)
Marius Wachtler
@undingen
Jul 15 2015 20:52
:-D
Kevin Modzelewski
@kmod
Jul 15 2015 21:07
@Daetalus yeah that sounds fine, but it is different from how CPython handles it
since the c-api gives you the ability to run the pow function of the modulus
luckily I think this should be fairly easy for us to test comprehensively
something like
all_args = [None, 1, 1L, -1, -1L, 2.0, 0.5]
for lhs in all_args:
  for rhs in all_args:
    for mod in all_args:
      pow(lhs, rhs, mod) # but with some error handling
Chris Toshok
@toshok
Jul 15 2015 21:20
what’s the timeout for tests on travis-ci?
Daniel Agar
@dagar
Jul 15 2015 21:23
10 minutes if there's no console output
I think 60 minutes overall
or did you mean per test?
Chris Toshok
@toshok
Jul 15 2015 21:25
Yeah per test
Daniel Agar
@dagar
Jul 15 2015 21:26
it's the same as running it locally
Marius Wachtler
@undingen
Jul 15 2015 21:26
afaik 25secs and 30secs for the cpython test suite
at least this are the variables I try too adjust to get the debug build passing on travis-ci :-D
Chris Toshok
@toshok
Jul 15 2015 21:29
huh. because multiprocessing_test.py timed out for me (on the debug build on travis-ci), but it doesn’t here.. 9.3/10.0 seconds for the two versions of the test that run
i rebased off master, so maybe jemalloc will get things in under the wire now :)
Daniel Agar
@dagar
Jul 15 2015 21:31
it's also running 4 test threads
Marius Wachtler
@undingen
Jul 15 2015 21:41
@tjhance I changed the bumpUse do deallocate the scratch mem. But now I'm having an issue with setAttr....
It looks like:
  • we allocate some scratch memory array
  • we allocate another scratch memory array
  • we copy the address of the first array with setAttr and some offset into the other
  • bumpUse will think the first array isn't used any more (but it is because the second one points to it)
Travis Hance
@tjhance
Jul 15 2015 21:43
so we store a pointer to the first array in the second array?
hmm
that is tricky
Marius Wachtler
@undingen
Jul 15 2015 21:43
I could think of setting a flag inside the first array: never reuse my scratch mem... or something like this. But is there a better way to fix this?
Travis Hance
@tjhance
Jul 15 2015 21:45
that sounds like the easiest thing to do
Marius Wachtler
@undingen
Jul 15 2015 21:46
I don't want to spend to much time on this because the whole thing is not super important
Travis Hance
@tjhance
Jul 15 2015 21:46
I would go with that plan then
incidentally, I bet we do some extraneous copies
Marius Wachtler
@undingen
Jul 15 2015 21:48
The two allocas come from the rearrangeArguments the first one (because the dest funtion has more than 3 args) and then we make a second allocate inside callCLFunc for the astInterpretHelper call because it has more than 6 args...
Travis Hance
@tjhance
Jul 15 2015 21:49
yeah something to look into
maybe a 0.5% improvement here, or something
Marius Wachtler
@undingen
Jul 15 2015 21:49
Yes we do... at least the bjit does... but I think it's to much of a problem currently because it looks like most function only have small number of args
Travis Hance
@tjhance
Jul 15 2015 21:50
we also copy when we add an argument (e.g. self) to the front
Marius Wachtler
@undingen
Jul 15 2015 21:50
but I could be wrong. at least recent history showed that my guesses can be pretty far off :-D
Marius Wachtler
@undingen
Jul 15 2015 21:57
another thing: Currently the bjit emit code for a if statement which looks roughly like this:
  • binop(lhs, rhs, op)
  • boxBool(nonzero(binop))
  • asm cmp result with True or False
would be nice to get rid of the boxBool and instead compare directly with 0.
Chris Toshok
@toshok
Jul 15 2015 21:59
:+1:
Marius Wachtler
@undingen
Jul 15 2015 22:01
It's kind of hard wit the current design to not generate the boxes in the first place because there could be stores afterwards which need to be boxed...
So I would like to create boxing actions which I would remove if I see that a I'm going to emit a compare which has a boxBool as operand...
I don't think thats possible with the current design. Do you have an idea what's the easiest way for this?
as this is probably rather small code size + reduction perf improvement. I don't want to add another layer in between the bjit + rewriter just so I can remove the uneeded boxedBools...
BTW: it would be really nice if our AST nodes would have a user/uses list :-D
Marius Wachtler
@undingen
Jul 15 2015 22:21
and the perf + clang profile guided opt I was talking about: http://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
Chris Toshok
@toshok
Jul 15 2015 22:22
unfortunately it uses perf record -b :(
Marius Wachtler
@undingen
Jul 15 2015 22:22
mmh what's that / what does it change?
Chris Toshok
@toshok
Jul 15 2015 22:22
was going to do a investigatory PR to see what perf list printed out
yeah, I read that
which isn’t available on jitdev at least, and I’m guessing isn’t available on travis-ci either
it’s branch event recording
"Last Branch Record"
Marius Wachtler
@undingen
Jul 15 2015 22:23
oh....
Chris Toshok
@toshok
Jul 15 2015 22:23
it does seem insanely useful.. maybe we could just check it in and have someone nice with a linux pc on their desk generate it :)
Marius Wachtler
@undingen
Jul 15 2015 22:24
than back to a special instrumented binary build :-(
yeah would also be an idea
Chris Toshok
@toshok
Jul 15 2015 22:24
yeah. i think that’ll be okay really (the instrumented build)
trying to get lto working now and it’s failure modes are completely inscrutable :/
Marius Wachtler
@undingen
Jul 15 2015 22:26
is this with this llvm gold plugin? or does this now work out of the box? (haven't tried it in a long time)
Chris Toshok
@toshok
Jul 15 2015 22:27
yeah, with LLVMgold.so
basically i get this during the final link:
inlined-at should be a location
<0x7f6306c08620>
<0x7f632ab51aa0>
inlined-at should be a location
<0x7f6306c089b0>
<0x7f632ab51ae0>
LLVM ERROR: Broken function found, compilation aborted!
Marius Wachtler
@undingen
Jul 15 2015 22:27
oh wow...
Chris Toshok
@toshok
Jul 15 2015 22:28
which is coming from the IR verifier
Travis Hance
@tjhance
Jul 15 2015 22:28
fix the broken function!!
Marius Wachtler
@undingen
Jul 15 2015 22:29
could it be that it's also trying to build in our custom modified stdlib bitcode file?
e.g. you pass *.bc to the linker or soemthing like this?
Daniel Agar
@dagar
Jul 15 2015 22:31
Did you just need to add -flto?
Chris Toshok
@toshok
Jul 15 2015 22:31
yeah, to the -c compile lines. the link line I’m munging myself so it pulls in the right plugin (-flto brings in the llvm 3.5 plugin, which actually crashes)
the link line is unchanged, but .o’s are actually LLVM bitcode
Marius Wachtler
@undingen
Jul 15 2015 22:34
looks like the error has something todo with debuglocations and afaik the changed this from 3.5 to 3.6 so maybe an issue with it accessing the wrong plugin version...
Daniel Agar
@dagar
Jul 15 2015 22:34
the linker is called a few times when building stdlib
Chris Toshok
@toshok
Jul 15 2015 22:36
for stdlib.bc.o don’t we encode the .bc as a byte array?
all the other .o’s in stdlib.a are IR bitcode (like boxing.cpp.o, dict.cpp.o, etc.) stdlib.bc.o is a normal .o
oh neat, never knew about this flag before:
-ftrap-function=[name]
Instruct code generator to emit a function call to the specified function name for __builtin_trap().
Marius Wachtler
@undingen
Jul 15 2015 22:50
the problem may be that we have the modified stdlib.bc laying around and it may link this one in. and this one is from a different llvm version + may contain symbols with strange visibility
Chris Toshok
@toshok
Jul 15 2015 22:52
appears that tuple.cpp.o is one of the broken files
and str.cpp.o
guessing it’s something to do with the inline/ and/or link_forcer?
Marius Wachtler
@undingen
Jul 15 2015 22:59
the inline/ stuff is also the stuff which we put into our stdlib.bc file...
ok now I know how to make text italic :-D
Chris Toshok
@toshok
Jul 15 2015 22:59
right but the things in stdlib.bc are forced to be included in link_forcer.cpp, right?
stdlib.bc shouldn’t affect things at link time I would imagine :/
Marius Wachtler
@undingen
Jul 15 2015 23:01
ỳeah sorry maybe I'm putting you on the wrong track. I just thought that the issue is caused by it picking up the stdlib bitcode file beause it sees it inside the directory
Chris Toshok
@toshok
Jul 15 2015 23:02
actually it doesn’t appear to just be the inline/ stuff :/ int.cpp.o worked
i wonder if this is due to using clang 3.5 to do the actual compilation
Marius Wachtler
@undingen
Jul 15 2015 23:07
you could try to get gcc with flto working as a first step :-D
Travis Hance
@tjhance
Jul 15 2015 23:07
delete functions until it works?
everybody’s favorite technique
Marius Wachtler
@undingen
Jul 15 2015 23:08
maybe you encounter the same error but it gives better diagnostics?
Chris Toshok
@toshok
Jul 15 2015 23:08
@tjhance: i’ve been doing “add a file at a time until it breaks”, so that would be the perfect counterpoint
Marius Wachtler
@undingen
Jul 15 2015 23:16
:-( instead of switching to the upstream/master branch I switched to my seriously outdated origin/master and now I have to rebuild llvm..
Chris Toshok
@toshok
Jul 15 2015 23:16
yeah I really wish github had some feature where it would automatically keep a branch updated from upstream for you
i’ve done that before :/
Daniel Agar
@dagar
Jul 15 2015 23:18
the inline stuff is built with the pyston version of llvm/clang
possibly relevant if only src/runtime/inline/*.cpp are broken
Chris Toshok
@toshok
Jul 15 2015 23:27
definitely relevant with gcc+lto
since gcc doesn’t know how to deal with LLVM IR bitcode files :)
Chris Toshok
@toshok
Jul 15 2015 23:45
https://www.youtube.com/watch?v=vzzABBxo44g click’s talk that goes with those slides from a few days ago