These are chat archives for dropbox/pyston

17th
Jul 2015
Chris Toshok
@toshok
Jul 17 2015 02:15
huh, looks like pgo causes compilation errors during the -fprofile-use phase
../../from_cpython/Modules/_io/textio.c:2162:52: error: ‘*((void *)&cookie+16)’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
         if (PyUnicode_GetSize(self->decoded_chars) < cookie.chars_to_skip) {
which is pretty neat I suppose
Travis Hance
@tjhance
Jul 17 2015 02:17
It determined that from one of the profiles executions?
Chris Toshok
@toshok
Jul 17 2015 02:18
i’m guessing the -fprofile-generate phase generated info saying that the BB that initializes cookie is never hit
or something?
Chris Toshok
@toshok
Jul 17 2015 02:25
it’s totally, totally hacky and I had to turn off -Werror, but with pgo we’re close to 3s:
$ time ./pyston_release_gcc minibenchmarks/django_template.py
took  9.6ms for last iteration

real    0m3.008s
ah, next run is real 0m2.970s
Sun
@Daetalus
Jul 17 2015 03:32
Hi, I already accompished the new pow support, I will create a PR for review soon.
But I want add more tests, for example, pow(3, -3, 0) should report 2nd parameter should not be negative when 3rd argument specified first, then check 3rd argument whether is zero.
Because both error type is ValueError, so I want to check the error message. It seems that unittest not support the feature.
Could I use the method in selected answer? SO link
Kevin Modzelewski
@kmod
Jul 17 2015 05:00
we have a bit of a different testing method
so you can do something like
try:
  pow(3, -3, 0)
except ValueError as e:
  print e
which will end up testing that our error message is the same
where we compare the output vs cpython's output and assert that they are the same
Sun
@Daetalus
Jul 17 2015 05:13
Thanks!
Kevin Modzelewski
@kmod
Jul 17 2015 06:36
I ran some quick "steady state" numbers
ie run our 500 iterations of django_template as a warmup, and then time how long it takes to run 100 more
it's 2.2ms per iteration for pypy (2.6)
2.7ms for cpython
and 4.1 for us
and pypy is -20%
compared to +135% / +45% if you include warmup time
ie we are +50% compared to cpython
Sun
@Daetalus
Jul 17 2015 06:44
Greate!
Sun
@Daetalus
Jul 17 2015 06:51

Does the method dispatch in unittest different from normal pyston? For example, a simple statement:

(11).__pow__(32, 50)

Run it on script or command line with pyston_dbg and pyston_release both were fine. But run it in test, same statement, will cause error. Seems it try to find wrong method. I am try to find out why. But could anyone give some hints? Thanks!

Sun
@Daetalus
Jul 17 2015 07:12
Sorry for the format..., try again.
ln -sf /home/sun/pyston/build/Debug/pyston pyston_dbg
python ./tools/tester.py -R pyston_dbg -j1 -a=-S -k ./test/tests intmethods -K
        intmethods.py Correct output ( 37.2ms)
python ./tools/tester.py -R pyston_dbg -j1 -a=-S -k --exit-code-only --skip-failing -t30 ./test/cpython intmethods -K 
No tests matched the given patterns. OK by me!
python ./tools/tester.py -R pyston_dbg -j1 -k -a=-S --exit-code-only --skip-failing -t600 ./test/integration intmethods -K 
No tests matched the given patterns. OK by me! 
python ./tools/tester.py -a=-x -R pyston_dbg -j1 -a=-n -a=-S -k ./test/tests intmethods -K 
/home/sun/pyston/./test/tests/intmethods.py: 
Traceback (most recent call last):
    File "./tools/tester.py", line 433, in worker_thread 
        results[job[0]] = run_test(*job) 
    File "./tools/tester.py", line 164, in run_test 
        return determine_test_result(fn, opts, code, out, stderr, elapsed) 
    File "./tools/tester.py", line 307, in determine_test_result
        raise Exception("%s\n%s\n%s" % (msg, err, stderr))
Exception: Exited with code -6 (expected code 0)
pyston_dbg: ../../src/runtime/int.cpp:682: pyston::Box *intPowLong(pyston::BoxedInt *, pyston::BoxedLong *): Assertion `isSubclass(rhs->cls, long_cls)' failed.
pyston_dbg: ../../src/runtime/int.cpp:682: pyston::Box *intPowLong(pyston::BoxedInt *, pyston::BoxedLong *): Assertion `isSubclass(rhs->cls, long_cls)' failed.
Travis Hance
@tjhance
Jul 17 2015 07:19
so it runs the test with a bunch of different command line arguments
to try to exercise more code paths
for example
./pyston_dbg -n will run with the interpreter disabled, so the JIT gets exercised more
Sun
@Daetalus
Jul 17 2015 07:21
Thanks @tjhance , I will dig more.
Travis Hance
@tjhance
Jul 17 2015 07:21
In this case it looks like the test is failing with the -n argument
Sun
@Daetalus
Jul 17 2015 07:21
That's correct! -n will cause failed
Marius Wachtler
@undingen
Jul 17 2015 07:37
I'm only guessing but it may be that you registered the __pow__ function with the wrong argument types? (calls to addRTFunction specifies the arg types)
Sun
@Daetalus
Jul 17 2015 07:55
Thanks, @undingen I found that. And also thanks to @tjhance . Problem solved.
Marius Wachtler
@undingen
Jul 17 2015 07:56
:-)
Chris Toshok
@toshok
Jul 17 2015 19:11
nice, gcc pgo build is pretty impressive. -9.4% on django_template, -6.1 on sqlalchemy_imperative, -9.5% on pyxl_bench
Marius Wachtler
@undingen
Jul 17 2015 19:11
nice!
Chris Toshok
@toshok
Jul 17 2015 19:12
and those improvements are against pyston_release, so clang vs gcc there too
gcc builds have historically been slower than clang for us, right?
Daniel Agar
@dagar
Jul 17 2015 19:14
cool
are you going to work pgo into the regular build?
Chris Toshok
@toshok
Jul 17 2015 19:15
right now I have a separate Makefile target for it (pyston_release_pgo). clang pgo isn’t quite there yet in the clangs we use, I think?
Daniel Agar
@dagar
Jul 17 2015 19:16
I think you can get newer clang even back to 12.04
Chris Toshok
@toshok
Jul 17 2015 19:17
ah I thought clang pgo was newer than 3.5
toshok @toshok adds another makefile target
Chris Toshok
@toshok
Jul 17 2015 19:18
two pgo builds enter, one pgo build leaves
Marius Wachtler
@undingen
Jul 17 2015 19:19
good week: >-10% just by reusing the work of others :-P
Daniel Agar
@dagar
Jul 17 2015 19:22
clang 3.6 is available for 12.04
Chris Toshok
@toshok
Jul 17 2015 19:22
yeah I thought pgo was something >= 3.7 for some reason
Daniel Agar
@dagar
Jul 17 2015 19:25
with the way the build system ended up (Makefile shim + cmake) we could go back building clang and then using it
Daniel Agar
@dagar
Jul 17 2015 19:37
might even be able to get away with using the same patched llvm for release, debug, gcc, release_pgo
that would save some time...
Chris Toshok
@toshok
Jul 17 2015 19:38
wow, cmake doesn’t seem to like when you do -DFLAG=value_with_a=_inside_it
Marius Wachtler
@undingen
Jul 17 2015 20:25
mmh I benchmarked my 'interp/bjit don't lookup vars with a hashtable instead use fixed assigned offsets into an array' patch: looks like this is a slowdown for now??? (I even removed the getLocalHelper/setLocalHelper calls and directly emit the code...)
pretty sure that I have somewhere a bug because this does totally not make sense. (I even slowed the force interpreter mode down)
Marius Wachtler
@undingen
Jul 17 2015 20:30
this patch changes a python local var store from a call to a helper function which then does a DenseMap lookup and sets the value to a direct mov reg, [reg+offset]
Chris Toshok
@toshok
Jul 17 2015 20:38
this just started happening:
CMake Error at lz4/cmake_unofficial/CMakeLists.txt:12 (IF):
  if given arguments:

    "STREQUAL" "8"

  Unknown arguments specified
that corresponds to:
check_type_size("void *" SIZEOF_VOID_P)
IF( ${SIZEOF_VOID_P} STREQUAL  "8" )
Daniel Agar
@dagar
Jul 17 2015 20:43
what else changed?
Marius Wachtler
@undingen
Jul 17 2015 20:45
ok found the bug, numbers look now much more sane (slight perf improvement)
Chris Toshok
@toshok
Jul 17 2015 21:05
@dagar cmake question: if I want to make a change to CMAKE_CXX_FLAGS everywhere except in src/runtime/inline, is the cmake way to do that something like:
set (SAVED_CXX_FLAGS “${CMAKE_CXX_FLAGS}”)
set (CMAKE_CXX_FLAGS “…”)
add_subdirectory(…)

set (CMAKE_CXX_FLAGS “${SAVED_CXX_FLAGS}”)
add_subdirectory(src/runtime/inline)
would that even work?
Daniel Agar
@dagar
Jul 17 2015 21:10
I don't know if there's a better way, but that kind of thing works
Chris Toshok
@toshok
Jul 17 2015 21:12
okay, cool
will keep plugging. I think the lz4 cmake failure was due to some extra -D flags I was passing along. turns out I didn’t need them, removed them, and the problem went away
Daniel Agar
@dagar
Jul 17 2015 21:13
what a helpful error
Chris Toshok
@toshok
Jul 17 2015 21:13
yeah really. and when I fixed the lz4 problem locally (by putting “” around ${}), it just broke elsewhere
Daniel Agar
@dagar
Jul 17 2015 21:14
I don't love cmake
but at least it's not autotools
Travis Hance
@tjhance
Jul 17 2015 21:15
what’s wrong with autotools
(i don’t know anything about it)
Chris Toshok
@toshok
Jul 17 2015 21:18
i miss autotools sometimes :) i mean, m4 is amazingly painful
Daniel Agar
@dagar
Jul 17 2015 21:18
m4 and a lot of bloat that's not really relevant
Chris Toshok
@toshok
Jul 17 2015 21:18
automake/libtool are pretty lousy and unnecessary today, but as far as autoconf goes, m4 is the only real pain point I can recall
Travis Hance
@tjhance
Jul 17 2015 21:25
do we have a way to dump out an object for debugging
like any ol’ Box
Rudi Chen
@rudi-c
Jul 17 2015 21:26
There's a dump function defined somewhere
you can just try using dump that in GDB
Travis Hance
@tjhance
Jul 17 2015 21:27
ah
Rudi Chen
@rudi-c
Jul 17 2015 21:28
Yeah it's dump(void* p) in objmodel.cpp
Travis Hance
@tjhance
Jul 17 2015 21:29
kthnksx
Marius Wachtler
@undingen
Jul 17 2015 21:33
I learned something new: don't forget that you disabled GC collections before running django_template or you pc will freeze :-D
Marius Wachtler
@undingen
Jul 17 2015 21:48
fixed array offsets + don't GC allocate ASTInterpreter:
       django_template.py             3.5s (4)             3.3s (4)  -3.6%
            pyxl_bench.py             3.4s (4)             3.3s (4)  -2.5%
sqlalchemy_imperative2.py             4.1s (4)             4.0s (4)  -3.6%
                  geomean                 3.6s                 3.5s  -3.2%
Marius Wachtler
@undingen
Jul 17 2015 21:55
with this changes the baseline jit and LLVM JIT tier generated code has much closer perf
Travis Hance
@tjhance
Jul 17 2015 21:56
does that mean we can be more aggressive in the LLVM tier? :)
Marius Wachtler
@undingen
Jul 17 2015 22:00
I hope, -I mode is still slower but not that much anymore
Chris Toshok
@toshok
Jul 17 2015 22:02
Oh wow nice numvers
Interesting, too fast for autocorrect there (on mobile :)
Chris Toshok
@toshok
Jul 17 2015 22:27
didn’t want to kick this build https://travis-ci.org/dropbox/pyston/jobs/71484724
it’s threading_local.py, but not a .pyc problem
Chris Toshok
@toshok
Jul 17 2015 23:54
oh, ugh. CMAKE_CXX_FLAGS has some weird behavior where it’s settable up until a point and then no longer?