These are chat archives for dropbox/pyston

11th
May 2015
andrewchambers
@andrewchambers
May 11 2015 01:19
Yeah, that is what I was asking.
Rudi Chen
@rudi-c
May 11 2015 17:36
Which of typeobject and classobj was the new style class and old style class?
Travis Hance
@tjhance
May 11 2015 18:27
classobj is old-style class
Rudi Chen
@rudi-c
May 11 2015 18:40
Cool thanks
Chris Toshok
@toshok
May 11 2015 19:40
just spent 45 minutes tracking down a bug due to clang++ picking the wrong ctor (due to implicit coercions) :(
Marius Wachtler
@undingen
May 11 2015 19:41
:-(
Chris Toshok
@toshok
May 11 2015 19:41
my own fault, but I wish there was some way to get clang to help me figure it out :)
Travis Hance
@tjhance
May 11 2015 19:42
hm i once spent 4 hours tracking down a bug because i wrote ClassName(…) rather than ClassName varName(…);(and hence the destructor was called immediately rather than at the end of scope)
Chris Toshok
@toshok
May 11 2015 19:42
haha i had that exact bug with the stat timers
StatTimer(t0, “foo”);
man, our code is fast
pretty cool. now we can have multiple (well, three) levels of timers
e.g.:
us_med_timer_PyEq: 205339
us_timer_PyEq: 72143
Marius Wachtler
@undingen
May 11 2015 19:44
I spend also a lot of hours to find this small issue last week: (broke protobufs)
     } else
-        x = strtoul(s, &end, base);
+        x = strtol(s, &end, base);
     if (end == s || !isalnum(Py_CHARMASK(end[-1])))
Chris Toshok
@toshok
May 11 2015 19:44
us_timer is “self” time, us_med_timer is everything under PyEq, including self time
oh wow
Marius Wachtler
@undingen
May 11 2015 19:45
the only error I got was something like: decoding failed....
cool
Chris Toshok
@toshok
May 11 2015 19:47
not sure how useful this will be, since it’s so ad-hoc
Marius Wachtler
@undingen
May 11 2015 19:48
On the weekend I tried to change the small alloc pool to use a separate bitmap for tracking reachable memory instead of marking it directly inside the allocation
Chris Toshok
@toshok
May 11 2015 19:49
how’d that go?
Marius Wachtler
@undingen
May 11 2015 19:49
but (unsurprisingly) the overhead of finding the bit to set was much higher then the time we waste because of all the cache misses
Chris Toshok
@toshok
May 11 2015 19:50
yeah, but something like that will help with performance after fork()
Marius Wachtler
@undingen
May 11 2015 19:50
I just thought that I want to try it because we have a really high percentage of cache misses
Chris Toshok
@toshok
May 11 2015 19:50
nod
I wonder if we could pay additional compile/codesize costs and specialize the buckets
i.e. pass the bucket size as a template argument
Marius Wachtler
@undingen
May 11 2015 19:55
to make the minObjIndex, numObjects etc compile time consts?
Chris Toshok
@toshok
May 11 2015 19:55
yeah
and possibly switch from a bitmap to a bytemap
I think I tried that some time ago and couldn’t get it to work, but there’s no reason it couldn’t (and if the bitmap isn’t stored in the block, we’d have less of an issue with the increased size)
i wish mike pall would hurry up and write more about his proposed gc work for luajit 3.0 :)
Travis Hance
@tjhance
May 11 2015 19:59
ugh write barriers
we’re going to need those at some point
probably
Chris Toshok
@toshok
May 11 2015 20:00
doing some sort of analysis of our small arena to get to a setup like that might be beneficial. particularly his SIMD bitmap tricks
I’m optimistic about write barriers :)
Marius Wachtler
@undingen
May 11 2015 20:03
Maybe we will have the first gc which requires and takes advantage of AVX512? :-P
:)
Marius Wachtler
@undingen
May 11 2015 20:05
:smile:
Chris Toshok
@toshok
May 11 2015 20:07
ah yeah, bacon’s got a lot of good stuff in the real time gc arena
but yeah, fpgas for gc :)
we should just have a 12th compiler tier that compiles to fpga
Marius Wachtler
@undingen
May 11 2015 20:08
:-)
Chris Toshok
@toshok
May 11 2015 20:08
nice, so after inlining the stattimer ctor/dtor and getCPUTicks(), the overhead of running with timers on some microbenchmarks is down to 0.01s
Marius Wachtler
@undingen
May 11 2015 20:10
crazy how little overhead this has, let's my really wonder how we can waste so much time with other stuff...
Marius Wachtler
@undingen
May 11 2015 20:15
We will need to patch mysql-python in order to add two PyType_Ready calls :-(

they get directly assigned to the module dict:

PyDict_SetItemString(dict, "connection", (PyObject *)&_mysql_ConnectionObject_Type)

So I don't think we can easily auto register the type

Travis Hance
@tjhance
May 11 2015 20:27
what does PyType_Ready do?
Marius Wachtler
@undingen
May 11 2015 20:32
set's up mro, registers the capi methods etc
Chris Toshok
@toshok
May 11 2015 21:21
where all do we call allowGLReadPreemption?
compiled python function entry?
Kevin Modzelewski
@kmod
May 11 2015 21:27
and backedges
(and the corresponding places in the ast interpreter)
Chris Toshok
@toshok
May 11 2015 21:27
cool, ok that’s what I figured
so in a completely uncontended case, that function shows overhead of function call + 1 load + 1 stat timer
Chris Toshok
@toshok
May 11 2015 21:37
so the bulk of my hash() microbenchmark is attributed to:
us_timer_astinterpreter_jump_osrexit: 10439310
I split that up into the compilePartialFuncInternal time and the time taken to call the partial
and it comes back with:
us_timer_astinterpreter_jump_osrexit_compilePartialFuncInternal: 0
us_timer_astinterpreter_jump_osrexit_partial_func_call: 10511742
not sure how that’s possible, unless we’re not doing osr?
or rather, unless we’re not actually compiling the thing we’re calling
Marius Wachtler
@undingen
May 11 2015 21:43
mmh
Chris Toshok
@toshok
May 11 2015 21:45
wow, the partial_func->call line. once I step into it I can’t hit ctrl-c in gdb
I just see ^C, and then ~1.5 seconds later: ^C[Inferior 1 (process 20487) exited normally]
Marius Wachtler
@undingen
May 11 2015 21:48
time for :~$ gdb gdb :-D?
Chris Toshok
@toshok
May 11 2015 21:49
i set a breakpoint in the jitted assembly and am stepping through that now :)
Marius Wachtler
@undingen
May 11 2015 21:50
looks like travic-ci is very busy... I have 3 pull requests up for checking and it didn't yet start to build any of them..
oh one did run... 70.py Expected failure (got code -122, should be 0)
Chris Toshok
@toshok
May 11 2015 21:53
hm, so “while i < 10000000:” appears to box 10000000 every time through the loop
switching that to “j = 10000000; while i < j:” drops .5 seconds from runtime
Marius Wachtler
@undingen
May 11 2015 22:07
maybe we should have a sort of code.co_consts. I suspect this should help with unicode strings because currently we always call decodeUTF8StringPtr on them which could be slow.
Chris Toshok
@toshok
May 11 2015 22:07
yeah, decodeUTF8StringPtr shows up pretty high in django times
oh, I thought it was higher. only 85ms for the django-template test (which takes ~36 seconds)
Marius Wachtler
@undingen
May 11 2015 22:09
k. Do you know if our django-template test is testing similar functionality as the pypy django test?
Chris Toshok
@toshok
May 11 2015 22:10
I think ours does more (actually loads the admin/index.html template). theirs I think just creates a template and expands it repeatedly
Marius Wachtler
@undingen
May 11 2015 22:10
k
Chris Toshok
@toshok
May 11 2015 22:19
i added a card to Benchmarks -> Optimization ideas
the speedup is much larger when stat timers are active, alas :)
Chris Toshok
@toshok
May 11 2015 22:44
interesting. the hash() microbenchmark we do pretty well in, generally slightly faster than cpython
the dict setitem() micro benchmark we are 3x slower
i’m guessing due to unordered map operations, rehashing the entire contents of the table on resize?
if unordered_map doesn’t cache the hash value in the node struct it uses..
checking load factor before/after insert (if before > after, the map rehashed), I see this:
so we’re potentially doing a lot of extra hashes for bigger tables
those numbers are self->d.size(), which I’m guessing returns the number of buckets, not the number of hashed keys