These are chat archives for dropbox/pyston

13th
May 2015
Chris Toshok
@toshok
May 13 2015 03:07
weakref4.py is failing with clean master on jitdev
Chris Toshok
@toshok
May 13 2015 03:15
interesting, did something happen codegen wise where we use less stack space? upping the fact call from fact(10) to fact(20) fixes it
andrewchambers
@andrewchambers
May 13 2015 04:35
just browsing the code, is the assembler in there for your own faster code gen?
Chris Toshok
@toshok
May 13 2015 04:36
iirc it’s because we can’t use llvm for generating patchpoint assembly
Travis Hance
@tjhance
May 13 2015 04:36
Why is that?
But yeah we only use it for our inline patch points
When we jit, say, an entire method, we use llvm
We've looked into bringing in an existing open source assembler but so far nobody had taken the trouble
At any given point it's easier to add the instructions you need to our custom assembler than it is to pull in an existing project:P
andrewchambers
@andrewchambers
May 13 2015 04:41
lol
I also heard you are embedding llvm bitcode into your binary?
is that right?
Chris Toshok
@toshok
May 13 2015 04:41
yes
so llvm has opportunities to inline our runtime functions
@tjhance I just figured it was because llvm wants to emit assembly a function at a time, and is a bit heavyweight for that usage
andrewchambers
@andrewchambers
May 13 2015 04:42
Right, so the jit is kind of stitching your runtime code together from the original implementation?
So it saves you needing to implement everything twice right, once for the jit, once for the interpreter
or is it a bit different
Chris Toshok
@toshok
May 13 2015 04:45
no it’s pretty much that. most jits have opencoded sequences for operations they want to inline, and they have to worry about the non-inlined version mirroring the inlined one
andrewchambers
@andrewchambers
May 13 2015 04:46
awesome, and I noticed you have a class ast interpter, there is no bytecode ?
Travis Hance
@tjhance
May 13 2015 04:47
Yeah check out objmodel.cpp, you'll see that we write assembly to do operations (for the inline patchpoints) alongside the same operations written in c++
Chris Toshok
@toshok
May 13 2015 04:48
yeah, we interpret straight from the ast
Travis Hance
@tjhance
May 13 2015 04:48
It should really be called cfg interpreter
Chris Toshok
@toshok
May 13 2015 04:48
yeah, was going to say the ast after we’ve massaged it a bit
Travis Hance
@tjhance
May 13 2015 04:48
But our cfg is basically an ast
Although it does have some nodes that correspond more to cpython byte codes than real ast nodes
andrewchambers
@andrewchambers
May 13 2015 04:50
oh ok, so you wont be supporting the cpython bytecode format?
as in, there are some cpython modules for runtime manipulation of the bytecode
or are you gonna have one to one
of the cpython bytecode
Chris Toshok
@toshok
May 13 2015 04:51
i think we’d rather not support the cpython bytecode (or bytecode at all) until/unless we need to
andrewchambers
@andrewchambers
May 13 2015 04:51
Hehe, I think that means pypy can't be built with pyston, but most stuff would be fine.
Chris Toshok
@toshok
May 13 2015 04:52
oh pypy’s interpreter deals with cpython bytecode?
i should really check out the pypy source at some point
andrewchambers
@andrewchambers
May 13 2015 04:53
I think it does support it. I think Rpython relies on cpython bytecode format
Rpython is basically a cpython bytecode -> C translator
which does whole program analysis on the bytecode from a set entry point.
at least from what I understand.
It just does whole program type inference on the cpython bytecode
I also know Pypy uses many tricks for speed
like they specialize all the containers based on what types they have
so there are many implementations of set and array etc
Chris Toshok
@toshok
May 13 2015 04:59
weird, " Can't automatically merge. Don't worry, you can still create the pull request."
Travis Hance
@tjhance
May 13 2015 04:59
Why is that weird?
Chris Toshok
@toshok
May 13 2015 04:59
why wouldn’t github be able to automatically merge when it’s just a series of commits from the tip of upstream master?
Travis Hance
@tjhance
May 13 2015 05:00
Aliens.
Chris Toshok
@toshok
May 13 2015 05:00
heh
oh, there are additional changes on master. that would do it
Travis Hance
@tjhance
May 13 2015 05:37
what a twist ending
Marius Wachtler
@undingen
May 13 2015 08:31
@kmod Just to make sure we are not working on the same things: I would start to work on the pypa issues I have seen in your repo and then work on the pycrypto etc stuff.
Chris Toshok
@toshok
May 13 2015 14:55
@tjhance an M. Night level twist
Chris Toshok
@toshok
May 13 2015 15:57
interesting consequence of our __methods__ vs. class slots: subclasses of object end up without a tp_hash method, but the hashIC code i’ve written doesn’t call the inherited __hash__ (because .cls_only is true?)
i suspect that’ll have implications for other subclasses wrt our runtime ic's
Chris Toshok
@toshok
May 13 2015 16:17
@undingen I dropped MEGAMORPHIC_THRESHOLD from 100 to 5 locally. didn’t have a huge impact on perf (which I suppose I’d expect - we’re going through the slowpath always instead of rewriting often, after a point), but there is only an increase of 16 in megamorphic_ics (from 87 to 103), with 9000 fewer rewrite_commits and rewrite_starts.
Chris Toshok
@toshok
May 13 2015 16:34
hm, also createBoxedIterWrapperIfNeeded appears to be megamorphic due to over-specific guards. it does a typeLookup for __hasnext__ in the ic and guards on it being a specific value. couldn’t it just guard on a type flag on the class?
Marius Wachtler
@undingen
May 13 2015 16:36
mmh makes sense or at least just check if it's set or not and compare the specific value
Chris Toshok
@toshok
May 13 2015 16:40
the other place where we hit the megamorphic threshold is cls.__dispatch usage inside __promise__ in functional.py
which is odd, since cls is always the same, the property name is the same
Marius Wachtler
@undingen
May 13 2015 16:43
As for the megamorphic stuff: looks like it doesn't change performance much currently but I think it would be cool if we could handle stuff like this django Node class without constantly rewriting in the future. e.g. code where there is a method in a base class which checks what type self is and having a lot of subclasses of this class and being able to create specialized implementations of the base class method for the different subclasses.
Chris Toshok
@toshok
May 13 2015 16:44
right - I was just looking at the megamorphic since that gives some indication to the places where we’re successfully rewriting, we’re just doing it too much
does the Node class not cross the megamorphic threshold?
Marius Wachtler
@undingen
May 13 2015 16:45
mmh haven't looked into that one put the TokenParser subclass stuff was because the class got created on every request because it's defined inside a function / not at global scope.
It does you should see it
Chris Toshok
@toshok
May 13 2015 16:46
I suppose I was also operating under the assumption that successful rewriting but too often should mean that our guards might be over-specific. but that doesn’t sound like the Node problem
yeah, probably just haven’t gotten to that part of the run yet
Marius Wachtler
@undingen
May 13 2015 16:47
it's the get_nodes_by_type method I'm talking about
Chris Toshok
@toshok
May 13 2015 16:47
ah there it is. #3 0x00007ffff51a6349 in get_nodes_by_type_e2_48 () at /home/toshok/pyston-build-dbg/lib_pyston/django/template/base.py:828
Marius Wachtler
@undingen
May 13 2015 16:47
exactly :-)
Chris Toshok
@toshok
May 13 2015 16:48
for the self.child_nodelists getattr, are we failing to rewrite because we’re guarding on self?
err, we aren’t failing to rewrite - are we continually rewriting
Marius Wachtler
@undingen
May 13 2015 16:52

later on comes the TokenParser where we create megamorhic rewrites for every attribute access:

 390     def __init__(self, subject):
 391         self.subject = subject
 392         self.pointer = 0
 393         self.backout = []
 394         self.tagname = self.tag()

e.g. self.subject this is because self is always a different subclass (TranslateParser) of TokenParser because the TranslateParser implementation is inside a function / is a closure.

Chris Toshok
@toshok
May 13 2015 16:53
but it should hopefully always have the same hidden class
Marius Wachtler
@undingen
May 13 2015 16:53
yes I suppose we are guarding on the self class type even for the getattr call
Chris Toshok
@toshok
May 13 2015 17:09
wow, 20% of our ICInfo::shouldAttempt returning false is due to the ic being megamorphic
and there are a lot. ~467k
Chris Toshok
@toshok
May 13 2015 17:22
have a gdb script running that is dumping out every attempt to rewrite a megamorphic ic, so we can see if there are more heavy hitters
oh wow, here’s an interesting one:
#3  0x00007ffff4f37c0b in do_translate_e2_136 () at /home/toshok/pyston-build-dbg/lib_pyston/django/templatetags/i18n.py:351
351        class TranslateParser(TokenParser):
so even the creation of the class is megamorphic?
Chris Toshok
@toshok
May 13 2015 17:30
going to drop the https so gitter doesn’t try to inline it, but this gist shows the list of megamorphic ics + attempts to rewrite after they’ve crossed megamorphic threshold: //gist.github.com/toshok/c04e842ff240f3d13d1b
Marius Wachtler
@undingen
May 13 2015 17:54
what's the meaning of the first row?
column sorry
Marius Wachtler
@undingen
May 13 2015 18:01
oh should have read the first line... it's the number of rewrites as I suspected
Chris Toshok
@toshok
May 13 2015 18:23
Yeah, add 100 for the actual count
Marius Wachtler
@undingen
May 13 2015 18:33
maybe we just have to increase the number of slots inside the patchpoints to a few hundred and we will stop rewriting a bunch of those :-P
Chris Toshok
@toshok
May 13 2015 18:33
Once I get to office I'll generate a list of the type of ic being rewritten / count
Oh I meant to ask about that - the megamorphic threshold is 100 - does that mean we've got 100 x (guard + actions)?
That seems like an awfully big patchpoint ;)
Maybe if we can decrease the number of megamorphic ics we can decrease the threshold and lower the patchpoint size
Marius Wachtler
@undingen
May 13 2015 18:42
The getattr patchpoints have 2 slots and the callsites 4. I suppose most of the megamorphic patchpoints are of this types but I'm not sure.
Chris Toshok
@toshok
May 13 2015 18:44
Yeah that's my guess too.
Kevin Modzelewski
@kmod
May 13 2015 19:30
once it hits the threshold it just doesn't try to rewrite any more
but it keeps the same structure + the existing inline caches
it was just a simple way to reduce the previous cost of constantly (successfully) rewriting those ics
Chris Toshok
@toshok
May 13 2015 19:31
@undingen added the count of the IC types to that gist
Chris Toshok
@toshok
May 13 2015 19:37
@kmod right, but the threshold only counts against successful rewrites, so the patchpoint has to be large enough to contain that number of entries. my theory/hope is guards are too specific and can be relaxed slightly. so there will be fewer entries.
and if we can drop the number of entries across the board, we can drop the patchpoint size(s)/threshold
Kevin Modzelewski
@kmod
May 13 2015 19:38
well, we don't expand ICs right now
so I think it means that we overwrote the slots enough times
but anyway, yeah could still be an overly-specific guards issue
some of it is hard to fix due to Python's rules :/
I think we have to guard on the class when we getattr to make sure that there wasn't a data descriptor
which would override instance lookups
Chris Toshok
@toshok
May 13 2015 19:39
ahh
do the patchpoints get completely rewritten, or do we encode multiple sets of guards/actions in a given patchpoint?
Marius Wachtler
@undingen
May 13 2015 20:23
@kmod In case you are wondering what fixed the 'Unknown ciphertext feedback mode 0; this shouldn't happen' bug: it was just a small issue in your METH_OLDARGS implementation you passed the wrong self (self (CFunction) instead of self->passthrough) and I copied your bug :-D.
Kevin Modzelewski
@kmod
May 13 2015 20:31
oh ok nice :)
Marius Wachtler
@undingen
May 13 2015 20:40
because I just saw it: llvm statepoints are now also patchable like pachpoints. Don't know if this may become handy for us at some points (and don't really understand why the added it). http://reviews.llvm.org/D9546
Chris Toshok
@toshok
May 13 2015 20:41
statepoints are gc related?
Marius Wachtler
@undingen
May 13 2015 20:41
yes
Chris Toshok
@toshok
May 13 2015 20:41
interesting
Marius Wachtler
@undingen
May 13 2015 21:04
I need to implement setting the func defaults (and changing the number of them) for the decorator module: The num of defaults currently get saved inside the BoxedFunction and on the CLFunction. The default values are in BoxedFunction. Does it make sense to remove the CLFunction::num_defaults field? Or would it make more sense to move the default values into the CLFunction?
Chris Toshok
@toshok
May 13 2015 21:04
i’m stopped a rewriter.cpp:975 (where it logs a successful commit). where do I find the address of the place it’s just emitted assembly to?
Kevin Modzelewski
@kmod
May 13 2015 21:05
oh hmm I think I had looked into it
(the func_defaults thing)
I think the defaults need to live on the BoxedFunction since they can be different per function boxing
I think we were able to get decently far by only allowing changing the defaults if you set it to the same number of arguments
err, that you can change the values but not the number
@toshok I think it's in ICInfo::commit() / Rewriter::finishAssembly()
Marius Wachtler
@undingen
May 13 2015 21:07
yes I saw that, but for one of the decorator examples the number of defaults changes from 0 to 2.
assembler->start_addr?
Kevin Modzelewski
@kmod
May 13 2015 21:08
hmm well I'm still hopeful that we don't have to support that
but if we do, I think we'd have to remove CLFunction::num_defaults and then make sure to guard appropriately in callFunc
Marius Wachtler
@undingen
May 13 2015 21:09
Chris Toshok
@toshok
May 13 2015 21:09
@undingen awesome, that does the trick
Kevin Modzelewski
@kmod
May 13 2015 21:09
assembler->start_addr is the address of the temporary buffer that it writes into
which will have the valid assembly but isn't the the slot that it will end up in
Chris Toshok
@toshok
May 13 2015 21:09
nod
Marius Wachtler
@undingen
May 13 2015 21:10
oh :-(
Chris Toshok
@toshok
May 13 2015 21:13
so one thing v8 does for JS is to include the hidden class of the prototype in the hidden class of a given instance
Marius Wachtler
@undingen
May 13 2015 21:13
@trace
def f(x, y=1, z=2, *args, **kw):
     pass
I think internally the decorator module creates a function with no defaults and then retrieves with the introspect module the number of defaults and sets them on the newly created function. Thereby increasing the number of default args. We may be able to work around this by changing the decorator module.
and sorry for making following the chat hard by always interrupting the flow with my posts about different topic.
Chris Toshok
@toshok
May 13 2015 21:14
so IIRC if you do something like: function foo() { } foo.prototype = Object.create(null); f = new foo(); f.prototype.blah = 5; the f.prototype.blah assignment actually changes the hidden class of f
maybe we can do something similar with the tp_mro
and maybe also factor in descriptor behavior into the hidden class
so we can move back to a place where shapes are the guards as opposed to BoxedHeapClass*’s
Kevin Modzelewski
@kmod
May 13 2015 21:21
@undingen it looks like the only places we use the decorator library
is on functions of the signature def foo(func, *args, **kwargs)
and it's just decorator.decorator(foo)
I think it's just this one
Marius Wachtler
@undingen
May 13 2015 21:28
oh then I will focus on the other stuff which makes lib decorator not work at the moment
pylons is crazy, I tried to run the tests and it was installing what felt like 10 external libs.
install_requires=[
        "Routes>=1.12.3", "WebHelpers>=0.6.4", "Beaker>=1.5.4",
        "Paste>=1.7.5.1", "PasteDeploy>=1.5.0", "PasteScript>=1.7.4.2",
        "FormEncode>=1.2.4", "simplejson>=2.2.1", "decorator>=3.3.2",
        "nose>=1.1.2", "Mako>=0.5.0", "WebError>=0.10.3",
        "WebTest>=1.3.1", "Tempita>=0.5.1", "MarkupSafe>=0.15",
        "WebOb>=1.1.1",
    ],
didn't even exaggerate
Chris Toshok
@toshok
May 13 2015 23:48
pretty small test that shows megamorphic ics: gist.github.com/toshok/acdb83ff6bb384eed23b
3 of them, presumably for the calls to __iter__, __hasnext__, as well as the createBoxedIterWrapperIfNeeded