These are chat archives for dropbox/pyston

19th
Jun 2015
Rudi Chen
@rudi-c
Jun 19 2015 00:03
Was there a function like getAllocationFromInteriorPointer but for non-interior pointers?
Using isValidGCObject(b->cls) in _doFree is slowing down a lot of other tests.
Kevin Modzelewski
@kmod
Jun 19 2015 00:19
what about the idea of making sure classes get marked if they have any instances
even if the instances are dead?
Rudi Chen
@rudi-c
Jun 19 2015 00:20
Yeah, I was only marking the ones with simple finalizers. Also, I kept it in a vector which is dumb. llvm::SmallPtrSet sounds like it'd be the right data structure to use?
Chris Toshok
@toshok
Jun 19 2015 00:34
@undigen: oh that upcast trick is pretty slick
@rudi-c GCAllocation::fromUserData is the way to go from non-interior pointer to allocation
@rudi-c do we have a good idea where that extra time goes? is it all the phase where we scan the heap for finalizable objects?
Rudi Chen
@rudi-c
Jun 19 2015 00:37
Oh, it's that called isValidGCObject for every object being swept, but that isn't really necessary + I have better data structure options.
Chris Toshok
@toshok
Jun 19 2015 00:41
ahh, yeah unfortunately there’s no real way to speed that up since we have to be able to search for the large/huge arena object
is this just for an assert?
Rudi Chen
@rudi-c
Jun 19 2015 00:42
Oh it's ok, there's a solution that removes the need for it entirely.
Chris Toshok
@toshok
Jun 19 2015 00:42
nod
Chris Toshok
@toshok
Jun 19 2015 00:59
@rudi-c was the trace stack virtual method a slowdown?
Rudi Chen
@rudi-c
Jun 19 2015 00:59
Haven't tried measuring that specifically yet.
Chris Toshok
@toshok
Jun 19 2015 03:03
@kmod heh I was just sending you mail about that PR. I already have improvements (both design and performance :)
the perf improvement will likely help the ASTInterpreter Box subclass’s impact as well
Chris Toshok
@toshok
Jun 19 2015 04:16
what’s __attribute__((visibility("default"))); do?
Kevin Modzelewski
@kmod
Jun 19 2015 06:13
not 100% sure any more, but I think it's the magic incantation to make it so we can inline bitcode that comes from inline functions
which normally ends up with the wrong "visibility" so the llvm linker can't find it
Chris Toshok
@toshok
Jun 19 2015 06:17
looks like we get ~10% by LD_PRELOAD’ing jemalloc
on django_template.py that is
Kevin Modzelewski
@kmod
Jun 19 2015 06:19
oh nice
Chris Toshok
@toshok
Jun 19 2015 06:20
from 0major+51705minor to 0major+23605minor probably helps too
Chris Toshok
@toshok
Jun 19 2015 07:06
              pyston django_template.py                 :    5.4s baseline: 5.9 (-7.8%)
              pyston pyxl_bench.py                      :    4.3s baseline: 4.6 (-7.2%)
              pyston sqlalchemy_imperative.py           :    2.0s baseline: 2.1 (-7.2%)
              pyston django_migrate.py                  :    1.7s baseline: 1.8 (-8.0%)
              pyston virtualenv_bench.py                :    4.9s baseline: 5.3 (-6.7%)
little less than 10% but not bad
Marius Wachtler
@undingen
Jun 19 2015 07:16
wow would not have thought that changing the malloc impl would help. Are our malloc calls mostly std::string allocs?
Chris Toshok
@toshok
Jun 19 2015 14:43
At this point I don't think so. It's be interesting to find out. Jemalloc uses tls for allocation instead of heap locks iirc, which helps
Marius Wachtler
@undingen
Jun 19 2015 18:04
migrating the jit tier I'm working on to using the rewriter is more work than I thought. But I'm now atleast in a state where it's compiling :-D
Chris Toshok
@toshok
Jun 19 2015 18:20
Awesome
Rudi Chen
@rudi-c
Jun 19 2015 19:07
Putting in a set every class object for which there is still an instance in the heap is really expensive. As much as checking if it's an interior pointer whenever an object gets freed.
Rudi Chen
@rudi-c
Jun 19 2015 19:25
Oh better idea: handle class objects like weak references (store in a list to free individually later).
Travis Hance
@tjhance
Jun 19 2015 23:28
how significant is the rewriter in perf reports? I’m wondering whether I need to code it carefully to be fast or not
I’m guessing it isn’t that signficant
Chris Toshok
@toshok
Jun 19 2015 23:29
It shows up as the source of a lot of mallocs/frees
Travis Hance
@tjhance
Jun 19 2015 23:29
okay that isn’t surprising
I tried to cut that down, but it didn’t seem to help much though
Chris Toshok
@toshok
Jun 19 2015 23:29
yeah. and that’s fixable without really changing the code
we could use a mempool per rewriter where everything gets allocated bump pointer fashion, then we free everything at the end
Travis Hance
@tjhance
Jun 19 2015 23:30
right I basically tried allocating all the RewriterVars that way
Chris Toshok
@toshok
Jun 19 2015 23:31
-   2.06%  pyston_release  libjemalloc.so.1     [.] malloc                                                                                                                                                                                                                                                       ▒
   - malloc                                                                                                                                                                                                                                                                                                      ▒
      + 6.92% void __gnu_cxx::new_allocator<pyston::RewriterAction>::construct<pyston::RewriterAction, std::function<void ()> const&>(pyston::RewriterAction*, std::function<void ()> const&)                                                                                                                    ▒
so not super terrible
Kevin Modzelewski
@kmod
Jun 19 2015 23:31
I think overall we're pretty good at not rewriting when it's going to fail
Travis Hance
@tjhance
Jun 19 2015 23:31
what do the percents mean, there?
Kevin Modzelewski
@kmod
Jun 19 2015 23:31
so I feel like it's ok to err on the side of doing things that are somewhat expensive?
Chris Toshok
@toshok
Jun 19 2015 23:32
2.06% of total cpu time in malloc, 6.92% of that in ::construct
yeah
Travis Hance
@tjhance
Jun 19 2015 23:32
that isn’t very much
Chris Toshok
@toshok
Jun 19 2015 23:32
yeah
Travis Hance
@tjhance
Jun 19 2015 23:33
k, I won’t really worry about it at the moment
Chris Toshok
@toshok
Jun 19 2015 23:33
that’s with jemalloc, though, %’s might be higher with glibc, but not by much
yeah, with glibc _int_malloc (the thing c++ operator new uses) is at 2.95%. so not too much of an issue