These are chat archives for dropbox/pyston

11th
Jun 2015
Chris Toshok
@toshok
Jun 11 2015 17:13
From pypy.org: "Another difference is that if you add a __del__ to an existing class it will not be called"
Rudi Chen
@rudi-c
Jun 11 2015 17:14
Is that just pypy or python specs?
Chris Toshok
@toshok
Jun 11 2015 17:14
Pypy- it's a difference between pypy and cpython
Rudi Chen
@rudi-c
Jun 11 2015 17:15
Certainly makes that part easier.
Rudi Chen
@rudi-c
Jun 11 2015 17:16
So with your heap changes, time spent in GC goes from 6.5s to 2.4s
Chris Toshok
@toshok
Jun 11 2015 17:16
Yeah, there are other useful differences too (related to old style instances)
Rudi Chen
@rudi-c
Jun 11 2015 17:16
That's pretty nice
(Although total time only went down 1s, not sure why)
Chris Toshok
@toshok
Jun 11 2015 17:17
Niiice. Unfortunately I think it's a bit of a hit for the non-finalization case, since it increases time spent in alloc/free slightly.
Rudi Chen
@rudi-c
Jun 11 2015 17:18
Maybe we can use some sort of tree structure?
Chris Toshok
@toshok
Jun 11 2015 17:19
That would remove the memmove + realloc, but we'd still have to balance
Or maybe not. Just having hierarchy would be enough
Rudi Chen
@rudi-c
Jun 11 2015 17:30
              pyston (calibration)                      :    0.6s baseline: 0.6 (-4.2%)
              pyston django_migrate.py                  :    3.6s baseline: 3.7 (-2.0%)
              pyston virtualenv_bench.py                : failed (code 1)
              pyston django-template.py                 :   14.0s baseline: 18.1 (-22.5%)
              pyston interp2.py                         :    3.0s baseline: 3.0 (+1.1%)
              pyston raytrace.py                        :    4.6s baseline: 4.7 (-1.9%)
              pyston nbody.py                           :    7.3s baseline: 7.3 (-0.1%)
              pyston fannkuch.py                        :    6.1s baseline: 5.9 (+1.9%)
              pyston chaos.py                           :   12.7s baseline: 13.1 (-2.7%)
              pyston fasta.py                           :    6.4s baseline: 6.0 (+7.0%)
              pyston pidigits.py                        :    6.5s baseline: 5.6 (+16.0%)
              pyston richards.py                        :    1.3s baseline: 1.3 (-3.0%)
              pyston deltablue.py                       :    1.0s baseline: 1.1 (-4.4%)
Where baseline = with finalizers without your heap
Chris Toshok
@toshok
Jun 11 2015 17:57
Yeah the pidigits change is the overhead I was thinking of. Pidigits allocates a lot of large arena objects (BoxedLongs)
Kevin Modzelewski
@kmod
Jun 11 2015 19:01
Is this including the change away from using allocationFrom?
Rudi Chen
@rudi-c
Jun 11 2015 19:01
No, it still uses allocationFrom. Only about 1/3 of the calls to allocationFrom have ptr == head of block.
Kevin Modzelewski
@kmod
Jun 11 2015 19:02
hmm but that function is for conservative scanning... I don't think the finalization code should need to use it
Chris Toshok
@toshok
Jun 11 2015 19:03
We need to conservatively scan - finalizable objects can be separated by arbitrarily many objects, some conservative_python :/
Kevin Modzelewski
@kmod
Jun 11 2015 19:05
I don't see how "separated by conservative objects" means "have to conservatively scan"
Rudi Chen
@rudi-c
Jun 11 2015 19:06
Objects with finalizers can have references to conservative objects and we need to mark the conservative object until the object gets finalized.
Chris Toshok
@toshok
Jun 11 2015 19:06
We have to scan through conservative objects = conservatively scan, doesn't it?
I mean we can change things such that a potential pointer isn't scanned through if it's not a head pointer, but that is exactly the operation that allocationFrom performs
Rudi Chen
@rudi-c
Jun 11 2015 19:08
That being said I thought making unicode_cls a has_safe_tp_dealloc should have removed a lot of the allocationFrom calls. I need to investigate that again...
Kevin Modzelewski
@kmod
Jun 11 2015 19:08
No my question isn't why does the conservative gc have to do conservative scanning, but why the finalizer code does
Chris Toshok
@toshok
Jun 11 2015 19:08
@rudi-c Did it lessen the number at all?
Rudi Chen
@rudi-c
Jun 11 2015 19:08
The finalization ordering code needs to do a mark-like phase starting from objects with finalizers.
@toshok No but I might be confusing between which runs had which code, so that might not be right.
I'm trying to get my slicing code to pass protobuf that Marius added atm
Kevin Modzelewski
@kmod
Jun 11 2015 19:10
oh hmm why do we need to do an extra mark phase?
could the current mark phase handle that?
and it's also weird that the finalization-mark-phase does so many more allocationFrom calls than the global mark phase
Rudi Chen
@rudi-c
Jun 11 2015 19:11
That's because in django-template, there are a lot of short-lived LargeObj.
The mark phase scans reachable objects.
Chris Toshok
@toshok
Jun 11 2015 19:11
The extra mark phase is the pypy algorithm
Rudi Chen
@rudi-c
Jun 11 2015 19:11
The finalization ordering phase scans unreachable objects reachable from an objects with a finalizer (that is unreachable itself also).
So it just happens that in django-template, a lot of objects that were only touched during the sweep phase are now touched during the mark phase.
Chris Toshok
@toshok
Jun 11 2015 19:12
Right, initial mark phase only visits live objects. Turns out we have a lot of garbage :)
Rudi Chen
@rudi-c
Jun 11 2015 19:12
*mark phase of finalization ordering
Chris Toshok
@toshok
Jun 11 2015 19:13
We might be able to prune the ordering phase when we reach simple tpdealloc objects?
Rudi Chen
@rudi-c
Jun 11 2015 19:13
Yeah I think we should be able to do something there.
Btw are either of you coming for lunch?
Chris Toshok
@toshok
Jun 11 2015 19:15
Just outside now- had prenatal appt :)
Kevin Modzelewski
@kmod
Jun 11 2015 19:15
I don't think those reasons explain the 10x increase in calls to allocationFrom
no on lunch, have an errand :/
Rudi Chen
@rudi-c
Jun 11 2015 19:18
I need to double check this, but I believe the majority of allocated objects in django-template are dead by the time a GC pass occurs, but adding finalization ordering scans them. Scanning causes call to allocationFrom.
It's only one that one test that there's this big increase.
*only on
Chris Toshok
@toshok
Jun 11 2015 19:19
there’s also the issue that allocationFrom is what’s used to determine if we’ve already visited an object, so mark bit/ordering state doesn’t keep us from having to do it, sometimes repeatedly on the same object
It gets hit in a small test case like this:
def test(obj):
    if not obj:
        return "yay"
    else:
        return "nope"

print test(str(x) for x in xrange(2))
I could add || generator_cls but I'm wondering if the whole assert is needed to begin with.
Marius Wachtler
@undingen
Jun 11 2015 20:15
Although I like the std::string -> BoxedString change rebasing was a pain...
Chris Toshok
@toshok
Jun 11 2015 20:16
yeah
Marius Wachtler
@undingen
Jun 11 2015 20:28
Today I mostly investigated the performance of the new benchmarks you added. Couldn't really see large easy gains. will be a interesting journey to reach good perf :-)
Chris Toshok
@toshok
Jun 11 2015 20:29
i need to figure out a way to get a pycparser benchmark added
Kevin Modzelewski
@kmod
Jun 11 2015 22:12
@rudi-c that check is just to make sure that we don't forget to add a __nonzero__ method to something that should have one
the thinking being that it would be super hard to debug
so yeah it's kind of dumb that we are listing all of the builtin classes there
but it's been somewhat helpful. maybe we've already covered everything though and it could be removed.
andrewchambers
@andrewchambers
Jun 11 2015 22:34
pycparser was be a good benchmark, I think pypy was pretty slow with it previously
and used a crap load of ram.
would be*
I was playing with it before https://github.com/andrewchambers/pycc
and pypy really didn't help much vs cpython
that was 2 or 3 years ago though.
andrewchambers
@andrewchambers
Jun 11 2015 22:40
@toshok http://people.csail.mit.edu/smcc/projects/single-file-programs/ I tried to benchmark by taking one of these files
running through gcc -E
and manually fixing up any errors
you can get a really long and realistic test case
andrewchambers
@andrewchambers
Jun 11 2015 22:46
(with pycparser, not my curddy thing)
Marius Wachtler
@undingen
Jun 11 2015 22:47
gcc... :-P
andrewchambers
@andrewchambers
Jun 11 2015 22:47
just for the preprocessor
clang -E ? lol
tcc ?
one of them
If you are mocking gcc, I agree.
Marius Wachtler
@undingen
Jun 11 2015 22:48
no I mean the gcc source from your link as benchmark :-D
andrewchambers
@andrewchambers
Jun 11 2015 22:48
lol Yeah, I have no idea how he did it
combined gcc into a single file, it works though
That single file can compile itself last time I checked
Use a bunch of space just to check it into your git :P
Marius Wachtler
@undingen
Jun 11 2015 22:49
Could also get used as benchmark for text editors
andrewchambers
@andrewchambers
Jun 11 2015 22:51
Yeah, good benchmark for anything, regex, grep, python, c compilers lexers
great resource
garbage collectors
just doing any processing on such a massive text file is interesting.
Marius Wachtler
@undingen
Jun 11 2015 23:05
  • testing how long a printer cartridge lasts :-D
andrewchambers
@andrewchambers
Jun 11 2015 23:06
Testing my sanity