These are chat archives for dropbox/pyston

30th
Jun 2015
Chris Toshok
@toshok
Jun 30 2015 04:19
seen in gc_trace log: Pushing 0x2270011024
that’s not a valid gc pointer :/
Marius Wachtler
@undingen
Jun 30 2015 06:48
looks like a pointer from the large heap arena
Chris Toshok
@toshok
Jun 30 2015 06:48
yeah
Marius Wachtler
@undingen
Jun 30 2015 06:48
constexpr uintptr_t ARENA_SIZE = 0x1000000000L;
constexpr uintptr_t SMALL_ARENA_START = 0x1270000000L;
constexpr uintptr_t LARGE_ARENA_START = 0x2270000000L;
constexpr uintptr_t HUGE_ARENA_START = 0x3270000000L;
Chris Toshok
@toshok
Jun 30 2015 06:49
due to some changes I’d made to the GCAllocation header. going to add more asserts there :)
static_asserts that is
Marius Wachtler
@undingen
Jun 30 2015 06:52
what's your idea on why your DenseMap/Set change has now really bad perf? I wouldn't have expected that...
Chris Toshok
@toshok
Jun 30 2015 06:53
yeah I hadn’t dug into it at all once I got it rebased
i was kinda stumped, since earlier it showed better perf in the (different set of) benchmarks
one change I had to make for it was increasing the large bucket sizes available in the small arena
since densemap/set/etc requires more inline storage, it was pushing more types to the large arena
might be that again
Marius Wachtler
@undingen
Jun 30 2015 07:01
oh k
Travis Hance
@tjhance
Jun 30 2015 07:46
hm, I think that pyxl_bench is running out of rewrite slots. I saw pickEntryForRewrite return NULL
Kevin Modzelewski
@kmod
Jun 30 2015 11:00
Erf looks like there was a travis-ci maintenance
I was having a lot of issues with it around that time
so, just an fyi if anyone else was having issues :/
Chris Toshok
@toshok
Jun 30 2015 14:21
https://travis-ci.org/dropbox/pyston/jobs/68898867 threading_local.py strikes again
Chris Toshok
@toshok
Jun 30 2015 15:47
played around with gc frequency. turns out that once LargeArena::allocationFrom is fixed, django_template does pretty well with a 192 meg heap
requires only 3 collections for the run (if we collect at 85% of heap size)
Collection #1, heap size = 192.00M, pre-gc utilization ~= 163.20M, post-gc utilization = 9.12M) .   34ms
Collection #2, heap size = 192.00M, pre-gc utilization ~= 163.20M, post-gc utilization = 9.21M) .   29ms
Collection #3, heap size = 192.00M, pre-gc utilization ~= 163.20M, post-gc utilization = 9.18M) .   29ms
crazy amounts of garbage - 9M survives each collection
Rudi Chen
@rudi-c
Jun 30 2015 17:41
What are the perf changes?
Chris Toshok
@toshok
Jun 30 2015 17:43
for 192 it’s ~4% faster, but other benchmarks are negatively affected
i made things more dynamic (based on heap occupancy after a collection, as well as the time between collections), and got it to this:
                           88eedd39ce0745f6e0:  35525e019e1221d9a7:
       django_template.py             4.7s (2)             4.4s (4)  -6.2%
            pyxl_bench.py             3.9s (2)             4.0s (4)  +1.8%
sqlalchemy_imperative2.py             5.1s (2)             5.0s (4)  -1.1%
        django_migrate.py             1.8s (2)             1.8s (4)  -2.3%
      virtualenv_bench.py             7.8s (2)             7.6s (4)  -3.1%
                  geomean                 4.2s                 4.1s  -2.2%
Rudi Chen
@rudi-c
Jun 30 2015 17:44
That seems pretty good
Chris Toshok
@toshok
Jun 30 2015 17:44
the +1.8% on pyxl_bench.py seems to be that while it allocates some large objects, it doesn’t allocate enough to outweigh the cost of building the optimized lookup vector
at the start of each collection it walks all large objects and huge objects and puts them in sorted vectors. ::allocationFrom uses that vector
Travis Hance
@tjhance
Jun 30 2015 17:45
why is it so good for django_template
Chris Toshok
@toshok
Jun 30 2015 17:45
large heap size + fast LargeArena::allocationFrom
pyxl_bench allocates on the order of a couple hundred large objects
django_template is like 12000
per collection
large heap size means drastically fewer collections. django_template normally does 88 gcs. with this patch it does 8
there are a few knobs to twiddle that are #defines at the moment. almost want to make those the defaults but allow them to be overridden either on the command line or env vars
the knobs for that gist are: heap_occupancy_post_gc: 0.9 , quick_gc_frequency_ms: 500, number_of_quick_gcs: 2, initial_heap_size_megs: 32
Rudi Chen
@rudi-c
Jun 30 2015 17:50
These seem to be useful knobs to have.
I always hear about "tuning the GC" from Java developers.
Chris Toshok
@toshok
Jun 30 2015 17:51
actually heap_occupancy_post_gc is both the trigger for heap resizing as well as the % of the heap that needs to be filled before we gc
should probably be split into two knobs
Travis Hance
@tjhance
Jun 30 2015 17:51
won’t we just end up “overfitting” to these particular benchmarks by twiddling those knobs?
Chris Toshok
@toshok
Jun 30 2015 17:51
and it’s also the amount we grow the heap when we do.
there’s definitely the desire to do that - I mean I was doing that here with the #defines :)
Marius Wachtler
@undingen
Jun 30 2015 17:53
what's number_of_quick_gcs?
Chris Toshok
@toshok
Jun 30 2015 17:53
“tuning the gc” wrt java is one of the things I don’t really like about the language, but it’s hard to optimize for every workflow
the number of gcs that must occur more frequently than quick_gc_ms in a row
Rudi Chen
@rudi-c
Jun 30 2015 17:54
What is the "pyston (calibration)" benchmark? If it's more than +/-1%, does it mean I should adjust the value of the other benchmarks mentally in one direction or another?
Chris Toshok
@toshok
Jun 30 2015 17:54
right now it’s 2 and 500, so if 2 gcs happen in a row with less than 500ms between them and the previous one, we resize the heap
Marius Wachtler
@undingen
Jun 30 2015 17:54
k
Rudi Chen
@rudi-c
Jun 30 2015 17:54
The best way to avoid overfitting is to have more data :D (aka more benchmarks)
Travis Hance
@tjhance
Jun 30 2015 17:54
:D
Chris Toshok
@toshok
Jun 30 2015 17:55
we’ll eventually overfit to all python applications
it’s a slippery slope
i’d love to have more single process benchmarks
right now we only have 3
Travis Hance
@tjhance
Jun 30 2015 17:59
the multiprocess ones are annoying to debug
Rudi Chen
@rudi-c
Jun 30 2015 17:59
I know right.
But they're the only ones that uncover obscure edge cases...
Travis Hance
@tjhance
Jun 30 2015 17:59
maybe we should add an option to all our multiprocess benchmarks to run all the subprocesses in gdb
i think that would solve most problems
Chris Toshok
@toshok
Jun 30 2015 18:00
if there’s a way to specify a command in a more general way, i’d be all over that
i got exactly nowhere trying to figure out why my run allocator caused a 10% slowdown in virtualenv_bench :(
Rudi Chen
@rudi-c
Jun 30 2015 18:01
It's tricky because some subprocess calls redirect stdout/stderr so I need to change that argument manually when I want to run GDB.
Travis Hance
@tjhance
Jun 30 2015 18:02
monkeypatch Popen
to check an environment variable or something
Marius Wachtler
@undingen
Jun 30 2015 22:43
@kmod One thing I forgot to document is the strange: continue_jmp_offset member which will get set in emitJump
I always generate a jump at the end of the block to the next block / interpreter exit (ok not always if it's a return statement then not)
Marius Wachtler
@undingen
Jun 30 2015 22:49
if we jit the following block we can just overwrite the jump and fall through. But we don't know if JITing the next block will work. that's why I emit the jump but save the num of bytes emitted inside the continue_jmp_offset member. And the next block will start before the jump (startJITing second arg). So it will overwrite it if it succeeds.
hope the text makes more sense after the edit :-D
Chris Toshok
@toshok
Jun 30 2015 23:04
haircut time, back online from home afte