These are chat archives for dropbox/pyston

18th
Nov 2015
lesshaste
@lesshaste
Nov 18 2015 10:36
@undingen I found out why pypy is slow for this and it seems unrelated to the pyston issues
your speed up seems great.. do you think it is generally useful?
This message was deleted
Marius Wachtler
@undingen
Nov 18 2015 10:38
ok cool, I guess for pypy it's also only some small thing which introduces the bad perf but I have no idea about pyp..
lesshaste
@lesshaste
Nov 18 2015 10:39
in pypy it is mostly that the tuple comparisons are just not jitted at all
increasing the range from 0..99 to -100..99 seems.. somewhat arbitrary :)
does this range need to be hardcoded?
Marius Wachtler
@undingen
Nov 18 2015 10:40
The "cache negative" numbers part seems useful and we will properly add it. But the GC frequency change is not good. It's better to wait and see if we get a refcounting GC
yes I think it should stay hardcoded because it's nothing the user should worry about and increasing it would properly not speed up that much (it makes only sense for a small range of numbers)
lesshaste
@lesshaste
Nov 18 2015 10:43
ok.. I wasn't thinking of the user setting it but more if it could be dynamically changed.. but I am speaking beyond my knowledge so please feel free to ignore :)
I am pleased my benchmark may have inspired at least one pyston improvement
Marius Wachtler
@undingen
Nov 18 2015 10:43
:-)
lesshaste
@lesshaste
Nov 18 2015 10:44
do you mind if I ask a more general pyston question? I am a little confused by the aim of the project to be honest
is it a) to be faster than pypy for the type of code dropbox cares about (but maybe not for lots of other code that dropbox doesn't care about) and/or b) to be faster than cpython on a range of python code that pypy can't easily support?
because being faster than pypy in general seems very hard
or c).. something else :)
Marius Wachtler
@undingen
Nov 18 2015 10:48
So for us is the dropbox codebase the most important target. We noticed that pypy doesn't do very good at it
And we really require very good C API support because the codebase has also a lot of C code
lesshaste
@lesshaste
Nov 18 2015 10:50
Thanks. Does that mean that supporting the many many 3rd party python modules implemented in C will be less painful?
Marius Wachtler
@undingen
Nov 18 2015 10:50
and we have the goal to be very compatible with cpython (both for python and C extensions) much more than pypy currently is
lesshaste
@lesshaste
Nov 18 2015 10:50
take as random examples, scipy or scikit-learn
pypy's main and possibly fatal flaw is that it is a huge amount of work to support third party python modules not written in pure python
it will be great if this doesn't turn out to be a fatal flaw of pyston
Marius Wachtler
@undingen
Nov 18 2015 10:51
yes thats the goal (that 3th party libs work without changes - but we mostly test only with the ones dropbox uses but we are looking forward to contributions for other libs)
lesshaste
@lesshaste
Nov 18 2015 10:52
does dropbox use any of the scientific libraries including numpy etc?
Marius Wachtler
@undingen
Nov 18 2015 10:52
so I think pypy will always do better for some short small hot loop test case where a tracing jit does very very good
lesshaste
@lesshaste
Nov 18 2015 10:53
pypy does give 5-7x speedup in my experience for quite a lot of my simple algorithmic code which is very impressive
Marius Wachtler
@undingen
Nov 18 2015 10:53
but we saw the performance is much worse for large genral libraries - just take a look at our django benchmark for example pypy is there not that much faster than cpython
lesshaste
@lesshaste
Nov 18 2015 10:53
but then I can't use any of the modules I want to use with it
I wasn't sure how to interpret the django benchmark. Are these short running jobs?
Marius Wachtler
@undingen
Nov 18 2015 10:54
and we saw similar (and even worse) behavior on large dynamic codebases
lesshaste
@lesshaste
Nov 18 2015 10:54
pypy can take a few seconds to warm up so any benchmark that is short will do badly for it
Marius Wachtler
@undingen
Nov 18 2015 10:55
10x version takes about 10secs
yes I know that, we try with pyston to also have short warm up time in order that developers can use it while developing the software (long delays would be really bad)
So we want to support the use case of long running server process and frequent code changes and restarts on a dev machine
Marius Wachtler
@undingen
Nov 18 2015 11:02
and I'm surprised that pypy is unable to speed up your script because I think that's something were pypy should do well but like the said the properly just don't handle a particular path in there tracer which is very important for this benchmark and if they would add it (no idea how complicated) I would expect it to get much faster that it is now.
lesshaste
@lesshaste
Nov 18 2015 11:09
I have an uncanny ability to write code that breaks things :) More seriously.. I basically write scientific or mathematical simulation code which I suppose is not the main use case of python
although arguably it is a use case for people who want python to run faster
Marius Wachtler
@undingen
Nov 18 2015 11:13
yes I think we need and should add pypy4 to the auto benchmark runs
and I would really like if we support more of numpy in the future but I don't have currently time to work on it
Rudi Chen
@rudi-c
Nov 18 2015 11:16
I'm trying to get the test suite to run atm (read: run, not pass).
Marius Wachtler
@undingen
Nov 18 2015 11:16
I suspect the library should work really good with pyston in the future and I really like that we should not need to fork it like pypy does...
oh nice!
Rudi Chen
@rudi-c
Nov 18 2015 11:17
But a pretty significant portion of the tests do pass already.
Marius Wachtler
@undingen
Nov 18 2015 11:17
:+1:
Rudi Chen
@rudi-c
Nov 18 2015 11:17
Though of course, those that don't are going to be the challenge.
lesshaste
@lesshaste
Nov 18 2015 11:19
@undingen it would be amazing if pyston supported numpy!
Marius Wachtler
@undingen
Nov 18 2015 11:22
and it would properly help to get more people interested in our project ;-)
lesshaste
@lesshaste
Nov 18 2015 11:26
yes!
Marius Wachtler
@undingen
Nov 18 2015 22:19
@kmod I too thought about having a fixed offset for the vregs from the base pointer but than I was not sure how I can implement it. It should be possbile with a custom pass which adjust the frame layout llvm emits but that does not sound like a too good approach. Do you have an idea for a easier way?
and I will tomorrow investigate the llvm generated asm code in order to better understand what llvm does and what is slowing down things or speeding up stuff for the frame handling stuff
Kevin Modzelewski
@kmod
Nov 18 2015 22:22
I don't know of an elegant way to do it
one option is to just do an alloca and then pass that as a stackmap arg
and then assert that the stackmap arg got passed as an rbp offset
we could even assert that it got passed as a specific offset
we do something similar to find the frame_info offset