These are chat archives for dropbox/pyston

7th
May 2015
Chris Toshok
@toshok
May 07 2015 00:28
cool, down to ~1.9s total that’s in the grab bag of us_timer_chosen_cf_body_builtins and us_timer_chosen_cf_body_jitted
out of a 27s runtime
Chris Toshok
@toshok
May 07 2015 00:35
woohoo @rudi-c
Rudi Chen
@rudi-c
May 07 2015 00:35
:D
Kevin Modzelewski
@kmod
May 07 2015 07:27
first commit and pull request #500 at the same time :)
Travis Hance
@tjhance
May 07 2015 08:01
I’m getting /home/tjhance/pyston/src/core/util.cpp:34:12: error: use of undeclared identifier '__rdtscp' return __rdtscp(&_unused);
I don’t know what this is
Kevin Modzelewski
@kmod
May 07 2015 08:02
glad I'm not the only one :P
I have a fix locally but I'll separate it out and push it
Travis Hance
@tjhance
May 07 2015 08:02
okay
Kevin Modzelewski
@kmod
May 07 2015 08:03
err, workaround, it sounds like it probably should be available / should be easy to make available
#502
if you want to try it now
Travis Hance
@tjhance
May 07 2015 08:05
ah ok
I should probably call it a night
but slots is so close..
Marius Wachtler
@undingen
May 07 2015 14:29
cpython return True for:
class C(object):
    pass
print C.__str__ is object.__str__
note the is identity comparison instead of == - cheetah uses this code. Now that I know what the issue is I also found a pypy mailinglist entry (pypy return False like we do) https://mail.python.org/pipermail/pypy-dev/2013-February/010882.html
what should we do about this?
Marius Wachtler
@undingen
May 07 2015 14:36
PyPy also has a wiki entry for this change: https://bitbucket.org/pypy/compatibility/wiki/cheetah
Marius Wachtler
@undingen
May 07 2015 15:04
@toshok thanks for creating the cool new timers
Travis Hance
@tjhance
May 07 2015 15:11
Well, why do wet not return true?
Chris Toshok
@toshok
May 07 2015 15:11
@kmod / @tjhance yeah, sorry about that rdtscp definedness problem.
the clang commit that added them is:
Date:   Thu Apr 24 18:26:35 2014 +0000

    [X86] Add Clang support for intrinsics __rdtsc and __rdtscp.
3.4 was released Jan 2014
Travis Hance
@tjhance
May 07 2015 15:17
actually come to think of it, why does cpython return true?
okay so we do the function/instancemethod thing
whereas cpython does the wrapper_descriptor thing
do we have support for wrapper_descriptors right now?
we do
Travis Hance
@tjhance
May 07 2015 15:25
oh, except we might use it only for the capi right now, and not support using it for pyston functions
Travis Hance
@tjhance
May 07 2015 15:32
i admit I have no clue what a wrapperdescriptor actually is
the source code is pretty unilluminating
Chris Toshok
@toshok
May 07 2015 15:53
BoxedWrapperDescriptor is a descriptor that wraps one of the tp_* methods, right?
it’s a callable object whose __call__ invokes the wrapped function
Travis Hance
@tjhance
May 07 2015 16:34
Oh I see
there are just so many things that wrap other things that wrapper_descriptor isn’t very informative:P
anyway I don’t see why we couldn’t do the same thing as cpython here
if that’s all it is
oh
unless the tp_* methods are capi functions
then we’d be needlessly going through the capi stuff?
is that true?
Chris Toshok
@toshok
May 07 2015 18:32
they are capi functions, if by that you mean C functions that participate in the return value / PyErr_ exception mechanism
Travis Hance
@tjhance
May 07 2015 18:32
yeah that’s what I meant
Chris Toshok
@toshok
May 07 2015 18:44
hm, i was going to use __has_builtin to check for the rdtscp intrinsic, but I suppose if we’re going to have to fall back to asm at all, might as well just always fall back
@kmod was the use of rdtsc (as opposed to rdtscp) intentional?
I suppose we’re going to run into the issue that cpus people want to run things on might not have rdtscp
Marius Wachtler
@undingen
May 07 2015 19:14
mmy sorry have no idea if we can do the same as cpython. But reusing the BoxedInstanceMethods would probably save a a few allocs.
for i in range(5):
    print id(object.__str__)

cpython:
139879813566704
139879813566704
139879813566704
139879813566704
139879813566704
pyston:
-2401053030601088133
-2401053030601088037
-2401053030601087941
-2401053030601087845
-2401053030601087749
Travis Hance
@tjhance
May 07 2015 19:27
I think we should make a version of wrapper_descriptor for pyston
so we can make everything the same type as it in cpython
so we’ll get all these quirks right
and save allocations, maybe
Chris Toshok
@toshok
May 07 2015 19:33
reusing BoxedInstanceMethods would save a huge number of allocations :)
it’s also entirely possible that django is checking for wrapper_descriptor equivalence someplace, and we’re doing extra work because of it
Chris Toshok
@toshok
May 07 2015 19:51
weird, do we throw a lot of exceptions during compilation?
i’m writing a little timer_diff.py script, and I ran it comparing cmake to non-cmake build (non-cmake includes the libgcc patch), and see:
GOOD us_timer_compileFunction, before 8195981932, after 2055958525, delta -6140023407 (approx -2.19 sec)
Kevin Modzelewski
@kmod
May 07 2015 19:57
you might be seeing object cache differences?
Chris Toshok
@toshok
May 07 2015 19:58
yeah, that’s just what I thought of - re-ran pyston_release twice in each config and that difference went away completely
awesome that the object cache gives us that much savings
Kevin Modzelewski
@kmod
May 07 2015 20:00
hmm so it's hard to make a CPython-style wrapper descriptor (which we do have) as efficient as our current builtins
since we don't get any information about the underlying function's calling convention
hmm maybe we could just not box those into instancemethods
Marius Wachtler
@undingen
May 07 2015 20:07
mmh I'm not sure what you mean by that. What would we instead return?
Would we just return the BoxedFunction?
Travis Hance
@tjhance
May 07 2015 20:30
Do we create a lot of boxed instance methods?
I would have guessed that's is not very common since we do the attribute call optimization of not creating one
Chris Toshok
@toshok
May 07 2015 20:37
django-template.py does
Kevin Modzelewski
@kmod
May 07 2015 20:38
sorry, I meant if it's something that would be a wrapper_descriptor in cpython, don't create unbound instance methods
since wrapper_descriptors have that behavior
Chris Toshok
@toshok
May 07 2015 20:40
alloc.instancemethod: 430229
allocsize.instancemethod: 13767328
i suppose that’s not that much
Kevin Modzelewski
@kmod
May 07 2015 20:41
a lot of these "number of times something happens" numbers are really hard for me to put in context :/
Chris Toshok
@toshok
May 07 2015 20:41
yeah, that’s the number of allocations over the entire program run :/
Marius Wachtler
@undingen
May 07 2015 20:48
how long is the total program run? Is this the ~30secs test?
Chris Toshok
@toshok
May 07 2015 20:49
yeah
Chris Toshok
@toshok
May 07 2015 20:56
do we have a policy of trying to match all cpython command line flags to ours?
I added a -U just for a hack to output us_timers_ in ticks, not us
Kevin Modzelewski
@kmod
May 07 2015 20:57
we're going to run out of letters soon :P
Chris Toshok
@toshok
May 07 2015 20:57
yeah, i can do this when an environment variable too
Kevin Modzelewski
@kmod
May 07 2015 20:57
I think at some point we'll need to bring them more in sync
and move the pyston-specific ones under some other flag
but for now I think it's ok if we just don't reuse the common python flags?
(the ones I know of we already support under the same flags as cpython)
Marius Wachtler
@undingen
May 07 2015 21:12
I have question concerning interpreting the new stats. When I see something like this:
us_timer_chosen_cf_body_builtins: 70057124
us_timer_chosen_cf_body_jitted: 2413863
This means we spend a lot of time inside non python functions. But when for example a c++ function calls internally a python function would the ticks spend inside the python code show up in the builtins count?
And does the us_timer_chosen_cf_body_jitted count also include time spend interpreting?
Chris Toshok
@toshok
May 07 2015 21:13
no, the interpreter should be listed separately (us_timer_astInterpret*, there are a couple)
those two timers you quote are the ones I’m spending time trying to split up
the _builtins one is invoking C++ through wrappers, basically
Michael Arntzenius
@rntz
May 07 2015 21:14
huh, I think I've found a corruption bug in libunwind. in a certain circumstance (when calling unw_get_proc_name on a function registered via _U_dyn_register with a format that's not UNW_INFO_FORMAT_DYNAMIC), it'll read a (dwarf_cie_info*) as if it were a (unw_dyn_info_t*), and consequently get garbage results.
Chris Toshok
@toshok
May 07 2015 21:14
and the _jitted one is directly invoking jitted code (hopefully this is always true)
from my gdb adventures last night it appears to be true
@undigen the top 4 targets of the call for chosen_cf_body_builtins are:
 120307 pyston::BoxedMethodDescriptor::__call__(pyston::BoxedMethodDescriptor*, pyston::Box*, pyston::BoxedTuple*, pyston::Box**)>
 127414 pyston::tupleContains(pyston::BoxedTuple*, pyston::Box*)>
 199648 pyston::BoxedCApiFunction::__call__(pyston::BoxedCApiFunction*, pyston::BoxedTuple*, pyston::BoxedDict*)>
1714714 pyston::BoxedWrapperObject::__call__(pyston::BoxedWrapperObject*, pyston::Box*, pyston::Box*)>
so it’s things like that
if the C++ then calls into a python function through some mechanism other than callCLFunc, it will add the time to that builtin, yes
but does that actually happen?
Marius Wachtler
@undingen
May 07 2015 21:17
ok thanks
mmh maybe with the RuntimeICs?
Chris Toshok
@toshok
May 07 2015 21:18
hm, good point. we should have timers around the runtime ic calls
Marius Wachtler
@undingen
May 07 2015 21:19
k
Chris Toshok
@toshok
May 07 2015 21:38
added ic timers. doesn’t look like we’re spending too much time beneath them:
us_timer_box_hasnextOrNullIC: 16496
us_timer_box_nextIC: 2934
us_timer_box_nonzeroIC: 5910
us_timer_box_reprIC: 45
are there other runtime ics? those were the most easily locateable ones :)
Marius Wachtler
@undingen
May 07 2015 22:02
builtins.cpp: extern "C" Box sum(Box container, Box* initial)
is missing
it's a BinopIC
Marius Wachtler
@undingen
May 07 2015 22:13

cool, cheetah on the first testsuite run (it looks like it runs the tests twice):

Ran 2138 tests in 84.925s
FAILED (failures=2)

second run:

Ran 2138 tests in 55.973s
FAILED (failures=4, errors=228)

and a few of them are unicode errors which I hope are already fixed with the '#coding' patch (haven't tried yet)

My guess is that it runs it twice to test some template caching mechanism
Kevin Modzelewski
@kmod
May 07 2015 22:29
nice that's pretty good :)
Chris Toshok
@toshok
May 07 2015 23:05
@undingen awesome, will add
Michael Arntzenius
@rntz
May 07 2015 23:19
in our inline caches, what exactly gets compared to determine whether the type matches?
because looking at this disassembly dump, it's looking like it's tp_mro
which doesn't make sense to me
Kevin Modzelewski
@kmod
May 07 2015 23:23
well, I believe we have to guard not just on the type but also on the inheritance tree of the type
so guarding on tp_mro should accomplish both that and the type itself?
(since the type is in the mro)
Michael Arntzenius
@rntz
May 07 2015 23:31
oh, and a type's mro could conceivably change, so we can't just guard on the type?
anyway, I'm mostly just checking that in fact we are guarding on the mro, and I am reading this disassembly right
is there a good way to build with valgrind under CMake? just using -DENABLE_VALGRIND=ON is producing an error
oh, I don't even have valgrind installed
Kevin Modzelewski
@kmod
May 07 2015 23:34
yeah in theory the mro can change
I don't think we support that yet
but I think it turns out to be as cheap to guard on the whole mro as on the type