These are chat archives for dropbox/pyston

13th
Jul 2015
Sun
@Daetalus
Jul 13 2015 06:13 UTC
Hi theres,
Travis Hance
@tjhance
Jul 13 2015 06:13 UTC
hello!
Sun
@Daetalus
Jul 13 2015 06:16 UTC
I use the latest master branch to build and test, but the analysis_unittest failed, any helps. The output seems a bit long. But I can put it here if you desired.
Travis Hance
@tjhance
Jul 13 2015 06:16 UTC
yeah can you paste it here?
Sun
@Daetalus
Jul 13 2015 06:17 UTC
No problem.
4: analysis_unittest: ../../src/codegen/osrentry.h:34: pyston::OSREntryDescriptor::OSREntryDescriptor(pyston::CLFunction , pyston::AST_Jump ): Assertion `clfunc' failed.
4: 0 analysis_unittest 0x00000000016113d8 llvm::sys::PrintStackTrace(_IO_FILE) + 40
4: 1 analysis_unittest 0x000000000161296b
4: 2 libpthread.so.0 0x00002abb5207e340
4: 3 libc.so.6 0x00002abb52ae2cc9 gsignal + 57
4: 4 libc.so.6 0x00002abb52ae60d8 abort + 328
4: 5 libc.so.6 0x00002abb52adbb86
4: 6 libc.so.6 0x00002abb52adbc32
4: 7 analysis_unittest 0x000000000062226e doOsrTest(bool, bool) + 8862
4: 8 analysis_unittest 0x0000000000b072a1 testing::Test::Run() + 785
4: 9 analysis_unittest 0x0000000000b08b00 testing::TestInfo::Run() + 816
4: 10 analysis_unittest 0x0000000000b0923f testing::TestCase::Run() + 447
4: 11 analysis_unittest 0x0000000000b12246 testing::internal::UnitTestImpl::RunAllTests() + 1350
4: 12 analysis_unittest 0x0000000000b11cea testing::UnitTest::Run() + 106
4: 13 analysis_unittest 0x0000000000b24726 main + 70
4: 14 libc.so.6 0x00002abb52acdec5 __libc_start_main + 245
4: 15 analysis_unittest 0x000000000061efc9
4/8 Test #4: analysis_unittest ................**
Exception: Other 0.04 sec
test 5
Mint 64 bit.
gcc 4.8.4
Travis Hance
@tjhance
Jul 13 2015 06:23 UTC
hmm I get the same thing, actually
Sun
@Daetalus
Jul 13 2015 06:25 UTC
I saw there have some clfunc related commit recently. Maybe one of these commit broke it?
Travis Hance
@tjhance
Jul 13 2015 06:29 UTC
that’s weird, the unittest pretty expliclty passes NULL to the constructor of OSREntryDescriptor which asserts that its argument is non-NULL
i wonder if this test is even enabled for our main ci build
oh actually it looks like the non-NULL assertion was added just 3 days ago
maybe somebody was too lazy to run the tests before merging this in:P
Sun
@Daetalus
Jul 13 2015 06:37 UTC
Do you mind to point out which commit bring it, please? I could not find it...
Travis Hance
@tjhance
Jul 13 2015 06:37 UTC
dropbox/pyston@41c0273
Sun
@Daetalus
Jul 13 2015 06:38 UTC
Thanks!
Kevin Modzelewski
@kmod
Jul 13 2015 08:14 UTC
oh drat
I really need to resurrect that change that has travis-ci also test in Debug mode
currently it only tests in Release mode and we get issues like this
Travis Hance
@tjhance
Jul 13 2015 08:15 UTC
Ooohhhhh
Sun
@Daetalus
Jul 13 2015 08:16 UTC
Hi Kevin, do mind to review #683 if you have time? I believe it is quite short...
Kevin Modzelewski
@kmod
Jul 13 2015 08:23 UTC
ok I'll take a look
thanks for the report, I just opened #695 :)
Sun
@Daetalus
Jul 13 2015 08:24 UTC
Any time.
Kevin Modzelewski
@kmod
Jul 13 2015 08:33 UTC
looks pretty good :) just had one small comment on it
Sun
@Daetalus
Jul 13 2015 08:34 UTC
Got it! Thanks!
Kevin Modzelewski
@kmod
Jul 13 2015 08:34 UTC
btw you can ignore the travis-ci failure it reported -- that was because one of our external dependencies got updated and broke that test
we've since revision-locked that dependency, so the failure should go away the next time you update the PR
Sun
@Daetalus
Jul 13 2015 08:36 UTC
Ok, could I ask some another questions here? Maybe these questions seems stupid...
Marius Wachtler
@undingen
Jul 13 2015 08:44 UTC
if you tell me the question I may be able to help :-)
Kevin Modzelewski
@kmod
Jul 13 2015 08:47 UTC
you should definitely feel free to ask questions here :)
Sun
@Daetalus
Jul 13 2015 08:51 UTC
I am working on pow issue, such as pow(3l, 3l, -8), and already get some progress, most of the test_pow test could passed. But I don't know how to let longPow accept float parameter, pow(long, float, mod) will say TypeError: unsupported operand type(s) for ** or pow(): 'long' and 'float'.
Kevin Modzelewski
@kmod
Jul 13 2015 09:18 UTC
it looks like in CPython, long**float is handled by casting the long to a float
in their float_pow
I guess that makes sense, since the result will be a float
I'm not quite sure about the ternary form; looks like pow(long, float, mod) throws an exception in cpython
TypeError: pow() 3rd argument not allowed unless all arguments are integers
do you have a branch up?
Sun
@Daetalus
Jul 13 2015 09:19 UTC
Thanks, I will investigate it more. And the #683 was updated.
I already added these exceptions, include ZeroDivisionError: 0.0 cannot be raised to a negative power in intPow.
I create seperate branch for min_max, pow, and long float issues.
Marius Wachtler
@undingen
Jul 13 2015 15:08 UTC
Ok I think I finally have found why the protobuf test fails sometimes when we start to rewrite more stuff...
Marius Wachtler
@undingen
Jul 13 2015 16:10 UTC
protobuf calls at some point collections.defaultdict(arg). We rewrite this to but forget to add the call to defaultdict.tp_init. This means on the first call it will succeed because during the rewrite we call it but the generated code does not contain the call...
I think the issue is in typeCallInner() types.cpp:922
// If we weren't passed the args array, it's not safe to index into it
if (passed <= 2)
        initrtn = runtimeCallInternal(init_attr, NULL, init_argspec, arg2, arg3, NULL, NULL, keyword_names);
else
        initrtn = runtimeCallInternal(init_attr, NULL, init_argspec, arg2, arg3, args[0], &args[1], keyword_names);
I think we have to either abort the rewrite or have to pass a rewriter to the runtimeCallInternal
Can somebody who is more familiar with the code have a look and tell me what the right fix is.
Marius Wachtler
@undingen
Jul 13 2015 16:15 UTC
and damn took me more than half the day to find this bug.
Travis Hance
@tjhance
Jul 13 2015 16:56 UTC
well, aborting the rewrite will definitely work
hm it looks like processDescriptor doesn’t deal with rewriting right now
it looks like that’s the main reason this case isn’t being rewritten
I’m not sure if we want to add rewriting capabilities to processDescriptor, or if we just want to emit a call to processDescriptor in the assembly
Chris Toshok
@toshok
Jul 13 2015 17:01 UTC
processDescriptor calls into python, doesn’t it?
Travis Hance
@tjhance
Jul 13 2015 17:02 UTC
yeah, but that doesn’t prohibit rewriting a call to it
another processDescriptor usage along with in-rewriter, but calling runtimeCallInternal with NULL rewrite args?
oh, that one doesn’t commit the rewriter
Travis Hance
@tjhance
Jul 13 2015 17:04 UTC
why does it have two different code paths depending on whether or not there is a rewriter
if neither one does rewriting?
Chris Toshok
@toshok
Jul 13 2015 17:05 UTC
line 2011-ish? looks to be using rewriter existence as some different state
Travis Hance
@tjhance
Jul 13 2015 17:05 UTC
whaaat
wait that doesn’t even make sense
Marius Wachtler
@undingen
Jul 13 2015 17:06 UTC
wow thats confusing
Chris Toshok
@toshok
Jul 13 2015 17:09 UTC
yeah, but tp_setattr/tp_setattro rules + trying to rewrite everything
Rudi Chen
@rudi-c
Jul 13 2015 19:03 UTC
Are we having a standup meeting?
Marius Wachtler
@undingen
Jul 13 2015 19:05 UTC
I'm ready
Travis Hance
@tjhance
Jul 13 2015 19:06 UTC
hi
Chris Toshok
@toshok
Jul 13 2015 19:07 UTC
:wave:
Marius Wachtler
@undingen
Jul 13 2015 19:09 UTC
@tjhance am I right that if I call rewriter::alloca(n bytes) from scratch space this scratch space will be marked as used forever?
Travis Hance
@tjhance
Jul 13 2015 19:10 UTC
yeah we don’t collect it at the moment
I was too lazy
Marius Wachtler
@undingen
Jul 13 2015 19:10 UTC
or will it get released when the RewriterVar* has no uses?
Travis Hance
@tjhance
Jul 13 2015 19:10 UTC
in principle we could make it get released when the RewriterVar* does
Marius Wachtler
@undingen
Jul 13 2015 19:10 UTC
ok, I will probably change that because with the bjit uses it more often
Travis Hance
@tjhance
Jul 13 2015 19:10 UTC
we should do that, if you’re running out of scratch space
Marius Wachtler
@undingen
Jul 13 2015 19:10 UTC
I do :-D
Travis Hance
@tjhance
Jul 13 2015 19:16 UTC
So I haven't really been paying attention with the baseline jit, what kinds of things does it rewrite? I thought it was just rewriting the same things we were already rewriting
Chris Toshok
@toshok
Jul 13 2015 19:16 UTC
oh yuck, capsule
/* Wrap void * pointers to be passed between C modules */
also, there really needs to be a :facepalm:
Travis Hance
@tjhance
Jul 13 2015 19:18 UTC
?
Marius Wachtler
@undingen
Jul 13 2015 19:18 UTC
yes, it doen't rewrite more stuff than the LLVM tier. The only difference is that llvm tier may skip some patchpoins because it can figure out the function if the types are static. I don't know why this defaultdict bug shows only up sometimes...
Chris Toshok
@toshok
Jul 13 2015 19:19 UTC
@tjhance PyCapsule
@rudi-c: fannkuch creates a ton of capsule objects, right?
@tjhance essentially a wrapper around arbitrary data, but with instance- (not class-) level finalizers
Rudi Chen
@rudi-c
Jul 13 2015 19:21 UTC
yeas
*yes
Chris Toshok
@toshok
Jul 13 2015 19:21 UTC
if that’s possible anyway :/
Travis Hance
@tjhance
Jul 13 2015 19:22 UTC
Instance level finalizers?
Chris Toshok
@toshok
Jul 13 2015 19:22 UTC
one type for the instances that have NULL destructors
it might be worth splitting capsule into two types
since that seems to be the bulk of the uses in from_cpython/Modules
Marius Wachtler
@undingen
Jul 13 2015 19:24 UTC
what line in fannkuch creates this capsules?
Rudi Chen
@rudi-c
Jul 13 2015 19:24 UTC
It's not explicit in the code
Chris Toshok
@toshok
Jul 13 2015 19:24 UTC
yeah I was just looking - i didn’t see anyplace other than possibly time.time()
Rudi Chen
@rudi-c
Jul 13 2015 19:24 UTC
but when I was printing the types that pass through the GC, I saw a lot of PyCapsules
Kevin Modzelewski
@kmod
Jul 13 2015 19:24 UTC
I think PyArg_ParseTuple creates them
also, are people still around for the standup?
sorry I slept through it :(
Chris Toshok
@toshok
Jul 13 2015 19:25 UTC
think everyone’s still around that was around :)
Travis Hance
@tjhance
Jul 13 2015 19:25 UTC
I'm he re
Rudi Chen
@rudi-c
Jul 13 2015 19:25 UTC
yup
Kevin Modzelewski
@kmod
Jul 13 2015 19:26 UTC
ok cool
so apparently the idea is to have each person answer these three questions
  • What did you do yesterday?
  • What will you do today?
  • Are there any impediments in your way?
(I guess modified for our weekly schedule)
the thing I don't understand is that people are very explicit that this is not supposed to be a "status meeting"
they say that it's because of the peer-to-peer, self-organized nature of it
personally I think it just sounds like marketing :P
Marius Wachtler
@undingen
Jul 13 2015 19:27 UTC
:-D
Travis Hance
@tjhance
Jul 13 2015 19:27 UTC
Main impediment is that the code review cycle seems pretty long
Chris Toshok
@toshok
Jul 13 2015 19:28 UTC
:+1: more reviewers
Kevin Modzelewski
@kmod
Jul 13 2015 19:28 UTC
man I wish gitter had "so-and-so is typing" notifications
not sure how we're supposed to self-organize this effectively over im
Rudi Chen
@rudi-c
Jul 13 2015 19:29 UTC
Let's assign an ordering
who wants to go first?
Kevin Modzelewski
@kmod
Jul 13 2015 19:29 UTC
ok travis first, then alphabetical: travis-kmod-rudic-toshok-undingen
since he already started
Travis Hance
@tjhance
Jul 13 2015 19:30 UTC
Well I have rewrite stuff out, and a little bit of in progress stuff like rewriting function calls that pass defaults
Umm I probably won't be able to take on a Big Project like the templaying exceptions thing, sorry kmod :/
Kevin Modzelewski
@kmod
Jul 13 2015 19:31 UTC
ok np :)
Travis Hance
@tjhance
Jul 13 2015 19:32 UTC
Also I put a fix up for the live outs thing
But I think we can do better by just removing the live outs concept from he re writer and spI'll in everything beforehand
But I haven't attempted to do that yet
Although maybe that isn't true, If, say, it's pres er ve all and the ic assembly is short
Kevin Modzelewski
@kmod
Jul 13 2015 19:35 UTC
ok, interesting
and you said the main impediment is code review?
Travis Hance
@tjhance
Jul 13 2015 19:36 UTC
Yeah (it doesn't help that I work at odds hours and so sporadically)
What's interesting? I don't think I said anything we didn't already discuss
Kevin Modzelewski
@kmod
Jul 13 2015 19:37 UTC
ok cool
Travis Hance
@tjhance
Jul 13 2015 19:38 UTC
Anyway that's my "standup" I guess
Kevin Modzelewski
@kmod
Jul 13 2015 19:38 UTC
I think we're supposed to keep the bulk of the discussion of individual topics to after the end of the standup, so shall we move on?
ok cool :)
Travis Hance
@tjhance
Jul 13 2015 19:38 UTC
sits down
Kevin Modzelewski
@kmod
Jul 13 2015 19:38 UTC
hey no you need to keep standing for everyone else's :)
anyway, mine is
Travis Hance
@tjhance
Jul 13 2015 19:39 UTC
stands up
Kevin Modzelewski
@kmod
Jul 13 2015 19:39 UTC
  • I worked on fixing + reenabling the type speculation + specialization, and was able to get pretty good wins on some microbenchmarks. they didn't seem to translate to wins on the macrobenchmark
(we are faster on for loops again!)
  • I'm going to look into reducing the overhead of the llvm optimization passes (I think we have a lot in there we don't need) and fix some issues that made this non-pushable for now
  • not really an "impediment" but the more work we can do on the baseline-jit stuff (and the more code we can keep there), the more aggressive we can be with the llvm tier
(done with update)
Travis Hance
@tjhance
Jul 13 2015 19:42 UTC
Nice bullet points
Really organized
Kevin Modzelewski
@kmod
Jul 13 2015 19:44 UTC
hmm maybe we should have explicit handoffs
@rudi-c you're up :)
Rudi Chen
@rudi-c
Jul 13 2015 19:44 UTC
So last week I finished up finalizer and slicing stuff. For the slicing code unless there's any more comments it should be ready to be merged (#536)
For finalizers, @toshok is reviewing, mostly small changes.
Just so I know, did anyone else wanted to take a look at those changes?
So I know whether to wait on anybody else once I'm done with the tidying up.
Kevin Modzelewski
@kmod
Jul 13 2015 19:45 UTC
I'd like to take a look :) hopefully won't block it though
Rudi Chen
@rudi-c
Jul 13 2015 19:46 UTC
Ok. The slicing stuff especially I'd like to get it off my mind :)
Other than that I'm looking into compacting/moving/mostly-moving GCs.
Travis Hance
@tjhance
Jul 13 2015 19:46 UTC
Ooooooh
Rudi Chen
@rudi-c
Jul 13 2015 19:47 UTC
Still just started reading though.
Reading is hard -_-. I have a tendency to fall asleep all the time everywhere (but especially when reading).
Hmm that's all I have for now.
@toshok?
Chris Toshok
@toshok
Jul 13 2015 19:48 UTC
yo
Travis Hance
@tjhance
Jul 13 2015 19:48 UTC
lo
Chris Toshok
@toshok
Jul 13 2015 19:49 UTC
as @rudi-c mentioned, I’m reviewing GC changes
i’ve also got a few PR’s that I want to update to be directly mergeable.
other than that I was planning to steal @dagar’s jemalloc PR and hopefully finish it up, and work with jason evans to get him able to run pyston against jemalloc so he can fix the perf regressions I found in 4.0
I’d like to echo the code review impediment - not that it’s really an impediment, just that I’d love comments from everyone :)
that’s all I have
toshok @toshok tags @undingen
Travis Hance
@tjhance
Jul 13 2015 19:52 UTC
Yeah getting comments makes one feel fuzzy and warm
Marius Wachtler
@undingen
Jul 13 2015 19:52 UTC
  • I worked on speeding up the baseline jit both by generating code faster and making the generated code faster. Most importantly I switched from using runtime ICs to directly emit patchpoints (>6 arg support).
    Today I spend mostly in tracking down a rewriter bug which I encountered sporadically since about a week but could always reproduce with the new patchpoint baseline jit patch.
  • I'm planning to spend at least one more day on improving the baseline jit because I expect to find sill low hanging fruit and it's still quite significantly slower than the LLVM tier (but getting much closer :-)) and I was able to improve the overall benchmark perf quite good with this.
    More precise I plan to improve the ASTInterpreter::initArguments, baseline jit boxBool() and branch handling next (remove boxBools call + less helper calls etc).
Travis Hance
@tjhance
Jul 13 2015 19:53 UTC
Oh damn Marius knows how it’s done
Marius Wachtler
@undingen
Jul 13 2015 19:53 UTC
  • impediment: currently none. before definitely tracking down a rewriter bug inside the integration test... but disabling patchpoints one by one and adding dump() call rewriter actions definitely helped.
Chris Toshok
@toshok
Jul 13 2015 19:53 UTC
getting schooled in our first standup
Marius Wachtler
@undingen
Jul 13 2015 19:53 UTC
you destroyed the formating!!! :-P
I'm done...
@kmod Would it make sense to add code to track types already inside the baseline jit
?
Kevin Modzelewski
@kmod
Jul 13 2015 19:55 UTC
ok cool that's a wrap :)
I guess now we can "sit down" and start discussing things in more detail... though in gitter it looks the same :P
Marius Wachtler
@undingen
Jul 13 2015 19:55 UTC
so that the first LLVM tier can already do type speculation once you added deopt
Travis Hance
@tjhance
Jul 13 2015 19:56 UTC
sits down
Kevin Modzelewski
@kmod
Jul 13 2015 19:56 UTC
yeah part of my change is to have the baseline jit do type speculations :)
err, type recording
Marius Wachtler
@undingen
Jul 13 2015 19:56 UTC
k
Kevin Modzelewski
@kmod
Jul 13 2015 19:56 UTC
I think we will eventually get to the point that we have just a single llvm tier
the issue with that right now is that the optimizations we run are too expensive
and I think we want to run those to get the most out of the type speculation
so if we have just one llvm tier that has optimizations turned on, perf is worse since we spend too much time doing the optimizations
if we have just one llvm tier that has optimizations turned off, that actually helps perf for now :P
but for now I just left it as the same llvm-noopt -> llvm-opt strategy
I'm going to work on getting the optimizations cost down though
have you tried the investigate_stat_timer "profiler"?
I think that could potentially help pin down where the time "in_baseline_jit" is getting spent
Marius Wachtler
@undingen
Jul 13 2015 20:01 UTC
yes I began to use it and it helps
Travis Hance
@tjhance
Jul 13 2015 20:01 UTC
kmod could you give us a crash course in type recording/type speculation/type analysis and what all these things mean in pyston and how they interact
like for example, at what point, if any, do we get to not add in type guards?
Rudi Chen
@rudi-c
Jul 13 2015 20:02 UTC
+1
Marius Wachtler
@undingen
Jul 13 2015 20:02 UTC
btw: I too meant type recoding inside the baseline jit not speculation...
Kevin Modzelewski
@kmod
Jul 13 2015 20:03 UTC
ok so our llvm tier has a pretty powerful type-analysis system in it
but we haven't been making much use of it
because we haven't been feeding it any type information lately
so it just says "well the argument types are unknown, and we don't know anything else, so I guess all the types in this function are unknown"
there are two ways we have of inputting type info into this process
the first is argument-based specialization; ie we will compile a version of a function for a specific set of types (think: min() specialized for taking ints)
so at that point we know the argument types, and we can infer other things
the other way is through type speculation
which is where we say "hey the result of this expression has typically had the same type when we've hit it, let's speculate that it will continue to have that type"
for this we need an earlier "type recording" phase where we record the types of the results
that used to be in our lower-cost llvm tiers, and I think it makes sense now to have it happen in the baseline jit
so anyway, now we have some more decent type info to play with at jit-time
we used to have the concept of "guaranteed classes", where the jit could pass to the rewriter that "hey this argument is known to be of this class, you don't need to guard on it"
I think that got removed at some point but if you grep through there are still some places it's referenced
Travis Hance
@tjhance
Jul 13 2015 20:08 UTC
when was this? I don’t remember it at all
(and I find it hard to believe that you had some advanced feature like this on “manually emit assembly in objmodel” rewriter1 :P)
Kevin Modzelewski
@kmod
Jul 13 2015 20:10 UTC
I think it might have been in the pre-open-source days
so maybe I am making it up :)
but anyway, the idea is still possible regardless of whether we did it before
I'm pretty sure we don't do it at all right now
Chris Toshok
@toshok
Jul 13 2015 20:11 UTC
will the type recorders be able to record multiple types, or just the last one?
Marius Wachtler
@undingen
Jul 13 2015 20:11 UTC
do you have any idea how to treat int+longs during static type speculation? Because I suspect that this is quite common and we currently have to specifiy unknown. Or do you think it's enough to have them statically resolve to unknown because the type recording will probably figure out at runtime that it will mostly encounter ints?
Chris Toshok
@toshok
Jul 13 2015 20:11 UTC
i wonder if we could use number of recorded types to dictate how large a patchpoint we need
Kevin Modzelewski
@kmod
Jul 13 2015 20:12 UTC
oh interesting, yeah right now the type recorders just record the last type and how many times it was seen in a row
but it could definitely keep some very simple stats about that and try to guess on patchpoint size
or number of slots
I think for int promotion, we might benefit from having some more custom support
like, instead of going into the runtime and then speculating on the return type, we could just emit the fast "add with overflow detection" instruction and then speculate that overflow=false
I don't think django_template does that much integer arithmetic
I'm trying to figure out what cases we could use this stuff to help django_template, and I saw int comparisons coming up, and I was able to cut down on those but it didn't help much
Marius Wachtler
@undingen
Jul 13 2015 20:16 UTC
I think we have already most of this info: the baseline jit uses patchpoints. if we switch to the LLVM tier we could read out the number of slots inside the patchpoint in order to know how many slots are at least needed + we may even be able to read out the types by looking at the emitted assembler.. (if we add some helpers)
oh ok
but yeah we miss the info what type is most often encountered if we just look at the emitted assembler of the patchpoint :-(
Travis Hance
@tjhance
Jul 13 2015 20:20 UTC
so what about unboxed stuff? I haven’t seen us discuss that in a while
Marius Wachtler
@undingen
Jul 13 2015 20:20 UTC
and this retrieving all the info from the patchoint may actually be much harder todo. was just an idea I had in mind
Kevin Modzelewski
@kmod
Jul 13 2015 20:21 UTC
I think I heard that other vms do harvest type info from ICs
chris do you know?
Travis Hance
@tjhance
Jul 13 2015 20:22 UTC
right now we do some unboxed ints right? but we could do unboxed bools, floats, tuples, etc.
Kevin Modzelewski
@kmod
Jul 13 2015 20:22 UTC
I think we have all of those :)
but use them very little
though we do use unboxed tuples for some simple cases like
a, b = b, a
or maybe more realistic a, b = l[0], l[1]
Marius Wachtler
@undingen
Jul 13 2015 20:25 UTC
I think type speculation will help in reducing the number of boxes... Because last time I looked into removing the number of boxes I always failed todo so because there was some generic operation to be made which required boxing
Travis Hance
@tjhance
Jul 13 2015 20:25 UTC
oh, guess I remembered wrong
Chris Toshok
@toshok
Jul 13 2015 20:28 UTC
sorry, was off grabbing food. I think most vm’s that do IC’s either always record, or have different modes IC’s can be in. or do you mean actually groveling in the assembly for type info?
Marius Wachtler
@undingen
Jul 13 2015 20:28 UTC
and what about: for i in range(10) (the range) or True, False etc... do you plan to add type recording to getGlobal()? Because afaik we only supported getattr?
Daniel Agar
@dagar
Jul 13 2015 20:29 UTC
@toshok I think the current PR works
needs testing
Chris Toshok
@toshok
Jul 13 2015 20:29 UTC
oh? awesome - will take a look
Marius Wachtler
@undingen
Jul 13 2015 20:30 UTC
cool then you are going to be the guy which contributed one of the largest speedups in the last weeks :-D
Daniel Agar
@dagar
Jul 13 2015 20:30 UTC
rebased to force the build
Chris Toshok
@toshok
Jul 13 2015 20:31 UTC
it’s also a huge aid in perf usage
since we don’t have to deal with _int_malloc and _int_free anymore
Kevin Modzelewski
@kmod
Jul 13 2015 20:32 UTC
and range as well
oh, yeah True/False would be low-hanging fruit
I think I need to take a smaller part of django_template and look into that
I'm worried that all of these things would be improvements but insignificant for django_template as a whole :/
Marius Wachtler
@undingen
Jul 13 2015 20:49 UTC
retyping the message because gitter lost the previous one:
yes this is probably more a microbenchmark thing which won't add up improving much. It just drives my crazy to think about the large overhead a simple for loop has compared to the C one :crying_cat_face:
Chris Toshok
@toshok
Jul 13 2015 21:18 UTC
if PyErr_Occurred() returns true, is there some easy way to dump the error to the console?
Chris Toshok
@toshok
Jul 13 2015 21:34 UTC
so it looks like virtualenv’s performance characteristics might change due to adding support for _curses,bz2,grp modules (I’m guessing the bz2 module is the reason)
there are definitely codepaths being executed that weren’t before
Travis Hance
@tjhance
Jul 13 2015 21:42 UTC
:(
Chris Toshok
@toshok
Jul 13 2015 21:43 UTC
django_template.py             3.9s (2)             4.1s (2)  +6.6%
not expected
additional startup costs? :(
Kevin Modzelewski
@kmod
Jul 13 2015 21:46 UTC
I wonder if something like this would be interesting for dealing with our variability:
Chris Toshok
@toshok
Jul 13 2015 21:53 UTC
ugh
0m3.742s runtime. after doing rm build/Release/lib_pyston/grp.pyston.so, runtime drops to 0m3.489s
Kevin Modzelewski
@kmod
Jul 13 2015 23:21 UTC
another fun effect of self-hosting our sharedmodules (such as grp): if you build a broken version of them, the broken build will prevent you from building a fixed version
Chris Toshok
@toshok
Jul 13 2015 23:22 UTC
yeah, I just found that if you switch branches it doesn’t remove the .so’s, so they were being copied into the builds in pyston-perf
bz2.pyston.so depends on symbols only present in that branch, and virtualenv_bench ran really quick :)
Kevin Modzelewski
@kmod
Jul 13 2015 23:26 UTC
the extra time looks like it might be coming from GC
oh I think it's because it makes isNonheapRoot more expensive
since the sharedmodule gets loaded above our heaps
Chris Toshok
@toshok
Jul 13 2015 23:28 UTC
oh, because of the registration of .so's
right
Kevin Modzelewski
@kmod
Jul 13 2015 23:36 UTC
hey travis, random thought from those slides chris sent out -- maybe we could make sure that all of our instruction memory gets allocated below 4gb
and then use the more compact call instruction
Travis Hance
@tjhance
Jul 13 2015 23:36 UTC
did the slides say something like “having your instructions below 4gb is really useful”?
Kevin Modzelewski
@kmod
Jul 13 2015 23:37 UTC
yeah
Travis Hance
@tjhance
Jul 13 2015 23:37 UTC
also, don’t they need to be below 2gb in order to always use the compact instruction?
err
Kevin Modzelewski
@kmod
Jul 13 2015 23:37 UTC
err maybe
Travis Hance
@tjhance
Jul 13 2015 23:37 UTC
what kind of arithmetic does the call instruction do?
does it extend before or after the add?
Kevin Modzelewski
@kmod
Jul 13 2015 23:37 UTC
I think it's a signed 4-byte offset
so yeah it seems like you can only call +-2gb, but I thought the slide said 4
anyway, you get the point :)
Travis Hance
@tjhance
Jul 13 2015 23:38 UTC
yeah i guess 2gb is the only thing that makes sense
Kevin Modzelewski
@kmod
Jul 13 2015 23:39 UTC
maybe I am making up the part about this being from the slides
Chris Toshok
@toshok
Jul 13 2015 23:39 UTC
oh man, switching the order we check things in (non-heap vs heap) makes a big difference
Kevin Modzelewski
@kmod
Jul 13 2015 23:40 UTC
I can't link to the slide, but it's slide 29 here http://www.slideshare.net/curryon/cliff-clickbitsofadviceforthevmwriter
Travis Hance
@tjhance
Jul 13 2015 23:40 UTC
awesome i don’t even have to leave gitter to view this slide show
Travis Hance
@tjhance
Jul 13 2015 23:47 UTC
on that note (something I forgot to mention at the standup today) I’ve been playing around with statically allocating the builtin type objects
one advantage is that this will make the pointers to those object small constants instead of large constants
I’m running into some trouble though because BoxedClass doesn’t seem to have all the functionality of BoxedHeapClass
especially with tp_* stuff, which we handle differently
now I guess there isn’t really a fundamental reason why BoxedHeapClasses have to be literally on the heap
so I could just keep the builtin types as BoxedHeapClasses, I guess
but like with the BoxedHeapClasses, we have the as_sequence field and then point tp_as_sequence at it, and if I turn, say, tuple_cls into a BoxedClass, then tp_as_sequence won’t be pointed to anything and won’t be initialized right (I’m not really entirely sure how all that initialization works but it seems like it’s just wrapping the __iter__ method or something, is that right?)
Travis Hance
@tjhance
Jul 13 2015 23:54 UTC
yeah anyway my point is that we treat our builtin classes much more like “heap classes” than cpython does (in the sense that they are on the heap, that they are instances of BoxedHeapClass, and that they actually seem to rely on its functionality of using attributes for this stuff just like Python classes do)
this seems mildly weird