These are chat archives for dropbox/pyston

27th
Mar 2015
Kevin Modzelewski
@kmod
Mar 27 2015 02:33
@toshok, try commenting out the call to registerGCManagedBytes in generator.cpp
Chris Toshok
@toshok
Mar 27 2015 05:20
i added some counters and see this
before the first request (but after setup):
scanned 0 bytes from extension objects
scanned 42248 bytes from threads
scanned 32736 bytes from generator stacks
and after 50 requests:
scanned 0 bytes from extension objects
scanned 178520 bytes from threads
scanned 705704 bytes from generator stacks
thread stacks stability at 125-175k, generator stacks appear to grow without bound
i also think some of thread bytes are actually generator stack bytes, since they’re collected along with the normal thread stacks in threading.cpp
Chris Toshok
@toshok
Mar 27 2015 05:26
@kmod oh wow, we’re registering MAX_STACKS_SIZE for any generator? i was wondering why I was getting GC’s with only ~2000 new objects allocated
commenting out that call, 0 GC’s after 50 requests
Chris Toshok
@toshok
Mar 27 2015 05:40
and that’s with 50megs between gc’s. dropping to 10meg = 1.41% for markPhase
top of perf is now dominated by unwinding
Travis Hance
@tjhance
Mar 27 2015 05:48
gahh trying to understand performance is so frustrating -_-
Travis Hance
@tjhance
Mar 27 2015 06:48
so in order to support exec code in namespace (where namespace is a dict), we’re technically supposed to pass in namespace as both the globals and the locals. Passing it in as locals is easy, we have that functionality. But passing in the globals is harder, since right now we’re “cheating” by passing in a module instead of a dict
@kmod do we have the capability to set the dictionary of an object yet?
I guess we can make a new module and put the dictionary on that
and then pass in that module
I’m not actually sure… does python create a new module for the scope in the exec, or what ?:
no I odn’t think so
Travis Hance
@tjhance
Mar 27 2015 06:55
I mean part of the problem here is that we’re sort of cheating with our global accesses by accessing them directly as module attributes rather than in the module dict
Travis Hance
@tjhance
Mar 27 2015 07:07
fascinating
exec """class C(object): pass print C.__module__ """ in {}, {}
This message was deleted
exec """class C(object): pass print C.__module__ """ in {}, {}
hm can gitter not print multiline stuff?
sdlfajsdf
hmm
exec """class C(object):
    pass
print C.__module__
""" in {}, {}
aha
there we go
anyway that prints __builtin__
which is interesting
without the in {}, {}, it just prints the name of the module its in
Travis Hance
@tjhance
Mar 27 2015 07:12
is that just some bizarre artifact, or is the __builtin__ module attached to the scope in some non-trivial way?
Travis Hance
@tjhance
Mar 27 2015 07:33
hm I think it’s just an artifact
because it just loads the name __name__
and when __name__ isn’t found in either the globals or locals it checks the __builtins__ module
and of course from there it gets the name __builtin__
well I should probably go to sleep, night
Marius Wachtler
@undingen
Mar 27 2015 08:57
@toshok wow changing gc::registerGCManagedBytes(MAX_STACK_SIZE) to INITIAL_STACK_SIZE reduces the gc time for pip install setuptools pip from gc_collections: 188 gc_collections_us: 1653381 to gc_collections: 22 gc_collections_us: 235633
Chris Toshok
@toshok
Mar 27 2015 17:13
Yeah it's pretty crazy. I don't think we should be calling it there at all.
Kevin Modzelewski
@kmod
Mar 27 2015 18:44
we also only free the stack space in the generator simple_destructor
if we did that also when the generator exits that should help a bunch too
@tjhance so I looked into the "create a new module for the eval/exec" a tiny bit dropbox/pyston@9098ccf
I don't think CPython actually attaches the module object to anything in the module
it just passes in the relevant fields (globals, fn)
we could probably do the same?
and we could change getGlobal to take an arbitrary dict-like object as the globals, rather than the module object
I guess we'd take a perf hit at first but we could probably get back most of it by rewriting into attrwrappers (or we could just special-case that)
oh, and we can set __dict__ now, and that could probably work as well as a way of shimming it into the current model
though we'd have to create a new BoxedModule object
Chris Toshok
@toshok
Mar 27 2015 18:55
wow, from cppreference.com: "Moreover, code such as f(std::shared_ptr<int>(new int(42)), g()) can cause a memory leak if g throws an exception because g() may be called after new int(42) and before the constructor of shared_ptr<int>"
c++’s eval order is bizarre
@kmod re: generator simple_destructors, I’m not sure generators are being collected properly now, so the simple_destructor might not be called.
Kevin Modzelewski
@kmod
Mar 27 2015 19:03
oh hmm
I think they do get run sometimes but yeah it could be easy for them not to
I think this is why we weren't seeing much of a difference from turning down the collection frequency, since it meant we allocated more generator stacks
Travis Hance
@tjhance
Mar 27 2015 19:23
@kmod kk thanks
yeah my concern is mostly that i don’t want to take the perf-hit by using the dict when we could use the module attrs instead
Travis Hance
@tjhance
Mar 27 2015 19:32
use make_shared?
Chris Toshok
@toshok
Mar 27 2015 19:57
Yeah that's the fix. Just bizarre that the eval order permits that interleaving.
Marius Wachtler
@undingen
Mar 27 2015 21:22
Do we want to check the __class__ attr when doing isInstance() and PyObject_TypeCheck return 0?
Kevin Modzelewski
@kmod
Mar 27 2015 21:23
what does CPython do?
Marius Wachtler
@undingen
Mar 27 2015 21:25
currently isInstance calls into _PyObject_RealIsInstance which does this:
      } else if (PyType_Check(cls)) {
        retval = PyObject_TypeCheck(inst, (PyTypeObject*)cls);
        if (retval == 0) {
            PyObject* __class__ = boxStrConstant("__class__");
            PyObject* c = PyObject_GetAttr(inst, __class__);
            if (c == NULL) {
                PyErr_Clear();
            } else {
                if (c != (PyObject*)(inst->cls) && PyType_Check(c))
                    retval = PyType_IsSubtype((PyTypeObject*)c, (PyTypeObject*)cls);
                Py_DECREF(c);
            }
        }
    } else {
I'm wondering if I should change this one to just call PyObject_TypeCheck(inst, (PyTypeObject*)cls); or leave it as is
Marius Wachtler
@undingen
Mar 27 2015 21:41
is name_forcing_syntax_error.py working for you guys?
my python version has probably a slightly different error string
Kevin Modzelewski
@kmod
Mar 27 2015 21:50
hmm seems to work for me
I'm also not 100% sure what you're thinking of changing about isInstance, but it looks like there are only three places it's called
I think the intention was that it was just a shorthand for isSubclass(obj->cls, base_cls)
but with only three callers we could probably just move those to calls to isSubclass or what not?
Marius Wachtler
@undingen
Mar 27 2015 21:53
ok
I found the error message bug: https://bugs.python.org/issue22008
Kevin Modzelewski
@kmod
Mar 27 2015 21:57
heh :)
Travis Hance
@tjhance
Mar 27 2015 22:20
oh I even wondered about that while copying the error message
Marius Wachtler
@undingen
Mar 27 2015 22:26
Could be written by me :smile:
Chris Toshok
@toshok
Mar 27 2015 23:14
really annoying that llvm goes to the trouble of adding a type that represents concatenations of stack-allocated strings (Twine), but doesn’t give you a way to get the length of it
Chris Toshok
@toshok
Mar 27 2015 23:21
I added a boxStringTwine() to make it easier to box up string concatenations. I’d like to have the twine’s length so I can allocate the properly sized Box, then implement something (probably an llvm::raw_ostream subclass) that will let me write the twine directly into the box’s memory.
but I can’t do that, since I can’t get the length. so I have to allocate a temporary (resizable) buffer, have the twine output to that (reallocating space as necessary), then copy from there into the box
time for pyston::Twine :)
Kevin Modzelewski
@kmod
Mar 27 2015 23:48
or you can call Twine.str() and call it a day :P
Chris Toshok
@toshok
Mar 27 2015 23:48
heh yeah. i ended up duplicating 2 lines of twine.str() (to keep from also creating the std::string)