These are chat archives for dropbox/pyston

13th
Sep 2015
An Long
@aisk
Sep 13 2015 12:30
Hi any one can help me with this clang RELEASE test? clang DEBUG test and GCC test is passed https://travis-ci.org/dropbox/pyston/jobs/80075993
Marius Wachtler
@undingen
Sep 13 2015 13:27
I restarted it (the test log showed: it didn't receive output in 10min...)
Marius Wachtler
@undingen
Sep 13 2015 13:42
I upgraded LLVM to the most recent version (some version from yesterday), there were escpecially some changes around the alias analysis, but now after getting it to compile I figured out that all AA passes have to be hardcoded inside llvm :-( to make them work with the old pass manager
So in order that passes really call into our AA we would need another small llvm patch which whould register our AA with the AAResultsWrapperPass
In addition clang 3.4 can't compile the current clang (crash during the compilation in release mode for some c++ file) so I had to upgrade to clang 3.5.. :-(
Marius Wachtler
@undingen
Sep 13 2015 13:51
@aisk You should add some simple tests to #916 :-)
An Long
@aisk
Sep 13 2015 14:27
@undingen Done :smile:
Marius Wachtler
@undingen
Sep 13 2015 14:35
:-), maybe you could also add print statement which prints the doc string of the modules to make sure it changed and that it is infect a unicode string. Sorry for not telling what I wanted from the beginning...
An Long
@aisk
Sep 13 2015 14:42
I want to know why we not do some assert in the tests? Atfer we write the test code and ensure the output is right, know one knows what it should output I think...
Marius Wachtler
@undingen
Sep 13 2015 15:02
we run the tests with cpython and pyston and compare them
if the output is not exactly the same the test fails
so it's kind of like an assert but we don't need to add lots of manual checks
Sun
@Daetalus
Sep 13 2015 15:05
Hi @undingen , seems Pyston use denseMap and denseSet to implement dict and set. But every time denseMap or Set call insert or erase. It will recalculate the element hash value.
I did some investigation. But seems BoxAndHash just return its hash attribute in getHashValue
Marius Wachtler
@undingen
Sep 13 2015 15:07
mmh the BoxAndHash type should only calc the python hash value once..
but I will test it
Sun
@Daetalus
Sep 13 2015 15:12
FYI, in from_cpython/Lib/cpython/test_set, test_do_not_rehash_dict_keys
An Long
@aisk
Sep 13 2015 15:12
@undingen got it, thx :smile:
Marius Wachtler
@undingen
Sep 13 2015 15:23
ok it looks like we don't recompute the hash if we update a dict with a dict but if we update a set from a dict we will recompute the hashes. So it looks like we have to also special case dicts inside sets and vice versa.
Marius Wachtler
@undingen
Sep 13 2015 15:28
mmh ok looks like nearly all set methods miss the special casing for sets (and dicts)
Sun
@Daetalus
Sep 13 2015 15:29
Would you mind to explain what is the special case, please?
Maybe some dict operations not recompute the hash just because they did not call insert or erase? If my understanding is correct. dict.setdefault will recompute the hash, because it called insertinternally.
Marius Wachtler
@undingen
Sep 13 2015 15:36
for example take setDifference it will iterate over a set using the pyElements. This way we will just retrieve every element as a Box* even though if it's coming from a set so we already have a hash.
    for (auto container : args->pyElements()) {
        if (PyAnySet_Check(container)) {
            for (auto&& elt : ((BoxedSet*)container)->s) {
                rtn->s.erase(elt);
            }
        } else {
            for (auto elt : container->pyElements()) {
                rtn->s.erase(elt);
            }
        }
    }
I think setDifference should look something like this (I added the PyAnySet_Check branch)
cpython has also this special cases so it looks like the right solution
but for some functions have in an additon a special case for dicts. so it's good to look into the cpython code to get the same.
Marius Wachtler
@undingen
Sep 13 2015 15:43
This message was deleted
... oh did not want to delete this...
Sun
@Daetalus
Sep 13 2015 15:45
So auto&& elt : ((BoxedSet*)container)->s will retrive the element exactly, will not recompute the hash in erase?
Marius Wachtler
@undingen
Sep 13 2015 15:45
you are right for dictSetdefault it should be something like this
    BoxAndHash k_hash(k);
    auto it = self->d.find(k_hash);
    if (it != self->d.end())
        return it->second;

    self->d.insert(std::make_pair(k_hash, v));
    return v;
yes it won't recompute the hash but instead it should return a BoxAndHash instance
Sun
@Daetalus
Sep 13 2015 15:46
Got it. Noted.
Marius Wachtler
@undingen
Sep 13 2015 15:47
and for dictSetdefault we could even do better by removing the .find call and instead always inserting and checking the return code, this would save on hash lookup
shouldn't be hard to get rid of the hash recomputes but it's kind of annoying that one has to compare every method inside the dict and set class with the cpython implementation :-(
Sun
@Daetalus
Sep 13 2015 15:50
I will see what I can do.
Marius Wachtler
@undingen
Sep 13 2015 15:50
:-)
Sun
@Daetalus
Sep 13 2015 15:51
You mean remove the find, and insert the element directly, check the return value?
Marius Wachtler
@undingen
Sep 13 2015 15:52
DenseMap::insert returns a std::pair<iterator, bool> where the bool is false if the key is already in the map
but that's just a microoptimization feel free to ignore it
@aisk thanks for adding the test, but we will have to to wait until kmod is back
Sun
@Daetalus
Sep 13 2015 15:58
I will add with this optimization.

Another question, if you have time.
In Pyston:

p = weakref.proxy(A_INSTNACE)
A_INSTANCE = None

the proxy weak reference will not finalize correctly. I checked the code in /src/gc/collector.cpp:656 and from_cpython/Objects/weakrefobject.c, didn't not found any code obviously wrong.

Marius Wachtler
@undingen
Sep 13 2015 16:09
mmh I think it's just the difference of using refcounting vs using a "real" GC
you can trigger a collection explicitly with gc.collect() but this will probably not be enough because there is often still a pointer around somewhere which references it
so you may need to overwrite some memory. have a look at the test/tests/weakref5.py test to see what I mean
Sun
@Daetalus
Sep 13 2015 16:15
I will investigate it.
Can be understood as when a_instance = None, in reference counting. It will finallize proxy immediately. But in GC, it will reclaim the proxy until GC run?
Marius Wachtler
@undingen
Sep 13 2015 16:22
yes, at least that is how I think cpython does it (immediately after refcount hits 0)
in addition because we have conservative GC we have sometimes still references on the stack or in a register and the GC will think the object is still alive. This often happens inside small test functions. But shouldn't be an issue if a lot of code is getting executed between GC collections e.g. real programs
Sun
@Daetalus
Sep 13 2015 16:28
I see
Thanks! @undingen
Marius Wachtler
@undingen
Sep 13 2015 16:29
np, thanks for the patches :-)
python 3.5 got released today
Sun
@Daetalus
Sep 13 2015 16:36
Seems Python 2.7 support will continues untill 2020.
An Long
@aisk
Sep 13 2015 17:01
cool
I like async / await
Sun
@Daetalus
Sep 13 2015 17:09
IMHO, maybe less commits for this situation will be better. One for actually change, one for test.