These are chat archives for dropbox/pyston

4th
Oct 2015
Marius Wachtler
@undingen
Oct 04 2015 15:17
I tried to look into it a little bit and it looks like django has much more cache misses so it may indeed be cache sizes / mem perf which is important for django.
                                  pyston-django10x  cpython-django10x pyston-pyxl10x pyston-sqlalch10x 
isns per cycle                     0.92             1.25               1.53           1.00
branch-misses (% of all branches)  3.19%            3.53%              1.31%          2.00%
cache-misses (M/sec)              51.902            5.086             16.566         18.163
L3-cache misses (M/sec)            6.081            0.840              1.363          2.249
The notebook and the server have the same L1 and L2 cache sizes but L3 is 4MB (notebook) vs 25MB (server) (but shared between all cores). I sadly can't verify it because the server doesn't allow me to show the perf events but I would guess that the number of L3 cache misses will be much lower.
I think the cache-misses are probably not so bad but the L3-cache misses will hurt a lot AFAIK
Marius Wachtler
@undingen
Oct 04 2015 15:25
this are the top 10 functions which produce L3 cache misses for django10x (not really surprising)
    15.75%  _ZN6pyston2gc17TraversalWorklist7addWorkEPv                                                               
     4.94%  _ZN6pyston3Box7getattrILNS_10RewritableE1EEEPS0_PNS_11BoxedStringEPNS_18GetattrRewriteArgsE               
     4.53%  _ZN6pyston2gc10SmallArena10_freeChainEPPNS1_5BlockERSt6vectorIPNS_3BoxESaIS7_EERS5_IPNS_10BoxedClassESaISC
     4.27%  _ZN6pyston2gc9markPhaseEv                                                                                 
     3.14%  _ZN6pyston13BoxedFunction9gcHandlerEPNS_2gc9GCVisitorEPNS_3BoxE                                           
     2.22%  _ZN6pyston2gc9GCVisitor6_visitEPPv                                                                        
     2.08%  _ZN6pystonL11pickVersionEPNS_10CLFunctionENS_14ExceptionStyleEiPNS_3BoxES4_S4_PS4_                        
     1.64%  _ZN6pyston11HiddenClass8gc_visitEPNS_2gc9GCVisitorE                                                       
     1.58%  _ZN6pyston3Box9gcHandlerEPNS_2gc9GCVisitorEPS0_                                                           
     1.32%  _ZN6pyston6ICInfo13shouldAttemptEv
Marius Wachtler
@undingen
Oct 04 2015 15:41
and for fun one more which compares pyston cpython and pypy 2.2.1 (old version which comes with ubuntu 14.04)
                                  pyston-django10x  cpython-django10x pypy221-django10x
isns per cycle                     0.92             1.25              0.69     
branches (M/secs)                549.567          847.021           468.899  
branch-misses (% of all branches)  3.19%            3.53%             4.04%
cache-misses (M/sec)              51.902            5.086            41.742
L3-cache misses (M/sec)            6.081            0.840             6.613
elapsed time (secs)               12.67             8.98             12.88
This stats make me eager to know how pyston would perform with refcounting :-D
But it maybe that I interpreted to much into this cache miss numbers maybe it's something else