These are chat archives for hibernate/hibernate-orm

21st
Jan 2014
Sanne Grinovero
@Sanne
Jan 21 2014 16:28
Ah, hy @hferentschik :)
Sanne Grinovero
@Sanne
Jan 21 2014 17:51
@sebersole I'll move the perf discussion over here.
so looking at the latest stats, there are 3 mains areas I'd like to work on:
sure, ping me when u're back
priority order:
  • 1 Reduce the number of times we allocate, or the size of each, for org.hibernate.engine.spi.EntityEntry
  • 2 Same for org.hibernate.engine.spi.EntityKey <- previous fixed where really good, but we should look at further improvements
  • 3 org.hibernate.engine.internal.StatefulPersistenceContext.<init> still allocates various IdentityHashMap, and we also still use them -> Is that expected after the instrumentation patches??
  • 4 org.hibernate.event.spi.LoadEvent and..
  • 5 org.hibernate.LockOptions both should go on a diet too: large fatty objects taking a lot of memory.
    these are essentially the ones which need your help, in order of priority :)
    there's another cause for frequent allocations which is simple to fix, and I started it for fun on the plane.. no need to discuss that now: it's probably easier when you'll see the patch next week when I can get back to it.
    also, my fix is relatively non-urgent compared to those above.

About 1: EntityEntry

on 1# no I don't have specifics. I was hoping you could have a look at that class and suggest some fields which could be removed, for example.
I understand that all the fields have a purpose of course, but maybe we can identify a set which only gets used in specific circumstances? That's where I really can't think of much without an in-depth knowledge of all parts.
Steve Ebersole
@sebersole
Jan 21 2014 18:22
@Sanne with regard to removing fields, I can certainly take a look.
To each point:
Sanne Grinovero
@Sanne
Jan 21 2014 18:23
cool

About 2: EntityKey

Previous patches where really good: we're down from 300GB to "just" 46GB.
Still, 46GB is a lot, it's about 4.4% of all used memory.
So, looking at the typical usage of it, it all revolves around how the PersistenceContext is organized; we clearly need both the primary ID for the entity, and the EntityPersister.
So we have a conceptual key {persister,id} which whe then use as key for several maps.. now what if:

  • I assume all EntityPersister instances are built upfront, and are a "well known set" after the SessionFactory has been built
  • We organize them all in an array-like structure, so that we can assign an ordinal identifier to each
  • the Maps in PersistenceContext get reorganized as an Array of Maps<primary Key>

We would have more arrays, but each access to the PersistenceContext could do a quick random-access toe the right Map - the one for the persister it's looking for- and from there use the primary ID as Key for the next map directly.

Sanne Grinovero
@Sanne
Jan 21 2014 18:29
So I think such a refactoring has some drawbacks: for example we'd probablye have PersistenceContext larger and fatter, as it will not have just one map but an array of maps
on the other hand, each of such maps would be smaller, could be lazily initialized, and lookup/insertion would be probably more efficient as we have way less collisions in the map (less data)
Steve Ebersole
@sebersole
Jan 21 2014 18:31

EntityEntry

I took a quick look at the fields. There are some we could talk about, but they fall into the same vein of things we discussed for EntityKey. Namely transient fields that we set lazily. They are meant to hold things that hurts to look up over and over (persister e.g.)
Sanne Grinovero
@Sanne
Jan 21 2014 18:32
also, I'm not aware how much code would get "less maintainable", there obviously is a tradeoff on nice code vs. quick one

EntityEntry

I suspected that. What about identifying a group of them and making an object which doesn't depend on the Session but only on the persister & similar reusable services? We'd reduce it to a single lookup for each of those fields, we'd reuse such an object, and we'd be using a single reference from the EntityEntry.
Steve Ebersole
@sebersole
Jan 21 2014 18:36
What do you mean "doesn't depend on the Session"?
Sanne Grinovero
@Sanne
Jan 21 2014 18:36

LockOptions

I had several ideas on this one, but they where all dead on arrival because it's not an interface.. you think we could change that?
Steve Ebersole
@sebersole
Jan 21 2014 18:37
Take lockMode for example. That "does not depend on Session". But it is specific to the Session/entity relation being described.
Which is the purpose of EntityEntry class as a whole
Sanne Grinovero
@Sanne
Jan 21 2014 18:40
I'm with you that reducing the EntityEntry won't be an easy one, just trying to start some reasoning about it :)
Of course the "diet" strategy is only a portion of what we could do.. back at the EntityKey optimisation you had found an awesome opportunity to reduce the number of them by removing an allocation case.
Steve Ebersole
@sebersole
Jan 21 2014 18:41
I'm just trying hard in other areas to rip out such "parallel arrays" (sorry i used that term way before it became en vogue with jdk 8 ;)
Sanne Grinovero
@Sanne
Jan 21 2014 18:41
I guess that's also harder for the EntityEntry.. still if you could look at opportunities to remove some of them that would help
Steve Ebersole
@sebersole
Jan 21 2014 18:42
if you have places where EntityEntry is created i can look
Sanne Grinovero
@Sanne
Jan 21 2014 18:42
I like the arrays, as long as they aren't "parallel" that you need multiple of them being in synch
Steve Ebersole
@sebersole
Jan 21 2014 18:42
let me search for usage
Sanne Grinovero
@Sanne
Jan 21 2014 18:45
we have accurate stats on who's guilty of allocating the most:
org.hibernate.engine.internal.StatefulPersistenceContext.addEntry -> 85 GB of EntityEntries
(actually it's also the only one ever allocating, so let me break that down):
org.hibernate.engine.internal.StatefulPersistenceContext.addEntity -> 44GB
org.hibernate.event.internal.DefaultLoadEventListener.convertCacheEntryToEntity -> 42GB
org.hibernate.engine.internal.TwoPhaseLoad.postHydrate -> 2,2GB
followed by smaller cases, probably less interesting to look at
Steve Ebersole
@sebersole
Jan 21 2014 18:51
So postHydrate illustrates an interesting case
Basically, Hibernate adds a "empty entry" up front when it first sees a key
it does this as the means to avoid recursively processing the same entity
Sanne Grinovero
@Sanne
Jan 21 2014 18:53
ahhh that would be a nice one to include some kind of "empty placeholder"!
Steve Ebersole
@sebersole
Jan 21 2014 18:53
right
Sanne Grinovero
@Sanne
Jan 21 2014 18:53
maybe it could even be a constant, i.e. not allocating at all
Steve Ebersole
@sebersole
Jan 21 2014 18:54
just a little nervous about all the places we'd need to handle that on read
Sanne Grinovero
@Sanne
Jan 21 2014 18:54
you could already "shave off" some 2GB and more :)
Steve Ebersole
@sebersole
Jan 21 2014 18:54
still, its something to look at
no
postHydrate is the number after we have read the state from db
Sanne Grinovero
@Sanne
Jan 21 2014 18:55
well are you not handling that already? if it's not the final object I assume something would already be broken if you ended up not checking for it consistently
Steve Ebersole
@sebersole
Jan 21 2014 18:55
thats the number we have actually read
i'd have to look tbh
org.hibernate.engine.internal.TwoPhaseLoad#addUninitializedEntity
that's where the placeholder is added
(for posterity sake)
@Sanne my point was that this would actually affect the bigger number
the 2G is from points where the placeholder becomes a real entry
Sanne Grinovero
@Sanne
Jan 21 2014 19:00
ah, which one would you be addressing then?
You'd need to drill down into your metrics more
find where org.hibernate.engine.internal.TwoPhaseLoad#addUninitializedEntity is referenced
Sanne Grinovero
@Sanne
Jan 21 2014 19:02
I'm happy to share the whole snapshot with you?
Steve Ebersole
@sebersole
Jan 21 2014 19:03
it calls PersistenceContext.addEntity
snapshot from what specifically?
will i be able to open it?
Sanne Grinovero
@Sanne
Jan 21 2014 19:04
sure
it's a bitch to download.. 500GB of compressed file, but then it's very useful.
Steve Ebersole
@sebersole
Jan 21 2014 19:04
tell me where
Sanne Grinovero
@Sanne
Jan 21 2014 19:07
StatefulPersistenceContext.addEntity:
org.hibernate.engine.internal.TwoPhaseLoad.addUninitializedEntity -> 2,1 GB
org.hibernate.engine.internal.TwoPhaseLoad.addUninitializedCachedEntity -> 42,3GB
org.hibernate.action.internal.AbstractEntityInsertAction.makeEntityManaged -> 90MB
Steve Ebersole
@sebersole
Jan 21 2014 19:08
org.hibernate.engine.internal.TwoPhaseLoad.addUninitializedEntity -> 2,1 GB
^^ thats what we'd be saving here
nothing to sneeze at
Sanne Grinovero
@Sanne
Jan 21 2014 19:08
indeed, would be nice.
Lol, I had it interprested wrong above ^ but I had guessed the size right: "you could already "shave off" some 2GB and more"
Steve Ebersole
@sebersole
Jan 21 2014 19:09
:)
end result is all that matters ! hehe
I'll look at making the singleton a subclass too to "lock" the internal state
Sanne Grinovero
@Sanne
Jan 21 2014 19:10

to explode the more interesting branch:

StatefulPersistenceContext.addEntity:
org.hibernate.engine.internal.TwoPhaseLoad.addUninitializedEntity -> 2,1 GB
org.hibernate.engine.internal.TwoPhaseLoad.addUninitializedCachedEntity -> 42,3GB
-- org.hibernate.event.internal.DefaultLoadEventListener.convertCacheEntryToEntity -> 42,3GB
---- org.hibernate.event.internal.DefaultLoadEventListener.loadFromSecondLevelCache -> 42,3GB
org.hibernate.action.internal.AbstractEntityInsertAction.makeEntityManaged -> 90MB

Steve Ebersole
@sebersole
Jan 21 2014 19:10
just to be as safe as possible
lol
thanks for that
was about to ask if that was additive
Sanne Grinovero
@Sanne
Jan 21 2014 19:11
I had it nicely indented before pasting..
Steve Ebersole
@sebersole
Jan 21 2014 19:11
there is not much we'll be able to do there i dont think
o wait
Sanne Grinovero
@Sanne
Jan 21 2014 19:12
so, not to do the same mistake of johara as before.. but it would be interesting to explore how an immutable Entry loaded from 2nd level cache could be assigned a "surrogate" EntityEntry; Maybe it could even be an appropriate place to throw some exceptions like " you're really not allowed to make changes here" ? In that case it would be a reusable instance..
Steve Ebersole
@sebersole
Jan 21 2014 19:13
the full 42G is tucked under addUninitializedCachedEntity ?
Sanne Grinovero
@Sanne
Jan 21 2014 19:13
I'm sure there might be some complexities in my fantasy, but we'd be winning a pot of 42GB ..
yes
Steve Ebersole
@sebersole
Jan 21 2014 19:13
that should be covered under the same change
Sanne Grinovero
@Sanne
Jan 21 2014 19:13
ah, wow..
Steve Ebersole
@sebersole
Jan 21 2014 19:13
we should really win 42G+2G there
definitely nothing to sneeze at!
Sanne Grinovero
@Sanne
Jan 21 2014 19:14
sounds like something you should be sending to Stale ASAP :)
would be exciting to see the results after that
Steve Ebersole
@sebersole
Jan 21 2014 19:14
he's not online
but we'll definitely point him to https://hibernate.atlassian.net/browse/HHH-8887
Sanne Grinovero
@Sanne
Jan 21 2014 19:15
they're currently working hard to "polish" a driver issue in Postgres JDBC to "shave off" a 0,5%, but here you're talking about a 6% off overall heap!
what about the LockOptions -> interface, I guess you don't want that to happen in a micro release?
ah actually maybe that what I was planning to do could work even without that, I'll need to play with it
FYI the LockOptions are taking 20GB, and I'm assuming that most of them will be carrying the same options.
Steve Ebersole
@sebersole
Jan 21 2014 19:19
i need to rethink LockOptions overall
It was done in haste for JPA
I am open to suggestions if you have time
Sanne Grinovero
@Sanne
Jan 21 2014 19:21
I'll code it to see if I get into a dead end or not, but basically I was thinking of a copy-on-write pattern.
we'd have a set of immutable instances with the most common options, and we'd encourage constructors to take them from this set as an initial starting point.. if they then happen to start messing with it, like changing the timeout, then we need a defensive copy.
Steve Ebersole
@sebersole
Jan 21 2014 19:26
thats one option. more i even mean scraping LockOptions overall and starting afresh
Sanne Grinovero
@Sanne
Jan 21 2014 19:29
ah, I wasn't assuming we had that "option"
but good to know there are no restrictions
are we really talking about org.hibernate.LockOptions here? looks like public API, not something you'd like to change now?
Sanne Grinovero
@Sanne
Jan 21 2014 19:36
@sebersole ^ ?
Steve Ebersole
@sebersole
Jan 21 2014 20:01
you can add new apis and deprecate the old, right?
The use of LockOptions you are talking about are not coming from the LockOptions api anyway
since these all come from strict JPA usage
its the internal mapping of JPA features onto LockOptions that are in play here
@Sanne ^^
Sanne Grinovero
@Sanne
Jan 21 2014 20:16
ah, thanks that might simplify a lot as it reduces the number of places to look at. That would be another 20GB ..
I'll look at this one after the release trains of tomorrow
Steve Ebersole
@sebersole
Jan 21 2014 20:22
OK, I already started looking into HHH-8887 btw
Shoot
first obstacle is a change gail made once upon a time
to add EntityKey to the EntityEntry
Steve Ebersole
@sebersole
Jan 21 2014 20:49
@Sanne not so sure this is going to be doable
talking about the EntityEntry change
trouble is specifically around:
  • org.hibernate.engine.spi.EntityEntry#getEntityKey
  • org.hibernate.engine.spi.EntityEntry#getPersister
  • org.hibernate.engine.spi.EntityEntry#getId
we need at least some form of {persister,id} available from the entry
that, or risk either:
  • expensive look ups of the EntityKey
  • extra creations of EntityKey
Steve Ebersole
@sebersole
Jan 21 2014 21:30
We could work around this by introducing (yet) another Map to StatefulPersistenceContext, specifically mapping "entity instance" to EntityKey
basically there are 3 values at play here
  • The entity instance
  • The EntityKey
  • The EntityEntry
Together they all define the entity and its relation to a Session
we already keep a few cross references between these:
  • EntityKey -> EntityEntry
  • entity instance -> EntityEntry
Steve Ebersole
@sebersole
Jan 21 2014 21:37
So one option would be to keep a reference from EntityEntry to EntityKey (or make the existing EntityKey -> EntityEntry map a "bimap") or keep a reference from entity instance to EntityKey
Steve Ebersole
@sebersole
Jan 21 2014 22:39
Another possibility is to do some normalization here
EntityEntry is pretty normalized after our last round of perf improvements
EntityEntry would need to drop 'cachedEntityKey', 'id', 'persister' (plus some others not needed like 'tenantId')
But that means the ability to go from any of the 3 forms to the others efficiently is even more important