These are chat archives for bio4j/bio4j
The maximum number of cells allowed per row in a particular storage backend is therefore also the maximum degree of a vertex that Titan can support against this backend.
Hi! After reading Titan docs, I understood it as vertex centric indexes have nothing to do with this limitation: it uses BigTable data model anyway and the number of cells in a row is limited by a particular backend (ok, then let’s review backends).
But I think that the problem is not with the limited number of cells/edges that can be stored, rather with the number of edges that can be loaded in memory at once. (it’s just my primitive understanding…).
From the issues you reffered:
This limitation also applies to dense index entries, i.e. if one is loading millions of properties with the same indexed value, then that creates a dense list of entries under that index entry. In these cases, failure in the storage backend may occur.
If it is necessary for you to really pull in all 1M+ edges (i.e. indices
and limit() won't do anything for you) then you are likely entering OLAP
So the point is that indexes help, but don’t solve the problem globally.
From Cassandra 15.6:
Titan over Cassandra supports global vertex and edge iteration. However, note that all these vertices and/or edges will be loaded into memory which can cause OutOfMemoryException.
That’s about global iteration, but I guess with a local index with billions of elements it’s the same..
Cutting a vertex means storing a subset of that vertex’s adjacency list on each partition in the graph. In other words, the vertex and its adjacency list is partitioned thereby effectively distributing the load on that single vertex across all of the instances in the cluster and removing the hot spot.