These are chat archives for bio4j/bio4j

14th
Nov 2014
Eduardo Pareja Tobes
@eparejatobes
Nov 14 2014 18:02
about #62 and vertex-centric indexes
if you read Titan docs about their data model
it looks like there's actually a limit on the number of edges incident to a vertex
regardless of any vertex-centric stuff
that would be sad

quoting:

The maximum number of cells allowed per row in a particular storage backend is therefore also the maximum degree of a vertex that Titan can support against this backend.

Eduardo Pareja Tobes
@eparejatobes
Nov 14 2014 18:09
see also thinkaurelius/titan#11 and thinkaurelius/titan#93
which somehow made me think that adding a vertex-centric index based on a property that would classify the edges could fix this
if someone knows something about all this, I'm all ears
Alexey Alekhin
@laughedelic
Nov 14 2014 23:08

Hi! After reading Titan docs, I understood it as vertex centric indexes have nothing to do with this limitation: it uses BigTable data model anyway and the number of cells in a row is limited by a particular backend (ok, then let’s review backends).
But I think that the problem is not with the limited number of cells/edges that can be stored, rather with the number of edges that can be loaded in memory at once. (it’s just my primitive understanding…).
From the issues you reffered:

This limitation also applies to dense index entries, i.e. if one is loading millions of properties with the same indexed value, then that creates a dense list of entries under that index entry. In these cases, failure in the storage backend may occur.

and

If it is necessary for you to really pull in all 1M+ edges (i.e. indices
and limit() won't do anything for you) then you are likely entering OLAP
land […]

So the point is that indexes help, but don’t solve the problem globally.

ok, after reading what I've written, I think I didn’t say anything new here.. “/ never mind

From Cassandra 15.6:

Titan over Cassandra supports global vertex and edge iteration. However, note that all these vertices and/or edges will be loaded into memory which can cause OutOfMemoryException.

That’s about global iteration, but I guess with a local index with billions of elements it’s the same..

Alexey Alekhin
@laughedelic
Nov 14 2014 23:16
Same about HBase and BerkleyDB.

Partitioning!!!:

Cutting a vertex means storing a subset of that vertex’s adjacency list on each partition in the graph. In other words, the vertex and its adjacency list is partitioned thereby effectively distributing the load on that single vertex across all of the instances in the cluster and removing the hot spot.