Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    Vinayak Shiddappa Bali
    @VINAYAK179

    Hi All,

    Trying to implement OLAP for performance improvement of count queries.
    https://docs.janusgraph.org/advanced-topics/hadoop/
    Referred to the above document, still not working.
    Error: Spark master not responding, but it's running
    Error while invoking RpcHandler #receive() for one-way message while spark job is hosted on Jboss and trying to connect to master

    Thanks

    Florian Cäsar
    @flotothemoon

    Hi, I'm experiencing a strange no response bug somewhere between Gremlin python, JanusGraph and Gremlin server:
    Some long traversals (with thousands of instructions) don't get a response. And I don't mean that they time out or that they get an error back, I mean they get nothing back whatsoever. No log entries at the server, nothing in the client.

    The threshold for this silence treatment isn't clear - it doesn't clearly depend on bytes or number of instructions. For instance, this code will hang forever for me:

    g = traversal().withRemote(DriverRemoteConnection('ws://localhost:8182/gremlin', 't'))
    g = g.inject("")
    for i in range(0, 8000):
        g = g.constant("test")
    print(f"submitting traversal with length={len(g.bytecode.step_instructions)}")
    result = g.next()
    print(f"done, got: {result}") # this is never reached

    I know it doesn't depend on just the number of bytes because the number of instructions beyond which this happens doesn't change even with very large strings instead of just "test".

    Just in case, I've already increased server-side maxHeaderSize, maxChunkSize, maxContentLength etc. to ridiculously high numbers. No change.

    Any ideas what I'm doing to deserve the silent treatment? This is driving me insane.

    3 replies
    Florian Cäsar
    @flotothemoon

    I'm trying to use JanusGraph's full-text predicates in the gremlin-python client library. Using GraphSON serializer, I can just use a predicate with e.g. "textContains" as the operator and it works since JanusGraphIoRegistryV1d0 registers its custom deserializer for P objects:

    addDeserializer(P.class, new JanusGraphPDeserializerV2d0());

    However, as of v0.5.3, JanusGraph does not register any deserializers for GraphBinary (though that feature is already on the master branch). This means that when I submit the same exact traversal with P("textContains", "string") in graphbinary format I get:

    org.apache.tinkerpop.gremlin.server.handler.OpSelectorHandler  - Invalid OpProcessor requested [null]

    I presume this is because the "textContains" predicate isn't registered. Weirdly enough, in my Groovy console, the same traversal works fine even though it also uses graphbinary (according to the configuration).

    There are a couple options here and I don't have enough information on any of them, so I would appreciate input:

    1. Figure out what the Groovy console is doing differently and use that in the Python library
    2. Use a Docker image from master branch and adapt the Python library to use the new custom JanusgraphP type in graphbinary
    3. Use two separate clients with different serializations depending on which traversal I need to run (yuck)

    Note: I've already tested https://github.com/JanusGraph/janusgraph-python, it does the same thing I do manually and thus only works with GraphSON.

    julianhatwellvv
    @julianhatwellvv

    ok, found - configuration options:

    schema.constraints: 'true'
    schema.default: none

    How can these options be set on an already created graph? I can do it in ConfiguredGraphFactory, but what about a running graph?

    3 replies
    julianhatwellvv
    @julianhatwellvv

    How do I set schema constraints on PropertyKeys i.e. property annotations? In the reference on Advanced Schema - Multi Properties, it demonstrates how to add property annotations. That is, properties on properties. I would like to schema constrain these annotations e.g. to create an address block with annotations for house number, street name etc.

    If I use mgmt.addProperties(myPropertyKey, myOtherPropertyKey) I get an error. This capability is not documented or does not seem to exist. What is the correct procedure?

    Venkat Dasari
    @dasari2828_gitlab
    I am trying to follow the procedure of bulk loading the data as described on the JanusGraph Documentation page. Loading the vertices, and trying to save the vertex-id for the key and then using that to load the edges for the vertices. The problem is, the vertex-id is not matching with what's in the graph. I tried to generate the vertex id after the configuration changes, but it won't allow me to insert any value as vertex id. Any suggestions?
    10 replies
    Harshit Sharma
    @dreamerHarshit
    I was using this query g.V(vertexId).repeat(both(labels).simplePath()).hasLabel(person).emit().id().fold().next()
    to find distinct person in a connected components.
    Actually, in my dataset, I have around 8 million vertices and 8 million edges and
    this query seems to be very expensive in terms of time complexity. So is there a way to optimize this query?
    Florian Cäsar
    @flotothemoon

    What is the recommended way of backing up Janusgraph data together with mixed indices?
    How do you ensure the backups are consistent?

    I've read all I could find on this and the gist seems to be "use whatever backup mechanism your backend storage supports", which is fine (Scylla has backups), but doesn't answer the question of coordinating the backups.

    Alternatively, I could of course rebuild indices from scratch after backup, but that seems unnecessarily expensive.

    Venkat Dasari
    @dasari2828
    I am doing a bulk loading of vertices, and trying to save the vertex ids. When i add 50 to 100 of them and then do vertex.toList(), its returning only the last one. What is the recommended way of getting all the vertices?
    1 reply
    Alessandro Sivieri
    @sivieri
    Hello! Do you know if anyone has experience of running OLAP on an AWS EMR cluster? I am currently trying to do so, but strange things are happening, in particular I have the sensation that the application is running only on the master and not on the other nodes, based on the fact that it appears correctly on Hadoop and Spark UI but it uses only two executors (I have configured memory and cores in my properties), which seem to occupy the entire driver resources and that's it.
    Moreover, I seem to have problems in getting the correct output: I started from the properties example that uses CQL, but I do not receive any meaningful answer on queries that I do on the Gremlin console (the data is there, because I am able to query it without Spark). I saw that the conf shows, as GraphWriter, a NullOutputFormat, so I tried to set the Gyro one in there, but nothing changed.
    G-Ark
    @G-Ark

    Hello, have anyone tried custom data types in Janusgraph?
    I tried following the doc - https://docs.janusgraph.org/advanced-topics/serializer/ but always ended up with error root.attributes.custom.attribute-class.

    That suggests that the list have not been handled correctly from the configurations? No other config also reads from a list as well to help me confirm if it indeed is a bug.

    Vinayak Shiddappa Bali
    @VINAYAK179
    Hi All, 
    
    I need to select multiple nodes and edges and display the content in v1 - e - v2 format. The query generated is as follows:
    
    g.V().union(has('title', 'V1').as('v1').outE().hasLabel('E1').as('e').inV().has('title', 'V2').as('v2'),has('title', 'V2').as('v1').union(outE().hasLabel('E2').as('e').inV().has('title', 'V2'),outE().hasLabel('E3').as('e').inV().has('title', 'V3')).as('v2')).select('v1','e','v2').by(valueMap().by(unfold()))
    
    It throws the warning:
    05:20:21 WARN  org.janusgraph.graphdb.transaction.StandardJanusGraphTx  - Query requires iterating over all vertices [()]. For better performance, use indexes
    
    How can we optimize the query may be without a union step?
    
    Thanks & Regards,
    Vinayak
    limejuly
    @limejuly
    This message was deleted
    1 reply
    G-Ark
    @G-Ark

    Hello, can anyone help me with how to index?

    The query which fails for indexing is

    g.V().hasLabel("Email")
            .or(
                __.has("email_from", P.within(“A"))
                    .or(
                        __.has("email_to", P.within(“B")),
                        __.has("email_copy_to", P.within(“B"))
                    ),
                __.has("email_from", P.within(“A"))
                    .or(
                        __.has("email_to", P.within(“B")),
                        __.has("email_copy_to", P.within(“B"))
                    )
            )

    The index that has already been added is

    janusGraphManagement.buildIndex("email_from_to_copy_list", Vertex.class)
              .addKey(emailFrom, Mapping.STRING.asParameter())
              .addKey(emailTo, Mapping.STRING.asParameter())
              .addKey(emailCopyTo, Mapping.STRING.asParameter())
              .indexOnly(emailLabel)
              .buildMixedIndex("search”);

    As you might have already guessed, “email_to”, “email_from” and “email_copy_to” are all having a cardinality of type list.

    Even after adding the above index, while querying, I still get a WARN of query is faster when indexed. The query is also very very time consuming.

    1 reply
    bvkart
    @bvkart
    Hi All, What is the roadmap of JanusGraph with dynamo DB as a backend ? I see the https://github.com/awslabs/dynamodb-janusgraph-storage-backend repo to be running on an older version of JG
    2 replies
    Florian Cäsar
    @flotothemoon
    Can you add a new property key to an existing mixed index?
    The docs aren't very clear on this. There is a method called "addIndexKey" in the management interface, but it's not documented. Looking at the code it might be possible, but I want to know for sure before I commit to this indexing approach.
    2 replies
    Joseph Kesting
    @kestingj

    Hello!

    I am currently working on a project that computes a 2 hop query for several million vertices. In order to speed up these queries I would like to utilize caching but I am having some trouble finding exact documentation on what is stored by the DB Cache vs. what is stored in the Transaction Cache. The query that I am executing traverses all nodes within a two hop network and then extracts a property from all vertices in that network. Currently these queries are running in different threads that share the DB cache but execute separate transactions and am not seeing the cache performance that I would have hoped.

    Is this property I am trying to fetch cached in the DB cache or is the DB cache is only used to maintain adjacency lists? Additionally, if I did refactor these threads to share a common transaction would that property be cached in the Transaction cache?

    Thanks for your assistance!

    Joe

    G-Ark
    @G-Ark
    Traversal Metrics
    Step                                                               Count  Traversers       Time (ms)    % Dur
    =============================================================================================================
    JanusGraphStep([],[type.eq(Person), person_emai...                     2           2        1157.132    93.81
        \_condition=(type = Person AND person_email = person.name@domain.com)
        \_orders=[]
        \_isFitted=true
        \_isOrdered=true
        \_query=multiKSQ[1]@100000
        \_index=person_type_email_id_composite
      optimization                                                                                 4.943
      optimization                                                                              1094.948
      backend-query                                                        2                      83.097
        \_query=person_type_email_id_composite:multiKSQ[1]@100000
        \_limit=100000
    PropertyMapStep(property)                                              2           2          76.304     6.19
                                                >TOTAL                     -           -        1233.437        -

    I am trying to do the below simple search query.

    g.V()
            .has(ENTITY_TYPE, PERSON_LABEL)
            .has(PERSON_EMAIL, graphReferenceUserId)
        .propertyMap()
        .profile()
        .toList()

    Index has been added as

    final PropertyKey entityTypeLabel = janusGraphManagement
              .makePropertyKey(ENTITY_TYPE)
              .dataType(String.class)
              .cardinality(Cardinality.SINGLE)
              .make();
    final PropertyKey personEmail = janusGraphManagement
              .makePropertyKey(PERSON_EMAIL)
              .dataType(String.class)
              .cardinality(Cardinality.SET)
              .make();
    janusGraphManagement.buildIndex("person_type_email_id_composite", Vertex.class)
              .addKey(entityTypeLabel)
              .addKey(personEmail)
              .buildCompositeIndex();

    There are around 500 nodes and querying a single indexed node is taking 350ms.
    Please help me understand what I am doing wrong.

    Sai Supraj Ratakonda
    @ratakonda_sai_twitter

    Does anyone faced issues with configuredGraphFactory in v0.5.3 i started gremlin server with changes in gremlin-server.yaml:

    graphs: {
    graph: conf/janusgraph-scylla-configurationgraph.properties,
    ConfigurationManagementGraph: conf/janusgraph-scylla-configurationgraph.properties
    }

    my properties file contains following properties:

    gremlin.graph = org.janusgraph.core.ConfiguredGraphFactory
    graph.graphname=ConfigurationManagementGraph
    storage.backend=cql
    storage.hostname=**hostnames of scylla db**
    storage.cql.keyspace=ConfigurationManagementGraph
    cache.db-cache = true
    cache.db-cache-clean-wait = 20
    cache.db-cache-time = 180000
    cache.db-cache-size = 0.5
    storage.cql.write-consistency-level = QUORUM
    storage.cql.read-consistency-level = QUORUM
    storage.cql.protocol-version=4
    storage.read-time=100000
    storage.write-time=100000
    graph.set-vertex-id=false

    I am getting the following error:
    gremlin> :remote connect tinkerpop.server conf/remote.yaml session
    ==>Configured localhost/127.0.0.1:8182-[002f6bb9-e886-4598-a61c-2765f2c69f13]
    gremlin> :remote console
    ==>All scripts will now be sent to Gremlin Server - [localhost/127.0.0.1:8182]-[002f6bb9-e886-4598-a61c-2765f2c69f13] - type ':remote console' to return to local mode
    gremlin> ConfiguredGraphFactory.getGraphNames() --->> is not returning anything
    gremlin> map = new HashMap();
    gremlin> map.put("graph.graphname", "graph");
    ==>null
    gremlin> ConfiguredGraphFactory.createConfiguration(new MapConfiguration(map));
    Must provide vertex id
    Type ':help' or ':h' for help.
    Display stack trace? [yN]y
    java.lang.IllegalArgumentException: Must provide vertex id
    at com.google.common.base.Preconditions.checkArgument(Preconditions.java:122)
    at org.janusgraph.graphdb.transaction.StandardJanusGraphTx.addVertex(StandardJanusGraphTx.java:506)
    at org.janusgraph.graphdb.tinkerpop.JanusGraphBlueprintsTransaction.addVertex(JanusGraphBlueprintsTransaction.java:121)
    at org.janusgraph.graphdb.tinkerpop.JanusGraphBlueprintsGraph.addVertex(JanusGraphBlueprintsGraph.java:141)
    at org.janusgraph.graphdb.management.ConfigurationManagementGraph.createConfiguration(ConfigurationManagementGraph.java:128)
    at org.janusgraph.core.ConfiguredGraphFactory.createConfiguration(ConfiguredGraphFactory.java:182)
    at org.janusgraph.core.ConfiguredGraphFactory$createConfiguration$0.call(Unknown Source)
    at Script5.run(Script5.groovy:1)
    at org.apache.tinkerpop.gremlin.groovy.jsr223.GremlinGroovyScriptEngine.eval(GremlinGroovyScriptEngine.java:674)
    at org.apache.tinkerpop.gremlin.groovy.jsr223.GremlinGroovyScriptEngine.eval(GremlinGroovyScriptEngine.java:376)
    at javax.script.AbstractScriptEngine.eval(AbstractScriptEngine.java:233)
    at org.apache.tinkerpop.gremlin.groovy.engine.GremlinExecutor.lambda$eval$0(GremlinExecutor.java:267)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

    Venkat Dasari
    @dasari2828_gitlab
    Just a quick general question:- has anyone loaded Janus Graph with billions of vertices and edges and was able to traverse with simple filters? I am struggling with loading simple datasets in million, and the querying of the data takes forever. I am using very large machines to run for my use case. Has anyone created a document or blog that can leverage spark to load the data and tried traversing it quickly? Is JanusGraph really scalable and distributed as it boasts?
    Venkat Dasari
    @dasari2828_gitlab

    Just a quick general question:- has anyone loaded Janus Graph with billions of vertices and edges and was able to traverse with simple filters? I am struggling with loading simple datasets in million, and the querying of the data takes forever. I am using very large machines to run for my use case. Has anyone created a document or blog that can leverage spark to load the data and tried traversing it quickly? Is JanusGraph really scalable and distributed as it boasts?

    No one? No idea?

    Venkat Dasari
    @dasari2828_gitlab
    When I am adding some 1000 vertices at a time, and calling the method vertex.getList(). It always returns me the last one. What is the point of toList()?
    Michael Wilson
    @mike:matrix.phragma.org
    [m]

    Hey all. I'm unable to use janusgraph started from the latest docker container. I've done nothing other than do: docker run --rm -p 8182:8182 docker.io/janusgraph/janusgraph:latest

    janusgraph-default | 979 [main] WARN org.apache.tinkerpop.gremlin.server.GremlinServer - Graph [graph] configured at [/etc/opt/janusgraph/janusgraph.properties] could not be instantiated and will not be available in Gremlin Server. GraphFactory message: GraphFactory could not instantiate this Graph implementation [class org.janusgraph.core.JanusGraphFactory]
    janusgraph-default | java.lang.RuntimeException: GraphFactory could not instantiate this Graph implementation [class org.janusgraph.core.JanusGraphFactory]
    janusgraph-default | at org.apache.tinkerpop.gremlin.structure.util.GraphFactory.open(GraphFactory.java:81)
    janusgraph-default | at org.apache.tinkerpop.gremlin.structure.util.GraphFactory.open(GraphFactory.java:69)
    janusgraph-default | at org.apache.tinkerpop.gremlin.structure.util.GraphFactory.open(GraphFactory.java:103)

    Michael Wilson
    @mike:matrix.phragma.org
    [m]
    Okay, I figured this out. I'm running on Mac, and it looked like I didn't have enough free space allocated for Docker.
    Venkat Dasari
    @dasari2828_gitlab

    Folks, I was able to use Remote Graph Traversal and was able to add a million vertices in 70 seconds. I am trying to parallelize the same, and I am getting the below error.

    Caused by: java.util.concurrent.CompletionException: org.apache.tinkerpop.gremlin.driver.exception.ResponseException: Could not commit transaction due to exception during persistence
    at java.util.concurrent.CompletableFuture.reportJoin(CompletableFuture.java:375)
    at java.util.concurrent.CompletableFuture.join(CompletableFuture.java:1947)
    at org.apache.tinkerpop.gremlin.driver.ResultSet.one(ResultSet.java:119)
    at org.apache.tinkerpop.gremlin.driver.ResultSet$1.hasNext(ResultSet.java:171)
    at org.apache.tinkerpop.gremlin.driver.ResultSet$1.next(ResultSet.java:178)
    at org.apache.tinkerpop.gremlin.driver.ResultSet$1.next(ResultSet.java:165)
    at org.apache.tinkerpop.gremlin.driver.remote.DriverRemoteTraversal$TraverserIterator.next(DriverRemoteTraversal.java:146)
    at org.apache.tinkerpop.gremlin.driver.remote.DriverRemoteTraversal$TraverserIterator.next(DriverRemoteTraversal.java:131)
    at org.apache.tinkerpop.gremlin.driver.remote.DriverRemoteTraversal.nextTraverser(DriverRemoteTraversal.java:112)
    at org.apache.tinkerpop.gremlin.process.remote.traversal.step.map.RemoteStep.processNextStart(RemoteStep.java:80)
    at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.next(AbstractStep.java:129)
    at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.next(AbstractStep.java:39)
    at org.apache.tinkerpop.gremlin.process.traversal.Traversal.fill(Traversal.java:184)
    at org.apache.tinkerpop.gremlin.process.traversal.Traversal.toList(Traversal.java:122)
    at com.iqvia.janus.JanusRemoteTraversalVertexMapPartitionJava$1$1.hasNext(JanusRemoteTraversalVertexMapPartitionJava.java:172)
    at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:43)
    at scala.collection.Iterator

    KaTeX parse error: Can't use function '$' in math mode at position 5: anon$̲10.hasNext(Iter…: anon$10.hasNext(Iterator.scala:458)
        at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown Source)
        at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
        at org.apache.spark.sql.execution.WholeStageCodegenExec
    anon$2.hasNext(WholeStageCodegenExec.scala:636)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeTask$1(FileFormatWriter.scala:244)
    at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1394)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:242)

    Folks, I was able to use Remote Graph Traversal and was able to add a million vertices in 70 seconds. I am trying to parallelize the same, and I am getting the below error.

    Caused by: java.util.concurrent.CompletionException: org.apache.tinkerpop.gremlin.driver.exception.ResponseException: Could not commit transaction due to exception during persistence
    at java.util.concurrent.CompletableFuture.reportJoin(CompletableFuture.java:375)
    at java.util.concurrent.CompletableFuture.join(CompletableFuture.java:1947)
    at org.apache.tinkerpop.gremlin.driver.ResultSet.one(ResultSet.java:119)
    at org.apache.tinkerpop.gremlin.driver.ResultSet$1.hasNext(ResultSet.java:171)
    at org.apache.tinkerpop.gremlin.driver.ResultSet$1.next(ResultSet.java:178)
    at org.apache.tinkerpop.gremlin.driver.ResultSet$1.next(ResultSet.java:165)
    at org.apache.tinkerpop.gremlin.driver.remote.DriverRemoteTraversal$TraverserIterator.next(DriverRemoteTraversal.java:146)
    at org.apache.tinkerpop.gremlin.driver.remote.DriverRemoteTraversal$TraverserIterator.next(DriverRemoteTraversal.java:131)
    at org.apache.tinkerpop.gremlin.driver.remote.DriverRemoteTraversal.nextTraverser(DriverRemoteTraversal.java:112)
    at org.apache.tinkerpop.gremlin.process.remote.traversal.step.map.RemoteStep.processNextStart(RemoteStep.java:80)
    at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.next(AbstractStep.java:129)
    at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.next(AbstractStep.java:39)
    at org.apache.tinkerpop.gremlin.process.traversal.Traversal.fill(Traversal.java:184)
    at org.apache.tinkerpop.gremlin.process.traversal.Traversal.toList(Traversal.java:122)
    at com.iqvia.janus.JanusRemoteTraversalVertexMapPartitionJava$1$1.hasNext(JanusRemoteTraversalVertexMapPartitionJava.java:172)
    at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:43)
    at scala.collection.Iterator

    KaTeX parse error: Can't use function '$' in math mode at position 5: anon$̲10.hasNext(Iter…: anon$10.hasNext(Iterator.scala:458)
        at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown Source)
        at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
        at org.apache.spark.sql.execution.WholeStageCodegenExec
    anon$2.hasNext(WholeStageCodegenExec.scala:636)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeTask$1(FileFormatWriter.scala:244)
    at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1394)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:242)

    Any help? Anyone tried with Spark and Remote Traversal to insert millions of rows?

    jpfloresibm
    @jpfloresibm

    We are trying to add a vertex in janusgraph 0.5.2 and seeing the following error

    java.lang.IllegalArgumentException: Multiple entries with same key: abilitec_link.identification_number=org.janusgraph.diskstorage.indexing.StandardKeyInformation@7f5cab1d and abilitec_link.identification_number=org.janusgraph.diskstorage.indexing.StandardKeyInformation@365bc1bc
    52417 [Executor task launch worker for task 25] INFO  org.apache.spark.storage.ShuffleBlockFetcherIterator  - Started 3 remote fetches in 3 ms
        at com.google.common.collect.RegularImmutableMap.checkNoConflictInBucket(RegularImmutableMap.java:104)
        at com.google.common.collect.RegularImmutableMap.<init>(RegularImmutableMap.java:70)
        at com.google.common.collect.ImmutableMap$Builder.build(ImmutableMap.java:254)
        at org.janusgraph.graphdb.database.IndexSerializer$IndexInfoRetriever$1.get(IndexSerializer.java:165)
        at org.janusgraph.diskstorage.indexing.IndexTransaction.getIndexMutation(IndexTransaction.java:82)
        at org.janusgraph.diskstorage.indexing.IndexTransaction.add(IndexTransaction.java:67)
        at org.janusgraph.graphdb.database.StandardJanusGraph.prepareCommit(StandardJanusGraph.java:647)
        at org.janusgraph.graphdb.database.StandardJanusGraph.commit(StandardJanusGraph.java:731)
        at org.janusgraph.graphdb.transaction.StandardJanusGraphTx.commit(StandardJanusGraphTx.java:1425)
        ... 17 more
        at com.google.common.collect.RegularImmutableMap.checkNoConflictInBucket(RegularImmutableMap.java:104)
        at com.google.common.collect.RegularImmutableMap.<init>(RegularImmutableMap.java:70)
        at com.google.common.collect.ImmutableMap$Builder.build(ImmutableMap.java:254)
        at org.janusgraph.graphdb.database.IndexSerializer$IndexInfoRetriever$1.get(IndexSerializer.java:165)
        at org.janusgraph.diskstorage.indexing.IndexTransaction.getIndexMutation(IndexTransaction.java:82)
        at org.janusgraph.diskstorage.indexing.IndexTransaction.add(IndexTransaction.java:67)
        at org.janusgraph.graphdb.database.StandardJanusGraph.prepareCommit(StandardJanusGraph.java:647)
        at org.janusgraph.graphdb.database.StandardJanusGraph.commit(StandardJanusGraph.java:731)

    we are only adding a single vertex. Is there any guidance on what could be causing this error or how to troubleshoot?

    Matthias Leinweber
    @lnnwvr_twitter

    I have some misunderstanding which i can not find in the documentation regarding OLAP:

    First if you are using clustered janusgraph with multiple janusgraph instances (connected to cassandra+elasticsearch) can i do Graph compute() which are executed across multiple nodes?

    When do i have to add hdfs (gremlin hadoop) why cant cassandra be used as a "storage backend" for task barrier?

    Depending on the first answer whats the purpose of compute() if i really have to use Spark As "Runner" for my GraphComputer? What is FulgoraGraphComputer? What is Spark local? And can i somehow combine multiple janusgraph instances with spark local?

    What is needed for MapReduceIndexManagement?

    How do in configure olap with ConfiguredGraphFactory do i have to add a 2nd graph which connects to the same backend but with a HadoopGraph configuration?

    39 replies
    Vinayak Shiddappa Bali
    @VINAYAK179

    Hi All,
    The initial query is as follows:

    g.inject(1).union(V().has('property1', 'vertex1').as('v1').union(outE().has('property1', 'edge1').as('e').inV().has('property1', 'vertex1'),outE().has('property1', 'edge2').as('e').inV().has('property1', 'vertex2')).as('v2'),V().has('property1', 'vertex3').as('v1').union(outE().has('property1', 'edge3').as('e').inV().has('property1', 'vertex2'),outE().has('property1', 'Component_Of').as('e').inV().has('property1', 'vertex1')).as('v2')).limit(100).select('v1','e','v2').by(valueMap().by(unfold()))

    I wanted to add a limit on every edge returned by the queries, hence modified the query as follows:

    g.inject(1).union(V().has('property1', 'vertex1').as('v1').union(outE().has('property1', 'edge1').limit(100).as('e').inV().has('property1', 'vertex1'),outE().has('property1', 'edge2').limit(100).as('e').inV().has('property1', 'vertex2')).as('v2'),V().has('property1', 'vertex3').as('v1').union(outE().has('property1', 'edge3').limit(100).as('e').inV().has('property1', 'vertex2'),outE().has('property1', 'Component_Of').limit(100).as('e').inV().has('property1', 'vertex1')).as('v2')).select('v1','e','v2').by(valueMap().by(unfold()))

    It works but the performance is affected adversely and takes 2 mins for execution. The profile step indicates 99% of time is taken by union step.
    Please check and help me to improve the performance.
    Thank You

    Nilabhra Patra
    @nilbro
    Hi all. I am having problems connecting to the Gremlin server using Python.
    Getting "ConnectionRefusedError: [Errno 111] Connection refused" or "OSError: [Errno 99] Cannot assign requested address"
    I have already started the Gremlin server and can access it from the console
    can anyone help me with this? I am using BigTable as the backend
    Chidambaram Ramanathan
    @ChidambaramR
    Can janus graph be run in a cluster mode without using Hbase or Cassandra? I am designing a very small application for which HBase / Cassandra might be an overkill (cost wise, administrative etc). At the same time I want my graph data to persist across a single host. My app is read heavy, with high expected TPS. Although the size of the graph is not expected to grow linearly. Whats the best way?
    3 replies
    kusumakar
    @kusumakarb
    We are using Janusgraph 0.5.2 with Cassandra as Storage backend. I observed that the delete operations are slow due to the locking mechanism that happens on commit of the transaction in which the vertex/edge is being deleted. There is no explicit setConsistency defined on the elements being deleted, still the locking kicks in and the commit on the transaction happens very slowly. How can we make the deletes faster ?
    Ben Wuest
    @wuestinc_twitter
    @SeanTasker Did you ever resolve that issue around the ConfiguredGraphFactory and setting the storage.cql.replication-strategy-options. I am fighting with it ...
    Haven Wang
    @havenwang

    Does anyone know whether there is a documented list of all the available Metrics and their descriptions?
    I have looked through the Monitoring documentation (https://docs.janusgraph.org/advanced-topics/monitoring/) and have tried searching with various queries online, but with no luck.

    I can make a guess at the meaning of most, but would feel better off directly verifying somewhere. Here are some examples:

    • metrics_org_janusgraph_query_graph_getNew_time_Mean
    • metrics_org_janusgraph_query_vertex_hasDeletions_time_Mean
    • metrics_org_janusgraph_storeManager_mutate_time_Mean
    • metrics_org_janusgraph_stores_getSlice_entries_histogram_Mean
    • metrics_org_janusgraph_stores_getSlice_time_Mean
    Casey Kneale
    @caseykneale
    Is it possible to write "stored procedures" ie: groovy scripts with function definitions that can be accessed via HTTP/etc? It seems like you can attach groovy scripts to the gremlin REPL - but I don't see any documentation like that for gremlin-server?
    Antonio Jerez
    @pobreiluso_twitter

    Hi there! we recently placed Janusgraph in production for our ecommerce platform, right now we are working with OLAP queries with spark, we have reached a point that we don't know if it's right or wrong...

    The question is:

    If we have a graph with 39M vertex and 650M edges evenly distributed between the vertex...
    and I ran some OLAP query against a Spark Standalone Cluster with 80 cpu cores and 400GB of RAM

    ¿ is it ok that it takes 10 minutes to filter and count a subset of vertex ?

    We don't know if 80 cores is enough, and 10 minutes acceptable, or otherwise, if we have too much cpu power, and our OLAP queries are wrong...

    2 replies
    right now our OLAP queries, simply run some counts to have an overview of the status of the graph, for example, number of edges of certain types, or number of expired edges and vertex ....
    Casey Kneale
    @caseykneale
    is there another place to ask for help? I've tried asking questions here for a week or 2 and all of them went unanswered so I deleted them.
    2 replies
    Vinayak Shiddappa Bali
    @VINAYAK179
    Brian Miller
    @bkmdev
    Hi, how does one check the current ConfiguredGraphFactory settings to make sure it is set correctly? The docs only talk about creating/updating/opening/templates but not how to just show/print the current config? https://docs.janusgraph.org/basics/configured-graph-factory/
    (also tried things like :show variables graph, ConfiguredGraphFactory.configurations , and getConfigruation('...') )
    Brian Miller
    @bkmdev
    Anyone?
    Brian Miller
    @bkmdev
    ok, figured it out, looks like one can do foo=ConfiguredGraphFactory.getConfiguration('my_graph_name'); and it dumps it out
    or even w/o the foo= it seems, not sure why it didn't work earlier
    Razi Kheir
    @raxelz

    @ all
    Anyone knows how to deal with making significant changes to the exist schema in JanusGraph?

    Adding new edges to existing vertices, adding new vertices and indices, or even changing existing schema elements.

    I can not for the life of me find specifics in the documentation, there are a lot of overview things, but no specifics, how to exactly do it, if its async or sync, can the API still accept other mutation queries while this is happening, is there a way to do it with zero downtime, etc etc.

    If someone has a living example of this sort of thing, or a link to specifics technical documentation I would love to read!

    rajsenthil
    @rajsenthil_twitter
    Hello Everyone, my experience with Janusgraph is only a week. During this time, I could install Janusgraph with Cassandra and elastic search, insert few vertices and edges connecting them. So far it looks good. Our environment is reactjs, spring-boot, jdk 11, tinkerpop 3.5.0, Janusgraph 0.5.3, cassandra, elasticsearch all are running on a kubernetes cluster. I am looking for a way to administer the vertices and edges at the frontend (reactjs). Can someone suggest a react library to render and also editing capability? Does react-d3-graph can support edit as well? I want the spring-boot to be the middleware to carry-out the editing changes takes place in frontend? I tried with spring-data-gremlin. Initially, it worked ok but had issues with serializing/deserializing vertex, edges. Also, it is in the already in the process of deprecating it. Is there any other spring adapters exists? I would like to hear your thoughts. Regards