Where communities thrive

  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
Repo info
    Matthias Leinweber
    what else could be a reason that my index dont get registered/installed/disable although i clean all connections and transactions?
    Matthias Leinweber
    and i am pretty confused about mapreduce jobs and spark/hadoop connectivity .. i am looking for more complete examples or someone who has some time to answer stupid questions :)
    Ben Doan
    Hey everyone - has anyone here successfully implemented a SparkGraphComputer for OLAP on AWS EMR? I'm trying to implement it using JanusGraph 0.5.3 on AWS EMR Release version 5.23.0 and I'm running into some dependency issues
    Ben Doan

    I am currently trying to set up SparkGraphComputer using JanusGraph with a CQL storage and ElasticSearch Index backend, and am receiving an error when trying to complete a simple vertex count traversal in the gremlin console:

    gremlin> hadoop_graph = GraphFactory.open('conf/hadoop-graph/olap/olap-cassandra-HadoopGraph-YARN.properties')
    gremlin> hg = hadoop_graph.traversal().withComputer(SparkGraphComputer)
    gremlin> hg.V().count()
    ERROR org.apache.spark.SparkContext - Error initializing SparkContext. java.lang.ClassCastException: org.apache.hadoop.yarn.proto.YarnServiceProtos$GetNewApplicationRequestProto cannot be cast to org.apache.hadoop.hbase.shaded.com.google. protobuf.Message

    Relevant cluster details

    • JanusGraph Version 0.5.3
    • Spark-Gremlin Version 3.4.6
    • AWS EMR Release 5.23.0
    • Spark Version 2.4.0
    • Hadoop 2.8.5
    • Cassandra/CQL version 3.11.10

    Implementation Details

    Properties File

    metrics.enabled= false
    ### Storage - CQL 
    ###HOSTS and PORTS
    ####InputFormat configuration
    ### SparkGraphComputer Configuration 
    ###Spark Job Configurations
    ###Gremlin and Serializer Configuration
    ### Special Yarn Configuration (WIP)
    ###Spark Driver and Executors
    Sanjeev Ghimire
    question: my janusgraph is hosted in openshift cluster. I have apython app that can connect to it and load successfully, but I can't connect to the server using gremlin console, I get permission denied. Also the data are not replicated on all the servers
    Any idea why? Any help is appreciated
    Piotr Joński
    hi, quick question:
    we are testing things and dont want to spin up full cluster with elastisearch, so we use lucene indexing as default.
    will that work with multi-node janusgraph cluster? (we have 2 instances)
    2 replies
    Hi, anyone already tried to add a custom lib in JanusGraph? I create an empty class in a TestClass.java file with org.test package that I compiled into jar and I added it to my JANUS_HOME/lib folder. However when I try to import my class with :i org.test.TestClass it does not find it. Should I do something special?
    I found a funny bug:
    These two requests looks exactly the same but they are not, they do not produce the same result and if I play with up arrow and enter they still do. I checked that there is no hidden special character.
    And this occurs only with numbers combined with a whitespace, or at least I could not reproduce it with text containing letters+whitespaces.
    On the second request I wrote the value digit per digit on my keyboard while on the first request and copied/pasted the value from the gremlin console after doing a .values()
    My bad, notepad detected that the withespace character is not the same:
    Sanjeev Ghimire
    why is janusgraph not replicating data on all pods?
    we are using all default settings
    when queried from an app, we dont get all teh results
    I would like to ask a question regarding how Janus retrieves properties of vertexes. Take following query for example, g.V(12345).out().has("Name", "abc"), V(12345) has 100 or so neighbors. But it seems that Janus is fetching properties of these vertexes in a sequential way. Am I missing anything or this is done intentionally? I also found a related option "query.batch-property-prefetch" . After it is turned on, Janus can prefetch properties in parallel, but with a drawback. It seems that it is fetching all the properties of an vertex even though the filters in later steps only need a single property. Am I missing anything?
    16 replies
    Sanjeev Ghimire

    why is janusgraph not replicating data on all pods?

    any help on this?

    7 replies
    Piotr Joński

    hi guys,
    question about

                    .serializer(new GraphSONMessageSerializerV2d0(GraphSONMapper.build().addRegistry(JanusGraphIoRegistry.instance())))

    if we use k8s service (the service acts as load balancer) shall we configure single ContactPoint or put 3 (we have 3 pods) janusgraph pod IPs there? (whetever is ContactPoint, i assume it is the same as url, is it correct?)
    and next question: if we specify 3 ContactPoints (all pods directly) what will be the difference? is it only for client-side loadbalancing or some addtional behaviour is expected?

    Piotr Joński

    did anybody in the world, or at least in this channel :) tried to scale up janusgraph? i have serious issues with that, struggiling for few days already.
    it always return connection timeout if we have more than 1 replica :/
    i try to deploy that to k8s, sometimes single pod work, sometimes it throws exceptions about connections.
    how to set it up properly? i have read the articles from janusgraph website (multi node) and nothing helps. do you have any examples of working configurations for multinode janusgraph deployment?


    Florian Hockmann
    Hey Piotr, regarding your first question: You can use a load balancer in front of JanusGraph and then pass it as a contact point to the driver. You just need to make sure that the load balancer supports Websockets and ensures that subsequent requests on the same connection will be forwarded to the same endpoint. Nginx supports this for example. Not sure whether the k8s load balancer supports this or whether it's more "low level".
    If you however provide the driver directly with the IPs of the JanusGraph pods, then it should do the load balancing itself. That might be an option if your environment doesn't change much (e.g., if you scale JanusGraph up or down, you would need to change this in the driver)
    5 replies
    What exception do you get if you try it with just one pod for JanusGraph? I would definitely try to get that working before scaling up as you then get the added complexity from the load balancer
    5 replies
    Hope someone can chime in on my question above. The key question is why Janus fetches properties sequentially which hurts the latency to a great degree.
    Sanjeev Ghimire
    Anyone inegrated cassandra with janusgraph?
    11 replies
    Piotr Joński

    hi guys,
    could you elaborate more on how to connect from java application to janusgraph, please?
    i found SO question and some answers: https://stackoverflow.com/questions/45673861/how-can-i-remotely-connect-to-a-janusgraph-server
    but i cannot fully get it, and, nowadays it looks really awkward to send "code as a string" to server.

    i tried to find description of that problem and solutions in official docs -- https://docs.janusgraph.org/ -- but did not manage :sad:
    the docs are great, but sometimes i feel like reading abstraction-view of specific problem, without description of specific solutions. after reading i have the feeling that janusgraph is supposed to be used only manually from gremlin console, instead of in automatic-way from java application.

    could someone shed more light on that topic, please?
    eventually update docs with examples, or point to examples to SO or any other pages?
    thank you :thumbsup:

    1 reply
    Florian Cäsar

    Hi, Gremlin/Janus question (though mostly Gremlin-related). I'm injecting vertices & edges in a Gremlin language variant for bulk insertion. To add the edges, I reference the connected vertices by their id like using the Vertex-step (i.e. V(<id>)). Since I'm adding the edges from an injected array of maps like [from:id, to:id], I need to dynamically look up vertices by their id from within the traversal with e.g. select('from'). In essence, I want:

     g.inject([[from:vertex_id_1,to:vertex_id_2], ...])

    However, this doesn't work since the outer map containing from/to aren't available inside the vertex lookup step. Something like this must be possible, but I haven't been able to figure out how. Any ideas?

    3 replies
    Marie Diana Tran

    Hi there,

    I have a setup with janusgraph server with BigTable as a storage backend and a remote Elasticsearch for search indexing.
    Following the documentation, I managed to have a remote traversal using Gremlin to the previous janusgraph server.

    Now, I want to punctually run scripts to manage the graph schema and setup indexes.
    I understood that I have to use janusgraph-core and JanusGraphManagement .
    Nevertheless, I could not manage to setup the connection to the janusgraph server

        JanusGraphFactory.Builder config = JanusGraphFactory.build();
            config.set("storage.backend", "hbase");
            config.set("storage.hbase.ext.hbase.client.connection.impl", "com.google.cloud.bigtable.hbase2_x.BigtableConnection");
            config.set("storage.hbase.ext.google.bigtable.project.id", "xxx");
            config.set("storage.hbase.ext.google.bigtable.instance.id", "xxx");
            config.set("index.search.backend", "elasticsearch");
            config.set("index.search.hostname", "xxx:9200");
            JanusGraph graph = config.open(); 
            GraphTraversalSource g = graph.traversal();
            JanusGraphManagement mgmt = graph.openManagement();

    I have the following debug log

    16:12:36.914 [main] DEBUG o.j.d.c.BasicConfiguration - Ignored configuration entry for storage.hbase.ext.hbase.client.connection.impl since it does not map to an option
    java.lang.IllegalArgumentException: Unknown configuration element in namespace [root.storage.hbase.ext]: client
        at com.google.common.base.Preconditions.checkArgument(Preconditions.java:164) ~[jg-tester-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
        at org.janusgraph.diskstorage.configuration.ConfigElement.parse(ConfigElement.java:177) ~[jg-tester-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
        at org.janusgraph.diskstorage.configuration.BasicConfiguration.getAll(BasicConfiguration.java:93) ~[jg-tester-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
        at org.janusgraph.graphdb.configuration.builder.GraphDatabaseConfigurationBuilder.build(GraphDatabaseConfigurationBuilder.java:59) [jg-tester-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
        at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:161) [jg-tester-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
        at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:132) [jg-tester-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
        at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:122) [jg-tester-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
        at org.janusgraph.core.JanusGraphFactory$Builder.open(JanusGraphFactory.java:261) [jg-tester-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
        at live.yubo.JanusGraphApp.main(JanusGraphApp.java:44) [jg-tester-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
    16:12:36.915 [main] DEBUG o.j.d.c.BasicConfiguration - Ignored configuration entry for storage.hbase.ext.google.bigtable.project.id since it does not map to an option
    java.lang.IllegalArgumentException: Unknown configuration element in namespace [root.storage.hbase.ext]: bigtable
        at com.google.common.base.Preconditions.checkArgument(Preconditions.java:164) ~[jg-tester-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
        at org.janusgraph.diskstorage.configuration.ConfigElement.parse(ConfigElement.java:177) ~[jg-tester-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
        at org.janusgraph.diskstorage.configuration.BasicConfiguration.getAll(BasicConfiguration.java:93) ~[jg-tester-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
        at org.janusgraph.graphdb.configuration.builder.GraphDatabaseConfigurationBuilder.build(GraphDatabaseConfigurationBuilder.java:59) [jg-tester-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
        at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:161) [jg-tester-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
        at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:132) [jg-tester-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
        at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:122) [jg-tester-1.0-SNAPSHOT-jar-wit

    Anyone can help ?

    Venkat Dasari
    Hi, Janus Architecture Question. We are trying to start our evaluation on Janus Graph, and from Architecture standpoint, can this graph be built by loading the data directly from HDFS and does it support parquet? Even if it doesn't support Parquet, its fine, but will it be able to read the data from HDFS directly?
    Philipp Kraus
    Hello, I'M using Janusgraph with Python and Java for some graph data storage, but now I would like to use Janusgraph in Python as a knowledge graph with logical or probabilistic reasoning. Is there any additional support for logical reasoning without writing my unification by myself and a general question can I define some inheritance on the vertex label types e.g. label "car" is an inheritance by "vehicale"?
    1 reply
    Vinayak Shiddappa Bali
    Hi All, 
    The Data Model of the graph is as follows:
    Label: Node1, count: 130K
    Label: Node2, count: 183K
    Label: Node3, count: 437K
    Label: Node4, count: 156
    Node1 to Node2 Label: Edge1, count: 9K
    Node2 to Node3 Label: Edge2, count: 200K
    Node2 to Node4 Label: Edge3, count: 71K
    Node4 to Node3 Label: Edge4, count: 15K
    Node4 to Node1 Label: Edge5 , count: 1K
    The Count query used to get vertex and edge count :
    g2.V().has('title', 'Node2').aggregate('v').outE().has('title','Edge2').aggregate('e').inV().has('title', 'Node3').aggregate('v').select('v').dedup().as('vertexCount').select('e').dedup().as('edgeCount').select('vertexCount','edgeCount').by(unfold().count())
    This query takes around 3.5 mins to execute and the output returned is as follows:
    The problem is traversing the edges takes more time.
    g.V().has('title','Node3').dedup().count() takes 3 sec to return 437K nodes.
    g.E().has('title','Edge2').dedup()..count() takes 1 min to return 200K edges
    In some cases, subsequent calls are faster, due to cache usage. 
    I also considered in-memory backend, but the data is large and I don't think that will work. Is there any way to cache the result at first-time execution of query ?? or any approach to load the graph from cql backend to in-memory to improve performance?
    Please help me to improve the performance, count query should not take much time.
    Janusgraph : 0.5.2
    Storage: Cassandra cql
    The server specification is high and that is not the issue.
    Thanks & Regards,
    Mohammad Alian
    Hi, we currently call /?gremlin=g.V().none() to check if JG is up and has healthy connection to it's storage. Calling /?gremlin=graph.open doesn't check if the storage connection is healthy or not. Is there any other way to do this?
    3 replies
    The JG docs pages on schema management recommend against spaces and special chars in PropertyKey names but not Edge or Vertex Label names. Any particular reason for this? We were hoping to standardise on lower case with underscores. Now it looks like we need e.g. camelCase on Property Keys. Does anyone know if this is more than a recommendation e.g. some impact on performance or indexing?
    2 replies
    Michael Wilson

    Hey all. Hopefully this question isn't too noobish. I'm currently just getting started looking into JanusGraph, and I'm trying to get a general feel for failure modes. In particular, we're building out a service where various teams will have their own individual spaces that we want to protect from one another while still maintaining queryability. For example, if team 1 accidentally does something that takes down their space JanusGraph, we want to ensure that teams 2, 3, 4, and 5 are not affected and their jobs/processes still continue.

    That being said, I'm not sure if this question is valid or realistic, so a gut check would be very much appreciated here.

    1 reply
    Vinayak Shiddappa Bali

    Hi All,

    Trying to implement OLAP for performance improvement of count queries.
    Referred to the above document, still not working.
    Error: Spark master not responding, but it's running
    Error while invoking RpcHandler #receive() for one-way message while spark job is hosted on Jboss and trying to connect to master


    Florian Cäsar

    Hi, I'm experiencing a strange no response bug somewhere between Gremlin python, JanusGraph and Gremlin server:
    Some long traversals (with thousands of instructions) don't get a response. And I don't mean that they time out or that they get an error back, I mean they get nothing back whatsoever. No log entries at the server, nothing in the client.

    The threshold for this silence treatment isn't clear - it doesn't clearly depend on bytes or number of instructions. For instance, this code will hang forever for me:

    g = traversal().withRemote(DriverRemoteConnection('ws://localhost:8182/gremlin', 't'))
    g = g.inject("")
    for i in range(0, 8000):
        g = g.constant("test")
    print(f"submitting traversal with length={len(g.bytecode.step_instructions)}")
    result = g.next()
    print(f"done, got: {result}") # this is never reached

    I know it doesn't depend on just the number of bytes because the number of instructions beyond which this happens doesn't change even with very large strings instead of just "test".

    Just in case, I've already increased server-side maxHeaderSize, maxChunkSize, maxContentLength etc. to ridiculously high numbers. No change.

    Any ideas what I'm doing to deserve the silent treatment? This is driving me insane.

    3 replies
    Florian Cäsar

    I'm trying to use JanusGraph's full-text predicates in the gremlin-python client library. Using GraphSON serializer, I can just use a predicate with e.g. "textContains" as the operator and it works since JanusGraphIoRegistryV1d0 registers its custom deserializer for P objects:

    addDeserializer(P.class, new JanusGraphPDeserializerV2d0());

    However, as of v0.5.3, JanusGraph does not register any deserializers for GraphBinary (though that feature is already on the master branch). This means that when I submit the same exact traversal with P("textContains", "string") in graphbinary format I get:

    org.apache.tinkerpop.gremlin.server.handler.OpSelectorHandler  - Invalid OpProcessor requested [null]

    I presume this is because the "textContains" predicate isn't registered. Weirdly enough, in my Groovy console, the same traversal works fine even though it also uses graphbinary (according to the configuration).

    There are a couple options here and I don't have enough information on any of them, so I would appreciate input:

    1. Figure out what the Groovy console is doing differently and use that in the Python library
    2. Use a Docker image from master branch and adapt the Python library to use the new custom JanusgraphP type in graphbinary
    3. Use two separate clients with different serializations depending on which traversal I need to run (yuck)

    Note: I've already tested https://github.com/JanusGraph/janusgraph-python, it does the same thing I do manually and thus only works with GraphSON.


    ok, found - configuration options:

    schema.constraints: 'true'
    schema.default: none

    How can these options be set on an already created graph? I can do it in ConfiguredGraphFactory, but what about a running graph?

    3 replies

    How do I set schema constraints on PropertyKeys i.e. property annotations? In the reference on Advanced Schema - Multi Properties, it demonstrates how to add property annotations. That is, properties on properties. I would like to schema constrain these annotations e.g. to create an address block with annotations for house number, street name etc.

    If I use mgmt.addProperties(myPropertyKey, myOtherPropertyKey) I get an error. This capability is not documented or does not seem to exist. What is the correct procedure?

    Venkat Dasari
    I am trying to follow the procedure of bulk loading the data as described on the JanusGraph Documentation page. Loading the vertices, and trying to save the vertex-id for the key and then using that to load the edges for the vertices. The problem is, the vertex-id is not matching with what's in the graph. I tried to generate the vertex id after the configuration changes, but it won't allow me to insert any value as vertex id. Any suggestions?
    10 replies
    Harshit Sharma
    I was using this query g.V(vertexId).repeat(both(labels).simplePath()).hasLabel(person).emit().id().fold().next()
    to find distinct person in a connected components.
    Actually, in my dataset, I have around 8 million vertices and 8 million edges and
    this query seems to be very expensive in terms of time complexity. So is there a way to optimize this query?
    Florian Cäsar

    What is the recommended way of backing up Janusgraph data together with mixed indices?
    How do you ensure the backups are consistent?

    I've read all I could find on this and the gist seems to be "use whatever backup mechanism your backend storage supports", which is fine (Scylla has backups), but doesn't answer the question of coordinating the backups.

    Alternatively, I could of course rebuild indices from scratch after backup, but that seems unnecessarily expensive.

    Venkat Dasari
    I am doing a bulk loading of vertices, and trying to save the vertex ids. When i add 50 to 100 of them and then do vertex.toList(), its returning only the last one. What is the recommended way of getting all the vertices?
    1 reply
    Alessandro Sivieri
    Hello! Do you know if anyone has experience of running OLAP on an AWS EMR cluster? I am currently trying to do so, but strange things are happening, in particular I have the sensation that the application is running only on the master and not on the other nodes, based on the fact that it appears correctly on Hadoop and Spark UI but it uses only two executors (I have configured memory and cores in my properties), which seem to occupy the entire driver resources and that's it.
    Moreover, I seem to have problems in getting the correct output: I started from the properties example that uses CQL, but I do not receive any meaningful answer on queries that I do on the Gremlin console (the data is there, because I am able to query it without Spark). I saw that the conf shows, as GraphWriter, a NullOutputFormat, so I tried to set the Gyro one in there, but nothing changed.

    Hello, have anyone tried custom data types in Janusgraph?
    I tried following the doc - https://docs.janusgraph.org/advanced-topics/serializer/ but always ended up with error root.attributes.custom.attribute-class.

    That suggests that the list have not been handled correctly from the configurations? No other config also reads from a list as well to help me confirm if it indeed is a bug.

    Vinayak Shiddappa Bali
    Hi All, 
    I need to select multiple nodes and edges and display the content in v1 - e - v2 format. The query generated is as follows:
    g.V().union(has('title', 'V1').as('v1').outE().hasLabel('E1').as('e').inV().has('title', 'V2').as('v2'),has('title', 'V2').as('v1').union(outE().hasLabel('E2').as('e').inV().has('title', 'V2'),outE().hasLabel('E3').as('e').inV().has('title', 'V3')).as('v2')).select('v1','e','v2').by(valueMap().by(unfold()))
    It throws the warning:
    05:20:21 WARN  org.janusgraph.graphdb.transaction.StandardJanusGraphTx  - Query requires iterating over all vertices [()]. For better performance, use indexes
    How can we optimize the query may be without a union step?
    Thanks & Regards,
    This message was deleted
    1 reply

    Hello, can anyone help me with how to index?

    The query which fails for indexing is

                __.has("email_from", P.within(“A"))
                        __.has("email_to", P.within(“B")),
                        __.has("email_copy_to", P.within(“B"))
                __.has("email_from", P.within(“A"))
                        __.has("email_to", P.within(“B")),
                        __.has("email_copy_to", P.within(“B"))

    The index that has already been added is

    janusGraphManagement.buildIndex("email_from_to_copy_list", Vertex.class)
              .addKey(emailFrom, Mapping.STRING.asParameter())
              .addKey(emailTo, Mapping.STRING.asParameter())
              .addKey(emailCopyTo, Mapping.STRING.asParameter())

    As you might have already guessed, “email_to”, “email_from” and “email_copy_to” are all having a cardinality of type list.

    Even after adding the above index, while querying, I still get a WARN of query is faster when indexed. The query is also very very time consuming.

    1 reply
    Hi All, What is the roadmap of JanusGraph with dynamo DB as a backend ? I see the https://github.com/awslabs/dynamodb-janusgraph-storage-backend repo to be running on an older version of JG
    2 replies