Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Mar 16 2018 21:28

    haifengl on master

    scala-logging 3.8 and LazyLoggi… (compare)

  • Dec 29 2017 21:35

    haifengl on master

    clean tests (compare)

  • Dec 29 2017 17:08

    haifengl on master

    remove sql module ammonite based shell (compare)

  • Dec 29 2017 16:40

    haifengl on master

    revise doc clean up comment out narwhal module (compare)

  • Dec 14 2017 23:06

    haifengl on master

    fix edge scan start key make SQL for unibase sbt 1.0.4 (compare)

  • Dec 13 2017 03:07

    haifengl on master

    revise GraphOps with new GraphL… redesign RowKey (compare)

  • Dec 07 2017 02:17

    haifengl on master

    remove redundant companion obje… revise base class (compare)

  • Dec 07 2017 02:08

    haifengl on master

    revise Traveler traits (compare)

  • Dec 05 2017 20:32

    haifengl on master

    simplify GraphLike api (compare)

  • Nov 28 2017 05:34

    haifengl on master

    clean up bigtable api move ByteArray to kv module bring back generic to Database and 8 more (compare)

  • Nov 27 2017 00:22

    haifengl on master

    implement new key-value store a… change namespace to kv rename to Rockspace and 1 more (compare)

  • Nov 26 2017 22:34

    haifengl on master

    remove util module refine Vertex, Edge, and Graph … recursively delete directory wh… and 2 more (compare)

  • Nov 24 2017 05:04

    haifengl on master

    normalize text refactor graph design rename Scan.scala to FindOps.sc… and 4 more (compare)

  • Nov 22 2017 01:50

    haifengl on master

    add index (compare)

  • Nov 21 2017 20:29

    haifengl on master

    add text tokenizer make scanner autoclosable scan bug fix and 2 more (compare)

  • Nov 21 2017 13:48

    haifengl on master

    initial design drawer and table add scala-logging remove util module and 2 more (compare)

  • Nov 20 2017 22:37

    haifengl on master

    space-efficient json datetime s… refact BigTable interface. remo… edit comments and 4 more (compare)

  • Nov 20 2017 04:24

    haifengl on master

    depend on cassandra-thrift only remove network configuration fo… (compare)

  • Nov 20 2017 04:14

    haifengl on master

    remove import util logger is always used. remove l… (compare)

  • Nov 20 2017 04:04

    haifengl on master

    enable compression and data blo… clean up util package object (compare)

Jan Prill
@janprill
ah, ok...
Haifeng Li
@haifengl
It is a good idea though. I will add it
Btw, I already add bulk loading edges from a file
Not pushed yet
Haifeng Li
@haifengl
WriteBatch on rocksdb doesn't improve performance much either
Jan Prill
@janprill
ok. Thanks for being so responsive. Great project...
Haifeng Li
@haifengl
@janprill what framework/library do you use to parse RDF, which is used by dbpedia?
Jena?
Debmalya Jash
@debmalya
Which one is better Scala or Kotlin? Is it depends on the use case / problem we are trying to solve ? I do not know Scala or Kotlin. Asking this question to know which one will be better to start with?
Haifeng Li
@haifengl
it is my first time to know Kotlin. Without knowledge about its design, I cannot compare it to Scala technically. But I would like to point out that Scala has built a fairly big community and rich ecosystem. With the success of Akka, Spark and other systems on top of Scala, I feel that Scala is probably has an edge on commercial support and talent pool.
Douglas Arantes
@douglas_jvm_twitter
@debmalya Kotlin Comparison to Scala
Haifeng Li
@haifengl
@janprill I add support to import RDF files
checkout the example in shell/src/universal/examples/dbpedia.sh
Jan Prill
@janprill
@haifengl : I've just tested performance and therefore correct parses where not too important. I went with a Tokenizer, as you did in the old version of dbpedia.sh. I saw hat you made changes there. Great stuff! Going to test once again with a new download of dbpedia-data.
Haifeng Li
@haifengl
My example is single node, single thread
If you have a cluster, we should see a linear speed up
We can speed up more by cache vertex id
Debmalya Jash
@debmalya
thanks @douglas_jvm_twitter
Haifeng Li
@haifengl
add vertex string key cache
Haifeng Li
@haifengl
we now support SQL!
in the shell, create a table "worker" and insert several documents like
val joe = JsObject( "name" -> "John", "age" -> 40, "gender" -> "Male", "salary" -> 80000.0, "address" -> JsObject( "street" -> "1 ADP Blvd", "city" -> "Roseland", "state" -> "NJ", "zip" -> "07068" ), "project" -> JsArray("HCM", "NoSQL", "Analytics") )
then query like this
'''val db = Narwhal(HBase())
db.sql("select address.state, count(address.state), max(age), avg(salary) as avg_salary from worker group by address.state order by avg_salary")'''
val db = Narwhal(HBase()) db.sql("select address.state, count(address.state), max(age), avg(salary) as avg_salary from worker group by address.state order by avg_salary")
note that all the computation like sum, avg, group by and order by are done in the client side
so this should only be applied to small data
your table can still be large
but make sure to use where clause to filter data, which is done on the server side
BTW, the return type is unicorn.json.DataFrame
it is not full blow data frame as in R
but it supports all we need for sql queries.
if you need other operations on data frame, let me know.
Haifeng Li
@haifengl
val joe = JsObject(
  "name" -> "John",
  "age" -> 40,
  "gender" -> "Male",
  "salary" -> 80000.0,
  "address" -> JsObject(
    "street" -> "1 ADP Blvd",
    "city" -> "Roseland",
    "state" -> "NJ",
    "zip" -> "07068"
  ),
  "project" -> JsArray("HCM", "NoSQL", "Analytics")
)
val db = Narwhal(HBase())
db.sql("select address.state, count(address.state), max(age), avg(salary) as avg_salary from worker group by address.state order by avg_salary")
Haifeng Li
@haifengl
@all Unicorn 2.1 is released
RocksDB is supported as a super low latency local storage engine.
We also support SQL queries now for operational queries.
besides, many small improvements to Graph database
Danila Matveev
@optician
Hi!
I've just found this project. At a glance there is no information about scalability, performance and cluster behaviour in README. Could you give tips where I can find this.
Haifeng Li
@haifengl
scalability, performance and sharding are really depending on the underlying storage engine.
with HBase and Accumulo, you get consistency, linear scalability, and good read performance.
with Cassandra, you get extreme availability but no consistency, high write throughput
with RocksDB, you get very low latency (in microseconds), which is important for graph traversal. however, it is a local embedded database.
Danila Matveev
@optician
Does unicorn have local caches?
Danila Matveev
@optician

If so, how cache invalidation works?

And also

  • Does it have transactions?
  • What thread policy used? For example, titan forces one synchronous transaction per thread. It makes me a little nervous about amount of threads.

Sorry for welter)

Haifeng Li
@haifengl
for graph traversal, it has cache inside "SimpleTraveler". SimpleTraveler is a light object and should be throw away after your traversal.
each thread can have its own SimpleTraveler and it is safe to use unicorn in multiple threads.
unicorn doesn't have transaction. feel free to use as many as threads your system is good for.
Danila Matveev
@optician
Thanks a lot!