Where communities thrive

  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
  • Jun 18 02:56

    simerplaha on develop

    - bump Scala & sbt versions and… (compare)

  • Jun 02 10:25
    simerplaha commented #360
  • Jun 02 10:24
    simerplaha edited #317
  • Jun 02 10:23
    simerplaha commented #360
  • Jun 01 09:42
    simerplaha labeled #360
  • Jun 01 09:41
    simerplaha commented #360
  • Jun 01 08:57
    simerplaha commented #360
  • May 31 15:58
    hicolour edited #360
  • May 31 15:49
    hicolour opened #360
  • Apr 16 10:59
    simerplaha edited #359
  • Apr 11 14:58
    simerplaha edited #359
  • Apr 11 11:58
    simerplaha edited #359
  • Apr 11 11:56
    simerplaha edited #359
  • Apr 11 11:55
    simerplaha edited #359
  • Apr 11 11:55
    simerplaha labeled #359
  • Apr 11 11:55
    simerplaha opened #359
  • Mar 29 14:21
    simerplaha labeled #358
  • Mar 29 14:20
    simerplaha commented #358
  • Mar 29 14:19
    simerplaha commented #358
  • Mar 29 13:25
    hicolour opened #358
Simer Plaha
Thank you for your awesome feedback :- )
Simer Plaha

@/all Hey guys just released version 0.3. This release contains support for expire API (TTL), update & improved batching for atomic writes. Here is a sample of some of the new API.

//put & expire a key-value after a day
db.put(key = 1, value = "one", expireAfter = 1.day)
//or expire a range of key-values after an hour
db.expire(from = 1, to = 1000, after = 1.hour)
//update values without altering the already set expiration
db.update(from = 1, to = 1000, value = "value updated")
//or update a single value
db.update(key = 1, value = "value updated")
//fetch the expiration deadline for a key
db.expiration(key = 1)
//fetch time left until the key's expiration
db.timeLeft(key = 1)

Here is a whole list of write AP and read API.

I'm going to start working on getting compression ready for v0.4. If you think of any features you think we should add please do bring it up.

Other than that, all the core APIs are done. Any new features can be built as extension libraries. Let me know what you think.
Frank Rosner
I don't really have any strong opinions but great work releasing 0.3!
Also I couldn't find the time to look into the travis build :(
Simer Plaha

Thank Frank.
No worries whenever you have time is OK. We will eventually get to it :)

Loved your blog posts BTW :+1:

Simer Plaha

@/all Just pushed a big commit to support compression. Both LZ4 and Snappy libraries are full supported. It's part of the grouping strategy you can read up here .

Segment file format documentation
Grouping format documentation

Will do a release after simerplaha/SwayDB#15

Frank Rosner
Yeah :thumbsup:
Simer Plaha
@/all Just released v0.4. I've added a topic to scala-lang with a brief outlining the release.
Peter C.
Hello There, Do you guys have any stats on the storage footprint of SwayDB in comparison to LevelDB or LMDB?
in terms of mmap file
Simer Plaha

Hi @touhonoob, just ran some quick space usages tests on 10 million key-values with compression disabled for both LevelDB and SwayDB.

SwayDB provides two storage formats Map and Set. So I've added space usage for both.

The following shows the total size of sstables created by LevelDB and segments files by SwayDB.

Space usage when keys and values are unique integers

Key -> Value = 1 -> 1, 2 -> 2, 3 -> 3 .... 10000000 -> 10000000

  • LevelDB - 165.9 MB
  • SwayDB Map - 142.1 MB
  • SwayDB Set - 132.1 MB (Set databases reduces the space usage even further)

Unique keys only and same/duplicate values

Key -> Value = 1 -> 1, 2 -> 1, 3 -> 1 .... 10000000 -> 1

  • LevelDB - 165.9 MB
  • SwayDB Map - 92.1 MB (Because SwayDB eliminates duplicate values and writes them only ones the space usage is reduced even further)
  • SwayDB Set - 132.1 MB (Set combines key-values into a single key so there no duplicates)

I'm not too familiar with LevelDB's configuration but I used https://github.com/fusesource/leveldbjni with the following options.

val options = new Options()

For SwayDB I used the default with none groupingStrategy which disables compression.

Peter C.
@simerplaha Wow that's impressive. Thanks for your time.
Simer Plaha
No worries. Let me know if anything else :)
Valentyn Kolesnikov
Java wrapper is available in maven central repository. :+1:
Simer Plaha
That's awesome Valentyn. Thank you so much.
Simer Plaha
Hey @/all just created a slack account for some of us who prefer slack - here is the invite link.
@simerplaha impressive work! I’m playing around with Scala, FP, monix and swaydb for self-enjoyment. I have a few questions, I would love if you find some time to answer them.
  • you say that sway is non-blocking internally, but the async API seems to only “wrap” the sync one in an asyncronous manner. Can you give me pointers to understand if what I say is true or false?
  • what is the status of the integration with monix to replace the Future with monix Tasks ?
Simer Plaha

Hi @algobardo. Async APIs should not wrap sync APIs. I'm wondering if you are looking at an old version which had Async APIs work-in-progress?

In the newer versions (0.8-BETA+) all APIs are abstracted with the type Tag[T[_]] which allows us to choose whichever container we want to use for Sync and Async APIs.

For example: if you want to use Monix Task.

Same goes with Stream
As far as replacing Future with monix goes, the above approach allows us to use all container like Monix’s Task, Scala Future or Try, Scalaz Task, ZIO etc.
Simer Plaha
in 0.8-BETA+ the tagAsync is named asyncAPI which is now simplified and will be in the next version.
Simer Plaha
My bad I forgot. Writes do use sync API within async to maintain the order of inserts and updates. This might change depending on if there is a performance or reduced resource benefit. Will have to run some benchmarks.
Simer Plaha

@algobardo hey mate, just pushed another release with Monix support. Added 2 examples here demoing how to use Task.

I've never worked with monix before so any suggestions on improving the examples or testing would help a lot.

Let me know how you go.

Ori Dagan
Hi @simerplaha, first let me say that your project looks great. A fast, non-blocking, scala-native embedded DB is a welcome addition to the ecosystem. We are considering giving it a try so I wanted to ask about its production readiness and if you are planning a production release any time soon. Thanks!
Simer Plaha

Hi @oridag, thank you. Supporting scala-native is something I'm looking forward to as well (#1).

I wish I could say that we are production ready now but we need to write more integration test-cases (#178) and finish few tasks relating to measuring and monitoring performance (specially #276).

Best case scenario is that we reach production readiness before new year otherwise we will definitely be production ready by early-mid next year.

Ori Dagan
Got it. Thanks!
Simer Plaha
Ori Dagan
Hi @simerplaha , just checking to see if there is any new ETA for production release?
Simer Plaha
Hi @oridag, I haven’t been able to find much OS time recently. Can’t say much on the ETA but we are not that far from it.
Glen Marchesani
hello I am wondering on cats effect 3 support ?
Simer Plaha
You can copy this Bag implementation into your code. Proper release will happen when I win the battle against JVM's garbage collector.
import cats.effect.IO
import cats.effect.unsafe.IORuntime
import swaydb.Bag.Async
import swaydb.serializers.Default._
import swaydb.{IO => SwayIO, _}

import scala.concurrent.{ExecutionContext, Future, Promise}
import scala.util.Failure

object Cats3Example extends App {

   * Cats-effect 3 async bag implementation
  implicit def apply(implicit runtime: IORuntime): swaydb.Bag.Async[IO] =
    new Async[IO] { self =>

      override def executionContext: ExecutionContext =

      override val unit: IO[Unit] =

      override def none[A]: IO[Option[A]] =

      override def apply[A](a: => A): IO[A] =

      override def map[A, B](a: IO[A])(f: A => B): IO[B] =

      override def transform[A, B](a: IO[A])(f: A => B): IO[B] =

      override def flatMap[A, B](fa: IO[A])(f: A => IO[B]): IO[B] =

      override def success[A](value: A): IO[A] =

      override def failure[A](exception: Throwable): IO[A] =

      override def foreach[A](a: IO[A])(f: A => Unit): Unit =

      def fromPromise[A](a: Promise[A]): IO[A] =

      override def complete[A](promise: Promise[A], a: IO[A]): Unit =
        promise tryCompleteWith a.unsafeToFuture()

      override def fromIO[E: SwayIO.ExceptionHandler, A](a: SwayIO[E, A]): IO[A] =

      override def fromFuture[A](a: Future[A]): IO[A] =

      override def suspend[B](f: => IO[B]): IO[B] =

      override def flatten[A](fa: IO[IO[A]]): IO[A] =
        fa.flatMap(io => io)

  implicit val runtime = IORuntime.global

  val test =
    for {
      map <- memory.Map[Int, String, Nothing, IO]()
      _ <- map.put(key = 1, value = "one")
      value <- map.get(key = 1) //returns "one"
    } yield {
      println(s"value: $value")

Glen Marchesani
thanks @simerplaha I see you did a cats effect 3 release so I am guessing you won the battle with the garbage collector :-)
cooking with gas on swaydb... I would love to understand more about why you made swaydb. It looks like the perfect kit for my project (have previously used pulsar and then rocksdb)..
the ability to expire multi-maps covers my use cases nicely. My use case is http based messaging middleware with a large number of topics / mailboxes.
where swaydb would be the underlying storage engine
I have it prototyped in and all looks well. Wondering if there are any performance nobs I should turn.
Right now it is a direct port from the RocksDB code so topic and database partitioning is not managed by the storage engine BUT with multi-maps and expirations I can easily move it all into that layer...
the RocksDB code we create a RocksDB per day and at an atomic moment each day we drop the oldest rocks db from the end of partition list and add a new one to front of the list...
our structure is PartitionId : TopicId : TopicOffset
Glen Marchesani
which I think we just move into multi-maps and slightly change the order TopicId : PartitionId : TopicOffset
thanks again for the great kit...
really just amazed that this scala based kv store slid under my radar and I am just finding it now
Simer Plaha

I see you did a cats effect 3 release so I am guessing you won the battle with the garbage collector :-)

Oh the battle with the GC has been on for a while. SwayDB outperforms RocksDB when write workload is small-medium but on heavy compaction workloads longer GC pauses occur frequently so the battle is still on.

I would love to understand more about why you made swaydb.

There were many reasons to start but in general I felt that existing solutions always fell short one way or another. Thought a storage engine that (following general Scala philosophy) allowed simple data-structures to be composed easily to build more rich data-structures was needed. A company I used to work at got hefty monthly cloud bills for running ML training on large data so reducing these bills was also a major motivation.

My use case is http based messaging middleware with a large number of topics / mailboxes. where swaydb would be the underlying storage engine

I'm glad to hear that and would love to learn more. Is it a distributed system?

Wondering if there are any performance nobs I should turn.

There is heaps you can do here. The basic idea behind all configurations is that if something can determine Disk, CPU & RAM usage then it should be configurable. But I think this needs more documentation showing experiments with different settings and results.

the RocksDB code we create a RocksDB per day and at an atomic moment each day we drop the oldest rocks db from the end of partition list and add a new one to front of the list...

That's clever. Yep MultiMap should make this very easy.

I'm sure you know this already but please note that SwayDB is not production ready yet. Quick status overview is that there a total of 2,897 test-cases (unit, stress, integration & performance tests) to ensure that nothing leads to incorrect data or data corruption but a solution for reducing GC pauses on heavy compaction workload is still pending.

Glen Marchesani
roger on non-production ready it's all good, I have bloody fingers on many fronts
I have some customers that give me some leeway :-)
So the server I have (we call it hermes) has been used in production for 8 years. It has some commercial IP in it's code base that has stopped us from open sourcing it. My current effort has been transitioning the code base to a 100% open source base.
Glen Marchesani

So the general idea is the following

* mailboxes are lightweight and are created with a public and a private key
* you can write messages to a mailbox using the public key
* you can read messages from a mailbox using the private key
* best practice transport is to use the http methods they use server sent events and / or web sockets with a few other more esoteric mechanisms
* the idea is you can do all your RPC over this.  So browsers can talk directly to browsers, servers to talk servers, a browser doesn't have to talk to only "it's" server but can be easily moved to talk to another server
* all mailboxes are just a sequence of messages ALWAYS stored to disk
* reading a mailbox is just saying what index to start at
* tailing a mailbox is an in memory transaction (lowers latency)
* work load is LOTS of mailboxes without any specific mailbox really getting a ton of IO though we have use cases where we push pulsar / kafka data through a mailbox and hence get high IO, those work fine you 
* designed to be distributed but single server for now

we have tried a LOT of things to make this work over the years (various messaging servers most recently pulsar) and they all sort of fall over at some point... for example pulsar with a large number of topics / mailboxes all tailing thrashes really hard, like we can take down large pulsar clusters kind of thrashing.

In effect you can get around the infrastructure game (load balancing, etc, etc) but using this, which is what we do. We run some large customers using it and really avoid a TON of infrastrucutre headaches (think legacy systems, multiple clouds, things moving around all the time and multiple teams)... We run these systems with very small teams because of it. We deploy a service make sure it can reach the hermes server and everything else just works.

For debugging it is really cool. All I need are the two mailboxes of a conversation and I can re-construct the entire conversation... If you built your browser app properly can even do some ELM like things for replay...