simerplaha on develop
- bump Scala & sbt versions and… (compare)
@/all Hey guys just released version 0.3. This release contains support for expire API (TTL), update & improved batching for atomic writes. Here is a sample of some of the new API.
//put & expire a key-value after a day
db.put(key = 1, value = "one", expireAfter = 1.day)
//or expire a range of key-values after an hour
db.expire(from = 1, to = 1000, after = 1.hour)
//update values without altering the already set expiration
db.update(from = 1, to = 1000, value = "value updated")
//or update a single value
db.update(key = 1, value = "value updated")
//fetch the expiration deadline for a key
db.expiration(key = 1)
//fetch time left until the key's expiration
db.timeLeft(key = 1)
Here is a whole list of write AP and read API.
I'm going to start working on getting compression ready for v0.4. If you think of any features you think we should add please do bring it up.
@/all Just pushed a big commit to support compression. Both LZ4 and Snappy libraries are full supported. It's part of the grouping strategy you can read up here .
Segment file format documentation
Grouping format documentation
Will do a release after simerplaha/SwayDB#15
Hi @touhonoob, just ran some quick space usages tests on 10 million key-values with compression disabled for both LevelDB and SwayDB.
SwayDB provides two storage formats Map
and Set
. So I've added space usage for both.
The following shows the total size of sstables created by LevelDB and segments files by SwayDB.
Key -> Value = 1 -> 1, 2 -> 2, 3 -> 3 .... 10000000 -> 10000000
Key -> Value = 1 -> 1, 2 -> 1, 3 -> 1 .... 10000000 -> 1
I'm not too familiar with LevelDB's configuration but I used https://github.com/fusesource/leveldbjni with the following options.
val options = new Options()
options.createIfMissing(true)
options.compressionType(CompressionType.NONE)
options.writeBufferSize(10000000)
For SwayDB I used the default with none groupingStrategy
which disables compression.
Hi @algobardo. Async APIs should not wrap sync APIs. I'm wondering if you are looking at an old version which had Async APIs work-in-progress?
In the newer versions (0.8-BETA+) all APIs are abstracted with the type Tag[T[_]]
which allows us to choose whichever container we want to use for Sync and Async APIs.
For example: if you want to use Monix Task.
Stream
Future
with monix goes, the above approach allows us to use all container like Monix’s Task
, Scala Future
or Try
, Scalaz Task
, ZIO
etc.
@algobardo hey mate, just pushed another release with Monix support. Added 2 examples here demoing how to use Task.
I've never worked with monix before so any suggestions on improving the examples or testing would help a lot.
Let me know how you go.
Hi @oridag, thank you. Supporting scala-native is something I'm looking forward to as well (#1).
I wish I could say that we are production ready now but we need to write more integration test-cases (#178) and finish few tasks relating to measuring and monitoring performance (specially #276).
Best case scenario is that we reach production readiness before new year otherwise we will definitely be production ready by early-mid next year.
Bag
implementation into your code. Proper release will happen when I win the battle against JVM's garbage collector. import cats.effect.IO
import cats.effect.unsafe.IORuntime
import swaydb.Bag.Async
import swaydb.serializers.Default._
import swaydb.{IO => SwayIO, _}
import scala.concurrent.{ExecutionContext, Future, Promise}
import scala.util.Failure
object Cats3Example extends App {
/**
* Cats-effect 3 async bag implementation
*/
implicit def apply(implicit runtime: IORuntime): swaydb.Bag.Async[IO] =
new Async[IO] { self =>
override def executionContext: ExecutionContext =
runtime.compute
override val unit: IO[Unit] =
IO.unit
override def none[A]: IO[Option[A]] =
IO.pure(Option.empty)
override def apply[A](a: => A): IO[A] =
IO(a)
override def map[A, B](a: IO[A])(f: A => B): IO[B] =
a.map(f)
override def transform[A, B](a: IO[A])(f: A => B): IO[B] =
a.map(f)
override def flatMap[A, B](fa: IO[A])(f: A => IO[B]): IO[B] =
fa.flatMap(f)
override def success[A](value: A): IO[A] =
IO.pure(value)
override def failure[A](exception: Throwable): IO[A] =
IO.fromTry(Failure(exception))
override def foreach[A](a: IO[A])(f: A => Unit): Unit =
f(a.unsafeRunSync())
def fromPromise[A](a: Promise[A]): IO[A] =
IO.fromFuture(IO(a.future))
override def complete[A](promise: Promise[A], a: IO[A]): Unit =
promise tryCompleteWith a.unsafeToFuture()
override def fromIO[E: SwayIO.ExceptionHandler, A](a: SwayIO[E, A]): IO[A] =
IO.fromTry(a.toTry)
override def fromFuture[A](a: Future[A]): IO[A] =
IO.fromFuture(IO(a))
override def suspend[B](f: => IO[B]): IO[B] =
IO.defer(f)
override def flatten[A](fa: IO[IO[A]]): IO[A] =
fa.flatMap(io => io)
}
implicit val runtime = IORuntime.global
val test =
for {
map <- memory.Map[Int, String, Nothing, IO]()
_ <- map.put(key = 1, value = "one")
value <- map.get(key = 1) //returns "one"
} yield {
println(s"value: $value")
}
test.unsafeRunSync()
}
PartitionId : TopicId : TopicOffset
I see you did a cats effect 3 release so I am guessing you won the battle with the garbage collector :-)
Oh the battle with the GC has been on for a while. SwayDB outperforms RocksDB when write workload is small-medium but on heavy compaction workloads longer GC pauses occur frequently so the battle is still on.
I would love to understand more about why you made swaydb.
There were many reasons to start but in general I felt that existing solutions always fell short one way or another. Thought a storage engine that (following general Scala philosophy) allowed simple data-structures to be composed easily to build more rich data-structures was needed. A company I used to work at got hefty monthly cloud bills for running ML training on large data so reducing these bills was also a major motivation.
My use case is http based messaging middleware with a large number of topics / mailboxes. where swaydb would be the underlying storage engine
I'm glad to hear that and would love to learn more. Is it a distributed system?
Wondering if there are any performance nobs I should turn.
There is heaps you can do here. The basic idea behind all configurations is that if something can determine Disk, CPU & RAM usage then it should be configurable. But I think this needs more documentation showing experiments with different settings and results.
the RocksDB code we create a RocksDB per day and at an atomic moment each day we drop the oldest rocks db from the end of partition list and add a new one to front of the list...
That's clever. Yep MultiMap
should make this very easy.
I'm sure you know this already but please note that SwayDB is not production ready yet. Quick status overview is that there a total of 2,897 test-cases (unit, stress, integration & performance tests) to ensure that nothing leads to incorrect data or data corruption but a solution for reducing GC pauses on heavy compaction workload is still pending.