simerplaha on develop
- bump Scala & sbt versions and… (compare)
Hi Frank, I haven't looked at setting up travis yet.
Random inserts are slower than sequential because of insertion cost of inserting randomly into skipList. That's just how skipLists are. In Level0 after key-values get written to a .log
file they get added to an in-memory skipList. Sequential insertion would just append to the end of the skipList and adding new upper Levels (in the skipList) as required, but random insertions result in modifying and re-linking (similar to linkedList) multiple keys in the skipList and sometimes multiple Levels (skipList hierarchies).
You can run a simple test with java.util.concurrent.ConcurrentSkipListMap
to see the performance difference.
By data structure I assuming you mean the 'file formats'. I don't think the current file formats are documented anywhere. The overview of what Map
files (.log) and Segment
files (.seg) are can be found on the website. The actual file formats needs to be documented. Have a look at the documentation at http://www.swaydb.io/terminology/ which is an overview of each file and the role they play in Levels which should give you a good understanding of how the Levels are structured.
Thank you for the chart :+1:
Also why is the difference getting less when moving from 2-level in-memory towards 8-level regular file?
As you know RAM is a lot faster than MMAP files and MMAP files are faster than regular files (java.io.FileChannel).
The benchmarks should also show performance results when the databases cache is partly populated. As you read key-values they get cached in-memory and performance increases drastically. Reads get a lot faster as the database is running because the most read keys are in-memory which other reads piggy back off to save disk seeks.
As you know RAM is a lot faster than MMAP files and MMAP files are faster than regular files (java.io.FileChannel).
This is clear to me, of course. I was just wondering why the relative performance penalty becomes less for regular files than for MMAP files / in-memory. Do you know what I mean?
I now also understand that you are using skip lists as data structures for the in-memory levels. What are you using for storing on disk? B trees? SSTables? Some other structure?
Hi Frank, I haven't looked at setting up travis yet.
Ok I will see if I can find the time. What should travis run? sbt test
?
Thank you for the chart :thumbsup:
I am currently writing a blog post series about data structures to refresh my knowledge. Is it ok if I use your numbers to visualize the effects of compaction and different persistence levels on the throughput? That's where the graph was going to go. Here is the latest post of the series: Read Efficient Data Structures. I am currently working on Update Efficient Data Structures.
What are you using for storing on disk? B trees? SSTables? Some other structure?
I think its similar to SSTables. Here is an overview of the format - the Segment files store bytes in 3 groups.
1- Values - Top of the files stores all values bytes.
2 - Index - Keys stored in sorted order with other metadata related to the key such as id, compression info, value offset, TTL etc.
3 - Footer - stores bloomfilter & other file format information like key-value count, CRC, index offsets.
Ok I will see if I can find the time. What should travis run? sbt test?
Yep sbt test
is correct.
Is it ok if I use your numbers to visualize the effects of compaction and different persistence levels on the throughput?
Yes of course that's OK. Looking forward to your blog post :+1:
... Here is the latest post of the series: Read Efficient Data Structures. I am currently working on Update Efficient Data Structures.
I had a quick read and it looks very interesting. I will have a proper read soon.
The reason I used a custom format to get better compression.
A test inserting 10000000 key-values with compression and MMAP disabled resulted in following disk usage.
The above is just a quick test. I might be missing something from my test to tune LevelDB & RocksDB. It will be interesting to see the disk usage after releasing version 0.4
which will have better compressed file format and also support for LZ4.
@/all Hey guys just released version 0.3. This release contains support for expire API (TTL), update & improved batching for atomic writes. Here is a sample of some of the new API.
//put & expire a key-value after a day
db.put(key = 1, value = "one", expireAfter = 1.day)
//or expire a range of key-values after an hour
db.expire(from = 1, to = 1000, after = 1.hour)
//update values without altering the already set expiration
db.update(from = 1, to = 1000, value = "value updated")
//or update a single value
db.update(key = 1, value = "value updated")
//fetch the expiration deadline for a key
db.expiration(key = 1)
//fetch time left until the key's expiration
db.timeLeft(key = 1)
Here is a whole list of write AP and read API.
I'm going to start working on getting compression ready for v0.4. If you think of any features you think we should add please do bring it up.
@/all Just pushed a big commit to support compression. Both LZ4 and Snappy libraries are full supported. It's part of the grouping strategy you can read up here .
Segment file format documentation
Grouping format documentation
Will do a release after simerplaha/SwayDB#15
Hi @touhonoob, just ran some quick space usages tests on 10 million key-values with compression disabled for both LevelDB and SwayDB.
SwayDB provides two storage formats Map
and Set
. So I've added space usage for both.
The following shows the total size of sstables created by LevelDB and segments files by SwayDB.
Key -> Value = 1 -> 1, 2 -> 2, 3 -> 3 .... 10000000 -> 10000000
Key -> Value = 1 -> 1, 2 -> 1, 3 -> 1 .... 10000000 -> 1
I'm not too familiar with LevelDB's configuration but I used https://github.com/fusesource/leveldbjni with the following options.
val options = new Options()
options.createIfMissing(true)
options.compressionType(CompressionType.NONE)
options.writeBufferSize(10000000)
For SwayDB I used the default with none groupingStrategy
which disables compression.
Hi @algobardo. Async APIs should not wrap sync APIs. I'm wondering if you are looking at an old version which had Async APIs work-in-progress?
In the newer versions (0.8-BETA+) all APIs are abstracted with the type Tag[T[_]]
which allows us to choose whichever container we want to use for Sync and Async APIs.
For example: if you want to use Monix Task.
Stream
Future
with monix goes, the above approach allows us to use all container like Monix’s Task
, Scala Future
or Try
, Scalaz Task
, ZIO
etc.
@algobardo hey mate, just pushed another release with Monix support. Added 2 examples here demoing how to use Task.
I've never worked with monix before so any suggestions on improving the examples or testing would help a lot.
Let me know how you go.
Hi @oridag, thank you. Supporting scala-native is something I'm looking forward to as well (#1).
I wish I could say that we are production ready now but we need to write more integration test-cases (#178) and finish few tasks relating to measuring and monitoring performance (specially #276).
Best case scenario is that we reach production readiness before new year otherwise we will definitely be production ready by early-mid next year.
Bag
implementation into your code. Proper release will happen when I win the battle against JVM's garbage collector. import cats.effect.IO
import cats.effect.unsafe.IORuntime
import swaydb.Bag.Async
import swaydb.serializers.Default._
import swaydb.{IO => SwayIO, _}
import scala.concurrent.{ExecutionContext, Future, Promise}
import scala.util.Failure
object Cats3Example extends App {
/**
* Cats-effect 3 async bag implementation
*/
implicit def apply(implicit runtime: IORuntime): swaydb.Bag.Async[IO] =
new Async[IO] { self =>
override def executionContext: ExecutionContext =
runtime.compute
override val unit: IO[Unit] =
IO.unit
override def none[A]: IO[Option[A]] =
IO.pure(Option.empty)
override def apply[A](a: => A): IO[A] =
IO(a)
override def map[A, B](a: IO[A])(f: A => B): IO[B] =
a.map(f)
override def transform[A, B](a: IO[A])(f: A => B): IO[B] =
a.map(f)
override def flatMap[A, B](fa: IO[A])(f: A => IO[B]): IO[B] =
fa.flatMap(f)
override def success[A](value: A): IO[A] =
IO.pure(value)
override def failure[A](exception: Throwable): IO[A] =
IO.fromTry(Failure(exception))
override def foreach[A](a: IO[A])(f: A => Unit): Unit =
f(a.unsafeRunSync())
def fromPromise[A](a: Promise[A]): IO[A] =
IO.fromFuture(IO(a.future))
override def complete[A](promise: Promise[A], a: IO[A]): Unit =
promise tryCompleteWith a.unsafeToFuture()
override def fromIO[E: SwayIO.ExceptionHandler, A](a: SwayIO[E, A]): IO[A] =
IO.fromTry(a.toTry)
override def fromFuture[A](a: Future[A]): IO[A] =
IO.fromFuture(IO(a))
override def suspend[B](f: => IO[B]): IO[B] =
IO.defer(f)
override def flatten[A](fa: IO[IO[A]]): IO[A] =
fa.flatMap(io => io)
}
implicit val runtime = IORuntime.global
val test =
for {
map <- memory.Map[Int, String, Nothing, IO]()
_ <- map.put(key = 1, value = "one")
value <- map.get(key = 1) //returns "one"
} yield {
println(s"value: $value")
}
test.unsafeRunSync()
}