Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Ross Wolf
    @rw-access
    what performance differences have you seen between the multicursor batched and non-batched?
    Ville Tuulos
    @tuulos
    re: "something similar to a multicursor but threaded" - you mean multiple consumers in different threads pulling events from a single cursor?
    or a single consumer but multiple threads doing decoding in parallel?
    Ross Wolf
    @rw-access
    I believe the second one. The filters I'm using are generally sparse and cover multiple trails and tdbs. I want to use multiple threads for iterating the cursors (especially since there's a chance that some won't have any matches) and then one thread to consume the results and process them in order, like a multicursor
    Ville Tuulos
    @tuulos
    makes sense. In your case I would just have K parallel threads using normal (not multi) cursors. Each thread needs to push events to some output queue/buffer. The consumer can take care of ordering e.g. using the pqueue priority queue that tdb_multicursor uses internally
    Ross Wolf
    @rw-access
    awesome. yeah that makes sense. I'll see how that looks. and there's still a good chance that I'm wrong and the single threaded consumer is the real bottleneck. thanks for the help!
    Ville Tuulos
    @tuulos
    cool! let us know how it goes
    Ross Wolf
    @rw-access
    hello again!
    quick question this time - what's the lifetime of const tdb_event * as returned by tdb_cursor_peek/tdb_cursor_next?
    i'm guessing that it's valid until _tdb_cursor_next_batch is called again
    Oleg Avdeev
    @oavdeev
    yes, basically the idea that it lives until next tdb_cursor_next() call (that may call _tdb_cursor_next_batch internally)
    luca santini
    @santoxyz
    hello everybody. i'm evaluating using traildb (python client) on an embedded system with limited ram (1GB) and storage (4GB) to save "big" data (1 year of samples - hundreds of variables - 1 second interval).
    it seems promising, but i'm not sure i understood how it's working.
    Tutorial says: create a db, add points, finalize.
    What i see adding points is: file on disk is not growing... does it persist data only on finalize() ? How could i make sure data is persisted "frequently" (i.e. every minute) to minimize the loss in case of crash/reboot/problems?
    Ville Tuulos
    @tuulos
    you can choose how often to call tdb_finalize based on your needs. You can call it every minute. You can have a separate compaction process that then merges the minute-files to a larger chunk e.g. every hour / day.
    luca santini
    @santoxyz
    sounds good! yesterday i produced a dataset containing 1year of fake data in a couple of hours; resulting in a 97MB data.tdb (very good), but i noticed temporary files for 33GB (very bad!).
    Hope that finalizing and merging every minute i'll keep the temp data small. need some testing.
    luca santini
    @santoxyz
    now trying
    tdb merge -o merged data-1year.tdb data-chunk-3minutes.tdb
    process currently in progress.. generated 33GB of temp data and running for minutes... on a fast SSD disk.
    This is not acceptable in my embedded scenario :(
    I'm starting to think that what i want to do is not doable at all.
    Ville Tuulos
    @tuulos
    during creation, tdb uses local disk for tmp files quite extensively. Reading tdbs should be very efficient even in resource-constrained environments but writing hasn't been optimized for such cases
    Marius
    @marius-plv
    Hi Luca, from personal user experience with tdb, the time of updating a single tdb (even with small amounts of data) seems to be increasing with the tdb file size. I understand that this was not a design requirement, as the goal was fast reading operation. (Personally, I would also enjoy having faster tdb write times.) But what helped in my case (which is rather a work-around) was creating smaller tdbs in a RAM based filesystems (on Linux this could be ramfs, tmpfs, ..). The time of updating is still going to increase with the tdb size, but the operation itself will be factors faster. So my understanding is that, to optimize the write time, the ideal case would be to have independent tdb files (second/minute/hour/day/you choose; the smaller the time period, the faster the writes should run - measuring would confirm what is best) and not merge these.
    ruchirj
    @ruchirj
    Hello, I am interested in trying out TrailDB as a telemetry datastore for alerting on events. We may have potentially millions of events coming in every second
    The events themselves are JSON blobs. And event group identity can be inferred from a subset of the JSON properties. The use case is to build histograms of counts by event group. We will be write intensive and the reads are going to be visualization driven. Does it make sense to use TrailDB for this use case?
    Ville Tuulos
    @tuulos
    TrailDB is optimized for read-heavy use cases, complex analytics etc. If your workload is write-heavy with simple read queries, TDB might not be the best fit
    Chen Xinlu
    @boisde
    hello, i am new to traildb, is there any recent benchmark on it?
    for reference.
    Ville Tuulos
    @tuulos
    no, unfortunately we don't have a recent benchmark. I can assure you it is plenty fast especially on the read side :)
    Chen Xinlu
    @boisde
    does tdb dump still support s3 on mac OSX? like tdb dump -i s3://xxxx/yy.tdb?
    Tried at Mac Mojave, which reports TDB_ERR_IO_OPEN
    Chen Xinlu
    @boisde
    Hello, @tuulos is there a way to recover an unfinalized tdb?
    Chen Xinlu
    @boisde
    hi, is it possible that tdb can handle sorting for single .tdb file?