Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Ville Tuulos
    @tuulos
    you can implement a layer on top of core TrailDB that does something like that though
    Matt Perpick
    @clutchski
    ok thanks.
    Thomas P
    @ScullWM
    Hey! I was wondering how to use trailDB with a php micro-service env.
    So I've start a small Golang micro-service app to send events in it with a json format.
    Does it sound weird to you ?
    Ville Tuulos
    @tuulos
    hey, sorry for the delayed reply
    @ScullWM it doesn't sound weird :)
    Thomas P
    @ScullWM
    thanks @tuulos lot of great things in traildb :+1:
    Milan Opath
    @milancio42
    Hi Ville, I was playing with Traildb on Linux. I'd like to run the tests but I cannot figure out how. You mentioned ./coverage.py in tests directory in one of your previous messages, but I cannot find it. Thanks a lot.
    Ville Tuulos
    @tuulos
    Milan Opath
    @milancio42

    oh I should have mentioned it before - I tried to build traildb with waf, but it fails with StopIteration exeption.

    Traceback (most recent call last):
      File "/home/milan/Dev/traildb/.waf3-1.8.20-c859ca7dc3693011756f4edf45c36626/waflib/Node.py", line 312, in ant_iter
        raise StopIteration
    StopIteration
    
    The above exception was the direct cause of the following exception:
    
    Traceback (most recent call last):
      File "/home/milan/Dev/traildb/.waf3-1.8.20-c859ca7dc3693011756f4edf45c36626/waflib/Scripting.py", line 114, in waf_entry_point
        run_commands()
      File "/home/milan/Dev/traildb/.waf3-1.8.20-c859ca7dc3693011756f4edf45c36626/waflib/Scripting.py", line 171, in run_commands
        parse_options()
      File "/home/milan/Dev/traildb/.waf3-1.8.20-c859ca7dc3693011756f4edf45c36626/waflib/Scripting.py", line 144, in parse_options
        Context.create_context('options').execute()
      File "/home/milan/Dev/traildb/.waf3-1.8.20-c859ca7dc3693011756f4edf45c36626/waflib/Options.py", line 146, in execute
        super(OptionsContext,self).execute()
      File "/home/milan/Dev/traildb/.waf3-1.8.20-c859ca7dc3693011756f4edf45c36626/waflib/Context.py", line 93, in execute
        self.recurse([os.path.dirname(g_module.root_path)])
      File "/home/milan/Dev/traildb/.waf3-1.8.20-c859ca7dc3693011756f4edf45c36626/waflib/Context.py", line 134, in recurse
        user_function(self)
      File "/home/milan/Dev/traildb/wscript", line 57, in options
        opt.load("compiler_c")
      File "/home/milan/Dev/traildb/.waf3-1.8.20-c859ca7dc3693011756f4edf45c36626/waflib/Context.py", line 90, in load
        fun(self)
      File "/home/milan/Dev/traildb/.waf3-1.8.20-c859ca7dc3693011756f4edf45c36626/waflib/Tools/compiler_c.py", line 36, in options
        opt.load_special_tools('c_*.py',ban=['c_dumbpreproc.py'])
      File "/home/milan/Dev/traildb/.waf3-1.8.20-c859ca7dc3693011756f4edf45c36626/waflib/Context.py", line 321, in load_special_tools
        lst=self.root.find_node(waf_dir).find_node('waflib/extras').ant_glob(var)
      File "/home/milan/Dev/traildb/.waf3-1.8.20-c859ca7dc3693011756f4edf45c36626/waflib/Node.py", line 361, in ant_glob
        ret=[x for x in self.ant_iter(accept=accept,pats=[to_pat(incl),to_pat(excl)],maxdepth=kw.get('maxdepth',25),dir=dir,src=src,remove=kw.get('remove',True))]
      File "/home/milan/Dev/traildb/.waf3-1.8.20-c859ca7dc3693011756f4edf45c36626/waflib/Node.py", line 361, in <listcomp>
        ret=[x for x in self.ant_iter(accept=accept,pats=[to_pat(incl),to_pat(excl)],maxdepth=kw.get('maxdepth',25),dir=dir,src=src,remove=kw.get('remove',True))]
    RuntimeError: generator raised StopIteration

    So I've built it with autotools and was looking for a way to run tests with it.
    But if waf is the only way to run tests, I'll try to debug it.
    Thank you.

    Milan Opath
    @milancio42
    Ok, waf 1.8.20 does not work with python 3.7. Used waf 2.0.10 instead and it worked like a charm.
    Ville Tuulos
    @tuulos
    oh, interesting
    I haven't tried it with Py3.7 yet
    Jakob Sievers
    @cannedprimates
    does tdb handle small field values (ie values that would fit into an item directly without going through a lexicon) specially? had a quick look at jsm_insert_large() and didnt see anything...
    semi-related: are there best practices around numeric field values? should I hand the byte representation to tdb?
    Ville Tuulos
    @tuulos
    Hi @cannedprimates - there's no special handling of small values. Would you need it for performance reasons?
    all values are byte blobs currently. No special handling for numeric field values. If you have floating point values and you don't need the full 64/32-bit accuracy, you can save space / increase performance by truncating values to the desired accuracy before inserting them
    Jakob Sievers
    @cannedprimates
    @tuulos thanks for the reply! no concrete need for it (yet :)), just curious
    Ville Tuulos
    @tuulos
    cool. Let me know if you have any other questions / feedback!
    donaherc
    @donaherc

    Hello! I've run into some intermittent issues reading from a handful of ~18MB files I've combined repeatedly with tdb_cons_add(). Have anyone seen any behavior that resembles this:

    ==15444== Invalid read of size 8
    ==15444==    at 0x4E3FD52: read_bits (tdb_bits.h:14)
    ==15444==    by 0x4E3FD52: read_bits64 (tdb_bits.h:38)
    ==15444==    by 0x4E3FD52: huff_decode_value (tdb_huffman.h:72)
    ==15444==    by 0x4E3FD52: _tdb_cursor_next_batch (tdb_decode.c:282)
    ==15444==    by 0x935C57: tdb_cursor_next (traildb.h:304)
    ==15444==    by 0x935C57: _cgo_4805fbb2d53a_Cfunc_tdb_cursor_next (cgo-gcc-prolog:222)
    ==15444==    by 0x46565F: runtime.asmcgocall (/usr/local/bin/go/src/runtime/asm_amd64.s:688)
    ==15444==    by 0xC4200928FF: ???
    ==15444==    by 0xB07CE87: ???
    ==15444==    by 0x460D81: runtime.(*mcache).nextFree.func1 (/usr/local/bin/go/src/runtime/malloc.go:556)
    ==15444==    by 0xC4201AABFF: ???
    ==15444==    by 0x43BB8F: ??? (/usr/local/bin/go/src/runtime/proc.go:1092)
    ==15444==  Address 0xe323ff9 is in a r-- mapped file /home/vagrant/app_files2/0157e8982def92b71fcc767d568e57883b86dba4298b66c2468127de0ef9c8cc segment
    ==15444== 
    fatal error: unexpected signal during runtime execution
    [signal SIGSEGV: segmentation violation code=0x1 addr=0xe324000 pc=0x4e3fd52]
    
    runtime stack:
    runtime.throw(0xb18c4c, 0x2a)
            /usr/local/bin/go/src/runtime/panic.go:616 +0x81
    runtime.sigpanic()
            /usr/local/bin/go/src/runtime/signal_unix.go:372 +0x28e
    
    goroutine 12 [syscall]:
    runtime.cgocall(0x935c00, 0xc42006ca10, ==15444== Use of uninitialised value of size 8
    ==15444==    at 0x438673: runtime.printhex (/usr/local/bin/go/src/runtime/print.go:219)
    ==15444==    by 0x45AA68: runtime.gentraceback (/usr/local/bin/go/src/runtime/traceback.go:406)
    ==15444==    by 0x45C4F8: runtime.traceback1 (/usr/local/bin/go/src/runtime/traceback.go:684)
    ==15444==    by 0x45C371: runtime.traceback (/usr/local/bin/go/src/runtime/traceback.go:645)
    ==15444==    by 0x45CF56: runtime.tracebackothers (/usr/local/bin/go/src/runtime/traceback.go:816)
    ==15444==    by 0x437B54: runtime.dopanic_m (/usr/local/bin/go/src/runtime/panic.go:736)
    ==15444==    by 0x46271B: runtime.dopanic.func1 (/usr/local/bin/go/src/runtime/panic.go:598)
    ==15444==    by 0x437479: runtime.dopanic (/usr/local/bin/go/src/runtime/panic.go:597)
    ==15444==    by 0x437550: runtime.throw (/usr/local/bin/go/src/runtime/panic.go:616)
    ==15444==    by 0x44CD7D: runtime.sigpanic (/usr/local/bin/go/src/runtime/signal_unix.go:372)
    ==15444==    by 0x4E3FD51: read_bits (tdb_bits.h:13)
    ==15444==    by 0x4E3FD51: read_bits64 (tdb_bits.h:38)
    ==15444==    by 0x4E3FD51: huff_decode_value (tdb_huffman.h:72)
    ==15444==    by 0x4E3FD51: _tdb_cursor_next_batch (tdb_decode.c:282)
    ==15444==    by 0x935C57: tdb_cursor_next (traildb.h:304)
    ==15444==    by 0x935C57: _cgo_4805fbb2d53a_Cfunc_tdb_cursor_next (cgo-gcc-prolog:222)
    ==15444== 
    ==15444== Conditional jump or move depends on uninitialised value(s)
    ==15444==    at 0x438685: runtime.printhex (/usr/local/bin/go/src/runtime/print.go:220)
    ==15444==    by 0x45AA68: runtime.gentraceback (/usr/local/bin/go/src/runtime/traceback.go:406)
    ==15444==    by 0x45C4F8: runtime.traceback1 (/usr/local/bin/go/src/runtime/traceback.go:684)
    ==15444==    by 0x45C371: runtime.traceback (/usr/local/bin/go/src/runtime/traceback.go:645)
    ==15444==    by 0x45CF56: runtime.tracebackothers (/usr/local/bin/go/src/runtime/traceback.go:816)
    ==15444==    by 0x437B54: runtime.dopanic_m (/usr/local/bin/go/src/runtime/panic.go:736)
    ==15444==    by 0x46271B: runtime.dopanic.func1 (/usr/local/bin/go/src/runtime/panic.go:598)
    ==15444==    by 0x437479: runtime.dopanic (/usr/local/bin/go/src/runtime/panic.go:597)
    ==15444==    by 0x437550: runtime.throw (/usr/local/bin/go/src/runtime/panic.go:616)
    ==15444==    by 0x44CD7D: runtime.sigpanic (/usr/local/bin/go/src/runtime/signal_unix.go:372)
    ==15444==    by 0x4E3FD51: read_bits (tdb_bits.h:13)
    ==15444==    by 0x4E3FD51: read_bits64 (tdb_bits.h:38)
    ==15444==    by 0x4E3FD51: huff_decode_value (tdb_huffman.h:72)

    I'm using the traildb-go bindings.

    Willing to provide more info if it'd help!
    donaherc
    @donaherc
    Having dug in more, I now suspect that the issue is that our vm.max_map_count settings on our hosts we use to tdb_cons_add were too low (they were at the default 65530). Have seen no issues after raising the setting
    donaherc
    @donaherc
    I believe we're still running into intermittent issues iterating through traildb files and also merging them using tdb_cons_append causing segfaults inside CGO, which forces a panic. Has anyone here used the traildb-go library and seen such behavior? Is it possible that undefined behavior with traildb file access would cause a panic inside CGO, but behave normally when handled with the C library directly?
    Ville Tuulos
    @tuulos
    could you try tdb merge on the command line with the same files to see if it still segfaults?
    it might be an issue with the Go bindings or (more unlikely), the C library itself
    donaherc
    @donaherc
    hello! yeah have been unable to reproduce with the tdbcli tools, although for a handful of the files we have seen intermittent segfaulting using the 'tdb index' . Some of the files that appear to be impacted have values north of 10k characters, which is pretty anomalous for the data we're storing. When pushing the traildb reads down into pure C we have seen no issues.
    Ross Wolf
    @rw-access

    hello! i saw --threads on the CLI help and am wondering what is made parallel?

    I know that tdb handles aren't thread safe but am thinking of ways to build something parallel and ordered on top of multiple tdb files and cursors within a single process. possibly a batching multi-multicursor? could that work, or is there a good chance that i'd run into other issues that i'm not thinking of? thanks!

    Ross Wolf
    @rw-access
    the more I think about it, the less sense that seems to make. for my use case, I expect that many of the underlying cursors will not return results. so I think carefully creating with something similar to tdb_multi_cursor_new but calling a version of tdb_multi_cursor_reset that threads the initial calls to tdb_cursor_peekmight actually do the trick (since many cursors will be exhausted right away). i'll have to see how much time is spent in tdb_multi_cursor_new vs tdb_multi_cursor_next
    Oleg Avdeev
    @oavdeev
    Looks like --threads is only used for indexing in tdbcli
    I'm not sure I 100% understand what do you mean by "parallel .. within a single process"?
    Since tdb is read only after you create it, a typical pattern is that you just have a db open in every thread, and a cursor, and split work between the threads based on uuid
    Ville Tuulos
    @tuulos
    right, like @oavdeev said - everything on the read-size can be parallelized by using independent handles and cursors on each thread
    Ross Wolf
    @rw-access
    yeah, I've just been brainstorming ways to do something similar to a multicursor but threaded. ideally I would want to hit tdb_multi_cursor_next or the batched version, but have the peeking for the underlying cursors be more parallelized.
    but that seems really tricky and I'd obviously to divvy up the tdb handles between threads.
    I think a quick win without too much reworking is to parallelized the initial peek calls when calling tdb_multi_cursor_reset when it's created
    what performance differences have you seen between the multicursor batched and non-batched?
    Ville Tuulos
    @tuulos
    re: "something similar to a multicursor but threaded" - you mean multiple consumers in different threads pulling events from a single cursor?
    or a single consumer but multiple threads doing decoding in parallel?
    Ross Wolf
    @rw-access
    I believe the second one. The filters I'm using are generally sparse and cover multiple trails and tdbs. I want to use multiple threads for iterating the cursors (especially since there's a chance that some won't have any matches) and then one thread to consume the results and process them in order, like a multicursor
    Ville Tuulos
    @tuulos
    makes sense. In your case I would just have K parallel threads using normal (not multi) cursors. Each thread needs to push events to some output queue/buffer. The consumer can take care of ordering e.g. using the pqueue priority queue that tdb_multicursor uses internally
    Ross Wolf
    @rw-access
    awesome. yeah that makes sense. I'll see how that looks. and there's still a good chance that I'm wrong and the single threaded consumer is the real bottleneck. thanks for the help!
    Ville Tuulos
    @tuulos
    cool! let us know how it goes
    Ross Wolf
    @rw-access
    hello again!
    quick question this time - what's the lifetime of const tdb_event * as returned by tdb_cursor_peek/tdb_cursor_next?
    i'm guessing that it's valid until _tdb_cursor_next_batch is called again
    Oleg Avdeev
    @oavdeev
    yes, basically the idea that it lives until next tdb_cursor_next() call (that may call _tdb_cursor_next_batch internally)
    luca santini
    @santoxyz
    hello everybody. i'm evaluating using traildb (python client) on an embedded system with limited ram (1GB) and storage (4GB) to save "big" data (1 year of samples - hundreds of variables - 1 second interval).
    it seems promising, but i'm not sure i understood how it's working.
    Tutorial says: create a db, add points, finalize.
    What i see adding points is: file on disk is not growing... does it persist data only on finalize() ? How could i make sure data is persisted "frequently" (i.e. every minute) to minimize the loss in case of crash/reboot/problems?
    Ville Tuulos
    @tuulos
    you can choose how often to call tdb_finalize based on your needs. You can call it every minute. You can have a separate compaction process that then merges the minute-files to a larger chunk e.g. every hour / day.
    luca santini
    @santoxyz
    sounds good! yesterday i produced a dataset containing 1year of fake data in a couple of hours; resulting in a 97MB data.tdb (very good), but i noticed temporary files for 33GB (very bad!).
    Hope that finalizing and merging every minute i'll keep the temp data small. need some testing.
    luca santini
    @santoxyz
    now trying
    tdb merge -o merged data-1year.tdb data-chunk-3minutes.tdb
    process currently in progress.. generated 33GB of temp data and running for minutes... on a fast SSD disk.
    This is not acceptable in my embedded scenario :(
    I'm starting to think that what i want to do is not doable at all.