Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    plainas
    @plainas
    @Marius-Plv what exactly is tmp db? a small traildb file that you write as events comming nad close after a while?
    Maybe an extra abstraction transparently wrapping both traildb and something else for the data on the tip maybe redis or something. It's a pitty... traildb looks really cool but the design choice of not including writing functionality is a big deal.
    Marius
    @marius-plv
    @plainas Yes, with "tmp DB" I meant a small, temporary DB where the most recent events are stored.
    kzarzycki
    @kzarzycki
    I know at least one system which authors made the same design decision for the storage format: Druid (druid.io). But still the system in the whole accept querying even the most recent data. The real-time ingestion tasks accept and collect rows on the (java) heap in a raw, not optimized format. Only after enough data has been collected, it builds a DB file (called segment in Druid parlance) and stores it on the disk. Before the persist data is queryable in the raw format. A simple java Map plays the role of this "temporary DB".. The number of rows kept in memory don't have to be large, it can be in 100K rows range not being that much of a pressure on memory. The small segments saved to disk are already queryable. After some time passed, ingestion task merges the small segments into large one and hands it over the finalized segment to query layer. I believe a system based on TrailDB could follow a similar approach. Can't wait to see if someone implements such a TrailDB-based database :)
    Ville Tuulos
    @tuulos
    yeah, buffering recent data in memory and periodically flushing to TrailDB has been a pattern we have had in mind
    the use cases I have been working with this far have been ok with hourly / daily data, so I haven't had need to implement in-memory caching
    plainas
    @plainas
    I see... but that requires a bit of engineering in itself. Transition from memory to storage needs to be bullet proof while not affecting the input flow of data in production environments. Maybe someone writes a solution that wraps this concept in another abstraction level.
    Marius
    @marius-plv
    @plainas I have some (C++) pet code which provides basis for this functionality (always write events in a tmp DB and merge it back with the "main" DB at caller's request; from here to storing in memory and dumping it when required is only a jump away). However I would require probably a few days to pull it outside of my framework and get some minor specifics out of the code. Plus a few more days to get code coverage + documentation to a sufficient level. But that would still be far from expected "bullet proof" / 1.0 at that stage, it would be rather a "shy" 0.0.1 :smile: Anyhow, I'm tempted to contribute this back to the comunity in the next 2-5 months, however cannot commit to it "right now".
    plainas
    @plainas
    cool :thumbsup:
    Marius
    @marius-plv
    So then this could be a starter for integrating this library with higher-level (micro-)services
    Franz Chen
    @Dendrimer
    Hey! I was trying to use the TrailDB Python bindings, it looks like the comments for TrailDBEventFilter have an error: it looks like conjunction and disjunction are swapped around in the comments.
    e.g: [[("job_title", "manager"), ("user", "george_jetson")]] -- Match records for the user "george_jetson" AND with job title "manager" should be be OR, and [[("job_title", "manager")], [("user", "george_jetson")]] -- Match records for the user "george_jetson" OR with job title "manager" should be AND
    Ville Tuulos
    @tuulos
    yeah, based on a quick look, that seems to be the case. Filters are expressed as conjunctive normal form queries, i.e. ANDs of ORs
    Matt Perpick
    @clutchski
    Hey all, does trail support numeric fields? e.g. query user=george AND age >=1.0
    Ville Tuulos
    @tuulos
    no, all fields are bytes
    you can implement a layer on top of core TrailDB that does something like that though
    Matt Perpick
    @clutchski
    ok thanks.
    Thomas P
    @ScullWM
    Hey! I was wondering how to use trailDB with a php micro-service env.
    So I've start a small Golang micro-service app to send events in it with a json format.
    Does it sound weird to you ?
    Ville Tuulos
    @tuulos
    hey, sorry for the delayed reply
    @ScullWM it doesn't sound weird :)
    Thomas P
    @ScullWM
    thanks @tuulos lot of great things in traildb :+1:
    Milan Opath
    @milancio42
    Hi Ville, I was playing with Traildb on Linux. I'd like to run the tests but I cannot figure out how. You mentioned ./coverage.py in tests directory in one of your previous messages, but I cannot find it. Thanks a lot.
    Ville Tuulos
    @tuulos
    Milan Opath
    @milancio42

    oh I should have mentioned it before - I tried to build traildb with waf, but it fails with StopIteration exeption.

    Traceback (most recent call last):
      File "/home/milan/Dev/traildb/.waf3-1.8.20-c859ca7dc3693011756f4edf45c36626/waflib/Node.py", line 312, in ant_iter
        raise StopIteration
    StopIteration
    
    The above exception was the direct cause of the following exception:
    
    Traceback (most recent call last):
      File "/home/milan/Dev/traildb/.waf3-1.8.20-c859ca7dc3693011756f4edf45c36626/waflib/Scripting.py", line 114, in waf_entry_point
        run_commands()
      File "/home/milan/Dev/traildb/.waf3-1.8.20-c859ca7dc3693011756f4edf45c36626/waflib/Scripting.py", line 171, in run_commands
        parse_options()
      File "/home/milan/Dev/traildb/.waf3-1.8.20-c859ca7dc3693011756f4edf45c36626/waflib/Scripting.py", line 144, in parse_options
        Context.create_context('options').execute()
      File "/home/milan/Dev/traildb/.waf3-1.8.20-c859ca7dc3693011756f4edf45c36626/waflib/Options.py", line 146, in execute
        super(OptionsContext,self).execute()
      File "/home/milan/Dev/traildb/.waf3-1.8.20-c859ca7dc3693011756f4edf45c36626/waflib/Context.py", line 93, in execute
        self.recurse([os.path.dirname(g_module.root_path)])
      File "/home/milan/Dev/traildb/.waf3-1.8.20-c859ca7dc3693011756f4edf45c36626/waflib/Context.py", line 134, in recurse
        user_function(self)
      File "/home/milan/Dev/traildb/wscript", line 57, in options
        opt.load("compiler_c")
      File "/home/milan/Dev/traildb/.waf3-1.8.20-c859ca7dc3693011756f4edf45c36626/waflib/Context.py", line 90, in load
        fun(self)
      File "/home/milan/Dev/traildb/.waf3-1.8.20-c859ca7dc3693011756f4edf45c36626/waflib/Tools/compiler_c.py", line 36, in options
        opt.load_special_tools('c_*.py',ban=['c_dumbpreproc.py'])
      File "/home/milan/Dev/traildb/.waf3-1.8.20-c859ca7dc3693011756f4edf45c36626/waflib/Context.py", line 321, in load_special_tools
        lst=self.root.find_node(waf_dir).find_node('waflib/extras').ant_glob(var)
      File "/home/milan/Dev/traildb/.waf3-1.8.20-c859ca7dc3693011756f4edf45c36626/waflib/Node.py", line 361, in ant_glob
        ret=[x for x in self.ant_iter(accept=accept,pats=[to_pat(incl),to_pat(excl)],maxdepth=kw.get('maxdepth',25),dir=dir,src=src,remove=kw.get('remove',True))]
      File "/home/milan/Dev/traildb/.waf3-1.8.20-c859ca7dc3693011756f4edf45c36626/waflib/Node.py", line 361, in <listcomp>
        ret=[x for x in self.ant_iter(accept=accept,pats=[to_pat(incl),to_pat(excl)],maxdepth=kw.get('maxdepth',25),dir=dir,src=src,remove=kw.get('remove',True))]
    RuntimeError: generator raised StopIteration

    So I've built it with autotools and was looking for a way to run tests with it.
    But if waf is the only way to run tests, I'll try to debug it.
    Thank you.

    Milan Opath
    @milancio42
    Ok, waf 1.8.20 does not work with python 3.7. Used waf 2.0.10 instead and it worked like a charm.
    Ville Tuulos
    @tuulos
    oh, interesting
    I haven't tried it with Py3.7 yet
    Jakob Sievers
    @cannedprimates
    does tdb handle small field values (ie values that would fit into an item directly without going through a lexicon) specially? had a quick look at jsm_insert_large() and didnt see anything...
    semi-related: are there best practices around numeric field values? should I hand the byte representation to tdb?
    Ville Tuulos
    @tuulos
    Hi @cannedprimates - there's no special handling of small values. Would you need it for performance reasons?
    all values are byte blobs currently. No special handling for numeric field values. If you have floating point values and you don't need the full 64/32-bit accuracy, you can save space / increase performance by truncating values to the desired accuracy before inserting them
    Jakob Sievers
    @cannedprimates
    @tuulos thanks for the reply! no concrete need for it (yet :)), just curious
    Ville Tuulos
    @tuulos
    cool. Let me know if you have any other questions / feedback!
    donaherc
    @donaherc

    Hello! I've run into some intermittent issues reading from a handful of ~18MB files I've combined repeatedly with tdb_cons_add(). Have anyone seen any behavior that resembles this:

    ==15444== Invalid read of size 8
    ==15444==    at 0x4E3FD52: read_bits (tdb_bits.h:14)
    ==15444==    by 0x4E3FD52: read_bits64 (tdb_bits.h:38)
    ==15444==    by 0x4E3FD52: huff_decode_value (tdb_huffman.h:72)
    ==15444==    by 0x4E3FD52: _tdb_cursor_next_batch (tdb_decode.c:282)
    ==15444==    by 0x935C57: tdb_cursor_next (traildb.h:304)
    ==15444==    by 0x935C57: _cgo_4805fbb2d53a_Cfunc_tdb_cursor_next (cgo-gcc-prolog:222)
    ==15444==    by 0x46565F: runtime.asmcgocall (/usr/local/bin/go/src/runtime/asm_amd64.s:688)
    ==15444==    by 0xC4200928FF: ???
    ==15444==    by 0xB07CE87: ???
    ==15444==    by 0x460D81: runtime.(*mcache).nextFree.func1 (/usr/local/bin/go/src/runtime/malloc.go:556)
    ==15444==    by 0xC4201AABFF: ???
    ==15444==    by 0x43BB8F: ??? (/usr/local/bin/go/src/runtime/proc.go:1092)
    ==15444==  Address 0xe323ff9 is in a r-- mapped file /home/vagrant/app_files2/0157e8982def92b71fcc767d568e57883b86dba4298b66c2468127de0ef9c8cc segment
    ==15444== 
    fatal error: unexpected signal during runtime execution
    [signal SIGSEGV: segmentation violation code=0x1 addr=0xe324000 pc=0x4e3fd52]
    
    runtime stack:
    runtime.throw(0xb18c4c, 0x2a)
            /usr/local/bin/go/src/runtime/panic.go:616 +0x81
    runtime.sigpanic()
            /usr/local/bin/go/src/runtime/signal_unix.go:372 +0x28e
    
    goroutine 12 [syscall]:
    runtime.cgocall(0x935c00, 0xc42006ca10, ==15444== Use of uninitialised value of size 8
    ==15444==    at 0x438673: runtime.printhex (/usr/local/bin/go/src/runtime/print.go:219)
    ==15444==    by 0x45AA68: runtime.gentraceback (/usr/local/bin/go/src/runtime/traceback.go:406)
    ==15444==    by 0x45C4F8: runtime.traceback1 (/usr/local/bin/go/src/runtime/traceback.go:684)
    ==15444==    by 0x45C371: runtime.traceback (/usr/local/bin/go/src/runtime/traceback.go:645)
    ==15444==    by 0x45CF56: runtime.tracebackothers (/usr/local/bin/go/src/runtime/traceback.go:816)
    ==15444==    by 0x437B54: runtime.dopanic_m (/usr/local/bin/go/src/runtime/panic.go:736)
    ==15444==    by 0x46271B: runtime.dopanic.func1 (/usr/local/bin/go/src/runtime/panic.go:598)
    ==15444==    by 0x437479: runtime.dopanic (/usr/local/bin/go/src/runtime/panic.go:597)
    ==15444==    by 0x437550: runtime.throw (/usr/local/bin/go/src/runtime/panic.go:616)
    ==15444==    by 0x44CD7D: runtime.sigpanic (/usr/local/bin/go/src/runtime/signal_unix.go:372)
    ==15444==    by 0x4E3FD51: read_bits (tdb_bits.h:13)
    ==15444==    by 0x4E3FD51: read_bits64 (tdb_bits.h:38)
    ==15444==    by 0x4E3FD51: huff_decode_value (tdb_huffman.h:72)
    ==15444==    by 0x4E3FD51: _tdb_cursor_next_batch (tdb_decode.c:282)
    ==15444==    by 0x935C57: tdb_cursor_next (traildb.h:304)
    ==15444==    by 0x935C57: _cgo_4805fbb2d53a_Cfunc_tdb_cursor_next (cgo-gcc-prolog:222)
    ==15444== 
    ==15444== Conditional jump or move depends on uninitialised value(s)
    ==15444==    at 0x438685: runtime.printhex (/usr/local/bin/go/src/runtime/print.go:220)
    ==15444==    by 0x45AA68: runtime.gentraceback (/usr/local/bin/go/src/runtime/traceback.go:406)
    ==15444==    by 0x45C4F8: runtime.traceback1 (/usr/local/bin/go/src/runtime/traceback.go:684)
    ==15444==    by 0x45C371: runtime.traceback (/usr/local/bin/go/src/runtime/traceback.go:645)
    ==15444==    by 0x45CF56: runtime.tracebackothers (/usr/local/bin/go/src/runtime/traceback.go:816)
    ==15444==    by 0x437B54: runtime.dopanic_m (/usr/local/bin/go/src/runtime/panic.go:736)
    ==15444==    by 0x46271B: runtime.dopanic.func1 (/usr/local/bin/go/src/runtime/panic.go:598)
    ==15444==    by 0x437479: runtime.dopanic (/usr/local/bin/go/src/runtime/panic.go:597)
    ==15444==    by 0x437550: runtime.throw (/usr/local/bin/go/src/runtime/panic.go:616)
    ==15444==    by 0x44CD7D: runtime.sigpanic (/usr/local/bin/go/src/runtime/signal_unix.go:372)
    ==15444==    by 0x4E3FD51: read_bits (tdb_bits.h:13)
    ==15444==    by 0x4E3FD51: read_bits64 (tdb_bits.h:38)
    ==15444==    by 0x4E3FD51: huff_decode_value (tdb_huffman.h:72)

    I'm using the traildb-go bindings.

    Willing to provide more info if it'd help!
    donaherc
    @donaherc
    Having dug in more, I now suspect that the issue is that our vm.max_map_count settings on our hosts we use to tdb_cons_add were too low (they were at the default 65530). Have seen no issues after raising the setting
    donaherc
    @donaherc
    I believe we're still running into intermittent issues iterating through traildb files and also merging them using tdb_cons_append causing segfaults inside CGO, which forces a panic. Has anyone here used the traildb-go library and seen such behavior? Is it possible that undefined behavior with traildb file access would cause a panic inside CGO, but behave normally when handled with the C library directly?
    Ville Tuulos
    @tuulos
    could you try tdb merge on the command line with the same files to see if it still segfaults?
    it might be an issue with the Go bindings or (more unlikely), the C library itself
    donaherc
    @donaherc
    hello! yeah have been unable to reproduce with the tdbcli tools, although for a handful of the files we have seen intermittent segfaulting using the 'tdb index' . Some of the files that appear to be impacted have values north of 10k characters, which is pretty anomalous for the data we're storing. When pushing the traildb reads down into pure C we have seen no issues.
    Ross Wolf
    @rw-access

    hello! i saw --threads on the CLI help and am wondering what is made parallel?

    I know that tdb handles aren't thread safe but am thinking of ways to build something parallel and ordered on top of multiple tdb files and cursors within a single process. possibly a batching multi-multicursor? could that work, or is there a good chance that i'd run into other issues that i'm not thinking of? thanks!

    Ross Wolf
    @rw-access
    the more I think about it, the less sense that seems to make. for my use case, I expect that many of the underlying cursors will not return results. so I think carefully creating with something similar to tdb_multi_cursor_new but calling a version of tdb_multi_cursor_reset that threads the initial calls to tdb_cursor_peekmight actually do the trick (since many cursors will be exhausted right away). i'll have to see how much time is spent in tdb_multi_cursor_new vs tdb_multi_cursor_next
    Oleg Avdeev
    @oavdeev
    Looks like --threads is only used for indexing in tdbcli
    I'm not sure I 100% understand what do you mean by "parallel .. within a single process"?
    Since tdb is read only after you create it, a typical pattern is that you just have a db open in every thread, and a cursor, and split work between the threads based on uuid
    Ville Tuulos
    @tuulos
    right, like @oavdeev said - everything on the read-size can be parallelized by using independent handles and cursors on each thread
    Ross Wolf
    @rw-access
    yeah, I've just been brainstorming ways to do something similar to a multicursor but threaded. ideally I would want to hit tdb_multi_cursor_next or the batched version, but have the peeking for the underlying cursors be more parallelized.
    but that seems really tricky and I'd obviously to divvy up the tdb handles between threads.