Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Ville Tuulos
    @tuulos
    traildbs are immutable after finalization, so you can't add more events to an existing tdb. A typical pattern is to create many traildbs instead, e.g. one tdb file every hour / day. You can then optionally merge them together or query all of them using multi-cursors.
    yqylovy
    @yqylovy
    Hi ,I'm trying to use traidb to track user's daiy behavior .
    I want to create traildb file pre day.but if i want to focus on visitor level,it is hard to use . I saw multi-cursor ,but it seem to on event level, specially when the same visitor has action on different traildb files. Can anyone provide some suggestion ?
    yqylovy
    @yqylovy
    Is any open-source project use traildb .So I can get some production example?
    Ville Tuulos
    @tuulos
    hi @yqylovy - multi-cursor is implemented exactly for the use case of iterating over a single primary key (e.g. visitor id) in multiple files
    another option is to merge the daily tdbs to a single big tdb before analysis
    yqylovy
    @yqylovy

    Hi, @tuulos thinks for help!
    I had overviewed api in https://github.com/traildb/traildb/blob/master/doc/docs/api.md. these are tdb_multi_cursor_(new|free|reset) and tdb_multi_cursor_next and tdb_multi_cursor_next_batch.the *next* method is returning events.So if i terating over a single primary key,I have to merge it on visitor_id first,just like:

        visitormap = {}
        for(event in cursor){
            visitorid = event.key
            visitormap[visitorid] = append(visitormap[visitorid],event)
        }
    
        // THEN do work
        for((visitorid,events) in visitormap){
                   // DO SOME WORK
        }

    So I need to iter the data one time and hold it.It takes time and space.

    Is it a better way to do such thing ?

    Marius
    @marius-plv
    Hi Ville and thanks for the info. Based on your help, I am now evaluating the best approach for my use-case, in which I have frequent (1-5 seconds) write and read operations, spanned over large periods of time. So for the use-case when the merging of two traildbs is done more frequently (say a tiny traildb is merged frequently -each 1-5 seconds- into a larger traildb, which is MB to GB large), are there read/write performance penalties (IO and/or CPU) with this approach, which would rather make it not worthwhile in comparison with the other approach you suggested, where several traildbs are maintained instead?
    plainas
    @plainas
    Hey guys. Neat library! So I am looking for solutions to log user generated events, but IIUIC, traildb doesn't address the problem os storing the data when it's generated, rather focusing on building a queriable archive on already existent data. any suggestions on where to look instead? How do most people do? Use relational datbases and generate archives periodically?
    Marius
    @marius-plv
    Hi @plainas, I just started using traildb recently and wished to have the same storage capabilities you were describing above. There are probably several ways, it depends on your use-case and requirements. For my use-case, the solution was to open a small tmp DB and write the most recent events to it, then merge this tmp DB to the "main DB". Note: from my understanding, the tmp DB -as any traildb DB- can be queried only after the DB is "finalized"; and once finalized, one can only merge it into another DB. Alternatively, if immediate persistence is not required, one could place the tmp DB on a RAM disk to gain some speed (or cache the "most recent" events in RAM -and query them outside tdb- before persisitng them on disk in a DB). I wished tdb offered a way to store and query events which are not yet in a finalized DB. Still I am curious to learn how others approached the challenge.
    plainas
    @plainas
    @Marius-Plv what exactly is tmp db? a small traildb file that you write as events comming nad close after a while?
    Maybe an extra abstraction transparently wrapping both traildb and something else for the data on the tip maybe redis or something. It's a pitty... traildb looks really cool but the design choice of not including writing functionality is a big deal.
    Marius
    @marius-plv
    @plainas Yes, with "tmp DB" I meant a small, temporary DB where the most recent events are stored.
    kzarzycki
    @kzarzycki
    I know at least one system which authors made the same design decision for the storage format: Druid (druid.io). But still the system in the whole accept querying even the most recent data. The real-time ingestion tasks accept and collect rows on the (java) heap in a raw, not optimized format. Only after enough data has been collected, it builds a DB file (called segment in Druid parlance) and stores it on the disk. Before the persist data is queryable in the raw format. A simple java Map plays the role of this "temporary DB".. The number of rows kept in memory don't have to be large, it can be in 100K rows range not being that much of a pressure on memory. The small segments saved to disk are already queryable. After some time passed, ingestion task merges the small segments into large one and hands it over the finalized segment to query layer. I believe a system based on TrailDB could follow a similar approach. Can't wait to see if someone implements such a TrailDB-based database :)
    Ville Tuulos
    @tuulos
    yeah, buffering recent data in memory and periodically flushing to TrailDB has been a pattern we have had in mind
    the use cases I have been working with this far have been ok with hourly / daily data, so I haven't had need to implement in-memory caching
    plainas
    @plainas
    I see... but that requires a bit of engineering in itself. Transition from memory to storage needs to be bullet proof while not affecting the input flow of data in production environments. Maybe someone writes a solution that wraps this concept in another abstraction level.
    Marius
    @marius-plv
    @plainas I have some (C++) pet code which provides basis for this functionality (always write events in a tmp DB and merge it back with the "main" DB at caller's request; from here to storing in memory and dumping it when required is only a jump away). However I would require probably a few days to pull it outside of my framework and get some minor specifics out of the code. Plus a few more days to get code coverage + documentation to a sufficient level. But that would still be far from expected "bullet proof" / 1.0 at that stage, it would be rather a "shy" 0.0.1 :smile: Anyhow, I'm tempted to contribute this back to the comunity in the next 2-5 months, however cannot commit to it "right now".
    plainas
    @plainas
    cool :thumbsup:
    Marius
    @marius-plv
    So then this could be a starter for integrating this library with higher-level (micro-)services
    Franz Chen
    @Dendrimer
    Hey! I was trying to use the TrailDB Python bindings, it looks like the comments for TrailDBEventFilter have an error: it looks like conjunction and disjunction are swapped around in the comments.
    e.g: [[("job_title", "manager"), ("user", "george_jetson")]] -- Match records for the user "george_jetson" AND with job title "manager" should be be OR, and [[("job_title", "manager")], [("user", "george_jetson")]] -- Match records for the user "george_jetson" OR with job title "manager" should be AND
    Ville Tuulos
    @tuulos
    yeah, based on a quick look, that seems to be the case. Filters are expressed as conjunctive normal form queries, i.e. ANDs of ORs
    Matt Perpick
    @clutchski
    Hey all, does trail support numeric fields? e.g. query user=george AND age >=1.0
    Ville Tuulos
    @tuulos
    no, all fields are bytes
    you can implement a layer on top of core TrailDB that does something like that though
    Matt Perpick
    @clutchski
    ok thanks.
    Thomas P
    @ScullWM
    Hey! I was wondering how to use trailDB with a php micro-service env.
    So I've start a small Golang micro-service app to send events in it with a json format.
    Does it sound weird to you ?
    Ville Tuulos
    @tuulos
    hey, sorry for the delayed reply
    @ScullWM it doesn't sound weird :)
    Thomas P
    @ScullWM
    thanks @tuulos lot of great things in traildb :+1:
    Milan Opath
    @milancio42
    Hi Ville, I was playing with Traildb on Linux. I'd like to run the tests but I cannot figure out how. You mentioned ./coverage.py in tests directory in one of your previous messages, but I cannot find it. Thanks a lot.
    Ville Tuulos
    @tuulos
    Milan Opath
    @milancio42

    oh I should have mentioned it before - I tried to build traildb with waf, but it fails with StopIteration exeption.

    Traceback (most recent call last):
      File "/home/milan/Dev/traildb/.waf3-1.8.20-c859ca7dc3693011756f4edf45c36626/waflib/Node.py", line 312, in ant_iter
        raise StopIteration
    StopIteration
    
    The above exception was the direct cause of the following exception:
    
    Traceback (most recent call last):
      File "/home/milan/Dev/traildb/.waf3-1.8.20-c859ca7dc3693011756f4edf45c36626/waflib/Scripting.py", line 114, in waf_entry_point
        run_commands()
      File "/home/milan/Dev/traildb/.waf3-1.8.20-c859ca7dc3693011756f4edf45c36626/waflib/Scripting.py", line 171, in run_commands
        parse_options()
      File "/home/milan/Dev/traildb/.waf3-1.8.20-c859ca7dc3693011756f4edf45c36626/waflib/Scripting.py", line 144, in parse_options
        Context.create_context('options').execute()
      File "/home/milan/Dev/traildb/.waf3-1.8.20-c859ca7dc3693011756f4edf45c36626/waflib/Options.py", line 146, in execute
        super(OptionsContext,self).execute()
      File "/home/milan/Dev/traildb/.waf3-1.8.20-c859ca7dc3693011756f4edf45c36626/waflib/Context.py", line 93, in execute
        self.recurse([os.path.dirname(g_module.root_path)])
      File "/home/milan/Dev/traildb/.waf3-1.8.20-c859ca7dc3693011756f4edf45c36626/waflib/Context.py", line 134, in recurse
        user_function(self)
      File "/home/milan/Dev/traildb/wscript", line 57, in options
        opt.load("compiler_c")
      File "/home/milan/Dev/traildb/.waf3-1.8.20-c859ca7dc3693011756f4edf45c36626/waflib/Context.py", line 90, in load
        fun(self)
      File "/home/milan/Dev/traildb/.waf3-1.8.20-c859ca7dc3693011756f4edf45c36626/waflib/Tools/compiler_c.py", line 36, in options
        opt.load_special_tools('c_*.py',ban=['c_dumbpreproc.py'])
      File "/home/milan/Dev/traildb/.waf3-1.8.20-c859ca7dc3693011756f4edf45c36626/waflib/Context.py", line 321, in load_special_tools
        lst=self.root.find_node(waf_dir).find_node('waflib/extras').ant_glob(var)
      File "/home/milan/Dev/traildb/.waf3-1.8.20-c859ca7dc3693011756f4edf45c36626/waflib/Node.py", line 361, in ant_glob
        ret=[x for x in self.ant_iter(accept=accept,pats=[to_pat(incl),to_pat(excl)],maxdepth=kw.get('maxdepth',25),dir=dir,src=src,remove=kw.get('remove',True))]
      File "/home/milan/Dev/traildb/.waf3-1.8.20-c859ca7dc3693011756f4edf45c36626/waflib/Node.py", line 361, in <listcomp>
        ret=[x for x in self.ant_iter(accept=accept,pats=[to_pat(incl),to_pat(excl)],maxdepth=kw.get('maxdepth',25),dir=dir,src=src,remove=kw.get('remove',True))]
    RuntimeError: generator raised StopIteration

    So I've built it with autotools and was looking for a way to run tests with it.
    But if waf is the only way to run tests, I'll try to debug it.
    Thank you.

    Milan Opath
    @milancio42
    Ok, waf 1.8.20 does not work with python 3.7. Used waf 2.0.10 instead and it worked like a charm.
    Ville Tuulos
    @tuulos
    oh, interesting
    I haven't tried it with Py3.7 yet
    Jakob Sievers
    @cannedprimates
    does tdb handle small field values (ie values that would fit into an item directly without going through a lexicon) specially? had a quick look at jsm_insert_large() and didnt see anything...
    semi-related: are there best practices around numeric field values? should I hand the byte representation to tdb?
    Ville Tuulos
    @tuulos
    Hi @cannedprimates - there's no special handling of small values. Would you need it for performance reasons?
    all values are byte blobs currently. No special handling for numeric field values. If you have floating point values and you don't need the full 64/32-bit accuracy, you can save space / increase performance by truncating values to the desired accuracy before inserting them
    Jakob Sievers
    @cannedprimates
    @tuulos thanks for the reply! no concrete need for it (yet :)), just curious
    Ville Tuulos
    @tuulos
    cool. Let me know if you have any other questions / feedback!
    donaherc
    @donaherc

    Hello! I've run into some intermittent issues reading from a handful of ~18MB files I've combined repeatedly with tdb_cons_add(). Have anyone seen any behavior that resembles this:

    ==15444== Invalid read of size 8
    ==15444==    at 0x4E3FD52: read_bits (tdb_bits.h:14)
    ==15444==    by 0x4E3FD52: read_bits64 (tdb_bits.h:38)
    ==15444==    by 0x4E3FD52: huff_decode_value (tdb_huffman.h:72)
    ==15444==    by 0x4E3FD52: _tdb_cursor_next_batch (tdb_decode.c:282)
    ==15444==    by 0x935C57: tdb_cursor_next (traildb.h:304)
    ==15444==    by 0x935C57: _cgo_4805fbb2d53a_Cfunc_tdb_cursor_next (cgo-gcc-prolog:222)
    ==15444==    by 0x46565F: runtime.asmcgocall (/usr/local/bin/go/src/runtime/asm_amd64.s:688)
    ==15444==    by 0xC4200928FF: ???
    ==15444==    by 0xB07CE87: ???
    ==15444==    by 0x460D81: runtime.(*mcache).nextFree.func1 (/usr/local/bin/go/src/runtime/malloc.go:556)
    ==15444==    by 0xC4201AABFF: ???
    ==15444==    by 0x43BB8F: ??? (/usr/local/bin/go/src/runtime/proc.go:1092)
    ==15444==  Address 0xe323ff9 is in a r-- mapped file /home/vagrant/app_files2/0157e8982def92b71fcc767d568e57883b86dba4298b66c2468127de0ef9c8cc segment
    ==15444== 
    fatal error: unexpected signal during runtime execution
    [signal SIGSEGV: segmentation violation code=0x1 addr=0xe324000 pc=0x4e3fd52]
    
    runtime stack:
    runtime.throw(0xb18c4c, 0x2a)
            /usr/local/bin/go/src/runtime/panic.go:616 +0x81
    runtime.sigpanic()
            /usr/local/bin/go/src/runtime/signal_unix.go:372 +0x28e
    
    goroutine 12 [syscall]:
    runtime.cgocall(0x935c00, 0xc42006ca10, ==15444== Use of uninitialised value of size 8
    ==15444==    at 0x438673: runtime.printhex (/usr/local/bin/go/src/runtime/print.go:219)
    ==15444==    by 0x45AA68: runtime.gentraceback (/usr/local/bin/go/src/runtime/traceback.go:406)
    ==15444==    by 0x45C4F8: runtime.traceback1 (/usr/local/bin/go/src/runtime/traceback.go:684)
    ==15444==    by 0x45C371: runtime.traceback (/usr/local/bin/go/src/runtime/traceback.go:645)
    ==15444==    by 0x45CF56: runtime.tracebackothers (/usr/local/bin/go/src/runtime/traceback.go:816)
    ==15444==    by 0x437B54: runtime.dopanic_m (/usr/local/bin/go/src/runtime/panic.go:736)
    ==15444==    by 0x46271B: runtime.dopanic.func1 (/usr/local/bin/go/src/runtime/panic.go:598)
    ==15444==    by 0x437479: runtime.dopanic (/usr/local/bin/go/src/runtime/panic.go:597)
    ==15444==    by 0x437550: runtime.throw (/usr/local/bin/go/src/runtime/panic.go:616)
    ==15444==    by 0x44CD7D: runtime.sigpanic (/usr/local/bin/go/src/runtime/signal_unix.go:372)
    ==15444==    by 0x4E3FD51: read_bits (tdb_bits.h:13)
    ==15444==    by 0x4E3FD51: read_bits64 (tdb_bits.h:38)
    ==15444==    by 0x4E3FD51: huff_decode_value (tdb_huffman.h:72)
    ==15444==    by 0x4E3FD51: _tdb_cursor_next_batch (tdb_decode.c:282)
    ==15444==    by 0x935C57: tdb_cursor_next (traildb.h:304)
    ==15444==    by 0x935C57: _cgo_4805fbb2d53a_Cfunc_tdb_cursor_next (cgo-gcc-prolog:222)
    ==15444== 
    ==15444== Conditional jump or move depends on uninitialised value(s)
    ==15444==    at 0x438685: runtime.printhex (/usr/local/bin/go/src/runtime/print.go:220)
    ==15444==    by 0x45AA68: runtime.gentraceback (/usr/local/bin/go/src/runtime/traceback.go:406)
    ==15444==    by 0x45C4F8: runtime.traceback1 (/usr/local/bin/go/src/runtime/traceback.go:684)
    ==15444==    by 0x45C371: runtime.traceback (/usr/local/bin/go/src/runtime/traceback.go:645)
    ==15444==    by 0x45CF56: runtime.tracebackothers (/usr/local/bin/go/src/runtime/traceback.go:816)
    ==15444==    by 0x437B54: runtime.dopanic_m (/usr/local/bin/go/src/runtime/panic.go:736)
    ==15444==    by 0x46271B: runtime.dopanic.func1 (/usr/local/bin/go/src/runtime/panic.go:598)
    ==15444==    by 0x437479: runtime.dopanic (/usr/local/bin/go/src/runtime/panic.go:597)
    ==15444==    by 0x437550: runtime.throw (/usr/local/bin/go/src/runtime/panic.go:616)
    ==15444==    by 0x44CD7D: runtime.sigpanic (/usr/local/bin/go/src/runtime/signal_unix.go:372)
    ==15444==    by 0x4E3FD51: read_bits (tdb_bits.h:13)
    ==15444==    by 0x4E3FD51: read_bits64 (tdb_bits.h:38)
    ==15444==    by 0x4E3FD51: huff_decode_value (tdb_huffman.h:72)

    I'm using the traildb-go bindings.

    Willing to provide more info if it'd help!
    donaherc
    @donaherc
    Having dug in more, I now suspect that the issue is that our vm.max_map_count settings on our hosts we use to tdb_cons_add were too low (they were at the default 65530). Have seen no issues after raising the setting
    donaherc
    @donaherc
    I believe we're still running into intermittent issues iterating through traildb files and also merging them using tdb_cons_append causing segfaults inside CGO, which forces a panic. Has anyone here used the traildb-go library and seen such behavior? Is it possible that undefined behavior with traildb file access would cause a panic inside CGO, but behave normally when handled with the C library directly?
    Ville Tuulos
    @tuulos
    could you try tdb merge on the command line with the same files to see if it still segfaults?