Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    snakescott
    @snakescott
    interesting, thanks
    I may try to find some way to remix arrow, parquet, and traildb and see what happens!
    Ville Tuulos
    @tuulos
    awesome. Please share your experiences here :)
    Yassine Marzougui
    @ymarzougui
    Hi! Thanks for the great tool!
    Would it be possible to add support for multi-cursors in the python bindings?
    Knut Nesheim
    @knutin
    I just pushed 0.4.0 of traildb-rs which now has support for event filters! :D
    Ville Tuulos
    @tuulos
    @ymarzougui yes, we should. Could you open a ticket about it at https://github.com/traildb/traildb-python/issues thanks!
    Yassine Marzougui
    @ymarzougui
    Great, thanks! I opened the ticket.
    Ville Tuulos
    @tuulos
    thanks!
    Vladimir Makhaev
    @vmakhaev
    hi, guys. I want to play with TrailDB and golang, but met problem with building traildb-go. Here is ticket traildb/traildb-go#9 If you can give any clue, would be awesome
    vladkluev
    @vladkluev
    Just a shot in the dark but I had some trouble with the Go extensions at one point and it turned out that I didn't have the most recent traildb installed
    Vladimir Makhaev
    @vmakhaev
    thanks, but traildb is most recent
    Ville Tuulos
    @tuulos
    let me check
    Ville Tuulos
    @tuulos
    @vmakhaev go build seems to work ok with Go 1.7 on Linux. I am trying to reproduce with your setup using 1.8 on OS X
    Ville Tuulos
    @tuulos
    yeah, I can reproduce the issue with 1.8
    we will fix it for 1.8 but meanwhile, if possible, using 1.7 should be a workaround
    Vladimir Makhaev
    @vmakhaev
    @tuulos works with Go 1.7. thanks
    Vladimir Makhaev
    @vmakhaev
    it almost works, except of db creation part traildb/traildb-go#10
    Ville Tuulos
    @tuulos
    hmm, strange. I'll try to reproduce the issue
    we are using the Go bindings in production to produce tdbs so maybe there's something special about this case
    Vladimir Makhaev
    @vmakhaev
    created another couple of issues: traildb/traildb-go#11 and traildb/traildb-go#12. I think #11 could be fixed by upgrading to golang 1.8.1.
    Ville Tuulos
    @tuulos
    thanks @vmakhaev ! I wrote new code using the Go binding just yesterday without trouble on Linux. It seems like most/all of those issues might be related to OS X, which has got less testing
    thanks for reporting
    it should be easy to fix them
    rhymes
    @rhymes
    I have just started playing with TrailDB so far (I'm playing with a tdb containing 62 million events) and Python. Just out of curiosity, is there any real performance difference in constructing a TDB file between Python and Go bindings?
    rhymes
    @rhymes
    BTW I actually wrote a "create trail db" script both in Python and in Go. They both go through a huge CSV file (Python's has 69 261 656 rows, Go's CSV has 69 755 429 rows). Python took 3h5m20 seconds to create the traildb file. Go took 1h31m25s.
    Then I wrote a "query trail db" script with an easy filter: field_1 = value AND (field_2 = value OR field_2 = value), which is basically one of the conditions of the SQL queries we use. Wrote the script in Python and it took 14.36s, rewrote it in Go and it took 5.75s
    All sequential, no optimizations, also my first time writing code in Go so I'm sure it can be better :D
    Ville Tuulos
    @tuulos
    I have seen the Go binding being up to 6x faster than Python
    and Go programs tend to be really straightforward to parallelize over multiple cores for added benefit
    rhymes
    @rhymes
    Yeah, it's not hard for me to believe that.
    Ville Tuulos
    @tuulos
    in the same benchmark, coincidentally C was also about 6x faster than Go
    but a simple multicore version of the Go program beated a single-core C
    rhymes
    @rhymes
    Yeah, I'd definitely use Go if we decide to use traildb.
    I was trying to use trck today but I failed traildb/trck#11 - It probably has something to do with me using OSX for trck.
    Ville Tuulos
    @tuulos
    @oavdeev might be able to help with that
    Marius
    @marius-plv

    Dear all, first a BIG thank you for opening up this project to the world, pushing it forward and providing good project documentation and groups like this one for new users - would like to see more projects like traildb out there :)

    My question is related to optimizing retrieval of consecutive events. I would like to efficiently retrieve all the events stored in one trail, between time interval [t1, t2].
    Let's assume first the simple use-case, where no traildb filter is configured on the cursor.
    The "data" (that gets stored with each "event") has always the same length.
    The closest API to achieve this I've seen is tdb_multi_cursor_next_batch().
    Is this the fastest traildb API to retrieve events stored in a given time range?

    Actually tangent to the above: just now I saw here some notes about a great feature: tdb indexing (created using 'tdb index -i my-tdb').
    This could speed such queries quite a bit, if I understood it correctly. Does this indexing operation apply to events as well and it is safe to run in parallel with "read" operations (cursors performing read operations from different processes)?

    Kind regards,
    Marius

    Ville Tuulos
    @tuulos
    hi @Marius-Plv
    re: retrieve all the events in one trail within a time interval: The fastest way should be to set a time filter and then use a cursor as usual. In C, there shouldn't be a huge difference between retrieving events one by one using tdb_cursor_next() vs. many events in a batch with tdb_multi_cursor_next_batch(). The batch mode tends to be much faster if using a language binding
    The index is designed to speed up queries. It is an independent feature from the standard API and the default TrailDB API doesn't use the index yet. The tdb command line tool does and you can use it in your own apps too
    since it is an independent feature, you can index in parallel with other reads
    Marius
    @marius-plv
    Thank you, Ville! I think I'll stick to the tdb_cursor_next() approach over a time filtered cursor. The index sounds very appealing to use, although I would resist to use it for now, not beying available with the standard API (and calling tdb in another process is a pain). But would consider the approach from tdb command line for the 2nd iteration - or a hybrid one in which events are streamed from tdb through a "message bus" to my application - this would also be feasible. Best regards, Marius
    Ville Tuulos
    @tuulos
    ok. Don't hesitate to ask if you have any other questions!
    Raunak Ramakrishnan
    @rrampage
    Are there any more examples in D? I built the traildb-d repo and ran a small example with dub.
    also, is there a way of using ldc for compiling the code? When I try it, I get an error about /usr/bin/ld: .dub/obj/TrailDB.o: relocation R_X86_64_32 against symbol_D9Exception7__ClassZ' can not be used when making a shared object; recompile with -fPIC`
    Ville Tuulos
    @tuulos
    @rrampage let me try to summon some people who know about the D bindings
    Lawrence Christopher Evans
    @lcevans
    Hey @rrampage, I'm not the person who wrote the TrailDB D bindings though I have used them in our other private company repos. I can't share those repos... but if you have questions on how to use the interface I can help. I've only used dmd and don't know much about ldc. It looks like a linking issue -- this tutorial explains linking and the need for the -fPIC flag: http://www.cprogramming.com/tutorial/shared-libraries-linux-gcc.html. dmd calls gcc and perhaps supplies the -fPIC flag automatically, while perhaps ldc does not?
    vladkluev
    @vladkluev
    Hey yall, not sure if you know but the travis build is failing rn, looks like a waf issue
    https://travis-ci.org/traildb/traildb
    Ville Tuulos
    @tuulos
    thanks, @vladkluev - I will take a look. Seems like a simple config issue. I wonder why it broke
    Raunak Ramakrishnan
    @rrampage
    @lcevans an example on merging tdbs will be really helpful. Also, in your company do you query the tdbs using trck or C API?
    Lawrence Christopher Evans
    @lcevans
    @rrampage My company (AdRoll) uses tdbs in a variety of places, both trck and the C API. But the project I am most familiar with uses the C API directly via the D bindings. In this project we iterate over 30 days of tdbs (each holding one day of data) keeping track of per-cookie information in memory with D associative arrays... so in particular we don't merge tdbs (side note: This task was set up in the early days of tdb so if you want to do something like this you should try trck first). I don't have familiarity with merging tdbs, and I don't believe there is a D binding for the merging functions. But it should be possible to add a D binding for the relevant C functions -- if you end up doing so you're welcome to add them via PR to the traildb-d repo :)