Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Zachary Schneirov
    @scrod
    alright, thanks anyway, just wanted to check
    Ville Tuulos
    @tuulos
    cool
    Ritchie J. Latimore
    @rijalati
    so I'm not having any luck building judy from source without getting the message that it's broken, even when I grab the latest debian source and apply all the patches. does anyone have a working judy source tree they can point me to? I was wanting to package up traildb for the Arch Linux AUR.
    Ville Tuulos
    @tuulos
    at least all recent sources from Debian/Ubuntu should work - they have the right patches applied
    Arch Linux doesn't have a fixed package for Judy arrays already?
    Ritchie J. Latimore
    @rijalati
    no the current version in the AUR also returns the error. also, I was applying the debian patches to the CVS source, which may have gone wrong somewhere along the way, I'm going to try extracting a debian source .deb and build that, see if I get anywhere.
    Ville Tuulos
    @tuulos
    yeah, the patch fixing the issue should be quite small
    there are some mirrors of the old Judy repo that's in SourceForge on GitHub but based on a quick look I couldn't see if any of them include the patch(es)
    maybe we should host a fixed repo after all..
    Yan Lu
    @luyanrock
    Hi all, I am researching to store and query our events data using traildb(right now we are using mongodb), I have a question about when there are multiple servers involved:
    our web application is running in multiple servers under load balancer, but seems like traildb doesn't have a remote write option, which means I need to write events from each server separately and merge later on. Is my understanding correct?
    Mikko Juola
    @Noeda
    yes, that is correct
    you could have servers write every 1 hour or so, and then merge the events regularly to one big traildb
    Yan Lu
    @luyanrock
    Thanks @Noeda, I am also thinking of routing events write using async task tool(like celery) to one single server, will traildb be able to handle frequent event writing? saw you mentioned write every 1 hour, does that mean it's better to have a WAL?
    Yan Lu
    @luyanrock
    nvm my previous question, I wasn't clear about the immutability nature of traildb. thanks for your help.
    Knut Nesheim
    @knutin
    I just published a new version of the Rust bindings to https://crates.io/crates/traildb There's some bug fixes and additions, most notably of multicursors.
    Ville Tuulos
    @tuulos
    awesome, @knutin !
    Ville Tuulos
    @tuulos
    TrailDB 0.6 released finally! Read all about it here http://tech.adroll.com/blog/data/2017/05/15/traildb-0.6.html
    snakescott
    @snakescott
    I know from reading docs TrailDB was designed for a specific use case and that's why the data model and API exist as they do. That said, I've always found it a bit surprising that TrailDB stores data row-major rather than column-major, and was wondering whether TrailDB devs had any thoughts on the feasibility or utility of a TrailDB-alike built on top of Apache Arrow / parquet-cpp / etc?
    also congrats on shipping 0.6!
    Ville Tuulos
    @tuulos
    the layout depends on the type of scan you want to optimize for. TrailDB is optimized for scanning all events related to a user
    thanks!
    snakescott
    @snakescott
    @tuulos does grouping by user just resolve to sorting?
    Ville Tuulos
    @tuulos
    what do you mean?
    snakescott
    @snakescott
    if you write events to parquet sorted by an appropriate key -- perhaps (user id, timestamp) -- doesn't that give you an optimized way to scan for events related to a user
    or maybe another way to put this is it seems like row vs column major is orthogonal to scan optimization?
    Ville Tuulos
    @tuulos
    in row-major you have fields of a row adjacent to each other vs. values of a field adjacent to each other in column-major
    if you want to scan over all fields related to a set of adjacent rows, row-major is more efficient
    snakescott
    @snakescott
    ah, I guess that's the crux!
    I assumed that TrailDB queries would look similar to analytic queries on systems like RedShift/etc
    obviously they aren't SQL
    but in terms of how many fields are used per query
    especially since events -- to be concrete, say web analytic events -- can have lots of fields?
    Ville Tuulos
    @tuulos
    yeah, TrailDB optimizes for select * from users where user_id=X vs. select aggregate(field) from users that would be more efficient with a columnar layout
    snakescott
    @snakescott
    is this for simplicity, or did you expect to be working with funnel queries which cover lots of fields?
    I feel like I got a lot of insight into TrailDB design from the code, docs, and presentations (thanks!), but not on this point -- I might have missed a resource though.
    Ville Tuulos
    @tuulos
    lots of fields (say, hundreds or thousands of fields) works fine especially if not all fields are populated. TrailDB handles sparse data like that well
    if you have lots of non-empty fields but you care only about a tiny subset of them, it is more inefficient
    one approach to optimize that use case is to partition data by field and use multi-cursors to join subsets of fields on the fly
    great questions!
    snakescott
    @snakescott
    interesting, thanks
    I may try to find some way to remix arrow, parquet, and traildb and see what happens!
    Ville Tuulos
    @tuulos
    awesome. Please share your experiences here :)
    Yassine Marzougui
    @ymarzougui
    Hi! Thanks for the great tool!
    Would it be possible to add support for multi-cursors in the python bindings?
    Knut Nesheim
    @knutin
    I just pushed 0.4.0 of traildb-rs which now has support for event filters! :D
    Ville Tuulos
    @tuulos
    @ymarzougui yes, we should. Could you open a ticket about it at https://github.com/traildb/traildb-python/issues thanks!
    Yassine Marzougui
    @ymarzougui
    Great, thanks! I opened the ticket.
    Ville Tuulos
    @tuulos
    thanks!
    Vladimir Makhaev
    @vmakhaev
    hi, guys. I want to play with TrailDB and golang, but met problem with building traildb-go. Here is ticket traildb/traildb-go#9 If you can give any clue, would be awesome
    vladkluev
    @vladkluev
    Just a shot in the dark but I had some trouble with the Go extensions at one point and it turned out that I didn't have the most recent traildb installed