Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Oleg Avdeev
    @oavdeev
    it is just you can't really call it from windows apps
    i ran some simple tests manually but not waf tests (forgot about these)
    Linus Atorf (MMI)
    @atorfmmi
    thanks for all the hints and facts!
    Jason Brown
    @leanvertising_twitter

    So I am new and collecting some info and I had a question. Adroll stores their trailDBs as 24 hour files on S3. So 1 user could be on hundreds of trailDBs for a time range ( lets say last 90 days ).

    How are you pulling this? Just a loop over 90 known trailDB locations and searching for that userid?

    Also, any time span for PHP bindings or Perl?

    Ville Tuulos
    @tuulos
    we shard traildbs by customer account and in some cases by uuid
    in the latter case, if you were interested in a specific uuid, you could download all shards corresponding to the UUID locally, create a cursor for each tdb, use tdb_get_trail_idto reset each cursor to the UUID, and use the new multi-cursor to conveniently iterate over the full trail corresponding to the user https://github.com/tuulos/traildb/blob/multi-cursor/src/traildb.h#L225
    this is roughly what we do now
    @gregory-nisbet mentioned he could work on the Perl bindings - I haven't checked the status of the PHP bindings that someone started to implement
    gregn-adroll
    @gregn-adroll
    Perl is happening right after I finish OCaml and learn a little more about XS. As for the PHP thing I have no idea who's working on that already or how much work is involved in making a C-extension for PHP.
    Ville Tuulos
    @tuulos
    FYI - multi-cursors were just merged to master. They provide a convenient way to stitch together multiple TrailDBs. For more info, see http://traildb.io/docs/api/#join-trails-with-multi-cursors and http://traildb.io/docs/technical_overview/#join-trails-over-multiple-traildbs
    Suminda Sirinath Salpitikorala Dharmasena
    @sirinath
    Any chance of JVM and .Net bindings?
    Ville Tuulos
    @tuulos
    JVM/.Net bindings would be great - I am not aware of anyone working on them
    snakescott
    @snakescott
    Are there best practices for handling/joining complex relational logic (e.g., deducing a user's company from their email, then checking if the company uses any AWS services) against events stored in TrailsDB?
    Also, is there a recommended workflow for merging users, for example in the case of cross-device targeting you may think you have two separate people but realize later (after events have been processed into trails) that they are the laptop and tablet of the same person
    Mikko Juola
    @Noeda
    for cross-device you might want to use the multicursor feature
    if you know user A and user B are the same people you can create a multicursor from two cursor where first cursor is on user A and second is on user B
    and the multicursor will iterate the events out of TrailDB in correct order
    this doesn't apply at TrailDB build-time though...but it's one possible workflow
    @snakescott ^^
    snakescott
    @snakescott
    thanks!
    Zachary Schneirov
    @scrod
    hi all, is traildb officially supporting i386 builds?
    Ville Tuulos
    @tuulos
    hi! not currently
    there are quite a number of things that assume 64-bit word-width
    Zachary Schneirov
    @scrod
    128-bit integers, which are only available on 64-bit, are used in many places
    Ville Tuulos
    @tuulos
    right, that's one issue
    Zachary Schneirov
    @scrod
    alright, thanks anyway, just wanted to check
    Ville Tuulos
    @tuulos
    cool
    Ritchie J. Latimore
    @rijalati
    so I'm not having any luck building judy from source without getting the message that it's broken, even when I grab the latest debian source and apply all the patches. does anyone have a working judy source tree they can point me to? I was wanting to package up traildb for the Arch Linux AUR.
    Ville Tuulos
    @tuulos
    at least all recent sources from Debian/Ubuntu should work - they have the right patches applied
    Arch Linux doesn't have a fixed package for Judy arrays already?
    Ritchie J. Latimore
    @rijalati
    no the current version in the AUR also returns the error. also, I was applying the debian patches to the CVS source, which may have gone wrong somewhere along the way, I'm going to try extracting a debian source .deb and build that, see if I get anywhere.
    Ville Tuulos
    @tuulos
    yeah, the patch fixing the issue should be quite small
    there are some mirrors of the old Judy repo that's in SourceForge on GitHub but based on a quick look I couldn't see if any of them include the patch(es)
    maybe we should host a fixed repo after all..
    Yan Lu
    @luyanrock
    Hi all, I am researching to store and query our events data using traildb(right now we are using mongodb), I have a question about when there are multiple servers involved:
    our web application is running in multiple servers under load balancer, but seems like traildb doesn't have a remote write option, which means I need to write events from each server separately and merge later on. Is my understanding correct?
    Mikko Juola
    @Noeda
    yes, that is correct
    you could have servers write every 1 hour or so, and then merge the events regularly to one big traildb
    Yan Lu
    @luyanrock
    Thanks @Noeda, I am also thinking of routing events write using async task tool(like celery) to one single server, will traildb be able to handle frequent event writing? saw you mentioned write every 1 hour, does that mean it's better to have a WAL?
    Yan Lu
    @luyanrock
    nvm my previous question, I wasn't clear about the immutability nature of traildb. thanks for your help.
    Knut Nesheim
    @knutin
    I just published a new version of the Rust bindings to https://crates.io/crates/traildb There's some bug fixes and additions, most notably of multicursors.
    Ville Tuulos
    @tuulos
    awesome, @knutin !
    Ville Tuulos
    @tuulos
    TrailDB 0.6 released finally! Read all about it here http://tech.adroll.com/blog/data/2017/05/15/traildb-0.6.html
    snakescott
    @snakescott
    I know from reading docs TrailDB was designed for a specific use case and that's why the data model and API exist as they do. That said, I've always found it a bit surprising that TrailDB stores data row-major rather than column-major, and was wondering whether TrailDB devs had any thoughts on the feasibility or utility of a TrailDB-alike built on top of Apache Arrow / parquet-cpp / etc?
    also congrats on shipping 0.6!
    Ville Tuulos
    @tuulos
    the layout depends on the type of scan you want to optimize for. TrailDB is optimized for scanning all events related to a user
    thanks!
    snakescott
    @snakescott
    @tuulos does grouping by user just resolve to sorting?
    Ville Tuulos
    @tuulos
    what do you mean?
    snakescott
    @snakescott
    if you write events to parquet sorted by an appropriate key -- perhaps (user id, timestamp) -- doesn't that give you an optimized way to scan for events related to a user