traildb.io - do more with your event data with less overhead
People
Repo info
Activity
Suminda Sirinath Salpitikorala Dharmasena
@sirinath
Any chance of JVM and .Net bindings?
Ville Tuulos
@tuulos
JVM/.Net bindings would be great - I am not aware of anyone working on them
snakescott
@snakescott
Are there best practices for handling/joining complex relational logic (e.g., deducing a user's company from their email, then checking if the company uses any AWS services) against events stored in TrailsDB?
Also, is there a recommended workflow for merging users, for example in the case of cross-device targeting you may think you have two separate people but realize later (after events have been processed into trails) that they are the laptop and tablet of the same person
Mikko Juola
@Noeda
for cross-device you might want to use the multicursor feature
if you know user A and user B are the same people you can create a multicursor from two cursor where first cursor is on user A and second is on user B
and the multicursor will iterate the events out of TrailDB in correct order
this doesn't apply at TrailDB build-time though...but it's one possible workflow
@snakescott ^^
snakescott
@snakescott
thanks!
Zachary Schneirov
@scrod
hi all, is traildb officially supporting i386 builds?
Ville Tuulos
@tuulos
hi! not currently
there are quite a number of things that assume 64-bit word-width
Zachary Schneirov
@scrod
128-bit integers, which are only available on 64-bit, are used in many places
Ville Tuulos
@tuulos
right, that's one issue
Zachary Schneirov
@scrod
alright, thanks anyway, just wanted to check
Ville Tuulos
@tuulos
cool
Ritchie J. Latimore
@rijalati
so I'm not having any luck building judy from source without getting the message that it's broken, even when I grab the latest debian source and apply all the patches. does anyone have a working judy source tree they can point me to? I was wanting to package up traildb for the Arch Linux AUR.
Ville Tuulos
@tuulos
at least all recent sources from Debian/Ubuntu should work - they have the right patches applied
Arch Linux doesn't have a fixed package for Judy arrays already?
Ritchie J. Latimore
@rijalati
no the current version in the AUR also returns the error. also, I was applying the debian patches to the CVS source, which may have gone wrong somewhere along the way, I'm going to try extracting a debian source .deb and build that, see if I get anywhere.
Ville Tuulos
@tuulos
yeah, the patch fixing the issue should be quite small
there are some mirrors of the old Judy repo that's in SourceForge on GitHub but based on a quick look I couldn't see if any of them include the patch(es)
maybe we should host a fixed repo after all..
_
Yan Lu
@luyanrock
Hi all, I am researching to store and query our events data using traildb(right now we are using mongodb), I have a question about when there are multiple servers involved:
our web application is running in multiple servers under load balancer, but seems like traildb doesn't have a remote write option, which means I need to write events from each server separately and merge later on. Is my understanding correct?
Mikko Juola
@Noeda
yes, that is correct
you could have servers write every 1 hour or so, and then merge the events regularly to one big traildb
Yan Lu
@luyanrock
Thanks @Noeda, I am also thinking of routing events write using async task tool(like celery) to one single server, will traildb be able to handle frequent event writing? saw you mentioned write every 1 hour, does that mean it's better to have a WAL?
Yan Lu
@luyanrock
nvm my previous question, I wasn't clear about the immutability nature of traildb. thanks for your help.
Knut Nesheim
@knutin
I just published a new version of the Rust bindings to https://crates.io/crates/traildb There's some bug fixes and additions, most notably of multicursors.
I know from reading docs TrailDB was designed for a specific use case and that's why the data model and API exist as they do. That said, I've always found it a bit surprising that TrailDB stores data row-major rather than column-major, and was wondering whether TrailDB devs had any thoughts on the feasibility or utility of a TrailDB-alike built on top of Apache Arrow / parquet-cpp / etc?
also congrats on shipping 0.6!
Ville Tuulos
@tuulos
the layout depends on the type of scan you want to optimize for. TrailDB is optimized for scanning all events related to a user
thanks!
snakescott
@snakescott
@tuulos does grouping by user just resolve to sorting?
Ville Tuulos
@tuulos
what do you mean?
snakescott
@snakescott
if you write events to parquet sorted by an appropriate key -- perhaps (user id, timestamp) -- doesn't that give you an optimized way to scan for events related to a user
or maybe another way to put this is it seems like row vs column major is orthogonal to scan optimization?
Ville Tuulos
@tuulos
in row-major you have fields of a row adjacent to each other vs. values of a field adjacent to each other in column-major
if you want to scan over all fields related to a set of adjacent rows, row-major is more efficient
snakescott
@snakescott
ah, I guess that's the crux!
I assumed that TrailDB queries would look similar to analytic queries on systems like RedShift/etc
obviously they aren't SQL
but in terms of how many fields are used per query
especially since events -- to be concrete, say web analytic events -- can have lots of fields?
Ville Tuulos
@tuulos
yeah, TrailDB optimizes for select * from users where user_id=X vs. select aggregate(field) from users that would be more efficient with a columnar layout
snakescott
@snakescott
is this for simplicity, or did you expect to be working with funnel queries which cover lots of fields?