These are chat archives for influxdata/influxdb

27th
Apr 2016
Rüdiger Klaehn
@rklaehn
Apr 27 2016 15:50
Is there some more detailed documentation about the data model? I read https://docs.influxdata.com/influxdb/v0.12/concepts/schema_and_data_layout/#page-title , but I would like to have some more detail, especially about the performance and memory impact of tags.
I read this blog post: http://puyuan.github.io/influxdb-tag-cardinality-memory-performance , so now I wonder if it is safe at all to use tags with any dynamic data (hostname, user-agent etc.)
Connor Peet
@connor4312
Apr 27 2016 16:01
Did these benchmarks on 0.10 on Windows:
- 1m records with a 100k cardinality = 26 MB RAM
- 10m records with 1m cardinality = 230 MB RAM
- 100m records with 10m cardinality = ~6GB RAM
(cardinality == number of unique tags)
@rklaehn
Rüdiger Klaehn
@rklaehn
Apr 27 2016 16:03
How is cardinality calculated. Product of number of different values per tag over all time?
Connor Peet
@connor4312
Apr 27 2016 16:03
In all of these I added 10 records for each unique tag
This was a while ago so I don't have the script I used, but it should be pretty easily to whip up your own test
Rüdiger Klaehn
@rklaehn
Apr 27 2016 16:04
E.g. if you have something like hostname or user agent, it could grow to a very large number of distinct values over time. I guess something like session id does not belong in a tag in the first place.
Connor Peet
@connor4312
Apr 27 2016 16:05
Well, it depends on your workload and what kind of machine you're running. If you read frequently, you'll get way better read performance if session ID is a tag (assuming you read per session ID).
Rüdiger Klaehn
@rklaehn
Apr 27 2016 16:06
But isn't the cardinality of a sessionid basically unbounded?
Connor Peet
@connor4312
Apr 27 2016 16:07
Yea, though the number you'll actually have to store is dependent on how many users you have and what your retention policy is
Rüdiger Klaehn
@rklaehn
Apr 27 2016 16:08
That is why I am interested in how cardinality is calculated. E.g. the number of sessions per day might be totally acceptable, but not for a year.
But thanks a lot for your info. The bottom line is that at the current data rate I don't have to worry too much.
Connor Peet
@connor4312
Apr 27 2016 16:10
You could keep them in a series with a one day RP and tag by their ID, and downsample into a series where it isn't tagged after that. You could also manually tag them by the first n characters of the session ID; that would give you a bounded cardinality without having too poor read performance.