@BertHartm sorry it’s been a while, but are you still seeing huge disparities in network as per m3db/m3#949 ? We’re in a place now where we can start handling the issues here and get some fixes in if it’s still causing problems
network is less problematic than CPU, just more surprising. It'd be worse if I weren't in the cloud though (and m3db/m3#1184 becomes the bigger problem then)
@cw9 was recommending maybe talking about moving to something with an aggregation tier and such instead of going through the coordinator for everything, so I should probably figure out how to start doing that in the new year also
yeah, that’s probably the right approach
Coordinator writes are more there to act as a prom remote store; we may work on building an endpoint that can take in a bunch of points in protobuf (or similar) format to service push-based models, but it’s unlikely that it’ll be as performant or flexible as the agg tier approach
How does the aggregation tier work?
I’m probably not the best person to ask, but the general gist of it is we have collectors running on each of your hosts that stream metrics to the agg tier, which buckets them by resolution and retention given a set of rules (I think the default is 10 sec with 2 day retention, but you can have a single metric go to multiple namespaces), then sends the aggregated values to your storage
so clusters for each level of retention and multiple namespaces configured in the coordinator?
I believe so, but am not sure what the configs should look like on all sides to get it working; one of the other guys would probably be a lot better at going over it
@jhofeditz A single agg tier cluster for all retention/resolution combinations and it can do roll-ups too
@felipe_fso_twitter brilliant, looking at that now - thanks for the results
Yeah that looks exactly the same
is that the same length of query?
basically if You can collect the two JSONs for a query that doesn’t match
that can help us distill it
What is the preferred way of pushing metrics into M3DB? I'm looking to try to replace my current metrics system (graphite) with m3db and get all our metrics into one place. A huge portion of our graphite metrics are pushed in via collectd/diamond agents, so I'm trying to figure out the best way to be able to re-create something like that for m3db
You can use the m3coordinator to push metrics into M3DB
theres a Json endpoint but it’s not very performanr
you May want to use the m3msg ingestion proto+tcp route
in the near future (perhaps 1-2 months) we should have some level of first class graphite support however
ie Carbon TCP port to write to
and possibly statsd TCP too
Thanks @robskillington! Being able to support a carbon like tcp port to accept writes would be huge. If this allows for a seamless cutover from our current graphite service, it'd be a nice win. Definitely something I'd be happy to get tested out as soon as that does become available.
@genericgithubuser might want to join the google group, we'll announce there once we have some support
I have a question about how Prometheus handles remote write service disruptions. I don't see the answer in the Prometheus docs. If I'm writing to M3 and M3 is unavailable, will Prometheus "catch up" the missing data when it comes back online? I'm guessing so since when I first setup M3 with read_recent turned on it showed all of my past data.
Prometheus has a queue
so I think it can handle temporary disruptions
but eventually the queue will fill and it will start dropping things
you may have seen past data because it was coming from Prometheus storage
Ok, so is there a way to import data from prometheus local storage into M3?
Not currently. M3DB doesn't currently support writing past a configurable buffer into the past, and even if it did, you'd have to write some kind of utility to read from prometheus and write to M3DB. That said, we're actively developing the out-of-order writes feature and will have it soon.
How much data are you trying to backfill? I wonder if you can just let Prom remote write to M3 for awhile and then do a cutover
I can, just trying to evaluate the plan. I'm in testing.
and trying to decide how to handle an M3 outage.
Yeah thats an interesting question
You can keep some storage on Prometheus
so even in the worst case you can look at the most recent metrics
until you get M3 back online
can I ask what was missing from other TSDBs that made an M3 build necessary?