These are chat archives for Yelp/elastalert

May 2016
May 05 2016 05:13
thanks for the info
May 05 2016 10:37
how to search for "error: Internal server error" using elastalert
May 05 2016 12:35
Hi, I'm using a Frequency rule and would like to be able to say how many events actually occurred when an alert was generated. I.e. rather than say “At least 5 events occurred between X and Y” if the value was 8 then I’d like the message to read along the lines of “At least 5 (8) events occurred between X and Y” OR alternatively, if it’s easier, expose the number of events (8 in this case) in either the elastalert document entry in the elastalert_status index or somewhere an Alerter could access it. A few questions: (1) Is this possible? (2) Does the approach even make sense? (3) I presume I’d need to create a new rule to this?
Miguel Ángel García
May 05 2016 14:50

I'm trying to use elastalert for once and I'm having some unexpected behaviors. My configuration is very similar to one example and it is as simple as possible, but I'm having this error when trying to detect when a http request was successful (just a test):
{'index': 'packetbeat-*', 'terms_window_size': {'days': 90}, 'fields': ['http.code'], 'alert': ['email'], 'filter': [{'term': {'http.code': 200}}], 'rule_file': '/opt/rules/http.yaml', 'type': 'new_term', 'email': [''], 'name': 'Example rule on HTTP'} is not valid under any of the given schemas
Failed validating 'oneOf' in schema:

What is wrong with that configuration?

Quentin Long
May 05 2016 17:09
@marlymarl The reason it does the "At least X" thing is that the alert actually fires as soon as it hits num_events, it doesn't wait to see how many events there are. A sort of workaround for this is by using top_count_keys because this will actually add 10 minutes (hardcoded, sorry) to the alert time and then do an aggregate query over the time period (alert_time minus timeframe to alert_time + 10 minutes) and give you the number of documents
@marlymarl It might be a good feature to do something similar but JUST to get the total document count for the surrounding time period

@marlymarl So for example, if you add

 - status

you might get

200: 16
404: 3
500: 2

even if the alert triggered on only 5 events, but those other events occurred immediately after

@magmax Hi Miguel. The new_term rule schema was broken for a little bit for single fields. See Yelp/elastalert#514. I just merged the fix for this, so if you pull master, it should work again
You can also just use fields: "http.code" and it would also pass the schema, which is why i didnt catch it, because our rules were laid out like this