These are chat archives for Yelp/elastalert

6th
Apr 2016
sunilmchaudhari
@sunilmchaudhari
Apr 06 2016 06:08

Hi, I am facing delay problem in alerts received

Below is the alert I got.

  • Alert Sent 19:40

  • eventLogTime: 2016-04-05T16:25:00.575Z this is the field in Kibana/ES which is nothing but the time when event was created in logfile, not in ES>

  • At least 1 events occurred between 2016-04-05 19:10 EEST and 2016-04-05 19:25 EEST (timeframe)

Assuming there is time difference of 3 hours, the alert is delayed by 15 min.
Also it looks like all alerts are done during the end time of the time range, so the delay might actually be 30 min

All infrastructure is in Finnish timezone.

Problems: 1) Alert is sent 15 mins late. It was expected on 19.25. why?
2) Alert is sent at the end of timeframe. is it expected behaviour?

I have below configurtions in config.yaml

run_every:
  minutes: 15

buffer_time:
  minutes: 15


buffer_time:
  minutes: 15

--myrule.yaml

type: frequency

timeframe:
        minutes: 15

num_events: 1

Please help me how to overcome above 2 problems?

snirad
@snirad
Apr 06 2016 09:11
Hi, is there an integartion with pagerduty?
Error initiating alert ['email', 'PagerDuty']: Could not import module PagerDuty: need more than 1 value to unpack
snirad
@snirad
Apr 06 2016 09:29
Traceback (most recent call last):
File "elastalert/elastalert.py", line 1342, in <module>
sys.exit(main(sys.argv[1:]))
File "elastalert/elastalert.py", line 1337, in main
client = ElastAlerter(args)
File "elastalert/elastalert.py", line 90, in init
self.conf = load_rules(self.args)
File "/tmp/yealo/elastalert/elastalert/config.py", line 373, in load_rules
raise EAException('Error loading file %s: %s' % (rule_file, e))
util.EAException: Error loading file production_rules/example_frequency.yaml: Error initiating alert ['email', 'PagerDuty']: Could not import module PagerDuty: need more than 1 value to unpack
snirad
@snirad
Apr 06 2016 13:28
Need help to set up query if anyone can land a hand
mzamora717
@mzamora717
Apr 06 2016 16:11
Hello, this should be an easy one..I"m having an issue where only the first email address in the email list is receiving emails. Is my syntax wrong? This is what I have: email: "tower@springcm.com, operations@springcm.com"
mzamora717
@mzamora717
Apr 06 2016 16:16
Thanks @snirad !
snirad
@snirad
Apr 06 2016 16:17
email:
- email@email.com
- email2@email.com
np
mzamora717
@mzamora717
Apr 06 2016 16:54
Whats the easiest way to get the results of the email content into tabular format, instead of in a list form? @Qmando mentioned there could be an easy solution for this.
Quentin Long
@Qmando
Apr 06 2016 18:07
@snirad just put "pagerduty" without capitalization
@mzamora717 : No tables right now. You can format the email however you like with alert_text, and you can include fields from every document, but that becomes a big json blob
@sunilmchaudhari "Alert is sent 15 mins late" you have run_every: minutes: 15 so of course it's gonna take 15 minutes.
timeframe is completely meaningless when num_events is 1.
Thats why type: any exists, basically equivalent to a rule with frequency and num_events==1, but more efficient. Anyway just change run_every to 1 minute or something and that will solve your problems
snirad
@snirad
Apr 06 2016 18:15
@Qmando thanks, I was able to figure it out after looking at some other examples over the internet
Quentin Long
@Qmando
Apr 06 2016 18:16
It should just lowercase everything maybe I'll add that feature at some point
snirad
@snirad
Apr 06 2016 18:16
There is a query I would like to do and I wonder what is the best options to do it. I agree with the programmers that if an error show up they add to the message - " ended with exit code 1"
Quentin Long
@Qmando
Apr 06 2016 18:17
You are trying to query for " ended with exit code 1" ?
snirad
@snirad
Apr 06 2016 18:18
I have a lot of different servers so I basically want to query whatever ended with exit code 1 and throw back an alert with the server name and the message.
I came up with this
filter:
- query:
    query_string:
      query: "@message:\"ended with exit code 1\""
query_key: "@source"
use_terms_query: true
doc_type: "Task [./populateS3.js]"
attach_related: true
type: frequency
thing is, I have a lot of diffrent doc_types
Quentin Long
@Qmando
Apr 06 2016 18:20
I think you can do wildcard in doc_type
Or, just don't use use_terms_query if you can get away with it. I don't think attach_related works with that anyway
use terms query means it will only grab the count for each value of "@source", but not download the actual documents (what attach_related would grab)
If you have too many documents to not use use_terms_query, you can add top_count_keys: [fieldX, fieldY] instead
That will at least tell you some more information
I think doc_type also takes comma separated values, but honestly I've never tried a doc_type with spaces and symbols in it.
snirad
@snirad
Apr 06 2016 18:22
I tried to do an array and also use regex, didnt work :/
Quentin Long
@Qmando
Apr 06 2016 18:23
How many hits do you get without use_terms_query?
You can lower the buffer_time down from the default of 45 minutes to 10 or something
If it's too many
snirad
@snirad
Apr 06 2016 18:24
well my aim is to give an alert for all the diffrent servers that had exit code 1
more than 2-3 times in the last hour / and need to be uniqe doc_type
Quentin Long
@Qmando
Apr 06 2016 18:24
Have you tried without use_terms_query, because then you don't need doc_type
snirad
@snirad
Apr 06 2016 18:25
will check
Quentin Long
@Qmando
Apr 06 2016 18:25
the only reason you need use_terms_query is if there are too many docs and it becomes slow or memory hog
snirad
@snirad
Apr 06 2016 18:26
if i use attach_related, I can remove aggregate
?
  • aggregation:
snirad
@snirad
Apr 06 2016 18:32
ok - cool I Think it work, Just it doesnt get all the messages in one mail :D
the right way to get if a server report to elasticsarch is by using spike rule :?
snirad
@snirad
Apr 06 2016 18:44
type: flatline
threshold: 1
query_key: ["username", "ip_address", "status"]
timeframe:
days: 1\
thanks
Quentin Long
@Qmando
Apr 06 2016 23:23
attach_related only works for frequency rule. Aggregation will take ANY rule type and just concatenate all the alerts together into one
"the right way to get if a server report to elasticsarch"? Do you mean you want to alert if there are NO documents matching a specific pattern?