These are chat archives for Yelp/elastalert

3rd
Oct 2017
manya12
@manya12
Oct 03 2017 07:30
HI @Qmando. I have a rule which should generate an alert whenever number of hits are more than 10000 for one of my componentName.
My rule.yaml looks like:
es_host: ...
es_port: 9200
name: DAILY_ALERT_LOG
type: frequency
index: applog-*
num_events: 10000
timeframe:
days: 1
alert:
  • "email"
    alert_subject: "LOGS ALERTING for {} on Shore-DEV KIBANA"
    alert_subject_args:
  • "componentName"
    alert_text: "Number of logs generated by {} are more than 10000"
    alert_text_args: ["componentName", "@timestamp"]
    include: ["hostName", "componentName", "componentVersion"]
    email:
  • "manya.goyal@decurtis.com"
    from_addr: "alert@decurtis.com"
    smtp_host: "smtp.office365.com"
    smtp_port: 587
But it is generating me false results in email. The componentname for which it is generating alert do not have 10000 hits in last 24 hrs, when i am querying the same in kibana.
Also few alerts do not even match the 10000 hits criteria.
one of my alert email looks like:

Number of logs generated by boarding service are more than 10000

At least 10000 events occurred between 2017-10-02 07:33 UTC and 2017-10-03 07:33 UTC

@timestamp: 2017-10-03T07:33:23.746Z
_id: AV7hJ_bEHlVbrKeGpvJB
_index: applog-2017.10.03
_type: log
componentName: boarding service
componentVersion: 0.6.0.17
hostName: dxpboardingservice-v014-m0nnl
num_hits: 4207
num_matches: 1

sathishdsgithub
@sathishdsgithub
Oct 03 2017 07:35
@manya12 what is the query filter ?
manya12
@manya12
Oct 03 2017 07:35
I have around 50 components. How to specify query filter?
@sathishdsgithub
sathishdsgithub
@sathishdsgithub
Oct 03 2017 07:36
I don't find any query filter in your rule. Which logs you're trying to match ?
manya12
@manya12
Oct 03 2017 07:37
My use case is to have alert whenever any of my component has more than 10000 logs in last 24 hrs
sathishdsgithub
@sathishdsgithub
Oct 03 2017 07:37
Are you trying to match all 50 components from the raw log?
manya12
@manya12
Oct 03 2017 07:37
yes
I want to have alert whenever anyone of them has more than 10000 hits but i think it is looking for hits in total and not for a component.
sathishdsgithub
@sathishdsgithub
Oct 03 2017 07:39
From the above results what I understand is you got alert component name :boarding service , but this does not have 10000 matching
manya12
@manya12
Oct 03 2017 07:41
yes, i am getting alerts for components with more or less than 10000 matches, though they don't have that much of hits when i query the same in kibana
sathishdsgithub
@sathishdsgithub
Oct 03 2017 07:41
What is your run_every time ?
manya12
@manya12
Oct 03 2017 07:42
rules_folder: elastalert_rules
run_every:
minutes: 1
buffer_time:
minutes: 15
sathishdsgithub
@sathishdsgithub
Oct 03 2017 07:51
Can you try to match one specific component name for last one hour and see if your get correct alert ?
You can use query filter to match one specific component name
manya12
@manya12
Oct 03 2017 07:57
I tried this with rule.yaml:
es_host: ...
es_port: 9200
name: DAILY_ALERT_LOG
type: frequency
index: applog-*
num_events: 1000
timeframe:
minutes: 60
filter:
  • query:
    query_string:
    query: "componentName: \"guest service\""
sathishdsgithub
@sathishdsgithub
Oct 03 2017 08:00
what was the result
manya12
@manya12
Oct 03 2017 08:00
Now it is sending me alerts like:
sathishdsgithub
@sathishdsgithub
Oct 03 2017 08:00
?
manya12
@manya12
Oct 03 2017 08:00

At least 1000 events occurred between 2017-10-03 06:41 UTC and 2017-10-03 07:41 UTC

@timestamp: 2017-10-03T07:41:27.425Z
_id: AV7hL2VuHlVbrKeGpyXr
_index: applog-2017.10.03
_type: log
componentName: guest service
componentVersion: 0.6.0.11
hostName: dxpguestservice-v005-x2kg3
num_hits: 21412
num_matches: 7

another email is:

At least 1000 events occurred between 2017-10-03 06:48 UTC and 2017-10-03 07:48 UTC

@timestamp: 2017-10-03T07:48:41.517Z
_id: AV7hNgb1HlVbrKeGp02P
_index: applog-2017.10.03
_type: log
componentName: guest service
componentVersion: 0.6.0.11
hostName: dxpguestservice-v005-x2kg3
num_hits: 3516
num_matches: 3

sathishdsgithub
@sathishdsgithub
Oct 03 2017 08:02
i guess the alert is correct .. since you have set the threshold as 1000 you got alert result because it is > 1000
manya12
@manya12
Oct 03 2017 08:04
But first one is showing num_hits as 21412 and matches as 7.
also when i am querying kibana to show hits for last 1 hour, it shows guest service has 10,185 hits
Also i am not getting, how it is displaying result on the basis of timestamp. means first alert shows At least 1000 events occurred between 2017-10-03 06:41 UTC and 2017-10-03 07:41 UTC while second is At least 1000 events occurred between 2017-10-03 06:48 UTC and 2017-10-03 07:48 UTC
sathishdsgithub
@sathishdsgithub
Oct 03 2017 08:07
21412 hits meaning total no of logs that the rule tries to find the matches for run_every times.. the hits counts gets varied when each time the rule runs
manya12
@manya12
Oct 03 2017 08:08
But Quentin told me that num_matches means the no. of hits that matches the search
Also how to get the alert for a component when it gets more than 10000 hits in last 24 hrs?
manya12
@manya12
Oct 03 2017 08:15
Can u please help me in understanding run_every_time property?
and buffer_time property?
manya12
@manya12
Oct 03 2017 08:21
Should buffer_time property of config.yaml and timeframe in frequency_rule.yaml should be same?
Quentin Long
@Qmando
Oct 03 2017 18:10
@manya12 You are alerting on ALL componentName's together and just adding in the value of the one document that pushed it above 10,000
You need to add
query_key: componentName
That will make your alert "Alert if a single componentName has more than 10,000 events"
you can probably leave buffer_time as the default.
Run_every is just how often elastalert makes a query. Lower time means more load on the elasticesarch cluster but less delay to an alert, more time means more delay but less queries being made