These are chat archives for Yelp/elastalert

8th
Apr 2016
Jose Armesto
@fiunchinho
Apr 08 2016 09:16
nobody has formatted the email alert with HTML?
Marius Ducea
@mdxp
Apr 08 2016 16:18
I'm having an issue with ES 2.2 not liking the field name generated by top_count_keys; this ends up containing a dot and ES will barf with something like:
MapperParsingException[Field name [top_events_apache_error_url.raw] cannot contain '.']
is there a way to work around this and generate the filed name ES 2.X compatible (aka without dot)?
Marius Ducea
@mdxp
Apr 08 2016 16:40
I think i've figured it out; need to set raw_count_keys: false
Marius Ducea
@mdxp
Apr 08 2016 17:01
I'm not quite sure i understand the disadvantage of doing that, but at least it will keep ES 2.x happy
Quentin Long
@Qmando
Apr 08 2016 20:08
mdxp: It depends on how your mapping is set up. At least with Logstash, the default dynamic fields have the normal, analyzed version, and then the .raw field, which is not analyzed
If you run an aggregation across an analyzed field, you will get counts for individual words instead of phrases
So this is just an optimization for the default logstash settings. If the field is unanalyzed or single words, it should be fine
Quentin Long
@Qmando
Apr 08 2016 20:19
Not sure what the equivalent to ".raw" in ES 2.0
Marius Ducea
@mdxp
Apr 08 2016 20:59
@Qmando: thanks. I've used raw_count_keys: false and I don't see the error in ES anymore but now the information is not included into the alert:
apache_error_url:
No events found.
Quentin Long
@Qmando
Apr 08 2016 21:01
Yeah :(
Marius Ducea
@mdxp
Apr 08 2016 21:01
makes it basically useless
Quentin Long
@Qmando
Apr 08 2016 21:01
I think it's because it has a query_key value that it's triyng to filter by
And it's not matching for some reason, probably because it's an analyzed field. So for example if query_key: hostname and then hostname is "mdxp-something" it will try to filter with {"term": {"hostname": "mdxp-something"}} but since hostname is analyzed, it doesn't actually match
Marius Ducea
@mdxp
Apr 08 2016 21:04
Let me show you the rule i have
it is basically an apache error log and the intent of using the top_count_keys was to try to get the top urls; it something like this:
filter:
- term:
    source: "apache2"
- query:
    query_string:
      query: "message: ERROR”

include:
  - message
  - apache_error_url
  - host

top_count_keys: ["apache_error_url", "host"]
Quentin Long
@Qmando
Apr 08 2016 21:07
Hm. So you dont have query_key set, but you are getting "No events found"
That is actually not what I expected
Marius Ducea
@mdxp
Apr 08 2016 21:08
Well i was getting them, but with the above issues and errors reported with the dot in ES2.x
Now after i added raw_count_keys: false i get them with no events found
Quentin Long
@Qmando
Apr 08 2016 21:09
Yeah, just reading the documentation now
".raw" fields DO exist, hence why you get results
but you can't create your own field names that contain "."
So when elastalert tries to save it's results, you get the error
Not sure why you don't get results with raw_count_keys set to false though. It should be making the exact same query, just with .raw, which should still return stuff
Marius Ducea
@mdxp
Apr 08 2016 21:11
is there a better workaround that i should be using?
Quentin Long
@Qmando
Apr 08 2016 21:12
Can you run it with --es_debug_trace ~/trace.log and find when it makes the query for apache_error_url?
Marius Ducea
@mdxp
Apr 08 2016 21:13
sure
with my existing config? with raw_count_keys false?
Quentin Long
@Qmando
Apr 08 2016 21:13
Yeah, with false
Marius Ducea
@mdxp
Apr 08 2016 21:13
ok
Quentin Long
@Qmando
Apr 08 2016 21:13
I wanna see if it's actually returning no counts, or if it's just returning it in some different format
It should contain something like {'aggs': {'counts': {'terms': {'field': "apache_error_url", ... }
Marius Ducea
@mdxp
Apr 08 2016 21:18
i assume we need for this to be matched right?
Quentin Long
@Qmando
Apr 08 2016 21:25
haha yes. You need to trigger a match
you can set what it runs over--start 2016-04-04 --end 2016-04-05
and you can use --debug and it will still make that query but not send an alert
Marius Ducea
@mdxp
Apr 08 2016 21:26
doing that right now
got something like this:
  "aggs": {
    "filtered": {
      "aggs": {
        "counts": {
          "terms": {
            "field": "apache_error_url",
            "size": 5
          }
        }
      },
Quentin Long
@Qmando
Apr 08 2016 21:31
Can you run the curl command and tell me what it returns?
You should just be able to copy-paste the whole thing
(feel free to scrub actual data obviously)
Marius Ducea
@mdxp
Apr 08 2016 21:32
for some reason it has localhost instead of the real host that is in the config
is that normal?
Quentin Long
@Qmando
Apr 08 2016 21:32
Ah, yeah, thats a bug in the elasticsearch-py library
It always says localhost no matter where it actually goes
Marius Ducea
@mdxp
Apr 08 2016 21:32
got it; thought so
Quentin Long
@Qmando
Apr 08 2016 21:33
Elastalert expects the return data to contain ['aggregations'] then ['filtered'] then ['counts'] then ['buckets']. So I'm wondering if somehow it's slightly different
Marius Ducea
@mdxp
Apr 08 2016 21:33
so i will replace that and the rest will be identical to the output from the trace and i will run it like that
Quentin Long
@Qmando
Apr 08 2016 21:33
Yeah, you should get a json blob back that contains that the counts for all values
Thanks for going through the trouble of helping diagnose this. I need to get a 2.X cluster to test everything eventually
Marius Ducea
@mdxp
Apr 08 2016 21:36
weird; this is the relevant part that shows the issue:
{
  "took" : 3655,
  "timed_out" : false,
  "_shards" : {
    "total" : 90,
    "successful" : 87,
    "failed" : 3,
    "failures" : [ {
      "shard" : 1,
      "index" : "logstash-2016.04.07",
      "node" : "_fZCExeNSfO3yJ7tZ5vV_Q",
      "reason" : {
        "type" : "exception",
        "reason" : "java.lang.IllegalStateException: Field data loading is forbidden on apache_error_url",
        "caused_by" : {
          "type" : "unchecked_execution_exception",
          "reason" : "java.lang.IllegalStateException: Field data loading is forbidden on apache_error_url",
          "caused_by" : {
            "type" : "illegal_state_exception",
            "reason" : "Field data loading is forbidden on apache_error_url"
          }
        }
      }
    } ]
  }
Quentin Long
@Qmando
Apr 08 2016 21:36
Hmm. Well. There's ur problem!
Marius Ducea
@mdxp
Apr 08 2016 21:37
lol
this is not showing any warning or message in normal run of elastalert
thanks for helping me debug and find it
Quentin Long
@Qmando
Apr 08 2016 21:37
Yeah, unfortunately it covers it up :(
elastic/elasticsearch#15267
The solution, "just use .raw fields"
shit..
let me get you a workaround
Ok, here's a hacky solution
Quentin Long
@Qmando
Apr 08 2016 21:43
Remove the raw_count_keys: false
and at line 980 in elastalert.py (right above res = self.writeback('elastalert', alert_body)) put
Marius Ducea
@mdxp
Apr 08 2016 21:43
ok
Quentin Long
@Qmando
Apr 08 2016 21:43
alert_body['match_body'] = dict([(key.replace('.raw', '_raw'), value) for (key, value) in alert_body['match_body'].items()])
So that will rename everything .raw to _raw before it writes it to elasticsearch
Im hoping that fixes it
If not, sorry man :/ I'll try to get around to fixing this bug eventually
Marius Ducea
@mdxp
Apr 08 2016 21:44
much appreciated
thanks again for your help on this; i'll try it out
Quentin Long
@Qmando
Apr 08 2016 21:46
Cheers. Good luck