Thanks again for all the pointers. They will help me make sense of what the code is doing.
I think I will start with storing the IP addresses as the 16 byte ipv6 address and do an inefficient
full table scan for cidr matches.
On the other hand, cidr matches all have the same prefix, so if I can search for the lowest member and scan forward from there, I should be able to get efficient CIDR matches.
What happens if we want to be able to store multiple IPs in a field?
Or for that matter, multiple keywords in a text field?
Is this something that bleve handles naturally? Or should I generate multiple documents - if I have 5 IPs should I generate 5 copies of my document, one for each IP?
@mschoch As I understood from the documentation this is doable using token filters https://blevesearch.com/docs/Token-Filters/. Is there any ready made token filter type in which will let me do it?
Also which is the best way to do it? using the AddCustomTokenFilter ? Or is there any way to do it like a plugin for anyone else wanting the same and for better recycled on other projects?
There is a plugin for elasticsearch that do the same thing I want https://github.com/skroutz/elasticsearch-analysis-greeklish https://github.com/skroutz/elasticsearch-analysis-greeklish/blob/7.7.0/src/main/java/org/elasticsearch/index/analysis/GreeklishGenerator.java
I have closed the IP Range search PR as it had a long and circuitous history compromising dozens of commits, which looked like an intimidating prospect for any reviewer. I have squashed the history into a new PR blevesearch/bleve#1546 which comprises less than 600 lines of code, half of which are tests.
In the end the implementation of IP range searching is quite trivial and maps very nicely to the bleve's indexReader.FieldDictRange feature.
I have added some end to end tests which demonstrate indexing and searching IP addresses. I put the tests in test/ip_field_test.go as I was non sure what was the best location for these kind of tests. @mschoch let me know what you think.
@hsjgrobler_gitlab bleve today doesn't support multiple processes very well, opening an index from a single process to serve all requests is the model it was designed around
Hi - I just started using bleve a few days ago and now looking for some general advice on optimizing the index search speed. In my case, I only have to index all the data once on program initialization, and there will be many concurrent search requests after initialization. I just went through this chat and came across this explanation which makes me wonder if bleve is the right choice for this scenario, as I am not sure what it means by "single process"