Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    Marty Schoch
    @mschoch
    ah i see, sorry i didn't notice it sooner, the fields must be exported (as bleve uses reflection to examine your documents, it cannot see unexported fields)
    you can name the fields in your index lower-case if you want
    Marty Schoch
    @mschoch
    @jmoss20 here it is, hacked up a bit (and using scorch, not the old default upsidedown)
    ```

    package main

    import (
    "fmt"
    "github.com/blevesearch/bleve"
    "log"
    "os"
    )

    type Note struct {
    Id string
    Title string
    }

    func (n Note) Type() string {
    return "note"
    }

    func main() {

    os.RemoveAll("/tmp/idx")
    
    englishMapping := bleve.NewTextFieldMapping()
    englishMapping.Analyzer = "en"
    
    blockMapping := bleve.NewDocumentMapping()
    blockMapping.AddFieldMappingsAt("contents", englishMapping)
    
    noteMapping := bleve.NewDocumentMapping()
    noteMapping.AddFieldMappingsAt("title", englishMapping)
    noteMapping.AddSubDocumentMapping("blocks", blockMapping)
    
    indexMapping := bleve.NewIndexMapping()
    indexMapping.AddDocumentMapping(Note{}.Type(), noteMapping)
    
    idx, err := bleve.NewUsing("/tmp/idx", indexMapping, "scorch", "scorch", nil)
    if err != nil {
        log.Fatal(err)
    }
    
    err = idx.Index("note1.org", Note{Id: "note1.org", Title: "note1.org"})
    if err != nil {
        log.Fatal(err)
    }
    
    query := bleve.NewMatchQuery("note1.org")
    search := bleve.NewSearchRequest(query)
    searchResults, err := idx.Search(search)
    if err != nil {
        log.Fatal(err)
    }
    
    fmt.Println(searchResults)

    }

    gitter so terrible
    John Moss
    @jmoss20
    ahhh, thanks! Haven't touched go before, didn't realize uppercase/lowercase was significant
    appreciate the help
    Marty Schoch
    @mschoch
    sure np
    Amnon
    @amnonbc
    Hi all, we added an IPRange mapper to bleve. Is this something which you would be prepared to consider upstreaming? Should I submit a PR?
    Amnon
    @amnonbc
    Looking though the history, I can see that one of my predecessors did submit a PR for this a while back blevesearch/bleve#644.
    The upstream code has moved on a bit since then. But I am more than happy to fix or redo the PR if you would like any changes.
    Amnon
    @amnonbc
    I see that it has also been discussed on the mailing list https://groups.google.com/g/bleve/c/FBiUWJKWMZg/m/aUrXqGfxCgAJ
    Marty Schoch
    @mschoch
    yeah, i think that commentary captures my thoughts
    i would expect a new field type to encode all necessary information into a single field
    and then some new query or set of queries that operate on that data
    Amnon
    @amnonbc
    Thanks @mschoch, that definitely makes sense. I'll pick up the PR and make these changes. I would also like to add IPrange queries into the query string language if I can find a syntax you are happy with.
    Amnon
    @amnonbc
    I had a look at adding an IpField type. But I am missing some context about what Fields are for, and what their methods do. The comments in field.go are a bit terse, and the unit tests in the document directory don't really demonstrate what functionality is required. I am having trouble getting what the Analyze method is doing. Do you have any higher level explanation what the field types do?
    Marty Schoch
    @mschoch
    field types are resonsible for creating some representation of the data inside the index
    the index is an inverted index of terms
    for textual fields the Analyze() method splits the "full text" into the pieces, terms
    numeric, date, geo, etc typically do some artificial manipulation of the data
    i realize this isn't very helpful, but honestly not sure where to begin
    Amnon
    @amnonbc

    Thanks @mschoch , that actually is helpful - in that it gives me a starting point, and a birds I view of what the Field interface does.
    I'll have a read through the existing Field implementations and see if I can figure it out from here.

    It would be great it there were more extensive comments in document/field.go giving an overview and explaining the role of each of the methods.

    An unrelated question - what is the status of the Scorch backend now? Is it ready for production use?
    And how do we turn it on? Do we just set Config.DefaultIndexType = scorch.Name and specify
    bleve.NewUsing(path, mapping, "scorch", "scorch", nil)? Or are there other things we need to do?
    How does performance compare with Rocksdb?

    Marty Schoch
    @mschoch
    the reason there wont be more docs added is that the interface no longer makes sense
    it was a bad design, we just didn't know it then
    bleve really has one kind of field right now, all of which should just have 5 or 6 different constructors for the different start points (data types)
    scorch is production ready, and with the 2.0 release coming soon, it will be the default (upsidedown and all k/v store index will be deprecated)
    going back to fields, i recommend you use numeric field as your starting point for an IP field, it likely won't be the same, but will likely do similar things
    starting with the constructors at the end of the file, there are several functions NewNumericField...
    Marty Schoch
    @mschoch
    at their core, they take a float64 and return a *NumericField
    you will probalby need ones that take IPv4 and IPv6 addrs
    at this stage all these functions do one important converstion
    they convert that incoming data, to a binary representation, which you see stored in the "value" field of the struct
    this should be some sort of lossless encoding of the data, and it is what will be "stored" if the field is stored, allowing recovery of the original value after searching
    now, the Analyze method takes that value, and possibly uses it to create a set of muliple values to be stored in the index
    this is highly data-type dependent
    for example, the numeric range data type does bit shift, to shift off less significant bits, and index those value as well (this allows for a particular technique to more efficiently perform numeric range searches)
    at this stage, you have to be able to answer the following
    what types of queries do i need to be able to perform on IP addresses?
    exact match? arbitrary range? CIDR mask? something else?
    and then, what are the right values to put into the index to facilitate answering those queries?
    anyway, thats enough for now, let me know if you have more questions
    Amnon
    @amnonbc
    The only two queries we need are exact match and CID mask.
    Amnon
    @amnonbc

    Thanks again for all the pointers. They will help me make sense of what the code is doing.
    I think I will start with storing the IP addresses as the 16 byte ipv6 address and do an inefficient
    full table scan for cidr matches.
    On the other hand, cidr matches all have the same prefix, so if I can search for the lowest member and scan forward from there, I should be able to get efficient CIDR matches.

    What happens if we want to be able to store multiple IPs in a field?
    Or for that matter, multiple keywords in a text field?
    Is this something that bleve handles naturally? Or should I generate multiple documents - if I have 5 IPs should I generate 5 copies of my document, one for each IP?

    Marty Schoch
    @mschoch
    I think starting with IPv4 and getting CIDR matches to work is a good starting point. And you'll better understand all the issues involved to try and extend it to ipv6
    1 reply
    so bleve already supports multi-valued fields, for example if a document has an array of IP addresses in one field
    but if you have multiple IP addrs with specific meanings like src and dest, then those should be separate fields with meaningful names
    Panagiotis Koursaris
    @panakour
    Hi. I would like to create a filter that works by creating multiple forms of a single greek token. i.e. from the single token καλημερα you can create greeklish forms like kalimera, kalhmera, kalimeres (while, of course, also keeping the original) eg. Replace each greek character with the corresponding latin character.
    Amnon
    @amnonbc
    If I have a conjugation query of two terms, and the first term returns 10 elements, and the second returns 1,000,000
    will I be better off running only the first search, and filtering the results bleve returns outside of bleve for the second term?
    Or does bleve do this optimisation itself?
    Does the order of the terms in a conjugation query matter?
    And if I have a conjugation query, some of whose terms are themselves conjugations, would I get better performance if I flattened them?