Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    Marty Schoch
    @mschoch
    Generally, it should be as simple as ensuring that you set "store" to true on the field mapping: https://github.com/blevesearch/bleve/blob/ae28975038cb25655da968e3f043210749ba382b/mapping/field.go#L50
    And then when you build the search request, set Fields to be []{"*"} https://github.com/blevesearch/bleve/blob/master/search.go#L276
    NOTE that "*" is just a magic value interpreted to mean you want to us load all stored fields, there is no pattern matching.
    1 reply
    Jxic
    @Jxic
    hi @mschoch, recently I've been using memory-only bleve for storing around 100000 documents, and I did a few benchmarks on the search performance. The document struct is quite easy as in there are only 3 string fields and 1 int field. The major performance bottleneck seems to be caused by the frequent call to golang garbage collection strategy since unmarshalling value from []byte data creates a lot of short-lived objects. So do you think it's possible to skip the marshalling and unmarshalling process when using memory-only index?
    Marty Schoch
    @mschoch
    @Jxic yeah the in-memory index is pretty bad today. We'd like to replace it with one backed by scorch as well (on the road-map for this year) Can you be more specific about which marhsal/unmarshal you think would be helpful to remove?
    Jxic
    @Jxic
    so according to the pprof diagram, the function NewBackIndexRowKV causes plenty of new memory allocation, and it boils down to the (*BackIndexRowValue).Unmarshal
    Jxic
    @Jxic
    it's only when the garbage collection starts that a few search request got affected, but if the gc gets called too frequently, we cannot achieve high availability for our services
    Type: cpu
    Time: Feb 26, 2021 at 5:29pm (CST)
    Duration: 30s, Total samples = 1mins (200.47%)
    Entering interactive mode (type "help" for commands, "o" for options)
    (pprof) top 10
    Showing nodes accounting for 29.71s, 49.40% of 60.14s total
    Dropped 347 nodes (cum <= 0.30s)
    Showing top 10 nodes out of 140
          flat  flat%   sum%        cum   cum%
         7.59s 12.62% 12.62%      7.59s 12.62%  (our own function)
         4.85s  8.06% 20.69%     10.92s 18.16%  (our own function)
         4.32s  7.18% 27.87%      8.07s 13.42%  runtime.scanobject
         3.06s  5.09% 32.96%     11.18s 18.59%  runtime.mallocgc
    Sipun S
    @SipunS1_twitter
    Hi, How can I use "ngrams" for partial matching of words. I tried to import "github.com/couchbaselabs/bleve/analysis" and nameNgramMapping := bleve.NewTextFieldMapping()
    nameNgramMapping.Analyzer = ngramAnalyzer
    nameNgramMapping.Name = "ngram", but it fails with error " module declares its path as: github.com/blevesearch/bleve
    but was required as: github.com/couchbaselabs/bleve"
    Marty Schoch
    @mschoch
    @Jxic unfortunately I just don't see any simple way around that. The upsidedown index scheme uses the back index for core operation, I can't think of any simple way to just "not do that part".
    @SipunS1_twitter for errors like that it is helpful to share the entire code somewhere. That particular error sounds unrelated to bleve usage, but possibly to the way the modules/packages are configured and referenced. There is an example of ngram mappings here: https://github.com/blevesearch/beer-search/blob/master/mapping_example2.go
    Jxic
    @Jxic
    @mschoch alright I got it, really appreciate for you reply, so for the scorch replacement, how much improvement could we expect?
    Marty Schoch
    @mschoch
    @Jxic so, it's hard to speculate, and I don't want to give you false hope that it's a lot better. What I would say is that more of the data is already in a form that is useful for searching, as opposed to upsidedown where we first have to deserialize to even process the records. But, ultimately as matches are found, there will still be allocations.
    @Jxic I'm not sure what the state of your application is, but a very similar in-memory index is already supported by the bluge project I started last fall. The API for indexing is a bit lower-level, but should feel familiar if you've worked with bleve. So if you're interested in trying that out, you can find more info here: https://github.com/blugelabs/bluge
    Some of the ideas explored in bluge are being back-ported into bleve, so even if you're project can't use bluge, it still might be useful feedback about the in-memory scorch index ideas.
    Jxic
    @Jxic

    @Jxic I'm not sure what the state of your application is, but a very similar in-memory index is already supported by the bluge project I started last fall. The API for indexing is a bit lower-level, but should feel familiar if you've worked with bleve. So if you're interested in trying that out, you can find more info here: https://github.com/blugelabs/bluge

    Cool! thank you so much, I'll check that out

    Amnon
    @amnonbc
    What is the best way to print a query.Query (for debugging)
    Marty Schoch
    @mschoch
    @amnonbc I don't use it myself, but you can try the DumpQuery function in the search/query package: https://github.com/blevesearch/bleve/blob/ae28975038cb25655da968e3f043210749ba382b/search/query/query.go#L351-L361
    It claims to print query hierarchy, including expanding query strings into their underlying queries
    Amnon
    @amnonbc
    Thanks!
    Amnon
    @amnonbc
    Another question. What sort of performance (speed) should we expect to see for fuzzy searches?
    Marty Schoch
    @mschoch
    @amnonbc so it's hard to characterize the performance in absolute terms, because it depends on so many factors. What I can say is that we store the term dictionary in a data-structure called an FST (finite state transducer), which allows us to pretty efficiently find the set of terms within the edit distance of the search term.
    Amnon
    @amnonbc
    Thanks Marty. Does this apply to the old upside down index, or only to the new scorch index?
    Marty Schoch
    @mschoch
    Only to the scorch index. The upsidedown index has to compute all the various possibilities and then brute force search for them.
    Darrin McCarthy
    @darrinmc_twitter
    I want to run a query for the first 50 documents based on a specified sort order (mod time). When I run the query using a query string query passing an empty string, the request takes 5-6 seconds. When I pass at least 1 search term (e.g. +beer) the query returns very quickly. Is there a way to make the empty string query faster?
    Amnon
    @amnonbc
    I would imagine that the first case will sort all the documents in your index and return the first 50. The second case will only sort the subset of documents that contain the term "beer". If this represents only a small proportion of the total documents in the index, one would expect this to complete faster.
    Hendrik Grobler
    @hsjgrobler_gitlab
    Hey guys, after swapping out an old index for a new one, what's the best way to remove the old one permanently?
    Marty Schoch
    @mschoch
    @darrinmc_twitter yeah confirm what query is actually being executed, I think by default empty query string results in match none query, so your results are unexpected: https://github.com/blevesearch/bleve/blob/master/search/query/query_string_parser.go#L38
    as @amnonbc points out, it sounds like a match all is executing, and it means we have more documents to process and sort by your sort order (mod time)
    @hsjgrobler_gitlab can you clarify what you mean by "swapped out" here? if you're using an index alias, and switch from an old index to a new one, you can close and remove the old if you no longer need it
    Hendrik Grobler
    @hsjgrobler_gitlab
    @mschoch Yeah, using the index alias. How would I remove the index? Delete the file at the index path?
    Marty Schoch
    @mschoch
    Yes, once you no longer need it, you should call Close(), and one that returns you can remove the index path.
    Hendrik Grobler
    @hsjgrobler_gitlab
    Great, thanks!
    SvineruS
    @SvineruS
    hello, i'm new to golang and bleve. can i get in search results data, that i put in bleve.Index function? or only a id?
    13 replies
    and another question. can I change the data in the index for a given ID and write it to disk?
    3 replies
    @mschoch
    Amnon
    @amnonbc

    https://github.com/golang/go/issues/40724#issuecomment-821758073

    Bleve is being used as two of the benchmarks to measure the speedup that the change to register based param passing will give in GO 1.17. Results look impressive: 14% speedup in one case.

    Marty Schoch
    @mschoch
    @amnonbc thanks for sharing that
    JT Archie
    @jtarchie
    Bleve is optimized for generic documents, has any work been done to see if there are speed advantages for specific document shapes? I have no idea what I am talking about in nomenclature.
    JT Archie
    @jtarchie
    It looks like a custom made Index that adheres to the IndexMapping interface.
    Marty Schoch
    @mschoch
    @jtarchie can you expand on what you mean by document shape, and how you would optimize for a particular shape?
    JT Archie
    @jtarchie
    It looks like Bleve does a lot of inspecting of structs with reflection, empty interface{}, and inspection of a query results. If remove any reflection, can we get get a performance speedup?
    Marty Schoch
    @mschoch
    @jtarchie again can you be more specific? The entrypoint to indexing/updating documents does take interface{} and perform a mapping process. But this is not required, applications are encouraged to directly build their own Document instances and use the IndexAdvance() method instead (but only if your application needs this). So, on the indexing path, you can avoid interface and reflection if you choose. On the querying side I don't think we use the empty interface much. Is there some particular location you have in mind?
    Jonathan Clem
    @jclem
    Does Bleve support concurrent access by different OS processes? I would like to have one (OS) process doing indexing and another (a CLI) performing queries, but is that possible? Right now my solution is that the indexing process starts an HTTP server that my CLI instead connects to. It looks like it doesn't just because I can't get a second connection on the index to open, but not sure if there's a configuration thing or something. Google has failed me.
    Marty Schoch
    @mschoch
    @jclem Bleve does not support this today. I have another project Bluge, which is an experimental fork of Bleve, it does support using OS locking capabilities to let multiple processes use the index at the same time (single writer, multi-reader)
    See https://github.com/blugelabs/bluge for more info, it is currently just developer preview though
    Jonathan Clem
    @jclem
    @mschoch Thanks! I'll check out bluge. This is for a purely personal project (for now), so that may be a good option for me.
    chemivro
    @chemivro:matrix.org
    [m]
    Hello, I was planning on using bleve and I was wondering around the read mutex, if I have a single instance of a bleve index, will that be only able to perform one read at a time, at which point would it be better if I opened multiple indexes? I'm going to be using it on a web app which might be receiving hundreds of requests per second
    chemivro
    @chemivro:matrix.org
    [m]
    for example opening 10 indexes in read only (I don't need to index) then when a request arrives I try to lock one of the indexes, if that works use it, if not continue to try others. Or maybe I'm misunderstanding and this is not needed to allow multiple concurrent queries?
    chemivro
    @chemivro:matrix.org
    [m]
    looks like it works in parallel just fine so no need to do any of this