Where communities thrive

  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
Repo info
    Marty Schoch
    @sergio__vera_twitter no, the query string query doesn't support that because it works across multiple fields
    James Mills
    Hey all. Having some troubles getting bleve highlight to work. I can see <mark>...</mark> if I log the output of searchResults but once I shove this into a template/html and render that it's gone
    I thought there was some funky escaping/sanitizing going on but I'm a bit strumped
    Yes, html/template will sanitize out the <mark> tags.
    Hi - I'm new to bleve - and am trying to find some detail guidance - on concurrency and thread safety - can someone please point me in the right direction?? I'd like to be able to index a large dataset multithreaded - but want to understand if there is risk to the store?
    Marty Schoch
    weclome @orenben12 i'll try to answer your questions
    first, the index structure itself is threadsafe, so you can have multiple goroutines sharing reference to it and using the index/batch methods to index concurrently
    the batch objects are NOT thread-safe, but are reusable (more on this later), so if each goroutine is indexing in batches (recommended), they should use separate batch structures
    batch structures can be reused, unless you're using the "unsafe_batch" option on a scorch index, in which case they are much more difficult to reuse safely, to start i wouldn't bother reusing batches
    in general though, while some concurrency may help, you'll eventually hit some limits with a single index, most likely not saturating your disk I/O
    if you're goal is to index all the data as quickly as possible, you may find it beneficial to create multiple indexes, instead of just one
    then at query time, you can query across all of them using an index alias
    finally, in general, i recommend you start with bleve v2, if you're not already, as it produces the most optimal index we support out of the box (v1 defaults to some older technology)
    i'll stop there, but if you have follow-up questions, just let me know

    @hsjgrobler_gitlab bleve today doesn't support multiple processes very well, opening an index from a single process to serve all requests is the model it was designed around

    Hi - I just started using bleve a few days ago and now looking for some general advice on optimizing the index search speed. In my case, I only have to index all the data once on program initialization, and there will be many concurrent search requests after initialization. I just went through this chat and came across this explanation which makes me wonder if bleve is the right choice for this scenario, as I am not sure what it means by "single process"

    will the requests be queued in this case?
    by the way, I'm using the memory-only index
    Marty Schoch
    ok, so for an in-memory index this issue isn't relevant, because only one process can have access to the memory anyway
    for indexes persisted to disk in files, it can sometimes be convenient to allow multiple processes to work with those files at one time
    bleve does not allow that, locks held by the writer process block other processes from reading as well
    bluge does allow that, by using operating system locking primitives, a single writer and multiple reader processes can share access to the files
    John Forstmeier
    Hi! I can't figure out how to get the fields back from a sub-document - basically, I'd like to run queries against one of the sub-document fields ("body") and then return all three fields ("id", "timestamp", and "body") in the results so that I can rebuild the original BleveSpecimen (a wrapper to Specimen so that the bleve.Classifier can be implemented) type. I've looked at this Gist which uses the SetInternal/GetInternal methods but is there a way to do it directly from the SearchResult type?
    package main
    import (
    // Specimen is the root specimen.
    type Specimen struct {
        ID        string    `json:"id"`
        Timestamp time.Time `json:"timestamp"`
        Body      string    `json:"body"`
    // BleveSpecimen wraps the root specimen.
    type BleveSpecimen struct {
        // Specimen `json:"specimen"`
        Specimen Specimen `json:"specimen"`
    // Type implements the bleve.Classifier interface.
    func (bs *BleveSpecimen) Type() string {
        return "bleve_specimen"
    var now = time.Now()
    var specimens = []Specimen{
        {ID: "one", Timestamp: now, Body: `{"text":"the quick brown fox jumped over the lazy dog"}`},
        {ID: "two", Timestamp: now, Body: `{"text":"jump over the brick wall"}`},
        {ID: "three", Timestamp: now, Body: `{"text":"carnivours are delicious"}`},
    func main() {
        specimenMapping := bleve.NewDocumentMapping()
        idFieldMapping := bleve.NewTextFieldMapping()
        timestampFieldMapping := bleve.NewDateTimeFieldMapping()
        bodyFieldMapping := bleve.NewTextFieldMapping()
        specimenMapping.AddFieldMappingsAt("id", idFieldMapping)
        specimenMapping.AddFieldMappingsAt("timestamp", timestampFieldMapping)
        specimenMapping.AddFieldMappingsAt("body", bodyFieldMapping)
        bleveSpecimenMapping := bleve.NewDocumentMapping()
        bleveSpecimenMapping.AddSubDocumentMapping("specimen", specimenMapping)
        indexMapping := bleve.NewIndexMapping()
        indexMapping.AddDocumentMapping("bleve_specimen", bleveSpecimenMapping)
        name := "testing.bleve"
        index, err := bleve.New(name, indexMapping)
        if err != nil {
            log.Fatalf("error creating index: %s", err.Error())
        defer os.RemoveAll(name)
        batch := index.NewBatch()
        for _, specimen := range specimens {
            batch.Index(specimen.ID, BleveSpecimen{
                Specimen: specimen,
        if err := index.Batch(batch); err != nil {
            log.Fatalf("error calling batch: %s", err.Error())
        query := bleve.NewMatchQuery("carnivours")
        search := bleve.NewSearchRequest(query)
        searchResults, err := index.Search(search)
        if err != nil {
            log.Fatalf("error running query: %s", err.Error())
        log.Printf("results: %+v", searchResults)
    Marty Schoch
    @forstmeier we no longer recommend using internal storage for storing significant amounts of data. It works OK with the older upsidedown index format, but the scorch index is not designed to store large amounts of data with those internal values.
    Generally, it should be as simple as ensuring that you set "store" to true on the field mapping: https://github.com/blevesearch/bleve/blob/ae28975038cb25655da968e3f043210749ba382b/mapping/field.go#L50
    And then when you build the search request, set Fields to be []{"*"} https://github.com/blevesearch/bleve/blob/master/search.go#L276
    NOTE that "*" is just a magic value interpreted to mean you want to us load all stored fields, there is no pattern matching.
    1 reply
    hi @mschoch, recently I've been using memory-only bleve for storing around 100000 documents, and I did a few benchmarks on the search performance. The document struct is quite easy as in there are only 3 string fields and 1 int field. The major performance bottleneck seems to be caused by the frequent call to golang garbage collection strategy since unmarshalling value from []byte data creates a lot of short-lived objects. So do you think it's possible to skip the marshalling and unmarshalling process when using memory-only index?
    Marty Schoch
    @Jxic yeah the in-memory index is pretty bad today. We'd like to replace it with one backed by scorch as well (on the road-map for this year) Can you be more specific about which marhsal/unmarshal you think would be helpful to remove?
    so according to the pprof diagram, the function NewBackIndexRowKV causes plenty of new memory allocation, and it boils down to the (*BackIndexRowValue).Unmarshal
    it's only when the garbage collection starts that a few search request got affected, but if the gc gets called too frequently, we cannot achieve high availability for our services
    Type: cpu
    Time: Feb 26, 2021 at 5:29pm (CST)
    Duration: 30s, Total samples = 1mins (200.47%)
    Entering interactive mode (type "help" for commands, "o" for options)
    (pprof) top 10
    Showing nodes accounting for 29.71s, 49.40% of 60.14s total
    Dropped 347 nodes (cum <= 0.30s)
    Showing top 10 nodes out of 140
          flat  flat%   sum%        cum   cum%
         7.59s 12.62% 12.62%      7.59s 12.62%  (our own function)
         4.85s  8.06% 20.69%     10.92s 18.16%  (our own function)
         4.32s  7.18% 27.87%      8.07s 13.42%  runtime.scanobject
         3.06s  5.09% 32.96%     11.18s 18.59%  runtime.mallocgc
    Sipun S
    Hi, How can I use "ngrams" for partial matching of words. I tried to import "github.com/couchbaselabs/bleve/analysis" and nameNgramMapping := bleve.NewTextFieldMapping()
    nameNgramMapping.Analyzer = ngramAnalyzer
    nameNgramMapping.Name = "ngram", but it fails with error " module declares its path as: github.com/blevesearch/bleve
    but was required as: github.com/couchbaselabs/bleve"
    Marty Schoch
    @Jxic unfortunately I just don't see any simple way around that. The upsidedown index scheme uses the back index for core operation, I can't think of any simple way to just "not do that part".
    @SipunS1_twitter for errors like that it is helpful to share the entire code somewhere. That particular error sounds unrelated to bleve usage, but possibly to the way the modules/packages are configured and referenced. There is an example of ngram mappings here: https://github.com/blevesearch/beer-search/blob/master/mapping_example2.go
    @mschoch alright I got it, really appreciate for you reply, so for the scorch replacement, how much improvement could we expect?
    Marty Schoch
    @Jxic so, it's hard to speculate, and I don't want to give you false hope that it's a lot better. What I would say is that more of the data is already in a form that is useful for searching, as opposed to upsidedown where we first have to deserialize to even process the records. But, ultimately as matches are found, there will still be allocations.
    @Jxic I'm not sure what the state of your application is, but a very similar in-memory index is already supported by the bluge project I started last fall. The API for indexing is a bit lower-level, but should feel familiar if you've worked with bleve. So if you're interested in trying that out, you can find more info here: https://github.com/blugelabs/bluge
    Some of the ideas explored in bluge are being back-ported into bleve, so even if you're project can't use bluge, it still might be useful feedback about the in-memory scorch index ideas.

    @Jxic I'm not sure what the state of your application is, but a very similar in-memory index is already supported by the bluge project I started last fall. The API for indexing is a bit lower-level, but should feel familiar if you've worked with bleve. So if you're interested in trying that out, you can find more info here: https://github.com/blugelabs/bluge

    Cool! thank you so much, I'll check that out

    What is the best way to print a query.Query (for debugging)
    Marty Schoch
    @amnonbc I don't use it myself, but you can try the DumpQuery function in the search/query package: https://github.com/blevesearch/bleve/blob/ae28975038cb25655da968e3f043210749ba382b/search/query/query.go#L351-L361
    It claims to print query hierarchy, including expanding query strings into their underlying queries
    Another question. What sort of performance (speed) should we expect to see for fuzzy searches?
    Marty Schoch
    @amnonbc so it's hard to characterize the performance in absolute terms, because it depends on so many factors. What I can say is that we store the term dictionary in a data-structure called an FST (finite state transducer), which allows us to pretty efficiently find the set of terms within the edit distance of the search term.
    Thanks Marty. Does this apply to the old upside down index, or only to the new scorch index?
    Marty Schoch
    Only to the scorch index. The upsidedown index has to compute all the various possibilities and then brute force search for them.
    Darrin McCarthy
    I want to run a query for the first 50 documents based on a specified sort order (mod time). When I run the query using a query string query passing an empty string, the request takes 5-6 seconds. When I pass at least 1 search term (e.g. +beer) the query returns very quickly. Is there a way to make the empty string query faster?
    I would imagine that the first case will sort all the documents in your index and return the first 50. The second case will only sort the subset of documents that contain the term "beer". If this represents only a small proportion of the total documents in the index, one would expect this to complete faster.
    Hendrik Grobler
    Hey guys, after swapping out an old index for a new one, what's the best way to remove the old one permanently?