Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    Hendrik Grobler
    @hsjgrobler_gitlab
    Hey guys, after swapping out an old index for a new one, what's the best way to remove the old one permanently?
    Marty Schoch
    @mschoch
    @darrinmc_twitter yeah confirm what query is actually being executed, I think by default empty query string results in match none query, so your results are unexpected: https://github.com/blevesearch/bleve/blob/master/search/query/query_string_parser.go#L38
    as @amnonbc points out, it sounds like a match all is executing, and it means we have more documents to process and sort by your sort order (mod time)
    @hsjgrobler_gitlab can you clarify what you mean by "swapped out" here? if you're using an index alias, and switch from an old index to a new one, you can close and remove the old if you no longer need it
    Hendrik Grobler
    @hsjgrobler_gitlab
    @mschoch Yeah, using the index alias. How would I remove the index? Delete the file at the index path?
    Marty Schoch
    @mschoch
    Yes, once you no longer need it, you should call Close(), and one that returns you can remove the index path.
    Hendrik Grobler
    @hsjgrobler_gitlab
    Great, thanks!
    SvineruS
    @SvineruS
    hello, i'm new to golang and bleve. can i get in search results data, that i put in bleve.Index function? or only a id?
    13 replies
    and another question. can I change the data in the index for a given ID and write it to disk?
    3 replies
    @mschoch
    Amnon
    @amnonbc

    https://github.com/golang/go/issues/40724#issuecomment-821758073

    Bleve is being used as two of the benchmarks to measure the speedup that the change to register based param passing will give in GO 1.17. Results look impressive: 14% speedup in one case.

    Marty Schoch
    @mschoch
    @amnonbc thanks for sharing that
    JT Archie
    @jtarchie
    Bleve is optimized for generic documents, has any work been done to see if there are speed advantages for specific document shapes? I have no idea what I am talking about in nomenclature.
    JT Archie
    @jtarchie
    It looks like a custom made Index that adheres to the IndexMapping interface.
    Marty Schoch
    @mschoch
    @jtarchie can you expand on what you mean by document shape, and how you would optimize for a particular shape?
    JT Archie
    @jtarchie
    It looks like Bleve does a lot of inspecting of structs with reflection, empty interface{}, and inspection of a query results. If remove any reflection, can we get get a performance speedup?
    Marty Schoch
    @mschoch
    @jtarchie again can you be more specific? The entrypoint to indexing/updating documents does take interface{} and perform a mapping process. But this is not required, applications are encouraged to directly build their own Document instances and use the IndexAdvance() method instead (but only if your application needs this). So, on the indexing path, you can avoid interface and reflection if you choose. On the querying side I don't think we use the empty interface much. Is there some particular location you have in mind?
    Jonathan Clem
    @jclem
    Does Bleve support concurrent access by different OS processes? I would like to have one (OS) process doing indexing and another (a CLI) performing queries, but is that possible? Right now my solution is that the indexing process starts an HTTP server that my CLI instead connects to. It looks like it doesn't just because I can't get a second connection on the index to open, but not sure if there's a configuration thing or something. Google has failed me.
    Marty Schoch
    @mschoch
    @jclem Bleve does not support this today. I have another project Bluge, which is an experimental fork of Bleve, it does support using OS locking capabilities to let multiple processes use the index at the same time (single writer, multi-reader)
    See https://github.com/blugelabs/bluge for more info, it is currently just developer preview though
    Jonathan Clem
    @jclem
    @mschoch Thanks! I'll check out bluge. This is for a purely personal project (for now), so that may be a good option for me.
    chemivro
    @chemivro:matrix.org
    [m]
    Hello, I was planning on using bleve and I was wondering around the read mutex, if I have a single instance of a bleve index, will that be only able to perform one read at a time, at which point would it be better if I opened multiple indexes? I'm going to be using it on a web app which might be receiving hundreds of requests per second
    chemivro
    @chemivro:matrix.org
    [m]
    for example opening 10 indexes in read only (I don't need to index) then when a request arrives I try to lock one of the indexes, if that works use it, if not continue to try others. Or maybe I'm misunderstanding and this is not needed to allow multiple concurrent queries?
    chemivro
    @chemivro:matrix.org
    [m]
    looks like it works in parallel just fine so no need to do any of this
    Marty Schoch
    @mschoch
    @chemivro:matrix.org yes in general concurrent querying will not require multiple indexes
    however, if you are trying to saturate I/O bandwidth during indexing, then you will find that partitioning can help
    Enisha Eshwar
    @enisha_eshwar_twitter

    @mschoch: Hello! I'm new to Bleve and excited to see an easy-to-use text indexing and search library on Golang. I am facing some problem getting a partial phrase match to work. Let's say I have this text string index "Harry Potter and the Cursed Child". When i lookup for "Harry P", I want to be able to match the above document.
    I am using a custom analyzer with a whitespace tokenizer and to_lower token filter.
    I tried the following:
    q1:=NewMatchQuery("Harry")
    q2:=NewWildcardQuery("P*")

    query := bleve.NewBooleanQuery()
    query.AddMust(q1)
    query.AddMust(q2)

    While this matches "Harry Potter and the Cursed Child", it also matches "Prince Harry" (Order of the word is not maintained).

    I tried a regexQuery and MatchPhraseQuery too. But haven't been able to get Regex working and MatchPhraseQuery does a full phrase match. Would appreciate some help on this.
    Marty Schoch
    @mschoch
    @enisha_eshwar_twitter welcome, unfortunately the query you want isn't one that we support. See if this PR addresses your needs: blevesearch/bleve#858
    Otherwise, you would have to match the pieces (as you were doing above) and manually process the location information on the client-side to validate matches vs false positives...
    Enisha Eshwar
    @enisha_eshwar_twitter
    Thanks for the response @mschoch. I checked out the code shared here for MultiPhrasePrefix query here: blevesearch/bleve#377 but that works similar to the code I've shared above. Again, the order is not maintained. I'm not sure if I'm missing something.
    On a different note, I tried this regex: .+firstword+" "+secondWord+.
    This isn't matching the word too.
    Marty Schoch
    @mschoch
    @enisha_eshwar_twitter the regular expression query only matches against indexed terms, it cannot match across word boundaries. As I said earlier, if you use of these other techniques to combine multiple clauses with a conjunction, you will also have to post-process the results to filter out false positives, for when the order isn't what you want. Today, the phrase query and match phrase queries are the only ones that look at the position of the term.
    Enisha Eshwar
    @enisha_eshwar_twitter
    @mschoch : Great. Understood it now. I'll be using a keyword analyser and will index the whole string and will then match it via regex. Thanks for your response. Appreciate the help.
    Marty Schoch
    @mschoch
    @enisha_eshwar_twitter that's fine, just understand, you're not really getting any of the benefits of a full-text search index when you use it in that way. It is only slightly better than a brute-force check of each string against the regular expression.
    Enisha Eshwar
    @enisha_eshwar_twitter
    @mschoch: Ok. I was initially thinking I'll build 2 indexes (one with a custom analyser where every word is tokenised, another with a keyword tokenizer). Use the 1st index with prefixQuery for looking up single words and use the 2nd index for partial phrase matches with regexQuery. Now that you mention regex would be close to brute force, I'll take your suggestion on using multiple clauses with conjunction and then post process it to filter out non-ordered words.
    Amnon
    @amnonbc

    Hi @mschoch!
    I am having some performance problems with a large index, which consists of 4 million items.
    We are using the RocksDB backend, and the index weighs about 6Gb on disk.
    A simple MatchAll search takes about a minute - most of which consists of iterating through rocksdb records in CGO.
    This takes about a minute, by which time our GUI times out.

    We can limit the time the search takes by calling idx.SearchInContext, with a context with an appropriate timeout, but in this case the search rerturns a nil result.
    What would be great for us is if there was a way for Bleve to give us "best effort" results within a deadline.
    Or alternatively for Bleve to return results as it collects them, rather than computing everything first, and then giving us the results.
    We basically want to show our users something in the GUI quickly, even if the result may need revising once more data is collected.
    Is there any way I can do this in bleve?

    Marty Schoch
    @mschoch
    Sorry to hear you are having performance issues. Are you doing MatchAll as a proxy for any query that takes a long time or are you relying on using MatchAll for your application? In general the upsidedown index format (which allows you to use RocksDB) is not well designed, and many queries have to scan large ranges of the rocksdb database.
    There is a hidden capability to stream results back as they are found, instead of returning the final result set. Unfortunately, it was added by Couchbase, and has no real obvious public API. Instead it is accessed using a magic key which can be set inside the Context passed into the search request. They key is used here: https://github.com/blevesearch/bleve/blob/e7235bec9cf6d984a1683372673d4f0571fa7d94/search/collector/topn.go#L188-L197
    By default we use that built-in function MakeTopNDocumentMatchHandler but you can (and need to) write your own.
    At the moment I am cannot find any documentation about how it is used
    Amnon
    @amnonbc
    Thanks @mschoch,
    We initially do a DateRangeQuery to return the last week's data, and the user then adds more specific terms to get the items are interested in.
    The problem is that the DateRange query takes a long time - and counter-intuitively it does not depend on how many documents match that time range.
    I'll have a look at the snippet you sent and try to make sense of it. But it looks like we will have to partition the data by date.
    Marty Schoch
    @mschoch
    @amnonbc are the date ranges completely arbitrary? or do they fall into simple buckets like month/year (presumably what you would be partitioning on) Because if the queries align with those buckets, you could prepare a special field with those values, and use basic term query (fast). That might perform well enough you could skip the partitioning. But again, nothing wrong with partitioning solution either.
    Amnon
    @amnonbc
    Many of the searchers are for the last week, or the last month. Approximating these in the form of buckets (or disjunctions of buckets), and this is a lot easier to do than partitioning. I'll give this a try.
    Amnon
    @amnonbc

    I tried the idea of buckets, and it gives a 100x speedup, even when I need to combine tens of buckets to express my query.

    This leads to another question. When a user at a GUI does a search, they (eventually) get a page of results.
    When they scroll to the next page, bleve appears to perform the entire search from scratch.
    Is there any way to get bleve to cache the results?

    Another question: when I create an index, and populate it, is it possible to add a new FieldMapping at a later stage?
    Or must the index be re-created?
    Marty Schoch
    @mschoch
    bleve appears to perform the entire search from scratch
    yes it does, the size/skip literally does just that, runs the entire search, and skips over results
    there is no easy way to cache this in some useful way to save work getting the second page
    alternatively we have a different method, it is most useful to allow for "deep pagination" but it may suit your use-case as well
    you can read more about that feature here: blevesearch/bleve#1182