Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    Marty Schoch
    @mschoch
    hey everyone, i see i've been pinged a few times, still travelling, but will try to scroll back through everything once i return in two days
    thanks for understanding
    John Cheng
    @jlcheng
    Thanks for taking a look. At work at the moment, will take a look at the dataset I tested against to look for large terms when I get home.
    John Cheng
    @jlcheng
    @steveyen yup - looks like it was choking on this 21MB term https://github.com/ikawaha/kagome/blob/master/internal/dic/data/bindata12.go
    on my machine, just indexing this file used up 3GB of memory
    John Cheng
    @jlcheng
    How should I go about using a token length filter in my code?
    Dustin Spicuzza
    @virtuald
    it seems to me that if I retrieve a document from the index, change the fields on the document, and call 'update', that all of the fields should still exist. That doesn't seem to be the case however. :-/
    John Cheng
    @jlcheng
    @virtuald what does happen?
    Dustin Spicuzza
    @virtuald
    only the updated field stays
    it's interesting, because in the debugger I can see that all of the fields were retrieved
    I'm using a custom analyzer though
    and the analyzer for all the fields was null
    so maybe that's what it was
    what I ended up doing was repopulating all of the fields (since the data was there anyways) and update worked as expected
    a bit racy... but eh
    John Cheng
    @jlcheng
    I am curious, is there a way to update a single field of a document without re-analyszing the rest of the document? The use case I have in mind is to store the "last accessed" time of a document.
    Dustin Spicuzza
    @virtuald
    not as far as I could tell
    John Cheng
    @jlcheng
    ah, that's what I thought as well.
    Dustin Spicuzza
    @virtuald
    my index is storing short sentences that I want to search... so I'd like to segment on punctuation. For example, with a string like some-1thing_more12aha.bob I'd like to be able to search for '1thing', 'more12aha', and 'bob' and get a result
    not sure if there's an existing analyzer combination that can do that?
    Dustin Spicuzza
    @virtuald
    for anyone else who runs into that, I ended up using the camelCase tokenizer, which does exactly what I want
    speunz
    @speunz
    Where can I track scorch's readiness for production use? Are there known issues with the current implementation in master?
    Herval Freire
    @herval
    hi folks! I'm trying to use bleve for a personal project, and I'm a bit stuck defining analyzers/tokenizers/etc to match my case. Anyone around with a minute to spare? :)
    https://github.com/herval/bleve-samples <-- these are my failed tests, basically
    David Pennington
    @Xeoncross
    I have been using sphinx to support custom ranking based on different weights of the documents I provide. Can bleve do something like this? OPTION ranker=expr('1000*bm25f(2.0,0.75,{link=3,title=5,text=1})')
    Where the attributes link, title, and text have different weights when the terms are found inside them.
    David Pennington
    @Xeoncross
    VictorNine
    @VictorNine
    Hi! I'm trying to use bleve to return search results almost matching. So if I have a text "Somestrangeword" I would like the serach "some" to return this result. And also if no results are found I would like "strange" to return this result. Is this possible? I've tried prefixQuery but can't make it work
    VictorNine
    @VictorNine
    Any ideas? I'm not getting expected results from NewWildcardQuery either
    VictorNine
    @VictorNine
    Maybe it's better to get out all the ID's and do a tree search? And the search when I have the full word
    Denis Titusov
    @titusjaka

    Hi @mschoch,
    I'm trying to use Bleve on a new project to index IP-related info. My data represents IP-range and contains two fields: StartIP and EndIP. The main purpose is to find that range having the exact IP-address.
    For example, StartIP: 192.168.0.0, EndIP: 192.168.255.255, and the desired IP-address is 192.168.100.100.

    Because I have IPv4 and IPv6, I must use big.Int type to represent IP-addresses. So, I convert my data to big.Int and save it to bleve as a numeric field.

    I build search-query this way:

    func buildSearchQuery(ipAddr string) query.Query {
        ipNumeric := ipToInt(ipAddr)
        q1 := bleve.NewQueryStringQuery(fmt.Sprintf("%s:<=%d", InfoStartIP, ipNumeric))
        q2 := bleve.NewQueryStringQuery(fmt.Sprintf("%s:>=%d", InfoEndIP, ipNumeric))
    
        return bleve.NewConjunctionQuery(q1, q2)
    }

    In the end, bleve cannot find any data on my requests. I think, it's because bleve use float64 for numeric fields.

    Maybe you have an advance for me on how to store and search for IP-ranges using bleve?

    Ron Lapushner
    @Ronll
    Hello!, is it possible to delete keys with a given prefix? if not what is the best way to delete a group of documents that are related somehow?
    @mschoch :)
    Mickael
    @mickael-kerjean
    Hey there, I'm evaluating different full text search solution for this project. It looks like bleve compete with the fts extension of sqlite and I'm not sure how those 2 compared together. Did someone tried both approach? I'm mostly interested in making things working on devices as powerfull as a raspberry pie, indexing data from a few gigs to a few hundreds gigs without seeing RAM or response time going overboard. Is that even possible?
    Kevin Klein
    @0x002A
    Hey there. We are implementing a search engine using the default analyzer for the English language. If we search for the term "house" like in "white house" we are getting no matches because "house" is getting filtered out of the search query although it is not part of the stop word list. Any ideas why? @mschoch
    Marty Schoch
    @mschoch
    @titusjaka unfortunately it won't work today, as our numeric support is limited to 64bits
    there is a PR which may work, but it's not an appropriate impl, so we won't be merging that as is
    @Ronll only able to delete by id's today, you'd have to fall back to searching for the keys satisfying your criteria and then doing batch deletes
    @0x002A no reason i can think of for house to be filtered out
    can you put together a test case showing this
    Seif Lotfy
    @seiflotfy
    @mschoch is there an easy way to just convert my current implementation to "scorch", is there a value i can set or something or do i have to reimplement things
    Marty Schoch
    @mschoch
    @seiflotfy API is the same, just pass scorch.Name as the index name and kvstore name to NewUsing()
    you'll have to build new index, no conversion
    melbourne2991
    @melbourne2991
    Hi all, can anyone tell me which language initials correspond to thai? I expected it would be "th" but that doesn't seem to exist yet the docs state that there is a prebuilt analyzer for thai language?
    melbourne2991
    @melbourne2991
    @mschoch
    Marty Schoch
    @mschoch
    @melbourne2991 'th' is correct, we use ISO-639 two-letter codes
    the package was moved out of core bleve into blevex, here
    this was done because tokenizing thai requries dictionary support which is only supported by 'icu'