Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    Paul Masurel
    @fulmicoton
    one is to use the regex term query
    you can search for a regex
    the second is to use the ngramfilter @drusellers implemented.
    matrixbot
    @matrixbot
    bbigras Paul Masurel (Gitter): do I need to do anything special for a regex search or just use .*oba.*?
    Paul Masurel
    @fulmicoton
    You need to do something special
    It does not work with the query parser nor with Tantivy-cli
    You need to build the query object yourself
    matrixbot
    @matrixbot
    bbigras Paul Masurel (Gitter): Thank you very much. I'll take a look tomorrow.
    Stephen Becker IV
    @sbeckeriv
    Hello again, I am following the delete update example. I am deleting my doc term and recreating. I now have duplicates. Do i need to use a shared index object for the delete and create like the example? -> my code in the thread.
    18 replies
    Laurențiu Nicola
    @lnicola

    It's now official... I'm quitting google to found a startup with a friend around tantivy

    Congratulations and good luck!

    Paul Masurel
    @fulmicoton
    long time no see @lnicola!
    Laurențiu Nicola
    @lnicola
    Yeah, sorry about that :(. Still lurking around, though. I've worked a little on rust-analyzer, so if you're using it, it might still help tantivy indirectly :)
    Sean Stangl
    @sean_stangl_twitter
    Does Tantivy support sorting on a field instead of by score? For example, if I have a numeric column, and I would like to just filter for the Top 10 documents that match the query, as sorted by the numeric field in ascending order.
    Sean Stangl
    @sean_stangl_twitter
    As a fallback, I could make some custom Scorer that uses that field's value as the document's ConstantScore, or something to that effect?
    Sean Stangl
    @sean_stangl_twitter
    I suppose this is asking about filter queries.
    Paul Masurel
    @fulmicoton
    Yes. You can do that. Check the doc of the TopDocs collector.
    Your field needs to be a fast field.
    And in 0.13 it needs to be u64. More types will be supported in 0.14
    1 reply
    Does this answer your question @sean_stangl_twitter
    ?
    @lnicola oh yeah. I recently switched to vscode so I am a rust analyzer user now! Thanks for that!
    Sean Stangl
    @sean_stangl_twitter
    @fulmicoton Yes, thanks! Somehow managed to miss that repeatedly in the docs.
    Sean Stangl
    @sean_stangl_twitter
    I think I would still need to control sort order, but in the meantime I can work around that by storing both N and u64_max - N
    Sean Stangl
    @sean_stangl_twitter
    looking at the code it seems really simple to just make a ScorerByFastFieldReader that does that transformation internally :)
    Paul Masurel
    @fulmicoton
    Yeah can you open a ticket to add decreasing order?
    The easiest way for you is to but your transform in a TopDocs::custom_score(...)
    If you check the example you can pass it a closure... It is really < 6 lines of code
    Sean Stangl
    @sean_stangl_twitter
    Sure! Do you have a preferred API for it: maybe an ASC/DESC enum passed into order_by_u64_field()? Happy to submit a PR.
    Paul Masurel
    @fulmicoton
    Can you open a ticket first? We can discuss the implementation there.
    Sean Stangl
    @sean_stangl_twitter
    Paul Masurel
    @fulmicoton
    thank you
    Stephen Becker IV
    @sbeckeriv
    Is there anyway to get stats on my index? number of documents, terms, size of things? people like looking at stats :)
    Paul Masurel
    @fulmicoton
    There is a space usage function that gives you detail about... Space usage of each field.
    Meta.json contains the number of docs
    You can get all of these programmatically but not in a centralized manner.
    Sean Stangl
    @sean_stangl_twitter
    Would you have an opinion on how one might best implement a serializable, read-only index? For example, suppose there are tens of thousands of small, unrelated indexes. After creation, they are read-only.
    2 replies
    Ideally they would just be represented as files we could mmap, and we'd disable watch functionality and remove locking. (With hand-waving if some metadata needs to change at runtime: maybe that is stored in a real, separate file.)
    I could construct a RAMDirectory and then serialize it to some flatbuffer-like format, which is effectively the same as the HashMap<PathBuf, ReadOnlySource>, but works with the data in serialized format.
    Are there any obvious pitfalls I'm missing if I would try that?
    matrixbot
    @matrixbot
    bbigras If I want to index some orders and their items. Could there be a way to search an item while wanting to specify a filter on the parent order, like "order.client: something"? Would I need something like nested documents support?
    4 replies
    Sean Stangl
    @sean_stangl_twitter
    I think I need downcast_ref() support for ManagedDirectory.
    1 reply
    Njagi Mwaniki
    @urbanslug
    @fulmicoton WOAH! Congrats!!!
    Paul Masurel
    @fulmicoton
    @urbanslug thank you!
    jonahcwest
    @jonahcwest

    Hi all! I've came across Tantivy and it looks very promising. I had a small question, however: Is it possible to combine multiple queries together? In our application, tokenization and weighting are done on the client so we'd like to be able to pass a list of tokens (and possibly an edit distance) and search for those without any processing done by the search library. Forgive me if I have overlooked this, but there doesn't seem to be an obvious way to use multiple queries (such as searching for multiple terms in a FuzzyTermQuery) without using a QueryParser.

    Thank you!

    Pasha Podolsky
    @ppodolsky
    Not sure I understood you correctly, but you can programatically construct impl Query object, something like https://github.com/tantivy-search/tantivy/blob/main/src/query/boolean_query/mod.rs#L284
    Paul Masurel
    @fulmicoton
    @jonahcwest yes you can. Check out the BooleanQuery...
    You would have to build a bunch of TermQuery and FuzzyTermQuery ... and then combine them in an union or an intersection depending on your need
    The doc is not really great, impl From<Vec<(Occur, Box<Query>)>> is the one you want to use.
    If you want an union for instance, you build a BooleanQuery::from(vec![(Occur::Should, term_a), (Occur::Should, term_b), ...])