Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    Sean Stangl
    @sean_stangl_twitter
    looking at the code it seems really simple to just make a ScorerByFastFieldReader that does that transformation internally :)
    Paul Masurel
    @fulmicoton
    Yeah can you open a ticket to add decreasing order?
    The easiest way for you is to but your transform in a TopDocs::custom_score(...)
    If you check the example you can pass it a closure... It is really < 6 lines of code
    Sean Stangl
    @sean_stangl_twitter
    Sure! Do you have a preferred API for it: maybe an ASC/DESC enum passed into order_by_u64_field()? Happy to submit a PR.
    Paul Masurel
    @fulmicoton
    Can you open a ticket first? We can discuss the implementation there.
    Sean Stangl
    @sean_stangl_twitter
    Paul Masurel
    @fulmicoton
    thank you
    Stephen Becker IV
    @sbeckeriv
    Is there anyway to get stats on my index? number of documents, terms, size of things? people like looking at stats :)
    Paul Masurel
    @fulmicoton
    There is a space usage function that gives you detail about... Space usage of each field.
    Meta.json contains the number of docs
    You can get all of these programmatically but not in a centralized manner.
    Sean Stangl
    @sean_stangl_twitter
    Would you have an opinion on how one might best implement a serializable, read-only index? For example, suppose there are tens of thousands of small, unrelated indexes. After creation, they are read-only.
    2 replies
    Ideally they would just be represented as files we could mmap, and we'd disable watch functionality and remove locking. (With hand-waving if some metadata needs to change at runtime: maybe that is stored in a real, separate file.)
    I could construct a RAMDirectory and then serialize it to some flatbuffer-like format, which is effectively the same as the HashMap<PathBuf, ReadOnlySource>, but works with the data in serialized format.
    Are there any obvious pitfalls I'm missing if I would try that?
    matrixbot
    @matrixbot
    bbigras If I want to index some orders and their items. Could there be a way to search an item while wanting to specify a filter on the parent order, like "order.client: something"? Would I need something like nested documents support?
    4 replies
    Sean Stangl
    @sean_stangl_twitter
    I think I need downcast_ref() support for ManagedDirectory.
    1 reply
    Njagi Mwaniki
    @urbanslug
    @fulmicoton WOAH! Congrats!!!
    Paul Masurel
    @fulmicoton
    @urbanslug thank you!
    jonahcwest
    @jonahcwest

    Hi all! I've came across Tantivy and it looks very promising. I had a small question, however: Is it possible to combine multiple queries together? In our application, tokenization and weighting are done on the client so we'd like to be able to pass a list of tokens (and possibly an edit distance) and search for those without any processing done by the search library. Forgive me if I have overlooked this, but there doesn't seem to be an obvious way to use multiple queries (such as searching for multiple terms in a FuzzyTermQuery) without using a QueryParser.

    Thank you!

    Pasha Podolsky
    @ppodolsky
    Not sure I understood you correctly, but you can programatically construct impl Query object, something like https://github.com/tantivy-search/tantivy/blob/main/src/query/boolean_query/mod.rs#L284
    Paul Masurel
    @fulmicoton
    @jonahcwest yes you can. Check out the BooleanQuery...
    You would have to build a bunch of TermQuery and FuzzyTermQuery ... and then combine them in an union or an intersection depending on your need
    The doc is not really great, impl From<Vec<(Occur, Box<Query>)>> is the one you want to use.
    If you want an union for instance, you build a BooleanQuery::from(vec![(Occur::Should, term_a), (Occur::Should, term_b), ...])
    if you want an intersection, same thing but with Occur::Must
    I really need to add helpers...
    jonahcwest
    @jonahcwest
    @fulmicoton Thank you! That's exactly what I was looking for
    Paul Masurel
    @fulmicoton
    @jonahcwest I'm glad it helped
    jonahcwest
    @jonahcwest
    Do you provide a way to search bytes that may not be valid UTF-8? ie. a ‘Vec<u8>’ instead of an ‘str’
    Paul Masurel
    @fulmicoton
    It is available in master. You will need your own query parser however
    jonahcwest
    @jonahcwest
    I see. How would you create a Term since Term::from_bytes is private?
    Paul Masurel
    @fulmicoton
    Nice catch. So the one you want to use is Term::from_term_bytes(...) and it was indeed pub(crate). I just pushed a commit to make it public
    jonahcwest
    @jonahcwest
    That's great, thanks! If you don't mind me bugging you once again, it doesn't seem possible to index a bytes field. How would you get around that?
    Paul Masurel
    @fulmicoton
    again this is only available in master. are you working in master?
    This feature is not released yet
    jonahcwest
    @jonahcwest
    Yes, but I’m referring to creating a schema with add_bytes_field
    There is no option to index a bytes field
    jonahcwest
    @jonahcwest
    My bad. I only saw the commit you made yesterday that fixed Term and not the earlier ones for indexing byte fields. Thank you anyways!
    Paul Masurel
    @fulmicoton
    no problem
    matrixbot
    @matrixbot
    bbigras Paul Masurel (Gitter): thanks for the reply last week.
    bbigras I have another question. For one of my "problem" a solution was to use RegexQuery. It seems that I have to do the query myself, but can I still allow my users to search for something like "client:Bob.* and color:green". I mean that if I do the query myself, I have no idea if I can handle the logic stuff like "and".
    Paul Masurel
    @fulmicoton
    Yes. Check my reply to @jonahcwest above
    You need to build a booleanquery.
    matrixbot
    @matrixbot
    bbigras I think I understand that you say to use booleanquery if I want to do a query myself but can I get a booleanquery from a string that my users will produce in my UI?
    Paul Masurel
    @fulmicoton
    You need to implement your own query parser. Tantivy's query parser does not have a syntax for regex queries
    matrixbot
    @matrixbot
    bbigras gotcha. thanks
    Stephen Becker IV
    @sbeckeriv
    Hello again. Is there a way to debug why a document matched a query?
    Paul Masurel
    @fulmicoton
    Have you looked at the explain output?
    23 replies