Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    Paul Masurel
    @fulmicoton
    If you want an union for instance, you build a BooleanQuery::from(vec![(Occur::Should, term_a), (Occur::Should, term_b), ...])
    if you want an intersection, same thing but with Occur::Must
    I really need to add helpers...
    jonahcwest
    @jonahcwest
    @fulmicoton Thank you! That's exactly what I was looking for
    Paul Masurel
    @fulmicoton
    @jonahcwest I'm glad it helped
    jonahcwest
    @jonahcwest
    Do you provide a way to search bytes that may not be valid UTF-8? ie. a ‘Vec<u8>’ instead of an ‘str’
    Paul Masurel
    @fulmicoton
    It is available in master. You will need your own query parser however
    jonahcwest
    @jonahcwest
    I see. How would you create a Term since Term::from_bytes is private?
    Paul Masurel
    @fulmicoton
    Nice catch. So the one you want to use is Term::from_term_bytes(...) and it was indeed pub(crate). I just pushed a commit to make it public
    jonahcwest
    @jonahcwest
    That's great, thanks! If you don't mind me bugging you once again, it doesn't seem possible to index a bytes field. How would you get around that?
    Paul Masurel
    @fulmicoton
    again this is only available in master. are you working in master?
    This feature is not released yet
    jonahcwest
    @jonahcwest
    Yes, but I’m referring to creating a schema with add_bytes_field
    There is no option to index a bytes field
    jonahcwest
    @jonahcwest
    My bad. I only saw the commit you made yesterday that fixed Term and not the earlier ones for indexing byte fields. Thank you anyways!
    Paul Masurel
    @fulmicoton
    no problem
    matrixbot
    @matrixbot
    bbigras Paul Masurel (Gitter): thanks for the reply last week.
    bbigras I have another question. For one of my "problem" a solution was to use RegexQuery. It seems that I have to do the query myself, but can I still allow my users to search for something like "client:Bob.* and color:green". I mean that if I do the query myself, I have no idea if I can handle the logic stuff like "and".
    Paul Masurel
    @fulmicoton
    Yes. Check my reply to @jonahcwest above
    You need to build a booleanquery.
    matrixbot
    @matrixbot
    bbigras I think I understand that you say to use booleanquery if I want to do a query myself but can I get a booleanquery from a string that my users will produce in my UI?
    Paul Masurel
    @fulmicoton
    You need to implement your own query parser. Tantivy's query parser does not have a syntax for regex queries
    matrixbot
    @matrixbot
    bbigras gotcha. thanks
    Stephen Becker IV
    @sbeckeriv
    Hello again. Is there a way to debug why a document matched a query?
    Paul Masurel
    @fulmicoton
    Have you looked at the explain output?
    23 replies
    henghanan
    @henghanan
    hello, can somebody tell me why tantivy is faster than lucene?
    lyj
    @lengyijun
    Written in Rust
    Paul Masurel
    @fulmicoton
    It is not due to one single reason. It is more a sum of not very well identified performance gain.
    One of them might be pure rust byte code generation.
    The usage of explicit SIMD instructions is another obvious one.
    The care given to what should be a static dispatch and what can be a dynamic dispatch is another
    On the indexer side, the datastructure is sensibly different.
    For count on unions, the algorithm is better on tantivy's side.
    Finally there is a couple of difference in phrase queries handling, I don't know if that makes a difference to be honest
    Paul Masurel
    @fulmicoton
    I don't like the simple "it is because of rust" answer because direct ports of lucene are typically slower than lucene.
    Rucene is slower, clucene was slower...
    Stephen Becker IV
    @sbeckeriv
    Hello, I have read recently regex matching above. . I see the work on the pr https://github.com/tantivy-search/tantivy/pull/918/files#diff-44834880126ba22476c7e7ef833aab1ec4767a200661577e4dc84d1a579a4fb8R236 and i see the regex query https://github.com/tantivy-search/tantivy/blob/5f574348d184559caa024912306bc54fac3b1086/src/query/regex_query.rs#L144 object... I think I understand how the two should work together. With this work will some of the query parser functions become public? If i understand the query parser right I think i want to change this line in convert literal to query https://github.com/tantivy-search/tantivy/blob/main/src/query/query_parser/query_parser.rs#L533 to check to see if it looks like it could be a regex and use RegexQuery::from_pattern? Or am i very wrong about how this should work?
    Paul Masurel
    @fulmicoton

    With this work will some of the query parser functions become public? If i understand the query parser right I think i want to change this line in convert literal to query https://github.com/tantivy-search/tantivy/blob/main/src/query/query_parser/query_parser.rs#L533 to check to see if it looks like it could be a regex and use RegexQuery::from_pattern? Or am i very wrong about how this should work?

    There is no plan to put regex into the query parser, I am afraid. You need to implement your own query parser.

    Lucene had a wildcard operator by default in their query parser for quite a few version. It was really terrible because any website using lucene would have this hidden feature with horrible computational cost
    It would make sense to make it an option when building the query parser though, and disable the regex by default.
    That's quite a bit of work however.
    Stephen Becker IV
    @sbeckeriv
    I understand. Configurable would be nice and prevent a lot of duplicate code you already so well tested. I attempted to add a bool to QueryParser and now see that convert_to_query does not use the QueryPraser at all..
    Stephen Becker IV
    @sbeckeriv
    Hello again, Can i explain a regex query? it does not appear to work like a boolean query. I can confirm it works but the results are not so clear sometimes. how does scoring work for regex?
    Paul Masurel
    @fulmicoton
    If i recall correctly the score is constant
    lyj
    @lengyijun

    I meet an error:

    thread '<unnamed>' panicked at 'Field norm not found for field "id". Was it market as indexed during indexing.', /home/mpc/.cargo/git/checkouts/tantivy-9e77a871f83bfdf7/3aff18c/src/core/segment_reader.rs:138:13

    I spent few hours and found that: the id is not STORED.
    I feel this bug information should be improved. It shold notify me the cause may be the field was not STORED.

    Paul Masurel
    @fulmicoton
    Can you give more context on which call it happened?
    And the version of Tantivy you use?
    lyj
    @lengyijun
    Sorry, I made a mistake.
    Stephen Eckels
    @stevemk14ebr
    @fulmicoton are you online this is in relation to the boolean query stuff
    there's 2049 segments. I fetch the whole document but i really only need a single field 'sha256' from the document. My documents are pretty simple, they are a STRING sha256 which represents the hash of the whole document, then for each feature of the document ~4000 there is a STRING feature_value and a u64 FAST | INDEXED farmhash of the feature_value