by

Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Jo Kristian Bergum
    @jobergum
    Thanks for the clarification @agusgun, Yes, WordItem tokenItem = new WordItem("coronavirus", "default", true) Will resolve this problem
    1 reply
    If isFromQuery is false, the word will not be stemmed by the stemming searcher, it's then assumed that everything is taken care of already
    Jo Kristian Bergum
    @jobergum
    @agusgun Also consider using WeakAndItem instead of OrItem if your collection is large
    1 reply
    Ferreira Remi
    @remi.ferreira_gitlab
    Hello everyone, I have a question about empty fields. I came across this issue: vespa-engine/vespa#9835 do you have any news on that? I would like to build a search query that does not takes documents where a specified field is empty.
    Jo Kristian Bergum
    @jobergum
    Hello @remi.ferreira_gitlab, no updates on this. Workaround is to use a default chosen value to represent empty.
    Ferreira Remi
    @remi.ferreira_gitlab
    Thanks for the reply @jobergum , can I filter those documents in ranking during match_phase? If yes how will it looks?
    Something like: attribute(name).count >0 ?
    Jo Kristian Bergum
    @jobergum
    @rmi.ferreira_gitlab yes you can, but it's not very efficient
    rank-profile drop inherits default { first-phase { rank-score-drop-limit: 0 expression: if(attribute(foo).count == 0, -1, nativeRank) } }
    I this case the if assigns the score -1 if count is 0, otherwise nativeRank. But if this drops a lot of documents you would be much better of having this expressed in the query
    Ferreira Remi
    @remi.ferreira_gitlab
    Ok, I get the idea... Thanks a lot!
    jblankfeld
    @jblankfeld
    Hi guys, I am trying to fetch features with a random ranking function in order to improve recall of relevant docs.
    My index is quite large and my select function rather broad, meaning that I match many documents in average. This turns out to be very resource hungry and thus poorly scalable in a job. I would like to use match degradation with soft timeout because anyway it's just a random ranking function but this turns out difficult because listing features adds a timeout of 360s that I cannot override. Any idea how to improve this ?
    Jo Kristian Bergum
    @jobergum
    If you are using a subset of the features which are dumped by &rankfeatures=true, you can add them to summary-features in your ranking profile, e.g rank-profile dump, then you won't have this 360second timeout
    or add a searcher which re-sets the timeout
    jblankfeld
    @jblankfeld
    I tried your second suggestion but it is overriden anyway, I guess there is a component coming after my searcher doing this. Could I specify my searcher to be run after the override ?
    Will try summary-features too ! thanks a lot for the tip
    Agus Gunawan
    @agusgun

    Hi everyone, I want to ask about my use case to add custom topic tagging to the text search:
    Suppose we have a text document with an additional array of string that represents the top 3 topics:

    text: coronavirus is a dangerous virus
    topics: ["virus", "health", "species"]

    what I want is to do a custom query processor then add the topic to the query then match the topics, however, the match only acts as optionalFilter who boost the score when the topics matches, any idea how to do this?

    example of a query:

    query: coronavirus
    topics: ["virus", "species"]
    Jo Kristian Bergum
    @jobergum
    @agusgun You can use the rank query operator for this. where rank(default contains "coronavirus", topics contains "virus", topics contains "species");
    Jon Bratseth
    @bratseth
    See also vespa-engine/vespa#13558 which answers the same question.
    Agus Gunawan
    @agusgun
    Thank you for the guidance @jobergum @bratseth
    Jo Kristian Bergum
    @jobergum
    To add to the above, the rank() query operator is great for free text, not a fixed vocabulary, e.g free user input. If you have a pre-defined vocabulary/taxonomy/labels using tensors as in mentioned issue is IMHO a cleaner solution.
    Agus Gunawan
    @agusgun
    Thank you Jo. I will try the tensor solution for now.
    Jo Kristian Bergum
    @jobergum
    You also then need to design a ranking profile @agusgun which uses a combination of the text matching rank features from the original query and the topics tensor scoring.
    Agus Gunawan
    @agusgun
    Yes I will try to combine the text matching and the topics tensor scoring. Btw, will the matching:prefix supported for the key of the tensor?
    Jo Kristian Bergum
    @jobergum
    No, tensors cannot be used for matching (except for nearest neighbor search)
    You can think of tensors like torch.tensors, they are useful for math computations, not for string matching
    Agus Gunawan
    @agusgun
    I got it, so it means the first solution to use matches is better if I want to use match:prefix right?
    Jo Kristian Bergum
    @jobergum
    Yes, then it smells more like a free form input and not predefined labels/taxonomy
    3 replies
    François Weber
    @francoisWeber
    Hi Vespa Team
    I try to get an exhaustive list of signals that Vespa is able to compute on a given doctype for learning-to-rank purpose. To retrieve these features, I send a request to Vespa by including "listFeatures": true within the request's JSON. However it appears that many computable signals are missing (ex: bm25(ANY_FIELD). Is their a way to know by advance the exhaustive list of computable signals ? If I knew this list, i would use it through the summary-features to make them accessible ... Thanks for your help :)
    Jon Bratseth
    @bratseth
    bm25 is not included because you must enable-bm25 on the field. The full list is here: https://docs.vespa.ai/documentation/reference/rank-features.html
    François Weber
    @francoisWeber
    Thanks for your quick answer. Most of my fields are already bm25-enabled actually
    Jon Bratseth
    @bratseth
    Yes, but perhaps this list should depend on that, but currently it does not.
    Imho, it is better to manually select a subset of features.
    François Weber
    @francoisWeber
    OK thanks for you advice about manual selection !
    Gregory Bondar
    @jamesbond7
    Good day to everyone! Please, advice how to implement a misspellings and typos correction which will work similar to Azure's fuzzy search: https://docs.microsoft.com/en-us/azure/search/search-query-fuzzy
    Agus Gunawan
    @agusgun
    Hi @jamesbond7 , maybe this SO solve the question https://stackoverflow.com/questions/54760470/is-there-a-spell-checker-in-vespa
    Jo Kristian Bergum
    @jobergum
    Yes, there is no fuzzy/edit distance query support or query syntax in Vespa. Spell checking is slightly different IMHo as you would re-write the query before hitting the content nodes. There is n-gram matching with configurable n and also regular expression support for fields which are in memory (attribute fields).
    Gregory Bondar
    @jamesbond7
    @jobergum @agusgun Thanks for the advice!
    Jo Kristian Bergum
    @jobergum
    @jamesbond7 I've seen this question a few times so I've created vespa-engine/vespa#13814
    Agus Gunawan
    @agusgun
    Hi everyone, I have question about security model in vespa cloud. Can we disable API certificate in the data plane? So we don't need to provide certificate for it. If not, is there any workaround for this? Maybe change the data plane to API Key.
    Jo Kristian Bergum
    @jobergum
    @agusgun It's not possible to disable mTLS for cloud.vespa.ai data plane (read and write).
    9 replies
    Bogdan Snisar
    @bsnisar
    This message was deleted

    Hello!

    There is a question of grouping of multiple values for a multivalue field.
    According to https://docs.vespa.ai/documentation/reference/grouping-syntax.html#multivalue-attributes

    Multivalue fields such as maps but there is no way to apply aggregation for each value of grouped key.

    all( group(mymap.key) each(output(max(mymap.value))) )

    Is there any workaround that can get desired result (group by key aggregate by value of each key) ?

    Bogdan Snisar
    @bsnisar
    Will add one details: interesting only in counting max(value) among grouping of some key.
    Thanks
    Jo Kristian Bergum
    @jobergum
    Hi @bsnisar, sorry but there is no work around on this at scale.
    5 replies
    Marcel Neuhausler
    @marcelneu_gitlab
    Feels like I currently look in to the wrong direction, or have blinds on .. a use-case I would like to solve with Vespa but don't know how .. would like to rank based on the number of words occuring in a text based on a predefined set of key words. Example: set of words 'lawsuit, convicted, ..' and now rank on how many of those words show up in a text field, btw not the number of times one word shows up, but the number of individual words found in the text. Highest ranked: All those words show up at least once in the text field.
    6 replies
    Marcel Neuhausler
    @marcelneu_gitlab
    .. and we would like to do this at query/serving time so we can change those word-lists dynamically/on-demand.
    Agus Gunawan
    @agusgun
    Hello,
    For the parent-child relationship, can we create an array of reference?
    Kristian Aune
    @kkraune
    not the expert, but quite sure you cannot - not sure how to use the array of references to import field values then. can you share your use case?
    Agus Gunawan
    @agusgun

    My case is suppose one content belong to some group.
    Group have an id and a text field which is the name of the group.

    Later, I want to fill the content with the name of all of the related group. However, the group name can change quickly. I want to use reference to solve this so I don't need to include the group field in the content and update each related content whenever there is a change in group name.

    Kristian Aune
    @kkraune
    Now I get it, thanks! It is a good use case. I am quite sure it is not supported, but this is not well documented, so we should anyway fix that, too. Can you please create an issue at https://github.com/vespa-engine/vespa/issues - we will use this as a feature request, and can also discuss workarounds/other options in that ticket, too (including updating the documentation/FAQ)
    1 reply
    Chris Nell
    @raincoastchris

    Is there a way to access summary features from inside a grouping aggregator expression?

    More specifically we'd like to be able to access attributeMatch(<attribute>) from inside the order(...) clause of a grouping expression when using a rank profile that declares attributeMatch(<attribute>) as a summary-feature. So, I see the value I want to use when I output summary() for the group hits, but I'd like to sort groups on eg average of those.

    I saw vespa-engine/vespa#12792 is open which seems related, but a bit more general.