Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Claudiu-Vlad Ursache
    @ursachec
    Hello everyone, I have yet another question, this time regarding parent/child relationships: is there a workaround to querying indexed fields from parent documents (something like parent_ref.indexed_field)? And just to throw a comment in: importing indexed fields sounds like a very useful feature to have, happy that it's on the radar.
    Jo Kristian Bergum
    @jobergum
    Hello @ursachec, you cannot import fields with index, only attribute fields. See vespa-engine/vespa#12333
    Claudiu-Vlad Ursache
    @ursachec
    Maybe I didn't ask the question in the right way @jobergum: I am aware that only attribute fields can be imported, I was curious if there is any other way to query indexed fields from parent documents without importing them in child documents
    Jo Kristian Bergum
    @jobergum
    Not in a brilliant scalable way as with parent child and attribute fields but multiple approaches exist. 1) work around the limitation by tokenization and storing tokens in an array which is attribute, or weightedset. Downside is memory util increase and that you need to write custom code in a custom document processor. One can use https://docs.vespa.ai/documentation/linguistics.html for tokenization and stemming.
    2) Join by multiple queries, one query which searches the 'global' parent and groups by an key field which is attribute and a second query which limits recall by weightedSet query operator over the foreignKey in the child document type.
    Second approach might be network intensive if too many unique keys are found, and relevancy information is lost inflight.
    It all depends on the scale you operate, throughput, latency requirements and document volumes
    Claudiu-Vlad Ursache
    @ursachec
    That is very useful @jobergum, thank you for the input
    ddorian
    @ddorian
    Assuming you have 3 servers/hosts.
    Can you create an app, with, say, 10 content clusters, all pointing to the same hosts.
    This will help in the future to split the content cluster to their own hosts and scale indipendently.
    Kristian Aune
    @kkraune
    yes you can, mapping these in services.xml. what teams often do is running in separate docker containers, though
    some thoughts at https://docs.vespa.ai/documentation/search-definitions.html#multiple-search-definitions - some times, using less clusters, with more search definitions makes sense, too
    ddorian
    @ddorian
    The idea is of high static query cost. Which is best fixed by having 1 doc-type in content-cluster (and using group etc).
    Can you live-migrate a doc-type to another content cluster ?
    Jon Bratseth
    @bratseth
    Yes, but not automagically. You need to write the documents to the new content cluster, shift queries to it, then remove that document type from the old one.
    ddorian
    @ddorian
    ^ I was hoping for an automagically.
    I assumed you could have multiple content-cluster in the same content-node-process, something like a "doc-type=table, content-cluster=database", but it's not like that.
    Jon Bratseth
    @bratseth
    A cluster is a collection of content-node processes ...
    ddorian
    @ddorian
    ^correct, thank you!
    Иван Сердюк
    @oceanfish81_twitter
    Any suggestions for using Podman, instead of Docker ?
    Jon Bratseth
    @bratseth
    No. We plan to make that change ourselves in Q3
    Jeffrey Kayne
    @jeffkayne
    Hi everyone! Relatively new to vespa, and struggling with trying to write a query to match documents with a "&" special character. I am trying to use "\x38" but without success. (https://docs.vespa.ai/documentation/reference/document-select-language.html ). Could anyone help me out? Thanks!
    Jon Bratseth
    @bratseth
    Just using a quoted string containing & works fine in the document selection language @jeffkayne E.g mydocument.myfield = "&"
    Since you mention "queries" though - are you sure you are looking the right place? Document selection if for content management (feeding, dumping data, garbage collection), not for issuing queries over the data.
    Jeffrey Kayne
    @jeffkayne
    Hi Jon, thanks for the response! Indeed you are right, I was looking in the wrong place. To recap, I am trying to query the data (with a yql), to find all documents that match the string "&Free".
    The exact query I'm executing is:
    "SELECT text_raw FROM sources post WHERE text_raw CONTAINS '&free';"
    This query only filters documents that contain the string "free", not "&free".
    Cheers
    Jon Bratseth
    @bratseth
    This is because & is classified as punctuation, not a word character, so it is stripped during tokenization (in text indexes). If this is a problem for you I suggest creating an issue for it on https://github.com/vespa-engine/vespa
    Vlad
    @vfil
    Hi! I am trying to run some integration tests for my vespa app. unfortunately, the platform has some restrictions on number of open files.
    I am trying to set VESPA_UNPRIVILEGED var, but for some reason, the start script is trying to run ulimit -n command anyway. Do you have any hints on how to use VESPA_UNPRIVILEGED or file_descriptor_limit env vars?
    Vlad
    @vfil
    I am running vespa from vespaengine/vespa:7.104.46 docker image
    Arnstein Ressem
    @aressem
    @vfil I would expect this to work. What are you setting VESPA_UNPRIVILEGED to ? How do you expose this env variable to the startup scripts ?
    Arnstein Ressem
    @aressem
    I would also recommend submitting an issue at https://github.com/vespa-engine/vespa/issues with enough details for us to reproduce problem. Vespa 7.104.46 is also very old (although from what I have seen I would expect VESPA_UNPRIVILEGED=yes to work).
    Vlad
    @vfil

    Thanks, @aressem! I tried to reproduce on my local:

    ~:10:11:16> docker run -it --entrypoint="" vespaengine/vespa:7.104.46 bash
    [root@a0fb8519b22e /]# ulimit -n 65000
    [root@a0fb8519b22e /]# export VESPA_UNPRIVILEGED=yes
    [root@a0fb8519b22e /]# /usr/local/bin/start-container.sh &
    [1] 16
    [root@a0fb8519b22e /]# hostname: you must be root to change the host name
    Running /opt/vespa/libexec/vespa/start-configserver
    Creating data directory /opt/vespa/conf/zookeeper
    Creating data directory /opt/vespa/var/zookeeper
    Creating data directory /opt/vespa/var/zookeeper/version-2
    /opt/vespa/libexec/vespa/common-env.sh: line 178: ulimit: open files: cannot modify limit: Operation not permitted

    Good suggestion, will try with latest version and then will submitt an issue.

    Vlad
    @vfil
    have the same issue with vespaengine/vespa:7.187.1
    Jo Kristian Bergum
    @jobergum
    Could you open an github issue on it?
    Vlad
    @vfil
    sure @jobergum, doing it now
    Jo Kristian Bergum
    @jobergum
    Generally, since Vespa can use a lot of files (depending on number of files etc) having default low ulimit settings for number of files might be considered experimental
    I meant, number of fields
    Vlad
    @vfil
    my case is to run some integration tests on a continuous integration platform (circleci) with some generated fixtures. So the CI job is running on a docker container with low open files ulimit configuration ~65k and I don't have the option to control this. So I decided to go with VESPA_UNPRIVILEGED=yes to skip the ulimit -n step on startup.
    what I noticed is that vespa processes are running as vespa user even if I start the process as root or another user.
    Arnstein Ressem
    @aressem
    @vfil I have described a solution in that ticket.
    Vlad
    @vfil
    thanks, @aressem. it works for me!
    quangdanh
    @quangdanh
    Hello everyone!
    I'm facing an issue when feeding data
    I posted the question in here vespa-engine/vespa#12661
    Jon Bratseth
    @bratseth
    Answered there
    quangdanh
    @quangdanh
    Thanks @bratseth . I replied the message.
    rameshpoti
    @rameshpoti
    Installed Vespa version 7.197.21 on Red Hat Enterprise Linux Server release 7.7 (Maipo) and getting a JVM core dump
    Java VM: OpenJDK 64-Bit Server VM (11.0.6+10-LTS)
    Stack: [0x00007fa0308d0000,0x00007fa0309d1000], sp=0x00007fa0309cd760, free space=1013k
    Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
    C [jna14224380065926013414.tmp+0x13095] ffi_prep_closure_loc+0x15
    C [jna14224380065926013414.tmp+0xa3cc] Java_com_sun_jna_Native_registerMethod+0x51c
    j com.sun.jna.Native.registerMethod(Ljava/lang/Class;Ljava/lang/String;Ljava/lang/String;[I[J[JIJJLjava/lang/reflect/Method;JIZ[Lcom/sun/jna/ToNativeConverter;Lcom/sun/jna/FromNativeConverter;Ljava/lang/String;)J+0
    j com.sun.jna.Native.register(Ljava/lang/Class;Lcom/sun/jna/NativeLibrary;)V+1117
    j com.sun.jna.Native.register(Ljava/lang/Class;Ljava/lang/String;)V+17
    j com.sun.jna.Native.register(Ljava/lang/String;)V+7
    j com.yahoo.io.NativeIO.<clinit>()V+28
    Bogdan Snisar
    @bsnisar

    Hello guys, could you pls give me an advice.

    TL:DR

    • we had one feature to return some evaluated ranks in response
    • for this reason have an idea to use summary-features

    This idea raises questions that can't be answered correctly by docs, so I am here.
    Example:

    
    rank-profile arithmetic_mean inherits abstract_weighted_mean {
            summary-features {
                ...
                queryMatchScoreFn
                popularScoreFn
                undiscoveredScoreFn
                ...
            }
            first-phase {
                expression: .... 
            }
            second-phase {
                expression: ....
            }
    }

    My questions:

    1. Are evaluations for summary-features will be executed on first that makes it heavier?
    2. Is there any other ways to return evaluated ranks ?

    Thanks!

    Jon Bratseth
    @bratseth
    1. They are only evaluated for hits that will actually be returned, so there's no waste here.
    1. Using summary-features is the right way of doing it.
    Bogdan Snisar
    @bsnisar
    Thanks @bratseth