Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    Paul Masurel
    @fulmicoton
    The original from and to index you extracted your token from
    So the offset in byte of the relevant bit in the HTML
    If this is difficult just set it to 0
    It is only used for highlighting
    Paul Masurel
    @fulmicoton
    You need to population the text and the position though
    petr-tik
    @petr-tik
    Hey, I've been snowed in, hence the relative silence. Should have time this weekend to dig into this
    Paul Masurel
    @fulmicoton
    noproblem
    the feature is more or less implemented on #772 now
    it is missing docs & unit tests
    matrixbot
    @matrixbot
    Matthew tantivy-search/tantivy#777 just bit us fairly nastily in matrixland
    Paul Masurel
    @fulmicoton
    The issue description is awesome
    I'll
    Have a look at it soonish
    xlzheng021
    @xlzheng021
    Hi, would it be possible to merge two/multiple different set of search indexes?
    Paul Masurel
    @fulmicoton
    if the schema is the same yes
    but you will need to do some operations manually
    if the resulting index is ment to be read only copy the file and edit meta.json
    if it is writable, you need to edit managed.json as well
    xlzheng021
    @xlzheng021
    Thanks Paul! yes, the schema are the same, I was looking for some tool or way to consolidate the index like the way Lucene does: https://lucene.apache.org/solr/guide/6_6/merging-indexes.html
    So after the merge, is that up to the Tantivy to decide when each segments to be merged, right?
    Paul Masurel
    @fulmicoton
    once you managed created an index the union of your segments, and edited managed.json (should contain the list of all the file) and edited meta.json
    if you open that index and use a indexwriter, tantivy will do the merging when it thinks this is a good idea
    it is important to edit the managed.json file because tantivy only garbage collects files that are in this file
    unfortunately it might need some kind of event to consider doing this merging.
    typically a commit.
    There are no easy way to trigger it
    (wihtout committing)
    and I don't know if it happens if you do an empty commit... I suspect it does
    you can also manually select the segments you want to merge and askk tantivy to merge them
    XBagon
    @XBagon_gitlab

    Stumbling over a huge problem, and not sure if it has anything to do with tantivy, but I'm running out of ideas..

    Error: Error while loading searcher after commit was detected. IOError(IOError { path: Some("index/c3ccce920be14099ba687c183c47889e.store"), err: Os { code: 12, kind: Other, message: "Cannot allocate memory" } })
    thread 'thread '<unnamed><unnamed>' panicked at '' panicked at 'failed to allocate an alternative stackfailed to allocate an alternative stack', src/libstd/sys/unix/stack_overflow.rs:132Warning: Merge of [Seg("28232c89"), Seg("eb779fb5"), Seg("6302ddb0"), Seg("3abfb5bf"), Seg("4f2498d5"), Seg("9d0810a7"), Seg("ef9890cd"), Seg("3e54bad1")] was cancelled: IOError(IOError { path: Some("index/28232c8956c8477aa11c20c334433c62.fast"), err: Os { code: 12, kind: Other, message: "Cannot allocate memory" } })
    :13',
    note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
    src/libstd/sys/unix/stack_overflow.rs:132:13
    fatal runtime error: failed to initiate panic, error 5fatal runtime error: failed to initiate panic, error
    5
    Aborted (core dumped)

    this is the error message, of the error that occurs seemingly randomly after inserting over 3 million documents. RUST_BACKTRACE is already set to 1, no idea why it says that. I compiled everything with debug information, but sadly this is all I get of information. I had the same problem before without the fatal errors and abort, no idea why they are now there.

    fdb-hiroshima
    @fdb-hiroshima
    code 12 being ENOMEM, and seeing failed to allocate an alternative stack, you probably ran out of ram at some point, so that mmap failed
    XBagon
    @XBagon_gitlab
    It used like 10 GB of 30
    I watched with htop most of the time
    and I doubt it went from 10 to 30 in very little time as it went up very slowly linearly
    Paul Masurel
    @fulmicoton
    Wide guess.
    You are not batching commits?
    Can you check your number of running tjhreafs and number of files in your index?
    XBagon
    @XBagon_gitlab
    do you mean the amount of documents or literal files in my index folder?
    freshly started, 34 threads and the files in the folder are increasing, currently at 92
    the one time it went up to 4 million documents it had 141 files
    more at some points of the process though
    Paul Masurel
    @fulmicoton
    you do not commit after each added document do you?
    matrixbot
    @matrixbot
    poljar perhaps it's the issue we're having, keeping a reader alive while committing?
    poljar without searching
    XBagon
    @XBagon_gitlab
    I commit every 10 seconds, is that too often?
    Paul Masurel
    @fulmicoton
    no it sounds ok
    Paul Masurel
    @fulmicoton
    You can check if #779 solves your problem too
    XBagon
    @XBagon_gitlab
    okay thanks! I will try that
    Paul Masurel
    @fulmicoton
    it is in master now
    Paul Masurel
    @fulmicoton
    there is a freelancer job about tantivy on upwork apparently https://www.upwork.com/job/Prototyping_~01f7550e034bd7fb96/