Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Jul 25 17:47
    fergiemcdowall commented #586
  • Jul 25 17:45

    fergiemcdowall on v3.2.0

    (compare)

  • Jul 25 17:42

    fergiemcdowall on master

    bump node test versions version bump (compare)

  • Jul 25 16:19
    mikaelkaron commented #586
  • Jul 25 13:06
    fergiemcdowall closed #585
  • Jul 25 13:06

    fergiemcdowall on master

    feat: switch to `abstract-level` fix: install `fergies-inverted-… fix: fix require path for `si` and 4 more (compare)

  • Jul 25 13:06
    fergiemcdowall closed #586
  • Jul 25 13:06
    fergiemcdowall commented #586
  • Jul 25 12:45
    mikaelkaron commented #586
  • Jul 25 12:44
    mikaelkaron review_requested #586
  • Jul 25 12:40
    mikaelkaron synchronize #586
  • Jul 25 12:28
    mikaelkaron synchronize #586
  • Jul 23 01:38
    dependabot[bot] labeled #588
  • Jul 23 01:38
    dependabot[bot] opened #588
  • Jul 23 01:38

    dependabot[bot] on npm_and_yarn

    Bump file-type from 16.5.3 to 1… (compare)

  • Jul 21 20:02

    eklem on npm_and_yarn

    (compare)

  • Jul 21 20:02

    eklem on master

    Bump terser from 5.14.1 to 5.14… Merge pull request #587 from fe… (compare)

  • Jul 21 20:02
    eklem closed #587
  • Jul 21 07:09
    dependabot[bot] labeled #587
  • Jul 21 07:09
    dependabot[bot] opened #587
Fergus McDowall
@fergiemcdowall
Yes. SQLlite is an ACID compliant data store- nothing should ever disappear or go corrupt in SQLlite under normal circumstances. search-index is a type of database that is specifically optimised for fast linguistic search (with aggregation and token matching)
So its pretty common to store data in a "proper" database (postgres, mysql, oracle, sqllite, etc) and then feed that data into a search index (search-index, elastic search, solr, norch, etc)
goodev2021
@goodev2021
how to create a DB structure to store a multi-level hierarchy - the hierarcy level is not static? does anyone have any suggestion
Fergus McDowall
@fergiemcdowall
@goodev2021 was that a search-index question?
goodev2021
@goodev2021
what is the difference between using apache lucene in mobile app vs using search-index. i saw few apps using apache lucene inside mobile app for offline search? but when i tried the search it was pretty slower....so which one is faster as both the libraries are indexing and do the search
the decision is to be taken before implementation of just normal sqlite or search-index or apache lucene in inside..
Fergus McDowall
@fergiemcdowall
Hi @goodev2021 - that is a decision that you need to take on your own depending on your own use cases :) I have never done a direct comparison between the three, but would be really interested in hearing about the results if anybody else has.
goodev2021
@goodev2021
I'm looking for a android mobile app - open source or commerical to search huge pdfs - i found one for IOS - https://pdfsearch.app/ but couldn't find one for android. Can anyone suggest me ??? Is there is any software or template or mobile app does this for android
Jeff See
@jeffsee55

:wave: I'm investigating search-index for a library and am wondering how it handles searching in array values, here's the data I'm indexing:

{
  "name": "Homer Simposon",
  "favoritePosts": [
    "content/posts/welcome3.md"
  ],
  "_id": "content/authors/homer.md",
}

And here's my query (sorry for the length):

{
  "AND": [
    {
      "FIELD": [
        "favoriteposts"
      ],
      "VALUE": {
        "GTE": "contentpostswelcome3md",
        "LTE": "contentpostswelcome3md"
      }
    },
    {
      "FIELD": [
        "name"
      ],
      "VALUE": {
        "GTE": "homer",
        "LTE": "homer"
      }
    }
  ]
}

Without the favoriteposts portion, things work as expected. I've seen a few things on the Github issues about this but none of them quite seem to match up to my use case

I've tried various things with the casing and special characters of the field and values but nothing seems to work (ie. favoriteposts/ favoritePosts and contentpostswelcome3md/content/posts/welcome3.md)
Also - it seems like other Level DB libraries like PouchDB have fallen short of providing anything around this type of array searching. I would be ok with altering the data I'm indexing to be more like this if it'd make sense:
{
  "name": "Homer Simposon",
  "favoritePosts.0": "content/posts/welcome3.md",
  "favoritePosts.1": "content/posts/welcome4.md",
  "_id": "content/authors/homer.md"
}
Jeff See
@jeffsee55

EDIT: I do get valid results when I don't strip casing and special characters in the VALUE section, so this works:

{
  "AND": [
    {
      "FIELD": [
        "favoriteposts"
      ],
      "VALUE": {
        "GTE": "content/posts/welcome3.md",
        "LTE": "content/posts/welcome3.md"
      }
    },
    {
      "FIELD": [
        "name"
      ],
      "VALUE": {
        "GTE": "homer",
        "LTE": "homer"
      }
    }
  ]
}

This is confusing to me because when I don't have an array for favoritePosts I do need to strip special characters

Fergus McDowall
@fergiemcdowall
Hi @jeffsee55 and thanks for reaching out- you have found a bug:
basically favoriteposts finds hits, but favoritePosts (large 'P') doesn't
if you initialize search-index with caseSensitive: true then the field name favoritePosts works as expected
Jeff See
@jeffsee55
That's one thing I noticed, which was easier to fix by just casting them to lowercase, the more surprising thing for me was the difference between an array value and a non-array
Though I noticed this comment https://github.com/fergiemcdowall/search-index/issues/540#issuecomment-822635413. So it seems like this is expected, do array values not get indexed the same?
if you initialize search-index with caseSensitive: true then the field name favoritePosts works as expected
Thanks, I'll try that
Fergus McDowall
@fergiemcdowall
ah- yes sorry- I was a bit quick to reply there...
here is an example:
;(async () => {
  const si = require('search-index')
  const print = txt => console.log(JSON.stringify(txt, null, 2))
  const db = await si({
    name: 'arrays'
  })

  const data = [
    {
      name: 'Homer Simpson',
      favoritePosts: ['content/posts/welcome3.md'],
      _id: 'content/authors/homer.md'
    }
  ]

  await db.PUT(data)

  await db
    .QUERY({
      FIELD: ['favoriteposts'],
      VALUE: 'content/posts/welcome3.md'
    })
    .then(print)
})()
just to explain ->
when you use arrays, special characters are not stripped
Jeff See
@jeffsee55
Ok, makes sense why that works for me then, what's the reason for the difference?
Fergus McDowall
@fergiemcdowall
so ['content/posts/welcome3.md'] is stored in the index as 'content/posts/welcome3.md'
whereas if you simply do 'content/posts/welcome3.md' (no array), then special chars will be be stripped and the string will be tokenized
Jeff See
@jeffsee55

Would

  const data = [
    {
      name: 'Homer Simpson',
      favoritePosts: ['content/posts/welcome3.md', 'content/post/welcome4.md'],
      _id: 'content/authors/homer.md'
    }
  ]

Be indexed as 'content/posts/welcome3.md' and 'content/posts/welcome4.md' as separate fields somehow (sorry, might be a confusing question)

Fergus McDowall
@fergiemcdowall
yes
If you want to investigate this you can use level-out to inspect the index
level-out arrays
{"key":"favoriteposts:content/posts/welcome3.md#1.00","value":["content/authors/homer.md"]}
{"key":"name:homer#1.00","value":["content/authors/homer.md"]}
{"key":"name:simpson#1.00","value":["content/authors/homer.md"]}
{"key":"○DOCUMENT_COUNT○","value":1}
{"key":"○DOC_RAW○content/authors/homer.md○","value":{"name":"Homer Simpson","favoritePosts":["content/posts/welcome3.md"],"_id":"content/authors/homer.md"}}
{"key":"○FIELD○favoriteposts○","value":"favoriteposts"}
{"key":"○FIELD○name○","value":"name"}
{"key":"○○CREATED","value":1622729161588}
Jeff See
@jeffsee55
Ah that would be awesome, I was looking for a way to to do that
Fergus McDowall
@fergiemcdowall
level-out is really handy for seeing how things are being indexed
BTW- indexing and query pipelines are going to be much better in search-index@3 (coming soon)
Jeff See
@jeffsee55
I saw that! Seems like it's close, is the reason array values don't get stripped of special characters for something specific?
Fergus McDowall
@fergiemcdowall
Yes- its so that you can easily do your own tokenization and use non-ascii chars
So for instance- I work a lot with scandinavian languages and sentences like "bøker er gøy" are not always tokenized correctly
(due to the non-ascii ø)
Jeff See
@jeffsee55
Ok, so if I had an item like:
  const data = [
    {
      name: 'Homer Simpson',
      comments: ['doh', 'this is another comment'],
      _id: 'content/authors/homer.md'
    }
  ]
Fergus McDowall
@fergiemcdowall
therefore its easier to do ['bøker', 'er', 'gøy']
You could also introduce ngrams: ['bøker er', 'er gøy']
Jeff See
@jeffsee55

Would I get a hit for:

  await db
    .QUERY({
      FIELD: ['comments'],
      VALUE: 'another'
    })
    .then(print

Or does that need to be indexed manually by me

Fergus McDowall
@fergiemcdowall
ëtc.
lets see...
no
Jeff See
@jeffsee55
Ok that's what I'm seeing too I just wasn't sure if I was doing something wrong
Fergus McDowall
@fergiemcdowall
 const data = [
    {
      name: 'Homer Simpson',
      comments: 'doh this is another comment',
      _id: 'content/authors/homer.md'
    }
  ]
This would tokenize comments to 'doh', 'this', 'is', 'another', 'comment'
and you would then get a hit for VALUE: 'another'
comments: ['doh', 'this is another comment'] would allow you to search for VALUE: 'doh' and VALUE: 'this is another comment'
Jeff See
@jeffsee55
Ok, thanks for the explanation. I'm trying to see if I can use this for full search of potentially complex objects. Some might have an array of objects - which I'd like to sort of "flatten" out into something that gets indexed in the normal way
But maybe what I'm starting to understand is that there's no reason to keep those values as arrays, I can just merge them as you've done in the last example into a single string