Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Sutou Kouhei
    @kou
    You don't need to use vector_find for jsonb index. You can use string == "tag1" for ["tag1", "tag2", ...].
    Zhanzhao (Deo) Liang
    @DeoLeung
    argh right, it's expanded :)
    Zhanzhao (Deo) Liang
    @DeoLeung
    hi, we are planning to upgrade from 2.2.9 to 2.3.4 for the crash safe feature. a question, will it block the server to be ready or has significant performance penalty if pgroonga has to rebuild the index? "If it's failed, a primary process removes all existing Groonga's database, creates a new Groonga's database and executes REINDEX"
    Sutou Kouhei
    @kou
    It doesn't block PostgreSQL to be ready. It's done in background. But you can't use (only) broken PGroonga index while rebuilding.
    Note that there are still significant write performance penalty for the crash safe feature. First, you must try it on staging environment.
    Providing performance benchmark is welcome. We can use it to improve the crash safe performance.
    Zhanzhao (Deo) Liang
    @DeoLeung
    thanks for the classification, we will give it a try :)
    zyp-rgb
    @zyp-rgb
    Hi, I'm using partition table in postgresql, and I create pgroonga_jsonb_ops_v2 for every partition table; Then I constantly run into deadlock in pgroonga, which is eventually solved by pgroonga_command('clearlock'); I'm not sure the deadlock is connected to the partition table. Does anyone has similar experience?
    Sutou Kouhei
    @kou
    Can you provide how to reproduce the problem?
    BTW, REINDEX is better than pgroonga_command('clearlock'). Deadlocked index may be still broken after pgroonga_command('clearlock').
    zyp-rgb
    @zyp-rgb
    Thank for the tip. The trigger of the problem remain unknown, I will try inventing an example
    zyp-rgb
    @zyp-rgb
    Hi, I am creating an table and a jsonb_v2 pgroonga index within a transaction, and the operation keep get locked. Here is the pgroonga log;
    ```
    2022-01-21 08:03:35.917898|e|27649: [object][register] failed to register a name: <BuildingSources4527964>: <[table][add][dat] failed to add: #<key "BuildingSources4527964" table:#<dat (nil) key:(nil)>>>
    Sutou Kouhei
    @kou
    Can we always reproduce this?
    Zhanzhao (Deo) Liang
    @DeoLeung

    hello, we are migrating one schema of our database using pg_dump/pg_restore, on one table we met

    ERROR:  pgroonga: failed to set column value: <BuildingSources289352.sms_content>: failed to cast to <Lexicon289352_0>: <"\n\n">

    the column was created with USING pgroonga (sms_content pgroonga_text_full_text_search_ops_v2), we tried creating index first or last but problem remains, is there any problem with \n? using pgroonga==2.2.9

    Sutou Kouhei
    @kou
    Can you identify the column value that causes this case?
    Zhanzhao (Deo) Liang
    @DeoLeung
    @kou hi, I reproduce it as follow:
    select encode(sms_content::bytea, 'hex') from "sometable";
    -- output value is 0a0a
    
    create table public.pgroonga_test
    (
     sms varchar
    );
    create index sms_idx on public.pgroonga_test using pgroonga (sms pgroonga_varchar_term_search_ops_v2);
    insert into public.pgroonga_test values ((select sms_content from "sometable" where id = 1234));
    -- ERROR:  pgroonga: failed to set column value: <Sources98409828.sms>: failed to cast to <Lexicon98409828_0>: <"\n\n">
    Sutou Kouhei
    @kou
    Great!
    Is this reproducible with the following insert?
    insert into public.pgroonga_test values (E'\x0a\x0a');
    Could you try Groonga 12.0.0?
    Zhanzhao (Deo) Liang
    @DeoLeung
    @kou yes, it's reproducable, I will try to compile it with 12.0.0 later today
    Zhanzhao (Deo) Liang
    @DeoLeung
    [
        [
            0,
            1644472656.01468,
            0.00008535385131835938
        ],
        {
            "n_jobs": 0,
            "uptime": 150,
            "version": "12.0.0",
            "features": {
                "lz4": true,
                "nfkc": true,
                "poll": false,
                "zlib": true,
                "epoll": true,
                "mecab": true,
                "mruby": true,
                "kqueue": false,
                "onigmo": true,
                "xxhash": false,
                "rapidjson": true,
                "zstandard": true,
                "apache_arrow": true,
                "message_pack": true
            },
            "n_queries": 0,
            "starttime": 1644472506,
            "start_time": 1644472506,
            "alloc_count": 12305,
            "apache_arrow": {
                "version": "7.0.0",
                "version_major": 7,
                "version_minor": 0,
                "version_patch": 0
            },
            "cache_hit_rate": 0.0,
            "command_version": 1,
            "max_command_version": 3,
            "default_command_version": 1
        }
    ]
    drop table public.pgroonga_test;
    create table public.pgroonga_test
    (
     sms varchar
    );
    create index sms_idx on public.pgroonga_test using pgroonga (sms pgroonga_varchar_term_search_ops_v2);
    
    insert into public.pgroonga_test values (E'\x0a\x0a');
    it's still failing with pgroonga=2.3.4 groonga=12.0.0
    Sutou Kouhei
    @kou
    Thanks!
    I can reproduce it with groonga/pgroonga:latest-debian-14 Docker image but can't on my local environment...
    Anyway, I will be able to investigate this. Thanks!!!
    Sutou Kouhei
    @kou
    @DeoLeung I've fixed this in master. E'\x0a\x0a' reports a warning instead of an error.
    zyp-rgb
    @zyp-rgb
    hello, I learned from groonga doc that there is possible that data is corrupt when we execute many additions, delete, and update data to vector column; Do we have some example or test on what level amount of execution could cause the problem? also does the bug relate to pgroonga "lock failed 1000 times" problem (because pgroonga use vector data in jsonb index)?
    Horimoto Yasuhiro
    @komainu8

    @zyp-rgb Currently, we don't know what level amount of execution could cause the problem.

    Does PGroonga output not only "lock failed 1000 times" but also such as "[DB Locked] time out(): io() collisions(/)"?
    If PGroonga does not output "[DB Locked] time out(): io() collisions(/)", PGroonga can get lock in the end.
    If PGroonga also output "[DB Locked] time out(): io() collisions(/)", PGroonga can not get lock.

    1 reply
    zyp-rgb
    @zyp-rgb
    Also, is there a better solution if one of pgroonga index is corrupted? I mean, we cannot run reindex successfully for lock 1000 times. So we searched and found the ultimate solution here pgroonga/pgroonga#156, which is deleting files and reindex everything. However, we used pgroonga in a large scale (1000 tables). Often the reindex would cause another crash... Can we some how narrow down to a few tables and just delete their files and reindex them?
    Zhanzhao (Deo) Liang
    @DeoLeung
    hi, we want to make an apt mirror of the pgroonga using nexus to speed up image build , but it complains about gpg error, where can I download the gpgkey?
    Horimoto Yasuhiro
    @komainu8

    @zyp-rgb Sorry, I don't get it. Let me get this straight.
    Which of the following is the problem you are currently experiencing?

    1. We fail to execute the "REINDEX" command on PostgreSQL due to a lot of "lock failed 1000 times" output.
    2. PGroonga crashes when we execute "REINDEX"
    3. PGroonga crashes after 1 occurs

    If PGroonga has crashed due to "REINDEX", if possible, could you provide us PGroonga's log when PGroonga crashes?

    Horimoto Yasuhiro
    @komainu8
    @DeoLeung We can download gpg key as below command.
    $ gpg --keyserver keyserver.ubuntu.com --recv-key 2701F317CFCCCB975CADE9C2624CF77434839225
    zyp-rgb
    @zyp-rgb
    @komainu8
    step by step:
    1. we got 1000 tables using pgroonga; somehow DB was locked, because any read or write operation did not respond. We checked pgroonga log, it just repeat grn_init: <12.0.0> pgroonga: initialize: <2.3.4>
    2. we decided to do pgrn file cleaning and reboot, it worked, except we have to rebuild every pgroonga index.
    3. So I started reindexing one by one. After around 100 tables, the process become extremely slow. So I stopped and check the db. For already reindexed table, it works fine. But creating new index or reindexing takes longer than 30mins. Currently I have no idea how long it would take to reindex all the tables.
    Zhanzhao (Deo) Liang
    @DeoLeung
    @komainu8 thanks, is there a debian recv key?
    Zhanzhao (Deo) Liang
    @DeoLeung
    argh~I found the key file in apt registry :)
    Zhanzhao (Deo) Liang
    @DeoLeung
    hi we want to do a full backup using rsync, but find out that the destination files are much bigger. is there something related to sparse files or hard link?
    Sutou Kouhei
    @kou
    .pgrn files are sparse files. You can confirm it by du -h /.../*.pgrn and du -h --apparent-size /.../*.pgrn.
    Note that online backup with rsync isn't PGroonga safe. You need to stop PostgreSQL (or updates) while rsync.
    Zhanzhao (Deo) Liang
    @DeoLeung
    thanks, just tried rsync -avS and it seems to be correct now :)
    Zhanzhao (Deo) Liang
    @DeoLeung
    @kou hi, we tried to create a brand new standby server from our master using repmgr, it first did a pg_basebackup, which takes up twice the space(similar to rsync without -S), is there any settings that we can do to avoid the disk blow up?
    Zhanzhao (Deo) Liang
    @DeoLeung
    pgroonga/pgroonga#144 we discuss it before (the db was 1x GB by that moment and the disk problem seems not significant), but now we got a 5xx GB db, and the space explosion will be horrible :)
    Horimoto Yasuhiro
    @komainu8

    @zyp-rgb
    I'm sorry for the late response.
    We can narrow down to a table and just delete their as below.

    1. We execute the following a command.

      We specify the name of an index that you want to remove into the following "index_name"(the second argument of pgroonga_index_column_name).

      SELECT pgroonga_index_column_name('pgroonga_content_index', 'index_name');
      
      pgroonga_index_column_name 
      ----------------------------
      Lexicon17077_0.index
      (1 row)
    2. We execute the following a command.

      We specify the result value of 1 into the following index_column_name.

      SELECT pgroonga_command('object_inspect index_column_name')::jsonb->1->'id';

      For example, In this example as below.

      SELECT pgroonga_command('object_inspect Lexicon17077_0.index')::jsonb->1->'id';
      
      ?column?
      ----------
      266
      (1 row)
    3. We convert the result value of 2 into hexadecimal of 7 digits by the following command in a shell.

      For example, In this example as below.

      % printf '%07X' 266
      000010A
    4. We remove the "pgrn" file with the result value of 3.

      For example, In this example as below.

      $ rm pgrn.000010A
      $ rm pgrn.000010A.*

    In addition, if you possibly can, could you provide the pgroonga.log and the log of PostgreSQL when any read or write operation did not respond from PostgreSQL?
    We might be able to detect the cause of this problem from those logs.

    zyp-rgb
    @zyp-rgb
    @komainu8 wow this is so helpful! We could handle specific table errors now. I will try organize the logs.
    zyp-rgb
    @zyp-rgb
    lQLPDhsju6jMWzLNBkbNB_Swn-YiDqhW1NMCCzfchICDAA_2036_1606.png
    lQLPDhsjvx6F9f3NAvzNBxqwD3t_Mu497UsCCz2H68CqAA_1818_764.png
    lQLPDhsjvv-HZVXNAnjNBhiwPvmDNFUzrmACCz1VSkCqAA_1560_632.png
    @komainu8 above was all I saved, sorry about that. At the time the priority was to recover the db so we did a lot of restart. These log are recorded when we already deleted all the files, restarted, and in the process of reindexing all the table.
    zyp-rgb
    @zyp-rgb
    @komainu8 stuck in step2
    [[-22, 1646616961.568913, 0.0001027584075927734, "[object][inspect] nonexistent target: <Lexicon4659982_0.index>", [["command_object_inspect", "proc_object_inspect.c", 635]]], null]
    Horimoto Yasuhiro
    @komainu8

    @zyp-rgb Oh...
    That index has already broken.
    Probably, the corresponding index has already been removed in PGroonga.

    We need to execute "REINDEX" in this case.
    If "REINDEX" is slow in your environment, could you disable "crash-safe" as below?

    pgroonga.enable_crash_safe = off

    I think that "REINDEX" will be faster than ever by disabling "crash-safe".
    The cause that PostgreSQL did not respond to any read or write operation was the feature of "crash-safe" as far as I could see your logs.
    Therefore, I think that your problems resolve temporarily by disabling the feature of "crash-safe".

    Currently, we are revising the feature of "crash-safe" because it has some problems.
    If you want to use the feature of "crash-safe", could you wait for the next release of PGroonga?

    zyp-rgb
    @zyp-rgb
    @komainu8 thanks for the advice. Looking forward to next release!
    Zhanzhao (Deo) Liang
    @DeoLeung
    hi, is groonga a gin or a gist index? I'm trying citus columnar to compress our data(it's getting big), but it supports btree/hash index only at the moment, so I'm wonder how to express the feature request to them :)
    ERROR:  operator class "pgroonga_jsonb_ops_v2" does not exist for access method "btree"
    Sutou Kouhei
    @kou
    No. PGroonga doesn't use GIN nor GIST.
    I don't know but Citus' implementation details but if Citus uses the PostgreSQL's standard table access method, PGroonga will work with Citus.
    Because PGroonga uses table access method API to get data from PostgreSQL.
    Zhanzhao (Deo) Liang
    @DeoLeung
    I see, I will have a talk with the citus guys, thank you!