Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Sutou Kouhei
    @kou
    Could you open an issue including how to reproduce it from scratch? It should include how to install Postgres-XL, PGroonga and so on.
    https://github.com/pgroonga/pgroonga/issues/new
    eric jonas
    @ericmachine88_twitter

    Hi all,

    I need some help.

    I am currently using MariaDB 10.1 and the full text search can't work with infix %..%

    I am hoping Groonga (or MGroonga) can help me to achieve this limitation. Managed to install MGroonga inside MariaDB 10.1.

    All my existing tables are InnoDB.

    I did this

    create table salesorder_mroonga engine=Mroonga select * from salesorder;

    managed to create the table successful.

    So i add the fulltext index

    alter table salesorder_mroonga add fulltext index pName (productName);

    or using this way

    create table salesorder_mroonga(fulltext index (productName)) engine=Mroonga select * from salesorder;

    Can add, but the fulltext index will show null for cardinality.

    A fulltext index in InnoDB table salesorder will have 30,000 for cardinality.

    So if I run this query, it will return 0.

    select from salesorder_mroonga where match(productName) against ('Al*' in boolean mode);

    Any idea why I add a fulltext index won't work?

    Sutou Kouhei
    @kou
    * around AI are needless:
    select * from salesorder_mroonga where match(productName) against ('AI' in boolean mode);
    eric jonas
    @ericmachine88_twitter
    Can mroonga support prefix wildcard? example in mariadb LIKE '%a%'
    Sutou Kouhei
    @kou
    match against uses partial match not exact match.
    It means that match() against ('AI' in boolean mode) is like '%AI%.
    eric jonas
    @ericmachine88_twitter
    wow, thanks. I believe if i want exact match, then i will do this match() against ('+AI' in boolean mode) ?
    Sutou Kouhei
    @kou
    You can use the standard = like COLUMN = 'AI'.
    eric jonas
    @ericmachine88_twitter
    I read your docs and to achieve '%AI%', you have something called pragma DOR .. is match() against ('DAI' in boolean mode) the same with match() against ('AI' in boolean mode) ? basically i want to have results like 'AIMA', 'BIAIM', 'CAI'.
    weird, asterisk not showing in this chat - i mean this https://mroonga.org/docs/reference/full_text_search/boolean_mode.html (5.7.1.4.1.1. DOR)
    Sutou Kouhei
    @kou

    You need to specify the TokenBigramSplitSymbolAlphaDigit tokenizer (parser) when you create a full text index:

    create table salesorder_mroonga(fulltext index (productName) parser "TokenBigramSplitSymbolAlphaDigit") engine=Mroonga

    https://mroonga.org/docs/tutorial/wrapper.html#how-to-specify-the-parser-for-full-text-search

    Then against('AI' in boolean mode) returns AIMA, BIAIM and CAI.
    You need to use Markdown markup here.
    eric jonas
    @ericmachine88_twitter
    i see, so i can just ignore the DOR pragma. If I use that parser, any side effects? I thought that parser is meant for unicode characters like japanese, koreans, etc.
    Sutou Kouhei
    @kou
    Search noise may be increased.
    eric jonas
    @ericmachine88_twitter
    search noise may increase when use that parser? hmm.. any example?
    Sutou Kouhei
    @kou
    But it depends on context.
    Normally, users want to search "artificial intelligence" related documents by AI.
    With the tokenizer, AI returns documents that are related to taint because taint includes ai.
    The tokenizer isn't only for CJK characters.
    eric jonas
    @ericmachine88_twitter
    oic, i get what you mean by noise
    fair though
    eric jonas
    @ericmachine88_twitter
    ok i will try the above tips, thanks :)
    Sutou Kouhei
    @kou
    No problem.
    eric jonas
    @ericmachine88_twitter

    I have tested this query for tb1 (mroonga storage engine).

    select count(*) from tb1 where match(description) against ('+Java +Shell' in boolean mode) order by lastUpdatedAt;

    1) Before added:

    Mroonga (1.08 s) is faster than MariaDB 10.1 (3.0 s)

    2) After added - FULLTEXT INDEX (description) COMMENT 'tokenizer "TokenBigramSplitSymbolAlphaDigit"':

    Mroonga (4.94 s) is slower than MariaDB 10.1 (3.0 s)

    Why added that become much slower? I tried the above parser syntax, it won't work. So have to use comment 'tokenizer...'

    if mroonga is slower than existing mariadb full text, don't see the value of using it though
    Sutou Kouhei
    @kou
    Could you show explain select ...?
    Why does order by exist with count(*)?
    It's needless.
    Markdown syntax for writing SQL: https://github.github.com/gfm/#fenced-code-blocks
    Could you also show show create table tb1?
    eric jonas
    @ericmachine88_twitter
    i just dropped that table, i will add it back again. will share the stats soon.
    eric jonas
    @ericmachine88_twitter

    a) For the explain

    id | select_type | table | type | possible_keys | key | key_len | ref | rows | extra
    1 | SIMPLE | tb1 | fulltext | description | description | 0 |    | 1 | Use where

    b) For the show create

    tb1    CREATE TABLE `tb1` (`id` int(11) NOT NULL AUTO_INCREMENT,`activityType` int(11) DEFAULT NULL,`lastUpdatedAt` datetime DEFAULT NULL,`lastUpdatedBy` varchar(255) DEFAULT NULL,`description` text,`refNo` varchar(255) DEFAULT NULL, PRIMARY KEY (`id`),KEY `index_activitytype` (`activityType`),  KEY `index_refno` (`refNo`), FULLTEXT KEY `description` (`description`) COMMENT 'tokenizer "TokenBigramSplitSymbolAlphaDigit"') ENGINE=Mroonga AUTO_INCREMENT=11121052 DEFAULT CHARSET=utf8
    any help?
    Sutou Kouhei
    @kou

    It looks good.

    Could you show show status like '%mroonga%' before select count(*) from tb1 where match(description) against ('+Java +Shell' in boolean mode) and after it?

    Mroonga_count_skip should be incremented after select count(*) from tb1 where match(description) against ('+Java +Shell' in boolean mode).
    BTW, how did you install Mroonga?
    eric jonas
    @ericmachine88_twitter
    Installed in MariaDB 10.1
    1. mysql -uroot -p
    2. INSTALL SONAME 'ha_mroonga';
    3. CREATE FUNCTION last_insert_grn_id RETURNS INTEGER SONAME 'ha_mroonga.so';
    4. \q

    a) Before run select query

    show status like '%mroonga%'
    
    Variable_name | Value
    Mroonga_count_skip | 1
    Mroonga_fast_order_limit | 0

    b) After run select query

    show status like '%mroonga%'
    
    Variable_name | Value
    Mroonga_count_skip | 1
    Mroonga_fast_order_limit | 0

    It doesn't increment? hmm...

    eric jonas
    @ericmachine88_twitter

    I have tested storage engine vs wrapper

    storage engine is faster than wrapper

    and if both I add "TokenBigramSplitSymbolAlphaDigit", it will be slower than MariaDB InnoDB fulltext search.

    Horimoto Yasuhiro
    @komainu8

    From show status like '%mroonga%' result, I can read row count optimization( https://mroonga.org/docs/reference/optimizations.html#row-count ) doesn't work.
    So, that may have been the cause that Mroonga's fulltext search will be slower than MariaDB InnoDB fulltext search.

    However, this optimization does work in my enviroment.

    Could you open an issue from here https://github.com/mroonga/mroonga/issues and write full SQLs (CREATE TABLE, CREATE INDEX, INSERT, SELECT, ...) to reproduce the case?

    And could you teach me your Mroonga version by the following SQL.

    SHOW VARIABLES LIKE 'mroonga_version';
    Kartik Soneji
    @KartikSoneji
    Hi everyone!
    I wanted to know how to enable compression for Mroonga tables.
    The documentation lists the variables mroonga_libgroonga_support_{lz4,zlib,zstd}, but does not have any info about how to actually use compression.
    ふりだしにもどる
    @kenhys_twitter
    There are some test case for compression. It may help you.
    Kartik Soneji
    @KartikSoneji
    @kenhys_twitter Thanks, that is what I was looking for.
    Don't you think this should easier to find in the documentation? Maybe as a separate page for compression?
    簡煒航 (Jian Weihang)
    @tonytonyjan
    Hi everyone. I would really appreciate it if anyone can tell me whether should I put all index columns in a single table or one index column per table?
    Horimoto Yasuhiro
    @komainu8
    What kind of case assume? Could you give me more details?
    Is the detail of this question explained in groonga/groonga#1132 ?
    Kartik Soneji
    @KartikSoneji
    Hi,
    I noticed an issue with the Mroonga compression library tests.
    Groonga does not compress text if the length is less than 256 bytes.
    In lib/store.c
    #define COMPRESS_THRESHOLD_BYTE 256
      if (value_len < COMPRESS_THRESHOLD_BYTE) {
        return grn_ja_put_packed(ctx, ja, id, value, value_len, flags, cas);
      }
    But the test string is only 62 bytes long, so it doesn't trigger compression.
    LZ4 , Zlib, ZStd
    INSERT INTO entries (id, content) VALUES (1, "I found Mroonga that is a MySQL storage engine to use Groonga!");
    Horimoto Yasuhiro
    @komainu8
    @KartikSoneji Thank you for your report! I'll add tests that they use a text the length is more than 256byte.