Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    bgruening
    @bgruening:matrix.org
    [m]
    Hi all. We have a PG issue with new PG versions
    If you rebuild the DB with PG 12 or later and import all data you will get this error (on large servers)
    psql:backup/test.full.2021-11-02.sql:1732648574: ERROR: index row size 2712 exceeds btree version 4 maximum 2704 for index "ix_history_annotation_association_annotation"

    Stefan has looked at it and commented this:

    Ok, I think I'm beginning to understand what's going on...
    Btree-indexing has been re-done in v12 already; from v12 on, btree indexes are "version 4" with additional capabilities. So far so good.

    Now, according to what I gather from earlier occurrences of the error message in question found on the web, the max size of a value that could be hashed using btrees used to be 2713 (bytes, I presume). But according to the error message I got when loading the test backup (see above) with v4 that limit has been decreased to 2704.
    But our current database layout still assumes that a size of 2712 is ok - aye, there's the rub!
    The interesting question is of course: how does the ORG manage to run PG v12 just fine with the same DB schema?

    I suspect that they did not upgrade to v12 by backing up, installing the new version and finally re-loading the backup but rather did an in-place upgrade using pg_upgrade(1). This process just keeps the old indexes around, at v3. And unless they perform a REINDEX operation they can still use that old index happily everafter...
    Unfortunately, I see no way to create v3 btree indexes in PG v12 and up, no suitable parameter for CREATE INDEX as far as I can see.

    Helena Rasche
    @hexylena:matrix.org
    [m]
    but I'm wondering if we need to index by annotation, indexing by user_id/history_id both make sense.
    I don't know if I've ever seen a way to query history annotations in the UI by text content? Does the history search do that currently?
    ok, no, can confirm that history search does search the annotations.
    Helena Rasche
    @hexylena:matrix.org
    [m]
    should probably be a GIN index
    not a btree
    all the PG text search code I'm looking at uses GINs
    Helena Rasche
    @hexylena:matrix.org
    [m]
    bgruening: dropping and re-creating that index is going to be very expensive if it locks the table, maybe something to try during your downtime? Or just drop it + re-create on the new machine?
    1 reply
    but it's probably over to the sqlalchemy folks to figure out how to do a different index type in postgres only. and how not to make that a migration since locking the history table will kill big servers.
    Helena Rasche
    @hexylena:matrix.org
    [m]
    yeah something like: drop index, (server gets slow for annotation searchers), export, reimport (new machine), re-create the index with type=gin now
    bgruening
    @bgruening:matrix.org
    [m]
    yes, this we can do.
    I assume others will see similar problems when they update to >PG12 and have big instances so, we need to discuss this also on the code level.
    Helena Rasche
    @hexylena:matrix.org
    [m]
    yeah definitely. I'm thinkin btree is the wrong index type here (and for any other columns with type=text) so, definitely going to be something we need to think about how to change
    sj213
    @sj213:matrix.org
    [m]
    CREATE INDEX and REINDEX both support the keyword CONCURRENTLY, which allows for (re-)creating an index w/o locking the pertinent table. This operation will of course run even longer than a idxbuild with a writelock and definitely have an impact on server performance but blocking writes for hours can be avoided this way. (Disclaimer: this is according to the manual, I never tried it myself)
    Helena Rasche
    @hexylena:matrix.org
    [m]
    still, if that's an option that's fantastic
    sj213
    @sj213:matrix.org
    [m]
    Just for fixing ix_history_annotation_association_annotation in our new server instance concurrent idxbuild may not be needed at all as we could do it before putting the box into production. OTOH, I expect the whole process of dump & reload to take around 9h already w/o manual fiddling with any indexes, so depending on how the operation goes schedule-wise I'll still consider it an option :)
    sj213
    @sj213:matrix.org
    [m]
    What's more problematic is that I apparently can't just substitute GIN for BTREE in CREATE INDEX
    When I try "CREATE INDEX ix_history_annotation_association_annotation ON history_annotation_assiciation USING GIN (annotation);" what it get is 'ERROR: data type text has no default operator class for access method "gin"'
    Followed by "HINT: You must specify an operator class for the index or define a default operator class for the data type"
    sj213
    @sj213:matrix.org
    [m]
    The GIN-idx examples in the manual all use functions like to_tsquery() or to_tsvector(). This is frankly all greek to me XD All the text matching I ever did so far in SQL was using LIKE...
    Helena Rasche
    @hexylena:matrix.org
    [m]
    yeah
    https://www.postgresql.org/docs/current/indexes-types.html nothing else there sticks out to me as a better option for text, but, ok, maybe there's some bigger changes involved in switching to that.
    but I guess the short term solution is drop the index and hope no one is searching by history
    sj213
    @sj213:matrix.org
    [m]
    Advanced text matching is, apparently, an advanced topic... ;)
    Helena Rasche
    @hexylena:matrix.org
    [m]
    BRINs aren't right, SP-GiST are spatial, GiST lack equality, Hash only have equals.
    sj213
    @sj213:matrix.org
    [m]
    The index hasn't been created in the first place, so no need to drop it in the new instance. But yes, if we don't find a solution certain querys will probably be slooow... :\
    Or else, unless anything in the current version of Galaxy requires v12-features, we could maybe just try our luck with PG v11 and stick with that until a general solution has emerged
    The new server instance isn't in production yet, so all options are on the table
    Bj√∂rn Gr√ľning
    @bgruening
    There is a way to check how often this index was used in the past, isn't it?
    can we get this number
    sj213
    @sj213:matrix.org
    [m]
    For "SELECT idx_scan,idx_tup_read,idx_tup_fetch FROM pg_stat_user_indexes WHERE indexrelname='ix_history_annotation_association_annotation';" I get 0,0,0 :)
    for 'ix_history_annotation_association_user_id' the numbers are 8,8,8
    Bj√∂rn Gr√ľning
    @bgruening
    can you double check this with other indices? @hexylena:matrix.org @natefoo could you see if you get the same results on your instances maybe?
    if its 0, we should just drop it ;)
    also upstream
    sj213
    @sj213:matrix.org
    [m]
    For ..._history_id it is 51227775, 203278, 203001
    So the zeroes are not just zero because no data was collected it would seem
    Nate Coraor
    @natefoo:matrix.org
    [m]
    galaxy_main=> SELECT idx_scan,idx_tup_read,idx_tup_fetch FROM pg_stat_user_indexes WHERE indexrelname='ix_history_annotation_association_annotation';
     idx_scan | idx_tup_read | idx_tup_fetch
    ----------+--------------+---------------
            0 |            0 |             0
    (1 row)
    sj213
    @sj213:matrix.org
    [m]
    Ok, this particular index looks indeed dispensable
    Helena Rasche
    @hexylena:matrix.org
    [m]
    Whoo!
    Bj√∂rn Gr√ľning
    @bgruening
    here we go, lets get rid of it :)
    is there a way to do this for all indices in a loop @sj213:matrix.org ?
    1 reply
    maybe we can remove more?
    Curtis Ross
    @cross12tamu

    Alright...I can't find where environment variables are declared for Galaxy.

    We're trying to migrate our Apollo services and I cannot make the environment variable change for the life of me. I thought it was in /etc/supervisor/conf.d/<files> but...evidently it isn't picking up. Even after restart.

    Is there another place that environment variables can get declared? Something perhaps that I am missing? At least obvious ones.

    <we don't use ansible atm, we are currently old school/bare metal>

    Nolan Woods
    @innovate-invent
    how is galaxy executed on your server?
    Curtis Ross
    @cross12tamu
    supervisor, but, I think I answered my own question and the env vars in that file are getting picked up, they're just not getting picked up on restart. So, looking into that issue now.