Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Ghost
    @ghost~5772e7e2c2f0db084a206e1b
    I like to use py-spy for this, e.g. sudo -E py-spy dump --pid <pid>
    that should reveal where in the code the handler is stuck … if it is copying or moving files I think you should go ahead and try extended metadata strategy
    Thomas N Lawson
    @Tomnl
    OK great. Thanks. Next time it is that everything has got stuck. I will run that command
    Ralf Weber
    @RJMW
    Thanks for the input @mvdbeek - We'll keep you posted.
    Ghost
    @ghost~5772e7e2c2f0db084a206e1b
    great, I think it should be very helpful to have that information
    Ralf Weber
    @RJMW
    Sorry to ask again - where is the extended metadata strategy documented?
    Ghost
    @ghost~5772e7e2c2f0db084a206e1b
    it isn’t in its entirety
    I will try to write it up though
    might take some more time, but happy to help you with this if I haven’t finished it in time for you
    We aren’t running this in production at usegalaxy.org, so we don’t have a lot of experience with this. I’d really like to narrow down the issue before sending you down that path
    Ralf Weber
    @RJMW
    @mvdbeek fair enough - let us dig a little more first.
    MiguelJulia
    @MiguelJulia
    Hi! I am trying to debug why my jobs stay gray forever following these slides: https://training.galaxyproject.org/training-material/topics/admin/tutorials/troubleshooting/slides.html#44
    but in my log there is no mention to uWSGIWorker at all
    anyone has any clue where shoul I start?
    Nicola Soranzo
    @nsoranzo
    Are you using uWSGI mules?
    Helena
    @hexylena
    gxadmin job-inputs is where Nate and I start
    to see if there are any inputs to that job in new
    Morten Johansen
    @morj-uio
    Hi, I am a sysadmin with UiO/ELIXIR Norway in Oslo.
    We have a cvmfs replica of data.galaxyproject.org and are trying to set up one of the singularity repo but always getting this error during initial snapshot attempts:
    "failed to download http://cvmfs0-psu0.galaxyproject.org/cvmfs/singularity.galaxyproject.org/data/6b/81f0681da21040ce0b4a31c3602255a19c5032P (9 - host returned HTTP error)
    unexpected HTTP error code 200 - please check the stratum 0 health"
    When downloading this particular chunk manually with curl it turns out to be 0 bytes.
    The problem seems to be with the s0. Can anyone help?
    Helena
    @hexylena
    @natefoo ^
    MiguelJulia
    @MiguelJulia
    @nsoranzo yes, I am using mules. I have installed a new server using ansible as explained in Barcelona last month, with two mules:
    # Our additions
    mule:
      - lib/galaxy/main.py
      - lib/galaxy/main.py
    farm: job-handlers:1,2
    but the only reference I faind in the log about mules is:
    mar 27 12:27:41 galaxyVM uwsgi[17439]: galaxy.web_stack DEBUG 2020-03-27 12:27:41,961 [p:26092,w:0,m:1] [MainThread] JobConfiguration: No job handler assignment methods were configured but a uWSGI farm named 'jo
    b-handlers' exists, automatically enabling the 'uwsgi-mule-message' assignment method
    mar 27 12:27:41 galaxyVM uwsgi[17439]: galaxy.web_stack DEBUG 2020-03-27 12:27:41,961 [p:26092,w:0,m:1] [MainThread] JobConfiguration: Removed 'db-self' from handler assignment methods due to use of mules
    mar 27 12:27:41 galaxyVM uwsgi[17439]: galaxy.web_stack DEBUG 2020-03-27 12:27:41,961 [p:26092,w:0,m:1] [MainThread] JobConfiguration: handler assignment methods updated to: uwsgi-mule-message
    mar 27 12:27:41 galaxyVM uwsgi[17439]: galaxy.web_stack.handlers INFO 2020-03-27 12:27:41,964 [p:26092,w:0,m:1] [MainThread] JobConfiguration: No job handler assignment method is set, defaulting to 'uwsgi-mule-message', set the assign_with attribute on <handlers> to override the default
    mar 27 12:27:41 galaxyVM uwsgi[17439]: galaxy.jobs INFO 2020-03-27 12:27:41,964 [p:26092,w:0,m:1] [MainThread] Job handler assignment methods set to: uwsgi-mule-message
    MiguelJulia
    @MiguelJulia
    @hexylena Thanks! I am going to check it
    Helena
    @hexylena
    usually the job isn't ready to run due to some normal reason like missing inputs, more often than not ready to run because of other errors (since those logs look fine)
    Nate Coraor
    @natefoo
    @morj-uio The stratum 0 for singularity is unhealthy - after many crashes and hardware failures, we are rebuilding it in Freiburg
    MiguelJulia
    @MiguelJulia
    @hexylena Sorry, I do not manage to find anything with gxadmin, as aparently I currently have no job IDs in my DBs? Galaxy is not assigning any IDs or launching them:
    Destination Parameters
    Runner None
    Runner Job ID None
    Handler None
    Helena
    @hexylena
    gxadmin query queue-detail --all ?
    MiguelJulia
    @MiguelJulia

    Could it be a problem of the database not working? I can create users and histories without problem. I configured it laike this in group_vars:

    PostgreSQL

    postgresql_objects_users:

    • name: galaxy
      postgresql_objects_databases:
    • name: galaxy
      owner: galaxy
    gxadmin query queue-detail --all
    state | id | extid | tool_id | coalesce | time_since_creation | handler | job_runner_name | destination_id
    -------+----+-------+---------+----------+---------------------+---------+-----------------+----------------
    new | 12 | | upload1 | mjuliam | 02:55:58.65216 | | |
    new | 13 | | upload1 | mjuliam | 02:55:58.02824 | | |
    new | 14 | | upload1 | mjuliam | 02:55:57.544039 | | |
    new | 15 | | upload1 | mjuliam | 02:55:57.095877 | | |
    (4 rows)
    Helena
    @hexylena
    well, you have some jobs :) upload jobs can be a bit special?
    but not clear why those wouldn't be running
    Morten Johansen
    @morj-uio
    @natefoo Thanks, then we wait. Btw how big is this repo? More than 5TB?
    Nate Coraor
    @natefoo
    10 TB currently
    selten
    @selten
    Has anyone experienced that Galaxy just won't start workflows at some point? I use the job-handlers to schedule the workflows and I've noticed that at times it randomly stops scheduling workflows.
    Would a mule dedicated to workflow scheduling help?
    Nate Coraor
    @natefoo
    It'd probably make it easier to debug if nothing else
    selten
    @selten
    Thanks
    M Bernt
    @bernt-matthias
    Wondering if anyone has seen such a stacktrace and has ideas how to deal with this situation (the io errors looks scary to me .. but I hope that this is not the actual problem)
    141.65.254.85 - - [28/Mar/2020:13:14:00 +0200] "GET /admin HTTP/1.1" 500 - "https://galaxy.intranet.ufz.de/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:74.0) Gecko/20100101 Firefox/74.
    0"
    Traceback (most recent call last):
      File "/gpfs1/data/galaxy_server/galaxy/.venv/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1244, in _execute_context
        cursor, statement, parameters, context
      File "/gpfs1/data/galaxy_server/galaxy/.venv/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 552, in do_execute
        cursor.execute(statement, parameters)
    sqlite3.OperationalError: disk I/O error
    
    The above exception was the direct cause of the following exception:
    Traceback (most recent call last):
      File "lib/galaxy/web/framework/middleware/error.py", line 154, in __call__
        app_iter = self.application(environ, sr_checker)
      File "/gpfs1/data/galaxy_server/galaxy/.venv/lib/python3.6/site-packages/paste/recursive.py", line 85, in __call__
        return self.application(environ, start_response)
      File "/gpfs1/data/galaxy_server/galaxy/.venv/lib/python3.6/site-packages/paste/httpexceptions.py", line 640, in __call__
        return self.application(environ, start_response)
      File "lib/galaxy/web/framework/base.py", line 143, in __call__
        return self.handle_request(environ, start_response)
      File "lib/galaxy/web/framework/base.py", line 222, in handle_request
        body = method(trans, **kwargs)
      File "lib/galaxy/web/framework/decorators.py", line 101, in decorator
        return func(self, trans, *args, **kwargs)
      File "lib/galaxy/webapps/galaxy/controllers/admin.py", line 541, in index
        return self.client(trans, **kwd)
      File "lib/galaxy/web/framework/decorators.py", line 101, in decorator
        return func(self, trans, *args, **kwargs)
      File "lib/galaxy/webapps/galaxy/controllers/admin.py", line 552, in client
        'is_repo_installed': trans.install_model.context.query(trans.install_model.ToolShedRepository).first() is not None,
      File "/gpfs1/data/galaxy_server/galaxy/.venv/lib/python3.6/site-packages/sqlalchemy/orm/query.py", line 3232, in first
        ret = list(self[0:1])
      File "/gpfs1/data/galaxy_server/galaxy/.venv/lib/python3.6/site-packages/sqlalchemy/orm/query.py", line 3018, in __getitem__
        return list(res)
      File "/gpfs1/data/galaxy_server/galaxy/.venv/lib/python3.6/site-packages/sqlalchemy/orm/query.py", line 3334, in __iter__
        return self._execute_and_instances(context)
      File "/gpfs1/data/galaxy_server/galaxy/.venv/lib/python3.6/site-packages/sqlalchemy/orm/query.py", line 3359, in _execute_and_instances
        result = conn.execute(querycontext.statement, self._params)
      File "/gpfs1/data/galaxy_server/galaxy/.venv/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 988, in execute
        return meth(self, multiparams, params)
      File "/gpfs1/data/galaxy_server/galaxy/.venv/lib/python3.6/site-packages/sqlalchemy/sql/elements.py", line 287, in _execute_on_connection
        return connection._execute_clauseelement(self, multiparams, params)
      File "/gpfs1/data/galaxy_server/galaxy/.venv/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1107, in _execute_clauseelement
        distilled_params,
      File "/gpfs1/data/galaxy_server/galaxy/.venv/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1248, in _execute_context
        e, statement, parameters, cursor, context
      File "/gpfs1/data/galaxy_server/galaxy/.venv/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1466, in _handle_dbapi_exception
        util.raise_from_cause(sqlalchemy_exception, exc_info)
      File "/gpfs1/data/galaxy_server/galaxy/.venv/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 398, in raise_from_cause
        reraise(type(exception), exception, tb=exc_tb, cause=cause)
      File "/gpfs1/data/galaxy_server/galaxy/.venv/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 152, in reraise
        raise value.with_traceback(tb)
      File "/gpfs1/data/galaxy_server/galaxy/.venv/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1244, in _execute_context
        cursor, statement, parameters, context
      File "/gpfs1/data/galaxy_server/galaxy/.venv/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 552, in do_execute
        cursor.execute(statement, parameters)
    sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) disk I/O error
    [SQL: SELECT tool_shed_repository.id AS tool_shed_repository_id, tool_shed_repository.create_time AS tool_shed_repository_create_time, tool_shed_repository.update_time AS tool_shed_repository_update_time, tool_shed_repository.tool_shed AS tool_shed_repository_tool_shed, tool_shed_repository.name AS tool_shed_repository_name, tool_shed_repository.description AS tool_shed_repository_description, tool_shed_repository.owner AS tool_shed_repository_owner, tool_shed_repository.installed_changeset_revision AS tool_shed_repository_installed_changeset_revision, tool_shed_repository.changeset_revision AS tool_shed_repository_changeset_revision, tool_shed_repository.ctx_rev AS tool_shed_repository_ctx_rev, tool_shed_repository.metadata AS tool_shed_repository_metadata, tool_shed_repository.includes_datatypes AS tool_shed_repository_includes_datatypes, tool_shed_repository.tool_shed_status AS tool_shed_repository_tool_shed_status, tool_shed_repository.deleted AS tool_shed_repository_deleted, tool_shed_repository.uninstalled AS tool_shed_repository_uninstalled, tool_shed_repository.dist_to_shed AS tool_shed_repository_dist_to_shed, tool_shed_repository.status AS tool_shed_repository_status, tool_shed_repository.error_message AS tool_shed_repository_error_message 
    FROM tool_shed_repository
     LIMIT ? OFFSET ?]
    [parameters: (1, 0)]
    (Background on this error at: http://sqlalche.me/e/e3q8)
    
    
    The above exception was the direct cause of the following exception:
    
    Traceback (most recent call last):
      File "lib/galaxy/web/framework/middleware/batch.py", line 80, in __call__
        return self.application(environ, start_response)
      File "lib/galaxy/web/framework/middleware/request_id.py", line 15, in __call__
        return self.app(environ, start_response)
      File "lib/galaxy/web/framework/middleware/xforwardedhost.py", line 23, in __call__
        return self.app(environ, start_response)
      File "lib/galaxy/web/framework/middleware/translogger.py", line 71, in __call__
        return self.application(environ, replacement_start_response)
      File "lib/galaxy/web/framework/middleware/error.py", line 164, in __call__
        exc_info)
      File "lib/galaxy/web/framework/middleware/translogger.py", line 70, in replacement_start_response
        return start_response(status, headers, exc_info)
    SystemError: <built-in function uwsgi_spit> returned a result with an error set
    Ghost
    @ghost~5772e7e2c2f0db084a206e1b
    That’s probably not too serious, check free space and permissions …
    and if you try again, does it work ?
    M Bernt
    @bernt-matthias
    free space is good (1.2 PB) permissions of universe.sqlite? Those are rw.
    Error is reproducible and happens when I open /admin in the browser (page is empty)
    Dannon
    @dannon
    That last query there in the log that has the query, can you run that manually just using sqlite3 database/universe.sqlite <query>?
    (and see if it blows up still, outside the context of Galaxy)
    M Bernt
    @bernt-matthias
    Hmm: Error: disk I/O error
    I can't even copy that file.
    Guess I have to talk to our admins.
    Dannon
    @dannon
    Hrmm. Yeah, that's not great.