Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Michael R. Crusoe
@mr-c
oooh, our slurmd decided to die ..
Michael R. Crusoe
@mr-c
and it is repeatable after a reboot!
Peter Amstutz
@tetron
You need at least 2 nodes to run the tests, or maybe 3
Because 1 node has the workflow runner and 1+ nodes to do the actual work
By "node" I really mean "slurm job slot"
Peter Amstutz
@tetron
@/all Arvados community call happening in a couple minutes! https://meet.google.com/eig-fvsw-xvd
Joshua C. Randall
@jrandall
@tetron @cure I have written up a more complete proposal for the manifests-in-keep idea I mentioned at the end of the community call: https://dev.arvados.org/issues/17112
pvanheus
@pvanheus
heya Arvados folks - I'm trying to install Arvados (again) using the single node salt option... I keep getting 'KeyError: 'boto3.assign_funcs'' errors in my logs... this is on Ubuntu 18.04
more worryingly, 16 tasks fail.. I'm going through them now, but one of the problems is that libnginx-mod-http-passenger did not install.. which in turn seems to be because the phusion passenger repo did not get added to the system..
pvanheus
@pvanheus
here's the log from (re-running) provision.sh: https://gist.github.com/pvanheus/272a20a2d2c6c1be8152c255ed818665
Michael R. Crusoe
@mr-c
@SasSwart Any ideas about ^^ ?
I see that Sas's repo has https://github.com/SasSwart/arvados-cluster/blob/master/pillar/nginx.sls ; maybe that needs a PR to the main repo
pvanheus
@pvanheus
the phusion passenger parts of that looks similar to what is here: https://github.com/arvados/arvados/blob/master/tools/salt-install/single_host/nginx_passenger.sls
pvanheus
@pvanheus
ok I've found the source of the problem (I think) - /srv/pillars/top.sls was created empty
or rather it is created and them emptied somehow.
pvanheus
@pvanheus
I found the source and fixed it: arvados/arvados#140
pvanheus
@pvanheus
so to prove that it is fixed I trashed the VM and am re-installing from scratch....
btw does Arvados require specifically 5 character cluster IDs? as in, not 4, not 6?
pvanheus
@pvanheus
so my fix does not fix the KeyError: 'boto3.assign_funcs' messages.... I hope they are a warning, not an error.
Peter Amstutz
@tetron
@pvanheus yes the cluster ids have to be exactly 5 characters
I don't know anything about the boto3 messages, do you want to post them here: https://forum.arvados.org/
pvanheus
@pvanheus
Succeeded: 115 (changed=92)
Failed:      0
Peter Amstutz
@tetron
@pvanheus that sounds encouraging!
pvanheus
@pvanheus
Yep. Workbench is up and running... now I just need to figure out how to use it
Javier Bértoli
@javierbertoli
pvanheus: The boto3 error is salt-call trying to find some of its modules (as it's running in debug mode). You can safely ignore those messages.
Michael R. Crusoe
@mr-c
@javierbertoli Can that be added to the README?
Javier Bértoli
@javierbertoli
yes. And also, perhaps, we could run it in regular (info) mode and not debug mode.
pvanheus
@pvanheus
yes I think non-debug mode should be good for normal provisioning with a switch to use debug mode if a debug log is necessary
pvanheus
@pvanheus
ok so I created a workflow from the command line and submitted a job... status is queued and has been so for some minutes... where can I look to see what is going on?
ah... syslog...
the problem is crunch-dispatch-local[29319]: {"level":"fatal","msg":"\"error getting my token UUID: Get https:///arvados/v1/api_client_authorizations/current: http: no Host in request URL\"","time":"2020-11-18T16:40:31.233634203Z"}
Peter Amstutz
@tetron
@pvanheus that's probably a salt formula or service file bug. it should be setting ARVADOS_API_HOST
the systemd service file mentions /etc/arvados/crunch-dispatch-local-credentials
Javier Bértoli
@javierbertoli
tetron: iirc, the systemd service file for crunch-dispatch-local got merged recently and I need to upgrade the formula to reflect that (It's using a service file I added to it)
pvanheus: are you running everything local, in a single host, right? In the formula, crunch-dispatch-local expects to find the credentials in /etc/arvados/environment, which is not being added by default by the formula (I'm taking notes to fixes these things)
pvanheus
@pvanheus
ah... there is no /etc/arvados/environment - so this is a set of credentials that crunch-dispatch-local is using?
yes, everything local
but what credentials are meant to be in that environment file? if I put my admin user credentials there then... the workflow runs, and gets cancelled immediately...
(and there is nothing in the log tab)
Javier Bértoli
@javierbertoli
give me a sec, I'll try to give you some hints.
Peter Amstutz
@tetron
@/all Arvados v2.1.1 is released! https://arvados.org/release-notes/2.1.1/
pvanheus
@pvanheus
woohoo!
Javier Bértoli
@javierbertoli
pvanheus: you should need to do a couple of changes (I'll try to add all that to the formula):
  1. you need to edit the service file /etc/systemd/system/crunch-dispatch-local.service and change the line
ExecStart=/usr/bin/crunch-dispatch-local -poll-interval=1 -crunch-run-command=/usr/local/bin/crunch-run.sh
with
ExecStart=/usr/bin/crunch-dispatch-local -poll-interval=1 -crunch-run-command=/usr/bin/crunch-run
  1. add the missing env file, /etc/arvados/environment (for the current version of the formula) and add
ARVADOS_API_HOST=localhost