Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
Gudule JR
@GuduleJR_twitter
Hi all... Coming back with Arvados... I would like to know if the "Multi host Arvados" is posible for a provate network, tahts is, to install all of the services for a storage managment, using somne local CA certificates? That's because i have to proove on my staff that Arvados is viable, to have some public ip....
Peter Amstutz
@tetron
yes, you can use a local CA for your certificates
crusoe
@mr-c:matrix.org
[m]
I've seen multiple references to https://gitlab.com/iidsgt/arv-helm but that returns 404 ; where did that get moved to?
Peter Amstutz
@tetron
maybe @osmanwa knows?
Gudule JR
@GuduleJR_twitter
@tetron ok, thank's I supose I have to use salt to autoticaly transfer the certificates to each servers....
Tom Schoonjans
@tschoonj

Hi all,

We are currently setting up a test Arvados cluster and I ran into some unusual behaviour regarding the default replication number: even though I have set this number to 1 in config.yml (Clusters.ClusterID.Collections.DefaultReplication), and this is confirmed in the output of http://ClusterID.our.domain.com/arvados/v1/config, this does not appear to be replicated in the Python SDK:

$ /usr/share/python3/dist/python3-arvados-python-client/bin/python
Python 3.8.10 (default, Jun  2 2021, 10:49:15) 
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import arvados
>>> arvados.api('v1')._rootDesc['defaultCollectionReplication']
2

Any thoughts on why this happens? Thanks in advance!!

Peter Amstutz
@tetron:matrix.org
[m]
good catch, let me see
Tom Schoonjans
@tschoonj
thanks Peter
Peter Amstutz
@tetron:matrix.org
[m]
oh, perhaps you have a cached discovery document?
Tom Schoonjans
@tschoonj
not sure what that is
I actually ran into this problem through arv-put
Peter Amstutz
@tetron:matrix.org
[m]
rm -r ~/.cache/arvados
Tom Schoonjans
@tschoonj
ok
this works!
thanks Peter!!
Peter Amstutz
@tetron:matrix.org
[m]
by the way we have an Arvados use group meeting today in half an hour
Tom Schoonjans
@tschoonj
I know, but won't be able to make it due to childcare :-(
Peter Amstutz
@tetron
@/all The user group video chat is happening soon https://forum.arvados.org/t/arvados-user-group-video-chat/47/8
Tom Schoonjans
@tschoonj

Hello again,

We are still setting up our test arvados infrastructure, and have now a single VM with API server, PostgreSQL, keepstore and keepproxy. Our issue now is with the keepproxy: the docs stipulate that arv keep_service accessible should contain a reference to the keepproxy server. This works fine when running the command on the office network, but fails when trying this from home over VPN as it is shown to contain the keepstore domain name instead.

I assume that this is related to the geo settings in the nginx config?

Thanks in advance!

Peter Amstutz
@tetron
yes
it is controlled by the geo setting
is the home VPN considered to be on the same network?
Tom Schoonjans
@tschoonj
apparently not :-)
I will ask our IT department what IP range we need to add to support our VPN connections
Peter Amstutz
@tetron
if you are outside the private network, you should get keepproxy from "keep_services accessible", if you are inside the private network, you should get the keepstore servers instead. it doesn't matter which one you get as long as it is reachable
so it sounds like either the keepstore needs to be reachable from the home VPN or your geo section needs to send the home VPN to keepproxy (which needs to be reachable?)
Tom Schoonjans
@tschoonj
aha
so what we are seeing here is actually ok?
Peter Amstutz
@tetron
does it work?
does arv-get work?
Tom Schoonjans
@tschoonj
my colleague on VPN just tested arv-put and that fails
Peter Amstutz
@tetron
well so either all the keepstore servers, or the keepproxy server, need to be reachable by home VPN
so you need to figure that out first
the one particular advantage of using keepproxy in this case, if you have keepstore-level replication enabled, it'll handle replicating the upload at the keepproxy level instead of the client having to send data twice
Tom Schoonjans
@tschoonj
ok thanks will investigate
Peter Amstutz
@tetron
however if you arn't using keepstore level replication (DefaultReplication: 1) and using replication at a lower level (object store or RAID) then it doesn't matter
Tom Schoonjans
@tschoonj
yes, we are using DefaultReplication: 1 in this setup
Peter Amstutz
@tetron
ok then you just need to figure out where the VPN fits in your network topology
Tom Schoonjans
@tschoonj
Ok, we got it fixed now. The proxy is now used everywhere except when using arv on the arvados VM itself
Andrey Kartashov
@portah
@tetron Does arvados have a preinstalled version on a cloud?
Peter Amstutz
@tetron
@portah to try it out or to do real workloads?
Andrey Kartashov
@portah
@tetron to check api and try with cwl
Peter Amstutz
@tetron
Andrey Kartashov
@portah
Thank you
Cibin S B
@cibinsb
Hi There, I have been trying to deploy Arvados on GKE and came across the following load balancer error from one of the Arvados services. How to fix this problem
cibin@cibins-beast-13-9380:~/EBI/arvados-k8s/charts/arvados$ kubectl get svc
NAME                         TYPE           CLUSTER-IP     EXTERNAL-IP    PORT(S)                       AGE
arvados-api-server           LoadBalancer   10.88.12.90    34.89.54.152   444:31588/TCP                 31m
arvados-keep-proxy           LoadBalancer   10.88.11.130   34.89.54.152   25107:31630/TCP               31m
arvados-keep-store           ClusterIP      None           <none>         25107/TCP                     31m
arvados-keep-web             LoadBalancer   10.88.5.66     34.89.54.152   9002:32663/TCP                31m
arvados-postgres             ClusterIP      10.88.12.232   <none>         5432/TCP                      31m
arvados-slurm-compute        ClusterIP      None           <none>         6818/TCP                      31m
arvados-slurm-controller-0   ClusterIP      10.88.14.128   <none>         6817/TCP                      31m
arvados-workbench            LoadBalancer   10.88.8.200    <pending>      443:30734/TCP,445:32051/TCP   31m
arvados-ws                   LoadBalancer   10.88.5.207    34.89.54.152   9003:30153/TCP                31m
kubernetes                   ClusterIP      10.88.0.1      <none>         443/TCP                       22h
cibin@cibins-beast-13-9380:~/EBI/arvados-k8s/charts/arvados$ kubectl describe service/arvados-workbench
Name:                     arvados-workbench
Namespace:                default
Labels:                   app=arvados
                          app.kubernetes.io/managed-by=Helm
                          chart=arvados-0.1.0
                          heritage=Helm
                          release=arvados
Annotations:              cloud.google.com/neg: {"ingress":true}
                          meta.helm.sh/release-name: arvados
                          meta.helm.sh/release-namespace: default
Selector:                 app=arvados-workbench
Type:                     LoadBalancer
IP Families:              <none>
IP:                       10.88.8.200
IPs:                      10.88.8.200
IP:                       34.89.54.152
Port:                     wb2  443/TCP
TargetPort:               443/TCP
NodePort:                 wb2  30734/TCP
Endpoints:                10.84.2.18:443
Port:                     wb  445/TCP
TargetPort:               445/TCP
NodePort:                 wb  32051/TCP
Endpoints:                10.84.2.18:445
Session Affinity:         None
External Traffic Policy:  Cluster
Events:
  Type     Reason                  Age                   From                Message
  ----     ------                  ----                  ----                -------
  Normal   EnsuringLoadBalancer    2m38s (x11 over 28m)  service-controller  Ensuring load balancer
  Warning  SyncLoadBalancerFailed  2m34s (x11 over 28m)  service-controller  Error syncing load balancer: failed to ensure load balancer: failed to create forwarding rule for load balancer (ae0291ffb3043451580fc197edd8a34e(default/arvados-workbench)): googleapi: Error 400: Invalid value for field 'resource.IPAddress': '34.89.54.152'. Specified IP address is in-use and would result in a conflict., invalid
Peter Amstutz
@tetron
@cure might know
Tom Schoonjans
@tschoonj

Hi all,

We are testing the arvados Slurm dispatch and are running into trouble:

$ sudo journalctl -o cat -fu crunch-dispatch-slurm.service
{"level":"info","msg":"crunch-dispatch-slurm 2.2.2 started","time":"2021-10-07T15:36:08.672769209Z"}
Started Arvados Crunch Dispatcher for SLURM.
{"level":"fatal","msg":"error getting my token UUID: Get \"https://88d80-crunch-dispatcher-slurm-controller-dispatcher.dev.core.genomicsplc.com/arvados/v1/api_client_authorizations/current\": dial tcp 10.93.111.119:443: connect: connection refused","time":"2021-10-07T15:37:00.794084728Z"}
crunch-dispatch-slurm.service: Main process exited, code=exited, status=1/FAILURE
crunch-dispatch-slurm.service: Failed with result 'exit-code'.
crunch-dispatch-slurm.service: Scheduled restart job, restart counter is at 121.
Stopped Arvados Crunch Dispatcher for SLURM.
Starting Arvados Crunch Dispatcher for SLURM...
{"level":"info","msg":"crunch-dispatch-slurm 2.2.2 started","time":"2021-10-07T15:37:01.919705722Z"}
Started Arvados Crunch Dispatcher for SLURM.
{"level":"fatal","msg":"error getting my token UUID: Get \"https://88d80-crunch-dispatcher-slurm-controller-dispatcher.dev.core.genomicsplc.com/arvados/v1/api_client_authorizations/current\": dial tcp 10.93.111.119:443: connect: connection refused","time":"2021-10-07T15:37:54.030722405Z"}
crunch-dispatch-slurm.service: Main process exited, code=exited, status=1/FAILURE
crunch-dispatch-slurm.service: Failed with result 'exit-code'.
crunch-dispatch-slurm.service: Scheduled restart job, restart counter is at 122.
Stopped Arvados Crunch Dispatcher for SLURM.
Starting Arvados Crunch Dispatcher for SLURM...
{"level":"info","msg":"crunch-dispatch-slurm 2.2.2 started","time":"2021-10-07T15:37:55.167350562Z"}
Started Arvados Crunch Dispatcher for SLURM.

This is bizarre, as we are able to use arv api_client_authorization current without problems from the VM running the dispatcher, when using the root API token. Any thoughts? Thanks!

Tom Schoonjans
@tschoonj
Please ignore, our config was wrong
Ward Vandewege
@cure
ok!
Callum-Joyce
@Callum-Joyce

Hello, I am looking at using SLURM dispatch with @tschoonj.

We have tried running a job with the example command provided here: https://doc.arvados.org/v2.2/install/crunch2-slurm/install-test.html but get hit with this error:

Error: //railsapi.internal/arvados/v1/container_requests: 422 Unprocessable Entity: #<ArvadosModel::UnresolvableContainerError: docker image "arvados/jobs:latest" not found> (req-ecdzw2wz1qq5r24xfuus)

The documentation here: https://doc.arvados.org/v2.2/api/methods/container_requests.html suggests that the "container_image"property should be set to the PDH of a collection containing the image, but in the example script mentioned above it is set to "arvados/jobs:latest" which is obviously not a PDH.

Could you advise on exactly what the value should be here? If putting the image into a collection is necessary, will we need to do this for every image we need to use in the future? Thanks in advance.