Q&A, support and general discussion about the Arvados project, for development see https://gitter.im/arvados/development
Hi all,
We are currently setting up a test Arvados cluster and I ran into some unusual behaviour regarding the default replication number: even though I have set this number to 1 in config.yml (Clusters.ClusterID.Collections.DefaultReplication
), and this is confirmed in the output of http://ClusterID.our.domain.com/arvados/v1/config
, this does not appear to be replicated in the Python SDK:
$ /usr/share/python3/dist/python3-arvados-python-client/bin/python
Python 3.8.10 (default, Jun 2 2021, 10:49:15)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import arvados
>>> arvados.api('v1')._rootDesc['defaultCollectionReplication']
2
Any thoughts on why this happens? Thanks in advance!!
arv-put
Hello again,
We are still setting up our test arvados infrastructure, and have now a single VM with API server, PostgreSQL, keepstore and keepproxy. Our issue now is with the keepproxy: the docs stipulate that arv keep_service accessible
should contain a reference to the keepproxy server. This works fine when running the command on the office network, but fails when trying this from home over VPN as it is shown to contain the keepstore domain name instead.
I assume that this is related to the geo settings in the nginx config?
Thanks in advance!
cibin@cibins-beast-13-9380:~/EBI/arvados-k8s/charts/arvados$ kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
arvados-api-server LoadBalancer 10.88.12.90 34.89.54.152 444:31588/TCP 31m
arvados-keep-proxy LoadBalancer 10.88.11.130 34.89.54.152 25107:31630/TCP 31m
arvados-keep-store ClusterIP None <none> 25107/TCP 31m
arvados-keep-web LoadBalancer 10.88.5.66 34.89.54.152 9002:32663/TCP 31m
arvados-postgres ClusterIP 10.88.12.232 <none> 5432/TCP 31m
arvados-slurm-compute ClusterIP None <none> 6818/TCP 31m
arvados-slurm-controller-0 ClusterIP 10.88.14.128 <none> 6817/TCP 31m
arvados-workbench LoadBalancer 10.88.8.200 <pending> 443:30734/TCP,445:32051/TCP 31m
arvados-ws LoadBalancer 10.88.5.207 34.89.54.152 9003:30153/TCP 31m
kubernetes ClusterIP 10.88.0.1 <none> 443/TCP 22h
cibin@cibins-beast-13-9380:~/EBI/arvados-k8s/charts/arvados$ kubectl describe service/arvados-workbench
Name: arvados-workbench
Namespace: default
Labels: app=arvados
app.kubernetes.io/managed-by=Helm
chart=arvados-0.1.0
heritage=Helm
release=arvados
Annotations: cloud.google.com/neg: {"ingress":true}
meta.helm.sh/release-name: arvados
meta.helm.sh/release-namespace: default
Selector: app=arvados-workbench
Type: LoadBalancer
IP Families: <none>
IP: 10.88.8.200
IPs: 10.88.8.200
IP: 34.89.54.152
Port: wb2 443/TCP
TargetPort: 443/TCP
NodePort: wb2 30734/TCP
Endpoints: 10.84.2.18:443
Port: wb 445/TCP
TargetPort: 445/TCP
NodePort: wb 32051/TCP
Endpoints: 10.84.2.18:445
Session Affinity: None
External Traffic Policy: Cluster
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal EnsuringLoadBalancer 2m38s (x11 over 28m) service-controller Ensuring load balancer
Warning SyncLoadBalancerFailed 2m34s (x11 over 28m) service-controller Error syncing load balancer: failed to ensure load balancer: failed to create forwarding rule for load balancer (ae0291ffb3043451580fc197edd8a34e(default/arvados-workbench)): googleapi: Error 400: Invalid value for field 'resource.IPAddress': '34.89.54.152'. Specified IP address is in-use and would result in a conflict., invalid
Hi all,
We are testing the arvados Slurm dispatch and are running into trouble:
$ sudo journalctl -o cat -fu crunch-dispatch-slurm.service
{"level":"info","msg":"crunch-dispatch-slurm 2.2.2 started","time":"2021-10-07T15:36:08.672769209Z"}
Started Arvados Crunch Dispatcher for SLURM.
{"level":"fatal","msg":"error getting my token UUID: Get \"https://88d80-crunch-dispatcher-slurm-controller-dispatcher.dev.core.genomicsplc.com/arvados/v1/api_client_authorizations/current\": dial tcp 10.93.111.119:443: connect: connection refused","time":"2021-10-07T15:37:00.794084728Z"}
crunch-dispatch-slurm.service: Main process exited, code=exited, status=1/FAILURE
crunch-dispatch-slurm.service: Failed with result 'exit-code'.
crunch-dispatch-slurm.service: Scheduled restart job, restart counter is at 121.
Stopped Arvados Crunch Dispatcher for SLURM.
Starting Arvados Crunch Dispatcher for SLURM...
{"level":"info","msg":"crunch-dispatch-slurm 2.2.2 started","time":"2021-10-07T15:37:01.919705722Z"}
Started Arvados Crunch Dispatcher for SLURM.
{"level":"fatal","msg":"error getting my token UUID: Get \"https://88d80-crunch-dispatcher-slurm-controller-dispatcher.dev.core.genomicsplc.com/arvados/v1/api_client_authorizations/current\": dial tcp 10.93.111.119:443: connect: connection refused","time":"2021-10-07T15:37:54.030722405Z"}
crunch-dispatch-slurm.service: Main process exited, code=exited, status=1/FAILURE
crunch-dispatch-slurm.service: Failed with result 'exit-code'.
crunch-dispatch-slurm.service: Scheduled restart job, restart counter is at 122.
Stopped Arvados Crunch Dispatcher for SLURM.
Starting Arvados Crunch Dispatcher for SLURM...
{"level":"info","msg":"crunch-dispatch-slurm 2.2.2 started","time":"2021-10-07T15:37:55.167350562Z"}
Started Arvados Crunch Dispatcher for SLURM.
This is bizarre, as we are able to use arv api_client_authorization current
without problems from the VM running the dispatcher, when using the root API token. Any thoughts? Thanks!