Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
Peter Amstutz
@tetron
the arvados user group meeting is happening now: https://forum.arvados.org/t/arvados-user-group-video-chat/47/9
Peter Amstutz
@tetron:matrix.org
[m]
@room Aravdos 2.3 has been released! https://forum.arvados.org/t/arvados-2-3-0-released/95
Brad Chapman
@chapmanb
When using the Python API, what is the right function call to update a collection with new files replacing the old ones? I'm happily creating initial containers using save_new, but when trying to create a new version of a collection with new files using a similar approach and calling saveI get the dreaded [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate which I suspect is just telling me I'm not doing it the right way. Are there any cookbook examples of this? I'm having lots of fun learning my way around the API and enjoying using the fabulous new workbench, cool to see all the awesome work you've been doing.
Peter Amstutz
@tetron
are you using playground or arvbox?
@chapmanb
Brad Chapman
@chapmanb
Sorry for the lack of context, this is on playground.
Peter Amstutz
@tetron
so "certificate verify failed" doesn't have anything to do with using the API correctly, it's an SSL issue
but it should just work
where are you running it from?
the playground shell? your laptop? somewhere else?
Brad Chapman
@chapmanb
This is from my laptop. The weird thing is that save_new works fine, but then using save gives that error in the same environment. So I figured I was using the API wrong. If there is a code snippet for how to create a new version of a collection with updated files, I can dig from there.
Ward Vandewege
@cure
@chapmanb hi! hmm, that error is weird. Clearly we need to add such an example to the cookbook, @tetron are you able to help or should I give it a go?
Peter Amstutz
@tetron
sorry, stepped away, back for a bit
@chapmanb it doesn't make sense for save_new() to work and save() to not work if they are communicating with the same API server
so that's really odd
save() is the right API call to commit changes to a collection
Peter Amstutz
@tetron
save_new() creates a new collection and save() saves changes to an existing collection
Brad Chapman
@chapmanb
Thanks y'all, sorry I had to go pickup kids. No rush on this. I added you to the repo and my code is here if I'm doing anything obviously wrong: https://github.com/chapmanb/veggiedb-standardize/blob/d211606c33291bde9ada7942075d380c10cb2179/veggiedb/arvadosio.py#L75 It's the same logic for the actual file addition, just how I initialize and save the collection if it exists versus being new. That's why I figured the error was caused by bad coding, not an actual configuration issue.
Ward Vandewege
@cure
@chapmanb no worries, I've been poking at this a bit. The SSL error is very odd; our best guess is currently that something is going on with the apiconfig object: either it is not properly propagated somehow in the save call (that would be an sdk bug, I'm trying to replicate it), or, could it be apiconfig is not what you expect in some circumstances? Maybe you could add a debug print that prints it out when you get the SSL error.
Ward Vandewege
@cure
@chapmanb the other thing that could be going on here is a pycurl/curl version that is out of date. The "unable to get local issuer certificate" suggests that.
but I don't know why you'd only see it on 'save' and not 'save_new'
Brad Chapman
@chapmanb
Thanks Ward for digging, I'm glad it's not just me that's confused. I've got this all in a pipenv so pretty isolated and repeatable with the python requirement. pycurl is pycurl==7.44.1. apiconfig is consistent between both since the code is the same. I'll try to poke more but it's helpful to hear I'm not doing anything obviously wrong in the code.
Ward Vandewege
@cure
@chapmanb huh I really need to track this down then, I was hoping that apiconfig was screwed up somehow
Brad Chapman
@chapmanb
Thanks Ward, I'm looking at the connection code as well and realizing how different save and save_new are under the covers so will do some debugging. I purposely tried to keep the code on my side as consistent as possible but will also double check that. I appreciate the help.
Ward Vandewege
@cure
I haven't been able to replicate the problem, yet
Peter Amstutz
@tetron:matrix.org
[m]
@chapmanb: yea, can you be more specific about how you run it and under what circumstances you get the error
actually, even better, can you shave it down to a test case
if you can just make a tiny script that just makes a collection and uses save/save_new and see if it succeeds or fails
and then build that back up until we can isolate the issue
stack traces would also be helpful
Ward Vandewege
@cure
@chapmanb is your venv very old?
or, older than the end of September 2021?
(I have this sneaking suspicion that the root cert expiry for Let's Encrypt has something to do with this; that happened Sep 30th - cf. https://letsencrypt.org/docs/dst-root-ca-x3-expiration-september-2021/)
venv is kind of nasty, it copies a bunch of system files into your virtualenv at creation.
I see certifi's cacert.pem and httplib2's cacerts.txt also
I think one of those in your venv could be out of date
Peter Amstutz
@tetron:matrix.org
[m]
would that affect any of our python packages that are virtualenvs?
Ward Vandewege
@cure
we haven't seen this though.
Peter Amstutz
@tetron:matrix.org
[m]
riht
right
just speculating
Ward Vandewege
@cure
but maybe we should compare the versions of the httplib2 and certifi python packages in your venv, @chapmanb because those 2 come with a cacert file.
I've also seen cert breakage caused by this because of out of date libssl and other system packages
Brad Chapman
@chapmanb
I'll try to reduce it to a repeatable test case so we can debug, and will look at the pipenv bits to see if I can figure out when it does and doesn't work. Thanks for all the tips and ideas.
Brad Chapman
@chapmanb
Here's a self contained example that reproduces the error on my setup, and works around it by calling an extra save: https://gist.github.com/chapmanb/42d2a81321a04350974daed85d76c4d1 The trigger for the error seems to be calling save when you overwrite an existing file. I had included a remove previously but it looks like you also need to call save after that and before starting to write to the file. So this works around the problem, but doesn't really solve my underlying issue. When I update this way the collection updates the file, but doesn't get a new version so is not tracking the change. The versioning and ability to see the previous sequence is really what I was hoping for, so could still use the pointers on the right way to do that through the API. Thanks so much for all the help working through this and debugging.
16 replies
Tom Schoonjans
@tschoonj
Hi all,
Lucas Di Pentima
@ldipenti
Hi @chapmanb, I’ve tested your example script with an older and the latest arvados-python-client and couldn’t reproduce the problem you’re having. I had to do some changes to it, the main one being the fact that get_connection_config() isn’t available so I just left the Collection class do its thing without passing an api object to it. Maybe the problem is related to that? You can look at the modified script here: https://pastebin.com/6AQk6mxy
Tom Schoonjans
@tschoonj

We are seeing an issue with the arvados-python-client's arv-mount command in release 2.3:

mkdir -p /tmp/arvados-mount
source ~/.arv_key
arv-mount --replace --read-only --all /tmp/arvados-mount --exec
mkdir -p /home/manager/ontology_files
cp /tmp/arvados-mount/by_id/glon1-4zz18-6uskmpf7bz9hhz5/* /home/manager/ontology_files
cp /tmp/arvados-mount/by_id/glon1-4zz18-3986qs7m7ydzb0n/some-other-file.bin /home/manager/ontology_files
cp: failed to close '/tmp/arvados-mount/by_id/glon1-4zz18-3986qs7m7ydzb0n/some-other-file.bin': Input/output error

We do not observe this failure with 2.2. Any thoughts on why this is happening?

Lucas Di Pentima
@ldipenti
@tschoonj It seems that the mount is being done in read-only mode?
Brad Chapman
@chapmanb
Thanks Lucas, the get_connection_config was only meant to be a stub to pull the authentication tokens from somewhere. We grab these from a file to avoid needing to inject environmental variables. Most importantly though, did running it create a new version of the collection or is it still on version 1 after your successful run? That's the main thing I wanted to accomplish with the API and am not so worried about debugging the error since we now have a workaround. I really appreciate the help.