by

Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Peter Amstutz
    @tetron
    can you find the log where keepstore restarted
    my suspicion is that it isn't binding to the right IP address
    Ibrahim Cagri Kurt
    @ibrahimkurt
    ● keepstore.service - Arvados Keep Storage Daemon
       Loaded: loaded (/lib/systemd/system/keepstore.service; enabled; vendor preset: enabled)
       Active: active (running) since Fri 2020-05-29 02:14:45 UTC; 11s ago
         Docs: https://doc.arvados.org/
     Main PID: 30396 (keepstore)
        Tasks: 6 (limit: 4369)
       CGroup: /system.slice/keepstore.service
               └─30396 /usr/bin/keepstore
    
    May 29 02:14:45 arv-ubuntu-18-04-1 systemd[1]: Starting Arvados Keep Storage Daemon...
    May 29 02:14:45 arv-ubuntu-18-04-1 keepstore[30396]: {"PID":30396,"level":"info","msg":"keepstore 2.0.2 starting, pid 30396","time":"2020-05-29T02:14:45.3773752
    May 29 02:14:45 arv-ubuntu-18-04-1 keepstore[30396]: {"PID":30396,"level":"info","msg":"started volume arvad-nyw5e-000000000000000 ([UnixVolume /mnt/local-disk]
    May 29 02:14:45 arv-ubuntu-18-04-1 keepstore[30396]: {"PID":30396,"level":"info","msg":"started volume arvad-nyw5e-000000000000002 ([UnixVolume /mnt/network-att
    May 29 02:14:45 arv-ubuntu-18-04-1 keepstore[30396]: {"Listen":"127.0.0.10:25107","PID":30396,"Service":"keepstore","URL":"http://keep0.arvad.elmgenomics.com:25
    May 29 02:14:45 arv-ubuntu-18-04-1 systemd[1]: Started Arvados Keep Storage Daemon.
    is this what you ask for?
    Peter Amstutz
    @tetron
    yes
    oh
    wait a minute
    you only have one instance of keepstore
    you run one keepstore
    it is running as keep0
    there is nothing running as keep1
    either you need to you start a second copy of keepstore, or you need to reconsider your volume configuration
    Ibrahim Cagri Kurt
    @ibrahimkurt
    we were wondering why keep1 is not shown as up and running there in the log!
    how could we start a second copy of keepstore?
    Peter Amstutz
    @tetron
    maybe you could explain what you want to happen at the storage layer? right now it looks like you will store blocks both on local disk and on a mounted file system
    but when you read back you would randomly read some blocks from local disk and some blocks from NFS
    which is maybe not what you intended
    Peter Amstutz
    @tetron
    to start a second copy of keepstore you would need to copy the systemd unit file and just call it something like keepstore_keep1
    Ibrahim Cagri Kurt
    @ibrahimkurt
    thanks a lot Peter for diagnosing and also the explanations. I need to discuss more with @AhmetBahcivan once it is day time for him to tell more specifically our storage needs. One more thing. Remember I said arv-put worked fine with a sample file? When I try to download that same file over the browser at https://workbench.arvad.elmgenomics.com/ I get 502 Bad Gateway error page of nginx. Do you think this is also due to inexistence of keep1?
    Peter Amstutz
    @tetron
    ah, that is supposed to direct you to keep-web
    so it sounds like the keep-web configuration has an issue
    Dante Tsang
    @dantetwc
    Hi Guys, any thought on getting error 422 on uploading large files to S3 backed keep?
    CONTAINER ID        NAME                                                     CPU %               MEM USAGE / LIMIT     MEM %               NET I/O             BLOCK I/O           PIDS                                                                                                                    [0/23952]
    7608b64f2479        ecs-agent                                                0.32%               20.27MiB / 30.96GiB   0.06%               0B / 0B             0B / 0B             17
    850d35474c20        ecs-arvados_docker-10-keepstore-a0e2bc9c93bab3d74d00     86.06%              88.28MiB / 30.96GiB   0.28%               8.67GB / 192MB      0B / 0B             14
    4b3aa6ecb95b        ecs-arvados_docker-10-nginx-8894db95f5cf908a0400         0.16%               6.75MiB / 512MiB      1.32%               430MB / 427MB       0B / 0B             6
    adc40a06dde7        ecs-arvados_docker-10-ws-f0afc5c4a78587a2e801            0.00%               19.51MiB / 512MiB     3.81%               131kB / 76.9kB      0B / 0B             15
    674131f29855        ecs-arvados_docker-10-keepstore1-b2d9ccfcd5a8b499fc01    0.01%               84.09MiB / 30.96GiB   0.27%               1.53GB / 29.8MB     0B / 0B             14
    59952e7b6b64        ecs-arvados_docker-10-sso-9c95bedaadc2c5d7c801           0.00%               91.41MiB / 512MiB     17.85%              35.5kB / 16.7kB     0B / 0B             39
    77674aade031        ecs-arvados_docker-10-keepweb-eaddbcf5eafde9c7c101       0.23%               10.47MiB / 512MiB     2.05%               146kB / 38.9kB      0B / 0B             13
    c59b63b4505f        ecs-arvados_docker-10-keepproxy-9483ffdbb3f29fbeff01     0.00%               63.78MiB / 30.96GiB   0.20%               162MB / 322MB       0B / 0B             12
    0a122cfd0841        ecs-arvados_docker-10-api-f0c8b4eb9c81b1897700           93.27%              449.4MiB / 30.96GiB   1.42%               1.65GB / 275MB      0B / 0B             72
    f2a07d4297a7        ecs-arvados_docker-10-keepbalance-faf99ea38dc6d4f53600   92.43%              5.863MiB / 512MiB     1.15%               396MB / 15.3MB      0B / 0B             4
    0814d17a04bb        ecs-arvados_docker-10-workbench-a6ab8384c0e0ecdd6f00     0.00%               146.2MiB / 512MiB     28.56%              339kB / 1.89MB      0B / 0B             39
    pretty normal resource usage for dockers
    Lucas Di Pentima
    @ldipenti
    Hi Dante, were you able to track down the error at the log files? What are you using to upload the files?
    Dante Tsang
    @dantetwc
    @ldipenti Trying arv-put and WebDAV, however, both of them reported error randomly when uploading large files
    Dante Tsang
    @dantetwc
    and there is no error log from keep web/store
    Lucas Di Pentima
    @ldipenti
    Hi @dantetwc , do you get a code like req-xxxxxxxxxx along the 422? Have you checked the API server’s log?
    Peter Amstutz
    @tetron
    @GuduleJR_twitter is this where you work? http://www.uusmb.unam.mx/
    Evan Clark
    @djevo1_gitlab
    Is there a simple way to upload data to arvados via cloud that doesn’t require docker? My assumption is that arv keep put requires docker in order to copy data but I may be incorrect in that assumption
    Peter Amstutz
    @tetron
    arv-put does not require Docker. you can "pip install" into a virtualenv
    or install arv-put from packages
    Dante Tsang
    @dantetwc
    {"PID":1,"RequestID":"req-5wl74rw1buqprvgscy7h","level":"info","msg":"response","remoteAddr":"172.17.0.9:56544","reqBytes":67108864,"reqForwardedFor":"","reqHost":"keep0.arv01.cogenesis.ai:25017","reqMethod":"PUT","reqPath":"b84c2853bac4605be24d87a102e61834","reqQuery":"","respBody":"unexpected EOF\n","respBytes":15,"respStatus":"Internal Server Error","respStatusCode":500,"time":"2020-06-02T07:20:21.802992873Z","timeToStatus":14.999702,"timeTotal":14.999716,"timeWriteBody":0.000014}
    {"PID":1,"RequestID":"req-5wl74rw1buqprvgscy7h","level":"info","msg":"request","remoteAddr":"172.17.0.9:57152","reqBytes":67108864,"reqForwardedFor":"","reqHost":"keep1.arv01.cogenesis.ai:25018","reqMethod":"PUT","reqPath":"b84c2853bac4605be24d87a102e61834","reqQuery":"","time":"2020-06-02T07:20:06.803940883Z"}
    Lucas Di Pentima
    @ldipenti
    Hi @dantetwc that error sounds familiar, maybe you bumped into https://dev.arvados.org/issues/16393, if you’re trying to upload from a slow client. Can you confirm? this fix will be included on the upcoming 2.0.3 release.
    Gudule JR
    @GuduleJR_twitter
    Hi all. As i can test on my fresh instalation, the unique way to trash a project is via the arv-cli command. Inside Workbench web site there is not posible. Right?
    Peter Amstutz
    @tetron
    projects have a "trash" button on workbench
    so you should be able to do it with workbench
    Gudule JR
    @GuduleJR_twitter
    Capture d’écran du 2020-06-02 15-53-12.png
    the unique trash button in the upper right corner is for show trash content...
    Peter Amstutz
    @tetron
    you have to delete it from the parent
    go to home and then "subprojects" tab
    Gudule JR
    @GuduleJR_twitter
    You're right, sorry...
    Dante Tsang
    @dantetwc
    2020-06-03 09:59:43 arvados.arvados_fuse[24893] DEBUG: del_entry on inode 1719 with refcount 2
    2020-06-03 09:59:43 arvados.arvados_fuse[24893] DEBUG: collection notify mod <arvados.collection.Subcollection object at 0x7f640427a790> .dbNSFP.txt.gz.v9C6e5 (<arvados.arvfile.ArvadosFile object at 0x7f63ec3dbe50>, <arvados.arvfile.ArvadosFile object at 0x7f63ec3dbe50>)
    2020-06-03 09:59:43 arvados.arvados_fuse[24893] DEBUG: collection notify mod <arvados.collection.Subcollection object at 0x7f640427a790> .dbNSFP.txt.gz.v9C6e5 (<arvados.arvfile.ArvadosFile object at 0x7f63ec3dbe50>, <arvados.arvfile.ArvadosFile object at 0x7f63ec3dbe50>)
    2020-06-03 09:59:43 arvados.arvados_fuse[24893] ERROR: Keep write error: Error writing some blocks: block afb01e18f28fd083ae2ce47a1dba6a60+67108864 raised KeepWriteError (failed to write afb01e18f28fd083ae2ce47a1dba6a60 after 2 attempts (wanted 2 copies but wrote 0): service https://keep.arv01.cogenesis.ai:443/ responded with 413 HTTP/1.1 100 Continue
      HTTP/1.1 413 Request Entity Too Large); block 9fe190511a102b6b956308cea96b5975+67108864 raised KeepWriteError (failed to write 9fe190511a102b6b956308cea96b5975 after 2 attempts (wanted 2 copies but wrote 0): service https://keep.arv01.cogenesis.ai:443/ responded with 413 HTTP/1.1 100 Continue
      HTTP/1.1 413 Request Entity Too Large)
       unique: 668534, error: -5 (Input/output error), outsize: 16
    unique: 668536, opcode: FORGET (2), nodeid: 1719, insize: 48, pid: 0
    2020-06-03 09:59:43 arvados.arvados_fuse[24893] DEBUG: arv-mount forget: inode 1719 nlookup 2 ref_count 2
    @ldipenti it seems the keep store randomly fail when doing read/write for large file
    Lucas Di Pentima
    @ldipenti
    @dantetwc Can you confirm that this only happens when using keep from “the outside” (aka: using keep through keepproxy). Does it happen for example if you upload data from a host on the same network as arvados? That’s an important detail because internal clients are routed directly to keep, and outside/external clients go through keepproxy. The issue I mentioned yesterday is related to that: slow clients accessing keep through keepproxy sometimes don’t transfer a whole 64 MB block in less than the max time required, so keepproxy attempt to write the block fails, making it to retry, until sometimes works, that’s the randomness of the problem occurrence.
    If this problem happens to you while using Keep on the same network as arvados, that may indicate another problem.
    Gudule JR
    @GuduleJR_twitter
    Hi all. Playing aroud with my infraestructure, i notice that, the Admin documentation is wrong, in the Monitoring -> Metrics page: I have to use the URL ike this: "http://ip.of.services:25107/metrics" and "http://ip.of.services:25107/_health/ping". that is, without the ssl crypting...
    Gudule JR
    @GuduleJR_twitter
    Ok, that's right uniquely for the keepstore server. For my own apt server, need to use "https://api.name.server/_health/ping
    Ward Vandewege
    @cure
    @GuduleJR_twitter it depends on the hostname of the service, indeed
    Gudule JR
    @GuduleJR_twitter
    @cure Yes, you're right. The command line with curl is specific for keepstore server. For another services the URL change..