by

Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    Yosuke Hara
    @yosukehara
    LeoFS v1.3.6 has been release:
    Jasper Siepkes
    @siepkes
    hi all! I was doing some experimenting on a lab setup and end up with the follwing state (by doing some stupid things):
     [State of Node(s)]
    -------+--------------------------------------+--------------+----------------+----------------+----------------------------
     type  |                 node                 |    state     |  current ring  |   prev ring    |          updated at         
    -------+--------------------------------------+--------------+----------------+----------------+----------------------------
      S    | leofs-storage-1@10.100.2.199         | stop         | -1             | -1             | 2017-09-28 13:01:42 +0200
      S    | leofs-storage-2@10.100.2.201         | stop         | -1             | -1             | 2017-09-28 13:01:36 +0200
      G    | leofs-gateway-s3-1@10.100.2.220      | running      | 8f824bb0       | 0dc4658a       | 2017-09-27 15:47:21 +0200
    -------+--------------------------------------+--------------+----------------+----------------+----------------------------
    i could just blow the cluster away start over again but im interested in fixing it as an excersie
    i cant seem to delete the 2 stopped storage nodes
    it gives the following error:
    # leofs-adm detach leofs-storage-1@10.100.2.199
    [ERROR] Could not get node-status
    does anyone have any suggestions?
    ( I would expect to always be able to remove a storage node )
    Yosuke Hara
    @yosukehara
    @siepkes Thank you for your report. I’ve recognized leofs-adm detach fail when stopping all nodes as below:
    • leo-project/leofs#855
    I’ll fix this issue at v1.4.0.
    Jasper Siepkes
    @siepkes
    @yosukehara Thanks for the feedback!
    Jasper Siepkes
    @siepkes
    On a related note i'm currently struggling a bit with the (buzzword alert) "cloud-nativeness" of LeoFS. What I mean is how tolerant LeoFS is to VM's being blown away and being recreated. For example leo-project/leofs#514 kinda rains on my cloudnativiness self-healing parade ;-). I realize that kind of manual intervention to reactivate a master node after it went down is normal in the "classic" scenario where you have a server / VM which you always keep around (update LeoFS in VM, etc.). However if you use some form of container orchestration tools with Terraform or Kubernetes this is different. Doing an upgrade usually means deploying an new (immutable) container image in a VM and destroying the old one. My question is; Is this a use case LeoFS wants to support at some point?
    Yosuke Hara
    @yosukehara
    @siepkes We do not have a plan of "dockerize" yet. But we're planning to implement a persistent volumes for K8s with v1.5 o v1.6:
    https://kubernetes.io/docs/concepts/storage/persistent-volumes/
    Yosuke Hara
    @yosukehara
    Jasper Siepkes
    @siepkes
    @yosukehara cool! Thanks for all your hard work!
    Dan Haworth
    @asphytotalxtc
    Hey guys, wonder if anyone could give me some pointers when using recover-cluster. We've simulated a dr cluster going unavailable (by just shutting it all down) and when bringing it up again and doing a recover-cluster on the master (with the cluster name of the now available dr cluster) things seem to work, files created on the primary whilst the second cluster was down appear as they should.
    I can browse them, they're listed, a leofs-adm whereis shows the right size and checksum, but any attempt to download those files just results in an empty response from leofs. All I can see in the gateway error log is "[E] gateway_0@leo-dc2.local 2018-09-07 08:56:24.84394 +0100 1536306984 null:null 0 Bad value on output port 'tcp_inet'"
    Objects replicated before the cluster went down are fine, objects newly replicated after recover-cluster are fine too, it's just the objects created and replicated during recover-cluster from the time the cluster was down that illustrate this issue.
    I've not really got a clue how to further debug this issue, any pointers?
    (could be worth mentioning that these are just two single vms, each running a full cluster on a single node with two managers, and a single gateway and storage node each, don't know if that makes any difference?)
    Dan Haworth
    @asphytotalxtc
    (also using most recent 1.4.2)
    Dan Haworth
    @asphytotalxtc
    Upon further inspection, for objects restored using restore-cluster, getting the object doesn't appear to return any headers .. Here's a wireshark comparison of two requests for the same replicated object from both the primary, and the dr cluster.
    screenshot
    requests for objects that were replicated whilst both clusters online DO have response headers though.. so this does appear to be something to do with restore-cluster
    Dan Haworth
    @asphytotalxtc
    FYI opened issue #1120 on github for this
    Yosuke Hara
    @yosukehara
    @asphytotalxtc Thank you for sharing. We’re going to check #1120 tomorrow.
    Dan Haworth
    @asphytotalxtc
    Thanks guys, saw the update on the issue.. And cheers for the help too, LeoFS seems like an absolutely awesome project! Nice work :)