Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Illya Chekrygin
    @ichekrygin
    totally excited, got tired of the mounting EBS nonsense
    Bassam Tabbara
    @bassam
    @ichekrygin would love to understand your scenarios of running rook in AWS. we've been assuming that most people would be using rook on-premise and not necessarily in AWS/Azure etc.
    Illya Chekrygin
    @ichekrygin
    hi, to circle back, after running Prometheus w/ Rook on K8S for last 12 hrs
    blob
    last gap, is me forcing the Prometheus pod to start on another EC2 instance
    In comparison to:
    blob
    Running Prometheus w/ EBS on K8S: first gap EC2 with Prometheus pod is being killed by AWS, second: Prometheus pod restart
    Bassam Tabbara
    @bassam
    whats the Y axis?
    Bassam Tabbara
    @bassam
    @ichekrygin not sure I follow the significance of the gaps
    Illya Chekrygin
    @ichekrygin
    time it took to restart a pod
    y-axis is just a random metric (in this case CPU usage)
    the significance is the x-axis (time), specifically gap in metrics due to pod not being able to start
    Bassam Tabbara
    @bassam
    @ichekrygin what do you conclude from this?
    Illya Chekrygin
    @ichekrygin
    w/ Rook, i see no more than a 60 sec delay, with EBS i see at times delay over full hour
    DanKerns
    @DanKerns
    How can it be so slow on EBS?
    Illya Chekrygin
    @ichekrygin
    it took a while for EBS to be marked as detached, before it could be re-mounted on new instance
    Bassam Tabbara
    @bassam
    oh wow
    thats bad
    Illya Chekrygin
    @ichekrygin
    an hour, was an extreme case, but on average, we see 5-10 min delay just on pod restart
    with EBS, pod must be started on the node running in the same AWS AZ
    DanKerns
    @DanKerns
    There is an opportunity for Rook!
    Illya Chekrygin
    @ichekrygin
    with Rook, it could be started on any node
    i will late it bake for few more days, to accumulate more data (larger volume) and will repeat my test
    Bassam Tabbara
    @bassam
    @ichekrygin thanks for the insights here.
    next question: how do we get it to 0 delay on pod restart
    Illya Chekrygin
    @ichekrygin
    i think it's a tall order with Prometheus specifically, since it need to read (reload) all the data back from the disk before becoming operational, hence, there is an inherent delay. EBS was adding an additional delay on top of that.
    Elson Rodriguez
    @elsonrodriguez_twitter

    Heya, I tried rook, but I’m getting a sad cluster:

    OSDs:
    TOTAL     UP        IN        FULL      NEAR FULL
    4         1         1         false     false

    The OSDs are all complaining along these lines:

    2017-03-06 10:08:02.340442 I | osd1: 2017-03-06 10:07:50.174195 f439bf80700 -1  Processor -- bind was unable to bind. Trying again in 5 seconds 
    2017-03-06 10:08:02.340456 I | osd1: 2017-03-06 10:07:55.192786 f439bf80700 -1  Processor -- bind unable to bind to 172.16.2.250:7300/1023 on any port in range 6800-7300: (99) Cannot assign requested address

    Any tips?

    172.16.2.250 is the Host IP , not the Pod IP
    bzub
    @bzub
    Are you using v0.3.0 of rook-operator?
    Elson Rodriguez
    @elsonrodriguez_twitter
    Yes
    bzub
    @bzub
    When I run it, it doesn't use hostNetworking, hm
    should be pod IPs
    Elson Rodriguez
    @elsonrodriguez_twitter
    Correct
    Odd pods have a proper pod ip
    Travis Nielsen
    @travisn
    Rook doesn't specify which ip to use, so Ceph tries to find one on the same subnet
    the OSD pods run as privileged, which means they could find the host ip
    bzub
    @bzub
    interesting. I wonder what's different on your cluster then,
    should still be in the pod network namespace I would think without hostNetworking
    Elson Rodriguez
    @elsonrodriguez_twitter
    What's the
    What's the mechanism for the discovery of the osd ip?
    bzub
    @bzub
    In normal situations you just need to know the Mon IPs
    and they keep track of OSDs
    kubectl -n rook exec osd-6t04n -- ip addr                                                    130 ↵  222911:44:24 
    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
        inet 127.0.0.1/8 scope host lo
           valid_lft forever preferred_lft forever
        inet6 ::1/128 scope host
           valid_lft forever preferred_lft forever
    3: eth0@if47: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue state UP
        link/ether 46:79:f3:eb:c4:1b brd ff:ff:ff:ff:ff:ff
        inet 10.2.136.151/32 scope global eth0
           valid_lft forever preferred_lft forever
        inet6 fe80::4479:f3ff:feeb:c41b/64 scope link
           valid_lft forever preferred_lft forever
    maybe see what that says for one of your osd pods
    kubectl -n rook get pods -o wide
    will tell you their IPs also
    Travis Nielsen
    @travisn
    @bzub, 10.2.136.151 is the pod ip, correct? in my environment i actually don't see the host ip showing up either in the osd pods
    bzub
    @bzub
    correct
    host IP never came up in my environment either so I'm intrigued