Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Pascal Maria
    @p.maria_gitlab
    Bonjour,
    Je vais poser ma question sur ce fil de conversations car pas de retour sur le ovh/metrics et, il me semble, plus de réactivité.
    Je cherche à monitorer l'utilisation de l'espace des persistent volumes.
    Si quelqu'un avait une solution ?
    Merci.
    Cdlt,
    Roland Édouard Jean Laurès
    @rlaures.pro_gitlab
    Perso, quand j’ai dû le faire, je n’ai pas eu d’autres choix que d’avoir une sonde sur un Pod ayant acces au volume.
    Pascal Maria
    @p.maria_gitlab
    @rlaures.pro_gitlab
    Merci pour ce retour.
    Cela m'étonne de moins en moins suite aux différentes lectures que j'ai pu faire sur le sujet.
    Mais, j'ai du mal à comprendre cette réponse validée :
    https://stackoverflow.com/a/47117776
    Roland Édouard Jean Laurès
    @rlaures.pro_gitlab
    Oui, mais cette réponse en dis long : https://stackoverflow.com/a/44792060/4558590
    Roland Édouard Jean Laurès
    @rlaures.pro_gitlab
    Bonjour ovh !
    Je vais ouvrir un ticket, toutefois hier j'ai créé un nouveau pool de 6 nœuds (c2-15). Et depuis lors les pods créé dessus (dans lesquels j'ai du service web) sont extrêmement lent. L'application Rails reçoit rapidement la demande, la traite en moins de 100 ms, puis chaque versement de fichier se bloque pendant 10-30s pour être rapidement libéré.
    Pour les pods provisionnés sur les nouveaux nœuds :
    • les sondes internes (livenessProbe et readinessProbe) subissent le même soucis,
    • l'accès au site depuis l'extérieur est très lent aussi,
    • un des threads rentre dans l'état D avant de disparaître et d'être remplacer par des nouveaux.
      Lorsque les pods sont provisionnés sur les anciens nœuds :
    • il peut y avoir des grosses latences comme ça par les sondes,
    • mais l'accès au site depuis l'extérieur (dans un navigateur) fonctionne sans latence.
      Je pense donc qu'il ne s'agit pas d'un soucis de code de notre côté, ni d'un problème sur le load-balancer (notamment les anciens pods n'ont pas de soucis).
      Est-ce que quelqu'un a une idée ?
    2 replies
    Jérémy Levilain
    @IamBlueSlime
    Hi guys! Little question about clusters in vRack, is this working with DHCP disabled?
    I currently have a "classic" cluster which communicate to an instance hosting my database through internet. I'm considering resetting my cluster with my vRack but will this work with static IPs? Or maybe could we have a subnet with DHCP dedicated to my cluster?
    Alban Mouton
    @albanm
    Hello, our production cluster is a mess suddenly : nodes switch from status "ready" / "notready" very fast, and inside the cluster new services or services targetting new pods all hangup forever
    Alban Mouton
    @albanm
    some important pod was evicted then recreated for some reason, following what the service now hangs up forever (as described in my previous message) and the application is down... some help would be greatly appreciated !
    1 reply
    Roland Édouard Jean Laurès
    @rlaures.pro_gitlab
    Hello, @OVH anything about my problem and ticket # 2448084 ?
    Andy Tan
    @tanandy
    Hi guys, someone uses OVH SSL gateway in front K8s managed ? i got 404 page not found so i guess SSL gtw cannot contact the backends
    David Jeansen
    @EsKuel
    Hello, I've an issue with Ingress on one of my cluster (it already happened before), the watch Kubernetes API for Ingress seems to don't work, can someone please help ?
    6 replies
    fkalinowski
    @fkalinowski
    Hi,
    I'm currently testing the Managed K8S inside a vRack.
    Here is my setup:
    • Subnet 172.16.1.0/24 (VLAN 0) is used by my baremetal hosts in the vRack
    • Subnet 172.16.5.0/24 (VLAN 0) is configured for any OpenStack VM popped in my Public Cloud Private Network (via Horizon UI)
    • Gateway (pfSense ) with 2 network interface (172.16.5.1 + 172.16.1.252) as OpenStack Instance in my Public Cloud to route trafic between both subnets 172.16.1.0/24 and 172.16.5.0/24
    • The Public Cloud Subnet is also configured (via Horizon UI) to enable DHCP + provide the Gateway 172.16.5.1 + provide DNS server 172.16.1.1 + push the route 172.16.1.0/24 via 172.16.5.1 (i.e. the pfSense gateway)
      .
      To fully test this setup, I've deployed a D2-2 (Ubuntu 20.04) instance in my Public Cloud, here is the result:
    • the instance get an IP in the expected DHCP range of subnet 172.16.5.0/24 ==> OK
    • I can ping the instance from the same subnet (via pfSense) ==> OK
    • I can ping the instance from the other subnet (via baremetal host) ==> OK
      ** the instance has the appropriate routes to reach subnet 172.16.1.0/24 ==> OK
    • the configured DNS servers are available in /etc/resolv.conf and DNS resolution works ==> OK
      .
      After validating my setup, I've configured a Managed K8S inside the appropriate Private Network/vRack, here is the result:
    • the worker nodes get an IP in the expected DHCP range of 172.16.5.0/24 ==> OK
    • I can ping the worker node from the same subnet (via pfSense) ==> OK
    • I CANNOT ping the worker node from the other subnet ==> NOK
      the worker node DOES NOT have the appropriate routes to reach subnet 172.16.1.0/24 ==> NOK
      *
      If I manually add the appropriate route via a Pod with hostNework and NET_ADMIN capability then the routing works...
    • the configured DNS servers are NOT available in the worker NODES ==> NOK
      .
      In conclusion, it seems that OpenStack subnet configuration (Gateway, Routes, DNS servers) is NOT honored by provisioned Managed K8S Worker Nodes.
      Can you confirm ?
    3 replies
    golivhub
    @golivhub
    @OVH
    Dear all, We are facing a problem on one of our managed kubernetes service concerning Cert-Manager, since 3 or 4 days we are unable to have a SSL certificate using certmanager. Challenges are no longer requested.
    This cluster has been created from scratch around May 20th, and yet cert-manager is not working.
    Could we have a chat about this ? I've opened a ticket about this problem.
    4 replies
    EnergieZ
    @EnergieZ

    Hi All,

    I followed this page to install nginx ingress controller : https://docs.ovh.com/gb/en/kubernetes/installing-nginx-ingress/

    One ingress pod is in error (CrashLoopBackOff). In log i can see this :

    I0615 09:35:44.275165 1 flags.go:208] "Watching for Ingress" class="nginx"
    W0615 09:35:44.275281 1 flags.go:213] Ingresses with an empty class will also be processed by this Ingress controller
    W0615 09:35:44.275952 1 client_config.go:614] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
    I0615 09:35:44.276362 1 main.go:241] "Creating API client" host="https://10.3.0.1:443"
    I0615 09:35:44.359492 1 main.go:285] "Running in Kubernetes cluster" major="1" minor="20" git="v1.20.2" state="clean" commit="faecb196815e248d3ecfb03c680a4507229c2a56" platform="linux/amd64"
    I0615 09:35:44.381985 1 main.go:87] "Valid default backend" service="default/nginx-ingress-nginx-ingress-controller-default-backend"
    I0615 09:35:44.930715 1 main.go:105] "SSL fake certificate created" file="/etc/ingress-controller/ssl/default-fake-certificate.pem"
    I0615 09:35:44.939068 1 main.go:115] "Enabling new Ingress features available since Kubernetes v1.18"
    E0615 09:35:44.945906 1 main.go:124] "Searching IngressClass" err="ingressclasses.networking.k8s.io \"nginx\" is forbidden: User \"system:serviceaccount:default:nginx-ingress-nginx-ingress-controller\" cannot get resource \"ingressclasses\" in API group \"networking.k8s.io\" at the cluster scope: RBAC: clusterrole.rbac.authorization.k8s.io \"nginx-ingress-nginx-ingress-controller\" not found" class="nginx"
    W0615 09:35:44.945957 1 main.go:127] No IngressClass resource with name nginx found. Only annotation will be used.
    W0615 09:35:45.013608 1 store.go:620] Unexpected error reading configuration configmap: configmaps "nginx-ingress-nginx-ingress-controller" not found
    I0615 09:35:45.041365 1 nginx.go:254] "Starting NGINX Ingress controller"
    E0615 09:35:45.046968 1 reflector.go:138] k8s.io/client-go@v0.20.2/tools/cache/reflector.go:167: Failed to watch *v1.Secret: failed to list *v1.Secret: secrets is forbidden: User "system:serviceaccount:default:nginx-ingress-nginx-ingress-controller" cannot list resource "secrets" in API group "" at the cluster scope: RBAC: clusterrole.rbac.authorization.k8s.io "nginx-ingress-nginx-ingress-controller" not found
    E0615 09:35:45.047065 1 reflector.go:138] k8s.io/client-go@v0.20.2/tools/cache/reflector.go:167: Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:default:nginx-ingress-nginx-ingress-controller" cannot list resource "configmaps" in API group "" at the cluster scope: RBAC: clusterrole.rbac.authorization.k8s.io "nginx-ingress-nginx-ingress-controller" not found
    ...

    Any idea how to resolve these problems ?
    Thank you for your help

    6 replies
    Sébastien Gaïde
    @sgaide
    Hello, would it be possible for an ovh guy to restart the api server of one of my clusters please ?
    4 replies
    EnergieZ
    @EnergieZ
    @OVHTeam , could some one restart my API please.
    K8S id : a9bfacee-e495-4a54-bc0b-61ad7d624df3
    Thank you
    2 replies
    Andy Tan
    @tanandy
    I see lot of errors with API server cache. What are the coming changes on that ? Any fix in the roadmap
    10 replies
    Andy Tan
    @tanandy
    fyi, im experiencing issue in horizon i cannot change security groups of my existing vms (no security groups found)
    11 replies
    image.png
    Chaya56
    @Chaya56
    Hello, do we have some problem with API ?
        chaya@DESKTOP-BHGN56S  /mnt/c/Data/Projects/Origination-ops/k8s/kustomize   master ±  kubectl -n feature exec -it crm-api-sezaam-pe-bd475f7b9-txswh -- curl -I -H "Authorization: Basic XXX" http://crm-api-arx:8080/actuator/health
        error: error sending request: Post "https://ctkjcx.c1.gra7.k8s.ovh.net/api/v1/namespaces/feature/pods/crm-api-sezaam-pe-bd475f7b9-txswh/exec?command=curl&command=-I&command=-H&command=Authorization%3A+Basic+aGVhbHRoY2hlY2s6ODUhd0VCazZe&command=http%3A%2F%2Fcrm-api-arx%3A8080%2Factuator%2Fhealth&container=java&stdin=true&stdout=true&tty=true": EOF
        ✘ chaya@DESKTOP-BHGN56S  /mnt/c/Data/Projects/Origination-ops/k8s/kustomize   master ±  kubectl -n feature exec -it crm-api-sezaam-pe-bd475f7b9-txswh -- curl -I -H "Authorization: Basic XXX" http://crm-api-arx:8080/actuator/health
        HTTP/1.1 200
        Set-Cookie: JSESSIONID=EA49BB8EA3879AA2FDDCC109F9CC1BBC; Path=/; HttpOnly
        X-Content-Type-Options: nosniff
        X-XSS-Protection: 1; mode=block
        Cache-Control: no-cache, no-store, max-age=0, must-revalidate
        Pragma: no-cache
        Expires: 0
        X-Frame-Options: DENY
        Content-Type: application/vnd.spring-boot.actuator.v3+json
        Transfer-Encoding: chunked
        Date: Tue, 15 Jun 2021 12:26:43 GMT
    6 replies
    hpannetier
    @hpannetier
    Hello.
    Since yesterday (14/06/2021) volume attachment is failing.
    The k8s events reports :
    • AttachVolume.Attach failed for volume
    • MountVolume.WaitForAttach failed for volume
      Is this connected with the incident described here : http://travaux.ovh.net/?do=details&id=50121& which mainly impacts the GRA7
      region where my node is located?
      Is there any possible workaround?
      Thank you
    Andy Tan
    @tanandy
    Bernhard J. M. Grün
    @bernhardgruen_twitter

    Hello together @OVH @OVHteam,
    I look for a solution to automatically monitor the free capacity of persistent volumes. Normally (as in most other providers) these values are available via an metrics endpoint and the metrics are called kubelet_volume_stats_available_bytes
    and kubelet_volume_stats_capacity_bytes. Unfortunately I can't find them on OVH. Do you have an idea or suggestion how we could get that information automatically in order to alert if the free space of a volume is critically low.
    Currently our admins have to look into those volumes by hand - this is not feasible for a larger number of clusters.

    Thank you in advance
    Bernhard J. M. Grün

    lchdev
    @lchdev
    Hi @OVHTeam, my cluster seem to have issues with RBAC: pods are unable to find roles, while I can properly list them using kubectl. Does it have something to do with stale caches of the API Server ? Can it be solved by restarting the API server ?
    5 replies
    NullACK
    @arkalira_gitlab
    Hi @OVHTeam, we have created a managed kubernetes cluster and now we are trying to create volume snapshots using csi-cinder-snapclass. We didnt have success creating the snapshots even with the snapshot-controller installed. https://kubernetes-csi.github.io/docs/snapshot-controller.html#deployment Does csi-cinder-snapclass work to backup volumes (highspeed or classic)?
    1 reply
    Anthony Domingue
    @hessman
    Hi OVH team, I'm experiencing a strange behaviour with one of my node. When I describe the node, it's in "Not Ready" state due to the error "Kubelet stopped posting node status." but when I check the status on the OVH panel it seems up. Can you explain me why the node is stuck ? Also can I use the OVH panel to perform a soft reboot in order to fix the issue ?
    golivhub
    @golivhub
    hello @OVH we have a node in not ready status on one of our clusters
    this cluster is brand new, freshly installed less than a month ago.
    we have nothing working properly on this node, that maintain our internal Gitlab ...
    NAME                                         STATUS     ROLES    AGE   VERSION
    nodepool-7a8294de-6f3a-4cb8-ac-node-33e20b   Ready      <none>   30d   v1.18.15
    nodepool-7a8294de-6f3a-4cb8-ac-node-8307a7   Ready      <none>   30d   v1.18.15
    nodepool-7a8294de-6f3a-4cb8-ac-node-aed25a   Ready      <none>   30d   v1.18.15
    nodepool-7a8294de-6f3a-4cb8-ac-node-b1f02e   Ready      <none>   30d   v1.18.15
    nodepool-7a8294de-6f3a-4cb8-ac-node-ecae14   NotReady   <none>   30d   v1.18.15
    Xavier Duthil
    @xduthil_gitlab
    Hello @/all,
    There is an outage on a rack in Gravelines. GRA5 & GRA7 regions are impacted.
    Some of your nodes may be NotReady at the moment.
    http://travaux.ovh.net/?project=18&status=all&perpage=50
    1 reply
    golivhub
    @golivhub
    such as our applications on K8S
    cemonneau
    @cemonneau
    when the issue is resolved, a postmortem will be required to me and know the precaution taken to ensure it wont happen again. OVH cloud, and k8s, is NOT reliable at this stage, and well, I think we deserve to get one, 80% of the time we don't get it. We've been talking with our clients, due to the SLA we are working to switch to other K8S providers, sorry guys, we can't take it anymore and let critical prod on it
    7 replies
    Jérémie MONSINJON
    @jMonsinjon

    Really sorry to read your messages, and really sorry that you are impacted by the Public Cloud outage on GRA5 and GRA7.

    As you know, k8s is a technology that help you to minimize downtime during incident (and so many more features).
    As a good practice your kubernetes cluster should be configured to be able to handle the loss of one node.
    As stated in our documentations (https://docs.ovh.com/gb/en/kubernetes/known-limits/)
    To ensure high availability for your services, it is recommended to possess the computation power capable of handling your workload even when one of your nodes becomes unavailable.

    An anti-affinity feature is also available on node-pools, to spread your nodes on different public cloud hosts. With this feature, you will be able to limit the impact in the event of such an incident.

    golivhub
    @golivhub
    OK, but the problem here is not on my side :
    NAME                 STATUS      MESSAGE                                                                                     ERROR
    scheduler            Unhealthy   Get http://127.0.0.1:10251/healthz: dial tcp 127.0.0.1:10251: connect: connection refused   
    controller-manager   Unhealthy   Get http://127.0.0.1:10252/healthz: dial tcp 127.0.0.1:10252: connect: connection refused   
    etcd-0               Unhealthy   Get https://coastguard:23790/health: stream error: stream ID 1; INTERNAL_ERROR
    2 replies
    Jérémie MONSINJON
    @jMonsinjon
    @golivhub
    You can found the explanation here: https://docs.ovh.com/sg/en/kubernetes/known-limits/#cluster-health_1
    golivhub
    @golivhub
    yes, sorry, emergency somtimes makes you do bad things, forgot it was managed and that i could not see these
    golivhub
    @golivhub
    in any cas, a volumes is attache to one of the faulty nodes, and cannot be detached
    Warning  FailedAttachVolume  13s        attachdetach-controller  Multi-Attach error for volume "ovh-managed-kubernetes-bee5xr-pvc-5a5e3932-636e-4f5e-b1e7-edc6c50ca1f1" Volume is already exclusively attached to one node and can't be attached to another
    Florian Lacrampe
    @quadeare
    Same as @golivhub. I have anti-affinity on my nodes, but I still have outage on my production. I can't mount volumes that are present on other nodes and I can't drain faulted nodes correctly.
    It's not a problem to lose nodes, it happens. But recovery is almost impossible with the volume and drain issues.
    golivhub
    @golivhub
    The idea would be to be able with support team, to detach openstack volumes from faulty nodes, don't you think ?
    Jérémie MONSINJON
    @jMonsinjon
    Agree
    We already asked to the dedicated team. I don't know yet if they will be able to do it quickly
    Florian Lacrampe
    @quadeare
    Next time, it should be done automatically. Kubernetes should be able to resume activity automatically... from my point of view, it's not the case now.
    1 reply
    Bernhard J. M. Grün
    @bernhardgruen_twitter

    Hello together,
    I look for a solution to automatically monitor the free capacity of persistent volumes. Normally (as in most other providers) these values are available via an metrics endpoint and the metrics are called kubelet_volume_stats_available_bytes
    and kubelet_volume_stats_capacity_bytes. Unfortunately I can't find them on OVH (using Kubernetes 1.20.2). Do you have an idea or suggestion how we could get that information automatically in order to alert if the free space of a volume is critically low.
    Currently our admins have to look into those volumes by hand - this is not feasible for a larger number of clusters.

    Thank you in advance
    Bernhard J. M. Grün

    fkalinowski
    @fkalinowski
    Hi @OVHTeam,
    Our freshly new cluster have some troubles with its worker nodes, they are continuously going to NotReady status then back to Ready - this is changing every few seconds on different worker nodes.
    3 replies
    EnergieZ
    @EnergieZ
    Hi
    I'm actually creating a new offer to host some important websites on K8S (using OVH).
    About media, like images, is it a good idea to host them on block storage ? Or is it better to create a CEPH on the K8S to host them ?
    Thank's for your help.
    7 replies
    Coolero
    @Coolero
    Hello, is there a way to add self-managed nodes to OVH Kubernetes?
    4 replies
    Patrick Palacin
    @ppalacin_gitlab

    Hi i encounter this etcd issues after quota exceeded:

    Error from server: rpc error: code = Unknown desc = quota computation: etcdserver: not capable

    Can you check etcd ?

    I am encountering the same, but I didnt even exceed the quota. Any solution for it? I am not even able to upgrade existing helm charts because of it

    1 reply
    golivhub
    @golivhub
    Dear @OVH, we have received alerting from our alertmanager on one of our nodes
    
    Annotations
    description = Filesystem on shm at nodepool-7a8294de-XXXX-XXX-XX-node-XXXX has only 0.34% available space left.
    5 replies
    waht does it implies and how to fix this, alert is seen has critical
    Pierrick Gicquelais
    @Kafei59
    Hello Gitter, I am going to perform a maintenance patch, on Monday 21th June, to update a release of a major component which is in front of your apiservers. It should NOT impact your services, but in any case do not hesitate to ping me here, I will be extra vigilant. You can find the travaux task here: http://travaux.ovh.net/?do=details&id=51474