Where communities thrive

  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
    Narendra Patel
    Hi, has anyone tried hot restart with envoy similar to reload feature of haproxy / nginx. We are implementing a large scale deployment of envoy via consul and need this functionality to avoid dropping existing connections. There could be instances where we might need this, for eg: reloading in case of issues. There seems to be little documentation for the same on consul website. As per envoy docs we need to use hot-restarter.py. But the start_envoy.sh file seems to be different than consul connect way of starting envoy. What should be the correct way to accomplish this? We are currently using systemctl to manage envoy. Can we configure some thing there for hot restart?
    3 replies
    Can anyone tell me how to update the http_max_conns_per_client values and reload consule ?
    George Negoita
    Hello! Is it possible to update the metadata of a node via API (add or delete a key)? I know I can update the config and reload consul, but I was wondering if there is a better solution. Thank you!
    nahsi (Anatoly Laskaris)
    @ngmlabs_twitter I think yes since there is a terraform resource for that https://registry.terraform.io/providers/hashicorp/consul/latest/docs/resources/node
    What could be the possible reason for this error?
    agent.server.memberlist.lan: memberlist: Was able to connect to X but other probes failed, network may be misconfigured
    6 replies
    Vadym Vikulin
    @odysseus654, Hi. I saw your project in github go-udt: https://github.com/odysseus654/go-udt. First, I appreciate your affords. It looks like a great job. Could you enable issue in your repo: then I could add a few bits.
    Iury Fukuda
    Hey, someone can help with a question. please?
    when i try to start mesh gateway
    i had some problem
    in am vm environment
    May 30 17:20:06 r1 consul-mesh-start[42577]: [2022-05-30 17:20:06.836][42577][debug][pool] [source/common/conn_pool/conn_pool_base.cc:443] [C23] client disconnected, failure reason: TLS error: 268435581:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED
    May 30 17:20:06 r1 consul-mesh-start[42577]: [2022-05-30 17:20:06.836][42577][debug][router] [source/common/router/router.cc:1154] [C0][S12085588059115559816] upstream reset: reset reason: connection failure, transport failure reason: TLS error: 268435581:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED
    May 30 17:20:06 r1 consul-mesh-start[42577]: [2022-05-30 17:20:06.836][42577][debug][http] [source/common/http/async_client_impl.cc:100] async http request response headers (end_stream=true):
    May 30 17:20:06 r1 consul-mesh-start[42577]: ':status', '200'
    May 30 17:20:06 r1 consul-mesh-start[42577]: 'content-type', 'application/grpc'
    May 30 17:20:06 r1 consul-mesh-start[42577]: 'grpc-status', '14'
    May 30 17:20:06 r1 consul-mesh-start[42577]: 'grpc-message', 'upstream connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: TLS error: 268435581:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED'
    May 30 17:20:06 r1 consul-mesh-start[42577]: [2022-05-30 17:20:06.836][42577][warning][config] [./source/common/config/grpc_stream.h:195] DeltaAggregatedResources gRPC config stream closed since 278s ago: 14, upstream connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: TLS error: 268435581:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED
    1 reply
    the grpc is configured in server
    and tls seens to be good ( i can use it in browser)
    Iury Fukuda
    thanks, it apparently passed
    now its in =s Error registering service "gateway-primary": Put "": dial tcp connect: connection refused
    1 reply
    Hi everyone,
    I am working with consul v1.12.0 and kubernetes. install some deploys configure service mesh, so far so good.
    The problem is when I want to communicate with an RDS (external service) TCP health checks don't work; I tried two types of approaches with no results:
    • Registering it together with the node, the service and checks via catalog. (output: timeout)
    • Registering a proxy and linking it with the service (output: connection refused)
      Intentions all allow and security groups ok !! here is the repo https://github.com/Nicolasrs23/Consul_proyect.git.
    Marina Shustova
    Hello Everyone,
    I’m looking into how Consul can process “Host” http header instead of destination IP for outgoing http requests.
    In my scenario some requests make it to Consul through proxy, so only "Host" header has information about actual destination.
    Is it possible to configure Consul this way?
    Thanks in advance!
    hi everyone.. getting a lot of error messages "[ERR] memberlist: Push/Pull with <host> failed: Node <host> protocol version (2) is incompatible: [1, 0] - incidentally If i try to add a new node (client) to the cluster, it fails repeatedly and all i can see in the failure logs is a version of these messages
    Failed to join IP of server : Node 'different host name' protocol version (2) is incompatible: [1, 0]
    5 replies
    Riain Condon

    Hi all,

    I am running Consul servers on ECS EC2 which all connect up fine via retry-join on an NLB.

    For the clients, I am using ECS Fargate and retry-join with the aws tags.

    The clients seem to find the server instances and their IPv4 address and attempt to join them, but there's no error logged about that failing. What happens is is that the client starts logging logs like: 2022-06-20T09:00:54.961Z [WARN] agent.router.manager: No servers available and 2022-06-20T09:00:54.961Z [ERROR] agent: failed to sync changes: error="No known Consul servers".

    I've seen a couple issues about the logs above in GitHub but no solutions and I can't determine if this is even related.

    Has anyone seen this before/know off the top of their head what this could be?

    Patrick Flick
    I'd like to put a templated app config into consul KV. Is it possible that consul updates its kv value based on consul template? Is there an easy way to achieve this that doesn't require manually triggered scripts?

    Hmm, strange one related to connect.

    All clients/servers have connect enabled. However, ALL clients are reporting this error every 10 minutes+-:

    Jun 29 00:24:27 ip-11-0-3-20 consul[1572]: {"@level":"error","@message":"RPC failed to server","@module":"agent.client","@timestamp":"2022-06-29T00:24:27.108110Z","error":"rpc error making call: i/o deadline reached","method":"ConnectCA.Roots","server":{"IP":"","Port":8300,"Zone":""}}
    Jun 29 00:24:27 ip-11-0-3-20 consul[1572]: {"@level":"warn","@message":"handling error in Cache.Notify","@module":"agent.cache","@timestamp":"2022-06-29T00:24:27.108796Z","cache-type":"connect-ca-root","error":"rpc error making call: i/o deadline reached","index":12}

    Connect sidecar proxies fail to deploy (with nomad), Traefik fails with a similar error when setup to use consul connect.
    KV sync and health check sync is working. The network is open between the cluster and clients (confirmed with telnet {server-ip} 8300 from client). curl https://{server-ip}:8501/v1/connect/ca/roots returns a valid 200 response with a CA cert.

    I've successfully deployed this before, which makes it doubly strange. THE ONLY difference between past consul deployments and this one, is TLS auto_encrypt for the clients. In the past I've distributed client certs. TLS settings are set to their strictest, including tls { internal_rpc { verify_server_hostname = true } }

    ACLs are also enabled.

    The servers themselves don't have any logs of interest (at least at INFO level).

    Any ideas, how can I debug further?

    1 reply

    Hi there, I'm using Traefik which builds its configuration using Consul Catalog. Upon Traefik startup, it takes >5 minutes for Traefik to retrieve its configuration from Consul Catalog. Looking in Traefik logs, it looks like it's having issues fetching the Connect certificate from Consul

    level=info msg="Waiting for Connect certificate before building first configuration" providerName=consulcatalog

    while it appears Consul seems to be canceling the request

    consul[458]: agent.http: Request cancelled: method=GET url=/v1/agent/connect/ca/roots?index=9 from= error="context canceled"
    consul[458]: agent.http: Request cancelled: method=GET url=/v1/agent/connect/ca/leaf/traefik?index=111619 from= error="context canceled"

    I am on Consul v1.12.0. How can I debug what's causing Consul to be canceling the request like this?

    Upgraded to Consul v1.12.2, seems to have fixed the issue
    @axsuul:matrix.org See my question above, it was exactly the same problem. Upgrading to v1.12.2 fixed it. I lost 2 days on this... sigh
    @seanamos: Thanks! Yep same, lost days but glad there's a fix 😊
    Narendra Patel
    Hi, is connect non mandatory? We missed setting it to true for 2 of our lower env DCs and service mesh was still working with envoy receiving certificates and it getting rotated as well post the default 72h interval.
    Marina Shustova
    Could you please tell me if Hashicorp has any community meetings? If yes, where can I find the schedule?
    I seem to have some type of phantom Vault service in Consul, is there any way for me to force remove this?
    Shantanu Gadgil
    @axsuul:matrix.org Consul catalog deregister?
    Do you mean consul services deregister? How would I specify however just that one IP since I don't want to deregister the entire service
    nahsi (Anatoly Laskaris)
    consul services deregister -id vault: on the host with that ip
    Shantanu Gadgil
    @axsuul:matrix.org actually there are two deregister commands...one from the node and another from the catalog
    Jason Witkowski

    Hey All, I am having TONS of errors about RPC connections failing between my consul server and mesh gateway pods inside my kubernetes cluster.

    2022-07-21T17:03:24.666Z [ERROR] agent.server.rpc: failed to ingest RPC: sni=consul-server-1.server.lhr-poc1-dataplane.dev.consul protocol=consul/wan-gossip/packet conn=from= error="read tcp> i/o timeout"

    I have googled to infinity, I have modified gossip_wan settings, I have opened firewall/security group settings to be wide open, but nothing seems to work

    Has anyone seen these issues before or could maybe provide me any insight into why this is failing?
    Jason Witkowski
    Putting my mesh gateway into trace level logging I see the following:
    [2022-07-21 18:02:52.789][50][debug][connection] [source/common/network/connection_impl.cc:890] [C613] connecting to
    [2022-07-21 18:02:52.789][50][debug][connection] [source/common/network/connection_impl.cc:909] [C613] connection in progress
    [2022-07-21 18:02:52.789][50][trace][pool] [source/common/conn_pool/conn_pool_base.cc:130] not creating a new connection, shouldCreateNewConnection returned false.
    [2022-07-21 18:02:52.789][50][debug][conn_handler] [source/server/active_tcp_listener.cc:140] [C612] new connection from
    [2022-07-21 18:02:52.789][50][trace][connection] [source/common/network/connection_impl.cc:554] [C612] socket event: 2
    [2022-07-21 18:02:52.789][50][trace][connection] [source/common/network/connection_impl.cc:663] [C612] write ready
    [2022-07-21 18:02:52.790][50][trace][connection] [source/common/network/connection_impl.cc:554] [C613] socket event: 2
    [2022-07-21 18:02:52.790][50][trace][connection] [source/common/network/connection_impl.cc:663] [C613] write ready
    [2022-07-21 18:02:52.790][50][debug][connection] [source/common/network/connection_impl.cc:672] [C613] connected
    [2022-07-21 18:02:52.790][50][trace][connection] [source/common/network/connection_impl.cc:417] [C613] raising connection event 2
    [2022-07-21 18:02:52.790][50][trace][connection] [source/common/network/connection_impl.cc:356] [C613] readDisable: disable=true disable_count=0 state=0 buffer_length=0
    [2022-07-21 18:02:52.790][50][debug][pool] [source/common/conn_pool/conn_pool_base.cc:294] [C613] attaching to next stream
    [2022-07-21 18:02:52.790][50][debug][pool] [source/common/conn_pool/conn_pool_base.cc:177] [C613] creating stream
    [2022-07-21 18:02:52.790][50][trace][connection] [source/common/network/connection_impl.cc:356] [C613] readDisable: disable=false disable_count=1 state=0 buffer_length=0
    [2022-07-21 18:02:52.790][50][trace][connection] [source/common/network/connection_impl.cc:356] [C612] readDisable: disable=false disable_count=1 state=0 buffer_length=0
    [2022-07-21 18:02:52.790][50][debug][filter] [source/common/tcp_proxy/tcp_proxy.cc:609] [C612] TCP:onUpstreamEvent(), requestedServerName: cpeconsul-consul-server-4.server.l
    [2022-07-21 18:02:52.790][50][trace][connection] [source/common/network/connection_impl.cc:554] [C613] socket event: 2
    [2022-07-21 18:02:52.790][50][trace][connection] [source/common/network/connection_impl.cc:663] [C613] write ready
    [2022-07-21 18:02:52.790][50][trace][connection] [source/common/network/connection_impl.cc:554] [C612] socket event: 3
    [2022-07-21 18:02:52.790][50][trace][connection] [source/common/network/connection_impl.cc:663] [C612] write ready
    [2022-07-21 18:02:52.790][50][trace][connection] [source/common/network/connection_impl.cc:592] [C612] read ready. dispatch_buffered_data=false
    [2022-07-21 18:02:52.790][50][trace][connection] [source/common/network/raw_buffer_socket.cc:24] [C612] read returns: 341
    [2022-07-21 18:02:52.790][50][trace][connection] [source/common/network/raw_buffer_socket.cc:38] [C612] read error: Resource temporarily unavailable
    Yann Huissoud

    Two similar clusters, two similar consul configs. trying to spawn a second consul cluster, one join the other not:

    [WARN]  agent.server: Raft has a leader but other tracking of the node would indicate that the node is unhealthy or does not exist. The network may be misconfigured.: leader=
    [WARN]  agent: Syncing node info failed.: error="Raft leader not found in server lookup mapping"
    [ERROR] agent.anti_entropy: failed to sync remote state: error="Raft leader not found in server lookup mapping"
    [ERROR] agent.server.memberlist.lan: memberlist: Conflicting address for pxe-boot. Mine: Theirs: Old state: 0
    [ERROR] agent.server.serf.lan: serf: Node name conflicts with another node at Names must be unique! (Resolution enabled: false)

    Any idea what might cause the error?

    Alvin Lin
    @Amier3 any luck finding someone new to take a look at hashicorp/memberlist#262
    Marc Richter
    WOW - last message I see is from Jul 26 - is this thing still alive?
    I have no idea, but it is extremely quiet.
    Marc Richter
    Hmm. What is "the main" Community Channel/Platform then if it isn't Gitter?

    In the meantime, I will put my question in here anyways. Maybe someone who might help reads it ...
    As I described in discuss already (which seems to have similar activity as Gitter), the official Deployment Guide is inconsistent when it comes to TLS configuration.

    In “Create the certificates” section, it says: “First, for your Consul servers, use the following command to create a certificate for each server.”. So: not for the clients, since “servers” is explicitly written.
    Next it says: " The Consul client agents will only need the the CA certificate, consul-agent-ca.pem , to enable mTLS.". So again: It confirms that the clients only need the CA certificate, not the DC certificates.

    But then, with the very next section “Distribute the certificates to agents”, it says: “You must distribute the CA certificate, consul-agent-ca.pem, to each of the Consul agents as well as the agent specific certificate and private key.”. So, from here, it says that one must copy all node specific certs in addition to the CA certificate, which is the opposite of what was explained before.

    This is once more confirmed in the TLS configuration - Section. Even though “Auto encryption” guide is selected, the consul.hcl snipplet lists not only ca_file, but cert_file and key_file parameters as well “for Consul clients”. The only difference between “Auto” and “Manual” seems to be the auto_encrypt nested section. Which again seems to be the opposite of the “CA cert only” statement and the entire Auto encryption idea.

    Marc Richter
    Regarding that auto_encrypt nested section, the consul Security guide brings another unclear element onto the table: in Configure the clients section, it says to configure the clients by indeed setting the ca_file option only, but instead of auto_encrypt { allow_tls = true } to set auto_encrypt { tls = true } instead.
    What's correct now?
    Marc Richter
    As far as I understand from the general Consul Configuration Reference, on servers auto_encrypt { allow_tls = true } must be set and on clients auto_encrypt { tls = true }; but that's what my interpretation is and I'm unsure if that's correct.
    1 reply
    Hi, Does anyone know how to handle this message:
    [WARN] agent.server.serf.lan: serf: Intent queue depth (11437) exceeds limit (10690), dropping messages!
    Consul’s version is 1.13.1
    not sure, but either your have way too many hosts/services and consul is already having problem with all them, or some node is slow and is getting more healtchecks to do than those that it can manage...
    segment the consul in the first one, increase the node or solve the load issue in the second
    that is also a warning, so if just a random event, it was probably just load and worse case you failed to do a healtcheck for some hosts/services in time
    @oratlv: ↑
    Hello, i try to use the consul kv inside kubernetes, consul implemented but inside a pod the code says
    Unhandled exception. System.Net.Http.HttpRequestException: Connection refused (
    1 reply