Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Shantanu Gadgil
    @shantanugadgil
    @axsuul:matrix.org Consul catalog deregister?
    axsuul
    @axsuul:matrix.org
    [m]
    Do you mean consul services deregister? How would I specify however just that one IP since I don't want to deregister the entire service
    nahsi (Anatoly Laskaris)
    @nahsi:nahsi.dev
    [m]
    consul services deregister -id vault:100.99.252.246:8200 on the host with that ip
    Shantanu Gadgil
    @shantanugadgil
    @axsuul:matrix.org actually there are two deregister commands...one from the node and another from the catalog
    Jason Witkowski
    @jwitko

    Hey All, I am having TONS of errors about RPC connections failing between my consul server and mesh gateway pods inside my kubernetes cluster.

    2022-07-21T17:03:24.666Z [ERROR] agent.server.rpc: failed to ingest RPC: sni=consul-server-1.server.lhr-poc1-dataplane.dev.consul protocol=consul/wan-gossip/packet conn=from=10.245.3.51:57878 error="read tcp 10.245.2.195:8300->10.245.3.51:57878: i/o timeout"

    I have googled to infinity, I have modified gossip_wan settings, I have opened firewall/security group settings to be wide open, but nothing seems to work

    Has anyone seen these issues before or could maybe provide me any insight into why this is failing?
    Jason Witkowski
    @jwitko
    Putting my mesh gateway into trace level logging I see the following:
    [2022-07-21 18:02:52.789][50][debug][connection] [source/common/network/connection_impl.cc:890] [C613] connecting to 10.245.1.44:8300
    [2022-07-21 18:02:52.789][50][debug][connection] [source/common/network/connection_impl.cc:909] [C613] connection in progress
    [2022-07-21 18:02:52.789][50][trace][pool] [source/common/conn_pool/conn_pool_base.cc:130] not creating a new connection, shouldCreateNewConnection returned false.
    [2022-07-21 18:02:52.789][50][debug][conn_handler] [source/server/active_tcp_listener.cc:140] [C612] new connection from 10.245.2.0:31924
    [2022-07-21 18:02:52.789][50][trace][connection] [source/common/network/connection_impl.cc:554] [C612] socket event: 2
    [2022-07-21 18:02:52.789][50][trace][connection] [source/common/network/connection_impl.cc:663] [C612] write ready
    [2022-07-21 18:02:52.790][50][trace][connection] [source/common/network/connection_impl.cc:554] [C613] socket event: 2
    [2022-07-21 18:02:52.790][50][trace][connection] [source/common/network/connection_impl.cc:663] [C613] write ready
    [2022-07-21 18:02:52.790][50][debug][connection] [source/common/network/connection_impl.cc:672] [C613] connected
    [2022-07-21 18:02:52.790][50][trace][connection] [source/common/network/connection_impl.cc:417] [C613] raising connection event 2
    [2022-07-21 18:02:52.790][50][trace][connection] [source/common/network/connection_impl.cc:356] [C613] readDisable: disable=true disable_count=0 state=0 buffer_length=0
    [2022-07-21 18:02:52.790][50][debug][pool] [source/common/conn_pool/conn_pool_base.cc:294] [C613] attaching to next stream
    [2022-07-21 18:02:52.790][50][debug][pool] [source/common/conn_pool/conn_pool_base.cc:177] [C613] creating stream
    [2022-07-21 18:02:52.790][50][trace][connection] [source/common/network/connection_impl.cc:356] [C613] readDisable: disable=false disable_count=1 state=0 buffer_length=0
    [2022-07-21 18:02:52.790][50][trace][connection] [source/common/network/connection_impl.cc:356] [C612] readDisable: disable=false disable_count=1 state=0 buffer_length=0
    [2022-07-21 18:02:52.790][50][debug][filter] [source/common/tcp_proxy/tcp_proxy.cc:609] [C612] TCP:onUpstreamEvent(), requestedServerName: cpeconsul-consul-server-4.server.l
    hr-poc1-dataplane.dev.consul
    [2022-07-21 18:02:52.790][50][trace][connection] [source/common/network/connection_impl.cc:554] [C613] socket event: 2
    [2022-07-21 18:02:52.790][50][trace][connection] [source/common/network/connection_impl.cc:663] [C613] write ready
    [2022-07-21 18:02:52.790][50][trace][connection] [source/common/network/connection_impl.cc:554] [C612] socket event: 3
    [2022-07-21 18:02:52.790][50][trace][connection] [source/common/network/connection_impl.cc:663] [C612] write ready
    [2022-07-21 18:02:52.790][50][trace][connection] [source/common/network/connection_impl.cc:592] [C612] read ready. dispatch_buffered_data=false
    [2022-07-21 18:02:52.790][50][trace][connection] [source/common/network/raw_buffer_socket.cc:24] [C612] read returns: 341
    [2022-07-21 18:02:52.790][50][trace][connection] [source/common/network/raw_buffer_socket.cc:38] [C612] read error: Resource temporarily unavailable
    Yann Huissoud
    @aiqency

    Two similar clusters, two similar consul configs. trying to spawn a second consul cluster, one join the other not:

    [WARN]  agent.server: Raft has a leader but other tracking of the node would indicate that the node is unhealthy or does not exist. The network may be misconfigured.: leader=172.98.120.15:8300
    [WARN]  agent: Syncing node info failed.: error="Raft leader not found in server lookup mapping"
    [ERROR] agent.anti_entropy: failed to sync remote state: error="Raft leader not found in server lookup mapping"
    [ERROR] agent.server.memberlist.lan: memberlist: Conflicting address for pxe-boot. Mine: 172.98.120.101:8301 Theirs: 172.98.120.15:8301 Old state: 0
    [ERROR] agent.server.serf.lan: serf: Node name conflicts with another node at 172.98.120.15:8301. Names must be unique! (Resolution enabled: false)

    Any idea what might cause the error?

    Alvin Lin
    @alvinlin123
    @Amier3 any luck finding someone new to take a look at hashicorp/memberlist#262
    Marc Richter
    @The-Judge
    WOW - last message I see is from Jul 26 - is this thing still alive?
    techdrgn
    @techdrgn:matrix.org
    [m]
    I have no idea, but it is extremely quiet.
    Marc Richter
    @The-Judge
    Hmm. What is "the main" Community Channel/Platform then if it isn't Gitter?

    In the meantime, I will put my question in here anyways. Maybe someone who might help reads it ...
    As I described in discuss already (which seems to have similar activity as Gitter), the official Deployment Guide is inconsistent when it comes to TLS configuration.

    In “Create the certificates” section, it says: “First, for your Consul servers, use the following command to create a certificate for each server.”. So: not for the clients, since “servers” is explicitly written.
    Next it says: " The Consul client agents will only need the the CA certificate, consul-agent-ca.pem , to enable mTLS.". So again: It confirms that the clients only need the CA certificate, not the DC certificates.

    But then, with the very next section “Distribute the certificates to agents”, it says: “You must distribute the CA certificate, consul-agent-ca.pem, to each of the Consul agents as well as the agent specific certificate and private key.”. So, from here, it says that one must copy all node specific certs in addition to the CA certificate, which is the opposite of what was explained before.

    This is once more confirmed in the TLS configuration - Section. Even though “Auto encryption” guide is selected, the consul.hcl snipplet lists not only ca_file, but cert_file and key_file parameters as well “for Consul clients”. The only difference between “Auto” and “Manual” seems to be the auto_encrypt nested section. Which again seems to be the opposite of the “CA cert only” statement and the entire Auto encryption idea.

    Marc Richter
    @The-Judge
    Regarding that auto_encrypt nested section, the consul Security guide brings another unclear element onto the table: in Configure the clients section, it says to configure the clients by indeed setting the ca_file option only, but instead of auto_encrypt { allow_tls = true } to set auto_encrypt { tls = true } instead.
    What's correct now?
    Marc Richter
    @The-Judge
    As far as I understand from the general Consul Configuration Reference, on servers auto_encrypt { allow_tls = true } must be set and on clients auto_encrypt { tls = true }; but that's what my interpretation is and I'm unsure if that's correct.
    1 reply
    oratlv
    @oratlv
    Hi, Does anyone know how to handle this message:
    [WARN] agent.server.serf.lan: serf: Intent queue depth (11437) exceeds limit (10690), dropping messages!
    Consul’s version is 1.13.1
    higuita
    @higuita:matrix.org
    [m]
    not sure, but either your have way too many hosts/services and consul is already having problem with all them, or some node is slow and is getting more healtchecks to do than those that it can manage...
    segment the consul in the first one, increase the node or solve the load issue in the second
    that is also a warning, so if just a random event, it was probably just load and worse case you failed to do a healtcheck for some hosts/services in time
    @oratlv: ↑
    OliverSmart
    @AdamCzepiel78
    Hello, i try to use the consul kv inside kubernetes, consul implemented but inside a pod the code http://127.0.0.1:8500 says
    Unhandled exception. System.Net.Http.HttpRequestException: Connection refused (127.0.0.1:8500)
    1 reply
    oratlv
    @oratlv
    I get it actually on all servers but we don't have extra load - so it's weird. I'm trying to track the cause of it somehow.
    Sean
    @seanamos

    These docs demonstrate how to register a service proxy: https://www.consul.io/docs/connect/registration/service-registration
    They give plenty sample configurations, but I can't figure out where to use those sample configurations!

    consul services register proxy.hcl
    Error: failed to parse proxy.hcl: 4 errors occurred:
        * invalid config key kind
        * invalid config key name
        * invalid config key port
        * invalid config key proxy
    consul config write proxy.hcl
    Failed to decode config entry input: invalid config entry kind: connect-proxy

    What am I missing?

    Sean
    @seanamos
    Right, figured it out:
    service { # <-- must be in a service block, examples don't show this
      name =  <name of the service>
      kind = "connect-proxy"
      proxy = {
      destination_service_name = "<name of the service that the proxy represents>"
      <additional proxy parameters> = "<additional parameter values>"
      }
      port = <port where services can discover and connect to proxied services>
    }
    1 reply
    Ayaan Zaidi
    @obviyus
    For some reason DNS resolution across all my consul nodes seem to be failing. The only thing I see in logs is:
    Aug 20 11:32:28 ip-172-31-33-223 consul[569145]: agent.rpcclient.health: subscribe call failed: err="rpc error: code = InvalidArgument desc = Key is required" failure_count=14 key=<service_name> topic=ServiceHealth
    Brett Larson
    @brettplarson
    What's the best practice for dev workstations? Should I install the consul agent on my WSL2 to get service resolution?
    bsharma-tavisca
    @bsharma-tavisca

    @bsharma-tavisca
    Hello everyone
    I am occasionally getting this error
    "Raft leader not found in server lookup mapping"

    "bootstrap_expect": 3,
    "retry_join": ["provider=aws tag_key=DataCenterName tag_value=ek-consul-nv-aws region=us-east-1 addr_type=private_v4"],
    "performance": {
    "raft_multiplier": 1
    }
    total consul servers running 5
    all consul server are running on m5.4xlarge

    bsharma-tavisca
    @bsharma-tavisca

    Two similar clusters, two similar consul configs. trying to spawn a second consul cluster, one join the other not:

    [WARN]  agent.server: Raft has a leader but other tracking of the node would indicate that the node is unhealthy or does not exist. The network may be misconfigured.: leader=172.98.120.15:8300
    [WARN]  agent: Syncing node info failed.: error="Raft leader not found in server lookup mapping"
    [ERROR] agent.anti_entropy: failed to sync remote state: error="Raft leader not found in server lookup mapping"
    [ERROR] agent.server.memberlist.lan: memberlist: Conflicting address for pxe-boot. Mine: 172.98.120.101:8301 Theirs: 172.98.120.15:8301 Old state: 0
    [ERROR] agent.server.serf.lan: serf: Node name conflicts with another node at 172.98.120.15:8301. Names must be unique! (Resolution enabled: false)

    Any idea what might cause the error?

    hey @aiqency
    did you find any luck getting the answers for the query you posted

    lcividin
    @lcividin:matrix.org
    [m]
    is it possible to create a consul key value out of a registered consul service?
    For example I want to create a variable in the key value store for a nomad job and a need the node ip and a particular port of a cluster of containers so I can join another cluster of containers to on that port
    1 reply
    Michael Aldridge
    @the-maldridge
    @blake when doing consul wan federation, does each DC maintain its own unique ACLs?
    1 reply
    Kholis Respati Agum Gumelar
    @kholisrag

    Hi, got problem when trying to do nomad job with consul connect enabled like in https://developer.hashicorp.com/nomad/docs/integrations/consul-connect

    the connect-proxy-count-dashboard

    [2022-09-28 07:49:45.908][1][warning][config] [./source/common/config/grpc_stream.h:196] DeltaAggregatedResources gRPC config stream closed since 312s ago: 14, upstream connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: TLS error: 268435581:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED

    already tried to follow https://developer.hashicorp.com/nomad/tutorials/integrate-consul/consul-service-mesh?in=nomad%2Fintegrate-consul#tls-enabled-consul-environment but still no luck.

    anyone can help me to fix?

    Kalyan Chakravarthy S
    @kalyanchakravarthys
    Hi.. consul acl init job is failing to start.. Logs has following errors
    -consul-api-timeout must be set to a value greater than 0
    Error with Exit Code 1
    BackoffLimitExceeded
    can someone help me find a solution for this issue?
    Marcin Pastecki
    @mpastecki
    Hi, I'm starting to play around with Consul Namespaces, and have a question.
    Say I have an ACL Token created in the default namespace, what happens if the policies I add to it are in different namespaces then the default one?
    would the service using the token be able to perform actions in those namespaces as configured by the policies?
    Marcin Pastecki
    @mpastecki
    Looks like I can't assign policies from different namespace to a token
    And the same is valid for roles Roles can only be linked to policies that are defined in the same namespace
    Michael Aldridge
    @the-maldridge
    I have my consul peering connected between 2 datacenters, I have intentions that allow to and I have a peering filter that allows * to each peer, when I do a consul peer read -name <other> it works for one direction and is mostly empty on the other end.
    thoughts?
    Benjamín Visón
    @bvisonl

    Hi,

    I'm doing some testing with Consul and I am running a client inside a container, when the container starts up it initially is not able to find any consul servers (I assume network starting or something) and the watches that I've setup is throwing an error:

    2022-10-21T14:58:32.569Z [ERROR] agent.watch.watch: Watch errored: type=nodes error="Unexpected response code: 500 (No known Consul servers)" retry=5s

    After this the client joins the cluster successfully but the watches never retries and stays dead.

    Consul v1.13.3

    Any thoughts?

    Benjamín Visón
    @bvisonl
    I think I misunderstood the use of "nodes" as a watch type, thought it would react based on nodes going up/down but I guess that's what services are for.
    Alexey Shcherbak
    @centur

    Hi, I'm trying to troubleshoot the reachability of the consul from one of the jobs I'm running in nomad.
    So I'm trying to start grafana/agent (read prometheus, they both work in the same way) container as a nomad job and use it to collect consul cluster telemetry.
    Our consul cluster has an ingress gateway with public dns and if I point grafana/agent to that address, say https://consul.example.com:8500, everything works. Traffic to this public address goes via AWS ALB and all other AWS plumbing, so we want grafana agent to talk to the consul cluster locally, via private network they both resides in. And I can't figure out how to point Grafana-agent task to consul HTTP API correctly. Grafana agent has a consul service sidecar and I can see - it successfully registered in Consul mesh via Nomad connect {sidecar_service...} stanza.
    What I've tried so far:

    1. Point agent to http:/127.0.0.1:8500, which from my understanding corresponds to local consul agent that we are running in client mode on each node for service mesh. I also tried to define an upstream in this sidecar to point to service "consul" registered in the catalog via

      connect{
      sidecar_service{
      proxy{
       upstreams {
         destination_name   = "consul"
         local_bind_address = "127.0.0.1"
         local_bind_port    = 10123
       }

      and point grafana agent to 127.0.0.1:10123

    2. I tried to use one of the env variables injected by nomad to get a specific consul service IP (it actually gives me a local node private network IP) and use it to configure consul cluster scraping at http://{IP}:8500.

    3. Given this is our research cluster - I also tried to update consul cluster to allow all comms between all services and hardcode one of the consul server nodes' private IP address as a destination e.g. grafana agent tries to reach http://{consul-node-ip-from-AWS-console}:8500

    Everything to no avail with various errors in grafana agent logs.

    Can anyone please advice on what's is the correct way to configure grafana agent to collect Consul cluster own telemetry via Prometheus endpoint (https://developer.hashicorp.com/consul/docs/agent/telemetry) and what I might be doing wrong here, as I spent almost 3 days trying to figure this out.

    Alexey Shcherbak
    @centur
    Well, nvm :point_up: . Aside of multiple small changes, I noticed that consul address that I was supplying in consul_exporter stanza, was added without protocol prefix. Setting explicit protocol prefix ended up the last (or maybe a single key item) bit that was missing from my configuration. :facepalm:
    dagtveit
    @dagtveit
    hey guys can somone please help me, i am in deep shit i lots my vault . consul keeps rebooting due to Attempting re-join to previously known node
    i had some issues after a restart. it was in single cluster node and i tried adding more nodes as it couldnt find leader
    i removed the extra nodes again but it still tryes to connect to them. i tried force removing them and everything. but it still comes up in the log that it is trying to connect to it
    I cant find anywhere that the old nodes exists, consul members or raft peers dosnt list them
    Ryan Matte
    @rmatte
    We're currently testing out consul to be used a load balancer type setup via dns. I've been stress testing our test nodes with dns queries. These are bare metal nodes with 32 cpu cores, 128gb of ram, 10gig networking. The best I can seem to get out of consul is 32,000 queries per second, which is basically 1000 queries per second per cpu core. While I'm hitting it with the stress test it's load average, cpu usage, memory usage, and network usage all remain relatively low. I have caching enabled within consul set at 10 seconds. Does anyone have any idea for other config options I could try to squeeze more performance out of this, or am I just hitting some kind of programmatical limit within consul?