Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Marcin Pastecki
    @mpastecki
    And the same is valid for roles Roles can only be linked to policies that are defined in the same namespace
    Michael Aldridge
    @the-maldridge
    I have my consul peering connected between 2 datacenters, I have intentions that allow to and I have a peering filter that allows * to each peer, when I do a consul peer read -name <other> it works for one direction and is mostly empty on the other end.
    thoughts?
    Benjamín Visón
    @bvisonl

    Hi,

    I'm doing some testing with Consul and I am running a client inside a container, when the container starts up it initially is not able to find any consul servers (I assume network starting or something) and the watches that I've setup is throwing an error:

    2022-10-21T14:58:32.569Z [ERROR] agent.watch.watch: Watch errored: type=nodes error="Unexpected response code: 500 (No known Consul servers)" retry=5s

    After this the client joins the cluster successfully but the watches never retries and stays dead.

    Consul v1.13.3

    Any thoughts?

    Benjamín Visón
    @bvisonl
    I think I misunderstood the use of "nodes" as a watch type, thought it would react based on nodes going up/down but I guess that's what services are for.
    Alexey Shcherbak
    @centur

    Hi, I'm trying to troubleshoot the reachability of the consul from one of the jobs I'm running in nomad.
    So I'm trying to start grafana/agent (read prometheus, they both work in the same way) container as a nomad job and use it to collect consul cluster telemetry.
    Our consul cluster has an ingress gateway with public dns and if I point grafana/agent to that address, say https://consul.example.com:8500, everything works. Traffic to this public address goes via AWS ALB and all other AWS plumbing, so we want grafana agent to talk to the consul cluster locally, via private network they both resides in. And I can't figure out how to point Grafana-agent task to consul HTTP API correctly. Grafana agent has a consul service sidecar and I can see - it successfully registered in Consul mesh via Nomad connect {sidecar_service...} stanza.
    What I've tried so far:

    1. Point agent to http:/127.0.0.1:8500, which from my understanding corresponds to local consul agent that we are running in client mode on each node for service mesh. I also tried to define an upstream in this sidecar to point to service "consul" registered in the catalog via

      connect{
      sidecar_service{
      proxy{
       upstreams {
         destination_name   = "consul"
         local_bind_address = "127.0.0.1"
         local_bind_port    = 10123
       }

      and point grafana agent to 127.0.0.1:10123

    2. I tried to use one of the env variables injected by nomad to get a specific consul service IP (it actually gives me a local node private network IP) and use it to configure consul cluster scraping at http://{IP}:8500.

    3. Given this is our research cluster - I also tried to update consul cluster to allow all comms between all services and hardcode one of the consul server nodes' private IP address as a destination e.g. grafana agent tries to reach http://{consul-node-ip-from-AWS-console}:8500

    Everything to no avail with various errors in grafana agent logs.

    Can anyone please advice on what's is the correct way to configure grafana agent to collect Consul cluster own telemetry via Prometheus endpoint (https://developer.hashicorp.com/consul/docs/agent/telemetry) and what I might be doing wrong here, as I spent almost 3 days trying to figure this out.

    Alexey Shcherbak
    @centur
    Well, nvm :point_up: . Aside of multiple small changes, I noticed that consul address that I was supplying in consul_exporter stanza, was added without protocol prefix. Setting explicit protocol prefix ended up the last (or maybe a single key item) bit that was missing from my configuration. :facepalm:
    dagtveit
    @dagtveit
    hey guys can somone please help me, i am in deep shit i lots my vault . consul keeps rebooting due to Attempting re-join to previously known node
    i had some issues after a restart. it was in single cluster node and i tried adding more nodes as it couldnt find leader
    i removed the extra nodes again but it still tryes to connect to them. i tried force removing them and everything. but it still comes up in the log that it is trying to connect to it
    I cant find anywhere that the old nodes exists, consul members or raft peers dosnt list them
    Ryan Matte
    @rmatte
    We're currently testing out consul to be used a load balancer type setup via dns. I've been stress testing our test nodes with dns queries. These are bare metal nodes with 32 cpu cores, 128gb of ram, 10gig networking. The best I can seem to get out of consul is 32,000 queries per second, which is basically 1000 queries per second per cpu core. While I'm hitting it with the stress test it's load average, cpu usage, memory usage, and network usage all remain relatively low. I have caching enabled within consul set at 10 seconds. Does anyone have any idea for other config options I could try to squeeze more performance out of this, or am I just hitting some kind of programmatical limit within consul?
    Michael Aldridge
    @the-maldridge
    something something dns is not a load balancer
    I would though if I were inclined to do that put unbound in front of consul and use caching
    Ryan Matte
    @rmatte
    For our particular use case it'll suffice, it doesn't need to be a perfect load balancer persay.
    but cool, I'll look in to unbound, thanks
    Michael Aldridge
    @the-maldridge
    or coredns or even dnsmasq, just something with a performant cache to take the load off your consul machines
    Ryan Matte
    @rmatte
    I thought consul's built in cache feature would improve things a bit more significantly than it did. By enabling it I only gained about another 1000 queries per second compared to having it turned off which doesn't seem right.
    Michael Aldridge
    @the-maldridge
    consul's cache reduces the reliance on the upstream consul servers, but it is not a protocol optimized cache in the same way a dedicated DNS server is
    Ryan Matte
    @rmatte
    gotcha
    Matt Darcy
    @ikonia
    can anyone point me at the documentation that lists the parameter to change the consul hosted domain name from .consul to something else, I know it exists as I’ve read it and used it before, but I cannot find it now for the life of me, also is there any negative impact from changing this, I can’t think of one, but be nice to know if there is anything I’ve not considered
    Matt Darcy
    @ikonia
    thanks
    Sean
    @seanamos
    No problem!
    0xalex88
    @0xalex88
    Hi everyone, I've started a 4 node cluster, this is my config https://gist.github.com/0xalex88/a609af0698d0dd46b48571ea6baed42b I've followed https://developer.hashicorp.com/consul/tutorials/production-deploy/deployment-guide
    however the nodes are all continuously logging that there is no leader, what should I do? is this expected?
    0xalex88
    @0xalex88
    after deleting everything it correctly bootstrapped
    Sean
    @seanamos

    @0xalex88 Maybe you are just experimenting, but a 4 node cluster is not a good idea.
    Consul does leadership election based on majority. In a 4 node cluster, you are likely to run into a split brain or inability to elect a leader.
    I believe they are adding a warning for when people incorrectly set an even number in bootstrap_expect.

    You want bootstrap_expect set to an odd number:
    1 - No HA.
    3 - Tolerance for 1 node failing
    5 - Tolerance for 2 nodes failing

    1 reply
    Yohan Daddou
    @deepbluemussel
    Hi everyone,
    I know that consul resolves like <tag>.<service_name>.service.<dc>.consul but Is there a way to register a service that can be resolved with a wildcard subdomain?
    Ex: *.app_name.service.dc1.consul ==> same IP
    2 replies
    ilpianista
    @ilpianista:kde.org
    [m]
    hi there, I'm trying to setup consul inject. I've deployed my 2 pods, I've added the consul.hashicorp.com/connect-service-upstreams: foo:1234 annotation to the client, but curl localhost:1234 fails with Connection refused and indeed there's nothing listening on that port. curl foo is working. The envoy-sidecar is injected. What could be missing?
    ilpianista
    @ilpianista:kde.org
    [m]
    one thing I don't understand in the example is why you should define the port in the service client: https://developer.hashicorp.com/consul/docs/k8s/connect#connecting-to-connect-enabled-services
    Rodrigo Pereira
    @voiprodrigo
    Hi. I'm facing a weird situation between Vault and Consul. Maybe someone here can help me. I have a 5-node Consult cluster and a 5-node Vault cluster, both using latest versions. This uses 5 machines only, each machine holds a member for each service cluster. Vault reports directly to the local Consul server agent. These 5 machines span 3 "geographic/network zones". One zone contains only one node. There was an issue with one of the zones, so two nodes were isolated from the other 3, but that was temporary. The problem I'm seeing now is that although there is only one active/leader Vault node, Consul DNS and service check metric insist to report that two Vault nodes are active, which is not true. For example, DNS querying active.vault.service.mydc.consul alternates between two Vault nodes, and the service check metrics collected from Consul also report those two same nodes. I have no idea what's going on here. Any idea? TIA.
    0xalex88
    @0xalex88

    Hi everyone, after following https://developer.hashicorp.com/consul/tutorials/get-started-vms/virtual-machine-gs-deploy#create-server-tokens I'm still getting:

    agent: Node info update blocked by ACLs: node=7f08f176-a3f3-effe-7443-bd60865e09d1 accessorID=e340e34c-4ef6-5adb-ad48-5a3d923355f9
    agent: Coordinate update blocked by ACLs: accessorID=e340e34c-4ef6-5adb-ad48-5a3d923355f9

    what could be the reason?

    2 replies
    the accessor has an ID so I guess a token is set?
    the accessor ID matches the "server agent token" that has the "acl-policy-server-node" policy
    moris1amar
    @moris1amar
    Hi everyone, after deploying the consul server and one "client" in my environment, I can see that the client don't reach to join the server...
    I got from the client:
    Dec 05 14:46:10 kubetmplp consul[3325]: 2022-12-05T14:46:10.876Z [INFO]  agent: Starting server: address=127.0.0.1:8500 network=tcp protocol=http
    Dec 05 14:46:10 kubetmplp consul[3325]: agent: Starting server: address=127.0.0.1:8500 network=tcp protocol=http
    Dec 05 14:46:10 kubetmplp consul[3325]: 2022-12-05T14:46:10.876Z [INFO]  agent: started state syncer
    Dec 05 14:46:10 kubetmplp consul[3325]: 2022-12-05T14:46:10.876Z [INFO]  agent: Consul agent running!
    Dec 05 14:46:10 kubetmplp consul[3325]: 2022-12-05T14:46:10.876Z [WARN]  agent.router.manager: No servers available
    Dec 05 14:46:10 kubetmplp consul[3325]: 2022-12-05T14:46:10.876Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No known Consul servers"
    Dec 05 14:46:10 kubetmplp consul[3325]: agent: started state syncer
    Dec 05 14:46:10 kubetmplp consul[3325]: agent: Consul agent running!
    Dec 05 14:46:10 kubetmplp consul[3325]: agent.router.manager: No servers available
    Dec 05 14:46:10 kubetmplp consul[3325]: agent.anti_entropy: failed to sync remote state: error="No known Consul servers"
    consul.hcl
    {
      "datacenter": "iplan",
      "data_dir": "/var/lib/consul",
      "encrypt": "3ZYt2575ONn/EYcnQTGKBg==",
      "retry_interval": "10s",
      "enable_script_checks": false,
      "disable_update_check": true,
      "dns_config": {
        "enable_truncate": true,
        "only_passing": true
      },
      "enable_syslog": true,
      "leave_on_terminate": true,
      "log_level": "trace",
      "rejoin_after_leave": true,
      "tls": {
        "defaults": {
          "verify_incoming": false,
          "verify_outgoing": false
        }
      }
    }
    1 reply
    Susan Tang
    @susan.tang_gitlab
    Anyone have experience with external services (with health checks) automatically deregistering when registered through Terraform? We've set "deregister_critical_service_after" to an extremely high value, but this external service is still deregistered after some time. The health check is green after initial registration and we use ESM to monitor.
    Narendra Patel
    @narendrapatel
    We have consul deployment in k8s using external consul servers. Get failed to switch to Consul server \"xx.xx.xx.xx:8502\": target sub-connection is not ready (state=TRANSIENT_FAILURE)"}when it tries to connect to the server during upgrade to consul chart 1.0.2 with consul 14.2. Think this issue is due to TLS encryption.
    coconut30
    @coconut30
    Hello, I use consul service mesh on k8s, and I want to customize the envoy tcp idle_timeout. Does anyone know if this is possible ? And how I can configure it ?
    Jason Sievert
    @putz612
    Hello everyone, running into a weird issue with consul-connect-injector with openshift. I am using the helm chart to install it and have the global setting for openshift enabled. This weird thing that that the consul-connect-injector pod never goes healthy. Keep getting readiness probe failed however if I got onto the pod itself and curl the health check it comes back with ok. What am I missing?
    Lior Azroel
    @lior_azroel_gitlab
    hello there is a way to connect consul with ldap server?
    Alex Oskotsky
    @aoskotsky-amplify
    Hi is there anyway to set the idle_timeout on TCP services? I see the latest version of consul added local_idle_timeout_ms but the docs say it is only for HTTP. I see there is this open issue hashicorp/consul#8521. Are there any workarounds or plans to implement it?
    Michael Aldridge
    @the-maldridge
    @lior_azroel_gitlab not directly, but if you have oidc available from your ldap server you could connect via that
    Tommy Alatalo
    @altosys

    I'm having DNS access issues on one of my consul nodes. I've set this acl policy on each node in my cluster, only changing the name accordingly:

    agent "blockypi" {
      policy = "write"
    }
    
    node "blockypi" {
      policy = "write"
    }
    
    service_prefix "" {
      policy = "read"
    }
    
    # only needed if using prepared queries
    query_prefix "" {
      policy = "read"
    }

    The above policy works on all other nodes except blockypi. I have the above policy set on a token which I set as both the default and agent tokens on blockypi, but doing a lookup like dig consul.service.consul @127.0.0.1 -p8600 fails to return any addresses. The same lookup works perfectly fine on my other nodes, with equivalent policies.

    The strangest thing about this is that if I temporarily set the default token to a management token then the DNS lookups work. But why DNS doesn't work with the node token breaks my head since all my nodes use the same policy rules as mentioned. I tried removing the policy and token and then recreating it and resetting it on the agent, but the problem remains.

    Tommy Alatalo
    @altosys
    Actually I now also found that another node nas is having trouble looking up all addresses; dig consul.service.consul @127.0.0.1 -p 8600 +short should return 3 addresses but nas only gets one (its own). Has there been some kind of change in recent Consul versions regarding this? Because this has been working for quite some time until now.
    I create my agent tokens with Terraform like this:
    resource "consul_acl_token" "agent_token" {
      for_each    = toset(local.nodes)
      description = "Agent token '${each.value}'"
      policies    = [consul_acl_policy.node_policy[each.value].name]
      local       = true
    }
    0xalex88
    @0xalex88
    Hi everyone, I've connected two consul clusters via peering, from A to B everything works, from B to A the mesh gateway on the A side says Cluster not found prometheus.default.default.B.external.xxxxx-redacted-xxxxxx.consul, what could be the problem?
    I'm also not sure why it's trying it's "default.default.B" while the cluster with that service is the A one