For complex issues please use https://discuss.hashicorp.com/c/consul/, https://github.com/hashicorp/consul/issues or https://groups.google.com/forum/#!forum/consul-tool.
Roles can only be linked to policies that are defined in the same namespace
Hi,
I'm doing some testing with Consul and I am running a client inside a container, when the container starts up it initially is not able to find any consul servers (I assume network starting or something) and the watches that I've setup is throwing an error:
2022-10-21T14:58:32.569Z [ERROR] agent.watch.watch: Watch errored: type=nodes error="Unexpected response code: 500 (No known Consul servers)" retry=5s
After this the client joins the cluster successfully but the watches never retries and stays dead.
Consul v1.13.3
Any thoughts?
Hi, I'm trying to troubleshoot the reachability of the consul from one of the jobs I'm running in nomad.
So I'm trying to start grafana/agent (read prometheus, they both work in the same way) container as a nomad job and use it to collect consul cluster telemetry.
Our consul cluster has an ingress gateway with public dns and if I point grafana/agent to that address, say https://consul.example.com:8500
, everything works. Traffic to this public address goes via AWS ALB and all other AWS plumbing, so we want grafana agent to talk to the consul cluster locally, via private network they both resides in. And I can't figure out how to point Grafana-agent task to consul HTTP API correctly. Grafana agent has a consul service sidecar and I can see - it successfully registered in Consul mesh via Nomad connect {sidecar_service...}
stanza.
What I've tried so far:
Point agent to http:/127.0.0.1:8500, which from my understanding corresponds to local consul agent that we are running in client mode on each node for service mesh. I also tried to define an upstream in this sidecar to point to service "consul" registered in the catalog via
connect{
sidecar_service{
proxy{
upstreams {
destination_name = "consul"
local_bind_address = "127.0.0.1"
local_bind_port = 10123
}
and point grafana agent to 127.0.0.1:10123
I tried to use one of the env variables injected by nomad to get a specific consul service IP (it actually gives me a local node private network IP) and use it to configure consul cluster scraping at http://{IP}:8500.
Given this is our research cluster - I also tried to update consul cluster to allow all comms between all services and hardcode one of the consul server nodes' private IP address as a destination e.g. grafana agent tries to reach http://{consul-node-ip-from-AWS-console}:8500
Everything to no avail with various errors in grafana agent logs.
Can anyone please advice on what's is the correct way to configure grafana agent to collect Consul cluster own telemetry via Prometheus endpoint (https://developer.hashicorp.com/consul/docs/agent/telemetry) and what I might be doing wrong here, as I spent almost 3 days trying to figure this out.
@0xalex88 Maybe you are just experimenting, but a 4 node cluster is not a good idea.
Consul does leadership election based on majority. In a 4 node cluster, you are likely to run into a split brain or inability to elect a leader.
I believe they are adding a warning for when people incorrectly set an even number in bootstrap_expect.
You want bootstrap_expect set to an odd number:
1 - No HA.
3 - Tolerance for 1 node failing
5 - Tolerance for 2 nodes failing
consul.hashicorp.com/connect-service-upstreams: foo:1234
annotation to the client, but curl localhost:1234
fails with Connection refused
and indeed there's nothing listening on that port. curl foo
is working. The envoy-sidecar is injected. What could be missing?
Hi everyone, after following https://developer.hashicorp.com/consul/tutorials/get-started-vms/virtual-machine-gs-deploy#create-server-tokens I'm still getting:
agent: Node info update blocked by ACLs: node=7f08f176-a3f3-effe-7443-bd60865e09d1 accessorID=e340e34c-4ef6-5adb-ad48-5a3d923355f9
agent: Coordinate update blocked by ACLs: accessorID=e340e34c-4ef6-5adb-ad48-5a3d923355f9
what could be the reason?
Dec 05 14:46:10 kubetmplp consul[3325]: 2022-12-05T14:46:10.876Z [INFO] agent: Starting server: address=127.0.0.1:8500 network=tcp protocol=http
Dec 05 14:46:10 kubetmplp consul[3325]: agent: Starting server: address=127.0.0.1:8500 network=tcp protocol=http
Dec 05 14:46:10 kubetmplp consul[3325]: 2022-12-05T14:46:10.876Z [INFO] agent: started state syncer
Dec 05 14:46:10 kubetmplp consul[3325]: 2022-12-05T14:46:10.876Z [INFO] agent: Consul agent running!
Dec 05 14:46:10 kubetmplp consul[3325]: 2022-12-05T14:46:10.876Z [WARN] agent.router.manager: No servers available
Dec 05 14:46:10 kubetmplp consul[3325]: 2022-12-05T14:46:10.876Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No known Consul servers"
Dec 05 14:46:10 kubetmplp consul[3325]: agent: started state syncer
Dec 05 14:46:10 kubetmplp consul[3325]: agent: Consul agent running!
Dec 05 14:46:10 kubetmplp consul[3325]: agent.router.manager: No servers available
Dec 05 14:46:10 kubetmplp consul[3325]: agent.anti_entropy: failed to sync remote state: error="No known Consul servers"
{
"datacenter": "iplan",
"data_dir": "/var/lib/consul",
"encrypt": "3ZYt2575ONn/EYcnQTGKBg==",
"retry_interval": "10s",
"enable_script_checks": false,
"disable_update_check": true,
"dns_config": {
"enable_truncate": true,
"only_passing": true
},
"enable_syslog": true,
"leave_on_terminate": true,
"log_level": "trace",
"rejoin_after_leave": true,
"tls": {
"defaults": {
"verify_incoming": false,
"verify_outgoing": false
}
}
}
failed to switch to Consul server \"xx.xx.xx.xx:8502\": target sub-connection is not ready (state=TRANSIENT_FAILURE)"}
when it tries to connect to the server during upgrade to consul chart 1.0.2 with consul 14.2. Think this issue is due to TLS encryption.
local_idle_timeout_ms
but the docs say it is only for HTTP. I see there is this open issue hashicorp/consul#8521. Are there any workarounds or plans to implement it?
I'm having DNS access issues on one of my consul nodes. I've set this acl policy on each node in my cluster, only changing the name accordingly:
agent "blockypi" {
policy = "write"
}
node "blockypi" {
policy = "write"
}
service_prefix "" {
policy = "read"
}
# only needed if using prepared queries
query_prefix "" {
policy = "read"
}
The above policy works on all other nodes except blockypi
. I have the above policy set on a token which I set as both the default and agent tokens on blockypi, but doing a lookup like dig consul.service.consul @127.0.0.1 -p8600
fails to return any addresses. The same lookup works perfectly fine on my other nodes, with equivalent policies.
The strangest thing about this is that if I temporarily set the default token to a management token then the DNS lookups work. But why DNS doesn't work with the node token breaks my head since all my nodes use the same policy rules as mentioned. I tried removing the policy and token and then recreating it and resetting it on the agent, but the problem remains.
nas
is having trouble looking up all addresses; dig consul.service.consul @127.0.0.1 -p 8600 +short
should return 3 addresses but nas
only gets one (its own). Has there been some kind of change in recent Consul versions regarding this? Because this has been working for quite some time until now.
resource "consul_acl_token" "agent_token" {
for_each = toset(local.nodes)
description = "Agent token '${each.value}'"
policies = [consul_acl_policy.node_policy[each.value].name]
local = true
}