For complex issues please use https://discuss.hashicorp.com/c/consul/, https://github.com/hashicorp/consul/issues or https://groups.google.com/forum/#!forum/consul-tool.
In the meantime, I will put my question in here anyways. Maybe someone who might help reads it ...
As I described in discuss already (which seems to have similar activity as Gitter), the official Deployment Guide is inconsistent when it comes to TLS configuration.
In “Create the certificates” section, it says: “First, for your Consul servers, use the following command to create a certificate for each server.”. So: not for the clients, since “servers” is explicitly written.
Next it says: " The Consul client agents will only need the the CA certificate, consul-agent-ca.pem , to enable mTLS.". So again: It confirms that the clients only need the CA certificate, not the DC certificates.
But then, with the very next section “Distribute the certificates to agents”, it says: “You must distribute the CA certificate, consul-agent-ca.pem
, to each of the Consul agents as well as the agent specific certificate and private key.”. So, from here, it says that one must copy all node specific certs in addition to the CA certificate, which is the opposite of what was explained before.
This is once more confirmed in the TLS configuration - Section. Even though “Auto encryption” guide is selected, the consul.hcl snipplet lists not only ca_file, but cert_file and key_file parameters as well “for Consul clients”. The only difference between “Auto” and “Manual” seems to be the auto_encrypt nested section. Which again seems to be the opposite of the “CA cert only” statement and the entire Auto encryption idea.
auto_encrypt
nested section, the consul Security guide brings another unclear element onto the table: in Configure the clients section, it says to configure the clients by indeed setting the ca_file
option only, but instead of auto_encrypt { allow_tls = true }
to set auto_encrypt { tls = true }
instead.These docs demonstrate how to register a service proxy: https://www.consul.io/docs/connect/registration/service-registration
They give plenty sample configurations, but I can't figure out where to use those sample configurations!
consul services register proxy.hcl
Error: failed to parse proxy.hcl: 4 errors occurred:
* invalid config key kind
* invalid config key name
* invalid config key port
* invalid config key proxy
consul config write proxy.hcl
Failed to decode config entry input: invalid config entry kind: connect-proxy
What am I missing?
service { # <-- must be in a service block, examples don't show this
name = <name of the service>
kind = "connect-proxy"
proxy = {
destination_service_name = "<name of the service that the proxy represents>"
<additional proxy parameters> = "<additional parameter values>"
}
port = <port where services can discover and connect to proxied services>
}
Aug 20 11:32:28 ip-172-31-33-223 consul[569145]: agent.rpcclient.health: subscribe call failed: err="rpc error: code = InvalidArgument desc = Key is required" failure_count=14 key=<service_name> topic=ServiceHealth
@bsharma-tavisca
Hello everyone
I am occasionally getting this error
"Raft leader not found in server lookup mapping"
"bootstrap_expect": 3,
"retry_join": ["provider=aws tag_key=DataCenterName tag_value=ek-consul-nv-aws region=us-east-1 addr_type=private_v4"],
"performance": {
"raft_multiplier": 1
}
total consul servers running 5
all consul server are running on m5.4xlarge
Two similar clusters, two similar consul configs. trying to spawn a second consul cluster, one join the other not:
[WARN] agent.server: Raft has a leader but other tracking of the node would indicate that the node is unhealthy or does not exist. The network may be misconfigured.: leader=172.98.120.15:8300 [WARN] agent: Syncing node info failed.: error="Raft leader not found in server lookup mapping" [ERROR] agent.anti_entropy: failed to sync remote state: error="Raft leader not found in server lookup mapping" [ERROR] agent.server.memberlist.lan: memberlist: Conflicting address for pxe-boot. Mine: 172.98.120.101:8301 Theirs: 172.98.120.15:8301 Old state: 0 [ERROR] agent.server.serf.lan: serf: Node name conflicts with another node at 172.98.120.15:8301. Names must be unique! (Resolution enabled: false)
Any idea what might cause the error?
hey @aiqency
did you find any luck getting the answers for the query you posted
Hi, got problem when trying to do nomad job with consul connect enabled like in https://developer.hashicorp.com/nomad/docs/integrations/consul-connect
the connect-proxy-count-dashboard
[2022-09-28 07:49:45.908][1][warning][config] [./source/common/config/grpc_stream.h:196] DeltaAggregatedResources gRPC config stream closed since 312s ago: 14, upstream connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: TLS error: 268435581:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED
already tried to follow https://developer.hashicorp.com/nomad/tutorials/integrate-consul/consul-service-mesh?in=nomad%2Fintegrate-consul#tls-enabled-consul-environment but still no luck.
anyone can help me to fix?
Roles can only be linked to policies that are defined in the same namespace
Hi,
I'm doing some testing with Consul and I am running a client inside a container, when the container starts up it initially is not able to find any consul servers (I assume network starting or something) and the watches that I've setup is throwing an error:
2022-10-21T14:58:32.569Z [ERROR] agent.watch.watch: Watch errored: type=nodes error="Unexpected response code: 500 (No known Consul servers)" retry=5s
After this the client joins the cluster successfully but the watches never retries and stays dead.
Consul v1.13.3
Any thoughts?
Hi, I'm trying to troubleshoot the reachability of the consul from one of the jobs I'm running in nomad.
So I'm trying to start grafana/agent (read prometheus, they both work in the same way) container as a nomad job and use it to collect consul cluster telemetry.
Our consul cluster has an ingress gateway with public dns and if I point grafana/agent to that address, say https://consul.example.com:8500
, everything works. Traffic to this public address goes via AWS ALB and all other AWS plumbing, so we want grafana agent to talk to the consul cluster locally, via private network they both resides in. And I can't figure out how to point Grafana-agent task to consul HTTP API correctly. Grafana agent has a consul service sidecar and I can see - it successfully registered in Consul mesh via Nomad connect {sidecar_service...}
stanza.
What I've tried so far:
Point agent to http:/127.0.0.1:8500, which from my understanding corresponds to local consul agent that we are running in client mode on each node for service mesh. I also tried to define an upstream in this sidecar to point to service "consul" registered in the catalog via
connect{
sidecar_service{
proxy{
upstreams {
destination_name = "consul"
local_bind_address = "127.0.0.1"
local_bind_port = 10123
}
and point grafana agent to 127.0.0.1:10123
I tried to use one of the env variables injected by nomad to get a specific consul service IP (it actually gives me a local node private network IP) and use it to configure consul cluster scraping at http://{IP}:8500.
Given this is our research cluster - I also tried to update consul cluster to allow all comms between all services and hardcode one of the consul server nodes' private IP address as a destination e.g. grafana agent tries to reach http://{consul-node-ip-from-AWS-console}:8500
Everything to no avail with various errors in grafana agent logs.
Can anyone please advice on what's is the correct way to configure grafana agent to collect Consul cluster own telemetry via Prometheus endpoint (https://developer.hashicorp.com/consul/docs/agent/telemetry) and what I might be doing wrong here, as I spent almost 3 days trying to figure this out.