These are chat archives for spring-cloud/spring-cloud

2nd
Nov 2018
Nicolas Schwarzentrub
@niggusch
Nov 02 2018 11:32

Hi all.
We currently have some strange issues we do not understand using spring-cloud-bus in our spring-boot apps connected to a rabbimq cluster with 3 nodes in OpenShift.
The problem occurs when the rabbitmq node on which the spring-cloud-bus queue was created, is leaving the cluster, The (spring-cloud-bus-) queue won't be recreated for unknown reason and we can see errors in the spring-boot application as well as on one other rabbitmq node (still living when the other dies). But we don't understand them:
It seams the the spring boot app swiches from node-0 to node-1 (when node-0 dies) and tries to declare the queue with the same name (springCloudBus.anonymous.pbE8wrx5RFSVGbj4m4YoFw-2) but for unknown reason fails forever.
Does anyone have any explanation for this behaviour? Do we have some missconfiguration?

Strange thing is, that we have a local setup using docker and the code is running fine, The queues will be recreated on a reachable node. We see completely different logs and fast switch and recreation of the queue on next free node. The config on the spring boot side is the same, only hostnames are different. On the rabbitmq side the only big difference is the cluster_formation configuration part. Localy we use rabbit_peer_discoveryclassic config where we use rabbit_peer_discovery_k8s* in OpenShift. The nodes themself rejoin the cluster fine in both environments.

spring log

[2018-11-02 09:06:36,049] [springCloudBus.anonymous.pbE8wrx5RFSVGbj4m4YoFw-2] WARN  o.s.a.r.l.BlockingQueueConsumer - corid= Failed to declare queue: springCloudBus.anonymous.pbE8wrx5RFSVGbj4m4YoFw
...
Caused by: com.rabbitmq.client.ShutdownSignalException: channel error; protocol method: #method<channel.close>(reply-code=404, reply-text=NOT_FOUND - no queue 'springCloudBus.anonymous.pbE8wrx5RFSVGbj4m4YoFw' in vhost '/', class-id=50, method-id=10)

Log excerp from rabbitmq node1

...
2018-11-02 09:06:26.039 [error] <0.31645.5> Channel error on connection <0.30102.5> (10.1.11.68:50832 -> 10.1.23.120:5672, vhost: '/', user: 'guest'), channel 2:  
operation queue.declare caused a channel exception not_found: no queue 'springCloudBus.anonymous.pbE8wrx5RFSVGbj4m4YoFw' in vhost '/'
...
Nicolas Schwarzentrub
@niggusch
Nov 02 2018 13:46

It seems that rabbitmq had a wrong configuration.
rabbitmqctl list_queues name owner_pid pid showed our spring cloud queues sometimes with different pids for owner and pid.
In any case where the _ownerpid and pid where the same, it caused no problem. When _ownerpid and pid were different and the node of the pid was shutdown, we had the problem.
But NOT when they were the same, or when they were different, but the _ownerpid node was shutdown.

We changed in our rabbitmq.config from

{ queue_master_locator, "min-masters"

to

{ queue_master_locator, "client-local" }

And then the _ownerpid and pid of the queue is always the same, which solves the problem.