These are chat archives for spring-cloud/spring-cloud

2nd
Jun 2016
Niklas Herder
@herder
Jun 02 2016 07:39
Hi, I'm having great problems getting Ribbon to handle a redeploy of a dependency service without freaking out and slowing down to a crawl. I'm running an Apache bench against a sidecar app, like this: ab -n 500000 -c4 localhost:5678/entity-api/12455, which works very nice when the entity-api service is up and running. But when I do a redeploy of the service, the new instances are up and healthy, and we take down the old instances, I get timeouts in the logs and the speed goes down from about 1000/s to 5-10/s.
I'm using a PingUrl and have tried overriding the default IRule, but it doesn't seem to do any difference whatever IRule implementation I choose.
Does anyone have any ideas on what I could do wrong? @spencergibb , maybe?
This is my ribbon and zuul properties (current - I have tweaked these values a lot):
ribbon:
  ReadTimeout: 1000
  ConnectTimeout: 200
  MaxAutoRetries: 2
  OkToRetryOnAllOperations: true
  MaxAutoRetriesNextServer: 2
  ServerListRefreshInterval: 1000

zuul:
  host:
    connect-timeout-millis: 200
    socket-timeout-millis: 200
...and the entity-api service responds to the PingUrl call ok.
Dave Syer
@dsyer
Jun 02 2016 08:26
That's a really useful test.
Soaking the server with requests to a single endpoint might not be totally realistic. But it should tell us something, I guess.
Niklas Herder
@herder
Jun 02 2016 08:55
Yes, I think it's a good test just for these things, to see if it is resilient to these kinds of things (which is kind of the point of this whole Netflix thingie :) ). As far as I can see, it's really hard to actually make it behave as it should under load, which I find weird? Or maybe Neflix have such a large amount of instances that they don't notice this?
Dave Syer
@dsyer
Jun 02 2016 10:03
I'm going to look into it.
One problem will definitely be that hitting a single endpoint really hard is not a very good simulation of real load
Ribbon uses thread pools, for instance, and the size of the pool is configurable, but it's probably per client, not global.
Dave Syer
@dsyer
Jun 02 2016 10:11
The behaviour in these kinds of situations is probably highly dependent on the precise nature of the outage as well. But a smooth transition to a new version of a backend service should be easy to get right.
szisti
@szisti
Jun 02 2016 10:35
the one thing i've seen is that just because the new instance has started, it's not added to the pool
it can take 20-30 seconds for that to happen
Dave Syer
@dsyer
Jun 02 2016 10:35
It can take longer than that
szisti
@szisti
Jun 02 2016 10:35
very nicely visible on the spring-boot-admin, when the new service pops up and becomes "UP"
Dave Syer
@dsyer
Jun 02 2016 10:36
But @herder was saying he had both instances and it only broke when he closed one down
szisti
@szisti
Jun 02 2016 10:36
so shutting down the "old" when the "new" is up, might be the problem
szisti
@szisti
Jun 02 2016 10:44
is there an extension, or planned extension to the config server client library, which would allow me to retrieve "plain text" files from the config server using the same rest template and authentication details ?
Dave Syer
@dsyer
Jun 02 2016 10:46
Same as what?
There's nothing preventing you from accessing the endpoints anyway you want
szisti
@szisti
Jun 02 2016 11:01
yes, but the config client already has all the authentication settings and everything available and setup
ConfigServicePropertySourceLocator only the locate and the setresttemplate are public at the moment
the locate uses the secureresttemplate, which i cannot use from outside to retrieve a different file
like /name/profile/"master"/"filename"
Dave Syer
@dsyer
Jun 02 2016 11:05
You'll have to configure your own client then
Niklas Herder
@herder
Jun 02 2016 11:34
re my question above, I seem to have a working setup now. I got 0 to 1 errors out of 500000 on each run, and traffic didn't drop catastrophically.
What I did was ditch the PingUrl and use the default NIWSDiscoveryPingwhich is set up by Spring in a Ribbon/Eureka env (I missed that one), and more or less use the default setup.
What seemed to do the trick was setting short timeouts and setting a high ribbon.MaxAutoRetriesNextServer so that it doesn't crash before finding a live one.
ribbon:
  ReadTimeout: 1000
  ConnectTimeout: 200
  MaxAutoRetries: 2
  OkToRetryOnAllOperations: true
  MaxAutoRetriesNextServer: 8
  ServerListRefreshInterval: 1000

zuul:
  host:
    connect-timeout-millis: 200
    socket-timeout-millis: 200
Maybe the RIbbon tutorial on Spring-cloud should mention that you don't need to define a PingUrl when using Eureka for host discovery?
Niklas Herder
@herder
Jun 02 2016 12:04
Also, the few errors I get now are most likely caused by the new instances reporting UP instead of STARTING, since EurekaInstanceConfigBean#initialStatusis UP.
Is that by design? Shouldn't it wait to report UP until it can respond to traffic?
Dave Syer
@dsyer
Jun 02 2016 12:06
Possibly
A PR for the tutorial/docs would be awesome
Niklas Herder
@herder
Jun 02 2016 12:39
Yes, I should do that... I'll set a reminder for when I'm done with my project :)
Rob Harrop
@robharrop
Jun 02 2016 13:06
Is there a way to only load cloud config for certain profiles in my app?
And also the same question for cloud netflix stuff?
Dave Syer
@dsyer
Jun 02 2016 13:07
---
spring:
  profiles: local
  cloud:
     config:
        enabled: false
Should do it
"netflix" is a big surface area
individual features normally have enabled flags
Rob Harrop
@robharrop
Jun 02 2016 13:08
I want to disable the eureka stuff
But I get the general idea, I'll work on that
Thanks!
Dave Syer
@dsyer
Jun 02 2016 13:09
Probably eureka.client.enabled=false
Rob Harrop
@robharrop
Jun 02 2016 13:10
Thank you sir
Matt Benson
@mbenson
Jun 02 2016 21:29
I've got some demo code that was working until recently. In the interim I've upgraded to Brixton.RELEASE... I've got a Hystrix/Ribbon/Eureka Feign client but the server list always ends up with an empty list due to DiscoveryManager.getInstance().getDiscoveryClient() returning null. Any ideas?
Dave Syer
@dsyer
Jun 02 2016 21:32
If that's the Netflix code it's a deprecated API I think
You're supposed to use dependency injection not singletons
Matt Benson
@mbenson
Jun 02 2016 21:33
it is indeed; I still wonder why it would just stop working. I have an idea of something to look at, though
Matt Benson
@mbenson
Jun 02 2016 21:39
nope, still don't get it. DomainExtractingServerList wraps an instance of DiscoveryEnabledNIWSServerList and calls its #getUpdatedListOfServers() which ultimately calls the deprecated API
Matt Benson
@mbenson
Jun 02 2016 21:57
looks like the ribbon-eureka library could stand updating