Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Jun 17 18:54

    jhalterman on master

    Upgrade vertx-core to 3.9.8 (#2… (compare)

  • Jun 17 18:54
    jhalterman closed #276
  • Jun 17 11:49
    odidev opened #276
  • Jun 09 16:23
    stefnats opened #275
  • May 21 15:15

    jhalterman on gh-pages

    Added Skyscanner to the list of… (compare)

  • May 21 15:15
    jhalterman closed #274
  • May 21 08:57
    shnako opened #274
  • Apr 26 19:41
    Tembrel commented #273
  • Apr 26 19:15
    afds commented #273
  • Apr 26 14:47
    Tembrel commented #273
  • Apr 26 07:38
    afds commented #273
  • Apr 26 06:59
    afds commented #273
  • Apr 26 04:58
    afds commented #273
  • Apr 26 00:53
    Tembrel commented #273
  • Apr 25 22:02
    Tembrel commented #273
  • Apr 25 16:43
    Tembrel commented #273
  • Apr 25 14:09
    Tembrel commented #273
  • Apr 25 06:58
    afds commented #273
  • Apr 25 06:58
    afds commented #273
  • Apr 25 06:56
    afds opened #273
Jonathan Halterman
@jhalterman
:thumbsup:
Manish Jain
@sahihaimanish_twitter

Hi, I was testing this library through a JUnit test case where there were two tasks in CompletableFuture.runAsync() and both fail after 2 retries with Failsafe. I noticed that when the task starts it uses two workers 'ForkJoinPool.commonPool-worker-1' and 'ForkJoinPool.commonPool-worker-2' (since we are using the default ThreadPool) but when the retries happen, it is only carried out by one of the workers. Any idea why this is happening? Is it even async if only one worker is switching and doing tasks?

I tried with 4 tasks too, same issue.

Jonathan Halterman
@jhalterman
@sahihaimanish_twitter Failsafe will only perform one retry at a time for some execution. I'm not sure exactly how the two tasks you mention relate to failsafe. Can you post a bit of example code?
Manish Jain
@sahihaimanish_twitter
@jhalterman Here, eventObject consists of multiple items.
List<CompletableFuture<Void>> completableFutures = eventObject.batch.stream().map(metric -> {
                    return (CompletableFuture<Void>) Failsafe.with(retryPolicy).runAsync(() -> {
                        String metricString = metric.getMetric();
                        try {
                            sendDataToMetricStore();
                        } catch (IOException e) {
                            throw new RuntimeException(e);
                        }
                    });
            }).collect(Collectors.toList());
Manish Jain
@sahihaimanish_twitter
I get that failsafe is applied per item in this case but why it is picking the same worker thread?
Like, runAsync() will pick worker-1, worker-2, and worker-3 (say, for 3 tasks) but Failsafe retry will always pick worker-2 for all its retries. Doesn't that make the retry slower in this case?
Nguyen Anh Tuan
@tuannguyen7
Hi, do I need to create a new RetryPolicy every time or I can create only one instance and safely use it.
Jonathan Halterman
@jhalterman
@tuannguyen7 yep, you can reuse policies and it's ideal to do so. BUT, worth noting that RetryPolicy is not threadsafe. So you wouldn't want to modify it from different threads.
Nguyen Anh Tuan
@tuannguyen7
Thank you @jhalterman
Nguyen Anh Tuan
@tuannguyen7
There is any clever way to throw a checked exception if max retries exceeded. I know that Fallback can be used but I want to declare the exception in method signature. My code looks like this but seems like it's a bit ugly.
public Product getProduct(int id) throw ClientException {
    try {
      return Failsafe.with(retryPolicy).get(() -> tryGettingProduct(id));
   } catch (FailsafeException e) {
      if (e.getCause() instanceof ClientException)
        throw (ClientException)e.getCause();
      throw e;
   }
}

private Product tryGettingProduct(int id) throw ClientException {
     // code ...
}
Sarah
@BaiyuY

Hi all, I was using an old version of failsafe(1.0.0), configured a circuitbreaker with no delay. the code inside the circuit breaker hasn't been enable to execute, so it basically looks like

failsafe.run(() -> {
  if (!Strings.isNullOrEmpty("someString")) {
      //do nothing
  }
}
CircuitBreaker circuitBreaker = new CircuitBreaker()
                .failOn(TimeoutException.class)
                .withFailureThreshold(Integer.parseInt(1), Integer.parseInt(10))
                .withSuccessThreshold(Integer.parseInt(5))
                .withTimeout(Integer.parseInt(10), TimeUnit.MILLISECONDS);

        circuitBreaker.onOpen(() -> System.out.println("Circuit Breaker opened"));
        circuitBreaker.onHalfOpen(() -> System.out.println("Circuit Breaker half-opened"));
        circuitBreaker.onClose(() -> System.out.println("Circuit Breaker closed"));

I configured the circuitbreaker with 10ms timeout. which a simple strings null check shouldn't take that long. But I am see the log of constant rotation of "Circuit Breaker open" then half open, then closed, and Repeat.

This is in a multi-threading env, I am wondering if anyone can explain this behaviour? Thanks!

And I am curious with 2.40, is the counting of failure thread-safe? (not plan to modify the circuit breaker policy inside different threads)
Sarah
@BaiyuY
@jhalterman much appreciated if you could help :)
swordsp
@swordsp
Hi everyone, I have a question about the FailsafeExecutor. From its comment it looks like the command wrapped by the executor would be executed in an extra background thread, either from commonPool or a given ExecutorService. Is it possible that we still execute it in current thread to reduce the overhead?
Joy Hou
@strawbee
Hi all. Is there a way to test the circuit breaker in "shadow mode" (just emit metrics that the circuit breaker is open, but not actually prevent calls to the dependency) so that I can test the thresholds configuration for a while, without actually affecting my service?
Jonathan Halterman
@jhalterman
@swordsp Executions are sync (in the current thread) or async (on the common pool or some other pool you configure), depending on which terminal method you use. Ex: get vs getAsync. So for synchronous, just use the run or get methods.
Jonathan Halterman
@jhalterman

@strawbee Nothing like that exists atm. You could pretty easily override CircuitBreaker and CircuitBreakerExecutor with your own impls to accomplish this though. First, extend CircuitBreakerExecutor and return null during preExecute:

https://github.com/jhalterman/failsafe/blob/master/src/main/java/net/jodah/failsafe/CircuitBreakerExecutor.java#L34

Then extend CircuitBreaker to return your customer CircuitBreakerExecutor:

https://github.com/jhalterman/failsafe/blob/master/src/main/java/net/jodah/failsafe/CircuitBreaker.java#L666

4 replies
Joy Hou
@strawbee
Hi @jhalterman, I had another question if you could help :) What order should the policies be composed in if I want to specify a slow call timeout that counts as a failure in my circuit breaker, but I also want it to retry with the same timeout? If I do:
Failsafe.with(fallback, retryPolicy, circuit breaker, timeout) like in the docs, will the retry timeout as well before it returns the fallback? Thanks a ton.
9 replies
Jon Bates
@spadger
This message was deleted
szlisiecki
@szlisiecki

Hi @jhalterman, I would like to use your library in multithreading environment and I'm wondering how it should be done properly. I also faced a problem with passing parameters. Here is a bit of code

Fallback<Void> fallback = Fallback.of(() -> {
    // I need myCheckedRunnable here
    logger("error with param {}", myCheckedRunnable.getIndex());
    // try to save task in db
});
FailsafeExecutor<Void> failsafeExecutor = Failsafe.with(fallback, retryPolicy);

//invoke many times with different parameters
IntStream.range(0, 10).forEach(
    index -> {
        MyCheckedRunnable myCheckedRunnable = new MyCheckedRunnable(index);
        failsafeExecutor.runAsync(myCheckedRunnable);
    }
);

So the first question is - is there a posibility to get somehow my task (myCheckedRunnable) in fallback?
And second question - is this approach proper for such things?

Best regards

Willi Schönborn
@whiskeysierra
Move your retryPolicy and fallback into your loop. Then you have a local reference and therefore access to your task while you construct your fallback.
szlisiecki
@szlisiecki
Hello Willi, thank you for your response. I was thinking about that but if I move my retryPolicy and fallback into my loop I need also to move failsafeExecutor creation into loop. So I will have n policies, n fallbacks and n failsafeExecutor instances. From my perspective performance is crucial. I'm not sure if this approach is good enough. My understanding is that if failsafeExecutor is able to run many tasks using runAsync so is it a good practise to create n executors for n tasks? Not just one?
Willi Schönborn
@whiskeysierra
I'd really surprised if that would have any noticeable effect on your performance.
szlisiecki
@szlisiecki

I've made some comparison about memory consuming between this 2 approaches :
First (create instances in loop)

ExecutorService executor = Executors.newFixedThreadPool(40);
IntStream.range(0, 100000).forEach(
    index -> {
        RetryPolicy<Void> retryPolicy = new RetryPolicy<Void>() ;
        Fallback<Void> fallback = Fallback.of(e -> logger.error("error"));
        FailsafeExecutor<Void> failsafeExecutor = Failsafe.with(fallback, retryPolicy).with(executor);

        MyCheckedRunnable myCheckedRunnable = new MyCheckedRunnable(index);
            failsafeExecutor.runAsync(myCheckedRunnable);
    }
);

Second (create instances outside loop)

ExecutorService executor = Executors.newFixedThreadPool(40);
RetryPolicy<Void> retryPolicy = new RetryPolicy<Void>() ;
Fallback<Void> fallback = Fallback.of(e -> logger.error("error"));
FailsafeExecutor<Void> failsafeExecutor = Failsafe.with(fallback, retryPolicy).with(executor);
IntStream.range(0, 100000).forEach(
    index -> {
        MyCheckedRunnable myCheckedRunnable = new MyCheckedRunnable(index);
        failsafeExecutor.runAsync(myCheckedRunnable);
    }
);

The conclusions are as follows:

  1. Using first approach we need more memory (12%-40%)
  2. The more loop runs the smaller difference
  3. The more retries the smaller difference

So as Willi mentioned, in my case there is very small effect on my performance

Joy Hou
@strawbee
Hi @jhalterman or @whiskeysierra or anyone else that might know - the docs were a little unclear to me, and I wanted to verify how Failsafe handles timeouts. If I get a TimeoutExceededException, and I didn't initialize my timeout with withCancel, then the request will still complete, right? I'm asking because my service caches responses. Even if I get a TimeoutExceededException and fail the current request, I still want it to get the response and cache it for future requests.
Thank you! :)
Thomas Santana
@nebulorum
Hi Failsafe. I have been trying to figure out why my test fail after migration from 2.0.0 to 2.{1,4}.0. This is a legacy codebase I just wanted to keep up to date.
After a lot of attempts we seem to narrow it down to an interaction better RetryPolicy, Critcuit Breaker and getAsyncExecution.
With not circuitbreak everything is fine. With Retry(CB(f)) we get NPE from inside the PolicyExecuter.
Only very few executions happen.
Thomas Santana
@nebulorum
I have an example of this not working. Should I create a issue on the project issue tracker?
Jonathan Halterman
@jhalterman
Yes please, if you have a small reproducer you can include that would be great
Jens Bannmann
@bannmann

I use Failsafe to add timeout (2 seconds) and retry (2 retries) behavior for requests made with Java 11 HTTPClient. the application makes several thousands of requests (mostly sequentially) to several servers. mostly, things work fine, but sometimes after a few minutes, the future returned by Failsafe exceeds the additional timeout of 20 seconds I passed to Future.get(long, TimeUnit). I tried changing the order of the retry/timeout policies and even ran with only the timeout policy, but the problem still appears.

I suspected an interaction of HTTPClient and Failsafe as both default to using the common fork/join pools, but giving both their own executors did not help.

I'm not sure whether I can create a minimal reproducer.

@jhalterman, do you have any other suggestions how to locate the cause?

Jonathan Halterman
@jhalterman
@bannmann do you have reason to believe the execution is actually being completed but Failsafe is not completing the future for some reason? Or do you think the execution is actually timing out or stalling?
Thomas Santana
@nebulorum
@jhalterman I created the following issue for my case: jhalterman/failsafe#268
We can discuss there if you think it's better. I added several variations of the failures. We can discuss on the issue.
William Johnson S. Okano
@williamokano

Hey team, this might not be the right place to talk about it, but I'm trying to upgrade from failsafe 1.1.0 to 2.4.0 and I noticed that some API slightly changed. On 1.1.0 I used to use RetryPolicy with retryOn with given conditions. Internally it would add the condition to the retryConditions array and check for isRetryable on the executor and execute once more.

On 2.4.0, I tried using handleIf, as the changelog says, but it returns as complete and don't retry my condition. I wrote a test I can paste here in case anyone is willing to help me with that, but basically I'm trying to make an http call with retrofit, that returns the call in which I can test the HTTP status code, and ask for retry (after some delay) when it's 5xx.

Can someone help me with that?

Also, I don't understand what the RetryPolicy#withMaxDuration does. I thought it was some kind of timeout, but when I simulate a first call (that I want to be retried) "sleeping" for more the define on the this threshold, it doesn't timeout, but rather finishes after the supplier delay.
Jonathan Halterman
@jhalterman
Hey @williamokano - if you have a failing test for the retryIf situation you mentioned, feel free to open an issue and include it there. Probably better than pasting here.
withMaxDuration is a passive timeout, so to speak, and will stop retrying after the maxDuration across all attempts has been exceeded. It will not though interrupt any execution attempt that's in progress. For that you should use the TImeout policy withCancel(true)
William Johnson S. Okano
@williamokano

Hey @jhalterman thanks for the explanation. There's no error on the lib then, just my dumbness not knowing how it works. I was setting maxDuration with 2 seconds and making the first call 4 seconds long. So even though I put maxAttempts(2), it would never call the second and return the response from the first request.

Thanks for clarification and sorry for asking just dumb question 😂

the-mod
@the-mod
Hi, I am currently facing Problems regarding the Event Listeners of a Circuit Breaker. It seams the passed CheckedRunnable isn't completely executed. I am using a Failsafe Executor with an own ScheduledExecutorService. I know the Documentation stated Exceptions within this Runnables are ignored. But any change to debug it?
Jonathan Halterman
@jhalterman
@the-mod At the moment there is not. If we did allow some sort of exception handling for event listeners, how might it work? If you have any good suggestions, feel free to open an issue.
Alex Popescu
@al3xandru
Where should I look to understand why the following code ends up in 12 retries (rather than at most 3):
RetryPolicy<Object> policy= new RetryPolicy<>()
                .withMaxRetries(3)
                .withBackoff(1, 30, ChronoUnit.SECONDS, 2)
                .withMaxDuration(Duration.ofSeconds(30));
I take that back... I had another "native" retry inside the Failsafe retry
Alex Popescu
@al3xandru
Are handle and handleIf additive or "or-ed"? (as in handle(Exception).handleI(Predicate) => if (exception && Predicate) vs if (exception || Predicate) --- yes, my pseudo is really bad)
Alex Popescu
@al3xandru
Quick test shows that they are ORed (if the exception is of the type or the condition is met than handle the failure)
Alex Popescu
@al3xandru
And the javadoc confirms that: "If multiple handle or handleResult conditions are specified, any matching condition can allow a retry"
Jens Bannmann
@bannmann

:point_up: 27. Februar 2021 20:40

@jhalterman, thank you for your pointers!

I finally got around to do some more detailed testing and I think I can answer your question now: the original execution seems to be stalling. Does the methodology outlined below look valid to you? If yes, what do you suggest trying next?

Setup:

  • changed the supplier passed into getStageAsync() so that it wraps the original CompletableFuture by adding a whenComplete() handler which logs the completion
  • added logging by registering a global onComplete executor listener and all listeners offered by the timeout and retry policies

Results:
Logging happens exactly as intended, but only for a few minutes. Then the problem occurs and none of my listeners seem to be called: the only messages I see are the one I log immediately before calling Failsafe.with() followed by the one from my class calling future.get(long, TimeUnit) on the future returned by Failsafe. The timestamp delta of these log messages equals the timeout passed to that future.get call.

Alex Popescu
@al3xandru
Can someone help me understand the generic type of RetryPolicy? When is it something else than Object?
Jens Bannmann
@bannmann
@al3xandru if the code that you run via Failsafe (e.g. the this::connect in this example) returns e.g. a HttpResponse<String> instead of Object, all policies will have that type parameter as well. This way, they can act on the result (e.g. inspecting the http response and triggering a retry if status code is 500).
Jonathan Halterman
@jhalterman
@al3xandru Here's an example https://jodah.net/failsafe/strong-typing/
Alex Popescu
@al3xandru
Thanks @bannmann & @jhalterman . I was doing something wrong on my side (forgot the type on the constructor side and wondered what's complaining about).