Troubleshooting Reactive Spring WebFlux: Handling Synchronous API Calls
In the world of reactive programming with Spring WebFlux, it's crucial to understand the nuances of asynchronous and non-blocking request handling. However, even the most well-designed applications can run into issues, especially when mixing synchronous and asynchronous components. In this article, we'll explore a real-world scenario where a synchronous SOAP API call led to pod restarts in a production OpenShift environment. We'll dive into the root cause, analyze the problem, and provide a solution that aligns with reactive programming principles.
The Incident: Mismatched Timings and Synchronous API Calls
The incident at hand occurred in a production OpenShift pod where intermittent restarts were reported every few hours. Upon investigation, it became clear that the pod was failing health probes due to timeouts during a specific SOAP API call. The health probes were configured with a timeout of 3 seconds per 10 seconds, with a maximum of 3 retries. This closely aligned with the issue: the SOAP API call was occasionally taking more than 30 seconds to respond.
The Root Cause: Mixing Sync and Async in Reactive World
Digging deeper, the root cause of the issue became evident. The application, built with Spring WebFlux, was designed for non-blocking, reactive request handling. However, a critical part of the code was making a synchronous SOAP API call using getWebServiceTemplate().marshalSendAndReceive(request)
. This synchronous call, running within a reactive context, was creating a bottleneck and defying the reactive programming principles.
The Solution: Embracing Reactive Principles
To address the issue and bring the application back in line with reactive principles, a solution was devised. The goal was to defer the synchronous SOAP API call and execute it on a separate thread. This would prevent the call from blocking the reactive thread and ensure that the application remains responsive. Here's how the solution was implemented:
return Mono.defer(() -> Mono.fromCallable(() -> syncSoapAPICall())
.subscribeOn(Schedulers.boundedElastic());
A Closer Look at the Solution
Mono.defer()
: TheMono.defer()
function is used to wrap the SOAP API call in aMono
. This ensures that the API call is not executed immediately but is deferred until subscription.Mono.fromCallable()
: TheMono.fromCallable()
function is used to encapsulate the synchronous SOAP API call. It turns a synchronous operation into a reactive one.subscribeOn(Schedulers.boundedElastic())
: To prevent the synchronous call from blocking the main reactive thread, the API call is subscribed on a separate thread managed bySchedulers.boundedElastic()
. This scheduler is designed for blocking tasks and prevents overloading the reactive thread pool.
Customizing Thread Pool Capacity
It's important to note that the Schedulers.boundedElastic()
scheduler comes with a default thread pool capacity. By default, the maximum number of threads created is capped at ten times the number of available CPU cores. However, in scenarios where higher throughput is required, it's possible to customize the capacity. To do so, you can set the JVM property:
-Dreactor.schedulers.defaultBoundedElasticSize=${customize-capacity}
For instance, if your application needs to handle 200 transactions per second (TPS), you can customize the capacity to match that requirement.
Learning from the Experience
This incident highlights the challenges of mixing synchronous and asynchronous components within a reactive application. When dealing with reactive programming, it's essential to embrace the principles fully. The solution of deferring synchronous calls and executing them on separate threads with Schedulers.boundedElastic()
ensures that the application remains responsive and avoids blocking the reactive thread.
References and Further Reading
- Stack Overflow: Spring Boot Actuator to Run in Separate Thread Pool
- Baeldung: Concurrency in Spring WebFlux
- SpeakerDeck: How to Avoid Common Mistakes When Using Reactor Netty
- Colin Breck's Blog: Kubernetes Liveness and Readiness Probes
- Project Reactor Documentation: Wrapping Blocking
Conclusion
In the world of reactive programming, ensuring that your application aligns with the principles of non-blocking and asynchronous execution is paramount. The incident discussed here serves as a reminder of the importance of consistency in reactive design. By deferring synchronous calls and leveraging Schedulers.boundedElastic()
for separate thread execution, you can create a harmonious blend of synchronous and asynchronous components within your reactive Spring WebFlux application. This approach not only prevents bottlenecks but also ensures the resilience and responsiveness of your application in dynamic production environments.