Retrying Failed Requests in Batch Jobs: A Generic Approach
Retrying failed HTTP requests is a common challenge in distributed systems, especially in batch processing scenarios. While libraries like Spring's WebClient
and CircuitBreaker
offer retry mechanisms, dealing with heterogeneous request payloads and retrying them efficiently in a batch job context requires a different approach. In this blog post, we'll explore a generic solution using the FailedRequest
object and a specialized interceptor. We'll discuss how to capture and serialize failed requests, store them, and efficiently retry them at a later time.
Introducing the FailedRequest
Object
The cornerstone of our solution is the FailedRequest
object, a versatile structure designed to encapsulate essential information about a failed HTTP request:
@Data
public class FailedRequest {
private HttpMethod method;
@Exclude
private byte[] payload; // raw body bytes
private HttpHeaders headers;
private String url;
private String service;
private String operation;
}
This object stores the HTTP method, headers, URL, and raw body bytes of a request, while also providing placeholders for service and operation identifiers. This allows for a flexible and generic approach to capturing failed requests, regardless of their specific payload structures.
Intercepting Failed Requests with DryRunClientHttpRequestInterceptor
To capture failed requests, we utilize a custom ClientHttpRequestInterceptor
called DryRunClientHttpRequestInterceptor
. This interceptor captures request details and constructs a FailedRequest
object for serialization, all without actually sending out the original request:
public class DryRunClientHttpRequestInterceptor implements ClientHttpRequestInterceptor {
private final FailedRequest failedRequest = new FailedRequest();
@Override
public ClientHttpResponse intercept(HttpRequest request, byte[] body, ClientHttpRequestExecution execution) {
failedRequest.setUrl(request.getURI().toString());
failedRequest.setHeaders(request.getHeaders());
failedRequest.setMethod(request.getMethod());
failedRequest.setPayload(body);
return new ClientHttpResponse() {
// ... Status code and other methods ...
};
}
}
This interceptor ensures that even in the case of a failed request, no actual external request is made, and the necessary data is collected for future retry.
The RetryHelper
Class: Building and Capturing Failed Requests
To streamline the process of building and capturing failed requests, we've created the RetryHelper
class. This class leverages the DryRunClientHttpRequestInterceptor
to construct a failed request object and logs it for later processing:
public class RetryHelper {
private RetryHelper() {}
public static FailedRequest buildFailedRequest(String url, HttpMethod method, HttpEntity<?> requestEntity,
String service, String operation) {
val restTemplate = new RestTemplate();
val interceptor = new DryRunClientHttpRequestInterceptor();
restTemplate.setInterceptors(Arrays.asList(interceptor));
restTemplate.exchange(url, method, requestEntity, String.class);
val failedRequest = interceptor.getFailedRequest();
failedRequest.setService(service);
failedRequest.setOperation(operation);
log.info("failed request\n{}", failedRequest);
return failedRequest;
}
}
This RetryHelper
class encapsulates the process of constructing a failed request object and populating it with the necessary information.
Serializing and Storing Failed Requests
Once the FailedRequest
object is constructed, it can be serialized and stored for later retry attempts. The serialized object can be saved to various storage mechanisms, such as databases, files, or message queues like Kafka or RabbitMQ. The choice of storage depends on your architecture and requirements.
Using a Database
Storing failed requests in a database provides durability and query capabilities. By mapping the FailedRequest
object to a database entity, you can persist the failed requests and later query and process them efficiently.
Leveraging Message Queues
Alternatively, you can use message queues like Kafka or RabbitMQ to publish the serialized FailedRequest
objects. Subscribers of the queue can then consume and retry the failed requests independently, ensuring that retry attempts are decoupled from the main application.
Retrying Failed Requests
Retrying the failed requests can be approached in two ways: locally or asynchronously through a separate process.
Local Retries
For local retries, you can have a dedicated batch job that reads the serialized FailedRequest
objects, reconstructs the HTTP requests, and attempts to resend them. This approach is suitable for scenarios where immediate retries are required.
Asynchronous Retries
Asynchronous retries involve separate workers or microservices that consume the serialized FailedRequest
objects from the message queue and attempt to resend them. This approach provides scalability and resilience, as retries can be distributed across multiple instances.
Deserializing and Retrying Failed Requests
In the retry batch process, we need to deserialize the stored FailedRequest
objects and attempt to retry them. Here's an example of how you can achieve this:
FailedRequest failedRequest = mapper.readValue(record.getFailedRequest(), FailedRequest.class);
HttpHeaders headers = failedRequest.getHeaders();
if (failedRequest.getUrl().startsWith("https")) {
tokenProvider.getAccessToken().ifPresent(token -> headers.set(HttpHeaders.AUTHORIZATION, token));
}
HttpEntity<byte[]> payload = new HttpEntity<>(failedRequest.getPayload(), headers);
ResponseEntity<String> response = restTemplate.exchange(failedRequest.getUrl(), failedRequest.getMethod(), payload, String.class);
Conclusion
Retrying failed requests in batch jobs requires a robust and generic approach, especially when dealing with heterogeneous payloads. By capturing request details using the FailedRequest
object and leveraging the DryRunClientHttpRequestInterceptor
, you can efficiently serialize and store failed requests. The choice of storage, whether it's a database or message queue, depends on your architectural needs. Whether you opt for local or asynchronous retries, the approach outlined here provides a flexible and efficient way to manage and retry failed requests in your batch processing pipelines.