By Errong Leng — Aug 5, 2023

Retrying Failed Requests in Batch Jobs: A Generic Approach

Retrying failed HTTP requests is a common challenge in distributed systems, especially in batch processing scenarios. While libraries like Spring's WebClient and CircuitBreaker offer retry mechanisms, dealing with heterogeneous request payloads and retrying them efficiently in a batch job context requires a different approach. In this blog post, we'll explore a generic solution using the FailedRequest object and a specialized interceptor. We'll discuss how to capture and serialize failed requests, store them, and efficiently retry them at a later time.

Introducing the `FailedRequest` Object

The cornerstone of our solution is the FailedRequest object, a versatile structure designed to encapsulate essential information about a failed HTTP request:

@Data
public class FailedRequest {
    private HttpMethod method;
    @Exclude
    private byte[] payload; // raw body bytes
    private HttpHeaders headers;
    private String url;
    private String service;
    private String operation;
}

This object stores the HTTP method, headers, URL, and raw body bytes of a request, while also providing placeholders for service and operation identifiers. This allows for a flexible and generic approach to capturing failed requests, regardless of their specific payload structures.

Intercepting Failed Requests with `DryRunClientHttpRequestInterceptor`

To capture failed requests, we utilize a custom ClientHttpRequestInterceptor called DryRunClientHttpRequestInterceptor. This interceptor captures request details and constructs a FailedRequest object for serialization, all without actually sending out the original request:

public class DryRunClientHttpRequestInterceptor implements ClientHttpRequestInterceptor {

    private final FailedRequest failedRequest = new FailedRequest();

    @Override
    public ClientHttpResponse intercept(HttpRequest request, byte[] body, ClientHttpRequestExecution execution) {

        failedRequest.setUrl(request.getURI().toString());
        failedRequest.setHeaders(request.getHeaders());
        failedRequest.setMethod(request.getMethod());
        failedRequest.setPayload(body);

        return new ClientHttpResponse() {
            // ... Status code and other methods ...
        };
    }
}

This interceptor ensures that even in the case of a failed request, no actual external request is made, and the necessary data is collected for future retry.

The `RetryHelper` Class: Building and Capturing Failed Requests

To streamline the process of building and capturing failed requests, we've created the RetryHelper class. This class leverages the DryRunClientHttpRequestInterceptor to construct a failed request object and logs it for later processing:

public class RetryHelper {

    private RetryHelper() {}

    public static FailedRequest buildFailedRequest(String url, HttpMethod method, HttpEntity<?> requestEntity,
            String service, String operation) {
        val restTemplate = new RestTemplate();
        val interceptor = new DryRunClientHttpRequestInterceptor();
        restTemplate.setInterceptors(Arrays.asList(interceptor));
        restTemplate.exchange(url, method, requestEntity, String.class);
        val failedRequest = interceptor.getFailedRequest();
        failedRequest.setService(service);
        failedRequest.setOperation(operation);
        log.info("failed request\n{}", failedRequest);
        return failedRequest;
    }
}

This RetryHelper class encapsulates the process of constructing a failed request object and populating it with the necessary information.

Serializing and Storing Failed Requests

Once the FailedRequest object is constructed, it can be serialized and stored for later retry attempts. The serialized object can be saved to various storage mechanisms, such as databases, files, or message queues like Kafka or RabbitMQ. The choice of storage depends on your architecture and requirements.

Using a Database

Storing failed requests in a database provides durability and query capabilities. By mapping the FailedRequest object to a database entity, you can persist the failed requests and later query and process them efficiently.

Leveraging Message Queues

Alternatively, you can use message queues like Kafka or RabbitMQ to publish the serialized FailedRequest objects. Subscribers of the queue can then consume and retry the failed requests independently, ensuring that retry attempts are decoupled from the main application.

Retrying Failed Requests

Retrying the failed requests can be approached in two ways: locally or asynchronously through a separate process.

Local Retries

For local retries, you can have a dedicated batch job that reads the serialized FailedRequest objects, reconstructs the HTTP requests, and attempts to resend them. This approach is suitable for scenarios where immediate retries are required.

Asynchronous Retries

Asynchronous retries involve separate workers or microservices that consume the serialized FailedRequest objects from the message queue and attempt to resend them. This approach provides scalability and resilience, as retries can be distributed across multiple instances.

Deserializing and Retrying Failed Requests

In the retry batch process, we need to deserialize the stored FailedRequest objects and attempt to retry them. Here's an example of how you can achieve this:

FailedRequest failedRequest = mapper.readValue(record.getFailedRequest(), FailedRequest.class);

HttpHeaders headers = failedRequest.getHeaders();

if (failedRequest.getUrl().startsWith("https")) {
    tokenProvider.getAccessToken().ifPresent(token -> headers.set(HttpHeaders.AUTHORIZATION, token));
}

HttpEntity<byte[]> payload = new HttpEntity<>(failedRequest.getPayload(), headers);

ResponseEntity<String> response = restTemplate.exchange(failedRequest.getUrl(), failedRequest.getMethod(), payload, String.class);

Conclusion

Retrying failed requests in batch jobs requires a robust and generic approach, especially when dealing with heterogeneous payloads. By capturing request details using the FailedRequest object and leveraging the DryRunClientHttpRequestInterceptor, you can efficiently serialize and store failed requests. The choice of storage, whether it's a database or message queue, depends on your architectural needs. Whether you opt for local or asynchronous retries, the approach outlined here provides a flexible and efficient way to manage and retry failed requests in your batch processing pipelines.

Introducing the FailedRequest Object

Intercepting Failed Requests with DryRunClientHttpRequestInterceptor

The RetryHelper Class: Building and Capturing Failed Requests