Automatic Failover and Failback with Jedis

API was significantly changed in 7.0.0. Please follow the migration guide below.

This feature is experimental and may change in future versions.

Jedis supports failover and failback for your Redis deployments. This is useful when: 1. You have more than one Redis deployment. This might include two independent Redis servers or two or more Redis databases replicated across multiple active-active Redis Enterprise clusters. 2. You want your application to connect to and use one deployment at a time. 3. You want your application to fail over to the next available deployment if the current deployment becomes unavailable. 4. You want your application to fail back to the original deployment when it becomes available again.

Jedis will fail over to a subsequent Redis deployment after reaching a configurable failure threshold. This failure threshold is implemented using a circuit breaker pattern.

You can also configure Jedis to retry failed calls to Redis. Once a maximum number of retries have been exhausted, the circuit breaker will record a failure. When the circuit breaker reaches its failure threshold, a failover will be triggered on the subsequent operation. In the background, Jedis executes configured health checks to determine when a Redis deployment is available again. When this occurs, Jedis will fail back to the original deployment after a configurable grace period.

The remainder of this guide describes:

A basic failover and health check configuration
Supported retry and circuit breaker settings
Failback and the database selection API

We recommend that you read this guide carefully and understand the configuration settings before enabling Jedis failover in production.

Migration from 6.x to 7.x

In Jedis 6.x, failover was supported using special constructor for UnifiedJedis. In Jedis 7.x, failover is supported using MultiDbClient and MultiDbConfig.builder:

// Jedis 6.x
JedisClientConfig config = DefaultJedisClientConfig.builder().user("cache").password("secret").build();

ClusterConfig[] clientConfigs = new ClusterConfig[2];
clientConfigs[0] = new ClusterConfig(new HostAndPort("redis-east.example.com", 14000), config);
clientConfigs[1] = new ClusterConfig(new HostAndPort("redis-west.example.com", 14000), config);

MultiClusterClientConfig.Builder builder = new MultiClusterClientConfig.Builder(clientConfigs);
// ...
MultiClusterPooledConnectionProvider provider = new MultiClusterPooledConnectionProvider(builder.build());
UnifiedJedis client = new UnifiedJedis(provider);

// Jedis 7.x
// MultiClusterClientConfig was renamed to MultiDbConfig and MultiDbClient with convenient builder was added
MultiDbConfig multiConfig = MultiDbConfig.builder()
        .database(DatabaseConfig.builder(east, config).weight(1.0f).build())
        .database(DatabaseConfig.builder(west, config).weight(0.5f).build())
        .build();
// Use MultiDbClient instead of UnifiedJedis
MultiDbClient multiDbClient = MultiDbClient.builder().multiDbConfig(multiConfig).build();

For more details on configuration options see sections below.

Installing optional dependencies

Jedis failover support is provided by optional dependencies. To use failover, add the following dependencies to your project:

<dependency>
    <groupId>io.github.resilience4j</groupId>
    <artifactId>resilience4j-all</artifactId>
    <version>1.7.1</version>
</dependency>
<dependency>
    <groupId>io.github.resilience4j</groupId>
    <artifactId>resilience4j-circuitbreaker</artifactId>
    <version>1.7.1</version>
</dependency>
<dependency>
    <groupId>io.github.resilience4j</groupId>
    <artifactId>resilience4j-retry</artifactId>
    <version>1.7.1</version>
</dependency>

Basic usage

To configure Jedis for failover, you specify a weighted list of Redis databases. Jedis will connect to the Redis database in the list with the highest weight. If the highest-weighted database becomes unavailable, Jedis will attempt to connect to the database with the next highest weight in the list, and so on.

Suppose you run two Redis deployments. We'll call them redis-east and redis-west. You want your application to first connect to redis-east. If redis-east becomes unavailable, you want your application to connect to redis-west.

Let's look at one way of configuring Jedis for this scenario.

First, start by defining the initial configuration for each Redis database available and prioritize them using weights.

JedisClientConfig config = DefaultJedisClientConfig.builder()
        .user("cache").password("secret")
        .socketTimeoutMillis(5000).connectionTimeoutMillis(5000).build();

// Custom pool config per database can be provided
ConnectionPoolConfig poolConfig = new ConnectionPoolConfig();
poolConfig.setMaxTotal(8);
poolConfig.setMaxIdle(8);
poolConfig.setMinIdle(0);
poolConfig.setBlockWhenExhausted(true);
poolConfig.setMaxWait(Duration.ofSeconds(1));
poolConfig.setTestWhileIdle(true);
poolConfig.setTimeBetweenEvictionRuns(Duration.ofSeconds(1));

HostAndPort east = new HostAndPort("redis-east.example.com", 14000);
HostAndPort west = new HostAndPort("redis-west.example.com", 14000);

MultiDbConfig.Builder multiConfig = MultiDbConfig.builder()
        .database(DatabaseConfig.builder(east, config).connectionPoolConfig(poolConfig).weight(1.0f).build())
        .database(DatabaseConfig.builder(west, config).connectionPoolConfig(poolConfig).weight(0.5f).build());

The configuration above represents your two Redis deployments: redis-east and redis-west.

Continue using the MultiDbConfig.Builder builder to set your preferred retry and failover configuration. Then build a MultiDbClient:

// Configure circuit breaker for failure detection
multiConfig
        .failureDetector(MultiDbConfig.CircuitBreakerConfig.builder()
                .slidingWindowSize(1000)        // Sliding window size in number of calls
                .failureRateThreshold(50.0f)    // percentage of failures to trigger circuit breaker
                .minNumOfFailures(500)          // Minimum number of failures before circuit breaker is tripped
                .build())
        .failbackSupported(true)                // Enable failback
        .failbackCheckInterval(1000)            // Check every second the unhealthy database to see if it has recovered
        .gracePeriod(10000)                     // Keep database disabled for 10 seconds after it becomes unhealthy
        // Optional: configure retry settings
        .commandRetry(MultiDbConfig.RetryConfig.builder()
                .maxAttempts(3)                  // Maximum number of retry attempts (including the initial call)
                .waitDuration(500)               // Number of milliseconds to wait between retry attempts
                .exponentialBackoffMultiplier(2) // Exponential backoff factor multiplied against wait duration between retries
                .build())
        // Optional: configure fast failover
        .fastFailover(true)                       // Force closing connections to unhealthy database on failover
        .retryOnFailover(false);                  // Do not retry failed commands during failover

MultiDbClient multiDbClient = MultiDbClient.builder()
        .multiDbConfig(multiConfig.build())
        .build();

In the configuration here, we've set a sliding window size of 1000 and a failure rate threshold of 50%. This means that a failover will be triggered only if both 500 out of any 1000 calls to Redis fail (i.e., the failure rate threshold is reached) and the minimum number of failures is also met.

You can now use this MultiDbClient instance in your application to execute Redis commands.

Configuration options

Under the hood, Jedis' failover support relies on resilience4j, a fault-tolerance library that implements retry and circuit breakers.

Once you configure a MultiDbClient, each call to Redis is decorated with a resilience4j retry and circuit breaker.

By default, any call that throws a JedisConnectionException will be retried up to 3 times. If the call fail then the circuit breaker will record a failure.

The circuit breaker maintains a record of failures in a sliding window data structure. If the failure rate reaches a configured threshold (e.g., when 50% of the last 1000 calls have failed), then the circuit breaker's state transitions from CLOSED to OPEN. When this occurs, Jedis will attempt to connect to the next Redis database with the highest weight in its client configuration list.

The supported retry and circuit breaker settings, and their default values, are described below. You can configure any of these settings using the MultiDbConfig.Builder builder. Refer the basic usage above for an example of this.

Retry configuration

Configuration for command retry behavior is encapsulated in MultiDbConfig.RetryConfig. Jedis uses the following retry settings:

Setting	Default value	Description
Max retry attempts	3	Maximum number of retry attempts (including the initial call)
Retry wait duration	500 ms	Number of milliseconds to wait between retry attempts
Wait duration backoff multiplier	2	Exponential backoff factor multiplied against wait duration between retries. For example, with a wait duration of 1 second and a multiplier of 2, the retries would occur after 1s, 2s, 4s, 8s, 16s, and so on.
Retry included exception list	[JedisConnectionException]	A list of Throwable classes that count as failures and should be retried.
Retry ignored exception list	null	A list of Throwable classes to explicitly ignore for the purposes of retry.

To disable retry, set maxAttempts to 1.

Circuit breaker configuration

For failover, Jedis uses a circuit breaker to detect when a Redis database has failed. Failover configuration is encapsulated in MultiDbConfig.CircuitBreakerConfig and can be provided using the MultiDbConfig.Builder.failureDetector(). Jedis uses the following circuit breaker settings:

Setting	Default value	Description
Sliding window size	2	The size of the sliding window. Units depend on sliding window type. The size represents seconds.
Threshold min number of failures	1000	Minimum number of failures before circuit breaker is tripped.
Failure rate threshold	`10.0f`	Percentage of calls within the sliding window that must fail before the circuit breaker transitions to the `OPEN` state.
Circuit breaker included exception list	[JedisConnectionException]	A list of Throwable classes that count as failures and add to the failure rate.
Circuit breaker ignored exception list	null	A list of Throwable classes to explicitly ignore for failure rate calculations.

Health Check Configuration and Customization

The MultiDbClient includes a comprehensive health check system that continuously monitors the availability of Redis databases to enable automatic failover and failback.

The health check system serves several critical purposes in the failover architecture:

Proactive Monitoring: Continuously monitors passive databases that aren't currently receiving traffic
Failback Detection: Determines when a previously failed database has recovered and is ready to accept traffic
Circuit Breaker Integration: Works with the circuit breaker pattern to manage database state transitions
Customizable Strategies: Supports pluggable health check implementations for different deployment scenarios

The health check system operates independently of your application traffic, running background checks at configurable intervals to assess database health without impacting performance.

Available Health Check Types

1. PingStrategy (Default)

The PingStrategy is the default health check implementation that uses Redis's PING command to verify both connectivity and write capability.

Use Cases: - General-purpose health checking for most Redis deployments - Verifying both read and write operations - Simple connectivity validation

How it works: - Sends PING command to the Redis server - Expects exact response "PONG" to consider the server healthy - Any exception or unexpected response marks the server as unhealthy

2. LagAwareStrategy [PREVIEW] (Redis Enterprise)

The LagAwareStrategy is designed specifically for Redis Enterprise Active-Active deployments and uses the Redis Enterprise REST API to check database availability and replication lag.

Use Cases: - Redis Enterprise Active-Active (CRDB) deployments - Scenarios where replication lag tolerance is critical - Enterprise environments with REST API access

How it works: - Queries Redis Enterprise REST API for database availability - Optionally validates replication lag against configurable thresholds - Automatically discovers database IDs based on endpoint hostnames

Example Configuration:

BiFunction<HostAndPort, Supplier<RedisCredentials>, MultiDbConfig.StrategySupplier> healthCheckStrategySupplier =
        (HostAndPort dbHostPort, Supplier<RedisCredentials> credentialsSupplier) -> {
            LagAwareStrategy.Config lagConfig = LagAwareStrategy.Config.builder(dbHostPort, credentialsSupplier)
                    .interval(5000)                                          // Check every 5 seconds
                    .timeout(3000)                                           // 3 second timeout
                    .extendedCheckEnabled(true)
                    .build();

            return (hostAndPort, jedisClientConfig) -> new LagAwareStrategy(lagConfig);
        };

// Configure REST API endpoint and credentials
HostAndPort restEndpoint = new HostAndPort("redis-enterprise-db-fqdn", 9443);
Supplier<RedisCredentials> credentialsSupplier = () ->
        new DefaultRedisCredentials("rest-api-user", "pwd");

MultiDbConfig.StrategySupplier lagawareStrategySupplier = healthCheckStrategySupplier.apply(
        restEndpoint, credentialsSupplier);

MultiDbConfig.DatabaseConfig dbConfig =
        MultiDbConfig.DatabaseConfig.builder(hostAndPort, clientConfig)
                .healthCheckStrategySupplier(lagawareStrategySupplier)
                .build();

3. Custom Health Check Strategies

You can implement custom health check strategies by implementing the HealthCheckStrategy interface.

Use Cases: - Application-specific health validation logic - Integration with external monitoring systems - Custom performance or latency-based health checks

Use the healthCheckStrategySupplier() method to provide a custom health check implementation:

// Custom strategy supplier
MultiDbConfig.StrategySupplier customStrategy =
        (hostAndPort, jedisClientConfig) -> {
            // Return your custom HealthCheckStrategy implementation
            return new MyCustomHealthCheckStrategy(hostAndPort, jedisClientConfig);
        };

MultiDbConfig.DatabaseConfig dbConfig =
        MultiDbConfig.DatabaseConfig.builder(hostAndPort, clientConfig)
                .healthCheckStrategySupplier(customStrategy)
                .weight(1.0f)
                .build();

You can implement custom health check strategies by implementing the HealthCheckStrategy interface:

MultiDbConfig.StrategySupplier pingStrategy = (hostAndPort, jedisClientConfig) -> {
    return new HealthCheckStrategy() {
        @Override
        public int getInterval() {
            return 1000; // Check every second
        }

        @Override
        public int getTimeout() {
            return 500; // 500ms timeout
        }


        @Override
        public int getNumProbes() {
            return 1;
        }

        @Override
        public ProbingPolicy getPolicy() {
            return ProbingPolicy.BuiltIn.ANY_SUCCESS;
        }

        @Override
        public int getDelayInBetweenProbes() {
            return 100;
        }
        @Override
        public HealthStatus doHealthCheck(Endpoint endpoint) {
            try (UnifiedJedis jedis = new UnifiedJedis(hostAndPort, jedisClientConfig)) {
                String result = jedis.ping();
                return "PONG".equals(result) ? HealthStatus.HEALTHY : HealthStatus.UNHEALTHY;
            } catch (Exception e) {
                return HealthStatus.UNHEALTHY;
            }
        }

        @Override
        public void close() {
            // Cleanup resources if needed
        }
    };
};

MultiDbConfig.DatabaseConfig dbConfig =
        MultiDbConfig.DatabaseConfig.builder(hostAndPort, clientConfig)
                .healthCheckStrategySupplier(pingStrategy)
                .build();

Disabling Health Checks

Use the healthCheckEnabled(false) method to completely disable health checks:

MultiDbConfig.DatabaseConfig dbConfig = MultiDbConfig.DatabaseConfig.builder(east, config)
    .healthCheckEnabled(false) // Disable health checks entirely
    .build();

Fallback configuration

Jedis uses the following fallback settings:

Setting	Default value	Description
Fallback exception list	[CallNotPermittedException, JedisConnectionException]	A list of Throwable classes that trigger fallback.

Failover callbacks

In the event that Jedis fails over, you may wish to take some action. This might include logging a warning, recording a metric, or externally persisting the database connection state, to name just a few examples. For this reason, MultiDbClient lets you register a custom callback that will be called whenever Jedis fails over to a new database.

To use this feature, you'll need to design a class that implements java.util.function.Consumer. This class must implement the accept method, as you can see below.

public class FailoverReporter implements Consumer<DatabaseSwitchEvent> {

    @Override
    public void accept(DatabaseSwitchEvent e) {
        System.out.println("Jedis failover to database: " + e.getDatabaseName() + " due to " + e.getReason());
    }
}

DatabaseSwitchEvent consumer can be registered as follows:

FailoverReporter reporter = new FailoverReporter();
MultiDbClient client = MultiDbClient.builder()
        .databaseSwitchListener(reporter)
        .build();

The provider will call your accept whenever a failover occurs. or directly using lambda expression:

MultiDbClient client = MultiDbClient.builder()
        .databaseSwitchListener(event -> System.out.println("Switched to: " + event.getEndpoint()))
        .build();

Failing back

Jedis supports automatic failback based on health checks or manual failback using the database selection API.

Failback scenario

When a failover is triggered, Jedis will attempt to connect to the next Redis server based on the weights of server configurations you provide at setup.

For example, recall the redis-east and redis-west deployments from the basic usage example above. Jedis will attempt to connect to redis-east first. If redis-east becomes unavailable (and the circuit breaker transitions), then Jedis will attempt to use redis-west.

Now suppose that redis-east eventually comes back online. You will likely want to fail your application back to redis-east.

Automatic failback based on health checks

When health checks are enabled, Jedis automatically monitors the health of all configured databases, including those that are currently inactive due to previous failures. The automatic failback process works as follows:

Continuous Monitoring: Health checks run continuously for all databases, regardless of their current active status
Recovery Detection: When a previously failed database passes the required number of consecutive health checks, it's marked as healthy
Weight-Based Failback: If automatic failback is enabled and a recovered database has a higher weight than the currently active database, Jedis will automatically switch to the recovered database
Grace Period Respect: Failback only occurs after the configured grace period has elapsed since the database was marked as unhealthy

Manual Failback using the database selection API

Once you've determined that it's safe to fail back to a previously-unavailable database, you need to decide how to trigger the failback. There are two ways to accomplish this:

MultiDbClient exposes a method that you can use to manually select which database Jedis should use. To select a different database to use, pass the database's HostAndPort to setActiveDatabase():

        Endpoint endpoint =  new HostAndPort("redis-east.example.com", 14000);
        client.setActiveDatabase(endpoint);

This method is thread-safe.

If you decide to implement manual failback, you will need a way for external systems to trigger this method in your application. For example, if your application exposes a REST API, you might consider creating a REST endpoint to call setActiveDatabase and fail back the application.

Troubleshooting Failover and Failback Issues

Health Checks Always Report Unhealthy

Common causes: - Timeout too aggressive for network conditions - Authentication issues with Redis server - Network connectivity problems

Solutions:

// Increase timeout values
HealthCheckStrategy.Config config = HealthCheckStrategy.Config.builder()
    .timeout(3000)  // Increase from default 1000ms
    .build();

Intermittent Health Check Failures

Solutions:

// Require more consecutive successes for stability
HealthCheckStrategy.Config config = HealthCheckStrategy.Config.builder()
    .interval(5000)                 // Less frequent checks
    .timeout(2000)                  // More generous timeout
    .build();

Slow Failback After Recovery

Solutions:

// Faster recovery configuration
HealthCheckStrategy.Config config = HealthCheckStrategy.Config.builder()
    .interval(1000)                    // More frequent checks
    .build();

// Adjust failback timing
MultiDbConfig multiConfig = MultiDbConfig.builder()
        .gracePeriod(5000)                 // Shorter grace period
        .build();

Need help or have questions?

For assistance with this automatic failover and failback feature, start a discussion.