Big Keys

When replicating large datasets, you may encounter keys that exceed the memory limit that RIOT can efficiently handle in-memory. This page explains the issue and provides solutions using utility scripts.

Download migrate-big-keys.py | Download keystats.py

The Problem

RIOT reads key values into memory before writing them to the target Redis instance. For very large keys (e.g., strings or collections exceeding hundreds of megabytes), this can cause:

Out-of-memory errors
Excessive GC pauses
Slow replication throughput

To prevent these issues, RIOT provides the --mem-limit option which skips keys exceeding the specified size. However, these large keys still need to be migrated.

Solution: Chunked Struct Replication (sync2)

The sync2 command with --struct streams each key’s value in chunks using type-specific iteration commands (HSCAN, SSCAN, ZSCAN, LRANGE windows, GETRANGE windows, XRANGE pagination, TS.RANGE pagination), so memory usage is bounded regardless of key size:

riotx sync2 redis://source:6379 redis://target:6379 --struct

The --chunk option controls how many elements are read per chunk (default: 1000); strings are read in 1MB windows. JSON documents cannot be iterated and are always transferred whole. --mem-limit remains available to skip keys above a given size entirely.

A chunked key is not written in a single command. If the source key mutates mid-transfer, the target may temporarily hold a hybrid value; in live mode the next keyspace notification re-replicates the key.

Full comparison (--compare full) uses the same chunked reads, so verification also works on keys larger than the JVM heap.

Solution: Two-Phase Migration

The recommended approach is a two-phase migration:

Phase 1: Use riotx replicate with --mem-limit to efficiently replicate all keys below the threshold
Phase 2: Use migrate-big-keys.py to migrate the remaining large keys

The script automatically chooses the best migration strategy:

Keys < 2GB: Uses Redis DUMP/RESTORE for fast atomic transfer
Keys ≥ 2GB: Uses chunked migration with type-specific commands

This approach gives you the best of both worlds: fast parallel replication for most keys, and reliable chunked transfer for oversized keys.

Usage

Phase 1: Replicate Small Keys

riotx replicate redis://source:6379 redis://target:6379 --mem-limit 512MB

Keys exceeding 512MB will be reported as MISSING in the final verification.

Phase 2: Migrate Big Keys

The script requires Python 3.7+ and the redis package:

python3 -m venv .venv
source .venv/bin/activate
pip install redis

Then run:

python3 migrate-big-keys.py source:6379 target:6379 --mem-limit 512MB --replace

The script will:

Scan for keys exceeding the memory limit
For each key, choose the appropriate migration method:
- Keys < 2GB: Use DUMP/RESTORE
- Keys ≥ 2GB: Use chunked migration (ZSCAN, HSCAN, GETRANGE, etc.)
Preserve TTLs during migration

Migrate a Specific Key

To migrate a single key without scanning (useful for very large keys):

python3 migrate-big-keys.py source:6379 target:6379 --key "my:large:key" --replace --verbose

Migrate Keys Listed in a File

To migrate only specific missing keys without scanning, pass --key-file. The script reads each non-empty line and treats the last whitespace-delimited token as the Redis key, so it can consume verification logs directly.

2026-04-09 10:38:45.836 ERROR Missing hash gen:470
2026-04-09 10:38:45.836 ERROR Missing string gen:2629
2026-04-09 10:38:45.836 ERROR Missing hash gen:3200

python3 migrate-big-keys.py source:6379 target:6379 \
  --key-file missing-keys.log \
  --replace \
  --verbose

Duplicate keys in the file are deduplicated automatically, and keys that no longer exist on the source are skipped with a warning.

Dry Run

To see which keys would be migrated without actually transferring them:

python3 migrate-big-keys.py source:6379 target:6379 --mem-limit 512MB --dry-run

Example output:

[INFO] Memory limit: 512MB (536870912 bytes)
[STEP] Testing Redis connections...
[INFO] Both Redis connections successful
[STEP] Scanning for keys exceeding 512MB...
[INFO] Scanned 1000 keys, found 3 keys exceeding 512MB
[INFO] [DRY RUN] Would migrate 3 keys:
  large:blob:1 (750.25MB)
  large:blob:2 (612.50MB)
  large:collection (1.20GB)

Script Options

Option Description

Option	Description
`--mem-limit <size>`	Memory limit threshold (e.g., `512MB`, `1GB`). Required unless `--key` or `--key-file` is used.
`--key <keyname>`	Migrate a specific key without scanning. Useful for very large keys.
`--key-file <path>`	Migrate keys listed in a file without scanning. The last whitespace-delimited token on each non-empty line is treated as the key.
`--replace`	Replace existing keys on target. Without this flag, keys that already exist on target will fail to restore.
`--dry-run`	Show what would be migrated without actually doing it.
`--key-pattern <pattern>`	Only scan keys matching this pattern (default: `*`).
`--verbose`	Enable verbose output showing progress for each key.
`--scan-count <n>`	Number of keys to fetch per SCAN call (default: 100).
`--chunk <n>`	Batch size for chunked migration (default: 10000). Applies to collection types.

--mem-limit <size>

Memory limit threshold (e.g., 512MB, 1GB). Required unless --key or --key-file is used.

--key <keyname>

Migrate a specific key without scanning. Useful for very large keys.

--key-file <path>

Migrate keys listed in a file without scanning. The last whitespace-delimited token on each non-empty line is treated as the key.

--replace

Replace existing keys on target. Without this flag, keys that already exist on target will fail to restore.

--dry-run

Show what would be migrated without actually doing it.

--key-pattern <pattern>

Only scan keys matching this pattern (default: *).

--verbose

Enable verbose output showing progress for each key.

--scan-count <n>

Number of keys to fetch per SCAN call (default: 100).

--chunk <n>

Batch size for chunked migration (default: 10000). Applies to collection types.

The script accepts URIs with or without the redis:// prefix (e.g., source:6379 or redis://source:6379).

Chunked Migration

For keys larger than 2GB, the script uses chunked migration instead of DUMP/RESTORE to avoid memory and protocol limits.

Supported Types

Type Method

Type	Method
`zset`	ZSCAN → ZADD in batches
`hash`	HSCAN → HSET in batches
`set`	SSCAN → SADD in batches
`list`	LRANGE pagination → RPUSH in batches
`stream`	XRANGE pagination → XADD in batches
`string`	GETRANGE → SET + APPEND (64MB chunks)

zset

ZSCAN → ZADD in batches

hash

HSCAN → HSET in batches

set

SSCAN → SADD in batches

list

LRANGE pagination → RPUSH in batches

stream

XRANGE pagination → XADD in batches

string

GETRANGE → SET + APPEND (64MB chunks)

Features

Automatic retry: Retries up to 3 times on connection errors with exponential backoff
Progress reporting: Shows progress every 5 seconds during long migrations
TTL preservation: Restores TTL after migration completes
Long timeout: 1-hour socket timeout with keepalive for very large keys

Limitations

Keys Using DUMP/RESTORE (< 2GB)

Keys smaller than 2GB use DUMP/RESTORE which may require increasing proto-max-bulk-len for keys larger than 512MB:

redis-cli -h source CONFIG SET proto-max-bulk-len 2147483648
redis-cli -h target CONFIG SET proto-max-bulk-len 2147483648

Unsupported Types for Chunked Migration

Module types (e.g., ReJSON, TimeSeries) fall back to DUMP/RESTORE and may fail if larger than 2GB.

Performance

The script migrates keys sequentially, which is slower than RIOT’s parallel pipeline approach. For datasets with many large keys, expect longer migration times.

Typical performance:

~500MB key (DUMP/RESTORE): 3-5 seconds
~1GB key (DUMP/RESTORE): 10-15 seconds
~10GB+ key (chunked): Several minutes depending on network speed

Complete Example

Migrate a database with mixed key sizes:

# Phase 1: Fast parallel replication for keys under 256MB
riotx replicate redis://source:6379 redis://target:6379 \
  --mem-limit 256MB \
  --read-threads 8 \
  --threads 4

# Phase 2: Sequential migration for large keys
python3 migrate-big-keys.py source:6379 target:6379 \
  --mem-limit 256MB \
  --replace \
  --verbose

# Verify
riotx compare redis://source:6379 redis://target:6379

Migrating a Single Very Large Key

For keys that are tens of gigabytes:

python3 migrate-big-keys.py source:6379 target:6379 \
  --key "my:huge:sorted:set" \
  --replace \
  --verbose

Analyzing Big Keys with keystats.py

The keystats.py script helps you identify big keys and analyze memory distribution across cluster slots before migration. Unlike redis-cli --bigkeys which samples keys, this script scans ALL keys.

Installation

pip install redis

Finding Big Keys (memkeys)

To find all keys above a memory threshold:

python3 keystats.py memkeys -u redis://source:6379 --mem-limit 100mb > bigkeys.csv

Output (CSV):

key,size_bytes
"my:large:hash",39461511168
"my:large:zset",20951752704

Analyzing Slot Distribution (slots)

To see how memory is distributed across slot ranges (useful for planning cluster migrations):

python3 keystats.py slots -u redis://source:6379 --shards 140 > slots.csv

Output (CSV):

shard,slot_start,slot_end,size_bytes
0,0,116,1388716
1,117,233,5366755
105,12285,12401,43770000000

This helps identify hotspots where multiple large keys land in the same shard.

keystats.py Options

Option Description

Option	Description
`memkeys`	Subcommand to find keys above a memory threshold
`slots`	Subcommand to analyze memory distribution across slot ranges
`-u, --uri <uri>`	Redis URI (e.g., `redis://host:6379` or `rediss://` for TLS)
`-h, --host <host>`	Redis host (default: localhost)
`-p, --port <port>`	Redis port (default: 6379)
`-a, --password <pass>`	Redis password
`--user <username>`	ACL username (Redis 6+)
`--tls`	Enable TLS connection
`--mem-limit <size>`	Memory threshold for `memkeys` (default: 100mb)
`--shards <n>`	Number of shards for `slots` (default: 100)
`--scan-count <n>`	SCAN batch size (default: 1000)

memkeys

Subcommand to find keys above a memory threshold

slots

Subcommand to analyze memory distribution across slot ranges

-u, --uri <uri>

Redis URI (e.g., redis://host:6379 or rediss:// for TLS)

-h, --host <host>

Redis host (default: localhost)

-p, --port <port>

Redis port (default: 6379)

-a, --password <pass>

Redis password

--user <username>

ACL username (Redis 6+)

--tls

Enable TLS connection

--mem-limit <size>

Memory threshold for memkeys (default: 100mb)

--shards <n>

Number of shards for slots (default: 100)

--scan-count <n>

SCAN batch size (default: 1000)

Example: Pre-Migration Analysis

Before migrating to a 140-shard cluster with 15GB RAM per shard:

# Find all keys that would exceed per-shard RAM limit
python3 keystats.py memkeys -u redis://source:6379 --mem-limit 15gb

# Analyze slot distribution to identify potential hotspots
python3 keystats.py slots -u redis://source:6379 --shards 140 > slots.csv