PostgreSQL JSON Benchmark

This page documents a representative db-import benchmark for loading denormalized PostgreSQL rows into Redis 8 as JSON documents while a RediSearch JSON index is active.

The benchmark is intended to estimate cold-load behavior for appointment-search style workloads where each source row produces one Redis JSON document.

Environment

Component Value

Component	Value
Source	PostgreSQL 16 in Docker
Target	Redis 8.6.2 in Docker, with `ReJSON` and `search` modules loaded
RIOT-X command	`db-import`
Documents	1,000,000 synthetic appointment-like JSON documents
Average JSON size	4,811 bytes
RIOT-X settings	`--fetch 10000 --batch 500 --threads 8 --pool 8`

Source

PostgreSQL 16 in Docker

Target

Redis 8.6.2 in Docker, with ReJSON and search modules loaded

RIOT-X command

db-import

Documents

1,000,000 synthetic appointment-like JSON documents

Average JSON size

4,811 bytes

RIOT-X settings

--fetch 10000 --batch 500 --threads 8 --pool 8

This benchmark used local Docker containers on one development machine. Use it as sizing guidance, then validate on your actual PostgreSQL replica, network path, and Redis deployment.

Source Table

The benchmark source table stores the Redis key field and the JSON document text that db-import writes with json.set.

DROP TABLE IF EXISTS public.appointment_json_benchmark;

CREATE UNLOGGED TABLE public.appointment_json_benchmark (
  appointment_id BIGINT PRIMARY KEY,
  org_id INTEGER NOT NULL,
  doc TEXT NOT NULL
);

The test data was inserted in 100,000-row batches. Each batch used generate_series(:row_start, :row_end):

INSERT INTO public.appointment_json_benchmark (appointment_id, org_id, doc)
SELECT
  gs AS appointment_id,
  CASE WHEN gs <= 90000 THEN 42 ELSE ((gs % 1690) + 1)::integer END AS org_id,
  jsonb_build_object(
    'appointmentId', gs,
    'orgId', CASE WHEN gs <= 90000 THEN 42 ELSE ((gs % 1690) + 1)::integer END,
    'status', (ARRAY['scheduled','checked_in','loaded','cancelled','completed'])[(gs % 5) + 1],
    'startTime', to_char(timestamp '2026-01-01' + ((gs % 63072000) * interval '1 second'), 'YYYY-MM-DD"T"HH24:MI:SS"Z"'),
    'endTime', to_char(timestamp '2026-01-01' + ((gs % 63072000) * interval '1 second') + interval '2 hours', 'YYYY-MM-DD"T"HH24:MI:SS"Z"'),
    'referenceNumber', 'REF-' || lpad(gs::text, 12, '0'),
    'createdBy', jsonb_build_object('id', gs % 50000, 'name', 'Created User ' || (gs % 50000), 'email', 'created' || (gs % 50000) || '@example.com'),
    'updatedBy', jsonb_build_object('id', gs % 50000, 'name', 'Updated User ' || (gs % 50000), 'email', 'updated' || (gs % 50000) || '@example.com'),
    'dock', jsonb_build_object('id', gs % 25000, 'name', 'Dock ' || (gs % 25000), 'doorNumber', (gs % 400)::text),
    'warehouse', jsonb_build_object('id', gs % 5000, 'name', 'Warehouse ' || (gs % 5000), 'timezone', 'America/Chicago'),
    'loadType', jsonb_build_object('id', gs % 500, 'name', 'Load Type ' || (gs % 500)),
    'carrier', jsonb_build_object('id', gs % 80000, 'name', 'Carrier User ' || (gs % 80000), 'email', 'carrier' || (gs % 80000) || '@example.com'),
    'company', jsonb_build_object('id', gs % 15000, 'name', 'Company ' || (gs % 15000), 'scac', 'SC' || lpad((gs % 9999)::text, 4, '0')),
    'customForm', jsonb_build_object(
      'id', gs % 200000,
      'templateId', gs % 250,
      'answers', jsonb_build_array(
        jsonb_build_object('field', 'po', 'value', 'PO-' || gs),
        jsonb_build_object('field', 'temperature', 'value', (gs % 80)::text)
      )
    ),
    'customFields', jsonb_build_object(
      'priority', (gs % 10),
      'searchText', left(repeat(md5(gs::text || ':custom'), (3900 / 32) + 1), 3900)
    )
  )::text AS doc
FROM generate_series(:row_start::bigint, :row_end::bigint) AS gs;

The generated corpus measured:

SELECT count(*) AS rows,
       round(avg(length(doc))::numeric, 2) AS avg_doc_bytes,
       min(length(doc)) AS min_doc_bytes,
       max(length(doc)) AS max_doc_bytes,
       pg_size_pretty(pg_total_relation_size('public.appointment_json_benchmark'::regclass)) AS table_size
FROM public.appointment_json_benchmark;

Metric	Value
Rows	1,000,000
Average document size	4,811 bytes
Minimum document size	4,732 bytes
Maximum document size	4,820 bytes
PostgreSQL table size	673 MB

Metric

Value

Rows

1,000,000

Average document size

4,811 bytes

Minimum document size

4,732 bytes

Maximum document size

4,820 bytes

PostgreSQL table size

673 MB

Redis 8 Index

The Redis 8 target had this JSON index in place before the import started:

FT.CREATE idx:appointments ON JSON PREFIX 1 appointment: SCHEMA \
  '$.orgId' AS orgId NUMERIC SORTABLE \
  '$.status' AS status TAG \
  '$.startTime' AS startTime TAG \
  '$.referenceNumber' AS referenceNumber TAG \
  '$.dock.name' AS dockName TAG \
  '$.warehouse.name' AS warehouseName TAG \
  '$.carrier.name' AS carrierName TAG \
  '$.company.name' AS companyName TAG \
  '$.customFields.searchText' AS customSearchText TEXT

referenceNumber and the name fields are modeled as TAG fields here. This reduces index records and memory compared with indexing those fields as TEXT when exact-match or faceted lookup semantics are sufficient.

RIOT-X Command

The import query selected only the key field and the JSON text:

SELECT appointment_id, doc
FROM public.appointment_json_benchmark

The RIOT-X command wrote each row as a JSON document:

riotx db-import \
  'SELECT appointment_id, doc FROM public.appointment_json_benchmark' \
  -u redis://localhost:16380 \
  --pool 8 \
  --jdbc-url 'jdbc:postgresql://localhost:15432/riotx_bench?ApplicationName=riotx-appointment-json-tag-benchmark' \
  --jdbc-user riotx \
  --jdbc-pass riotx \
  --fetch 10000 \
  --batch 500 \
  --threads 8 \
  json.set 'appointment:#{appointment_id}' --value doc

Results

Metric	Value
Import time	199 seconds
Throughput	5,025 documents/second
Redis keys	1,000,000
Indexed documents	1,000,000
Redis memory used	10.63 GB

Metric

Value

Import time

199 seconds

Throughput

5,025 documents/second

Redis keys

1,000,000

Indexed documents

1,000,000

Redis memory used

10.63 GB

Selected FT.INFO values after the load:

Metric Value

Metric	Value
`num_docs`	1,000,000
`num_records`	9,066,141
`records_per_doc_avg`	9.07
`inverted_sz_mb`	284 MB
`doc_table_size_mb`	80 MB
`sortable_values_size_mb`	23 MB
`key_table_size_mb`	31 MB

num_docs

1,000,000

num_records

9,066,141

records_per_doc_avg

9.07

inverted_sz_mb

284 MB

doc_table_size_mb

80 MB

sortable_values_size_mb

23 MB

key_table_size_mb

31 MB

Full-Corpus Estimate

At 5,025 documents/second, a 36,900,717-document cold load projects to:

36,900,717 / 5,025 docs/sec = 7,343 seconds = 2.04 hours

For production planning, include headroom for network latency, Redis topology, index complexity, source SQL cost, and any rate limit needed to protect the PostgreSQL replica. A practical estimate for this workload is:

Scenario	Estimated duration
Local benchmark extrapolation	~2 hours
Typical production planning range	2.5-4 hours
Conservative planning range	4-6 hours

Scenario

Estimated duration

Local benchmark extrapolation

~2 hours

Typical production planning range

2.5-4 hours

Conservative planning range

4-6 hours

To complete 36,900,717 documents within 8 hours, RIOT-X must sustain about 1,281 documents/second:

36,900,717 / (8 * 60 * 60) = 1,281 docs/sec

The measured local throughput is about 3.9x that threshold.

Operational Notes

Run cold-load benchmarks against a PostgreSQL replica, not the primary database.
Use a positive JDBC fetch size. For large result sets, --fetch 10000 avoids loading the full result set upfront.
Treat --batch 500 --threads 8 --pool 8 as a starting point, not a universal optimum. In local tests, larger thread counts reduced throughput.
If the PostgreSQL replica needs protection, use --rate to cap write throughput. For example, 2,000 documents/second still completes 36.9M documents in about 5.1 hours.
Validate memory with production-like documents and the production index schema. Synthetic large text fields can materially change RediSearch memory usage.
Create the RediSearch index before the benchmark when you need to measure indexing cost as part of cold-load time.