Elasticsearch

RIOT-X provides comprehensive support for importing data from both Elasticsearch and OpenSearch clusters using the elastic-import command.

Overview

The Elasticsearch integration uses efficient pagination techniques and provides extensive configuration options for production environments.

Key Features

Efficient Pagination: Uses search_after API for optimal performance with large datasets
Metadata Control: Configure inclusion/exclusion of Elasticsearch metadata fields
Multi-Host Support: Connect to clustered environments with failover
Authentication: HTTP Basic Authentication support
SSL Configuration: Production-ready SSL/TLS support

Command Syntax

riotx elastic-import <index> [OPTIONS] <redis-command> [REDIS-COMMAND-OPTIONS]

Basic Examples

Import from Elasticsearch

riotx elastic-import my-index --elastic-host http://localhost:9200 json.set doc:#{_id}

Import from OpenSearch

riotx elastic-import my-index --opensearch --elastic-host http://localhost:9200 json.set doc:#{_id}

Configuration Options

Connection Settings

Single Host

riotx elastic-import my-index --elastic-host http://localhost:9200 json.set doc:#{_id}

Multi-Host Clusters

riotx elastic-import my-index --elastic-host https://node1.example.com:9200 https://node2.example.com:9200 https://node3.example.com:9200 json.set doc:#{_id}

Authentication

HTTP Basic Authentication

riotx elastic-import my-index --elastic-host https://elasticsearch.example.com:9200 --elastic-user admin --elastic-pass mypassword json.set doc:#{_id}

SSL Configuration

Production SSL

riotx elastic-import my-index --elastic-host https://elasticsearch.example.com:9200 --elastic-user admin --elastic-pass mypassword json.set doc:#{_id}

Skip SSL Verification (Development Only)

riotx elastic-import my-index --elastic-host https://elasticsearch.example.com:9200 --elastic-user admin --elastic-pass mypassword --elastic-insecure json.set doc:#{_id}

Data Control

Metadata Fields

Use --elastic-meta to include Elasticsearch metadata fields (_id, _index, _score) in the imported data.

riotx elastic-import my-index --elastic-host http://localhost:9200 --elastic-meta json.set doc:#{_id}

Performance Optimization

Pagination Strategy

The Elasticsearch ItemReader uses the search_after API for efficient pagination through large datasets:

Efficient for large datasets: Recommended for >10,000 documents
Consistent ordering: Automatically sorts by _id to ensure reliable pagination
Memory efficient: Doesn’t suffer from deep paging performance issues like from/size

Batch Size Tuning

Control the number of documents fetched per request:

# Small batches for frequent progress updates
riotx elastic-import my-index --elastic-host http://localhost:9200 --elastic-batch 100 json.set doc:#{_id}

# Large batches for maximum throughput
riotx elastic-import my-index --elastic-host http://localhost:9200 --elastic-batch 2000 json.set doc:#{_id}

Batch Size Recommendations

Scenario	Recommended Batch Size
Small documents (<1KB)	1000-2000
Large documents (>10KB)	100-500
Network limited environment	Increase batch size
Memory limited environment	Decrease batch size
Fast progress updates needed	100-500

Scenario

Recommended Batch Size

Small documents (<1KB)

1000-2000

Large documents (>10KB)

100-500

Network limited environment

Increase batch size

Memory limited environment

Decrease batch size

Fast progress updates needed

100-500

Field Order Preservation

The Elasticsearch ItemReader preserves the original field order from JSON documents using LinkedHashMap internally. This ensures that when documents are processed and stored in Redis, the field ordering matches the original document structure from Elasticsearch/OpenSearch.

Error Handling

The reader gracefully handles common scenarios:

Empty indices: Returns no results without errors
Connection failures: Automatic failover with multi-host configurations
Authentication errors: Clear error messages for troubleshooting
SSL certificate issues: Configurable verification for different environments

Best Practices

Production Deployments

Use multi-host configuration for high availability
Enable proper SSL certificate verification
Use appropriate batch sizes based on document size and network capacity
Monitor memory usage during large imports
Test authentication in staging environments

Development Environments

Use single host for simplicity
Skip SSL verification if using self-signed certificates
Use smaller batch sizes for faster iteration
Include metadata fields for debugging

Data Migration

Test with small subsets before full migration
Monitor progress using appropriate batch sizes
Verify field ordering if document structure is important
Plan for downtime during large migrations