Elasticsearch

RIOT-X provides comprehensive support for importing data from both Elasticsearch and OpenSearch clusters using the elastic-import command.

Overview

The Elasticsearch integration uses efficient pagination techniques and provides extensive configuration options for production environments.

Key Features

  • Efficient Pagination: Uses search_after API for optimal performance with large datasets

  • Metadata Control: Configure inclusion/exclusion of Elasticsearch metadata fields

  • Multi-Host Support: Connect to clustered environments with failover

  • Authentication: HTTP Basic Authentication support

  • SSL Configuration: Production-ready SSL/TLS support

Command Syntax

riotx elastic-import <index> [OPTIONS] <redis-command> [REDIS-COMMAND-OPTIONS]

Basic Examples

Import from Elasticsearch

riotx elastic-import my-index --elastic-host http://localhost:9200 json.set doc:#{_id}

Import from OpenSearch

riotx elastic-import my-index --opensearch --elastic-host http://localhost:9200 json.set doc:#{_id}

Configuration Options

Connection Settings

Single Host

riotx elastic-import my-index --elastic-host http://localhost:9200 json.set doc:#{_id}

Multi-Host Clusters

riotx elastic-import my-index --elastic-host https://node1.example.com:9200 https://node2.example.com:9200 https://node3.example.com:9200 json.set doc:#{_id}

Authentication

HTTP Basic Authentication

riotx elastic-import my-index --elastic-host https://elasticsearch.example.com:9200 --elastic-user admin --elastic-pass mypassword json.set doc:#{_id}

SSL Configuration

Production SSL

riotx elastic-import my-index --elastic-host https://elasticsearch.example.com:9200 --elastic-user admin --elastic-pass mypassword json.set doc:#{_id}

Skip SSL Verification (Development Only)

riotx elastic-import my-index --elastic-host https://elasticsearch.example.com:9200 --elastic-user admin --elastic-pass mypassword --elastic-insecure json.set doc:#{_id}

Data Control

Metadata Fields

Use --elastic-meta to include Elasticsearch metadata fields (_id, _index, _score) in the imported data.

riotx elastic-import my-index --elastic-host http://localhost:9200 --elastic-meta json.set doc:#{_id}

Performance Optimization

Pagination Strategy

The Elasticsearch ItemReader uses the search_after API for efficient pagination through large datasets:

  • Efficient for large datasets: Recommended for >10,000 documents

  • Consistent ordering: Automatically sorts by _id to ensure reliable pagination

  • Memory efficient: Doesn’t suffer from deep paging performance issues like from/size

Batch Size Tuning

Control the number of documents fetched per request:

# Small batches for frequent progress updates
riotx elastic-import my-index --elastic-host http://localhost:9200 --elastic-batch 100 json.set doc:#{_id}

# Large batches for maximum throughput
riotx elastic-import my-index --elastic-host http://localhost:9200 --elastic-batch 2000 json.set doc:#{_id}

Batch Size Recommendations

Scenario Recommended Batch Size

Small documents (<1KB)

1000-2000

Large documents (>10KB)

100-500

Network limited environment

Increase batch size

Memory limited environment

Decrease batch size

Fast progress updates needed

100-500

Field Order Preservation

The Elasticsearch ItemReader preserves the original field order from JSON documents using LinkedHashMap internally. This ensures that when documents are processed and stored in Redis, the field ordering matches the original document structure from Elasticsearch/OpenSearch.

Error Handling

The reader gracefully handles common scenarios:

  • Empty indices: Returns no results without errors

  • Connection failures: Automatic failover with multi-host configurations

  • Authentication errors: Clear error messages for troubleshooting

  • SSL certificate issues: Configurable verification for different environments

Best Practices

Production Deployments

  1. Use multi-host configuration for high availability

  2. Enable proper SSL certificate verification

  3. Use appropriate batch sizes based on document size and network capacity

  4. Monitor memory usage during large imports

  5. Test authentication in staging environments

Development Environments

  1. Use single host for simplicity

  2. Skip SSL verification if using self-signed certificates

  3. Use smaller batch sizes for faster iteration

  4. Include metadata fields for debugging

Data Migration

  1. Test with small subsets before full migration

  2. Monitor progress using appropriate batch sizes

  3. Verify field ordering if document structure is important

  4. Plan for downtime during large migrations