Elasticsearch
RIOT-X provides comprehensive support for importing data from both Elasticsearch and OpenSearch clusters using the elastic-import
command.
Overview
The Elasticsearch integration uses efficient pagination techniques and provides extensive configuration options for production environments.
Key Features
-
Efficient Pagination: Uses
search_after
API for optimal performance with large datasets -
Metadata Control: Configure inclusion/exclusion of Elasticsearch metadata fields
-
Multi-Host Support: Connect to clustered environments with failover
-
Authentication: HTTP Basic Authentication support
-
SSL Configuration: Production-ready SSL/TLS support
Configuration Options
Performance Optimization
Pagination Strategy
The Elasticsearch ItemReader uses the search_after
API for efficient pagination through large datasets:
-
Efficient for large datasets: Recommended for >10,000 documents
-
Consistent ordering: Automatically sorts by
_id
to ensure reliable pagination -
Memory efficient: Doesn’t suffer from deep paging performance issues like
from/size
Batch Size Tuning
Control the number of documents fetched per request:
# Small batches for frequent progress updates
riotx elastic-import my-index --elastic-host http://localhost:9200 --elastic-batch 100 json.set doc:#{_id}
# Large batches for maximum throughput
riotx elastic-import my-index --elastic-host http://localhost:9200 --elastic-batch 2000 json.set doc:#{_id}
Error Handling
The reader gracefully handles common scenarios:
-
Empty indices: Returns no results without errors
-
Connection failures: Automatic failover with multi-host configurations
-
Authentication errors: Clear error messages for troubleshooting
-
SSL certificate issues: Configurable verification for different environments
Best Practices
Production Deployments
-
Use multi-host configuration for high availability
-
Enable proper SSL certificate verification
-
Use appropriate batch sizes based on document size and network capacity
-
Monitor memory usage during large imports
-
Test authentication in staging environments