Version current

Hash vs JSON Storage

RedisVL supports two storage types for your data: Redis Hash and Redis JSON. Both storage options offer a variety of features and tradeoffs. Understanding the differences helps you choose the right option for your use case.

Storage Types Overview

Aspect Hash JSON

Structure

Flat key-value pairs

Nested documents with hierarchy

Performance

Faster, less overhead

Slightly slower, more features

Memory

More efficient

Uses more memory

Vectors

Stored as byte arrays

Stored as float arrays

Nested Data

Not supported

Fully supported

Query Syntax

Field names directly

JSONPath with $. prefix

Best For

Simple, flat data; performance-critical

Complex, nested data; flexibility

Hash Storage

Hashes in Redis are simple collections of field-value pairs. Think of it like a mutable single-level dictionary.

Hashes are best suited for use cases with the following characteristics: - Performance (speed) and storage space (memory consumption) are top concerns - Data can be easily normalized and modeled as a single-level dict

Hashes are typically the default recommendation.

Schema Definition

index:
  name: user-hash-index
  prefix: user
  storage_type: hash  # ← Hash storage

fields:
  - name: name
    type: tag
  - name: age
    type: numeric
  - name: email
    type: text
  - name: embedding
    type: vector
    attrs:
      dims: 384
      distance_metric: cosine
      algorithm: flat
      datatype: float32

Data Format

// Data for Hash storage
Map<String, Object> user = Map.of(
    "name", "john",
    "age", 25,
    "email", "john@example.com",
    "embedding", new float[]{0.1f, 0.2f, 0.3f}  // Float array
);

// Load into Hash index
List<String> keys = index.load(List.of(user));
Vectors in Hash storage must be converted to byte arrays internally by RedisVL.

Advantages

  1. Performance - Faster read/write operations

  2. Memory Efficiency - Lower memory footprint

  3. Simplicity - Direct field access without paths

  4. Atomicity - Field-level atomic operations

Limitations

  1. Flat Structure - No nested objects

  2. No Arrays - Cannot store arrays (except vectors)

  3. Limited Types - String, numeric, and byte arrays only

  4. No Partial Updates - Must update entire fields

Use Cases

  • User profiles with flat attributes

  • Product catalogs without nested data

  • High-performance caching

  • Simple key-value lookups with vectors

JSON Storage

JSON storage uses Redis JSON (RedisJSON module) for hierarchical documents.

JSON is best suited for use cases with the following characteristics: - Ease of use and data model flexibility are top concerns - Application data is already native JSON - Replacing another document storage/db solution

Schema Definition

index:
  name: product-json-index
  prefix: product
  storage_type: json  # ← JSON storage

fields:
  - name: $.name
    type: text
  - name: $.category
    type: tag
  - name: $.price
    type: numeric
  - name: $.specs.weight
    type: numeric
    path: $.specs.weight  # Nested field
  - name: $.embedding
    type: vector
    attrs:
      dims: 384
      distance_metric: cosine
      algorithm: flat
      datatype: float32
JSON fields use JSONPath notation with $. prefix.

Data Format

// Data for JSON storage - supports nesting
Map<String, Object> product = Map.of(
    "name", "Laptop",
    "category", "electronics",
    "price", 899.99,
    "specs", Map.of(
        "weight", 1.5,
        "dimensions", Map.of(
            "width", 30,
            "height", 2,
            "depth", 20
        )
    ),
    "tags", List.of("portable", "powerful", "gaming"),
    "embedding", new float[]{0.1f, 0.2f, 0.3f}
);

// Load into JSON index
List<String> keys = index.load(List.of(product));

Advantages

  1. Nested Data - Full support for hierarchical documents

  2. Arrays - Store and query arrays

  3. Partial Updates - Update specific paths without reading entire document

  4. Flexibility - Complex data models

  5. Rich Queries - JSONPath queries

Limitations

  1. Memory - Uses more memory than Hash

  2. Performance - Slightly slower than Hash

  3. Complexity - Requires understanding JSONPath

Use Cases

  • E-commerce products with specifications

  • User profiles with nested preferences

  • Documents with metadata

  • Complex data models

Side-by-Side Comparison

Example: User Profile

Hash Storage

// Hash schema - flat structure
Map<String, Object> schema = Map.of(
    "index", Map.of(
        "name", "users-hash",
        "storage_type", "hash"
    ),
    "fields", List.of(
        Map.of("name", "name", "type", "text"),
        Map.of("name", "age", "type", "numeric"),
        Map.of("name", "city", "type", "tag"),
        Map.of("name", "embedding", "type", "vector",
               "attrs", Map.of("dims", 128, "distance_metric", "cosine"))
    )
);

// Flat data
Map<String, Object> user = Map.of(
    "name", "Alice",
    "age", 28,
    "city", "San Francisco",
    "embedding", new float[128]
);

JSON Storage

// JSON schema - nested structure
Map<String, Object> schema = Map.of(
    "index", Map.of(
        "name", "users-json",
        "storage_type", "json"
    ),
    "fields", List.of(
        Map.of("name", "$.name", "type", "text"),
        Map.of("name", "$.age", "type", "numeric"),
        Map.of("name", "$.location.city", "type", "tag"),
        Map.of("name", "$.preferences.topics", "type", "tag"),
        Map.of("name", "$.embedding", "type", "vector",
               "attrs", Map.of("dims", 128, "distance_metric", "cosine"))
    )
);

// Nested data
Map<String, Object> user = Map.of(
    "name", "Alice",
    "age", 28,
    "location", Map.of(
        "city", "San Francisco",
        "state", "CA",
        "country", "USA"
    ),
    "preferences", Map.of(
        "topics", List.of("AI", "databases", "cloud"),
        "notifications", true
    ),
    "embedding", new float[128]
);

Querying

Hash Queries

import com.redis.vl.query.Filter;

// Direct field names
Filter filter = Filter.and(
    Filter.tag("city", "San Francisco"),
    Filter.numeric("age").between(25, 35)
);

VectorQuery query = VectorQuery.builder()
    .vector(queryVector)
    .field("embedding")
    .withPreFilter(filter.build())
    .returnFields("name", "age", "city")  // Direct field names
    .build();

JSON Queries

// JSONPath syntax
Filter filter = Filter.and(
    Filter.tag("$.location.city", "San Francisco"),
    Filter.numeric("$.age").between(25, 35),
    Filter.tag("$.preferences.topics", "AI")
);

VectorQuery query = VectorQuery.builder()
    .vector(queryVector)
    .field("embedding")
    .withPreFilter(filter.build())
    .returnFields("$.name", "$.age", "$.location", "$.preferences")
    .build();

Performance Comparison

Based on Redis benchmarks:

Operation Hash JSON

Write (single)

~100k ops/s

~80k ops/s

Read (single)

~120k ops/s

~100k ops/s

Update (single field)

~100k ops/s

~90k ops/s

Vector search

Similar

Similar

Memory (1M docs)

~500 MB

~650 MB

Performance varies based on document size and complexity. Hash is generally 10-20% faster.

Migration Between Storage Types

You can migrate data between storage types:

public class StorageMigration {
    public void migrateHashToJson(
        SearchIndex hashIndex,
        SearchIndex jsonIndex
    ) {
        // Fetch all documents from Hash
        // (implement pagination for large datasets)
        List<Map<String, Object>> docs = hashIndex.fetchAll();

        // Transform if needed (e.g., add nesting)
        List<Map<String, Object>> transformedDocs = docs.stream()
            .map(this::transformToNested)
            .collect(Collectors.toList());

        // Load into JSON index
        jsonIndex.load(transformedDocs);
    }

    private Map<String, Object> transformToNested(
        Map<String, Object> flatDoc
    ) {
        // Transform flat structure to nested
        return Map.of(
            "name", flatDoc.get("name"),
            "age", flatDoc.get("age"),
            "location", Map.of(
                "city", flatDoc.get("city")
            ),
            "embedding", flatDoc.get("embedding")
        );
    }
}

Best Practices

  1. Start Simple

    Begin with Hash storage unless you need nesting:

    // Start with Hash
    storage_type: hash
    
    // Migrate to JSON if needed later
    storage_type: json
  2. Consider Memory

    For large datasets, Hash storage saves significant memory:

    // 1 million documents
    // Hash: ~500 MB
    // JSON: ~650 MB
    // Savings: ~150 MB (23%)
  3. Evaluate Query Complexity

    If you need nested queries, use JSON:

    // Complex nested query - JSON only
    Filter.and(
        Filter.tag("$.specs.features.ai", "true"),
        Filter.numeric("$.specs.performance.score").gt(80)
    )
  4. Benchmark Your Use Case

    // Measure performance for your data
    long start = System.currentTimeMillis();
    index.load(testData);
    long duration = System.currentTimeMillis() - start;
    System.out.println("Load time: " + duration + "ms");
  5. Plan for Growth

    • Hash: Better for scaling to millions of simple documents

    • JSON: Better for evolving, complex schemas

Decision Tree

Do you need nested data?
├─ No → Use Hash
│        ├─ Flat user profiles
│        ├─ Simple product catalogs
│        └─ Key-value with vectors
└─ Yes → Use JSON
         ├─ E-commerce products
         ├─ User preferences
         └─ Complex documents

Complete Examples

Hash Example

// Schema
IndexSchema hashSchema = IndexSchema.fromYaml("hash-schema.yaml");

// Create index
SearchIndex hashIndex = new SearchIndex(hashSchema, jedis);
hashIndex.create(true);

// Load flat data
List<Map<String, Object>> users = List.of(
    Map.of("name", "Alice", "age", 28, "city", "SF",
           "embedding", embeddings.get(0)),
    Map.of("name", "Bob", "age", 32, "city", "NYC",
           "embedding", embeddings.get(1))
);
hashIndex.load(users);

// Query
VectorQuery query = VectorQuery.builder()
    .vector(queryVector)
    .field("embedding")
    .withPreFilter(Filter.tag("city", "SF").build())
    .build();

JSON Example

// Schema
IndexSchema jsonSchema = IndexSchema.fromYaml("json-schema.yaml");

// Create index
SearchIndex jsonIndex = new SearchIndex(jsonSchema, jedis);
jsonIndex.create(true);

// Load nested data
List<Map<String, Object>> products = List.of(
    Map.of(
        "name", "Laptop",
        "specs", Map.of(
            "cpu", "Intel i7",
            "ram", 16,
            "storage", Map.of("type", "SSD", "size", 512)
        ),
        "embedding", embeddings.get(0)
    )
);
jsonIndex.load(products);

// Query nested fields
VectorQuery query = VectorQuery.builder()
    .vector(queryVector)
    .field("embedding")
    .withPreFilter(
        Filter.numeric("$.specs.ram").gte(16).build()
    )
    .build();

Next Steps