Skip to content

Blog

MongoDB Bulk Operations and Performance Optimization: High-Throughput Data Processing with Advanced Batch Operations and SQL-Style Bulk Operations

Modern applications frequently need to process large volumes of data efficiently, requiring sophisticated approaches to batch operations that can maintain high throughput while ensuring data consistency and optimal resource utilization. Traditional single-document operations become a significant bottleneck when dealing with thousands or millions of records, leading to performance degradation and resource exhaustion.

MongoDB's bulk operations provide powerful batch processing capabilities that can dramatically improve throughput for high-volume data operations. Unlike traditional databases that require complex orchestration for batch processing, MongoDB's bulk operations offer native support for optimized batch writes, reads, and updates with built-in error handling and performance optimization.

The Traditional High-Volume Data Processing Challenge

Conventional approaches to processing large datasets suffer from significant performance and scalability limitations:

-- Traditional PostgreSQL bulk processing - inefficient and resource-intensive

-- Standard approach using individual INSERT statements
DO $$
DECLARE
    record_data RECORD;
    batch_size INTEGER := 1000;
    processed_count INTEGER := 0;
    total_records INTEGER;
BEGIN
    -- Count total records to process
    SELECT COUNT(*) INTO total_records FROM staging_data;

    -- Process records one by one (inefficient)
    FOR record_data IN 
        SELECT id, user_id, transaction_amount, transaction_date, metadata 
        FROM staging_data 
        ORDER BY id
    LOOP
        -- Individual INSERT - causes overhead per operation
        INSERT INTO user_transactions (
            user_id,
            amount,
            transaction_date,
            metadata,
            processed_at
        ) VALUES (
            record_data.user_id,
            record_data.transaction_amount,
            record_data.transaction_date,
            record_data.metadata,
            CURRENT_TIMESTAMP
        );

        processed_count := processed_count + 1;

        -- Commit every batch_size records to avoid long transactions
        IF processed_count % batch_size = 0 THEN
            COMMIT;
            RAISE NOTICE 'Processed % of % records', processed_count, total_records;
        END IF;
    END LOOP;

    -- Final commit
    COMMIT;
    RAISE NOTICE 'Completed processing % total records', total_records;

EXCEPTION 
    WHEN OTHERS THEN
        ROLLBACK;
        RAISE NOTICE 'Error processing bulk data: %', SQLERRM;
END $$;

-- Problems with traditional single-record approach:
-- 1. Extremely high overhead - one network round-trip per operation
-- 2. Poor throughput - typically 1,000-5,000 operations/second maximum
-- 3. Resource exhaustion - excessive connection and memory usage
-- 4. Limited error handling - single failure can abort entire batch
-- 5. Lock contention - frequent commits cause index lock overhead
-- 6. No optimization for similar operations
-- 7. Difficult progress tracking and resume capability
-- 8. Poor CPU and I/O efficiency due to constant context switching

-- Attempt at bulk INSERT (better but still limited)
INSERT INTO user_transactions (user_id, amount, transaction_date, metadata, processed_at)
SELECT 
    user_id,
    transaction_amount,
    transaction_date,
    metadata,
    CURRENT_TIMESTAMP
FROM staging_data;

-- Issues with basic bulk INSERT:
-- - All-or-nothing behavior - single bad record fails entire batch
-- - No granular error reporting
-- - Limited to INSERT operations only
-- - Difficult to handle conflicts or duplicates
-- - No support for conditional operations
-- - Memory constraints with very large datasets
-- - Poor performance with complex transformations

-- MySQL limitations (even more restrictive)
INSERT INTO user_transactions (user_id, amount, transaction_date, metadata)
VALUES 
    (1001, 150.00, '2024-01-15', '{"category": "purchase"}'),
    (1002, 75.50, '2024-01-15', '{"category": "refund"}'),
    (1003, 200.00, '2024-01-15', '{"category": "purchase"}');
    -- Repeat for potentially millions of records...

-- MySQL bulk processing problems:
-- - Maximum query size limitations
-- - Poor error handling for mixed success/failure scenarios  
-- - Limited transaction size support
-- - No built-in upsert capabilities
-- - Difficult conflict resolution
-- - Poor performance with large batch sizes
-- - Memory exhaustion with complex operations

MongoDB provides comprehensive bulk operation capabilities:

// MongoDB Bulk Operations - high-performance batch processing
const { MongoClient } = require('mongodb');

const client = new MongoClient('mongodb://localhost:27017');
const db = client.db('high_performance_db');

// Advanced bulk operations system for high-throughput data processing
class HighThroughputBulkProcessor {
  constructor(db, config = {}) {
    this.db = db;
    this.config = {
      batchSize: config.batchSize || 10000,
      maxParallelBatches: config.maxParallelBatches || 5,
      retryAttempts: config.retryAttempts || 3,
      retryDelay: config.retryDelay || 1000,
      enableMetrics: config.enableMetrics !== false,
      ...config
    };

    this.metrics = {
      totalOperations: 0,
      successfulOperations: 0,
      failedOperations: 0,
      batchesProcessed: 0,
      processingTime: 0,
      throughputHistory: []
    };

    this.collections = new Map();
    this.activeOperations = new Set();
  }

  async setupOptimizedCollections() {
    console.log('Setting up collections for high-performance bulk operations...');

    const collectionConfigs = [
      {
        name: 'user_transactions',
        indexes: [
          { key: { userId: 1, transactionDate: -1 } },
          { key: { transactionType: 1, status: 1 } },
          { key: { amount: 1 } },
          { key: { createdAt: -1 } }
        ],
        options: { 
          writeConcern: { w: 'majority', j: true },
          readConcern: { level: 'majority' }
        }
      },
      {
        name: 'user_profiles',
        indexes: [
          { key: { email: 1 }, unique: true },
          { key: { userId: 1 }, unique: true },
          { key: { lastLoginDate: -1 } },
          { key: { 'preferences.category': 1 } }
        ]
      },
      {
        name: 'product_analytics',
        indexes: [
          { key: { productId: 1, eventDate: -1 } },
          { key: { eventType: 1, processed: 1 } },
          { key: { userId: 1, eventDate: -1 } }
        ]
      },
      {
        name: 'audit_logs',
        indexes: [
          { key: { entityId: 1, timestamp: -1 } },
          { key: { action: 1, timestamp: -1 } },
          { key: { userId: 1, timestamp: -1 } }
        ]
      }
    ];

    for (const config of collectionConfigs) {
      const collection = this.db.collection(config.name);
      this.collections.set(config.name, collection);

      // Create indexes for optimal bulk operation performance
      for (const index of config.indexes) {
        try {
          await collection.createIndex(index.key, index.unique ? { unique: true } : {});
          console.log(`Created index on ${config.name}:`, index.key);
        } catch (error) {
          console.warn(`Index creation warning for ${config.name}:`, error.message);
        }
      }
    }

    console.log(`Initialized ${collectionConfigs.length} collections for bulk operations`);
    return this.collections;
  }

  async bulkInsertOptimized(collectionName, documents, options = {}) {
    console.log(`Starting bulk insert for ${documents.length} documents in ${collectionName}...`);
    const startTime = Date.now();

    const collection = this.collections.get(collectionName);
    if (!collection) {
      throw new Error(`Collection ${collectionName} not found. Initialize collections first.`);
    }

    const {
      batchSize = this.config.batchSize,
      ordered = false,
      writeConcern = { w: 'majority', j: true },
      enableValidation = true,
      enableProgressReporting = true
    } = options;

    let totalInserted = 0;
    let totalErrors = 0;
    const results = [];
    const errors = [];

    // Process documents in optimized batches
    for (let i = 0; i < documents.length; i += batchSize) {
      const batch = documents.slice(i, i + batchSize);
      const batchNumber = Math.floor(i / batchSize) + 1;

      try {
        // Validate documents if enabled
        if (enableValidation) {
          await this.validateDocumentBatch(batch, collectionName);
        }

        // Execute bulk insert with optimized settings
        const result = await collection.insertMany(batch, {
          ordered: ordered,
          writeConcern: writeConcern,
          bypassDocumentValidation: !enableValidation
        });

        totalInserted += result.insertedCount;
        results.push({
          batchNumber,
          insertedCount: result.insertedCount,
          insertedIds: result.insertedIds
        });

        // Progress reporting
        if (enableProgressReporting) {
          const progress = ((i + batch.length) / documents.length * 100).toFixed(1);
          const throughput = Math.round(totalInserted / ((Date.now() - startTime) / 1000));
          console.log(`Batch ${batchNumber}: ${result.insertedCount} inserted, ${progress}% complete, ${throughput} docs/sec`);
        }

      } catch (error) {
        totalErrors++;
        console.error(`Batch ${batchNumber} failed:`, error.message);

        // Handle partial failures in unordered mode
        if (!ordered && error.result) {
          const partialInserted = error.result.nInserted || 0;
          totalInserted += partialInserted;

          errors.push({
            batchNumber,
            error: error.message,
            insertedCount: partialInserted,
            failedOperations: error.writeErrors || []
          });
        } else {
          errors.push({
            batchNumber,
            error: error.message,
            insertedCount: 0
          });
        }
      }
    }

    const totalTime = Date.now() - startTime;
    const throughput = Math.round(totalInserted / (totalTime / 1000));

    // Update metrics
    this.updateMetrics('insert', totalInserted, totalErrors, totalTime);

    const summary = {
      success: totalErrors === 0,
      totalDocuments: documents.length,
      totalInserted: totalInserted,
      totalBatches: Math.ceil(documents.length / batchSize),
      failedBatches: totalErrors,
      executionTime: totalTime,
      throughput: throughput,
      results: results,
      errors: errors.length > 0 ? errors : undefined
    };

    console.log(`Bulk insert completed: ${totalInserted}/${documents.length} documents in ${totalTime}ms (${throughput} docs/sec)`);
    return summary;
  }

  async bulkUpdateOptimized(collectionName, updateOperations, options = {}) {
    console.log(`Starting bulk update for ${updateOperations.length} operations in ${collectionName}...`);
    const startTime = Date.now();

    const collection = this.collections.get(collectionName);
    if (!collection) {
      throw new Error(`Collection ${collectionName} not found`);
    }

    const {
      batchSize = this.config.batchSize,
      ordered = false,
      writeConcern = { w: 'majority', j: true }
    } = options;

    let totalModified = 0;
    let totalMatched = 0;
    let totalUpserted = 0;
    const results = [];
    const errors = [];

    // Process operations in batches
    for (let i = 0; i < updateOperations.length; i += batchSize) {
      const batch = updateOperations.slice(i, i + batchSize);
      const batchNumber = Math.floor(i / batchSize) + 1;

      try {
        // Build bulk update operation
        const bulkOp = collection.initializeUnorderedBulkOp();

        for (const operation of batch) {
          const { filter, update, options: opOptions = {} } = operation;

          if (opOptions.upsert) {
            bulkOp.find(filter).upsert().updateOne(update);
          } else if (operation.type === 'updateMany') {
            bulkOp.find(filter).update(update);
          } else {
            bulkOp.find(filter).updateOne(update);
          }
        }

        // Execute bulk operation
        const result = await bulkOp.execute({ writeConcern });

        totalModified += result.modifiedCount || 0;
        totalMatched += result.matchedCount || 0;  
        totalUpserted += result.upsertedCount || 0;

        results.push({
          batchNumber,
          matchedCount: result.matchedCount,
          modifiedCount: result.modifiedCount,
          upsertedCount: result.upsertedCount,
          upsertedIds: result.upsertedIds
        });

        const progress = ((i + batch.length) / updateOperations.length * 100).toFixed(1);
        console.log(`Update batch ${batchNumber}: ${result.modifiedCount} modified, ${progress}% complete`);

      } catch (error) {
        console.error(`Update batch ${batchNumber} failed:`, error.message);

        // Handle partial results from bulk write errors
        if (error.result) {
          totalModified += error.result.nModified || 0;
          totalMatched += error.result.nMatched || 0;
          totalUpserted += error.result.nUpserted || 0;
        }

        errors.push({
          batchNumber,
          error: error.message,
          writeErrors: error.writeErrors || []
        });
      }
    }

    const totalTime = Date.now() - startTime;
    const throughput = Math.round(totalModified / (totalTime / 1000));

    this.updateMetrics('update', totalModified, errors.length, totalTime);

    return {
      success: errors.length === 0,
      totalOperations: updateOperations.length,
      totalMatched: totalMatched,
      totalModified: totalModified, 
      totalUpserted: totalUpserted,
      executionTime: totalTime,
      throughput: throughput,
      results: results,
      errors: errors.length > 0 ? errors : undefined
    };
  }

  async bulkUpsertOptimized(collectionName, upsertOperations, options = {}) {
    console.log(`Starting bulk upsert for ${upsertOperations.length} operations in ${collectionName}...`);

    const {
      batchSize = this.config.batchSize,
      enableDeduplication = true
    } = options;

    // Deduplicate operations based on filter if enabled
    let processedOperations = upsertOperations;
    if (enableDeduplication) {
      processedOperations = this.deduplicateUpsertOperations(upsertOperations);
      console.log(`Deduplicated ${upsertOperations.length} operations to ${processedOperations.length}`);
    }

    // Convert upsert operations to bulk update operations
    const updateOperations = processedOperations.map(op => ({
      filter: op.filter,
      update: op.update,
      options: { upsert: true }
    }));

    return await this.bulkUpdateOptimized(collectionName, updateOperations, {
      ...options,
      batchSize
    });
  }

  async bulkDeleteOptimized(collectionName, deleteFilters, options = {}) {
    console.log(`Starting bulk delete for ${deleteFilters.length} operations in ${collectionName}...`);
    const startTime = Date.now();

    const collection = this.collections.get(collectionName);
    if (!collection) {
      throw new Error(`Collection ${collectionName} not found`);
    }

    const {
      batchSize = this.config.batchSize,
      deleteMany = false,
      writeConcern = { w: 'majority', j: true }
    } = options;

    let totalDeleted = 0;
    const results = [];
    const errors = [];

    for (let i = 0; i < deleteFilters.length; i += batchSize) {
      const batch = deleteFilters.slice(i, i + batchSize);
      const batchNumber = Math.floor(i / batchSize) + 1;

      try {
        const bulkOp = collection.initializeUnorderedBulkOp();

        for (const filter of batch) {
          if (deleteMany) {
            bulkOp.find(filter).delete();
          } else {
            bulkOp.find(filter).deleteOne();
          }
        }

        const result = await bulkOp.execute({ writeConcern });
        const deletedCount = result.deletedCount || 0;

        totalDeleted += deletedCount;
        results.push({
          batchNumber,
          deletedCount
        });

        const progress = ((i + batch.length) / deleteFilters.length * 100).toFixed(1);
        console.log(`Delete batch ${batchNumber}: ${deletedCount} deleted, ${progress}% complete`);

      } catch (error) {
        console.error(`Delete batch ${batchNumber} failed:`, error.message);
        errors.push({
          batchNumber,
          error: error.message
        });
      }
    }

    const totalTime = Date.now() - startTime;
    const throughput = Math.round(totalDeleted / (totalTime / 1000));

    this.updateMetrics('delete', totalDeleted, errors.length, totalTime);

    return {
      success: errors.length === 0,
      totalOperations: deleteFilters.length,
      totalDeleted: totalDeleted,
      executionTime: totalTime,
      throughput: throughput,
      results: results,
      errors: errors.length > 0 ? errors : undefined
    };
  }

  async mixedBulkOperations(collectionName, operations, options = {}) {
    console.log(`Starting mixed bulk operations: ${operations.length} operations in ${collectionName}...`);
    const startTime = Date.now();

    const collection = this.collections.get(collectionName);
    const {
      batchSize = this.config.batchSize,
      writeConcern = { w: 'majority', j: true }
    } = options;

    let totalInserted = 0;
    let totalModified = 0;
    let totalDeleted = 0;
    let totalUpserted = 0;
    const results = [];
    const errors = [];

    for (let i = 0; i < operations.length; i += batchSize) {
      const batch = operations.slice(i, i + batchSize);
      const batchNumber = Math.floor(i / batchSize) + 1;

      try {
        const bulkOp = collection.initializeUnorderedBulkOp();

        for (const operation of batch) {
          switch (operation.type) {
            case 'insert':
              bulkOp.insert(operation.document);
              break;

            case 'update':
              bulkOp.find(operation.filter).updateOne(operation.update);
              break;

            case 'updateMany':
              bulkOp.find(operation.filter).update(operation.update);
              break;

            case 'upsert':
              bulkOp.find(operation.filter).upsert().updateOne(operation.update);
              break;

            case 'delete':
              bulkOp.find(operation.filter).deleteOne();
              break;

            case 'deleteMany':
              bulkOp.find(operation.filter).delete();
              break;

            case 'replace':
              bulkOp.find(operation.filter).replaceOne(operation.replacement);
              break;

            default:
              throw new Error(`Unknown operation type: ${operation.type}`);
          }
        }

        const result = await bulkOp.execute({ writeConcern });

        totalInserted += result.insertedCount || 0;
        totalModified += result.modifiedCount || 0;
        totalDeleted += result.deletedCount || 0;
        totalUpserted += result.upsertedCount || 0;

        results.push({
          batchNumber,
          insertedCount: result.insertedCount,
          modifiedCount: result.modifiedCount,
          deletedCount: result.deletedCount,
          upsertedCount: result.upsertedCount,
          matchedCount: result.matchedCount
        });

        const progress = ((i + batch.length) / operations.length * 100).toFixed(1);
        console.log(`Mixed batch ${batchNumber}: ${batch.length} operations completed, ${progress}% complete`);

      } catch (error) {
        console.error(`Mixed batch ${batchNumber} failed:`, error.message);
        errors.push({
          batchNumber,
          error: error.message,
          writeErrors: error.writeErrors || []
        });
      }
    }

    const totalTime = Date.now() - startTime;
    const totalOperations = totalInserted + totalModified + totalDeleted;
    const throughput = Math.round(totalOperations / (totalTime / 1000));

    this.updateMetrics('mixed', totalOperations, errors.length, totalTime);

    return {
      success: errors.length === 0,
      totalOperations: operations.length,
      executionSummary: {
        inserted: totalInserted,
        modified: totalModified,
        deleted: totalDeleted,
        upserted: totalUpserted
      },
      executionTime: totalTime,
      throughput: throughput,
      results: results,
      errors: errors.length > 0 ? errors : undefined
    };
  }

  async parallelBulkProcessing(collectionName, operationBatches, operationType = 'insert') {
    console.log(`Starting parallel bulk processing: ${operationBatches.length} batches in ${collectionName}...`);
    const startTime = Date.now();

    const maxParallel = Math.min(this.config.maxParallelBatches, operationBatches.length);
    const results = [];
    const errors = [];

    // Process batches in parallel with controlled concurrency
    const processBatch = async (batch, batchIndex) => {
      try {
        let result;
        switch (operationType) {
          case 'insert':
            result = await this.bulkInsertOptimized(collectionName, batch, {
              enableProgressReporting: false
            });
            break;
          case 'update':
            result = await this.bulkUpdateOptimized(collectionName, batch, {});
            break;
          case 'delete':
            result = await this.bulkDeleteOptimized(collectionName, batch, {});
            break;
          default:
            throw new Error(`Unsupported parallel operation type: ${operationType}`);
        }

        return { batchIndex, result, success: true };
      } catch (error) {
        console.error(`Parallel batch ${batchIndex} failed:`, error.message);
        return { batchIndex, error: error.message, success: false };
      }
    };

    // Execute batches with controlled parallelism
    for (let i = 0; i < operationBatches.length; i += maxParallel) {
      const parallelBatch = operationBatches.slice(i, i + maxParallel);
      const promises = parallelBatch.map((batch, index) => processBatch(batch, i + index));

      const batchResults = await Promise.all(promises);

      for (const batchResult of batchResults) {
        if (batchResult.success) {
          results.push(batchResult.result);
        } else {
          errors.push(batchResult);
        }
      }

      const progress = ((i + parallelBatch.length) / operationBatches.length * 100).toFixed(1);
      console.log(`Parallel processing: ${progress}% complete`);
    }

    const totalTime = Date.now() - startTime;

    // Aggregate results
    const aggregatedResult = {
      success: errors.length === 0,
      totalBatches: operationBatches.length,
      successfulBatches: results.length,
      failedBatches: errors.length,
      executionTime: totalTime,
      results: results,
      errors: errors.length > 0 ? errors : undefined
    };

    // Calculate total operations processed
    let totalOperations = 0;
    for (const result of results) {
      totalOperations += result.totalInserted || result.totalModified || result.totalDeleted || 0;
    }

    aggregatedResult.totalOperations = totalOperations;
    aggregatedResult.throughput = Math.round(totalOperations / (totalTime / 1000));

    console.log(`Parallel bulk processing completed: ${totalOperations} operations in ${totalTime}ms (${aggregatedResult.throughput} ops/sec)`);
    return aggregatedResult;
  }

  async validateDocumentBatch(documents, collectionName) {
    // Basic document validation
    const requiredFields = this.getRequiredFields(collectionName);

    for (const doc of documents) {
      for (const field of requiredFields) {
        if (doc[field] === undefined || doc[field] === null) {
          throw new Error(`Required field '${field}' missing in document`);
        }
      }
    }

    return true;
  }

  getRequiredFields(collectionName) {
    const fieldMap = {
      'user_transactions': ['userId', 'amount', 'transactionType'],
      'user_profiles': ['userId', 'email'],
      'product_analytics': ['productId', 'eventType', 'eventDate'],
      'audit_logs': ['entityId', 'action', 'timestamp']
    };

    return fieldMap[collectionName] || [];
  }

  deduplicateUpsertOperations(operations) {
    const seen = new Map();
    const deduplicated = [];

    for (const operation of operations) {
      const filterKey = JSON.stringify(operation.filter);
      if (!seen.has(filterKey)) {
        seen.set(filterKey, true);
        deduplicated.push(operation);
      }
    }

    return deduplicated;
  }

  updateMetrics(operationType, successCount, errorCount, executionTime) {
    if (!this.config.enableMetrics) return;

    this.metrics.totalOperations += successCount + errorCount;
    this.metrics.successfulOperations += successCount;
    this.metrics.failedOperations += errorCount;
    this.metrics.batchesProcessed++;
    this.metrics.processingTime += executionTime;

    const throughput = Math.round(successCount / (executionTime / 1000));
    this.metrics.throughputHistory.push({
      timestamp: new Date(),
      operationType,
      throughput,
      successCount,
      errorCount
    });

    // Keep only last 100 throughput measurements
    if (this.metrics.throughputHistory.length > 100) {
      this.metrics.throughputHistory.shift();
    }
  }

  getPerformanceMetrics() {
    const recentThroughput = this.metrics.throughputHistory.slice(-10);
    const avgThroughput = recentThroughput.length > 0 
      ? Math.round(recentThroughput.reduce((sum, t) => sum + t.throughput, 0) / recentThroughput.length)
      : 0;

    return {
      totalOperations: this.metrics.totalOperations,
      successfulOperations: this.metrics.successfulOperations,
      failedOperations: this.metrics.failedOperations,
      successRate: this.metrics.totalOperations > 0 
        ? ((this.metrics.successfulOperations / this.metrics.totalOperations) * 100).toFixed(2) + '%'
        : '0%',
      batchesProcessed: this.metrics.batchesProcessed,
      totalProcessingTime: this.metrics.processingTime,
      averageThroughput: avgThroughput,
      recentThroughputHistory: recentThroughput
    };
  }

  async shutdown() {
    console.log('Shutting down bulk processor...');

    // Wait for active operations to complete
    if (this.activeOperations.size > 0) {
      console.log(`Waiting for ${this.activeOperations.size} active operations to complete...`);
      await Promise.allSettled(Array.from(this.activeOperations));
    }

    console.log('Bulk processor shutdown complete');
    console.log('Final Performance Metrics:', this.getPerformanceMetrics());
  }
}

// Example usage and demonstration
const demonstrateBulkOperations = async () => {
  try {
    const processor = new HighThroughputBulkProcessor(db, {
      batchSize: 5000,
      maxParallelBatches: 3,
      enableMetrics: true
    });

    // Initialize collections
    await processor.setupOptimizedCollections();

    // Generate sample data for demonstration
    const sampleTransactions = Array.from({ length: 50000 }, (_, index) => ({
      _id: new ObjectId(),
      userId: `user_${Math.floor(Math.random() * 10000)}`,
      amount: Math.round((Math.random() * 1000 + 10) * 100) / 100,
      transactionType: ['purchase', 'refund', 'transfer'][Math.floor(Math.random() * 3)],
      transactionDate: new Date(Date.now() - Math.random() * 30 * 24 * 60 * 60 * 1000),
      metadata: {
        category: ['electronics', 'clothing', 'books', 'food'][Math.floor(Math.random() * 4)],
        channel: ['web', 'mobile', 'api'][Math.floor(Math.random() * 3)]
      },
      createdAt: new Date()
    }));

    // Bulk insert demonstration
    console.log('\n=== Bulk Insert Demonstration ===');
    const insertResult = await processor.bulkInsertOptimized('user_transactions', sampleTransactions);
    console.log('Insert Result:', insertResult);

    // Bulk update demonstration
    console.log('\n=== Bulk Update Demonstration ===');
    const updateOperations = Array.from({ length: 10000 }, (_, index) => ({
      filter: { userId: `user_${index % 1000}` },
      update: { 
        $inc: { totalSpent: Math.round(Math.random() * 100) },
        $set: { lastUpdated: new Date() }
      },
      type: 'updateMany'
    }));

    const updateResult = await processor.bulkUpdateOptimized('user_transactions', updateOperations);
    console.log('Update Result:', updateResult);

    // Mixed operations demonstration
    console.log('\n=== Mixed Operations Demonstration ===');
    const mixedOperations = [
      // Insert operations
      ...Array.from({ length: 1000 }, (_, index) => ({
        type: 'insert',
        document: {
          userId: `new_user_${index}`,
          email: `newuser${index}@example.com`,
          createdAt: new Date()
        }
      })),

      // Update operations
      ...Array.from({ length: 500 }, (_, index) => ({
        type: 'update',
        filter: { userId: `user_${index}` },
        update: { $set: { status: 'active', lastLogin: new Date() } }
      })),

      // Upsert operations
      ...Array.from({ length: 300 }, (_, index) => ({
        type: 'upsert',
        filter: { email: `upsert${index}@example.com` },
        update: { 
          $set: { 
            email: `upsert${index}@example.com`,
            status: 'new'
          },
          $setOnInsert: { createdAt: new Date() }
        }
      }))
    ];

    const mixedResult = await processor.mixedBulkOperations('user_profiles', mixedOperations);
    console.log('Mixed Operations Result:', mixedResult);

    // Performance metrics
    console.log('\n=== Performance Metrics ===');
    console.log(processor.getPerformanceMetrics());

    return processor;

  } catch (error) {
    console.error('Bulk operations demonstration failed:', error);
    throw error;
  }
};

// Benefits of MongoDB Bulk Operations:
// - Dramatically improved throughput (10x-100x faster than individual operations)
// - Reduced network overhead with batch processing
// - Built-in error handling for partial failures
// - Flexible operation mixing (insert, update, delete in same batch)
// - Automatic optimization and connection pooling
// - Support for ordered and unordered operations
// - Native upsert capabilities with conflict resolution
// - Comprehensive result reporting and metrics
// - Memory-efficient processing of large datasets
// - Integration with MongoDB's write concerns and read preferences

module.exports = {
  HighThroughputBulkProcessor,
  demonstrateBulkOperations
};

Understanding MongoDB Bulk Operation Performance Patterns

Advanced Bulk Processing Strategies

Implement sophisticated bulk processing patterns for different high-volume scenarios:

// Advanced bulk processing patterns and optimization strategies
class ProductionBulkProcessor extends HighThroughputBulkProcessor {
  constructor(db, config = {}) {
    super(db, config);
    this.processingQueue = [];
    this.workerPool = [];
    this.compressionEnabled = config.compressionEnabled || false;
    this.retryQueue = [];
    this.deadLetterQueue = [];
  }

  async setupStreamingBulkProcessor() {
    console.log('Setting up streaming bulk processor for continuous data ingestion...');

    const streamProcessor = {
      bufferSize: this.config.streamBufferSize || 1000,
      flushInterval: this.config.streamFlushInterval || 5000, // 5 seconds
      buffer: [],
      lastFlush: Date.now(),
      totalProcessed: 0
    };

    // Streaming insert processor
    const streamingInsert = async (documents, collectionName) => {
      streamProcessor.buffer.push(...documents);

      const shouldFlush = streamProcessor.buffer.length >= streamProcessor.bufferSize ||
                         (Date.now() - streamProcessor.lastFlush) >= streamProcessor.flushInterval;

      if (shouldFlush && streamProcessor.buffer.length > 0) {
        const toProcess = [...streamProcessor.buffer];
        streamProcessor.buffer = [];
        streamProcessor.lastFlush = Date.now();

        try {
          const result = await this.bulkInsertOptimized(collectionName, toProcess, {
            enableProgressReporting: false
          });

          streamProcessor.totalProcessed += result.totalInserted;
          console.log(`Streamed ${result.totalInserted} documents, total: ${streamProcessor.totalProcessed}`);

          return result;
        } catch (error) {
          console.error('Streaming insert error:', error);
          // Add failed documents to retry queue
          this.retryQueue.push(...toProcess);
        }
      }
    };

    // Automatic flushing interval
    const flushInterval = setInterval(async () => {
      if (streamProcessor.buffer.length > 0) {
        await streamingInsert([], 'user_transactions'); // Flush buffer
      }
    }, streamProcessor.flushInterval);

    return {
      streamingInsert,
      getStats: () => ({
        bufferSize: streamProcessor.buffer.length,
        totalProcessed: streamProcessor.totalProcessed,
        lastFlush: streamProcessor.lastFlush
      }),
      shutdown: () => {
        clearInterval(flushInterval);
        return streamingInsert([], 'user_transactions'); // Final flush
      }
    };
  }

  async setupPriorityQueueProcessor() {
    console.log('Setting up priority queue bulk processor...');

    const priorityLevels = {
      CRITICAL: 1,
      HIGH: 2, 
      NORMAL: 3,
      LOW: 4
    };

    const priorityQueues = new Map();
    Object.values(priorityLevels).forEach(level => {
      priorityQueues.set(level, []);
    });

    const processQueue = async () => {
      // Process queues by priority
      for (const [priority, queue] of priorityQueues.entries()) {
        if (queue.length === 0) continue;

        const batchSize = this.calculateDynamicBatchSize(priority, queue.length);
        const batch = queue.splice(0, batchSize);

        if (batch.length > 0) {
          try {
            await this.processPriorityBatch(batch, priority);
          } catch (error) {
            console.error(`Priority ${priority} batch failed:`, error);
            // Re-queue with lower priority or move to retry queue
            this.handlePriorityFailure(batch, priority);
          }
        }
      }
    };

    // Start priority processor
    const processorInterval = setInterval(processQueue, 1000);

    return {
      addToPriorityQueue: (operations, priority = priorityLevels.NORMAL) => {
        if (!priorityQueues.has(priority)) {
          throw new Error(`Invalid priority level: ${priority}`);
        }
        priorityQueues.get(priority).push(...operations);
      },
      getQueueStats: () => {
        const stats = {};
        for (const [priority, queue] of priorityQueues.entries()) {
          stats[`priority_${priority}`] = queue.length;
        }
        return stats;
      },
      shutdown: () => clearInterval(processorInterval)
    };
  }

  calculateDynamicBatchSize(priority, queueLength) {
    // Adjust batch size based on priority and queue length
    const baseBatchSize = this.config.batchSize;

    switch (priority) {
      case 1: // CRITICAL - smaller batches for faster processing
        return Math.min(baseBatchSize / 2, queueLength);
      case 2: // HIGH - normal batch size
        return Math.min(baseBatchSize, queueLength);
      case 3: // NORMAL - larger batches for efficiency
        return Math.min(baseBatchSize * 1.5, queueLength);
      case 4: // LOW - maximum batch size for throughput
        return Math.min(baseBatchSize * 2, queueLength);
      default:
        return Math.min(baseBatchSize, queueLength);
    }
  }

  async setupAdaptiveBatchSizing() {
    console.log('Setting up adaptive batch sizing system...');

    const adaptiveConfig = {
      minBatchSize: 100,
      maxBatchSize: 20000,
      targetLatency: 2000, // 2 seconds
      adjustmentFactor: 0.1,
      performanceHistory: []
    };

    const adjustBatchSize = (currentSize, latency, throughput) => {
      let newSize = currentSize;

      if (latency > adaptiveConfig.targetLatency) {
        // Latency too high - reduce batch size
        newSize = Math.max(
          adaptiveConfig.minBatchSize,
          Math.floor(currentSize * (1 - adaptiveConfig.adjustmentFactor))
        );
      } else if (latency < adaptiveConfig.targetLatency * 0.5) {
        // Latency good - try increasing batch size for better throughput
        newSize = Math.min(
          adaptiveConfig.maxBatchSize,
          Math.floor(currentSize * (1 + adaptiveConfig.adjustmentFactor))
        );
      }

      adaptiveConfig.performanceHistory.push({
        timestamp: new Date(),
        batchSize: currentSize,
        latency,
        throughput,
        newBatchSize: newSize
      });

      // Keep only last 50 measurements
      if (adaptiveConfig.performanceHistory.length > 50) {
        adaptiveConfig.performanceHistory.shift();
      }

      return newSize;
    };

    return {
      getOptimalBatchSize: (operationType, currentLatency, currentThroughput) => {
        return adjustBatchSize(this.config.batchSize, currentLatency, currentThroughput);
      },
      getPerformanceHistory: () => adaptiveConfig.performanceHistory,
      getConfig: () => adaptiveConfig
    };
  }

  async setupCompressedBulkOperations() {
    console.log('Setting up compressed bulk operations...');

    if (!this.compressionEnabled) {
      console.log('Compression not enabled, skipping setup');
      return null;
    }

    const zlib = require('zlib');

    const compressDocuments = async (documents) => {
      const serialized = JSON.stringify(documents);
      return new Promise((resolve, reject) => {
        zlib.gzip(serialized, (error, compressed) => {
          if (error) reject(error);
          else resolve(compressed);
        });
      });
    };

    const decompressDocuments = async (compressed) => {
      return new Promise((resolve, reject) => {
        zlib.gunzip(compressed, (error, decompressed) => {
          if (error) reject(error);
          else resolve(JSON.parse(decompressed.toString()));
        });
      });
    };

    return {
      compressedBulkInsert: async (collectionName, documents) => {
        const startCompress = Date.now();
        const compressed = await compressDocuments(documents);
        const compressTime = Date.now() - startCompress;

        const originalSize = JSON.stringify(documents).length;
        const compressedSize = compressed.length;
        const compressionRatio = (compressedSize / originalSize * 100).toFixed(2);

        console.log(`Compression: ${originalSize} -> ${compressedSize} bytes (${compressionRatio}%) in ${compressTime}ms`);

        // For demo - in practice you'd send compressed data to a queue or storage
        const decompressed = await decompressDocuments(compressed);
        return await this.bulkInsertOptimized(collectionName, decompressed);
      },

      compressionStats: (documents) => {
        const originalSize = JSON.stringify(documents).length;
        return {
          originalSizeBytes: originalSize,
          estimatedCompressionRatio: '60-80%', // Typical JSON compression
          potentialSavings: `${Math.round(originalSize * 0.3)} bytes`
        };
      }
    };
  }

  async setupRetryMechanism() {
    console.log('Setting up bulk operation retry mechanism...');

    const retryConfig = {
      maxRetries: 3,
      backoffMultiplier: 2,
      baseDelay: 1000,
      maxDelay: 30000,
      retryableErrors: [
        'NetworkTimeout',
        'ConnectionPoolClosed',
        'WriteConcernError'
      ]
    };

    const isRetryableError = (error) => {
      return retryConfig.retryableErrors.some(retryableError => 
        error.message.includes(retryableError)
      );
    };

    const calculateDelay = (attempt) => {
      const delay = retryConfig.baseDelay * Math.pow(retryConfig.backoffMultiplier, attempt - 1);
      return Math.min(delay, retryConfig.maxDelay);
    };

    const retryOperation = async (operation, attempt = 1) => {
      try {
        return await operation();
      } catch (error) {
        if (attempt >= retryConfig.maxRetries || !isRetryableError(error)) {
          console.error(`Operation failed after ${attempt} attempts:`, error.message);
          throw error;
        }

        const delay = calculateDelay(attempt);
        console.log(`Retry attempt ${attempt}/${retryConfig.maxRetries} after ${delay}ms delay`);

        await new Promise(resolve => setTimeout(resolve, delay));
        return await retryOperation(operation, attempt + 1);
      }
    };

    return {
      retryBulkOperation: retryOperation,
      isRetryable: isRetryableError,
      getRetryConfig: () => retryConfig
    };
  }

  async setupBulkOperationMonitoring() {
    console.log('Setting up comprehensive bulk operation monitoring...');

    const monitoring = {
      activeOperations: new Map(),
      operationHistory: [],
      alerts: [],
      thresholds: {
        maxLatency: 10000, // 10 seconds
        minThroughput: 1000, // 1000 ops/sec
        maxErrorRate: 0.05 // 5%
      }
    };

    const trackOperation = (operationId, operationType, startTime) => {
      monitoring.activeOperations.set(operationId, {
        type: operationType,
        startTime,
        status: 'running'
      });
    };

    const completeOperation = (operationId, result) => {
      const operation = monitoring.activeOperations.get(operationId);
      if (!operation) return;

      const endTime = Date.now();
      const duration = endTime - operation.startTime;

      const historyEntry = {
        operationId,
        type: operation.type,
        duration,
        success: result.success,
        throughput: result.throughput,
        errorCount: result.errors?.length || 0,
        timestamp: new Date(endTime)
      };

      monitoring.operationHistory.push(historyEntry);
      monitoring.activeOperations.delete(operationId);

      // Keep only last 1000 history entries
      if (monitoring.operationHistory.length > 1000) {
        monitoring.operationHistory.shift();
      }

      // Check for alerts
      this.checkPerformanceAlerts(historyEntry, monitoring);
    };

    return {
      trackOperation,
      completeOperation,
      getActiveOperations: () => Array.from(monitoring.activeOperations.entries()),
      getOperationHistory: () => monitoring.operationHistory,
      getAlerts: () => monitoring.alerts,
      getPerformanceSummary: () => this.generatePerformanceSummary(monitoring)
    };
  }

  checkPerformanceAlerts(operation, monitoring) {
    const alerts = [];

    // Latency alert
    if (operation.duration > monitoring.thresholds.maxLatency) {
      alerts.push({
        type: 'HIGH_LATENCY',
        message: `Operation ${operation.operationId} took ${operation.duration}ms (threshold: ${monitoring.thresholds.maxLatency}ms)`,
        severity: 'warning',
        timestamp: new Date()
      });
    }

    // Throughput alert
    if (operation.throughput < monitoring.thresholds.minThroughput) {
      alerts.push({
        type: 'LOW_THROUGHPUT',
        message: `Operation ${operation.operationId} achieved ${operation.throughput} ops/sec (threshold: ${monitoring.thresholds.minThroughput} ops/sec)`,
        severity: 'warning',
        timestamp: new Date()
      });
    }

    // Error rate alert
    const recentOperations = monitoring.operationHistory.slice(-10);
    const errorRate = recentOperations.reduce((sum, op) => sum + (op.success ? 0 : 1), 0) / recentOperations.length;

    if (errorRate > monitoring.thresholds.maxErrorRate) {
      alerts.push({
        type: 'HIGH_ERROR_RATE',
        message: `Error rate ${(errorRate * 100).toFixed(1)}% exceeds threshold ${monitoring.thresholds.maxErrorRate * 100}%`,
        severity: 'critical',
        timestamp: new Date()
      });
    }

    monitoring.alerts.push(...alerts);

    // Keep only last 100 alerts
    if (monitoring.alerts.length > 100) {
      monitoring.alerts.splice(0, monitoring.alerts.length - 100);
    }
  }

  generatePerformanceSummary(monitoring) {
    const recentOperations = monitoring.operationHistory.slice(-50);

    if (recentOperations.length === 0) {
      return { message: 'No recent operations' };
    }

    const avgDuration = recentOperations.reduce((sum, op) => sum + op.duration, 0) / recentOperations.length;
    const avgThroughput = recentOperations.reduce((sum, op) => sum + op.throughput, 0) / recentOperations.length;
    const successRate = recentOperations.reduce((sum, op) => sum + (op.success ? 1 : 0), 0) / recentOperations.length;

    return {
      recentOperations: recentOperations.length,
      averageDuration: Math.round(avgDuration),
      averageThroughput: Math.round(avgThroughput),
      successRate: (successRate * 100).toFixed(1) + '%',
      activeOperations: monitoring.activeOperations.size,
      recentAlerts: monitoring.alerts.filter(alert => 
        Date.now() - alert.timestamp.getTime() < 300000 // Last 5 minutes
      ).length
    };
  }

  async demonstrateAdvancedPatterns() {
    console.log('\n=== Advanced Bulk Processing Patterns Demo ===');

    try {
      // Setup advanced processors
      const streamProcessor = await this.setupStreamingBulkProcessor();
      const priorityProcessor = await this.setupPriorityQueueProcessor();
      const adaptiveSizing = await this.setupAdaptiveBatchSizing();
      const retryMechanism = await this.setupRetryMechanism();
      const monitoring = await this.setupBulkOperationMonitoring();

      // Demo streaming processing
      console.log('\n--- Streaming Processing Demo ---');
      const streamingData = Array.from({ length: 2500 }, (_, i) => ({
        _id: new ObjectId(),
        streamId: `stream_${i}`,
        data: `streaming_data_${i}`,
        timestamp: new Date()
      }));

      // Add data to stream in chunks
      for (let i = 0; i < streamingData.length; i += 300) {
        const chunk = streamingData.slice(i, i + 300);
        await streamProcessor.streamingInsert(chunk, 'user_transactions');
        await new Promise(resolve => setTimeout(resolve, 100)); // Simulate streaming delay
      }

      console.log('Streaming stats:', streamProcessor.getStats());

      // Demo priority processing
      console.log('\n--- Priority Processing Demo ---');
      const criticalOps = Array.from({ length: 50 }, (_, i) => ({
        type: 'insert',
        document: { priority: 'critical', id: i, timestamp: new Date() }
      }));

      const normalOps = Array.from({ length: 200 }, (_, i) => ({
        type: 'insert', 
        document: { priority: 'normal', id: i + 1000, timestamp: new Date() }
      }));

      priorityProcessor.addToPriorityQueue(criticalOps, 1); // CRITICAL
      priorityProcessor.addToPriorityQueue(normalOps, 3);   // NORMAL

      await new Promise(resolve => setTimeout(resolve, 3000)); // Let priority processor work
      console.log('Priority queue stats:', priorityProcessor.getQueueStats());

      // Cleanup
      await streamProcessor.shutdown();
      priorityProcessor.shutdown();

      return {
        streamingDemo: streamProcessor.getStats(),
        priorityDemo: priorityProcessor.getQueueStats(),
        adaptiveConfig: adaptiveSizing.getConfig()
      };

    } catch (error) {
      console.error('Advanced patterns demo failed:', error);
      throw error;
    }
  }
}

SQL-Style Bulk Operations with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB bulk operations:

-- QueryLeaf bulk operations with SQL-familiar syntax

-- Bulk INSERT operations
INSERT INTO user_transactions 
SELECT 
    UUID() as _id,
    user_id,
    transaction_amount as amount,
    transaction_type,
    transaction_date,
    JSON_BUILD_OBJECT(
        'category', category,
        'channel', channel,
        'location', location
    ) as metadata,
    CURRENT_TIMESTAMP as created_at
FROM staging_transactions
WHERE processing_status = 'pending'
LIMIT 100000;

-- Batch configuration for optimal performance
SET BULK_INSERT_BATCH_SIZE = 10000;
SET BULK_INSERT_ORDERED = false;
SET BULK_INSERT_WRITE_CONCERN = JSON_BUILD_OBJECT('w', 'majority', 'j', true);

-- Advanced bulk INSERT with conflict resolution
INSERT INTO user_profiles (user_id, email, profile_data, created_at)
SELECT 
    user_id,
    email,
    JSON_BUILD_OBJECT(
        'first_name', first_name,
        'last_name', last_name,
        'preferences', preferences,
        'registration_source', source
    ) as profile_data,
    registration_date as created_at
FROM user_registrations
WHERE processed = false
ON DUPLICATE KEY UPDATE 
    profile_data = JSON_MERGE_PATCH(profile_data, VALUES(profile_data)),
    last_updated = CURRENT_TIMESTAMP,
    update_count = COALESCE(update_count, 0) + 1;

-- Bulk UPDATE operations with complex conditions
UPDATE user_transactions 
SET 
    status = CASE 
        WHEN amount > 1000 AND transaction_type = 'purchase' THEN 'requires_approval'
        WHEN transaction_date < CURRENT_DATE - INTERVAL '30 days' THEN 'archived'
        WHEN metadata->>'$.category' = 'refund' THEN 'processed'
        ELSE 'completed'
    END,
    processing_fee = CASE
        WHEN amount > 500 THEN amount * 0.025
        WHEN amount > 100 THEN amount * 0.035
        ELSE amount * 0.05
    END,
    risk_score = CASE
        WHEN user_id IN (SELECT user_id FROM high_risk_users) THEN 100
        WHEN amount > 2000 THEN 75
        WHEN metadata->>'$.channel' = 'api' THEN 50
        ELSE 25
    END,
    last_updated = CURRENT_TIMESTAMP
WHERE status = 'pending'
    AND created_at >= CURRENT_DATE - INTERVAL '7 days'
BULK_OPTIONS (
    batch_size = 5000,
    ordered = false,
    write_concern = JSON_BUILD_OBJECT('w', 'majority')
);

-- Bulk UPSERT operations for data synchronization
UPSERT INTO user_analytics (
    user_id,
    daily_stats,
    calculation_date,
    last_updated
)
WITH daily_calculations AS (
    SELECT 
        user_id,
        DATE_TRUNC('day', transaction_date) as calculation_date,

        -- Aggregate daily statistics
        JSON_BUILD_OBJECT(
            'total_transactions', COUNT(*),
            'total_amount', SUM(amount),
            'avg_transaction', ROUND(AVG(amount)::NUMERIC, 2),
            'transaction_types', JSON_AGG(DISTINCT transaction_type),
            'categories', JSON_AGG(DISTINCT metadata->>'$.category'),
            'channels', JSON_AGG(DISTINCT metadata->>'$.channel'),

            -- Advanced metrics
            'largest_transaction', MAX(amount),
            'smallest_transaction', MIN(amount),
            'morning_transactions', COUNT(*) FILTER (WHERE EXTRACT(HOUR FROM transaction_date) BETWEEN 6 AND 11),
            'afternoon_transactions', COUNT(*) FILTER (WHERE EXTRACT(HOUR FROM transaction_date) BETWEEN 12 AND 17),
            'evening_transactions', COUNT(*) FILTER (WHERE EXTRACT(HOUR FROM transaction_date) BETWEEN 18 AND 23),
            'night_transactions', COUNT(*) FILTER (WHERE EXTRACT(HOUR FROM transaction_date) BETWEEN 0 AND 5),

            -- Spending patterns
            'high_value_transactions', COUNT(*) FILTER (WHERE amount > 500),
            'refund_count', COUNT(*) FILTER (WHERE transaction_type = 'refund'),
            'refund_amount', SUM(amount) FILTER (WHERE transaction_type = 'refund')
        ) as daily_stats

    FROM user_transactions
    WHERE transaction_date >= CURRENT_DATE - INTERVAL '30 days'
        AND status = 'completed'
    GROUP BY user_id, DATE_TRUNC('day', transaction_date)
)
SELECT 
    user_id,
    daily_stats,
    calculation_date,
    CURRENT_TIMESTAMP as last_updated
FROM daily_calculations
ON CONFLICT (user_id, calculation_date) DO UPDATE
SET 
    daily_stats = EXCLUDED.daily_stats,
    last_updated = EXCLUDED.last_updated,
    version = COALESCE(version, 0) + 1
BULK_OPTIONS (
    batch_size = 2000,
    ordered = false,
    enable_upsert = true
);

-- High-performance bulk DELETE with cascading
DELETE FROM user_sessions us
WHERE us.session_id IN (
    SELECT session_id 
    FROM expired_sessions 
    WHERE expiry_date < CURRENT_TIMESTAMP - INTERVAL '7 days'
)
BULK_OPTIONS (
    batch_size = 10000,
    cascade_delete = JSON_ARRAY(
        JSON_BUILD_OBJECT('collection', 'user_activity_logs', 'field', 'session_id'),
        JSON_BUILD_OBJECT('collection', 'session_analytics', 'field', 'session_id')
    )
);

-- Mixed bulk operations in single transaction
START BULK_TRANSACTION;

-- Insert new user activities
INSERT INTO user_activities (user_id, activity_type, activity_data, timestamp)
SELECT 
    user_id,
    'login' as activity_type,
    JSON_BUILD_OBJECT(
        'ip_address', ip_address,
        'user_agent', user_agent,
        'location', location
    ) as activity_data,
    login_timestamp as timestamp
FROM pending_logins
WHERE processed = false;

-- Update user login statistics
UPDATE user_profiles 
SET 
    last_login_date = (
        SELECT MAX(timestamp) 
        FROM user_activities ua 
        WHERE ua.user_id = user_profiles.user_id 
            AND ua.activity_type = 'login'
    ),
    login_count = COALESCE(login_count, 0) + (
        SELECT COUNT(*) 
        FROM pending_logins pl 
        WHERE pl.user_id = user_profiles.user_id 
            AND pl.processed = false
    ),
    profile_updated_at = CURRENT_TIMESTAMP
WHERE user_id IN (SELECT DISTINCT user_id FROM pending_logins WHERE processed = false);

-- Mark source data as processed
UPDATE pending_logins 
SET 
    processed = true,
    processed_at = CURRENT_TIMESTAMP
WHERE processed = false;

COMMIT BULK_TRANSACTION
WITH ROLLBACK_ON_ERROR = true;

-- Advanced bulk operation monitoring and analytics
WITH bulk_operation_metrics AS (
    SELECT 
        operation_type,
        collection_name,
        DATE_TRUNC('hour', operation_timestamp) as hour_bucket,

        -- Volume metrics
        COUNT(*) as operation_count,
        SUM(documents_processed) as total_documents,
        SUM(batch_count) as total_batches,
        AVG(documents_processed) as avg_documents_per_operation,

        -- Performance metrics
        AVG(execution_time_ms) as avg_execution_time,
        PERCENTILE_CONT(0.50) WITHIN GROUP (ORDER BY execution_time_ms) as median_execution_time,
        PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY execution_time_ms) as p95_execution_time,
        MAX(execution_time_ms) as max_execution_time,

        -- Throughput calculations
        AVG(throughput_ops_per_sec) as avg_throughput,
        MAX(throughput_ops_per_sec) as peak_throughput,
        SUM(documents_processed) / GREATEST(SUM(execution_time_ms) / 1000.0, 1) as overall_throughput,

        -- Error tracking
        COUNT(*) FILTER (WHERE success = false) as failed_operations,
        SUM(error_count) as total_errors,
        AVG(error_count) FILTER (WHERE error_count > 0) as avg_errors_per_failed_op,

        -- Resource utilization
        AVG(memory_usage_mb) as avg_memory_usage,
        MAX(memory_usage_mb) as peak_memory_usage,
        AVG(cpu_usage_percent) as avg_cpu_usage

    FROM bulk_operation_logs 
    WHERE operation_timestamp >= CURRENT_TIMESTAMP - INTERVAL '24 hours'
    GROUP BY operation_type, collection_name, DATE_TRUNC('hour', operation_timestamp)
),
performance_analysis AS (
    SELECT *,
        -- Performance classification
        CASE 
            WHEN avg_throughput >= 10000 THEN 'excellent'
            WHEN avg_throughput >= 5000 THEN 'good'
            WHEN avg_throughput >= 1000 THEN 'acceptable'
            ELSE 'needs_optimization'
        END as throughput_rating,

        -- Reliability assessment
        CASE 
            WHEN failed_operations = 0 THEN 'perfect'
            WHEN failed_operations::numeric / operation_count < 0.01 THEN 'excellent'
            WHEN failed_operations::numeric / operation_count < 0.05 THEN 'good'
            ELSE 'needs_attention'
        END as reliability_rating,

        -- Resource efficiency
        CASE 
            WHEN avg_memory_usage <= 100 AND avg_cpu_usage <= 50 THEN 'efficient'
            WHEN avg_memory_usage <= 500 AND avg_cpu_usage <= 75 THEN 'moderate'
            ELSE 'resource_intensive'
        END as resource_efficiency,

        -- Performance trends
        LAG(avg_throughput, 1) OVER (
            PARTITION BY operation_type, collection_name 
            ORDER BY hour_bucket
        ) as prev_hour_throughput,

        LAG(p95_execution_time, 1) OVER (
            PARTITION BY operation_type, collection_name 
            ORDER BY hour_bucket
        ) as prev_hour_p95_latency

    FROM bulk_operation_metrics
)
SELECT 
    operation_type,
    collection_name,
    hour_bucket,

    -- Volume summary
    operation_count,
    total_documents,
    ROUND(avg_documents_per_operation::NUMERIC, 0) as avg_docs_per_op,

    -- Performance summary
    ROUND(avg_execution_time::NUMERIC, 0) as avg_time_ms,
    ROUND(p95_execution_time::NUMERIC, 0) as p95_time_ms,
    ROUND(avg_throughput::NUMERIC, 0) as avg_throughput,
    ROUND(overall_throughput::NUMERIC, 0) as measured_throughput,

    -- Quality indicators
    throughput_rating,
    reliability_rating,
    resource_efficiency,

    -- Error statistics
    failed_operations,
    total_errors,
    ROUND((failed_operations::numeric / operation_count * 100)::NUMERIC, 2) as error_rate_pct,

    -- Resource usage
    ROUND(avg_memory_usage::NUMERIC, 1) as avg_memory_mb,
    ROUND(avg_cpu_usage::NUMERIC, 1) as avg_cpu_pct,

    -- Performance trends
    CASE 
        WHEN prev_hour_throughput IS NOT NULL AND avg_throughput > prev_hour_throughput * 1.1 THEN 'improving'
        WHEN prev_hour_throughput IS NOT NULL AND avg_throughput < prev_hour_throughput * 0.9 THEN 'degrading'
        ELSE 'stable'
    END as throughput_trend,

    CASE 
        WHEN prev_hour_p95_latency IS NOT NULL AND p95_execution_time > prev_hour_p95_latency * 1.2 THEN 'latency_increasing'
        WHEN prev_hour_p95_latency IS NOT NULL AND p95_execution_time < prev_hour_p95_latency * 0.8 THEN 'latency_improving'
        ELSE 'latency_stable'
    END as latency_trend,

    -- Optimization recommendations
    CASE 
        WHEN throughput_rating = 'needs_optimization' AND resource_efficiency = 'efficient' THEN 'Increase batch size or parallelism'
        WHEN throughput_rating = 'needs_optimization' AND resource_efficiency = 'resource_intensive' THEN 'Optimize query patterns or reduce batch size'
        WHEN reliability_rating = 'needs_attention' THEN 'Review error handling and retry logic'
        WHEN resource_efficiency = 'resource_intensive' THEN 'Consider memory optimization or connection pooling'
        ELSE 'Performance within acceptable parameters'
    END as optimization_recommendation

FROM performance_analysis
WHERE hour_bucket >= CURRENT_TIMESTAMP - INTERVAL '12 hours'
ORDER BY hour_bucket DESC, total_documents DESC;

-- Adaptive batch size optimization query
WITH batch_performance_analysis AS (
    SELECT 
        operation_type,
        collection_name,
        batch_size,

        -- Performance metrics per batch size
        COUNT(*) as operation_count,
        AVG(execution_time_ms) as avg_execution_time,
        AVG(throughput_ops_per_sec) as avg_throughput,
        STDDEV(throughput_ops_per_sec) as throughput_variance,

        -- Error rates
        COUNT(*) FILTER (WHERE success = false) as failed_ops,
        AVG(error_count) as avg_errors,

        -- Resource utilization
        AVG(memory_usage_mb) as avg_memory,
        AVG(cpu_usage_percent) as avg_cpu,

        -- Efficiency calculation
        AVG(throughput_ops_per_sec) / GREATEST(AVG(memory_usage_mb), 1) as memory_efficiency,
        AVG(throughput_ops_per_sec) / GREATEST(AVG(cpu_usage_percent), 1) as cpu_efficiency

    FROM bulk_operation_logs
    WHERE operation_timestamp >= CURRENT_TIMESTAMP - INTERVAL '7 days'
    GROUP BY operation_type, collection_name, batch_size
),
optimal_batch_analysis AS (
    SELECT *,
        -- Rank batch sizes by different criteria
        ROW_NUMBER() OVER (
            PARTITION BY operation_type, collection_name 
            ORDER BY avg_throughput DESC
        ) as throughput_rank,

        ROW_NUMBER() OVER (
            PARTITION BY operation_type, collection_name 
            ORDER BY avg_execution_time ASC
        ) as latency_rank,

        ROW_NUMBER() OVER (
            PARTITION BY operation_type, collection_name 
            ORDER BY memory_efficiency DESC
        ) as memory_efficiency_rank,

        ROW_NUMBER() OVER (
            PARTITION BY operation_type, collection_name 
            ORDER BY failed_ops ASC, avg_errors ASC
        ) as reliability_rank,

        -- Calculate composite score
        (
            -- Throughput weight: 40%
            (ROW_NUMBER() OVER (PARTITION BY operation_type, collection_name ORDER BY avg_throughput DESC) * 0.4) +
            -- Latency weight: 30%  
            (ROW_NUMBER() OVER (PARTITION BY operation_type, collection_name ORDER BY avg_execution_time ASC) * 0.3) +
            -- Reliability weight: 20%
            (ROW_NUMBER() OVER (PARTITION BY operation_type, collection_name ORDER BY failed_ops ASC) * 0.2) +
            -- Efficiency weight: 10%
            (ROW_NUMBER() OVER (PARTITION BY operation_type, collection_name ORDER BY memory_efficiency DESC) * 0.1)
        ) as composite_score

    FROM batch_performance_analysis
    WHERE operation_count >= 5 -- Minimum sample size for reliability
)
SELECT 
    operation_type,
    collection_name,
    batch_size,
    operation_count,

    -- Performance metrics
    ROUND(avg_execution_time::NUMERIC, 0) as avg_time_ms,
    ROUND(avg_throughput::NUMERIC, 0) as avg_throughput,
    ROUND(throughput_variance::NUMERIC, 0) as throughput_std_dev,

    -- Rankings
    throughput_rank,
    latency_rank,
    reliability_rank,
    ROUND(composite_score::NUMERIC, 2) as composite_score,

    -- Recommendations
    CASE 
        WHEN ROW_NUMBER() OVER (PARTITION BY operation_type, collection_name ORDER BY composite_score ASC) = 1 
        THEN 'OPTIMAL - Recommended batch size'
        WHEN throughput_rank <= 2 THEN 'HIGH_THROUGHPUT - Consider for bulk operations'
        WHEN latency_rank <= 2 THEN 'LOW_LATENCY - Consider for real-time operations'  
        WHEN reliability_rank <= 2 THEN 'HIGH_RELIABILITY - Consider for critical operations'
        ELSE 'SUBOPTIMAL - Not recommended'
    END as recommendation,

    -- Resource usage
    ROUND(avg_memory::NUMERIC, 1) as avg_memory_mb,
    ROUND(avg_cpu::NUMERIC, 1) as avg_cpu_pct,

    -- Quality indicators
    failed_ops,
    ROUND((failed_ops::numeric / operation_count * 100)::NUMERIC, 2) as error_rate_pct,

    -- Next steps
    CASE 
        WHEN batch_size < 1000 AND throughput_rank > 3 THEN 'Try larger batch size (2000-5000)'
        WHEN batch_size > 10000 AND reliability_rank > 3 THEN 'Try smaller batch size (5000-8000)'
        WHEN throughput_variance > avg_throughput * 0.5 THEN 'Inconsistent performance - review system load'
        ELSE 'Batch size appears well-tuned'
    END as tuning_suggestion

FROM optimal_batch_analysis
ORDER BY operation_type, collection_name, composite_score ASC;

-- QueryLeaf provides comprehensive bulk operation capabilities:
-- 1. High-performance bulk INSERT, UPDATE, DELETE, and UPSERT operations
-- 2. Advanced batch processing with configurable batch sizes and write concerns  
-- 3. Mixed operation support within single transactions
-- 4. Comprehensive error handling and partial failure recovery
-- 5. Real-time monitoring and performance analytics
-- 6. Adaptive batch size optimization based on performance metrics
-- 7. Resource usage tracking and efficiency analysis
-- 8. SQL-familiar syntax for complex bulk operations
-- 9. Integration with MongoDB's native bulk operation optimizations
-- 10. Production-ready patterns for high-volume data processing

Best Practices for MongoDB Bulk Operations

Performance Optimization Strategies

Essential techniques for maximizing bulk operation throughput:

  1. Batch Size Optimization: Start with 1,000-10,000 documents per batch and adjust based on document size and system resources
  2. Unordered Operations: Use unordered bulk operations when possible to allow parallel processing and partial failure handling
  3. Write Concern Tuning: Balance durability and performance by configuring appropriate write concerns for your use case
  4. Index Strategy: Ensure optimal indexes exist before bulk operations, but consider temporarily dropping non-essential indexes for large imports
  5. Connection Pooling: Configure adequate connection pools to handle concurrent bulk operations efficiently
  6. Memory Management: Monitor memory usage and adjust batch sizes to avoid memory pressure and garbage collection overhead

Operational Excellence

Implement robust operational practices for production bulk processing:

  1. Error Handling: Design comprehensive error handling with retry logic for transient failures and dead letter queues for persistent errors
  2. Progress Monitoring: Implement detailed progress tracking and monitoring for long-running bulk operations
  3. Resource Monitoring: Monitor CPU, memory, and I/O usage during bulk operations to identify bottlenecks
  4. Graceful Degradation: Design fallback mechanisms and circuit breakers for bulk operation failures
  5. Testing at Scale: Test bulk operations with production-size datasets to validate performance and reliability
  6. Documentation: Maintain comprehensive documentation of bulk operation patterns, configurations, and troubleshooting procedures

Conclusion

MongoDB bulk operations provide exceptional capabilities for high-throughput data processing that far exceed traditional single-operation approaches. The combination of flexible batch processing, intelligent error handling, and comprehensive monitoring makes MongoDB an ideal platform for applications requiring efficient bulk data management.

Key bulk operation benefits include:

  • Dramatic Performance Improvements: 10x-100x faster processing compared to individual operations
  • Intelligent Batch Processing: Configurable batch sizes with automatic optimization and adaptive sizing
  • Robust Error Handling: Partial failure recovery and comprehensive error reporting
  • Flexible Operation Mixing: Support for mixed INSERT, UPDATE, DELETE, and UPSERT operations in single batches
  • Production-Ready Features: Built-in monitoring, retry mechanisms, and resource management
  • Scalable Architecture: Seamless scaling across replica sets and sharded clusters

Whether you're processing data migrations, real-time analytics ingestion, or high-volume transaction processing, MongoDB bulk operations with QueryLeaf's familiar SQL interface provide the foundation for efficient, scalable data processing solutions.

QueryLeaf Integration: QueryLeaf seamlessly manages MongoDB bulk operations while providing SQL-familiar batch processing syntax, performance optimization patterns, and comprehensive monitoring capabilities. Advanced bulk processing strategies including adaptive batch sizing, priority queues, and streaming operations are elegantly handled through familiar SQL constructs, making sophisticated high-volume data processing both powerful and accessible to SQL-oriented development teams.

The combination of MongoDB's native bulk operation capabilities with SQL-style batch processing makes it an ideal platform for applications requiring both high-throughput data processing and familiar database interaction patterns, ensuring your bulk processing solutions remain both efficient and maintainable as they scale.

MongoDB GridFS for Large File Storage and Management: SQL-Style File Operations and Binary Data Handling

Modern applications frequently need to handle large files, multimedia content, binary data, and document attachments that exceed traditional database storage limitations. Whether you're building content management systems, media libraries, document repositories, or data archival platforms, efficient large file storage and retrieval becomes critical for application performance and user experience.

Traditional relational databases struggle with large binary data storage, often requiring complex external storage solutions, fragmented file management approaches, and intricate metadata synchronization. MongoDB GridFS provides a comprehensive solution for storing and managing files larger than the 16MB BSON document size limit while maintaining the benefits of database-integrated storage, atomic operations, and familiar query patterns.

The Large File Storage Challenge

Traditional database approaches to file storage face significant limitations:

-- PostgreSQL large file storage - complex and limited binary data handling

-- Basic file storage table with bytea limitations (limited to available memory)
CREATE TABLE file_storage (
    file_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    filename VARCHAR(255) NOT NULL,
    content_type VARCHAR(100) NOT NULL,
    file_size BIGINT NOT NULL,
    file_data BYTEA, -- Limited by available memory, inefficient for large files
    upload_timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    created_by UUID NOT NULL,

    -- Basic metadata
    description TEXT,
    tags TEXT[],
    category VARCHAR(50),
    is_public BOOLEAN DEFAULT false,

    -- File characteristics
    file_hash VARCHAR(64), -- For duplicate detection
    original_filename VARCHAR(255),
    compression_type VARCHAR(20),

    CONSTRAINT check_file_size CHECK (file_size > 0 AND file_size <= 1073741824) -- 1GB limit
);

-- Large object storage approach (pg_largeobject) - complex management
CREATE TABLE file_metadata (
    file_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    filename VARCHAR(255) NOT NULL,
    content_type VARCHAR(100) NOT NULL,
    file_size BIGINT NOT NULL,
    large_object_oid OID NOT NULL, -- Reference to pg_largeobject
    upload_timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    created_by UUID NOT NULL,
    description TEXT,
    tags TEXT[],
    category VARCHAR(50),
    is_public BOOLEAN DEFAULT false,
    file_hash VARCHAR(64),

    -- Complex management required
    CONSTRAINT check_file_size CHECK (file_size > 0)
);

-- File chunks table for manual chunking (complex to manage)
CREATE TABLE file_chunks (
    chunk_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    file_id UUID NOT NULL REFERENCES file_metadata(file_id) ON DELETE CASCADE,
    chunk_index INTEGER NOT NULL,
    chunk_data BYTEA NOT NULL,
    chunk_size INTEGER NOT NULL,
    checksum VARCHAR(32),

    UNIQUE(file_id, chunk_index)
);

-- Complex file upload process with manual chunking
CREATE OR REPLACE FUNCTION upload_file_chunked(
    p_filename VARCHAR(255),
    p_content_type VARCHAR(100),
    p_file_data BYTEA,
    p_created_by UUID,
    p_chunk_size INTEGER DEFAULT 262144 -- 256KB chunks
) RETURNS UUID AS $$
DECLARE
    v_file_id UUID;
    v_file_size BIGINT;
    v_chunk_count INTEGER;
    v_chunk_data BYTEA;
    v_offset INTEGER := 1;
    v_chunk_index INTEGER := 0;
    v_current_chunk_size INTEGER;
BEGIN
    -- Get file size
    v_file_size := LENGTH(p_file_data);
    v_chunk_count := CEIL(v_file_size::DECIMAL / p_chunk_size);

    -- Insert file metadata
    INSERT INTO file_metadata (filename, content_type, file_size, large_object_oid, created_by)
    VALUES (p_filename, p_content_type, v_file_size, 0, p_created_by) -- Placeholder OID
    RETURNING file_id INTO v_file_id;

    -- Insert chunks
    WHILE v_offset <= v_file_size LOOP
        v_current_chunk_size := LEAST(p_chunk_size, v_file_size - v_offset + 1);
        v_chunk_data := SUBSTRING(p_file_data FROM v_offset FOR v_current_chunk_size);

        INSERT INTO file_chunks (file_id, chunk_index, chunk_data, chunk_size, checksum)
        VALUES (
            v_file_id, 
            v_chunk_index, 
            v_chunk_data, 
            v_current_chunk_size,
            MD5(v_chunk_data)
        );

        v_offset := v_offset + p_chunk_size;
        v_chunk_index := v_chunk_index + 1;
    END LOOP;

    RETURN v_file_id;
END;
$$ LANGUAGE plpgsql;

-- Complex file retrieval with manual chunk reassembly
CREATE OR REPLACE FUNCTION download_file_chunked(p_file_id UUID)
RETURNS BYTEA AS $$
DECLARE
    v_file_data BYTEA := '';
    v_chunk RECORD;
BEGIN
    -- Reassemble file from chunks
    FOR v_chunk IN 
        SELECT chunk_data 
        FROM file_chunks 
        WHERE file_id = p_file_id 
        ORDER BY chunk_index
    LOOP
        v_file_data := v_file_data || v_chunk.chunk_data;
    END LOOP;

    RETURN v_file_data;
END;
$$ LANGUAGE plpgsql;

-- File search with basic metadata queries (limited functionality)
WITH file_search AS (
    SELECT 
        fm.file_id,
        fm.filename,
        fm.content_type,
        fm.file_size,
        fm.upload_timestamp,
        fm.created_by,
        fm.description,
        fm.tags,
        fm.category,

        -- Basic relevance scoring (very limited)
        CASE 
            WHEN LOWER(fm.filename) LIKE '%search_term%' THEN 3
            WHEN LOWER(fm.description) LIKE '%search_term%' THEN 2  
            WHEN 'search_term' = ANY(fm.tags) THEN 2
            ELSE 1
        END as relevance_score,

        -- File size categorization
        CASE 
            WHEN fm.file_size < 1048576 THEN 'Small (< 1MB)'
            WHEN fm.file_size < 10485760 THEN 'Medium (1-10MB)'
            WHEN fm.file_size < 104857600 THEN 'Large (10-100MB)'
            ELSE 'Very Large (> 100MB)'
        END as size_category,

        -- Check if file exists (chunks available)
        EXISTS(
            SELECT 1 FROM file_chunks fc WHERE fc.file_id = fm.file_id
        ) as file_available,

        -- Get chunk count for integrity verification
        (
            SELECT COUNT(*) FROM file_chunks fc WHERE fc.file_id = fm.file_id
        ) as chunk_count

    FROM file_metadata fm
    WHERE 
        (
            LOWER(fm.filename) LIKE '%search_term%' OR
            LOWER(fm.description) LIKE '%search_term%' OR  
            'search_term' = ANY(fm.tags)
        )
        AND fm.is_public = true
),
file_stats AS (
    SELECT 
        COUNT(*) as total_files,
        SUM(fs.file_size) as total_storage_used,
        AVG(fs.file_size) as avg_file_size,
        COUNT(*) FILTER (WHERE fs.file_available = false) as corrupted_files
    FROM file_search fs
)
SELECT 
    fs.file_id,
    fs.filename,
    fs.content_type,
    pg_size_pretty(fs.file_size) as formatted_size,
    fs.size_category,
    fs.upload_timestamp,
    u.name as uploaded_by,
    fs.description,
    fs.tags,
    fs.relevance_score,
    fs.file_available,
    fs.chunk_count,

    -- Download URL (requires application logic for reassembly)
    '/api/files/' || fs.file_id || '/download' as download_url,

    -- File status
    CASE 
        WHEN NOT fs.file_available THEN 'Missing/Corrupted'
        WHEN fs.chunk_count = 0 THEN 'Empty File'
        ELSE 'Available'
    END as file_status,

    -- Storage efficiency warning
    CASE 
        WHEN fs.chunk_count > 1000 THEN 'High fragmentation - consider optimization'
        ELSE 'Normal'
    END as storage_health

FROM file_search fs
JOIN users u ON fs.created_by = u.user_id
CROSS JOIN file_stats fst
WHERE fs.file_available = true
ORDER BY fs.relevance_score DESC, fs.upload_timestamp DESC
LIMIT 50;

-- Problems with traditional file storage approaches:
-- 1. Memory limitations with BYTEA for large files
-- 2. Complex manual chunking and reassembly processes
-- 3. No built-in file streaming or partial reads
-- 4. Limited metadata integration and search capabilities
-- 5. No automatic integrity checking or corruption detection
-- 6. Poor performance with large binary data operations
-- 7. Complex backup and replication scenarios
-- 8. No built-in compression or storage optimization
-- 9. Difficult scaling with growing file storage requirements
-- 10. Manual transaction management for file operations
-- 11. No streaming uploads or downloads
-- 12. Limited duplicate detection and deduplication
-- 13. Complex permission and access control implementation
-- 14. Poor integration with application object models
-- 15. No automatic metadata extraction capabilities

MongoDB GridFS provides comprehensive file storage capabilities:

// MongoDB GridFS - comprehensive large file storage with built-in optimization
const { MongoClient, GridFSBucket, ObjectId } = require('mongodb');
const fs = require('fs');
const crypto = require('crypto');
const path = require('path');

const client = new MongoClient('mongodb+srv://username:password@cluster.mongodb.net');
const db = client.db('file_storage_platform');

// Advanced GridFS file management system
class AdvancedGridFSManager {
  constructor(db, options = {}) {
    this.db = db;
    this.buckets = new Map();

    // Default bucket for general files
    this.defaultBucket = new GridFSBucket(db, {
      bucketName: 'files',
      chunkSizeBytes: options.chunkSize || 255 * 1024, // 255KB chunks
      writeConcern: { w: 'majority', j: true },
      readConcern: { level: 'majority' }
    });

    this.buckets.set('files', this.defaultBucket);

    // Specialized buckets for different content types
    this.setupSpecializedBuckets(options);

    // Configuration
    this.config = {
      maxFileSize: options.maxFileSize || 5 * 1024 * 1024 * 1024, // 5GB
      enableCompression: options.enableCompression !== false,
      enableDeduplication: options.enableDeduplication !== false,
      enableThumbnails: options.enableThumbnails !== false,
      enableMetadataExtraction: options.enableMetadataExtraction !== false,
      supportedMimeTypes: options.supportedMimeTypes || [
        'image/*', 'video/*', 'audio/*', 'application/pdf',
        'application/msword', 'application/vnd.openxmlformats-officedocument.*',
        'text/*', 'application/json', 'application/zip'
      ],

      // Advanced features
      enableVersioning: options.enableVersioning || false,
      enableEncryption: options.enableEncryption || false,
      enableAuditLogging: options.enableAuditLogging !== false
    };

    this.setupIndexes();
    this.initializeFileProcessing();
  }

  setupSpecializedBuckets(options) {
    // Images bucket with smaller chunks for better streaming
    const imagesBucket = new GridFSBucket(this.db, {
      bucketName: 'images',
      chunkSizeBytes: 64 * 1024 // 64KB for images
    });
    this.buckets.set('images', imagesBucket);

    // Videos bucket with larger chunks for efficiency
    const videosBucket = new GridFSBucket(this.db, {
      bucketName: 'videos', 
      chunkSizeBytes: 1024 * 1024 // 1MB for videos
    });
    this.buckets.set('videos', videosBucket);

    // Documents bucket for office files and PDFs
    const documentsBucket = new GridFSBucket(this.db, {
      bucketName: 'documents',
      chunkSizeBytes: 256 * 1024 // 256KB for documents
    });
    this.buckets.set('documents', documentsBucket);

    // Archives bucket for compressed files
    const archivesBucket = new GridFSBucket(this.db, {
      bucketName: 'archives',
      chunkSizeBytes: 512 * 1024 // 512KB for archives
    });
    this.buckets.set('archives', archivesBucket);
  }

  async setupIndexes() {
    console.log('Setting up GridFS indexes...');

    try {
      // Create indexes for each bucket
      for (const [bucketName, bucket] of this.buckets) {
        const filesCollection = this.db.collection(`${bucketName}.files`);
        const chunksCollection = this.db.collection(`${bucketName}.chunks`);

        // Files collection indexes
        await filesCollection.createIndexes([
          { key: { filename: 1, uploadDate: -1 } },
          { key: { 'metadata.contentType': 1, uploadDate: -1 } },
          { key: { 'metadata.tags': 1 } },
          { key: { 'metadata.category': 1, uploadDate: -1 } },
          { key: { 'metadata.uploadedBy': 1, uploadDate: -1 } },
          { key: { 'metadata.fileHash': 1 }, unique: true, sparse: true },
          { key: { 'metadata.isPublic': 1, uploadDate: -1 } },

          // Text search index for filename and metadata
          { key: { 
            filename: 'text', 
            'metadata.description': 'text',
            'metadata.tags': 'text'
          } },

          // Compound indexes for common queries
          { key: { 
            'metadata.contentType': 1, 
            'metadata.isPublic': 1, 
            uploadDate: -1 
          } }
        ]);

        // Chunks collection index (usually created automatically)
        await chunksCollection.createIndex({ files_id: 1, n: 1 }, { unique: true });
      }

      console.log('GridFS indexes created successfully');
    } catch (error) {
      console.error('Error setting up GridFS indexes:', error);
      throw error;
    }
  }

  async uploadFile(fileBuffer, filename, options = {}) {
    console.log(`Uploading file: ${filename} (${fileBuffer.length} bytes)`);

    try {
      // Validate file
      await this.validateFile(fileBuffer, filename, options);

      // Determine bucket based on content type
      const contentType = options.contentType || this.detectContentType(filename);
      const bucket = this.selectBucket(contentType);

      // Check for duplicates if enabled
      let existingFile = null;
      if (this.config.enableDeduplication) {
        existingFile = await this.checkForDuplicate(fileBuffer, options);
        if (existingFile) {
          console.log(`Duplicate file found: ${existingFile._id}`);
          return await this.handleDuplicate(existingFile, filename, options);
        }
      }

      // Prepare metadata
      const metadata = await this.prepareFileMetadata(fileBuffer, filename, contentType, options);

      // Create upload stream
      const uploadStream = bucket.openUploadStream(filename, {
        metadata: metadata,
        chunkSizeBytes: this.getOptimalChunkSize(fileBuffer.length, contentType)
      });

      return new Promise((resolve, reject) => {
        uploadStream.on('error', reject);
        uploadStream.on('finish', async () => {
          console.log(`File uploaded successfully: ${uploadStream.id}`);

          try {
            // Post-upload processing
            const processingResult = await this.postUploadProcessing(
              uploadStream.id, 
              fileBuffer, 
              filename, 
              contentType, 
              metadata
            );

            // Return comprehensive file information
            resolve({
              fileId: uploadStream.id,
              filename: filename,
              contentType: contentType,
              size: fileBuffer.length,
              metadata: metadata,
              bucket: bucket.bucketName,
              uploadDate: new Date(),
              processingResult: processingResult,
              downloadUrl: `/api/files/${uploadStream.id}`,
              thumbnailUrl: processingResult.thumbnail ? 
                `/api/files/${uploadStream.id}/thumbnail` : null
            });

          } catch (processingError) {
            console.error('Post-upload processing failed:', processingError);
            // Still resolve with basic file info even if processing fails
            resolve({
              fileId: uploadStream.id,
              filename: filename,
              contentType: contentType,
              size: fileBuffer.length,
              metadata: metadata,
              bucket: bucket.bucketName,
              uploadDate: new Date(),
              warning: 'Post-upload processing failed'
            });
          }
        });

        // Write file buffer to stream
        uploadStream.end(fileBuffer);
      });

    } catch (error) {
      console.error(`Error uploading file ${filename}:`, error);
      throw error;
    }
  }

  async validateFile(fileBuffer, filename, options) {
    // File size validation
    if (fileBuffer.length > this.config.maxFileSize) {
      throw new Error(
        `File too large: ${fileBuffer.length} bytes (max: ${this.config.maxFileSize})`
      );
    }

    if (fileBuffer.length === 0) {
      throw new Error('Cannot upload empty file');
    }

    // Content type validation
    const contentType = options.contentType || this.detectContentType(filename);
    if (!this.isContentTypeSupported(contentType)) {
      throw new Error(`Unsupported file type: ${contentType}`);
    }

    // Filename validation
    if (!filename || filename.trim().length === 0) {
      throw new Error('Filename is required');
    }

    // Check for malicious content (basic checks)
    await this.scanForMaliciousContent(fileBuffer, filename, contentType);
  }

  async prepareFileMetadata(fileBuffer, filename, contentType, options) {
    const fileHash = crypto.createHash('sha256').update(fileBuffer).digest('hex');

    const metadata = {
      originalFilename: filename,
      contentType: contentType,
      fileSize: fileBuffer.length,
      fileHash: fileHash,
      uploadedBy: options.uploadedBy || null,
      uploadedAt: new Date(),

      // File characteristics
      mimeType: contentType,
      fileExtension: path.extname(filename).toLowerCase(),

      // User-provided metadata
      description: options.description || null,
      tags: options.tags || [],
      category: options.category || this.categorizeByContentType(contentType),
      isPublic: options.isPublic !== false,
      accessLevel: options.accessLevel || 'public',

      // System metadata
      version: options.version || 1,
      parentFileId: options.parentFileId || null,
      processingStatus: 'pending',

      // Storage information
      compressionType: null,
      encryptionType: options.enableEncryption ? 'AES256' : null,

      // Additional metadata
      customFields: options.customFields || {}
    };

    // Extract additional metadata based on file type
    if (this.config.enableMetadataExtraction) {
      const extractedMetadata = await this.extractFileMetadata(fileBuffer, contentType);
      metadata.extracted = extractedMetadata;
    }

    return metadata;
  }

  async extractFileMetadata(fileBuffer, contentType) {
    const metadata = {};

    try {
      if (contentType.startsWith('image/')) {
        // Extract image metadata (simplified - would use actual image processing library)
        metadata.imageInfo = {
          format: contentType.split('/')[1],
          // In production, use libraries like sharp or jimp for actual metadata extraction
          estimated_width: null,
          estimated_height: null,
          color_space: null,
          has_transparency: null
        };
      } else if (contentType.startsWith('video/')) {
        // Extract video metadata
        metadata.videoInfo = {
          format: contentType.split('/')[1],
          estimated_duration: null,
          estimated_bitrate: null,
          estimated_resolution: null
        };
      } else if (contentType === 'application/pdf') {
        // Extract PDF metadata
        metadata.documentInfo = {
          estimated_page_count: null,
          estimated_word_count: null,
          has_text_layer: null,
          has_forms: null
        };
      }
    } catch (extractionError) {
      console.error('Metadata extraction failed:', extractionError);
      metadata.extraction_error = extractionError.message;
    }

    return metadata;
  }

  async postUploadProcessing(fileId, fileBuffer, filename, contentType, metadata) {
    console.log(`Starting post-upload processing for file: ${fileId}`);

    const processingResult = {
      thumbnail: null,
      textContent: null,
      additionalFormats: [],
      processingErrors: []
    };

    try {
      // Generate thumbnail for images and videos
      if (this.config.enableThumbnails) {
        if (contentType.startsWith('image/') || contentType.startsWith('video/')) {
          processingResult.thumbnail = await this.generateThumbnail(
            fileId, 
            fileBuffer, 
            contentType
          );
        }
      }

      // Extract text content for searchable documents
      if (this.isTextExtractable(contentType)) {
        processingResult.textContent = await this.extractTextContent(
          fileBuffer, 
          contentType
        );
      }

      // Update processing status
      await this.updateFileMetadata(fileId, {
        'metadata.processingStatus': 'completed',
        'metadata.processingResult': processingResult,
        'metadata.processedAt': new Date()
      });

    } catch (processingError) {
      console.error('Post-upload processing failed:', processingError);
      processingResult.processingErrors.push(processingError.message);

      await this.updateFileMetadata(fileId, {
        'metadata.processingStatus': 'failed',
        'metadata.processingError': processingError.message,
        'metadata.processedAt': new Date()
      });
    }

    return processingResult;
  }

  async downloadFile(fileId, options = {}) {
    console.log(`Downloading file: ${fileId}`);

    try {
      // Get file metadata first
      const fileInfo = await this.getFileInfo(fileId);
      if (!fileInfo) {
        throw new Error(`File not found: ${fileId}`);
      }

      // Check permissions
      if (!this.hasAccessPermission(fileInfo, options.user)) {
        throw new Error('Access denied');
      }

      // Select appropriate bucket
      const bucket = this.selectBucketByFileInfo(fileInfo);

      // Create download stream
      const downloadStream = bucket.openDownloadStream(new ObjectId(fileId));

      // Handle range requests for partial downloads
      if (options.range) {
        return this.handleRangeRequest(downloadStream, fileInfo, options.range);
      }

      // Return full file stream
      return {
        stream: downloadStream,
        fileInfo: fileInfo,
        contentType: fileInfo.metadata.contentType,
        filename: fileInfo.filename,
        size: fileInfo.length
      };

    } catch (error) {
      console.error(`Error downloading file ${fileId}:`, error);
      throw error;
    }
  }

  async searchFiles(query, options = {}) {
    console.log(`Searching files with query: ${query}`);

    try {
      const searchCriteria = this.buildSearchCriteria(query, options);
      const pipeline = this.buildFileSearchPipeline(searchCriteria, options);

      // Execute search across all relevant buckets
      const results = [];
      const buckets = options.bucket ? [options.bucket] : Array.from(this.buckets.keys());

      for (const bucketName of buckets) {
        const filesCollection = this.db.collection(`${bucketName}.files`);
        const bucketResults = await filesCollection.aggregate(pipeline).toArray();

        // Add bucket information to results
        bucketResults.forEach(result => {
          result.bucket = bucketName;
          result.downloadUrl = `/api/files/${result._id}`;
          result.thumbnailUrl = result.metadata.processingResult?.thumbnail ? 
            `/api/files/${result._id}/thumbnail` : null;
        });

        results.push(...bucketResults);
      }

      // Sort combined results
      const sortedResults = this.sortSearchResults(results, options);

      // Apply pagination
      const limit = options.limit || 20;
      const offset = options.offset || 0;
      const paginatedResults = sortedResults.slice(offset, offset + limit);

      return {
        files: paginatedResults,
        totalCount: results.length,
        query: query,
        searchOptions: options,
        executionTime: Date.now() - (options.startTime || Date.now())
      };

    } catch (error) {
      console.error('Error searching files:', error);
      throw error;
    }
  }

  buildSearchCriteria(query, options) {
    const criteria = { $and: [] };

    // Text search
    if (query && query.trim().length > 0) {
      criteria.$and.push({
        $or: [
          { filename: { $regex: query, $options: 'i' } },
          { 'metadata.description': { $regex: query, $options: 'i' } },
          { 'metadata.tags': { $in: [new RegExp(query, 'i')] } },
          { $text: { $search: query } }
        ]
      });
    }

    // Content type filter
    if (options.contentType) {
      criteria.$and.push({
        'metadata.contentType': options.contentType
      });
    }

    // Category filter
    if (options.category) {
      criteria.$and.push({
        'metadata.category': options.category
      });
    }

    // Access level filter
    if (options.accessLevel) {
      criteria.$and.push({
        'metadata.accessLevel': options.accessLevel
      });
    }

    // Date range filter
    if (options.dateFrom || options.dateTo) {
      const dateFilter = {};
      if (options.dateFrom) dateFilter.$gte = new Date(options.dateFrom);
      if (options.dateTo) dateFilter.$lte = new Date(options.dateTo);
      criteria.$and.push({ uploadDate: dateFilter });
    }

    // Size filter
    if (options.minSize || options.maxSize) {
      const sizeFilter = {};
      if (options.minSize) sizeFilter.$gte = options.minSize;
      if (options.maxSize) sizeFilter.$lte = options.maxSize;
      criteria.$and.push({ length: sizeFilter });
    }

    // User filter
    if (options.uploadedBy) {
      criteria.$and.push({
        'metadata.uploadedBy': options.uploadedBy
      });
    }

    // Public access filter
    if (options.publicOnly) {
      criteria.$and.push({
        'metadata.isPublic': true
      });
    }

    return criteria.$and.length > 0 ? criteria : {};
  }

  buildFileSearchPipeline(criteria, options) {
    const pipeline = [];

    // Match stage
    if (Object.keys(criteria).length > 0) {
      pipeline.push({ $match: criteria });
    }

    // Add computed fields for relevance scoring
    pipeline.push({
      $addFields: {
        relevanceScore: {
          $add: [
            // Filename match bonus
            {
              $cond: {
                if: { $regexMatch: { input: '$filename', regex: options.query || '', options: 'i' } },
                then: 3,
                else: 0
              }
            },
            // Size factor (prefer reasonable file sizes)
            {
              $cond: {
                if: { $and: [{ $gte: ['$length', 1000] }, { $lte: ['$length', 10000000] }] },
                then: 1,
                else: 0
              }
            },
            // Recency bonus
            {
              $cond: {
                if: { $gte: ['$uploadDate', new Date(Date.now() - 30 * 24 * 60 * 60 * 1000)] },
                then: 2,
                else: 0
              }
            },
            // Processing status bonus
            {
              $cond: {
                if: { $eq: ['$metadata.processingStatus', 'completed'] },
                then: 1,
                else: 0
              }
            }
          ]
        },

        // Formatted file size
        formattedSize: {
          $switch: {
            branches: [
              { case: { $lt: ['$length', 1024] }, then: { $concat: [{ $toString: '$length' }, ' bytes'] } },
              { case: { $lt: ['$length', 1048576] }, then: { $concat: [{ $toString: { $round: [{ $divide: ['$length', 1024] }, 1] } }, ' KB'] } },
              { case: { $lt: ['$length', 1073741824] }, then: { $concat: [{ $toString: { $round: [{ $divide: ['$length', 1048576] }, 1] } }, ' MB'] } }
            ],
            default: { $concat: [{ $toString: { $round: [{ $divide: ['$length', 1073741824] }, 1] } }, ' GB'] }
          }
        }
      }
    });

    // Project relevant fields
    pipeline.push({
      $project: {
        _id: 1,
        filename: 1,
        length: 1,
        formattedSize: 1,
        uploadDate: 1,
        relevanceScore: 1,
        'metadata.contentType': 1,
        'metadata.category': 1,
        'metadata.description': 1,
        'metadata.tags': 1,
        'metadata.isPublic': 1,
        'metadata.uploadedBy': 1,
        'metadata.processingStatus': 1,
        'metadata.processingResult': 1,
        'metadata.fileHash': 1
      }
    });

    return pipeline;
  }

  // Utility methods

  selectBucket(contentType) {
    if (contentType.startsWith('image/')) return this.buckets.get('images');
    if (contentType.startsWith('video/')) return this.buckets.get('videos');
    if (contentType.includes('pdf') || contentType.includes('document')) return this.buckets.get('documents');
    if (contentType.includes('zip') || contentType.includes('archive')) return this.buckets.get('archives');
    return this.defaultBucket;
  }

  detectContentType(filename) {
    const ext = path.extname(filename).toLowerCase();
    const mimeTypes = {
      '.jpg': 'image/jpeg', '.jpeg': 'image/jpeg', '.png': 'image/png', '.gif': 'image/gif',
      '.mp4': 'video/mp4', '.mov': 'video/quicktime', '.avi': 'video/x-msvideo',
      '.mp3': 'audio/mpeg', '.wav': 'audio/wav', '.flac': 'audio/flac',
      '.pdf': 'application/pdf', '.doc': 'application/msword',
      '.docx': 'application/vnd.openxmlformats-officedocument.wordprocessingml.document',
      '.zip': 'application/zip', '.tar': 'application/x-tar', '.gz': 'application/gzip',
      '.txt': 'text/plain', '.csv': 'text/csv', '.json': 'application/json'
    };
    return mimeTypes[ext] || 'application/octet-stream';
  }

  isContentTypeSupported(contentType) {
    return this.config.supportedMimeTypes.some(pattern => 
      pattern.endsWith('*') ? 
        contentType.startsWith(pattern.slice(0, -1)) : 
        contentType === pattern
    );
  }

  categorizeByContentType(contentType) {
    if (contentType.startsWith('image/')) return 'images';
    if (contentType.startsWith('video/')) return 'videos';  
    if (contentType.startsWith('audio/')) return 'audio';
    if (contentType.includes('pdf')) return 'documents';
    if (contentType.includes('document') || contentType.includes('text')) return 'documents';
    if (contentType.includes('zip') || contentType.includes('archive')) return 'archives';
    return 'misc';
  }

  getOptimalChunkSize(fileSize, contentType) {
    // Optimize chunk size based on file size and type
    if (contentType.startsWith('image/') && fileSize < 1024 * 1024) return 64 * 1024; // 64KB for small images
    if (contentType.startsWith('video/')) return 1024 * 1024; // 1MB for videos
    if (fileSize > 100 * 1024 * 1024) return 512 * 1024; // 512KB for large files
    return 255 * 1024; // Default 255KB
  }
}

// Benefits of MongoDB GridFS:
// - Automatic file chunking and reassembly
// - Built-in streaming for large files
// - Integrated metadata storage and indexing
// - High-performance binary data operations
// - Automatic replication and sharding support
// - ACID transactions for file operations
// - Advanced query capabilities on file metadata
// - Built-in compression and optimization
// - Seamless integration with MongoDB operations
// - Production-ready scalability and performance

module.exports = { AdvancedGridFSManager };

File Management and Advanced Operations

Comprehensive File Operations and Metadata Management

Implement sophisticated file management capabilities:

// Advanced file management operations with GridFS
class ProductionGridFSOperations extends AdvancedGridFSManager {
  constructor(db, options) {
    super(db, options);
    this.setupAdvancedCapabilities();
  }

  async implementAdvancedFileOperations() {
    console.log('Setting up advanced GridFS operations...');

    // File versioning system
    await this.setupFileVersioning();

    // Duplicate detection and deduplication
    await this.setupDeduplicationSystem();

    // File sharing and collaboration
    await this.setupFileSharingSystem();

    // Automated file lifecycle management
    await this.setupLifecycleManagement();

    // File analytics and reporting
    await this.setupFileAnalytics();
  }

  async createFileVersion(originalFileId, fileBuffer, versionMetadata = {}) {
    console.log(`Creating new version for file: ${originalFileId}`);

    try {
      // Get original file information
      const originalFile = await this.getFileInfo(originalFileId);
      if (!originalFile) {
        throw new Error(`Original file not found: ${originalFileId}`);
      }

      // Increment version number
      const newVersion = (originalFile.metadata.version || 1) + 1;

      // Upload new version with linked metadata
      const uploadOptions = {
        ...versionMetadata,
        parentFileId: originalFileId,
        version: newVersion,
        originalFilename: originalFile.filename,
        versionType: versionMetadata.versionType || 'update',
        versionComment: versionMetadata.comment || 'Updated version',
        uploadedBy: versionMetadata.uploadedBy,
        contentType: originalFile.metadata.contentType
      };

      const newVersionFile = await this.uploadFile(fileBuffer, originalFile.filename, uploadOptions);

      // Update version history in original file
      await this.updateFileMetadata(originalFileId, {
        'metadata.hasVersions': true,
        'metadata.latestVersion': newVersion,
        'metadata.latestVersionId': newVersionFile.fileId,
        $push: {
          'metadata.versionHistory': {
            versionId: newVersionFile.fileId,
            version: newVersion,
            createdAt: new Date(),
            createdBy: versionMetadata.uploadedBy,
            comment: versionMetadata.comment || '',
            fileSize: fileBuffer.length
          }
        }
      });

      return {
        originalFileId: originalFileId,
        newVersionId: newVersionFile.fileId,
        version: newVersion,
        versionInfo: newVersionFile
      };

    } catch (error) {
      console.error('Error creating file version:', error);
      throw error;
    }
  }

  async getFileVersionHistory(fileId) {
    console.log(`Getting version history for file: ${fileId}`);

    try {
      const file = await this.getFileInfo(fileId);
      if (!file || !file.metadata.hasVersions) {
        return { fileId: fileId, versions: [] };
      }

      // Get all versions
      const pipeline = [
        {
          $match: {
            $or: [
              { _id: new ObjectId(fileId) },
              { 'metadata.parentFileId': fileId }
            ]
          }
        },
        {
          $sort: { 'metadata.version': 1 }
        },
        {
          $project: {
            _id: 1,
            filename: 1,
            length: 1,
            uploadDate: 1,
            'metadata.version': 1,
            'metadata.versionType': 1,
            'metadata.versionComment': 1,
            'metadata.uploadedBy': 1,
            'metadata.contentType': 1
          }
        }
      ];

      const filesCollection = this.db.collection('files.files');
      const versions = await filesCollection.aggregate(pipeline).toArray();

      return {
        fileId: fileId,
        originalFile: file,
        versions: versions,
        totalVersions: versions.length
      };

    } catch (error) {
      console.error('Error getting version history:', error);
      throw error;
    }
  }

  async shareFile(fileId, shareOptions = {}) {
    console.log(`Creating file share for: ${fileId}`);

    try {
      const file = await this.getFileInfo(fileId);
      if (!file) {
        throw new Error(`File not found: ${fileId}`);
      }

      // Generate share token
      const shareToken = crypto.randomBytes(32).toString('hex');

      const shareRecord = {
        _id: new ObjectId(),
        fileId: fileId,
        shareToken: shareToken,
        sharedBy: shareOptions.sharedBy,
        createdAt: new Date(),
        expiresAt: shareOptions.expiresAt || new Date(Date.now() + 7 * 24 * 60 * 60 * 1000), // 7 days

        // Share settings
        allowDownload: shareOptions.allowDownload !== false,
        allowView: shareOptions.allowView !== false,
        allowComment: shareOptions.allowComment || false,
        requireAuth: shareOptions.requireAuth || false,

        // Access tracking
        accessCount: 0,
        lastAccessedAt: null,
        accessLog: [],

        // Share metadata
        shareNote: shareOptions.note || '',
        shareName: shareOptions.name || `Share of ${file.filename}`,
        shareType: shareOptions.shareType || 'public_link',

        // Restrictions
        maxDownloads: shareOptions.maxDownloads || null,
        allowedDomains: shareOptions.allowedDomains || [],
        allowedUsers: shareOptions.allowedUsers || []
      };

      // Store share record
      const sharesCollection = this.db.collection('file_shares');
      await sharesCollection.insertOne(shareRecord);

      // Update file metadata
      await this.updateFileMetadata(fileId, {
        'metadata.isShared': true,
        'metadata.shareCount': { $inc: 1 },
        $push: {
          'metadata.shareHistory': {
            shareId: shareRecord._id,
            shareToken: shareToken,
            createdAt: new Date(),
            sharedBy: shareOptions.sharedBy
          }
        }
      });

      return {
        shareId: shareRecord._id,
        shareToken: shareToken,
        shareUrl: `/api/shared/${shareToken}`,
        expiresAt: shareRecord.expiresAt,
        shareSettings: {
          allowDownload: shareRecord.allowDownload,
          allowView: shareRecord.allowView,
          allowComment: shareRecord.allowComment
        }
      };

    } catch (error) {
      console.error('Error creating file share:', error);
      throw error;
    }
  }

  async analyzeStorageUsage(options = {}) {
    console.log('Analyzing GridFS storage usage...');

    try {
      const analysisResults = {};

      // Analyze each bucket
      for (const [bucketName, bucket] of this.buckets) {
        const filesCollection = this.db.collection(`${bucketName}.files`);

        const bucketAnalysis = await filesCollection.aggregate([
          {
            $group: {
              _id: null,
              totalFiles: { $sum: 1 },
              totalSize: { $sum: '$length' },
              avgFileSize: { $avg: '$length' },
              maxFileSize: { $max: '$length' },
              minFileSize: { $min: '$length' },

              // Content type distribution
              contentTypes: {
                $push: '$metadata.contentType'
              },

              // Upload date analysis
              oldestFile: { $min: '$uploadDate' },
              newestFile: { $max: '$uploadDate' },

              // User analysis
              uploaders: {
                $addToSet: '$metadata.uploadedBy'
              }
            }
          },
          {
            $addFields: {
              // Content type statistics
              contentTypeStats: {
                $reduce: {
                  input: '$contentTypes',
                  initialValue: {},
                  in: {
                    $mergeObjects: [
                      '$$value',
                      {
                        $arrayToObject: [
                          [{ k: '$$this', v: { $add: [{ $ifNull: [{ $getField: { field: '$$this', input: '$$value' } }, 0] }, 1] } }]
                        ]
                      }
                    ]
                  }
                }
              },

              // Storage efficiency metrics
              avgChunkCount: {
                $divide: ['$totalSize', 255 * 1024] // Assuming 255KB chunks
              },

              storageEfficiency: {
                $multiply: [
                  { $divide: ['$totalSize', { $add: ['$totalSize', { $multiply: ['$totalFiles', 1024] }] }] }, // Account for metadata overhead
                  100
                ]
              }
            }
          }
        ]).toArray();

        // Additional bucket-specific analysis
        const categoryAnalysis = await filesCollection.aggregate([
          {
            $group: {
              _id: '$metadata.category',
              fileCount: { $sum: 1 },
              totalSize: { $sum: '$length' },
              avgSize: { $avg: '$length' }
            }
          },
          { $sort: { fileCount: -1 } }
        ]).toArray();

        const recentActivity = await filesCollection.aggregate([
          {
            $match: {
              uploadDate: { $gte: new Date(Date.now() - 30 * 24 * 60 * 60 * 1000) }
            }
          },
          {
            $group: {
              _id: { 
                $dateToString: { format: '%Y-%m-%d', date: '$uploadDate' }
              },
              filesUploaded: { $sum: 1 },
              sizeUploaded: { $sum: '$length' }
            }
          },
          { $sort: { _id: 1 } }
        ]).toArray();

        analysisResults[bucketName] = {
          overview: bucketAnalysis[0] || {},
          categoryBreakdown: categoryAnalysis,
          recentActivity: recentActivity,
          recommendations: this.generateStorageRecommendations(bucketAnalysis[0], categoryAnalysis)
        };
      }

      // Overall system analysis
      const systemStats = {
        totalBuckets: this.buckets.size,
        analysisDate: new Date(),
        recommendations: this.generateSystemRecommendations(analysisResults)
      };

      return {
        systemStats: systemStats,
        bucketAnalysis: analysisResults,
        summary: this.generateStorageSummary(analysisResults)
      };

    } catch (error) {
      console.error('Error analyzing storage usage:', error);
      throw error;
    }
  }

  generateStorageRecommendations(bucketStats, categoryStats) {
    const recommendations = [];

    if (bucketStats) {
      // Size-based recommendations
      if (bucketStats.avgFileSize > 50 * 1024 * 1024) { // 50MB
        recommendations.push({
          type: 'optimization',
          priority: 'medium',
          message: 'Large average file size detected. Consider implementing file compression.',
          action: 'Enable compression for new uploads'
        });
      }

      if (bucketStats.totalFiles > 10000) {
        recommendations.push({
          type: 'performance',
          priority: 'high', 
          message: 'High file count may impact query performance.',
          action: 'Consider implementing file archiving or additional indexing'
        });
      }

      if (bucketStats.storageEfficiency < 85) {
        recommendations.push({
          type: 'efficiency',
          priority: 'low',
          message: 'Storage efficiency could be improved.',
          action: 'Review chunk size settings and consider deduplication'
        });
      }
    }

    // Category-based recommendations
    if (categoryStats) {
      const topCategory = categoryStats[0];
      if (topCategory && topCategory.avgSize > 100 * 1024 * 1024) { // 100MB
        recommendations.push({
          type: 'category_optimization',
          priority: 'medium',
          message: `Category "${topCategory._id}" has large average file sizes.`,
          action: 'Consider specialized handling for this content type'
        });
      }
    }

    return recommendations;
  }

  async cleanupExpiredShares() {
    console.log('Cleaning up expired file shares...');

    try {
      const sharesCollection = this.db.collection('file_shares');

      // Find expired shares
      const expiredShares = await sharesCollection.find({
        expiresAt: { $lt: new Date() }
      }).toArray();

      if (expiredShares.length === 0) {
        console.log('No expired shares found');
        return { deletedCount: 0, updatedFiles: 0 };
      }

      // Remove expired shares
      const deleteResult = await sharesCollection.deleteMany({
        expiresAt: { $lt: new Date() }
      });

      // Update affected files
      const fileIds = expiredShares.map(share => share.fileId);
      const updateResult = await this.db.collection('files.files').updateMany(
        { _id: { $in: fileIds } },
        {
          $set: { 'metadata.isShared': false },
          $unset: { 'metadata.activeShares': '' }
        }
      );

      console.log(`Cleaned up ${deleteResult.deletedCount} expired shares`);

      return {
        deletedCount: deleteResult.deletedCount,
        updatedFiles: updateResult.modifiedCount,
        expiredShareIds: expiredShares.map(s => s._id)
      };

    } catch (error) {
      console.error('Error cleaning up expired shares:', error);
      throw error;
    }
  }
}

QueryLeaf GridFS Integration

QueryLeaf provides familiar SQL syntax for GridFS file operations and management:

-- QueryLeaf GridFS operations with SQL-familiar syntax

-- File upload and metadata management using SQL-style syntax
INSERT INTO gridfs_files (
  filename, 
  content, 
  content_type, 
  metadata
) VALUES (
  'product_manual.pdf',
  FILE_CONTENT('/path/to/local/file.pdf'),
  'application/pdf',
  JSON_OBJECT(
    'category', 'documentation',
    'tags', ARRAY['manual', 'product', 'guide'],
    'description', 'Product user manual version 2.1',
    'uploadedBy', CURRENT_USER_ID(),
    'accessLevel', 'public',
    'department', 'customer_support'
  )
);

-- Advanced file search with metadata filtering and full-text search
SELECT 
  f.file_id,
  f.filename,
  f.content_type,
  f.file_size,
  FORMAT_BYTES(f.file_size) as formatted_size,
  f.upload_date,
  f.metadata->>'category' as category,
  f.metadata->>'description' as description,
  JSON_EXTRACT(f.metadata, '$.tags') as tags,
  f.metadata->>'uploadedBy' as uploaded_by,

  -- File characteristics and analysis
  CASE f.content_type
    WHEN 'image/jpeg' THEN 'Image'
    WHEN 'image/png' THEN 'Image'
    WHEN 'application/pdf' THEN 'Document'
    WHEN 'video/mp4' THEN 'Video'
    WHEN 'audio/mpeg' THEN 'Audio'
    ELSE 'Other'
  END as file_type,

  -- File age and recency
  EXTRACT(DAYS FROM CURRENT_DATE - f.upload_date) as days_old,
  CASE 
    WHEN f.upload_date > CURRENT_DATE - INTERVAL '7 days' THEN 'Recent'
    WHEN f.upload_date > CURRENT_DATE - INTERVAL '30 days' THEN 'Current'
    WHEN f.upload_date > CURRENT_DATE - INTERVAL '90 days' THEN 'Older'
    ELSE 'Archive'
  END as age_category,

  -- Access and sharing information
  f.metadata->>'isPublic' as is_public,
  f.metadata->>'accessLevel' as access_level,
  COALESCE(f.metadata->>'shareCount', '0')::INTEGER as share_count,

  -- Processing status
  f.metadata->>'processingStatus' as processing_status,
  CASE f.metadata->>'processingStatus'
    WHEN 'completed' THEN '✓ Processed'
    WHEN 'pending' THEN '⏳ Processing'
    WHEN 'failed' THEN '❌ Failed'
    ELSE '❓ Unknown'
  END as processing_display,

  -- File URLs and access
  CONCAT('/api/files/', f.file_id, '/download') as download_url,
  CONCAT('/api/files/', f.file_id, '/view') as view_url,
  CASE 
    WHEN f.metadata->'processingResult'->>'thumbnail' IS NOT NULL 
    THEN CONCAT('/api/files/', f.file_id, '/thumbnail')
    ELSE NULL
  END as thumbnail_url,

  -- Relevance scoring for search
  (
    CASE 
      WHEN f.filename ILIKE '%search_term%' THEN 5
      WHEN f.metadata->>'description' ILIKE '%search_term%' THEN 3
      WHEN JSON_EXTRACT(f.metadata, '$.tags') @> '["search_term"]' THEN 4
      ELSE 1
    END +
    CASE f.metadata->>'processingStatus'
      WHEN 'completed' THEN 2
      ELSE 0
    END +
    CASE 
      WHEN f.upload_date > CURRENT_DATE - INTERVAL '30 days' THEN 1
      ELSE 0
    END
  ) as relevance_score

FROM gridfs_files f
WHERE 
  -- Text search across filename, description, and tags
  (
    f.filename ILIKE '%document%' OR
    f.metadata->>'description' ILIKE '%document%' OR
    JSON_EXTRACT(f.metadata, '$.tags') @> '["document"]'
  )

  -- Content type filtering
  AND f.content_type IN ('application/pdf', 'application/msword', 'text/plain')

  -- Access level filtering
  AND f.metadata->>'accessLevel' IN ('public', 'internal')

  -- Size filtering (documents between 1KB and 50MB)
  AND f.file_size BETWEEN 1024 AND 52428800

  -- Date range filtering (last 6 months)
  AND f.upload_date >= CURRENT_DATE - INTERVAL '6 months'

ORDER BY relevance_score DESC, f.upload_date DESC
LIMIT 25;

-- File analytics and storage insights
WITH file_statistics AS (
  SELECT 
    COUNT(*) as total_files,
    SUM(f.file_size) as total_storage_bytes,
    AVG(f.file_size) as avg_file_size,
    MIN(f.file_size) as smallest_file,
    MAX(f.file_size) as largest_file,
    COUNT(*) FILTER (WHERE f.upload_date > CURRENT_DATE - INTERVAL '30 days') as recent_uploads,
    COUNT(*) FILTER (WHERE f.metadata->>'processingStatus' = 'completed') as processed_files,
    COUNT(DISTINCT f.metadata->>'uploadedBy') as unique_uploaders,

    -- Content type distribution
    JSON_OBJECT_AGG(
      CASE f.content_type
        WHEN 'image/jpeg' THEN 'JPEG Images'
        WHEN 'image/png' THEN 'PNG Images'
        WHEN 'application/pdf' THEN 'PDF Documents'
        WHEN 'video/mp4' THEN 'MP4 Videos'
        WHEN 'audio/mpeg' THEN 'MP3 Audio'
        ELSE 'Other Files'
      END,
      COUNT(*)
    ) as content_type_distribution,

    -- Category breakdown
    JSON_OBJECT_AGG(
      COALESCE(f.metadata->>'category', 'Uncategorized'),
      COUNT(*)
    ) as category_distribution,

    -- Size category analysis
    JSON_OBJECT(
      'Small (<1MB)', COUNT(*) FILTER (WHERE f.file_size < 1048576),
      'Medium (1-10MB)', COUNT(*) FILTER (WHERE f.file_size BETWEEN 1048576 AND 10485760),
      'Large (10-100MB)', COUNT(*) FILTER (WHERE f.file_size BETWEEN 10485760 AND 104857600),
      'Very Large (>100MB)', COUNT(*) FILTER (WHERE f.file_size > 104857600)
    ) as size_distribution

  FROM gridfs_files f
),
storage_efficiency AS (
  SELECT 
    -- Storage efficiency metrics
    ROUND((fs.total_storage_bytes / (1024.0 * 1024 * 1024))::numeric, 2) as storage_gb,
    ROUND((fs.avg_file_size / 1048576.0)::numeric, 2) as avg_size_mb,

    -- Upload trends
    ROUND((fs.recent_uploads::numeric / fs.total_files * 100)::numeric, 1) as recent_upload_percentage,

    -- Processing efficiency
    ROUND((fs.processed_files::numeric / fs.total_files * 100)::numeric, 1) as processing_success_rate,

    -- Storage growth estimation
    CASE 
      WHEN fs.recent_uploads > 0 THEN
        ROUND((fs.recent_uploads * 12.0 / fs.total_files * fs.total_storage_bytes / (1024.0 * 1024 * 1024))::numeric, 2)
      ELSE 0
    END as estimated_yearly_growth_gb

  FROM file_statistics fs
),
top_uploaders AS (
  SELECT 
    f.metadata->>'uploadedBy' as user_id,
    u.name as user_name,
    COUNT(*) as files_uploaded,
    SUM(f.file_size) as total_bytes_uploaded,
    FORMAT_BYTES(SUM(f.file_size)) as formatted_total_size,
    AVG(f.file_size) as avg_file_size,
    MIN(f.upload_date) as first_upload,
    MAX(f.upload_date) as last_upload,

    -- User activity patterns
    COUNT(*) FILTER (WHERE f.upload_date > CURRENT_DATE - INTERVAL '7 days') as uploads_last_week,
    COUNT(*) FILTER (WHERE f.upload_date > CURRENT_DATE - INTERVAL '30 days') as uploads_last_month,

    -- Content preferences
    MODE() WITHIN GROUP (ORDER BY f.content_type) as most_common_content_type,
    COUNT(DISTINCT f.content_type) as content_type_diversity

  FROM gridfs_files f
  LEFT JOIN users u ON f.metadata->>'uploadedBy' = u.user_id
  GROUP BY f.metadata->>'uploadedBy', u.name
  HAVING COUNT(*) >= 5  -- Only users with at least 5 uploads
  ORDER BY files_uploaded DESC
  LIMIT 10
)

-- Final comprehensive analytics report
SELECT 
  -- Overall statistics
  fs.total_files,
  se.storage_gb as total_storage_gb,
  se.avg_size_mb as average_file_size_mb,
  fs.unique_uploaders,
  se.recent_upload_percentage as recent_activity_percentage,
  se.processing_success_rate as processing_success_percentage,
  se.estimated_yearly_growth_gb,

  -- Distribution insights
  fs.content_type_distribution,
  fs.category_distribution, 
  fs.size_distribution,

  -- Top users summary (as JSON array)
  (
    SELECT JSON_AGG(
      JSON_OBJECT(
        'user_name', tu.user_name,
        'files_uploaded', tu.files_uploaded,
        'total_size', tu.formatted_total_size,
        'uploads_last_month', tu.uploads_last_month
      )
      ORDER BY tu.files_uploaded DESC
    )
    FROM top_uploaders tu
  ) as top_uploaders_summary,

  -- Storage optimization recommendations
  CASE 
    WHEN se.storage_gb > 100 THEN 'Consider implementing file archiving and compression policies'
    WHEN se.recent_upload_percentage > 25 THEN 'High upload activity - monitor storage growth'
    WHEN se.processing_success_rate < 90 THEN 'Review file processing pipeline for efficiency'
    ELSE 'Storage usage is within normal parameters'
  END as optimization_recommendation,

  -- Health indicators
  JSON_OBJECT(
    'storage_health', CASE 
      WHEN se.storage_gb > 500 THEN 'High Usage'
      WHEN se.storage_gb > 100 THEN 'Moderate Usage'
      ELSE 'Low Usage'
    END,
    'activity_level', CASE 
      WHEN se.recent_upload_percentage > 20 THEN 'High Activity'
      WHEN se.recent_upload_percentage > 5 THEN 'Normal Activity'
      ELSE 'Low Activity'
    END,
    'processing_health', CASE 
      WHEN se.processing_success_rate > 95 THEN 'Excellent'
      WHEN se.processing_success_rate > 80 THEN 'Good'
      ELSE 'Needs Attention'
    END
  ) as system_health

FROM file_statistics fs
CROSS JOIN storage_efficiency se;

-- File version management and history tracking
WITH version_analysis AS (
  SELECT 
    f.file_id,
    f.filename,
    f.metadata->>'parentFileId' as parent_file_id,
    (f.metadata->>'version')::INTEGER as version_number,
    f.metadata->>'versionType' as version_type,
    f.metadata->>'versionComment' as version_comment,
    f.file_size,
    f.upload_date as version_date,
    f.metadata->>'uploadedBy' as version_author,

    -- Version relationships
    LAG(f.file_size) OVER (PARTITION BY COALESCE(f.metadata->>'parentFileId', f.file_id) ORDER BY (f.metadata->>'version')::INTEGER) as previous_version_size,
    LAG(f.upload_date) OVER (PARTITION BY COALESCE(f.metadata->>'parentFileId', f.file_id) ORDER BY (f.metadata->>'version')::INTEGER) as previous_version_date,

    -- Version statistics
    COUNT(*) OVER (PARTITION BY COALESCE(f.metadata->>'parentFileId', f.file_id)) as total_versions,
    ROW_NUMBER() OVER (PARTITION BY COALESCE(f.metadata->>'parentFileId', f.file_id) ORDER BY (f.metadata->>'version')::INTEGER DESC) as version_rank

  FROM gridfs_files f
  WHERE f.metadata->>'version' IS NOT NULL
),
version_insights AS (
  SELECT 
    va.*,

    -- Size change analysis
    CASE 
      WHEN va.previous_version_size IS NOT NULL THEN
        va.file_size - va.previous_version_size
      ELSE 0
    END as size_change_bytes,

    CASE 
      WHEN va.previous_version_size IS NOT NULL AND va.previous_version_size > 0 THEN
        ROUND(((va.file_size - va.previous_version_size)::numeric / va.previous_version_size * 100)::numeric, 1)
      ELSE 0
    END as size_change_percentage,

    -- Time between versions
    CASE 
      WHEN va.previous_version_date IS NOT NULL THEN
        EXTRACT(DAYS FROM va.version_date - va.previous_version_date)
      ELSE 0
    END as days_since_previous_version,

    -- Version classification
    CASE va.version_type
      WHEN 'major' THEN '🔴 Major Update'
      WHEN 'minor' THEN '🟡 Minor Update'  
      WHEN 'patch' THEN '🟢 Patch/Fix'
      WHEN 'update' THEN '🔵 Content Update'
      ELSE '⚪ Standard Update'
    END as version_type_display

  FROM version_analysis va
)

SELECT 
  vi.file_id,
  vi.filename,
  vi.version_number,
  vi.version_type_display,
  vi.version_comment,
  FORMAT_BYTES(vi.file_size) as current_size,
  vi.version_date,
  vi.version_author,
  vi.total_versions,

  -- Change analysis
  CASE 
    WHEN vi.size_change_bytes > 0 THEN 
      CONCAT('+', FORMAT_BYTES(vi.size_change_bytes))
    WHEN vi.size_change_bytes < 0 THEN 
      CONCAT('-', FORMAT_BYTES(ABS(vi.size_change_bytes)))
    ELSE 'No change'
  END as size_change_display,

  CONCAT(
    CASE 
      WHEN vi.size_change_percentage > 0 THEN '+'
      ELSE ''
    END,
    vi.size_change_percentage::text, '%'
  ) as size_change_percentage_display,

  -- Version timing
  CASE 
    WHEN vi.days_since_previous_version = 0 THEN 'Same day'
    WHEN vi.days_since_previous_version = 1 THEN '1 day'
    WHEN vi.days_since_previous_version < 7 THEN CONCAT(vi.days_since_previous_version, ' days')
    WHEN vi.days_since_previous_version < 30 THEN CONCAT(ROUND(vi.days_since_previous_version / 7.0, 1), ' weeks')
    ELSE CONCAT(ROUND(vi.days_since_previous_version / 30.0, 1), ' months')
  END as time_since_previous,

  -- Version context
  CASE vi.version_rank
    WHEN 1 THEN 'Latest Version'
    WHEN 2 THEN 'Previous Version'
    ELSE CONCAT('Version -', vi.version_rank - 1)
  END as version_status,

  -- Access URLs
  CONCAT('/api/files/', vi.file_id, '/download') as download_url,
  CONCAT('/api/files/', vi.file_id, '/version-info') as version_info_url,
  CASE 
    WHEN vi.parent_file_id IS NOT NULL THEN 
      CONCAT('/api/files/', vi.parent_file_id, '/versions')
    ELSE 
      CONCAT('/api/files/', vi.file_id, '/versions')
  END as version_history_url

FROM version_insights vi
WHERE vi.total_versions > 1  -- Only show files with multiple versions
ORDER BY vi.filename, vi.version_number DESC;

-- QueryLeaf provides comprehensive GridFS capabilities:
-- 1. Native file upload and download operations with SQL syntax
-- 2. Advanced metadata management and search capabilities
-- 3. Automatic chunking and streaming for large files
-- 4. Built-in file versioning and history tracking
-- 5. Comprehensive file analytics and storage insights
-- 6. Integrated permission and sharing management
-- 7. Automatic file processing and thumbnail generation
-- 8. SQL-familiar syntax for complex file operations
-- 9. Production-ready scalability with MongoDB's GridFS
-- 10. Seamless integration with application data models

Best Practices for GridFS Implementation

Design Guidelines for Production File Storage

Essential practices for MongoDB GridFS deployments:

  1. Content Type Organization: Use specialized buckets for different content types to optimize performance
  2. Metadata Design: Structure metadata for efficient querying and filtering operations
  3. Chunk Size Optimization: Configure appropriate chunk sizes based on file types and access patterns
  4. Index Strategy: Create comprehensive indexes on metadata fields for fast file discovery
  5. Version Management: Implement systematic file versioning for collaborative environments
  6. Access Control: Design permission systems integrated with application security models

Performance and Scalability Optimization

Optimize GridFS for large-scale file storage requirements:

  1. Storage Efficiency: Implement deduplication and compression strategies
  2. Query Optimization: Design metadata structures for efficient search operations
  3. Streaming Operations: Use GridFS streaming for large file uploads and downloads
  4. Caching Strategy: Implement intelligent caching for frequently accessed files
  5. Monitoring: Track storage usage, access patterns, and performance metrics
  6. Cleanup Automation: Automate expired file deletion and storage optimization

Conclusion

MongoDB GridFS provides comprehensive large file storage capabilities that seamlessly integrate with database operations while delivering high performance, automatic chunking, and sophisticated metadata management. Unlike traditional file storage approaches, GridFS maintains ACID properties, supports complex queries, and scales horizontally with your application.

Key GridFS benefits include:

  • Seamless Integration: Native MongoDB integration with consistent APIs and operations
  • Automatic Management: Built-in chunking, streaming, and integrity checking without manual implementation
  • Scalable Architecture: Horizontal scaling with MongoDB's sharding and replication capabilities
  • Rich Metadata: Sophisticated metadata storage and indexing for complex file management scenarios
  • Performance Optimization: Optimized chunk sizes and streaming operations for various content types
  • Production Ready: Enterprise-grade reliability with comprehensive monitoring and analytics

Whether you're building content management systems, media libraries, document repositories, or data archival platforms, MongoDB GridFS with QueryLeaf's familiar SQL interface provides the foundation for scalable file storage solutions.

QueryLeaf Integration: QueryLeaf automatically manages MongoDB GridFS operations through SQL-familiar syntax, handling file uploads, metadata management, and complex file queries seamlessly. Advanced file operations, version management, and storage analytics are accessible through standard SQL constructs, making sophisticated file storage capabilities available to SQL-oriented development teams.

The combination of MongoDB GridFS capabilities with SQL-style file operations makes it an ideal platform for applications requiring both advanced file management and familiar database interaction patterns, ensuring your file storage solutions remain both powerful and maintainable as they scale and evolve.

MongoDB Event Sourcing and CQRS: Distributed Systems Architecture with Immutable Audit Trails and Command Query Separation

Modern distributed systems require architectural patterns that ensure data consistency, provide complete audit trails, and enable scalable read/write operations across microservices boundaries. Traditional CRUD architectures struggle with distributed consistency challenges, complex business rules validation, and the need for comprehensive historical data tracking, particularly when dealing with financial transactions, regulatory compliance, or complex business processes that require precise event ordering and replay capabilities.

MongoDB event sourcing and Command Query Responsibility Segregation (CQRS) provide sophisticated architectural patterns that store system changes as immutable event sequences rather than current state snapshots. This approach enables complete system state reconstruction, provides natural audit trails, supports complex business logic validation, and allows independent scaling of read and write operations while maintaining eventual consistency across distributed system boundaries.

The Traditional State-Based Architecture Challenge

Conventional CRUD-based systems face significant limitations in distributed environments and audit-sensitive applications:

-- Traditional PostgreSQL state-based architecture - limited auditability and scalability
CREATE TABLE bank_accounts (
    account_id BIGINT PRIMARY KEY,
    account_number VARCHAR(20) UNIQUE NOT NULL,
    account_holder VARCHAR(255) NOT NULL,
    current_balance DECIMAL(15,2) NOT NULL DEFAULT 0.00,
    account_type VARCHAR(50) NOT NULL,
    account_status VARCHAR(20) NOT NULL DEFAULT 'active',

    -- Basic audit fields (insufficient for compliance)
    created_at TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
    created_by BIGINT NOT NULL,
    updated_by BIGINT NOT NULL
);

CREATE TABLE transactions (
    transaction_id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
    from_account_id BIGINT REFERENCES bank_accounts(account_id),
    to_account_id BIGINT REFERENCES bank_accounts(account_id),
    transaction_type VARCHAR(50) NOT NULL,
    amount DECIMAL(15,2) NOT NULL,
    transaction_status VARCHAR(20) NOT NULL DEFAULT 'pending',

    -- Limited audit information
    processed_at TIMESTAMPTZ,
    created_at TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
    reference_number VARCHAR(100),
    description TEXT
);

-- Traditional transaction processing (problematic in distributed systems)
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;

-- Step 1: Validate source account
SELECT account_id, current_balance, account_status 
FROM bank_accounts 
WHERE account_id = $1 AND account_status = 'active'
FOR UPDATE;

-- Step 2: Validate destination account
SELECT account_id, account_status 
FROM bank_accounts 
WHERE account_id = $2 AND account_status = 'active'
FOR UPDATE;

-- Step 3: Check sufficient funds
IF (SELECT current_balance FROM bank_accounts WHERE account_id = $1) >= $3 THEN

    -- Step 4: Update source account balance
    UPDATE bank_accounts 
    SET current_balance = current_balance - $3,
        updated_at = CURRENT_TIMESTAMP,
        updated_by = $4
    WHERE account_id = $1;

    -- Step 5: Update destination account balance
    UPDATE bank_accounts 
    SET current_balance = current_balance + $3,
        updated_at = CURRENT_TIMESTAMP,
        updated_by = $4
    WHERE account_id = $2;

    -- Step 6: Record transaction
    INSERT INTO transactions (
        from_account_id, to_account_id, transaction_type, 
        amount, transaction_status, processed_at, reference_number, description
    )
    VALUES ($1, $2, 'transfer', $3, 'completed', CURRENT_TIMESTAMP, $5, $6);

ELSE
    RAISE EXCEPTION 'Insufficient funds';
END IF;

COMMIT;

-- Problems with traditional state-based architecture:
-- 1. Lost historical information - only current state is preserved
-- 2. Audit trail limitations - changes tracked inadequately
-- 3. Distributed consistency challenges - difficult to maintain ACID across services
-- 4. Complex business logic validation - scattered across multiple update statements
-- 5. Limited replay capability - cannot reconstruct past states reliably
-- 6. Scalability bottlenecks - read and write operations compete for same resources
-- 7. Integration complexity - difficult to publish domain events to other systems
-- 8. Compliance gaps - insufficient audit trails for regulatory requirements
-- 9. Error recovery challenges - complex rollback procedures
-- 10. Testing difficulties - hard to reproduce exact historical conditions

-- Audit queries are complex and incomplete
WITH account_history AS (
    SELECT 
        account_id,
        'balance_update' as event_type,
        updated_at as event_timestamp,
        updated_by as user_id,
        current_balance as new_value,
        LAG(current_balance) OVER (
            PARTITION BY account_id 
            ORDER BY updated_at
        ) as previous_value
    FROM bank_accounts_audit -- Requires separate audit table setup
    WHERE account_id = $1

    UNION ALL

    SELECT 
        COALESCE(from_account_id, to_account_id) as account_id,
        transaction_type as event_type,
        processed_at as event_timestamp,
        created_by as user_id,
        amount as new_value,
        NULL as previous_value
    FROM transactions
    WHERE from_account_id = $1 OR to_account_id = $1
)

SELECT 
    event_timestamp,
    event_type,
    new_value,
    previous_value,
    COALESCE(new_value - previous_value, new_value) as change_amount,

    -- Limited reconstruction capability
    SUM(COALESCE(new_value - previous_value, new_value)) OVER (
        ORDER BY event_timestamp 
        ROWS UNBOUNDED PRECEDING
    ) as running_balance

FROM account_history
ORDER BY event_timestamp DESC
LIMIT 100;

-- Challenges with traditional audit approaches:
-- 1. Incomplete event capture - many state changes not properly tracked
-- 2. Limited business context - technical changes without domain meaning
-- 3. Performance overhead - separate audit tables and triggers
-- 4. Query complexity - reconstructing historical states requires complex joins
-- 5. Storage inefficiency - duplicate data in audit tables
-- 6. Consistency issues - audit and primary data can become out of sync
-- 7. Limited replay capability - cannot fully recreate business scenarios
-- 8. Integration challenges - difficult to publish meaningful events to external systems

MongoDB event sourcing provides comprehensive solutions for these distributed system challenges:

// MongoDB Event Sourcing and CQRS Implementation
const { MongoClient, ObjectId } = require('mongodb');

const client = new MongoClient('mongodb://localhost:27017/?replicaSet=rs0');
const db = client.db('event_sourcing_system');

// Advanced Event Sourcing and CQRS Manager
class MongoDBEventSourcedSystem {
  constructor(db, config = {}) {
    this.db = db;
    this.collections = {
      eventStore: db.collection('event_store'),
      aggregateSnapshots: db.collection('aggregate_snapshots'),
      readModels: {
        accountSummary: db.collection('account_summary_view'),
        transactionHistory: db.collection('transaction_history_view'),
        complianceAudit: db.collection('compliance_audit_view'),
        balanceProjections: db.collection('balance_projections_view')
      },
      commandHandlers: db.collection('command_handlers'),
      eventSubscriptions: db.collection('event_subscriptions'),
      sagaState: db.collection('saga_state')
    };

    this.config = {
      // Event store configuration
      snapshotFrequency: config.snapshotFrequency || 100,
      eventRetentionDays: config.eventRetentionDays || 2555, // 7 years for compliance
      enableCompression: config.enableCompression !== false,
      enableEncryption: config.enableEncryption || false,

      // CQRS configuration
      readModelUpdateBatchSize: config.readModelUpdateBatchSize || 1000,
      eventualConsistencyTimeout: config.eventualConsistencyTimeout || 30000,
      enableOptimisticConcurrency: config.enableOptimisticConcurrency !== false,

      // Business configuration
      maxTransactionAmount: config.maxTransactionAmount || 1000000,
      enableFraudDetection: config.enableFraudDetection !== false,
      enableRegulatory Compliance: config.enableRegulatoryCompliance !== false,

      // Performance configuration
      enableEventIndexing: config.enableEventIndexing !== false,
      enableReadModelCaching: config.enableReadModelCaching !== false,
      eventProcessingConcurrency: config.eventProcessingConcurrency || 10
    };

    this.setupEventStore();
    this.initializeReadModels();
    this.startEventProcessors();
  }

  async setupEventStore() {
    console.log('Setting up MongoDB event store with advanced indexing...');

    try {
      // Create event store with optimal indexes for event sourcing patterns
      await this.collections.eventStore.createIndexes([
        // Primary event lookup by aggregate
        { 
          key: { aggregateId: 1, version: 1 }, 
          unique: true,
          name: 'aggregate_version_unique'
        },

        // Event ordering and streaming
        { 
          key: { aggregateId: 1, eventTimestamp: 1 }, 
          name: 'aggregate_chronological'
        },

        // Global event ordering for projections
        { 
          key: { eventTimestamp: 1, _id: 1 }, 
          name: 'global_event_order'
        },

        // Event type filtering
        { 
          key: { eventType: 1, eventTimestamp: 1 }, 
          name: 'event_type_chronological'
        },

        // Compliance and audit queries
        { 
          key: { 'eventData.userId': 1, eventTimestamp: 1 }, 
          name: 'user_audit_trail'
        },

        // Correlation and saga support
        { 
          key: { correlationId: 1 }, 
          sparse: true,
          name: 'correlation_lookup'
        },

        // Event replay and reconstruction
        { 
          key: { aggregateType: 1, eventTimestamp: 1 }, 
          name: 'aggregate_type_replay'
        }
      ]);

      // Create snapshot collection indexes
      await this.collections.aggregateSnapshots.createIndexes([
        { 
          key: { aggregateId: 1, version: -1 }, 
          name: 'latest_snapshot_lookup'
        },

        { 
          key: { aggregateType: 1, snapshotTimestamp: -1 }, 
          name: 'aggregate_type_snapshots'
        }
      ]);

      console.log('Event store indexes created successfully');

    } catch (error) {
      console.error('Error setting up event store:', error);
      throw error;
    }
  }

  async handleCommand(command) {
    console.log(`Processing command: ${command.commandType}`);

    try {
      // Start distributed transaction for command processing
      const session = client.startSession();

      let result;

      await session.withTransaction(async () => {
        // Load current aggregate state
        const aggregate = await this.loadAggregate(command.aggregateId, session);

        // Validate command against current state and business rules
        const validation = await this.validateCommand(command, aggregate);
        if (!validation.valid) {
          throw new Error(`Command validation failed: ${validation.reason}`);
        }

        // Execute command and generate events
        const events = await this.executeCommand(command, aggregate);

        // Store events in event store
        result = await this.storeEvents(command.aggregateId, events, aggregate.version, session);

        // Update aggregate snapshot if needed
        if ((aggregate.version + events.length) % this.config.snapshotFrequency === 0) {
          await this.createSnapshot(command.aggregateId, events, session);
        }

      }, {
        readConcern: { level: 'majority' },
        writeConcern: { w: 'majority' },
        readPreference: 'primary'
      });

      await session.endSession();

      // Publish events for read model updates (eventual consistency)
      await this.publishEventsForProcessing(result.events);

      return {
        success: true,
        aggregateId: command.aggregateId,
        version: result.newVersion,
        eventsGenerated: result.events.length
      };

    } catch (error) {
      console.error('Error handling command:', error);
      throw error;
    }
  }

  async executeCommand(command, aggregate) {
    const events = [];
    const timestamp = new Date();
    const commandId = new ObjectId();

    switch (command.commandType) {
      case 'CreateAccount':
        events.push({
          eventId: new ObjectId(),
          aggregateId: command.aggregateId,
          aggregateType: 'BankAccount',
          eventType: 'AccountCreated',
          eventVersion: '1.0',
          eventTimestamp: timestamp,
          commandId: commandId,

          eventData: {
            accountNumber: command.data.accountNumber,
            accountHolder: command.data.accountHolder,
            accountType: command.data.accountType,
            initialBalance: command.data.initialBalance || 0,
            currency: command.data.currency || 'USD',
            createdBy: command.userId,
            createdAt: timestamp
          },

          eventMetadata: {
            sourceIp: command.metadata?.sourceIp,
            userAgent: command.metadata?.userAgent,
            correlationId: command.correlationId,
            causationId: command.causationId
          }
        });
        break;

      case 'TransferFunds':
        // Generate comprehensive events for fund transfer
        const transferId = new ObjectId();

        // Debit event
        events.push({
          eventId: new ObjectId(),
          aggregateId: command.data.fromAccountId,
          aggregateType: 'BankAccount', 
          eventType: 'FundsDebited',
          eventVersion: '1.0',
          eventTimestamp: timestamp,
          commandId: commandId,

          eventData: {
            transferId: transferId,
            amount: command.data.amount,
            currency: command.data.currency || 'USD',
            toAccountId: command.data.toAccountId,
            reference: command.data.reference,
            description: command.data.description,
            debitedBy: command.userId,
            debitedAt: timestamp,

            // Business context
            transferType: 'outbound',
            feeAmount: command.data.feeAmount || 0,
            exchangeRate: command.data.exchangeRate || 1.0
          },

          eventMetadata: {
            sourceIp: command.metadata?.sourceIp,
            userAgent: command.metadata?.userAgent,
            correlationId: command.correlationId,
            causationId: command.causationId,
            regulatoryFlags: this.calculateRegulatoryFlags(command)
          }
        });

        // Credit event (separate aggregate)
        events.push({
          eventId: new ObjectId(),
          aggregateId: command.data.toAccountId,
          aggregateType: 'BankAccount',
          eventType: 'FundsCredited', 
          eventVersion: '1.0',
          eventTimestamp: timestamp,
          commandId: commandId,

          eventData: {
            transferId: transferId,
            amount: command.data.amount,
            currency: command.data.currency || 'USD',
            fromAccountId: command.data.fromAccountId,
            reference: command.data.reference,
            description: command.data.description,
            creditedBy: command.userId,
            creditedAt: timestamp,

            // Business context
            transferType: 'inbound',
            feeAmount: 0,
            exchangeRate: command.data.exchangeRate || 1.0
          },

          eventMetadata: {
            sourceIp: command.metadata?.sourceIp,
            userAgent: command.metadata?.userAgent,
            correlationId: command.correlationId,
            causationId: command.causationId,
            regulatoryFlags: this.calculateRegulatoryFlags(command)
          }
        });
        break;

      case 'FreezeAccount':
        events.push({
          eventId: new ObjectId(),
          aggregateId: command.aggregateId,
          aggregateType: 'BankAccount',
          eventType: 'AccountFrozen',
          eventVersion: '1.0',
          eventTimestamp: timestamp,
          commandId: commandId,

          eventData: {
            reason: command.data.reason,
            frozenBy: command.userId,
            frozenAt: timestamp,
            expectedDuration: command.data.expectedDuration,
            complianceReference: command.data.complianceReference
          },

          eventMetadata: {
            sourceIp: command.metadata?.sourceIp,
            userAgent: command.metadata?.userAgent,
            correlationId: command.correlationId,
            causationId: command.causationId,
            securityLevel: 'high'
          }
        });
        break;

      default:
        throw new Error(`Unknown command type: ${command.commandType}`);
    }

    return events;
  }

  async validateCommand(command, aggregate) {
    console.log(`Validating command: ${command.commandType} for aggregate: ${command.aggregateId}`);

    try {
      switch (command.commandType) {
        case 'CreateAccount':
          // Check if account already exists
          if (aggregate.currentState && aggregate.currentState.accountNumber) {
            return { 
              valid: false, 
              reason: 'Account already exists'
            };
          }

          // Validate account number uniqueness
          const existingAccount = await this.collections.readModels.accountSummary.findOne({
            accountNumber: command.data.accountNumber
          });

          if (existingAccount) {
            return { 
              valid: false, 
              reason: 'Account number already in use'
            };
          }

          return { valid: true };

        case 'TransferFunds':
          // Validate source account exists and is active
          if (!aggregate.currentState || aggregate.currentState.status !== 'active') {
            return { 
              valid: false, 
              reason: 'Source account not found or not active'
            };
          }

          // Check sufficient funds
          if (aggregate.currentState.balance < command.data.amount + (command.data.feeAmount || 0)) {
            return { 
              valid: false, 
              reason: 'Insufficient funds'
            };
          }

          // Validate destination account
          const destinationAccount = await this.collections.readModels.accountSummary.findOne({
            _id: command.data.toAccountId,
            status: 'active'
          });

          if (!destinationAccount) {
            return { 
              valid: false, 
              reason: 'Destination account not found or not active'
            };
          }

          // Business rule validations
          if (command.data.amount > this.config.maxTransactionAmount) {
            return { 
              valid: false, 
              reason: `Transaction amount exceeds maximum allowed (${this.config.maxTransactionAmount})`
            };
          }

          // Fraud detection
          if (this.config.enableFraudDetection) {
            const fraudCheck = await this.performFraudDetection(command, aggregate);
            if (!fraudCheck.valid) {
              return fraudCheck;
            }
          }

          return { valid: true };

        case 'FreezeAccount':
          if (!aggregate.currentState || aggregate.currentState.status === 'frozen') {
            return { 
              valid: false, 
              reason: 'Account not found or already frozen'
            };
          }

          return { valid: true };

        default:
          return { 
            valid: false, 
            reason: `Unknown command type: ${command.commandType}`
          };
      }
    } catch (error) {
      console.error('Error validating command:', error);
      return { 
        valid: false, 
        reason: `Validation error: ${error.message}`
      };
    }
  }

  async storeEvents(aggregateId, events, expectedVersion, session) {
    console.log(`Storing ${events.length} events for aggregate: ${aggregateId}`);

    try {
      const eventsToStore = events.map((event, index) => ({
        ...event,
        aggregateId: aggregateId,
        version: expectedVersion + index + 1,
        storedAt: new Date()
      }));

      // Store events with optimistic concurrency control
      const result = await this.collections.eventStore.insertMany(
        eventsToStore,
        { session }
      );

      return {
        events: eventsToStore,
        newVersion: expectedVersion + events.length,
        storedEventIds: result.insertedIds
      };

    } catch (error) {
      if (error.code === 11000) { // Duplicate key error
        throw new Error('Concurrency conflict - aggregate was modified by another process');
      }
      throw error;
    }
  }

  async loadAggregate(aggregateId, session = null) {
    console.log(`Loading aggregate: ${aggregateId}`);

    try {
      // Try to load latest snapshot
      const snapshot = await this.collections.aggregateSnapshots
        .findOne(
          { aggregateId: aggregateId },
          { 
            sort: { version: -1 },
            session: session
          }
        );

      let fromVersion = 0;
      let currentState = null;

      if (snapshot) {
        fromVersion = snapshot.version;
        currentState = snapshot.snapshotData;
      }

      // Load events since snapshot
      const events = await this.collections.eventStore
        .find(
          {
            aggregateId: aggregateId,
            version: { $gt: fromVersion }
          },
          {
            sort: { version: 1 },
            session: session
          }
        )
        .toArray();

      // Replay events to build current state
      let version = fromVersion;
      for (const event of events) {
        currentState = this.applyEvent(currentState, event);
        version = event.version;
      }

      return {
        aggregateId: aggregateId,
        version: version,
        currentState: currentState,
        lastModified: events.length > 0 ? events[events.length - 1].eventTimestamp : 
                      snapshot?.snapshotTimestamp || null
      };

    } catch (error) {
      console.error('Error loading aggregate:', error);
      throw error;
    }
  }

  applyEvent(currentState, event) {
    // Event replay logic for state reconstruction
    let newState = currentState ? { ...currentState } : {};

    switch (event.eventType) {
      case 'AccountCreated':
        newState = {
          accountId: event.aggregateId,
          accountNumber: event.eventData.accountNumber,
          accountHolder: event.eventData.accountHolder,
          accountType: event.eventData.accountType,
          balance: event.eventData.initialBalance,
          currency: event.eventData.currency,
          status: 'active',
          createdAt: event.eventData.createdAt,
          createdBy: event.eventData.createdBy,
          version: event.version
        };
        break;

      case 'FundsDebited':
        newState.balance = (newState.balance || 0) - event.eventData.amount - (event.eventData.feeAmount || 0);
        newState.lastTransactionAt = event.eventTimestamp;
        newState.version = event.version;
        break;

      case 'FundsCredited':
        newState.balance = (newState.balance || 0) + event.eventData.amount;
        newState.lastTransactionAt = event.eventTimestamp;
        newState.version = event.version;
        break;

      case 'AccountFrozen':
        newState.status = 'frozen';
        newState.frozenAt = event.eventData.frozenAt;
        newState.frozenBy = event.eventData.frozenBy;
        newState.frozenReason = event.eventData.reason;
        newState.version = event.version;
        break;

      default:
        console.warn(`Unknown event type for replay: ${event.eventType}`);
    }

    return newState;
  }

  async createSnapshot(aggregateId, recentEvents, session) {
    console.log(`Creating snapshot for aggregate: ${aggregateId}`);

    try {
      // Rebuild current state
      const aggregate = await this.loadAggregate(aggregateId, session);

      const snapshot = {
        _id: new ObjectId(),
        aggregateId: aggregateId,
        aggregateType: 'BankAccount', // This should be dynamic based on aggregate type
        version: aggregate.version,
        snapshotData: aggregate.currentState,
        snapshotTimestamp: new Date(),
        eventCount: aggregate.version,

        // Metadata
        compressionEnabled: this.config.enableCompression,
        encryptionEnabled: this.config.enableEncryption
      };

      await this.collections.aggregateSnapshots.replaceOne(
        { aggregateId: aggregateId },
        snapshot,
        { upsert: true, session }
      );

      console.log(`Snapshot created for aggregate ${aggregateId} at version ${aggregate.version}`);

    } catch (error) {
      console.error('Error creating snapshot:', error);
      // Don't fail the main transaction for snapshot errors
    }
  }

  async publishEventsForProcessing(events) {
    console.log(`Publishing ${events.length} events for read model processing`);

    try {
      // Publish events to read model processors (eventual consistency)
      for (const event of events) {
        await this.updateReadModels(event);
      }

      // Trigger any saga workflows
      await this.processSagaEvents(events);

    } catch (error) {
      console.error('Error publishing events for processing:', error);
      // Log error but don't fail - eventual consistency will retry
    }
  }

  async updateReadModels(event) {
    console.log(`Updating read models for event: ${event.eventType}`);

    try {
      switch (event.eventType) {
        case 'AccountCreated':
          await this.collections.readModels.accountSummary.replaceOne(
            { _id: event.aggregateId },
            {
              _id: event.aggregateId,
              accountNumber: event.eventData.accountNumber,
              accountHolder: event.eventData.accountHolder,
              accountType: event.eventData.accountType,
              currentBalance: event.eventData.initialBalance,
              currency: event.eventData.currency,
              status: 'active',
              createdAt: event.eventData.createdAt,
              createdBy: event.eventData.createdBy,
              lastUpdated: event.eventTimestamp,
              version: event.version
            },
            { upsert: true }
          );
          break;

        case 'FundsDebited':
        case 'FundsCredited':
          // Update account summary
          const balanceChange = event.eventType === 'FundsCredited' ? 
            event.eventData.amount : 
            -(event.eventData.amount + (event.eventData.feeAmount || 0));

          await this.collections.readModels.accountSummary.updateOne(
            { _id: event.aggregateId },
            {
              $inc: { currentBalance: balanceChange },
              $set: { 
                lastTransactionAt: event.eventTimestamp,
                lastUpdated: event.eventTimestamp,
                version: event.version
              }
            }
          );

          // Create transaction history entry
          await this.collections.readModels.transactionHistory.insertOne({
            _id: new ObjectId(),
            transactionId: event.eventData.transferId,
            accountId: event.aggregateId,
            eventId: event.eventId,
            transactionType: event.eventType,
            amount: event.eventData.amount,
            currency: event.eventData.currency,
            counterpartyAccountId: event.eventType === 'FundsCredited' ? 
              event.eventData.fromAccountId : event.eventData.toAccountId,
            reference: event.eventData.reference,
            description: event.eventData.description,
            timestamp: event.eventTimestamp,
            feeAmount: event.eventData.feeAmount || 0,
            exchangeRate: event.eventData.exchangeRate,
            balanceAfter: null, // Will be calculated in a separate process

            // Audit and compliance
            processedBy: event.eventData.debitedBy || event.eventData.creditedBy,
            sourceIp: event.eventMetadata?.sourceIp,
            regulatoryFlags: event.eventMetadata?.regulatoryFlags || []
          });

          // Update compliance audit view
          await this.updateComplianceAudit(event);

          break;

        case 'AccountFrozen':
          await this.collections.readModels.accountSummary.updateOne(
            { _id: event.aggregateId },
            {
              $set: {
                status: 'frozen',
                frozenAt: event.eventData.frozenAt,
                frozenBy: event.eventData.frozenBy,
                frozenReason: event.eventData.reason,
                lastUpdated: event.eventTimestamp,
                version: event.version
              }
            }
          );
          break;
      }

    } catch (error) {
      console.error('Error updating read models:', error);
      // In a production system, you'd want to implement retry logic here
    }
  }

  async updateComplianceAudit(event) {
    // Create comprehensive compliance audit entries
    const auditEntry = {
      _id: new ObjectId(),
      eventId: event.eventId,
      aggregateId: event.aggregateId,
      eventType: event.eventType,
      timestamp: event.eventTimestamp,

      // Financial details
      amount: event.eventData.amount,
      currency: event.eventData.currency,

      // Regulatory information
      regulatoryFlags: event.eventMetadata?.regulatoryFlags || [],
      complianceReferences: event.eventData.complianceReference ? [event.eventData.complianceReference] : [],

      // User and system context
      performedBy: event.eventData.debitedBy || event.eventData.creditedBy,
      sourceIp: event.eventMetadata?.sourceIp,
      userAgent: event.eventMetadata?.userAgent,

      // Traceability
      correlationId: event.eventMetadata?.correlationId,
      causationId: event.eventMetadata?.causationId,

      // Classification
      riskLevel: this.calculateRiskLevel(event),
      complianceCategories: this.classifyCompliance(event),

      // Retention
      retentionDate: new Date(Date.now() + this.config.eventRetentionDays * 24 * 60 * 60 * 1000)
    };

    await this.collections.readModels.complianceAudit.insertOne(auditEntry);
  }

  async getAccountHistory(accountId, options = {}) {
    console.log(`Retrieving account history for: ${accountId}`);

    try {
      const limit = options.limit || 100;
      const fromDate = options.fromDate || new Date(0);
      const toDate = options.toDate || new Date();

      // Query events with temporal filtering
      const events = await this.collections.eventStore.aggregate([
        {
          $match: {
            aggregateId: accountId,
            eventTimestamp: { $gte: fromDate, $lte: toDate }
          }
        },
        {
          $sort: { version: options.reverse ? -1 : 1 }
        },
        {
          $limit: limit
        },
        {
          $project: {
            eventId: 1,
            eventType: 1,
            eventTimestamp: 1,
            version: 1,
            eventData: 1,
            eventMetadata: 1,

            // Add business-friendly formatting
            humanReadableType: {
              $switch: {
                branches: [
                  { case: { $eq: ['$eventType', 'AccountCreated'] }, then: 'Account Opened' },
                  { case: { $eq: ['$eventType', 'FundsDebited'] }, then: 'Funds Withdrawn' },
                  { case: { $eq: ['$eventType', 'FundsCredited'] }, then: 'Funds Deposited' },
                  { case: { $eq: ['$eventType', 'AccountFrozen'] }, then: 'Account Frozen' }
                ],
                default: '$eventType'
              }
            }
          }
        }
      ]).toArray();

      // Optionally rebuild state at each point for balance tracking
      if (options.includeBalanceHistory) {
        let runningBalance = 0;

        for (const event of events) {
          switch (event.eventType) {
            case 'AccountCreated':
              runningBalance = event.eventData.initialBalance || 0;
              break;
            case 'FundsCredited':
              runningBalance += event.eventData.amount;
              break;
            case 'FundsDebited':
              runningBalance -= (event.eventData.amount + (event.eventData.feeAmount || 0));
              break;
          }

          event.balanceAfterEvent = runningBalance;
        }
      }

      return {
        accountId: accountId,
        events: events,
        totalEvents: events.length,
        dateRange: { from: fromDate, to: toDate }
      };

    } catch (error) {
      console.error('Error retrieving account history:', error);
      throw error;
    }
  }

  async getComplianceAuditTrail(filters = {}) {
    console.log('Generating compliance audit trail...');

    try {
      const pipeline = [];

      // Match stage based on filters
      const matchStage = {};

      if (filters.accountId) {
        matchStage.aggregateId = filters.accountId;
      }

      if (filters.userId) {
        matchStage.performedBy = filters.userId;
      }

      if (filters.dateRange) {
        matchStage.timestamp = {
          $gte: filters.dateRange.start,
          $lte: filters.dateRange.end
        };
      }

      if (filters.riskLevel) {
        matchStage.riskLevel = filters.riskLevel;
      }

      if (filters.complianceCategories) {
        matchStage.complianceCategories = { $in: filters.complianceCategories };
      }

      pipeline.push({ $match: matchStage });

      // Sort by timestamp
      pipeline.push({
        $sort: { timestamp: -1 }
      });

      // Add limit if specified
      if (filters.limit) {
        pipeline.push({ $limit: filters.limit });
      }

      // Enhance with additional context
      pipeline.push({
        $lookup: {
          from: 'account_summary_view',
          localField: 'aggregateId',
          foreignField: '_id',
          as: 'accountInfo'
        }
      });

      pipeline.push({
        $addFields: {
          accountInfo: { $arrayElemAt: ['$accountInfo', 0] }
        }
      });

      const auditTrail = await this.collections.readModels.complianceAudit
        .aggregate(pipeline)
        .toArray();

      return {
        auditEntries: auditTrail,
        totalEntries: auditTrail.length,
        generatedAt: new Date(),
        filters: filters
      };

    } catch (error) {
      console.error('Error generating compliance audit trail:', error);
      throw error;
    }
  }

  // Utility methods for business logic
  calculateRegulatoryFlags(command) {
    const flags = [];

    if (command.commandType === 'TransferFunds') {
      if (command.data.amount >= 10000) {
        flags.push('large_transaction');
      }

      if (command.data.currency !== 'USD') {
        flags.push('foreign_currency');
      }

      // Add more regulatory checks as needed
    }

    return flags;
  }

  calculateRiskLevel(event) {
    if (event.eventMetadata?.regulatoryFlags?.includes('large_transaction')) {
      return 'high';
    }

    if (event.eventData.amount > 1000) {
      return 'medium';
    }

    return 'low';
  }

  classifyCompliance(event) {
    const categories = ['financial_transaction'];

    if (event.eventMetadata?.regulatoryFlags?.includes('large_transaction')) {
      categories.push('aml_monitoring');
    }

    if (event.eventMetadata?.regulatoryFlags?.includes('foreign_currency')) {
      categories.push('currency_reporting');
    }

    return categories;
  }

  async performFraudDetection(command, aggregate) {
    // Simplified fraud detection logic
    const recentTransactions = await this.collections.readModels.transactionHistory
      .countDocuments({
        accountId: command.data.fromAccountId,
        timestamp: { $gte: new Date(Date.now() - 24 * 60 * 60 * 1000) }
      });

    if (recentTransactions > 20) {
      return {
        valid: false,
        reason: 'Suspicious activity detected - too many transactions in 24 hours'
      };
    }

    return { valid: true };
  }
}

module.exports = { MongoDBEventSourcedSystem };

Advanced Event Sourcing Patterns and CQRS Implementation

Sophisticated Saga Orchestration and Process Management

Implement complex business processes using event-driven saga patterns:

// Advanced Saga and Process Manager for Complex Business Workflows
class EventSourcedSagaManager {
  constructor(db, eventSystem) {
    this.db = db;
    this.eventSystem = eventSystem;
    this.sagas = {
      transferSaga: db.collection('transfer_saga_state'),
      complianceSaga: db.collection('compliance_saga_state'),
      fraudSaga: db.collection('fraud_detection_saga_state')
    };

    this.setupSagaProcessors();
  }

  async handleSagaEvent(event) {
    console.log(`Processing saga event: ${event.eventType}`);

    switch (event.eventType) {
      case 'FundsDebited':
        await this.processFundTransferSaga(event);
        await this.processComplianceSaga(event);
        break;

      case 'SuspiciousActivityDetected':
        await this.processFraudInvestigationSaga(event);
        break;

      case 'ComplianceReviewRequired':
        await this.processComplianceReviewSaga(event);
        break;
    }
  }

  async processFundTransferSaga(event) {
    const sagaId = event.eventData.transferId;

    // Load or create saga state
    let sagaState = await this.sagas.transferSaga.findOne({ sagaId: sagaId });

    if (!sagaState) {
      sagaState = {
        sagaId: sagaId,
        sagaType: 'FundTransfer',
        state: 'DebitCompleted',
        fromAccountId: event.aggregateId,
        toAccountId: event.eventData.toAccountId,
        amount: event.eventData.amount,
        currency: event.eventData.currency,
        startedAt: event.eventTimestamp,

        // Saga workflow state
        steps: {
          debitCompleted: true,
          creditPending: true,
          notificationSent: false,
          auditRecorded: false
        },

        // Compensation tracking
        compensationEvents: [],

        // Timeout handling
        timeoutAt: new Date(Date.now() + 300000), // 5 minute timeout

        // Error handling
        retryCount: 0,
        maxRetries: 3,
        lastError: null
      };
    }

    // Process next saga step
    if (sagaState.state === 'DebitCompleted' && !sagaState.steps.creditPending) {
      await this.sendCreditCommand(sagaState);
      sagaState.state = 'CreditPending';
    }

    // Update saga state
    await this.sagas.transferSaga.replaceOne(
      { sagaId: sagaId },
      sagaState,
      { upsert: true }
    );
  }

  async sendCreditCommand(sagaState) {
    const creditCommand = {
      commandType: 'CreditFunds',
      aggregateId: sagaState.toAccountId,
      commandId: new ObjectId(),
      correlationId: sagaState.sagaId,

      data: {
        transferId: sagaState.sagaId,
        amount: sagaState.amount,
        currency: sagaState.currency,
        fromAccountId: sagaState.fromAccountId,
        reference: `Transfer from ${sagaState.fromAccountId}`
      },

      metadata: {
        sagaId: sagaState.sagaId,
        sagaType: sagaState.sagaType
      }
    };

    await this.eventSystem.handleCommand(creditCommand);
  }

  async processComplianceSaga(event) {
    if (event.eventData.amount >= 10000) {
      const sagaId = new ObjectId();

      const complianceSaga = {
        sagaId: sagaId,
        sagaType: 'ComplianceReview',
        state: 'ReviewRequired',
        transactionEventId: event.eventId,
        accountId: event.aggregateId,
        amount: event.eventData.amount,

        // Review workflow
        reviewSteps: {
          amlCheck: 'pending',
          riskAssessment: 'pending',
          manualReview: 'pending',
          regulatoryReporting: 'pending'
        },

        reviewAssignedTo: null,
        reviewDeadline: new Date(Date.now() + 72 * 60 * 60 * 1000), // 72 hours

        startedAt: event.eventTimestamp,
        timeoutAt: new Date(Date.now() + 7 * 24 * 60 * 60 * 1000) // 7 days
      };

      await this.sagas.complianceSaga.insertOne(complianceSaga);

      // Trigger automated compliance checks
      await this.triggerAutomatedComplianceChecks(complianceSaga);
    }
  }

  async triggerAutomatedComplianceChecks(sagaState) {
    // Trigger AML check
    const amlCommand = {
      commandType: 'PerformAMLCheck',
      aggregateId: new ObjectId(),
      commandId: new ObjectId(),

      data: {
        transactionEventId: sagaState.transactionEventId,
        accountId: sagaState.accountId,
        amount: sagaState.amount,
        sagaId: sagaState.sagaId
      }
    };

    await this.eventSystem.handleCommand(amlCommand);
  }
}

SQL-Style Event Sourcing and CQRS with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB event sourcing and CQRS operations:

-- QueryLeaf Event Sourcing and CQRS operations with SQL-familiar syntax

-- Create event store collections with proper configuration
CREATE COLLECTION event_store 
WITH (
  storage_engine = 'wiredTiger',
  compression = 'zlib'
);

CREATE COLLECTION aggregate_snapshots
WITH (
  storage_engine = 'wiredTiger', 
  compression = 'snappy'
);

-- Query event store for complete audit trail with sophisticated filtering
WITH event_timeline AS (
  SELECT 
    event_id,
    aggregate_id,
    aggregate_type,
    event_type,
    event_timestamp,
    version,

    -- Extract key business data
    JSON_EXTRACT(event_data, '$.amount') as transaction_amount,
    JSON_EXTRACT(event_data, '$.accountNumber') as account_number,
    JSON_EXTRACT(event_data, '$.transferId') as transfer_id,
    JSON_EXTRACT(event_data, '$.fromAccountId') as from_account,
    JSON_EXTRACT(event_data, '$.toAccountId') as to_account,

    -- Extract audit context
    JSON_EXTRACT(event_metadata, '$.sourceIp') as source_ip,
    JSON_EXTRACT(event_metadata, '$.userAgent') as user_agent,
    JSON_EXTRACT(event_metadata, '$.correlationId') as correlation_id,
    JSON_EXTRACT(event_data, '$.debitedBy') as performed_by,
    JSON_EXTRACT(event_data, '$.creditedBy') as credited_by,

    -- Extract regulatory information
    JSON_EXTRACT(event_metadata, '$.regulatoryFlags') as regulatory_flags,
    JSON_EXTRACT(event_data, '$.complianceReference') as compliance_ref,

    -- Event classification
    CASE event_type
      WHEN 'AccountCreated' THEN 'account_lifecycle'
      WHEN 'FundsDebited' THEN 'financial_transaction'
      WHEN 'FundsCredited' THEN 'financial_transaction'
      WHEN 'AccountFrozen' THEN 'security_action'
      ELSE 'other'
    END as event_category,

    -- Transaction direction analysis
    CASE event_type
      WHEN 'FundsDebited' THEN 'outbound'
      WHEN 'FundsCredited' THEN 'inbound'
      ELSE 'non_transaction'
    END as transaction_direction,

    -- Risk level calculation
    CASE 
      WHEN JSON_EXTRACT(event_data, '$.amount')::DECIMAL > 50000 THEN 'high'
      WHEN JSON_EXTRACT(event_data, '$.amount')::DECIMAL > 10000 THEN 'medium'
      WHEN JSON_EXTRACT(event_data, '$.amount')::DECIMAL > 1000 THEN 'low'
      ELSE 'minimal'
    END as risk_level

  FROM event_store
  WHERE event_timestamp >= CURRENT_TIMESTAMP - INTERVAL '90 days'
),

aggregate_state_reconstruction AS (
  -- Reconstruct current state for each aggregate using window functions
  SELECT 
    aggregate_id,
    event_timestamp,
    event_type,
    version,

    -- Running balance calculation for accounts
    CASE 
      WHEN event_type = 'AccountCreated' THEN 
        JSON_EXTRACT(event_data, '$.initialBalance')::DECIMAL
      WHEN event_type = 'FundsCredited' THEN 
        JSON_EXTRACT(event_data, '$.amount')::DECIMAL
      WHEN event_type = 'FundsDebited' THEN 
        -(JSON_EXTRACT(event_data, '$.amount')::DECIMAL + COALESCE(JSON_EXTRACT(event_data, '$.feeAmount')::DECIMAL, 0))
      ELSE 0
    END as balance_change,

    SUM(CASE 
      WHEN event_type = 'AccountCreated' THEN 
        JSON_EXTRACT(event_data, '$.initialBalance')::DECIMAL
      WHEN event_type = 'FundsCredited' THEN 
        JSON_EXTRACT(event_data, '$.amount')::DECIMAL
      WHEN event_type = 'FundsDebited' THEN 
        -(JSON_EXTRACT(event_data, '$.amount')::DECIMAL + COALESCE(JSON_EXTRACT(event_data, '$.feeAmount')::DECIMAL, 0))
      ELSE 0
    END) OVER (
      PARTITION BY aggregate_id 
      ORDER BY version 
      ROWS UNBOUNDED PRECEDING
    ) as current_balance,

    -- Account status reconstruction
    LAST_VALUE(CASE 
      WHEN event_type = 'AccountCreated' THEN 'active'
      WHEN event_type = 'AccountFrozen' THEN 'frozen'
      ELSE NULL
    END IGNORE NULLS) OVER (
      PARTITION BY aggregate_id 
      ORDER BY version 
      ROWS UNBOUNDED PRECEDING
    ) as current_status,

    -- Transaction count
    COUNT(*) FILTER (WHERE event_category = 'financial_transaction') OVER (
      PARTITION BY aggregate_id 
      ORDER BY version 
      ROWS UNBOUNDED PRECEDING
    ) as transaction_count

  FROM event_timeline
),

transaction_flow_analysis AS (
  -- Analyze transaction flows between accounts
  SELECT 
    et.transfer_id,
    et.correlation_id,

    -- Debit side
    MAX(CASE WHEN et.transaction_direction = 'outbound' THEN et.aggregate_id END) as debit_account,
    MAX(CASE WHEN et.transaction_direction = 'outbound' THEN et.transaction_amount END) as debit_amount,
    MAX(CASE WHEN et.transaction_direction = 'outbound' THEN et.event_timestamp END) as debit_timestamp,

    -- Credit side  
    MAX(CASE WHEN et.transaction_direction = 'inbound' THEN et.aggregate_id END) as credit_account,
    MAX(CASE WHEN et.transaction_direction = 'inbound' THEN et.transaction_amount END) as credit_amount,
    MAX(CASE WHEN et.transaction_direction = 'inbound' THEN et.event_timestamp END) as credit_timestamp,

    -- Flow analysis
    COUNT(*) as event_count,
    COUNT(CASE WHEN et.transaction_direction = 'outbound' THEN 1 END) as debit_events,
    COUNT(CASE WHEN et.transaction_direction = 'inbound' THEN 1 END) as credit_events,

    -- Transfer completeness
    CASE 
      WHEN COUNT(CASE WHEN et.transaction_direction = 'outbound' THEN 1 END) > 0 
       AND COUNT(CASE WHEN et.transaction_direction = 'inbound' THEN 1 END) > 0 THEN 'completed'
      WHEN COUNT(CASE WHEN et.transaction_direction = 'outbound' THEN 1 END) > 0 THEN 'partial_debit'
      WHEN COUNT(CASE WHEN et.transaction_direction = 'inbound' THEN 1 END) > 0 THEN 'partial_credit'
      ELSE 'unknown'
    END as transfer_status,

    -- Risk indicators
    MAX(et.risk_level) as max_risk_level,
    ARRAY_AGG(DISTINCT JSON_EXTRACT_ARRAY(et.regulatory_flags)) as all_regulatory_flags,

    -- Timing analysis
    EXTRACT(EPOCH FROM (
      MAX(CASE WHEN et.transaction_direction = 'inbound' THEN et.event_timestamp END) - 
      MAX(CASE WHEN et.transaction_direction = 'outbound' THEN et.event_timestamp END)
    )) as transfer_duration_seconds

  FROM event_timeline et
  WHERE et.transfer_id IS NOT NULL
    AND et.event_category = 'financial_transaction'
  GROUP BY et.transfer_id, et.correlation_id
),

compliance_risk_assessment AS (
  -- Comprehensive compliance and risk analysis
  SELECT 
    et.aggregate_id as account_id,
    et.performed_by as user_id,
    et.source_ip,

    -- Volume analysis
    COUNT(*) as total_events,
    COUNT(*) FILTER (WHERE et.event_category = 'financial_transaction') as transaction_count,
    SUM(et.transaction_amount::DECIMAL) FILTER (WHERE et.transaction_direction = 'outbound') as total_debits,
    SUM(et.transaction_amount::DECIMAL) FILTER (WHERE et.transaction_direction = 'inbound') as total_credits,

    -- Risk indicators
    COUNT(*) FILTER (WHERE et.risk_level = 'high') as high_risk_transactions,
    COUNT(*) FILTER (WHERE JSON_ARRAY_LENGTH(et.regulatory_flags) > 0) as flagged_transactions,
    COUNT(DISTINCT et.source_ip) as unique_ip_addresses,

    -- Behavioral patterns
    AVG(et.transaction_amount::DECIMAL) FILTER (WHERE et.event_category = 'financial_transaction') as avg_transaction_amount,
    MAX(et.transaction_amount::DECIMAL) FILTER (WHERE et.event_category = 'financial_transaction') as max_transaction_amount,
    MIN(et.event_timestamp) as first_activity,
    MAX(et.event_timestamp) as last_activity,

    -- Compliance flags
    COUNT(*) FILTER (WHERE et.compliance_ref IS NOT NULL) as compliance_referenced_events,
    ARRAY_AGG(DISTINCT et.compliance_ref) FILTER (WHERE et.compliance_ref IS NOT NULL) as compliance_references,

    -- Geographic indicators
    COUNT(DISTINCT et.source_ip) as ip_diversity,

    -- Velocity analysis
    COUNT(*) FILTER (WHERE et.event_timestamp >= CURRENT_TIMESTAMP - INTERVAL '24 hours') as events_last_24h,
    COUNT(*) FILTER (WHERE et.event_timestamp >= CURRENT_TIMESTAMP - INTERVAL '1 hour') as events_last_hour,

    -- Overall risk score calculation
    LEAST(100, 
      COALESCE(COUNT(*) FILTER (WHERE et.risk_level = 'high') * 20, 0) +
      COALESCE(COUNT(*) FILTER (WHERE JSON_ARRAY_LENGTH(et.regulatory_flags) > 0) * 15, 0) +
      COALESCE(COUNT(DISTINCT et.source_ip) * 5, 0) +
      CASE WHEN COUNT(*) FILTER (WHERE et.event_timestamp >= CURRENT_TIMESTAMP - INTERVAL '1 hour') > 10 THEN 25 ELSE 0 END
    ) as calculated_risk_score

  FROM event_timeline et
  WHERE et.event_timestamp >= CURRENT_TIMESTAMP - INTERVAL '30 days'
  GROUP BY et.aggregate_id, et.performed_by, et.source_ip
)

-- Comprehensive event sourcing analysis dashboard
SELECT 
  'Event Sourcing System Analysis' as report_type,
  CURRENT_TIMESTAMP as generated_at,

  -- System overview
  JSON_OBJECT(
    'total_events', (SELECT COUNT(*) FROM event_timeline),
    'total_aggregates', (SELECT COUNT(DISTINCT aggregate_id) FROM event_timeline),
    'event_types', (SELECT JSON_OBJECT_AGG(event_type, type_count) FROM (
      SELECT event_type, COUNT(*) as type_count 
      FROM event_timeline 
      GROUP BY event_type
    ) type_stats),
    'date_range', JSON_OBJECT(
      'earliest_event', (SELECT MIN(event_timestamp) FROM event_timeline),
      'latest_event', (SELECT MAX(event_timestamp) FROM event_timeline)
    )
  ) as system_overview,

  -- Account state summary
  JSON_OBJECT(
    'total_accounts', (SELECT COUNT(DISTINCT aggregate_id) FROM aggregate_state_reconstruction),
    'active_accounts', (SELECT COUNT(DISTINCT aggregate_id) FROM aggregate_state_reconstruction WHERE current_status = 'active'),
    'frozen_accounts', (SELECT COUNT(DISTINCT aggregate_id) FROM aggregate_state_reconstruction WHERE current_status = 'frozen'),
    'total_balance', (SELECT SUM(DISTINCT current_balance) FROM (
      SELECT aggregate_id, LAST_VALUE(current_balance) OVER (PARTITION BY aggregate_id ORDER BY version ROWS UNBOUNDED PRECEDING) as current_balance
      FROM aggregate_state_reconstruction
    ) final_balances),
    'avg_transactions_per_account', (SELECT AVG(DISTINCT transaction_count) FROM (
      SELECT aggregate_id, LAST_VALUE(transaction_count) OVER (PARTITION BY aggregate_id ORDER BY version ROWS UNBOUNDED PRECEDING) as transaction_count
      FROM aggregate_state_reconstruction
    ) txn_counts)
  ) as account_summary,

  -- Transaction flow analysis
  JSON_OBJECT(
    'total_transfers', (SELECT COUNT(*) FROM transaction_flow_analysis),
    'completed_transfers', (SELECT COUNT(*) FROM transaction_flow_analysis WHERE transfer_status = 'completed'),
    'partial_transfers', (SELECT COUNT(*) FROM transaction_flow_analysis WHERE transfer_status LIKE 'partial_%'),
    'avg_transfer_duration_seconds', (
      SELECT AVG(transfer_duration_seconds) 
      FROM transaction_flow_analysis 
      WHERE transfer_status = 'completed' AND transfer_duration_seconds > 0
    ),
    'high_risk_transfers', (SELECT COUNT(*) FROM transaction_flow_analysis WHERE max_risk_level = 'high'),
    'flagged_transfers', (SELECT COUNT(*) FROM transaction_flow_analysis WHERE ARRAY_LENGTH(all_regulatory_flags, 1) > 0)
  ) as transfer_analysis,

  -- Compliance and risk insights
  JSON_OBJECT(
    'high_risk_accounts', (SELECT COUNT(*) FROM compliance_risk_assessment WHERE calculated_risk_score > 50),
    'accounts_with_compliance_flags', (SELECT COUNT(*) FROM compliance_risk_assessment WHERE compliance_referenced_events > 0),
    'high_velocity_accounts', (SELECT COUNT(*) FROM compliance_risk_assessment WHERE events_last_hour > 5),
    'multi_ip_accounts', (SELECT COUNT(*) FROM compliance_risk_assessment WHERE ip_diversity > 3),
    'avg_risk_score', (SELECT AVG(calculated_risk_score) FROM compliance_risk_assessment)
  ) as compliance_insights,

  -- Event sourcing health metrics
  JSON_OBJECT(
    'events_per_day_avg', (
      SELECT AVG(daily_count) FROM (
        SELECT DATE_TRUNC('day', event_timestamp) as event_date, COUNT(*) as daily_count
        FROM event_timeline
        GROUP BY DATE_TRUNC('day', event_timestamp)
      ) daily_stats
    ),
    'largest_aggregate_event_count', (
      SELECT MAX(aggregate_event_count) FROM (
        SELECT aggregate_id, COUNT(*) as aggregate_event_count
        FROM event_timeline
        GROUP BY aggregate_id
      ) aggregate_stats
    ),
    'event_store_efficiency', 
      CASE 
        WHEN (SELECT COUNT(*) FROM event_timeline) > 1000000 THEN 'high_volume'
        WHEN (SELECT COUNT(*) FROM event_timeline) > 100000 THEN 'medium_volume'
        ELSE 'low_volume'
      END
  ) as system_health,

  -- Top risk accounts (limit to top 10)
  (SELECT JSON_AGG(
    JSON_OBJECT(
      'account_id', account_id,
      'risk_score', calculated_risk_score,
      'transaction_count', transaction_count,
      'total_volume', ROUND((total_debits + total_credits)::NUMERIC, 2),
      'ip_diversity', ip_diversity,
      'last_activity', last_activity
    )
  ) FROM (
    SELECT * FROM compliance_risk_assessment 
    WHERE calculated_risk_score > 30
    ORDER BY calculated_risk_score DESC 
    LIMIT 10
  ) top_risks) as high_risk_accounts,

  -- System recommendations
  ARRAY[
    CASE WHEN (SELECT COUNT(*) FROM transaction_flow_analysis WHERE transfer_status LIKE 'partial_%') > 0
         THEN 'Investigate partial transfers - possible saga failures or timeout issues' END,
    CASE WHEN (SELECT AVG(calculated_risk_score) FROM compliance_risk_assessment) > 25
         THEN 'Overall risk score elevated - review compliance monitoring thresholds' END,
    CASE WHEN (SELECT COUNT(*) FROM compliance_risk_assessment WHERE events_last_hour > 10) > 0
         THEN 'High-velocity accounts detected - review rate limiting and fraud detection' END,
    CASE WHEN (SELECT MAX(transfer_duration_seconds) FROM transaction_flow_analysis WHERE transfer_status = 'completed') > 300
         THEN 'Some transfers taking over 5 minutes - investigate saga timeout configurations' END
  ]::TEXT[] as recommendations;

-- Event store maintenance and optimization queries
WITH event_store_stats AS (
  SELECT 
    aggregate_type,
    event_type,
    COUNT(*) as event_count,
    MIN(event_timestamp) as earliest_event,
    MAX(event_timestamp) as latest_event,
    AVG(LENGTH(JSON_UNPARSE(event_data))) as avg_event_size,
    COUNT(DISTINCT aggregate_id) as unique_aggregates,
    AVG(COUNT(*)) OVER (PARTITION BY aggregate_id) as avg_events_per_aggregate
  FROM event_store
  WHERE event_timestamp >= CURRENT_TIMESTAMP - INTERVAL '7 days'
  GROUP BY aggregate_type, event_type
),

snapshot_opportunities AS (
  SELECT 
    aggregate_id,
    COUNT(*) as event_count,
    MAX(version) as latest_version,
    MIN(event_timestamp) as first_event,
    MAX(event_timestamp) as last_event,

    -- Calculate if snapshot would be beneficial
    CASE 
      WHEN COUNT(*) > 100 THEN 'high_priority'
      WHEN COUNT(*) > 50 THEN 'medium_priority'
      WHEN COUNT(*) > 20 THEN 'low_priority'
      ELSE 'not_needed'
    END as snapshot_priority,

    -- Estimate performance improvement
    ROUND((COUNT(*) / 100.0) * 100, 0) as estimated_replay_time_reduction_percent

  FROM event_store
  WHERE aggregate_id NOT IN (
    SELECT DISTINCT aggregate_id FROM aggregate_snapshots
  )
  GROUP BY aggregate_id
  HAVING COUNT(*) > 20
)

SELECT 
  'Event Store Optimization Report' as report_type,

  -- Performance statistics
  JSON_OBJECT(
    'total_event_types', (SELECT COUNT(DISTINCT event_type) FROM event_store_stats),
    'avg_event_size_bytes', (SELECT ROUND(AVG(avg_event_size)::NUMERIC, 0) FROM event_store_stats),
    'largest_aggregate_event_count', (SELECT MAX(event_count) FROM (
      SELECT aggregate_id, COUNT(*) as event_count FROM event_store GROUP BY aggregate_id
    ) agg_counts),
    'events_per_aggregate_avg', (SELECT AVG(unique_aggregates) FROM event_store_stats)
  ) as performance_stats,

  -- Snapshot recommendations
  JSON_OBJECT(
    'aggregates_needing_snapshots', (SELECT COUNT(*) FROM snapshot_opportunities),
    'high_priority_snapshots', (SELECT COUNT(*) FROM snapshot_opportunities WHERE snapshot_priority = 'high_priority'),
    'total_events_in_non_snapshotted_aggregates', (SELECT SUM(event_count) FROM snapshot_opportunities),
    'estimated_total_performance_improvement', (
      SELECT CONCAT(AVG(estimated_replay_time_reduction_percent), '%') 
      FROM snapshot_opportunities 
      WHERE snapshot_priority IN ('high_priority', 'medium_priority')
    )
  ) as snapshot_recommendations,

  -- Storage optimization insights
  CASE 
    WHEN (SELECT AVG(avg_event_size) FROM event_store_stats) > 10240 THEN 'Consider event data compression'
    WHEN (SELECT COUNT(*) FROM event_store WHERE event_timestamp < CURRENT_TIMESTAMP - INTERVAL '1 year') > 100000 THEN 'Consider archiving old events'
    WHEN (SELECT COUNT(*) FROM snapshot_opportunities WHERE snapshot_priority = 'high_priority') > 10 THEN 'Immediate snapshot creation recommended'
    ELSE 'Event store operating within optimal parameters'
  END as primary_recommendation;

-- Real-time event sourcing monitoring
CREATE VIEW event_sourcing_health_monitor AS
WITH real_time_metrics AS (
  SELECT 
    CURRENT_TIMESTAMP as monitor_timestamp,

    -- Recent event activity (last 5 minutes)
    (SELECT COUNT(*) FROM event_store 
     WHERE stored_at >= CURRENT_TIMESTAMP - INTERVAL '5 minutes') as recent_events,

    -- Command processing rate
    (SELECT COUNT(*) FROM event_store 
     WHERE stored_at >= CURRENT_TIMESTAMP - INTERVAL '1 minute') as events_last_minute,

    -- Aggregate activity
    (SELECT COUNT(DISTINCT aggregate_id) FROM event_store 
     WHERE stored_at >= CURRENT_TIMESTAMP - INTERVAL '5 minutes') as active_aggregates,

    -- System load indicators
    (SELECT COUNT(*) FROM event_store 
     WHERE event_type = 'FundsDebited' 
     AND stored_at >= CURRENT_TIMESTAMP - INTERVAL '5 minutes') as financial_events_recent,

    -- Error indicators (assuming error events are stored)
    (SELECT COUNT(*) FROM event_store 
     WHERE event_type LIKE '%Error%' 
     AND stored_at >= CURRENT_TIMESTAMP - INTERVAL '5 minutes') as recent_errors,

    -- Saga health (pending transfers)
    (SELECT COUNT(*) FROM (
      SELECT transfer_id 
      FROM event_store 
      WHERE event_type = 'FundsDebited' 
      AND stored_at >= CURRENT_TIMESTAMP - INTERVAL '10 minutes'
      AND JSON_EXTRACT(event_data, '$.transferId') NOT IN (
        SELECT JSON_EXTRACT(event_data, '$.transferId')
        FROM event_store 
        WHERE event_type = 'FundsCredited'
        AND stored_at >= CURRENT_TIMESTAMP - INTERVAL '10 minutes'
      )
    ) pending) as pending_transfers
)

SELECT 
  monitor_timestamp,
  recent_events,
  events_last_minute,
  active_aggregates,
  financial_events_recent,
  recent_errors,
  pending_transfers,

  -- System health indicators
  CASE 
    WHEN events_last_minute > 100 THEN 'high_throughput'
    WHEN events_last_minute > 20 THEN 'normal_throughput'
    WHEN events_last_minute > 0 THEN 'low_throughput'
    ELSE 'idle'
  END as throughput_status,

  CASE 
    WHEN recent_errors > 0 THEN 'error_detected'
    WHEN pending_transfers > 10 THEN 'saga_backlog'
    WHEN events_last_minute = 0 AND EXTRACT(HOUR FROM CURRENT_TIMESTAMP) BETWEEN 9 AND 17 THEN 'potentially_idle'
    ELSE 'healthy'
  END as system_health,

  -- Performance indicators
  ROUND(events_last_minute / 60.0, 2) as events_per_second,
  ROUND(financial_events_recent / GREATEST(recent_events, 1) * 100, 1) as financial_event_percentage

FROM real_time_metrics;

-- QueryLeaf provides comprehensive MongoDB event sourcing capabilities:
-- 1. Complete event store management with SQL-familiar syntax  
-- 2. Advanced aggregate reconstruction using window functions and temporal queries
-- 3. Sophisticated audit trail analysis with compliance reporting
-- 4. Complex business process tracking through correlation and saga patterns
-- 5. Real-time monitoring and health assessment capabilities
-- 6. Performance optimization insights including snapshot recommendations
-- 7. Risk assessment and fraud detection through event pattern analysis
-- 8. Regulatory compliance support with comprehensive audit capabilities
-- 9. Integration with MongoDB's native performance and indexing capabilities
-- 10. Familiar SQL patterns for complex event sourcing operations and CQRS implementations

Best Practices for Production Event Sourcing and CQRS

Event Store Design and Performance Optimization

Essential principles for scalable MongoDB event sourcing implementations:

  1. Event Design: Create immutable, self-contained events with complete business context and metadata
  2. Indexing Strategy: Implement comprehensive indexing for aggregate lookup, temporal queries, and compliance auditing
  3. Snapshot Management: Design efficient snapshot strategies to optimize aggregate reconstruction performance
  4. Schema Evolution: Plan for event schema versioning and backward compatibility as business rules evolve
  5. Security Integration: Implement encryption, access controls, and audit logging for sensitive event data
  6. Performance Monitoring: Deploy comprehensive metrics for event throughput, aggregate health, and saga performance

CQRS Read Model Optimization

Design efficient read models for complex query requirements:

  1. Read Model Strategy: Create specialized read models optimized for specific query patterns and user interfaces
  2. Eventual Consistency: Implement robust event processing for read model updates with proper error handling
  3. Caching Integration: Add intelligent caching layers for frequently accessed read model data
  4. Analytics Support: Design read models that support business intelligence and regulatory reporting requirements
  5. Scalability Planning: Plan read model distribution and replication for high-availability query processing
  6. Business Intelligence: Integrate with analytics tools for comprehensive business insights from event data

Conclusion

MongoDB Event Sourcing and CQRS provide enterprise-grade architectural patterns that enable building resilient, auditable, and scalable distributed systems with complete business context preservation and sophisticated query capabilities. The combination of immutable event storage, aggregate reconstruction, and command-query separation creates robust systems that naturally support complex business processes, regulatory compliance, and distributed system consistency requirements.

Key MongoDB Event Sourcing advantages include:

  • Complete Audit Trails: Immutable event storage provides comprehensive business history and regulatory compliance support
  • Distributed Consistency: Event-driven architecture enables eventual consistency across microservices boundaries
  • Business Logic Preservation: Events capture complete business context and decision-making information
  • Performance Optimization: Specialized read models and aggregate snapshots provide efficient query processing
  • Scalability Support: Independent scaling of command and query processing with MongoDB's distributed capabilities
  • SQL Accessibility: Familiar SQL-style operations through QueryLeaf for event sourcing and CQRS management

Whether you're building financial systems, e-commerce platforms, compliance-sensitive applications, or complex distributed architectures requiring complete auditability, MongoDB Event Sourcing with QueryLeaf's SQL interface provides the foundation for reliable, scalable, and maintainable systems that preserve business context while enabling sophisticated query and analysis capabilities.

QueryLeaf Integration: QueryLeaf automatically translates SQL-style event sourcing and CQRS operations into MongoDB's native aggregation pipelines and indexing strategies, making advanced event-driven architectures accessible to SQL-oriented development teams. Complex event replay scenarios, aggregate reconstruction, compliance reporting, and read model management are seamlessly handled through familiar SQL constructs, enabling sophisticated distributed system patterns without requiring deep MongoDB event sourcing expertise.

The combination of MongoDB's powerful event storage and aggregation capabilities with SQL-familiar event sourcing operations creates an ideal platform for applications requiring both sophisticated audit capabilities and familiar database interaction patterns, ensuring your event-driven systems can evolve and scale efficiently while maintaining complete business context and regulatory compliance.

MongoDB GridFS Advanced File Streaming and Compression: High-Performance Large File Management and Optimization

Modern applications require efficient handling of large files, from media assets and document repositories to backup systems and content delivery networks. Traditional file storage approaches struggle with distributed architectures, automatic failover, and efficient streaming, especially when dealing with multi-gigabyte files or high-throughput workloads.

MongoDB GridFS provides advanced file storage capabilities that integrate seamlessly with your database infrastructure, offering automatic sharding, compression, streaming, and distributed replication. Unlike traditional file systems that require separate infrastructure and complex synchronization mechanisms, GridFS stores files as documents with built-in metadata, versioning, and query capabilities.

The Large File Storage Challenge

Traditional file storage approaches have significant limitations for modern distributed applications:

// Traditional file system approach - limited scalability and integration
const fs = require('fs');
const path = require('path');
const crypto = require('crypto');

class TraditionalFileStorage {
  constructor(baseDirectory) {
    this.baseDirectory = baseDirectory;
    this.metadata = new Map(); // In-memory metadata - lost on restart
  }

  async uploadFile(filename, fileStream, metadata = {}) {
    try {
      const filePath = path.join(this.baseDirectory, filename);
      const writeStream = fs.createWriteStream(filePath);

      // No built-in compression
      fileStream.pipe(writeStream);

      await new Promise((resolve, reject) => {
        writeStream.on('finish', resolve);
        writeStream.on('error', reject);
      });

      // Manual metadata management
      const stats = fs.statSync(filePath);
      this.metadata.set(filename, {
        size: stats.size,
        uploadDate: new Date(),
        contentType: metadata.contentType || 'application/octet-stream',
        ...metadata
      });

      return { success: true, filename, size: stats.size };

    } catch (error) {
      console.error('Upload failed:', error);
      return { success: false, error: error.message };
    }
  }

  async downloadFile(filename) {
    try {
      const filePath = path.join(this.baseDirectory, filename);

      // No streaming optimization
      if (!fs.existsSync(filePath)) {
        throw new Error('File not found');
      }

      const readStream = fs.createReadStream(filePath);
      const metadata = this.metadata.get(filename) || {};

      return {
        success: true,
        stream: readStream,
        metadata: metadata
      };

    } catch (error) {
      console.error('Download failed:', error);
      return { success: false, error: error.message };
    }
  }

  async getFileMetadata(filename) {
    // Limited metadata capabilities
    return this.metadata.get(filename) || null;
  }

  async deleteFile(filename) {
    try {
      const filePath = path.join(this.baseDirectory, filename);
      fs.unlinkSync(filePath);
      this.metadata.delete(filename);

      return { success: true };
    } catch (error) {
      return { success: false, error: error.message };
    }
  }

  // Problems with traditional file storage:
  // 1. No automatic replication or high availability
  // 2. No built-in compression or optimization
  // 3. Limited metadata and search capabilities  
  // 4. No streaming optimization for large files
  // 5. Manual synchronization across distributed systems
  // 6. No versioning or audit trail capabilities
  // 7. Limited concurrent access and locking mechanisms
  // 8. No integration with database transactions
  // 9. Complex backup and recovery procedures
  // 10. No automatic sharding for very large files
}

MongoDB GridFS eliminates these limitations with comprehensive file storage features:

// MongoDB GridFS - comprehensive file storage with advanced features
const { MongoClient, GridFSBucket, ObjectId } = require('mongodb');
const { Transform, PassThrough } = require('stream');
const zlib = require('zlib');
const sharp = require('sharp'); // For image processing
const ffmpeg = require('fluent-ffmpeg'); // For video processing

class AdvancedGridFSManager {
  constructor(db) {
    this.db = db;
    this.bucket = new GridFSBucket(db, {
      bucketName: 'advanced_files',
      chunkSizeBytes: 1048576 * 4 // 4MB chunks for optimal performance
    });

    // Specialized buckets for different file types
    this.imageBucket = new GridFSBucket(db, { 
      bucketName: 'images',
      chunkSizeBytes: 1048576 * 2 // 2MB for images
    });

    this.videoBucket = new GridFSBucket(db, { 
      bucketName: 'videos',
      chunkSizeBytes: 1048576 * 8 // 8MB for videos
    });

    this.documentBucket = new GridFSBucket(db, { 
      bucketName: 'documents',
      chunkSizeBytes: 1048576 * 1 // 1MB for documents
    });

    // Performance monitoring
    this.metrics = {
      uploads: { count: 0, totalBytes: 0, totalTime: 0 },
      downloads: { count: 0, totalBytes: 0, totalTime: 0 },
      compressionRatios: []
    };
  }

  async uploadFileWithAdvancedFeatures(fileStream, metadata = {}) {
    const startTime = Date.now();

    try {
      // Determine optimal bucket and processing pipeline
      const fileType = this.detectFileType(metadata.contentType);
      const bucket = this.selectOptimalBucket(fileType);

      // Generate unique filename with collision prevention
      const filename = this.generateUniqueFilename(metadata.originalName || 'file');

      // Create processing pipeline based on file type
      const { processedStream, finalMetadata } = await this.createProcessingPipeline(
        fileStream, fileType, metadata
      );

      // Advanced upload with compression and optimization
      const uploadResult = await this.performAdvancedUpload(
        processedStream, filename, finalMetadata, bucket
      );

      // Record performance metrics
      const duration = Date.now() - startTime;
      this.updateMetrics('upload', uploadResult.size, duration);

      // Create file registry entry for advanced querying
      await this.createFileRegistryEntry(uploadResult, finalMetadata);

      return {
        success: true,
        fileId: uploadResult._id,
        filename: filename,
        size: uploadResult.size,
        processingTime: duration,
        compressionRatio: finalMetadata.compressionRatio || 1.0,
        optimizations: finalMetadata.optimizations || []
      };

    } catch (error) {
      console.error('Advanced upload failed:', error);
      return {
        success: false,
        error: error.message,
        processingTime: Date.now() - startTime
      };
    }
  }

  detectFileType(contentType) {
    if (!contentType) return 'unknown';

    if (contentType.startsWith('image/')) return 'image';
    if (contentType.startsWith('video/')) return 'video';
    if (contentType.startsWith('audio/')) return 'audio';
    if (contentType.includes('pdf')) return 'document';
    if (contentType.includes('text/')) return 'text';
    if (contentType.includes('application/json')) return 'data';
    if (contentType.includes('zip') || contentType.includes('gzip')) return 'archive';

    return 'binary';
  }

  selectOptimalBucket(fileType) {
    switch (fileType) {
      case 'image': return this.imageBucket;
      case 'video': 
      case 'audio': return this.videoBucket;
      case 'document':
      case 'text': return this.documentBucket;
      default: return this.bucket;
    }
  }

  generateUniqueFilename(originalName) {
    const timestamp = Date.now();
    const random = Math.random().toString(36).substring(2, 15);
    const extension = originalName.includes('.') ? 
      originalName.split('.').pop() : '';

    return `${timestamp}_${random}${extension ? '.' + extension : ''}`;
  }

  async createProcessingPipeline(inputStream, fileType, metadata) {
    const transforms = [];
    const finalMetadata = { ...metadata };
    let compressionRatio = 1.0;

    // Add compression based on file type
    if (this.shouldCompress(fileType, metadata)) {
      const compressionLevel = this.getOptimalCompressionLevel(fileType);

      const compressionTransform = this.createCompressionTransform(
        fileType, compressionLevel
      );

      transforms.push(compressionTransform);

      finalMetadata.compressed = true;
      finalMetadata.compressionLevel = compressionLevel;
      finalMetadata.originalContentType = metadata.contentType;
    }

    // Add file type specific optimizations
    if (fileType === 'image' && metadata.enableImageOptimization !== false) {
      const imageTransform = await this.createImageOptimizationTransform(metadata);
      transforms.push(imageTransform);
      finalMetadata.optimizations = ['image_optimization'];
    }

    // Add encryption if required
    if (metadata.encrypt === true) {
      const encryptionTransform = this.createEncryptionTransform(metadata.encryptionKey);
      transforms.push(encryptionTransform);
      finalMetadata.encrypted = true;
    }

    // Add integrity checking
    const integrityTransform = this.createIntegrityTransform();
    transforms.push(integrityTransform);

    // Create processing pipeline
    let processedStream = inputStream;

    for (const transform of transforms) {
      processedStream = processedStream.pipe(transform);
    }

    // Add size tracking
    const sizeTracker = this.createSizeTrackingTransform();
    processedStream = processedStream.pipe(sizeTracker);

    // Calculate compression ratio after processing
    finalMetadata.compressionRatio = compressionRatio;

    return { 
      processedStream, 
      finalMetadata: {
        ...finalMetadata,
        processingTimestamp: new Date(),
        pipeline: transforms.map(t => t.constructor.name)
      }
    };
  }

  shouldCompress(fileType, metadata) {
    // Don't compress already compressed formats
    const skipCompression = ['image/jpeg', 'image/png', 'video/', 'audio/', 'zip', 'gzip'];
    if (skipCompression.some(type => metadata.contentType?.includes(type))) {
      return false;
    }

    // Always compress text and data files
    return ['text', 'document', 'data', 'binary'].includes(fileType);
  }

  getOptimalCompressionLevel(fileType) {
    const compressionLevels = {
      'text': 9,      // Maximum compression for text
      'document': 7,  // High compression for documents
      'data': 8,      // High compression for data files
      'binary': 6     // Moderate compression for binary
    };

    return compressionLevels[fileType] || 6;
  }

  createCompressionTransform(fileType, level) {
    // Use gzip compression with optimal settings
    return zlib.createGzip({
      level: level,
      windowBits: 15,
      memLevel: 8,
      strategy: fileType === 'text' ? zlib.constants.Z_TEXT : zlib.constants.Z_DEFAULT_STRATEGY
    });
  }

  async createImageOptimizationTransform(metadata) {
    const options = {
      quality: metadata.imageQuality || 85,
      progressive: true,
      optimizeScans: true
    };

    // Create image optimization transform
    return sharp()
      .jpeg(options)
      .png({ compressionLevel: 9, adaptiveFiltering: true })
      .webp({ quality: options.quality, effort: 6 });
  }

  createEncryptionTransform(encryptionKey) {
    const crypto = require('crypto');
    const algorithm = 'aes-256-gcm';
    const iv = crypto.randomBytes(16);

    const cipher = crypto.createCipher(algorithm, encryptionKey);

    return new Transform({
      transform(chunk, encoding, callback) {
        try {
          const encrypted = cipher.update(chunk);
          callback(null, encrypted);
        } catch (error) {
          callback(error);
        }
      },
      flush(callback) {
        try {
          const final = cipher.final();
          callback(null, final);
        } catch (error) {
          callback(error);
        }
      }
    });
  }

  createIntegrityTransform() {
    const crypto = require('crypto');
    const hash = crypto.createHash('sha256');

    return new Transform({
      transform(chunk, encoding, callback) {
        hash.update(chunk);
        callback(null, chunk); // Pass through while calculating hash
      },
      flush(callback) {
        this.fileHash = hash.digest('hex');
        callback();
      }
    });
  }

  createSizeTrackingTransform() {
    let totalSize = 0;

    return new Transform({
      transform(chunk, encoding, callback) {
        totalSize += chunk.length;
        callback(null, chunk);
      },
      flush(callback) {
        this.totalSize = totalSize;
        callback();
      }
    });
  }

  async performAdvancedUpload(processedStream, filename, metadata, bucket) {
    return new Promise((resolve, reject) => {
      const uploadStream = bucket.openUploadStream(filename, {
        metadata: {
          ...metadata,
          uploadedAt: new Date(),
          processingVersion: '2.0',

          // Add searchable tags
          tags: this.generateSearchTags(metadata),

          // Add file categorization
          category: this.categorizeFile(metadata),

          // Add retention policy
          retentionPolicy: metadata.retentionDays || 365,
          expirationDate: metadata.retentionDays ? 
            new Date(Date.now() + metadata.retentionDays * 24 * 60 * 60 * 1000) : null
        }
      });

      uploadStream.on('error', reject);
      uploadStream.on('finish', (file) => {
        resolve({
          _id: file._id,
          filename: file.filename,
          size: file.length,
          uploadDate: file.uploadDate,
          metadata: file.metadata
        });
      });

      // Start the upload
      processedStream.pipe(uploadStream);
    });
  }

  generateSearchTags(metadata) {
    const tags = [];

    if (metadata.contentType) {
      tags.push(metadata.contentType.split('/')[0]); // e.g., 'image' from 'image/jpeg'
      tags.push(metadata.contentType); // Full content type
    }

    if (metadata.originalName) {
      const extension = metadata.originalName.split('.').pop()?.toLowerCase();
      if (extension) tags.push(extension);
    }

    if (metadata.category) tags.push(metadata.category);
    if (metadata.compressed) tags.push('compressed');
    if (metadata.encrypted) tags.push('encrypted');
    if (metadata.optimized) tags.push('optimized');

    return tags;
  }

  categorizeFile(metadata) {
    const contentType = metadata.contentType || '';

    if (contentType.startsWith('image/')) {
      return metadata.category || 'media';
    } else if (contentType.startsWith('video/') || contentType.startsWith('audio/')) {
      return metadata.category || 'multimedia';
    } else if (contentType.includes('pdf') || contentType.includes('document')) {
      return metadata.category || 'document';
    } else if (contentType.includes('text/')) {
      return metadata.category || 'text';
    } else {
      return metadata.category || 'data';
    }
  }

  async downloadFileWithStreaming(fileId, options = {}) {
    const startTime = Date.now();

    try {
      // Get file metadata for processing decisions
      const fileMetadata = await this.getFileMetadata(fileId);

      if (!fileMetadata) {
        throw new Error(`File not found: ${fileId}`);
      }

      // Select optimal bucket
      const bucket = this.selectBucketByMetadata(fileMetadata);

      // Create download stream with range support
      const downloadOptions = this.createDownloadOptions(options, fileMetadata);
      const downloadStream = bucket.openDownloadStream(
        ObjectId(fileId), 
        downloadOptions
      );

      // Create decompression/decoding pipeline
      const { processedStream, streamMetadata } = this.createDownloadPipeline(
        downloadStream, fileMetadata, options
      );

      // Record performance metrics
      const setupTime = Date.now() - startTime;

      return {
        success: true,
        stream: processedStream,
        metadata: {
          ...fileMetadata,
          streamingOptions: streamMetadata,
          setupTime: setupTime
        }
      };

    } catch (error) {
      console.error('Streaming download failed:', error);
      return {
        success: false,
        error: error.message,
        setupTime: Date.now() - startTime
      };
    }
  }

  selectBucketByMetadata(fileMetadata) {
    const category = fileMetadata.metadata?.category;

    switch (category) {
      case 'media': return this.imageBucket;
      case 'multimedia': return this.videoBucket;
      case 'document':
      case 'text': return this.documentBucket;
      default: return this.bucket;
    }
  }

  createDownloadOptions(options, fileMetadata) {
    const downloadOptions = {};

    // Range/partial content support
    if (options.start !== undefined || options.end !== undefined) {
      downloadOptions.start = options.start || 0;
      downloadOptions.end = options.end || fileMetadata.length - 1;
    }

    return downloadOptions;
  }

  createDownloadPipeline(downloadStream, fileMetadata, options) {
    const transforms = [];
    const streamMetadata = {
      originalSize: fileMetadata.length,
      compressed: fileMetadata.metadata?.compressed || false,
      encrypted: fileMetadata.metadata?.encrypted || false
    };

    // Add decryption if file is encrypted
    if (fileMetadata.metadata?.encrypted && options.decryptionKey) {
      const decryptionTransform = this.createDecryptionTransform(options.decryptionKey);
      transforms.push(decryptionTransform);
      streamMetadata.decrypted = true;
    }

    // Add decompression if file is compressed
    if (fileMetadata.metadata?.compressed && options.decompress !== false) {
      const decompressionTransform = this.createDecompressionTransform();
      transforms.push(decompressionTransform);
      streamMetadata.decompressed = true;
    }

    // Add format conversion if requested
    if (options.convertTo && this.supportsConversion(fileMetadata, options.convertTo)) {
      const conversionTransform = this.createConversionTransform(
        fileMetadata, options.convertTo, options.conversionOptions || {}
      );
      transforms.push(conversionTransform);
      streamMetadata.converted = options.convertTo;
    }

    // Add bandwidth throttling if specified
    if (options.throttle) {
      const throttleTransform = this.createThrottleTransform(options.throttle);
      transforms.push(throttleTransform);
      streamMetadata.throttled = options.throttle;
    }

    // Build pipeline
    let processedStream = downloadStream;

    for (const transform of transforms) {
      processedStream = processedStream.pipe(transform);
    }

    return { processedStream, streamMetadata };
  }

  createDecryptionTransform(decryptionKey) {
    const crypto = require('crypto');
    const algorithm = 'aes-256-gcm';

    const decipher = crypto.createDecipher(algorithm, decryptionKey);

    return new Transform({
      transform(chunk, encoding, callback) {
        try {
          const decrypted = decipher.update(chunk);
          callback(null, decrypted);
        } catch (error) {
          callback(error);
        }
      },
      flush(callback) {
        try {
          const final = decipher.final();
          callback(null, final);
        } catch (error) {
          callback(error);
        }
      }
    });
  }

  createDecompressionTransform() {
    return zlib.createGunzip();
  }

  supportsConversion(fileMetadata, targetFormat) {
    const sourceType = fileMetadata.metadata?.contentType;

    if (!sourceType) return false;

    // Image conversions
    if (sourceType.startsWith('image/') && ['jpeg', 'png', 'webp'].includes(targetFormat)) {
      return true;
    }

    // Video conversions (basic)
    if (sourceType.startsWith('video/') && ['mp4', 'webm'].includes(targetFormat)) {
      return true;
    }

    return false;
  }

  createConversionTransform(fileMetadata, targetFormat, options) {
    const sourceType = fileMetadata.metadata?.contentType;

    if (sourceType?.startsWith('image/')) {
      return this.createImageConversionTransform(targetFormat, options);
    } else if (sourceType?.startsWith('video/')) {
      return this.createVideoConversionTransform(targetFormat, options);
    }

    throw new Error(`Unsupported conversion: ${sourceType} to ${targetFormat}`);
  }

  createImageConversionTransform(targetFormat, options) {
    const sharpInstance = sharp();

    switch (targetFormat) {
      case 'jpeg':
        return sharpInstance.jpeg({
          quality: options.quality || 85,
          progressive: options.progressive !== false
        });
      case 'png':
        return sharpInstance.png({
          compressionLevel: options.compressionLevel || 6
        });
      case 'webp':
        return sharpInstance.webp({
          quality: options.quality || 80,
          effort: options.effort || 4
        });
      default:
        throw new Error(`Unsupported image format: ${targetFormat}`);
    }
  }

  createVideoConversionTransform(targetFormat, options) {
    // Note: This is a simplified example. Real video conversion
    // would require more sophisticated stream handling
    const passThrough = new PassThrough();

    const command = ffmpeg()
      .input(passThrough)
      .format(targetFormat)
      .videoCodec(options.videoCodec || 'libx264')
      .audioCodec(options.audioCodec || 'aac');

    if (options.bitrate) {
      command.videoBitrate(options.bitrate);
    }

    const outputStream = new PassThrough();
    command.pipe(outputStream);

    return outputStream;
  }

  createThrottleTransform(bytesPerSecond) {
    let lastTime = Date.now();
    let bytesWritten = 0;

    return new Transform({
      transform(chunk, encoding, callback) {
        const now = Date.now();
        const elapsed = (now - lastTime) / 1000;
        bytesWritten += chunk.length;

        const expectedTime = bytesWritten / bytesPerSecond;
        const delay = Math.max(0, expectedTime - elapsed);

        setTimeout(() => {
          callback(null, chunk);
        }, delay * 1000);

        lastTime = now;
      }
    });
  }

  async createFileRegistryEntry(uploadResult, metadata) {
    // Create searchable registry for advanced file management
    const registryEntry = {
      _id: new ObjectId(),
      fileId: uploadResult._id,
      filename: uploadResult.filename,

      // File attributes
      size: uploadResult.size,
      contentType: metadata.originalContentType || metadata.contentType,
      category: metadata.category,
      tags: metadata.tags || [],

      // Upload information
      uploadDate: uploadResult.uploadDate,
      uploadedBy: metadata.uploadedBy,
      uploadSource: metadata.uploadSource || 'api',

      // Processing information
      compressed: metadata.compressed || false,
      encrypted: metadata.encrypted || false,
      optimized: metadata.optimizations?.length > 0,
      processingPipeline: metadata.pipeline || [],
      compressionRatio: metadata.compressionRatio || 1.0,

      // Lifecycle management
      retentionPolicy: metadata.retentionDays || 365,
      expirationDate: metadata.expirationDate,
      accessCount: 0,
      lastAccessed: null,

      // Search optimization
      searchableText: this.generateSearchableText(metadata),

      // Audit trail
      auditLog: [{
        action: 'uploaded',
        timestamp: new Date(),
        user: metadata.uploadedBy,
        details: {
          originalSize: metadata.originalSize,
          finalSize: uploadResult.size,
          compressionRatio: metadata.compressionRatio
        }
      }]
    };

    await this.db.collection('file_registry').insertOne(registryEntry);

    // Create indexes for efficient searching
    await this.ensureRegistryIndexes();

    return registryEntry;
  }

  generateSearchableText(metadata) {
    const searchTerms = [];

    if (metadata.originalName) {
      searchTerms.push(metadata.originalName);
    }

    if (metadata.description) {
      searchTerms.push(metadata.description);
    }

    if (metadata.tags) {
      searchTerms.push(...metadata.tags);
    }

    if (metadata.category) {
      searchTerms.push(metadata.category);
    }

    return searchTerms.join(' ').toLowerCase();
  }

  async ensureRegistryIndexes() {
    const registryCollection = this.db.collection('file_registry');

    // Create text index for searching
    await registryCollection.createIndex({
      'searchableText': 'text',
      'filename': 'text',
      'contentType': 'text'
    }, { name: 'file_search_index' });

    // Create compound indexes for common queries
    await registryCollection.createIndex({
      'category': 1,
      'uploadDate': -1
    }, { name: 'category_date_index' });

    await registryCollection.createIndex({
      'tags': 1,
      'size': -1
    }, { name: 'tags_size_index' });

    await registryCollection.createIndex({
      'expirationDate': 1
    }, { 
      name: 'expiration_index',
      expireAfterSeconds: 0 // Automatic cleanup of expired files
    });
  }

  updateMetrics(operation, bytes, duration) {
    if (operation === 'upload') {
      this.metrics.uploads.count++;
      this.metrics.uploads.totalBytes += bytes;
      this.metrics.uploads.totalTime += duration;
    } else if (operation === 'download') {
      this.metrics.downloads.count++;
      this.metrics.downloads.totalBytes += bytes;
      this.metrics.downloads.totalTime += duration;
    }
  }

  async getPerformanceMetrics() {
    const uploadStats = this.metrics.uploads;
    const downloadStats = this.metrics.downloads;

    return {
      uploads: {
        count: uploadStats.count,
        totalMB: Math.round(uploadStats.totalBytes / (1024 * 1024)),
        avgDurationMs: uploadStats.count > 0 ? uploadStats.totalTime / uploadStats.count : 0,
        throughputMBps: uploadStats.totalTime > 0 ? 
          (uploadStats.totalBytes / (1024 * 1024)) / (uploadStats.totalTime / 1000) : 0
      },
      downloads: {
        count: downloadStats.count,
        totalMB: Math.round(downloadStats.totalBytes / (1024 * 1024)),
        avgDurationMs: downloadStats.count > 0 ? downloadStats.totalTime / downloadStats.count : 0,
        throughputMBps: downloadStats.totalTime > 0 ? 
          (downloadStats.totalBytes / (1024 * 1024)) / (downloadStats.totalTime / 1000) : 0
      },
      compressionRatios: this.metrics.compressionRatios,
      averageCompressionRatio: this.metrics.compressionRatios.length > 0 ?
        this.metrics.compressionRatios.reduce((a, b) => a + b) / this.metrics.compressionRatios.length : 1.0
    };
  }
}

Advanced GridFS Streaming Patterns

Chunked Upload with Progress Tracking

Implement efficient chunked uploads for large files with real-time progress monitoring:

// Advanced chunked upload with progress tracking and resume capability
class ChunkedUploadManager {
  constructor(gridFSManager) {
    this.gridFS = gridFSManager;
    this.activeUploads = new Map();
    this.chunkSize = 1048576 * 5; // 5MB chunks
  }

  async initiateChunkedUpload(metadata) {
    const uploadId = new ObjectId();
    const uploadSession = {
      uploadId: uploadId,
      metadata: metadata,
      chunks: [],
      totalSize: metadata.totalSize || 0,
      uploadedSize: 0,
      status: 'initiated',
      createdAt: new Date(),
      lastActivity: new Date()
    };

    this.activeUploads.set(uploadId.toString(), uploadSession);

    // Store upload session in database for persistence
    await this.gridFS.db.collection('upload_sessions').insertOne({
      _id: uploadId,
      ...uploadSession,
      expiresAt: new Date(Date.now() + 24 * 60 * 60 * 1000) // 24 hour expiration
    });

    return {
      success: true,
      uploadId: uploadId,
      chunkSize: this.chunkSize,
      session: uploadSession
    };
  }

  async uploadChunk(uploadId, chunkIndex, chunkData) {
    const uploadSession = this.activeUploads.get(uploadId.toString()) ||
      await this.loadUploadSession(uploadId);

    if (!uploadSession) {
      throw new Error('Upload session not found');
    }

    try {
      // Validate chunk
      const validationResult = this.validateChunk(uploadSession, chunkIndex, chunkData);
      if (!validationResult.valid) {
        throw new Error(`Invalid chunk: ${validationResult.reason}`);
      }

      // Store chunk with metadata
      const chunkDocument = {
        _id: new ObjectId(),
        uploadId: ObjectId(uploadId),
        chunkIndex: chunkIndex,
        size: chunkData.length,
        hash: this.calculateChunkHash(chunkData),
        data: chunkData,
        uploadedAt: new Date()
      };

      await this.gridFS.db.collection('upload_chunks').insertOne(chunkDocument);

      // Update upload session
      uploadSession.chunks[chunkIndex] = {
        chunkId: chunkDocument._id,
        size: chunkData.length,
        hash: chunkDocument.hash,
        uploadedAt: new Date()
      };

      uploadSession.uploadedSize += chunkData.length;
      uploadSession.lastActivity = new Date();

      // Calculate progress
      const progress = uploadSession.totalSize > 0 ? 
        (uploadSession.uploadedSize / uploadSession.totalSize) * 100 : 0;

      // Update session in database and memory
      await this.updateUploadSession(uploadId, uploadSession);
      this.activeUploads.set(uploadId.toString(), uploadSession);

      return {
        success: true,
        chunkIndex: chunkIndex,
        uploadedSize: uploadSession.uploadedSize,
        totalSize: uploadSession.totalSize,
        progress: Math.round(progress * 100) / 100,
        remainingChunks: this.calculateRemainingChunks(uploadSession)
      };

    } catch (error) {
      console.error(`Chunk upload failed for upload ${uploadId}, chunk ${chunkIndex}:`, error);
      throw error;
    }
  }

  validateChunk(uploadSession, chunkIndex, chunkData) {
    // Check chunk size
    if (chunkData.length > this.chunkSize) {
      return { valid: false, reason: 'Chunk too large' };
    }

    // Check for duplicate chunks
    if (uploadSession.chunks[chunkIndex]) {
      return { valid: false, reason: 'Chunk already exists' };
    }

    // Validate chunk index sequence
    const expectedIndex = uploadSession.chunks.filter(c => c !== undefined).length;
    if (chunkIndex < 0 || (chunkIndex > expectedIndex && chunkIndex !== expectedIndex)) {
      return { valid: false, reason: 'Invalid chunk index sequence' };
    }

    return { valid: true };
  }

  calculateChunkHash(chunkData) {
    const crypto = require('crypto');
    return crypto.createHash('sha256').update(chunkData).digest('hex');
  }

  calculateRemainingChunks(uploadSession) {
    if (!uploadSession.totalSize) return null;

    const totalChunks = Math.ceil(uploadSession.totalSize / this.chunkSize);
    const uploadedChunks = uploadSession.chunks.filter(c => c !== undefined).length;

    return totalChunks - uploadedChunks;
  }

  async finalizeChunkedUpload(uploadId) {
    const uploadSession = this.activeUploads.get(uploadId.toString()) ||
      await this.loadUploadSession(uploadId);

    if (!uploadSession) {
      throw new Error('Upload session not found');
    }

    try {
      // Validate all chunks are present
      const missingChunks = this.findMissingChunks(uploadSession);
      if (missingChunks.length > 0) {
        throw new Error(`Missing chunks: ${missingChunks.join(', ')}`);
      }

      // Create combined file stream from chunks
      const combinedStream = await this.createCombinedStream(uploadId, uploadSession);

      // Upload to GridFS using the advanced manager
      const uploadResult = await this.gridFS.uploadFileWithAdvancedFeatures(
        combinedStream, 
        {
          ...uploadSession.metadata,
          originalUploadId: uploadId,
          uploadMethod: 'chunked',
          chunkCount: uploadSession.chunks.length,
          finalizedAt: new Date()
        }
      );

      // Cleanup temporary chunks
      await this.cleanupChunkedUpload(uploadId);

      return {
        success: true,
        fileId: uploadResult.fileId,
        filename: uploadResult.filename,
        size: uploadResult.size,
        compressionRatio: uploadResult.compressionRatio,
        uploadMethod: 'chunked',
        totalChunks: uploadSession.chunks.length
      };

    } catch (error) {
      console.error(`Finalization failed for upload ${uploadId}:`, error);
      throw error;
    }
  }

  findMissingChunks(uploadSession) {
    const missingChunks = [];
    const totalChunks = Math.ceil(uploadSession.totalSize / this.chunkSize);

    for (let i = 0; i < totalChunks; i++) {
      if (!uploadSession.chunks[i]) {
        missingChunks.push(i);
      }
    }

    return missingChunks;
  }

  async createCombinedStream(uploadId, uploadSession) {
    const { Readable } = require('stream');

    return new Readable({
      read() {
        // This is a simplified implementation
        // In production, you'd want to stream chunks sequentially
        this.push(null); // End of stream
      }
    });
  }

  async cleanupChunkedUpload(uploadId) {
    // Remove chunks from database
    await this.gridFS.db.collection('upload_chunks').deleteMany({
      uploadId: ObjectId(uploadId)
    });

    // Remove upload session
    await this.gridFS.db.collection('upload_sessions').deleteOne({
      _id: ObjectId(uploadId)
    });

    // Remove from memory
    this.activeUploads.delete(uploadId.toString());
  }

  async loadUploadSession(uploadId) {
    const session = await this.gridFS.db.collection('upload_sessions').findOne({
      _id: ObjectId(uploadId)
    });

    if (session) {
      this.activeUploads.set(uploadId.toString(), session);
      return session;
    }

    return null;
  }

  async updateUploadSession(uploadId, session) {
    await this.gridFS.db.collection('upload_sessions').updateOne(
      { _id: ObjectId(uploadId) },
      { 
        $set: {
          chunks: session.chunks,
          uploadedSize: session.uploadedSize,
          lastActivity: session.lastActivity
        }
      }
    );
  }

  async getUploadProgress(uploadId) {
    const uploadSession = this.activeUploads.get(uploadId.toString()) ||
      await this.loadUploadSession(uploadId);

    if (!uploadSession) {
      return { found: false };
    }

    const progress = uploadSession.totalSize > 0 ? 
      (uploadSession.uploadedSize / uploadSession.totalSize) * 100 : 0;

    return {
      found: true,
      uploadId: uploadId,
      progress: Math.round(progress * 100) / 100,
      uploadedSize: uploadSession.uploadedSize,
      totalSize: uploadSession.totalSize,
      uploadedChunks: uploadSession.chunks.filter(c => c !== undefined).length,
      totalChunks: uploadSession.totalSize > 0 ? 
        Math.ceil(uploadSession.totalSize / this.chunkSize) : 0,
      status: uploadSession.status,
      createdAt: uploadSession.createdAt,
      lastActivity: uploadSession.lastActivity
    };
  }
}

SQL-Style GridFS Operations with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB GridFS operations:

-- QueryLeaf GridFS operations with SQL-familiar syntax

-- Upload file with advanced features
INSERT INTO gridfs_files (
  filename,
  content_type,
  metadata,
  data_stream,
  compression_enabled,
  optimization_level
)
VALUES (
  'large_video.mp4',
  'video/mp4',
  JSON_BUILD_OBJECT(
    'category', 'multimedia',
    'uploadedBy', 'user_123',
    'description', 'Training video content',
    'tags', ARRAY['training', 'video', 'multimedia'],
    'retentionDays', 730,
    'enableCompression', true,
    'qualityOptimization', true
  ),
  -- Stream data (handled by QueryLeaf GridFS interface)
  @video_stream,
  true,  -- Enable compression
  'high' -- Optimization level
);

-- Query files with advanced filtering
SELECT 
  file_id,
  filename,
  content_type,
  file_size_mb,
  upload_date,
  metadata.category,
  metadata.tags,
  metadata.uploadedBy,

  -- Calculated fields
  CASE 
    WHEN file_size_mb > 100 THEN 'large'
    WHEN file_size_mb > 10 THEN 'medium'
    ELSE 'small'
  END as size_category,

  -- Compression information
  metadata.compressed as is_compressed,
  metadata.compressionRatio as compression_ratio,
  ROUND((original_size_mb - file_size_mb) / original_size_mb * 100, 1) as space_saved_percent,

  -- Access statistics
  metadata.accessCount as total_downloads,
  metadata.lastAccessed,

  -- Lifecycle information
  metadata.expirationDate,
  CASE 
    WHEN metadata.expirationDate < CURRENT_TIMESTAMP THEN 'expired'
    WHEN metadata.expirationDate < CURRENT_TIMESTAMP + INTERVAL '30 days' THEN 'expiring_soon'
    ELSE 'active'
  END as lifecycle_status

FROM gridfs_files
WHERE metadata.category = 'multimedia'
  AND upload_date >= CURRENT_TIMESTAMP - INTERVAL '90 days'
  AND file_size_mb BETWEEN 1 AND 1000
ORDER BY upload_date DESC, file_size_mb DESC
LIMIT 50;

-- Advanced file search with full-text capabilities
SELECT 
  file_id,
  filename,
  content_type,
  file_size_mb,
  metadata.description,
  metadata.tags,

  -- Search relevance scoring
  TEXTRANK() as relevance_score,

  -- File categorization
  metadata.category,

  -- Performance metrics
  metadata.compressionRatio,
  metadata.optimizations

FROM gridfs_files
WHERE TEXTSEARCH('training video multimedia')
   OR filename ILIKE '%training%'
   OR metadata.tags && ARRAY['training', 'video']
   OR metadata.description ILIKE '%training%'
ORDER BY relevance_score DESC, upload_date DESC;

-- Aggregated file statistics by category
WITH file_analytics AS (
  SELECT 
    metadata.category,
    COUNT(*) as file_count,
    SUM(file_size_mb) as total_size_mb,
    AVG(file_size_mb) as avg_size_mb,
    MIN(file_size_mb) as min_size_mb,
    MAX(file_size_mb) as max_size_mb,

    -- Compression analysis
    COUNT(*) FILTER (WHERE metadata.compressed = true) as compressed_files,
    AVG(metadata.compressionRatio) FILTER (WHERE metadata.compressed = true) as avg_compression_ratio,

    -- Access patterns
    SUM(metadata.accessCount) as total_downloads,
    AVG(metadata.accessCount) as avg_downloads_per_file,

    -- Date ranges
    MIN(upload_date) as earliest_upload,
    MAX(upload_date) as latest_upload,

    -- Content type distribution
    array_agg(DISTINCT content_type) as content_types

  FROM gridfs_files
  WHERE upload_date >= CURRENT_TIMESTAMP - INTERVAL '1 year'
  GROUP BY metadata.category
),

storage_efficiency AS (
  SELECT 
    category,
    file_count,
    total_size_mb,
    compressed_files,

    -- Storage efficiency metrics
    ROUND((compressed_files::numeric / file_count * 100), 1) as compression_rate_percent,
    ROUND(avg_compression_ratio, 2) as avg_compression_ratio,
    ROUND((total_size_mb * (1 - avg_compression_ratio)), 1) as estimated_space_saved_mb,

    -- Performance insights
    ROUND(avg_size_mb, 1) as avg_file_size_mb,
    ROUND(total_downloads::numeric / file_count, 1) as avg_downloads_per_file,

    -- Category health score
    CASE 
      WHEN compression_rate_percent > 80 AND avg_compression_ratio < 0.7 THEN 'excellent'
      WHEN compression_rate_percent > 60 AND avg_compression_ratio < 0.8 THEN 'good'
      WHEN compression_rate_percent > 40 THEN 'fair'
      ELSE 'poor'
    END as storage_efficiency_rating,

    content_types

  FROM file_analytics
)

SELECT 
  category,
  file_count,
  total_size_mb,
  avg_file_size_mb,
  compression_rate_percent,
  avg_compression_ratio,
  estimated_space_saved_mb,
  storage_efficiency_rating,
  avg_downloads_per_file,

  -- Recommendations
  CASE storage_efficiency_rating
    WHEN 'poor' THEN 'Consider enabling compression for more files in this category'
    WHEN 'fair' THEN 'Review compression settings to improve storage efficiency'
    WHEN 'good' THEN 'Storage efficiency is good, monitor for further optimization'
    ELSE 'Excellent storage efficiency - current settings are optimal'
  END as optimization_recommendation,

  ARRAY_LENGTH(content_types, 1) as content_type_variety,
  content_types

FROM storage_efficiency
ORDER BY total_size_mb DESC;

-- File lifecycle management with retention policies
WITH expiring_files AS (
  SELECT 
    file_id,
    filename,
    content_type,
    file_size_mb,
    upload_date,
    metadata.expirationDate,
    metadata.retentionDays,
    metadata.category,
    metadata.uploadedBy,
    metadata.accessCount,
    metadata.lastAccessed,

    -- Calculate time until expiration
    metadata.expirationDate - CURRENT_TIMESTAMP as time_until_expiration,

    -- Classify expiration urgency
    CASE 
      WHEN metadata.expirationDate < CURRENT_TIMESTAMP THEN 'expired'
      WHEN metadata.expirationDate < CURRENT_TIMESTAMP + INTERVAL '7 days' THEN 'expires_this_week'
      WHEN metadata.expirationDate < CURRENT_TIMESTAMP + INTERVAL '30 days' THEN 'expires_this_month'
      WHEN metadata.expirationDate < CURRENT_TIMESTAMP + INTERVAL '90 days' THEN 'expires_this_quarter'
      ELSE 'expires_later'
    END as expiration_urgency,

    -- Calculate retention recommendation
    CASE 
      WHEN metadata.accessCount = 0 THEN 'delete_unused'
      WHEN metadata.lastAccessed < CURRENT_TIMESTAMP - INTERVAL '180 days' THEN 'archive_old'
      WHEN metadata.accessCount > 10 AND metadata.lastAccessed > CURRENT_TIMESTAMP - INTERVAL '30 days' THEN 'extend_retention'
      ELSE 'maintain_current'
    END as retention_recommendation

  FROM gridfs_files
  WHERE metadata.expirationDate IS NOT NULL
),

retention_actions AS (
  SELECT 
    expiration_urgency,
    retention_recommendation,
    COUNT(*) as file_count,
    SUM(file_size_mb) as total_size_mb,

    -- Files by category
    array_agg(DISTINCT category) as categories_affected,

    -- Size distribution
    ROUND(AVG(file_size_mb), 1) as avg_file_size_mb,
    ROUND(PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY file_size_mb), 1) as median_file_size_mb,

    -- Access patterns
    ROUND(AVG(accessCount), 1) as avg_access_count,
    MIN(lastAccessed) as oldest_last_access,
    MAX(lastAccessed) as newest_last_access

  FROM expiring_files
  GROUP BY expiration_urgency, retention_recommendation
)

SELECT 
  expiration_urgency,
  retention_recommendation,
  file_count,
  total_size_mb,

  -- Action priority scoring
  CASE expiration_urgency
    WHEN 'expired' THEN 100
    WHEN 'expires_this_week' THEN 90
    WHEN 'expires_this_month' THEN 70
    WHEN 'expires_this_quarter' THEN 50
    ELSE 30
  END as action_priority_score,

  -- Recommended actions
  CASE retention_recommendation
    WHEN 'delete_unused' THEN 'DELETE - No access history, safe to remove'
    WHEN 'archive_old' THEN 'ARCHIVE - Move to cold storage or compress further'
    WHEN 'extend_retention' THEN 'EXTEND - Popular file, consider extending retention period'
    ELSE 'MONITOR - Continue with current retention policy'
  END as recommended_action,

  -- Resource impact
  CONCAT(ROUND((total_size_mb / 1024), 1), ' GB') as storage_impact,

  categories_affected,
  avg_file_size_mb,
  avg_access_count

FROM retention_actions
ORDER BY action_priority_score DESC, total_size_mb DESC;

-- Real-time file transfer monitoring
SELECT 
  transfer_id,
  operation_type, -- 'upload' or 'download'
  file_id,
  filename,

  -- Progress tracking
  bytes_transferred,
  total_bytes,
  ROUND((bytes_transferred::numeric / NULLIF(total_bytes, 0)) * 100, 1) as progress_percent,

  -- Performance metrics
  transfer_rate_mbps,
  estimated_time_remaining_seconds,

  -- Transfer details
  client_ip,
  user_agent,
  compression_enabled,
  encryption_enabled,

  -- Status and timing
  status, -- 'in_progress', 'completed', 'failed', 'paused'
  started_at,
  EXTRACT(EPOCH FROM (CURRENT_TIMESTAMP - started_at)) as duration_seconds,

  -- Quality metrics
  error_count,
  retry_count,

  CASE 
    WHEN transfer_rate_mbps > 10 THEN 'fast'
    WHEN transfer_rate_mbps > 1 THEN 'normal'
    ELSE 'slow'
  END as transfer_speed_rating

FROM active_file_transfers
WHERE status IN ('in_progress', 'paused')
  AND started_at >= CURRENT_TIMESTAMP - INTERVAL '24 hours'
ORDER BY started_at DESC;

-- Performance optimization analysis
WITH transfer_performance AS (
  SELECT 
    DATE_TRUNC('hour', started_at) as time_bucket,
    operation_type,

    -- Volume metrics
    COUNT(*) as transfer_count,
    SUM(total_bytes) / (1024 * 1024 * 1024) as total_gb_transferred,

    -- Performance metrics
    AVG(transfer_rate_mbps) as avg_transfer_rate_mbps,
    PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY transfer_rate_mbps) as p95_transfer_rate_mbps,
    MIN(transfer_rate_mbps) as min_transfer_rate_mbps,

    -- Success rates
    COUNT(*) FILTER (WHERE status = 'completed') as successful_transfers,
    COUNT(*) FILTER (WHERE status = 'failed') as failed_transfers,
    COUNT(*) FILTER (WHERE retry_count > 0) as transfers_with_retries,

    -- Timing analysis
    AVG(EXTRACT(EPOCH FROM (completed_at - started_at))) as avg_duration_seconds,
    MAX(EXTRACT(EPOCH FROM (completed_at - started_at))) as max_duration_seconds,

    -- Feature usage
    COUNT(*) FILTER (WHERE compression_enabled = true) as compressed_transfers,
    COUNT(*) FILTER (WHERE encryption_enabled = true) as encrypted_transfers

  FROM active_file_transfers
  WHERE started_at >= CURRENT_TIMESTAMP - INTERVAL '7 days'
    AND status IN ('completed', 'failed')
  GROUP BY DATE_TRUNC('hour', started_at), operation_type
)

SELECT 
  time_bucket,
  operation_type,
  transfer_count,
  total_gb_transferred,

  -- Performance indicators
  ROUND(avg_transfer_rate_mbps, 1) as avg_speed_mbps,
  ROUND(p95_transfer_rate_mbps, 1) as p95_speed_mbps,

  -- Success metrics
  ROUND((successful_transfers::numeric / transfer_count * 100), 1) as success_rate_percent,
  ROUND((transfers_with_retries::numeric / transfer_count * 100), 1) as retry_rate_percent,

  -- Duration insights
  ROUND(avg_duration_seconds / 60, 1) as avg_duration_minutes,
  ROUND(max_duration_seconds / 60, 1) as max_duration_minutes,

  -- Feature adoption
  ROUND((compressed_transfers::numeric / transfer_count * 100), 1) as compression_usage_percent,
  ROUND((encrypted_transfers::numeric / transfer_count * 100), 1) as encryption_usage_percent,

  -- Performance rating
  CASE 
    WHEN avg_transfer_rate_mbps > 50 THEN 'excellent'
    WHEN avg_transfer_rate_mbps > 20 THEN 'good'
    WHEN avg_transfer_rate_mbps > 5 THEN 'fair'
    ELSE 'poor'
  END as performance_rating,

  -- Optimization recommendations
  CASE 
    WHEN avg_transfer_rate_mbps < 5 AND compression_usage_percent < 50 THEN 'Enable compression to improve transfer speeds'
    WHEN retry_rate_percent > 20 THEN 'High retry rate indicates network issues or oversized chunks'
    WHEN success_rate_percent < 95 THEN 'Investigate transfer failures and improve error handling'
    WHEN max_duration_minutes > 60 THEN 'Consider chunked uploads for very large files'
    ELSE 'Performance within acceptable ranges'
  END as optimization_recommendation

FROM transfer_performance
ORDER BY time_bucket DESC;

-- QueryLeaf provides comprehensive GridFS capabilities:
-- 1. SQL-familiar file upload/download operations with streaming support
-- 2. Advanced compression and optimization through SQL parameters
-- 3. Full-text search capabilities for file metadata and content
-- 4. Comprehensive file analytics and storage optimization insights
-- 5. Automated lifecycle management with retention policy enforcement
-- 6. Real-time transfer monitoring and performance analysis
-- 7. Integration with MongoDB's native GridFS optimizations
-- 8. Familiar SQL patterns for complex file management operations

Best Practices for GridFS Production Deployment

Performance Optimization Strategies

Essential optimization techniques for high-throughput GridFS deployments:

  1. Chunk Size Optimization: Configure appropriate chunk sizes based on file types and access patterns
  2. Index Strategy: Create compound indexes on file metadata for efficient queries
  3. Compression Algorithms: Choose optimal compression based on file type and performance requirements
  4. Connection Pooling: Implement appropriate connection pooling for concurrent file operations
  5. Caching Layer: Add CDN or caching layer for frequently accessed files
  6. Monitoring Setup: Implement comprehensive monitoring for file operations and storage usage

Storage Architecture Design

Design principles for scalable GridFS storage systems:

  1. Sharding Strategy: Plan sharding keys for optimal file distribution across cluster nodes
  2. Replica Configuration: Configure appropriate read preferences for file access patterns
  3. Storage Tiering: Implement hot/cold storage strategies for lifecycle management
  4. Backup Strategy: Design comprehensive backup and recovery procedures for file data
  5. Security Implementation: Implement encryption, access controls, and audit logging
  6. Capacity Planning: Plan storage growth and performance scaling requirements

Conclusion

MongoDB GridFS Advanced Streaming and Compression provides enterprise-grade file storage capabilities that eliminate the complexity and limitations of traditional file systems while delivering sophisticated streaming, optimization, and management features. The ability to store, process, and serve large files with built-in compression, encryption, and metadata management makes building robust file storage systems both powerful and straightforward.

Key GridFS Advanced Features include:

  • Streaming Optimization: Efficient chunked uploads and downloads with progress tracking
  • Advanced Compression: Intelligent compression strategies based on file type and content
  • Metadata Integration: Rich metadata storage with full-text search capabilities
  • Performance Monitoring: Real-time transfer monitoring and optimization insights
  • Lifecycle Management: Automated retention policies and storage optimization
  • SQL Accessibility: Familiar file operations through QueryLeaf's SQL interface

Whether you're building content management systems, media platforms, document repositories, or backup solutions, MongoDB GridFS with QueryLeaf's SQL interface provides the foundation for scalable file storage that integrates seamlessly with your application data while maintaining familiar database interaction patterns.

QueryLeaf Integration: QueryLeaf automatically translates SQL file operations into MongoDB GridFS commands, making advanced file storage accessible through familiar SQL patterns. Complex streaming scenarios, compression settings, and metadata queries are seamlessly handled through standard SQL syntax, enabling developers to build powerful file management features without learning GridFS-specific APIs.

The combination of MongoDB's robust file storage capabilities with SQL-familiar operations creates an ideal platform for applications requiring both sophisticated file management and familiar database interaction patterns, ensuring your file storage systems remain both scalable and maintainable as your data requirements evolve.

MongoDB Full-Text Search and Advanced Text Indexing: Query Optimization and Natural Language Processing

Modern applications require sophisticated text search capabilities that go beyond simple pattern matching to deliver intelligent, contextual search experiences. MongoDB's full-text search features provide comprehensive text indexing, natural language processing, relevance scoring, and advanced query capabilities that rival dedicated search engines while maintaining the simplicity and integration benefits of database-native search functionality.

MongoDB text indexes automatically handle stemming, stop word filtering, language-specific text processing, and relevance scoring, enabling developers to build powerful search features without the complexity of maintaining separate search infrastructure. Combined with aggregation pipelines and flexible document structures, MongoDB delivers enterprise-grade text search capabilities with familiar SQL-style query patterns.

The Text Search Challenge

Traditional database text queries using LIKE patterns are inefficient and limited:

-- Traditional SQL text search - limited and slow
SELECT product_id, name, description, category
FROM products
WHERE name LIKE '%wireless%'
   OR description LIKE '%bluetooth%'
   OR category LIKE '%electronics%';

-- Problems with pattern matching:
-- 1. Case sensitivity issues
-- 2. No support for word variations (wireless vs wirelessly)
-- 3. No relevance scoring or ranking
-- 4. Poor performance on large text fields
-- 5. No natural language processing
-- 6. Limited multi-field search capabilities
-- 7. No fuzzy matching or typo tolerance

MongoDB text search provides sophisticated alternatives:

// Create comprehensive text index for advanced search
db.products.createIndex({
  "name": "text",
  "description": "text", 
  "category": "text",
  "tags": "text",
  "brand": "text",
  "specifications.features": "text"
}, {
  "default_language": "english",
  "language_override": "language",
  "name": "comprehensive_product_search",
  "weights": {
    "name": 10,           // Highest relevance for product names
    "category": 8,        // High relevance for categories
    "brand": 6,           // Medium-high relevance for brands
    "description": 4,     // Medium relevance for descriptions
    "tags": 3,            // Lower relevance for tags
    "specifications.features": 2  // Lowest relevance for detailed specs
  },
  "textIndexVersion": 3
})

// Sample product document structure optimized for text search
{
  "_id": ObjectId("..."),
  "product_id": "ELEC_HEADPHONE_001",
  "name": "Premium Wireless Noise-Canceling Headphones",
  "description": "Experience crystal-clear audio with advanced noise cancellation technology. Perfect for travel, office work, or immersive music listening sessions.",
  "category": "Electronics",
  "subcategory": "Audio Equipment",
  "brand": "TechAudio Pro",
  "price": 299.99,
  "currency": "USD",

  // Optimized for search
  "tags": ["wireless", "bluetooth", "noise-canceling", "premium", "travel", "office"],
  "search_keywords": ["headphones", "audio", "music", "wireless", "bluetooth", "noise-canceling"],

  // Product specifications with searchable features
  "specifications": {
    "features": [
      "Active noise cancellation",
      "Bluetooth 5.0 connectivity", 
      "40-hour battery life",
      "Fast charging capability",
      "Multi-device pairing",
      "Voice assistant integration"
    ],
    "technical": {
      "driver_size": "40mm",
      "frequency_response": "20Hz-20kHz",
      "impedance": "32 ohms",
      "weight": "250g"
    }
  },

  // Multi-language support
  "language": "english",
  "translations": {
    "spanish": {
      "name": "Auriculares Inalámbricos Premium con Cancelación de Ruido",
      "description": "Experimenta audio cristalino con tecnología avanzada de cancelación de ruido."
    }
  },

  // Search analytics and optimization
  "search_metadata": {
    "search_count": 0,
    "popular_queries": [],
    "last_updated": ISODate("2025-12-29T00:00:00Z"),
    "seo_optimized": true
  },

  // Business metadata
  "inventory": {
    "stock_count": 150,
    "availability": "in_stock",
    "warehouse_locations": ["US-WEST", "US-EAST", "EU-CENTRAL"]
  },
  "ratings": {
    "average_rating": 4.7,
    "total_reviews": 342,
    "rating_distribution": {
      "5_star": 198,
      "4_star": 89,
      "3_star": 35,
      "2_star": 12,
      "1_star": 8
    }
  },
  "created_at": ISODate("2025-12-01T00:00:00Z"),
  "updated_at": ISODate("2025-12-29T00:00:00Z")
}

Advanced Text Search Queries

Basic Full-Text Search with Relevance Scoring

// Comprehensive text search with relevance scoring
db.products.aggregate([
  // Stage 1: Text search with scoring
  {
    $match: {
      $text: {
        $search: "wireless bluetooth headphones",
        $caseSensitive: false,
        $diacriticSensitive: false
      }
    }
  },

  // Stage 2: Add relevance score and metadata
  {
    $addFields: {
      relevance_score: { $meta: "textScore" },
      search_query: "wireless bluetooth headphones",
      search_timestamp: "$$NOW"
    }
  },

  // Stage 3: Enhanced relevance calculation
  {
    $addFields: {
      // Boost scores based on business factors
      boosted_score: {
        $multiply: [
          "$relevance_score",
          {
            $add: [
              1, // Base multiplier

              // Availability boost
              { $cond: [{ $eq: ["$inventory.availability", "in_stock"] }, 0.3, 0] },

              // Rating boost (high-rated products get higher relevance)
              { $multiply: [{ $subtract: ["$ratings.average_rating", 3] }, 0.1] },

              // Popular product boost
              { $cond: [{ $gt: ["$ratings.total_reviews", 100] }, 0.2, 0] },

              // Price range boost (mid-range products favored)
              {
                $cond: [
                  { $and: [{ $gte: ["$price", 50] }, { $lte: ["$price", 500] }] },
                  0.15,
                  0
                ]
              }
            ]
          }
        ]
      }
    }
  },

  // Stage 4: Category and brand analysis
  {
    $addFields: {
      category_match: {
        $cond: [
          { $regexMatch: { input: "$category", regex: /electronics|audio/i } },
          true,
          false
        ]
      },
      brand_popularity_score: {
        $switch: {
          branches: [
            { case: { $in: ["$brand", ["TechAudio Pro", "SoundMaster", "AudioElite"]] }, then: 1.2 },
            { case: { $in: ["$brand", ["BasicSound", "EcoAudio", "ValueTech"]] }, then: 0.9 }
          ],
          default: 1.0
        }
      }
    }
  },

  // Stage 5: Final score calculation
  {
    $addFields: {
      final_relevance_score: {
        $multiply: [
          "$boosted_score",
          "$brand_popularity_score",
          { $cond: ["$category_match", 1.1, 1.0] }
        ]
      }
    }
  },

  // Stage 6: Filter and sort results
  {
    $match: {
      "relevance_score": { $gte: 0.5 }, // Minimum relevance threshold
      "inventory.availability": { $ne: "discontinued" }
    }
  },

  // Stage 7: Sort by relevance and business factors
  {
    $sort: {
      "final_relevance_score": -1,
      "ratings.average_rating": -1,
      "ratings.total_reviews": -1
    }
  },

  // Stage 8: Project search results
  {
    $project: {
      product_id: 1,
      name: 1,
      description: 1,
      category: 1,
      brand: 1,
      price: 1,
      currency: 1,

      // Search relevance information
      search_metadata: {
        relevance_score: { $round: ["$relevance_score", 3] },
        boosted_score: { $round: ["$boosted_score", 3] },
        final_score: { $round: ["$final_relevance_score", 3] },
        search_query: "$search_query",
        search_timestamp: "$search_timestamp"
      },

      // Business information
      availability: "$inventory.availability",
      stock_count: "$inventory.stock_count",
      rating_info: {
        average_rating: "$ratings.average_rating",
        total_reviews: "$ratings.total_reviews"
      },

      // Highlighted text fields for search result display
      search_highlights: {
        name_highlight: "$name",
        description_highlight: { $substr: ["$description", 0, 150] },
        category_match: "$category_match"
      }
    }
  },

  { $limit: 20 }
])
// Multi-language text search with language detection
db.products.aggregate([
  // Stage 1: Detect search language and perform text search
  {
    $match: {
      $or: [
        // English search
        { 
          $and: [
            { $text: { $search: "auriculares inalámbricos" } },
            { "language": "spanish" }
          ]
        },
        // Spanish search
        {
          $and: [
            { $text: { $search: "wireless headphones" } },
            { "language": "english" }
          ]
        },
        // Language-agnostic search (fallback)
        { $text: { $search: "auriculares inalámbricos wireless headphones" } }
      ]
    }
  },

  // Stage 2: Language-aware scoring
  {
    $addFields: {
      base_score: { $meta: "textScore" },
      detected_language: {
        $cond: [
          { $regexMatch: { input: "auriculares inalámbricos", regex: /[áéíóúñü]/i } },
          "spanish",
          "english"
        ]
      }
    }
  },

  // Stage 3: Apply language-specific boosts
  {
    $addFields: {
      language_adjusted_score: {
        $multiply: [
          "$base_score",
          {
            $cond: [
              { $eq: ["$detected_language", "$language"] },
              1.5, // Boost for exact language match
              1.0  // Standard score for cross-language matches
            ]
          }
        ]
      }
    }
  },

  // Stage 4: Add localized content
  {
    $addFields: {
      localized_name: {
        $cond: [
          { $eq: ["$detected_language", "spanish"] },
          "$translations.spanish.name",
          "$name"
        ]
      },
      localized_description: {
        $cond: [
          { $eq: ["$detected_language", "spanish"] },
          "$translations.spanish.description", 
          "$description"
        ]
      }
    }
  },

  { $sort: { "language_adjusted_score": -1 } },
  { $limit: 15 }
])

Faceted Search with Text Queries

// Advanced faceted search combining text search with categorical filtering
db.products.aggregate([
  // Stage 1: Text search foundation
  {
    $match: {
      $text: { $search: "gaming laptop high performance" }
    }
  },

  // Stage 2: Add text relevance score
  {
    $addFields: {
      text_score: { $meta: "textScore" }
    }
  },

  // Stage 3: Create facet aggregations
  {
    $facet: {
      // Main search results
      "search_results": [
        {
          $match: {
            "text_score": { $gte: 0.5 },
            "inventory.availability": { $in: ["in_stock", "limited_stock"] }
          }
        },
        {
          $sort: { "text_score": -1, "ratings.average_rating": -1 }
        },
        {
          $project: {
            product_id: 1,
            name: 1,
            brand: 1,
            price: 1,
            category: 1,
            subcategory: 1,
            text_score: { $round: ["$text_score", 2] },
            rating: "$ratings.average_rating",
            availability: "$inventory.availability"
          }
        },
        { $limit: 50 }
      ],

      // Price range facets
      "price_facets": [
        {
          $bucket: {
            groupBy: "$price",
            boundaries: [0, 100, 300, 500, 1000, 2000, 5000],
            default: "5000+",
            output: {
              count: { $sum: 1 },
              avg_price: { $avg: "$price" },
              avg_rating: { $avg: "$ratings.average_rating" }
            }
          }
        }
      ],

      // Category facets
      "category_facets": [
        {
          $group: {
            _id: "$category",
            count: { $sum: 1 },
            avg_score: { $avg: "$text_score" },
            avg_price: { $avg: "$price" },
            avg_rating: { $avg: "$ratings.average_rating" }
          }
        },
        { $sort: { "avg_score": -1 } }
      ],

      // Brand facets
      "brand_facets": [
        {
          $group: {
            _id: "$brand",
            count: { $sum: 1 },
            avg_score: { $avg: "$text_score" },
            price_range: {
              min_price: { $min: "$price" },
              max_price: { $max: "$price" }
            },
            avg_rating: { $avg: "$ratings.average_rating" }
          }
        },
        { $sort: { "count": -1, "avg_score": -1 } },
        { $limit: 15 }
      ],

      // Rating facets
      "rating_facets": [
        {
          $bucket: {
            groupBy: "$ratings.average_rating",
            boundaries: [0, 2, 3, 4, 4.5, 5],
            default: "unrated",
            output: {
              count: { $sum: 1 },
              avg_score: { $avg: "$text_score" },
              price_range: {
                min_price: { $min: "$price" },
                max_price: { $max: "$price" }
              }
            }
          }
        }
      ],

      // Availability facets
      "availability_facets": [
        {
          $group: {
            _id: "$inventory.availability",
            count: { $sum: 1 },
            avg_score: { $avg: "$text_score" }
          }
        }
      ],

      // Search analytics
      "search_analytics": [
        {
          $group: {
            _id: null,
            total_results: { $sum: 1 },
            avg_relevance_score: { $avg: "$text_score" },
            score_distribution: {
              high_relevance: { $sum: { $cond: [{ $gte: ["$text_score", 1.5] }, 1, 0] } },
              medium_relevance: { $sum: { $cond: [{ $and: [{ $gte: ["$text_score", 1.0] }, { $lt: ["$text_score", 1.5] }] }, 1, 0] } },
              low_relevance: { $sum: { $cond: [{ $lt: ["$text_score", 1.0] }, 1, 0] } }
            },
            price_stats: {
              avg_price: { $avg: "$price" },
              min_price: { $min: "$price" },
              max_price: { $max: "$price" }
            }
          }
        }
      ]
    }
  },

  // Stage 4: Format faceted results
  {
    $project: {
      search_results: 1,
      facets: {
        price_ranges: "$price_facets",
        categories: "$category_facets", 
        brands: "$brand_facets",
        ratings: "$rating_facets",
        availability: "$availability_facets"
      },
      search_summary: {
        $arrayElemAt: ["$search_analytics", 0]
      }
    }
  }
])

Auto-Complete and Suggestion Engine

// Intelligent auto-complete system with typo tolerance
class MongoDBAutoCompleteEngine {
  constructor(db, collection) {
    this.db = db;
    this.collection = collection;
    this.suggestionCache = new Map();
  }

  async createAutoCompleteIndexes() {
    // Create text index for auto-complete
    await this.db[this.collection].createIndex({
      "name": "text",
      "tags": "text",
      "search_keywords": "text",
      "category": "text",
      "brand": "text"
    }, {
      name: "autocomplete_text_index",
      weights: {
        "name": 10,
        "brand": 8,
        "category": 6,
        "tags": 4,
        "search_keywords": 3
      }
    });

    // Create prefix-based indexes for fast auto-complete
    await this.db[this.collection].createIndex({
      "name": 1,
      "category": 1,
      "brand": 1
    }, {
      name: "prefix_autocomplete_index"
    });
  }

  async getAutoCompleteSuggestions(query, options = {}) {
    const maxSuggestions = options.maxSuggestions || 10;
    const includeCategories = options.includeCategories !== false;
    const includeBrands = options.includeBrands !== false;
    const minScore = options.minScore || 0.3;

    try {
      const suggestions = await this.db[this.collection].aggregate([
        // Stage 1: Multi-approach matching
        {
          $facet: {
            // Exact prefix matching
            "prefix_matches": [
              {
                $match: {
                  $or: [
                    { "name": { $regex: `^${this.escapeRegex(query)}`, $options: "i" } },
                    { "brand": { $regex: `^${this.escapeRegex(query)}`, $options: "i" } },
                    { "category": { $regex: `^${this.escapeRegex(query)}`, $options: "i" } }
                  ]
                }
              },
              {
                $addFields: {
                  suggestion_type: "prefix_match",
                  suggestion_text: {
                    $cond: [
                      { $regexMatch: { input: "$name", regex: new RegExp(`^${this.escapeRegex(query)}`, "i") } },
                      "$name",
                      {
                        $cond: [
                          { $regexMatch: { input: "$brand", regex: new RegExp(`^${this.escapeRegex(query)}`, "i") } },
                          "$brand",
                          "$category"
                        ]
                      }
                    ]
                  },
                  relevance_score: 2.0
                }
              },
              { $limit: 5 }
            ],

            // Full-text search suggestions
            "text_matches": [
              {
                $match: {
                  $text: { $search: query }
                }
              },
              {
                $addFields: {
                  suggestion_type: "text_match",
                  suggestion_text: "$name",
                  relevance_score: { $meta: "textScore" }
                }
              },
              {
                $match: { "relevance_score": { $gte: minScore } }
              },
              { $limit: 5 }
            ],

            // Fuzzy matching for typo tolerance
            "fuzzy_matches": [
              {
                $match: {
                  $or: [
                    { "name": { $regex: this.generateFuzzyRegex(query), $options: "i" } },
                    { "tags": { $elemMatch: { $regex: this.generateFuzzyRegex(query), $options: "i" } } }
                  ]
                }
              },
              {
                $addFields: {
                  suggestion_type: "fuzzy_match",
                  suggestion_text: "$name",
                  relevance_score: 1.0
                }
              },
              { $limit: 3 }
            ]
          }
        },

        // Stage 2: Combine and deduplicate suggestions
        {
          $project: {
            all_suggestions: {
              $concatArrays: ["$prefix_matches", "$text_matches", "$fuzzy_matches"]
            }
          }
        },

        // Stage 3: Unwind and process suggestions
        { $unwind: "$all_suggestions" },
        { $replaceRoot: { newRoot: "$all_suggestions" } },

        // Stage 4: Group to remove duplicates
        {
          $group: {
            _id: "$suggestion_text",
            suggestion_type: { $first: "$suggestion_type" },
            relevance_score: { $max: "$relevance_score" },
            product_count: { $sum: 1 },
            sample_product: { $first: "$$ROOT" }
          }
        },

        // Stage 5: Enhanced relevance scoring
        {
          $addFields: {
            final_score: {
              $multiply: [
                "$relevance_score",
                {
                  $switch: {
                    branches: [
                      { case: { $eq: ["$suggestion_type", "prefix_match"] }, then: 1.5 },
                      { case: { $eq: ["$suggestion_type", "text_match"] }, then: 1.2 },
                      { case: { $eq: ["$suggestion_type", "fuzzy_match"] }, then: 0.8 }
                    ],
                    default: 1.0
                  }
                },
                // Boost popular suggestions
                { $add: [1.0, { $multiply: [{ $ln: "$product_count" }, 0.1] }] }
              ]
            }
          }
        },

        // Stage 6: Sort and limit results
        { $sort: { "final_score": -1, "_id": 1 } },
        { $limit: maxSuggestions },

        // Stage 7: Format final suggestions
        {
          $project: {
            suggestion: "$_id",
            type: "$suggestion_type",
            score: { $round: ["$final_score", 2] },
            product_count: 1,
            category: "$sample_product.category",
            brand: "$sample_product.brand"
          }
        }
      ]).toArray();

      return suggestions;

    } catch (error) {
      console.error('Auto-complete error:', error);
      return [];
    }
  }

  escapeRegex(string) {
    return string.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
  }

  generateFuzzyRegex(query) {
    // Simple fuzzy matching - allows one character difference per 4 characters
    const chars = query.split('');
    const pattern = chars.map((char, index) => {
      if (index % 4 === 0 && index > 0) {
        return `.?${this.escapeRegex(char)}`;
      }
      return this.escapeRegex(char);
    }).join('');

    return pattern;
  }

  async getSearchHistory(userId, limit = 20) {
    return await this.db.search_history.aggregate([
      { $match: { user_id: userId } },
      {
        $group: {
          _id: "$query",
          search_count: { $sum: 1 },
          last_searched: { $max: "$timestamp" },
          avg_results: { $avg: "$result_count" }
        }
      },
      { $sort: { "search_count": -1, "last_searched": -1 } },
      { $limit: limit },
      {
        $project: {
          query: "$_id",
          search_count: 1,
          last_searched: 1,
          suggestion_score: {
            $add: [
              { $multiply: [{ $ln: "$search_count" }, 0.3] },
              { $divide: [{ $subtract: ["$$NOW", "$last_searched"] }, 86400000] } // Days since last search
            ]
          }
        }
      }
    ]).toArray();
  }
}

SQL-Style Text Search with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB text search operations:

-- Basic full-text search with SQL syntax
SELECT 
    product_id,
    name,
    description,
    brand,
    price,
    category,
    TEXTRANK() as relevance_score
FROM products
WHERE TEXTSEARCH('wireless bluetooth headphones')
  AND inventory.availability = 'in_stock'
ORDER BY relevance_score DESC
LIMIT 20;

-- Advanced text search with boosting and filtering
SELECT 
    name,
    brand,
    price,
    category,
    ratings.average_rating,
    TEXTRANK() * 
    CASE 
        WHEN ratings.average_rating >= 4.5 THEN 1.3
        WHEN ratings.average_rating >= 4.0 THEN 1.1
        ELSE 1.0
    END as boosted_score
FROM products
WHERE TEXTSEARCH('gaming laptop RTX', language='english')
  AND price BETWEEN 800 AND 3000
  AND ratings.average_rating >= 4.0
ORDER BY boosted_score DESC, price ASC;

-- Multi-field text search with field-specific weighting
SELECT 
    product_id,
    name,
    brand,
    description,
    category,
    price,
    TEXTRANK() as base_score,

    -- Calculate weighted scores for different fields
    CASE 
        WHEN name LIKE '%wireless%' THEN TEXTRANK() * 2.0
        WHEN brand LIKE '%tech%' THEN TEXTRANK() * 1.5
        WHEN category LIKE '%electronics%' THEN TEXTRANK() * 1.2
        ELSE TEXTRANK()
    END as weighted_score

FROM products
WHERE TEXTSEARCH('wireless technology premium')
ORDER BY weighted_score DESC;

-- Faceted search with text queries
WITH text_search_base AS (
    SELECT *,
           TEXTRANK() as relevance_score
    FROM products
    WHERE TEXTSEARCH('smartphone android camera')
      AND relevance_score >= 0.5
),

price_facets AS (
    SELECT 
        CASE 
            WHEN price < 200 THEN 'Under $200'
            WHEN price < 500 THEN '$200-$499'
            WHEN price < 800 THEN '$500-$799'
            WHEN price < 1200 THEN '$800-$1199'
            ELSE '$1200+'
        END as price_range,
        COUNT(*) as product_count,
        AVG(relevance_score) as avg_relevance
    FROM text_search_base
    GROUP BY 
        CASE 
            WHEN price < 200 THEN 'Under $200'
            WHEN price < 500 THEN '$200-$499'
            WHEN price < 500 THEN '$500-$799'
            WHEN price < 1200 THEN '$800-$1199'
            ELSE '$1200+'
        END
),

brand_facets AS (
    SELECT 
        brand,
        COUNT(*) as product_count,
        AVG(relevance_score) as avg_relevance,
        AVG(ratings.average_rating) as avg_rating
    FROM text_search_base
    GROUP BY brand
    ORDER BY avg_relevance DESC, product_count DESC
    LIMIT 10
)

SELECT 
    -- Main search results
    (SELECT JSON_AGG(JSON_BUILD_OBJECT(
        'product_id', product_id,
        'name', name,
        'brand', brand,
        'price', price,
        'relevance_score', ROUND(relevance_score, 2)
    )) 
    FROM (
        SELECT product_id, name, brand, price, relevance_score
        FROM text_search_base
        ORDER BY relevance_score DESC
        LIMIT 50
    ) results) as search_results,

    -- Price facets
    (SELECT JSON_AGG(JSON_BUILD_OBJECT(
        'range', price_range,
        'count', product_count,
        'avg_relevance', ROUND(avg_relevance, 2)
    ))
    FROM price_facets
    ORDER BY avg_relevance DESC) as price_facets,

    -- Brand facets
    (SELECT JSON_AGG(JSON_BUILD_OBJECT(
        'brand', brand,
        'count', product_count,
        'avg_relevance', ROUND(avg_relevance, 2),
        'avg_rating', ROUND(avg_rating, 1)
    ))
    FROM brand_facets) as brand_facets;

-- Auto-complete suggestions with SQL
SELECT DISTINCT
    name as suggestion,
    'product_name' as suggestion_type,
    TEXTRANK() as relevance_score
FROM products
WHERE TEXTSEARCH('wirele')  -- Partial query
   OR name ILIKE 'wirele%'   -- Prefix matching
ORDER BY relevance_score DESC
LIMIT 10

UNION ALL

SELECT DISTINCT
    brand as suggestion,
    'brand' as suggestion_type,
    2.0 as relevance_score  -- Fixed high score for brand matches
FROM products 
WHERE brand ILIKE 'wirele%'
LIMIT 5

UNION ALL

SELECT DISTINCT
    category as suggestion,
    'category' as suggestion_type,
    1.5 as relevance_score
FROM products
WHERE category ILIKE 'wirele%'
LIMIT 5

ORDER BY relevance_score DESC, suggestion ASC;

-- Search analytics and performance monitoring
WITH search_performance AS (
    SELECT 
        TEXTSEARCH('wireless headphones audio') as has_text_match,
        name,
        brand, 
        category,
        price,
        ratings.average_rating,
        TEXTRANK() as relevance_score,
        inventory.availability
    FROM products
    WHERE has_text_match
),

search_metrics AS (
    SELECT 
        COUNT(*) as total_results,
        AVG(relevance_score) as avg_relevance_score,
        COUNT(CASE WHEN relevance_score >= 1.5 THEN 1 END) as high_relevance_results,
        COUNT(CASE WHEN relevance_score >= 1.0 THEN 1 END) as medium_relevance_results,
        COUNT(CASE WHEN relevance_score < 1.0 THEN 1 END) as low_relevance_results,

        -- Price distribution in results
        AVG(price) as avg_price,
        MIN(price) as min_price,
        MAX(price) as max_price,

        -- Rating distribution
        AVG(ratings.average_rating) as avg_rating,
        COUNT(CASE WHEN ratings.average_rating >= 4.0 THEN 1 END) as high_rated_results,

        -- Availability distribution  
        COUNT(CASE WHEN inventory.availability = 'in_stock' THEN 1 END) as available_results
    FROM search_performance
)

SELECT 
    -- Search quality metrics
    'Search Performance Report' as report_type,
    CURRENT_TIMESTAMP as generated_at,
    total_results,
    ROUND(avg_relevance_score, 3) as avg_relevance,

    -- Relevance distribution
    JSON_BUILD_OBJECT(
        'high_relevance', high_relevance_results,
        'medium_relevance', medium_relevance_results, 
        'low_relevance', low_relevance_results,
        'high_relevance_percent', ROUND((high_relevance_results::FLOAT / total_results * 100), 1)
    ) as relevance_distribution,

    -- Price insights
    JSON_BUILD_OBJECT(
        'avg_price', ROUND(avg_price, 2),
        'price_range', CONCAT('$', min_price, ' - $', max_price)
    ) as price_insights,

    -- Quality indicators
    JSON_BUILD_OBJECT(
        'avg_rating', ROUND(avg_rating, 2),
        'high_rated_percent', ROUND((high_rated_results::FLOAT / total_results * 100), 1),
        'availability_percent', ROUND((available_results::FLOAT / total_results * 100), 1)
    ) as quality_metrics,

    -- Search recommendations
    CASE 
        WHEN avg_relevance_score < 1.0 THEN 'Consider expanding search terms or adjusting index weights'
        WHEN high_relevance_results < 5 THEN 'Results may benefit from relevance boost tuning'
        WHEN available_results::FLOAT / total_results < 0.7 THEN 'High proportion of unavailable items in results'
        ELSE 'Search performance within acceptable ranges'
    END as recommendations

FROM search_metrics;

-- QueryLeaf provides comprehensive MongoDB text search capabilities:
-- 1. TEXTSEARCH() function for full-text search with natural language processing
-- 2. TEXTRANK() function for relevance scoring and ranking
-- 3. Multi-language support with language parameter specification
-- 4. Advanced text search operators integrated with SQL WHERE clauses
-- 5. Fuzzy matching and auto-complete functionality through SQL pattern matching
-- 6. Faceted search capabilities using standard SQL aggregation functions
-- 7. Search analytics and performance monitoring with familiar SQL reporting
-- 8. Integration with business logic through SQL CASE statements and functions

Production Text Search Optimization

Index Strategy and Performance Tuning

// Comprehensive text search optimization strategy
class ProductionTextSearchOptimizer {
  constructor(db) {
    this.db = db;
    this.indexStats = new Map();
    this.searchMetrics = new Map();
  }

  async optimizeTextIndexes() {
    console.log('Optimizing text search indexes for production workload...');

    // Analysis current text search performance
    const currentIndexes = await this.analyzeCurrentIndexes();
    const queryPatterns = await this.analyzeSearchQueries();
    const fieldUsage = await this.analyzeFieldUsage();

    // Design optimal text index configuration
    const optimizedConfig = this.designOptimalIndexes(currentIndexes, queryPatterns, fieldUsage);

    // Implement new indexes with minimal downtime
    await this.implementOptimizedIndexes(optimizedConfig);

    return {
      optimization_summary: optimizedConfig,
      performance_improvements: await this.measurePerformanceImprovements(),
      recommendations: this.generateOptimizationRecommendations()
    };
  }

  async analyzeCurrentIndexes() {
    const indexStats = await this.db.products.aggregate([
      { $indexStats: {} },
      { 
        $match: { 
          "name": { $regex: /text/i }
        }
      },
      {
        $project: {
          index_name: "$name",
          usage_stats: {
            ops: "$accesses.ops",
            since: "$accesses.since"
          },
          index_size: "$host"  
        }
      }
    ]).toArray();

    return indexStats;
  }

  async analyzeSearchQueries() {
    // Analyze recent search patterns from application logs
    const searchPatterns = await this.db.search_logs.aggregate([
      {
        $match: {
          timestamp: { $gte: new Date(Date.now() - 7 * 24 * 3600 * 1000) } // Last 7 days
        }
      },
      {
        $group: {
          _id: "$query_type",
          query_count: { $sum: 1 },
          avg_response_time: { $avg: "$response_time_ms" },
          unique_queries: { $addToSet: "$search_terms" },
          popular_terms: { 
            $push: {
              terms: "$search_terms",
              result_count: "$result_count",
              click_through_rate: "$ctr"
            }
          }
        }
      },
      {
        $project: {
          query_type: "$_id",
          query_count: 1,
          avg_response_time: { $round: ["$avg_response_time", 2] },
          unique_query_count: { $size: "$unique_queries" },
          performance_score: {
            $multiply: [
              { $divide: [1000, "$avg_response_time"] }, // Speed factor
              { $ln: "$query_count" } // Volume factor
            ]
          }
        }
      }
    ]).toArray();

    return searchPatterns;
  }

  async createOptimizedTextIndex() {
    // Drop existing text indexes if they exist
    try {
      await this.db.products.dropIndex("comprehensive_product_search");
    } catch (error) {
      // Index may not exist, continue
    }

    // Create highly optimized text index
    const indexResult = await this.db.products.createIndex({
      // Primary searchable fields with optimized weights
      "name": "text",
      "brand": "text", 
      "category": "text",
      "subcategory": "text",
      "description": "text",
      "tags": "text",
      "search_keywords": "text",

      // Secondary searchable fields
      "specifications.features": "text",
      "specifications.technical.type": "text"
    }, {
      name: "optimized_product_search_v2",
      default_language: "english",
      language_override: "language",

      // Carefully tuned weights based on analysis
      weights: {
        "name": 15,                          // Highest priority - exact product names
        "brand": 12,                         // High priority - brand recognition
        "category": 10,                      // High priority - categorical searches  
        "subcategory": 8,                    // Medium-high priority
        "tags": 6,                           // Medium priority - user-generated tags
        "search_keywords": 6,                // Medium priority - SEO terms
        "description": 4,                    // Lower priority - detailed descriptions
        "specifications.features": 3,         // Low priority - technical features
        "specifications.technical.type": 2   // Lowest priority - technical specs
      },

      // Performance optimization
      textIndexVersion: 3,
      partialFilterExpression: {
        "inventory.availability": { $ne: "discontinued" },
        "active": true
      }
    });

    console.log('Optimized text index created:', indexResult);
    return indexResult;
  }

  async implementSearchPerformanceMonitoring() {
    // Create collection for search performance metrics
    await this.db.createCollection("search_performance_metrics");

    // Create TTL index for automatic cleanup of old metrics
    await this.db.search_performance_metrics.createIndex(
      { "timestamp": 1 },
      { expireAfterSeconds: 30 * 24 * 3600 } // 30 days
    );

    return {
      monitoring_enabled: true,
      retention_period: "30 days",
      metrics_collection: "search_performance_metrics"
    };
  }

  async measureSearchPerformance(query, options = {}) {
    const startTime = Date.now();

    try {
      // Execute the search with explain plan
      const searchResults = await this.db.products.find(
        { $text: { $search: query } },
        { score: { $meta: "textScore" } }
      )
      .sort({ score: { $meta: "textScore" } })
      .limit(options.limit || 20)
      .explain("executionStats");

      const endTime = Date.now();
      const executionTime = endTime - startTime;

      // Record performance metrics
      const performanceMetrics = {
        query: query,
        execution_time_ms: executionTime,
        documents_examined: searchResults.executionStats.docsExamined,
        documents_returned: searchResults.executionStats.docsReturned,
        index_hits: searchResults.executionStats.indexesUsed?.length || 0,
        winning_plan: searchResults.queryPlanner.winningPlan.stage,
        timestamp: new Date(),

        // Performance classification
        performance_rating: this.classifyPerformance(executionTime, searchResults.executionStats),

        // Optimization recommendations
        optimization_suggestions: this.generatePerformanceRecommendations(searchResults)
      };

      // Store metrics for analysis
      await this.db.search_performance_metrics.insertOne(performanceMetrics);

      return performanceMetrics;

    } catch (error) {
      console.error('Search performance measurement failed:', error);
      return {
        query: query,
        error: error.message,
        timestamp: new Date()
      };
    }
  }

  classifyPerformance(executionTime, executionStats) {
    // Performance rating based on response time and efficiency
    if (executionTime < 50 && executionStats.docsExamined < executionStats.docsReturned * 2) {
      return "excellent";
    } else if (executionTime < 150 && executionStats.docsExamined < executionStats.docsReturned * 5) {
      return "good";
    } else if (executionTime < 500) {
      return "acceptable";
    } else {
      return "poor";
    }
  }

  generatePerformanceRecommendations(explainResult) {
    const recommendations = [];

    if (explainResult.executionStats.docsExamined > explainResult.executionStats.docsReturned * 10) {
      recommendations.push("High document examination ratio - consider more selective index or query optimization");
    }

    if (explainResult.executionStats.totalKeysExamined > explainResult.executionStats.docsReturned * 5) {
      recommendations.push("High index key examination - consider compound index optimization");
    }

    if (!explainResult.queryPlanner.indexFilterSet) {
      recommendations.push("Query not using optimal index filtering - consider index hint or query restructuring");
    }

    return recommendations;
  }
}

Scaling and Performance Optimization

  1. Index Design: Create targeted text indexes with appropriate field weights and language settings
  2. Query Optimization: Use compound indexes combining text search with frequently filtered fields
  3. Performance Monitoring: Implement comprehensive metrics collection for search query analysis
  4. Caching Strategy: Cache frequently searched terms and results to reduce database load
  5. Load Balancing: Distribute text search queries across multiple database nodes

Search Quality and User Experience

  1. Relevance Tuning: Continuously adjust text index weights based on user interaction data
  2. Auto-complete: Implement intelligent suggestion systems with typo tolerance
  3. Faceted Search: Provide multiple filtering dimensions to help users refine search results
  4. Search Analytics: Track search patterns, click-through rates, and conversion metrics
  5. Multi-language Support: Handle international search requirements with appropriate language processing

Conclusion

MongoDB's full-text search capabilities provide enterprise-grade text search functionality that integrates seamlessly with document-based data models. The combination of sophisticated text indexing, natural language processing, and flexible scoring enables building powerful search experiences without external search infrastructure.

Key MongoDB text search advantages include:

  • Native Integration: Built-in text search eliminates need for separate search servers
  • Advanced Linguistics: Automatic stemming, stop words, and language-specific processing
  • Flexible Scoring: Customizable relevance scoring with business logic integration
  • Performance Optimization: Specialized text indexes optimized for search workloads
  • SQL Accessibility: Familiar text search operations through QueryLeaf's SQL interface
  • Comprehensive Analytics: Built-in search performance monitoring and optimization tools

Whether you're building e-commerce platforms, content management systems, knowledge bases, or document repositories, MongoDB's text search capabilities with QueryLeaf's SQL interface provide the foundation for delivering sophisticated search experiences that scale with your application requirements.

QueryLeaf Integration: QueryLeaf automatically translates SQL text search operations into MongoDB's native full-text search queries, making advanced text search accessible through familiar SQL patterns. Complex search scenarios including relevance scoring, faceted search, and auto-complete functionality are seamlessly handled through standard SQL syntax, enabling developers to build powerful search features without learning MongoDB's text search specifics.

The combination of MongoDB's powerful text search engine with SQL-familiar query patterns creates an ideal platform for applications requiring both sophisticated search capabilities and familiar database interaction patterns, ensuring your search functionality can evolve and scale efficiently as your data and user requirements grow.

MongoDB Distributed Caching and Session Management: High-Performance Web Application State Management and Cache Optimization

Modern web applications require sophisticated caching and session management capabilities that can handle millions of concurrent users while maintaining consistent performance across distributed infrastructure. Traditional caching approaches rely on dedicated cache servers like Redis or Memcached, creating additional infrastructure complexity and potential single points of failure, while session management often involves complex synchronization between application servers and separate session stores.

MongoDB TTL Collections and advanced document modeling provide comprehensive caching and session management capabilities that integrate seamlessly with existing application data, offering automatic expiration, flexible data structures, and distributed consistency without requiring additional infrastructure components. Unlike traditional cache-aside patterns that require complex cache invalidation logic, MongoDB's integrated approach enables intelligent caching strategies with built-in consistency guarantees and sophisticated query capabilities.

The Traditional Caching and Session Challenge

Conventional approaches to distributed caching and session management introduce significant complexity and operational overhead:

-- Traditional PostgreSQL session storage - limited scalability and complex cleanup
CREATE TABLE user_sessions (
    session_id VARCHAR(128) PRIMARY KEY,
    user_id BIGINT NOT NULL,
    session_data JSONB NOT NULL,

    -- Session lifecycle management
    created_at TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
    last_accessed TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
    expires_at TIMESTAMPTZ NOT NULL,

    -- Session metadata
    ip_address INET,
    user_agent TEXT,
    device_fingerprint VARCHAR(256),

    -- Security and fraud detection
    login_method VARCHAR(50),
    mfa_verified BOOLEAN DEFAULT FALSE,
    risk_score DECIMAL(3,2) DEFAULT 0.0,

    -- Application state
    active BOOLEAN DEFAULT TRUE,
    invalidated_at TIMESTAMPTZ,
    invalidation_reason VARCHAR(100)
);

-- Manual session cleanup (requires scheduled maintenance)
CREATE OR REPLACE FUNCTION cleanup_expired_sessions()
RETURNS INTEGER AS $$
DECLARE
    deleted_count INTEGER;
BEGIN
    -- Delete expired sessions
    DELETE FROM user_sessions 
    WHERE expires_at < CURRENT_TIMESTAMP
       OR (active = FALSE AND invalidated_at < CURRENT_TIMESTAMP - INTERVAL '1 day');

    GET DIAGNOSTICS deleted_count = ROW_COUNT;

    -- Log cleanup activity
    INSERT INTO session_cleanup_log (cleanup_date, sessions_deleted)
    VALUES (CURRENT_TIMESTAMP, deleted_count);

    RETURN deleted_count;
END;
$$ LANGUAGE plpgsql;

-- Complex cache table design with manual expiration
CREATE TABLE application_cache (
    cache_key VARCHAR(512) PRIMARY KEY,
    cache_namespace VARCHAR(100) NOT NULL,
    cache_data JSONB NOT NULL,

    -- Expiration management
    created_at TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
    expires_at TIMESTAMPTZ NOT NULL,
    last_accessed TIMESTAMPTZ DEFAULT CURRENT_TIMESTAMP,

    -- Cache metadata
    data_size INTEGER,
    cache_tags TEXT[],
    cache_version VARCHAR(50),

    -- Usage statistics
    hit_count BIGINT DEFAULT 0,
    miss_count BIGINT DEFAULT 0,
    invalidation_count INTEGER DEFAULT 0
);

-- Indexing for cache lookups (expensive maintenance overhead)
CREATE INDEX idx_cache_namespace_key ON application_cache (cache_namespace, cache_key);
CREATE INDEX idx_cache_expires_at ON application_cache (expires_at);
CREATE INDEX idx_cache_tags ON application_cache USING GIN (cache_tags);
CREATE INDEX idx_cache_last_accessed ON application_cache (last_accessed);

-- Manual cache invalidation logic (complex and error-prone)
CREATE OR REPLACE FUNCTION invalidate_cache_by_tags(tag_names TEXT[])
RETURNS INTEGER AS $$
DECLARE
    invalidated_count INTEGER;
BEGIN
    -- Invalidate cache entries with matching tags
    DELETE FROM application_cache 
    WHERE cache_tags && tag_names;

    GET DIAGNOSTICS invalidated_count = ROW_COUNT;

    -- Update invalidation statistics
    UPDATE cache_statistics 
    SET tag_invalidations = tag_invalidations + invalidated_count,
        last_invalidation = CURRENT_TIMESTAMP
    WHERE stat_date = CURRENT_DATE;

    RETURN invalidated_count;
END;
$$ LANGUAGE plpgsql;

-- Session data queries with complex join logic
WITH active_sessions AS (
    SELECT 
        us.session_id,
        us.user_id,
        us.session_data,
        us.last_accessed,
        us.expires_at,
        us.ip_address,
        us.device_fingerprint,

        -- Calculate session duration
        EXTRACT(EPOCH FROM us.last_accessed - us.created_at) / 3600 as session_hours,

        -- Determine session freshness
        CASE 
            WHEN us.last_accessed > CURRENT_TIMESTAMP - INTERVAL '15 minutes' THEN 'active'
            WHEN us.last_accessed > CURRENT_TIMESTAMP - INTERVAL '1 hour' THEN 'recent'
            WHEN us.last_accessed > CURRENT_TIMESTAMP - INTERVAL '6 hours' THEN 'idle'
            ELSE 'stale'
        END as session_status,

        -- Extract user preferences from session data
        us.session_data->>'preferences' as user_preferences,
        us.session_data->>'shopping_cart' as shopping_cart,
        us.session_data->>'last_page' as last_page

    FROM user_sessions us
    WHERE us.active = TRUE 
    AND us.expires_at > CURRENT_TIMESTAMP
),

session_analytics AS (
    SELECT 
        COUNT(*) as total_active_sessions,
        COUNT(DISTINCT user_id) as unique_users,
        AVG(session_hours) as avg_session_duration,

        COUNT(*) FILTER (WHERE session_status = 'active') as active_sessions,
        COUNT(*) FILTER (WHERE session_status = 'recent') as recent_sessions,
        COUNT(*) FILTER (WHERE session_status = 'idle') as idle_sessions,
        COUNT(*) FILTER (WHERE session_status = 'stale') as stale_sessions,

        -- Risk analysis
        COUNT(*) FILTER (WHERE risk_score > 0.7) as high_risk_sessions,
        COUNT(DISTINCT ip_address) as unique_ip_addresses,

        -- Application state analysis
        COUNT(*) FILTER (WHERE shopping_cart IS NOT NULL) as sessions_with_cart,
        AVG(CAST(shopping_cart->>'item_count' AS INTEGER)) as avg_cart_items

    FROM active_sessions
)

SELECT 
    'Session Management Report' as report_type,
    CURRENT_TIMESTAMP as generated_at,

    -- Session statistics
    sa.total_active_sessions,
    sa.unique_users,
    ROUND(sa.avg_session_duration::NUMERIC, 2) as avg_session_hours,

    -- Session distribution
    JSON_BUILD_OBJECT(
        'active', sa.active_sessions,
        'recent', sa.recent_sessions, 
        'idle', sa.idle_sessions,
        'stale', sa.stale_sessions
    ) as session_distribution,

    -- Security metrics
    sa.high_risk_sessions,
    sa.unique_ip_addresses,
    ROUND((sa.high_risk_sessions::FLOAT / sa.total_active_sessions * 100)::NUMERIC, 2) as risk_percentage,

    -- Business metrics
    sa.sessions_with_cart,
    ROUND(sa.avg_cart_items::NUMERIC, 1) as avg_cart_items,
    ROUND((sa.sessions_with_cart::FLOAT / sa.total_active_sessions * 100)::NUMERIC, 2) as cart_conversion_rate

FROM session_analytics sa;

-- Cache performance queries (limited analytics capabilities)
WITH cache_performance AS (
    SELECT 
        cache_namespace,
        COUNT(*) as total_entries,
        SUM(data_size) as total_size_bytes,
        AVG(data_size) as avg_entry_size,

        -- Hit ratio calculation
        SUM(hit_count) as total_hits,
        SUM(miss_count) as total_misses,
        CASE 
            WHEN SUM(hit_count + miss_count) > 0 THEN
                ROUND((SUM(hit_count)::FLOAT / SUM(hit_count + miss_count) * 100)::NUMERIC, 2)
            ELSE 0
        END as hit_ratio_percent,

        -- Expiration analysis
        COUNT(*) FILTER (WHERE expires_at <= CURRENT_TIMESTAMP) as expired_entries,
        COUNT(*) FILTER (WHERE expires_at > CURRENT_TIMESTAMP + INTERVAL '1 hour') as long_lived_entries,

        -- Access patterns
        AVG(hit_count) as avg_hits_per_entry,
        MAX(last_accessed) as most_recent_access,
        MIN(last_accessed) as oldest_access

    FROM application_cache
    GROUP BY cache_namespace
)

SELECT 
    cp.cache_namespace,
    cp.total_entries,
    ROUND((cp.total_size_bytes / 1024.0 / 1024.0)::NUMERIC, 2) as size_mb,
    ROUND(cp.avg_entry_size::NUMERIC, 0) as avg_entry_size_bytes,
    cp.hit_ratio_percent,
    cp.expired_entries,
    cp.long_lived_entries,
    cp.avg_hits_per_entry,

    -- Cache efficiency assessment
    CASE 
        WHEN cp.hit_ratio_percent > 90 THEN 'excellent'
        WHEN cp.hit_ratio_percent > 70 THEN 'good'
        WHEN cp.hit_ratio_percent > 50 THEN 'acceptable'
        ELSE 'poor'
    END as cache_efficiency,

    -- Recommendations
    CASE 
        WHEN cp.expired_entries > cp.total_entries * 0.1 THEN 'Consider shorter TTL or more frequent cleanup'
        WHEN cp.hit_ratio_percent < 50 THEN 'Review caching strategy and key patterns'
        WHEN cp.avg_entry_size > 1048576 THEN 'Consider data compression or smaller cache objects'
        ELSE 'Cache performance within normal parameters'
    END as recommendation

FROM cache_performance cp
ORDER BY cp.total_entries DESC;

-- Problems with traditional cache and session management:
-- 1. Manual expiration cleanup with potential for orphaned data
-- 2. Complex indexing strategies and maintenance overhead
-- 3. Limited scalability for high-concurrency web applications
-- 4. No built-in distributed consistency or replication
-- 5. Complex cache invalidation logic prone to race conditions
-- 6. Separate infrastructure requirements for cache and session stores
-- 7. Limited analytics and monitoring capabilities
-- 8. Manual session lifecycle management and security tracking
-- 9. No automatic compression or storage optimization
-- 10. Complex failover and disaster recovery procedures

MongoDB provides sophisticated distributed caching and session management with automatic TTL handling and advanced capabilities:

// MongoDB Distributed Caching and Session Management System
const { MongoClient, ObjectId } = require('mongodb');

const client = new MongoClient('mongodb://localhost:27017/?replicaSet=rs0');
const db = client.db('distributed_cache_system');

// Comprehensive MongoDB Caching and Session Manager
class AdvancedCacheSessionManager {
  constructor(db, config = {}) {
    this.db = db;
    this.collections = {
      userSessions: db.collection('user_sessions'),
      applicationCache: db.collection('application_cache'),
      cacheStatistics: db.collection('cache_statistics'),
      sessionEvents: db.collection('session_events'),
      cacheMetrics: db.collection('cache_metrics'),
      deviceFingerprints: db.collection('device_fingerprints')
    };

    // Advanced configuration
    this.config = {
      // Session management
      defaultSessionTTL: config.defaultSessionTTL || 7200, // 2 hours
      extendedSessionTTL: config.extendedSessionTTL || 86400, // 24 hours for "remember me"
      maxSessionsPerUser: config.maxSessionsPerUser || 10,
      sessionCleanupInterval: config.sessionCleanupInterval || 300, // 5 minutes

      // Cache management
      defaultCacheTTL: config.defaultCacheTTL || 3600, // 1 hour
      maxCacheSize: config.maxCacheSize || 16 * 1024 * 1024, // 16MB per entry
      enableCompression: config.enableCompression !== false,
      enableDistribution: config.enableDistribution !== false,

      // Security settings
      enableDeviceTracking: config.enableDeviceTracking !== false,
      enableRiskScoring: config.enableRiskScoring !== false,
      maxRiskScore: config.maxRiskScore || 0.8,
      suspiciousActivityThreshold: config.suspiciousActivityThreshold || 5,

      // Performance optimization
      enableMetrics: config.enableMetrics !== false,
      metricsRetentionDays: config.metricsRetentionDays || 30,
      enableLazyLoading: config.enableLazyLoading !== false,
      cacheWarmupEnabled: config.cacheWarmupEnabled !== false
    };

    // Initialize TTL collections and indexes
    this.initializeCollections();
    this.setupMetricsCollection();
    this.startMaintenanceTasks();
  }

  async initializeCollections() {
    console.log('Initializing MongoDB TTL collections and indexes...');

    try {
      // Configure user sessions collection with TTL
      await this.collections.userSessions.createIndex(
        { "expiresAt": 1 },
        { 
          expireAfterSeconds: 0,
          name: "session_ttl_index"
        }
      );

      // Additional session indexes for performance
      await this.collections.userSessions.createIndexes([
        { key: { sessionId: 1 }, unique: true, name: "session_id_unique" },
        { key: { userId: 1, expiresAt: 1 }, name: "user_active_sessions" },
        { key: { deviceFingerprint: 1 }, name: "device_sessions" },
        { key: { ipAddress: 1, createdAt: 1 }, name: "ip_activity" },
        { key: { riskScore: 1 }, name: "risk_analysis" },
        { key: { "sessionData.shoppingCart.items": 1 }, name: "cart_sessions", sparse: true }
      ]);

      // Configure application cache collection with TTL
      await this.collections.applicationCache.createIndex(
        { "expiresAt": 1 },
        { 
          expireAfterSeconds: 0,
          name: "cache_ttl_index"
        }
      );

      // Cache performance indexes
      await this.collections.applicationCache.createIndexes([
        { key: { cacheKey: 1 }, unique: true, name: "cache_key_unique" },
        { key: { namespace: 1, cacheKey: 1 }, name: "namespace_key_lookup" },
        { key: { tags: 1 }, name: "cache_tags" },
        { key: { lastAccessed: 1 }, name: "access_patterns" },
        { key: { dataSize: 1 }, name: "size_analysis" }
      ]);

      // Cache metrics with TTL for automatic cleanup
      await this.collections.cacheMetrics.createIndex(
        { "timestamp": 1 },
        { 
          expireAfterSeconds: this.config.metricsRetentionDays * 24 * 3600,
          name: "metrics_ttl_index"
        }
      );

      console.log('TTL collections and indexes initialized successfully');

    } catch (error) {
      console.error('Error initializing collections:', error);
      throw error;
    }
  }

  async createUserSession(userId, sessionData, options = {}) {
    console.log(`Creating new session for user: ${userId}`);

    try {
      // Clean up old sessions if user has too many
      await this.enforceSessionLimits(userId);

      // Generate secure session ID
      const sessionId = await this.generateSecureSessionId();

      // Calculate expiration time
      const ttlSeconds = options.rememberMe ? 
        this.config.extendedSessionTTL : 
        this.config.defaultSessionTTL;

      const expiresAt = new Date(Date.now() + (ttlSeconds * 1000));

      // Calculate risk score
      const riskScore = await this.calculateSessionRiskScore(userId, sessionData, options);

      // Create session document
      const session = {
        _id: new ObjectId(),
        sessionId: sessionId,
        userId: userId,

        // Session lifecycle
        createdAt: new Date(),
        lastAccessed: new Date(),
        expiresAt: expiresAt,
        ttlSeconds: ttlSeconds,

        // Session data with flexible structure
        sessionData: {
          preferences: sessionData.preferences || {},
          shoppingCart: sessionData.shoppingCart || { items: [], total: 0 },
          navigation: sessionData.navigation || { lastPage: '/', referrer: null },
          applicationState: sessionData.applicationState || {},
          temporaryData: sessionData.temporaryData || {}
        },

        // Security and device tracking
        ipAddress: options.ipAddress,
        userAgent: options.userAgent,
        deviceFingerprint: await this.generateDeviceFingerprint(options),
        riskScore: riskScore,

        // Authentication metadata
        loginMethod: options.loginMethod || 'password',
        mfaVerified: options.mfaVerified || false,
        loginLocation: options.loginLocation,

        // Session flags
        active: true,
        rememberMe: options.rememberMe || false,

        // Usage statistics
        pageViews: 0,
        actionsPerformed: 0,
        dataTransferred: 0
      };

      // Insert session with automatic TTL handling
      const result = await this.collections.userSessions.insertOne(session);

      // Log session creation event
      await this.logSessionEvent(sessionId, 'session_created', {
        userId: userId,
        riskScore: riskScore,
        ttlSeconds: ttlSeconds,
        rememberMe: options.rememberMe
      });

      // Update device fingerprint tracking
      if (this.config.enableDeviceTracking) {
        await this.updateDeviceTracking(session.deviceFingerprint, userId, sessionId);
      }

      console.log(`Session created successfully: ${sessionId}`);

      return {
        sessionId: sessionId,
        expiresAt: expiresAt,
        riskScore: riskScore,
        success: true
      };

    } catch (error) {
      console.error('Error creating user session:', error);
      throw error;
    }
  }

  async getSessionData(sessionId, options = {}) {
    console.log(`Retrieving session data: ${sessionId}`);

    try {
      // Find active session
      const session = await this.collections.userSessions.findOne({
        sessionId: sessionId,
        active: true,
        expiresAt: { $gt: new Date() }
      });

      if (!session) {
        return { success: false, reason: 'session_not_found' };
      }

      // Update last accessed timestamp
      const updateData = {
        $set: { lastAccessed: new Date() },
        $inc: { 
          pageViews: options.incrementPageView ? 1 : 0,
          actionsPerformed: options.actionPerformed ? 1 : 0
        }
      };

      await this.collections.userSessions.updateOne(
        { sessionId: sessionId },
        updateData
      );

      // Check for risk score updates
      if (this.config.enableRiskScoring && options.updateRiskScore) {
        const newRiskScore = await this.calculateSessionRiskScore(
          session.userId, 
          session.sessionData, 
          options
        );

        if (newRiskScore > this.config.maxRiskScore) {
          await this.flagHighRiskSession(sessionId, newRiskScore);
        }
      }

      return {
        success: true,
        session: session,
        userId: session.userId,
        sessionData: session.sessionData,
        expiresAt: session.expiresAt,
        riskScore: session.riskScore
      };

    } catch (error) {
      console.error('Error retrieving session data:', error);
      throw error;
    }
  }

  async updateSessionData(sessionId, updateData, options = {}) {
    console.log(`Updating session data: ${sessionId}`);

    try {
      // Prepare update operations
      const update = {
        $set: {
          lastAccessed: new Date(),
          'sessionData.preferences': updateData.preferences,
          'sessionData.shoppingCart': updateData.shoppingCart,
          'sessionData.navigation': updateData.navigation,
          'sessionData.applicationState': updateData.applicationState
        }
      };

      // Optional TTL extension
      if (options.extendTTL) {
        const newExpirationTime = new Date(Date.now() + (this.config.defaultSessionTTL * 1000));
        update.$set.expiresAt = newExpirationTime;
      }

      // Merge temporary data if provided
      if (updateData.temporaryData) {
        update.$set['sessionData.temporaryData'] = {
          ...updateData.temporaryData
        };
      }

      const result = await this.collections.userSessions.updateOne(
        { 
          sessionId: sessionId, 
          active: true,
          expiresAt: { $gt: new Date() }
        },
        update
      );

      if (result.matchedCount === 0) {
        return { success: false, reason: 'session_not_found' };
      }

      // Log session update event
      await this.logSessionEvent(sessionId, 'session_updated', {
        updateFields: Object.keys(updateData),
        extendTTL: options.extendTTL
      });

      return { success: true };

    } catch (error) {
      console.error('Error updating session data:', error);
      throw error;
    }
  }

  async invalidateSession(sessionId, reason = 'user_logout') {
    console.log(`Invalidating session: ${sessionId}, reason: ${reason}`);

    try {
      const result = await this.collections.userSessions.updateOne(
        { sessionId: sessionId },
        {
          $set: {
            active: false,
            invalidatedAt: new Date(),
            invalidationReason: reason,
            // Set immediate expiration for automatic cleanup
            expiresAt: new Date()
          }
        }
      );

      if (result.matchedCount > 0) {
        // Log session invalidation
        await this.logSessionEvent(sessionId, 'session_invalidated', {
          reason: reason,
          invalidatedAt: new Date()
        });
      }

      return { success: result.matchedCount > 0 };

    } catch (error) {
      console.error('Error invalidating session:', error);
      throw error;
    }
  }

  async setCache(cacheKey, data, options = {}) {
    console.log(`Setting cache entry: ${cacheKey}`);

    try {
      // Validate data size
      const dataSize = JSON.stringify(data).length;
      if (dataSize > this.config.maxCacheSize) {
        throw new Error(`Cache data size ${dataSize} exceeds maximum ${this.config.maxCacheSize}`);
      }

      // Calculate expiration time
      const ttlSeconds = options.ttl || this.config.defaultCacheTTL;
      const expiresAt = new Date(Date.now() + (ttlSeconds * 1000));

      // Prepare cache document
      const cacheEntry = {
        cacheKey: cacheKey,
        namespace: options.namespace || 'default',
        data: this.config.enableCompression ? await this.compressData(data) : data,
        compressed: this.config.enableCompression,

        // Expiration handling
        createdAt: new Date(),
        expiresAt: expiresAt,
        ttlSeconds: ttlSeconds,
        lastAccessed: new Date(),

        // Metadata
        tags: options.tags || [],
        version: options.version || '1.0',
        dataSize: dataSize,
        contentType: options.contentType || 'application/json',

        // Usage statistics
        hitCount: 0,
        accessHistory: [],

        // Cache strategy metadata
        cacheStrategy: options.strategy || 'default',
        invalidationRules: options.invalidationRules || []
      };

      // Upsert cache entry with automatic TTL
      await this.collections.applicationCache.replaceOne(
        { cacheKey: cacheKey },
        cacheEntry,
        { upsert: true }
      );

      // Update cache metrics
      await this.updateCacheMetrics('set', cacheKey, {
        namespace: cacheEntry.namespace,
        dataSize: dataSize,
        ttlSeconds: ttlSeconds
      });

      console.log(`Cache entry set successfully: ${cacheKey}`);
      return { success: true, expiresAt: expiresAt };

    } catch (error) {
      console.error('Error setting cache entry:', error);
      throw error;
    }
  }

  async getCache(cacheKey, options = {}) {
    console.log(`Getting cache entry: ${cacheKey}`);

    try {
      const cacheEntry = await this.collections.applicationCache.findOneAndUpdate(
        {
          cacheKey: cacheKey,
          expiresAt: { $gt: new Date() }
        },
        {
          $set: { lastAccessed: new Date() },
          $inc: { hitCount: 1 },
          $push: {
            accessHistory: {
              $each: [{ timestamp: new Date(), source: options.source || 'application' }],
              $slice: -10 // Keep only last 10 access records
            }
          }
        },
        { returnDocument: 'after' }
      );

      if (!cacheEntry.value) {
        // Record cache miss
        await this.updateCacheMetrics('miss', cacheKey, {
          namespace: options.namespace || 'default'
        });

        return { success: false, reason: 'cache_miss' };
      }

      // Decompress data if needed
      const data = cacheEntry.value.compressed ? 
        await this.decompressData(cacheEntry.value.data) : 
        cacheEntry.value.data;

      // Record cache hit
      await this.updateCacheMetrics('hit', cacheKey, {
        namespace: cacheEntry.value.namespace,
        dataSize: cacheEntry.value.dataSize
      });

      return {
        success: true,
        data: data,
        metadata: {
          createdAt: cacheEntry.value.createdAt,
          expiresAt: cacheEntry.value.expiresAt,
          hitCount: cacheEntry.value.hitCount,
          tags: cacheEntry.value.tags,
          version: cacheEntry.value.version
        }
      };

    } catch (error) {
      console.error('Error getting cache entry:', error);
      throw error;
    }
  }

  async invalidateCache(criteria, options = {}) {
    console.log('Invalidating cache entries with criteria:', criteria);

    try {
      let query = {};

      // Build invalidation query based on criteria
      if (criteria.key) {
        query.cacheKey = criteria.key;
      } else if (criteria.pattern) {
        query.cacheKey = { $regex: criteria.pattern };
      } else if (criteria.namespace) {
        query.namespace = criteria.namespace;
      } else if (criteria.tags) {
        query.tags = { $in: criteria.tags };
      }

      // Immediate expiration for automatic cleanup
      const result = await this.collections.applicationCache.updateMany(
        query,
        {
          $set: { 
            expiresAt: new Date(),
            invalidatedAt: new Date(),
            invalidationReason: options.reason || 'manual_invalidation'
          }
        }
      );

      // Update invalidation metrics
      await this.updateCacheMetrics('invalidation', null, {
        criteriaType: Object.keys(criteria)[0],
        entriesInvalidated: result.modifiedCount,
        reason: options.reason
      });

      console.log(`Invalidated ${result.modifiedCount} cache entries`);
      return { success: true, invalidatedCount: result.modifiedCount };

    } catch (error) {
      console.error('Error invalidating cache entries:', error);
      throw error;
    }
  }

  async getUserSessionAnalytics(userId, options = {}) {
    console.log(`Generating session analytics for user: ${userId}`);

    try {
      const timeRange = options.timeRange || 24; // hours
      const startTime = new Date(Date.now() - (timeRange * 3600 * 1000));

      const analytics = await this.collections.userSessions.aggregate([
        {
          $match: {
            userId: userId,
            createdAt: { $gte: startTime }
          }
        },
        {
          $group: {
            _id: '$userId',

            // Session counts
            totalSessions: { $sum: 1 },
            activeSessions: {
              $sum: {
                $cond: [
                  { $and: [
                    { $eq: ['$active', true] },
                    { $gt: ['$expiresAt', new Date()] }
                  ]},
                  1,
                  0
                ]
              }
            },

            // Duration analysis
            averageSessionDuration: {
              $avg: {
                $divide: [
                  { $subtract: ['$lastAccessed', '$createdAt'] },
                  1000 * 60 // Convert to minutes
                ]
              }
            },

            // Activity metrics
            totalPageViews: { $sum: '$pageViews' },
            totalActions: { $sum: '$actionsPerformed' },

            // Risk analysis
            averageRiskScore: { $avg: '$riskScore' },
            highRiskSessions: {
              $sum: {
                $cond: [{ $gt: ['$riskScore', 0.7] }, 1, 0]
              }
            },

            // Device analysis
            uniqueDevices: { $addToSet: '$deviceFingerprint' },
            uniqueIpAddresses: { $addToSet: '$ipAddress' },

            // Authentication methods
            loginMethods: { $addToSet: '$loginMethod' },
            mfaUsage: {
              $sum: {
                $cond: [{ $eq: ['$mfaVerified', true] }, 1, 0]
              }
            }
          }
        },
        {
          $project: {
            userId: '$_id',
            totalSessions: 1,
            activeSessions: 1,
            averageSessionDuration: { $round: ['$averageSessionDuration', 2] },
            totalPageViews: 1,
            totalActions: 1,
            averageRiskScore: { $round: ['$averageRiskScore', 3] },
            highRiskSessions: 1,
            deviceCount: { $size: '$uniqueDevices' },
            ipAddressCount: { $size: '$uniqueIpAddresses' },
            loginMethods: 1,
            mfaUsagePercentage: {
              $round: [
                { $multiply: [
                  { $divide: ['$mfaUsage', '$totalSessions'] },
                  100
                ]},
                2
              ]
            }
          }
        }
      ]).toArray();

      return analytics[0] || null;

    } catch (error) {
      console.error('Error generating user session analytics:', error);
      throw error;
    }
  }

  async getCachePerformanceReport(namespace = null, options = {}) {
    console.log('Generating cache performance report...');

    try {
      const timeRange = options.timeRange || 24; // hours
      const startTime = new Date(Date.now() - (timeRange * 3600 * 1000));

      // Build match criteria
      const matchCriteria = {
        timestamp: { $gte: startTime }
      };

      if (namespace) {
        matchCriteria.namespace = namespace;
      }

      const report = await this.collections.cacheMetrics.aggregate([
        { $match: matchCriteria },
        {
          $group: {
            _id: '$namespace',

            // Hit/miss statistics
            totalHits: {
              $sum: {
                $cond: [{ $eq: ['$operation', 'hit'] }, 1, 0]
              }
            },
            totalMisses: {
              $sum: {
                $cond: [{ $eq: ['$operation', 'miss'] }, 1, 0]
              }
            },
            totalSets: {
              $sum: {
                $cond: [{ $eq: ['$operation', 'set'] }, 1, 0]
              }
            },
            totalInvalidations: {
              $sum: {
                $cond: [{ $eq: ['$operation', 'invalidation'] }, '$metadata.entriesInvalidated', 0]
              }
            },

            // Data size statistics
            totalDataSize: {
              $sum: {
                $cond: [{ $eq: ['$operation', 'set'] }, '$metadata.dataSize', 0]
              }
            },
            averageDataSize: {
              $avg: {
                $cond: [{ $eq: ['$operation', 'set'] }, '$metadata.dataSize', null]
              }
            },

            // TTL analysis
            averageTTL: {
              $avg: {
                $cond: [{ $eq: ['$operation', 'set'] }, '$metadata.ttlSeconds', null]
              }
            }
          }
        },
        {
          $project: {
            namespace: '$_id',

            // Performance metrics
            totalRequests: { $add: ['$totalHits', '$totalMisses'] },
            hitRatio: {
              $round: [
                {
                  $cond: [
                    { $gt: [{ $add: ['$totalHits', '$totalMisses'] }, 0] },
                    {
                      $multiply: [
                        { $divide: ['$totalHits', { $add: ['$totalHits', '$totalMisses'] }] },
                        100
                      ]
                    },
                    0
                  ]
                },
                2
              ]
            },

            totalHits: 1,
            totalMisses: 1,
            totalSets: 1,
            totalInvalidations: 1,

            // Data insights
            totalDataSizeMB: {
              $round: [{ $divide: ['$totalDataSize', 1024 * 1024] }, 2]
            },
            averageDataSizeKB: {
              $round: [{ $divide: ['$averageDataSize', 1024] }, 2]
            },
            averageTTLHours: {
              $round: [{ $divide: ['$averageTTL', 3600] }, 2]
            }
          }
        }
      ]).toArray();

      return report;

    } catch (error) {
      console.error('Error generating cache performance report:', error);
      throw error;
    }
  }

  // Utility methods
  async generateSecureSessionId() {
    const crypto = require('crypto');
    return crypto.randomBytes(32).toString('hex');
  }

  async generateDeviceFingerprint(options) {
    const crypto = require('crypto');
    const fingerprint = `${options.userAgent || ''}-${options.ipAddress || ''}-${Date.now()}`;
    return crypto.createHash('sha256').update(fingerprint).digest('hex');
  }

  async calculateSessionRiskScore(userId, sessionData, options) {
    let riskScore = 0.0;

    // IP-based risk assessment
    if (options.ipAddress) {
      const recentSessions = await this.collections.userSessions.countDocuments({
        userId: userId,
        ipAddress: { $ne: options.ipAddress },
        createdAt: { $gte: new Date(Date.now() - 24 * 3600 * 1000) }
      });

      if (recentSessions > 0) riskScore += 0.2;
    }

    // Time-based risk assessment
    const currentHour = new Date().getHours();
    if (currentHour < 6 || currentHour > 23) {
      riskScore += 0.1;
    }

    // Device change assessment
    if (this.config.enableDeviceTracking && options.userAgent) {
      const knownDevice = await this.collections.deviceFingerprints.findOne({
        userId: userId,
        userAgent: options.userAgent
      });

      if (!knownDevice) riskScore += 0.3;
    }

    return Math.min(riskScore, 1.0);
  }

  async enforceSessionLimits(userId) {
    const sessionCount = await this.collections.userSessions.countDocuments({
      userId: userId,
      active: true,
      expiresAt: { $gt: new Date() }
    });

    if (sessionCount >= this.config.maxSessionsPerUser) {
      // Remove oldest sessions
      const sessionsToRemove = await this.collections.userSessions
        .find({
          userId: userId,
          active: true,
          expiresAt: { $gt: new Date() }
        })
        .sort({ lastAccessed: 1 })
        .limit(sessionCount - this.config.maxSessionsPerUser + 1)
        .toArray();

      for (const session of sessionsToRemove) {
        await this.invalidateSession(session.sessionId, 'session_limit_exceeded');
      }
    }
  }

  async logSessionEvent(sessionId, eventType, eventData) {
    if (!this.config.enableMetrics) return;

    await this.collections.sessionEvents.insertOne({
      sessionId: sessionId,
      eventType: eventType,
      eventData: eventData,
      timestamp: new Date()
    });
  }

  async updateCacheMetrics(operation, cacheKey, metadata) {
    if (!this.config.enableMetrics) return;

    await this.collections.cacheMetrics.insertOne({
      operation: operation,
      cacheKey: cacheKey,
      namespace: metadata.namespace,
      metadata: metadata,
      timestamp: new Date()
    });
  }

  async compressData(data) {
    const zlib = require('zlib');
    const jsonString = JSON.stringify(data);
    return zlib.deflateSync(jsonString);
  }

  async decompressData(compressedData) {
    const zlib = require('zlib');
    const decompressed = zlib.inflateSync(compressedData);
    return JSON.parse(decompressed.toString());
  }

  async startMaintenanceTasks() {
    // TTL collections handle expiration automatically
    console.log('Maintenance tasks initialized - TTL collections managing automatic cleanup');

    // Optional: Set up additional cleanup for edge cases
    if (this.config.sessionCleanupInterval > 0) {
      setInterval(async () => {
        await this.performMaintenanceCleanup();
      }, this.config.sessionCleanupInterval * 1000);
    }
  }

  async performMaintenanceCleanup() {
    try {
      // Optional cleanup for orphaned records or additional maintenance
      const orphanedSessions = await this.collections.userSessions.countDocuments({
        active: false,
        invalidatedAt: { $lt: new Date(Date.now() - 24 * 3600 * 1000) }
      });

      if (orphanedSessions > 0) {
        console.log(`Found ${orphanedSessions} orphaned sessions for cleanup`);
        // TTL will handle automatic cleanup
      }
    } catch (error) {
      console.warn('Error during maintenance cleanup:', error.message);
    }
  }
}

// Benefits of MongoDB Distributed Caching and Session Management:
// - Automatic TTL expiration with no manual cleanup required
// - Flexible document structure for complex session and cache data
// - Built-in indexing and query optimization for cache and session operations
// - Integrated compression and storage optimization capabilities
// - Sophisticated analytics and metrics collection with automatic retention
// - Advanced security features including risk scoring and device tracking
// - High-performance concurrent access with MongoDB's native concurrency controls
// - Seamless integration with existing MongoDB infrastructure
// - SQL-compatible operations through QueryLeaf for familiar management patterns
// - Distributed consistency and replication support for high availability

module.exports = {
  AdvancedCacheSessionManager
};

Understanding MongoDB TTL Collections and Cache Architecture

Advanced TTL Configuration and Performance Optimization

Implement sophisticated TTL strategies for optimal performance and resource management:

// Production-ready MongoDB TTL and cache optimization
class EnterpriseCAcheManager extends AdvancedCacheSessionManager {
  constructor(db, enterpriseConfig) {
    super(db, enterpriseConfig);

    this.enterpriseConfig = {
      ...enterpriseConfig,
      enableTieredStorage: true,
      enableCacheWarmup: true,
      enablePredictiveEviction: true,
      enableLoadBalancing: true,
      enableCacheReplication: true,
      enablePerformanceOptimization: true
    };

    this.setupEnterpriseFeatures();
    this.initializeAdvancedCaching();
    this.setupPerformanceMonitoring();
  }

  async implementTieredCaching() {
    console.log('Implementing enterprise tiered caching strategy...');

    const tieredConfig = {
      // Hot tier - frequently accessed data
      hotTier: {
        ttl: 900, // 15 minutes
        maxSize: 100 * 1024 * 1024, // 100MB
        compressionLevel: 'fast'
      },

      // Warm tier - moderately accessed data  
      warmTier: {
        ttl: 3600, // 1 hour
        maxSize: 500 * 1024 * 1024, // 500MB
        compressionLevel: 'balanced'
      },

      // Cold tier - rarely accessed data
      coldTier: {
        ttl: 86400, // 24 hours
        maxSize: 2048 * 1024 * 1024, // 2GB
        compressionLevel: 'maximum'
      }
    };

    return await this.deployTieredCaching(tieredConfig);
  }

  async setupAdvancedAnalytics() {
    console.log('Setting up advanced cache and session analytics...');

    const analyticsConfig = {
      // Real-time metrics
      realtimeMetrics: {
        hitRatioThreshold: 0.8,
        latencyThreshold: 100, // ms
        errorRateThreshold: 0.01,
        memoryUsageThreshold: 0.85
      },

      // Predictive analytics
      predictiveAnalytics: {
        accessPatternLearning: true,
        capacityForecasting: true,
        anomalyDetection: true,
        performanceOptimization: true
      },

      // Business intelligence
      businessIntelligence: {
        userBehaviorAnalysis: true,
        conversionTracking: true,
        sessionQualityScoring: true,
        cacheEfficiencyOptimization: true
      }
    };

    return await this.implementAdvancedAnalytics(analyticsConfig);
  }
}

SQL-Style Caching and Session Management with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB TTL collections and advanced caching operations:

-- QueryLeaf advanced caching and session management with SQL-familiar syntax

-- Configure TTL collections for automatic expiration management
CREATE COLLECTION user_sessions
WITH (
  ttl_field = 'expiresAt',
  expire_after_seconds = 0,
  storage_engine = 'wiredTiger',
  compression = 'snappy'
);

CREATE COLLECTION application_cache
WITH (
  ttl_field = 'expiresAt', 
  expire_after_seconds = 0,
  storage_engine = 'wiredTiger',
  compression = 'zlib'
);

-- Advanced session management with TTL and complex queries
WITH session_analytics AS (
  SELECT 
    user_id,
    session_id,
    created_at,
    last_accessed,
    expires_at,
    ip_address,
    device_fingerprint,
    risk_score,

    -- Session duration analysis
    EXTRACT(EPOCH FROM last_accessed - created_at) / 60 as session_duration_minutes,

    -- Session status classification
    CASE 
      WHEN last_accessed > CURRENT_TIMESTAMP - INTERVAL '5 minutes' THEN 'active'
      WHEN last_accessed > CURRENT_TIMESTAMP - INTERVAL '30 minutes' THEN 'recent' 
      WHEN last_accessed > CURRENT_TIMESTAMP - INTERVAL '2 hours' THEN 'idle'
      ELSE 'stale'
    END as session_status,

    -- Shopping cart analysis
    CAST(JSON_EXTRACT(session_data, '$.shoppingCart.items') AS JSONB) as cart_items,
    CAST(JSON_EXTRACT(session_data, '$.shoppingCart.total') AS DECIMAL(10,2)) as cart_total,

    -- User preferences analysis
    JSON_EXTRACT(session_data, '$.preferences.theme') as preferred_theme,
    JSON_EXTRACT(session_data, '$.preferences.language') as preferred_language,

    -- Navigation patterns
    JSON_EXTRACT(session_data, '$.navigation.lastPage') as last_page,
    JSON_EXTRACT(session_data, '$.navigation.referrer') as referrer,

    -- Activity metrics
    page_views,
    actions_performed,
    data_transferred,

    -- Security indicators  
    mfa_verified,
    login_method,

    -- Risk assessment
    CASE 
      WHEN risk_score > 0.8 THEN 'high'
      WHEN risk_score > 0.5 THEN 'medium'
      WHEN risk_score > 0.2 THEN 'low'
      ELSE 'minimal'
    END as risk_level

  FROM user_sessions
  WHERE active = true 
    AND expires_at > CURRENT_TIMESTAMP
),

session_aggregations AS (
  SELECT 
    -- Overall session metrics
    COUNT(*) as total_active_sessions,
    COUNT(DISTINCT user_id) as unique_active_users,
    AVG(session_duration_minutes) as avg_session_duration,

    -- Session status distribution
    COUNT(*) FILTER (WHERE session_status = 'active') as active_sessions,
    COUNT(*) FILTER (WHERE session_status = 'recent') as recent_sessions,
    COUNT(*) FILTER (WHERE session_status = 'idle') as idle_sessions,
    COUNT(*) FILTER (WHERE session_status = 'stale') as stale_sessions,

    -- Business metrics
    COUNT(*) FILTER (WHERE JSON_ARRAY_LENGTH(cart_items) > 0) as sessions_with_cart,
    AVG(cart_total) FILTER (WHERE cart_total > 0) as avg_cart_value,
    COUNT(*) FILTER (WHERE JSON_ARRAY_LENGTH(cart_items) > 5) as high_volume_carts,

    -- Security metrics
    COUNT(*) FILTER (WHERE risk_level = 'high') as high_risk_sessions,
    COUNT(*) FILTER (WHERE mfa_verified = true) as mfa_verified_sessions,
    COUNT(DISTINCT device_fingerprint) as unique_devices,
    COUNT(DISTINCT ip_address) as unique_ip_addresses,

    -- Activity analysis
    SUM(page_views) as total_page_views,
    SUM(actions_performed) as total_actions,
    AVG(page_views) as avg_page_views_per_session,
    AVG(actions_performed) as avg_actions_per_session,

    -- Performance insights
    SUM(data_transferred) / (1024 * 1024) as total_data_mb_transferred

  FROM session_analytics
),

cache_performance AS (
  SELECT 
    namespace,
    cache_key,
    created_at,
    expires_at,
    last_accessed,
    data_size,
    hit_count,
    tags,

    -- Cache efficiency metrics
    EXTRACT(EPOCH FROM expires_at - created_at) / 3600 as ttl_hours,
    EXTRACT(EPOCH FROM last_accessed - created_at) / 60 as lifetime_minutes,
    EXTRACT(EPOCH FROM CURRENT_TIMESTAMP - last_accessed) / 60 as idle_minutes,

    -- Data size analysis
    CASE 
      WHEN data_size > 1024 * 1024 THEN 'large'
      WHEN data_size > 100 * 1024 THEN 'medium'
      WHEN data_size > 10 * 1024 THEN 'small'
      ELSE 'tiny'
    END as size_category,

    -- Access pattern analysis
    CASE 
      WHEN hit_count > 100 THEN 'high_traffic'
      WHEN hit_count > 10 THEN 'medium_traffic'
      WHEN hit_count > 0 THEN 'low_traffic'
      ELSE 'unused'
    END as traffic_level,

    -- Cache effectiveness
    CASE 
      WHEN hit_count = 0 THEN 'ineffective'
      WHEN hit_count / GREATEST(lifetime_minutes / 60, 1) > 10 THEN 'highly_effective'
      WHEN hit_count / GREATEST(lifetime_minutes / 60, 1) > 1 THEN 'effective'
      ELSE 'moderately_effective'
    END as effectiveness_rating

  FROM application_cache
  WHERE expires_at > CURRENT_TIMESTAMP
),

cache_analytics AS (
  SELECT 
    namespace,

    -- Volume metrics
    COUNT(*) as total_entries,
    SUM(data_size) / (1024 * 1024) as total_size_mb,
    AVG(data_size) as avg_entry_size_bytes,

    -- Performance metrics
    SUM(hit_count) as total_hits,
    AVG(hit_count) as avg_hits_per_entry,
    COUNT(*) FILTER (WHERE hit_count = 0) as unused_entries,

    -- TTL analysis
    AVG(ttl_hours) as avg_ttl_hours,
    COUNT(*) FILTER (WHERE ttl_hours > 24) as long_lived_entries,
    COUNT(*) FILTER (WHERE ttl_hours < 1) as short_lived_entries,

    -- Size distribution
    COUNT(*) FILTER (WHERE size_category = 'large') as large_entries,
    COUNT(*) FILTER (WHERE size_category = 'medium') as medium_entries,
    COUNT(*) FILTER (WHERE size_category = 'small') as small_entries,
    COUNT(*) FILTER (WHERE size_category = 'tiny') as tiny_entries,

    -- Traffic analysis
    COUNT(*) FILTER (WHERE traffic_level = 'high_traffic') as high_traffic_entries,
    COUNT(*) FILTER (WHERE traffic_level = 'medium_traffic') as medium_traffic_entries,
    COUNT(*) FILTER (WHERE traffic_level = 'low_traffic') as low_traffic_entries,
    COUNT(*) FILTER (WHERE traffic_level = 'unused') as unused_traffic_entries,

    -- Effectiveness distribution
    COUNT(*) FILTER (WHERE effectiveness_rating = 'highly_effective') as highly_effective_entries,
    COUNT(*) FILTER (WHERE effectiveness_rating = 'effective') as effective_entries,
    COUNT(*) FILTER (WHERE effectiveness_rating = 'moderately_effective') as moderately_effective_entries,
    COUNT(*) FILTER (WHERE effectiveness_rating = 'ineffective') as ineffective_entries

  FROM cache_performance
  GROUP BY namespace
)

-- Comprehensive session and cache monitoring dashboard
SELECT 
  'System Performance Dashboard' as dashboard_title,
  CURRENT_TIMESTAMP as report_generated_at,

  -- Session management metrics
  JSON_OBJECT(
    'total_active_sessions', sa.total_active_sessions,
    'unique_active_users', sa.unique_active_users,
    'avg_session_duration_minutes', ROUND(sa.avg_session_duration::NUMERIC, 1),
    'session_distribution', JSON_OBJECT(
      'active', sa.active_sessions,
      'recent', sa.recent_sessions,
      'idle', sa.idle_sessions,
      'stale', sa.stale_sessions
    ),
    'security_metrics', JSON_OBJECT(
      'high_risk_sessions', sa.high_risk_sessions,
      'mfa_verified_sessions', sa.mfa_verified_sessions,
      'unique_devices', sa.unique_devices,
      'unique_ip_addresses', sa.unique_ip_addresses
    ),
    'business_metrics', JSON_OBJECT(
      'sessions_with_cart', sa.sessions_with_cart,
      'avg_cart_value', ROUND(sa.avg_cart_value::NUMERIC, 2),
      'high_volume_carts', sa.high_volume_carts,
      'cart_conversion_rate', 
        ROUND((sa.sessions_with_cart::FLOAT / sa.total_active_sessions * 100)::NUMERIC, 2)
    )
  ) as session_metrics,

  -- Cache performance metrics by namespace
  JSON_OBJECT_AGG(
    ca.namespace,
    JSON_OBJECT(
      'total_entries', ca.total_entries,
      'total_size_mb', ROUND(ca.total_size_mb::NUMERIC, 2),
      'avg_entry_size_kb', ROUND((ca.avg_entry_size_bytes / 1024)::NUMERIC, 1),
      'total_hits', ca.total_hits,
      'avg_hits_per_entry', ROUND(ca.avg_hits_per_entry::NUMERIC, 1),
      'unused_entry_percentage', 
        ROUND((ca.unused_entries::FLOAT / ca.total_entries * 100)::NUMERIC, 1),
      'cache_efficiency', 
        CASE 
          WHEN ca.unused_entries::FLOAT / ca.total_entries > 0.5 THEN 'poor'
          WHEN ca.unused_entries::FLOAT / ca.total_entries > 0.2 THEN 'fair'
          WHEN ca.unused_entries::FLOAT / ca.total_entries > 0.1 THEN 'good'
          ELSE 'excellent'
        END,
      'size_distribution', JSON_OBJECT(
        'large', ca.large_entries,
        'medium', ca.medium_entries,
        'small', ca.small_entries,
        'tiny', ca.tiny_entries
      ),
      'effectiveness_distribution', JSON_OBJECT(
        'highly_effective', ca.highly_effective_entries,
        'effective', ca.effective_entries,
        'moderately_effective', ca.moderately_effective_entries,
        'ineffective', ca.ineffective_entries
      )
    )
  ) as cache_metrics_by_namespace,

  -- System health indicators
  JSON_OBJECT(
    'session_system_health', 
      CASE 
        WHEN sa.high_risk_sessions::FLOAT / sa.total_active_sessions > 0.1 THEN 'critical'
        WHEN sa.avg_session_duration < 5 THEN 'warning'
        WHEN sa.unique_active_users::FLOAT / sa.total_active_sessions < 0.5 THEN 'warning'
        ELSE 'healthy'
      END,
    'cache_system_health',
      CASE 
        WHEN AVG(ca.unused_entries::FLOAT / ca.total_entries) > 0.5 THEN 'critical'
        WHEN AVG(ca.total_size_mb) > 1024 THEN 'warning'  
        WHEN AVG(ca.avg_hits_per_entry) < 5 THEN 'warning'
        ELSE 'healthy'
      END,
    'overall_system_status',
      CASE 
        WHEN sa.high_risk_sessions > 10 OR AVG(ca.unused_entries::FLOAT / ca.total_entries) > 0.5 THEN 'needs_attention'
        WHEN sa.avg_session_duration > 30 AND AVG(ca.avg_hits_per_entry) > 10 THEN 'optimal'
        ELSE 'normal'
      END
  ) as system_health,

  -- Operational recommendations
  ARRAY[
    CASE WHEN sa.high_risk_sessions > sa.total_active_sessions * 0.05 
         THEN 'Review session security policies and risk scoring algorithms' END,
    CASE WHEN AVG(ca.unused_entries::FLOAT / ca.total_entries) > 0.3
         THEN 'Optimize cache TTL settings and review caching strategies' END,
    CASE WHEN sa.avg_session_duration < 5
         THEN 'Investigate user engagement issues and session timeout settings' END,
    CASE WHEN AVG(ca.total_size_mb) > 512
         THEN 'Consider cache data compression and size optimization' END,
    CASE WHEN sa.sessions_with_cart::FLOAT / sa.total_active_sessions < 0.1
         THEN 'Review shopping cart functionality and user experience' END
  ]::TEXT[] as recommendations

FROM session_aggregations sa
CROSS JOIN cache_analytics ca
GROUP BY sa.total_active_sessions, sa.unique_active_users, sa.avg_session_duration,
         sa.active_sessions, sa.recent_sessions, sa.idle_sessions, sa.stale_sessions,
         sa.high_risk_sessions, sa.mfa_verified_sessions, sa.unique_devices, sa.unique_ip_addresses,
         sa.sessions_with_cart, sa.avg_cart_value, sa.high_volume_carts;

-- Advanced cache invalidation with pattern matching and conditional logic
UPDATE application_cache 
SET expires_at = CURRENT_TIMESTAMP,
    invalidated_at = CURRENT_TIMESTAMP,
    invalidation_reason = 'product_update_cascade'
WHERE 
  -- Pattern-based invalidation
  (cache_key LIKE 'product:%' OR cache_key LIKE 'catalog:%')
  AND 
  -- Conditional invalidation based on tags
  ('product_catalog' = ANY(tags) OR 'inventory' = ANY(tags))
  AND
  -- Time-based invalidation criteria
  created_at < CURRENT_TIMESTAMP - INTERVAL '1 hour'
  AND
  -- Size-based invalidation for large entries
  data_size > 1024 * 1024; -- 1MB

-- Session cleanup with advanced criteria and security considerations
UPDATE user_sessions 
SET active = false,
    expires_at = CURRENT_TIMESTAMP,
    invalidated_at = CURRENT_TIMESTAMP,
    invalidation_reason = 'security_cleanup'
WHERE 
  -- Risk-based cleanup
  risk_score > 0.8
  OR
  -- Inactive session cleanup
  (last_accessed < CURRENT_TIMESTAMP - INTERVAL '2 hours' AND remember_me = false)
  OR
  -- Device anomaly cleanup
  (device_fingerprint NOT IN (
    SELECT device_fingerprint 
    FROM user_sessions 
    WHERE user_id = user_sessions.user_id 
      AND created_at >= CURRENT_TIMESTAMP - INTERVAL '30 days'
    GROUP BY device_fingerprint
    HAVING COUNT(*) > 5
  ))
  OR
  -- Geographic anomaly cleanup
  (ip_address NOT SIMILAR TO (
    SELECT STRING_AGG(DISTINCT SUBSTRING(ip_address, 1, POSITION('.' IN ip_address, POSITION('.' IN ip_address) + 1)), '|')
    FROM user_sessions recent_sessions
    WHERE recent_sessions.user_id = user_sessions.user_id
      AND recent_sessions.created_at >= CURRENT_TIMESTAMP - INTERVAL '7 days'
  ));

-- Real-time TTL and performance monitoring
CREATE VIEW cache_session_health_monitor AS
WITH real_time_metrics AS (
  SELECT 
    -- Current timestamp for dashboard refresh
    CURRENT_TIMESTAMP as monitor_timestamp,

    -- Active session metrics
    (SELECT COUNT(*) FROM user_sessions 
     WHERE active = true AND expires_at > CURRENT_TIMESTAMP) as current_active_sessions,

    (SELECT COUNT(DISTINCT user_id) FROM user_sessions 
     WHERE active = true AND expires_at > CURRENT_TIMESTAMP) as current_unique_users,

    -- Cache metrics
    (SELECT COUNT(*) FROM application_cache 
     WHERE expires_at > CURRENT_TIMESTAMP) as current_cache_entries,

    (SELECT SUM(data_size) / (1024 * 1024) FROM application_cache 
     WHERE expires_at > CURRENT_TIMESTAMP) as current_cache_size_mb,

    -- Recent activity (last 5 minutes)
    (SELECT COUNT(*) FROM user_sessions 
     WHERE last_accessed >= CURRENT_TIMESTAMP - INTERVAL '5 minutes') as recent_session_activity,

    (SELECT COUNT(*) FROM application_cache 
     WHERE last_accessed >= CURRENT_TIMESTAMP - INTERVAL '5 minutes') as recent_cache_activity,

    -- TTL efficiency metrics
    (SELECT COUNT(*) FROM user_sessions 
     WHERE expires_at <= CURRENT_TIMESTAMP + INTERVAL '1 minute') as sessions_expiring_soon,

    (SELECT COUNT(*) FROM application_cache 
     WHERE expires_at <= CURRENT_TIMESTAMP + INTERVAL '1 minute') as cache_expiring_soon,

    -- Risk indicators
    (SELECT COUNT(*) FROM user_sessions 
     WHERE active = true AND risk_score > 0.7) as high_risk_active_sessions
)

SELECT 
  monitor_timestamp,

  -- Session health indicators
  current_active_sessions,
  current_unique_users,
  ROUND(current_unique_users::FLOAT / NULLIF(current_active_sessions, 0), 2) as user_session_ratio,
  recent_session_activity,
  sessions_expiring_soon,

  -- Cache health indicators  
  current_cache_entries,
  ROUND(current_cache_size_mb::NUMERIC, 2) as cache_size_mb,
  recent_cache_activity,
  cache_expiring_soon,

  -- Security indicators
  high_risk_active_sessions,
  CASE 
    WHEN high_risk_active_sessions > current_active_sessions * 0.1 THEN 'critical'
    WHEN high_risk_active_sessions > current_active_sessions * 0.05 THEN 'warning'
    ELSE 'normal'
  END as security_status,

  -- Performance indicators
  CASE 
    WHEN current_active_sessions > 10000 THEN 'high_load'
    WHEN current_active_sessions > 5000 THEN 'medium_load'
    WHEN current_active_sessions > 1000 THEN 'normal_load'
    ELSE 'low_load'
  END as system_load,

  -- Cache efficiency
  CASE 
    WHEN recent_cache_activity::FLOAT / NULLIF(current_cache_entries, 0) > 0.1 THEN 'highly_active'
    WHEN recent_cache_activity::FLOAT / NULLIF(current_cache_entries, 0) > 0.05 THEN 'moderately_active'
    ELSE 'low_activity'
  END as cache_activity_level,

  -- TTL management effectiveness
  CASE 
    WHEN sessions_expiring_soon + cache_expiring_soon > 100 THEN 'high_turnover'
    WHEN sessions_expiring_soon + cache_expiring_soon > 50 THEN 'moderate_turnover'
    ELSE 'stable_turnover'
  END as ttl_turnover_rate

FROM real_time_metrics;

-- QueryLeaf provides comprehensive MongoDB TTL and caching capabilities:
-- 1. Automatic TTL expiration management with SQL-familiar syntax
-- 2. Advanced session lifecycle management with security features
-- 3. Intelligent cache invalidation patterns and strategies
-- 4. Real-time performance monitoring and health assessments
-- 5. Flexible document structure support for complex cache and session data
-- 6. Built-in compression and storage optimization capabilities
-- 7. Sophisticated analytics and business intelligence integration
-- 8. Advanced security features including risk scoring and anomaly detection
-- 9. High-performance concurrent access with MongoDB's native capabilities
-- 10. Enterprise-grade scalability and distributed consistency support

Best Practices for Production Caching and Session Management

TTL Collection Strategy Design

Essential principles for effective MongoDB TTL implementation:

  1. TTL Configuration: Design TTL strategies that balance performance, storage costs, and data availability requirements
  2. Index Optimization: Implement appropriate indexing strategies for cache and session access patterns
  3. Data Compression: Use MongoDB compression features to optimize storage for large cache and session data
  4. Security Integration: Implement comprehensive security measures including risk scoring and device tracking
  5. Performance Monitoring: Deploy real-time monitoring and alerting for cache and session system health
  6. Scalability Planning: Design caching architecture that can scale with user growth and data volume increases

Enterprise Deployment Considerations

Optimize caching and session management for production environments:

  1. High Availability: Implement distributed session and cache management across multiple nodes
  2. Data Consistency: Ensure cache and session consistency across distributed infrastructure
  3. Disaster Recovery: Design backup and recovery procedures for critical session and cache data
  4. Compliance Integration: Meet regulatory requirements for session data handling and cache security
  5. Cost Optimization: Monitor and optimize caching costs while maintaining performance requirements
  6. Operational Integration: Integrate with existing monitoring, alerting, and operational workflows

Conclusion

MongoDB TTL Collections provide comprehensive distributed caching and session management capabilities that eliminate the complexity of traditional cache servers and session stores while offering superior flexibility, performance, and integration with existing application infrastructure. Native TTL expiration, advanced document modeling, and integrated analytics enable sophisticated caching strategies without requiring additional operational overhead.

Key MongoDB caching and session management benefits include:

  • Automatic TTL Management: Built-in expiration handling with no manual cleanup or maintenance required
  • Flexible Data Models: Support for complex nested session and cache data structures with efficient querying
  • Integrated Security: Comprehensive security features including risk scoring, device tracking, and anomaly detection
  • High Performance: Native MongoDB performance optimizations for concurrent cache and session operations
  • Advanced Analytics: Sophisticated metrics and business intelligence capabilities for optimization insights
  • SQL Accessibility: Familiar SQL-style operations through QueryLeaf for accessible cache and session management

Whether you're building high-traffic web applications, e-commerce platforms, IoT systems, or enterprise applications requiring sophisticated state management, MongoDB TTL Collections with QueryLeaf's familiar SQL interface provide the foundation for reliable, scalable, and efficient distributed caching and session management.

QueryLeaf Integration: QueryLeaf automatically translates SQL-style TTL and caching operations into MongoDB's native TTL collections and indexing strategies, making advanced caching and session management accessible to SQL-oriented development teams. Complex cache invalidation patterns, session analytics, and performance optimization are seamlessly handled through familiar SQL constructs, enabling sophisticated distributed state management without requiring deep MongoDB TTL expertise.

The combination of MongoDB's robust TTL capabilities with SQL-style caching operations makes it an ideal platform for applications requiring both high-performance state management and familiar database interaction patterns, ensuring your distributed systems can maintain optimal performance and user experience as they scale and evolve.

MongoDB Transactions and ACID Compliance in Distributed Systems: Multi-Document Consistency Patterns with SQL-Familiar Transaction Management

Modern distributed applications require reliable data consistency guarantees across multiple operations, ensuring that complex business workflows maintain data integrity even in the presence of concurrent access, system failures, and network partitions. MongoDB's multi-document transactions provide ACID compliance that enables traditional database consistency patterns while maintaining the flexibility and scalability of document-based data models.

MongoDB transactions support full ACID properties (Atomicity, Consistency, Isolation, Durability) across multiple documents, collections, and even databases within replica sets and sharded clusters, enabling complex business operations to maintain consistency without sacrificing the performance and flexibility advantages of NoSQL document storage.

The Distributed Data Consistency Challenge

Traditional approaches to maintaining consistency in distributed document systems often require complex application-level coordination:

// Traditional approach without transactions - complex error-prone coordination
async function transferFundsBetweenAccountsWithoutTransactions(fromAccountId, toAccountId, amount) {
  try {
    // Step 1: Check sufficient balance
    const fromAccount = await db.accounts.findOne({ _id: fromAccountId });
    if (!fromAccount || fromAccount.balance < amount) {
      throw new Error('Insufficient funds');
    }

    // Step 2: Deduct from source account
    const debitResult = await db.accounts.updateOne(
      { _id: fromAccountId, balance: { $gte: amount } },
      { $inc: { balance: -amount } }
    );

    if (debitResult.matchedCount === 0) {
      throw new Error('Concurrent modification - insufficient funds');
    }

    // Step 3: Add to destination account
    const creditResult = await db.accounts.updateOne(
      { _id: toAccountId },
      { $inc: { balance: amount } }
    );

    if (creditResult.matchedCount === 0) {
      // Rollback: Add money back to source account
      await db.accounts.updateOne(
        { _id: fromAccountId },
        { $inc: { balance: amount } }
      );
      throw new Error('Failed to credit destination account');
    }

    // Step 4: Record transaction history
    const historyResult = await db.transactionHistory.insertOne({
      fromAccount: fromAccountId,
      toAccount: toAccountId,
      amount: amount,
      type: 'transfer',
      timestamp: new Date(),
      status: 'completed'
    });

    if (!historyResult.insertedId) {
      // Complex rollback required
      await db.accounts.updateOne({ _id: fromAccountId }, { $inc: { balance: amount } });
      await db.accounts.updateOne({ _id: toAccountId }, { $inc: { balance: -amount } });
      throw new Error('Failed to record transaction history');
    }

    // Step 5: Update account statistics
    await db.accountStats.updateOne(
      { accountId: fromAccountId },
      { 
        $inc: { totalDebits: amount, transactionCount: 1 },
        $set: { lastActivity: new Date() }
      },
      { upsert: true }
    );

    await db.accountStats.updateOne(
      { accountId: toAccountId },
      { 
        $inc: { totalCredits: amount, transactionCount: 1 },
        $set: { lastActivity: new Date() }
      },
      { upsert: true }
    );

    return {
      success: true,
      transactionId: historyResult.insertedId,
      fromAccountBalance: fromAccount.balance - amount,
      timestamp: new Date()
    };

  } catch (error) {
    // Complex error recovery and partial rollback logic required
    console.error('Transfer failed:', error.message);

    // Attempt to verify and correct any partial updates
    try {
      // Check for orphaned updates and compensate
      await validateAndCompensatePartialTransfer(fromAccountId, toAccountId, amount);
    } catch (compensationError) {
      console.error('Compensation failed:', compensationError.message);
      // Manual intervention may be required
    }

    throw error;
  }
}

// Problems with non-transactional approaches:
// 1. Complex rollback logic for partial failures
// 2. Race conditions between concurrent operations
// 3. Potential data inconsistency during failure scenarios
// 4. Manual compensation logic for error recovery
// 5. Difficult to guarantee atomic multi-document operations
// 6. Complex error handling and state management
// 7. Risk of phantom reads and dirty reads
// 8. No isolation guarantees for concurrent access
// 9. Difficult to implement complex business rules atomically
// 10. Manual coordination across multiple collections and operations

async function validateAndCompensatePartialTransfer(fromAccountId, toAccountId, amount) {
  // Complex validation and compensation logic
  const fromAccount = await db.accounts.findOne({ _id: fromAccountId });
  const toAccount = await db.accounts.findOne({ _id: toAccountId });

  // Check for partial transfer state
  const recentHistory = await db.transactionHistory.findOne({
    fromAccount: fromAccountId,
    toAccount: toAccountId,
    amount: amount,
    timestamp: { $gte: new Date(Date.now() - 60000) } // Last minute
  });

  if (!recentHistory) {
    // No history recorded - check if money was debited but not credited
    const expectedFromBalance = fromAccount.originalBalance - amount; // This is problematic - we don't know original balance

    // Complex logic to determine correct state and compensate
    if (fromAccount.balance < expectedFromBalance) {
      // Money was debited but not credited - complete the transfer
      await db.accounts.updateOne(
        { _id: toAccountId },
        { $inc: { balance: amount } }
      );

      await db.transactionHistory.insertOne({
        fromAccount: fromAccountId,
        toAccount: toAccountId,
        amount: amount,
        type: 'transfer_compensation',
        timestamp: new Date(),
        status: 'compensated'
      });
    }
  }

  // Additional complex state validation and recovery logic...
}

// Traditional batch processing with manual consistency management
async function processOrderBatchWithoutTransactions(orders) {
  const processedOrders = [];
  const failedOrders = [];

  for (const order of orders) {
    try {
      // Step 1: Validate inventory
      const inventoryCheck = await db.inventory.findOne({ 
        productId: order.productId,
        quantity: { $gte: order.quantity }
      });

      if (!inventoryCheck) {
        failedOrders.push({ order, reason: 'insufficient_inventory' });
        continue;
      }

      // Step 2: Reserve inventory
      const inventoryUpdate = await db.inventory.updateOne(
        { 
          productId: order.productId, 
          quantity: { $gte: order.quantity }
        },
        { $inc: { quantity: -order.quantity, reserved: order.quantity } }
      );

      if (inventoryUpdate.matchedCount === 0) {
        failedOrders.push({ order, reason: 'inventory_update_failed' });
        continue;
      }

      // Step 3: Create order record
      const orderResult = await db.orders.insertOne({
        ...order,
        status: 'confirmed',
        createdAt: new Date(),
        inventoryReserved: true
      });

      if (!orderResult.insertedId) {
        // Rollback inventory reservation
        await db.inventory.updateOne(
          { productId: order.productId },
          { $inc: { quantity: order.quantity, reserved: -order.quantity } }
        );
        failedOrders.push({ order, reason: 'order_creation_failed' });
        continue;
      }

      // Step 4: Update customer statistics
      await db.customerStats.updateOne(
        { customerId: order.customerId },
        { 
          $inc: { 
            totalOrders: 1, 
            totalSpent: order.total 
          },
          $set: { lastOrderDate: new Date() }
        },
        { upsert: true }
      );

      processedOrders.push({
        orderId: orderResult.insertedId,
        customerId: order.customerId,
        productId: order.productId,
        status: 'completed'
      });

    } catch (error) {
      console.error('Order processing failed:', error);

      // Attempt partial cleanup - complex and error-prone
      try {
        await cleanupPartialOrder(order);
      } catch (cleanupError) {
        console.error('Cleanup failed for order:', order.orderId, cleanupError);
      }

      failedOrders.push({ 
        order, 
        reason: 'processing_error', 
        error: error.message 
      });
    }
  }

  return {
    processed: processedOrders,
    failed: failedOrders,
    summary: {
      total: orders.length,
      successful: processedOrders.length,
      failed: failedOrders.length
    }
  };
}

MongoDB transactions eliminate this complexity through ACID-compliant multi-document operations:

// MongoDB transactions - simple, reliable, ACID-compliant operations
const { MongoClient } = require('mongodb');

class DistributedTransactionManager {
  constructor(mongoClient) {
    this.client = mongoClient;
    this.db = mongoClient.db('financial_platform');

    // Collections for transactional operations
    this.collections = {
      accounts: this.db.collection('accounts'),
      transactions: this.db.collection('transactions'),
      accountStats: this.db.collection('account_statistics'),
      auditLog: this.db.collection('audit_log'),
      orders: this.db.collection('orders'),
      inventory: this.db.collection('inventory'),
      customers: this.db.collection('customers'),
      notifications: this.db.collection('notifications')
    };

    // Transaction configuration for different operation types
    this.transactionConfig = {
      // Financial operations require strict consistency
      financial: {
        readConcern: { level: 'snapshot' },
        writeConcern: { w: 'majority', j: true, wtimeout: 5000 },
        readPreference: 'primary',
        maxCommitTimeMS: 10000
      },

      // Business operations with balanced performance/consistency
      business: {
        readConcern: { level: 'majority' },
        writeConcern: { w: 'majority', j: true, wtimeout: 3000 },
        readPreference: 'primaryPreferred',
        maxCommitTimeMS: 8000
      },

      // Analytics operations allowing eventual consistency
      analytics: {
        readConcern: { level: 'available' },
        writeConcern: { w: 1, wtimeout: 2000 },
        readPreference: 'secondaryPreferred',
        maxCommitTimeMS: 5000
      }
    };
  }

  async transferFunds(fromAccountId, toAccountId, amount, metadata = {}) {
    const session = this.client.startSession();

    try {
      const result = await session.withTransaction(async () => {
        // All operations within this function are executed atomically

        // Step 1: Validate and lock source account with optimistic concurrency
        const fromAccount = await this.collections.accounts.findOneAndUpdate(
          { 
            _id: fromAccountId,
            balance: { $gte: amount },
            status: 'active',
            locked: { $ne: true }
          },
          { 
            $inc: { balance: -amount },
            $set: { 
              lastModified: new Date(),
              version: { $inc: 1 }
            }
          },
          { 
            session,
            returnDocument: 'before' // Get account state before modification
          }
        );

        if (!fromAccount.value) {
          throw new Error('Insufficient funds or account locked');
        }

        // Step 2: Credit destination account
        const toAccount = await this.collections.accounts.findOneAndUpdate(
          { 
            _id: toAccountId,
            status: 'active'
          },
          { 
            $inc: { balance: amount },
            $set: { 
              lastModified: new Date(),
              version: { $inc: 1 }
            }
          },
          { 
            session,
            returnDocument: 'after' // Get account state after modification
          }
        );

        if (!toAccount.value) {
          throw new Error('Invalid destination account');
        }

        // Step 3: Create transaction record with detailed information
        const transactionRecord = {
          type: 'transfer',
          fromAccount: {
            id: fromAccountId,
            balanceBefore: fromAccount.value.balance,
            balanceAfter: fromAccount.value.balance - amount
          },
          toAccount: {
            id: toAccountId,
            balanceBefore: toAccount.value.balance - amount,
            balanceAfter: toAccount.value.balance
          },
          amount: amount,
          currency: fromAccount.value.currency || 'USD',
          timestamp: new Date(),
          status: 'completed',
          metadata: {
            ...metadata,
            ipAddress: metadata.clientIp,
            userAgent: metadata.userAgent,
            requestId: metadata.requestId
          },
          fees: {
            transferFee: 0, // Could be calculated based on business rules
            exchangeFee: 0
          },
          compliance: {
            amlChecked: true,
            fraudScore: metadata.fraudScore || 0,
            riskLevel: metadata.riskLevel || 'low'
          }
        };

        const transactionResult = await this.collections.transactions.insertOne(
          transactionRecord,
          { session }
        );

        // Step 4: Update account statistics atomically
        await Promise.all([
          this.collections.accountStats.updateOne(
            { accountId: fromAccountId },
            {
              $inc: { 
                totalDebits: amount,
                transactionCount: 1,
                outgoingTransferCount: 1
              },
              $set: { 
                lastActivity: new Date(),
                lastDebitAmount: amount
              },
              $push: {
                recentTransactions: {
                  $each: [transactionResult.insertedId],
                  $slice: -100 // Keep only last 100 transactions
                }
              }
            },
            { session, upsert: true }
          ),

          this.collections.accountStats.updateOne(
            { accountId: toAccountId },
            {
              $inc: { 
                totalCredits: amount,
                transactionCount: 1,
                incomingTransferCount: 1
              },
              $set: { 
                lastActivity: new Date(),
                lastCreditAmount: amount
              },
              $push: {
                recentTransactions: {
                  $each: [transactionResult.insertedId],
                  $slice: -100
                }
              }
            },
            { session, upsert: true }
          )
        ]);

        // Step 5: Create audit log entry
        await this.collections.auditLog.insertOne({
          eventType: 'funds_transfer',
          entityType: 'account',
          entities: [fromAccountId, toAccountId],
          transactionId: transactionResult.insertedId,
          changes: {
            fromAccount: {
              balanceChange: -amount,
              newBalance: fromAccount.value.balance - amount
            },
            toAccount: {
              balanceChange: amount,
              newBalance: toAccount.value.balance
            }
          },
          metadata: metadata,
          timestamp: new Date(),
          sessionId: session.id
        }, { session });

        // Step 6: Trigger notifications if required
        if (amount >= 1000 || metadata.notifyUsers) {
          await this.collections.notifications.insertMany([
            {
              userId: fromAccount.value.userId,
              type: 'debit_notification',
              title: 'Funds Transfer Sent',
              message: `$${amount} transferred to account ${toAccountId}`,
              amount: amount,
              relatedTransactionId: transactionResult.insertedId,
              createdAt: new Date(),
              status: 'pending',
              priority: amount >= 10000 ? 'high' : 'normal'
            },
            {
              userId: toAccount.value.userId,
              type: 'credit_notification',
              title: 'Funds Transfer Received',
              message: `$${amount} received from account ${fromAccountId}`,
              amount: amount,
              relatedTransactionId: transactionResult.insertedId,
              createdAt: new Date(),
              status: 'pending',
              priority: amount >= 10000 ? 'high' : 'normal'
            }
          ], { session });
        }

        // Return comprehensive transaction result
        return {
          success: true,
          transactionId: transactionResult.insertedId,
          fromAccount: {
            id: fromAccountId,
            previousBalance: fromAccount.value.balance,
            newBalance: fromAccount.value.balance - amount
          },
          toAccount: {
            id: toAccountId,
            previousBalance: toAccount.value.balance - amount,
            newBalance: toAccount.value.balance
          },
          amount: amount,
          timestamp: transactionRecord.timestamp,
          fees: transactionRecord.fees,
          metadata: transactionRecord.metadata
        };

      }, this.transactionConfig.financial);

      return result;

    } catch (error) {
      console.error('Transaction failed:', error.message);

      // All changes are automatically rolled back by MongoDB
      throw new Error(`Transfer failed: ${error.message}`);

    } finally {
      await session.endSession();
    }
  }

  async processComplexOrder(orderData) {
    const session = this.client.startSession();

    try {
      const result = await session.withTransaction(async () => {
        // Complex multi-collection atomic operation

        // Step 1: Validate customer and apply discounts
        const customer = await this.collections.customers.findOneAndUpdate(
          { _id: orderData.customerId, status: 'active' },
          {
            $inc: { orderCount: 1 },
            $set: { lastOrderDate: new Date() }
          },
          { session, returnDocument: 'after' }
        );

        if (!customer.value) {
          throw new Error('Invalid customer');
        }

        // Calculate dynamic pricing based on customer tier
        const discountRate = this.calculateCustomerDiscount(customer.value);
        const discountedTotal = orderData.subtotal * (1 - discountRate);

        // Step 2: Reserve inventory for all items atomically
        const inventoryUpdates = orderData.items.map(async (item) => {
          const inventoryResult = await this.collections.inventory.findOneAndUpdate(
            {
              productId: item.productId,
              quantity: { $gte: item.quantity },
              status: 'available'
            },
            {
              $inc: { 
                quantity: -item.quantity,
                reserved: item.quantity,
                totalSold: item.quantity
              },
              $set: { lastSaleDate: new Date() }
            },
            { session, returnDocument: 'after' }
          );

          if (!inventoryResult.value) {
            throw new Error(`Insufficient inventory for product ${item.productId}`);
          }

          return {
            productId: item.productId,
            quantityReserved: item.quantity,
            newAvailableQuantity: inventoryResult.value.quantity,
            unitPrice: item.unitPrice,
            totalPrice: item.unitPrice * item.quantity
          };
        });

        const reservedInventory = await Promise.all(inventoryUpdates);

        // Step 3: Create comprehensive order record
        const order = {
          _id: orderData.orderId || new ObjectId(),
          customerId: orderData.customerId,
          customerTier: customer.value.tier,
          items: reservedInventory,
          pricing: {
            subtotal: orderData.subtotal,
            discountRate: discountRate,
            discountAmount: orderData.subtotal - discountedTotal,
            total: discountedTotal,
            currency: 'USD'
          },
          fulfillment: {
            status: 'confirmed',
            expectedShipDate: this.calculateShipDate(orderData.shippingMethod),
            shippingMethod: orderData.shippingMethod,
            trackingNumber: null
          },
          payment: {
            method: orderData.paymentMethod,
            status: 'pending',
            processingFee: this.calculateProcessingFee(discountedTotal)
          },
          timestamps: {
            ordered: new Date(),
            confirmed: new Date()
          },
          metadata: orderData.metadata || {}
        };

        const orderResult = await this.collections.orders.insertOne(order, { session });

        // Step 4: Process payment transaction
        if (orderData.paymentMethod === 'account_balance') {
          await this.processAccountPayment(
            orderData.customerId, 
            discountedTotal, 
            orderResult.insertedId,
            session
          );
        }

        // Step 5: Update customer statistics
        await this.collections.customers.updateOne(
          { _id: orderData.customerId },
          {
            $inc: { 
              totalSpent: discountedTotal,
              loyaltyPoints: Math.floor(discountedTotal * 0.1)
            },
            $push: {
              orderHistory: {
                $each: [orderResult.insertedId],
                $slice: -50 // Keep last 50 orders
              }
            }
          },
          { session }
        );

        // Step 6: Create fulfillment tasks
        await this.collections.notifications.insertOne({
          type: 'fulfillment_task',
          orderId: orderResult.insertedId,
          items: reservedInventory,
          priority: customer.value.tier === 'premium' ? 'high' : 'normal',
          assignedTo: null,
          status: 'pending',
          createdAt: new Date()
        }, { session });

        return {
          success: true,
          orderId: orderResult.insertedId,
          customer: {
            id: customer.value._id,
            tier: customer.value.tier,
            newOrderCount: customer.value.orderCount
          },
          order: {
            total: discountedTotal,
            itemsReserved: reservedInventory.length,
            status: 'confirmed'
          },
          inventory: reservedInventory
        };

      }, this.transactionConfig.business);

      return result;

    } catch (error) {
      console.error('Order processing failed:', error.message);
      throw error;

    } finally {
      await session.endSession();
    }
  }

  async batchProcessTransactions(transactions, batchSize = 10) {
    // Process transactions in batches with individual transaction isolation
    const results = [];
    const errors = [];

    for (let i = 0; i < transactions.length; i += batchSize) {
      const batch = transactions.slice(i, i + batchSize);

      const batchPromises = batch.map(async (txn, index) => {
        try {
          const result = await this.executeTransactionByType(txn);
          return { index: i + index, success: true, result };
        } catch (error) {
          return { index: i + index, success: false, error: error.message, transaction: txn };
        }
      });

      const batchResults = await Promise.allSettled(batchPromises);

      batchResults.forEach((promiseResult, batchIndex) => {
        if (promiseResult.status === 'fulfilled') {
          const txnResult = promiseResult.value;
          if (txnResult.success) {
            results.push(txnResult);
          } else {
            errors.push(txnResult);
          }
        } else {
          errors.push({
            index: i + batchIndex,
            success: false,
            error: promiseResult.reason.message,
            transaction: batch[batchIndex]
          });
        }
      });
    }

    return {
      totalProcessed: transactions.length,
      successful: results.length,
      failed: errors.length,
      results: results,
      errors: errors,
      successRate: (results.length / transactions.length) * 100
    };
  }

  async executeTransactionByType(transaction) {
    switch (transaction.type) {
      case 'transfer':
        return await this.transferFunds(
          transaction.fromAccount,
          transaction.toAccount,
          transaction.amount,
          transaction.metadata
        );

      case 'order':
        return await this.processComplexOrder(transaction.orderData);

      case 'payment':
        return await this.processPayment(transaction.paymentData);

      default:
        throw new Error(`Unknown transaction type: ${transaction.type}`);
    }
  }

  // Helper methods for business logic
  calculateCustomerDiscount(customer) {
    const tierDiscounts = {
      'premium': 0.15,
      'gold': 0.10,
      'silver': 0.05,
      'bronze': 0.02,
      'standard': 0.0
    };

    const baseDiscount = tierDiscounts[customer.tier] || 0;
    const orderCountBonus = Math.min(customer.orderCount * 0.001, 0.05);

    return Math.min(baseDiscount + orderCountBonus, 0.25); // Cap at 25%
  }

  calculateShipDate(shippingMethod) {
    const shippingDays = {
      'overnight': 1,
      'express': 2,
      'standard': 5,
      'economy': 7
    };

    const days = shippingDays[shippingMethod] || 5;
    const shipDate = new Date();
    shipDate.setDate(shipDate.getDate() + days);

    return shipDate;
  }

  calculateProcessingFee(amount) {
    return Math.max(amount * 0.029, 0.30); // 2.9% + $0.30 minimum
  }

  async processAccountPayment(customerId, amount, orderId, session) {
    return await this.transferFunds(
      customerId, // Assuming customer accounts for simplicity
      'merchant_account_id',
      amount,
      { 
        orderId: orderId,
        paymentType: 'order_payment'
      }
    );
  }
}

// Benefits of MongoDB transactions:
// 1. Automatic rollback on any failure - no manual cleanup required
// 2. ACID compliance ensures data consistency across multiple collections
// 3. Isolation levels prevent dirty reads and phantom reads
// 4. Durability guarantees with configurable write concerns
// 5. Simplified error handling - all-or-nothing semantics
// 6. Built-in deadlock detection and resolution
// 7. Performance optimization with snapshot isolation
// 8. Cross-shard transactions in distributed deployments
// 9. Integration with replica sets for high availability
// 10. Familiar transaction patterns for SQL developers

Advanced Transaction Patterns and Isolation Levels

Multi-Level Transaction Management

// Advanced transaction patterns for complex distributed scenarios
class AdvancedTransactionPatterns {
  constructor(mongoClient) {
    this.client = mongoClient;
    this.db = mongoClient.db('enterprise_platform');
  }

  async executeNestedBusinessTransaction(businessOperation) {
    const session = this.client.startSession();

    try {
      const result = await session.withTransaction(async () => {
        // Nested transaction pattern with savepoints simulation
        const checkpoints = [];

        try {
          // Checkpoint 1: Customer validation and setup
          const customerValidation = await this.validateAndSetupCustomer(
            businessOperation.customerId, 
            session
          );
          checkpoints.push('customer_validation');

          // Checkpoint 2: Inventory allocation across multiple warehouses
          const inventoryAllocation = await this.allocateInventoryAcrossWarehouses(
            businessOperation.items,
            businessOperation.deliveryLocation,
            session
          );
          checkpoints.push('inventory_allocation');

          // Checkpoint 3: Financial authorization and holds
          const financialAuthorization = await this.authorizePaymentWithHolds(
            businessOperation.paymentDetails,
            inventoryAllocation.totalCost,
            session
          );
          checkpoints.push('financial_authorization');

          // Checkpoint 4: Complex business rules validation
          const businessRulesValidation = await this.validateComplexBusinessRules(
            customerValidation,
            inventoryAllocation,
            financialAuthorization,
            session
          );
          checkpoints.push('business_rules');

          // Checkpoint 5: Finalize all operations atomically
          const finalization = await this.finalizeBusinessOperation(
            businessOperation,
            {
              customer: customerValidation,
              inventory: inventoryAllocation,
              financial: financialAuthorization,
              rules: businessRulesValidation
            },
            session
          );

          return {
            success: true,
            businessOperationId: finalization.operationId,
            checkpointsCompleted: checkpoints,
            details: finalization
          };

        } catch (error) {
          // Enhanced error context with checkpoint information
          throw new Error(`Business transaction failed at ${checkpoints[checkpoints.length - 1] || 'initialization'}: ${error.message}`);
        }

      }, {
        readConcern: { level: 'snapshot' },
        writeConcern: { w: 'majority', j: true },
        maxCommitTimeMS: 30000 // Extended timeout for complex operations
      });

      return result;

    } finally {
      await session.endSession();
    }
  }

  async validateAndSetupCustomer(customerId, session) {
    // Customer validation with comprehensive business context
    const customer = await this.db.collection('customers').findOneAndUpdate(
      {
        _id: customerId,
        status: 'active',
        creditStatus: { $nin: ['suspended', 'blocked'] }
      },
      {
        $set: { lastActivityDate: new Date() },
        $inc: { transactionAttempts: 1 }
      },
      { session, returnDocument: 'after' }
    );

    if (!customer.value) {
      throw new Error('Customer validation failed');
    }

    // Check customer limits and restrictions
    const customerLimits = await this.db.collection('customer_limits').findOne(
      { customerId: customerId },
      { session }
    );

    const riskAssessment = await this.db.collection('risk_assessments').findOne(
      { customerId: customerId, status: 'active' },
      { session }
    );

    return {
      customer: customer.value,
      limits: customerLimits,
      riskProfile: riskAssessment,
      validated: true
    };
  }

  async allocateInventoryAcrossWarehouses(items, deliveryLocation, session) {
    // Complex inventory allocation across multiple warehouses
    const allocationResults = [];
    let totalCost = 0;

    for (const item of items) {
      // Find optimal warehouse allocation
      const warehouseAllocation = await this.db.collection('warehouse_inventory').aggregate([
        {
          $match: {
            productId: item.productId,
            availableQuantity: { $gte: item.requestedQuantity },
            status: 'active'
          }
        },
        {
          $addFields: {
            // Calculate shipping cost and delivery time
            shippingCost: {
              $multiply: [
                "$shippingRates.base",
                { $add: [1, "$shippingRates.distanceMultiplier"] }
              ]
            },
            estimatedDeliveryDays: {
              $ceil: { $divide: ["$distanceFromDelivery", 500] }
            }
          }
        },
        {
          $sort: {
            shippingCost: 1,
            estimatedDeliveryDays: 1,
            availableQuantity: -1
          }
        },
        { $limit: 1 }
      ], { session }).toArray();

      if (warehouseAllocation.length === 0) {
        throw new Error(`No suitable warehouse found for product ${item.productId}`);
      }

      const selectedWarehouse = warehouseAllocation[0];

      // Reserve inventory atomically
      const reservationResult = await this.db.collection('warehouse_inventory').findOneAndUpdate(
        {
          _id: selectedWarehouse._id,
          availableQuantity: { $gte: item.requestedQuantity }
        },
        {
          $inc: {
            availableQuantity: -item.requestedQuantity,
            reservedQuantity: item.requestedQuantity
          },
          $push: {
            reservations: {
              quantity: item.requestedQuantity,
              reservedAt: new Date(),
              expiresAt: new Date(Date.now() + 30 * 60 * 1000), // 30-minute expiry
              customerId: items.customerId
            }
          }
        },
        { session, returnDocument: 'after' }
      );

      if (!reservationResult.value) {
        throw new Error(`Failed to reserve inventory for product ${item.productId}`);
      }

      const itemCost = item.requestedQuantity * selectedWarehouse.unitPrice;
      totalCost += itemCost;

      allocationResults.push({
        productId: item.productId,
        warehouseId: selectedWarehouse.warehouseId,
        quantity: item.requestedQuantity,
        unitPrice: selectedWarehouse.unitPrice,
        totalPrice: itemCost,
        shippingCost: selectedWarehouse.shippingCost,
        estimatedDelivery: selectedWarehouse.estimatedDeliveryDays,
        reservationId: reservationResult.value.reservations[reservationResult.value.reservations.length - 1]
      });
    }

    return {
      allocations: allocationResults,
      totalCost: totalCost,
      warehousesInvolved: [...new Set(allocationResults.map(a => a.warehouseId))]
    };
  }

  async authorizePaymentWithHolds(paymentDetails, amount, session) {
    // Financial authorization with temporary holds
    const paymentMethod = await this.db.collection('payment_methods').findOne(
      {
        _id: paymentDetails.paymentMethodId,
        customerId: paymentDetails.customerId,
        status: 'active'
      },
      { session }
    );

    if (!paymentMethod) {
      throw new Error('Invalid payment method');
    }

    // Create payment authorization hold
    const authorizationHold = {
      customerId: paymentDetails.customerId,
      paymentMethodId: paymentDetails.paymentMethodId,
      amount: amount,
      currency: 'USD',
      authorizationCode: this.generateAuthorizationCode(),
      status: 'authorized',
      expiresAt: new Date(Date.now() + 24 * 60 * 60 * 1000), // 24-hour expiry
      createdAt: new Date()
    };

    const authResult = await this.db.collection('payment_authorizations').insertOne(
      authorizationHold,
      { session }
    );

    // Update customer available credit if applicable
    if (paymentMethod.type === 'credit_account') {
      await this.db.collection('credit_accounts').updateOne(
        {
          customerId: paymentDetails.customerId,
          availableCredit: { $gte: amount }
        },
        {
          $inc: {
            availableCredit: -amount,
            pendingCharges: amount
          }
        },
        { session }
      );
    }

    return {
      authorizationId: authResult.insertedId,
      authorizationCode: authorizationHold.authorizationCode,
      authorizedAmount: amount,
      expiresAt: authorizationHold.expiresAt,
      paymentMethod: paymentMethod.type
    };
  }

  async validateComplexBusinessRules(customer, inventory, financial, session) {
    // Complex business rules validation
    const businessRules = [];

    // Rule 1: Customer tier restrictions
    const tierRestrictions = await this.db.collection('tier_restrictions').findOne(
      { tier: customer.customer.tier },
      { session }
    );

    if (tierRestrictions && inventory.totalCost > tierRestrictions.maxOrderValue) {
      throw new Error(`Order exceeds maximum value for ${customer.customer.tier} tier`);
    }

    businessRules.push({
      rule: 'tier_restrictions',
      passed: true,
      details: `Order value ${inventory.totalCost} within limits for tier ${customer.customer.tier}`
    });

    // Rule 2: Geographic shipping restrictions
    const shippingRestrictions = await this.db.collection('shipping_restrictions').findOne(
      {
        countries: customer.customer.shippingAddress?.country,
        productCategories: { $in: inventory.allocations.map(a => a.productCategory) }
      },
      { session }
    );

    if (shippingRestrictions?.restricted) {
      throw new Error('Shipping restrictions apply to this order');
    }

    businessRules.push({
      rule: 'shipping_restrictions',
      passed: true,
      details: 'No shipping restrictions found'
    });

    // Rule 3: Fraud detection rules
    const fraudScore = await this.calculateFraudScore(customer, inventory, financial);

    if (fraudScore > 75) {
      throw new Error('Order flagged by fraud detection system');
    }

    businessRules.push({
      rule: 'fraud_detection',
      passed: true,
      details: `Fraud score: ${fraudScore}/100`
    });

    return {
      rulesValidated: businessRules,
      fraudScore: fraudScore,
      allRulesPassed: true
    };
  }

  async finalizeBusinessOperation(operation, validationResults, session) {
    // Create comprehensive business operation record
    const businessOperation = {
      operationType: operation.type,
      customerId: operation.customerId,

      customerDetails: validationResults.customer,
      inventoryAllocation: validationResults.inventory,
      financialAuthorization: validationResults.financial,
      businessRulesValidation: validationResults.rules,

      status: 'completed',
      timestamps: {
        initiated: operation.initiatedAt || new Date(),
        validated: new Date(),
        completed: new Date()
      },

      metadata: {
        requestId: operation.requestId,
        channel: operation.channel || 'api',
        userAgent: operation.userAgent,
        ipAddress: operation.ipAddress
      }
    };

    const operationResult = await this.db.collection('business_operations').insertOne(
      businessOperation,
      { session }
    );

    // Create audit trail
    await this.db.collection('audit_trail').insertOne({
      entityType: 'business_operation',
      entityId: operationResult.insertedId,
      action: 'completed',
      performedBy: operation.customerId,
      details: businessOperation,
      timestamp: new Date()
    }, { session });

    // Trigger post-transaction workflows
    await this.db.collection('workflow_triggers').insertOne({
      triggerType: 'business_operation_completed',
      operationId: operationResult.insertedId,
      workflowsToExecute: [
        'send_confirmation_email',
        'update_customer_analytics',
        'trigger_fulfillment_process',
        'update_inventory_forecasting'
      ],
      priority: 'normal',
      scheduledFor: new Date(),
      status: 'pending'
    }, { session });

    return {
      operationId: operationResult.insertedId,
      completedAt: new Date(),
      summary: {
        customer: validationResults.customer.customer.email,
        totalValue: validationResults.inventory.totalCost,
        itemsAllocated: validationResults.inventory.allocations.length,
        warehousesInvolved: validationResults.inventory.warehousesInvolved.length,
        authorizationCode: validationResults.financial.authorizationCode
      }
    };
  }

  generateAuthorizationCode() {
    return Math.random().toString(36).substr(2, 9).toUpperCase();
  }

  async calculateFraudScore(customer, inventory, financial) {
    // Simplified fraud scoring algorithm
    let score = 0;

    // Customer history factor
    if (customer.customer.orderCount < 5) score += 20;
    if (customer.customer.accountAge < 30) score += 15;

    // Order size factor
    if (inventory.totalCost > 1000) score += 10;
    if (inventory.totalCost > 5000) score += 20;

    // Geographic factor
    if (customer.customer.shippingAddress?.country !== customer.customer.billingAddress?.country) {
      score += 15;
    }

    // Payment method factor
    if (financial.paymentMethod === 'new_credit_card') score += 25;

    return Math.min(score, 100);
  }
}

SQL Integration with QueryLeaf

QueryLeaf provides familiar SQL transaction syntax for MongoDB operations:

-- QueryLeaf SQL syntax for MongoDB transactions

-- Begin transaction with explicit isolation level
BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;

-- Multi-table operations within transaction scope
UPDATE accounts 
SET balance = balance - 1000,
    last_modified = CURRENT_TIMESTAMP,
    version = version + 1
WHERE account_id = 'ACC001' 
  AND balance >= 1000
  AND status = 'active';

UPDATE accounts 
SET balance = balance + 1000,
    last_modified = CURRENT_TIMESTAMP,
    version = version + 1  
WHERE account_id = 'ACC002'
  AND status = 'active';

-- Insert transaction record within same transaction
INSERT INTO transactions (
    from_account_id,
    to_account_id,
    amount,
    transaction_type,
    status,
    created_at
) VALUES (
    'ACC001',
    'ACC002', 
    1000,
    'transfer',
    'completed',
    CURRENT_TIMESTAMP
);

-- Update statistics atomically
INSERT INTO account_statistics (
    account_id,
    total_debits,
    total_credits,
    transaction_count,
    last_activity
) VALUES (
    'ACC001',
    1000,
    0,
    1,
    CURRENT_TIMESTAMP
) ON DUPLICATE KEY UPDATE 
    total_debits = total_debits + 1000,
    transaction_count = transaction_count + 1,
    last_activity = CURRENT_TIMESTAMP;

INSERT INTO account_statistics (
    account_id,
    total_debits, 
    total_credits,
    transaction_count,
    last_activity
) VALUES (
    'ACC002',
    0,
    1000,
    1,
    CURRENT_TIMESTAMP
) ON DUPLICATE KEY UPDATE
    total_credits = total_credits + 1000,
    transaction_count = transaction_count + 1,
    last_activity = CURRENT_TIMESTAMP;

-- Commit transaction - all operations succeed or fail together
COMMIT;

-- Example of transaction rollback on error
BEGIN TRANSACTION;

-- Attempt complex multi-step operation
UPDATE inventory 
SET quantity = quantity - 5,
    reserved = reserved + 5
WHERE product_id = 'PROD123' 
  AND quantity >= 5;

-- Check if update succeeded
IF @@ROWCOUNT = 0 BEGIN
    ROLLBACK;
    THROW 50001, 'Insufficient inventory', 1;
END

INSERT INTO orders (
    customer_id,
    product_id,
    quantity,
    status,
    order_date
) VALUES (
    'CUST456',
    'PROD123',
    5,
    'confirmed',
    CURRENT_TIMESTAMP
);

INSERT INTO order_items (
    order_id,
    product_id,
    quantity,
    unit_price,
    total_price
) SELECT 
    LAST_INSERT_ID(),
    'PROD123',
    5,
    p.price,
    p.price * 5
FROM products p
WHERE p.product_id = 'PROD123';

COMMIT;

-- Advanced transaction with savepoints
BEGIN TRANSACTION;

-- Savepoint for customer validation
SAVEPOINT customer_validation;

UPDATE customers 
SET last_order_date = CURRENT_TIMESTAMP,
    order_count = order_count + 1
WHERE customer_id = 'CUST789'
  AND status = 'active';

IF @@ROWCOUNT = 0 BEGIN
    ROLLBACK TO customer_validation;
    THROW 50002, 'Invalid customer', 1;
END

-- Savepoint for inventory allocation  
SAVEPOINT inventory_allocation;

-- Complex inventory update across multiple warehouses
WITH warehouse_inventory AS (
    SELECT 
        warehouse_id,
        product_id,
        available_quantity,
        ROW_NUMBER() OVER (ORDER BY shipping_cost, available_quantity DESC) as priority
    FROM warehouse_stock 
    WHERE product_id = 'PROD456'
      AND available_quantity >= 3
),
selected_warehouse AS (
    SELECT warehouse_id, product_id, available_quantity
    FROM warehouse_inventory
    WHERE priority = 1
)
UPDATE ws 
SET available_quantity = ws.available_quantity - 3,
    reserved_quantity = ws.reserved_quantity + 3
FROM warehouse_stock ws
INNER JOIN selected_warehouse sw ON ws.warehouse_id = sw.warehouse_id
WHERE ws.product_id = 'PROD456';

IF @@ROWCOUNT = 0 BEGIN
    ROLLBACK TO inventory_allocation;
    THROW 50003, 'Inventory allocation failed', 1;
END

-- Financial authorization
SAVEPOINT financial_authorization;

INSERT INTO payment_authorizations (
    customer_id,
    amount,
    payment_method_id,
    authorization_code,
    status,
    expires_at
) VALUES (
    'CUST789',
    149.97,
    'PM001',
    NEWID(),
    'authorized',
    DATEADD(HOUR, 24, CURRENT_TIMESTAMP)
);

-- Final order creation
INSERT INTO orders (
    customer_id,
    total_amount,
    status,
    payment_authorization_id,
    created_at
) VALUES (
    'CUST789',
    149.97,
    'confirmed',
    LAST_INSERT_ID(),
    CURRENT_TIMESTAMP
);

COMMIT;

-- QueryLeaf transaction features:
-- 1. Standard SQL transaction syntax (BEGIN/COMMIT/ROLLBACK)
-- 2. Isolation level specification for consistency requirements
-- 3. Savepoint support for complex multi-step operations
-- 4. Automatic translation to MongoDB transaction sessions
-- 5. Cross-collection operations with ACID guarantees
-- 6. Error handling with conditional rollbacks
-- 7. Integration with MongoDB replica sets and sharding
-- 8. Performance optimization with appropriate read/write concerns

Distributed Transaction Coordination

Cross-Shard Transaction Management

// Advanced distributed transaction patterns for sharded MongoDB clusters
class ShardedTransactionCoordinator {
  constructor(mongoClient, shardConfig) {
    this.client = mongoClient;
    this.shardConfig = shardConfig;
    this.databases = {
      financial: mongoClient.db('financial_shard'),
      inventory: mongoClient.db('inventory_shard'), 
      customer: mongoClient.db('customer_shard'),
      analytics: mongoClient.db('analytics_shard')
    };
  }

  async executeDistributedTransaction(distributedOperation) {
    // Distributed transaction across multiple shards
    const session = this.client.startSession();

    try {
      const result = await session.withTransaction(async () => {
        // Cross-shard transaction coordination
        const operationResults = [];

        // Phase 1: Customer shard operations
        const customerResult = await this.executeCustomerShardOperations(
          distributedOperation.customerOperations,
          session
        );
        operationResults.push({ shard: 'customer', result: customerResult });

        // Phase 2: Inventory shard operations
        const inventoryResult = await this.executeInventoryShardOperations(
          distributedOperation.inventoryOperations,
          session
        );
        operationResults.push({ shard: 'inventory', result: inventoryResult });

        // Phase 3: Financial shard operations
        const financialResult = await this.executeFinancialShardOperations(
          distributedOperation.financialOperations,
          session
        );
        operationResults.push({ shard: 'financial', result: financialResult });

        // Phase 4: Analytics shard operations (eventual consistency)
        const analyticsResult = await this.executeAnalyticsShardOperations(
          distributedOperation.analyticsOperations,
          session
        );
        operationResults.push({ shard: 'analytics', result: analyticsResult });

        // Phase 5: Cross-shard validation and coordination
        const coordinationResult = await this.validateCrossShardConsistency(
          operationResults,
          session
        );

        return {
          success: true,
          distributedTransactionId: this.generateTransactionId(),
          shardResults: operationResults,
          coordination: coordinationResult,
          completedAt: new Date()
        };

      }, {
        readConcern: { level: 'snapshot' },
        writeConcern: { w: 'majority', j: true, wtimeout: 10000 },
        maxCommitTimeMS: 30000
      });

      return result;

    } catch (error) {
      console.error('Distributed transaction failed:', error.message);

      // Enhanced error recovery for distributed scenarios
      await this.handleDistributedTransactionFailure(distributedOperation, error, session);
      throw error;

    } finally {
      await session.endSession();
    }
  }

  async executeCustomerShardOperations(operations, session) {
    const customerDb = this.databases.customer;
    const results = [];

    for (const operation of operations) {
      switch (operation.type) {
        case 'update_customer_profile':
          const customerUpdate = await customerDb.collection('customers').findOneAndUpdate(
            { _id: operation.customerId },
            {
              $set: operation.updateData,
              $inc: { version: 1 },
              $push: {
                updateHistory: {
                  timestamp: new Date(),
                  operation: operation.type,
                  data: operation.updateData
                }
              }
            },
            { session, returnDocument: 'after' }
          );
          results.push({ operation: operation.type, result: customerUpdate.value });
          break;

        case 'update_loyalty_points':
          const loyaltyUpdate = await customerDb.collection('loyalty_accounts').findOneAndUpdate(
            { customerId: operation.customerId },
            {
              $inc: { 
                points: operation.pointsChange,
                totalEarned: Math.max(0, operation.pointsChange),
                totalSpent: Math.max(0, -operation.pointsChange)
              },
              $set: { lastActivity: new Date() }
            },
            { session, upsert: true, returnDocument: 'after' }
          );
          results.push({ operation: operation.type, result: loyaltyUpdate.value });
          break;
      }
    }

    return results;
  }

  async executeInventoryShardOperations(operations, session) {
    const inventoryDb = this.databases.inventory;
    const results = [];

    for (const operation of operations) {
      switch (operation.type) {
        case 'reserve_inventory':
          const reservationResults = await Promise.all(
            operation.items.map(async (item) => {
              const reservation = await inventoryDb.collection('product_inventory').findOneAndUpdate(
                {
                  productId: item.productId,
                  warehouseId: item.warehouseId,
                  availableQuantity: { $gte: item.quantity }
                },
                {
                  $inc: {
                    availableQuantity: -item.quantity,
                    reservedQuantity: item.quantity
                  },
                  $push: {
                    reservations: {
                      customerId: operation.customerId,
                      quantity: item.quantity,
                      reservedAt: new Date(),
                      expiresAt: new Date(Date.now() + 30 * 60 * 1000) // 30 minutes
                    }
                  }
                },
                { session, returnDocument: 'after' }
              );

              if (!reservation.value) {
                throw new Error(`Failed to reserve ${item.quantity} units of ${item.productId}`);
              }

              return reservation.value;
            })
          );

          results.push({ operation: operation.type, reservations: reservationResults });
          break;

        case 'update_product_metrics':
          const metricsUpdate = await inventoryDb.collection('product_metrics').updateMany(
            { productId: { $in: operation.productIds } },
            {
              $inc: { 
                totalSales: operation.salesIncrement,
                viewCount: operation.viewIncrement || 0
              },
              $set: { lastSaleDate: new Date() }
            },
            { session, upsert: true }
          );
          results.push({ operation: operation.type, result: metricsUpdate });
          break;
      }
    }

    return results;
  }

  async executeFinancialShardOperations(operations, session) {
    const financialDb = this.databases.financial;
    const results = [];

    for (const operation of operations) {
      switch (operation.type) {
        case 'process_payment':
          // Payment processing with fraud detection
          const fraudCheck = await financialDb.collection('fraud_detection').insertOne({
            customerId: operation.customerId,
            amount: operation.amount,
            paymentMethodId: operation.paymentMethodId,
            riskScore: await this.calculateRiskScore(operation),
            checkTimestamp: new Date(),
            status: 'pending'
          }, { session });

          const payment = await financialDb.collection('payments').insertOne({
            customerId: operation.customerId,
            amount: operation.amount,
            currency: operation.currency || 'USD',
            paymentMethodId: operation.paymentMethodId,
            fraudCheckId: fraudCheck.insertedId,
            status: 'authorized',
            processedAt: new Date(),
            metadata: operation.metadata
          }, { session });

          // Update payment method statistics
          await financialDb.collection('payment_method_stats').updateOne(
            { paymentMethodId: operation.paymentMethodId },
            {
              $inc: {
                transactionCount: 1,
                totalAmount: operation.amount
              },
              $set: { lastUsed: new Date() }
            },
            { session, upsert: true }
          );

          results.push({ 
            operation: operation.type, 
            paymentId: payment.insertedId,
            fraudCheckId: fraudCheck.insertedId
          });
          break;

        case 'update_account_balance':
          const balanceUpdate = await financialDb.collection('account_balances').findOneAndUpdate(
            { 
              customerId: operation.customerId,
              currency: operation.currency || 'USD'
            },
            {
              $inc: { balance: operation.balanceChange },
              $set: { lastModified: new Date() },
              $push: {
                transactionHistory: {
                  amount: operation.balanceChange,
                  timestamp: new Date(),
                  reference: operation.reference
                }
              }
            },
            { session, upsert: true, returnDocument: 'after' }
          );

          results.push({ operation: operation.type, result: balanceUpdate.value });
          break;
      }
    }

    return results;
  }

  async executeAnalyticsShardOperations(operations, session) {
    const analyticsDb = this.databases.analytics;
    const results = [];

    // Analytics operations with eventual consistency
    for (const operation of operations) {
      switch (operation.type) {
        case 'update_customer_analytics':
          const customerAnalytics = await analyticsDb.collection('customer_analytics').updateOne(
            { customerId: operation.customerId },
            {
              $inc: {
                totalOrders: operation.orderIncrement || 0,
                totalSpent: operation.spentIncrement || 0,
                loyaltyPointsEarned: operation.pointsEarned || 0
              },
              $set: { lastUpdated: new Date() },
              $push: {
                activityLog: {
                  timestamp: new Date(),
                  activity: operation.activity,
                  value: operation.value
                }
              }
            },
            { session, upsert: true }
          );
          results.push({ operation: operation.type, result: customerAnalytics });
          break;

        case 'update_product_analytics':
          const productAnalytics = await analyticsDb.collection('product_analytics').updateMany(
            { productId: { $in: operation.productIds } },
            {
              $inc: {
                salesCount: operation.salesIncrement || 0,
                revenue: operation.revenueIncrement || 0
              },
              $set: { lastSaleTimestamp: new Date() }
            },
            { session, upsert: true }
          );
          results.push({ operation: operation.type, result: productAnalytics });
          break;

        case 'record_business_event':
          const businessEvent = await analyticsDb.collection('business_events').insertOne({
            eventType: operation.eventType,
            customerId: operation.customerId,
            productIds: operation.productIds,
            metadata: operation.metadata,
            timestamp: new Date(),
            value: operation.value
          }, { session });
          results.push({ operation: operation.type, eventId: businessEvent.insertedId });
          break;
      }
    }

    return results;
  }

  async validateCrossShardConsistency(shardResults, session) {
    // Cross-shard consistency validation
    const consistencyChecks = [];

    // Check customer-financial consistency
    const customerData = shardResults.find(r => r.shard === 'customer')?.result;
    const financialData = shardResults.find(r => r.shard === 'financial')?.result;

    if (customerData && financialData) {
      const customerConsistency = await this.validateCustomerFinancialConsistency(
        customerData,
        financialData,
        session
      );
      consistencyChecks.push(customerConsistency);
    }

    // Check inventory-financial consistency
    const inventoryData = shardResults.find(r => r.shard === 'inventory')?.result;

    if (inventoryData && financialData) {
      const inventoryConsistency = await this.validateInventoryFinancialConsistency(
        inventoryData,
        financialData,
        session
      );
      consistencyChecks.push(inventoryConsistency);
    }

    // Record cross-shard transaction coordination
    const coordinationRecord = await this.databases.financial.collection('transaction_coordination').insertOne({
      distributedTransactionId: this.generateTransactionId(),
      shardsInvolved: shardResults.map(r => r.shard),
      consistencyChecks: consistencyChecks,
      status: 'validated',
      timestamp: new Date()
    }, { session });

    return {
      coordinationId: coordinationRecord.insertedId,
      consistencyChecks: consistencyChecks,
      allConsistent: consistencyChecks.every(check => check.consistent)
    };
  }

  async calculateRiskScore(operation) {
    // Simplified risk scoring
    let score = 0;

    if (operation.amount > 1000) score += 20;
    if (operation.amount > 5000) score += 40;

    // Add more sophisticated risk factors
    return Math.min(score, 100);
  }

  generateTransactionId() {
    return `DTX_${Date.now()}_${Math.random().toString(36).substr(2, 9)}`;
  }

  async validateCustomerFinancialConsistency(customerData, financialData, session) {
    // Validate consistency between customer and financial data
    return {
      consistent: true,
      details: 'Customer-financial data consistency validated'
    };
  }

  async validateInventoryFinancialConsistency(inventoryData, financialData, session) {
    // Validate consistency between inventory and financial data
    return {
      consistent: true,
      details: 'Inventory-financial data consistency validated'
    };
  }

  async handleDistributedTransactionFailure(operation, error, session) {
    // Enhanced error handling for distributed scenarios
    console.log('Handling distributed transaction failure...');

    // Log failure for analysis and recovery
    await this.databases.financial.collection('transaction_failures').insertOne({
      operation: operation,
      error: error.message,
      timestamp: new Date(),
      sessionId: session.id
    }).catch(() => {}); // Don't fail on logging failure
  }
}

Best Practices for Production Transaction Management

Performance Optimization and Monitoring

  1. Transaction Scope: Keep transactions as short as possible to minimize lock contention
  2. Read Preferences: Use appropriate read preferences based on consistency requirements
  3. Write Concerns: Balance between performance and durability with suitable write concerns
  4. Session Management: Properly manage session lifecycle and cleanup
  5. Error Handling: Implement comprehensive error handling with appropriate retry logic
  6. Monitoring: Track transaction performance, abort rates, and deadlock frequency

Distributed System Considerations

  1. Network Partitions: Design for graceful degradation during network splits
  2. Shard Key Design: Choose shard keys that minimize cross-shard transactions
  3. Consistency Models: Understand and apply appropriate consistency levels
  4. Conflict Resolution: Implement strategies for handling concurrent modification conflicts
  5. Recovery Procedures: Plan for disaster recovery and data consistency restoration
  6. Performance Tuning: Optimize for distributed transaction performance characteristics

Conclusion

MongoDB's ACID-compliant transactions provide comprehensive data consistency guarantees for distributed applications while maintaining the flexibility and performance advantages of document-based storage. The integration with QueryLeaf enables familiar SQL transaction patterns for teams transitioning from relational databases.

Key advantages of MongoDB transactions include:

  • ACID Compliance: Full atomicity, consistency, isolation, and durability guarantees
  • Multi-Document Operations: Atomic operations across multiple documents and collections
  • Distributed Support: Cross-shard transactions in sharded cluster deployments
  • Flexible Consistency: Configurable read and write concerns for different requirements
  • SQL Familiarity: Traditional transaction syntax through QueryLeaf integration
  • Production Ready: Enterprise-grade transaction management with monitoring and recovery

Whether you're building financial systems, e-commerce platforms, or complex business applications, MongoDB transactions with QueryLeaf's SQL interface provide the foundation for maintaining data integrity while leveraging the scalability and flexibility of modern document databases.

QueryLeaf Integration: QueryLeaf seamlessly translates SQL transaction operations into MongoDB transaction sessions. Complex multi-table operations, isolation levels, and savepoint management are handled automatically while providing familiar SQL transaction semantics, making sophisticated distributed transaction patterns accessible to SQL-oriented development teams.

The combination of MongoDB's robust transaction capabilities with SQL-familiar transaction management creates an ideal platform for applications that require both strong consistency guarantees and the flexibility to evolve data models as business requirements change.

MongoDB Change Data Capture and Real-Time Streaming Applications: Building Event-Driven Architectures with Change Streams and SQL-Style Data Synchronization

Modern applications require real-time responsiveness to data changes, enabling live dashboards, instant notifications, collaborative editing, and synchronized multi-device experiences that react immediately to database modifications. Traditional polling-based approaches for detecting data changes introduce latency, consume unnecessary resources, and create scalability bottlenecks that limit real-time application performance.

MongoDB Change Data Capture (CDC) through Change Streams provides comprehensive real-time data change notification capabilities that enable reactive architectures, event-driven microservices, and live data synchronization across distributed systems. Unlike polling mechanisms that repeatedly query databases for changes, MongoDB Change Streams deliver immediate notifications of data modifications, enabling applications to react instantly to database events with minimal overhead.

The Traditional Polling and Batch Processing Challenge

Conventional approaches to detecting data changes rely on inefficient polling, timestamps, or batch processing that introduce latency and resource waste:

-- Traditional PostgreSQL change detection - inefficient polling and resource-intensive approaches

-- Timestamp-based change tracking with performance limitations
CREATE TABLE orders (
    order_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    customer_id UUID NOT NULL,
    product_id UUID NOT NULL,
    quantity INTEGER NOT NULL CHECK (quantity > 0),
    unit_price DECIMAL(10,2) NOT NULL,
    total_amount DECIMAL(10,2) NOT NULL,
    order_status VARCHAR(20) NOT NULL DEFAULT 'pending',
    shipping_address JSONB,
    payment_info JSONB,

    -- Change tracking fields (manual maintenance required)
    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    version INTEGER DEFAULT 1,
    last_modified_by UUID,

    -- Inefficient change flags
    is_modified BOOLEAN DEFAULT FALSE,
    change_type VARCHAR(10) DEFAULT 'insert',
    sync_required BOOLEAN DEFAULT TRUE
);

-- Audit table for change history (storage overhead)
CREATE TABLE order_audit (
    audit_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    order_id UUID NOT NULL,
    operation_type VARCHAR(10) NOT NULL, -- 'INSERT', 'UPDATE', 'DELETE'
    old_values JSONB,
    new_values JSONB,
    changed_fields TEXT[],
    changed_by UUID,
    change_timestamp TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    change_source VARCHAR(50)
);

-- Trigger-based change tracking (complex maintenance)
CREATE OR REPLACE FUNCTION track_order_changes()
RETURNS TRIGGER AS $$
BEGIN
    -- Update timestamp and version
    NEW.updated_at = CURRENT_TIMESTAMP;
    NEW.version = COALESCE(OLD.version, 0) + 1;
    NEW.is_modified = TRUE;
    NEW.sync_required = TRUE;

    -- Log to audit table
    INSERT INTO order_audit (
        order_id, 
        operation_type, 
        old_values, 
        new_values,
        changed_fields,
        changed_by,
        change_source
    ) VALUES (
        NEW.order_id,
        TG_OP,
        CASE WHEN TG_OP = 'UPDATE' THEN row_to_json(OLD) ELSE NULL END,
        row_to_json(NEW),
        CASE WHEN TG_OP = 'UPDATE' THEN 
            array_agg(key) FILTER (WHERE (OLD.*)::json->>key IS DISTINCT FROM (NEW.*)::json->>key)
        ELSE NULL END,
        NEW.last_modified_by,
        'database_trigger'
    );

    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER order_change_trigger
    BEFORE INSERT OR UPDATE ON orders
    FOR EACH ROW EXECUTE FUNCTION track_order_changes();

-- Inefficient polling-based change detection
WITH recent_changes AS (
    -- Polling approach - expensive and introduces latency
    SELECT 
        o.order_id,
        o.customer_id,
        o.order_status,
        o.total_amount,
        o.updated_at,
        o.version,
        o.is_modified,
        o.sync_required,

        -- Calculate time since last change
        EXTRACT(EPOCH FROM (CURRENT_TIMESTAMP - o.updated_at)) as seconds_since_change,

        -- Determine if change is recent enough for processing
        CASE 
            WHEN o.updated_at > CURRENT_TIMESTAMP - INTERVAL '5 minutes' THEN 'immediate'
            WHEN o.updated_at > CURRENT_TIMESTAMP - INTERVAL '30 minutes' THEN 'batch'
            ELSE 'delayed'
        END as processing_priority,

        -- Get audit information
        oa.operation_type,
        oa.changed_fields,
        oa.change_timestamp

    FROM orders o
    LEFT JOIN order_audit oa ON o.order_id = oa.order_id 
        AND oa.change_timestamp = (
            SELECT MAX(change_timestamp) 
            FROM order_audit oa2 
            WHERE oa2.order_id = o.order_id
        )
    WHERE 
        o.is_modified = TRUE 
        OR o.sync_required = TRUE
        OR o.updated_at > CURRENT_TIMESTAMP - INTERVAL '1 hour'  -- Polling window
),
change_processing AS (
    SELECT 
        rc.*,

        -- Categorize changes for different processing systems
        CASE rc.order_status
            WHEN 'confirmed' THEN 'inventory_update'
            WHEN 'shipped' THEN 'shipping_notification' 
            WHEN 'delivered' THEN 'delivery_confirmation'
            WHEN 'cancelled' THEN 'refund_processing'
            ELSE 'general_update'
        END as event_type,

        -- Calculate processing delay
        CASE 
            WHEN rc.seconds_since_change < 60 THEN 'real_time'
            WHEN rc.seconds_since_change < 300 THEN 'near_real_time'  
            WHEN rc.seconds_since_change < 1800 THEN 'delayed'
            ELSE 'stale'
        END as data_freshness,

        -- Determine notification requirements
        ARRAY[
            CASE WHEN rc.order_status IN ('shipped', 'delivered') THEN 'customer_sms' END,
            CASE WHEN rc.order_status = 'confirmed' THEN 'inventory_system' END,
            CASE WHEN rc.total_amount > 1000 THEN 'fraud_monitoring' END,
            CASE WHEN rc.changed_fields && ARRAY['shipping_address'] THEN 'logistics_update' END
        ] as notification_targets,

        -- Generate webhook payloads
        jsonb_build_object(
            'event_type', 'order_updated',
            'order_id', rc.order_id,
            'customer_id', rc.customer_id,
            'status', rc.order_status,
            'timestamp', rc.change_timestamp,
            'changed_fields', rc.changed_fields,
            'version', rc.version
        ) as webhook_payload

    FROM recent_changes rc
),
notification_queue AS (
    -- Build notification queue for external systems
    SELECT 
        cp.order_id,
        unnest(cp.notification_targets) as target_system,
        cp.webhook_payload,
        cp.event_type,
        cp.data_freshness,

        -- Priority scoring for queue processing
        CASE cp.event_type
            WHEN 'shipping_notification' THEN 5
            WHEN 'delivery_confirmation' THEN 5
            WHEN 'fraud_monitoring' THEN 10
            WHEN 'inventory_update' THEN 7
            ELSE 3
        END as priority_score,

        CURRENT_TIMESTAMP as queued_at,

        -- Retry logic configuration
        CASE cp.data_freshness
            WHEN 'real_time' THEN 3
            WHEN 'near_real_time' THEN 2
            ELSE 1
        END as max_retries

    FROM change_processing cp
    WHERE cp.notification_targets IS NOT NULL
)

-- Process changes and generate notifications
SELECT 
    nq.order_id,
    nq.target_system,
    nq.event_type,
    nq.priority_score,
    nq.max_retries,
    nq.webhook_payload,

    -- System-specific endpoint configuration
    CASE nq.target_system
        WHEN 'customer_sms' THEN 'https://api.sms.service.com/send'
        WHEN 'inventory_system' THEN 'https://inventory.internal/api/webhooks'
        WHEN 'fraud_monitoring' THEN 'https://fraud.security.com/api/alerts'
        WHEN 'logistics_update' THEN 'https://logistics.partner.com/api/updates'
    END as webhook_endpoint,

    -- Processing metadata
    nq.queued_at,
    nq.queued_at + INTERVAL '5 minutes' as max_processing_time,

    -- Performance impact assessment
    'polling_based_change_detection' as detection_method,
    EXTRACT(EPOCH FROM (CURRENT_TIMESTAMP - cp.change_timestamp)) as detection_latency_seconds

FROM notification_queue nq
JOIN change_processing cp ON nq.order_id = cp.order_id
WHERE nq.target_system IS NOT NULL
ORDER BY nq.priority_score DESC, nq.queued_at ASC;

-- Reset change flags (must be done manually)
UPDATE orders 
SET is_modified = FALSE, sync_required = FALSE
WHERE is_modified = TRUE 
  AND updated_at < CURRENT_TIMESTAMP - INTERVAL '1 minute';

-- Problems with traditional change detection approaches:
-- 1. Polling introduces significant latency between data changes and detection
-- 2. Constant polling consumes database resources even when no changes occur
-- 3. Complex trigger logic that's difficult to maintain and debug
-- 4. Manual synchronization flag management prone to race conditions
-- 5. Audit table storage overhead grows linearly with change volume
-- 6. No real-time notifications - applications must continuously poll
-- 7. Difficult to scale across multiple application instances
-- 8. Poor performance with high-frequency changes or large datasets
-- 9. Complex conflict resolution when multiple systems modify data
-- 10. No built-in filtering or transformation of change events

-- Batch processing approach (high latency)
WITH batch_changes AS (
    SELECT 
        o.order_id,
        o.customer_id,
        o.order_status,
        o.updated_at,

        -- Batch processing windows
        DATE_TRUNC('hour', o.updated_at) as processing_batch,

        -- Change detection via timestamp comparison
        CASE 
            WHEN o.updated_at > (
                SELECT COALESCE(MAX(last_processed_at), '1970-01-01'::timestamp)
                FROM processing_checkpoints 
                WHERE system_name = 'order_processor'
            ) THEN TRUE
            ELSE FALSE
        END as requires_processing,

        -- Lag calculation
        EXTRACT(EPOCH FROM (
            CURRENT_TIMESTAMP - o.updated_at
        )) / 60.0 as processing_delay_minutes

    FROM orders o
    WHERE o.updated_at > CURRENT_TIMESTAMP - INTERVAL '24 hours'
),
processing_statistics AS (
    SELECT 
        bc.processing_batch,
        COUNT(*) as total_changes,
        COUNT(*) FILTER (WHERE bc.requires_processing) as unprocessed_changes,
        AVG(bc.processing_delay_minutes) as avg_delay_minutes,
        MAX(bc.processing_delay_minutes) as max_delay_minutes,

        -- Batch processing efficiency
        CASE 
            WHEN COUNT(*) FILTER (WHERE bc.requires_processing) = 0 THEN 'up_to_date'
            WHEN AVG(bc.processing_delay_minutes) < 60 THEN 'acceptable_delay'
            WHEN AVG(bc.processing_delay_minutes) < 240 THEN 'moderate_delay'
            ELSE 'high_delay'
        END as processing_status

    FROM batch_changes bc
    GROUP BY bc.processing_batch
)

SELECT 
    processing_batch,
    total_changes,
    unprocessed_changes,
    ROUND(avg_delay_minutes::numeric, 2) as avg_delay_minutes,
    ROUND(max_delay_minutes::numeric, 2) as max_delay_minutes,
    processing_status,

    -- Performance assessment
    CASE processing_status
        WHEN 'high_delay' THEN 'Critical: Real-time requirements not met'
        WHEN 'moderate_delay' THEN 'Warning: Consider increasing processing frequency'
        WHEN 'acceptable_delay' THEN 'Good: Within acceptable parameters'
        ELSE 'Excellent: No backlog'
    END as performance_assessment

FROM processing_statistics
WHERE total_changes > 0
ORDER BY processing_batch DESC;

-- Traditional limitations:
-- 1. Batch processing introduces hours of latency for real-time requirements
-- 2. Resource waste from processing empty batches
-- 3. Complex checkpoint management and recovery logic
-- 4. Poor user experience with delayed updates and notifications
-- 5. Difficult horizontal scaling across multiple processing nodes
-- 6. No event ordering guarantees across different data modifications
-- 7. Limited ability to filter events based on content or business logic
-- 8. Manual coordination required between multiple consuming applications
-- 9. High operational overhead for monitoring and maintaining batch jobs
-- 10. Poor integration with modern event-driven and microservices architectures

MongoDB Change Data Capture provides efficient real-time change tracking:

// MongoDB Change Data Capture - real-time event-driven architecture with comprehensive change stream management
const { MongoClient } = require('mongodb');
const { EventEmitter } = require('events');

const client = new MongoClient('mongodb://localhost:27017');
const db = client.db('realtime_commerce_platform');

// Advanced Change Data Capture and event-driven processing system
class AdvancedChangeCaptureEngine extends EventEmitter {
  constructor(db, configuration = {}) {
    super();
    this.db = db;
    this.collections = {
      orders: db.collection('orders'),
      customers: db.collection('customers'),
      products: db.collection('products'),
      inventory: db.collection('inventory'),
      payments: db.collection('payments'),
      notifications: db.collection('notifications'),
      eventLog: db.collection('event_log')
    };

    // Advanced CDC configuration
    this.config = {
      changeStreamConfig: {
        fullDocument: 'updateLookup',
        fullDocumentBeforeChange: 'whenAvailable',
        showExpandedEvents: true,
        batchSize: configuration.batchSize || 100,
        maxAwaitTimeMS: configuration.maxAwaitTimeMS || 1000
      },

      // Event processing configuration
      eventProcessing: {
        enableAsync: true,
        enableRetry: true,
        retryAttempts: configuration.retryAttempts || 3,
        retryDelayMs: configuration.retryDelayMs || 1000,
        deadLetterQueue: true,
        preserveOrdering: true
      },

      // Filtering and routing configuration
      eventFiltering: {
        enableContentFiltering: true,
        enableBusinessLogicFiltering: true,
        enableUserDefinedFilters: true
      },

      // Performance optimization
      performance: {
        enableEventBatching: configuration.enableEventBatching || true,
        batchTimeoutMs: configuration.batchTimeoutMs || 500,
        enableParallelProcessing: true,
        maxConcurrentProcessors: configuration.maxConcurrentProcessors || 10
      },

      // Monitoring and observability
      monitoring: {
        enableMetrics: true,
        enableTracing: true,
        metricsIntervalMs: 30000,
        healthCheckIntervalMs: 5000
      }
    };

    // Internal state management
    this.changeStreams = new Map();
    this.eventProcessors = new Map();
    this.processingMetrics = {
      eventsProcessed: 0,
      eventsFailedProcessing: 0,
      averageProcessingTime: 0,
      lastProcessedTimestamp: null,
      activeChangeStreams: 0
    };

    // Event routing and transformation
    this.eventRouters = new Map();
    this.eventTransformers = new Map();
    this.businessRuleProcessors = new Map();

    this.initializeAdvancedCDC();
  }

  async initializeAdvancedCDC() {
    console.log('Initializing advanced MongoDB Change Data Capture system...');

    try {
      // Setup comprehensive change stream monitoring
      await this.setupCollectionChangeStreams();

      // Initialize event processing pipelines
      await this.initializeEventProcessors();

      // Setup business logic handlers
      await this.setupBusinessLogicHandlers();

      // Initialize monitoring and health checks
      await this.initializeMonitoring();

      console.log('Advanced CDC system initialized successfully');

    } catch (error) {
      console.error('Failed to initialize CDC system:', error);
      throw error;
    }
  }

  async setupCollectionChangeStreams() {
    console.log('Setting up collection change streams with advanced filtering...');

    // Orders collection change stream with comprehensive business logic
    const ordersChangeStream = this.collections.orders.watch([
      // Stage 1: Filter for relevant order events
      {
        $match: {
          $or: [
            // Order status changes
            { 'updateDescription.updatedFields.status': { $exists: true } },

            // Payment status changes
            { 'updateDescription.updatedFields.payment.status': { $exists: true } },

            // Shipping address changes
            { 'updateDescription.updatedFields.shipping.address': { $exists: true } },

            // High-value order insertions
            { 
              operationType: 'insert',
              'fullDocument.total': { $gte: 1000 }
            },

            // Order cancellations or refunds
            { 'updateDescription.updatedFields.cancellation': { $exists: true } },
            { 'updateDescription.updatedFields.refund': { $exists: true } }
          ]
        }
      },

      // Stage 2: Add enhanced metadata and business context
      {
        $addFields: {
          processedTimestamp: '$$NOW',
          changeStreamSource: 'orders_collection',

          // Extract key business events
          businessEvent: {
            $switch: {
              branches: [
                {
                  case: { 
                    $and: [
                      { $eq: ['$operationType', 'update'] },
                      { $eq: ['$updateDescription.updatedFields.status', 'confirmed'] }
                    ]
                  },
                  then: 'order_confirmed'
                },
                {
                  case: { 
                    $and: [
                      { $eq: ['$operationType', 'update'] },
                      { $eq: ['$updateDescription.updatedFields.status', 'shipped'] }
                    ]
                  },
                  then: 'order_shipped'
                },
                {
                  case: { 
                    $and: [
                      { $eq: ['$operationType', 'update'] },
                      { $eq: ['$updateDescription.updatedFields.payment.status', 'completed'] }
                    ]
                  },
                  then: 'payment_completed'
                },
                {
                  case: { 
                    $and: [
                      { $eq: ['$operationType', 'insert'] },
                      { $gte: ['$fullDocument.total', 1000] }
                    ]
                  },
                  then: 'high_value_order_created'
                }
              ],
              default: 'order_updated'
            }
          },

          // Priority scoring for event processing
          eventPriority: {
            $switch: {
              branches: [
                { case: { $eq: ['$updateDescription.updatedFields.payment.status', 'failed'] }, then: 10 },
                { case: { $eq: ['$updateDescription.updatedFields.status', 'cancelled'] }, then: 8 },
                { case: { $gte: ['$fullDocument.total', 5000] }, then: 7 },
                { case: { $eq: ['$updateDescription.updatedFields.status', 'shipped'] }, then: 6 },
                { case: { $eq: ['$updateDescription.updatedFields.status', 'confirmed'] }, then: 5 }
              ],
              default: 3
            }
          },

          // Determine required downstream actions
          requiredActions: {
            $switch: {
              branches: [
                {
                  case: { $eq: ['$updateDescription.updatedFields.status', 'confirmed'] },
                  then: ['inventory_update', 'customer_notification', 'logistics_preparation']
                },
                {
                  case: { $eq: ['$updateDescription.updatedFields.status', 'shipped'] },
                  then: ['shipping_notification', 'tracking_activation', 'delivery_estimation']
                },
                {
                  case: { $eq: ['$updateDescription.updatedFields.payment.status', 'completed'] },
                  then: ['receipt_generation', 'accounting_sync', 'loyalty_points_update']
                },
                {
                  case: { $gte: ['$fullDocument.total', 1000] },
                  then: ['fraud_screening', 'vip_handling', 'priority_processing']
                }
              ],
              default: ['general_processing']
            }
          }
        }
      }
    ], this.config.changeStreamConfig);

    // Register sophisticated event handlers
    ordersChangeStream.on('change', async (changeDocument) => {
      await this.processOrderChangeEvent(changeDocument);
    });

    ordersChangeStream.on('error', (error) => {
      console.error('Orders change stream error:', error);
      this.emit('changeStreamError', { collection: 'orders', error });
    });

    this.changeStreams.set('orders', ordersChangeStream);

    // Inventory collection change stream for real-time stock management
    const inventoryChangeStream = this.collections.inventory.watch([
      {
        $match: {
          $or: [
            // Stock level changes
            { 'updateDescription.updatedFields.quantity': { $exists: true } },
            { 'updateDescription.updatedFields.reservedQuantity': { $exists: true } },

            // Product availability changes
            { 'updateDescription.updatedFields.available': { $exists: true } },

            // Low stock alerts
            { 
              operationType: 'update',
              'fullDocument.quantity': { $lt: 10 }
            }
          ]
        }
      },
      {
        $addFields: {
          processedTimestamp: '$$NOW',
          changeStreamSource: 'inventory_collection',

          // Stock level categorization
          stockStatus: {
            $switch: {
              branches: [
                { case: { $lte: ['$fullDocument.quantity', 0] }, then: 'out_of_stock' },
                { case: { $lte: ['$fullDocument.quantity', 5] }, then: 'critical_low' },
                { case: { $lte: ['$fullDocument.quantity', 20] }, then: 'low_stock' },
                { case: { $gte: ['$fullDocument.quantity', 100] }, then: 'well_stocked' }
              ],
              default: 'normal_stock'
            }
          },

          // Calculate stock velocity and reorder triggers
          reorderRequired: {
            $cond: {
              if: {
                $and: [
                  { $lt: ['$fullDocument.quantity', '$fullDocument.reorderPoint'] },
                  { $ne: ['$fullDocument.reorderStatus', 'pending'] }
                ]
              },
              then: true,
              else: false
            }
          },

          // Urgency scoring for inventory management
          urgencyScore: {
            $add: [
              { $cond: [{ $lte: ['$fullDocument.quantity', 0] }, 10, 0] },
              { $cond: [{ $lte: ['$fullDocument.quantity', 5] }, 7, 0] },
              { $cond: [{ $gte: ['$fullDocument.demandForecast', 50] }, 3, 0] },
              { $cond: [{ $eq: ['$fullDocument.category', 'bestseller'] }, 2, 0] }
            ]
          }
        }
      }
    ], this.config.changeStreamConfig);

    inventoryChangeStream.on('change', async (changeDocument) => {
      await this.processInventoryChangeEvent(changeDocument);
    });

    this.changeStreams.set('inventory', inventoryChangeStream);

    // Customer collection change stream for personalization and CRM
    const customersChangeStream = this.collections.customers.watch([
      {
        $match: {
          $or: [
            // Profile updates
            { 'updateDescription.updatedFields.profile': { $exists: true } },

            // Preference changes
            { 'updateDescription.updatedFields.preferences': { $exists: true } },

            // Loyalty status changes
            { 'updateDescription.updatedFields.loyalty.tier': { $exists: true } },

            // New customer registrations
            { operationType: 'insert' }
          ]
        }
      },
      {
        $addFields: {
          processedTimestamp: '$$NOW',
          changeStreamSource: 'customers_collection',

          // Customer lifecycle events
          lifecycleEvent: {
            $switch: {
              branches: [
                { case: { $eq: ['$operationType', 'insert'] }, then: 'customer_registered' },
                { case: { $ne: ['$updateDescription.updatedFields.loyalty.tier', null] }, then: 'loyalty_tier_changed' },
                { case: { $ne: ['$updateDescription.updatedFields.preferences.marketing', null] }, then: 'communication_preferences_updated' }
              ],
              default: 'customer_profile_updated'
            }
          },

          // Personalization triggers
          personalizationActions: {
            $cond: {
              if: { $eq: ['$operationType', 'insert'] },
              then: ['welcome_sequence', 'preference_collection', 'recommendation_initialization'],
              else: {
                $switch: {
                  branches: [
                    {
                      case: { $ne: ['$updateDescription.updatedFields.preferences.categories', null] },
                      then: ['recommendation_refresh', 'content_personalization']
                    },
                    {
                      case: { $ne: ['$updateDescription.updatedFields.loyalty.tier', null] },
                      then: ['tier_benefits_notification', 'exclusive_offers_activation']
                    }
                  ],
                  default: ['profile_validation']
                }
              }
            }
          }
        }
      }
    ], this.config.changeStreamConfig);

    customersChangeStream.on('change', async (changeDocument) => {
      await this.processCustomerChangeEvent(changeDocument);
    });

    this.changeStreams.set('customers', customersChangeStream);

    this.processingMetrics.activeChangeStreams = this.changeStreams.size;
    console.log(`Initialized ${this.changeStreams.size} change streams with advanced filtering`);
  }

  async processOrderChangeEvent(changeDocument) {
    const startTime = Date.now();

    try {
      console.log(`Processing order change event: ${changeDocument.businessEvent}`);

      // Extract key information from the change document
      const orderId = changeDocument.documentKey._id;
      const operationType = changeDocument.operationType;
      const businessEvent = changeDocument.businessEvent;
      const eventPriority = changeDocument.eventPriority;
      const requiredActions = changeDocument.requiredActions;
      const fullDocument = changeDocument.fullDocument;

      // Create comprehensive event context
      const eventContext = {
        eventId: `order_${orderId}_${Date.now()}`,
        orderId: orderId,
        customerId: fullDocument?.customerId,
        operationType: operationType,
        businessEvent: businessEvent,
        priority: eventPriority,
        timestamp: changeDocument.processedTimestamp,
        requiredActions: requiredActions,

        // Change details
        changeDetails: {
          updatedFields: changeDocument.updateDescription?.updatedFields,
          removedFields: changeDocument.updateDescription?.removedFields,
          previousDocument: changeDocument.fullDocumentBeforeChange
        },

        // Business context
        businessContext: {
          orderValue: fullDocument?.total,
          orderStatus: fullDocument?.status,
          customerTier: fullDocument?.customer?.loyaltyTier,
          paymentMethod: fullDocument?.payment?.method,
          shippingMethod: fullDocument?.shipping?.method
        }
      };

      // Process each required action asynchronously
      const actionPromises = requiredActions.map(action => 
        this.executeBusinessAction(action, eventContext)
      );

      if (this.config.eventProcessing.enableAsync) {
        // Parallel processing for independent actions
        await Promise.allSettled(actionPromises);
      } else {
        // Sequential processing for dependent actions
        for (const actionPromise of actionPromises) {
          await actionPromise;
        }
      }

      // Log successful event processing
      await this.logEventProcessing(eventContext, 'success');

      // Update metrics
      this.updateProcessingMetrics(startTime, true);

      // Emit success event for monitoring
      this.emit('eventProcessed', {
        eventId: eventContext.eventId,
        businessEvent: businessEvent,
        processingTime: Date.now() - startTime
      });

    } catch (error) {
      console.error(`Error processing order change event:`, error);

      // Handle retry logic
      if (this.config.eventProcessing.enableRetry) {
        await this.retryEventProcessing(changeDocument, error);
      }

      // Update error metrics
      this.updateProcessingMetrics(startTime, false);

      // Emit error event for monitoring
      this.emit('eventProcessingError', {
        changeDocument: changeDocument,
        error: error,
        timestamp: new Date()
      });
    }
  }

  async executeBusinessAction(action, eventContext) {
    console.log(`Executing business action: ${action} for event: ${eventContext.eventId}`);

    try {
      switch (action) {
        case 'inventory_update':
          await this.updateInventoryForOrder(eventContext);
          break;

        case 'customer_notification':
          await this.sendCustomerNotification(eventContext);
          break;

        case 'logistics_preparation':
          await this.prepareLogistics(eventContext);
          break;

        case 'shipping_notification':
          await this.sendShippingNotification(eventContext);
          break;

        case 'tracking_activation':
          await this.activateOrderTracking(eventContext);
          break;

        case 'payment_processing':
          await this.processPayment(eventContext);
          break;

        case 'fraud_screening':
          await this.performFraudScreening(eventContext);
          break;

        case 'loyalty_points_update':
          await this.updateLoyaltyPoints(eventContext);
          break;

        case 'analytics_update':
          await this.updateAnalytics(eventContext);
          break;

        default:
          console.warn(`Unknown business action: ${action}`);
      }

    } catch (actionError) {
      console.error(`Error executing business action ${action}:`, actionError);
      throw actionError;
    }
  }

  async updateInventoryForOrder(eventContext) {
    console.log(`Updating inventory for order: ${eventContext.orderId}`);

    try {
      // Get order details
      const order = await this.collections.orders.findOne(
        { _id: eventContext.orderId }
      );

      if (!order || !order.items) {
        throw new Error(`Order ${eventContext.orderId} not found or has no items`);
      }

      // Process inventory updates for each order item
      const inventoryUpdates = order.items.map(async (item) => {
        const inventoryUpdate = {
          $inc: {
            reservedQuantity: item.quantity,
            availableQuantity: -item.quantity
          },
          $push: {
            reservations: {
              orderId: eventContext.orderId,
              quantity: item.quantity,
              reservedAt: new Date(),
              status: 'active'
            }
          },
          $set: {
            lastUpdated: new Date(),
            lastUpdateReason: 'order_confirmed'
          }
        };

        return this.collections.inventory.updateOne(
          { productId: item.productId },
          inventoryUpdate
        );
      });

      // Execute all inventory updates
      await Promise.all(inventoryUpdates);

      console.log(`Inventory updated successfully for order: ${eventContext.orderId}`);

    } catch (error) {
      console.error(`Failed to update inventory for order ${eventContext.orderId}:`, error);
      throw error;
    }
  }

  async sendCustomerNotification(eventContext) {
    console.log(`Sending customer notification for event: ${eventContext.businessEvent}`);

    try {
      // Get customer information
      const customer = await this.collections.customers.findOne(
        { _id: eventContext.customerId }
      );

      if (!customer) {
        throw new Error(`Customer ${eventContext.customerId} not found`);
      }

      // Determine notification content based on business event
      const notificationConfig = this.getNotificationConfig(
        eventContext.businessEvent, 
        eventContext.businessContext
      );

      // Create notification document
      const notification = {
        customerId: eventContext.customerId,
        orderId: eventContext.orderId,
        type: notificationConfig.type,
        channel: this.selectNotificationChannel(customer.preferences),

        content: {
          subject: notificationConfig.subject,
          message: this.personalizeMessage(
            notificationConfig.template,
            customer,
            eventContext.businessContext
          ),
          actionUrl: notificationConfig.actionUrl,
          imageUrl: notificationConfig.imageUrl
        },

        priority: eventContext.priority,
        scheduledFor: this.calculateDeliveryTime(notificationConfig.timing),

        metadata: {
          eventId: eventContext.eventId,
          businessEvent: eventContext.businessEvent,
          createdAt: new Date()
        }
      };

      // Store notification for delivery
      const result = await this.collections.notifications.insertOne(notification);

      // Trigger immediate delivery for high-priority notifications
      if (eventContext.priority >= 7) {
        await this.deliverNotificationImmediately(notification);
      }

      console.log(`Notification created successfully: ${result.insertedId}`);

    } catch (error) {
      console.error(`Failed to send customer notification:`, error);
      throw error;
    }
  }

  async processInventoryChangeEvent(changeDocument) {
    const startTime = Date.now();

    try {
      console.log(`Processing inventory change event: ${changeDocument.stockStatus}`);

      const productId = changeDocument.documentKey._id;
      const stockStatus = changeDocument.stockStatus;
      const urgencyScore = changeDocument.urgencyScore;
      const reorderRequired = changeDocument.reorderRequired;
      const fullDocument = changeDocument.fullDocument;

      const eventContext = {
        eventId: `inventory_${productId}_${Date.now()}`,
        productId: productId,
        stockStatus: stockStatus,
        urgencyScore: urgencyScore,
        reorderRequired: reorderRequired,
        currentQuantity: fullDocument?.quantity,
        changeDetails: changeDocument.updateDescription
      };

      // Handle critical stock situations
      if (stockStatus === 'out_of_stock' || stockStatus === 'critical_low') {
        await this.handleCriticalStockSituation(eventContext);
      }

      // Trigger reorder process if needed
      if (reorderRequired) {
        await this.initiateReorderProcess(eventContext);
      }

      // Update product availability in real-time
      await this.updateProductAvailability(eventContext);

      // Notify relevant stakeholders
      await this.notifyStakeholders(eventContext);

      this.updateProcessingMetrics(startTime, true);

    } catch (error) {
      console.error(`Error processing inventory change event:`, error);
      this.updateProcessingMetrics(startTime, false);
    }
  }

  async processCustomerChangeEvent(changeDocument) {
    const startTime = Date.now();

    try {
      console.log(`Processing customer change event: ${changeDocument.lifecycleEvent}`);

      const customerId = changeDocument.documentKey._id;
      const lifecycleEvent = changeDocument.lifecycleEvent;
      const personalizationActions = changeDocument.personalizationActions;
      const fullDocument = changeDocument.fullDocument;

      const eventContext = {
        eventId: `customer_${customerId}_${Date.now()}`,
        customerId: customerId,
        lifecycleEvent: lifecycleEvent,
        personalizationActions: personalizationActions,
        customerData: fullDocument
      };

      // Execute personalization actions
      for (const action of personalizationActions) {
        await this.executePersonalizationAction(action, eventContext);
      }

      this.updateProcessingMetrics(startTime, true);

    } catch (error) {
      console.error(`Error processing customer change event:`, error);
      this.updateProcessingMetrics(startTime, false);
    }
  }

  async initializeEventProcessors() {
    console.log('Initializing specialized event processors...');

    // Order fulfillment processor
    this.eventProcessors.set('order_fulfillment', {
      process: async (eventContext) => {
        await this.processOrderFulfillment(eventContext);
      },
      concurrency: 5,
      retryPolicy: { maxAttempts: 3, backoffMs: 1000 }
    });

    // Payment processor
    this.eventProcessors.set('payment_processing', {
      process: async (eventContext) => {
        await this.processPaymentEvent(eventContext);
      },
      concurrency: 10,
      retryPolicy: { maxAttempts: 5, backoffMs: 2000 }
    });

    // Notification processor
    this.eventProcessors.set('notification_delivery', {
      process: async (eventContext) => {
        await this.processNotificationDelivery(eventContext);
      },
      concurrency: 20,
      retryPolicy: { maxAttempts: 3, backoffMs: 500 }
    });
  }

  async logEventProcessing(eventContext, status) {
    try {
      const logEntry = {
        eventId: eventContext.eventId,
        timestamp: new Date(),
        status: status,
        eventType: eventContext.businessEvent || eventContext.lifecycleEvent,
        processingTime: Date.now() - new Date(eventContext.timestamp).getTime(),
        context: eventContext,
        metadata: {
          changeStreamSource: eventContext.changeStreamSource,
          priority: eventContext.priority
        }
      };

      await this.collections.eventLog.insertOne(logEntry);

    } catch (logError) {
      console.error('Failed to log event processing:', logError);
      // Don't throw - logging failures shouldn't break event processing
    }
  }

  updateProcessingMetrics(startTime, success) {
    this.processingMetrics.eventsProcessed++;

    if (success) {
      const processingTime = Date.now() - startTime;
      this.processingMetrics.averageProcessingTime = 
        (this.processingMetrics.averageProcessingTime + processingTime) / 2;
      this.processingMetrics.lastProcessedTimestamp = new Date();
    } else {
      this.processingMetrics.eventsFailedProcessing++;
    }
  }

  // Additional utility methods for comprehensive CDC functionality

  getNotificationConfig(businessEvent, businessContext) {
    const notificationConfigs = {
      order_confirmed: {
        type: 'order_confirmation',
        subject: 'Order Confirmed - Thank You!',
        template: 'order_confirmation_template',
        timing: 'immediate',
        actionUrl: '/orders/{orderId}',
        imageUrl: '/images/order-confirmed.png'
      },
      order_shipped: {
        type: 'shipping_notification',
        subject: 'Your Order is On the Way!',
        template: 'shipping_notification_template',
        timing: 'immediate',
        actionUrl: '/orders/{orderId}/tracking',
        imageUrl: '/images/package-shipped.png'
      },
      payment_completed: {
        type: 'payment_confirmation',
        subject: 'Payment Received',
        template: 'payment_confirmation_template',
        timing: 'immediate',
        actionUrl: '/orders/{orderId}/receipt',
        imageUrl: '/images/payment-success.png'
      }
    };

    return notificationConfigs[businessEvent] || {
      type: 'general_notification',
      subject: 'Order Update',
      template: 'general_update_template',
      timing: 'delayed'
    };
  }

  selectNotificationChannel(customerPreferences) {
    if (!customerPreferences) return 'email';

    if (customerPreferences.notifications?.push?.enabled) return 'push';
    if (customerPreferences.notifications?.sms?.enabled) return 'sms';
    return 'email';
  }

  personalizeMessage(template, customer, businessContext) {
    // Simplified personalization - in production, use a templating engine
    return template
      .replace('{customerName}', customer.profile?.firstName || 'Valued Customer')
      .replace('{orderId}', businessContext.orderId)
      .replace('{orderValue}', businessContext.orderValue);
  }

  async retryEventProcessing(changeDocument, error) {
    console.log(`Retrying event processing for change document: ${changeDocument._id}`);
    // Implement exponential backoff retry logic
    // This is a simplified version - production should use a proper retry queue
  }

  async initializeMonitoring() {
    console.log('Initializing CDC monitoring and health checks...');

    // Set up periodic health checks
    setInterval(() => {
      this.performHealthCheck();
    }, this.config.monitoring.healthCheckIntervalMs);

    // Set up metrics collection
    setInterval(() => {
      this.collectMetrics();
    }, this.config.monitoring.metricsIntervalMs);
  }

  performHealthCheck() {
    // Check change stream health
    let healthyStreams = 0;
    this.changeStreams.forEach((stream, name) => {
      if (!stream.closed) {
        healthyStreams++;
      } else {
        console.warn(`Change stream ${name} is closed - attempting reconnection`);
        // Implement reconnection logic
      }
    });

    const healthStatus = {
      timestamp: new Date(),
      totalStreams: this.changeStreams.size,
      healthyStreams: healthyStreams,
      processingMetrics: this.processingMetrics
    };

    this.emit('healthCheck', healthStatus);
  }

  collectMetrics() {
    const metrics = {
      timestamp: new Date(),
      ...this.processingMetrics,
      changeStreamStatus: Array.from(this.changeStreams.entries()).map(([name, stream]) => ({
        name: name,
        closed: stream.closed,
        hasNext: stream.hasNext()
      }))
    };

    this.emit('metricsCollected', metrics);
  }
}

// Benefits of MongoDB Change Data Capture:
// - Real-time data change notifications without polling overhead
// - Comprehensive change document information including before/after states  
// - Built-in filtering and transformation capabilities within change streams
// - Automatic ordering and delivery guarantees for change events
// - Horizontal scalability with replica set and sharded cluster support
// - Integration with MongoDB's operational capabilities (backup, monitoring)
// - Event-driven architecture enablement for microservices and reactive systems
// - Minimal performance impact on primary database operations
// - Rich metadata and context information for intelligent event processing
// - Native MongoDB driver integration with automatic reconnection handling

module.exports = {
  AdvancedChangeCaptureEngine
};

Understanding MongoDB Change Streams Architecture

Advanced Event Processing and Business Logic Integration

Implement sophisticated change data capture strategies for production real-time applications:

// Production-ready MongoDB Change Data Capture with enterprise-grade event processing
class EnterpriseChangeDataCaptureSystem extends AdvancedChangeCaptureEngine {
  constructor(db, enterpriseConfig) {
    super(db, enterpriseConfig);

    this.enterpriseConfig = {
      ...enterpriseConfig,

      // Advanced event sourcing
      eventSourcing: {
        enableEventStore: true,
        eventRetentionDays: 365,
        snapshotFrequency: 1000,
        enableReplay: true
      },

      // Distributed processing
      distributedProcessing: {
        enableClusterMode: true,
        nodeId: process.env.NODE_ID || 'node-1',
        coordinationDatabase: 'cdc_coordination',
        leaderElection: true
      },

      // Advanced monitoring
      observability: {
        enableDistributedTracing: true,
        enableCustomMetrics: true,
        alertingThresholds: {
          processingLatency: 5000,
          errorRate: 0.05,
          backlogSize: 1000
        }
      }
    };

    this.setupEnterpriseFeatures();
  }

  async setupEventSourcingCapabilities() {
    console.log('Setting up enterprise event sourcing capabilities...');

    // Event store for complete audit trail
    const eventStore = this.db.collection('event_store');
    await eventStore.createIndex({ aggregateId: 1, version: 1 }, { unique: true });
    await eventStore.createIndex({ eventType: 1, timestamp: -1 });
    await eventStore.createIndex({ timestamp: -1 });

    // Snapshots for performance optimization
    const snapshots = this.db.collection('aggregate_snapshots');
    await snapshots.createIndex({ aggregateId: 1, version: -1 });

    return { eventStore, snapshots };
  }

  async implementAdvancedEventRouting() {
    console.log('Implementing advanced event routing and transformation...');

    // Dynamic event routing based on content and business rules
    const routingRules = [
      {
        name: 'high_value_order_routing',
        condition: (event) => event.businessContext?.orderValue > 5000,
        destinations: ['fraud_detection', 'vip_processing', 'management_alerts'],
        transformation: this.transformHighValueOrder.bind(this)
      },

      {
        name: 'inventory_critical_routing',
        condition: (event) => event.stockStatus === 'critical_low',
        destinations: ['procurement', 'sales_alerts', 'website_updates'],
        transformation: this.transformInventoryAlert.bind(this)
      },

      {
        name: 'customer_lifecycle_routing',
        condition: (event) => event.lifecycleEvent === 'customer_registered',
        destinations: ['marketing_automation', 'personalization_engine', 'crm_sync'],
        transformation: this.transformCustomerEvent.bind(this)
      }
    ];

    return routingRules;
  }

  async setupDistributedProcessing() {
    console.log('Setting up distributed CDC processing...');

    // Implement leader election for coordinated processing
    const coordination = {
      leaderElection: await this.setupLeaderElection(),
      workloadDistribution: await this.setupWorkloadDistribution(),
      failoverHandling: await this.setupFailoverHandling()
    };

    return coordination;
  }
}

SQL-Style Change Data Capture with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB Change Data Capture and real-time streaming operations:

-- QueryLeaf advanced change data capture with SQL-familiar syntax

-- Create change data capture streams with comprehensive filtering and transformation
CREATE CHANGE STREAM order_events 
ON orders 
WITH (
  full_document = 'update_lookup',
  full_document_before_change = 'when_available',
  show_expanded_events = true
)
AS
SELECT 
  change_id() as event_id,
  operation_type(),
  document_key() as order_id,
  cluster_time() as event_timestamp,

  -- Enhanced change document information
  full_document() as current_order,
  full_document_before_change() as previous_order,
  update_description() as change_details,

  -- Business event classification
  CASE 
    WHEN operation_type() = 'insert' THEN 'order_created'
    WHEN JSON_EXTRACT(update_description(), '$.updatedFields.status') = 'confirmed' THEN 'order_confirmed'
    WHEN JSON_EXTRACT(update_description(), '$.updatedFields.status') = 'shipped' THEN 'order_shipped'
    WHEN JSON_EXTRACT(update_description(), '$.updatedFields.status') = 'delivered' THEN 'order_delivered'
    WHEN JSON_EXTRACT(update_description(), '$.updatedFields.status') = 'cancelled' THEN 'order_cancelled'
    WHEN JSON_EXTRACT(update_description(), '$.updatedFields.payment.status') = 'completed' THEN 'payment_completed'
    WHEN JSON_EXTRACT(update_description(), '$.updatedFields.payment.status') = 'failed' THEN 'payment_failed'
    ELSE 'order_updated'
  END as business_event,

  -- Priority scoring for event processing
  CASE 
    WHEN JSON_EXTRACT(full_document(), '$.total') > 5000 THEN 10
    WHEN JSON_EXTRACT(update_description(), '$.updatedFields.payment.status') = 'failed' THEN 9
    WHEN JSON_EXTRACT(update_description(), '$.updatedFields.status') = 'cancelled' THEN 8
    WHEN JSON_EXTRACT(update_description(), '$.updatedFields.status') = 'shipped' THEN 7
    WHEN JSON_EXTRACT(update_description(), '$.updatedFields.status') = 'confirmed' THEN 6
    ELSE 3
  END as event_priority,

  -- Required downstream actions
  CASE 
    WHEN JSON_EXTRACT(update_description(), '$.updatedFields.status') = 'confirmed' THEN 
      JSON_ARRAY('inventory_update', 'customer_notification', 'logistics_preparation')
    WHEN JSON_EXTRACT(update_description(), '$.updatedFields.status') = 'shipped' THEN 
      JSON_ARRAY('shipping_notification', 'tracking_activation', 'delivery_estimation')
    WHEN JSON_EXTRACT(update_description(), '$.updatedFields.payment.status') = 'completed' THEN 
      JSON_ARRAY('receipt_generation', 'accounting_sync', 'loyalty_points_update')
    WHEN JSON_EXTRACT(full_document(), '$.total') > 1000 THEN 
      JSON_ARRAY('fraud_screening', 'vip_handling', 'priority_processing')
    ELSE JSON_ARRAY('general_processing')
  END as required_actions,

  -- Customer and business context
  JSON_OBJECT(
    'customer_id', JSON_EXTRACT(full_document(), '$.customerId'),
    'order_value', JSON_EXTRACT(full_document(), '$.total'),
    'order_status', JSON_EXTRACT(full_document(), '$.status'),
    'payment_method', JSON_EXTRACT(full_document(), '$.payment.method'),
    'shipping_method', JSON_EXTRACT(full_document(), '$.shipping.method'),
    'customer_tier', JSON_EXTRACT(full_document(), '$.customer.loyaltyTier'),
    'order_items_count', JSON_LENGTH(JSON_EXTRACT(full_document(), '$.items'))
  ) as business_context

WHERE 
  -- Filter for relevant business events
  (
    operation_type() = 'insert' OR 
    JSON_EXTRACT(update_description(), '$.updatedFields.status') IS NOT NULL OR
    JSON_EXTRACT(update_description(), '$.updatedFields.payment.status') IS NOT NULL OR
    JSON_EXTRACT(update_description(), '$.updatedFields.shipping.address') IS NOT NULL OR
    JSON_EXTRACT(update_description(), '$.updatedFields.cancellation') IS NOT NULL
  )

  -- Additional business logic filters
  AND (
    operation_type() != 'insert' OR 
    JSON_EXTRACT(full_document(), '$.total') >= 10  -- Only track orders above minimum value
  );

-- Advanced change stream processing with business logic and real-time actions
WITH real_time_order_processing AS (
  SELECT 
    oe.*,

    -- Calculate processing urgency
    CASE 
      WHEN oe.event_priority >= 8 THEN 'critical'
      WHEN oe.event_priority >= 6 THEN 'high'
      WHEN oe.event_priority >= 4 THEN 'normal'
      ELSE 'low'
    END as processing_urgency,

    -- Determine notification channels
    CASE oe.business_event
      WHEN 'order_confirmed' THEN JSON_ARRAY('email', 'push_notification')
      WHEN 'order_shipped' THEN JSON_ARRAY('email', 'sms', 'push_notification')
      WHEN 'order_delivered' THEN JSON_ARRAY('email', 'push_notification', 'in_app')
      WHEN 'payment_failed' THEN JSON_ARRAY('email', 'sms', 'priority_alert')
      WHEN 'order_cancelled' THEN JSON_ARRAY('email', 'refund_processing')
      ELSE JSON_ARRAY('email')
    END as notification_channels,

    -- Generate webhook payloads for external systems
    JSON_OBJECT(
      'event_type', oe.business_event,
      'event_id', oe.event_id,
      'order_id', oe.order_id,
      'timestamp', oe.event_timestamp,
      'priority', oe.event_priority,
      'customer_context', oe.business_context,
      'change_details', oe.change_details
    ) as webhook_payload,

    -- Real-time analytics updates
    CASE oe.business_event
      WHEN 'order_created' THEN 'increment_daily_orders'
      WHEN 'payment_completed' THEN 'increment_revenue'
      WHEN 'order_cancelled' THEN 'increment_cancellations'
      WHEN 'order_delivered' THEN 'increment_completions'
      ELSE 'general_metric_update'
    END as analytics_action

  FROM order_events oe
),

-- Inventory change stream for real-time stock management
inventory_events AS (
  SELECT 
    change_id() as event_id,
    operation_type(),
    document_key() as product_id,
    cluster_time() as event_timestamp,
    full_document() as current_inventory,
    update_description() as change_details,

    -- Stock status classification
    CASE 
      WHEN JSON_EXTRACT(full_document(), '$.quantity') <= 0 THEN 'out_of_stock'
      WHEN JSON_EXTRACT(full_document(), '$.quantity') <= 5 THEN 'critical_low'
      WHEN JSON_EXTRACT(full_document(), '$.quantity') <= 20 THEN 'low_stock'
      WHEN JSON_EXTRACT(full_document(), '$.quantity') >= 100 THEN 'well_stocked'
      ELSE 'normal_stock'
    END as stock_status,

    -- Reorder trigger detection
    CASE 
      WHEN JSON_EXTRACT(full_document(), '$.quantity') < JSON_EXTRACT(full_document(), '$.reorderPoint')
           AND JSON_EXTRACT(full_document(), '$.reorderStatus') != 'pending' THEN true
      ELSE false
    END as reorder_required,

    -- Urgency scoring for inventory alerts
    (
      CASE WHEN JSON_EXTRACT(full_document(), '$.quantity') <= 0 THEN 10 ELSE 0 END +
      CASE WHEN JSON_EXTRACT(full_document(), '$.quantity') <= 5 THEN 7 ELSE 0 END +
      CASE WHEN JSON_EXTRACT(full_document(), '$.demandForecast') >= 50 THEN 3 ELSE 0 END +
      CASE WHEN JSON_EXTRACT(full_document(), '$.category') = 'bestseller' THEN 2 ELSE 0 END
    ) as urgency_score

  FROM CHANGE_STREAM(inventory)
  WHERE JSON_EXTRACT(update_description(), '$.updatedFields.quantity') IS NOT NULL
     OR JSON_EXTRACT(update_description(), '$.updatedFields.reservedQuantity') IS NOT NULL
     OR JSON_EXTRACT(update_description(), '$.updatedFields.available') IS NOT NULL
),

-- Customer lifecycle change stream for personalization and CRM
customer_events AS (
  SELECT 
    change_id() as event_id,
    operation_type(),
    document_key() as customer_id,
    cluster_time() as event_timestamp,
    full_document() as current_customer,
    update_description() as change_details,

    -- Lifecycle event classification
    CASE 
      WHEN operation_type() = 'insert' THEN 'customer_registered'
      WHEN JSON_EXTRACT(update_description(), '$.updatedFields.loyalty.tier') IS NOT NULL THEN 'loyalty_tier_changed'
      WHEN JSON_EXTRACT(update_description(), '$.updatedFields.preferences.marketing') IS NOT NULL THEN 'communication_preferences_updated'
      WHEN JSON_EXTRACT(update_description(), '$.updatedFields.profile') IS NOT NULL THEN 'profile_updated'
      ELSE 'customer_updated'
    END as lifecycle_event,

    -- Personalization trigger actions
    CASE 
      WHEN operation_type() = 'insert' THEN 
        JSON_ARRAY('welcome_sequence', 'preference_collection', 'recommendation_initialization')
      WHEN JSON_EXTRACT(update_description(), '$.updatedFields.preferences.categories') IS NOT NULL THEN 
        JSON_ARRAY('recommendation_refresh', 'content_personalization')
      WHEN JSON_EXTRACT(update_description(), '$.updatedFields.loyalty.tier') IS NOT NULL THEN 
        JSON_ARRAY('tier_benefits_notification', 'exclusive_offers_activation')
      ELSE JSON_ARRAY('profile_validation')
    END as personalization_actions

  FROM CHANGE_STREAM(customers)
  WHERE operation_type() = 'insert'
     OR JSON_EXTRACT(update_description(), '$.updatedFields.profile') IS NOT NULL
     OR JSON_EXTRACT(update_description(), '$.updatedFields.preferences') IS NOT NULL
     OR JSON_EXTRACT(update_description(), '$.updatedFields.loyalty.tier') IS NOT NULL
)

-- Comprehensive real-time event processing with cross-collection coordination
SELECT 
  -- Event identification and metadata
  'order' as event_source,
  rtop.event_id,
  rtop.business_event as event_type,
  rtop.event_timestamp,
  rtop.processing_urgency,

  -- Business context and payload
  rtop.business_context,
  rtop.webhook_payload,
  rtop.required_actions,
  rtop.notification_channels,

  -- Real-time processing instructions
  JSON_OBJECT(
    'immediate_actions', rtop.required_actions,
    'notification_config', JSON_OBJECT(
      'channels', rtop.notification_channels,
      'priority', rtop.event_priority,
      'urgency', rtop.processing_urgency
    ),
    'webhook_config', JSON_OBJECT(
      'payload', rtop.webhook_payload,
      'priority', rtop.event_priority,
      'retry_policy', CASE rtop.processing_urgency
        WHEN 'critical' THEN JSON_OBJECT('max_attempts', 5, 'backoff_ms', 1000)
        WHEN 'high' THEN JSON_OBJECT('max_attempts', 3, 'backoff_ms', 2000)
        ELSE JSON_OBJECT('max_attempts', 2, 'backoff_ms', 5000)
      END
    ),
    'analytics_config', JSON_OBJECT(
      'action', rtop.analytics_action,
      'metrics_update', rtop.business_context
    )
  ) as processing_configuration

FROM real_time_order_processing rtop

UNION ALL

SELECT 
  -- Inventory events
  'inventory' as event_source,
  ie.event_id,
  CONCAT('inventory_', ie.stock_status) as event_type,
  ie.event_timestamp,
  CASE 
    WHEN ie.urgency_score >= 8 THEN 'critical'
    WHEN ie.urgency_score >= 5 THEN 'high'
    ELSE 'normal'
  END as processing_urgency,

  -- Inventory context
  JSON_OBJECT(
    'product_id', ie.product_id,
    'stock_status', ie.stock_status,
    'current_quantity', JSON_EXTRACT(ie.current_inventory, '$.quantity'),
    'urgency_score', ie.urgency_score,
    'reorder_required', ie.reorder_required
  ) as business_context,

  -- Inventory webhook payload
  JSON_OBJECT(
    'event_type', CONCAT('inventory_', ie.stock_status),
    'product_id', ie.product_id,
    'stock_status', ie.stock_status,
    'quantity', JSON_EXTRACT(ie.current_inventory, '$.quantity'),
    'reorder_required', ie.reorder_required,
    'timestamp', ie.event_timestamp
  ) as webhook_payload,

  -- Inventory-specific actions
  CASE 
    WHEN ie.stock_status = 'out_of_stock' THEN 
      JSON_ARRAY('website_update', 'sales_alert', 'emergency_reorder')
    WHEN ie.stock_status = 'critical_low' THEN 
      JSON_ARRAY('reorder_trigger', 'low_stock_alert', 'sales_notification')
    WHEN ie.reorder_required THEN 
      JSON_ARRAY('procurement_notification', 'supplier_contact', 'reorder_automation')
    ELSE JSON_ARRAY('inventory_update')
  END as required_actions,

  -- Inventory notification channels
  CASE ie.stock_status
    WHEN 'out_of_stock' THEN JSON_ARRAY('email', 'slack', 'sms', 'dashboard_alert')
    WHEN 'critical_low' THEN JSON_ARRAY('email', 'slack', 'dashboard_alert')
    ELSE JSON_ARRAY('email', 'dashboard_alert')
  END as notification_channels,

  -- Inventory processing configuration
  JSON_OBJECT(
    'immediate_actions', CASE 
      WHEN ie.stock_status = 'out_of_stock' THEN 
        JSON_ARRAY('website_update', 'sales_alert', 'emergency_reorder')
      WHEN ie.stock_status = 'critical_low' THEN 
        JSON_ARRAY('reorder_trigger', 'low_stock_alert')
      ELSE JSON_ARRAY('inventory_sync')
    END,
    'notification_config', JSON_OBJECT(
      'channels', CASE ie.stock_status
        WHEN 'out_of_stock' THEN JSON_ARRAY('email', 'slack', 'sms')
        ELSE JSON_ARRAY('email', 'slack')
      END,
      'urgency', CASE 
        WHEN ie.urgency_score >= 8 THEN 'critical'
        WHEN ie.urgency_score >= 5 THEN 'high'
        ELSE 'normal'
      END
    ),
    'reorder_config', CASE 
      WHEN ie.reorder_required THEN JSON_OBJECT(
        'automatic_reorder', true,
        'supplier_notification', true,
        'quantity_calculation', 'demand_based'
      )
      ELSE NULL
    END
  ) as processing_configuration

FROM inventory_events ie

UNION ALL

SELECT 
  -- Customer events
  'customer' as event_source,
  ce.event_id,
  ce.lifecycle_event as event_type,
  ce.event_timestamp,
  CASE ce.lifecycle_event
    WHEN 'customer_registered' THEN 'high'
    WHEN 'loyalty_tier_changed' THEN 'high'
    ELSE 'normal'
  END as processing_urgency,

  -- Customer context
  JSON_OBJECT(
    'customer_id', ce.customer_id,
    'lifecycle_event', ce.lifecycle_event,
    'customer_tier', JSON_EXTRACT(ce.current_customer, '$.loyalty.tier'),
    'registration_date', JSON_EXTRACT(ce.current_customer, '$.createdAt'),
    'preferences', JSON_EXTRACT(ce.current_customer, '$.preferences')
  ) as business_context,

  -- Customer webhook payload
  JSON_OBJECT(
    'event_type', ce.lifecycle_event,
    'customer_id', ce.customer_id,
    'timestamp', ce.event_timestamp,
    'customer_data', ce.current_customer
  ) as webhook_payload,

  ce.personalization_actions as required_actions,

  -- Customer notification channels
  CASE ce.lifecycle_event
    WHEN 'customer_registered' THEN JSON_ARRAY('email', 'welcome_kit')
    WHEN 'loyalty_tier_changed' THEN JSON_ARRAY('email', 'push_notification', 'in_app')
    ELSE JSON_ARRAY('email')
  END as notification_channels,

  -- Customer processing configuration
  JSON_OBJECT(
    'immediate_actions', ce.personalization_actions,
    'personalization_config', JSON_OBJECT(
      'update_recommendations', true,
      'refresh_preferences', true,
      'trigger_campaigns', CASE ce.lifecycle_event
        WHEN 'customer_registered' THEN true
        ELSE false
      END
    ),
    'crm_sync_config', JSON_OBJECT(
      'sync_required', true,
      'priority', CASE ce.lifecycle_event
        WHEN 'customer_registered' THEN 'high'
        ELSE 'normal'
      END
    )
  ) as processing_configuration

FROM customer_events ce

ORDER BY 
  CASE processing_urgency
    WHEN 'critical' THEN 1
    WHEN 'high' THEN 2
    WHEN 'normal' THEN 3
    ELSE 4
  END,
  event_timestamp ASC;

-- Real-time analytics and monitoring for change data capture performance
WITH cdc_performance_metrics AS (
  SELECT 
    DATE_TRUNC('minute', event_timestamp) as time_bucket,
    event_source,
    event_type,
    processing_urgency,

    -- Event volume metrics
    COUNT(*) as events_per_minute,
    COUNT(DISTINCT CASE event_source
      WHEN 'order' THEN JSON_EXTRACT(business_context, '$.customer_id')
      WHEN 'customer' THEN JSON_EXTRACT(business_context, '$.customer_id')
      ELSE NULL
    END) as unique_customers_affected,

    -- Processing priority distribution
    COUNT(*) FILTER (WHERE processing_urgency = 'critical') as critical_events,
    COUNT(*) FILTER (WHERE processing_urgency = 'high') as high_priority_events,
    COUNT(*) FILTER (WHERE processing_urgency = 'normal') as normal_events,

    -- Business event analysis
    COUNT(*) FILTER (WHERE event_type LIKE 'order_%') as order_events,
    COUNT(*) FILTER (WHERE event_type LIKE 'inventory_%') as inventory_events,
    COUNT(*) FILTER (WHERE event_type LIKE 'customer_%') as customer_events,

    -- Revenue impact tracking
    SUM(
      CASE 
        WHEN event_type = 'payment_completed' THEN 
          CAST(JSON_EXTRACT(business_context, '$.order_value') AS DECIMAL(10,2))
        ELSE 0
      END
    ) as revenue_processed,

    -- Alert generation tracking
    COUNT(*) FILTER (WHERE processing_urgency IN ('critical', 'high')) as alerts_generated

  FROM (
    -- Use the main change stream query results
    SELECT * FROM (
      SELECT 
        'order' as event_source,
        rtop.business_event as event_type,
        rtop.event_timestamp,
        CASE 
          WHEN rtop.event_priority >= 8 THEN 'critical'
          WHEN rtop.event_priority >= 6 THEN 'high'
          ELSE 'normal'
        END as processing_urgency,
        rtop.business_context
      FROM real_time_order_processing rtop

      UNION ALL

      SELECT 
        'inventory' as event_source,
        CONCAT('inventory_', ie.stock_status) as event_type,
        ie.event_timestamp,
        CASE 
          WHEN ie.urgency_score >= 8 THEN 'critical'
          WHEN ie.urgency_score >= 5 THEN 'high'
          ELSE 'normal'
        END as processing_urgency,
        JSON_OBJECT(
          'product_id', ie.product_id,
          'stock_status', ie.stock_status
        ) as business_context
      FROM inventory_events ie

      UNION ALL

      SELECT 
        'customer' as event_source,
        ce.lifecycle_event as event_type,
        ce.event_timestamp,
        CASE ce.lifecycle_event
          WHEN 'customer_registered' THEN 'high'
          WHEN 'loyalty_tier_changed' THEN 'high'
          ELSE 'normal'
        END as processing_urgency,
        JSON_OBJECT(
          'customer_id', ce.customer_id
        ) as business_context
      FROM customer_events ce
    ) all_events
    WHERE event_timestamp >= CURRENT_TIMESTAMP - INTERVAL '1 hour'
  ) recent_events
  GROUP BY 
    DATE_TRUNC('minute', event_timestamp),
    event_source,
    event_type,
    processing_urgency
),

-- Real-time system health monitoring
system_health_metrics AS (
  SELECT 
    CURRENT_TIMESTAMP as health_check_time,

    -- Change stream performance indicators
    COUNT(*) as total_events_last_minute,
    AVG(events_per_minute) as avg_events_per_minute,
    MAX(events_per_minute) as peak_events_per_minute,

    -- Alert and priority distribution
    SUM(critical_events) as total_critical_events,
    SUM(high_priority_events) as total_high_priority_events,
    SUM(alerts_generated) as total_alerts_generated,

    -- Business impact metrics
    SUM(revenue_processed) as total_revenue_processed,
    SUM(unique_customers_affected) as total_customers_affected,

    -- Event type distribution
    SUM(order_events) as total_order_events,
    SUM(inventory_events) as total_inventory_events, 
    SUM(customer_events) as total_customer_events,

    -- Performance assessment
    CASE 
      WHEN MAX(events_per_minute) > 1000 THEN 'high_load'
      WHEN MAX(events_per_minute) > 500 THEN 'moderate_load'
      WHEN MAX(events_per_minute) > 100 THEN 'normal_load'
      ELSE 'low_load'
    END as system_load_status,

    -- Alert status assessment
    CASE 
      WHEN SUM(critical_events) > 50 THEN 'critical_alerts_high'
      WHEN SUM(critical_events) > 10 THEN 'critical_alerts_moderate'
      WHEN SUM(critical_events) > 0 THEN 'critical_alerts_low'
      ELSE 'no_critical_alerts'
    END as alert_status,

    -- Recommendations for system optimization
    CASE 
      WHEN MAX(events_per_minute) > 1000 AND SUM(critical_events) > 50 THEN 
        'Scale up processing capacity and review alert thresholds'
      WHEN MAX(events_per_minute) > 1000 THEN 
        'Consider horizontal scaling for change stream processing'
      WHEN SUM(critical_events) > 50 THEN 
        'Review alert sensitivity and business rule configuration'
      ELSE 'System operating within normal parameters'
    END as optimization_recommendation

  FROM cdc_performance_metrics
  WHERE time_bucket >= CURRENT_TIMESTAMP - INTERVAL '1 hour'
)

-- Final comprehensive CDC monitoring dashboard
SELECT 
  shm.health_check_time,
  shm.total_events_last_minute,
  ROUND(shm.avg_events_per_minute, 1) as avg_events_per_minute,
  shm.peak_events_per_minute,
  shm.total_critical_events,
  shm.total_high_priority_events,
  ROUND(shm.total_revenue_processed, 2) as revenue_processed_usd,
  shm.total_customers_affected,
  shm.system_load_status,
  shm.alert_status,
  shm.optimization_recommendation,

  -- Event distribution summary
  JSON_OBJECT(
    'order_events', shm.total_order_events,
    'inventory_events', shm.total_inventory_events,
    'customer_events', shm.total_customer_events
  ) as event_distribution,

  -- Performance indicators
  JSON_OBJECT(
    'events_per_second', ROUND(shm.avg_events_per_minute / 60.0, 2),
    'peak_throughput', shm.peak_events_per_minute,
    'alert_rate', ROUND((shm.total_alerts_generated / NULLIF(shm.total_events_last_minute, 0)) * 100, 2),
    'critical_event_percentage', ROUND((shm.total_critical_events / NULLIF(shm.total_events_last_minute, 0)) * 100, 2)
  ) as performance_indicators,

  -- Business impact summary
  JSON_OBJECT(
    'revenue_velocity', ROUND(shm.total_revenue_processed / 60.0, 2),
    'customer_engagement_rate', shm.total_customers_affected,
    'business_event_diversity', (
      CASE WHEN shm.total_order_events > 0 THEN 1 ELSE 0 END +
      CASE WHEN shm.total_inventory_events > 0 THEN 1 ELSE 0 END +
      CASE WHEN shm.total_customer_events > 0 THEN 1 ELSE 0 END
    )
  ) as business_impact,

  -- Trend analysis from recent performance metrics
  (
    SELECT JSON_OBJECT(
      'event_trend', CASE 
        WHEN COUNT(*) > 1 AND 
             (MAX(events_per_minute) - MIN(events_per_minute)) / NULLIF(MIN(events_per_minute), 0) > 0.2 
        THEN 'increasing'
        WHEN COUNT(*) > 1 AND 
             (MIN(events_per_minute) - MAX(events_per_minute)) / NULLIF(MAX(events_per_minute), 0) > 0.2 
        THEN 'decreasing'
        ELSE 'stable'
      END,
      'alert_trend', CASE 
        WHEN SUM(critical_events) > LAG(SUM(critical_events)) OVER (ORDER BY time_bucket) 
        THEN 'increasing'
        ELSE 'stable'
      END
    )
    FROM cdc_performance_metrics
    WHERE time_bucket >= CURRENT_TIMESTAMP - INTERVAL '15 minutes'
    ORDER BY time_bucket DESC
    LIMIT 1
  ) as trend_analysis

FROM system_health_metrics shm;

-- QueryLeaf provides comprehensive change data capture capabilities:
-- 1. Real-time change stream processing with SQL-familiar syntax
-- 2. Advanced event filtering, classification, and routing
-- 3. Business logic integration for intelligent event processing
-- 4. Multi-collection coordination for complex business workflows
-- 5. Comprehensive monitoring and performance analytics
-- 6. Enterprise-grade event sourcing and audit trail capabilities
-- 7. Distributed processing support for high-availability scenarios
-- 8. SQL-style syntax for change stream configuration and management
-- 9. Integration with MongoDB's native change stream capabilities
-- 10. Production-ready scalability and operational monitoring

Best Practices for Change Data Capture Implementation

Event Processing Strategy Design

Essential principles for effective MongoDB Change Data Capture deployment:

  1. Event Filtering: Design comprehensive filtering strategies to process only relevant business events
  2. Business Logic Integration: Embed business rules directly into change stream pipelines for immediate processing
  3. Error Handling: Implement robust retry mechanisms and dead letter queues for failed event processing
  4. Performance Optimization: Configure change streams for optimal throughput with appropriate batch sizes
  5. Monitoring Strategy: Deploy comprehensive monitoring for change stream health and event processing metrics
  6. Scalability Planning: Design for horizontal scaling with distributed processing capabilities

Production Implementation

Optimize MongoDB Change Data Capture for enterprise-scale deployments:

  1. Distributed Processing: Implement leader election and workload distribution for high availability
  2. Event Sourcing: Maintain complete audit trails with event store and snapshot capabilities
  3. Real-time Analytics: Integrate change streams with analytics pipelines for immediate insights
  4. Security Implementation: Ensure proper authentication and authorization for change stream access
  5. Disaster Recovery: Plan for change stream recovery and replay capabilities
  6. Integration Patterns: Design microservices integration with event-driven architecture patterns

Conclusion

MongoDB Change Data Capture through Change Streams provides comprehensive real-time data change notification capabilities that enable responsive, event-driven applications without the performance overhead and latency of traditional polling approaches. The native MongoDB integration ensures that change capture benefits from the same reliability, scalability, and operational features as core database operations.

Key MongoDB Change Data Capture benefits include:

  • Real-Time Responsiveness: Immediate notification of data changes without polling latency or resource waste
  • Comprehensive Change Information: Complete change documents including before/after states and modification details
  • Advanced Filtering: Sophisticated change stream filtering and transformation capabilities within the database
  • Event Ordering: Guaranteed ordering and delivery of change events for consistent event processing
  • Horizontal Scalability: Native support for replica sets and sharded clusters with distributed change stream processing
  • Production Ready: Enterprise-grade reliability with automatic reconnection, resume tokens, and operational monitoring

Whether you're building real-time dashboards, event-driven microservices, collaborative applications, or reactive user experiences, MongoDB Change Data Capture with QueryLeaf's familiar SQL interface provides the foundation for responsive, event-driven architectures.

QueryLeaf Integration: QueryLeaf automatically manages MongoDB Change Streams while providing SQL-familiar syntax for change data capture configuration, event processing, and real-time monitoring. Advanced event routing, business logic integration, and distributed processing patterns are seamlessly handled through familiar SQL constructs, making sophisticated real-time capabilities accessible to SQL-oriented development teams.

The combination of MongoDB's robust change stream capabilities with SQL-style event processing makes it an ideal platform for modern applications that require both real-time responsiveness and familiar database interaction patterns, ensuring your event-driven solutions can scale efficiently while remaining maintainable and feature-rich.

MongoDB Time-Series Collections for IoT Data Processing: Edge Analytics and Real-Time Stream Processing

Modern IoT applications generate massive volumes of time-stamped sensor data requiring efficient storage, real-time analysis, and historical trend analysis. MongoDB's time-series collections provide specialized storage optimization and query capabilities designed specifically for time-ordered data workloads, enabling high-performance IoT data processing with familiar SQL-style analytics patterns.

Time-series collections in MongoDB automatically optimize storage layout, indexing strategies, and query execution for temporal data patterns, significantly improving performance for IoT sensor readings, device telemetry, financial market data, and application metrics while maintaining the flexibility to handle complex nested sensor data structures.

The IoT Data Processing Challenge

Consider a smart manufacturing facility with thousands of sensors generating continuous data streams:

// Traditional document storage - inefficient for time-series data
{
  "_id": ObjectId("..."),
  "device_id": "SENSOR_001_TEMP",
  "device_type": "temperature",
  "location": "assembly_line_1", 
  "timestamp": ISODate("2025-12-24T10:15:30.123Z"),
  "temperature_celsius": 23.5,
  "humidity_percent": 45.2,
  "pressure_bar": 1.013,
  "battery_level": 87,
  "signal_strength": -65,
  "metadata": {
    "firmware_version": "2.1.4",
    "calibration_date": ISODate("2025-11-15T00:00:00Z"),
    "maintenance_status": "ok"
  }
}

// Problems with traditional collections for IoT data:
// 1. Storage Overhead: Full document structure repeated for each reading
// 2. Index Inefficiency: Generic indexes not optimized for time-ordered queries
// 3. Query Performance: Range queries on timestamp fields are slow
// 4. Memory Usage: Large working sets for time-based aggregations
// 5. Disk I/O: Scattered document layout reduces sequential read performance
// 6. Scaling Issues: Hot-spotting on insertion due to monotonic timestamps
// 7. Compression: Limited compression opportunities with varied document structures

// Example of inefficient time-range query performance:
db.sensor_data.find({
  "device_id": "SENSOR_001_TEMP",
  "timestamp": {
    $gte: ISODate("2025-12-24T00:00:00Z"),
    $lt: ISODate("2025-12-24T01:00:00Z")
  }
}).explain("executionStats")
// Result: Full collection scan, high disk I/O, poor cache utilization

MongoDB time-series collections solve these challenges through specialized optimizations:

// Create optimized time-series collection for IoT data
db.createCollection("iot_sensor_readings", {
  timeseries: {
    timeField: "timestamp",           // Required: field containing timestamp
    metaField: "device_info",         // Optional: field containing metadata
    granularity: "seconds",           // Optimization hint: seconds, minutes, or hours
    bucketMaxSpanSeconds: 3600,       // Maximum time span per bucket (1 hour)
    bucketRoundingSeconds: 3600       // Round bucket boundaries to hour marks
  },
  expireAfterSeconds: 7776000,        // TTL: expire data after 90 days
  clusteredIndex: {                   // Optimize for time-ordered access
    key: { timestamp: 1 },
    unique: false
  }
})

// Optimized IoT sensor data structure for time-series collections
{
  "timestamp": ISODate("2025-12-24T10:15:30.123Z"),    // Time field - always required
  "device_info": {                                      // Meta field - constant per device
    "device_id": "SENSOR_001_TEMP",
    "device_type": "temperature",
    "location": "assembly_line_1",
    "firmware_version": "2.1.4",
    "calibration_date": ISODate("2025-11-15T00:00:00Z")
  },
  // Measurement fields - vary over time
  "temperature_celsius": 23.5,
  "humidity_percent": 45.2,
  "pressure_bar": 1.013,
  "battery_level": 87,
  "signal_strength_dbm": -65,
  "status": "operational"
}

IoT Data Ingestion and Streaming

High-Throughput Sensor Data Insertion

// Batch insertion for high-volume IoT data streams
const sensorReadings = [
  {
    timestamp: new Date("2025-12-24T10:15:00Z"),
    device_info: {
      device_id: "TEMP_001",
      location: "warehouse_zone_a",
      device_type: "environmental"
    },
    temperature_celsius: 22.1,
    humidity_percent: 43.5,
    battery_level: 89
  },
  {
    timestamp: new Date("2025-12-24T10:15:30Z"),
    device_info: {
      device_id: "TEMP_001", 
      location: "warehouse_zone_a",
      device_type: "environmental"
    },
    temperature_celsius: 22.3,
    humidity_percent: 43.2,
    battery_level: 89
  },
  {
    timestamp: new Date("2025-12-24T10:15:00Z"),
    device_info: {
      device_id: "PRESS_002",
      location: "hydraulic_system_1",
      device_type: "pressure"
    },
    pressure_bar: 15.7,
    flow_rate_lpm: 125.3,
    valve_position_percent: 67
  }
  // ... thousands more readings
];

// Efficient bulk insertion with time-series optimizations
const result = await db.iot_sensor_readings.insertMany(sensorReadings, {
  ordered: false,        // Allow parallel inserts for better performance
  writeConcern: {        // Balance between performance and durability
    w: 1,               // Acknowledge from primary only
    j: false            // Don't wait for journal sync for high throughput
  }
});

console.log(`Inserted ${result.insertedCount} sensor readings`);

Real-Time Data Streaming Pipeline

// MongoDB Change Streams for real-time IoT data processing
const changeStream = db.iot_sensor_readings.watch([
  {
    $match: {
      "operationType": "insert",
      // Filter for specific device types or locations
      "fullDocument.device_info.device_type": { $in: ["temperature", "pressure"] },
      "fullDocument.device_info.location": { $regex: /^production_line/ }
    }
  },
  {
    $project: {
      timestamp: "$fullDocument.timestamp",
      device_id: "$fullDocument.device_info.device_id",
      location: "$fullDocument.device_info.location",
      temperature: "$fullDocument.temperature_celsius",
      pressure: "$fullDocument.pressure_bar",
      inserted_at: "$$clusterTime"
    }
  }
], { fullDocument: 'updateLookup' });

// Real-time alert processing
changeStream.on('change', async (changeEvent) => {
  const { timestamp, device_id, location, temperature, pressure } = changeEvent;

  // Temperature threshold monitoring
  if (temperature !== undefined && temperature > 35.0) {
    await processTemperatureAlert({
      device_id,
      location,
      temperature,
      timestamp,
      severity: temperature > 40.0 ? 'critical' : 'warning'
    });
  }

  // Pressure threshold monitoring  
  if (pressure !== undefined && pressure > 20.0) {
    await processPressureAlert({
      device_id,
      location,
      pressure,
      timestamp,
      severity: pressure > 25.0 ? 'critical' : 'warning'
    });
  }

  // Update real-time dashboard
  await updateDashboardMetrics({
    device_id,
    location,
    latest_reading: { temperature, pressure, timestamp }
  });
});

async function processTemperatureAlert(alertData) {
  // Check for sustained high temperature
  const recentReadings = await db.iot_sensor_readings.aggregate([
    {
      $match: {
        "device_info.device_id": alertData.device_id,
        "timestamp": {
          $gte: new Date(Date.now() - 5 * 60 * 1000) // Last 5 minutes
        },
        "temperature_celsius": { $gt: 35.0 }
      }
    },
    {
      $group: {
        _id: null,
        avg_temperature: { $avg: "$temperature_celsius" },
        max_temperature: { $max: "$temperature_celsius" },
        reading_count: { $sum: 1 }
      }
    }
  ]).next();

  if (recentReadings && recentReadings.reading_count >= 3) {
    // Sustained high temperature - trigger maintenance alert
    await db.maintenance_alerts.insertOne({
      alert_type: "temperature_sustained_high",
      device_id: alertData.device_id,
      location: alertData.location,
      severity: alertData.severity,
      current_temperature: alertData.temperature,
      avg_temperature_5min: recentReadings.avg_temperature,
      max_temperature_5min: recentReadings.max_temperature,
      created_at: new Date(),
      acknowledged: false
    });

    // Send notification to operations team
    await sendAlert({
      type: 'email',
      recipients: ['operations@manufacturing.com'],
      subject: `High Temperature Alert - ${alertData.location}`,
      body: `Device ${alertData.device_id} reporting sustained high temperature: ${alertData.temperature}°C`
    });
  }
}

Time-Series Analytics and Aggregations

SQL-Style Time-Based Analytics

// Advanced time-series aggregation for IoT analytics
db.iot_sensor_readings.aggregate([
  // Stage 1: Filter recent sensor data
  {
    $match: {
      "timestamp": {
        $gte: ISODate("2025-12-24T00:00:00Z"),
        $lt: ISODate("2025-12-25T00:00:00Z")
      },
      "device_info.location": { $regex: /^production_line/ }
    }
  },

  // Stage 2: Time-based grouping (hourly buckets)
  {
    $group: {
      _id: {
        device_id: "$device_info.device_id",
        location: "$device_info.location", 
        device_type: "$device_info.device_type",
        hour: {
          $dateToString: {
            format: "%Y-%m-%d %H:00:00",
            date: "$timestamp"
          }
        }
      },

      // Temperature analytics
      avg_temperature: { $avg: "$temperature_celsius" },
      min_temperature: { $min: "$temperature_celsius" },
      max_temperature: { $max: "$temperature_celsius" },
      temperature_readings: { $sum: { $cond: [{ $ne: ["$temperature_celsius", null] }, 1, 0] } },

      // Pressure analytics
      avg_pressure: { $avg: "$pressure_bar" },
      min_pressure: { $min: "$pressure_bar" },
      max_pressure: { $max: "$pressure_bar" },
      pressure_readings: { $sum: { $cond: [{ $ne: ["$pressure_bar", null] }, 1, 0] } },

      // Humidity analytics
      avg_humidity: { $avg: "$humidity_percent" },
      min_humidity: { $min: "$humidity_percent" },
      max_humidity: { $max: "$humidity_percent" },

      // Battery level monitoring
      avg_battery: { $avg: "$battery_level" },
      min_battery: { $min: "$battery_level" },
      low_battery_count: { 
        $sum: { $cond: [{ $and: [{ $ne: ["$battery_level", null] }, { $lt: ["$battery_level", 20] }] }, 1, 0] }
      },

      // Data quality metrics
      total_readings: { $sum: 1 },
      missing_data_count: { 
        $sum: { 
          $cond: [
            {
              $and: [
                { $eq: ["$temperature_celsius", null] },
                { $eq: ["$pressure_bar", null] },
                { $eq: ["$humidity_percent", null] }
              ]
            }, 
            1, 
            0
          ]
        }
      },

      // Signal quality
      avg_signal_strength: { $avg: "$signal_strength_dbm" },
      weak_signal_count: {
        $sum: { $cond: [{ $and: [{ $ne: ["$signal_strength_dbm", null] }, { $lt: ["$signal_strength_dbm", -80] }] }, 1, 0] }
      },

      first_reading_time: { $min: "$timestamp" },
      last_reading_time: { $max: "$timestamp" }
    }
  },

  // Stage 3: Calculate derived metrics and data quality indicators
  {
    $addFields: {
      // Temperature variation coefficient
      temperature_variation_coefficient: {
        $cond: [
          { $gt: ["$avg_temperature", 0] },
          {
            $divide: [
              { $subtract: ["$max_temperature", "$min_temperature"] },
              "$avg_temperature"
            ]
          },
          null
        ]
      },

      // Pressure stability indicator
      pressure_stability_score: {
        $cond: [
          { $and: [{ $gt: ["$avg_pressure", 0] }, { $gt: ["$pressure_readings", 10] }] },
          {
            $subtract: [
              1,
              {
                $divide: [
                  { $subtract: ["$max_pressure", "$min_pressure"] },
                  { $multiply: ["$avg_pressure", 2] }
                ]
              }
            ]
          },
          null
        ]
      },

      // Data completeness percentage
      data_completeness_percent: {
        $multiply: [
          {
            $divide: [
              { $subtract: ["$total_readings", "$missing_data_count"] },
              "$total_readings"
            ]
          },
          100
        ]
      },

      // Equipment health score (composite metric)
      equipment_health_score: {
        $multiply: [
          {
            $avg: [
              // Battery health factor (0-1)
              { $divide: ["$avg_battery", 100] },

              // Signal quality factor (0-1)
              { 
                $cond: [
                  { $ne: ["$avg_signal_strength", null] },
                  { $divide: [{ $add: ["$avg_signal_strength", 100] }, 100] },
                  0.5
                ]
              },

              // Data quality factor (0-1)
              { $divide: ["$data_completeness_percent", 100] }
            ]
          },
          100
        ]
      }
    }
  },

  // Stage 4: Quality and threshold analysis
  {
    $addFields: {
      temperature_status: {
        $switch: {
          branches: [
            { case: { $gt: ["$max_temperature", 40] }, then: "critical" },
            { case: { $gt: ["$avg_temperature", 35] }, then: "warning" },
            { case: { $lt: ["$avg_temperature", 15] }, then: "too_cold" },
            { case: { $gt: ["$temperature_variation_coefficient", 0.3] }, then: "unstable" }
          ],
          default: "normal"
        }
      },

      pressure_status: {
        $switch: {
          branches: [
            { case: { $gt: ["$max_pressure", 25] }, then: "critical" },
            { case: { $gt: ["$avg_pressure", 20] }, then: "warning" },
            { case: { $lt: ["$pressure_stability_score", 0.7] }, then: "unstable" }
          ],
          default: "normal"
        }
      },

      battery_status: {
        $switch: {
          branches: [
            { case: { $lt: ["$min_battery", 10] }, then: "critical" },
            { case: { $lt: ["$avg_battery", 20] }, then: "low" },
            { case: { $gt: ["$low_battery_count", 5] }, then: "degrading" }
          ],
          default: "normal"
        }
      },

      overall_status: {
        $switch: {
          branches: [
            { 
              case: { 
                $or: [
                  { $eq: ["$temperature_status", "critical"] },
                  { $eq: ["$pressure_status", "critical"] },
                  { $eq: ["$battery_status", "critical"] }
                ]
              }, 
              then: "critical" 
            },
            {
              case: {
                $or: [
                  { $eq: ["$temperature_status", "warning"] },
                  { $eq: ["$pressure_status", "warning"] },
                  { $eq: ["$battery_status", "low"] },
                  { $lt: ["$data_completeness_percent", 90] }
                ]
              },
              then: "warning"
            }
          ],
          default: "normal"
        }
      }
    }
  },

  // Stage 5: Sort and format results
  {
    $sort: {
      "_id.location": 1,
      "_id.device_id": 1,
      "_id.hour": 1
    }
  },

  // Stage 6: Project final analytics results
  {
    $project: {
      device_id: "$_id.device_id",
      location: "$_id.location",
      device_type: "$_id.device_type",
      hour: "$_id.hour",

      // Environmental metrics
      temperature_metrics: {
        average: { $round: ["$avg_temperature", 1] },
        minimum: { $round: ["$min_temperature", 1] },
        maximum: { $round: ["$max_temperature", 1] },
        variation_coefficient: { $round: ["$temperature_variation_coefficient", 3] },
        reading_count: "$temperature_readings",
        status: "$temperature_status"
      },

      pressure_metrics: {
        average: { $round: ["$avg_pressure", 2] },
        minimum: { $round: ["$min_pressure", 2] },
        maximum: { $round: ["$max_pressure", 2] },
        stability_score: { $round: ["$pressure_stability_score", 3] },
        reading_count: "$pressure_readings", 
        status: "$pressure_status"
      },

      humidity_metrics: {
        average: { $round: ["$avg_humidity", 1] },
        minimum: { $round: ["$min_humidity", 1] },
        maximum: { $round: ["$max_humidity", 1] }
      },

      // Equipment health
      equipment_metrics: {
        battery_average: { $round: ["$avg_battery", 1] },
        battery_minimum: "$min_battery",
        low_battery_incidents: "$low_battery_count",
        battery_status: "$battery_status",
        signal_strength_avg: { $round: ["$avg_signal_strength", 1] },
        weak_signal_count: "$weak_signal_count",
        health_score: { $round: ["$equipment_health_score", 1] }
      },

      // Data quality
      data_quality: {
        total_readings: "$total_readings",
        completeness_percent: { $round: ["$data_completeness_percent", 1] },
        missing_readings: "$missing_data_count",
        time_span_minutes: {
          $divide: [
            { $subtract: ["$last_reading_time", "$first_reading_time"] },
            60000
          ]
        }
      },

      overall_status: "$overall_status",
      analysis_timestamp: "$$NOW"
    }
  }
])

Moving Averages and Trend Analysis

// Calculate moving averages and trend detection for predictive maintenance
db.iot_sensor_readings.aggregate([
  {
    $match: {
      "device_info.device_id": "MOTOR_PUMP_001",
      "timestamp": {
        $gte: ISODate("2025-12-20T00:00:00Z"),
        $lt: ISODate("2025-12-25T00:00:00Z")
      }
    }
  },

  // Sort by timestamp for window functions
  { $sort: { "timestamp": 1 } },

  // Calculate moving averages using sliding windows
  {
    $setWindowFields: {
      partitionBy: "$device_info.device_id",
      sortBy: { "timestamp": 1 },
      output: {
        // 5-minute moving average for vibration
        vibration_ma_5min: {
          $avg: "$vibration_amplitude_mm",
          window: {
            range: [-300, 0], // 5 minutes in seconds
            unit: "second"
          }
        },

        // 15-minute moving average for temperature
        temperature_ma_15min: {
          $avg: "$temperature_celsius",
          window: {
            range: [-900, 0], // 15 minutes in seconds
            unit: "second"
          }
        },

        // 1-hour moving average for pressure
        pressure_ma_1hour: {
          $avg: "$pressure_bar",
          window: {
            range: [-3600, 0], // 1 hour in seconds
            unit: "second"
          }
        },

        // Rolling standard deviation for anomaly detection
        vibration_std_5min: {
          $stdDevSamp: "$vibration_amplitude_mm",
          window: {
            range: [-300, 0],
            unit: "second"
          }
        },

        // Previous reading for trend calculation
        prev_vibration: {
          $shift: {
            output: "$vibration_amplitude_mm",
            by: -1
          }
        },

        // Previous moving average for trend direction
        prev_vibration_ma: {
          $shift: {
            output: {
              $avg: "$vibration_amplitude_mm",
              window: {
                range: [-300, 0],
                unit: "second"
              }
            },
            by: -60 // 1-minute lag for trend detection
          }
        }
      }
    }
  },

  // Calculate derived trend metrics
  {
    $addFields: {
      // Vibration trend direction
      vibration_trend: {
        $cond: [
          { $and: [{ $ne: ["$vibration_ma_5min", null] }, { $ne: ["$prev_vibration_ma", null] }] },
          {
            $switch: {
              branches: [
                { 
                  case: { $gt: [{ $subtract: ["$vibration_ma_5min", "$prev_vibration_ma"] }, 0.1] },
                  then: "increasing"
                },
                {
                  case: { $lt: [{ $subtract: ["$vibration_ma_5min", "$prev_vibration_ma"] }, -0.1] },
                  then: "decreasing" 
                }
              ],
              default: "stable"
            }
          },
          null
        ]
      },

      // Anomaly detection using z-score
      vibration_anomaly_score: {
        $cond: [
          { $and: [{ $gt: ["$vibration_std_5min", 0] }, { $ne: ["$vibration_ma_5min", null] }] },
          {
            $abs: {
              $divide: [
                { $subtract: ["$vibration_amplitude_mm", "$vibration_ma_5min"] },
                "$vibration_std_5min"
              ]
            }
          },
          null
        ]
      },

      // Predictive maintenance indicators
      maintenance_risk_score: {
        $multiply: [
          {
            $add: [
              // High vibration factor
              { $cond: [{ $gt: ["$vibration_ma_5min", 2.5] }, 25, 0] },

              // Increasing vibration trend factor
              { $cond: [{ $eq: ["$vibration_trend", "increasing"] }, 15, 0] },

              // High temperature factor
              { $cond: [{ $gt: ["$temperature_ma_15min", 75] }, 20, 0] },

              // Anomaly factor
              { $cond: [{ $gt: ["$vibration_anomaly_score", 2] }, 30, 0] },

              // Pressure variation factor
              { $cond: [{ $gt: [{ $abs: { $subtract: ["$pressure_bar", "$pressure_ma_1hour"] } }, 2] }, 10, 0] }
            ]
          },
          0.01 // Scale to 0-100
        ]
      }
    }
  },

  // Filter to significant readings and add maintenance recommendations
  {
    $match: {
      $or: [
        { "vibration_anomaly_score": { $gt: 1.5 } },
        { "maintenance_risk_score": { $gt: 30 } },
        { "vibration_trend": "increasing" }
      ]
    }
  },

  // Add maintenance recommendations
  {
    $addFields: {
      maintenance_recommendation: {
        $switch: {
          branches: [
            {
              case: { $gt: ["$maintenance_risk_score", 70] },
              then: {
                priority: "immediate",
                action: "schedule_emergency_inspection",
                description: "High risk indicators detected - immediate inspection required"
              }
            },
            {
              case: { $gt: ["$maintenance_risk_score", 50] },
              then: {
                priority: "high",
                action: "schedule_maintenance_window",
                description: "Elevated risk indicators - schedule maintenance within 24 hours"
              }
            },
            {
              case: { $gt: ["$maintenance_risk_score", 30] },
              then: {
                priority: "medium",
                action: "monitor_closely",
                description: "Potential issues detected - increase monitoring frequency"
              }
            }
          ],
          default: {
            priority: "low", 
            action: "continue_monitoring",
            description: "Minor anomalies detected - continue standard monitoring"
          }
        }
      }
    }
  },

  // Project final predictive maintenance report
  {
    $project: {
      timestamp: 1,
      device_id: "$device_info.device_id",

      current_readings: {
        vibration_amplitude: "$vibration_amplitude_mm",
        temperature: "$temperature_celsius",
        pressure: "$pressure_bar"
      },

      moving_averages: {
        vibration_5min: { $round: ["$vibration_ma_5min", 2] },
        temperature_15min: { $round: ["$temperature_ma_15min", 1] },
        pressure_1hour: { $round: ["$pressure_ma_1hour", 2] }
      },

      trend_analysis: {
        vibration_trend: "$vibration_trend",
        anomaly_score: { $round: ["$vibration_anomaly_score", 2] },
        risk_score: { $round: ["$maintenance_risk_score", 0] }
      },

      maintenance_recommendation: 1,
      analysis_timestamp: "$$NOW"
    }
  },

  { $sort: { "timestamp": -1 } },
  { $limit: 100 }
])

Edge Computing and Local Processing

Edge Analytics with Local Aggregation

// Edge device local aggregation before cloud synchronization
class IoTEdgeProcessor {
  constructor(deviceConfig) {
    this.deviceId = deviceConfig.deviceId;
    this.location = deviceConfig.location;
    this.aggregationWindow = deviceConfig.aggregationWindow || 60; // seconds
    this.localBuffer = [];
    this.thresholds = deviceConfig.thresholds || {};
  }

  // Process incoming sensor reading at edge
  async processSensorReading(reading) {
    const enhancedReading = {
      ...reading,
      timestamp: new Date(),
      device_info: {
        device_id: this.deviceId,
        location: this.location,
        edge_processed: true
      }
    };

    // Add to local buffer
    this.localBuffer.push(enhancedReading);

    // Check for immediate alerts
    await this.checkAlertConditions(enhancedReading);

    // Perform local aggregation if buffer is full
    if (this.shouldAggregate()) {
      await this.performLocalAggregation();
    }

    return enhancedReading;
  }

  shouldAggregate() {
    if (this.localBuffer.length === 0) return false;

    const oldestReading = this.localBuffer[0];
    const currentTime = new Date();
    const timeDiff = (currentTime - oldestReading.timestamp) / 1000;

    return timeDiff >= this.aggregationWindow || this.localBuffer.length >= 100;
  }

  async performLocalAggregation() {
    if (this.localBuffer.length === 0) return;

    const aggregationPeriod = {
      start: this.localBuffer[0].timestamp,
      end: this.localBuffer[this.localBuffer.length - 1].timestamp
    };

    // Calculate edge aggregations
    const aggregatedData = {
      timestamp: aggregationPeriod.start,
      device_info: {
        device_id: this.deviceId,
        location: this.location,
        aggregation_type: "edge_local",
        reading_count: this.localBuffer.length
      },

      // Temperature aggregations
      temperature_metrics: this.calculateFieldMetrics(this.localBuffer, 'temperature_celsius'),

      // Pressure aggregations  
      pressure_metrics: this.calculateFieldMetrics(this.localBuffer, 'pressure_bar'),

      // Humidity aggregations
      humidity_metrics: this.calculateFieldMetrics(this.localBuffer, 'humidity_percent'),

      // Battery and signal quality
      battery_level: this.calculateFieldMetrics(this.localBuffer, 'battery_level'),
      signal_strength: this.calculateFieldMetrics(this.localBuffer, 'signal_strength_dbm'),

      // Data quality indicators
      data_quality: {
        total_readings: this.localBuffer.length,
        time_span_seconds: (aggregationPeriod.end - aggregationPeriod.start) / 1000,
        missing_data_count: this.countMissingData(),
        completeness_percent: this.calculateDataCompleteness()
      },

      // Edge-specific metadata
      edge_metadata: {
        aggregated_at: new Date(),
        local_alerts_triggered: this.localAlertsCount,
        network_quality: this.getNetworkQuality(),
        processing_latency_ms: Date.now() - aggregationPeriod.end.getTime()
      }
    };

    // Send to cloud database
    await this.sendToCloud(aggregatedData);

    // Keep recent raw data, clear older entries
    this.localBuffer = this.localBuffer.slice(-10); // Keep last 10 readings
    this.localAlertsCount = 0;
  }

  calculateFieldMetrics(buffer, fieldName) {
    const values = buffer
      .map(reading => reading[fieldName])
      .filter(value => value !== null && value !== undefined);

    if (values.length === 0) return null;

    const sorted = [...values].sort((a, b) => a - b);

    return {
      average: values.reduce((sum, val) => sum + val, 0) / values.length,
      minimum: Math.min(...values),
      maximum: Math.max(...values),
      median: sorted[Math.floor(sorted.length / 2)],
      standard_deviation: this.calculateStandardDeviation(values),
      reading_count: values.length,
      trend: this.calculateTrend(values)
    };
  }

  calculateStandardDeviation(values) {
    const avg = values.reduce((sum, val) => sum + val, 0) / values.length;
    const squaredDiffs = values.map(val => Math.pow(val - avg, 2));
    const variance = squaredDiffs.reduce((sum, val) => sum + val, 0) / values.length;
    return Math.sqrt(variance);
  }

  calculateTrend(values) {
    if (values.length < 3) return "insufficient_data";

    const firstHalf = values.slice(0, Math.floor(values.length / 2));
    const secondHalf = values.slice(Math.floor(values.length / 2));

    const firstAvg = firstHalf.reduce((sum, val) => sum + val, 0) / firstHalf.length;
    const secondAvg = secondHalf.reduce((sum, val) => sum + val, 0) / secondHalf.length;

    const difference = secondAvg - firstAvg;
    const threshold = Math.abs(firstAvg) * 0.05; // 5% threshold

    if (Math.abs(difference) < threshold) return "stable";
    return difference > 0 ? "increasing" : "decreasing";
  }

  async checkAlertConditions(reading) {
    const alerts = [];

    // Temperature alerts
    if (reading.temperature_celsius !== undefined) {
      if (reading.temperature_celsius > this.thresholds.temperature_critical || 40) {
        alerts.push({
          type: "temperature_critical",
          value: reading.temperature_celsius,
          threshold: this.thresholds.temperature_critical,
          severity: "critical"
        });
      } else if (reading.temperature_celsius > this.thresholds.temperature_warning || 35) {
        alerts.push({
          type: "temperature_warning", 
          value: reading.temperature_celsius,
          threshold: this.thresholds.temperature_warning,
          severity: "warning"
        });
      }
    }

    // Battery alerts
    if (reading.battery_level !== undefined && reading.battery_level < 15) {
      alerts.push({
        type: "battery_low",
        value: reading.battery_level,
        threshold: 15,
        severity: "warning"
      });
    }

    // Process alerts locally
    for (const alert of alerts) {
      await this.processEdgeAlert(alert, reading);
      this.localAlertsCount = (this.localAlertsCount || 0) + 1;
    }
  }

  async processEdgeAlert(alert, reading) {
    const alertData = {
      alert_id: `edge_${this.deviceId}_${Date.now()}`,
      device_id: this.deviceId,
      location: this.location,
      alert_type: alert.type,
      severity: alert.severity,
      triggered_value: alert.value,
      threshold_value: alert.threshold,
      reading_timestamp: reading.timestamp,
      processed_at_edge: new Date(),
      raw_reading: reading
    };

    // Store alert locally for immediate action
    await this.storeLocalAlert(alertData);

    // If critical, try immediate cloud notification
    if (alert.severity === "critical") {
      await this.sendCriticalAlertToCloud(alertData);
    }
  }

  async sendToCloud(aggregatedData) {
    try {
      await db.iot_edge_aggregations.insertOne(aggregatedData);
    } catch (error) {
      console.error('Failed to send aggregated data to cloud:', error);
      // Store locally for later retry
      await this.queueForRetry(aggregatedData);
    }
  }

  getNetworkQuality() {
    // Simulate network quality assessment
    return {
      signal_strength: Math.floor(Math.random() * 100),
      latency_ms: Math.floor(Math.random() * 200) + 50,
      bandwidth_mbps: Math.floor(Math.random() * 100) + 10
    };
  }
}

SQL Integration with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB time-series operations:

-- QueryLeaf SQL syntax for MongoDB time-series analytics

-- Basic time-series data selection with SQL syntax
SELECT 
    timestamp,
    device_info.device_id,
    device_info.location,
    temperature_celsius,
    pressure_bar,
    humidity_percent,
    battery_level
FROM iot_sensor_readings
WHERE timestamp >= CURRENT_TIMESTAMP - INTERVAL '24 hours'
  AND device_info.location LIKE 'production_line_%'
  AND temperature_celsius IS NOT NULL
ORDER BY timestamp DESC
LIMIT 1000;

-- Time-based aggregation with SQL window functions
SELECT 
    device_info.device_id,
    device_info.location,
    DATE_TRUNC('hour', timestamp) AS hour,

    -- Temperature analytics
    AVG(temperature_celsius) AS avg_temperature,
    MIN(temperature_celsius) AS min_temperature,  
    MAX(temperature_celsius) AS max_temperature,
    STDDEV(temperature_celsius) AS temp_std_deviation,

    -- Pressure analytics
    AVG(pressure_bar) AS avg_pressure,
    MIN(pressure_bar) AS min_pressure,
    MAX(pressure_bar) AS max_pressure,

    -- Data quality metrics
    COUNT(*) AS total_readings,
    COUNT(temperature_celsius) AS temp_reading_count,
    COUNT(pressure_bar) AS pressure_reading_count,
    (COUNT(temperature_celsius) * 100.0 / COUNT(*)) AS temp_data_completeness_pct

FROM iot_sensor_readings
WHERE timestamp >= CURRENT_TIMESTAMP - INTERVAL '7 days'
  AND device_info.device_type IN ('environmental', 'pressure')
GROUP BY device_info.device_id, device_info.location, DATE_TRUNC('hour', timestamp)
HAVING COUNT(*) >= 10  -- Ensure sufficient data points
ORDER BY device_info.location, device_info.device_id, hour;

-- Moving averages using SQL window functions
SELECT 
    timestamp,
    device_info.device_id,
    temperature_celsius,
    pressure_bar,

    -- Moving averages with time-based windows
    AVG(temperature_celsius) OVER (
        PARTITION BY device_info.device_id 
        ORDER BY timestamp 
        RANGE BETWEEN INTERVAL '5 minutes' PRECEDING AND CURRENT ROW
    ) AS temperature_ma_5min,

    AVG(pressure_bar) OVER (
        PARTITION BY device_info.device_id
        ORDER BY timestamp
        RANGE BETWEEN INTERVAL '15 minutes' PRECEDING AND CURRENT ROW  
    ) AS pressure_ma_15min,

    -- Standard deviation for anomaly detection
    STDDEV(temperature_celsius) OVER (
        PARTITION BY device_info.device_id
        ORDER BY timestamp
        RANGE BETWEEN INTERVAL '10 minutes' PRECEDING AND CURRENT ROW
    ) AS temperature_rolling_std,

    -- Previous values for trend calculation
    LAG(temperature_celsius, 1) OVER (
        PARTITION BY device_info.device_id 
        ORDER BY timestamp
    ) AS prev_temperature,

    -- Z-score calculation for anomaly detection
    (temperature_celsius - AVG(temperature_celsius) OVER (
        PARTITION BY device_info.device_id
        ORDER BY timestamp  
        RANGE BETWEEN INTERVAL '1 hour' PRECEDING AND CURRENT ROW
    )) / NULLIF(STDDEV(temperature_celsius) OVER (
        PARTITION BY device_info.device_id
        ORDER BY timestamp
        RANGE BETWEEN INTERVAL '1 hour' PRECEDING AND CURRENT ROW  
    ), 0) AS temperature_z_score

FROM iot_sensor_readings
WHERE timestamp >= CURRENT_TIMESTAMP - INTERVAL '1 day'
  AND device_info.device_id = 'TEMP_SENSOR_001'
ORDER BY timestamp;

-- Anomaly detection with SQL pattern matching
WITH sensor_analytics AS (
    SELECT 
        timestamp,
        device_info.device_id,
        device_info.location,
        temperature_celsius,
        pressure_bar,

        -- Calculate moving statistics
        AVG(temperature_celsius) OVER w AS temp_avg,
        STDDEV(temperature_celsius) OVER w AS temp_std,
        AVG(pressure_bar) OVER w AS pressure_avg, 
        STDDEV(pressure_bar) OVER w AS pressure_std

    FROM iot_sensor_readings
    WHERE timestamp >= CURRENT_TIMESTAMP - INTERVAL '2 days'
    WINDOW w AS (
        PARTITION BY device_info.device_id
        ORDER BY timestamp
        RANGE BETWEEN INTERVAL '1 hour' PRECEDING AND CURRENT ROW
    )
),

anomaly_detection AS (
    SELECT *,
        -- Temperature anomaly score (Z-score)
        ABS(temperature_celsius - temp_avg) / NULLIF(temp_std, 0) AS temp_anomaly_score,

        -- Pressure anomaly score  
        ABS(pressure_bar - pressure_avg) / NULLIF(pressure_std, 0) AS pressure_anomaly_score,

        -- Classify anomalies
        CASE 
            WHEN ABS(temperature_celsius - temp_avg) / NULLIF(temp_std, 0) > 3 THEN 'severe'
            WHEN ABS(temperature_celsius - temp_avg) / NULLIF(temp_std, 0) > 2 THEN 'moderate'  
            WHEN ABS(temperature_celsius - temp_avg) / NULLIF(temp_std, 0) > 1.5 THEN 'mild'
            ELSE 'normal'
        END AS temperature_anomaly_level,

        CASE
            WHEN ABS(pressure_bar - pressure_avg) / NULLIF(pressure_std, 0) > 3 THEN 'severe'
            WHEN ABS(pressure_bar - pressure_avg) / NULLIF(pressure_std, 0) > 2 THEN 'moderate'
            WHEN ABS(pressure_bar - pressure_avg) / NULLIF(pressure_std, 0) > 1.5 THEN 'mild' 
            ELSE 'normal'
        END AS pressure_anomaly_level

    FROM sensor_analytics
    WHERE temp_std > 0 AND pressure_std > 0
)

SELECT 
    timestamp,
    device_id,
    location,
    temperature_celsius,
    pressure_bar,
    temp_anomaly_score,
    pressure_anomaly_score,
    temperature_anomaly_level,
    pressure_anomaly_level,

    -- Combined risk assessment
    CASE 
        WHEN temperature_anomaly_level IN ('severe', 'moderate') 
             OR pressure_anomaly_level IN ('severe', 'moderate') THEN 'high_risk'
        WHEN temperature_anomaly_level = 'mild' 
             OR pressure_anomaly_level = 'mild' THEN 'medium_risk'
        ELSE 'low_risk'
    END AS overall_risk_level,

    -- Maintenance recommendation
    CASE
        WHEN temperature_anomaly_level = 'severe' OR pressure_anomaly_level = 'severe' 
            THEN 'immediate_inspection_required'
        WHEN temperature_anomaly_level = 'moderate' OR pressure_anomaly_level = 'moderate'
            THEN 'schedule_maintenance_check'
        WHEN temperature_anomaly_level = 'mild' OR pressure_anomaly_level = 'mild'
            THEN 'monitor_closely'
        ELSE 'continue_normal_monitoring'
    END AS maintenance_action

FROM anomaly_detection  
WHERE temperature_anomaly_level != 'normal' OR pressure_anomaly_level != 'normal'
ORDER BY timestamp DESC, temp_anomaly_score DESC;

-- Predictive maintenance analytics
WITH equipment_health_trends AS (
    SELECT 
        device_info.device_id,
        device_info.location,
        DATE_TRUNC('day', timestamp) AS date,

        -- Daily health metrics
        AVG(temperature_celsius) AS avg_daily_temp,
        MAX(temperature_celsius) AS max_daily_temp,
        STDDEV(temperature_celsius) AS daily_temp_variation,

        AVG(pressure_bar) AS avg_daily_pressure,
        MAX(pressure_bar) AS max_daily_pressure,
        STDDEV(pressure_bar) AS daily_pressure_variation,

        AVG(battery_level) AS avg_daily_battery,
        MIN(battery_level) AS min_daily_battery,

        COUNT(*) AS daily_reading_count,
        COUNT(CASE WHEN temperature_celsius > 35 THEN 1 END) AS high_temp_incidents,
        COUNT(CASE WHEN pressure_bar > 20 THEN 1 END) AS high_pressure_incidents

    FROM iot_sensor_readings
    WHERE timestamp >= CURRENT_TIMESTAMP - INTERVAL '30 days'
    GROUP BY device_info.device_id, device_info.location, DATE_TRUNC('day', timestamp)
),

health_score_calculation AS (
    SELECT *,
        -- Temperature health factor (0-100)
        GREATEST(0, 100 - (max_daily_temp - 20) * 5) AS temp_health_factor,

        -- Pressure health factor (0-100)  
        GREATEST(0, 100 - (max_daily_pressure - 15) * 10) AS pressure_health_factor,

        -- Battery health factor (0-100)
        avg_daily_battery AS battery_health_factor,

        -- Data quality factor (0-100)
        LEAST(100, daily_reading_count / 1440.0 * 100) AS data_quality_factor, -- Assuming 1 reading per minute ideal

        -- Stability factor (0-100) - lower variation is better
        GREATEST(0, 100 - daily_temp_variation * 10) AS temp_stability_factor,
        GREATEST(0, 100 - daily_pressure_variation * 20) AS pressure_stability_factor

    FROM equipment_health_trends
),

predictive_scoring AS (
    SELECT *,
        -- Overall equipment health score
        (temp_health_factor * 0.25 + 
         pressure_health_factor * 0.25 + 
         battery_health_factor * 0.20 + 
         data_quality_factor * 0.10 +
         temp_stability_factor * 0.10 +
         pressure_stability_factor * 0.10) AS daily_health_score,

        -- Trend analysis using moving average
        AVG(temp_health_factor) OVER (
            PARTITION BY device_id 
            ORDER BY date 
            ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
        ) AS temp_health_trend_7day,

        AVG(pressure_health_factor) OVER (
            PARTITION BY device_id
            ORDER BY date
            ROWS BETWEEN 6 PRECEDING AND CURRENT ROW  
        ) AS pressure_health_trend_7day

    FROM health_score_calculation
)

SELECT 
    device_id,
    location,
    date,
    daily_health_score,
    temp_health_factor,
    pressure_health_factor,
    battery_health_factor,

    -- Health trend indicators
    temp_health_trend_7day,
    pressure_health_trend_7day,

    -- Predictive maintenance classification
    CASE 
        WHEN daily_health_score < 30 THEN 'critical_maintenance_needed'
        WHEN daily_health_score < 50 THEN 'maintenance_recommended' 
        WHEN daily_health_score < 70 THEN 'monitor_closely'
        ELSE 'healthy'
    END AS maintenance_status,

    -- Failure risk prediction
    CASE
        WHEN daily_health_score < 40 AND temp_health_trend_7day < temp_health_factor THEN 'high_failure_risk'
        WHEN daily_health_score < 60 AND (temp_health_trend_7day < temp_health_factor OR pressure_health_trend_7day < pressure_health_factor) THEN 'medium_failure_risk'
        ELSE 'low_failure_risk'
    END AS failure_risk_level,

    -- Recommended actions
    CASE
        WHEN daily_health_score < 30 THEN 'schedule_immediate_inspection'
        WHEN daily_health_score < 50 AND temp_health_trend_7day < 50 THEN 'schedule_preventive_maintenance'
        WHEN daily_health_score < 70 THEN 'increase_monitoring_frequency'
        ELSE 'continue_standard_monitoring'
    END AS recommended_action

FROM predictive_scoring
WHERE date >= CURRENT_DATE - INTERVAL '7 days'
ORDER BY device_id, date DESC;

-- QueryLeaf automatically translates these SQL operations to optimized MongoDB time-series aggregations:
-- 1. DATE_TRUNC functions become MongoDB date aggregation operators
-- 2. Window functions translate to MongoDB $setWindowFields operations  
-- 3. Statistical functions map to MongoDB aggregation operators
-- 4. Complex CASE statements become MongoDB $switch expressions
-- 5. Time-based WHERE clauses leverage time-series index optimizations
-- 6. Multi-table operations use MongoDB $lookup for cross-collection analytics

Best Practices for Production IoT Systems

Performance Optimization for High-Volume IoT Data

  1. Collection Design: Use appropriate time-series collection settings for your data granularity and retention requirements
  2. Index Strategy: Create compound indexes on metaField + timeField for optimal query performance
  3. Bucketing Configuration: Set granularity and bucket parameters based on your query patterns
  4. TTL Management: Implement data lifecycle policies with expireAfterSeconds for automatic data expiration
  5. Batch Processing: Use bulk insertions and optimize write operations for high-throughput scenarios

Data Quality and Monitoring

  1. Validation: Implement schema validation for IoT data structure consistency
  2. Anomaly Detection: Build real-time anomaly detection using statistical analysis and machine learning
  3. Data Completeness: Monitor and alert on missing data or device connectivity issues
  4. Performance Metrics: Track insertion rates, query performance, and storage utilization
  5. Alert Systems: Implement multi-level alerting for device health, data quality, and system performance

Conclusion

MongoDB time-series collections provide specialized capabilities for IoT data processing that combine high-performance storage optimization with flexible analytics capabilities. The integration with QueryLeaf enables familiar SQL-style analytics while leveraging MongoDB's optimized time-series storage and indexing strategies.

Key advantages of MongoDB time-series collections for IoT include:

  • Storage Efficiency: Automatic compression and optimized storage layout for time-ordered data
  • Query Performance: Specialized indexing and query optimization for temporal data patterns
  • Real-Time Analytics: Built-in support for streaming analytics and real-time aggregations
  • Edge Integration: Seamless synchronization between edge devices and cloud databases
  • SQL Accessibility: Familiar time-series analytics through QueryLeaf's SQL interface
  • Scalable Architecture: Horizontal scaling capabilities for massive IoT data volumes

Whether you're building smart manufacturing systems, environmental monitoring networks, or industrial IoT platforms, MongoDB's time-series collections with SQL-familiar query patterns provide the foundation for building scalable, high-performance IoT analytics solutions.

QueryLeaf Integration: QueryLeaf seamlessly translates SQL time-series operations into optimized MongoDB time-series queries. Advanced analytics like window functions, moving averages, and anomaly detection are accessible through familiar SQL syntax while leveraging MongoDB's specialized time-series storage optimizations, making sophisticated IoT analytics approachable for SQL-oriented development teams.

The combination of MongoDB's time-series optimizations with SQL-familiar analytics patterns creates an ideal platform for IoT applications that require both high-performance data ingestion and sophisticated analytical capabilities at scale.

MongoDB Atlas App Services and Serverless Functions: SQL-Style Database Integration Patterns

Modern applications increasingly rely on serverless architectures for scalability, cost-effectiveness, and rapid development cycles. MongoDB Atlas App Services provides a comprehensive serverless platform that combines database operations, authentication, and business logic into a unified development experience.

Understanding how to leverage Atlas App Services with SQL-familiar patterns enables you to build robust, scalable applications while maintaining the development productivity and query patterns your team already knows.

The Serverless Database Challenge

Traditional application architectures require managing separate services for databases, authentication, APIs, and business logic:

// Traditional multi-service architecture complexity
const express = require('express');
const mongoose = require('mongoose');
const jwt = require('jsonwebtoken');
const bcrypt = require('bcrypt');

// Database connection
mongoose.connect('mongodb://localhost:27017/myapp');

// Authentication middleware
const authenticateToken = (req, res, next) => {
  const authHeader = req.headers['authorization'];
  const token = authHeader && authHeader.split(' ')[1];

  if (!token) {
    return res.sendStatus(401);
  }

  jwt.verify(token, process.env.ACCESS_TOKEN_SECRET, (err, user) => {
    if (err) return res.sendStatus(403);
    req.user = user;
    next();
  });
};

// API endpoint with manual validation
app.post('/api/orders', authenticateToken, async (req, res) => {
  try {
    // Manual validation
    if (!req.body.items || req.body.items.length === 0) {
      return res.status(400).json({ error: 'Items required' });
    }

    // Business logic
    const total = req.body.items.reduce((sum, item) => sum + item.price * item.quantity, 0);

    // Database operation
    const order = new Order({
      user_id: req.user.id,
      items: req.body.items,
      total: total,
      status: 'pending'
    });

    await order.save();
    res.json(order);
  } catch (error) {
    res.status(500).json({ error: error.message });
  }
});

This approach requires managing infrastructure, scaling concerns, security implementations, and coordination between multiple services.

MongoDB Atlas App Services Architecture

Atlas App Services simplifies this by providing integrated serverless functions, authentication, and data access in a single platform:

// Atlas App Services Function - Serverless and integrated
exports = async function(changeEvent) {
  const { insertedId, fullDocument } = changeEvent;

  // Automatic authentication and context
  const user = context.user;
  const mongodb = context.services.get("mongodb-atlas");
  const db = mongodb.db("ecommerce");

  if (fullDocument.status === 'completed') {
    // Update inventory automatically
    const inventoryUpdates = fullDocument.items.map(item => ({
      updateOne: {
        filter: { product_id: item.product_id },
        update: { $inc: { quantity: -item.quantity } }
      }
    }));

    await db.collection("inventory").bulkWrite(inventoryUpdates);

    // Send notification via integrated services
    await context.functions.execute("sendOrderConfirmation", user.id, fullDocument);
  }
};

SQL-Style Function Development

Atlas App Services functions can be approached using familiar SQL patterns for data access and business logic:

Database Functions

-- Traditional stored procedure pattern
CREATE OR REPLACE FUNCTION create_order(
  user_id UUID,
  items JSONB,
  shipping_address JSONB
) RETURNS JSONB AS $$
DECLARE
  order_total DECIMAL(10,2);
  new_order_id UUID;
BEGIN
  -- Calculate order total
  SELECT SUM((item->>'price')::DECIMAL * (item->>'quantity')::INTEGER)
  INTO order_total
  FROM jsonb_array_elements(items) AS item;

  -- Validate inventory
  IF EXISTS (
    SELECT 1 FROM jsonb_array_elements(items) AS item
    JOIN products p ON p.id = (item->>'product_id')::UUID
    WHERE p.quantity < (item->>'quantity')::INTEGER
  ) THEN
    RAISE EXCEPTION 'Insufficient inventory for one or more items';
  END IF;

  -- Create order
  INSERT INTO orders (user_id, items, total, status, shipping_address)
  VALUES (user_id, items, order_total, 'pending', shipping_address)
  RETURNING id INTO new_order_id;

  RETURN jsonb_build_object(
    'order_id', new_order_id,
    'total', order_total,
    'status', 'created'
  );
END;
$$ LANGUAGE plpgsql;

Atlas App Services equivalent using SQL-familiar logic:

// Atlas Function: createOrder
exports = async function(userId, items, shippingAddress) {
  const mongodb = context.services.get("mongodb-atlas");
  const db = mongodb.db("ecommerce");

  // SQL-style aggregation for total calculation
  const orderTotal = await db.collection("temp").aggregate([
    { $match: { _id: { $exists: false } } }, // Empty pipeline start
    {
      $project: {
        total: {
          $sum: {
            $map: {
              input: items,
              as: "item",
              in: { $multiply: ["$$item.price", "$$item.quantity"] }
            }
          }
        }
      }
    }
  ]).next();

  // SQL-style validation query
  const inventoryCheck = await db.collection("products").aggregate([
    {
      $match: {
        _id: { $in: items.map(item => BSON.ObjectId(item.product_id)) }
      }
    },
    {
      $project: {
        product_id: "$_id",
        available_quantity: "$quantity",
        requested_quantity: {
          $arrayElemAt: [
            {
              $map: {
                input: {
                  $filter: {
                    input: items,
                    cond: { $eq: ["$$this.product_id", { $toString: "$_id" }] }
                  }
                },
                as: "item",
                in: "$$item.quantity"
              }
            },
            0
          ]
        }
      }
    },
    {
      $match: {
        $expr: { $lt: ["$available_quantity", "$requested_quantity"] }
      }
    }
  ]).toArray();

  if (inventoryCheck.length > 0) {
    throw new Error(`Insufficient inventory for products: ${inventoryCheck.map(p => p.product_id).join(', ')}`);
  }

  // SQL-style insert with returning pattern
  const result = await db.collection("orders").insertOne({
    user_id: BSON.ObjectId(userId),
    items: items,
    total: orderTotal.total,
    status: 'pending',
    shipping_address: shippingAddress,
    created_at: new Date()
  });

  return {
    order_id: result.insertedId,
    total: orderTotal.total,
    status: 'created'
  };
};

Authentication and Authorization Functions

// Atlas Function: User Registration with SQL-style validation
exports = async function(email, password, profile) {
  const mongodb = context.services.get("mongodb-atlas");
  const db = mongodb.db("userdata");

  // SQL-style uniqueness check
  const existingUser = await db.collection("users").findOne({ 
    email: { $regex: new RegExp(`^${email}$`, 'i') }
  });

  if (existingUser) {
    throw new Error('User with this email already exists');
  }

  // SQL-style validation patterns
  const emailPattern = /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/;
  if (!emailPattern.test(email)) {
    throw new Error('Invalid email format');
  }

  if (password.length < 8) {
    throw new Error('Password must be at least 8 characters long');
  }

  // Create user with Atlas authentication
  const userResult = await context.services.get("mongodb-atlas").callFunction("registerUser", {
    email: email,
    password: password
  });

  // Create user profile with SQL-style structure
  const userProfile = await db.collection("user_profiles").insertOne({
    user_id: userResult.user_id,
    email: email,
    profile: profile,
    status: 'active',
    created_at: new Date(),
    updated_at: new Date(),
    preferences: {
      notifications: true,
      theme: 'auto',
      language: 'en'
    }
  });

  return {
    user_id: userResult.user_id,
    profile_id: userProfile.insertedId,
    status: 'created'
  };
};

Data Access Patterns with App Services

HTTP Endpoints with SQL-Style Routing

// Atlas HTTPS Endpoint: /api/orders
exports = async function(request, response) {
  const { httpMethod, query, body, headers } = request;

  // SQL-style route handling
  switch (httpMethod) {
    case 'GET':
      return await handleGetOrders(query, headers);
    case 'POST':
      return await handleCreateOrder(body, headers);
    case 'PUT':
      return await handleUpdateOrder(body, headers);
    case 'DELETE':
      return await handleDeleteOrder(query, headers);
    default:
      response.setStatusCode(405);
      return { error: 'Method not allowed' };
  }
};

async function handleGetOrders(query, headers) {
  const mongodb = context.services.get("mongodb-atlas");
  const db = mongodb.db("ecommerce");

  // SQL-style pagination and filtering
  const page = parseInt(query.page || '1');
  const limit = parseInt(query.limit || '20');
  const skip = (page - 1) * limit;

  // Build SQL-style filter conditions
  const filter = {};
  if (query.status) {
    filter.status = { $in: query.status.split(',') };
  }
  if (query.date_from) {
    filter.created_at = { $gte: new Date(query.date_from) };
  }
  if (query.date_to) {
    filter.created_at = { ...filter.created_at, $lte: new Date(query.date_to) };
  }

  // SQL-style aggregation with joins
  const orders = await db.collection("orders").aggregate([
    { $match: filter },
    {
      $lookup: {
        from: "users",
        localField: "user_id",
        foreignField: "_id",
        as: "user_info"
      }
    },
    {
      $unwind: "$user_info"
    },
    {
      $project: {
        order_id: "$_id",
        user_email: "$user_info.email",
        total: 1,
        status: 1,
        created_at: 1,
        item_count: { $size: "$items" }
      }
    },
    { $sort: { created_at: -1 } },
    { $skip: skip },
    { $limit: limit }
  ]).toArray();

  const totalCount = await db.collection("orders").countDocuments(filter);

  return {
    data: orders,
    pagination: {
      page: page,
      limit: limit,
      total: totalCount,
      pages: Math.ceil(totalCount / limit)
    }
  };
}

GraphQL Integration with SQL Patterns

// Atlas GraphQL Custom Resolver
exports = async function(parent, args, context, info) {
  const { input } = args;
  const mongodb = context.services.get("mongodb-atlas");
  const db = mongodb.db("blog");

  // SQL-style full-text search with joins
  const articles = await db.collection("articles").aggregate([
    {
      $match: {
        $and: [
          { $text: { $search: input.searchTerm } },
          { status: "published" },
          input.category ? { category: input.category } : {}
        ]
      }
    },
    {
      $lookup: {
        from: "users",
        localField: "author_id",
        foreignField: "_id",
        as: "author"
      }
    },
    {
      $unwind: "$author"
    },
    {
      $addFields: {
        relevance_score: { $meta: "textScore" },
        engagement_score: {
          $add: [
            { $multiply: ["$view_count", 0.1] },
            { $multiply: ["$like_count", 0.3] },
            { $multiply: ["$comment_count", 0.6] }
          ]
        }
      }
    },
    {
      $project: {
        title: 1,
        excerpt: 1,
        author_name: "$author.name",
        author_avatar: "$author.avatar_url",
        published_date: 1,
        reading_time: 1,
        tags: 1,
        relevance_score: 1,
        engagement_score: 1,
        combined_score: {
          $add: [
            { $multiply: ["$relevance_score", 0.7] },
            { $multiply: ["$engagement_score", 0.3] }
          ]
        }
      }
    },
    { $sort: { combined_score: -1, published_date: -1 } },
    { $limit: input.limit || 20 }
  ]).toArray();

  return {
    articles: articles,
    total: articles.length,
    search_term: input.searchTerm
  };
};

Real-Time Data Synchronization

Database Triggers with SQL-Style Logic

// Atlas Database Trigger: Order Status Changes
exports = async function(changeEvent) {
  const { operationType, fullDocument, updateDescription } = changeEvent;

  if (operationType === 'update' && updateDescription.updatedFields.status) {
    const mongodb = context.services.get("mongodb-atlas");
    const db = mongodb.db("ecommerce");

    const order = fullDocument;
    const newStatus = updateDescription.updatedFields.status;

    // SQL-style cascading updates based on status
    switch (newStatus) {
      case 'confirmed':
        // Update inventory like SQL UPDATE with JOIN
        const inventoryUpdates = order.items.map(item => ({
          updateOne: {
            filter: { product_id: item.product_id },
            update: {
              $inc: { 
                quantity: -item.quantity,
                reserved_quantity: item.quantity
              }
            }
          }
        }));

        await db.collection("inventory").bulkWrite(inventoryUpdates);
        break;

      case 'shipped':
        // SQL-style insert into shipping records
        await db.collection("shipping_records").insertOne({
          order_id: order._id,
          tracking_number: generateTrackingNumber(),
          carrier: order.shipping_method,
          shipped_date: new Date(),
          estimated_delivery: calculateDeliveryDate(order.shipping_address)
        });

        // Update user loyalty points like SQL computed columns
        await db.collection("users").updateOne(
          { _id: order.user_id },
          {
            $inc: { 
              loyalty_points: Math.floor(order.total * 0.1),
              orders_completed: 1
            },
            $set: { last_order_date: new Date() }
          }
        );
        break;

      case 'delivered':
        // Release reserved inventory like SQL constraint updates
        const releaseUpdates = order.items.map(item => ({
          updateOne: {
            filter: { product_id: item.product_id },
            update: {
              $inc: { reserved_quantity: -item.quantity }
            }
          }
        }));

        await db.collection("inventory").bulkWrite(releaseUpdates);

        // Schedule review request like SQL scheduled jobs
        await context.functions.execute("scheduleReviewRequest", order._id, order.user_id);
        break;
    }
  }
};

function generateTrackingNumber() {
  return 'TRK' + Date.now() + Math.random().toString(36).substr(2, 5).toUpperCase();
}

function calculateDeliveryDate(address) {
  // Business logic for delivery estimation
  const baseDelivery = new Date();
  baseDelivery.setDate(baseDelivery.getDate() + 3); // 3 days standard

  // Add extra days for remote areas
  if (address.state && ['AK', 'HI'].includes(address.state)) {
    baseDelivery.setDate(baseDelivery.getDate() + 2);
  }

  return baseDelivery;
}

Scheduled Functions for Maintenance

// Atlas Scheduled Function: Daily Maintenance Tasks
exports = async function() {
  const mongodb = context.services.get("mongodb-atlas");
  const db = mongodb.db("ecommerce");

  // SQL-style cleanup operations
  const yesterday = new Date();
  yesterday.setDate(yesterday.getDate() - 1);

  // Clean up expired sessions like SQL DELETE with JOIN
  await db.collection("user_sessions").deleteMany({
    expires_at: { $lt: new Date() }
  });

  // Archive old orders like SQL INSERT INTO archive SELECT
  const oldOrders = await db.collection("orders").aggregate([
    {
      $match: {
        status: { $in: ['completed', 'cancelled'] },
        updated_at: { $lt: new Date(Date.now() - 90 * 24 * 60 * 60 * 1000) }
      }
    }
  ]).toArray();

  if (oldOrders.length > 0) {
    await db.collection("orders_archive").insertMany(oldOrders);
    const orderIds = oldOrders.map(order => order._id);
    await db.collection("orders").deleteMany({ _id: { $in: orderIds } });
  }

  // Update analytics like SQL materialized views
  await db.collection("daily_analytics").insertOne({
    date: yesterday,
    metrics: await calculateDailyMetrics(db, yesterday),
    generated_at: new Date()
  });

  console.log(`Daily maintenance completed: Cleaned ${oldOrders.length} orders, updated analytics`);
};

async function calculateDailyMetrics(db, date) {
  const startOfDay = new Date(date);
  startOfDay.setHours(0, 0, 0, 0);
  const endOfDay = new Date(date);
  endOfDay.setHours(23, 59, 59, 999);

  // SQL-style aggregation for daily metrics
  const metrics = await db.collection("orders").aggregate([
    {
      $match: {
        created_at: { $gte: startOfDay, $lte: endOfDay }
      }
    },
    {
      $group: {
        _id: null,
        total_orders: { $sum: 1 },
        total_revenue: { $sum: "$total" },
        avg_order_value: { $avg: "$total" },
        unique_customers: { $addToSet: "$user_id" }
      }
    },
    {
      $project: {
        _id: 0,
        total_orders: 1,
        total_revenue: 1,
        avg_order_value: { $round: ["$avg_order_value", 2] },
        unique_customers: { $size: "$unique_customers" }
      }
    }
  ]).next();

  return metrics || {
    total_orders: 0,
    total_revenue: 0,
    avg_order_value: 0,
    unique_customers: 0
  };
}

Security and Authentication Patterns

Rule-Based Access Control

// Atlas App Services Rules: Collection-level security
{
  "roles": [
    {
      "name": "user",
      "apply_when": {
        "%%user.custom_data.role": "customer"
      },
      "read": {
        "user_id": "%%user.id"
      },
      "write": {
        "$and": [
          { "user_id": "%%user.id" },
          { "status": { "$nin": ["completed", "cancelled"] } }
        ]
      }
    },
    {
      "name": "admin",
      "apply_when": {
        "%%user.custom_data.role": "admin"
      },
      "read": true,
      "write": true
    }
  ]
}

SQL-Style Permission Checking

// Atlas Function: Check User Permissions
exports = async function(userId, resource, action) {
  const mongodb = context.services.get("mongodb-atlas");
  const db = mongodb.db("auth");

  // SQL-style permission lookup with joins
  const permissions = await db.collection("user_permissions").aggregate([
    {
      $match: { user_id: BSON.ObjectId(userId) }
    },
    {
      $lookup: {
        from: "roles",
        localField: "role_id",
        foreignField: "_id",
        as: "role"
      }
    },
    {
      $unwind: "$role"
    },
    {
      $lookup: {
        from: "role_permissions",
        localField: "role._id",
        foreignField: "role_id",
        as: "role_permissions"
      }
    },
    {
      $unwind: "$role_permissions"
    },
    {
      $lookup: {
        from: "permissions",
        localField: "role_permissions.permission_id",
        foreignField: "_id",
        as: "permission"
      }
    },
    {
      $unwind: "$permission"
    },
    {
      $match: {
        "permission.resource": resource,
        "permission.action": action
      }
    },
    {
      $project: {
        has_permission: true,
        permission_name: "$permission.name",
        role_name: "$role.name"
      }
    }
  ]).toArray();

  return {
    allowed: permissions.length > 0,
    permissions: permissions
  };
};

QueryLeaf Integration with Atlas App Services

QueryLeaf can seamlessly work with Atlas App Services to provide SQL interfaces for serverless applications:

-- QueryLeaf can generate Atlas Functions from SQL procedures
CREATE OR REPLACE FUNCTION get_user_dashboard_data(user_id UUID)
RETURNS TABLE (
  user_profile JSONB,
  recent_orders JSONB,
  recommendations JSONB,
  analytics JSONB
) AS $$
BEGIN
  -- This gets translated to Atlas App Services function
  RETURN QUERY
  WITH user_data AS (
    SELECT 
      u.name,
      u.email,
      u.preferences,
      u.loyalty_points
    FROM users u
    WHERE u._id = user_id
  ),
  order_data AS (
    SELECT json_agg(
      json_build_object(
        'order_id', o._id,
        'total', o.total,
        'status', o.status,
        'created_at', o.created_at
      ) ORDER BY o.created_at DESC
    ) AS recent_orders
    FROM orders o
    WHERE o.user_id = user_id
    AND o.created_at >= CURRENT_DATE - INTERVAL '30 days'
    LIMIT 10
  ),
  recommendation_data AS (
    SELECT json_agg(
      json_build_object(
        'product_id', p._id,
        'name', p.name,
        'price', p.price,
        'score', r.score
      ) ORDER BY r.score DESC
    ) AS recommendations
    FROM product_recommendations r
    JOIN products p ON r.product_id = p._id
    WHERE r.user_id = user_id
    LIMIT 5
  )
  SELECT 
    row_to_json(user_data.*) AS user_profile,
    order_data.recent_orders,
    recommendation_data.recommendations,
    json_build_object(
      'total_orders', (SELECT COUNT(*) FROM orders WHERE user_id = user_id),
      'total_spent', (SELECT SUM(total) FROM orders WHERE user_id = user_id)
    ) AS analytics
  FROM user_data, order_data, recommendation_data;
END;
$$ LANGUAGE plpgsql;

-- QueryLeaf automatically converts this to Atlas App Services function
-- Call the function using familiar SQL syntax
SELECT * FROM get_user_dashboard_data('507f1f77bcf86cd799439011');

Performance Optimization for Serverless Functions

Function Caching Strategies

// Atlas Function: Cached Product Catalog
const CACHE_TTL = 300; // 5 minutes
let catalogCache = null;
let cacheTimestamp = 0;

exports = async function(category, limit = 20) {
  const now = Date.now();

  // Check cache validity like SQL query caching
  if (catalogCache && (now - cacheTimestamp) < (CACHE_TTL * 1000)) {
    return filterCachedResults(catalogCache, category, limit);
  }

  const mongodb = context.services.get("mongodb-atlas");
  const db = mongodb.db("catalog");

  // SQL-style aggregation with caching
  catalogCache = await db.collection("products").aggregate([
    {
      $match: { 
        status: "active",
        inventory_count: { $gt: 0 }
      }
    },
    {
      $lookup: {
        from: "categories",
        localField: "category_id",
        foreignField: "_id",
        as: "category"
      }
    },
    {
      $unwind: "$category"
    },
    {
      $project: {
        name: 1,
        price: 1,
        category_name: "$category.name",
        inventory_count: 1,
        rating: 1,
        popularity_score: {
          $add: [
            { $multiply: ["$view_count", 0.3] },
            { $multiply: ["$purchase_count", 0.7] }
          ]
        }
      }
    },
    { $sort: { popularity_score: -1 } }
  ]).toArray();

  cacheTimestamp = now;

  return filterCachedResults(catalogCache, category, limit);
};

function filterCachedResults(cache, category, limit) {
  let results = cache;

  if (category) {
    results = cache.filter(product => product.category_name === category);
  }

  return results.slice(0, limit);
}

Batch Processing Patterns

// Atlas Function: Batch Order Processing
exports = async function() {
  const mongodb = context.services.get("mongodb-atlas");
  const db = mongodb.db("ecommerce");

  // SQL-style batch processing with transactions
  const session = mongodb.startSession();

  try {
    session.startTransaction();

    // Get pending orders like SQL SELECT FOR UPDATE
    const pendingOrders = await db.collection("orders")
      .find({ 
        status: "pending",
        created_at: { $lt: new Date(Date.now() - 60000) } // 1 minute old
      })
      .limit(100)
      .toArray();

    const batchResults = [];

    for (const order of pendingOrders) {
      try {
        // Process each order with SQL-style logic
        const processResult = await processOrder(db, order, session);
        batchResults.push({
          order_id: order._id,
          status: 'processed',
          result: processResult
        });

      } catch (error) {
        batchResults.push({
          order_id: order._id,
          status: 'failed',
          error: error.message
        });
      }
    }

    await session.commitTransaction();

    return {
      processed: batchResults.length,
      successful: batchResults.filter(r => r.status === 'processed').length,
      failed: batchResults.filter(r => r.status === 'failed').length,
      results: batchResults
    };

  } catch (error) {
    await session.abortTransaction();
    throw error;
  } finally {
    session.endSession();
  }
};

async function processOrder(db, order, session) {
  // SQL-style order processing logic
  const paymentResult = await context.functions.execute(
    "processPayment", 
    order._id, 
    order.total, 
    order.payment_method
  );

  if (paymentResult.status === 'success') {
    await db.collection("orders").updateOne(
      { _id: order._id },
      { 
        $set: { 
          status: 'confirmed',
          payment_id: paymentResult.payment_id,
          confirmed_at: new Date()
        }
      },
      { session }
    );

    return { payment_id: paymentResult.payment_id };
  } else {
    throw new Error(`Payment failed: ${paymentResult.error}`);
  }
}

Best Practices for Atlas App Services

  1. Function Design: Keep functions focused and single-purpose like SQL stored procedures
  2. Error Handling: Implement comprehensive error handling with meaningful messages
  3. Security: Use App Services rules for data access control and authentication
  4. Performance: Leverage caching and batch processing for optimal performance
  5. Monitoring: Implement logging and metrics collection for production visibility
  6. Testing: Develop comprehensive test suites for serverless functions

Conclusion

MongoDB Atlas App Services provides a powerful serverless platform that simplifies application development while maintaining the performance and scalability characteristics needed for production systems. By approaching serverless development with SQL-familiar patterns, teams can leverage their existing expertise while gaining the benefits of serverless architecture.

Key advantages of SQL-style serverless development:

  • Familiar Patterns: Use well-understood SQL concepts for business logic
  • Integrated Platform: Combine database, authentication, and compute in a single service
  • Automatic Scaling: Handle traffic spikes without infrastructure management
  • Cost Efficiency: Pay only for actual function execution time
  • Developer Productivity: Focus on business logic instead of infrastructure concerns

Whether you're building e-commerce platforms, content management systems, or data processing applications, Atlas App Services with SQL-style patterns provides a robust foundation for modern serverless applications.

The combination of MongoDB's document flexibility, Atlas's managed infrastructure, and QueryLeaf's familiar SQL interface creates an ideal environment for rapid development and deployment of scalable serverless applications.

With proper design patterns, security implementation, and performance optimization, Atlas App Services enables you to build enterprise-grade serverless applications that maintain the development velocity and operational simplicity that modern teams require.