Skip to content

Blog

MongoDB Capped Collections: Fixed-Size High-Performance Logging and Data Streaming for Real-Time Applications

Real-time applications require efficient data structures for continuous data capture, event streaming, and high-frequency logging without the overhead of traditional database management. Conventional database approaches struggle with scenarios requiring sustained high-throughput writes, automatic old data removal, and guaranteed insertion order preservation, often leading to performance degradation, storage bloat, and complex maintenance procedures in production environments.

MongoDB capped collections provide native fixed-size, high-performance data structures that maintain insertion order and automatically remove old documents when storage limits are reached. Unlike traditional database logging solutions that require complex archival processes and performance-degrading maintenance operations, MongoDB capped collections deliver consistent high-throughput writes, predictable storage usage, and automatic data lifecycle management through optimized storage allocation and write-optimized data structures.

The Traditional High-Performance Logging Challenge

Conventional database logging approaches often encounter significant performance and maintenance challenges:

-- Traditional PostgreSQL high-performance logging - complex maintenance and performance issues

-- Basic application logging table with growing maintenance complexity
CREATE TABLE application_logs (
    log_id BIGSERIAL PRIMARY KEY,
    application_name VARCHAR(100) NOT NULL,
    log_level VARCHAR(20) NOT NULL,
    timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    message TEXT NOT NULL,

    -- Additional context fields
    user_id BIGINT,
    session_id VARCHAR(100),
    request_id VARCHAR(100),

    -- Performance metadata
    duration_ms INTEGER,
    memory_usage_mb DECIMAL(8,2),
    cpu_usage_percent DECIMAL(5,2),

    -- Log metadata
    thread_id INTEGER,
    process_id INTEGER,
    hostname VARCHAR(100),

    -- Complex indexing for performance
    CONSTRAINT valid_log_level CHECK (log_level IN ('DEBUG', 'INFO', 'WARN', 'ERROR', 'CRITICAL'))
);

-- Multiple indexes required for different query patterns - increasing maintenance overhead
CREATE INDEX idx_logs_timestamp ON application_logs(timestamp DESC);
CREATE INDEX idx_logs_level_timestamp ON application_logs(log_level, timestamp DESC);
CREATE INDEX idx_logs_app_timestamp ON application_logs(application_name, timestamp DESC);
CREATE INDEX idx_logs_user_timestamp ON application_logs(user_id, timestamp DESC) WHERE user_id IS NOT NULL;
CREATE INDEX idx_logs_session_timestamp ON application_logs(session_id, timestamp DESC) WHERE session_id IS NOT NULL;

-- Complex partitioning strategy for log table management
CREATE TABLE application_logs_2024_01 (
    CHECK (timestamp >= '2024-01-01' AND timestamp < '2024-02-01')
) INHERITS (application_logs);

CREATE TABLE application_logs_2024_02 (
    CHECK (timestamp >= '2024-02-01' AND timestamp < '2024-03-01')
) INHERITS (application_logs);

-- Monthly partition maintenance (complex and error-prone)
CREATE OR REPLACE FUNCTION create_monthly_log_partition()
RETURNS VOID AS $$
DECLARE
    partition_name TEXT;
    start_date DATE;
    end_date DATE;
BEGIN
    start_date := DATE_TRUNC('month', CURRENT_DATE);
    end_date := start_date + INTERVAL '1 month';
    partition_name := 'application_logs_' || TO_CHAR(start_date, 'YYYY_MM');

    EXECUTE format('
        CREATE TABLE IF NOT EXISTS %I (
            CHECK (timestamp >= %L AND timestamp < %L)
        ) INHERITS (application_logs)', 
        partition_name, start_date, end_date);

    EXECUTE format('
        CREATE INDEX IF NOT EXISTS %I ON %I(timestamp DESC)',
        'idx_' || partition_name || '_timestamp', partition_name);
END;
$$ LANGUAGE plpgsql;

-- Automated cleanup process with significant performance impact
CREATE OR REPLACE FUNCTION cleanup_old_logs(retention_days INTEGER DEFAULT 90)
RETURNS TABLE(
    deleted_count BIGINT,
    cleanup_duration_ms BIGINT,
    affected_partitions TEXT[]
) AS $$
DECLARE
    cutoff_date TIMESTAMP;
    partition_record RECORD;
    total_deleted BIGINT := 0;
    start_time TIMESTAMP := clock_timestamp();
    affected_partitions TEXT[] := '{}';
BEGIN
    cutoff_date := CURRENT_TIMESTAMP - (retention_days || ' days')::INTERVAL;

    -- Delete from main table (expensive operation)
    DELETE FROM ONLY application_logs 
    WHERE timestamp < cutoff_date;

    GET DIAGNOSTICS total_deleted = ROW_COUNT;

    -- Handle partitioned tables
    FOR partition_record IN 
        SELECT schemaname, tablename 
        FROM pg_tables 
        WHERE tablename LIKE 'application_logs_%'
        AND tablename ~ '^\d{4}_\d{2}$'
    LOOP
        -- Check if entire partition can be dropped
        EXECUTE format('
            SELECT COUNT(*) 
            FROM %I.%I 
            WHERE timestamp >= %L',
            partition_record.schemaname,
            partition_record.tablename,
            cutoff_date
        );

        -- Complex logic to determine drop vs cleanup
        IF FOUND THEN
            EXECUTE format('DROP TABLE IF EXISTS %I.%I CASCADE',
                partition_record.schemaname, partition_record.tablename);
            affected_partitions := affected_partitions || partition_record.tablename;
        ELSE
            -- Partial cleanup within partition (expensive)
            EXECUTE format('
                DELETE FROM %I.%I WHERE timestamp < %L',
                partition_record.schemaname, partition_record.tablename, cutoff_date);
        END IF;
    END LOOP;

    -- Vacuum and reindex (significant performance impact)
    VACUUM ANALYZE application_logs;
    REINDEX TABLE application_logs;

    RETURN QUERY SELECT 
        total_deleted,
        EXTRACT(MILLISECONDS FROM clock_timestamp() - start_time)::BIGINT,
        affected_partitions;
END;
$$ LANGUAGE plpgsql;

-- High-frequency insert procedure with limited performance optimization
CREATE OR REPLACE FUNCTION batch_insert_logs(log_entries JSONB[])
RETURNS TABLE(
    inserted_count INTEGER,
    failed_count INTEGER,
    processing_time_ms INTEGER
) AS $$
DECLARE
    log_entry JSONB;
    success_count INTEGER := 0;
    error_count INTEGER := 0;
    start_time TIMESTAMP := clock_timestamp();
    temp_table_name TEXT := 'temp_log_batch_' || extract(epoch from now())::INTEGER;
BEGIN

    -- Create temporary table for batch processing
    EXECUTE format('
        CREATE TEMP TABLE %I (
            application_name VARCHAR(100),
            log_level VARCHAR(20),
            timestamp TIMESTAMP,
            message TEXT,
            user_id BIGINT,
            session_id VARCHAR(100),
            request_id VARCHAR(100),
            duration_ms INTEGER,
            memory_usage_mb DECIMAL(8,2),
            thread_id INTEGER,
            hostname VARCHAR(100)
        )', temp_table_name);

    -- Process each log entry individually (inefficient for high volume)
    FOREACH log_entry IN ARRAY log_entries
    LOOP
        BEGIN
            EXECUTE format('
                INSERT INTO %I (
                    application_name, log_level, timestamp, message,
                    user_id, session_id, request_id, duration_ms,
                    memory_usage_mb, thread_id, hostname
                ) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11)',
                temp_table_name
            ) USING 
                log_entry->>'application_name',
                log_entry->>'log_level',
                (log_entry->>'timestamp')::TIMESTAMP,
                log_entry->>'message',
                (log_entry->>'user_id')::BIGINT,
                log_entry->>'session_id',
                log_entry->>'request_id',
                (log_entry->>'duration_ms')::INTEGER,
                (log_entry->>'memory_usage_mb')::DECIMAL(8,2),
                (log_entry->>'thread_id')::INTEGER,
                log_entry->>'hostname';

            success_count := success_count + 1;

        EXCEPTION WHEN OTHERS THEN
            error_count := error_count + 1;
            -- Limited error handling for high-frequency operations
            CONTINUE;
        END;
    END LOOP;

    -- Batch insert into main table (still limited by indexing overhead)
    EXECUTE format('
        INSERT INTO application_logs (
            application_name, log_level, timestamp, message,
            user_id, session_id, request_id, duration_ms,
            memory_usage_mb, thread_id, hostname
        )
        SELECT * FROM %I', temp_table_name);

    -- Cleanup
    EXECUTE format('DROP TABLE %I', temp_table_name);

    RETURN QUERY SELECT 
        success_count,
        error_count,
        EXTRACT(MILLISECONDS FROM clock_timestamp() - start_time)::INTEGER;
END;
$$ LANGUAGE plpgsql;

-- Real-time event streaming table with performance limitations
CREATE TABLE event_stream (
    event_id BIGSERIAL PRIMARY KEY,
    event_type VARCHAR(100) NOT NULL,
    event_timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    user_id BIGINT,
    session_id VARCHAR(100),

    -- Event payload (limited JSON support)
    event_data JSONB,

    -- Stream metadata
    stream_partition VARCHAR(50),
    sequence_number BIGINT,

    -- Processing metadata
    processing_status VARCHAR(20) DEFAULT 'pending',
    processed_at TIMESTAMP,
    processor_id VARCHAR(100)
);

-- Complex trigger for sequence number management
CREATE OR REPLACE FUNCTION update_sequence_number()
RETURNS TRIGGER AS $$
BEGIN
    NEW.sequence_number := nextval('event_stream_sequence');
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER event_stream_sequence_trigger
    BEFORE INSERT ON event_stream
    FOR EACH ROW
    EXECUTE FUNCTION update_sequence_number();

-- Performance monitoring with complex aggregations
WITH log_performance_analysis AS (
    SELECT 
        application_name,
        log_level,
        DATE_TRUNC('hour', timestamp) as hour_bucket,
        COUNT(*) as log_count,

        -- Complex aggregations causing performance issues
        AVG(CASE WHEN duration_ms IS NOT NULL THEN duration_ms ELSE NULL END) as avg_duration,
        PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY duration_ms) as p95_duration,
        AVG(CASE WHEN memory_usage_mb IS NOT NULL THEN memory_usage_mb ELSE NULL END) as avg_memory_usage,

        -- Storage analysis
        SUM(LENGTH(message)) as total_message_bytes,
        AVG(LENGTH(message)) as avg_message_length,

        -- Performance degradation over time
        COUNT(*) / EXTRACT(EPOCH FROM INTERVAL '1 hour') as logs_per_second

    FROM application_logs
    WHERE timestamp >= CURRENT_TIMESTAMP - INTERVAL '24 hours'
    GROUP BY application_name, log_level, DATE_TRUNC('hour', timestamp)
),
storage_growth_analysis AS (
    -- Complex storage growth calculations
    SELECT 
        DATE_TRUNC('day', timestamp) as day_bucket,
        COUNT(*) as daily_logs,
        SUM(LENGTH(message) + COALESCE(LENGTH(session_id), 0) + COALESCE(LENGTH(request_id), 0)) as daily_storage_bytes,

        -- Growth projections (expensive calculations)
        LAG(COUNT(*)) OVER (ORDER BY DATE_TRUNC('day', timestamp)) as prev_day_logs,
        LAG(SUM(LENGTH(message) + COALESCE(LENGTH(session_id), 0) + COALESCE(LENGTH(request_id), 0))) OVER (ORDER BY DATE_TRUNC('day', timestamp)) as prev_day_bytes

    FROM application_logs
    WHERE timestamp >= CURRENT_TIMESTAMP - INTERVAL '30 days'
    GROUP BY DATE_TRUNC('day', timestamp)
)
SELECT 
    lpa.application_name,
    lpa.log_level,
    lpa.hour_bucket,
    lpa.log_count,

    -- Performance metrics
    ROUND(lpa.avg_duration, 2) as avg_duration_ms,
    ROUND(lpa.p95_duration, 2) as p95_duration_ms,
    ROUND(lpa.logs_per_second, 2) as throughput_logs_per_second,

    -- Storage efficiency
    ROUND(lpa.total_message_bytes / 1024.0 / 1024.0, 2) as message_storage_mb,
    ROUND(lpa.avg_message_length, 0) as avg_message_length,

    -- Growth indicators
    sga.daily_logs,
    ROUND(sga.daily_storage_bytes / 1024.0 / 1024.0, 2) as daily_storage_mb,

    -- Growth rate calculations (complex and expensive)
    CASE 
        WHEN sga.prev_day_logs IS NOT NULL THEN
            ROUND(((sga.daily_logs - sga.prev_day_logs) / sga.prev_day_logs::DECIMAL * 100), 1)
        ELSE NULL
    END as daily_log_growth_percent,

    CASE 
        WHEN sga.prev_day_bytes IS NOT NULL THEN
            ROUND(((sga.daily_storage_bytes - sga.prev_day_bytes) / sga.prev_day_bytes::DECIMAL * 100), 1)
        ELSE NULL
    END as daily_storage_growth_percent

FROM log_performance_analysis lpa
JOIN storage_growth_analysis sga ON DATE_TRUNC('day', lpa.hour_bucket) = sga.day_bucket
WHERE lpa.log_count > 0
ORDER BY lpa.application_name, lpa.log_level, lpa.hour_bucket DESC;

-- Traditional logging approach problems:
-- 1. Unbounded storage growth requiring complex partitioning and archival
-- 2. Performance degradation as table size increases due to indexing overhead
-- 3. Complex maintenance procedures for partition management and cleanup
-- 4. High-frequency writes causing lock contention and performance bottlenecks
-- 5. Expensive aggregation queries for real-time monitoring and analytics
-- 6. Limited support for truly high-throughput event streaming scenarios
-- 7. Complex error handling and recovery mechanisms for batch processing
-- 8. Storage bloat and fragmentation issues requiring regular maintenance
-- 9. No guarantee of insertion order preservation under concurrent access
-- 10. Resource-intensive cleanup and archival processes impacting performance

MongoDB capped collections provide elegant fixed-size, high-performance data structures for logging and streaming:

// MongoDB Capped Collections - high-performance logging and streaming with automatic size management
const { MongoClient, ObjectId } = require('mongodb');

const client = new MongoClient('mongodb://localhost:27017');
const db = client.db('high_performance_logging');

// Comprehensive MongoDB Capped Collections Manager
class CappedCollectionsManager {
  constructor(db, config = {}) {
    this.db = db;
    this.config = {
      // Default capped collection configurations
      defaultLogSize: config.defaultLogSize || 100 * 1024 * 1024, // 100MB
      defaultMaxDocuments: config.defaultMaxDocuments || 50000,

      // Performance optimization settings
      enableBulkOperations: config.enableBulkOperations !== false,
      enableAsyncOperations: config.enableAsyncOperations !== false,
      batchSize: config.batchSize || 1000,
      writeBufferSize: config.writeBufferSize || 16384,

      // Collection management
      enablePerformanceMonitoring: config.enablePerformanceMonitoring !== false,
      enableAutoOptimization: config.enableAutoOptimization !== false,
      enableMetricsCollection: config.enableMetricsCollection !== false,

      // Write concern and consistency
      writeConcern: config.writeConcern || {
        w: 1, // Fast writes for high-throughput logging
        j: false, // Disable journaling for maximum speed (trade-off with durability)
        wtimeout: 1000
      },

      // Advanced features
      enableTailableCursors: config.enableTailableCursors !== false,
      enableChangeStreams: config.enableChangeStreams !== false,
      enableRealTimeProcessing: config.enableRealTimeProcessing !== false,

      // Resource management
      maxConcurrentTails: config.maxConcurrentTails || 10,
      tailCursorTimeout: config.tailCursorTimeout || 30000,
      processingThreads: config.processingThreads || 4
    };

    // Collection references
    this.cappedCollections = new Map();
    this.tailableCursors = new Map();
    this.performanceMetrics = new Map();
    this.processingStats = {
      totalWrites: 0,
      totalReads: 0,
      averageWriteTime: 0,
      averageReadTime: 0,
      errorCount: 0
    };

    this.initializeCappedCollections();
  }

  async initializeCappedCollections() {
    console.log('Initializing capped collections for high-performance logging...');

    try {
      // Application logging with different retention strategies
      await this.createOptimizedCappedCollection('application_logs', {
        size: 200 * 1024 * 1024, // 200MB
        max: 100000, // Maximum 100k documents
        description: 'High-frequency application logs with automatic rotation'
      });

      // Real-time event streaming
      await this.createOptimizedCappedCollection('event_stream', {
        size: 500 * 1024 * 1024, // 500MB
        max: 250000, // Maximum 250k events
        description: 'Real-time event streaming with insertion order preservation'
      });

      // Performance metrics collection
      await this.createOptimizedCappedCollection('performance_metrics', {
        size: 100 * 1024 * 1024, // 100MB
        max: 50000, // Maximum 50k metric entries
        description: 'System performance metrics with circular buffer behavior'
      });

      // Audit trail with longer retention
      await this.createOptimizedCappedCollection('audit_trail', {
        size: 1024 * 1024 * 1024, // 1GB
        max: 1000000, // Maximum 1M audit entries
        description: 'Security audit trail with extended retention'
      });

      // User activity stream
      await this.createOptimizedCappedCollection('user_activity_stream', {
        size: 300 * 1024 * 1024, // 300MB
        max: 150000, // Maximum 150k activities
        description: 'User activity tracking with real-time processing'
      });

      // System health monitoring
      await this.createOptimizedCappedCollection('system_health_logs', {
        size: 150 * 1024 * 1024, // 150MB
        max: 75000, // Maximum 75k health checks
        description: 'System health monitoring with high-frequency updates'
      });

      // Initialize performance monitoring
      if (this.config.enablePerformanceMonitoring) {
        await this.setupPerformanceMonitoring();
      }

      // Setup real-time processing
      if (this.config.enableRealTimeProcessing) {
        await this.initializeRealTimeProcessing();
      }

      console.log('All capped collections initialized successfully');

    } catch (error) {
      console.error('Error initializing capped collections:', error);
      throw error;
    }
  }

  async createOptimizedCappedCollection(collectionName, options) {
    console.log(`Creating optimized capped collection: ${collectionName}...`);

    try {
      // Check if collection already exists
      const collections = await this.db.listCollections({ name: collectionName }).toArray();

      if (collections.length > 0) {
        // Collection exists - verify it's capped and get reference
        const collectionInfo = collections[0];
        if (!collectionInfo.options.capped) {
          throw new Error(`Collection ${collectionName} exists but is not capped`);
        }

        console.log(`Existing capped collection ${collectionName} found`);
        const collection = this.db.collection(collectionName);
        this.cappedCollections.set(collectionName, {
          collection: collection,
          options: collectionInfo.options,
          description: options.description
        });

      } else {
        // Create new capped collection
        const collection = await this.db.createCollection(collectionName, {
          capped: true,
          size: options.size,
          max: options.max,

          // Storage engine options for performance
          storageEngine: {
            wiredTiger: {
              configString: 'block_compressor=snappy' // Enable compression
            }
          }
        });

        // Create optimized indexes for capped collections
        await this.createCappedCollectionIndexes(collection, collectionName);

        this.cappedCollections.set(collectionName, {
          collection: collection,
          options: { capped: true, size: options.size, max: options.max },
          description: options.description,
          created: new Date()
        });

        console.log(`Created capped collection ${collectionName}: ${options.size} bytes, max ${options.max} documents`);
      }

    } catch (error) {
      console.error(`Error creating capped collection ${collectionName}:`, error);
      throw error;
    }
  }

  async createCappedCollectionIndexes(collection, collectionName) {
    console.log(`Creating optimized indexes for ${collectionName}...`);

    try {
      // Most capped collections benefit from a timestamp index for range queries
      // Note: Capped collections maintain insertion order, so _id is naturally ordered
      await collection.createIndex(
        { timestamp: -1 }, 
        { background: true, name: 'timestamp_desc' }
      );

      // Collection-specific indexes based on common query patterns
      switch (collectionName) {
        case 'application_logs':
          await collection.createIndexes([
            { key: { level: 1, timestamp: -1 }, background: true, name: 'level_timestamp' },
            { key: { application: 1, timestamp: -1 }, background: true, name: 'app_timestamp' },
            { key: { userId: 1 }, background: true, sparse: true, name: 'user_sparse' }
          ]);
          break;

        case 'event_stream':
          await collection.createIndexes([
            { key: { eventType: 1, timestamp: -1 }, background: true, name: 'event_type_timestamp' },
            { key: { userId: 1, timestamp: -1 }, background: true, sparse: true, name: 'user_timeline' },
            { key: { sessionId: 1 }, background: true, sparse: true, name: 'session_events' }
          ]);
          break;

        case 'performance_metrics':
          await collection.createIndexes([
            { key: { metricName: 1, timestamp: -1 }, background: true, name: 'metric_timeline' },
            { key: { hostname: 1, timestamp: -1 }, background: true, name: 'host_metrics' }
          ]);
          break;

        case 'audit_trail':
          await collection.createIndexes([
            { key: { action: 1, timestamp: -1 }, background: true, name: 'action_timeline' },
            { key: { userId: 1, timestamp: -1 }, background: true, name: 'user_audit' },
            { key: { resourceId: 1 }, background: true, sparse: true, name: 'resource_audit' }
          ]);
          break;
      }

    } catch (error) {
      console.error(`Error creating indexes for ${collectionName}:`, error);
      // Don't fail initialization for index creation issues
    }
  }

  async logApplicationEvent(application, level, message, metadata = {}) {
    const startTime = Date.now();

    try {
      const logCollection = this.cappedCollections.get('application_logs').collection;

      const logDocument = {
        timestamp: new Date(),
        application: application,
        level: level.toUpperCase(),
        message: message,

        // Enhanced metadata
        ...metadata,

        // System context
        hostname: metadata.hostname || require('os').hostname(),
        processId: process.pid,
        threadId: metadata.threadId,

        // Performance context
        memoryUsage: metadata.includeMemoryUsage ? process.memoryUsage() : undefined,

        // Request context
        requestId: metadata.requestId,
        sessionId: metadata.sessionId,
        userId: metadata.userId,

        // Application context
        version: metadata.version,
        environment: metadata.environment || process.env.NODE_ENV,

        // Timing information
        duration: metadata.duration,

        // Additional structured data
        tags: metadata.tags || [],
        customData: metadata.customData
      };

      // High-performance insert with minimal write concern
      const result = await logCollection.insertOne(logDocument, {
        writeConcern: this.config.writeConcern
      });

      // Update performance metrics
      this.updatePerformanceMetrics('application_logs', 'write', Date.now() - startTime);

      return {
        insertedId: result.insertedId,
        collection: 'application_logs',
        processingTime: Date.now() - startTime,
        logLevel: level,
        success: true
      };

    } catch (error) {
      console.error('Error logging application event:', error);
      this.processingStats.errorCount++;

      return {
        success: false,
        error: error.message,
        collection: 'application_logs',
        processingTime: Date.now() - startTime
      };
    }
  }

  async streamEvent(eventType, eventData, options = {}) {
    const startTime = Date.now();

    try {
      const streamCollection = this.cappedCollections.get('event_stream').collection;

      const eventDocument = {
        timestamp: new Date(),
        eventType: eventType,
        eventData: eventData,

        // Event metadata
        eventId: options.eventId || new ObjectId(),
        correlationId: options.correlationId,
        causationId: options.causationId,

        // User and session context
        userId: options.userId,
        sessionId: options.sessionId,

        // System context
        source: options.source || 'application',
        hostname: options.hostname || require('os').hostname(),

        // Event processing metadata
        priority: options.priority || 'normal',
        tags: options.tags || [],

        // Real-time processing flags
        requiresProcessing: options.requiresProcessing || false,
        processingStatus: options.processingStatus || 'pending',

        // Event relationships
        parentEventId: options.parentEventId,
        childEventIds: options.childEventIds || [],

        // Timing and sequence
        occurredAt: options.occurredAt || new Date(),
        sequenceNumber: options.sequenceNumber,

        // Custom event payload
        payload: eventData
      };

      // Insert event into capped collection
      const result = await streamCollection.insertOne(eventDocument, {
        writeConcern: this.config.writeConcern
      });

      // Trigger real-time processing if enabled
      if (this.config.enableRealTimeProcessing && eventDocument.requiresProcessing) {
        await this.triggerRealTimeProcessing(eventDocument);
      }

      // Update metrics
      this.updatePerformanceMetrics('event_stream', 'write', Date.now() - startTime);

      return {
        insertedId: result.insertedId,
        eventId: eventDocument.eventId,
        collection: 'event_stream',
        processingTime: Date.now() - startTime,
        success: true,
        sequenceOrder: result.insertedId // ObjectId provides natural ordering
      };

    } catch (error) {
      console.error('Error streaming event:', error);
      this.processingStats.errorCount++;

      return {
        success: false,
        error: error.message,
        collection: 'event_stream',
        processingTime: Date.now() - startTime
      };
    }
  }

  async recordPerformanceMetric(metricName, value, metadata = {}) {
    const startTime = Date.now();

    try {
      const metricsCollection = this.cappedCollections.get('performance_metrics').collection;

      const metricDocument = {
        timestamp: new Date(),
        metricName: metricName,
        value: value,

        // Metric metadata
        unit: metadata.unit || 'count',
        type: metadata.type || 'gauge', // gauge, counter, histogram, timer

        // System context
        hostname: metadata.hostname || require('os').hostname(),
        service: metadata.service || 'unknown',
        environment: metadata.environment || process.env.NODE_ENV,

        // Metric dimensions
        tags: metadata.tags || {},
        dimensions: metadata.dimensions || {},

        // Statistical data
        min: metadata.min,
        max: metadata.max,
        avg: metadata.avg,
        count: metadata.count,
        sum: metadata.sum,

        // Performance context
        duration: metadata.duration,
        sampleRate: metadata.sampleRate || 1.0,

        // Additional metadata
        source: metadata.source || 'system',
        category: metadata.category || 'performance',
        priority: metadata.priority || 'normal',

        // Custom data
        customMetadata: metadata.customMetadata
      };

      const result = await metricsCollection.insertOne(metricDocument, {
        writeConcern: this.config.writeConcern
      });

      // Update internal metrics
      this.updatePerformanceMetrics('performance_metrics', 'write', Date.now() - startTime);

      return {
        insertedId: result.insertedId,
        collection: 'performance_metrics',
        metricName: metricName,
        processingTime: Date.now() - startTime,
        success: true
      };

    } catch (error) {
      console.error('Error recording performance metric:', error);
      this.processingStats.errorCount++;

      return {
        success: false,
        error: error.message,
        collection: 'performance_metrics',
        processingTime: Date.now() - startTime
      };
    }
  }

  async createTailableCursor(collectionName, filter = {}, options = {}) {
    console.log(`Creating tailable cursor for ${collectionName}...`);

    try {
      const cappedCollection = this.cappedCollections.get(collectionName);
      if (!cappedCollection) {
        throw new Error(`Capped collection ${collectionName} not found`);
      }

      const collection = cappedCollection.collection;

      // Configure tailable cursor options
      const cursorOptions = {
        tailable: true,
        awaitData: true,
        noCursorTimeout: true,
        maxTimeMS: options.maxTimeMS || this.config.tailCursorTimeout,
        batchSize: options.batchSize || 100,
        ...options
      };

      // Create cursor starting from specified position or end
      let cursor;
      if (options.startFromEnd || options.startAfter) {
        if (options.startAfter) {
          filter._id = { $gt: options.startAfter };
        }
        cursor = collection.find(filter, cursorOptions);
      } else {
        // Start from beginning
        cursor = collection.find(filter, cursorOptions);
      }

      // Store cursor for management
      const cursorId = options.cursorId || new ObjectId().toString();
      this.tailableCursors.set(cursorId, {
        cursor: cursor,
        collection: collectionName,
        filter: filter,
        options: cursorOptions,
        created: new Date(),
        active: true
      });

      console.log(`Tailable cursor ${cursorId} created for ${collectionName}`);

      return {
        cursorId: cursorId,
        cursor: cursor,
        collection: collectionName,
        success: true
      };

    } catch (error) {
      console.error(`Error creating tailable cursor for ${collectionName}:`, error);
      return {
        success: false,
        error: error.message,
        collection: collectionName
      };
    }
  }

  async processTailableCursor(cursorId, processingFunction, options = {}) {
    console.log(`Starting tailable cursor processing for ${cursorId}...`);

    try {
      const cursorInfo = this.tailableCursors.get(cursorId);
      if (!cursorInfo) {
        throw new Error(`Tailable cursor ${cursorId} not found`);
      }

      const cursor = cursorInfo.cursor;
      const processingStats = {
        documentsProcessed: 0,
        errors: 0,
        startTime: new Date(),
        lastProcessedAt: null
      };

      // Process documents as they arrive
      while (await cursor.hasNext() && cursorInfo.active) {
        try {
          const document = await cursor.next();

          if (document) {
            // Process the document
            const processingStartTime = Date.now();
            await processingFunction(document, cursorInfo.collection);

            // Update statistics
            processingStats.documentsProcessed++;
            processingStats.lastProcessedAt = new Date();

            // Update performance metrics
            this.updatePerformanceMetrics(
              cursorInfo.collection, 
              'tail_process', 
              Date.now() - processingStartTime
            );

            // Batch processing optimization
            if (options.batchProcessing && processingStats.documentsProcessed % options.batchSize === 0) {
              await this.flushBatchProcessing(cursorId, options);
            }
          }

        } catch (processingError) {
          console.error(`Error processing document from cursor ${cursorId}:`, processingError);
          processingStats.errors++;

          // Handle processing errors based on configuration
          if (options.stopOnError) {
            break;
          }
        }
      }

      console.log(`Tailable cursor processing completed for ${cursorId}:`, processingStats);

      return {
        success: true,
        cursorId: cursorId,
        processingStats: processingStats
      };

    } catch (error) {
      console.error(`Error in tailable cursor processing for ${cursorId}:`, error);
      return {
        success: false,
        error: error.message,
        cursorId: cursorId
      };
    }
  }

  async bulkInsertLogs(collectionName, documents, options = {}) {
    console.log(`Performing bulk insert to ${collectionName} with ${documents.length} documents...`);
    const startTime = Date.now();

    try {
      const cappedCollection = this.cappedCollections.get(collectionName);
      if (!cappedCollection) {
        throw new Error(`Capped collection ${collectionName} not found`);
      }

      const collection = cappedCollection.collection;

      // Prepare documents with consistent structure
      const preparedDocuments = documents.map((doc, index) => ({
        ...doc,
        timestamp: doc.timestamp || new Date(),
        batchId: options.batchId || new ObjectId(),
        batchIndex: index,
        bulkInsertMetadata: {
          batchSize: documents.length,
          insertedAt: new Date(),
          source: options.source || 'bulk_operation'
        }
      }));

      // Configure bulk insert options for maximum performance
      const insertOptions = {
        ordered: options.ordered || false, // Unordered for better performance
        writeConcern: options.writeConcern || this.config.writeConcern,
        bypassDocumentValidation: options.bypassValidation || false
      };

      // Execute bulk insert
      const result = await collection.insertMany(preparedDocuments, insertOptions);

      // Update performance metrics
      const processingTime = Date.now() - startTime;
      this.updatePerformanceMetrics(collectionName, 'bulk_write', processingTime);
      this.processingStats.totalWrites += result.insertedCount;

      console.log(`Bulk insert completed: ${result.insertedCount} documents in ${processingTime}ms`);

      return {
        success: true,
        collection: collectionName,
        insertedCount: result.insertedCount,
        insertedIds: Object.values(result.insertedIds),
        processingTime: processingTime,
        throughput: Math.round((result.insertedCount / processingTime) * 1000), // docs/second
        batchId: options.batchId
      };

    } catch (error) {
      console.error(`Error in bulk insert to ${collectionName}:`, error);
      this.processingStats.errorCount++;

      return {
        success: false,
        error: error.message,
        collection: collectionName,
        processingTime: Date.now() - startTime
      };
    }
  }

  async queryRecentDocuments(collectionName, filter = {}, options = {}) {
    const startTime = Date.now();

    try {
      const cappedCollection = this.cappedCollections.get(collectionName);
      if (!cappedCollection) {
        throw new Error(`Capped collection ${collectionName} not found`);
      }

      const collection = cappedCollection.collection;

      // Configure query options for optimal performance
      const queryOptions = {
        sort: { $natural: options.reverse ? 1 : -1 }, // Natural order (insertion order)
        limit: options.limit || 1000,
        projection: options.projection || {},
        maxTimeMS: options.maxTimeMS || 5000,
        batchSize: options.batchSize || 100
      };

      // Add time range filter if specified
      if (options.timeRange) {
        filter.timestamp = {
          $gte: options.timeRange.start,
          $lte: options.timeRange.end || new Date()
        };
      }

      // Execute query
      const documents = await collection.find(filter, queryOptions).toArray();

      // Update performance metrics
      const processingTime = Date.now() - startTime;
      this.updatePerformanceMetrics(collectionName, 'read', processingTime);
      this.processingStats.totalReads += documents.length;

      return {
        success: true,
        collection: collectionName,
        documents: documents,
        count: documents.length,
        processingTime: processingTime,
        query: filter,
        options: queryOptions
      };

    } catch (error) {
      console.error(`Error querying ${collectionName}:`, error);
      this.processingStats.errorCount++;

      return {
        success: false,
        error: error.message,
        collection: collectionName,
        processingTime: Date.now() - startTime
      };
    }
  }

  updatePerformanceMetrics(collectionName, operationType, duration) {
    if (!this.config.enablePerformanceMonitoring) return;

    const key = `${collectionName}_${operationType}`;

    if (!this.performanceMetrics.has(key)) {
      this.performanceMetrics.set(key, {
        totalOperations: 0,
        totalTime: 0,
        averageTime: 0,
        minTime: Infinity,
        maxTime: 0,
        lastOperation: null
      });
    }

    const metrics = this.performanceMetrics.get(key);

    metrics.totalOperations++;
    metrics.totalTime += duration;
    metrics.averageTime = metrics.totalTime / metrics.totalOperations;
    metrics.minTime = Math.min(metrics.minTime, duration);
    metrics.maxTime = Math.max(metrics.maxTime, duration);
    metrics.lastOperation = new Date();

    // Update global stats
    if (operationType === 'write' || operationType === 'bulk_write') {
      this.processingStats.averageWriteTime = 
        (this.processingStats.averageWriteTime + duration) / 2;
    } else if (operationType === 'read') {
      this.processingStats.averageReadTime = 
        (this.processingStats.averageReadTime + duration) / 2;
    }
  }

  async getCollectionStats() {
    console.log('Gathering capped collection statistics...');

    const stats = {};

    for (const [collectionName, cappedInfo] of this.cappedCollections.entries()) {
      try {
        const collection = cappedInfo.collection;

        // Get MongoDB collection stats
        const mongoStats = await collection.stats();

        // Get performance metrics
        const performanceKey = `${collectionName}_write`;
        const performanceMetrics = this.performanceMetrics.get(performanceKey) || {};

        stats[collectionName] = {
          // Collection configuration
          configuration: cappedInfo.options,
          description: cappedInfo.description,
          created: cappedInfo.created,

          // MongoDB stats
          size: mongoStats.size,
          storageSize: mongoStats.storageSize,
          totalIndexSize: mongoStats.totalIndexSize,
          count: mongoStats.count,
          avgObjSize: mongoStats.avgObjSize,
          maxSize: mongoStats.maxSize,
          max: mongoStats.max,

          // Utilization metrics
          sizeUtilization: (mongoStats.size / mongoStats.maxSize * 100).toFixed(2) + '%',
          countUtilization: mongoStats.max ? (mongoStats.count / mongoStats.max * 100).toFixed(2) + '%' : 'N/A',

          // Performance metrics
          averageWriteTime: performanceMetrics.averageTime || 0,
          totalOperations: performanceMetrics.totalOperations || 0,
          minWriteTime: performanceMetrics.minTime === Infinity ? 0 : performanceMetrics.minTime || 0,
          maxWriteTime: performanceMetrics.maxTime || 0,
          lastOperation: performanceMetrics.lastOperation,

          // Health indicators
          isNearCapacity: mongoStats.size / mongoStats.maxSize > 0.8,
          hasRecentActivity: performanceMetrics.lastOperation && 
            (new Date() - performanceMetrics.lastOperation) < 300000, // 5 minutes

          // Estimated metrics
          estimatedDocumentsPerHour: this.estimateDocumentsPerHour(performanceMetrics),
          estimatedTimeToCapacity: this.estimateTimeToCapacity(mongoStats, performanceMetrics)
        };

      } catch (error) {
        stats[collectionName] = {
          error: error.message,
          available: false
        };
      }
    }

    return {
      collections: stats,
      globalStats: this.processingStats,
      summary: {
        totalCollections: this.cappedCollections.size,
        totalActiveCursors: this.tailableCursors.size,
        totalMemoryUsage: this.estimateMemoryUsage(),
        uptime: new Date() - this.startTime || new Date()
      }
    };
  }

  estimateDocumentsPerHour(performanceMetrics) {
    if (!performanceMetrics || !performanceMetrics.lastOperation) return 0;

    const hoursActive = (new Date() - (this.startTime || new Date())) / (1000 * 60 * 60);
    if (hoursActive === 0) return 0;

    return Math.round((performanceMetrics.totalOperations || 0) / hoursActive);
  }

  estimateTimeToCapacity(mongoStats, performanceMetrics) {
    if (!performanceMetrics || !performanceMetrics.totalOperations) return 'Unknown';

    const remainingSpace = mongoStats.maxSize - mongoStats.size;
    const averageDocSize = mongoStats.avgObjSize || 1000;
    const remainingDocuments = Math.floor(remainingSpace / averageDocSize);

    const documentsPerHour = this.estimateDocumentsPerHour(performanceMetrics);
    if (documentsPerHour === 0) return 'Unknown';

    const hoursToCapacity = remainingDocuments / documentsPerHour;

    if (hoursToCapacity < 24) {
      return `${Math.round(hoursToCapacity)} hours`;
    } else {
      return `${Math.round(hoursToCapacity / 24)} days`;
    }
  }

  estimateMemoryUsage() {
    // Rough estimate based on active cursors and performance metrics
    const baseMem = 50 * 1024 * 1024; // 50MB base
    const cursorMem = this.tailableCursors.size * 1024 * 1024; // 1MB per cursor
    const metricsMem = this.performanceMetrics.size * 10 * 1024; // 10KB per metric set

    return baseMem + cursorMem + metricsMem;
  }

  async shutdown() {
    console.log('Shutting down capped collections manager...');

    // Close all tailable cursors
    for (const [cursorId, cursorInfo] of this.tailableCursors.entries()) {
      try {
        cursorInfo.active = false;
        await cursorInfo.cursor.close();
        console.log(`Closed tailable cursor: ${cursorId}`);
      } catch (error) {
        console.error(`Error closing cursor ${cursorId}:`, error);
      }
    }

    // Clear collections and metrics
    this.cappedCollections.clear();
    this.tailableCursors.clear();
    this.performanceMetrics.clear();

    console.log('Capped collections manager shutdown complete');
  }
}

// Benefits of MongoDB Capped Collections:
// - Fixed-size storage with automatic old document removal (circular buffer behavior)
// - Guaranteed insertion order preservation for event sequencing
// - High-performance writes without index maintenance overhead
// - Optimal read performance for recent document queries
// - Built-in document rotation without external management
// - Tailable cursors for real-time data streaming
// - Memory-efficient operations with predictable resource usage
// - No fragmentation or storage bloat issues
// - Ideal for logging, event streaming, and real-time analytics
// - SQL-compatible operations through QueryLeaf integration

module.exports = {
  CappedCollectionsManager
};

Understanding MongoDB Capped Collections Architecture

Advanced High-Performance Logging and Streaming Patterns

Implement sophisticated capped collection strategies for production MongoDB deployments:

// Production-ready MongoDB capped collections with advanced optimization and real-time processing
class ProductionCappedCollectionsManager extends CappedCollectionsManager {
  constructor(db, productionConfig) {
    super(db, productionConfig);

    this.productionConfig = {
      ...productionConfig,
      enableShardedDeployment: true,
      enableReplicationOptimization: true,
      enableAdvancedMonitoring: true,
      enableAutomaticSizing: true,
      enableCompression: true,
      enableRealTimeAlerts: true
    };

    this.setupProductionOptimizations();
    this.initializeAdvancedMonitoring();
    this.setupAutomaticManagement();
  }

  async implementShardedCappedCollections(collectionName, shardingStrategy) {
    console.log(`Implementing sharded capped collections for ${collectionName}...`);

    const shardingConfig = {
      // Shard key design for capped collections
      shardKey: shardingStrategy.shardKey || { timestamp: 1, hostname: 1 },

      // Chunk size optimization for high-throughput writes
      chunkSizeMB: shardingStrategy.chunkSize || 16,

      // Balancing strategy
      enableAutoSplit: true,
      enableBalancer: true,
      balancerWindowStart: "01:00",
      balancerWindowEnd: "06:00",

      // Write distribution
      enableEvenWriteDistribution: true,
      monitorHotShards: true,
      automaticRebalancing: true
    };

    return await this.deployShardedCappedCollection(collectionName, shardingConfig);
  }

  async setupAdvancedRealTimeProcessing() {
    console.log('Setting up advanced real-time processing for capped collections...');

    const processingPipeline = {
      // Stream processing configuration
      streamProcessing: {
        enableChangeStreams: true,
        enableAggregationPipelines: true,
        enableParallelProcessing: true,
        maxConcurrentProcessors: 8
      },

      // Real-time analytics
      realTimeAnalytics: {
        enableWindowedAggregations: true,
        windowSizes: ['1m', '5m', '15m', '1h'],
        enableTrendDetection: true,
        enableAnomalyDetection: true
      },

      // Event correlation
      eventCorrelation: {
        enableEventMatching: true,
        correlationTimeWindow: 300000, // 5 minutes
        enableComplexEventProcessing: true
      }
    };

    return await this.deployRealTimeProcessing(processingPipeline);
  }

  async implementAutomaticCapacityManagement() {
    console.log('Implementing automatic capacity management for capped collections...');

    const capacityManagement = {
      // Automatic sizing
      automaticSizing: {
        enableDynamicResize: true,
        growthThreshold: 0.8,  // 80% capacity
        shrinkThreshold: 0.3,  // 30% capacity
        maxSize: 10 * 1024 * 1024 * 1024, // 10GB max
        minSize: 100 * 1024 * 1024 // 100MB min
      },

      // Performance-based optimization
      performanceOptimization: {
        monitorWriteLatency: true,
        latencyThreshold: 100, // 100ms
        enableAutomaticIndexing: true,
        optimizeForWorkload: true
      },

      // Resource management
      resourceManagement: {
        monitorMemoryUsage: true,
        memoryThreshold: 0.7, // 70% memory usage
        enableBackpressure: true,
        enableLoadShedding: true
      }
    };

    return await this.deployCapacityManagement(capacityManagement);
  }
}

SQL-Style Capped Collections Operations with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB capped collections and high-performance logging:

-- QueryLeaf capped collections operations with SQL-familiar syntax for MongoDB

-- Create capped collections with SQL-style DDL
CREATE CAPPED COLLECTION application_logs 
WITH (
  size = '200MB',
  max_documents = 100000,
  write_concern = 'fast',
  compression = 'snappy'
);

-- Alternative syntax for collection creation
CREATE TABLE event_stream (
  event_id UUID DEFAULT GENERATE_UUID(),
  timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  event_type VARCHAR(100) NOT NULL,
  event_data DOCUMENT,
  user_id VARCHAR(50),
  session_id VARCHAR(100),
  source VARCHAR(50) DEFAULT 'application',

  -- Capped collection metadata
  insertion_order BIGINT -- Natural insertion order in capped collections
)
WITH CAPPED (
  size = '500MB',
  max_documents = 250000,
  auto_rotation = true
);

-- High-performance log insertion with SQL syntax
INSERT INTO application_logs (
  application, level, message, timestamp, user_id, session_id, metadata
) VALUES 
  ('web-server', 'INFO', 'User login successful', CURRENT_TIMESTAMP, 'user123', 'sess456', 
   JSON_OBJECT('ip_address', '192.168.1.100', 'user_agent', 'Mozilla/5.0...')),
  ('web-server', 'WARN', 'Slow query detected', CURRENT_TIMESTAMP, 'user123', 'sess456',
   JSON_OBJECT('query_time', 2500, 'table', 'users')),
  ('payment-service', 'ERROR', 'Payment processing failed', CURRENT_TIMESTAMP, 'user789', 'sess789',
   JSON_OBJECT('amount', 99.99, 'error_code', 'CARD_DECLINED'));

-- Bulk insertion for high-throughput logging
INSERT INTO application_logs (application, level, message, timestamp, metadata)
WITH log_batch AS (
  SELECT 
    app_name as application,
    log_level as level,
    log_message as message,
    log_timestamp as timestamp,

    -- Enhanced metadata generation
    JSON_OBJECT(
      'hostname', hostname,
      'process_id', process_id,
      'thread_id', thread_id,
      'memory_usage_mb', memory_usage / 1024 / 1024,
      'request_duration_ms', request_duration,
      'tags', log_tags,
      'custom_data', custom_metadata
    ) as metadata

  FROM staging_logs
  WHERE processed = false
    AND log_timestamp >= CURRENT_TIMESTAMP - INTERVAL '1 hour'
)
SELECT application, level, message, timestamp, metadata
FROM log_batch
WHERE level IN ('INFO', 'WARN', 'ERROR', 'CRITICAL')

-- Capped collection bulk insert configuration
WITH BULK_OPTIONS (
  batch_size = 1000,
  ordered = false,
  write_concern = 'fast',
  bypass_validation = false
);

-- Event streaming with guaranteed insertion order
INSERT INTO event_stream (
  event_type, event_data, user_id, session_id, 
  correlation_id, source, priority, tags
) 
WITH event_preparation AS (
  SELECT 
    event_type,
    event_payload as event_data,
    user_id,
    session_id,

    -- Generate correlation context
    COALESCE(correlation_id, GENERATE_UUID()) as correlation_id,
    COALESCE(event_source, 'application') as source,
    COALESCE(event_priority, 'normal') as priority,

    -- Generate event tags for filtering
    ARRAY[
      event_category,
      'realtime',
      CASE WHEN event_priority = 'high' THEN 'urgent' ELSE 'standard' END
    ] as tags,

    -- Add timing metadata
    CURRENT_TIMESTAMP as insertion_timestamp,
    event_occurred_at

  FROM incoming_events
  WHERE processing_status = 'pending'
    AND event_occurred_at >= CURRENT_TIMESTAMP - INTERVAL '5 minutes'
)
SELECT 
  event_type,
  JSON_SET(
    event_data,
    '$.insertion_timestamp', insertion_timestamp,
    '$.occurred_at', event_occurred_at,
    '$.processing_context', JSON_OBJECT(
      'inserted_by', 'queryleaf',
      'capped_collection', true,
      'guaranteed_order', true
    )
  ) as event_data,
  user_id,
  session_id,
  correlation_id,
  source,
  priority,
  tags
FROM event_preparation
ORDER BY event_occurred_at, correlation_id;

-- Query recent logs with natural insertion order (most efficient for capped collections)
WITH recent_application_logs AS (
  SELECT 
    timestamp,
    application,
    level,
    message,
    user_id,
    session_id,
    metadata,

    -- Natural insertion order in capped collections
    _id as insertion_order,

    -- Extract metadata fields
    JSON_EXTRACT(metadata, '$.hostname') as hostname,
    JSON_EXTRACT(metadata, '$.request_duration_ms') as request_duration,
    JSON_EXTRACT(metadata, '$.memory_usage_mb') as memory_usage,

    -- Calculate log age
    EXTRACT(SECONDS FROM CURRENT_TIMESTAMP - timestamp) as age_seconds,

    -- Categorize log importance
    CASE level
      WHEN 'CRITICAL' THEN 1
      WHEN 'ERROR' THEN 2  
      WHEN 'WARN' THEN 3
      WHEN 'INFO' THEN 4
      WHEN 'DEBUG' THEN 5
    END as log_priority_numeric

  FROM application_logs
  WHERE 
    -- Time-based filtering (efficient with capped collections)
    timestamp >= CURRENT_TIMESTAMP - INTERVAL '1 hour'

    -- Application filtering
    AND (application = $1 OR $1 IS NULL)

    -- Level filtering
    AND level IN ('ERROR', 'WARN', 'INFO')

  -- Natural order query (most efficient for capped collections)
  ORDER BY $natural DESC
  LIMIT 1000
),

log_analysis AS (
  SELECT 
    ral.*,

    -- Session context analysis
    COUNT(*) OVER (
      PARTITION BY session_id 
      ORDER BY timestamp 
      ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) as session_log_sequence,

    -- Error rate analysis
    COUNT(*) FILTER (WHERE level IN ('ERROR', 'CRITICAL')) OVER (
      PARTITION BY application, DATE_TRUNC('minute', timestamp)
    ) as errors_this_minute,

    -- Performance analysis
    AVG(request_duration) OVER (
      PARTITION BY application 
      ORDER BY timestamp 
      ROWS BETWEEN 10 PRECEDING AND CURRENT ROW
    ) as rolling_avg_duration,

    -- Anomaly detection
    CASE 
      WHEN request_duration > 
        AVG(request_duration) OVER (
          PARTITION BY application 
          ORDER BY timestamp 
          ROWS BETWEEN 100 PRECEDING AND CURRENT ROW
        ) * 3 
      THEN 'performance_anomaly'

      WHEN errors_this_minute > 10 THEN 'error_spike'

      WHEN memory_usage > 
        AVG(memory_usage) OVER (
          PARTITION BY hostname 
          ORDER BY timestamp 
          ROWS BETWEEN 50 PRECEDING AND CURRENT ROW
        ) * 2
      THEN 'memory_anomaly'

      ELSE 'normal'
    END as anomaly_status

  FROM recent_application_logs ral
)

SELECT 
  timestamp,
  application,
  level,
  message,
  user_id,
  session_id,
  hostname,

  -- Performance metrics
  request_duration,
  memory_usage,
  rolling_avg_duration,

  -- Context information
  session_log_sequence,
  errors_this_minute,

  -- Analysis results
  log_priority_numeric,
  anomaly_status,
  age_seconds,

  -- Helpful indicators
  CASE 
    WHEN age_seconds < 60 THEN 'very_recent'
    WHEN age_seconds < 300 THEN 'recent' 
    WHEN age_seconds < 1800 THEN 'moderate'
    ELSE 'older'
  END as recency_category,

  -- Alert conditions
  CASE 
    WHEN level = 'CRITICAL' OR anomaly_status != 'normal' THEN 'immediate_attention'
    WHEN level = 'ERROR' AND errors_this_minute > 5 THEN 'monitor_closely'
    WHEN level = 'WARN' AND session_log_sequence > 20 THEN 'session_issues'
    ELSE 'normal_monitoring'
  END as attention_level

FROM log_analysis
WHERE 
  -- Focus on actionable logs
  (level IN ('CRITICAL', 'ERROR') OR anomaly_status != 'normal')

ORDER BY 
  -- Prioritize by importance and recency
  CASE attention_level
    WHEN 'immediate_attention' THEN 1
    WHEN 'monitor_closely' THEN 2  
    WHEN 'session_issues' THEN 3
    ELSE 4
  END,
  timestamp DESC

LIMIT 500;

-- Real-time event stream processing with tailable cursor behavior
WITH LIVE_EVENT_STREAM AS (
  SELECT 
    event_id,
    timestamp,
    event_type,
    event_data,
    user_id,
    session_id,
    correlation_id,
    source,
    tags,

    -- Event sequence tracking
    _id as natural_order,

    -- Extract event payload details
    JSON_EXTRACT(event_data, '$.action') as action,
    JSON_EXTRACT(event_data, '$.resource') as resource,
    JSON_EXTRACT(event_data, '$.metadata') as event_metadata,

    -- Real-time processing flags
    JSON_EXTRACT(event_data, '$.requires_processing') as requires_processing,
    JSON_EXTRACT(event_data, '$.priority') as event_priority

  FROM event_stream
  WHERE 
    -- Process events from the last insertion point
    _id > $last_processed_id

    -- Focus on events requiring real-time processing
    AND (
      JSON_EXTRACT(event_data, '$.requires_processing') = true
      OR event_type IN ('user_action', 'system_alert', 'security_event')
      OR JSON_EXTRACT(event_data, '$.priority') = 'high'
    )

  -- Use natural insertion order for optimal capped collection performance
  ORDER BY $natural ASC
),

event_correlation AS (
  SELECT 
    les.*,

    -- Correlation analysis
    COUNT(*) OVER (
      PARTITION BY correlation_id
      ORDER BY natural_order
    ) as correlation_sequence,

    -- User behavior patterns
    COUNT(*) OVER (
      PARTITION BY user_id, event_type
      ORDER BY timestamp
      RANGE BETWEEN INTERVAL '5 minutes' PRECEDING AND CURRENT ROW  
    ) as recent_similar_events,

    -- Session context
    STRING_AGG(event_type, ' -> ') OVER (
      PARTITION BY session_id
      ORDER BY natural_order
      ROWS BETWEEN 5 PRECEDING AND CURRENT ROW
    ) as session_event_sequence,

    -- Anomaly detection
    CASE 
      WHEN recent_similar_events > 10 THEN 'potential_abuse'
      WHEN correlation_sequence > 50 THEN 'long_running_process'
      WHEN event_type = 'security_event' THEN 'security_concern'
      ELSE 'normal_event'
    END as event_classification

  FROM LIVE_EVENT_STREAM les
),

processed_events AS (
  SELECT 
    ec.*,

    -- Generate processing instructions
    JSON_OBJECT(
      'processing_priority', 
      CASE event_classification
        WHEN 'security_concern' THEN 'critical'
        WHEN 'potential_abuse' THEN 'high'
        WHEN 'long_running_process' THEN 'monitor'
        ELSE 'standard'
      END,

      'correlation_context', JSON_OBJECT(
        'correlation_id', correlation_id,
        'sequence', correlation_sequence,
        'related_events', recent_similar_events
      ),

      'session_context', JSON_OBJECT(
        'session_id', session_id,
        'event_sequence', session_event_sequence,
        'user_id', user_id
      ),

      'processing_metadata', JSON_OBJECT(
        'inserted_at', CURRENT_TIMESTAMP,
        'natural_order', natural_order,
        'capped_collection_source', true
      )
    ) as processing_instructions,

    -- Determine next processing steps
    CASE event_classification
      WHEN 'security_concern' THEN 'immediate_alert'
      WHEN 'potential_abuse' THEN 'rate_limit_check'  
      WHEN 'long_running_process' THEN 'status_update'
      ELSE 'standard_processing'
    END as next_action

  FROM event_correlation ec
)

SELECT 
  event_id,
  timestamp,
  event_type,
  action,
  resource,
  user_id,
  session_id,

  -- Analysis results
  event_classification,
  correlation_sequence,
  recent_similar_events,
  next_action,

  -- Processing context
  processing_instructions,

  -- Natural ordering for downstream systems
  natural_order,

  -- Real-time indicators
  EXTRACT(SECONDS FROM CURRENT_TIMESTAMP - timestamp) as processing_latency_seconds,

  CASE 
    WHEN EXTRACT(SECONDS FROM CURRENT_TIMESTAMP - timestamp) < 5 THEN 'real_time'
    WHEN EXTRACT(SECONDS FROM CURRENT_TIMESTAMP - timestamp) < 30 THEN 'near_real_time'  
    ELSE 'delayed_processing'
  END as processing_timeliness

FROM processed_events
WHERE event_classification != 'normal_event' OR requires_processing = true
ORDER BY 
  -- Process highest priority events first
  CASE next_action
    WHEN 'immediate_alert' THEN 1
    WHEN 'rate_limit_check' THEN 2
    WHEN 'status_update' THEN 3
    ELSE 4
  END,
  natural_order ASC;

-- Performance metrics and capacity monitoring for capped collections
WITH capped_collection_stats AS (
  SELECT 
    collection_name,

    -- Storage utilization
    current_size_mb,
    max_size_mb,
    (current_size_mb / max_size_mb * 100) as size_utilization_percent,

    -- Document utilization  
    document_count,
    max_documents,
    (document_count / NULLIF(max_documents, 0) * 100) as document_utilization_percent,

    -- Performance metrics
    avg_document_size,
    total_index_size_mb,

    -- Operation statistics
    total_inserts_today,
    avg_inserts_per_hour,
    peak_inserts_per_hour,

    -- Capacity projections
    estimated_hours_to_capacity,
    estimated_rotation_frequency

  FROM (
    -- This would be populated by MongoDB collection stats
    VALUES 
      ('application_logs', 150, 200, 75000, 100000, 2048, 5, 180000, 7500, 15000, 8, 'every_3_hours'),
      ('event_stream', 400, 500, 200000, 250000, 2048, 8, 480000, 20000, 35000, 4, 'every_hour'),
      ('performance_metrics', 80, 100, 40000, 50000, 2048, 3, 96000, 4000, 8000, 20, 'every_5_hours')
  ) AS stats(collection_name, current_size_mb, max_size_mb, document_count, max_documents, 
             avg_document_size, total_index_size_mb, total_inserts_today, avg_inserts_per_hour,
             peak_inserts_per_hour, estimated_hours_to_capacity, estimated_rotation_frequency)
),

performance_analysis AS (
  SELECT 
    ccs.*,

    -- Utilization status
    CASE 
      WHEN size_utilization_percent > 90 THEN 'critical'
      WHEN size_utilization_percent > 80 THEN 'warning'  
      WHEN size_utilization_percent > 60 THEN 'moderate'
      ELSE 'healthy'
    END as size_status,

    CASE 
      WHEN document_utilization_percent > 90 THEN 'critical'
      WHEN document_utilization_percent > 80 THEN 'warning'
      WHEN document_utilization_percent > 60 THEN 'moderate'  
      ELSE 'healthy'
    END as document_status,

    -- Performance indicators
    CASE 
      WHEN peak_inserts_per_hour / NULLIF(avg_inserts_per_hour, 0) > 3 THEN 'high_variance'
      WHEN peak_inserts_per_hour / NULLIF(avg_inserts_per_hour, 0) > 2 THEN 'moderate_variance'
      ELSE 'stable_load'
    END as load_pattern,

    -- Capacity recommendations
    CASE 
      WHEN estimated_hours_to_capacity < 24 THEN 'monitor_closely'
      WHEN estimated_hours_to_capacity < 72 THEN 'plan_expansion'
      WHEN estimated_hours_to_capacity > 168 THEN 'over_provisioned'
      ELSE 'adequate_capacity'
    END as capacity_recommendation,

    -- Optimization suggestions
    CASE 
      WHEN total_index_size_mb / current_size_mb > 0.3 THEN 'review_indexes'
      WHEN avg_document_size > 4096 THEN 'consider_compression'
      WHEN avg_inserts_per_hour < 100 THEN 'potentially_over_sized'
      ELSE 'well_optimized'
    END as optimization_suggestion

  FROM capped_collection_stats ccs
)

SELECT 
  collection_name,

  -- Current utilization
  ROUND(size_utilization_percent, 1) as size_used_percent,
  ROUND(document_utilization_percent, 1) as documents_used_percent,
  size_status,
  document_status,

  -- Capacity information  
  current_size_mb,
  max_size_mb,
  (max_size_mb - current_size_mb) as remaining_capacity_mb,
  document_count,
  max_documents,

  -- Performance metrics
  avg_document_size,
  total_index_size_mb,
  load_pattern,
  avg_inserts_per_hour,
  peak_inserts_per_hour,

  -- Projections and recommendations
  estimated_hours_to_capacity,
  estimated_rotation_frequency,
  capacity_recommendation,
  optimization_suggestion,

  -- Action items
  CASE 
    WHEN size_status = 'critical' OR document_status = 'critical' THEN 'immediate_action_required'
    WHEN capacity_recommendation = 'monitor_closely' THEN 'increase_monitoring_frequency'
    WHEN optimization_suggestion != 'well_optimized' THEN 'schedule_optimization_review'
    ELSE 'continue_normal_operations'
  END as recommended_action,

  -- Detailed recommendations
  CASE recommended_action
    WHEN 'immediate_action_required' THEN 'Increase capped collection size or reduce retention period'
    WHEN 'increase_monitoring_frequency' THEN 'Monitor every 15 minutes instead of hourly'
    WHEN 'schedule_optimization_review' THEN 'Review indexes, compression, and document structure'
    ELSE 'Collection is operating within normal parameters'
  END as action_details

FROM performance_analysis
ORDER BY 
  CASE size_status 
    WHEN 'critical' THEN 1
    WHEN 'warning' THEN 2
    WHEN 'moderate' THEN 3  
    ELSE 4
  END,
  collection_name;

-- QueryLeaf provides comprehensive capped collection capabilities:
-- 1. SQL-familiar capped collection creation and management
-- 2. High-performance bulk insertion with optimized batching
-- 3. Natural insertion order queries for optimal performance
-- 4. Real-time event streaming with tailable cursor behavior  
-- 5. Advanced analytics and anomaly detection on streaming data
-- 6. Automatic capacity monitoring and optimization recommendations
-- 7. Integration with MongoDB's native capped collection optimizations
-- 8. SQL-style operations for complex streaming data workflows
-- 9. Built-in performance monitoring and alerting capabilities
-- 10. Production-ready capped collections with enterprise features

Best Practices for Capped Collections Implementation

Performance Optimization and Design Strategy

Essential principles for effective MongoDB capped collections deployment:

  1. Size Planning: Calculate optimal collection sizes based on throughput, retention requirements, and query patterns
  2. Write Optimization: Design write patterns that leverage capped collections' sequential write performance advantages
  3. Query Strategy: Utilize natural insertion order and time-based queries for optimal read performance
  4. Index Design: Implement minimal, strategic indexing that complements capped collection characteristics
  5. Monitoring Strategy: Track utilization, rotation frequency, and performance metrics for capacity planning
  6. Integration Patterns: Design applications that benefit from guaranteed insertion order and automatic data lifecycle

Production Deployment and Operational Excellence

Optimize capped collections for enterprise-scale requirements:

  1. Capacity Management: Implement automated monitoring and alerting for collection utilization and performance
  2. Write Distribution: Design shard keys and distribution strategies for balanced writes across replica sets
  3. Real-Time Processing: Leverage tailable cursors and change streams for efficient real-time data processing
  4. Backup Strategy: Account for capped collection characteristics in backup and disaster recovery planning
  5. Performance Monitoring: Track write throughput, query performance, and resource utilization continuously
  6. Operational Integration: Integrate capped collections with existing logging, monitoring, and alerting infrastructure

Conclusion

MongoDB capped collections provide native high-performance data structures that eliminate the complexity of traditional logging and streaming solutions through fixed-size storage, guaranteed insertion order, and automatic data lifecycle management. The combination of predictable performance characteristics with real-time processing capabilities makes capped collections ideal for modern streaming data applications.

Key MongoDB Capped Collections benefits include:

  • High-Performance Writes: Sequential write optimization with minimal index maintenance overhead
  • Predictable Storage: Fixed-size collections with automatic old document removal and no storage bloat
  • Insertion Order Guarantee: Natural document ordering ideal for event sequencing and temporal data analysis
  • Real-Time Processing: Tailable cursors and change streams for efficient streaming data consumption
  • Resource Efficiency: Predictable memory usage and optimal performance characteristics for high-throughput scenarios
  • SQL Accessibility: Familiar SQL-style capped collection operations through QueryLeaf for accessible streaming data management

Whether you're implementing application logging, event streaming, performance monitoring, or real-time analytics, MongoDB capped collections with QueryLeaf's familiar SQL interface provide the foundation for efficient, predictable, and scalable streaming data solutions.

QueryLeaf Integration: QueryLeaf seamlessly manages MongoDB capped collections while providing SQL-familiar syntax for high-performance logging, real-time streaming, and circular buffer operations. Advanced capped collection patterns including capacity planning, real-time processing, and performance optimization are elegantly handled through familiar SQL constructs, making sophisticated streaming data management both powerful and accessible to SQL-oriented development teams.

The combination of MongoDB's robust capped collection capabilities with SQL-style streaming operations makes it an ideal platform for applications requiring both high-throughput data capture and familiar database interaction patterns, ensuring your streaming data infrastructure can scale efficiently while maintaining predictable performance and operational simplicity.

MongoDB TTL Collections: Automatic Data Lifecycle Management and Expiration for Efficient Storage

Modern applications generate vast amounts of transient data that needs careful lifecycle management to maintain performance and control storage costs. Traditional approaches to data cleanup involve complex batch jobs, scheduled maintenance scripts, and manual processes that are error-prone and resource-intensive.

MongoDB TTL (Time To Live) collections provide native automatic data expiration capabilities that eliminate the complexity of manual data lifecycle management. Unlike traditional database systems that require custom deletion processes or external job schedulers, MongoDB TTL indexes automatically remove expired documents, ensuring optimal storage utilization and performance without operational overhead.

The Traditional Data Lifecycle Challenge

Conventional approaches to managing data expiration and cleanup involve significant complexity and operational burden:

-- Traditional PostgreSQL data cleanup approach - complex and resource-intensive

-- Session cleanup with manual batch processing
CREATE TABLE user_sessions (
    session_id UUID PRIMARY KEY,
    user_id BIGINT NOT NULL,
    session_data JSONB,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    expires_at TIMESTAMP NOT NULL,
    last_accessed TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    is_active BOOLEAN DEFAULT true
);

-- Scheduled cleanup job (requires external cron/scheduler)
-- This query must run regularly and can be resource-intensive
DELETE FROM user_sessions 
WHERE expires_at < CURRENT_TIMESTAMP 
   OR (last_accessed < CURRENT_TIMESTAMP - INTERVAL '30 days' AND is_active = false);

-- Complex log cleanup with multiple conditions
CREATE TABLE application_logs (
    log_id BIGSERIAL PRIMARY KEY,
    application_name VARCHAR(100) NOT NULL,
    log_level VARCHAR(20) NOT NULL,
    message TEXT,
    metadata JSONB,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    -- Manual retention policy implementation
    retention_days INTEGER DEFAULT 30,
    should_archive BOOLEAN DEFAULT false
);

-- Multi-stage cleanup process
WITH logs_to_cleanup AS (
    SELECT log_id, application_name, created_at, retention_days
    FROM application_logs
    WHERE 
        -- Different retention periods by log level
        (log_level = 'DEBUG' AND created_at < CURRENT_TIMESTAMP - INTERVAL '7 days')
        OR (log_level = 'INFO' AND created_at < CURRENT_TIMESTAMP - INTERVAL '30 days')
        OR (log_level = 'WARN' AND created_at < CURRENT_TIMESTAMP - INTERVAL '90 days')
        OR (log_level = 'ERROR' AND created_at < CURRENT_TIMESTAMP - INTERVAL '365 days')
        OR (should_archive = false AND created_at < CURRENT_TIMESTAMP - retention_days * INTERVAL '1 day')
),
archival_candidates AS (
    -- Identify logs that should be archived before deletion
    SELECT ltc.log_id, ltc.application_name, ltc.created_at
    FROM logs_to_cleanup ltc
    JOIN application_logs al ON ltc.log_id = al.log_id
    WHERE al.log_level IN ('ERROR', 'CRITICAL') 
       OR al.metadata ? 'trace_id' -- Contains important debugging info
),
archive_process AS (
    -- Archive important logs (complex external process)
    INSERT INTO archived_application_logs 
    SELECT al.* FROM application_logs al
    JOIN archival_candidates ac ON al.log_id = ac.log_id
    RETURNING log_id
)
-- Finally delete the logs
DELETE FROM application_logs
WHERE log_id IN (
    SELECT log_id FROM logs_to_cleanup
    WHERE log_id NOT IN (SELECT log_id FROM archival_candidates)
       OR log_id IN (SELECT log_id FROM archive_process)
);

-- Traditional approach problems:
-- 1. Complex scheduling and orchestration required
-- 2. Resource-intensive batch operations during cleanup
-- 3. Risk of data loss if cleanup jobs fail
-- 4. Manual management of different retention policies
-- 5. No automatic optimization of storage and indexes
-- 6. Difficulty in handling timezone and date calculations
-- 7. Complex error handling and retry logic required
-- 8. Performance impact during large cleanup operations
-- 9. Manual coordination between cleanup and application logic
-- 10. Inconsistent cleanup behavior across different environments

-- Attempting MySQL-style events (limited functionality)
SET GLOBAL event_scheduler = ON;

CREATE EVENT cleanup_expired_sessions
ON SCHEDULE EVERY 1 HOUR
STARTS CURRENT_TIMESTAMP
DO
BEGIN
    DELETE FROM user_sessions 
    WHERE expires_at < NOW() 
    LIMIT 1000; -- Prevent long-running operations
END;

-- MySQL event limitations:
-- - Basic scheduling only
-- - No complex retention logic
-- - Limited error handling
-- - Manual management of batch sizes
-- - No integration with application lifecycle
-- - Poor visibility into cleanup operations

MongoDB TTL collections provide elegant automatic data expiration:

// MongoDB TTL Collections - automatic data lifecycle management
const { MongoClient, ObjectId } = require('mongodb');

const client = new MongoClient('mongodb://localhost:27017');
const db = client.db('data_lifecycle_management');

// Comprehensive MongoDB TTL Data Lifecycle Manager
class MongoDBTTLManager {
  constructor(db, config = {}) {
    this.db = db;
    this.config = {
      defaultTTL: config.defaultTTL || 3600, // 1 hour default
      enableMetrics: config.enableMetrics !== false,
      enableIndexOptimization: config.enableIndexOptimization !== false,
      cleanupLogLevel: config.cleanupLogLevel || 'info',
      ...config
    };

    this.collections = {
      userSessions: db.collection('user_sessions'),
      applicationLogs: db.collection('application_logs'),
      temporaryData: db.collection('temporary_data'),
      eventStream: db.collection('event_stream'),
      apiRequests: db.collection('api_requests'),
      cacheEntries: db.collection('cache_entries'),
      ttlMetrics: db.collection('ttl_metrics')
    };

    this.ttlIndexes = new Map();
    this.expirationStrategies = new Map();
  }

  async initializeTTLCollections() {
    console.log('Initializing TTL collections and indexes...');

    try {
      // User sessions with 24-hour expiration
      await this.setupSessionTTL();

      // Application logs with variable retention based on log level
      await this.setupLogsTTL();

      // Temporary data with flexible expiration
      await this.setupTemporaryDataTTL();

      // Event stream with time-based partitioning
      await this.setupEventStreamTTL();

      // API request tracking with automatic cleanup
      await this.setupAPIRequestsTTL();

      // Cache entries with intelligent expiration
      await this.setupCacheTTL();

      // Metrics collection for monitoring TTL performance
      await this.setupTTLMetrics();

      console.log('All TTL collections initialized successfully');

    } catch (error) {
      console.error('Error initializing TTL collections:', error);
      throw error;
    }
  }

  async setupSessionTTL() {
    console.log('Setting up user session TTL...');

    const sessionCollection = this.collections.userSessions;

    // Create TTL index for automatic session expiration
    await sessionCollection.createIndex(
      { expiresAt: 1 },
      { 
        expireAfterSeconds: 0, // Expire based on document field value
        background: true,
        name: 'session_ttl_index'
      }
    );

    // Secondary TTL index for inactive sessions
    await sessionCollection.createIndex(
      { lastAccessedAt: 1 },
      { 
        expireAfterSeconds: 7 * 24 * 3600, // 7 days for inactive sessions
        background: true,
        name: 'session_inactivity_ttl_index'
      }
    );

    // Compound index for efficient session queries
    await sessionCollection.createIndex(
      { userId: 1, isActive: 1, expiresAt: 1 },
      { background: true }
    );

    this.ttlIndexes.set('userSessions', [
      { field: 'expiresAt', expireAfterSeconds: 0 },
      { field: 'lastAccessedAt', expireAfterSeconds: 7 * 24 * 3600 }
    ]);

    console.log('User session TTL configured');
  }

  async createUserSession(userId, sessionData, customTTL = null) {
    const expirationTime = new Date(Date.now() + ((customTTL || 24 * 3600) * 1000));

    const sessionDocument = {
      sessionId: new ObjectId(),
      userId: userId,
      sessionData: sessionData,
      createdAt: new Date(),
      expiresAt: expirationTime, // TTL field for automatic expiration
      lastAccessedAt: new Date(),
      isActive: true,

      // Session metadata
      userAgent: sessionData.userAgent,
      ipAddress: sessionData.ipAddress,
      deviceType: sessionData.deviceType,

      // Expiration strategy metadata
      ttlStrategy: 'fixed_expiration',
      customTTL: customTTL,
      renewalCount: 0
    };

    const result = await this.collections.userSessions.insertOne(sessionDocument);

    console.log(`Created session ${result.insertedId} for user ${userId}, expires at ${expirationTime}`);
    return result.insertedId;
  }

  async renewUserSession(sessionId, additionalTTL = 3600) {
    const newExpirationTime = new Date(Date.now() + (additionalTTL * 1000));

    const result = await this.collections.userSessions.updateOne(
      { sessionId: new ObjectId(sessionId), isActive: true },
      {
        $set: {
          expiresAt: newExpirationTime,
          lastAccessedAt: new Date()
        },
        $inc: { renewalCount: 1 }
      }
    );

    if (result.modifiedCount > 0) {
      console.log(`Renewed session ${sessionId} until ${newExpirationTime}`);
    }

    return result.modifiedCount > 0;
  }

  async setupLogsTTL() {
    console.log('Setting up application logs TTL with level-based retention...');

    const logsCollection = this.collections.applicationLogs;

    // Create partial TTL indexes for different log levels
    // Debug logs expire quickly
    await logsCollection.createIndex(
      { createdAt: 1 },
      {
        expireAfterSeconds: 7 * 24 * 3600, // 7 days
        partialFilterExpression: { logLevel: 'DEBUG' },
        background: true,
        name: 'debug_logs_ttl'
      }
    );

    // Info logs have moderate retention
    await logsCollection.createIndex(
      { createdAt: 1 },
      {
        expireAfterSeconds: 30 * 24 * 3600, // 30 days
        partialFilterExpression: { logLevel: 'INFO' },
        background: true,
        name: 'info_logs_ttl'
      }
    );

    // Warning logs kept longer
    await logsCollection.createIndex(
      { createdAt: 1 },
      {
        expireAfterSeconds: 90 * 24 * 3600, // 90 days
        partialFilterExpression: { logLevel: 'WARN' },
        background: true,
        name: 'warn_logs_ttl'
      }
    );

    // Error logs kept for a full year
    await logsCollection.createIndex(
      { createdAt: 1 },
      {
        expireAfterSeconds: 365 * 24 * 3600, // 365 days
        partialFilterExpression: { logLevel: { $in: ['ERROR', 'CRITICAL'] } },
        background: true,
        name: 'error_logs_ttl'
      }
    );

    // Compound index for efficient log queries
    await logsCollection.createIndex(
      { applicationName: 1, logLevel: 1, createdAt: -1 },
      { background: true }
    );

    this.expirationStrategies.set('applicationLogs', {
      DEBUG: 7 * 24 * 3600,
      INFO: 30 * 24 * 3600,
      WARN: 90 * 24 * 3600,
      ERROR: 365 * 24 * 3600,
      CRITICAL: 365 * 24 * 3600
    });

    console.log('Application logs TTL configured with level-based retention');
  }

  async createLogEntry(applicationName, logLevel, message, metadata = {}) {
    const logDocument = {
      logId: new ObjectId(),
      applicationName: applicationName,
      logLevel: logLevel.toUpperCase(),
      message: message,
      metadata: metadata,
      createdAt: new Date(), // TTL field used by level-specific indexes

      // Additional context
      hostname: metadata.hostname || 'unknown',
      processId: metadata.processId,
      threadId: metadata.threadId,
      traceId: metadata.traceId,

      // Automatic expiration via TTL indexes
      // No manual expiration field needed - handled by partial TTL indexes
    };

    const result = await this.collections.applicationLogs.insertOne(logDocument);

    // Log retention info based on level
    const retentionDays = this.expirationStrategies.get('applicationLogs')[logLevel.toUpperCase()];
    const expirationDate = new Date(Date.now() + (retentionDays * 1000));

    if (this.config.cleanupLogLevel === 'debug') {
      console.log(`Created ${logLevel} log entry ${result.insertedId}, will expire around ${expirationDate}`);
    }

    return result.insertedId;
  }

  async setupTemporaryDataTTL() {
    console.log('Setting up temporary data TTL with flexible expiration...');

    const tempCollection = this.collections.temporaryData;

    // Primary TTL index using document field
    await tempCollection.createIndex(
      { expiresAt: 1 },
      {
        expireAfterSeconds: 0, // Use document field value
        background: true,
        name: 'temp_data_ttl'
      }
    );

    // Backup TTL index with default expiration
    await tempCollection.createIndex(
      { createdAt: 1 },
      {
        expireAfterSeconds: 24 * 3600, // 24 hours default
        partialFilterExpression: { expiresAt: { $exists: false } },
        background: true,
        name: 'temp_data_default_ttl'
      }
    );

    // Index for data type queries
    await tempCollection.createIndex(
      { dataType: 1, createdAt: -1 },
      { background: true }
    );

    console.log('Temporary data TTL configured');
  }

  async storeTemporaryData(dataType, data, ttlSeconds = 3600) {
    const expirationTime = new Date(Date.now() + (ttlSeconds * 1000));

    const tempDocument = {
      tempId: new ObjectId(),
      dataType: dataType,
      data: data,
      createdAt: new Date(),
      expiresAt: expirationTime, // TTL field

      // Metadata
      sizeBytes: JSON.stringify(data).length,
      compressionType: data.compressionType || 'none',
      accessCount: 0,

      // TTL configuration
      ttlSeconds: ttlSeconds,
      autoExpire: true
    };

    const result = await this.collections.temporaryData.insertOne(tempDocument);

    console.log(`Stored temporary ${dataType} data ${result.insertedId}, expires at ${expirationTime}`);
    return result.insertedId;
  }

  async setupEventStreamTTL() {
    console.log('Setting up event stream TTL with sliding window retention...');

    const eventCollection = this.collections.eventStream;

    // TTL index for event stream with 30-day retention
    await eventCollection.createIndex(
      { timestamp: 1 },
      {
        expireAfterSeconds: 30 * 24 * 3600, // 30 days
        background: true,
        name: 'event_stream_ttl'
      }
    );

    // Compound index for event queries
    await eventCollection.createIndex(
      { eventType: 1, timestamp: -1 },
      { background: true }
    );

    // Index for user-specific events
    await eventCollection.createIndex(
      { userId: 1, timestamp: -1 },
      { background: true }
    );

    console.log('Event stream TTL configured');
  }

  async createEvent(eventType, userId, eventData) {
    const eventDocument = {
      eventId: new ObjectId(),
      eventType: eventType,
      userId: userId,
      eventData: eventData,
      timestamp: new Date(), // TTL field

      // Event metadata
      source: eventData.source || 'application',
      sessionId: eventData.sessionId,
      correlationId: eventData.correlationId,

      // Automatic expiration after 30 days via TTL index
    };

    const result = await this.collections.eventStream.insertOne(eventDocument);
    return result.insertedId;
  }

  async setupAPIRequestsTTL() {
    console.log('Setting up API requests TTL for monitoring and analytics...');

    const apiCollection = this.collections.apiRequests;

    // TTL index with 7-day retention for API requests
    await apiCollection.createIndex(
      { requestTime: 1 },
      {
        expireAfterSeconds: 7 * 24 * 3600, // 7 days
        background: true,
        name: 'api_requests_ttl'
      }
    );

    // Compound indexes for API analytics
    await apiCollection.createIndex(
      { endpoint: 1, requestTime: -1 },
      { background: true }
    );

    await apiCollection.createIndex(
      { statusCode: 1, requestTime: -1 },
      { background: true }
    );

    console.log('API requests TTL configured');
  }

  async logAPIRequest(endpoint, method, statusCode, responseTime, metadata = {}) {
    const requestDocument = {
      requestId: new ObjectId(),
      endpoint: endpoint,
      method: method.toUpperCase(),
      statusCode: statusCode,
      responseTime: responseTime,
      requestTime: new Date(), // TTL field

      // Request details
      userAgent: metadata.userAgent,
      ipAddress: metadata.ipAddress,
      userId: metadata.userId,
      sessionId: metadata.sessionId,

      // Performance metrics
      requestSize: metadata.requestSize || 0,
      responseSize: metadata.responseSize || 0,

      // Automatic expiration after 7 days
    };

    const result = await this.collections.apiRequests.insertOne(requestDocument);
    return result.insertedId;
  }

  async setupCacheTTL() {
    console.log('Setting up cache entries TTL with intelligent expiration...');

    const cacheCollection = this.collections.cacheEntries;

    // Primary TTL index using document field for custom expiration
    await cacheCollection.createIndex(
      { expiresAt: 1 },
      {
        expireAfterSeconds: 0, // Use document field
        background: true,
        name: 'cache_ttl'
      }
    );

    // Backup TTL for entries without explicit expiration
    await cacheCollection.createIndex(
      { lastAccessedAt: 1 },
      {
        expireAfterSeconds: 3600, // 1 hour default
        background: true,
        name: 'cache_access_ttl'
      }
    );

    // Index for cache key lookups
    await cacheCollection.createIndex(
      { cacheKey: 1 },
      { unique: true, background: true }
    );

    console.log('Cache TTL configured');
  }

  async setCacheEntry(cacheKey, value, ttlSeconds = 300) {
    const expirationTime = new Date(Date.now() + (ttlSeconds * 1000));

    const cacheDocument = {
      cacheKey: cacheKey,
      value: value,
      createdAt: new Date(),
      lastAccessedAt: new Date(),
      expiresAt: expirationTime, // TTL field

      // Cache metadata
      accessCount: 0,
      ttlSeconds: ttlSeconds,
      valueType: typeof value,
      sizeBytes: JSON.stringify(value).length,

      // Hit ratio tracking
      hitCount: 0,
      missCount: 0
    };

    const result = await cacheCollection.updateOne(
      { cacheKey: cacheKey },
      {
        $set: cacheDocument,
        $setOnInsert: { createdAt: new Date() }
      },
      { upsert: true }
    );

    return result.upsertedId || result.modifiedCount > 0;
  }

  async getCacheEntry(cacheKey) {
    const result = await this.collections.cacheEntries.findOneAndUpdate(
      { cacheKey: cacheKey },
      {
        $set: { lastAccessedAt: new Date() },
        $inc: { accessCount: 1, hitCount: 1 }
      },
      { returnDocument: 'after' }
    );

    return result.value?.value || null;
  }

  async setupTTLMetrics() {
    console.log('Setting up TTL metrics collection...');

    const metricsCollection = this.collections.ttlMetrics;

    // TTL index for metrics with 90-day retention
    await metricsCollection.createIndex(
      { timestamp: 1 },
      {
        expireAfterSeconds: 90 * 24 * 3600, // 90 days
        background: true,
        name: 'metrics_ttl'
      }
    );

    // Index for metrics queries
    await metricsCollection.createIndex(
      { collectionName: 1, timestamp: -1 },
      { background: true }
    );

    console.log('TTL metrics collection configured');
  }

  async collectTTLMetrics() {
    console.log('Collecting TTL performance metrics...');

    try {
      const metrics = {
        timestamp: new Date(),
        collections: {}
      };

      // Collect metrics for each TTL collection
      for (const [collectionName, collection] of Object.entries(this.collections)) {
        if (collectionName === 'ttlMetrics') continue;

        const collectionStats = await collection.stats();
        const indexStats = await this.getTTLIndexStats(collection);

        metrics.collections[collectionName] = {
          documentCount: collectionStats.count,
          storageSize: collectionStats.storageSize,
          avgObjSize: collectionStats.avgObjSize,
          totalIndexSize: collectionStats.totalIndexSize,
          ttlIndexes: indexStats,

          // Calculate expiration rates
          estimatedExpirationRate: await this.estimateExpirationRate(collection)
        };
      }

      // Store metrics
      await this.collections.ttlMetrics.insertOne(metrics);

      if (this.config.enableMetrics) {
        console.log('TTL Metrics:', {
          totalCollections: Object.keys(metrics.collections).length,
          totalDocuments: Object.values(metrics.collections).reduce((sum, c) => sum + c.documentCount, 0),
          totalStorageSize: Object.values(metrics.collections).reduce((sum, c) => sum + c.storageSize, 0)
        });
      }

      return metrics;

    } catch (error) {
      console.error('Error collecting TTL metrics:', error);
      throw error;
    }
  }

  async getTTLIndexStats(collection) {
    const indexes = await collection.listIndexes().toArray();
    const ttlIndexes = indexes.filter(index => 
      index.expireAfterSeconds !== undefined || index.expireAfterSeconds === 0
    );

    return ttlIndexes.map(index => ({
      name: index.name,
      key: index.key,
      expireAfterSeconds: index.expireAfterSeconds,
      partialFilterExpression: index.partialFilterExpression
    }));
  }

  async estimateExpirationRate(collection) {
    // Simple estimation based on documents created vs documents existing
    const now = new Date();
    const oneDayAgo = new Date(now.getTime() - (24 * 60 * 60 * 1000));

    const recentDocuments = await collection.countDocuments({
      createdAt: { $gte: oneDayAgo }
    });

    const totalDocuments = await collection.countDocuments();

    return recentDocuments > 0 ? (recentDocuments / totalDocuments) : 0;
  }

  async optimizeTTLIndexes() {
    console.log('Optimizing TTL indexes for better performance...');

    try {
      for (const [collectionName, collection] of Object.entries(this.collections)) {
        if (collectionName === 'ttlMetrics') continue;

        // Analyze index usage
        const indexStats = await collection.aggregate([
          { $indexStats: {} }
        ]).toArray();

        // Identify underutilized TTL indexes
        for (const indexStat of indexStats) {
          if (indexStat.key && indexStat.key.expiresAt) {
            const usage = indexStat.accesses;
            console.log(`TTL index ${indexStat.name} usage:`, usage);

            // Suggest optimizations based on usage patterns
            if (usage.ops < 100 && usage.since) {
              console.log(`Consider reviewing TTL index ${indexStat.name} - low usage detected`);
            }
          }
        }
      }

    } catch (error) {
      console.error('Error optimizing TTL indexes:', error);
    }
  }

  async getTTLStatus() {
    const status = {
      collectionsWithTTL: 0,
      totalTTLIndexes: 0,
      activeExpirations: {},
      systemHealth: 'healthy'
    };

    for (const [collectionName, collection] of Object.entries(this.collections)) {
      if (collectionName === 'ttlMetrics') continue;

      const indexes = await collection.listIndexes().toArray();
      const ttlIndexes = indexes.filter(index => 
        index.expireAfterSeconds !== undefined || index.expireAfterSeconds === 0
      );

      if (ttlIndexes.length > 0) {
        status.collectionsWithTTL++;
        status.totalTTLIndexes += ttlIndexes.length;

        // Estimate documents that will expire soon
        const soonToExpire = await this.estimateSoonToExpire(collection, ttlIndexes);
        status.activeExpirations[collectionName] = soonToExpire;
      }
    }

    return status;
  }

  async estimateSoonToExpire(collection, ttlIndexes) {
    let totalSoonToExpire = 0;

    for (const index of ttlIndexes) {
      if (index.expireAfterSeconds === 0) {
        // Documents expire based on field value
        const fieldName = Object.keys(index.key)[0];
        const nextHour = new Date(Date.now() + (60 * 60 * 1000));

        const count = await collection.countDocuments({
          [fieldName]: { $lt: nextHour }
        });

        totalSoonToExpire += count;
      } else {
        // Documents expire based on index TTL
        const fieldName = Object.keys(index.key)[0];
        const cutoffTime = new Date(Date.now() - (index.expireAfterSeconds * 1000) + (60 * 60 * 1000));

        const count = await collection.countDocuments({
          [fieldName]: { $lt: cutoffTime }
        });

        totalSoonToExpire += count;
      }
    }

    return totalSoonToExpire;
  }

  async shutdown() {
    console.log('Shutting down TTL Manager...');

    // Final metrics collection
    if (this.config.enableMetrics) {
      await this.collectTTLMetrics();
    }

    // Display final status
    const status = await this.getTTLStatus();
    console.log('Final TTL Status:', status);

    console.log('TTL Manager shutdown complete');
  }
}

// Benefits of MongoDB TTL Collections:
// - Automatic data expiration without manual intervention
// - Multiple TTL strategies (fixed time, document field, partial indexes)
// - Built-in optimization and storage reclamation
// - Integration with MongoDB's index and query optimization
// - Flexible retention policies based on data characteristics
// - No external job scheduling required
// - Consistent behavior across replica sets and sharded clusters
// - Real-time metrics and monitoring capabilities
// - SQL-compatible TTL operations through QueryLeaf integration

module.exports = {
  MongoDBTTLManager
};

Understanding MongoDB TTL Architecture

Advanced TTL Patterns and Configuration Strategies

Implement sophisticated TTL patterns for different data lifecycle requirements:

// Advanced TTL patterns for production MongoDB deployments
class AdvancedTTLStrategies extends MongoDBTTLManager {
  constructor(db, advancedConfig) {
    super(db, advancedConfig);

    this.advancedConfig = {
      ...advancedConfig,
      enableTimezoneSupport: true,
      enableConditionalExpiration: true,
      enableGradualExpiration: true,
      enableExpirationNotifications: true,
      enableComplianceMode: true
    };
  }

  async setupConditionalTTL() {
    // TTL that expires documents based on multiple conditions
    console.log('Setting up conditional TTL with complex business logic...');

    const conditionalTTLCollection = this.db.collection('conditional_expiration');

    // Different TTL for different user tiers
    await conditionalTTLCollection.createIndex(
      { lastActivityAt: 1 },
      {
        expireAfterSeconds: 30 * 24 * 3600, // 30 days for free tier
        partialFilterExpression: { 
          userTier: 'free',
          isPremium: false 
        },
        background: true,
        name: 'free_user_data_ttl'
      }
    );

    await conditionalTTLCollection.createIndex(
      { lastActivityAt: 1 },
      {
        expireAfterSeconds: 365 * 24 * 3600, // 1 year for premium users
        partialFilterExpression: { 
          userTier: 'premium',
          isPremium: true 
        },
        background: true,
        name: 'premium_user_data_ttl'
      }
    );

    // Business-critical data never expires automatically
    await conditionalTTLCollection.createIndex(
      { reviewDate: 1 },
      {
        expireAfterSeconds: 7 * 365 * 24 * 3600, // 7 years for compliance
        partialFilterExpression: { 
          dataClassification: 'business_critical',
          complianceRetentionRequired: true
        },
        background: true,
        name: 'compliance_data_ttl'
      }
    );
  }

  async setupGradualExpiration() {
    // Implement gradual expiration to reduce system load
    console.log('Setting up gradual expiration strategy...');

    const gradualCollection = this.db.collection('gradual_expiration');

    // Stagger expiration across time buckets
    const timeBuckets = [
      { hour: 2, expireSeconds: 7 * 24 * 3600 },   // 2 AM
      { hour: 14, expireSeconds: 14 * 24 * 3600 }, // 2 PM
      { hour: 20, expireSeconds: 21 * 24 * 3600 }  // 8 PM
    ];

    for (const bucket of timeBuckets) {
      await gradualCollection.createIndex(
        { createdAt: 1 },
        {
          expireAfterSeconds: bucket.expireSeconds,
          partialFilterExpression: {
            expirationBucket: bucket.hour
          },
          background: true,
          name: `gradual_ttl_${bucket.hour}h`
        }
      );
    }
  }

  async createDocumentWithGradualExpiration(data) {
    // Assign expiration bucket based on hash of document ID
    const buckets = [2, 14, 20];
    const bucketIndex = Math.abs(data.hashCode || Math.random()) % buckets.length;
    const selectedBucket = buckets[bucketIndex];

    const document = {
      ...data,
      createdAt: new Date(),
      expirationBucket: selectedBucket,

      // Add jitter to prevent thundering herd
      expirationJitter: Math.floor(Math.random() * 3600) // 0-1 hour jitter
    };

    return await this.db.collection('gradual_expiration').insertOne(document);
  }

  async setupTimezoneTTL() {
    // TTL that respects business hours and timezones
    console.log('Setting up timezone-aware TTL...');

    const timezoneCollection = this.db.collection('timezone_expiration');

    // Create TTL based on business date rather than UTC
    await timezoneCollection.createIndex(
      { businessDateExpiry: 1 },
      {
        expireAfterSeconds: 0, // Use document field
        background: true,
        name: 'business_timezone_ttl'
      }
    );
  }

  async createBusinessHoursTTLDocument(data, businessTimezone = 'America/New_York', retentionDays = 30) {
    const moment = require('moment-timezone');

    // Calculate expiration at end of business day in specified timezone
    const businessExpiry = moment()
      .tz(businessTimezone)
      .add(retentionDays, 'days')
      .endOf('day') // Expire at end of business day
      .toDate();

    const document = {
      ...data,
      createdAt: new Date(),
      businessDateExpiry: businessExpiry,
      timezone: businessTimezone,
      retentionPolicy: 'business_hours_aligned'
    };

    return await timezoneCollection.insertOne(document);
  }

  async setupComplianceTTL() {
    // TTL with compliance and audit requirements
    console.log('Setting up compliance-aware TTL...');

    const complianceCollection = this.db.collection('compliance_data');

    // Legal hold prevents automatic expiration
    await complianceCollection.createIndex(
      { scheduledDestructionDate: 1 },
      {
        expireAfterSeconds: 0,
        partialFilterExpression: {
          legalHold: false,
          complianceStatus: 'approved_for_destruction'
        },
        background: true,
        name: 'compliance_ttl'
      }
    );

    // Audit trail for expired documents
    await complianceCollection.createIndex(
      { auditExpirationDate: 1 },
      {
        expireAfterSeconds: 10 * 365 * 24 * 3600, // 10 years for audit trail
        background: true,
        name: 'audit_trail_ttl'
      }
    );
  }

  async createComplianceDocument(data, retentionYears = 7) {
    const scheduledDestruction = new Date();
    scheduledDestruction.setFullYear(scheduledDestruction.getFullYear() + retentionYears);

    const document = {
      ...data,
      createdAt: new Date(),
      retentionPeriodYears: retentionYears,
      scheduledDestructionDate: scheduledDestruction,

      // Compliance metadata
      legalHold: false,
      complianceStatus: 'under_retention',
      dataClassification: data.dataClassification || 'standard',

      // Audit requirements
      auditExpirationDate: new Date(scheduledDestruction.getTime() + (3 * 365 * 24 * 60 * 60 * 1000)) // +3 years
    };

    return await this.db.collection('compliance_data').insertOne(document);
  }

  async implementExpirationNotifications() {
    // Set up change streams to monitor expiring documents
    console.log('Setting up expiration notifications...');

    const expirationNotifier = this.db.collection('expiration_notifications');

    // Monitor documents that will expire soon
    setInterval(async () => {
      await this.checkUpcomingExpirations();
    }, 60 * 60 * 1000); // Check every hour
  }

  async checkUpcomingExpirations() {
    const collections = [
      'user_sessions', 
      'application_logs', 
      'temporary_data',
      'compliance_data'
    ];

    for (const collectionName of collections) {
      const collection = this.db.collection(collectionName);

      // Find documents expiring in the next 24 hours
      const tomorrow = new Date(Date.now() + (24 * 60 * 60 * 1000));

      const soonToExpire = await collection.find({
        $or: [
          { expiresAt: { $lt: tomorrow, $gte: new Date() } },
          { businessDateExpiry: { $lt: tomorrow, $gte: new Date() } },
          { scheduledDestructionDate: { $lt: tomorrow, $gte: new Date() } }
        ]
      }).toArray();

      if (soonToExpire.length > 0) {
        console.log(`${collectionName}: ${soonToExpire.length} documents expiring within 24 hours`);

        // Send notifications or trigger workflows
        await this.sendExpirationNotifications(collectionName, soonToExpire);
      }
    }
  }

  async sendExpirationNotifications(collectionName, documents) {
    // Implementation would integrate with notification systems
    const notification = {
      timestamp: new Date(),
      collection: collectionName,
      documentsCount: documents.length,
      urgency: 'medium',
      action: 'documents_expiring_soon'
    };

    console.log('Expiration notification:', notification);

    // Store notification for processing
    await this.db.collection('expiration_notifications').insertOne(notification);
  }
}

SQL-Style TTL Operations with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB TTL operations:

-- QueryLeaf TTL operations with SQL-familiar syntax

-- Create TTL-enabled collections with automatic expiration
CREATE TABLE user_sessions (
  session_id UUID PRIMARY KEY,
  user_id VARCHAR(50) NOT NULL,
  session_data DOCUMENT,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  expires_at TIMESTAMP NOT NULL,
  last_accessed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  is_active BOOLEAN DEFAULT true
)
WITH TTL (
  -- Multiple TTL strategies
  expires_at EXPIRE_AFTER 0,  -- Use document field value
  last_accessed_at EXPIRE_AFTER '7 days' -- Inactive session cleanup
);

-- Create application logs with level-based retention
CREATE TABLE application_logs (
  log_id UUID PRIMARY KEY,
  application_name VARCHAR(100) NOT NULL,
  log_level VARCHAR(20) NOT NULL,
  message TEXT,
  metadata DOCUMENT,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
)
WITH TTL (
  -- Different retention by log level using partial indexes
  created_at EXPIRE_AFTER '7 days' WHERE log_level = 'DEBUG',
  created_at EXPIRE_AFTER '30 days' WHERE log_level = 'INFO',
  created_at EXPIRE_AFTER '90 days' WHERE log_level = 'WARN',
  created_at EXPIRE_AFTER '365 days' WHERE log_level IN ('ERROR', 'CRITICAL')
);

-- Temporary data with flexible TTL
CREATE TABLE temporary_data (
  temp_id UUID PRIMARY KEY,
  data_type VARCHAR(100),
  data DOCUMENT,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  expires_at TIMESTAMP,
  ttl_seconds INTEGER DEFAULT 3600
)
WITH TTL (
  expires_at EXPIRE_AFTER 0,  -- Use document field
  created_at EXPIRE_AFTER '24 hours' WHERE expires_at IS NULL  -- Default fallback
);

-- Insert session with custom TTL
INSERT INTO user_sessions (user_id, session_data, expires_at, is_active)
VALUES 
  ('user123', '{"preferences": {"theme": "dark"}}', CURRENT_TIMESTAMP + INTERVAL '2 hours', true),
  ('user456', '{"preferences": {"lang": "en"}}', CURRENT_TIMESTAMP + INTERVAL '1 day', true);

-- Insert log entries (automatic TTL based on level)
INSERT INTO application_logs (application_name, log_level, message, metadata)
VALUES 
  ('web-server', 'DEBUG', 'Request processed', '{"endpoint": "/api/users", "duration": 45}'),
  ('web-server', 'ERROR', 'Database connection failed', '{"error": "timeout", "retry_count": 3}'),
  ('payment-service', 'INFO', 'Payment processed', '{"amount": 99.99, "currency": "USD"}');

-- Query active sessions with TTL information
SELECT 
  session_id,
  user_id,
  created_at,
  expires_at,

  -- Calculate remaining TTL
  EXTRACT(EPOCH FROM (expires_at - CURRENT_TIMESTAMP)) as seconds_until_expiry,

  -- Expiration status
  CASE 
    WHEN expires_at <= CURRENT_TIMESTAMP THEN 'expired'
    WHEN expires_at <= CURRENT_TIMESTAMP + INTERVAL '1 hour' THEN 'expiring_soon'
    ELSE 'active'
  END as expiration_status,

  -- Session age
  EXTRACT(EPOCH FROM (CURRENT_TIMESTAMP - created_at)) as session_age_seconds

FROM user_sessions
WHERE is_active = true
ORDER BY expires_at ASC;

-- Extend session TTL (renew expiration)
UPDATE user_sessions 
SET 
  expires_at = CURRENT_TIMESTAMP + INTERVAL '2 hours',
  last_accessed_at = CURRENT_TIMESTAMP
WHERE session_id = 'session-uuid-here'
  AND is_active = true
  AND expires_at > CURRENT_TIMESTAMP;

-- Store temporary data with custom expiration
INSERT INTO temporary_data (data_type, data, expires_at, ttl_seconds)
VALUES 
  ('cache_entry', '{"result": [1,2,3], "computed_at": "2025-11-01T10:00:00Z"}', CURRENT_TIMESTAMP + INTERVAL '5 minutes', 300),
  ('user_upload', '{"filename": "document.pdf", "size": 1024000}', CURRENT_TIMESTAMP + INTERVAL '24 hours', 86400),
  ('temp_report', '{"report_data": {...}, "generated_for": "user123"}', CURRENT_TIMESTAMP + INTERVAL '1 hour', 3600);

-- Advanced TTL queries with business logic
WITH session_analytics AS (
  SELECT 
    user_id,
    COUNT(*) as total_sessions,
    AVG(EXTRACT(EPOCH FROM (expires_at - created_at))) as avg_session_duration,
    MAX(last_accessed_at) as last_activity,

    -- TTL health metrics
    COUNT(*) FILTER (WHERE expires_at <= CURRENT_TIMESTAMP) as expired_sessions,
    COUNT(*) FILTER (WHERE expires_at <= CURRENT_TIMESTAMP + INTERVAL '1 hour') as soon_to_expire,
    COUNT(*) FILTER (WHERE last_accessed_at < CURRENT_TIMESTAMP - INTERVAL '1 day') as inactive_sessions

  FROM user_sessions
  WHERE created_at >= CURRENT_TIMESTAMP - INTERVAL '7 days'
  GROUP BY user_id
),
user_engagement AS (
  SELECT 
    sa.*,

    -- Engagement scoring
    CASE 
      WHEN avg_session_duration > 7200 AND inactive_sessions = 0 THEN 'highly_engaged'
      WHEN avg_session_duration > 1800 AND inactive_sessions < 2 THEN 'engaged'
      WHEN inactive_sessions > total_sessions * 0.5 THEN 'low_engagement'
      ELSE 'moderate_engagement'
    END as engagement_level,

    -- TTL optimization recommendations
    CASE 
      WHEN inactive_sessions > 5 THEN 'reduce_session_ttl'
      WHEN expired_sessions = 0 AND soon_to_expire = 0 THEN 'extend_session_ttl'
      ELSE 'current_ttl_optimal'
    END as ttl_recommendation

  FROM session_analytics sa
)
SELECT 
  user_id,
  total_sessions,
  ROUND(avg_session_duration / 60, 2) as avg_session_minutes,
  last_activity,
  engagement_level,
  ttl_recommendation,

  -- Session health indicators
  ROUND((total_sessions - expired_sessions)::numeric / total_sessions * 100, 1) as session_health_pct,

  -- TTL efficiency metrics
  expired_sessions,
  soon_to_expire,
  inactive_sessions

FROM user_engagement
WHERE total_sessions > 0
ORDER BY 
  CASE engagement_level 
    WHEN 'highly_engaged' THEN 1
    WHEN 'engaged' THEN 2
    WHEN 'moderate_engagement' THEN 3
    ELSE 4
  END,
  total_sessions DESC;

-- Log retention analysis with TTL monitoring
WITH log_retention_analysis AS (
  SELECT 
    application_name,
    log_level,
    DATE_TRUNC('day', created_at) as log_date,
    COUNT(*) as daily_log_count,
    AVG(LENGTH(message)) as avg_message_length,

    -- TTL calculation based on level-specific retention
    CASE log_level
      WHEN 'DEBUG' THEN created_at + INTERVAL '7 days'
      WHEN 'INFO' THEN created_at + INTERVAL '30 days'
      WHEN 'WARN' THEN created_at + INTERVAL '90 days'
      WHEN 'ERROR' THEN created_at + INTERVAL '365 days'
      WHEN 'CRITICAL' THEN created_at + INTERVAL '365 days'
      ELSE created_at + INTERVAL '30 days'
    END as estimated_expiry,

    -- Storage impact analysis
    SUM(LENGTH(message) + COALESCE(LENGTH(metadata::TEXT), 0)) as daily_storage_bytes

  FROM application_logs
  WHERE created_at >= CURRENT_TIMESTAMP - INTERVAL '30 days'
  GROUP BY application_name, log_level, DATE_TRUNC('day', created_at)
),
storage_projections AS (
  SELECT 
    application_name,
    log_level,

    -- Current metrics
    SUM(daily_log_count) as total_logs,
    AVG(daily_log_count) as avg_daily_logs,
    SUM(daily_storage_bytes) as total_storage_bytes,
    AVG(daily_storage_bytes) as avg_daily_storage,

    -- TTL impact
    MIN(estimated_expiry) as earliest_expiry,
    MAX(estimated_expiry) as latest_expiry,

    -- Storage efficiency
    CASE log_level
      WHEN 'DEBUG' THEN SUM(daily_storage_bytes) * 7 / 30 -- 7-day retention
      WHEN 'INFO' THEN SUM(daily_storage_bytes) -- 30-day retention
      WHEN 'WARN' THEN SUM(daily_storage_bytes) * 3 -- 90-day retention
      ELSE SUM(daily_storage_bytes) * 12 -- 365-day retention
    END as projected_steady_state_storage

  FROM log_retention_analysis
  GROUP BY application_name, log_level
)
SELECT 
  application_name,
  log_level,
  total_logs,
  avg_daily_logs,

  -- Storage analysis
  ROUND(total_storage_bytes / 1024.0 / 1024.0, 2) as storage_mb,
  ROUND(avg_daily_storage / 1024.0 / 1024.0, 2) as avg_daily_mb,
  ROUND(projected_steady_state_storage / 1024.0 / 1024.0, 2) as steady_state_mb,

  -- TTL effectiveness
  earliest_expiry,
  latest_expiry,
  EXTRACT(DAYS FROM (latest_expiry - earliest_expiry)) as retention_range_days,

  -- Storage optimization
  ROUND((total_storage_bytes - projected_steady_state_storage) / 1024.0 / 1024.0, 2) as storage_savings_mb,
  ROUND(((total_storage_bytes - projected_steady_state_storage) / total_storage_bytes * 100), 1) as storage_reduction_pct,

  -- Recommendations
  CASE 
    WHEN log_level = 'DEBUG' AND avg_daily_logs > 10000 THEN 'Consider shorter DEBUG retention or sampling'
    WHEN projected_steady_state_storage > total_storage_bytes * 2 THEN 'TTL may be too long for this log volume'
    WHEN projected_steady_state_storage < total_storage_bytes * 0.1 THEN 'TTL may be too aggressive'
    ELSE 'TTL appears well-configured'
  END as ttl_recommendation

FROM storage_projections
WHERE total_logs > 0
ORDER BY application_name, 
  CASE log_level 
    WHEN 'CRITICAL' THEN 1
    WHEN 'ERROR' THEN 2
    WHEN 'WARN' THEN 3
    WHEN 'INFO' THEN 4
    WHEN 'DEBUG' THEN 5
  END;

-- TTL index health monitoring
WITH ttl_index_health AS (
  SELECT 
    'user_sessions' as collection_name,
    'session_ttl' as index_name,
    'expires_at' as ttl_field,
    0 as expire_after_seconds,

    -- Health metrics
    COUNT(*) as total_documents,
    COUNT(*) FILTER (WHERE expires_at <= CURRENT_TIMESTAMP) as expired_documents,
    COUNT(*) FILTER (WHERE expires_at <= CURRENT_TIMESTAMP + INTERVAL '1 hour') as expiring_soon,

    -- Performance metrics
    AVG(EXTRACT(EPOCH FROM (expires_at - created_at))) as avg_document_lifetime,
    MIN(expires_at) as earliest_expiry,
    MAX(expires_at) as latest_expiry

  FROM user_sessions

  UNION ALL

  SELECT 
    'application_logs' as collection_name,
    'logs_level_ttl' as index_name,
    'created_at' as ttl_field,
    CASE log_level
      WHEN 'DEBUG' THEN 7 * 24 * 3600
      WHEN 'INFO' THEN 30 * 24 * 3600
      WHEN 'WARN' THEN 90 * 24 * 3600
      ELSE 365 * 24 * 3600
    END as expire_after_seconds,

    COUNT(*) as total_documents,
    COUNT(*) FILTER (WHERE 
      created_at <= CURRENT_TIMESTAMP - 
      CASE log_level
        WHEN 'DEBUG' THEN INTERVAL '7 days'
        WHEN 'INFO' THEN INTERVAL '30 days'
        WHEN 'WARN' THEN INTERVAL '90 days'
        ELSE INTERVAL '365 days'
      END
    ) as expired_documents,
    COUNT(*) FILTER (WHERE 
      created_at <= CURRENT_TIMESTAMP + INTERVAL '1 day' - 
      CASE log_level
        WHEN 'DEBUG' THEN INTERVAL '7 days'
        WHEN 'INFO' THEN INTERVAL '30 days'
        WHEN 'WARN' THEN INTERVAL '90 days'
        ELSE INTERVAL '365 days'
      END
    ) as expiring_soon,

    AVG(EXTRACT(EPOCH FROM (CURRENT_TIMESTAMP - created_at))) as avg_document_lifetime,
    MIN(created_at) as earliest_expiry,
    MAX(created_at) as latest_expiry

  FROM application_logs
  GROUP BY log_level
)
SELECT 
  collection_name,
  index_name,
  ttl_field,
  expire_after_seconds,
  total_documents,
  expired_documents,
  expiring_soon,

  -- TTL efficiency metrics
  ROUND(avg_document_lifetime / 3600, 2) as avg_lifetime_hours,
  CASE 
    WHEN total_documents > 0 
    THEN ROUND((expired_documents::numeric / total_documents) * 100, 2)
    ELSE 0
  END as expiration_rate_pct,

  -- TTL health indicators
  CASE 
    WHEN expired_documents > total_documents * 0.9 THEN 'unhealthy_high_expiration'
    WHEN expired_documents = 0 AND total_documents > 1000 THEN 'no_expiration_detected'
    WHEN expiring_soon > total_documents * 0.5 THEN 'high_upcoming_expiration'
    ELSE 'healthy'
  END as ttl_health_status,

  -- Performance impact assessment
  CASE 
    WHEN expired_documents > 10000 THEN 'high_cleanup_load'
    WHEN expiring_soon > 5000 THEN 'moderate_cleanup_load'
    ELSE 'low_cleanup_load'
  END as cleanup_load_assessment

FROM ttl_index_health
ORDER BY collection_name, expire_after_seconds;

-- TTL collection management commands
-- Monitor TTL operations
SHOW TTL STATUS;

-- Optimize TTL indexes
OPTIMIZE TTL INDEXES;

-- Modify TTL expiration times
ALTER TABLE user_sessions 
MODIFY TTL expires_at EXPIRE_AFTER 0,
MODIFY TTL last_accessed_at EXPIRE_AFTER '14 days';

-- Remove TTL from a collection
ALTER TABLE temporary_data DROP TTL created_at;

-- QueryLeaf provides comprehensive TTL capabilities:
-- 1. SQL-familiar TTL creation and management syntax
-- 2. Multiple TTL strategies (field-based, time-based, conditional)
-- 3. Advanced TTL monitoring and health assessment
-- 4. Automatic storage optimization and cleanup
-- 5. Business logic integration with TTL policies
-- 6. Compliance and audit-friendly TTL management
-- 7. Performance monitoring and optimization recommendations
-- 8. Integration with MongoDB's native TTL optimizations
-- 9. Flexible retention policies with partial index support
-- 10. Familiar SQL syntax for complex TTL operations

Best Practices for TTL Implementation

Data Lifecycle Strategy Design

Essential principles for effective TTL implementation:

  1. Business Alignment: Design TTL policies that align with business requirements and compliance needs
  2. Performance Optimization: Consider the impact of TTL operations on database performance
  3. Storage Management: Balance data retention needs with storage costs and performance
  4. Monitoring Strategy: Implement comprehensive monitoring for TTL effectiveness
  5. Gradual Implementation: Roll out TTL policies gradually to assess impact
  6. Backup Considerations: Ensure TTL policies don't conflict with backup and recovery strategies

Advanced TTL Configuration

Optimize TTL for production environments:

  1. Index Strategy: Design TTL indexes to minimize performance impact during cleanup
  2. Batch Operations: Configure TTL to avoid large batch deletions during peak hours
  3. Partial Indexes: Use partial indexes for complex retention policies
  4. Compound TTL: Combine TTL with other indexing strategies for optimal performance
  5. Timezone Handling: Account for business timezone requirements in TTL calculations
  6. Compliance Integration: Ensure TTL policies meet regulatory and audit requirements

Conclusion

MongoDB TTL collections eliminate the complexity of manual data lifecycle management by providing native, automatic data expiration capabilities. The ability to configure flexible retention policies, monitor TTL effectiveness, and integrate with business logic makes TTL collections essential for modern data management strategies.

Key TTL benefits include:

  • Automatic Data Management: Hands-off data expiration without manual intervention
  • Flexible Retention Policies: Multiple TTL strategies for different data types and business requirements
  • Storage Optimization: Automatic cleanup reduces storage costs and improves performance
  • Compliance Support: Built-in capabilities for audit trails and regulatory compliance
  • Performance Benefits: Optimized cleanup operations with minimal impact on application performance
  • SQL Accessibility: Familiar SQL-style TTL operations through QueryLeaf integration

Whether you're managing user sessions, application logs, temporary data, or compliance-sensitive information, MongoDB TTL collections with QueryLeaf's familiar SQL interface provide the foundation for efficient, automated data lifecycle management.

QueryLeaf Integration: QueryLeaf seamlessly manages MongoDB TTL collections while providing SQL-familiar data lifecycle management syntax, retention policy configuration, and TTL monitoring capabilities. Advanced TTL patterns including conditional expiration, gradual cleanup, and compliance-aware retention are elegantly handled through familiar SQL constructs, making sophisticated data lifecycle management both powerful and accessible to SQL-oriented development teams.

The combination of MongoDB's robust TTL capabilities with SQL-style data lifecycle operations makes it an ideal platform for applications requiring both automated data management and familiar database interaction patterns, ensuring your TTL strategies remain both effective and maintainable as your data needs evolve and scale.

MongoDB Bulk Operations and Performance Optimization: Advanced Batch Processing for High-Throughput Applications

High-throughput applications require efficient data processing capabilities that can handle large volumes of documents with minimal latency and optimal resource utilization. Traditional single-document operations become performance bottlenecks when applications need to process thousands or millions of documents, leading to increased response times, inefficient network utilization, and poor system scalability under heavy data processing loads.

MongoDB's bulk operations provide sophisticated batch processing capabilities that enable applications to perform multiple document operations in a single request, dramatically improving throughput while reducing network overhead and server-side processing costs. Unlike traditional databases that require complex batching logic or application-level transaction management, MongoDB offers native bulk operation support with automatic optimization, error handling, and performance monitoring.

The Single-Document Operation Challenge

Traditional document-by-document processing approaches face significant performance limitations in high-volume scenarios:

-- Traditional approach - processing documents one at a time (inefficient pattern)

-- Example: Processing user registration batch - individual operations
INSERT INTO users (name, email, registration_date, status) 
VALUES ('John Doe', 'john@example.com', CURRENT_TIMESTAMP, 'pending');

INSERT INTO users (name, email, registration_date, status) 
VALUES ('Jane Smith', 'jane@example.com', CURRENT_TIMESTAMP, 'pending');

INSERT INTO users (name, email, registration_date, status) 
VALUES ('Bob Johnson', 'bob@example.com', CURRENT_TIMESTAMP, 'pending');

-- Problems with single-document operations:
-- 1. High network round-trip overhead for each operation
-- 2. Individual index updates and lock acquisitions
-- 3. Inefficient resource utilization and memory allocation
-- 4. Poor scaling characteristics under high load
-- 5. Complex error handling for partial failures
-- 6. Limited transaction scope and atomicity guarantees

-- Example: Updating user statuses individually (performance bottleneck)
UPDATE users SET status = 'active', activated_at = CURRENT_TIMESTAMP 
WHERE email = 'john@example.com';

UPDATE users SET status = 'active', activated_at = CURRENT_TIMESTAMP 
WHERE email = 'jane@example.com';

UPDATE users SET status = 'active', activated_at = CURRENT_TIMESTAMP 
WHERE email = 'bob@example.com';

-- Individual updates result in:
-- - Multiple database connections and query parsing overhead
-- - Repeated index lookups and document retrieval operations  
-- - Inefficient write operations with individual lock acquisitions
-- - High latency due to network round trips
-- - Difficult error recovery and consistency management
-- - Poor resource utilization with context switching overhead

-- Example: Data cleanup operations (time-consuming individual deletes)
DELETE FROM users WHERE last_login < CURRENT_DATE - INTERVAL '2 years';
-- This approach processes each matching document individually

DELETE FROM user_sessions WHERE created_at < CURRENT_DATE - INTERVAL '30 days';
-- Again, individual document processing

DELETE FROM audit_logs WHERE log_date < CURRENT_DATE - INTERVAL '1 year';
-- More individual processing overhead

-- Single-document limitations:
-- 1. Long-running operations that block other requests
-- 2. Inefficient resource allocation and memory usage
-- 3. Poor progress tracking and monitoring capabilities
-- 4. Difficult to implement proper error handling
-- 5. No batch-level optimization opportunities
-- 6. Complex application logic for managing large datasets
-- 7. Limited ability to prioritize or throttle operations
-- 8. Inefficient use of database connection pooling

-- Traditional PostgreSQL bulk insert attempt (limited capabilities)
BEGIN;
INSERT INTO users (name, email, registration_date, status) VALUES
  ('User 1', 'user1@example.com', CURRENT_TIMESTAMP, 'pending'),
  ('User 2', 'user2@example.com', CURRENT_TIMESTAMP, 'pending'),
  ('User 3', 'user3@example.com', CURRENT_TIMESTAMP, 'pending');
  -- Limited to relatively small batches due to query size restrictions
  -- No advanced error handling or partial success reporting
  -- Limited optimization compared to native bulk operations
COMMIT;

-- PostgreSQL bulk update limitations
UPDATE users SET 
  status = CASE 
    WHEN email = 'user1@example.com' THEN 'active'
    WHEN email = 'user2@example.com' THEN 'suspended'
    WHEN email = 'user3@example.com' THEN 'active'
    ELSE status
  END,
  last_updated = CURRENT_TIMESTAMP
WHERE email IN ('user1@example.com', 'user2@example.com', 'user3@example.com');

-- Issues with traditional bulk approaches:
-- 1. Complex SQL syntax for conditional updates
-- 2. Limited flexibility for different operations per document
-- 3. No built-in error reporting for individual items
-- 4. Query size limitations for large batches
-- 5. Poor performance characteristics compared to native bulk operations
-- 6. Limited monitoring and progress reporting capabilities

MongoDB bulk operations provide comprehensive high-performance batch processing:

// MongoDB Advanced Bulk Operations - comprehensive batch processing with optimization

const { MongoClient } = require('mongodb');

// Advanced MongoDB Bulk Operations Manager
class MongoDBBulkOperationsManager {
  constructor(db) {
    this.db = db;
    this.performanceMetrics = {
      bulkInserts: { operations: 0, documentsProcessed: 0, totalTime: 0 },
      bulkUpdates: { operations: 0, documentsProcessed: 0, totalTime: 0 },
      bulkDeletes: { operations: 0, documentsProcessed: 0, totalTime: 0 },
      bulkWrites: { operations: 0, documentsProcessed: 0, totalTime: 0 }
    };
    this.errorTracking = new Map();
    this.optimizationSettings = {
      defaultBatchSize: 1000,
      maxBatchSize: 10000,
      enableOrdered: false, // Unordered operations for better performance
      enableBypassValidation: false,
      retryAttempts: 3,
      retryDelayMs: 1000
    };
  }

  // High-performance bulk insert operations
  async performBulkInsert(collectionName, documents, options = {}) {
    console.log(`Starting bulk insert of ${documents.length} documents into ${collectionName}`);

    const startTime = Date.now();
    const collection = this.db.collection(collectionName);

    // Configure bulk insert options for optimal performance
    const bulkOptions = {
      ordered: options.ordered !== undefined ? options.ordered : this.optimizationSettings.enableOrdered,
      bypassDocumentValidation: options.bypassValidation || this.optimizationSettings.enableBypassValidation,
      writeConcern: options.writeConcern || { w: 'majority', j: true }
    };

    try {
      // Process documents in optimal batch sizes
      const batchSize = Math.min(
        options.batchSize || this.optimizationSettings.defaultBatchSize,
        this.optimizationSettings.maxBatchSize
      );

      const results = [];
      let totalInserted = 0;
      let totalErrors = 0;

      for (let i = 0; i < documents.length; i += batchSize) {
        const batch = documents.slice(i, i + batchSize);

        try {
          console.log(`Processing batch ${Math.floor(i / batchSize) + 1} of ${Math.ceil(documents.length / batchSize)}`);

          // Add metadata to documents for tracking
          const enrichedBatch = batch.map(doc => ({
            ...doc,
            _bulk_operation_id: `bulk_insert_${Date.now()}_${Math.random().toString(36).substr(2, 9)}`,
            _inserted_at: new Date(),
            _batch_number: Math.floor(i / batchSize) + 1
          }));

          const batchResult = await collection.insertMany(enrichedBatch, bulkOptions);

          results.push({
            batchIndex: Math.floor(i / batchSize),
            insertedCount: batchResult.insertedCount,
            insertedIds: batchResult.insertedIds,
            success: true
          });

          totalInserted += batchResult.insertedCount;

        } catch (error) {
          console.error(`Batch ${Math.floor(i / batchSize) + 1} failed:`, error.message);

          // Handle partial failures in unordered operations
          if (error.result && error.result.insertedCount) {
            totalInserted += error.result.insertedCount;
          }

          totalErrors += batch.length - (error.result?.insertedCount || 0);

          results.push({
            batchIndex: Math.floor(i / batchSize),
            insertedCount: error.result?.insertedCount || 0,
            error: error.message,
            success: false
          });

          // Track errors for analysis
          this.trackBulkOperationError('bulkInsert', error);
        }
      }

      const totalTime = Date.now() - startTime;

      // Update performance metrics
      this.updatePerformanceMetrics('bulkInserts', {
        operations: 1,
        documentsProcessed: totalInserted,
        totalTime: totalTime
      });

      const summary = {
        success: totalErrors === 0,
        totalDocuments: documents.length,
        insertedDocuments: totalInserted,
        failedDocuments: totalErrors,
        executionTimeMs: totalTime,
        throughputDocsPerSecond: Math.round((totalInserted / totalTime) * 1000),
        batchResults: results
      };

      console.log(`Bulk insert completed: ${totalInserted}/${documents.length} documents processed in ${totalTime}ms`);
      return summary;

    } catch (error) {
      console.error('Bulk insert operation failed:', error);
      this.trackBulkOperationError('bulkInsert', error);
      throw error;
    }
  }

  // Advanced bulk update operations with flexible patterns
  async performBulkUpdate(collectionName, updateOperations, options = {}) {
    console.log(`Starting bulk update of ${updateOperations.length} operations on ${collectionName}`);

    const startTime = Date.now();
    const collection = this.db.collection(collectionName);

    try {
      // Initialize ordered or unordered bulk operation
      const bulkOp = options.ordered ? collection.initializeOrderedBulkOp() : 
                                       collection.initializeUnorderedBulkOp();

      let operationCount = 0;

      // Process different types of update operations
      for (const operation of updateOperations) {
        const { filter, update, upsert = false, arrayFilters = null, hint = null } = operation;

        // Add operation metadata for tracking
        const enhancedUpdate = {
          ...update,
          $set: {
            ...update.$set,
            _last_bulk_update: new Date(),
            _bulk_operation_id: `bulk_update_${Date.now()}_${operationCount}`
          }
        };

        // Configure update operation based on type
        const updateConfig = { upsert };
        if (arrayFilters) updateConfig.arrayFilters = arrayFilters;
        if (hint) updateConfig.hint = hint;

        // Add to bulk operation
        if (operation.type === 'updateMany') {
          bulkOp.find(filter).updateMany(enhancedUpdate, updateConfig);
        } else {
          bulkOp.find(filter).updateOne(enhancedUpdate, updateConfig);
        }

        operationCount++;

        // Execute batch when reaching optimal size
        if (operationCount % this.optimizationSettings.defaultBatchSize === 0) {
          console.log(`Executing intermediate batch of ${this.optimizationSettings.defaultBatchSize} operations`);
        }
      }

      // Execute all bulk update operations
      console.log(`Executing ${operationCount} bulk update operations`);
      const result = await bulkOp.execute({
        writeConcern: options.writeConcern || { w: 'majority', j: true }
      });

      const totalTime = Date.now() - startTime;

      // Update performance metrics
      this.updatePerformanceMetrics('bulkUpdates', {
        operations: 1,
        documentsProcessed: result.modifiedCount + result.upsertedCount,
        totalTime: totalTime
      });

      const summary = {
        success: true,
        totalOperations: operationCount,
        matchedDocuments: result.matchedCount,
        modifiedDocuments: result.modifiedCount,
        upsertedDocuments: result.upsertedCount,
        upsertedIds: result.upsertedIds,
        executionTimeMs: totalTime,
        throughputOpsPerSecond: Math.round((operationCount / totalTime) * 1000),
        writeErrors: result.writeErrors || [],
        writeConcernErrors: result.writeConcernErrors || []
      };

      console.log(`Bulk update completed: ${result.modifiedCount} documents modified, ${result.upsertedCount} upserted in ${totalTime}ms`);
      return summary;

    } catch (error) {
      console.error('Bulk update operation failed:', error);
      this.trackBulkOperationError('bulkUpdate', error);

      // Return partial results if available
      if (error.result) {
        const totalTime = Date.now() - startTime;
        return {
          success: false,
          error: error.message,
          partialResult: {
            matchedDocuments: error.result.matchedCount,
            modifiedDocuments: error.result.modifiedCount,
            upsertedDocuments: error.result.upsertedCount,
            executionTimeMs: totalTime
          }
        };
      }
      throw error;
    }
  }

  // Optimized bulk delete operations
  async performBulkDelete(collectionName, deleteOperations, options = {}) {
    console.log(`Starting bulk delete of ${deleteOperations.length} operations on ${collectionName}`);

    const startTime = Date.now();
    const collection = this.db.collection(collectionName);

    try {
      // Initialize bulk operation
      const bulkOp = options.ordered ? collection.initializeOrderedBulkOp() : 
                                       collection.initializeUnorderedBulkOp();

      let operationCount = 0;

      // Process delete operations
      for (const operation of deleteOperations) {
        const { filter, deleteType = 'deleteMany', hint = null } = operation;

        // Configure delete operation
        const deleteConfig = {};
        if (hint) deleteConfig.hint = hint;

        // Add to bulk operation based on type
        if (deleteType === 'deleteOne') {
          bulkOp.find(filter).deleteOne();
        } else {
          bulkOp.find(filter).delete(); // deleteMany is default
        }

        operationCount++;
      }

      // Execute bulk delete operations
      console.log(`Executing ${operationCount} bulk delete operations`);
      const result = await bulkOp.execute({
        writeConcern: options.writeConcern || { w: 'majority', j: true }
      });

      const totalTime = Date.now() - startTime;

      // Update performance metrics
      this.updatePerformanceMetrics('bulkDeletes', {
        operations: 1,
        documentsProcessed: result.deletedCount,
        totalTime: totalTime
      });

      const summary = {
        success: true,
        totalOperations: operationCount,
        deletedDocuments: result.deletedCount,
        executionTimeMs: totalTime,
        throughputOpsPerSecond: Math.round((operationCount / totalTime) * 1000),
        writeErrors: result.writeErrors || [],
        writeConcernErrors: result.writeConcernErrors || []
      };

      console.log(`Bulk delete completed: ${result.deletedCount} documents deleted in ${totalTime}ms`);
      return summary;

    } catch (error) {
      console.error('Bulk delete operation failed:', error);
      this.trackBulkOperationError('bulkDelete', error);

      if (error.result) {
        const totalTime = Date.now() - startTime;
        return {
          success: false,
          error: error.message,
          partialResult: {
            deletedDocuments: error.result.deletedCount,
            executionTimeMs: totalTime
          }
        };
      }
      throw error;
    }
  }

  // Mixed bulk operations (insert, update, delete in single batch)
  async performMixedBulkOperations(collectionName, operations, options = {}) {
    console.log(`Starting mixed bulk operations: ${operations.length} operations on ${collectionName}`);

    const startTime = Date.now();
    const collection = this.db.collection(collectionName);

    try {
      const bulkOp = options.ordered ? collection.initializeOrderedBulkOp() : 
                                       collection.initializeUnorderedBulkOp();

      let insertCount = 0;
      let updateCount = 0;
      let deleteCount = 0;

      // Process mixed operations
      for (const operation of operations) {
        const { type, ...opData } = operation;

        switch (type) {
          case 'insert':
            const enrichedDoc = {
              ...opData.document,
              _bulk_operation_id: `bulk_mixed_${Date.now()}_${insertCount}`,
              _inserted_at: new Date()
            };
            bulkOp.insert(enrichedDoc);
            insertCount++;
            break;

          case 'updateOne':
            const updateOneData = {
              ...opData.update,
              $set: {
                ...opData.update.$set,
                _last_bulk_update: new Date(),
                _bulk_operation_id: `bulk_mixed_update_${Date.now()}_${updateCount}`
              }
            };
            bulkOp.find(opData.filter).updateOne(updateOneData, { upsert: opData.upsert || false });
            updateCount++;
            break;

          case 'updateMany':
            const updateManyData = {
              ...opData.update,
              $set: {
                ...opData.update.$set,
                _last_bulk_update: new Date(),
                _bulk_operation_id: `bulk_mixed_update_${Date.now()}_${updateCount}`
              }
            };
            bulkOp.find(opData.filter).updateMany(updateManyData, { upsert: opData.upsert || false });
            updateCount++;
            break;

          case 'deleteOne':
            bulkOp.find(opData.filter).deleteOne();
            deleteCount++;
            break;

          case 'deleteMany':
            bulkOp.find(opData.filter).delete();
            deleteCount++;
            break;

          default:
            console.warn(`Unknown operation type: ${type}`);
        }
      }

      // Execute mixed bulk operations
      console.log(`Executing mixed bulk operations: ${insertCount} inserts, ${updateCount} updates, ${deleteCount} deletes`);
      const result = await bulkOp.execute({
        writeConcern: options.writeConcern || { w: 'majority', j: true }
      });

      const totalTime = Date.now() - startTime;
      const totalDocumentsProcessed = result.insertedCount + result.modifiedCount + result.deletedCount + result.upsertedCount;

      // Update performance metrics
      this.updatePerformanceMetrics('bulkWrites', {
        operations: 1,
        documentsProcessed: totalDocumentsProcessed,
        totalTime: totalTime
      });

      const summary = {
        success: true,
        totalOperations: operations.length,
        operationBreakdown: {
          inserts: insertCount,
          updates: updateCount,
          deletes: deleteCount
        },
        results: {
          insertedDocuments: result.insertedCount,
          insertedIds: result.insertedIds,
          matchedDocuments: result.matchedCount,
          modifiedDocuments: result.modifiedCount,
          deletedDocuments: result.deletedCount,
          upsertedDocuments: result.upsertedCount,
          upsertedIds: result.upsertedIds
        },
        executionTimeMs: totalTime,
        throughputOpsPerSecond: Math.round((operations.length / totalTime) * 1000),
        throughputDocsPerSecond: Math.round((totalDocumentsProcessed / totalTime) * 1000),
        writeErrors: result.writeErrors || [],
        writeConcernErrors: result.writeConcernErrors || []
      };

      console.log(`Mixed bulk operations completed: ${totalDocumentsProcessed} documents processed in ${totalTime}ms`);
      return summary;

    } catch (error) {
      console.error('Mixed bulk operations failed:', error);
      this.trackBulkOperationError('bulkWrite', error);

      if (error.result) {
        const totalTime = Date.now() - startTime;
        const totalDocumentsProcessed = error.result.insertedCount + error.result.modifiedCount + error.result.deletedCount + error.result.upsertedCount;

        return {
          success: false,
          error: error.message,
          partialResult: {
            insertedDocuments: error.result.insertedCount,
            modifiedDocuments: error.result.modifiedCount,
            deletedDocuments: error.result.deletedCount,
            upsertedDocuments: error.result.upsertedCount,
            totalDocumentsProcessed: totalDocumentsProcessed,
            executionTimeMs: totalTime
          }
        };
      }
      throw error;
    }
  }

  // Performance monitoring and optimization
  updatePerformanceMetrics(operationType, metrics) {
    const current = this.performanceMetrics[operationType];
    current.operations += metrics.operations;
    current.documentsProcessed += metrics.documentsProcessed;
    current.totalTime += metrics.totalTime;
  }

  trackBulkOperationError(operationType, error) {
    if (!this.errorTracking.has(operationType)) {
      this.errorTracking.set(operationType, []);
    }

    this.errorTracking.get(operationType).push({
      timestamp: new Date(),
      error: error.message,
      code: error.code,
      details: error.writeErrors || error.result
    });
  }

  getBulkOperationStatistics() {
    const stats = {};

    for (const [operationType, metrics] of Object.entries(this.performanceMetrics)) {
      if (metrics.operations > 0) {
        stats[operationType] = {
          totalOperations: metrics.operations,
          documentsProcessed: metrics.documentsProcessed,
          averageExecutionTimeMs: Math.round(metrics.totalTime / metrics.operations),
          averageThroughputDocsPerSecond: Math.round((metrics.documentsProcessed / metrics.totalTime) * 1000),
          totalExecutionTimeMs: metrics.totalTime
        };
      }
    }

    return stats;
  }

  getErrorStatistics() {
    const errorStats = {};

    for (const [operationType, errors] of this.errorTracking.entries()) {
      errorStats[operationType] = {
        totalErrors: errors.length,
        recentErrors: errors.filter(e => Date.now() - e.timestamp.getTime() < 3600000), // Last hour
        errorBreakdown: this.groupErrorsByCode(errors)
      };
    }

    return errorStats;
  }

  groupErrorsByCode(errors) {
    const breakdown = {};
    errors.forEach(error => {
      const code = error.code || 'Unknown';
      breakdown[code] = (breakdown[code] || 0) + 1;
    });
    return breakdown;
  }

  // Optimized data import functionality
  async performOptimizedDataImport(collectionName, dataSource, options = {}) {
    console.log(`Starting optimized data import for ${collectionName}`);

    const importOptions = {
      batchSize: options.batchSize || 5000,
      enableValidation: options.enableValidation !== false,
      createIndexes: options.createIndexes || false,
      dropExistingCollection: options.dropExisting || false,
      parallelBatches: options.parallelBatches || 1
    };

    try {
      const collection = this.db.collection(collectionName);

      // Drop existing collection if requested
      if (importOptions.dropExistingCollection) {
        try {
          await collection.drop();
          console.log(`Existing collection ${collectionName} dropped`);
        } catch (error) {
          console.log(`Collection ${collectionName} did not exist or could not be dropped`);
        }
      }

      // Create indexes before import if specified
      if (importOptions.createIndexes && options.indexes) {
        console.log('Creating indexes before data import...');
        for (const indexSpec of options.indexes) {
          await collection.createIndex(indexSpec.fields, indexSpec.options);
        }
      }

      // Process data in optimized batches
      let totalImported = 0;
      const startTime = Date.now();

      // Assuming dataSource is an array or iterable
      const documents = Array.isArray(dataSource) ? dataSource : await this.convertDataSource(dataSource);

      const result = await this.performBulkInsert(collectionName, documents, {
        batchSize: importOptions.batchSize,
        bypassValidation: !importOptions.enableValidation,
        ordered: false // Unordered for better performance
      });

      console.log(`Data import completed: ${result.insertedDocuments} documents imported in ${result.executionTimeMs}ms`);
      return result;

    } catch (error) {
      console.error(`Data import failed for ${collectionName}:`, error);
      throw error;
    }
  }

  async convertDataSource(dataSource) {
    // Convert various data sources (streams, iterators, etc.) to arrays
    // This is a placeholder - implement based on your specific data source types
    if (typeof dataSource.toArray === 'function') {
      return await dataSource.toArray();
    }

    if (Symbol.iterator in dataSource) {
      return Array.from(dataSource);
    }

    throw new Error('Unsupported data source type');
  }
}

// Example usage: High-performance bulk operations
async function demonstrateBulkOperations() {
  const client = new MongoClient('mongodb://localhost:27017');
  await client.connect();
  const db = client.db('bulk_operations_demo');

  const bulkManager = new MongoDBBulkOperationsManager(db);

  // Demonstrate bulk insert
  const usersToInsert = [];
  for (let i = 0; i < 10000; i++) {
    usersToInsert.push({
      name: `User ${i}`,
      email: `user${i}@example.com`,
      age: Math.floor(Math.random() * 50) + 18,
      department: ['Engineering', 'Sales', 'Marketing', 'HR'][Math.floor(Math.random() * 4)],
      salary: Math.floor(Math.random() * 100000) + 40000,
      join_date: new Date(Date.now() - Math.random() * 365 * 24 * 60 * 60 * 1000)
    });
  }

  const insertResult = await bulkManager.performBulkInsert('users', usersToInsert);
  console.log('Bulk Insert Result:', insertResult);

  // Demonstrate bulk update
  const updateOperations = [
    {
      type: 'updateMany',
      filter: { department: 'Engineering' },
      update: { 
        $set: { department: 'Software Engineering' },
        $inc: { salary: 5000 }
      }
    },
    {
      type: 'updateMany', 
      filter: { age: { $lt: 25 } },
      update: { $set: { employee_type: 'junior' } },
      upsert: false
    }
  ];

  const updateResult = await bulkManager.performBulkUpdate('users', updateOperations);
  console.log('Bulk Update Result:', updateResult);

  // Display performance statistics
  const stats = bulkManager.getBulkOperationStatistics();
  console.log('Performance Statistics:', stats);

  await client.close();
}

Understanding MongoDB Bulk Operations Architecture

Advanced Bulk Processing Patterns and Performance Optimization

Implement sophisticated bulk operation patterns for production-scale data processing:

// Production-ready MongoDB bulk operations with advanced optimization strategies
class EnterpriseMongoDBBulkManager extends MongoDBBulkOperationsManager {
  constructor(db, enterpriseConfig = {}) {
    super(db);

    this.enterpriseConfig = {
      enableShardingOptimization: enterpriseConfig.enableShardingOptimization || false,
      enableReplicationOptimization: enterpriseConfig.enableReplicationOptimization || false,
      enableCompressionOptimization: enterpriseConfig.enableCompressionOptimization || false,
      maxConcurrentOperations: enterpriseConfig.maxConcurrentOperations || 10,
      enableProgressTracking: enterpriseConfig.enableProgressTracking || true,
      enableResourceMonitoring: enterpriseConfig.enableResourceMonitoring || true
    };

    this.setupEnterpriseOptimizations();
  }

  async performParallelBulkOperations(collectionName, operationBatches, options = {}) {
    console.log(`Starting parallel bulk operations on ${collectionName} with ${operationBatches.length} batches`);

    const concurrency = Math.min(
      options.maxConcurrency || this.enterpriseConfig.maxConcurrentOperations,
      operationBatches.length
    );

    const results = [];
    const startTime = Date.now();

    // Process batches in parallel with controlled concurrency
    for (let i = 0; i < operationBatches.length; i += concurrency) {
      const batchPromises = [];

      for (let j = i; j < Math.min(i + concurrency, operationBatches.length); j++) {
        const batch = operationBatches[j];

        const promise = this.processSingleBatch(collectionName, batch, {
          ...options,
          batchIndex: j
        });

        batchPromises.push(promise);
      }

      // Wait for current set of concurrent batches to complete
      const batchResults = await Promise.allSettled(batchPromises);
      results.push(...batchResults);

      console.log(`Completed ${Math.min(i + concurrency, operationBatches.length)} of ${operationBatches.length} batches`);
    }

    const totalTime = Date.now() - startTime;

    return this.consolidateParallelResults(results, totalTime);
  }

  async processSingleBatch(collectionName, batch, options) {
    // Determine batch type and process accordingly
    if (batch.type === 'insert') {
      return await this.performBulkInsert(collectionName, batch.documents, options);
    } else if (batch.type === 'update') {
      return await this.performBulkUpdate(collectionName, batch.operations, options);
    } else if (batch.type === 'delete') {
      return await this.performBulkDelete(collectionName, batch.operations, options);
    } else if (batch.type === 'mixed') {
      return await this.performMixedBulkOperations(collectionName, batch.operations, options);
    }
  }

  async performShardOptimizedBulkOperations(collectionName, operations, shardKey) {
    console.log(`Performing shard-optimized bulk operations on ${collectionName}`);

    // Group operations by shard key for optimal routing
    const shardGroupedOps = this.groupOperationsByShardKey(operations, shardKey);

    const results = [];

    for (const [shardValue, shardOps] of shardGroupedOps.entries()) {
      console.log(`Processing ${shardOps.length} operations for shard key value: ${shardValue}`);

      const shardResult = await this.performMixedBulkOperations(collectionName, shardOps, {
        ordered: false // Better performance for sharded clusters
      });

      results.push({
        shardKey: shardValue,
        result: shardResult
      });
    }

    return this.consolidateShardResults(results);
  }

  groupOperationsByShardKey(operations, shardKey) {
    const grouped = new Map();

    for (const operation of operations) {
      let keyValue;

      if (operation.type === 'insert') {
        keyValue = operation.document[shardKey];
      } else {
        keyValue = operation.filter[shardKey];
      }

      if (!grouped.has(keyValue)) {
        grouped.set(keyValue, []);
      }

      grouped.get(keyValue).push(operation);
    }

    return grouped;
  }

  async performStreamingBulkOperations(collectionName, dataStream, options = {}) {
    console.log(`Starting streaming bulk operations on ${collectionName}`);

    const batchSize = options.batchSize || 1000;
    const processingOptions = {
      ordered: false,
      ...options
    };

    let batch = [];
    let totalProcessed = 0;
    const results = [];

    return new Promise((resolve, reject) => {
      dataStream.on('data', async (data) => {
        batch.push(data);

        if (batch.length >= batchSize) {
          try {
            const batchResult = await this.performBulkInsert(
              collectionName, 
              batch, 
              processingOptions
            );

            results.push(batchResult);
            totalProcessed += batchResult.insertedDocuments;
            batch = [];

            console.log(`Processed ${totalProcessed} documents so far`);

          } catch (error) {
            reject(error);
          }
        }
      });

      dataStream.on('end', async () => {
        try {
          // Process remaining documents
          if (batch.length > 0) {
            const finalResult = await this.performBulkInsert(
              collectionName, 
              batch, 
              processingOptions
            );
            results.push(finalResult);
            totalProcessed += finalResult.insertedDocuments;
          }

          resolve({
            success: true,
            totalProcessed: totalProcessed,
            batchResults: results
          });

        } catch (error) {
          reject(error);
        }
      });

      dataStream.on('error', reject);
    });
  }
}

QueryLeaf Bulk Operations Integration

QueryLeaf provides familiar SQL syntax for MongoDB bulk operations and batch processing:

-- QueryLeaf bulk operations with SQL-familiar syntax for MongoDB batch processing

-- Bulk insert with SQL VALUES syntax (automatically optimized for MongoDB bulk operations)
INSERT INTO users (name, email, age, department, salary, join_date)
VALUES 
  ('John Doe', 'john@example.com', 32, 'Engineering', 85000, CURRENT_DATE),
  ('Jane Smith', 'jane@example.com', 28, 'Sales', 75000, CURRENT_DATE - INTERVAL '1 month'),
  ('Bob Johnson', 'bob@example.com', 35, 'Marketing', 70000, CURRENT_DATE - INTERVAL '2 months'),
  ('Alice Brown', 'alice@example.com', 29, 'HR', 68000, CURRENT_DATE - INTERVAL '3 months'),
  ('Charlie Wilson', 'charlie@example.com', 31, 'Engineering', 90000, CURRENT_DATE - INTERVAL '4 months');

-- QueryLeaf automatically converts this to optimized MongoDB bulk insert:
-- db.users.insertMany([documents...], { ordered: false })

-- Bulk update operations using SQL UPDATE syntax
-- Update all engineers' salaries (automatically uses MongoDB bulk operations)
UPDATE users 
SET salary = salary * 1.1, 
    last_updated = CURRENT_TIMESTAMP,
    promotion_eligible = true
WHERE department = 'Engineering';

-- Update employees based on multiple conditions
UPDATE users 
SET employee_level = CASE 
  WHEN age > 35 AND salary > 80000 THEN 'Senior'
  WHEN age > 30 OR salary > 70000 THEN 'Mid-level'
  ELSE 'Junior'
END,
last_evaluation = CURRENT_DATE
WHERE join_date < CURRENT_DATE - INTERVAL '6 months';

-- QueryLeaf optimizes these as MongoDB bulk update operations:
-- Uses bulkWrite() with updateMany operations for optimal performance

-- Bulk delete operations
-- Clean up old inactive users
DELETE FROM users 
WHERE last_login < CURRENT_DATE - INTERVAL '2 years' 
  AND status = 'inactive';

-- Remove test data
DELETE FROM users 
WHERE email LIKE '%test%' OR email LIKE '%example%';

-- QueryLeaf converts to optimized bulk delete operations

-- Advanced bulk processing with data transformation and aggregation
WITH user_statistics AS (
  SELECT 
    department,
    COUNT(*) as employee_count,
    AVG(salary) as avg_salary,
    MAX(salary) as max_salary,
    MIN(join_date) as earliest_hire
  FROM users 
  GROUP BY department
),

salary_adjustments AS (
  SELECT 
    u._id,
    u.name,
    u.department,
    u.salary,
    us.avg_salary,

    -- Calculate adjustment based on department average
    CASE 
      WHEN u.salary < us.avg_salary * 0.8 THEN u.salary * 1.15  -- 15% increase
      WHEN u.salary < us.avg_salary * 0.9 THEN u.salary * 1.10  -- 10% increase  
      WHEN u.salary > us.avg_salary * 1.2 THEN u.salary * 1.02  -- 2% increase
      ELSE u.salary * 1.05  -- 5% standard increase
    END as new_salary,

    CURRENT_DATE as adjustment_date

  FROM users u
  JOIN user_statistics us ON u.department = us.department
  WHERE u.status = 'active'
)

-- Bulk update with calculated values (QueryLeaf optimizes this as bulk operation)
UPDATE users 
SET salary = sa.new_salary,
    last_salary_review = sa.adjustment_date,
    salary_review_reason = CONCAT('Department average adjustment - Previous: $', 
                                 CAST(sa.salary AS VARCHAR), 
                                 ', New: $', 
                                 CAST(sa.new_salary AS VARCHAR))
FROM salary_adjustments sa
WHERE users._id = sa._id;

-- Bulk data processing with conditional operations
-- Process employee performance reviews in batches
WITH performance_data AS (
  SELECT 
    _id,
    name,
    department,
    performance_score,

    -- Calculate performance category
    CASE 
      WHEN performance_score >= 90 THEN 'exceptional'
      WHEN performance_score >= 80 THEN 'exceeds_expectations'  
      WHEN performance_score >= 70 THEN 'meets_expectations'
      WHEN performance_score >= 60 THEN 'needs_improvement'
      ELSE 'unsatisfactory'
    END as performance_category,

    -- Calculate bonus eligibility
    CASE 
      WHEN performance_score >= 85 AND department IN ('Sales', 'Engineering') THEN true
      WHEN performance_score >= 90 THEN true
      ELSE false
    END as bonus_eligible,

    -- Calculate development plan requirement
    CASE 
      WHEN performance_score < 70 THEN true
      ELSE false  
    END as requires_development_plan

  FROM employees 
  WHERE review_period = '2025-Q3'
),

bonus_calculations AS (
  SELECT 
    pd._id,
    pd.bonus_eligible,

    -- Calculate bonus amount
    CASE 
      WHEN pd.performance_score >= 95 THEN u.salary * 0.15  -- 15% bonus
      WHEN pd.performance_score >= 90 THEN u.salary * 0.12  -- 12% bonus  
      WHEN pd.performance_score >= 85 THEN u.salary * 0.10  -- 10% bonus
      ELSE 0
    END as bonus_amount

  FROM performance_data pd
  JOIN users u ON pd._id = u._id
  WHERE pd.bonus_eligible = true
)

-- Execute bulk updates for performance review results
UPDATE users 
SET performance_category = pd.performance_category,
    bonus_eligible = pd.bonus_eligible,
    bonus_amount = COALESCE(bc.bonus_amount, 0),
    requires_development_plan = pd.requires_development_plan,
    last_performance_review = CURRENT_DATE,
    review_status = 'completed'
FROM performance_data pd
LEFT JOIN bonus_calculations bc ON pd._id = bc._id  
WHERE users._id = pd._id;

-- Advanced batch processing with data validation and error handling
-- Bulk data import with validation
INSERT INTO products (sku, name, category, price, stock_quantity, supplier_id, created_at)
SELECT 
  import_sku,
  import_name,
  import_category,
  CAST(import_price AS DECIMAL(10,2)),
  CAST(import_stock AS INTEGER),
  supplier_lookup.supplier_id,
  CURRENT_TIMESTAMP

FROM product_import_staging pis
JOIN suppliers supplier_lookup ON pis.supplier_name = supplier_lookup.name

-- Validation conditions
WHERE import_sku IS NOT NULL
  AND import_name IS NOT NULL  
  AND import_category IN ('Electronics', 'Clothing', 'Books', 'Home', 'Sports')
  AND import_price::DECIMAL(10,2) > 0
  AND import_stock::INTEGER >= 0
  AND supplier_lookup.supplier_id IS NOT NULL

  -- Duplicate check
  AND NOT EXISTS (
    SELECT 1 FROM products p 
    WHERE p.sku = pis.import_sku
  );

-- Bulk inventory adjustments with audit trail
WITH inventory_adjustments AS (
  SELECT 
    product_id,
    adjustment_quantity,
    adjustment_reason,
    adjustment_type, -- 'increase', 'decrease', 'recount'
    CURRENT_TIMESTAMP as adjustment_timestamp,
    'system' as adjusted_by
  FROM inventory_adjustment_queue
  WHERE processed = false
),

stock_calculations AS (
  SELECT 
    ia.product_id,
    p.stock_quantity as current_stock,

    CASE ia.adjustment_type
      WHEN 'increase' THEN p.stock_quantity + ia.adjustment_quantity
      WHEN 'decrease' THEN GREATEST(p.stock_quantity - ia.adjustment_quantity, 0)
      WHEN 'recount' THEN ia.adjustment_quantity
      ELSE p.stock_quantity
    END as new_stock_quantity,

    ia.adjustment_reason,
    ia.adjustment_timestamp,
    ia.adjusted_by

  FROM inventory_adjustments ia
  JOIN products p ON ia.product_id = p._id
)

-- Bulk update product stock levels
UPDATE products 
SET stock_quantity = sc.new_stock_quantity,
    last_stock_update = sc.adjustment_timestamp,
    stock_updated_by = sc.adjusted_by
FROM stock_calculations sc
WHERE products._id = sc.product_id;

-- Insert audit records for inventory changes
INSERT INTO inventory_audit_log (
  product_id,
  previous_stock,
  new_stock,
  adjustment_reason,
  adjustment_timestamp,
  adjusted_by
)
SELECT 
  sc.product_id,
  sc.current_stock,
  sc.new_stock_quantity,
  sc.adjustment_reason,
  sc.adjustment_timestamp,
  sc.adjusted_by
FROM stock_calculations sc;

-- Mark adjustment queue items as processed
UPDATE inventory_adjustment_queue 
SET processed = true,
    processed_at = CURRENT_TIMESTAMP
WHERE processed = false;

-- High-performance bulk operations with monitoring
-- Query for bulk operation performance analysis
WITH operation_metrics AS (
  SELECT 
    DATE_TRUNC('hour', operation_timestamp) as hour_bucket,
    operation_type, -- 'bulk_insert', 'bulk_update', 'bulk_delete'
    collection_name,

    -- Performance metrics
    COUNT(*) as operations_count,
    SUM(documents_processed) as total_documents,
    AVG(execution_time_ms) as avg_execution_time_ms,
    MAX(execution_time_ms) as max_execution_time_ms,
    MIN(execution_time_ms) as min_execution_time_ms,

    -- Throughput calculations
    AVG(throughput_docs_per_second) as avg_throughput_docs_per_sec,
    MAX(throughput_docs_per_second) as max_throughput_docs_per_sec,

    -- Error tracking
    COUNT(*) FILTER (WHERE success = false) as failed_operations,
    COUNT(*) FILTER (WHERE success = true) as successful_operations,

    -- Resource utilization
    AVG(memory_usage_mb) as avg_memory_usage_mb,
    AVG(cpu_utilization_percent) as avg_cpu_utilization

  FROM bulk_operation_log
  WHERE operation_timestamp >= CURRENT_TIMESTAMP - INTERVAL '24 hours'
  GROUP BY DATE_TRUNC('hour', operation_timestamp), operation_type, collection_name
)

SELECT 
  hour_bucket,
  operation_type,
  collection_name,
  operations_count,
  total_documents,

  -- Performance summary
  ROUND(avg_execution_time_ms, 2) as avg_execution_time_ms,
  ROUND(avg_throughput_docs_per_sec, 0) as avg_throughput_docs_per_sec,
  max_throughput_docs_per_sec,

  -- Success rate
  successful_operations,
  failed_operations,
  ROUND((successful_operations::DECIMAL / (successful_operations + failed_operations)) * 100, 2) as success_rate_percent,

  -- Resource efficiency  
  ROUND(avg_memory_usage_mb, 1) as avg_memory_usage_mb,
  ROUND(avg_cpu_utilization, 1) as avg_cpu_utilization_percent,

  -- Performance assessment
  CASE 
    WHEN avg_execution_time_ms < 100 AND success_rate_percent > 99 THEN 'excellent'
    WHEN avg_execution_time_ms < 500 AND success_rate_percent > 95 THEN 'good'
    WHEN avg_execution_time_ms < 1000 AND success_rate_percent > 90 THEN 'acceptable'
    ELSE 'needs_optimization'
  END as performance_rating

FROM operation_metrics
ORDER BY hour_bucket DESC, total_documents DESC;

-- QueryLeaf provides comprehensive bulk operation support:
-- 1. Automatic conversion of SQL batch operations to MongoDB bulk operations
-- 2. Optimal batching strategies based on operation types and data characteristics
-- 3. Advanced error handling with partial success reporting
-- 4. Performance monitoring and optimization recommendations
-- 5. Support for complex data transformations during bulk processing
-- 6. Intelligent resource utilization and concurrency management
-- 7. Integration with MongoDB's native bulk operation optimizations
-- 8. Familiar SQL syntax for complex batch processing workflows

Best Practices for MongoDB Bulk Operations

Performance Optimization Strategies

Essential principles for maximizing bulk operation performance:

  1. Batch Size Optimization: Choose optimal batch sizes based on document size, available memory, and network capacity
  2. Unordered Operations: Use unordered bulk operations when possible for better parallelization and performance
  3. Index Considerations: Consider index impact when performing bulk operations - create indexes before bulk inserts, after bulk updates
  4. Write Concern Configuration: Balance consistency requirements with performance using appropriate write concern settings
  5. Error Handling Strategy: Implement comprehensive error handling with partial success reporting and retry logic
  6. Resource Monitoring: Monitor system resources during bulk operations and adjust batch sizes dynamically

Production Deployment Considerations

Optimize bulk operations for enterprise production environments:

  1. Sharding Awareness: Design bulk operations to work efficiently with MongoDB sharded clusters
  2. Replication Optimization: Configure operations to work optimally with replica sets and read preferences
  3. Concurrency Management: Implement appropriate concurrency controls to prevent resource contention
  4. Progress Tracking: Provide comprehensive progress reporting for long-running bulk operations
  5. Memory Management: Monitor and control memory usage during large-scale bulk processing
  6. Performance Monitoring: Implement detailed performance monitoring and alerting for bulk operations

Conclusion

MongoDB bulk operations provide powerful capabilities for high-throughput data processing that dramatically improve performance compared to single-document operations through intelligent batching, automatic optimization, and comprehensive error handling. The native bulk operation support enables applications to efficiently process large volumes of data while maintaining consistency and providing detailed operational visibility.

Key MongoDB Bulk Operations benefits include:

  • High-Performance Processing: Optimal throughput through intelligent batching and reduced network overhead
  • Flexible Operation Types: Support for mixed bulk operations including inserts, updates, and deletes in single batches
  • Advanced Error Handling: Comprehensive error reporting with partial success tracking and recovery capabilities
  • Resource Optimization: Efficient memory and CPU utilization through optimized batch processing algorithms
  • Production Scalability: Enterprise-ready bulk processing with monitoring, progress tracking, and performance optimization
  • SQL Accessibility: Familiar SQL-style bulk operations through QueryLeaf for accessible high-performance data processing

Whether you're building data import systems, batch processing pipelines, ETL workflows, or high-throughput applications, MongoDB bulk operations with QueryLeaf's familiar SQL interface provide the foundation for efficient, scalable, and reliable batch data processing.

QueryLeaf Integration: QueryLeaf automatically optimizes SQL batch operations into MongoDB bulk operations while providing familiar SQL syntax for complex data processing workflows. Advanced bulk operation patterns, performance monitoring, and error handling are seamlessly handled through familiar SQL constructs, making high-performance batch processing accessible to SQL-oriented development teams.

The combination of MongoDB's robust bulk operation capabilities with SQL-style batch processing operations makes it an ideal platform for applications requiring both high-throughput data processing and familiar database operation patterns, ensuring your batch processing workflows can scale efficiently while maintaining performance and reliability.

MongoDB Backup and Recovery for Enterprise Data Protection: Advanced Disaster Recovery Strategies, Point-in-Time Recovery, and Operational Resilience

Enterprise applications require comprehensive data protection strategies that ensure business continuity during system failures, natural disasters, or data corruption events. Traditional database backup approaches often struggle with the complexity of distributed systems, large data volumes, and the stringent Recovery Time Objectives (RTO) and Recovery Point Objectives (RPO) demanded by modern business operations.

MongoDB's distributed architecture and flexible backup mechanisms provide sophisticated data protection capabilities that support everything from simple scheduled backups to complex multi-region disaster recovery scenarios. Unlike traditional relational systems that often require expensive specialized backup software and complex coordination across multiple database instances, MongoDB's replica sets, sharding, and oplog-based recovery enable native, high-performance backup strategies that integrate seamlessly with cloud storage systems and enterprise infrastructure.

The Traditional Backup Challenge

Conventional database backup approaches face significant limitations when dealing with large-scale distributed applications:

-- Traditional PostgreSQL backup approach - complex and time-consuming

-- Full database backup (blocks database during backup)
pg_dump --host=localhost --port=5432 --username=postgres \
  --format=custom --blobs --verbose --file=full_backup_20240130.dump \
  --schema=public ecommerce_db;

-- Problems with traditional full backups:
-- 1. Database blocking during backup operations
-- 2. Exponentially growing backup sizes
-- 3. Long recovery times for large databases
-- 4. No granular recovery options
-- 5. Complex coordination across multiple database instances
-- 6. Limited point-in-time recovery capabilities
-- 7. Expensive storage requirements for frequent backups
-- 8. Manual intervention required for disaster recovery scenarios

-- Incremental backup simulation (requires complex custom scripting)
BEGIN;

-- Create backup tracking table
CREATE TABLE IF NOT EXISTS backup_tracking (
    backup_id SERIAL PRIMARY KEY,
    backup_type VARCHAR(20) NOT NULL, -- full, incremental, differential
    backup_timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    last_lsn BIGINT,
    backup_size_bytes BIGINT,
    backup_location TEXT NOT NULL,
    backup_status VARCHAR(20) DEFAULT 'in_progress',
    completion_time TIMESTAMP,
    verification_status VARCHAR(20),
    retention_until TIMESTAMP
);

-- Track WAL position for incremental backups
CREATE TABLE IF NOT EXISTS wal_tracking (
    tracking_id SERIAL PRIMARY KEY,
    checkpoint_timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    wal_position BIGINT NOT NULL,
    transaction_count BIGINT,
    database_size_bytes BIGINT,
    active_connections INTEGER
);

COMMIT;

-- Complex stored procedure for incremental backup coordination
CREATE OR REPLACE FUNCTION perform_incremental_backup(
    backup_location TEXT,
    compression_level INTEGER DEFAULT 6
)
RETURNS TABLE (
    backup_id INTEGER,
    backup_size_bytes BIGINT,
    duration_seconds INTEGER,
    success BOOLEAN
) AS $$
DECLARE
    current_lsn BIGINT;
    last_backup_lsn BIGINT;
    backup_start_time TIMESTAMP := clock_timestamp();
    new_backup_id INTEGER;
    backup_command TEXT;
    backup_result INTEGER;
BEGIN
    -- Get current WAL position
    SELECT pg_current_wal_lsn() INTO current_lsn;

    -- Get last backup LSN
    SELECT COALESCE(MAX(last_lsn), 0) 
    INTO last_backup_lsn 
    FROM backup_tracking 
    WHERE backup_status = 'completed';

    -- Check if incremental backup is needed
    IF current_lsn <= last_backup_lsn THEN
        RAISE NOTICE 'No changes since last backup, skipping incremental backup';
        RETURN;
    END IF;

    -- Create backup record
    INSERT INTO backup_tracking (
        backup_type, 
        last_lsn, 
        backup_location
    ) 
    VALUES (
        'incremental', 
        current_lsn, 
        backup_location
    ) 
    RETURNING backup_id INTO new_backup_id;

    -- Perform incremental backup (simplified - actual implementation much more complex)
    -- This would require complex WAL shipping and parsing logic
    backup_command := format(
        'pg_basebackup --host=localhost --username=postgres --wal-method=stream --compress=%s --format=tar --pgdata=%s/incremental_%s',
        compression_level,
        backup_location,
        new_backup_id
    );

    -- Execute backup command (in real implementation)
    -- SELECT * FROM system_command(backup_command) INTO backup_result;
    backup_result := 0; -- Simulate success

    IF backup_result = 0 THEN
        -- Update backup record with completion
        UPDATE backup_tracking 
        SET 
            backup_status = 'completed',
            completion_time = clock_timestamp(),
            backup_size_bytes = pg_database_size(current_database())
        WHERE backup_id = new_backup_id;

        -- Record WAL tracking information
        INSERT INTO wal_tracking (
            wal_position,
            transaction_count,
            database_size_bytes,
            active_connections
        ) VALUES (
            current_lsn,
            (SELECT sum(xact_commit + xact_rollback) FROM pg_stat_database),
            pg_database_size(current_database()),
            (SELECT count(*) FROM pg_stat_activity WHERE state = 'active')
        );

        RETURN QUERY SELECT 
            new_backup_id,
            pg_database_size(current_database()),
            EXTRACT(SECONDS FROM clock_timestamp() - backup_start_time)::INTEGER,
            TRUE;
    ELSE
        -- Mark backup as failed
        UPDATE backup_tracking 
        SET backup_status = 'failed' 
        WHERE backup_id = new_backup_id;

        RETURN QUERY SELECT 
            new_backup_id,
            0::BIGINT,
            EXTRACT(SECONDS FROM clock_timestamp() - backup_start_time)::INTEGER,
            FALSE;
    END IF;
END;
$$ LANGUAGE plpgsql;

-- Point-in-time recovery simulation (extremely complex in traditional systems)
CREATE OR REPLACE FUNCTION simulate_point_in_time_recovery(
    target_timestamp TIMESTAMP,
    recovery_location TEXT
)
RETURNS TABLE (
    recovery_success BOOLEAN,
    recovered_to_timestamp TIMESTAMP,
    recovery_duration_minutes INTEGER,
    data_loss_minutes INTEGER
) AS $$
DECLARE
    base_backup_id INTEGER;
    target_lsn BIGINT;
    recovery_start_time TIMESTAMP := clock_timestamp();
    actual_recovery_timestamp TIMESTAMP;
BEGIN
    -- Find appropriate base backup
    SELECT backup_id 
    INTO base_backup_id
    FROM backup_tracking 
    WHERE backup_timestamp <= target_timestamp 
      AND backup_status = 'completed'
      AND backup_type IN ('full', 'differential')
    ORDER BY backup_timestamp DESC 
    LIMIT 1;

    IF base_backup_id IS NULL THEN
        RAISE EXCEPTION 'No suitable base backup found for timestamp %', target_timestamp;
    END IF;

    -- Find target LSN from WAL tracking
    SELECT wal_position 
    INTO target_lsn
    FROM wal_tracking 
    WHERE checkpoint_timestamp <= target_timestamp
    ORDER BY checkpoint_timestamp DESC 
    LIMIT 1;

    -- Simulate complex recovery process
    -- In reality, this involves:
    -- 1. Restoring base backup
    -- 2. Applying WAL files up to target point
    -- 3. Complex validation and consistency checks
    -- 4. Service coordination and failover

    -- Simulate recovery time based on data size and complexity
    PERFORM pg_sleep(
        CASE 
            WHEN pg_database_size(current_database()) > 1073741824 THEN 5 -- Large DB: 5+ minutes
            WHEN pg_database_size(current_database()) > 104857600 THEN 2  -- Medium DB: 2+ minutes
            ELSE 0.5 -- Small DB: 30+ seconds
        END
    );

    actual_recovery_timestamp := target_timestamp - INTERVAL '2 minutes'; -- Simulate slight data loss

    RETURN QUERY SELECT 
        TRUE as recovery_success,
        actual_recovery_timestamp,
        EXTRACT(MINUTES FROM clock_timestamp() - recovery_start_time)::INTEGER,
        EXTRACT(MINUTES FROM target_timestamp - actual_recovery_timestamp)::INTEGER;

END;
$$ LANGUAGE plpgsql;

-- Disaster recovery coordination (manual and error-prone)
CREATE OR REPLACE FUNCTION coordinate_disaster_recovery(
    disaster_scenario VARCHAR(100),
    recovery_site_location TEXT,
    maximum_data_loss_minutes INTEGER DEFAULT 15
)
RETURNS TABLE (
    step_number INTEGER,
    step_description TEXT,
    step_status VARCHAR(20),
    step_duration_minutes INTEGER,
    success BOOLEAN
) AS $$
DECLARE
    step_counter INTEGER := 0;
    total_start_time TIMESTAMP := clock_timestamp();
    step_start_time TIMESTAMP;
BEGIN
    -- Step 1: Assess disaster scope
    step_counter := step_counter + 1;
    step_start_time := clock_timestamp();

    -- Simulate disaster assessment
    PERFORM pg_sleep(0.5);

    RETURN QUERY SELECT 
        step_counter,
        'Assess disaster scope and determine recovery requirements',
        'completed',
        EXTRACT(MINUTES FROM clock_timestamp() - step_start_time)::INTEGER,
        TRUE;

    -- Step 2: Activate disaster recovery site
    step_counter := step_counter + 1;
    step_start_time := clock_timestamp();

    PERFORM pg_sleep(2);

    RETURN QUERY SELECT 
        step_counter,
        'Activate disaster recovery site and initialize infrastructure',
        'completed',
        EXTRACT(MINUTES FROM clock_timestamp() - step_start_time)::INTEGER,
        TRUE;

    -- Step 3: Restore latest backup
    step_counter := step_counter + 1;
    step_start_time := clock_timestamp();

    PERFORM pg_sleep(3);

    RETURN QUERY SELECT 
        step_counter,
        'Restore latest full backup to recovery site',
        'completed', 
        EXTRACT(MINUTES FROM clock_timestamp() - step_start_time)::INTEGER,
        TRUE;

    -- Step 4: Apply incremental backups and WAL files
    step_counter := step_counter + 1;
    step_start_time := clock_timestamp();

    PERFORM pg_sleep(1.5);

    RETURN QUERY SELECT 
        step_counter,
        'Apply incremental backups and WAL files for point-in-time recovery',
        'completed',
        EXTRACT(MINUTES FROM clock_timestamp() - step_start_time)::INTEGER,
        TRUE;

    -- Step 5: Validate data consistency and application connectivity
    step_counter := step_counter + 1;
    step_start_time := clock_timestamp();

    PERFORM pg_sleep(1);

    RETURN QUERY SELECT 
        step_counter,
        'Validate data consistency and test application connectivity',
        'completed',
        EXTRACT(MINUTES FROM clock_timestamp() - step_start_time)::INTEGER,
        TRUE;

    -- Step 6: Switch application traffic to recovery site
    step_counter := step_counter + 1;
    step_start_time := clock_timestamp();

    PERFORM pg_sleep(0.5);

    RETURN QUERY SELECT 
        step_counter,
        'Switch application traffic to disaster recovery site',
        'completed',
        EXTRACT(MINUTES FROM clock_timestamp() - step_start_time)::INTEGER,
        TRUE;

END;
$$ LANGUAGE plpgsql;

-- Problems with traditional disaster recovery approaches:
-- 1. Complex manual coordination across multiple systems and teams
-- 2. Long recovery times due to sequential restoration process
-- 3. High risk of human error during crisis situations
-- 4. Limited automation and orchestration capabilities
-- 5. Expensive infrastructure duplication requirements
-- 6. Difficult testing and validation of recovery procedures
-- 7. Poor integration with cloud storage and modern infrastructure
-- 8. Limited granular recovery options for specific collections or datasets
-- 9. Complex dependency management across related database systems
-- 10. High operational overhead for maintaining backup infrastructure

MongoDB provides comprehensive backup and recovery capabilities that address these traditional limitations:

// MongoDB Enterprise Backup and Recovery Management System
const { MongoClient, GridFSBucket } = require('mongodb');
const fs = require('fs');
const path = require('path');
const zlib = require('zlib');
const crypto = require('crypto');

// Advanced MongoDB backup and recovery management system
class MongoEnterpriseBackupManager {
  constructor(connectionUri, options = {}) {
    this.client = new MongoClient(connectionUri);
    this.db = null;
    this.gridFS = null;

    // Backup configuration
    this.config = {
      // Backup strategy settings
      backupStrategy: {
        enableFullBackups: options.backupStrategy?.enableFullBackups !== false,
        enableIncrementalBackups: options.backupStrategy?.enableIncrementalBackups !== false,
        fullBackupInterval: options.backupStrategy?.fullBackupInterval || '7d',
        incrementalBackupInterval: options.backupStrategy?.incrementalBackupInterval || '1h',
        retentionPeriod: options.backupStrategy?.retentionPeriod || '90d',
        compressionEnabled: options.backupStrategy?.compressionEnabled !== false,
        encryptionEnabled: options.backupStrategy?.encryptionEnabled || false
      },

      // Storage configuration
      storageSettings: {
        localBackupPath: options.storageSettings?.localBackupPath || './backups',
        cloudStorageEnabled: options.storageSettings?.cloudStorageEnabled || false,
        cloudProvider: options.storageSettings?.cloudProvider || 'aws', // aws, azure, gcp
        cloudBucket: options.storageSettings?.cloudBucket || 'mongodb-backups',
        storageClass: options.storageSettings?.storageClass || 'standard' // standard, infrequent, archive
      },

      // Recovery configuration
      recoverySettings: {
        enablePointInTimeRecovery: options.recoverySettings?.enablePointInTimeRecovery !== false,
        oplogRetentionHours: options.recoverySettings?.oplogRetentionHours || 72,
        parallelRecoveryThreads: options.recoverySettings?.parallelRecoveryThreads || 4,
        recoveryValidationEnabled: options.recoverySettings?.recoveryValidationEnabled !== false
      },

      // Disaster recovery configuration
      disasterRecovery: {
        enableCrossRegionReplication: options.disasterRecovery?.enableCrossRegionReplication || false,
        replicationRegions: options.disasterRecovery?.replicationRegions || [],
        automaticFailover: options.disasterRecovery?.automaticFailover || false,
        rpoMinutes: options.disasterRecovery?.rpoMinutes || 15, // Recovery Point Objective
        rtoMinutes: options.disasterRecovery?.rtoMinutes || 30   // Recovery Time Objective
      }
    };

    // Backup state tracking
    this.backupState = {
      lastFullBackup: null,
      lastIncrementalBackup: null,
      activeBackupOperations: new Map(),
      backupHistory: new Map(),
      recoveryOperations: new Map()
    };

    // Performance metrics
    this.metrics = {
      totalBackupsCreated: 0,
      totalDataBackedUp: 0,
      totalRecoveryOperations: 0,
      averageBackupTime: 0,
      averageRecoveryTime: 0,
      backupSuccessRate: 100,
      lastBackupTimestamp: null
    };
  }

  async initialize(databaseName) {
    console.log('Initializing MongoDB Enterprise Backup Manager...');

    try {
      await this.client.connect();
      this.db = this.client.db(databaseName);
      this.gridFS = new GridFSBucket(this.db, { bucketName: 'backups' });

      // Setup backup management collections
      await this.setupBackupCollections();

      // Initialize backup storage directories
      await this.initializeBackupStorage();

      // Load existing backup history
      await this.loadBackupHistory();

      // Setup automated backup scheduling if enabled
      if (this.config.backupStrategy.enableFullBackups || 
          this.config.backupStrategy.enableIncrementalBackups) {
        this.setupAutomatedBackups();
      }

      console.log('MongoDB Enterprise Backup Manager initialized successfully');

    } catch (error) {
      console.error('Error initializing backup manager:', error);
      throw error;
    }
  }

  // Create comprehensive full backup
  async createFullBackup(options = {}) {
    console.log('Starting full backup operation...');

    const backupId = this.generateBackupId();
    const startTime = Date.now();

    try {
      // Initialize backup operation tracking
      const backupOperation = {
        backupId: backupId,
        backupType: 'full',
        startTime: new Date(startTime),
        status: 'in_progress',
        collections: [],
        totalDocuments: 0,
        totalSize: 0,
        compressionRatio: 0,
        encryptionEnabled: this.config.backupStrategy.encryptionEnabled
      };

      this.backupState.activeBackupOperations.set(backupId, backupOperation);

      // Get list of collections to backup
      const collections = options.collections || await this.getBackupCollections();
      backupOperation.collections = collections.map(c => c.name);

      console.log(`Backing up ${collections.length} collections...`);

      // Create backup metadata
      const backupMetadata = {
        backupId: backupId,
        backupType: 'full',
        timestamp: new Date(),
        databaseName: this.db.databaseName,
        collections: collections.map(c => ({
          name: c.name,
          documentCount: 0,
          avgDocSize: 0,
          totalSize: 0,
          indexes: []
        })),
        backupSize: 0,
        compressionEnabled: this.config.backupStrategy.compressionEnabled,
        encryptionEnabled: this.config.backupStrategy.encryptionEnabled,
        version: '1.0'
      };

      // Backup each collection with metadata
      for (const collectionInfo of collections) {
        const collectionBackup = await this.backupCollection(
          collectionInfo.name, 
          backupId, 
          'full',
          options
        );

        // Update metadata
        const collectionMeta = backupMetadata.collections.find(c => c.name === collectionInfo.name);
        collectionMeta.documentCount = collectionBackup.documentCount;
        collectionMeta.avgDocSize = collectionBackup.avgDocSize;
        collectionMeta.totalSize = collectionBackup.totalSize;
        collectionMeta.indexes = collectionBackup.indexes;

        backupOperation.totalDocuments += collectionBackup.documentCount;
        backupOperation.totalSize += collectionBackup.totalSize;
      }

      // Backup database metadata and indexes
      await this.backupDatabaseMetadata(backupId, backupMetadata);

      // Create backup manifest
      const backupManifest = await this.createBackupManifest(backupId, backupMetadata);

      // Store backup in GridFS
      await this.storeBackupInGridFS(backupId, backupManifest);

      // Upload to cloud storage if enabled
      if (this.config.storageSettings.cloudStorageEnabled) {
        await this.uploadToCloudStorage(backupId, backupManifest);
      }

      // Calculate final metrics
      const endTime = Date.now();
      const duration = endTime - startTime;

      backupOperation.status = 'completed';
      backupOperation.endTime = new Date(endTime);
      backupOperation.duration = duration;
      backupOperation.compressionRatio = this.calculateCompressionRatio(backupOperation.totalSize, backupManifest.compressedSize);

      // Update backup history
      this.backupState.backupHistory.set(backupId, backupOperation);
      this.backupState.lastFullBackup = backupOperation;
      this.backupState.activeBackupOperations.delete(backupId);

      // Update metrics
      this.updateBackupMetrics(backupOperation);

      // Log backup completion
      await this.logBackupOperation(backupOperation);

      console.log(`Full backup completed successfully: ${backupId}`);
      console.log(`Duration: ${Math.round(duration / 1000)}s, Size: ${Math.round(backupOperation.totalSize / 1024 / 1024)}MB`);

      return {
        backupId: backupId,
        backupType: 'full',
        duration: duration,
        totalSize: backupOperation.totalSize,
        collections: backupOperation.collections.length,
        totalDocuments: backupOperation.totalDocuments,
        compressionRatio: backupOperation.compressionRatio,
        success: true
      };

    } catch (error) {
      console.error(`Full backup failed: ${backupId}`, error);

      // Update backup operation status
      const backupOperation = this.backupState.activeBackupOperations.get(backupId);
      if (backupOperation) {
        backupOperation.status = 'failed';
        backupOperation.error = error.message;
        backupOperation.endTime = new Date();

        // Move to history
        this.backupState.backupHistory.set(backupId, backupOperation);
        this.backupState.activeBackupOperations.delete(backupId);
      }

      throw error;
    }
  }

  // Create incremental backup based on oplog
  async createIncrementalBackup(options = {}) {
    console.log('Starting incremental backup operation...');

    if (!this.backupState.lastFullBackup) {
      throw new Error('No full backup found. Full backup required before incremental backup.');
    }

    const backupId = this.generateBackupId();
    const startTime = Date.now();

    try {
      // Get oplog entries since last backup
      const lastBackupTime = this.backupState.lastIncrementalBackup?.endTime || 
                             this.backupState.lastFullBackup.endTime;

      const oplogEntries = await this.getOplogEntries(lastBackupTime, options);

      console.log(`Processing ${oplogEntries.length} oplog entries for incremental backup...`);

      const backupOperation = {
        backupId: backupId,
        backupType: 'incremental',
        startTime: new Date(startTime),
        status: 'in_progress',
        baseBackupId: this.backupState.lastFullBackup.backupId,
        oplogEntries: oplogEntries.length,
        affectedCollections: new Set(),
        totalSize: 0
      };

      this.backupState.activeBackupOperations.set(backupId, backupOperation);

      // Process oplog entries and create incremental backup data
      const incrementalData = await this.processOplogForBackup(oplogEntries, backupId);

      // Update operation with processed data
      backupOperation.affectedCollections = Array.from(incrementalData.affectedCollections);
      backupOperation.totalSize = incrementalData.totalSize;

      // Create incremental backup manifest
      const incrementalManifest = {
        backupId: backupId,
        backupType: 'incremental',
        timestamp: new Date(),
        baseBackupId: this.backupState.lastFullBackup.backupId,
        oplogStartTime: lastBackupTime,
        oplogEndTime: new Date(),
        oplogEntries: oplogEntries.length,
        affectedCollections: backupOperation.affectedCollections,
        incrementalSize: incrementalData.totalSize
      };

      // Store incremental backup
      await this.storeIncrementalBackup(backupId, incrementalData, incrementalManifest);

      // Upload to cloud storage if enabled
      if (this.config.storageSettings.cloudStorageEnabled) {
        await this.uploadIncrementalToCloud(backupId, incrementalManifest);
      }

      // Complete backup operation
      const endTime = Date.now();
      const duration = endTime - startTime;

      backupOperation.status = 'completed';
      backupOperation.endTime = new Date(endTime);
      backupOperation.duration = duration;

      // Update backup state
      this.backupState.backupHistory.set(backupId, backupOperation);
      this.backupState.lastIncrementalBackup = backupOperation;
      this.backupState.activeBackupOperations.delete(backupId);

      // Update metrics
      this.updateBackupMetrics(backupOperation);

      // Log backup completion
      await this.logBackupOperation(backupOperation);

      console.log(`Incremental backup completed successfully: ${backupId}`);
      console.log(`Duration: ${Math.round(duration / 1000)}s, Oplog entries: ${oplogEntries.length}`);

      return {
        backupId: backupId,
        backupType: 'incremental',
        duration: duration,
        oplogEntries: oplogEntries.length,
        affectedCollections: backupOperation.affectedCollections.length,
        totalSize: backupOperation.totalSize,
        success: true
      };

    } catch (error) {
      console.error(`Incremental backup failed: ${backupId}`, error);

      const backupOperation = this.backupState.activeBackupOperations.get(backupId);
      if (backupOperation) {
        backupOperation.status = 'failed';
        backupOperation.error = error.message;
        backupOperation.endTime = new Date();

        this.backupState.backupHistory.set(backupId, backupOperation);
        this.backupState.activeBackupOperations.delete(backupId);
      }

      throw error;
    }
  }

  // Advanced point-in-time recovery
  async performPointInTimeRecovery(targetTimestamp, options = {}) {
    console.log(`Starting point-in-time recovery to ${targetTimestamp}...`);

    const recoveryId = this.generateRecoveryId();
    const startTime = Date.now();

    try {
      // Find appropriate backup chain for target timestamp
      const backupChain = await this.findBackupChain(targetTimestamp);

      if (!backupChain || backupChain.length === 0) {
        throw new Error(`No suitable backup found for timestamp: ${targetTimestamp}`);
      }

      console.log(`Using backup chain: ${backupChain.map(b => b.backupId).join(' -> ')}`);

      const recoveryOperation = {
        recoveryId: recoveryId,
        recoveryType: 'point_in_time',
        targetTimestamp: targetTimestamp,
        startTime: new Date(startTime),
        status: 'in_progress',
        backupChain: backupChain,
        recoveryDatabase: options.recoveryDatabase || `${this.db.databaseName}_recovery_${recoveryId}`,
        totalSteps: 0,
        completedSteps: 0
      };

      this.backupState.recoveryOperations.set(recoveryId, recoveryOperation);

      // Create recovery database
      const recoveryDb = this.client.db(recoveryOperation.recoveryDatabase);

      // Step 1: Restore base full backup
      console.log('Restoring base full backup...');
      await this.restoreFullBackup(backupChain[0], recoveryDb, recoveryOperation);
      recoveryOperation.completedSteps++;

      // Step 2: Apply incremental backups in sequence
      for (let i = 1; i < backupChain.length; i++) {
        console.log(`Applying incremental backup ${i}/${backupChain.length - 1}...`);
        await this.applyIncrementalBackup(backupChain[i], recoveryDb, recoveryOperation);
        recoveryOperation.completedSteps++;
      }

      // Step 3: Apply oplog entries up to target timestamp
      console.log('Applying oplog entries for point-in-time recovery...');
      await this.applyOplogToTimestamp(targetTimestamp, recoveryDb, recoveryOperation);
      recoveryOperation.completedSteps++;

      // Step 4: Validate recovered database
      if (this.config.recoverySettings.recoveryValidationEnabled) {
        console.log('Validating recovered database...');
        await this.validateRecoveredDatabase(recoveryDb, recoveryOperation);
        recoveryOperation.completedSteps++;
      }

      // Complete recovery operation
      const endTime = Date.now();
      const duration = endTime - startTime;

      recoveryOperation.status = 'completed';
      recoveryOperation.endTime = new Date(endTime);
      recoveryOperation.duration = duration;
      recoveryOperation.actualRecoveryTimestamp = await this.getLatestTimestampFromDb(recoveryDb);

      // Calculate data loss
      const dataLoss = targetTimestamp - recoveryOperation.actualRecoveryTimestamp;
      recoveryOperation.dataLossMs = Math.max(0, dataLoss);

      // Update metrics
      this.updateRecoveryMetrics(recoveryOperation);

      // Log recovery completion
      await this.logRecoveryOperation(recoveryOperation);

      console.log(`Point-in-time recovery completed successfully: ${recoveryId}`);
      console.log(`Recovery database: ${recoveryOperation.recoveryDatabase}`);
      console.log(`Duration: ${Math.round(duration / 1000)}s, Data loss: ${Math.round(dataLoss / 1000)}s`);

      return {
        recoveryId: recoveryId,
        recoveryType: 'point_in_time',
        duration: duration,
        recoveryDatabase: recoveryOperation.recoveryDatabase,
        actualRecoveryTimestamp: recoveryOperation.actualRecoveryTimestamp,
        dataLossMs: recoveryOperation.dataLossMs,
        backupChainLength: backupChain.length,
        success: true
      };

    } catch (error) {
      console.error(`Point-in-time recovery failed: ${recoveryId}`, error);

      const recoveryOperation = this.backupState.recoveryOperations.get(recoveryId);
      if (recoveryOperation) {
        recoveryOperation.status = 'failed';
        recoveryOperation.error = error.message;
        recoveryOperation.endTime = new Date();
      }

      throw error;
    }
  }

  // Disaster recovery orchestration
  async orchestrateDisasterRecovery(disasterScenario, options = {}) {
    console.log(`Orchestrating disaster recovery for scenario: ${disasterScenario}`);

    const recoveryId = this.generateRecoveryId();
    const startTime = Date.now();

    try {
      const disasterRecoveryOperation = {
        recoveryId: recoveryId,
        recoveryType: 'disaster_recovery',
        disasterScenario: disasterScenario,
        startTime: new Date(startTime),
        status: 'in_progress',
        steps: [],
        currentStep: 0,
        recoveryRegion: options.recoveryRegion || 'primary',
        targetRPO: this.config.disasterRecovery.rpoMinutes,
        targetRTO: this.config.disasterRecovery.rtoMinutes
      };

      this.backupState.recoveryOperations.set(recoveryId, disasterRecoveryOperation);

      // Define disaster recovery steps
      const recoverySteps = [
        {
          step: 1,
          description: 'Assess disaster scope and activate recovery procedures',
          action: this.assessDisasterScope.bind(this),
          estimatedDuration: 2
        },
        {
          step: 2, 
          description: 'Initialize disaster recovery infrastructure',
          action: this.initializeRecoveryInfrastructure.bind(this),
          estimatedDuration: 5
        },
        {
          step: 3,
          description: 'Locate and prepare latest backup chain',
          action: this.prepareDisasterRecoveryBackups.bind(this),
          estimatedDuration: 3
        },
        {
          step: 4,
          description: 'Restore database from backup chain',
          action: this.restoreDisasterRecoveryDatabase.bind(this),
          estimatedDuration: 15
        },
        {
          step: 5,
          description: 'Validate data consistency and integrity',
          action: this.validateDisasterRecoveryDatabase.bind(this),
          estimatedDuration: 3
        },
        {
          step: 6,
          description: 'Switch application traffic to recovery site',
          action: this.switchToRecoverySite.bind(this),
          estimatedDuration: 2
        }
      ];

      disasterRecoveryOperation.steps = recoverySteps;
      disasterRecoveryOperation.totalSteps = recoverySteps.length;

      // Execute recovery steps sequentially
      for (const step of recoverySteps) {
        console.log(`Executing step ${step.step}: ${step.description}`);
        disasterRecoveryOperation.currentStep = step.step;

        const stepStartTime = Date.now();

        try {
          await step.action(disasterRecoveryOperation, options);

          step.status = 'completed';
          step.actualDuration = Math.round((Date.now() - stepStartTime) / 1000 / 60);

          console.log(`Step ${step.step} completed in ${step.actualDuration} minutes`);

        } catch (stepError) {
          step.status = 'failed';
          step.error = stepError.message;
          step.actualDuration = Math.round((Date.now() - stepStartTime) / 1000 / 60);

          console.error(`Step ${step.step} failed:`, stepError);
          throw stepError;
        }
      }

      // Complete disaster recovery
      const endTime = Date.now();
      const totalDuration = Math.round((endTime - startTime) / 1000 / 60);

      disasterRecoveryOperation.status = 'completed';
      disasterRecoveryOperation.endTime = new Date(endTime);
      disasterRecoveryOperation.totalDuration = totalDuration;
      disasterRecoveryOperation.rtoAchieved = totalDuration <= this.config.disasterRecovery.rtoMinutes;

      // Update metrics
      this.updateRecoveryMetrics(disasterRecoveryOperation);

      // Log disaster recovery completion
      await this.logRecoveryOperation(disasterRecoveryOperation);

      console.log(`Disaster recovery completed successfully: ${recoveryId}`);
      console.log(`Total duration: ${totalDuration} minutes (RTO target: ${this.config.disasterRecovery.rtoMinutes} minutes)`);

      return {
        recoveryId: recoveryId,
        recoveryType: 'disaster_recovery',
        totalDuration: totalDuration,
        rtoAchieved: disasterRecoveryOperation.rtoAchieved,
        stepsCompleted: recoverySteps.filter(s => s.status === 'completed').length,
        totalSteps: recoverySteps.length,
        success: true
      };

    } catch (error) {
      console.error(`Disaster recovery failed: ${recoveryId}`, error);

      const recoveryOperation = this.backupState.recoveryOperations.get(recoveryId);
      if (recoveryOperation) {
        recoveryOperation.status = 'failed';
        recoveryOperation.error = error.message;
        recoveryOperation.endTime = new Date();
      }

      throw error;
    }
  }

  // Backup individual collection with compression and encryption
  async backupCollection(collectionName, backupId, backupType, options) {
    console.log(`Backing up collection: ${collectionName}`);

    const collection = this.db.collection(collectionName);
    const backupData = {
      collectionName: collectionName,
      backupId: backupId,
      backupType: backupType,
      timestamp: new Date(),
      documents: [],
      indexes: [],
      documentCount: 0,
      totalSize: 0,
      avgDocSize: 0
    };

    try {
      // Get collection stats
      const stats = await collection.stats();
      backupData.documentCount = stats.count || 0;
      backupData.totalSize = stats.size || 0;
      backupData.avgDocSize = backupData.documentCount > 0 ? backupData.totalSize / backupData.documentCount : 0;

      // Backup collection indexes
      const indexes = await collection.listIndexes().toArray();
      backupData.indexes = indexes.filter(idx => idx.name !== '_id_'); // Exclude default _id index

      // Stream collection documents for memory-efficient backup
      const cursor = collection.find({});
      const documents = [];

      while (await cursor.hasNext()) {
        const doc = await cursor.next();
        documents.push(doc);

        // Process in batches to manage memory usage
        if (documents.length >= 1000) {
          await this.processBatch(documents, backupData, backupId, collectionName);
          documents.length = 0; // Clear batch
        }
      }

      // Process remaining documents
      if (documents.length > 0) {
        await this.processBatch(documents, backupData, backupId, collectionName);
      }

      console.log(`Collection backup completed: ${collectionName} (${backupData.documentCount} documents)`);

      return backupData;

    } catch (error) {
      console.error(`Error backing up collection ${collectionName}:`, error);
      throw error;
    }
  }

  // Process document batch with compression and encryption
  async processBatch(documents, backupData, backupId, collectionName) {
    // Serialize documents to JSON
    const batchData = JSON.stringify(documents);

    // Apply compression if enabled
    let processedData = Buffer.from(batchData, 'utf8');
    if (this.config.backupStrategy.compressionEnabled) {
      processedData = zlib.gzipSync(processedData);
    }

    // Apply encryption if enabled  
    if (this.config.backupStrategy.encryptionEnabled) {
      processedData = this.encryptData(processedData);
    }

    // Store batch data (implementation would store to GridFS or file system)
    const batchId = `${backupId}_${collectionName}_${Date.now()}`;
    await this.storeBatch(batchId, processedData);

    backupData.documents.push({
      batchId: batchId,
      documentCount: documents.length,
      compressedSize: processedData.length,
      originalSize: Buffer.byteLength(batchData, 'utf8')
    });
  }

  // Get oplog entries for incremental backup
  async getOplogEntries(fromTimestamp, options = {}) {
    console.log(`Retrieving oplog entries from ${fromTimestamp}...`);

    try {
      const oplogDb = this.client.db('local');
      const oplogCollection = oplogDb.collection('oplog.rs');

      // Query oplog for entries since last backup
      const query = {
        ts: { $gt: fromTimestamp },
        ns: { $regex: `^${this.db.databaseName}\.` }, // Only our database
        op: { $in: ['i', 'u', 'd'] } // Insert, update, delete operations
      };

      // Exclude certain collections from oplog backup
      const excludeCollections = options.excludeCollections || ['backups.files', 'backups.chunks'];
      if (excludeCollections.length > 0) {
        query.ns = {
          $regex: `^${this.db.databaseName}\.`,
          $nin: excludeCollections.map(col => `${this.db.databaseName}.${col}`)
        };
      }

      const oplogEntries = await oplogCollection
        .find(query)
        .sort({ ts: 1 })
        .limit(options.maxEntries || 100000)
        .toArray();

      console.log(`Retrieved ${oplogEntries.length} oplog entries`);

      return oplogEntries;

    } catch (error) {
      console.error('Error retrieving oplog entries:', error);
      throw error;
    }
  }

  // Process oplog entries for incremental backup
  async processOplogForBackup(oplogEntries, backupId) {
    console.log('Processing oplog entries for incremental backup...');

    const incrementalData = {
      backupId: backupId,
      oplogEntries: oplogEntries,
      affectedCollections: new Set(),
      totalSize: 0,
      operationCounts: {
        inserts: 0,
        updates: 0,
        deletes: 0
      }
    };

    // Group oplog entries by collection
    const collectionOps = new Map();

    for (const entry of oplogEntries) {
      const collectionName = entry.ns.split('.')[1];
      incrementalData.affectedCollections.add(collectionName);

      if (!collectionOps.has(collectionName)) {
        collectionOps.set(collectionName, []);
      }
      collectionOps.get(collectionName).push(entry);

      // Count operation types
      switch (entry.op) {
        case 'i': incrementalData.operationCounts.inserts++; break;
        case 'u': incrementalData.operationCounts.updates++; break;  
        case 'd': incrementalData.operationCounts.deletes++; break;
      }
    }

    // Process and store oplog data per collection
    for (const [collectionName, ops] of collectionOps) {
      const collectionOplogData = JSON.stringify(ops);
      let processedData = Buffer.from(collectionOplogData, 'utf8');

      // Apply compression
      if (this.config.backupStrategy.compressionEnabled) {
        processedData = zlib.gzipSync(processedData);
      }

      // Apply encryption
      if (this.config.backupStrategy.encryptionEnabled) {
        processedData = this.encryptData(processedData);
      }

      // Store incremental data
      const incrementalId = `${backupId}_oplog_${collectionName}`;
      await this.storeIncrementalData(incrementalId, processedData);

      incrementalData.totalSize += processedData.length;
    }

    console.log(`Processed oplog for ${incrementalData.affectedCollections.size} collections`);

    return incrementalData;
  }

  // Comprehensive backup analytics and monitoring
  async getBackupAnalytics(timeRange = '30d') {
    console.log('Generating backup and recovery analytics...');

    const timeRanges = {
      '1d': 1,
      '7d': 7,
      '30d': 30,
      '90d': 90
    };

    const days = timeRanges[timeRange] || 30;
    const startDate = new Date(Date.now() - (days * 24 * 60 * 60 * 1000));

    try {
      // Get backup history from database
      const backupHistory = await this.db.collection('backup_operations')
        .find({
          startTime: { $gte: startDate }
        })
        .sort({ startTime: -1 })
        .toArray();

      // Get recovery history
      const recoveryHistory = await this.db.collection('recovery_operations')
        .find({
          startTime: { $gte: startDate }
        })
        .sort({ startTime: -1 })
        .toArray();

      // Calculate analytics
      const analytics = {
        reportGeneratedAt: new Date(),
        timeRange: timeRange,

        // Backup statistics
        backupStatistics: {
          totalBackups: backupHistory.length,
          fullBackups: backupHistory.filter(b => b.backupType === 'full').length,
          incrementalBackups: backupHistory.filter(b => b.backupType === 'incremental').length,
          successfulBackups: backupHistory.filter(b => b.status === 'completed').length,
          failedBackups: backupHistory.filter(b => b.status === 'failed').length,
          successRate: backupHistory.length > 0 
            ? (backupHistory.filter(b => b.status === 'completed').length / backupHistory.length) * 100 
            : 0,

          // Size and performance metrics
          totalDataBackedUp: backupHistory
            .filter(b => b.status === 'completed')
            .reduce((sum, b) => sum + (b.totalSize || 0), 0),
          averageBackupSize: 0,
          averageBackupDuration: 0,
          averageCompressionRatio: 0
        },

        // Recovery statistics  
        recoveryStatistics: {
          totalRecoveryOperations: recoveryHistory.length,
          pointInTimeRecoveries: recoveryHistory.filter(r => r.recoveryType === 'point_in_time').length,
          disasterRecoveries: recoveryHistory.filter(r => r.recoveryType === 'disaster_recovery').length,
          successfulRecoveries: recoveryHistory.filter(r => r.status === 'completed').length,
          failedRecoveries: recoveryHistory.filter(r => r.status === 'failed').length,
          recoverySuccessRate: recoveryHistory.length > 0 
            ? (recoveryHistory.filter(r => r.status === 'completed').length / recoveryHistory.length) * 100 
            : 0,

          // Performance metrics
          averageRecoveryDuration: 0,
          averageDataLoss: 0,
          rtoCompliance: 0,
          rpoCompliance: 0
        },

        // System health indicators
        systemHealth: {
          backupFrequency: this.calculateBackupFrequency(backupHistory),
          storageUtilization: await this.calculateStorageUtilization(),
          lastSuccessfulBackup: backupHistory.find(b => b.status === 'completed'),
          nextScheduledBackup: this.getNextScheduledBackup(),
          alertsAndWarnings: []
        },

        // Detailed backup history
        recentBackups: backupHistory.slice(0, 10),
        recentRecoveries: recoveryHistory.slice(0, 5)
      };

      // Calculate averages
      const completedBackups = backupHistory.filter(b => b.status === 'completed');
      if (completedBackups.length > 0) {
        analytics.backupStatistics.averageBackupSize = 
          analytics.backupStatistics.totalDataBackedUp / completedBackups.length;
        analytics.backupStatistics.averageBackupDuration = 
          completedBackups.reduce((sum, b) => sum + (b.duration || 0), 0) / completedBackups.length;
        analytics.backupStatistics.averageCompressionRatio = 
          completedBackups.reduce((sum, b) => sum + (b.compressionRatio || 1), 0) / completedBackups.length;
      }

      const completedRecoveries = recoveryHistory.filter(r => r.status === 'completed');
      if (completedRecoveries.length > 0) {
        analytics.recoveryStatistics.averageRecoveryDuration = 
          completedRecoveries.reduce((sum, r) => sum + (r.duration || 0), 0) / completedRecoveries.length;
        analytics.recoveryStatistics.averageDataLoss = 
          completedRecoveries.reduce((sum, r) => sum + (r.dataLossMs || 0), 0) / completedRecoveries.length;
      }

      // Generate alerts and warnings
      analytics.systemHealth.alertsAndWarnings = this.generateHealthAlerts(analytics);

      return analytics;

    } catch (error) {
      console.error('Error generating backup analytics:', error);
      throw error;
    }
  }

  // Utility methods
  async setupBackupCollections() {
    // Create indexes for backup management collections
    await this.db.collection('backup_operations').createIndexes([
      { key: { backupId: 1 }, unique: true },
      { key: { backupType: 1, startTime: -1 } },
      { key: { status: 1, startTime: -1 } },
      { key: { startTime: -1 } }
    ]);

    await this.db.collection('recovery_operations').createIndexes([
      { key: { recoveryId: 1 }, unique: true },
      { key: { recoveryType: 1, startTime: -1 } },
      { key: { status: 1, startTime: -1 } }
    ]);
  }

  async initializeBackupStorage() {
    // Create backup storage directories
    if (!fs.existsSync(this.config.storageSettings.localBackupPath)) {
      fs.mkdirSync(this.config.storageSettings.localBackupPath, { recursive: true });
    }
  }

  generateBackupId() {
    return `backup_${Date.now()}_${Math.random().toString(36).substr(2, 9)}`;
  }

  generateRecoveryId() {
    return `recovery_${Date.now()}_${Math.random().toString(36).substr(2, 9)}`;
  }

  calculateCompressionRatio(originalSize, compressedSize) {
    return originalSize > 0 ? originalSize / compressedSize : 1;
  }

  encryptData(data) {
    // Simplified encryption - in production, use proper encryption libraries
    const cipher = crypto.createCipher('aes192', 'backup-encryption-key');
    let encrypted = cipher.update(data, 'binary', 'hex');
    encrypted += cipher.final('hex');
    return Buffer.from(encrypted, 'hex');
  }

  async storeBatch(batchId, data) {
    // Store batch data in GridFS
    const uploadStream = this.gridFS.openUploadStream(batchId);
    uploadStream.end(data);
    return new Promise((resolve, reject) => {
      uploadStream.on('finish', resolve);
      uploadStream.on('error', reject);
    });
  }

  async logBackupOperation(backupOperation) {
    await this.db.collection('backup_operations').insertOne({
      ...backupOperation,
      loggedAt: new Date()
    });
  }

  async logRecoveryOperation(recoveryOperation) {
    await this.db.collection('recovery_operations').insertOne({
      ...recoveryOperation,
      loggedAt: new Date()
    });
  }

  // Placeholder methods for complex operations
  async getBackupCollections() { /* Implementation */ return []; }
  async backupDatabaseMetadata(backupId, metadata) { /* Implementation */ }
  async createBackupManifest(backupId, metadata) { /* Implementation */ return {}; }
  async storeBackupInGridFS(backupId, manifest) { /* Implementation */ }
  async uploadToCloudStorage(backupId, manifest) { /* Implementation */ }
  async storeIncrementalBackup(backupId, data, manifest) { /* Implementation */ }
  async findBackupChain(timestamp) { /* Implementation */ return []; }
  async restoreFullBackup(backup, db, operation) { /* Implementation */ }
  async applyIncrementalBackup(backup, db, operation) { /* Implementation */ }
  async applyOplogToTimestamp(timestamp, db, operation) { /* Implementation */ }
  async validateRecoveredDatabase(db, operation) { /* Implementation */ }
  async assessDisasterScope(operation, options) { /* Implementation */ }
  async initializeRecoveryInfrastructure(operation, options) { /* Implementation */ }
  async prepareDisasterRecoveryBackups(operation, options) { /* Implementation */ }
  async restoreDisasterRecoveryDatabase(operation, options) { /* Implementation */ }
  async validateDisasterRecoveryDatabase(operation, options) { /* Implementation */ }
  async switchToRecoverySite(operation, options) { /* Implementation */ }

  updateBackupMetrics(operation) {
    this.metrics.totalBackupsCreated++;
    this.metrics.totalDataBackedUp += operation.totalSize || 0;
    this.metrics.lastBackupTimestamp = operation.endTime;
  }

  updateRecoveryMetrics(operation) {
    this.metrics.totalRecoveryOperations++;
    // Update other recovery metrics
  }
}

// Example usage demonstrating comprehensive backup and recovery
async function demonstrateEnterpriseBackupRecovery() {
  const backupManager = new MongoEnterpriseBackupManager('mongodb://localhost:27017');

  try {
    await backupManager.initialize('production_ecommerce');

    console.log('Performing full backup...');
    const fullBackupResult = await backupManager.createFullBackup();
    console.log('Full backup result:', fullBackupResult);

    // Simulate some data changes
    console.log('Simulating data changes...');
    await new Promise(resolve => setTimeout(resolve, 2000));

    console.log('Performing incremental backup...');
    const incrementalBackupResult = await backupManager.createIncrementalBackup();
    console.log('Incremental backup result:', incrementalBackupResult);

    // Demonstrate point-in-time recovery
    const recoveryTimestamp = new Date(Date.now() - 60000); // 1 minute ago
    console.log('Performing point-in-time recovery...');
    const recoveryResult = await backupManager.performPointInTimeRecovery(recoveryTimestamp);
    console.log('Recovery result:', recoveryResult);

    // Generate analytics report
    const analytics = await backupManager.getBackupAnalytics('30d');
    console.log('Backup Analytics:', JSON.stringify(analytics, null, 2));

  } catch (error) {
    console.error('Backup and recovery demonstration error:', error);
  }
}

module.exports = {
  MongoEnterpriseBackupManager,
  demonstrateEnterpriseBackupRecovery
};

QueryLeaf Backup and Recovery Integration

QueryLeaf provides SQL-familiar syntax for MongoDB backup and recovery operations:

-- QueryLeaf backup and recovery with SQL-style commands

-- Create comprehensive backup strategy configuration
CREATE BACKUP_STRATEGY enterprise_production AS (
  -- Strategy identification
  strategy_name = 'enterprise_production_backups',
  strategy_description = 'Production environment backup strategy with disaster recovery',

  -- Backup scheduling configuration
  full_backup_schedule = JSON_OBJECT(
    'frequency', 'weekly',
    'day_of_week', 'sunday', 
    'time', '02:00:00',
    'timezone', 'UTC'
  ),

  incremental_backup_schedule = JSON_OBJECT(
    'frequency', 'hourly',
    'interval_hours', 4,
    'business_hours_only', false
  ),

  -- Data retention policy
  retention_policy = JSON_OBJECT(
    'full_backups_retention_days', 90,
    'incremental_backups_retention_days', 30,
    'archive_after_days', 365,
    'permanent_retention_monthly', true
  ),

  -- Storage configuration
  storage_configuration = JSON_OBJECT(
    'primary_storage', JSON_OBJECT(
      'type', 'cloud',
      'provider', 'aws',
      'bucket', 'enterprise-mongodb-backups',
      'region', 'us-east-1',
      'storage_class', 'standard'
    ),
    'secondary_storage', JSON_OBJECT(
      'type', 'cloud',
      'provider', 'azure',
      'container', 'backup-replica',
      'region', 'east-us-2',
      'storage_class', 'cool'
    ),
    'local_cache', JSON_OBJECT(
      'enabled', true,
      'path', '/backup/cache',
      'max_size_gb', 500
    )
  ),

  -- Compression and encryption settings
  data_protection = JSON_OBJECT(
    'compression_enabled', true,
    'compression_algorithm', 'gzip',
    'compression_level', 6,
    'encryption_enabled', true,
    'encryption_algorithm', 'AES-256',
    'key_rotation_days', 90
  ),

  -- Performance and resource limits
  performance_settings = JSON_OBJECT(
    'max_concurrent_backups', 3,
    'backup_bandwidth_limit_mbps', 100,
    'memory_limit_gb', 8,
    'backup_timeout_hours', 6,
    'parallel_collection_backups', true
  )
);

-- Execute full backup with comprehensive options
EXECUTE BACKUP full_backup_production WITH OPTIONS (
  -- Backup scope
  backup_type = 'full',
  databases = JSON_ARRAY('ecommerce', 'analytics', 'user_management'),
  include_system_collections = true,
  include_indexes = true,

  -- Quality and validation
  verify_backup_integrity = true,
  test_restore_sample = true,
  backup_checksum_validation = true,

  -- Performance optimization
  batch_size = 1000,
  parallel_collections = 4,
  compression_level = 6,

  -- Metadata and tracking
  backup_tags = JSON_OBJECT(
    'environment', 'production',
    'application', 'ecommerce_platform',
    'backup_tier', 'critical',
    'retention_class', 'long_term'
  ),

  backup_description = 'Weekly full backup for production ecommerce platform'
);

-- Monitor backup progress with real-time analytics
WITH backup_progress AS (
  SELECT 
    backup_id,
    backup_type,
    database_name,

    -- Progress tracking
    total_collections,
    completed_collections,
    ROUND((completed_collections::numeric / total_collections) * 100, 2) as progress_percentage,

    -- Performance metrics
    EXTRACT(MINUTES FROM CURRENT_TIMESTAMP - backup_start_time) as elapsed_minutes,
    CASE 
      WHEN completed_collections > 0 THEN
        ROUND(
          (total_collections - completed_collections) * 
          (EXTRACT(MINUTES FROM CURRENT_TIMESTAMP - backup_start_time) / completed_collections),
          0
        )
      ELSE NULL
    END as estimated_remaining_minutes,

    -- Size and throughput
    total_documents_processed,
    total_size_backed_up_mb,
    ROUND(
      total_size_backed_up_mb / 
      (EXTRACT(MINUTES FROM CURRENT_TIMESTAMP - backup_start_time) + 0.1), 
      2
    ) as throughput_mb_per_minute,

    -- Compression and efficiency
    original_size_mb,
    compressed_size_mb,
    ROUND(
      CASE 
        WHEN original_size_mb > 0 THEN 
          (1 - (compressed_size_mb / original_size_mb)) * 100 
        ELSE 0 
      END, 
      1
    ) as compression_ratio_percent,

    backup_status,
    error_count,
    warning_count

  FROM ACTIVE_BACKUP_OPERATIONS()
  WHERE backup_status IN ('running', 'finalizing')
),

-- Resource utilization analysis
resource_utilization AS (
  SELECT 
    backup_id,

    -- System resource usage
    cpu_usage_percent,
    memory_usage_mb,
    disk_io_mb_per_sec,
    network_io_mb_per_sec,

    -- Database performance impact
    active_connections_during_backup,
    query_response_time_impact_percent,
    replication_lag_seconds,

    -- Storage utilization
    backup_storage_used_gb,
    available_storage_gb,
    ROUND(
      (backup_storage_used_gb / (backup_storage_used_gb + available_storage_gb)) * 100, 
      1
    ) as storage_utilization_percent

  FROM BACKUP_RESOURCE_MONITORING()
  WHERE monitoring_timestamp >= DATE_SUB(NOW(), INTERVAL 1 HOUR)
)

SELECT 
  -- Current backup status
  bp.backup_id,
  bp.backup_type,
  bp.database_name,
  bp.progress_percentage || '%' as progress,
  bp.backup_status,

  -- Time estimates
  bp.elapsed_minutes || ' min elapsed' as duration,
  COALESCE(bp.estimated_remaining_minutes || ' min remaining', 'Calculating...') as eta,

  -- Performance indicators
  bp.throughput_mb_per_minute || ' MB/min' as throughput,
  bp.compression_ratio_percent || '% compression' as compression,

  -- Quality indicators
  bp.error_count as errors,
  bp.warning_count as warnings,
  bp.total_documents_processed as documents,

  -- Resource impact
  ru.cpu_usage_percent || '%' as cpu_usage,
  ru.memory_usage_mb || 'MB' as memory_usage,
  ru.query_response_time_impact_percent || '% slower' as db_impact,
  ru.storage_utilization_percent || '%' as storage_used,

  -- Health assessment
  CASE 
    WHEN bp.error_count > 0 THEN 'Errors Detected'
    WHEN ru.cpu_usage_percent > 80 THEN 'High CPU Usage'
    WHEN ru.query_response_time_impact_percent > 20 THEN 'High DB Impact'
    WHEN bp.throughput_mb_per_minute < 10 THEN 'Low Throughput'
    WHEN ru.storage_utilization_percent > 90 THEN 'Storage Critical'
    ELSE 'Healthy'
  END as health_status,

  -- Recommendations
  CASE 
    WHEN bp.throughput_mb_per_minute < 10 THEN 'Consider increasing batch size or parallel operations'
    WHEN ru.cpu_usage_percent > 80 THEN 'Reduce concurrent operations or backup during off-peak hours'
    WHEN ru.query_response_time_impact_percent > 20 THEN 'Schedule backup during maintenance window'
    WHEN ru.storage_utilization_percent > 90 THEN 'Archive old backups or increase storage capacity'
    WHEN bp.progress_percentage > 95 THEN 'Backup nearing completion, prepare for verification'
    ELSE 'Backup proceeding normally'
  END as recommendation

FROM backup_progress bp
LEFT JOIN resource_utilization ru ON bp.backup_id = ru.backup_id
ORDER BY bp.backup_start_time DESC;

-- Advanced point-in-time recovery with SQL-style syntax
WITH recovery_analysis AS (
  SELECT 
    target_timestamp,

    -- Find optimal backup chain
    (SELECT backup_id FROM BACKUP_OPERATIONS 
     WHERE backup_type = 'full' 
       AND backup_timestamp <= target_timestamp 
       AND backup_status = 'completed'
     ORDER BY backup_timestamp DESC 
     LIMIT 1) as base_backup_id,

    -- Count incremental backups needed
    (SELECT COUNT(*) FROM BACKUP_OPERATIONS
     WHERE backup_type = 'incremental'
       AND backup_timestamp <= target_timestamp
       AND backup_timestamp > (
         SELECT backup_timestamp FROM BACKUP_OPERATIONS 
         WHERE backup_type = 'full' 
           AND backup_timestamp <= target_timestamp 
           AND backup_status = 'completed'
         ORDER BY backup_timestamp DESC 
         LIMIT 1
       )) as incremental_backups_needed,

    -- Estimate recovery time
    (SELECT 
       (backup_duration_minutes * 0.8) + -- Full restore (slightly faster than backup)
       (COUNT(*) * 5) + -- Incremental backups (5 min each)
       10 -- Oplog application and validation
     FROM BACKUP_OPERATIONS
     WHERE backup_type = 'incremental'
       AND backup_timestamp <= target_timestamp
     GROUP BY target_timestamp) as estimated_recovery_minutes,

    -- Calculate potential data loss
    TIMESTAMPDIFF(SECOND, target_timestamp, 
      (SELECT MAX(oplog_timestamp) FROM OPLOG_BACKUP_COVERAGE 
       WHERE oplog_timestamp <= target_timestamp)) as potential_data_loss_seconds

  FROM (SELECT TIMESTAMP('2024-01-30 14:30:00') as target_timestamp) t
)

-- Execute point-in-time recovery
EXECUTE RECOVERY point_in_time_recovery WITH OPTIONS (
  -- Recovery target
  recovery_target_timestamp = '2024-01-30 14:30:00',
  recovery_target_name = 'pre_deployment_state',

  -- Recovery destination  
  recovery_database = 'ecommerce_recovery_20240130',
  recovery_mode = 'new_database', -- new_database, replace_existing, parallel_validation

  -- Recovery scope
  include_databases = JSON_ARRAY('ecommerce', 'user_management'),
  exclude_collections = JSON_ARRAY('temp_data', 'cache_collection'),
  include_system_data = true,

  -- Performance and safety options
  parallel_recovery_threads = 4,
  recovery_batch_size = 500,
  validate_recovery = true,
  create_recovery_report = true,

  -- Backup chain configuration (auto-detected if not specified)
  base_backup_id = (SELECT base_backup_id FROM recovery_analysis),

  -- Safety and rollback
  enable_recovery_rollback = true,
  recovery_timeout_minutes = 120,

  -- Notification and logging
  notify_on_completion = JSON_ARRAY('dba@company.com', 'ops-team@company.com'),
  recovery_priority = 'high',

  recovery_metadata = JSON_OBJECT(
    'requested_by', 'database_admin',
    'business_justification', 'Rollback deployment due to data corruption',
    'ticket_number', 'INC-2024-0130-001',
    'approval_code', 'RECOVERY-AUTH-789'
  )
) RETURNING recovery_operation_id, estimated_completion_time, recovery_database_name;

-- Monitor point-in-time recovery progress
WITH recovery_progress AS (
  SELECT 
    recovery_operation_id,
    recovery_type,
    target_timestamp,
    recovery_database,

    -- Progress tracking
    total_recovery_steps,
    completed_recovery_steps,
    current_step_description,
    ROUND((completed_recovery_steps::numeric / total_recovery_steps) * 100, 2) as progress_percentage,

    -- Time analysis
    EXTRACT(MINUTES FROM CURRENT_TIMESTAMP - recovery_start_time) as elapsed_minutes,
    estimated_total_duration_minutes,
    estimated_remaining_minutes,

    -- Data recovery metrics
    total_collections_to_restore,
    collections_restored,
    documents_recovered,
    oplog_entries_applied,

    -- Quality and validation
    validation_errors,
    consistency_warnings,
    recovery_status,

    -- Performance metrics
    recovery_throughput_mb_per_minute,
    current_memory_usage_mb,
    current_cpu_usage_percent

  FROM ACTIVE_RECOVERY_OPERATIONS()
  WHERE recovery_status IN ('initializing', 'restoring', 'applying_oplog', 'validating')
),

-- Recovery validation and integrity checks
recovery_validation AS (
  SELECT 
    recovery_operation_id,

    -- Data integrity checks
    total_document_count_original,
    total_document_count_recovered,
    document_count_variance,

    -- Index validation
    total_indexes_original,
    total_indexes_recovered,  
    index_recreation_success_rate,

    -- Consistency validation
    referential_integrity_check_status,
    data_type_consistency_status,
    duplicate_detection_status,

    -- Business rule validation
    constraint_validation_errors,
    business_rule_violations,

    -- Performance baseline comparison
    query_performance_comparison_percent,
    storage_size_comparison_percent,

    -- Final validation score
    CASE 
      WHEN document_count_variance = 0 
        AND index_recreation_success_rate = 100
        AND referential_integrity_check_status = 'PASSED'
        AND constraint_validation_errors = 0
      THEN 'EXCELLENT'
      WHEN ABS(document_count_variance) < 0.1
        AND index_recreation_success_rate >= 95
        AND constraint_validation_errors < 10
      THEN 'GOOD'
      WHEN ABS(document_count_variance) < 1.0
        AND index_recreation_success_rate >= 90
      THEN 'ACCEPTABLE'
      ELSE 'NEEDS_REVIEW'
    END as overall_validation_status

  FROM RECOVERY_VALIDATION_RESULTS()
  WHERE validation_completed_at >= DATE_SUB(NOW(), INTERVAL 2 HOUR)
)

SELECT 
  -- Recovery operation overview
  rp.recovery_operation_id,
  rp.recovery_type,
  rp.target_timestamp,
  rp.recovery_database,
  rp.progress_percentage || '%' as progress,
  rp.recovery_status,

  -- Timing information
  rp.elapsed_minutes || ' min elapsed' as duration,
  rp.estimated_remaining_minutes || ' min remaining' as eta,
  rp.current_step_description as current_activity,

  -- Recovery metrics
  rp.collections_restored || '/' || rp.total_collections_to_restore as collections_progress,
  FORMAT_NUMBER(rp.documents_recovered) as documents_recovered,
  FORMAT_NUMBER(rp.oplog_entries_applied) as oplog_entries,

  -- Performance indicators
  rp.recovery_throughput_mb_per_minute || ' MB/min' as throughput,
  rp.current_memory_usage_mb || ' MB' as memory_usage,
  rp.current_cpu_usage_percent || '%' as cpu_usage,

  -- Quality metrics
  rp.validation_errors as errors,
  rp.consistency_warnings as warnings,

  -- Validation results (when available)
  COALESCE(rv.overall_validation_status, 'IN_PROGRESS') as validation_status,
  COALESCE(rv.document_count_variance || '%', 'Calculating...') as data_accuracy,
  COALESCE(rv.index_recreation_success_rate || '%', 'Pending...') as index_success,

  -- Health and status indicators
  CASE 
    WHEN rp.recovery_status = 'failed' THEN 'Recovery Failed'
    WHEN rp.validation_errors > 0 THEN 'Validation Errors Detected'
    WHEN rp.current_cpu_usage_percent > 90 THEN 'High Resource Usage'
    WHEN rp.progress_percentage > 95 AND rp.recovery_status = 'validating' THEN 'Final Validation'
    WHEN rp.recovery_status = 'completed' THEN 'Recovery Completed Successfully'
    ELSE 'Recovery In Progress'
  END as status_indicator,

  -- Recommendations and next steps
  CASE 
    WHEN rp.recovery_status = 'completed' AND rv.overall_validation_status = 'EXCELLENT' 
      THEN 'Recovery completed successfully. Database ready for use.'
    WHEN rp.recovery_status = 'completed' AND rv.overall_validation_status = 'GOOD'
      THEN 'Recovery completed. Minor inconsistencies detected, review validation report.'
    WHEN rp.recovery_status = 'completed' AND rv.overall_validation_status = 'NEEDS_REVIEW'
      THEN 'Recovery completed with issues. Manual review required before production use.'
    WHEN rp.validation_errors > 0 
      THEN 'Validation errors detected. Check recovery logs and consider retry.'
    WHEN rp.estimated_remaining_minutes < 10 
      THEN 'Recovery nearly complete. Prepare for validation phase.'
    WHEN rp.recovery_throughput_mb_per_minute < 5 
      THEN 'Low recovery throughput. Consider resource optimization.'
    ELSE 'Recovery progressing normally. Continue monitoring.'
  END as recommendations

FROM recovery_progress rp
LEFT JOIN recovery_validation rv ON rp.recovery_operation_id = rv.recovery_operation_id
ORDER BY rp.recovery_start_time DESC;

-- Disaster recovery orchestration dashboard
CREATE VIEW disaster_recovery_dashboard AS
SELECT 
  -- Current disaster recovery readiness
  (SELECT COUNT(*) FROM BACKUP_OPERATIONS 
   WHERE backup_status = 'completed' 
     AND backup_timestamp >= DATE_SUB(NOW(), INTERVAL 24 HOUR)) as backups_last_24h,

  (SELECT MIN(TIMESTAMPDIFF(HOUR, backup_timestamp, NOW())) 
   FROM BACKUP_OPERATIONS 
   WHERE backup_type = 'full' AND backup_status = 'completed') as hours_since_last_full_backup,

  (SELECT COUNT(*) FROM BACKUP_OPERATIONS 
   WHERE backup_type = 'incremental' 
     AND backup_timestamp >= DATE_SUB(NOW(), INTERVAL 4 HOUR)
     AND backup_status = 'completed') as recent_incremental_backups,

  -- Recovery capabilities
  (SELECT COUNT(*) FROM RECOVERY_TEST_OPERATIONS 
   WHERE test_timestamp >= DATE_SUB(NOW(), INTERVAL 30 DAY)
     AND test_status = 'successful') as successful_recovery_tests_30d,

  (SELECT AVG(recovery_duration_minutes) FROM RECOVERY_TEST_OPERATIONS
   WHERE test_timestamp >= DATE_SUB(NOW(), INTERVAL 90 DAY)
     AND test_status = 'successful') as avg_recovery_time_minutes,

  -- RPO/RTO compliance
  (SELECT 
     CASE 
       WHEN MIN(TIMESTAMPDIFF(MINUTE, backup_timestamp, NOW())) <= 15 THEN 'COMPLIANT'
       WHEN MIN(TIMESTAMPDIFF(MINUTE, backup_timestamp, NOW())) <= 30 THEN 'WARNING'  
       ELSE 'NON_COMPLIANT'
     END
   FROM BACKUP_OPERATIONS 
   WHERE backup_status = 'completed') as rpo_compliance_status,

  (SELECT 
     CASE 
       WHEN avg_recovery_time_minutes <= 30 THEN 'COMPLIANT'
       WHEN avg_recovery_time_minutes <= 60 THEN 'WARNING'
       ELSE 'NON_COMPLIANT'  
     END) as rto_compliance_status,

  -- Storage and capacity
  (SELECT SUM(backup_size_mb) FROM BACKUP_OPERATIONS 
   WHERE backup_status = 'completed') as total_backup_storage_mb,

  (SELECT available_storage_gb FROM STORAGE_CAPACITY_MONITORING 
   ORDER BY monitoring_timestamp DESC LIMIT 1) as available_storage_gb,

  -- System health indicators
  (SELECT COUNT(*) FROM ACTIVE_BACKUP_OPERATIONS()) as active_backup_operations,
  (SELECT COUNT(*) FROM ACTIVE_RECOVERY_OPERATIONS()) as active_recovery_operations,

  -- Alert conditions
  JSON_ARRAYAGG(
    CASE 
      WHEN hours_since_last_full_backup > 168 THEN 'Full backup overdue'
      WHEN recent_incremental_backups = 0 THEN 'No recent incremental backups'
      WHEN successful_recovery_tests_30d = 0 THEN 'No recent recovery testing'
      WHEN available_storage_gb < 100 THEN 'Low storage capacity'
      WHEN rpo_compliance_status = 'NON_COMPLIANT' THEN 'RPO compliance violation'
      WHEN rto_compliance_status = 'NON_COMPLIANT' THEN 'RTO compliance violation'
    END
  ) as active_alerts,

  -- Overall disaster recovery readiness score
  CASE 
    WHEN hours_since_last_full_backup <= 24
      AND recent_incremental_backups >= 6  
      AND successful_recovery_tests_30d >= 2
      AND rpo_compliance_status = 'COMPLIANT'
      AND rto_compliance_status = 'COMPLIANT'
      AND available_storage_gb >= 500
    THEN 'EXCELLENT'
    WHEN hours_since_last_full_backup <= 48
      AND recent_incremental_backups >= 3
      AND successful_recovery_tests_30d >= 1  
      AND rpo_compliance_status != 'NON_COMPLIANT'
      AND available_storage_gb >= 200
    THEN 'GOOD'
    WHEN hours_since_last_full_backup <= 168
      AND recent_incremental_backups >= 1
      AND available_storage_gb >= 100
    THEN 'FAIR'
    ELSE 'CRITICAL'
  END as disaster_recovery_readiness,

  NOW() as dashboard_timestamp;

-- QueryLeaf backup and recovery capabilities provide:
-- 1. SQL-familiar backup strategy configuration and execution
-- 2. Real-time backup and recovery progress monitoring  
-- 3. Advanced point-in-time recovery with comprehensive validation
-- 4. Disaster recovery orchestration and readiness assessment
-- 5. Performance optimization and resource utilization tracking
-- 6. Comprehensive analytics and compliance reporting
-- 7. Integration with MongoDB's native backup capabilities
-- 8. Enterprise-grade automation and scheduling features
-- 9. Multi-storage tier management and lifecycle policies
-- 10. Complete audit trail and regulatory compliance support

Best Practices for MongoDB Backup and Recovery

Backup Strategy Design

Essential principles for comprehensive data protection:

  1. 3-2-1 Rule: Maintain 3 copies of data, on 2 different storage types, with 1 offsite copy
  2. Tiered Storage: Use different storage classes based on access patterns and retention requirements
  3. Incremental Backups: Implement frequent incremental backups to minimize data loss
  4. Testing and Validation: Regularly test backup restoration and validate data integrity
  5. Automation: Automate backup processes to reduce human error and ensure consistency
  6. Monitoring: Implement comprehensive monitoring for backup success and storage utilization

Recovery Planning

Optimize recovery strategies for business continuity:

  1. RTO/RPO Definition: Clearly define Recovery Time and Point Objectives for different scenarios
  2. Recovery Testing: Conduct regular disaster recovery drills and document procedures
  3. Priority Classification: Classify data and applications by recovery priority
  4. Documentation: Maintain detailed recovery procedures and contact information
  5. Cross-Region Strategy: Implement geographic distribution for disaster resilience
  6. Validation Procedures: Establish data validation protocols for recovered systems

Conclusion

MongoDB's comprehensive backup and recovery capabilities provide enterprise-grade data protection that supports complex disaster recovery scenarios, automated backup workflows, and granular point-in-time recovery operations. By implementing advanced backup strategies with QueryLeaf's familiar SQL interface, organizations can ensure business continuity while maintaining operational simplicity and regulatory compliance.

Key MongoDB backup and recovery benefits include:

  • Native Integration: Seamless integration with MongoDB's replica sets and sharding for optimal performance
  • Flexible Recovery Options: Point-in-time recovery, selective collection restore, and cross-region disaster recovery
  • Automated Workflows: Sophisticated scheduling, retention management, and cloud storage integration
  • Performance Optimization: Parallel processing, compression, and incremental backup strategies
  • Enterprise Features: Encryption, compliance reporting, and comprehensive audit trails
  • Operational Simplicity: Familiar SQL-style backup and recovery commands reduce learning curve

Whether you're protecting financial transaction data, healthcare records, or e-commerce platforms, MongoDB's backup and recovery capabilities with QueryLeaf's enterprise management interface provide the foundation for robust data protection strategies that scale with your organization's growth and compliance requirements.

QueryLeaf Integration: QueryLeaf automatically translates SQL-familiar backup and recovery commands into optimized MongoDB operations, providing familiar scheduling, monitoring, and validation capabilities. Advanced disaster recovery orchestration, compliance reporting, and performance optimization are seamlessly handled through SQL-style interfaces, making enterprise-grade data protection both comprehensive and accessible for database-oriented teams.

The combination of MongoDB's native backup capabilities with SQL-style operational commands makes it an ideal platform for mission-critical applications requiring both sophisticated data protection and familiar administrative workflows, ensuring your backup and recovery strategies remain both effective and maintainable as they evolve to meet changing business requirements.

MongoDB Schema Evolution and Migration Strategies: Advanced Patterns for Database Versioning, Backward Compatibility, and SQL-Style Schema Management

Production MongoDB applications face inevitable schema evolution challenges as business requirements change, data models mature, and application functionality expands. Traditional relational databases handle schema changes through DDL operations with strict versioning, but often require complex migration scripts, application downtime, and careful coordination between database and application deployments.

MongoDB's flexible document model provides powerful schema evolution capabilities that enable incremental data model changes, backward compatibility maintenance, and zero-downtime migrations. Unlike rigid relational schemas, MongoDB supports mixed document structures within collections, enabling gradual transitions and sophisticated migration strategies that adapt to real-world deployment constraints.

The Traditional Schema Migration Challenge

Conventional relational databases face significant limitations when implementing schema evolution and data migration:

-- Traditional PostgreSQL schema migration - rigid and disruptive approach

-- Step 1: Create backup table (downtime and storage overhead)
CREATE TABLE users_backup AS SELECT * FROM users;

-- Step 2: Add new columns with application downtime
ALTER TABLE users 
ADD COLUMN user_preferences JSONB DEFAULT '{}',
ADD COLUMN subscription_tier VARCHAR(50) DEFAULT 'basic',
ADD COLUMN last_login_timestamp TIMESTAMP,
ADD COLUMN account_status VARCHAR(20) DEFAULT 'active',
ADD COLUMN profile_completion_percentage INTEGER DEFAULT 0;

-- Step 3: Update existing data (potentially long-running operation)
BEGIN TRANSACTION;

-- Complex data transformation requiring application logic
UPDATE users 
SET user_preferences = jsonb_build_object(
  'email_notifications', true,
  'privacy_level', 'standard',
  'theme', 'light',
  'language', 'en'
)
WHERE user_preferences = '{}';

-- Derive subscription tier from existing data
UPDATE users 
SET subscription_tier = CASE 
  WHEN annual_subscription_fee > 120 THEN 'premium'
  WHEN annual_subscription_fee > 60 THEN 'plus' 
  ELSE 'basic'
END
WHERE subscription_tier = 'basic';

-- Calculate profile completion
UPDATE users 
SET profile_completion_percentage = (
  CASE WHEN email IS NOT NULL THEN 20 ELSE 0 END +
  CASE WHEN phone IS NOT NULL THEN 20 ELSE 0 END +
  CASE WHEN address IS NOT NULL THEN 20 ELSE 0 END +
  CASE WHEN birth_date IS NOT NULL THEN 20 ELSE 0 END +
  CASE WHEN bio IS NOT NULL AND LENGTH(bio) > 50 THEN 20 ELSE 0 END
)
WHERE profile_completion_percentage = 0;

COMMIT TRANSACTION;

-- Step 4: Create new indexes (additional downtime)
CREATE INDEX CONCURRENTLY users_subscription_tier_idx ON users(subscription_tier);
CREATE INDEX CONCURRENTLY users_last_login_idx ON users(last_login_timestamp);
CREATE INDEX CONCURRENTLY users_account_status_idx ON users(account_status);

-- Step 5: Drop old columns (breaking change requiring application updates)
ALTER TABLE users 
DROP COLUMN IF EXISTS old_preferences_text,
DROP COLUMN IF EXISTS legacy_status_code,
DROP COLUMN IF EXISTS deprecated_login_count;

-- Step 6: Rename columns (coordinated deployment required)
ALTER TABLE users 
RENAME COLUMN user_email TO email_address,
RENAME COLUMN user_phone to phone_number;

-- Step 7: Create migration log table (manual tracking)
CREATE TABLE schema_migrations (
    migration_id SERIAL PRIMARY KEY,
    migration_name VARCHAR(200) NOT NULL,
    applied_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    application_version VARCHAR(50),
    database_version VARCHAR(50),
    rollback_script TEXT,
    migration_notes TEXT
);

INSERT INTO schema_migrations (
    migration_name, 
    application_version, 
    database_version,
    rollback_script,
    migration_notes
) VALUES (
    'users_table_v2_migration',
    '2.1.0',
    '2.1.0',
    'ALTER TABLE users DROP COLUMN user_preferences, subscription_tier, last_login_timestamp, account_status, profile_completion_percentage;',
    'Added user preferences, subscription tiers, and profile completion tracking'
);

-- Problems with traditional schema migration approaches:
-- 1. Application downtime required for structural changes
-- 2. All-or-nothing migration approach with limited rollback capabilities
-- 3. Complex coordination between database and application deployments
-- 4. Risk of data loss during migration failures
-- 5. Performance impact during large table modifications
-- 6. Limited support for gradual migration and A/B testing scenarios
-- 7. Difficulty in maintaining multiple application versions simultaneously
-- 8. Complex rollback procedures requiring manual intervention
-- 9. Poor support for distributed systems and microservices architectures
-- 10. High operational overhead for migration planning and execution

MongoDB provides sophisticated schema evolution capabilities with flexible document structures:

// MongoDB Schema Evolution - flexible and non-disruptive approach
const { MongoClient } = require('mongodb');

// Advanced MongoDB Schema Migration and Evolution Management System
class MongoSchemaEvolutionManager {
  constructor(connectionUri, options = {}) {
    this.client = new MongoClient(connectionUri);
    this.db = null;
    this.collections = new Map();

    // Schema evolution configuration
    this.config = {
      // Migration strategy settings
      migrationStrategy: {
        approachType: options.migrationStrategy?.approachType || 'gradual', // gradual, immediate, hybrid
        batchSize: options.migrationStrategy?.batchSize || 1000,
        concurrentOperations: options.migrationStrategy?.concurrentOperations || 3,
        maxExecutionTimeMs: options.migrationStrategy?.maxExecutionTimeMs || 300000, // 5 minutes
        enableRollback: options.migrationStrategy?.enableRollback !== false
      },

      // Version management
      versionManagement: {
        trackDocumentVersions: options.versionManagement?.trackDocumentVersions !== false,
        versionField: options.versionManagement?.versionField || '_schema_version',
        migrationLogCollection: options.versionManagement?.migrationLogCollection || 'schema_migrations',
        enableVersionValidation: options.versionManagement?.enableVersionValidation !== false
      },

      // Backward compatibility
      backwardCompatibility: {
        maintainOldFields: options.backwardCompatibility?.maintainOldFields !== false,
        gracefulDegradation: options.backwardCompatibility?.gracefulDegradation !== false,
        compatibilityPeriodDays: options.backwardCompatibility?.compatibilityPeriodDays || 90,
        enableFieldAliasing: options.backwardCompatibility?.enableFieldAliasing !== false
      },

      // Performance optimization
      performanceSettings: {
        useIndexedMigration: options.performanceSettings?.useIndexedMigration !== false,
        enableProgressTracking: options.performanceSettings?.enableProgressTracking !== false,
        optimizeConcurrency: options.performanceSettings?.optimizeConcurrency !== false,
        memoryLimitMB: options.performanceSettings?.memoryLimitMB || 512
      }
    };

    // Schema version registry
    this.schemaVersions = new Map();
    this.migrationPlans = new Map();
    this.activeMigrations = new Map();

    // Migration execution state
    this.migrationProgress = new Map();
    this.rollbackStrategies = new Map();
  }

  async initialize(databaseName) {
    console.log('Initializing MongoDB Schema Evolution Manager...');

    try {
      await this.client.connect();
      this.db = this.client.db(databaseName);

      // Setup system collections for schema management
      await this.setupSchemaManagementCollections();

      // Load existing schema versions and migration history
      await this.loadSchemaVersionRegistry();

      console.log('Schema evolution manager initialized successfully');

    } catch (error) {
      console.error('Error initializing schema evolution manager:', error);
      throw error;
    }
  }

  async setupSchemaManagementCollections() {
    console.log('Setting up schema management collections...');

    // Schema version registry
    const schemaVersions = this.db.collection('schema_versions');
    await schemaVersions.createIndexes([
      { key: { collection_name: 1, version: 1 }, unique: true },
      { key: { is_active: 1 } },
      { key: { created_at: -1 } }
    ]);

    // Migration execution log
    const migrationLog = this.db.collection(this.config.versionManagement.migrationLogCollection);
    await migrationLog.createIndexes([
      { key: { migration_id: 1 }, unique: true },
      { key: { collection_name: 1, execution_timestamp: -1 } },
      { key: { migration_status: 1 } },
      { key: { schema_version_from: 1, schema_version_to: 1 } }
    ]);

    // Migration progress tracking
    const migrationProgress = this.db.collection('migration_progress');
    await migrationProgress.createIndexes([
      { key: { migration_id: 1 }, unique: true },
      { key: { collection_name: 1 } },
      { key: { status: 1 } }
    ]);
  }

  async defineSchemaVersion(collectionName, versionConfig) {
    console.log(`Defining schema version for collection: ${collectionName}`);

    const schemaVersion = {
      collection_name: collectionName,
      version: versionConfig.version,
      version_name: versionConfig.versionName || `v${versionConfig.version}`,

      // Schema definition
      schema_definition: {
        fields: versionConfig.fields || {},
        required_fields: versionConfig.requiredFields || [],
        optional_fields: versionConfig.optionalFields || [],
        deprecated_fields: versionConfig.deprecatedFields || [],

        // Field transformations and mappings
        field_mappings: versionConfig.fieldMappings || {},
        data_transformations: versionConfig.dataTransformations || {},
        validation_rules: versionConfig.validationRules || {}
      },

      // Migration configuration
      migration_config: {
        migration_type: versionConfig.migrationType || 'additive', // additive, transformative, breaking
        backward_compatible: versionConfig.backwardCompatible !== false,
        requires_reindex: versionConfig.requiresReindex || false,
        data_transformation_required: versionConfig.dataTransformationRequired || false,

        // Performance settings
        batch_processing: versionConfig.batchProcessing !== false,
        parallel_execution: versionConfig.parallelExecution || false,
        estimated_duration_minutes: versionConfig.estimatedDuration || 0
      },

      // Compatibility and rollback
      compatibility_info: {
        compatible_with_versions: versionConfig.compatibleVersions || [],
        breaking_changes: versionConfig.breakingChanges || [],
        rollback_strategy: versionConfig.rollbackStrategy || 'automatic',
        rollback_script: versionConfig.rollbackScript || null
      },

      // Metadata
      version_metadata: {
        created_by: versionConfig.createdBy || 'system',
        created_at: new Date(),
        is_active: versionConfig.isActive !== false,
        deployment_notes: versionConfig.deploymentNotes || '',
        business_justification: versionConfig.businessJustification || ''
      }
    };

    // Store schema version definition
    const schemaVersions = this.db.collection('schema_versions');
    await schemaVersions.replaceOne(
      { collection_name: collectionName, version: versionConfig.version },
      schemaVersion,
      { upsert: true }
    );

    // Cache schema version
    this.schemaVersions.set(`${collectionName}:${versionConfig.version}`, schemaVersion);

    console.log(`Schema version ${versionConfig.version} defined for ${collectionName}`);
    return schemaVersion;
  }

  async createMigrationPlan(collectionName, fromVersion, toVersion, options = {}) {
    console.log(`Creating migration plan: ${collectionName} v${fromVersion} → v${toVersion}`);

    const sourceSchema = this.schemaVersions.get(`${collectionName}:${fromVersion}`);
    const targetSchema = this.schemaVersions.get(`${collectionName}:${toVersion}`);

    if (!sourceSchema || !targetSchema) {
      throw new Error(`Schema version not found for migration: ${fromVersion}${toVersion}`);
    }

    const migrationPlan = {
      migration_id: this.generateMigrationId(),
      collection_name: collectionName,
      schema_version_from: fromVersion,
      schema_version_to: toVersion,

      // Migration analysis
      migration_analysis: {
        migration_type: this.analyzeMigrationType(sourceSchema, targetSchema),
        impact_assessment: await this.assessMigrationImpact(collectionName, sourceSchema, targetSchema),
        field_changes: this.analyzeFieldChanges(sourceSchema, targetSchema),
        data_transformation_required: this.requiresDataTransformation(sourceSchema, targetSchema)
      },

      // Execution plan
      execution_plan: {
        migration_steps: await this.generateMigrationSteps(sourceSchema, targetSchema),
        execution_order: options.executionOrder || 'sequential',
        batch_configuration: {
          batch_size: options.batchSize || this.config.migrationStrategy.batchSize,
          concurrent_batches: options.concurrentBatches || this.config.migrationStrategy.concurrentOperations,
          throttle_delay_ms: options.throttleDelay || 10
        },

        // Performance predictions
        estimated_execution_time: await this.estimateExecutionTime(collectionName, sourceSchema, targetSchema),
        resource_requirements: await this.calculateResourceRequirements(collectionName, sourceSchema, targetSchema)
      },

      // Safety and rollback
      safety_measures: {
        backup_required: options.backupRequired !== false,
        validation_checks: await this.generateValidationChecks(sourceSchema, targetSchema),
        rollback_plan: await this.generateRollbackPlan(sourceSchema, targetSchema),
        progress_checkpoints: options.progressCheckpoints || []
      },

      // Metadata
      plan_metadata: {
        created_at: new Date(),
        created_by: options.createdBy || 'system',
        plan_version: '1.0',
        approval_required: options.approvalRequired || false,
        deployment_window: options.deploymentWindow || null
      }
    };

    // Store migration plan
    await this.db.collection('migration_plans').replaceOne(
      { migration_id: migrationPlan.migration_id },
      migrationPlan,
      { upsert: true }
    );

    // Cache migration plan
    this.migrationPlans.set(migrationPlan.migration_id, migrationPlan);

    console.log(`Migration plan created: ${migrationPlan.migration_id}`);
    return migrationPlan;
  }

  async executeMigration(migrationId, options = {}) {
    console.log(`Executing migration: ${migrationId}`);

    const migrationPlan = this.migrationPlans.get(migrationId);
    if (!migrationPlan) {
      throw new Error(`Migration plan not found: ${migrationId}`);
    }

    const executionId = this.generateExecutionId();
    const startTime = Date.now();

    try {
      // Initialize migration execution tracking
      await this.initializeMigrationExecution(executionId, migrationPlan, options);

      // Pre-migration validation and preparation
      await this.performPreMigrationChecks(migrationPlan);

      // Execute migration based on strategy
      const migrationResult = await this.executeByStrategy(migrationPlan, executionId, options);

      // Post-migration validation
      await this.performPostMigrationValidation(migrationPlan, migrationResult);

      // Update migration log
      await this.logMigrationCompletion(executionId, migrationPlan, migrationResult, {
        start_time: startTime,
        end_time: Date.now(),
        status: 'success'
      });

      console.log(`Migration completed successfully: ${migrationId}`);
      return migrationResult;

    } catch (error) {
      console.error(`Migration failed: ${migrationId}`, error);

      // Attempt automatic rollback if enabled
      if (this.config.migrationStrategy.enableRollback && options.autoRollback !== false) {
        try {
          await this.executeRollback(executionId, migrationPlan);
        } catch (rollbackError) {
          console.error('Rollback failed:', rollbackError);
        }
      }

      // Log migration failure
      await this.logMigrationCompletion(executionId, migrationPlan, null, {
        start_time: startTime,
        end_time: Date.now(),
        status: 'failed',
        error: error.message
      });

      throw error;
    }
  }

  async executeByStrategy(migrationPlan, executionId, options) {
    const strategy = options.strategy || this.config.migrationStrategy.approachType;

    switch (strategy) {
      case 'gradual':
        return await this.executeGradualMigration(migrationPlan, executionId, options);
      case 'immediate':
        return await this.executeImmediateMigration(migrationPlan, executionId, options);
      case 'hybrid':
        return await this.executeHybridMigration(migrationPlan, executionId, options);
      default:
        throw new Error(`Unknown migration strategy: ${strategy}`);
    }
  }

  async executeGradualMigration(migrationPlan, executionId, options) {
    console.log('Executing gradual migration strategy...');

    const collection = this.db.collection(migrationPlan.collection_name);
    const batchConfig = migrationPlan.execution_plan.batch_configuration;

    let processedCount = 0;
    let totalCount = await collection.countDocuments();
    let lastId = null;

    console.log(`Processing ${totalCount} documents in batches of ${batchConfig.batch_size}`);

    while (processedCount < totalCount) {
      // Build batch query
      const batchQuery = lastId 
        ? { _id: { $gt: lastId }, [this.config.versionManagement.versionField]: migrationPlan.schema_version_from }
        : { [this.config.versionManagement.versionField]: migrationPlan.schema_version_from };

      // Get batch of documents
      const batch = await collection
        .find(batchQuery)
        .sort({ _id: 1 })
        .limit(batchConfig.batch_size)
        .toArray();

      if (batch.length === 0) {
        break; // No more documents to process
      }

      // Process batch
      const batchResult = await this.processMigrationBatch(
        collection, 
        batch, 
        migrationPlan.execution_plan.migration_steps,
        migrationPlan.schema_version_to
      );

      processedCount += batch.length;
      lastId = batch[batch.length - 1]._id;

      // Update progress
      await this.updateMigrationProgress(executionId, {
        processed_count: processedCount,
        total_count: totalCount,
        progress_percentage: (processedCount / totalCount) * 100,
        last_processed_id: lastId
      });

      // Throttle to avoid overwhelming the system
      if (batchConfig.throttle_delay_ms > 0) {
        await new Promise(resolve => setTimeout(resolve, batchConfig.throttle_delay_ms));
      }

      console.log(`Processed ${processedCount}/${totalCount} documents (${((processedCount / totalCount) * 100).toFixed(1)}%)`);
    }

    return {
      strategy: 'gradual',
      processed_count: processedCount,
      total_count: totalCount,
      batches_processed: Math.ceil(processedCount / batchConfig.batch_size),
      success: true
    };
  }

  async processMigrationBatch(collection, documents, migrationSteps, targetVersion) {
    const bulkOperations = [];

    for (const doc of documents) {
      let transformedDoc = { ...doc };

      // Apply each migration step
      for (const step of migrationSteps) {
        transformedDoc = await this.applyMigrationStep(transformedDoc, step);
      }

      // Update schema version
      transformedDoc[this.config.versionManagement.versionField] = targetVersion;
      transformedDoc._migration_timestamp = new Date();

      // Add to bulk operations
      bulkOperations.push({
        replaceOne: {
          filter: { _id: doc._id },
          replacement: transformedDoc
        }
      });
    }

    // Execute bulk operation
    if (bulkOperations.length > 0) {
      const result = await collection.bulkWrite(bulkOperations, { ordered: false });
      return {
        modified_count: result.modifiedCount,
        matched_count: result.matchedCount,
        errors: result.getWriteErrors()
      };
    }

    return { modified_count: 0, matched_count: 0, errors: [] };
  }

  async applyMigrationStep(document, migrationStep) {
    let transformedDoc = { ...document };

    switch (migrationStep.type) {
      case 'add_field':
        transformedDoc[migrationStep.field_name] = migrationStep.default_value;
        break;

      case 'rename_field':
        if (transformedDoc[migrationStep.old_field_name] !== undefined) {
          transformedDoc[migrationStep.new_field_name] = transformedDoc[migrationStep.old_field_name];
          delete transformedDoc[migrationStep.old_field_name];
        }
        break;

      case 'transform_field':
        if (transformedDoc[migrationStep.field_name] !== undefined) {
          transformedDoc[migrationStep.field_name] = await this.applyFieldTransformation(
            transformedDoc[migrationStep.field_name],
            migrationStep.transformation
          );
        }
        break;

      case 'nested_restructure':
        transformedDoc = await this.applyNestedRestructure(transformedDoc, migrationStep.restructure_config);
        break;

      case 'data_type_conversion':
        if (transformedDoc[migrationStep.field_name] !== undefined) {
          transformedDoc[migrationStep.field_name] = this.convertDataType(
            transformedDoc[migrationStep.field_name],
            migrationStep.target_type
          );
        }
        break;

      case 'conditional_transformation':
        if (this.evaluateCondition(transformedDoc, migrationStep.condition)) {
          transformedDoc = await this.applyConditionalTransformation(transformedDoc, migrationStep.transformation);
        }
        break;

      default:
        console.warn(`Unknown migration step type: ${migrationStep.type}`);
    }

    return transformedDoc;
  }

  async generateBackwardCompatibilityLayer(collectionName, fromVersion, toVersion) {
    console.log(`Generating backward compatibility layer: ${collectionName} v${fromVersion} ↔ v${toVersion}`);

    const sourceSchema = this.schemaVersions.get(`${collectionName}:${fromVersion}`);
    const targetSchema = this.schemaVersions.get(`${collectionName}:${toVersion}`);

    const compatibilityLayer = {
      collection_name: collectionName,
      source_version: fromVersion,
      target_version: toVersion,

      // Field mapping for backward compatibility
      field_mappings: {
        // Map old field names to new field names
        old_to_new: this.generateFieldMappings(sourceSchema, targetSchema, 'forward'),
        new_to_old: this.generateFieldMappings(targetSchema, sourceSchema, 'backward')
      },

      // Data transformation functions
      transformation_functions: {
        forward_transform: await this.generateTransformationFunction(sourceSchema, targetSchema, 'forward'),
        backward_transform: await this.generateTransformationFunction(targetSchema, sourceSchema, 'backward')
      },

      // API compatibility
      api_compatibility: {
        deprecated_fields: this.identifyDeprecatedFields(sourceSchema, targetSchema),
        field_aliases: this.generateFieldAliases(sourceSchema, targetSchema),
        default_values: this.generateDefaultValues(targetSchema)
      },

      // Migration instructions
      migration_instructions: {
        application_changes_required: this.identifyRequiredApplicationChanges(sourceSchema, targetSchema),
        breaking_changes: this.identifyBreakingChanges(sourceSchema, targetSchema),
        migration_timeline: this.generateMigrationTimeline(sourceSchema, targetSchema)
      }
    };

    // Store compatibility layer configuration
    await this.db.collection('compatibility_layers').replaceOne(
      { collection_name: collectionName, source_version: fromVersion, target_version: toVersion },
      compatibilityLayer,
      { upsert: true }
    );

    return compatibilityLayer;
  }

  async validateMigrationIntegrity(collectionName, migrationId, options = {}) {
    console.log(`Validating migration integrity: ${collectionName} (${migrationId})`);

    const collection = this.db.collection(collectionName);
    const migrationPlan = this.migrationPlans.get(migrationId);

    if (!migrationPlan) {
      throw new Error(`Migration plan not found: ${migrationId}`);
    }

    const validationResults = {
      migration_id: migrationId,
      collection_name: collectionName,
      validation_timestamp: new Date(),

      // Document count validation
      document_counts: {
        total_documents: await collection.countDocuments(),
        migrated_documents: await collection.countDocuments({
          [this.config.versionManagement.versionField]: migrationPlan.schema_version_to
        }),
        unmigrated_documents: await collection.countDocuments({
          [this.config.versionManagement.versionField]: { $ne: migrationPlan.schema_version_to }
        })
      },

      // Schema validation
      schema_validation: await this.validateSchemaCompliance(collection, migrationPlan.schema_version_to),

      // Data integrity checks
      data_integrity: await this.performDataIntegrityChecks(collection, migrationPlan),

      // Performance impact assessment
      performance_impact: await this.assessPerformanceImpact(collection, migrationPlan),

      // Compatibility verification
      compatibility_status: await this.verifyBackwardCompatibility(collection, migrationPlan)
    };

    // Calculate overall validation status
    validationResults.overall_status = this.calculateOverallValidationStatus(validationResults);

    // Store validation results
    await this.db.collection('migration_validations').insertOne(validationResults);

    console.log(`Migration validation completed: ${validationResults.overall_status}`);
    return validationResults;
  }

  // Utility methods for migration management
  generateMigrationId() {
    return `migration_${Date.now()}_${Math.random().toString(36).substr(2, 9)}`;
  }

  generateExecutionId() {
    return `exec_${Date.now()}_${Math.random().toString(36).substr(2, 9)}`;
  }

  async loadSchemaVersionRegistry() {
    const schemaVersions = await this.db.collection('schema_versions')
      .find({ 'version_metadata.is_active': true })
      .toArray();

    schemaVersions.forEach(schema => {
      this.schemaVersions.set(`${schema.collection_name}:${schema.version}`, schema);
    });

    console.log(`Loaded ${schemaVersions.length} active schema versions`);
  }

  analyzeMigrationType(sourceSchema, targetSchema) {
    const sourceFields = new Set(Object.keys(sourceSchema.schema_definition.fields));
    const targetFields = new Set(Object.keys(targetSchema.schema_definition.fields));

    const addedFields = [...targetFields].filter(field => !sourceFields.has(field));
    const removedFields = [...sourceFields].filter(field => !targetFields.has(field));
    const modifiedFields = [...sourceFields].filter(field => 
      targetFields.has(field) && 
      JSON.stringify(sourceSchema.schema_definition.fields[field]) !== 
      JSON.stringify(targetSchema.schema_definition.fields[field])
    );

    if (removedFields.length > 0 || modifiedFields.length > 0) {
      return 'breaking';
    } else if (addedFields.length > 0) {
      return 'additive';
    } else {
      return 'maintenance';
    }
  }
}

// Example usage demonstrating comprehensive MongoDB schema evolution
async function demonstrateSchemaEvolution() {
  const schemaManager = new MongoSchemaEvolutionManager('mongodb://localhost:27017');

  try {
    await schemaManager.initialize('ecommerce_platform');

    console.log('Defining initial user schema version...');

    // Define initial schema version
    await schemaManager.defineSchemaVersion('users', {
      version: '1.0',
      versionName: 'initial_user_schema',
      fields: {
        _id: { type: 'ObjectId', required: true },
        email: { type: 'String', required: true, unique: true },
        password_hash: { type: 'String', required: true },
        created_at: { type: 'Date', required: true },
        last_login: { type: 'Date', required: false }
      },
      requiredFields: ['_id', 'email', 'password_hash', 'created_at'],
      migrationType: 'initial',
      backwardCompatible: true
    });

    // Define enhanced schema version
    await schemaManager.defineSchemaVersion('users', {
      version: '2.0',
      versionName: 'enhanced_user_profile',
      fields: {
        _id: { type: 'ObjectId', required: true },
        email: { type: 'String', required: true, unique: true },
        password_hash: { type: 'String', required: true },

        // New profile fields
        profile: {
          type: 'Object',
          required: false,
          fields: {
            first_name: { type: 'String', required: false },
            last_name: { type: 'String', required: false },
            avatar_url: { type: 'String', required: false },
            bio: { type: 'String', required: false, max_length: 500 }
          }
        },

        // Enhanced user preferences
        preferences: {
          type: 'Object',
          required: false,
          fields: {
            email_notifications: { type: 'Boolean', default: true },
            privacy_level: { type: 'String', enum: ['public', 'friends', 'private'], default: 'public' },
            theme: { type: 'String', enum: ['light', 'dark'], default: 'light' },
            language: { type: 'String', default: 'en' }
          }
        },

        // Subscription and status
        subscription: {
          type: 'Object',
          required: false,
          fields: {
            tier: { type: 'String', enum: ['basic', 'plus', 'premium'], default: 'basic' },
            expires_at: { type: 'Date', required: false },
            auto_renewal: { type: 'Boolean', default: false }
          }
        },

        // Tracking and analytics
        activity: {
          type: 'Object',
          required: false,
          fields: {
            last_login: { type: 'Date', required: false },
            login_count: { type: 'Number', default: 0 },
            profile_completion: { type: 'Number', min: 0, max: 100, default: 0 }
          }
        },

        created_at: { type: 'Date', required: true },
        updated_at: { type: 'Date', required: true }
      },
      requiredFields: ['_id', 'email', 'password_hash', 'created_at', 'updated_at'],

      // Migration configuration
      migrationType: 'additive',
      backwardCompatible: true,

      // Field mappings and transformations
      fieldMappings: {
        last_login: 'activity.last_login'
      },

      dataTransformations: {
        // Transform old last_login field to new nested structure
        'activity.last_login': 'document.last_login',
        'activity.login_count': '1',
        'profile_completion': 'calculateProfileCompletion(document)',
        'preferences': 'generateDefaultPreferences()',
        'subscription.tier': 'deriveTierFromHistory(document)'
      }
    });

    // Create migration plan
    const migrationPlan = await schemaManager.createMigrationPlan('users', '1.0', '2.0', {
      batchSize: 500,
      concurrentBatches: 2,
      backupRequired: true,
      deploymentWindow: {
        start: '2024-01-15T02:00:00Z',
        end: '2024-01-15T06:00:00Z'
      }
    });

    console.log('Migration plan created:', migrationPlan.migration_id);

    // Generate backward compatibility layer
    const compatibilityLayer = await schemaManager.generateBackwardCompatibilityLayer('users', '1.0', '2.0');
    console.log('Backward compatibility layer generated');

    // Execute migration (if approved and in deployment window)
    if (process.env.EXECUTE_MIGRATION === 'true') {
      const migrationResult = await schemaManager.executeMigration(migrationPlan.migration_id, {
        strategy: 'gradual',
        autoRollback: true
      });

      console.log('Migration executed:', migrationResult);

      // Validate migration integrity
      const validationResults = await schemaManager.validateMigrationIntegrity('users', migrationPlan.migration_id);
      console.log('Migration validation:', validationResults.overall_status);
    }

  } catch (error) {
    console.error('Schema evolution demonstration error:', error);
  }
}

module.exports = {
  MongoSchemaEvolutionManager,
  demonstrateSchemaEvolution
};

Understanding MongoDB Schema Evolution Patterns

Advanced Migration Strategies and Version Management

Implement sophisticated schema evolution with enterprise-grade version control and migration orchestration:

// Production-ready schema evolution with advanced migration patterns
class EnterpriseSchemaEvolutionManager extends MongoSchemaEvolutionManager {
  constructor(connectionUri, enterpriseConfig) {
    super(connectionUri, enterpriseConfig);

    this.enterpriseFeatures = {
      // Advanced migration orchestration
      migrationOrchestration: {
        distributedMigration: true,
        crossCollectionDependencies: true,
        transactionalMigration: true,
        rollbackOrchestration: true
      },

      // Enterprise integration
      enterpriseIntegration: {
        cicdIntegration: true,
        approvalWorkflows: true,
        auditCompliance: true,
        performanceMonitoring: true
      },

      // Advanced compatibility management
      compatibilityManagement: {
        multiVersionSupport: true,
        apiVersioning: true,
        clientCompatibilityTracking: true,
        automaticDeprecation: true
      }
    };
  }

  async orchestrateDistributedMigration(migrationConfig) {
    console.log('Orchestrating distributed migration across collections...');

    const distributedPlan = {
      // Cross-collection dependency management
      dependencyGraph: await this.analyzeCrossCollectionDependencies(migrationConfig.collections),

      // Coordinated execution strategy
      executionStrategy: {
        coordinationMethod: 'transaction', // transaction, phased, eventually_consistent
        consistencyLevel: 'strong', // strong, eventual, causal
        isolationLevel: 'snapshot', // snapshot, read_committed, read_uncommitted
        rollbackStrategy: 'coordinated' // coordinated, independent, manual
      },

      // Performance optimization
      performanceOptimization: {
        parallelCollections: true,
        resourceBalancing: true,
        priorityQueueing: true,
        adaptiveThrottling: true
      }
    };

    return await this.executeDistributedMigration(distributedPlan);
  }

  async implementSmartRollback(migrationId, rollbackConfig) {
    console.log('Implementing smart rollback with data recovery...');

    const rollbackStrategy = {
      // Intelligent rollback analysis
      rollbackAnalysis: {
        dataImpactAssessment: true,
        dependencyReversal: true,
        performanceImpactMinimization: true,
        dataConsistencyVerification: true
      },

      // Recovery mechanisms
      recoveryMechanisms: {
        pointInTimeRecovery: rollbackConfig.pointInTimeRecovery || false,
        incrementalRollback: rollbackConfig.incrementalRollback || false,
        dataReconciliation: rollbackConfig.dataReconciliation || true,
        consistencyRepair: rollbackConfig.consistencyRepair || true
      }
    };

    return await this.executeSmartRollback(migrationId, rollbackStrategy);
  }
}

SQL-Style Schema Management with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB schema evolution and migration management:

-- QueryLeaf schema evolution with SQL-familiar migration patterns

-- Define comprehensive schema version with validation and constraints
CREATE SCHEMA_VERSION users_v2 FOR COLLECTION users AS (
  -- Schema version metadata
  version_number = '2.0',
  version_name = 'enhanced_user_profiles',
  migration_type = 'additive',
  backward_compatible = true,

  -- Field definitions with validation rules
  field_definitions = JSON_OBJECT(
    '_id', JSON_OBJECT('type', 'ObjectId', 'required', true, 'primary_key', true),
    'email', JSON_OBJECT('type', 'String', 'required', true, 'unique', true, 'format', 'email'),
    'password_hash', JSON_OBJECT('type', 'String', 'required', true, 'min_length', 60),

    -- New nested profile structure
    'profile', JSON_OBJECT(
      'type', 'Object',
      'required', false,
      'fields', JSON_OBJECT(
        'first_name', JSON_OBJECT('type', 'String', 'max_length', 50),
        'last_name', JSON_OBJECT('type', 'String', 'max_length', 50),
        'display_name', JSON_OBJECT('type', 'String', 'max_length', 100),
        'avatar_url', JSON_OBJECT('type', 'String', 'format', 'url'),
        'bio', JSON_OBJECT('type', 'String', 'max_length', 500),
        'date_of_birth', JSON_OBJECT('type', 'Date', 'format', 'YYYY-MM-DD'),
        'location', JSON_OBJECT(
          'type', 'Object',
          'fields', JSON_OBJECT(
            'city', JSON_OBJECT('type', 'String'),
            'country', JSON_OBJECT('type', 'String', 'length', 2),
            'timezone', JSON_OBJECT('type', 'String')
          )
        )
      )
    ),

    -- Enhanced user preferences with defaults
    'preferences', JSON_OBJECT(
      'type', 'Object',
      'required', false,
      'default', JSON_OBJECT(
        'email_notifications', true,
        'privacy_level', 'public',
        'theme', 'light',
        'language', 'en'
      ),
      'fields', JSON_OBJECT(
        'email_notifications', JSON_OBJECT('type', 'Boolean', 'default', true),
        'privacy_level', JSON_OBJECT('type', 'String', 'enum', JSON_ARRAY('public', 'friends', 'private'), 'default', 'public'),
        'theme', JSON_OBJECT('type', 'String', 'enum', JSON_ARRAY('light', 'dark', 'auto'), 'default', 'light'),
        'language', JSON_OBJECT('type', 'String', 'pattern', '^[a-z]{2}$', 'default', 'en'),
        'notification_settings', JSON_OBJECT(
          'type', 'Object',
          'fields', JSON_OBJECT(
            'push_notifications', JSON_OBJECT('type', 'Boolean', 'default', true),
            'email_frequency', JSON_OBJECT('type', 'String', 'enum', JSON_ARRAY('immediate', 'daily', 'weekly'), 'default', 'daily')
          )
        )
      )
    ),

    -- Subscription and billing information
    'subscription', JSON_OBJECT(
      'type', 'Object',
      'required', false,
      'fields', JSON_OBJECT(
        'tier', JSON_OBJECT('type', 'String', 'enum', JSON_ARRAY('free', 'basic', 'plus', 'premium'), 'default', 'free'),
        'status', JSON_OBJECT('type', 'String', 'enum', JSON_ARRAY('active', 'cancelled', 'expired', 'trial'), 'default', 'active'),
        'starts_at', JSON_OBJECT('type', 'Date'),
        'expires_at', JSON_OBJECT('type', 'Date'),
        'auto_renewal', JSON_OBJECT('type', 'Boolean', 'default', false),
        'billing_cycle', JSON_OBJECT('type', 'String', 'enum', JSON_ARRAY('monthly', 'yearly'), 'default', 'monthly')
      )
    ),

    -- Activity tracking and analytics
    'activity_metrics', JSON_OBJECT(
      'type', 'Object',
      'required', false,
      'fields', JSON_OBJECT(
        'last_login_at', JSON_OBJECT('type', 'Date'),
        'login_count', JSON_OBJECT('type', 'Integer', 'min', 0, 'default', 0),
        'profile_completion_score', JSON_OBJECT('type', 'Integer', 'min', 0, 'max', 100, 'default', 0),
        'account_verification_status', JSON_OBJECT('type', 'String', 'enum', JSON_ARRAY('pending', 'verified', 'rejected'), 'default', 'pending'),
        'last_profile_update', JSON_OBJECT('type', 'Date'),
        'feature_usage_stats', JSON_OBJECT(
          'type', 'Object',
          'fields', JSON_OBJECT(
            'dashboard_visits', JSON_OBJECT('type', 'Integer', 'default', 0),
            'api_calls_count', JSON_OBJECT('type', 'Integer', 'default', 0),
            'storage_usage_bytes', JSON_OBJECT('type', 'Long', 'default', 0)
          )
        )
      )
    ),

    -- Timestamps and audit trail
    'created_at', JSON_OBJECT('type', 'Date', 'required', true, 'immutable', true),
    'updated_at', JSON_OBJECT('type', 'Date', 'required', true, 'auto_update', true),
    '_schema_version', JSON_OBJECT('type', 'String', 'required', true, 'default', '2.0')
  ),

  -- Migration mapping from previous version
  migration_mappings = JSON_OBJECT(
    -- Direct field mappings
    'last_login', 'activity_metrics.last_login_at',

    -- Computed field mappings
    'activity_metrics.login_count', 'COALESCE(login_count, 1)',
    'activity_metrics.profile_completion_score', 'CALCULATE_PROFILE_COMPLETION(profile)',
    'subscription.tier', 'DERIVE_TIER_FROM_USAGE(usage_history)',
    'preferences', 'GENERATE_DEFAULT_PREFERENCES()',
    'updated_at', 'CURRENT_TIMESTAMP'
  ),

  -- Validation rules for data integrity
  validation_rules = JSON_ARRAY(
    JSON_OBJECT('rule', 'email_domain_validation', 'expression', 'email REGEXP ''^[^@]+@[^@]+\\.[^@]+$'''),
    JSON_OBJECT('rule', 'subscription_dates_consistency', 'expression', 'subscription.expires_at > subscription.starts_at'),
    JSON_OBJECT('rule', 'profile_completion_accuracy', 'expression', 'activity_metrics.profile_completion_score <= 100'),
    JSON_OBJECT('rule', 'timezone_validation', 'expression', 'profile.location.timezone IN (SELECT timezone FROM valid_timezones)')
  ),

  -- Index optimization for new schema
  index_definitions = JSON_ARRAY(
    JSON_OBJECT('fields', JSON_OBJECT('email', 1), 'unique', true, 'sparse', false),
    JSON_OBJECT('fields', JSON_OBJECT('subscription.tier', 1, 'subscription.status', 1), 'background', true),
    JSON_OBJECT('fields', JSON_OBJECT('activity_metrics.last_login_at', -1), 'background', true),
    JSON_OBJECT('fields', JSON_OBJECT('profile.location.country', 1), 'sparse', true),
    JSON_OBJECT('fields', JSON_OBJECT('_schema_version', 1), 'background', true)
  ),

  -- Compatibility and deprecation settings
  compatibility_settings = JSON_OBJECT(
    'maintain_old_fields_days', 90,
    'deprecated_fields', JSON_ARRAY('last_login', 'login_count'),
    'breaking_changes', JSON_ARRAY(),
    'migration_required_for', JSON_ARRAY('v1.0', 'v1.5')
  )
);

-- Create comprehensive migration plan with performance optimization
WITH migration_analysis AS (
  SELECT 
    collection_name,
    current_schema_version,
    target_schema_version,

    -- Document analysis for migration planning
    COUNT(*) as total_documents,
    AVG(LENGTH(BSON_SIZE(document))) as avg_document_size,
    SUM(LENGTH(BSON_SIZE(document))) / 1024 / 1024 as total_size_mb,

    -- Performance projections
    CASE 
      WHEN COUNT(*) > 10000000 THEN 'large_collection_parallel_required'
      WHEN COUNT(*) > 1000000 THEN 'medium_collection_batch_optimize'
      ELSE 'small_collection_standard_processing'
    END as processing_category,

    -- Migration complexity assessment
    CASE 
      WHEN target_schema_version LIKE '%.0' THEN 'major_version_comprehensive_testing'
      WHEN COUNT_SCHEMA_CHANGES(current_schema_version, target_schema_version) > 10 THEN 'complex_migration'
      ELSE 'standard_migration'
    END as migration_complexity,

    -- Resource requirements estimation
    CEIL(COUNT(*) / 1000.0) as estimated_batches,
    CEIL((SUM(LENGTH(BSON_SIZE(document))) / 1024 / 1024) / 100.0) * 2 as estimated_duration_minutes,
    CEIL(COUNT(*) / 10000.0) * 512 as estimated_memory_mb

  FROM users u
  JOIN schema_version_registry svr ON u._schema_version = svr.version
  WHERE svr.collection_name = 'users'
  GROUP BY collection_name, current_schema_version, target_schema_version
),

-- Generate optimized migration execution plan
migration_execution_plan AS (
  SELECT 
    ma.*,

    -- Batch processing configuration
    CASE ma.processing_category
      WHEN 'large_collection_parallel_required' THEN 
        JSON_OBJECT(
          'batch_size', 500,
          'concurrent_batches', 5,
          'parallel_collections', true,
          'memory_limit_per_batch_mb', 256,
          'throttle_delay_ms', 50
        )
      WHEN 'medium_collection_batch_optimize' THEN
        JSON_OBJECT(
          'batch_size', 1000,
          'concurrent_batches', 3,
          'parallel_collections', false,
          'memory_limit_per_batch_mb', 128,
          'throttle_delay_ms', 10
        )
      ELSE
        JSON_OBJECT(
          'batch_size', 2000,
          'concurrent_batches', 1,
          'parallel_collections', false,
          'memory_limit_per_batch_mb', 64,
          'throttle_delay_ms', 0
        )
    END as batch_configuration,

    -- Safety and rollback configuration
    JSON_OBJECT(
      'backup_required', CASE WHEN ma.total_documents > 100000 THEN true ELSE false END,
      'rollback_enabled', true,
      'validation_sample_size', LEAST(ma.total_documents * 0.1, 10000),
      'progress_checkpoint_interval', GREATEST(ma.estimated_batches / 10, 1),
      'failure_threshold_percent', 5.0
    ) as safety_configuration,

    -- Performance monitoring setup
    JSON_OBJECT(
      'monitor_memory_usage', true,
      'monitor_throughput', true,
      'monitor_lock_contention', true,
      'alert_on_slowdown_percent', 50,
      'performance_baseline_samples', 100
    ) as monitoring_configuration

  FROM migration_analysis ma
)

-- Create and execute migration plan
CREATE MIGRATION_PLAN users_v1_to_v2 AS (
  SELECT 
    mep.*,

    -- Migration steps with detailed transformations
    JSON_ARRAY(
      -- Step 1: Add new schema version field
      JSON_OBJECT(
        'step_number', 1,
        'step_type', 'add_field',
        'field_name', '_schema_version',
        'default_value', '2.0',
        'description', 'Add schema version tracking'
      ),

      -- Step 2: Restructure activity data
      JSON_OBJECT(
        'step_number', 2,
        'step_type', 'nested_restructure',
        'restructure_config', JSON_OBJECT(
          'create_nested_object', 'activity_metrics',
          'field_mappings', JSON_OBJECT(
            'last_login', 'activity_metrics.last_login_at',
            'login_count', 'activity_metrics.login_count'
          ),
          'computed_fields', JSON_OBJECT(
            'activity_metrics.profile_completion_score', 'CALCULATE_PROFILE_COMPLETION(profile)',
            'activity_metrics.account_verification_status', '''pending'''
          )
        )
      ),

      -- Step 3: Generate default preferences
      JSON_OBJECT(
        'step_number', 3,
        'step_type', 'add_field',
        'field_name', 'preferences',
        'transformation', 'GENERATE_DEFAULT_PREFERENCES()',
        'description', 'Add user preferences with smart defaults'
      ),

      -- Step 4: Initialize subscription data
      JSON_OBJECT(
        'step_number', 4,
        'step_type', 'add_field',
        'field_name', 'subscription',
        'transformation', 'DERIVE_SUBSCRIPTION_INFO(user_history)',
        'description', 'Initialize subscription information from usage history'
      ),

      -- Step 5: Update timestamps
      JSON_OBJECT(
        'step_number', 5,
        'step_type', 'add_field',
        'field_name', 'updated_at',
        'default_value', 'CURRENT_TIMESTAMP',
        'description', 'Add updated timestamp for audit trail'
      )
    ) as migration_steps,

    -- Validation and verification tests
    JSON_ARRAY(
      JSON_OBJECT(
        'test_name', 'schema_version_consistency',
        'test_query', 'SELECT COUNT(*) FROM users WHERE _schema_version != ''2.0''',
        'expected_result', 0,
        'severity', 'critical'
      ),
      JSON_OBJECT(
        'test_name', 'data_completeness_check',
        'test_query', 'SELECT COUNT(*) FROM users WHERE activity_metrics IS NULL',
        'expected_result', 0,
        'severity', 'critical'
      ),
      JSON_OBJECT(
        'test_name', 'preferences_initialization',
        'test_query', 'SELECT COUNT(*) FROM users WHERE preferences IS NULL',
        'expected_result', 0,
        'severity', 'high'
      ),
      JSON_OBJECT(
        'test_name', 'profile_completion_accuracy',
        'test_query', 'SELECT COUNT(*) FROM users WHERE activity_metrics.profile_completion_score < 0 OR activity_metrics.profile_completion_score > 100',
        'expected_result', 0,
        'severity', 'medium'
      )
    ) as validation_tests

  FROM migration_execution_plan mep
);

-- Execute migration with comprehensive monitoring and safety checks
EXECUTE MIGRATION users_v1_to_v2 WITH OPTIONS (
  -- Execution settings
  execution_mode = 'gradual',  -- gradual, immediate, test_mode
  safety_checks_enabled = true,
  automatic_rollback = true,

  -- Performance settings
  resource_limits = JSON_OBJECT(
    'max_memory_usage_mb', 1024,
    'max_execution_time_minutes', 120,
    'max_cpu_usage_percent', 80,
    'io_throttling_enabled', true
  ),

  -- Monitoring and alerting
  monitoring = JSON_OBJECT(
    'progress_reporting_interval_seconds', 30,
    'performance_metrics_collection', true,
    'alert_on_errors', true,
    'alert_email', 'dba@company.com'
  ),

  -- Backup and recovery
  backup_settings = JSON_OBJECT(
    'create_backup_before_migration', true,
    'backup_location', 'migrations/backup_users_v1_to_v2',
    'verify_backup_integrity', true
  )
);

-- Monitor migration progress with real-time analytics
WITH migration_progress AS (
  SELECT 
    migration_id,
    execution_id,
    collection_name,
    schema_version_from,
    schema_version_to,

    -- Progress tracking
    total_documents,
    processed_documents,
    ROUND((processed_documents::numeric / total_documents) * 100, 2) as progress_percentage,

    -- Performance metrics
    EXTRACT(SECONDS FROM CURRENT_TIMESTAMP - migration_started_at) as elapsed_seconds,
    ROUND(processed_documents::numeric / EXTRACT(SECONDS FROM CURRENT_TIMESTAMP - migration_started_at), 2) as documents_per_second,

    -- Resource utilization
    current_memory_usage_mb,
    peak_memory_usage_mb,
    cpu_usage_percent,

    -- Quality indicators
    error_count,
    warning_count,
    validation_failures,

    -- ETA calculation
    CASE 
      WHEN processed_documents > 0 AND migration_status = 'running' THEN
        CURRENT_TIMESTAMP + 
        (INTERVAL '1 second' * 
         ((total_documents - processed_documents) / 
          (processed_documents::numeric / EXTRACT(SECONDS FROM CURRENT_TIMESTAMP - migration_started_at))))
      ELSE NULL
    END as estimated_completion_time,

    migration_status

  FROM migration_execution_status
  WHERE migration_status IN ('running', 'validating', 'finalizing')
),

-- Performance trend analysis
performance_trends AS (
  SELECT 
    migration_id,

    -- Throughput trends (last 5 minutes)
    AVG(documents_per_second) OVER (
      ORDER BY checkpoint_timestamp 
      ROWS BETWEEN 4 PRECEDING AND CURRENT ROW
    ) as avg_throughput_5min,

    -- Memory usage trends
    AVG(memory_usage_mb) OVER (
      ORDER BY checkpoint_timestamp
      ROWS BETWEEN 9 PRECEDING AND CURRENT ROW  
    ) as avg_memory_usage_10min,

    -- Error rate trends
    SUM(errors_since_last_checkpoint) OVER (
      ORDER BY checkpoint_timestamp
      ROWS BETWEEN 19 PRECEDING AND CURRENT ROW
    ) as error_count_20min,

    -- Performance indicators
    CASE 
      WHEN documents_per_second < avg_documents_per_second * 0.7 THEN 'degraded_performance'
      WHEN memory_usage_mb > peak_memory_usage_mb * 0.9 THEN 'high_memory_usage'
      WHEN error_count > 0 THEN 'errors_detected'
      ELSE 'healthy'
    END as health_status

  FROM migration_performance_checkpoints
  WHERE checkpoint_timestamp >= CURRENT_TIMESTAMP - INTERVAL '1 hour'
)

-- Migration monitoring dashboard
SELECT 
  -- Current status overview
  mp.migration_id,
  mp.collection_name,
  mp.progress_percentage || '%' as progress,
  mp.documents_per_second || ' docs/sec' as throughput,
  mp.estimated_completion_time,
  mp.migration_status,

  -- Resource utilization
  mp.current_memory_usage_mb || 'MB (' || 
    ROUND((mp.current_memory_usage_mb::numeric / mp.peak_memory_usage_mb) * 100, 1) || '% of peak)' as memory_usage,
  mp.cpu_usage_percent || '%' as cpu_usage,

  -- Quality indicators
  mp.error_count as errors,
  mp.warning_count as warnings,
  mp.validation_failures as validation_issues,

  -- Performance health
  pt.health_status,
  pt.avg_throughput_5min || ' docs/sec (5min avg)' as recent_throughput,

  -- Recommendations
  CASE 
    WHEN pt.health_status = 'degraded_performance' THEN 'Consider reducing batch size or increasing resources'
    WHEN pt.health_status = 'high_memory_usage' THEN 'Monitor for potential memory issues'
    WHEN pt.health_status = 'errors_detected' THEN 'Review error logs and consider pausing migration'
    WHEN mp.progress_percentage > 95 THEN 'Migration nearing completion, prepare for validation'
    ELSE 'Migration proceeding normally'
  END as recommendation,

  -- Next actions
  CASE 
    WHEN mp.migration_status = 'running' AND mp.progress_percentage > 99 THEN 'Begin final validation phase'
    WHEN mp.migration_status = 'validating' THEN 'Performing post-migration validation tests'
    WHEN mp.migration_status = 'finalizing' THEN 'Completing migration and cleanup'
    ELSE 'Continue monitoring progress'
  END as next_action

FROM migration_progress mp
LEFT JOIN performance_trends pt ON mp.migration_id = pt.migration_id
WHERE mp.migration_id = (SELECT MAX(migration_id) FROM migration_progress)

UNION ALL

-- Historical migration performance summary
SELECT 
  'HISTORICAL_SUMMARY' as migration_id,
  collection_name,
  NULL as progress,
  AVG(final_throughput) || ' docs/sec avg' as throughput,
  NULL as estimated_completion_time,
  'completed' as migration_status,
  AVG(peak_memory_usage_mb) || 'MB avg peak' as memory_usage,
  AVG(avg_cpu_usage_percent) || '% avg' as cpu_usage,
  SUM(total_errors) as errors,
  SUM(total_warnings) as warnings,
  SUM(validation_failures) as validation_issues,

  CASE 
    WHEN AVG(success_rate) > 99 THEN 'excellent_historical_performance'
    WHEN AVG(success_rate) > 95 THEN 'good_historical_performance'
    ELSE 'performance_issues_detected'
  END as health_status,

  COUNT(*) || ' previous migrations' as recent_throughput,
  'Historical performance baseline' as recommendation,
  'Use for future migration planning' as next_action

FROM migration_history
WHERE migration_completed_at >= CURRENT_DATE - INTERVAL '6 months'
  AND collection_name = 'users'
GROUP BY collection_name;

-- QueryLeaf schema evolution capabilities:
-- 1. SQL-familiar schema version definition with comprehensive validation rules
-- 2. Automated migration plan generation with performance optimization
-- 3. Advanced batch processing configuration based on collection size and complexity
-- 4. Real-time migration monitoring with progress tracking and performance analytics
-- 5. Comprehensive safety checks including automatic rollback and validation testing
-- 6. Backward compatibility management with deprecated field handling
-- 7. Resource utilization monitoring and optimization recommendations
-- 8. Historical performance analysis for migration planning and optimization
-- 9. Enterprise-grade error handling and recovery mechanisms
-- 10. Integration with MongoDB's native document flexibility while maintaining SQL familiarity

Best Practices for MongoDB Schema Evolution

Migration Strategy Design

Essential principles for effective MongoDB schema evolution and migration management:

  1. Gradual Evolution: Implement incremental schema changes that support both old and new document structures during transition periods
  2. Version Tracking: Maintain explicit schema version fields in documents to enable targeted migration and compatibility management
  3. Backward Compatibility: Design migrations that preserve application functionality across deployment cycles and rollback scenarios
  4. Performance Optimization: Utilize batch processing, indexing strategies, and resource throttling to minimize production impact
  5. Validation and Testing: Implement comprehensive validation frameworks that verify data integrity and schema compliance
  6. Rollback Planning: Design robust rollback strategies with automated recovery mechanisms for migration failures

Production Deployment Strategies

Optimize MongoDB schema evolution for enterprise-scale applications:

  1. Zero-Downtime Migrations: Implement rolling migration strategies that maintain application availability during schema transitions
  2. Resource Management: Configure memory limits, CPU throttling, and I/O optimization to prevent system impact during migrations
  3. Monitoring and Alerting: Deploy real-time monitoring systems that track migration progress, performance, and error conditions
  4. Documentation and Compliance: Maintain comprehensive migration documentation and audit trails for regulatory compliance
  5. Testing and Validation: Establish staging environments that replicate production conditions for migration testing and validation
  6. Team Coordination: Implement approval workflows and deployment coordination processes for enterprise migration management

Conclusion

MongoDB schema evolution provides comprehensive capabilities for managing database structure changes through flexible document models, automated migration frameworks, and sophisticated compatibility management systems. The document-based architecture enables gradual schema transitions that maintain application stability while supporting continuous evolution of data models and business requirements.

Key MongoDB Schema Evolution benefits include:

  • Flexible Migration Strategies: Support for gradual, immediate, and hybrid migration approaches that adapt to different application requirements and constraints
  • Zero-Downtime Evolution: Advanced migration patterns that maintain application availability during schema transitions and data transformations
  • Comprehensive Version Management: Sophisticated version tracking and compatibility management that supports multiple application versions simultaneously
  • Performance Optimization: Intelligent batch processing and resource management that minimizes production system impact during migrations
  • Automated Validation: Built-in validation frameworks that ensure data integrity and schema compliance throughout migration processes
  • Enterprise Integration: Advanced orchestration capabilities that integrate with CI/CD pipelines, approval workflows, and enterprise monitoring systems

Whether you're evolving simple document structures, implementing complex data transformations, or managing enterprise-scale schema migrations, MongoDB's schema evolution capabilities with QueryLeaf's familiar SQL interface provide the foundation for robust, maintainable database evolution strategies.

QueryLeaf Integration: QueryLeaf automatically translates SQL-style schema definition and migration commands into optimized MongoDB operations, providing familiar DDL syntax for schema versions, migration plan creation, and execution monitoring. Advanced schema evolution patterns, backward compatibility management, and performance optimization are seamlessly accessible through SQL constructs, making sophisticated database evolution both powerful and approachable for SQL-oriented development teams.

The combination of MongoDB's flexible schema capabilities with SQL-style migration management makes it an ideal platform for modern applications requiring both database evolution flexibility and operational simplicity, ensuring your schema management processes can scale efficiently while maintaining data integrity and application stability throughout continuous development cycles.

MongoDB Concurrent Operations and Race Condition Management: Advanced Multi-User Data Integrity with Optimistic Locking and Conflict Resolution

Modern applications face increasing concurrency challenges as user bases grow and systems become more distributed. Multiple users modifying the same data simultaneously, background processes running automated updates, and microservices accessing shared resources create complex race condition scenarios that can lead to data corruption, inconsistent states, and lost updates.

Traditional approaches to concurrency control often rely on pessimistic locking mechanisms that can create bottlenecks, deadlocks, and reduced system throughput. MongoDB's flexible document model and powerful atomic operations provide sophisticated tools for managing concurrent operations while maintaining high performance and data integrity.

The Concurrency Challenge

Traditional relational databases handle concurrency through locking mechanisms that can limit scalability:

-- Traditional pessimistic locking approach - blocks other users
BEGIN TRANSACTION;

-- Exclusive lock prevents other transactions from reading/writing
SELECT account_balance 
FROM accounts 
WHERE account_id = 12345 
FOR UPDATE;  -- Blocks all other operations

-- Update after acquiring lock
UPDATE accounts 
SET account_balance = account_balance - 500.00,
    last_transaction = CURRENT_TIMESTAMP
WHERE account_id = 12345;

-- Transaction processing during exclusive lock
INSERT INTO transactions (
    account_id, 
    transaction_type, 
    amount, 
    timestamp
) VALUES (12345, 'withdrawal', -500.00, CURRENT_TIMESTAMP);

COMMIT TRANSACTION;

-- Problems with pessimistic locking:
-- - Reduced concurrency due to blocking
-- - Potential for deadlocks with multiple locks
-- - Performance bottlenecks under high load
-- - Lock timeouts and failed operations
-- - Complex lock hierarchy management
-- - Reduced system scalability

MongoDB provides optimistic concurrency control and atomic operations that maintain data integrity without blocking:

// MongoDB optimistic concurrency with atomic operations
async function transferFunds(fromAccount, toAccount, amount) {
  const session = client.startSession();

  try {
    return await session.withTransaction(async () => {
      // Read current state without locking
      const fromAccountDoc = await db.collection('accounts').findOne(
        { accountId: fromAccount }, 
        { session }
      );

      const toAccountDoc = await db.collection('accounts').findOne(
        { accountId: toAccount }, 
        { session }
      );

      // Verify sufficient balance
      if (fromAccountDoc.balance < amount) {
        throw new Error('Insufficient funds');
      }

      // Atomic update with optimistic concurrency control
      const fromResult = await db.collection('accounts').updateOne(
        { 
          accountId: fromAccount, 
          version: fromAccountDoc.version,  // Optimistic lock
          balance: { $gte: amount }         // Additional safety check
        },
        { 
          $inc: { 
            balance: -amount,
            version: 1                      // Increment version
          },
          $set: { 
            lastModified: new Date(),
            lastTransaction: ObjectId()
          }
        },
        { session }
      );

      // Check if update succeeded (no race condition)
      if (fromResult.modifiedCount === 0) {
        throw new Error('Account modified by another operation - retry');
      }

      // Atomic credit to destination account
      const toResult = await db.collection('accounts').updateOne(
        { 
          accountId: toAccount,
          version: toAccountDoc.version
        },
        { 
          $inc: { 
            balance: amount,
            version: 1
          },
          $set: { 
            lastModified: new Date(),
            lastTransaction: ObjectId()
          }
        },
        { session }
      );

      if (toResult.modifiedCount === 0) {
        throw new Error('Destination account modified - retry');
      }

      // Record transaction atomically
      await db.collection('transactions').insertOne({
        transactionId: ObjectId(),
        fromAccount: fromAccount,
        toAccount: toAccount,
        amount: amount,
        timestamp: new Date(),
        status: 'completed',
        sessionId: session.id
      }, { session });

      return { success: true, transactionId: ObjectId() };
    });

  } catch (error) {
    console.error('Transaction failed:', error.message);
    throw error;
  } finally {
    await session.endSession();
  }
}

// Benefits of optimistic concurrency:
// - High concurrency without blocking
// - No deadlock scenarios
// - Automatic conflict detection and retry
// - Maintains ACID properties through transactions
// - Scalable under high load
// - Flexible conflict resolution strategies

Understanding Concurrent Operations in MongoDB

Optimistic Locking and Version Control

Implement sophisticated version-based concurrency control:

// Advanced optimistic locking system
class OptimisticLockManager {
  constructor(db) {
    this.db = db;
    this.retryConfig = {
      maxRetries: 3,
      baseDelay: 100,
      maxDelay: 1000,
      backoffFactor: 2
    };
  }

  async updateWithOptimisticLock(collection, filter, update, options = {}) {
    const maxRetries = options.maxRetries || this.retryConfig.maxRetries;
    let attempt = 0;

    while (attempt <= maxRetries) {
      try {
        // Get current document with version
        const currentDoc = await this.db.collection(collection).findOne(filter);

        if (!currentDoc) {
          throw new Error('Document not found');
        }

        // Ensure document has version field
        const currentVersion = currentDoc.version || 0;

        // Prepare update with version increment
        const versionedUpdate = {
          ...update,
          $inc: {
            ...(update.$inc || {}),
            version: 1
          },
          $set: {
            ...(update.$set || {}),
            lastModified: new Date(),
            modifiedBy: options.userId || 'system'
          }
        };

        // Atomic update with version check
        const result = await this.db.collection(collection).updateOne(
          { 
            ...filter,
            version: currentVersion  // Optimistic lock condition
          },
          versionedUpdate,
          options.mongoOptions || {}
        );

        if (result.modifiedCount === 0) {
          // Document was modified by another operation
          throw new OptimisticLockError(
            `Document modified by another operation. Expected version: ${currentVersion}`
          );
        }

        // Success - return updated document info
        return {
          success: true,
          previousVersion: currentVersion,
          newVersion: currentVersion + 1,
          modifiedCount: result.modifiedCount,
          attempt: attempt + 1
        };

      } catch (error) {
        if (error instanceof OptimisticLockError && attempt < maxRetries) {
          // Retry with exponential backoff
          const delay = Math.min(
            this.retryConfig.baseDelay * Math.pow(this.retryConfig.backoffFactor, attempt),
            this.retryConfig.maxDelay
          );

          console.log(`Optimistic lock retry ${attempt + 1}/${maxRetries} after ${delay}ms`);
          await this.sleep(delay);
          attempt++;
          continue;
        }

        // Max retries exceeded or non-retryable error
        throw error;
      }
    }
  }

  async updateManyWithOptimisticLock(collection, documents, updateFunction, options = {}) {
    // Batch optimistic locking for multiple documents
    const session = this.db.client.startSession();
    const results = [];

    try {
      await session.withTransaction(async () => {
        for (const docFilter of documents) {
          const currentDoc = await this.db.collection(collection).findOne(
            docFilter, 
            { session }
          );

          if (!currentDoc) {
            throw new Error(`Document not found: ${JSON.stringify(docFilter)}`);
          }

          // Apply update function to get changes
          const update = await updateFunction(currentDoc, docFilter);
          const currentVersion = currentDoc.version || 0;

          // Atomic update with version check
          const result = await this.db.collection(collection).updateOne(
            { 
              ...docFilter,
              version: currentVersion
            },
            {
              ...update,
              $inc: {
                ...(update.$inc || {}),
                version: 1
              },
              $set: {
                ...(update.$set || {}),
                lastModified: new Date(),
                batchId: options.batchId || ObjectId()
              }
            },
            { session }
          );

          if (result.modifiedCount === 0) {
            throw new OptimisticLockError(
              `Batch update failed - document modified: ${JSON.stringify(docFilter)}`
            );
          }

          results.push({
            filter: docFilter,
            previousVersion: currentVersion,
            newVersion: currentVersion + 1,
            success: true
          });
        }
      });

      return {
        success: true,
        totalUpdated: results.length,
        results: results
      };

    } catch (error) {
      return {
        success: false,
        error: error.message,
        partialResults: results
      };
    } finally {
      await session.endSession();
    }
  }

  async compareAndSwap(collection, filter, expectedValue, newValue, options = {}) {
    // Compare-and-swap operation for atomic value updates
    const valueField = options.valueField || 'value';
    const versionField = options.versionField || 'version';

    const result = await this.db.collection(collection).updateOne(
      {
        ...filter,
        [valueField]: expectedValue,  // Current value must match
        ...(options.expectedVersion && { [versionField]: options.expectedVersion })
      },
      {
        $set: {
          [valueField]: newValue,
          lastModified: new Date(),
          modifiedBy: options.userId || 'system'
        },
        $inc: {
          [versionField]: 1
        }
      }
    );

    return {
      success: result.modifiedCount > 0,
      matched: result.matchedCount > 0,
      modified: result.modifiedCount,
      wasExpectedValue: result.matchedCount > 0
    };
  }

  async createVersionedDocument(collection, document, options = {}) {
    // Create new document with initial version
    const versionedDoc = {
      ...document,
      version: 1,
      createdAt: new Date(),
      lastModified: new Date(),
      createdBy: options.userId || 'system'
    };

    try {
      const result = await this.db.collection(collection).insertOne(
        versionedDoc,
        options.mongoOptions || {}
      );

      return {
        success: true,
        documentId: result.insertedId,
        version: 1
      };
    } catch (error) {
      if (error.code === 11000) { // Duplicate key error
        throw new Error('Document already exists with the same unique identifier');
      }
      throw error;
    }
  }

  async getDocumentVersion(collection, filter) {
    // Get current document version
    const doc = await this.db.collection(collection).findOne(
      filter, 
      { projection: { version: 1, lastModified: 1 } }
    );

    return doc ? {
      exists: true,
      version: doc.version || 0,
      lastModified: doc.lastModified
    } : {
      exists: false,
      version: null,
      lastModified: null
    };
  }

  async getVersionHistory(collection, filter, options = {}) {
    // Get version history if audit trail is maintained
    const limit = options.limit || 10;
    const auditCollection = `${collection}_audit`;

    const history = await this.db.collection(auditCollection).find(
      filter,
      { 
        sort: { version: -1, timestamp: -1 },
        limit: limit
      }
    ).toArray();

    return history;
  }

  sleep(ms) {
    return new Promise(resolve => setTimeout(resolve, ms));
  }
}

// Custom error class for optimistic locking
class OptimisticLockError extends Error {
  constructor(message) {
    super(message);
    this.name = 'OptimisticLockError';
  }
}

Atomic Operations and Race Condition Prevention

Implement atomic operations to prevent race conditions:

// Advanced atomic operations for race condition prevention
class AtomicOperationManager {
  constructor(db) {
    this.db = db;
    this.operationLog = db.collection('atomic_operations_log');
  }

  async atomicIncrement(collection, filter, field, incrementValue = 1, options = {}) {
    // Thread-safe atomic increment with bounds checking
    const session = this.db.client.startSession();

    try {
      return await session.withTransaction(async () => {
        // Get current value
        const doc = await this.db.collection(collection).findOne(filter, { session });

        if (!doc) {
          throw new Error('Document not found for atomic increment');
        }

        const currentValue = doc[field] || 0;
        const newValue = currentValue + incrementValue;

        // Validate bounds if specified
        if (options.min !== undefined && newValue < options.min) {
          throw new Error(`Increment would violate minimum bound: ${options.min}`);
        }

        if (options.max !== undefined && newValue > options.max) {
          throw new Error(`Increment would violate maximum bound: ${options.max}`);
        }

        // Atomic increment with bounds checking
        const updateFilter = {
          ...filter,
          [field]: { 
            $gte: options.min || Number.MIN_SAFE_INTEGER,
            $lt: (options.max || Number.MAX_SAFE_INTEGER) - incrementValue + 1
          }
        };

        const result = await this.db.collection(collection).updateOne(
          updateFilter,
          {
            $inc: { [field]: incrementValue },
            $set: { 
              lastModified: new Date(),
              lastIncrementBy: incrementValue
            }
          },
          { session }
        );

        if (result.modifiedCount === 0) {
          throw new Error('Atomic increment failed - bounds violated or document modified');
        }

        // Log successful operation
        await this.logAtomicOperation({
          operation: 'increment',
          collection: collection,
          filter: filter,
          field: field,
          incrementValue: incrementValue,
          previousValue: currentValue,
          newValue: newValue,
          timestamp: new Date()
        }, session);

        return {
          success: true,
          previousValue: currentValue,
          newValue: newValue,
          incrementValue: incrementValue
        };
      });
    } finally {
      await session.endSession();
    }
  }

  async atomicArrayOperation(collection, filter, arrayField, operation, value, options = {}) {
    // Thread-safe atomic array operations
    const session = this.db.client.startSession();

    try {
      return await session.withTransaction(async () => {
        const doc = await this.db.collection(collection).findOne(filter, { session });

        if (!doc) {
          throw new Error('Document not found for atomic array operation');
        }

        const currentArray = doc[arrayField] || [];
        let updateOperation = {};
        let operationResult = {};

        switch (operation) {
          case 'push':
            // Add element if not exists (optional uniqueness)
            if (options.unique && currentArray.includes(value)) {
              operationResult = {
                success: false,
                reason: 'duplicate_value',
                currentArray: currentArray
              };
            } else {
              updateOperation = { $push: { [arrayField]: value } };
              operationResult = {
                success: true,
                operation: 'push',
                value: value,
                newLength: currentArray.length + 1
              };
            }
            break;

          case 'pull':
            // Remove specific value
            if (!currentArray.includes(value)) {
              operationResult = {
                success: false,
                reason: 'value_not_found',
                currentArray: currentArray
              };
            } else {
              updateOperation = { $pull: { [arrayField]: value } };
              operationResult = {
                success: true,
                operation: 'pull',
                value: value,
                newLength: currentArray.length - 1
              };
            }
            break;

          case 'addToSet':
            // Add unique value to set
            updateOperation = { $addToSet: { [arrayField]: value } };
            operationResult = {
              success: true,
              operation: 'addToSet',
              value: value,
              wasAlreadyPresent: currentArray.includes(value)
            };
            break;

          case 'pop':
            // Remove last element
            if (currentArray.length === 0) {
              operationResult = {
                success: false,
                reason: 'array_empty',
                currentArray: currentArray
              };
            } else {
              updateOperation = { $pop: { [arrayField]: 1 } }; // Remove last
              operationResult = {
                success: true,
                operation: 'pop',
                removedValue: currentArray[currentArray.length - 1],
                newLength: currentArray.length - 1
              };
            }
            break;

          default:
            throw new Error(`Unsupported atomic array operation: ${operation}`);
        }

        if (operationResult.success && Object.keys(updateOperation).length > 0) {
          // Apply atomic update
          const result = await this.db.collection(collection).updateOne(
            filter,
            {
              ...updateOperation,
              $set: {
                lastModified: new Date(),
                lastArrayOperation: {
                  operation: operation,
                  value: value,
                  timestamp: new Date()
                }
              }
            },
            { session }
          );

          if (result.modifiedCount === 0) {
            throw new Error('Atomic array operation failed - document may have been modified');
          }
        }

        // Log operation
        await this.logAtomicOperation({
          operation: `array_${operation}`,
          collection: collection,
          filter: filter,
          arrayField: arrayField,
          value: value,
          result: operationResult,
          timestamp: new Date()
        }, session);

        return operationResult;
      });
    } finally {
      await session.endSession();
    }
  }

  async atomicUpsert(collection, filter, update, options = {}) {
    // Atomic upsert with race condition handling
    const session = this.db.client.startSession();

    try {
      return await session.withTransaction(async () => {
        // Try to find existing document
        const existingDoc = await this.db.collection(collection).findOne(filter, { session });

        if (existingDoc) {
          // Document exists - perform update with optimistic locking
          const currentVersion = existingDoc.version || 0;

          const result = await this.db.collection(collection).updateOne(
            {
              ...filter,
              version: currentVersion
            },
            {
              ...update,
              $inc: {
                ...(update.$inc || {}),
                version: 1
              },
              $set: {
                ...(update.$set || {}),
                lastModified: new Date(),
                operation: 'update'
              }
            },
            { session }
          );

          if (result.modifiedCount === 0) {
            throw new Error('Atomic upsert update failed - document modified concurrently');
          }

          return {
            operation: 'update',
            documentId: existingDoc._id,
            previousVersion: currentVersion,
            newVersion: currentVersion + 1,
            success: true
          };

        } else {
          // Document doesn't exist - try to insert
          const insertDoc = {
            ...filter,
            ...(update.$set || {}),
            version: 1,
            createdAt: new Date(),
            lastModified: new Date(),
            operation: 'insert'
          };

          // Apply increment operations to initial values
          if (update.$inc) {
            Object.keys(update.$inc).forEach(field => {
              if (field !== 'version') {
                insertDoc[field] = (insertDoc[field] || 0) + update.$inc[field];
              }
            });
          }

          try {
            const insertResult = await this.db.collection(collection).insertOne(
              insertDoc,
              { session }
            );

            return {
              operation: 'insert',
              documentId: insertResult.insertedId,
              version: 1,
              success: true
            };
          } catch (error) {
            if (error.code === 11000) {
              // Duplicate key - another process inserted concurrently
              // Retry as update
              throw new Error('Concurrent insert detected - retrying as update');
            }
            throw error;
          }
        }
      });
    } finally {
      await session.endSession();
    }
  }

  async atomicSwapFields(collection, filter, field1, field2, options = {}) {
    // Atomically swap values between two fields
    const session = this.db.client.startSession();

    try {
      return await session.withTransaction(async () => {
        const doc = await this.db.collection(collection).findOne(filter, { session });

        if (!doc) {
          throw new Error('Document not found for atomic field swap');
        }

        const value1 = doc[field1];
        const value2 = doc[field2];

        // Perform atomic swap
        const result = await this.db.collection(collection).updateOne(
          filter,
          {
            $set: {
              [field1]: value2,
              [field2]: value1,
              lastModified: new Date(),
              lastSwapOperation: {
                field1: field1,
                field2: field2,
                timestamp: new Date()
              }
            },
            $inc: {
              version: 1
            }
          },
          { session }
        );

        if (result.modifiedCount === 0) {
          throw new Error('Atomic field swap failed');
        }

        return {
          success: true,
          swappedValues: {
            [field1]: { from: value1, to: value2 },
            [field2]: { from: value2, to: value1 }
          }
        };
      });
    } finally {
      await session.endSession();
    }
  }

  async bulkAtomicOperations(operations, options = {}) {
    // Execute multiple atomic operations in a single transaction
    const session = this.db.client.startSession();
    const results = [];

    try {
      await session.withTransaction(async () => {
        for (const [index, op] of operations.entries()) {
          try {
            let result;

            switch (op.type) {
              case 'increment':
                result = await this.atomicIncrement(
                  op.collection, op.filter, op.field, op.value, { ...op.options, session }
                );
                break;

              case 'arrayOperation':
                result = await this.atomicArrayOperation(
                  op.collection, op.filter, op.arrayField, op.operation, op.value, 
                  { ...op.options, session }
                );
                break;

              case 'upsert':
                result = await this.atomicUpsert(
                  op.collection, op.filter, op.update, { ...op.options, session }
                );
                break;

              default:
                throw new Error(`Unsupported bulk operation type: ${op.type}`);
            }

            results.push({
              index: index,
              operation: op.type,
              success: true,
              result: result
            });

          } catch (error) {
            results.push({
              index: index,
              operation: op.type,
              success: false,
              error: error.message
            });

            if (!options.continueOnError) {
              throw error;
            }
          }
        }
      });

      return {
        success: true,
        totalOperations: operations.length,
        successfulOperations: results.filter(r => r.success).length,
        results: results
      };

    } catch (error) {
      return {
        success: false,
        error: error.message,
        partialResults: results
      };
    } finally {
      await session.endSession();
    }
  }

  async logAtomicOperation(operationDetails, session) {
    // Log atomic operation for audit trail
    await this.operationLog.insertOne({
      ...operationDetails,
      operationId: ObjectId(),
      sessionId: session.id
    }, { session });
  }
}

Transaction Isolation and Conflict Resolution

Implement sophisticated conflict resolution strategies:

// Advanced conflict resolution and transaction isolation
class ConflictResolutionManager {
  constructor(db) {
    this.db = db;
    this.conflictLog = db.collection('conflict_resolution_log');
  }

  async resolveWithStrategy(collection, conflictData, strategy = 'merge', options = {}) {
    // Resolve conflicts using various strategies
    const session = this.db.client.startSession();

    try {
      return await session.withTransaction(async () => {
        const { 
          documentId, 
          baseVersion, 
          localChanges, 
          remoteChanges 
        } = conflictData;

        // Get current document state
        const currentDoc = await this.db.collection(collection).findOne(
          { _id: ObjectId(documentId) }, 
          { session }
        );

        if (!currentDoc) {
          throw new Error('Document not found for conflict resolution');
        }

        if (currentDoc.version <= baseVersion) {
          // No conflict - apply changes directly
          return await this.applyChanges(
            collection, documentId, localChanges, session
          );
        }

        // Conflict detected - apply resolution strategy
        let resolvedChanges;

        switch (strategy) {
          case 'merge':
            resolvedChanges = await this.mergeChanges(
              currentDoc, localChanges, remoteChanges, options
            );
            break;

          case 'last_write_wins':
            resolvedChanges = await this.lastWriteWins(
              localChanges, remoteChanges, options
            );
            break;

          case 'first_write_wins':
            resolvedChanges = await this.firstWriteWins(
              currentDoc, localChanges, baseVersion, options
            );
            break;

          case 'user_resolution':
            resolvedChanges = await this.userResolution(
              currentDoc, localChanges, remoteChanges, options
            );
            break;

          case 'field_level_merge':
            resolvedChanges = await this.fieldLevelMerge(
              currentDoc, localChanges, remoteChanges, options
            );
            break;

          default:
            throw new Error(`Unknown conflict resolution strategy: ${strategy}`);
        }

        // Apply resolved changes
        const result = await this.applyResolvedChanges(
          collection, documentId, currentDoc.version, resolvedChanges, session
        );

        // Log conflict resolution
        await this.logConflictResolution({
          documentId: documentId,
          collection: collection,
          strategy: strategy,
          baseVersion: baseVersion,
          conflictVersion: currentDoc.version,
          localChanges: localChanges,
          remoteChanges: remoteChanges,
          resolvedChanges: resolvedChanges,
          resolvedAt: new Date(),
          resolvedBy: options.userId || 'system'
        }, session);

        return {
          success: true,
          strategy: strategy,
          conflictResolved: true,
          finalVersion: result.newVersion,
          resolvedChanges: resolvedChanges
        };
      });
    } finally {
      await session.endSession();
    }
  }

  async mergeChanges(currentDoc, localChanges, remoteChanges, options) {
    // Intelligent three-way merge
    const merged = { ...currentDoc };
    const conflicts = [];

    // Process local changes
    Object.keys(localChanges).forEach(field => {
      if (field === '_id' || field === 'version') return;

      const localValue = localChanges[field];
      const remoteValue = remoteChanges[field];
      const currentValue = currentDoc[field];

      if (remoteValue !== undefined && localValue !== remoteValue) {
        // Conflict detected - apply merge rules
        const mergeResult = this.mergeFieldValues(
          field, currentValue, localValue, remoteValue, options.mergeRules || {}
        );

        merged[field] = mergeResult.value;

        if (mergeResult.hadConflict) {
          conflicts.push({
            field: field,
            localValue: localValue,
            remoteValue: remoteValue,
            resolvedValue: mergeResult.value,
            mergeRule: mergeResult.rule
          });
        }
      } else {
        // No conflict - use local value
        merged[field] = localValue;
      }
    });

    // Process remote changes not in local changes
    Object.keys(remoteChanges).forEach(field => {
      if (field === '_id' || field === 'version') return;

      if (localChanges[field] === undefined) {
        merged[field] = remoteChanges[field];
      }
    });

    return {
      ...merged,
      conflicts: conflicts,
      mergeStrategy: 'three_way_merge',
      mergedAt: new Date()
    };
  }

  mergeFieldValues(fieldName, currentValue, localValue, remoteValue, mergeRules) {
    // Apply field-specific merge rules
    const fieldRule = mergeRules[fieldName];

    if (fieldRule) {
      switch (fieldRule.strategy) {
        case 'local_wins':
          return { value: localValue, hadConflict: true, rule: 'local_wins' };

        case 'remote_wins':  
          return { value: remoteValue, hadConflict: true, rule: 'remote_wins' };

        case 'max_value':
          return { 
            value: Math.max(localValue, remoteValue), 
            hadConflict: true, 
            rule: 'max_value' 
          };

        case 'min_value':
          return { 
            value: Math.min(localValue, remoteValue), 
            hadConflict: true, 
            rule: 'min_value' 
          };

        case 'concatenate':
          return { 
            value: `${localValue}${fieldRule.separator || ' '}${remoteValue}`, 
            hadConflict: true, 
            rule: 'concatenate' 
          };

        case 'array_merge':
          const localArray = Array.isArray(localValue) ? localValue : [];
          const remoteArray = Array.isArray(remoteValue) ? remoteValue : [];
          return { 
            value: [...new Set([...localArray, ...remoteArray])], 
            hadConflict: true, 
            rule: 'array_merge' 
          };
      }
    }

    // Default conflict resolution - prefer local changes
    return { value: localValue, hadConflict: true, rule: 'default_local' };
  }

  async lastWriteWins(localChanges, remoteChanges, options) {
    // Simple last write wins strategy
    const localTimestamp = localChanges.lastModified || new Date(0);
    const remoteTimestamp = remoteChanges.lastModified || new Date(0);

    return localTimestamp > remoteTimestamp ? localChanges : remoteChanges;
  }

  async firstWriteWins(currentDoc, localChanges, baseVersion, options) {
    // Keep current state, reject local changes
    return {
      ...currentDoc,
      rejectedChanges: localChanges,
      rejectionReason: 'first_write_wins',
      rejectedAt: new Date()
    };
  }

  async fieldLevelMerge(currentDoc, localChanges, remoteChanges, options) {
    // Merge at field level with timestamp tracking
    const merged = { ...currentDoc };
    const fieldMergeLog = [];

    // Get field timestamps if available
    const getFieldTimestamp = (changes, field) => {
      return changes.fieldTimestamps?.[field] || changes.lastModified || new Date(0);
    };

    // Merge each field independently
    const allFields = new Set([
      ...Object.keys(localChanges),
      ...Object.keys(remoteChanges)
    ]);

    allFields.forEach(field => {
      if (field === '_id' || field === 'version' || field === 'fieldTimestamps') return;

      const localValue = localChanges[field];
      const remoteValue = remoteChanges[field];
      const localTimestamp = getFieldTimestamp(localChanges, field);
      const remoteTimestamp = getFieldTimestamp(remoteChanges, field);

      if (localValue !== undefined && remoteValue !== undefined) {
        // Both have changes - use timestamp
        if (localTimestamp > remoteTimestamp) {
          merged[field] = localValue;
          fieldMergeLog.push({
            field: field,
            winner: 'local',
            localValue: localValue,
            remoteValue: remoteValue,
            reason: 'newer_timestamp'
          });
        } else {
          merged[field] = remoteValue;
          fieldMergeLog.push({
            field: field,
            winner: 'remote',
            localValue: localValue,
            remoteValue: remoteValue,
            reason: 'newer_timestamp'
          });
        }
      } else if (localValue !== undefined) {
        merged[field] = localValue;
      } else if (remoteValue !== undefined) {
        merged[field] = remoteValue;
      }
    });

    return {
      ...merged,
      fieldMergeLog: fieldMergeLog,
      mergeStrategy: 'field_level_timestamp',
      mergedAt: new Date()
    };
  }

  async applyResolvedChanges(collection, documentId, currentVersion, resolvedChanges, session) {
    // Apply conflict-resolved changes
    const result = await this.db.collection(collection).updateOne(
      { 
        _id: ObjectId(documentId),
        version: currentVersion
      },
      {
        $set: {
          ...resolvedChanges,
          lastModified: new Date(),
          conflictResolved: true
        },
        $inc: { version: 1 }
      },
      { session }
    );

    if (result.modifiedCount === 0) {
      throw new Error('Failed to apply resolved changes - document modified during resolution');
    }

    return {
      success: true,
      previousVersion: currentVersion,
      newVersion: currentVersion + 1
    };
  }

  async detectConflicts(collection, documentId, baseVersion, proposedChanges) {
    // Detect potential conflicts before attempting resolution
    const currentDoc = await this.db.collection(collection).findOne({
      _id: ObjectId(documentId)
    });

    if (!currentDoc) {
      return { hasConflicts: false, reason: 'document_not_found' };
    }

    if (currentDoc.version <= baseVersion) {
      return { hasConflicts: false, reason: 'no_intervening_changes' };
    }

    // Analyze conflicts
    const conflicts = [];
    const changedFields = Object.keys(proposedChanges);

    changedFields.forEach(field => {
      if (field === '_id' || field === 'version') return;

      const proposedValue = proposedChanges[field];
      const currentValue = currentDoc[field];

      // Simple value comparison - in practice, this could be more sophisticated
      if (JSON.stringify(currentValue) !== JSON.stringify(proposedValue)) {
        conflicts.push({
          field: field,
          baseValue: 'unknown', // Would need to track base state
          currentValue: currentValue,
          proposedValue: proposedValue,
          conflictType: 'value_mismatch'
        });
      }
    });

    return {
      hasConflicts: conflicts.length > 0,
      conflictCount: conflicts.length,
      conflicts: conflicts,
      currentVersion: currentDoc.version,
      baseVersion: baseVersion
    };
  }

  async logConflictResolution(resolutionDetails, session) {
    // Log detailed conflict resolution information
    await this.conflictLog.insertOne({
      ...resolutionDetails,
      resolutionId: ObjectId()
    }, { session });
  }
}

QueryLeaf Concurrency Control Integration

QueryLeaf provides SQL-familiar syntax for MongoDB concurrency operations:

-- QueryLeaf concurrency control with SQL-style syntax

-- Optimistic locking with version-based updates
BEGIN TRANSACTION ISOLATION LEVEL OPTIMISTIC;

-- Update with automatic version checking
UPDATE accounts 
SET balance = balance - @transfer_amount,
    last_transaction_date = CURRENT_TIMESTAMP
WHERE account_id = @from_account 
  AND version = @expected_version  -- Optimistic lock condition
  AND balance >= @transfer_amount; -- Safety check

-- Check if update succeeded (no race condition)
IF @@ROWCOUNT = 0
BEGIN
    ROLLBACK TRANSACTION;
    RAISERROR('Account modified by another transaction or insufficient funds', 16, 1);
    RETURN;
END

-- Atomic credit to destination account  
UPDATE accounts
SET balance = balance + @transfer_amount,
    version = version + 1,
    last_transaction_date = CURRENT_TIMESTAMP
WHERE account_id = @to_account;

-- Log transaction with conflict detection
INSERT INTO transactions (
    from_account,
    to_account, 
    amount,
    transaction_date,
    transaction_type,
    session_id
)
VALUES (
    @from_account,
    @to_account,
    @transfer_amount,
    CURRENT_TIMESTAMP,
    'transfer',
    CONNECTION_ID()
);

COMMIT TRANSACTION;

-- Atomic increment operations with bounds checking
UPDATE inventory
SET quantity = quantity + @increment_amount,
    version = version + 1,
    last_modified = CURRENT_TIMESTAMP
WHERE product_id = @product_id
  AND quantity + @increment_amount >= 0      -- Prevent negative inventory
  AND quantity + @increment_amount <= @max_stock; -- Prevent oversocking

-- Atomic array operations
-- Add item to array if not already present
UPDATE user_preferences
SET favorite_categories = ARRAY_APPEND_UNIQUE(favorite_categories, @new_category),
    version = version + 1,
    last_modified = CURRENT_TIMESTAMP
WHERE user_id = @user_id
  AND NOT ARRAY_CONTAINS(favorite_categories, @new_category);

-- Remove item from array
UPDATE user_preferences  
SET favorite_categories = ARRAY_REMOVE(favorite_categories, @remove_category),
    version = version + 1,
    last_modified = CURRENT_TIMESTAMP
WHERE user_id = @user_id
  AND ARRAY_CONTAINS(favorite_categories, @remove_category);

-- Compare-and-swap operations
UPDATE configuration
SET setting_value = @new_value,
    version = version + 1,
    last_modified = CURRENT_TIMESTAMP,
    modified_by = @user_id
WHERE setting_key = @setting_key
  AND setting_value = @expected_current_value  -- Compare condition
  AND version = @expected_version;            -- Additional version check

-- Bulk atomic operations with conflict handling
WITH batch_updates AS (
    SELECT 
        order_id,
        new_status,
        expected_version,
        ROW_NUMBER() OVER (ORDER BY order_id) as batch_order
    FROM (VALUES 
        ('order_1', 'shipped', 5),
        ('order_2', 'shipped', 3), 
        ('order_3', 'shipped', 7)
    ) AS v(order_id, new_status, expected_version)
),
update_results AS (
    UPDATE orders o
    SET status = b.new_status,
        version = version + 1,
        status_changed_at = CURRENT_TIMESTAMP,
        batch_id = @batch_id
    FROM batch_updates b
    WHERE o.order_id = b.order_id
      AND o.version = b.expected_version  -- Optimistic lock per order
    RETURNING o.order_id, o.version as new_version, 'success' as result
)
SELECT 
    b.order_id,
    COALESCE(r.result, 'failed') as update_result,
    r.new_version,
    CASE 
        WHEN r.result IS NULL THEN 'Version conflict or order not found'
        ELSE 'Successfully updated'
    END as message
FROM batch_updates b
LEFT JOIN update_results r ON b.order_id = r.order_id
ORDER BY b.batch_order;

-- Conflict detection and resolution
WITH conflict_detection AS (
    SELECT 
        document_id,
        current_version,
        proposed_changes,
        base_version,
        CASE 
            WHEN current_version > base_version THEN 'conflict_detected'
            ELSE 'no_conflict'
        END as conflict_status,

        -- Analyze field-level conflicts
        JSON_EXTRACT_PATH(proposed_changes, 'field1') as proposed_field1,
        JSON_EXTRACT_PATH(current_data, 'field1') as current_field1,

        CASE 
            WHEN JSON_EXTRACT_PATH(proposed_changes, 'field1') != 
                 JSON_EXTRACT_PATH(current_data, 'field1') THEN 'field_conflict'
            ELSE 'no_field_conflict'
        END as field1_status
    FROM documents d
    CROSS JOIN proposed_updates p ON d.id = p.document_id
),
conflict_resolution AS (
    SELECT 
        document_id,
        conflict_status,

        -- Apply merge strategy based on conflict type
        CASE conflict_status
            WHEN 'no_conflict' THEN proposed_changes
            WHEN 'conflict_detected' THEN 
                CASE @resolution_strategy
                    WHEN 'merge' THEN MERGE_JSON(current_data, proposed_changes)
                    WHEN 'last_write_wins' THEN proposed_changes
                    WHEN 'first_write_wins' THEN current_data
                    ELSE proposed_changes
                END
        END as resolved_changes
    FROM conflict_detection
)
UPDATE documents d
SET data = r.resolved_changes,
    version = version + 1,
    last_modified = CURRENT_TIMESTAMP,
    conflict_resolved = CASE r.conflict_status 
        WHEN 'conflict_detected' THEN TRUE 
        ELSE FALSE 
    END,
    resolution_strategy = @resolution_strategy
FROM conflict_resolution r
WHERE d.id = r.document_id;

-- High-concurrency counter with atomic operations
-- Safe increment even under heavy concurrent load
UPDATE page_views
SET view_count = view_count + 1,
    last_view_timestamp = CURRENT_TIMESTAMP,
    version = version + 1
WHERE page_id = @page_id;

-- If page doesn't exist, create it atomically
INSERT INTO page_views (page_id, view_count, first_view_timestamp, last_view_timestamp, version)
SELECT @page_id, 1, CURRENT_TIMESTAMP, CURRENT_TIMESTAMP, 1
WHERE NOT EXISTS (SELECT 1 FROM page_views WHERE page_id = @page_id);

-- Distributed lock implementation for critical sections
WITH lock_acquisition AS (
    INSERT INTO distributed_locks (
        lock_key,
        acquired_by,
        acquired_at,
        expires_at,
        lock_version
    )
    SELECT 
        @lock_key,
        @process_id,
        CURRENT_TIMESTAMP,
        CURRENT_TIMESTAMP + INTERVAL @timeout_seconds SECOND,
        1
    WHERE NOT EXISTS (
        SELECT 1 FROM distributed_locks 
        WHERE lock_key = @lock_key 
          AND expires_at > CURRENT_TIMESTAMP
    )
    RETURNING lock_key, acquired_by, acquired_at
)
SELECT 
    CASE 
        WHEN l.lock_key IS NOT NULL THEN 'acquired'
        ELSE 'failed'
    END as lock_status,
    l.acquired_by,
    l.acquired_at
FROM lock_acquisition l;

-- Release distributed lock
DELETE FROM distributed_locks
WHERE lock_key = @lock_key
  AND acquired_by = @process_id
  AND lock_version = @expected_version;

-- QueryLeaf automatically handles:
-- 1. Version-based optimistic locking
-- 2. Atomic increment and decrement operations  
-- 3. Array manipulation with uniqueness constraints
-- 4. Compare-and-swap semantics
-- 5. Bulk operations with per-document conflict detection
-- 6. Conflict resolution strategies (merge, last-wins, first-wins)
-- 7. Distributed locking mechanisms
-- 8. Transaction isolation levels
-- 9. Deadlock prevention and detection
-- 10. Performance optimization for high-concurrency scenarios

Best Practices for Concurrency Management

Design Guidelines

Essential practices for effective concurrency control:

  1. Version-Based Optimistic Locking: Implement version fields in documents that change frequently
  2. Atomic Operations: Use MongoDB's atomic update operations instead of read-modify-write patterns
  3. Transaction Boundaries: Keep transactions short and focused to minimize lock contention
  4. Conflict Resolution: Design clear conflict resolution strategies appropriate for your use case
  5. Retry Logic: Implement exponential backoff retry for optimistic locking failures
  6. Performance Monitoring: Monitor contention points and optimize high-conflict operations

Concurrency Patterns

Choose appropriate concurrency patterns:

  1. Document-Level Locking: Use optimistic locking for individual document updates
  2. Field-Level Granularity: Implement field-specific version control for large documents
  3. Event Sourcing: Consider event-driven architectures for high-conflict scenarios
  4. CQRS: Separate read and write operations to reduce contention
  5. Distributed Locking: Use distributed locks for cross-document consistency requirements
  6. Queue-Based Processing: Use message queues to serialize high-conflict operations

Conclusion

MongoDB's sophisticated concurrency control mechanisms provide powerful tools for managing race conditions and maintaining data integrity in high-throughput applications. Combined with SQL-familiar concurrency patterns, MongoDB enables robust multi-user applications that scale effectively under load.

Key concurrency management benefits include:

  • High Performance: Optimistic locking avoids blocking operations under normal conditions
  • Scalability: Non-blocking concurrency control scales with user load
  • Data Integrity: Automatic conflict detection prevents lost updates and inconsistent states
  • Flexible Resolution: Multiple conflict resolution strategies accommodate different business requirements
  • ACID Compliance: Multi-document transactions provide full ACID guarantees when needed

Whether you're building financial systems requiring strict consistency, collaborative platforms with concurrent editing, or high-throughput applications with frequent updates, MongoDB's concurrency control with QueryLeaf's familiar SQL interface provides the foundation for robust, scalable applications. This combination enables you to implement sophisticated concurrency patterns while preserving familiar database interaction models.

QueryLeaf Integration: QueryLeaf automatically manages MongoDB concurrency control including optimistic locking, atomic operations, and conflict resolution while providing SQL-familiar transaction syntax. Complex concurrency patterns, version management, and conflict resolution strategies are seamlessly handled through familiar SQL constructs, making advanced concurrency control both powerful and accessible.

The integration of sophisticated concurrency control with SQL-style operations makes MongoDB an ideal platform for applications requiring both high-performance concurrent operations and familiar database development patterns, ensuring your concurrency solutions remain both effective and maintainable as they scale and evolve.

MongoDB Change Streams for Event-Driven Microservices: Advanced Real-Time Data Synchronization and Distributed System Architecture

Modern distributed systems require sophisticated event-driven architectures that can handle real-time data synchronization across multiple microservices while maintaining data consistency, service decoupling, and system resilience. Traditional approaches to inter-service communication often rely on polling mechanisms, message queues with complex configuration, or tightly coupled API calls that create bottlenecks, increase latency, and reduce system reliability under high load conditions.

MongoDB Change Streams provide comprehensive real-time event processing capabilities that enable microservices to react immediately to data changes through native database-level event streaming, advanced filtering mechanisms, and automatic resume token management. Unlike traditional message queue systems that require separate infrastructure and complex message routing logic, MongoDB Change Streams integrate event processing directly with the database layer, providing guaranteed event delivery, ordering semantics, and fault tolerance without additional middleware dependencies.

The Traditional Microservices Communication Challenge

Conventional approaches to microservices event processing face significant limitations in reliability and performance:

-- Traditional PostgreSQL event processing - complex and unreliable approaches

-- Basic event log table (limited capabilities)
CREATE TABLE service_events (
    event_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    service_name VARCHAR(100) NOT NULL,
    event_type VARCHAR(100) NOT NULL,
    entity_id UUID NOT NULL,
    entity_type VARCHAR(100) NOT NULL,

    -- Event data (limited structure)
    event_data JSONB NOT NULL,
    event_metadata JSONB,

    -- Processing tracking (manual management)
    event_timestamp TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    processing_status VARCHAR(50) DEFAULT 'pending', -- pending, processing, completed, failed
    processed_by VARCHAR(100),
    processed_at TIMESTAMP,

    -- Retry management (basic implementation)
    retry_count INTEGER DEFAULT 0,
    max_retries INTEGER DEFAULT 3,
    next_retry_at TIMESTAMP,

    -- Ordering and partitioning
    sequence_number BIGINT,
    partition_key VARCHAR(100),

    -- Audit fields
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Event subscriptions table (manual subscription management)
CREATE TABLE event_subscriptions (
    subscription_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    service_name VARCHAR(100) NOT NULL,
    event_type VARCHAR(100) NOT NULL,
    entity_type VARCHAR(100),

    -- Subscription configuration
    filter_conditions JSONB, -- Basic filtering capabilities
    delivery_endpoint VARCHAR(500) NOT NULL,
    delivery_method VARCHAR(50) DEFAULT 'webhook', -- webhook, queue, database

    -- Processing configuration
    batch_size INTEGER DEFAULT 1,
    max_delivery_attempts INTEGER DEFAULT 3,
    delivery_timeout_seconds INTEGER DEFAULT 30,

    -- Subscription status
    subscription_status VARCHAR(50) DEFAULT 'active', -- active, paused, disabled
    last_processed_event_id UUID,
    last_processing_error TEXT,

    -- Subscription metadata
    created_by VARCHAR(100) NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Event processing queue (complex state management)
CREATE TABLE event_processing_queue (
    queue_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    subscription_id UUID NOT NULL REFERENCES event_subscriptions(subscription_id),
    event_id UUID NOT NULL REFERENCES service_events(event_id),

    -- Processing state
    queue_status VARCHAR(50) DEFAULT 'queued', -- queued, processing, completed, failed, dead_letter
    processing_attempts INTEGER DEFAULT 0,

    -- Timing information
    queued_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    processing_started_at TIMESTAMP,
    processing_completed_at TIMESTAMP,
    next_attempt_at TIMESTAMP,

    -- Error tracking
    last_error_message TEXT,
    last_error_details JSONB,

    -- Processing metadata
    processing_node VARCHAR(100),
    processing_duration_ms INTEGER,

    UNIQUE (subscription_id, event_id)
);

-- Complex stored procedure for event processing (error-prone and limited)
CREATE OR REPLACE FUNCTION process_pending_events()
RETURNS TABLE (
    events_processed INTEGER,
    events_failed INTEGER,
    processing_duration_seconds INTEGER
) AS $$
DECLARE
    event_record RECORD;
    subscription_record RECORD;
    processing_start TIMESTAMP := clock_timestamp();
    processed_count INTEGER := 0;
    failed_count INTEGER := 0;
    current_batch_size INTEGER;
    delivery_result BOOLEAN;
BEGIN

    -- Process events in batches for each active subscription
    FOR subscription_record IN 
        SELECT * FROM event_subscriptions 
        WHERE subscription_status = 'active'
        ORDER BY created_at
    LOOP
        current_batch_size := subscription_record.batch_size;

        -- Get pending events for this subscription
        FOR event_record IN
            WITH filtered_events AS (
                SELECT se.*, epq.queue_id, epq.processing_attempts
                FROM service_events se
                JOIN event_processing_queue epq ON se.event_id = epq.event_id
                WHERE epq.subscription_id = subscription_record.subscription_id
                  AND epq.queue_status = 'queued'
                  AND (epq.next_attempt_at IS NULL OR epq.next_attempt_at <= CURRENT_TIMESTAMP)
                ORDER BY se.event_timestamp, se.sequence_number
                LIMIT current_batch_size
            )
            SELECT * FROM filtered_events
        LOOP

            -- Update processing status
            UPDATE event_processing_queue 
            SET 
                queue_status = 'processing',
                processing_started_at = CURRENT_TIMESTAMP,
                processing_attempts = processing_attempts + 1,
                processing_node = 'sql_processor'
            WHERE queue_id = event_record.queue_id;

            BEGIN
                -- Apply subscription filters (limited filtering capability)
                IF subscription_record.filter_conditions IS NOT NULL THEN
                    IF NOT jsonb_path_exists(
                        event_record.event_data, 
                        subscription_record.filter_conditions::jsonpath
                    ) THEN
                        -- Skip this event
                        UPDATE event_processing_queue 
                        SET queue_status = 'completed',
                            processing_completed_at = CURRENT_TIMESTAMP
                        WHERE queue_id = event_record.queue_id;
                        CONTINUE;
                    END IF;
                END IF;

                -- Simulate event delivery (in real implementation, would make HTTP call)
                delivery_result := deliver_event_to_service(
                    subscription_record.delivery_endpoint,
                    event_record.event_data,
                    subscription_record.delivery_timeout_seconds
                );

                IF delivery_result THEN
                    -- Mark as completed
                    UPDATE event_processing_queue 
                    SET 
                        queue_status = 'completed',
                        processing_completed_at = CURRENT_TIMESTAMP,
                        processing_duration_ms = EXTRACT(
                            MILLISECONDS FROM CURRENT_TIMESTAMP - processing_started_at
                        )::INTEGER
                    WHERE queue_id = event_record.queue_id;

                    processed_count := processed_count + 1;

                ELSE
                    RAISE EXCEPTION 'Event delivery failed';
                END IF;

            EXCEPTION WHEN OTHERS THEN
                failed_count := failed_count + 1;

                -- Handle retry logic
                IF event_record.processing_attempts < subscription_record.max_delivery_attempts THEN
                    -- Schedule retry with exponential backoff
                    UPDATE event_processing_queue 
                    SET 
                        queue_status = 'queued',
                        next_attempt_at = CURRENT_TIMESTAMP + 
                            (INTERVAL '1 minute' * POWER(2, event_record.processing_attempts)),
                        last_error_message = SQLERRM,
                        last_error_details = jsonb_build_object(
                            'error_code', SQLSTATE,
                            'error_message', SQLERRM,
                            'processing_attempt', event_record.processing_attempts + 1,
                            'timestamp', CURRENT_TIMESTAMP
                        )
                    WHERE queue_id = event_record.queue_id;
                ELSE
                    -- Move to dead letter queue
                    UPDATE event_processing_queue 
                    SET 
                        queue_status = 'dead_letter',
                        last_error_message = SQLERRM,
                        processing_completed_at = CURRENT_TIMESTAMP
                    WHERE queue_id = event_record.queue_id;
                END IF;
            END;
        END LOOP;

        -- Update subscription's last processed event
        UPDATE event_subscriptions 
        SET 
            last_processed_event_id = (
                SELECT event_id FROM event_processing_queue 
                WHERE subscription_id = subscription_record.subscription_id 
                  AND queue_status = 'completed'
                ORDER BY processing_completed_at DESC 
                LIMIT 1
            ),
            updated_at = CURRENT_TIMESTAMP
        WHERE subscription_id = subscription_record.subscription_id;

    END LOOP;

    RETURN QUERY SELECT 
        processed_count,
        failed_count,
        EXTRACT(SECONDS FROM clock_timestamp() - processing_start)::INTEGER;

END;
$$ LANGUAGE plpgsql;

-- Manual trigger-based event creation (limited and unreliable)
CREATE OR REPLACE FUNCTION create_user_change_event()
RETURNS TRIGGER AS $$
BEGIN
    -- Only create events for significant changes
    IF TG_OP = 'INSERT' OR 
       (TG_OP = 'UPDATE' AND (
           OLD.email != NEW.email OR 
           OLD.status != NEW.status OR
           OLD.user_type != NEW.user_type
       )) THEN

        INSERT INTO service_events (
            service_name,
            event_type,
            entity_id,
            entity_type,
            event_data,
            event_metadata,
            sequence_number,
            partition_key
        ) VALUES (
            'user_service',
            CASE TG_OP 
                WHEN 'INSERT' THEN 'user_created'
                WHEN 'UPDATE' THEN 'user_updated'
                WHEN 'DELETE' THEN 'user_deleted'
            END,
            COALESCE(NEW.user_id, OLD.user_id),
            'user',
            jsonb_build_object(
                'user_id', COALESCE(NEW.user_id, OLD.user_id),
                'email', COALESCE(NEW.email, OLD.email),
                'status', COALESCE(NEW.status, OLD.status),
                'user_type', COALESCE(NEW.user_type, OLD.user_type),
                'operation', TG_OP,
                'changed_fields', CASE 
                    WHEN TG_OP = 'INSERT' THEN jsonb_build_array('all')
                    WHEN TG_OP = 'UPDATE' THEN jsonb_build_array(
                        CASE WHEN OLD.email != NEW.email THEN 'email' END,
                        CASE WHEN OLD.status != NEW.status THEN 'status' END,
                        CASE WHEN OLD.user_type != NEW.user_type THEN 'user_type' END
                    )
                    ELSE jsonb_build_array('all')
                END
            ),
            jsonb_build_object(
                'source_table', TG_TABLE_NAME,
                'source_operation', TG_OP,
                'timestamp', CURRENT_TIMESTAMP,
                'transaction_id', txid_current()
            ),
            nextval('event_sequence'),
            COALESCE(NEW.user_id, OLD.user_id)::TEXT
        );

        -- Queue event for all matching subscriptions
        INSERT INTO event_processing_queue (subscription_id, event_id)
        SELECT 
            s.subscription_id,
            currval('service_events_event_id_seq')
        FROM event_subscriptions s
        WHERE s.subscription_status = 'active'
          AND s.event_type IN ('user_created', 'user_updated', 'user_deleted', '*')
          AND (s.entity_type IS NULL OR s.entity_type = 'user');

    END IF;

    RETURN COALESCE(NEW, OLD);
END;
$$ LANGUAGE plpgsql;

-- Problems with traditional event processing approaches:
-- 1. Complex manual event creation and subscription management
-- 2. Limited filtering and routing capabilities
-- 3. No guaranteed event ordering or delivery semantics
-- 4. Manual retry logic and error handling implementation
-- 5. Expensive polling mechanisms for event consumption
-- 6. No built-in support for resume tokens or fault tolerance
-- 7. Complex state management across multiple tables
-- 8. Limited scalability and performance under high event volumes
-- 9. No native integration with database transactions
-- 10. Manual implementation of event sourcing and CQRS patterns

MongoDB Change Streams eliminate these limitations with native event processing:

// MongoDB Change Streams - comprehensive event-driven microservices architecture
const { MongoClient } = require('mongodb');
const EventEmitter = require('events');

// Advanced microservices event processing system using MongoDB Change Streams
class MongoEventDrivenMicroservicesManager {
  constructor(connectionUri, options = {}) {
    this.client = new MongoClient(connectionUri);
    this.db = null;
    this.eventEmitter = new EventEmitter();
    this.activeStreams = new Map();
    this.subscriptions = new Map();

    // Configuration for event processing
    this.config = {
      // Change stream configuration
      changeStreamOptions: {
        fullDocument: 'updateLookup', // Include full document in updates
        fullDocumentBeforeChange: 'whenAvailable', // Include previous version
        maxAwaitTimeMS: 1000, // Reduce latency
        batchSize: 100 // Optimize batch processing
      },

      // Event processing configuration
      eventProcessing: {
        enableRetries: true,
        maxRetryAttempts: 3,
        retryDelayMs: 1000,
        exponentialBackoff: true,
        deadLetterQueueEnabled: true,
        preserveEventOrder: true
      },

      // Subscription management
      subscriptionManagement: {
        autoReconnect: true,
        resumeTokenPersistence: true,
        subscriptionHealthCheck: true,
        metricsCollection: true
      },

      // Performance optimization
      performanceSettings: {
        concurrentStreamLimit: 10,
        eventBatchSize: 50,
        processingTimeout: 30000,
        memoryBufferSize: 1000
      }
    };

    // Event processing metrics
    this.metrics = {
      totalEventsProcessed: 0,
      totalEventsReceived: 0,
      totalSubscriptions: 0,
      activeStreams: 0,
      eventProcessingErrors: 0,
      averageProcessingTime: 0,
      lastEventTimestamp: null
    };

    // Resume token storage for fault tolerance
    this.resumeTokens = new Map();
    this.subscriptionHealthStatus = new Map();
  }

  async initialize(databaseName) {
    console.log('Initializing MongoDB Event-Driven Microservices Manager...');

    try {
      await this.client.connect();
      this.db = this.client.db(databaseName);

      // Setup system collections for event management
      await this.setupEventManagementCollections();

      // Load existing subscriptions and resume tokens
      await this.loadExistingSubscriptions();

      // Setup health monitoring
      if (this.config.subscriptionManagement.subscriptionHealthCheck) {
        this.startHealthMonitoring();
      }

      console.log('Event-driven microservices manager initialized successfully');

    } catch (error) {
      console.error('Error initializing event manager:', error);
      throw error;
    }
  }

  // Create comprehensive event subscription for microservices
  async createEventSubscription(subscriptionConfig) {
    console.log(`Creating event subscription: ${subscriptionConfig.subscriptionId}`);

    const subscription = {
      subscriptionId: subscriptionConfig.subscriptionId,
      serviceName: subscriptionConfig.serviceName,

      // Event filtering configuration
      collections: subscriptionConfig.collections || [], // Collections to watch
      eventTypes: subscriptionConfig.eventTypes || ['insert', 'update', 'delete'], // Operation types
      pipeline: subscriptionConfig.pipeline || [], // Advanced filtering pipeline

      // Event processing configuration
      eventHandler: subscriptionConfig.eventHandler, // Function to process events
      batchProcessing: subscriptionConfig.batchProcessing || false,
      batchSize: subscriptionConfig.batchSize || 1,
      preserveOrder: subscriptionConfig.preserveOrder !== false,

      // Error handling configuration
      errorHandler: subscriptionConfig.errorHandler,
      retryPolicy: {
        maxRetries: subscriptionConfig.maxRetries || this.config.eventProcessing.maxRetryAttempts,
        retryDelay: subscriptionConfig.retryDelay || this.config.eventProcessing.retryDelayMs,
        exponentialBackoff: subscriptionConfig.exponentialBackoff !== false
      },

      // Subscription metadata
      createdAt: new Date(),
      lastEventProcessed: null,
      resumeToken: null,
      isActive: false,

      // Performance tracking
      metrics: {
        eventsReceived: 0,
        eventsProcessed: 0,
        eventsSkipped: 0,
        processingErrors: 0,
        averageProcessingTime: 0,
        lastProcessingTime: null
      }
    };

    // Store subscription configuration
    await this.db.collection('event_subscriptions').replaceOne(
      { subscriptionId: subscription.subscriptionId },
      subscription,
      { upsert: true }
    );

    // Cache subscription
    this.subscriptions.set(subscription.subscriptionId, subscription);

    console.log(`Event subscription created: ${subscription.subscriptionId}`);
    return subscription.subscriptionId;
  }

  // Start change streams for active subscriptions
  async startEventStreaming(subscriptionId) {
    console.log(`Starting event streaming for subscription: ${subscriptionId}`);

    const subscription = this.subscriptions.get(subscriptionId);
    if (!subscription) {
      throw new Error(`Subscription not found: ${subscriptionId}`);
    }

    // Build change stream pipeline based on subscription configuration
    const pipeline = this.buildChangeStreamPipeline(subscription);

    // Configure change stream options
    const changeStreamOptions = {
      ...this.config.changeStreamOptions,
      resumeAfter: subscription.resumeToken,
      startAtOperationTime: subscription.resumeToken ? undefined : new Date()
    };

    try {
      let changeStream;

      // Create change stream based on collection scope
      if (subscription.collections.length === 1) {
        // Single collection stream
        const collection = this.db.collection(subscription.collections[0]);
        changeStream = collection.watch(pipeline, changeStreamOptions);
      } else if (subscription.collections.length > 1) {
        // Multiple collections stream (requires database-level watch)
        changeStream = this.db.watch(pipeline, changeStreamOptions);
      } else {
        // Database-level stream for all collections
        changeStream = this.db.watch(pipeline, changeStreamOptions);
      }

      // Store active stream
      this.activeStreams.set(subscriptionId, changeStream);
      subscription.isActive = true;
      this.metrics.activeStreams++;

      // Setup event processing
      changeStream.on('change', async (changeEvent) => {
        await this.processChangeEvent(subscriptionId, changeEvent);
      });

      // Handle stream errors
      changeStream.on('error', async (error) => {
        console.error(`Change stream error for ${subscriptionId}:`, error);
        await this.handleStreamError(subscriptionId, error);
      });

      // Handle stream close
      changeStream.on('close', () => {
        console.log(`Change stream closed for ${subscriptionId}`);
        subscription.isActive = false;
        this.activeStreams.delete(subscriptionId);
        this.metrics.activeStreams--;
      });

      console.log(`Event streaming started for subscription: ${subscriptionId}`);
      return true;

    } catch (error) {
      console.error(`Error starting event streaming for ${subscriptionId}:`, error);
      subscription.isActive = false;
      throw error;
    }
  }

  // Process individual change events with comprehensive handling
  async processChangeEvent(subscriptionId, changeEvent) {
    const startTime = Date.now();
    const subscription = this.subscriptions.get(subscriptionId);

    if (!subscription || !subscription.isActive) {
      return; // Skip if subscription is inactive
    }

    try {
      // Update resume token for fault tolerance
      subscription.resumeToken = changeEvent._id;
      this.resumeTokens.set(subscriptionId, changeEvent._id);

      // Apply subscription filtering
      if (!this.matchesSubscriptionCriteria(changeEvent, subscription)) {
        subscription.metrics.eventsSkipped++;
        return;
      }

      // Prepare enriched event data
      const enrichedEvent = await this.enrichChangeEvent(changeEvent, subscription);

      // Update metrics
      subscription.metrics.eventsReceived++;
      this.metrics.totalEventsReceived++;

      // Process event with retry logic
      await this.processEventWithRetries(subscription, enrichedEvent, 0);

      // Update processing metrics
      const processingTime = Date.now() - startTime;
      subscription.metrics.averageProcessingTime = 
        (subscription.metrics.averageProcessingTime + processingTime) / 2;
      subscription.metrics.lastProcessingTime = new Date();
      subscription.lastEventProcessed = new Date();

      this.metrics.averageProcessingTime = 
        (this.metrics.averageProcessingTime + processingTime) / 2;
      this.metrics.lastEventTimestamp = new Date();

      // Persist resume token periodically
      if (this.config.subscriptionManagement.resumeTokenPersistence) {
        await this.persistResumeToken(subscriptionId, changeEvent._id);
      }

    } catch (error) {
      console.error(`Error processing change event for ${subscriptionId}:`, error);
      subscription.metrics.processingErrors++;
      this.metrics.eventProcessingErrors++;

      // Handle error based on subscription configuration
      if (subscription.errorHandler) {
        try {
          await subscription.errorHandler(error, changeEvent, subscription);
        } catch (handlerError) {
          console.error('Error handler failed:', handlerError);
        }
      }
    }
  }

  // Advanced event processing with retry mechanisms
  async processEventWithRetries(subscription, enrichedEvent, attemptNumber) {
    try {
      // Execute event handler
      if (subscription.batchProcessing) {
        // Add to batch processing queue
        await this.addToBatchQueue(subscription.subscriptionId, enrichedEvent);
      } else {
        // Process event immediately
        await subscription.eventHandler(enrichedEvent, subscription);
      }

      // Mark as successfully processed
      subscription.metrics.eventsProcessed++;
      this.metrics.totalEventsProcessed++;

    } catch (error) {
      console.error(`Event processing error (attempt ${attemptNumber + 1}):`, error);

      if (attemptNumber < subscription.retryPolicy.maxRetries) {
        // Calculate retry delay with exponential backoff
        const delay = subscription.retryPolicy.exponentialBackoff
          ? subscription.retryPolicy.retryDelay * Math.pow(2, attemptNumber)
          : subscription.retryPolicy.retryDelay;

        console.log(`Retrying event processing in ${delay}ms...`);

        await new Promise(resolve => setTimeout(resolve, delay));
        return this.processEventWithRetries(subscription, enrichedEvent, attemptNumber + 1);
      } else {
        // Max retries reached, send to dead letter queue
        if (this.config.eventProcessing.deadLetterQueueEnabled) {
          await this.sendToDeadLetterQueue(subscription.subscriptionId, enrichedEvent, error);
        }
        throw error;
      }
    }
  }

  // Enrich change events with additional context and metadata
  async enrichChangeEvent(changeEvent, subscription) {
    const enrichedEvent = {
      // Original change event data
      ...changeEvent,

      // Event metadata
      eventMetadata: {
        subscriptionId: subscription.subscriptionId,
        serviceName: subscription.serviceName,
        processedAt: new Date(),
        eventId: this.generateEventId(),

        // Change event details
        operationType: changeEvent.operationType,
        collectionName: changeEvent.ns?.coll,
        databaseName: changeEvent.ns?.db,

        // Document information
        documentKey: changeEvent.documentKey,
        hasFullDocument: !!changeEvent.fullDocument,
        hasFullDocumentBeforeChange: !!changeEvent.fullDocumentBeforeChange,

        // Event context
        clusterTime: changeEvent.clusterTime,
        resumeToken: changeEvent._id,

        // Processing context
        processingTimestamp: Date.now(),
        correlationId: this.generateCorrelationId(changeEvent)
      },

      // Service-specific enrichment
      serviceContext: {
        serviceName: subscription.serviceName,
        subscriptionConfig: {
          preserveOrder: subscription.preserveOrder,
          batchProcessing: subscription.batchProcessing
        }
      }
    };

    // Add business context if available
    if (changeEvent.fullDocument) {
      enrichedEvent.businessContext = await this.extractBusinessContext(
        changeEvent.fullDocument, 
        changeEvent.ns?.coll
      );
    }

    return enrichedEvent;
  }

  // Build change stream pipeline based on subscription configuration
  buildChangeStreamPipeline(subscription) {
    const pipeline = [...subscription.pipeline];

    // Add operation type filtering
    if (subscription.eventTypes.length > 0 && 
        !subscription.eventTypes.includes('*')) {
      pipeline.push({
        $match: {
          operationType: { $in: subscription.eventTypes }
        }
      });
    }

    // Add collection filtering for database-level streams
    if (subscription.collections.length > 0 && subscription.collections.length > 1) {
      pipeline.push({
        $match: {
          'ns.coll': { $in: subscription.collections }
        }
      });
    }

    // Add service-specific filtering
    pipeline.push({
      $addFields: {
        processedBy: subscription.serviceName,
        subscriptionId: subscription.subscriptionId
      }
    });

    return pipeline;
  }

  // Check if change event matches subscription criteria
  matchesSubscriptionCriteria(changeEvent, subscription) {
    // Check operation type
    if (subscription.eventTypes.length > 0 && 
        !subscription.eventTypes.includes('*') &&
        !subscription.eventTypes.includes(changeEvent.operationType)) {
      return false;
    }

    // Check collection name
    if (subscription.collections.length > 0 &&
        !subscription.collections.includes(changeEvent.ns?.coll)) {
      return false;
    }

    return true;
  }

  // Batch processing queue management
  async addToBatchQueue(subscriptionId, enrichedEvent) {
    if (!this.batchQueues) {
      this.batchQueues = new Map();
    }

    if (!this.batchQueues.has(subscriptionId)) {
      this.batchQueues.set(subscriptionId, []);
    }

    const queue = this.batchQueues.get(subscriptionId);
    queue.push(enrichedEvent);

    const subscription = this.subscriptions.get(subscriptionId);
    if (queue.length >= subscription.batchSize) {
      await this.processBatch(subscriptionId);
    }
  }

  // Process batched events
  async processBatch(subscriptionId) {
    const queue = this.batchQueues.get(subscriptionId);
    if (!queue || queue.length === 0) {
      return;
    }

    const subscription = this.subscriptions.get(subscriptionId);
    const batch = queue.splice(0, subscription.batchSize);

    try {
      await subscription.eventHandler(batch, subscription);
      subscription.metrics.eventsProcessed += batch.length;
      this.metrics.totalEventsProcessed += batch.length;
    } catch (error) {
      console.error(`Batch processing error for ${subscriptionId}:`, error);
      // Handle batch processing errors
      for (const event of batch) {
        await this.sendToDeadLetterQueue(subscriptionId, event, error);
      }
    }
  }

  // Dead letter queue management
  async sendToDeadLetterQueue(subscriptionId, enrichedEvent, error) {
    try {
      await this.db.collection('dead_letter_events').insertOne({
        subscriptionId: subscriptionId,
        originalEvent: enrichedEvent,
        error: {
          message: error.message,
          stack: error.stack,
          timestamp: new Date()
        },
        createdAt: new Date(),
        status: 'failed',
        retryAttempts: 0
      });

      console.log(`Event sent to dead letter queue for subscription: ${subscriptionId}`);
    } catch (dlqError) {
      console.error('Error sending event to dead letter queue:', dlqError);
    }
  }

  // Comprehensive event analytics and monitoring
  async getEventAnalytics(timeRange = '24h') {
    console.log('Generating event processing analytics...');

    const timeRanges = {
      '1h': 1,
      '6h': 6,
      '24h': 24,
      '7d': 168,
      '30d': 720
    };

    const hours = timeRanges[timeRange] || 24;
    const startTime = new Date(Date.now() - (hours * 60 * 60 * 1000));

    try {
      // Get subscription performance metrics
      const subscriptionMetrics = await this.db.collection('event_subscriptions')
        .aggregate([
          {
            $project: {
              subscriptionId: 1,
              serviceName: 1,
              isActive: 1,
              'metrics.eventsReceived': 1,
              'metrics.eventsProcessed': 1,
              'metrics.eventsSkipped': 1,
              'metrics.processingErrors': 1,
              'metrics.averageProcessingTime': 1,
              lastEventProcessed: 1,
              createdAt: 1
            }
          }
        ]).toArray();

      // Get event volume trends
      const eventTrends = await this.db.collection('event_processing_log')
        .aggregate([
          {
            $match: {
              timestamp: { $gte: startTime }
            }
          },
          {
            $group: {
              _id: {
                hour: { $hour: '$timestamp' },
                serviceName: '$serviceName'
              },
              eventCount: { $sum: 1 },
              avgProcessingTime: { $avg: '$processingTime' }
            }
          },
          {
            $sort: { '_id.hour': 1 }
          }
        ]).toArray();

      // Get error analysis
      const errorAnalysis = await this.db.collection('dead_letter_events')
        .aggregate([
          {
            $match: {
              createdAt: { $gte: startTime }
            }
          },
          {
            $group: {
              _id: {
                subscriptionId: '$subscriptionId',
                errorType: '$error.message'
              },
              errorCount: { $sum: 1 },
              latestError: { $max: '$createdAt' }
            }
          }
        ]).toArray();

      return {
        reportGeneratedAt: new Date(),
        timeRange: timeRange,

        // Overall system metrics
        systemMetrics: {
          ...this.metrics,
          activeSubscriptions: this.subscriptions.size,
          totalSubscriptions: subscriptionMetrics.length
        },

        // Subscription performance
        subscriptionPerformance: subscriptionMetrics,

        // Event volume trends
        eventTrends: eventTrends,

        // Error analysis
        errorAnalysis: errorAnalysis,

        // Health indicators
        healthIndicators: {
          subscriptionsWithErrors: errorAnalysis.length,
          averageProcessingTime: this.metrics.averageProcessingTime,
          eventProcessingRate: this.metrics.totalEventsProcessed / hours,
          systemHealth: this.calculateSystemHealth()
        }
      };

    } catch (error) {
      console.error('Error generating event analytics:', error);
      throw error;
    }
  }

  // System health monitoring
  calculateSystemHealth() {
    const errorRate = this.metrics.eventProcessingErrors / this.metrics.totalEventsReceived;
    const processingEfficiency = this.metrics.totalEventsProcessed / this.metrics.totalEventsReceived;

    if (errorRate > 0.05) return 'Critical';
    if (errorRate > 0.01 || processingEfficiency < 0.95) return 'Warning';
    if (this.metrics.averageProcessingTime > 5000) return 'Degraded';
    return 'Healthy';
  }

  // Utility methods
  async setupEventManagementCollections() {
    // Create indexes for optimal performance
    await this.db.collection('event_subscriptions').createIndexes([
      { key: { subscriptionId: 1 }, unique: true },
      { key: { serviceName: 1 } },
      { key: { isActive: 1 } }
    ]);

    await this.db.collection('dead_letter_events').createIndexes([
      { key: { subscriptionId: 1, createdAt: -1 } },
      { key: { createdAt: 1 }, expireAfterSeconds: 30 * 24 * 60 * 60 } // 30 days TTL
    ]);
  }

  async loadExistingSubscriptions() {
    const subscriptions = await this.db.collection('event_subscriptions')
      .find({ isActive: true })
      .toArray();

    subscriptions.forEach(sub => {
      this.subscriptions.set(sub.subscriptionId, sub);
    });
  }

  generateEventId() {
    return `evt_${Date.now()}_${Math.random().toString(36).substr(2, 9)}`;
  }

  generateCorrelationId(changeEvent) {
    return `corr_${changeEvent.ns?.coll}_${changeEvent.documentKey?._id}_${Date.now()}`;
  }

  async extractBusinessContext(document, collectionName) {
    // Extract relevant business context based on collection
    const context = {
      collectionName: collectionName,
      entityId: document._id,
      entityType: collectionName.replace(/s$/, '') // Simple singularization
    };

    // Add collection-specific context
    if (collectionName === 'users') {
      context.userEmail = document.email;
      context.userType = document.userType;
    } else if (collectionName === 'orders') {
      context.customerId = document.customerId;
      context.orderTotal = document.total;
    } else if (collectionName === 'products') {
      context.productCategory = document.category;
      context.productBrand = document.brand;
    }

    return context;
  }

  async persistResumeToken(subscriptionId, resumeToken) {
    await this.db.collection('event_subscriptions').updateOne(
      { subscriptionId: subscriptionId },
      { $set: { resumeToken: resumeToken, updatedAt: new Date() } }
    );
  }

  async handleStreamError(subscriptionId, error) {
    const subscription = this.subscriptions.get(subscriptionId);
    subscription.isActive = false;

    console.error(`Handling stream error for ${subscriptionId}:`, error);

    // Implement automatic reconnection logic
    if (this.config.subscriptionManagement.autoReconnect) {
      setTimeout(async () => {
        try {
          console.log(`Attempting to reconnect stream for ${subscriptionId}`);
          await this.startEventStreaming(subscriptionId);
        } catch (reconnectError) {
          console.error(`Failed to reconnect stream for ${subscriptionId}:`, reconnectError);
        }
      }, 5000); // Retry after 5 seconds
    }
  }

  startHealthMonitoring() {
    setInterval(async () => {
      try {
        for (const [subscriptionId, subscription] of this.subscriptions) {
          const isHealthy = subscription.isActive && 
            (Date.now() - (subscription.lastEventProcessed?.getTime() || Date.now())) < 300000; // 5 minutes

          this.subscriptionHealthStatus.set(subscriptionId, {
            isHealthy: isHealthy,
            lastCheck: new Date(),
            subscription: subscription
          });
        }
      } catch (error) {
        console.error('Health monitoring error:', error);
      }
    }, 60000); // Check every minute
  }

  // Graceful shutdown
  async shutdown() {
    console.log('Shutting down event-driven microservices manager...');

    // Close all active streams
    for (const [subscriptionId, stream] of this.activeStreams) {
      try {
        await stream.close();
        console.log(`Closed stream for subscription: ${subscriptionId}`);
      } catch (error) {
        console.error(`Error closing stream for ${subscriptionId}:`, error);
      }
    }

    // Close MongoDB connection
    await this.client.close();
    console.log('Event-driven microservices manager shutdown complete');
  }
}

// Example usage demonstrating comprehensive microservices event processing
async function demonstrateMicroservicesEventProcessing() {
  const client = new MongoClient('mongodb://localhost:27017');
  const eventManager = new MongoEventDrivenMicroservicesManager(client);

  try {
    await eventManager.initialize('microservices_platform');

    console.log('Setting up microservices event subscriptions...');

    // User service subscription for authentication events
    await eventManager.createEventSubscription({
      subscriptionId: 'user_auth_events',
      serviceName: 'authentication_service',
      collections: ['users'],
      eventTypes: ['insert', 'update'],
      pipeline: [
        {
          $match: {
            $or: [
              { operationType: 'insert' },
              { 
                operationType: 'update',
                'updateDescription.updatedFields.lastLogin': { $exists: true }
              }
            ]
          }
        }
      ],
      eventHandler: async (event, subscription) => {
        console.log(`Auth Service processing: ${event.operationType} for user ${event.documentKey._id}`);

        if (event.operationType === 'insert') {
          // Send welcome email
          console.log('Triggering welcome email workflow');
        } else if (event.operationType === 'update' && event.fullDocument.lastLogin) {
          // Log user activity
          console.log('Recording user login activity');
        }
      }
    });

    // Order service subscription for inventory management
    await eventManager.createEventSubscription({
      subscriptionId: 'inventory_management',
      serviceName: 'inventory_service',
      collections: ['orders'],
      eventTypes: ['insert', 'update'],
      batchProcessing: true,
      batchSize: 10,
      eventHandler: async (events, subscription) => {
        console.log(`Inventory Service processing batch of ${events.length} order events`);

        for (const event of events) {
          if (event.operationType === 'insert' && event.fullDocument.status === 'confirmed') {
            console.log(`Reducing inventory for order: ${event.documentKey._id}`);
            // Update inventory levels
          }
        }
      }
    });

    // Analytics service subscription for real-time metrics
    await eventManager.createEventSubscription({
      subscriptionId: 'realtime_analytics',
      serviceName: 'analytics_service',
      collections: ['orders', 'products', 'users'],
      eventTypes: ['insert', 'update', 'delete'],
      eventHandler: async (event, subscription) => {
        console.log(`Analytics Service processing: ${event.operationType} on ${event.ns.coll}`);

        // Update real-time dashboards
        if (event.ns.coll === 'orders' && event.operationType === 'insert') {
          console.log('Updating real-time sales metrics');
        }
      }
    });

    // Start event streaming for all subscriptions
    await eventManager.startEventStreaming('user_auth_events');
    await eventManager.startEventStreaming('inventory_management');
    await eventManager.startEventStreaming('realtime_analytics');

    console.log('All event streams started successfully');

    // Simulate some database changes to trigger events
    console.log('Simulating database changes...');

    // Insert a new user
    await eventManager.db.collection('users').insertOne({
      email: 'john.doe@example.com',
      name: 'John Doe',
      userType: 'premium',
      createdAt: new Date()
    });

    // Insert a new order
    await eventManager.db.collection('orders').insertOne({
      customerId: new ObjectId(),
      total: 299.99,
      status: 'confirmed',
      items: [
        { productId: new ObjectId(), quantity: 2, price: 149.99 }
      ],
      createdAt: new Date()
    });

    // Wait a bit for events to process
    await new Promise(resolve => setTimeout(resolve, 2000));

    // Get analytics report
    const analytics = await eventManager.getEventAnalytics('1h');
    console.log('Event Processing Analytics:', JSON.stringify(analytics, null, 2));

  } catch (error) {
    console.error('Microservices event processing demonstration error:', error);
  } finally {
    await eventManager.shutdown();
  }
}

// Export the event-driven microservices manager
module.exports = {
  MongoEventDrivenMicroservicesManager,
  demonstrateMicroservicesEventProcessing
};

SQL-Style Event Processing with QueryLeaf

QueryLeaf provides familiar SQL approaches to MongoDB Change Streams and event-driven architectures:

-- QueryLeaf event-driven microservices with SQL-familiar syntax

-- Create event subscription with comprehensive configuration
CREATE EVENT_SUBSCRIPTION user_service_events AS (
  -- Subscription identification
  subscription_id = 'user_lifecycle_events',
  service_name = 'user_service',

  -- Event source configuration
  watch_collections = JSON_ARRAY('users', 'user_profiles', 'user_preferences'),
  event_types = JSON_ARRAY('insert', 'update', 'delete'),

  -- Advanced event filtering with SQL-style conditions
  event_filter = JSON_OBJECT(
    'operationType', JSON_OBJECT('$in', JSON_ARRAY('insert', 'update')),
    '$or', JSON_ARRAY(
      JSON_OBJECT('operationType', 'insert'),
      JSON_OBJECT(
        'operationType', 'update',
        'updateDescription.updatedFields', JSON_OBJECT(
          '$or', JSON_ARRAY(
            JSON_OBJECT('email', JSON_OBJECT('$exists', true)),
            JSON_OBJECT('status', JSON_OBJECT('$exists', true)),
            JSON_OBJECT('subscription_tier', JSON_OBJECT('$exists', true))
          )
        )
      )
    )
  ),

  -- Event processing configuration
  batch_processing = false,
  preserve_order = true,
  full_document = 'updateLookup',
  full_document_before_change = 'whenAvailable',

  -- Error handling and retry policy
  max_retry_attempts = 3,
  retry_delay_ms = 1000,
  exponential_backoff = true,
  dead_letter_queue_enabled = true,

  -- Performance settings
  batch_size = 100,
  processing_timeout_ms = 30000,

  -- Subscription metadata
  created_by = 'user_service_admin',
  description = 'User lifecycle events for authentication and personalization services'
);

-- Monitor event processing with real-time analytics
WITH event_stream_metrics AS (
  SELECT 
    subscription_id,
    service_name,

    -- Event volume metrics
    COUNT(*) as total_events_received,
    COUNT(CASE WHEN processing_status = 'completed' THEN 1 END) as events_processed,
    COUNT(CASE WHEN processing_status = 'failed' THEN 1 END) as events_failed,
    COUNT(CASE WHEN processing_status = 'retrying' THEN 1 END) as events_retrying,

    -- Processing performance
    AVG(processing_duration_ms) as avg_processing_time_ms,
    MAX(processing_duration_ms) as max_processing_time_ms,
    MIN(processing_duration_ms) as min_processing_time_ms,

    -- Event type distribution
    COUNT(CASE WHEN event_type = 'insert' THEN 1 END) as insert_events,
    COUNT(CASE WHEN event_type = 'update' THEN 1 END) as update_events,
    COUNT(CASE WHEN event_type = 'delete' THEN 1 END) as delete_events,

    -- Collection distribution
    COUNT(CASE WHEN collection_name = 'users' THEN 1 END) as user_events,
    COUNT(CASE WHEN collection_name = 'user_profiles' THEN 1 END) as profile_events,
    COUNT(CASE WHEN collection_name = 'user_preferences' THEN 1 END) as preference_events,

    -- Time-based analysis
    DATE_FORMAT(event_timestamp, '%Y-%m-%d %H:00:00') as hour_bucket,
    COUNT(*) as hourly_event_count,

    -- Success rate calculation
    ROUND(
      (COUNT(CASE WHEN processing_status = 'completed' THEN 1 END) * 100.0) / 
      COUNT(*), 2
    ) as success_rate_percent

  FROM CHANGE_STREAM_EVENTS()
  WHERE event_timestamp >= DATE_SUB(NOW(), INTERVAL 24 HOUR)
    AND subscription_id IN (
      'user_lifecycle_events', 
      'inventory_management', 
      'realtime_analytics',
      'notification_service',
      'audit_logging'
    )
  GROUP BY 
    subscription_id, 
    service_name, 
    DATE_FORMAT(event_timestamp, '%Y-%m-%d %H:00:00')
),

-- Event processing lag and performance analysis
processing_performance AS (
  SELECT 
    subscription_id,

    -- Latency metrics
    AVG(TIMESTAMPDIFF(MICROSECOND, event_timestamp, processing_completed_at) / 1000) as avg_processing_lag_ms,
    MAX(TIMESTAMPDIFF(MICROSECOND, event_timestamp, processing_completed_at) / 1000) as max_processing_lag_ms,

    -- Throughput calculations
    COUNT(*) / 
      (TIMESTAMPDIFF(SECOND, MIN(event_timestamp), MAX(event_timestamp)) / 3600.0) as events_per_hour,

    -- Error analysis
    COUNT(CASE WHEN retry_count > 0 THEN 1 END) as events_requiring_retry,
    AVG(retry_count) as avg_retry_count,

    -- Resume token health
    MAX(resume_token_timestamp) as latest_resume_token,
    TIMESTAMPDIFF(SECOND, MAX(resume_token_timestamp), NOW()) as resume_token_lag_seconds,

    -- Queue depth analysis
    COUNT(CASE WHEN processing_status = 'queued' THEN 1 END) as current_queue_depth,

    -- Service health indicators
    CASE 
      WHEN success_rate_percent >= 99 AND avg_processing_lag_ms < 1000 THEN 'Excellent'
      WHEN success_rate_percent >= 95 AND avg_processing_lag_ms < 5000 THEN 'Good'
      WHEN success_rate_percent >= 90 AND avg_processing_lag_ms < 15000 THEN 'Fair'
      ELSE 'Needs Attention'
    END as service_health_status

  FROM event_stream_metrics
  GROUP BY subscription_id
)

SELECT 
  esm.subscription_id,
  esm.service_name,
  esm.total_events_received,
  esm.events_processed,
  esm.success_rate_percent,
  esm.avg_processing_time_ms,

  -- Performance indicators
  pp.avg_processing_lag_ms,
  pp.events_per_hour,
  pp.service_health_status,

  -- Event distribution
  esm.insert_events,
  esm.update_events,
  esm.delete_events,

  -- Collection breakdown
  esm.user_events,
  esm.profile_events,
  esm.preference_events,

  -- Error and retry analysis
  esm.events_failed,
  pp.events_requiring_retry,
  pp.avg_retry_count,
  pp.current_queue_depth,

  -- Real-time status
  pp.resume_token_lag_seconds,
  CASE 
    WHEN pp.resume_token_lag_seconds > 300 THEN 'Stream Lagging'
    WHEN pp.current_queue_depth > 1000 THEN 'Queue Backlog'
    WHEN esm.success_rate_percent < 95 THEN 'High Error Rate'
    ELSE 'Healthy'
  END as real_time_status,

  -- Performance recommendations
  CASE 
    WHEN pp.avg_processing_lag_ms > 10000 THEN 'Increase processing capacity'
    WHEN pp.current_queue_depth > 500 THEN 'Enable batch processing'
    WHEN esm.success_rate_percent < 90 THEN 'Review error handling'
    WHEN pp.events_per_hour > 10000 THEN 'Consider partitioning'
    ELSE 'Performance optimal'
  END as optimization_recommendation

FROM event_stream_metrics esm
JOIN processing_performance pp ON esm.subscription_id = pp.subscription_id
ORDER BY esm.total_events_received DESC, esm.success_rate_percent ASC;

-- Advanced event correlation and business process tracking
WITH event_correlation AS (
  SELECT 
    correlation_id,
    business_process_id,

    -- Process timeline tracking
    MIN(event_timestamp) as process_start_time,
    MAX(event_timestamp) as process_end_time,
    TIMESTAMPDIFF(SECOND, MIN(event_timestamp), MAX(event_timestamp)) as process_duration_seconds,

    -- Event sequence analysis
    GROUP_CONCAT(
      CONCAT(service_name, ':', event_type, ':', collection_name) 
      ORDER BY event_timestamp 
      SEPARATOR ' -> '
    ) as event_sequence,

    COUNT(*) as total_events_in_process,
    COUNT(DISTINCT service_name) as services_involved,
    COUNT(DISTINCT collection_name) as collections_affected,

    -- Process completion analysis
    COUNT(CASE WHEN processing_status = 'completed' THEN 1 END) as completed_events,
    COUNT(CASE WHEN processing_status = 'failed' THEN 1 END) as failed_events,

    -- Business metrics
    SUM(CAST(JSON_EXTRACT(event_data, '$.order_total') AS DECIMAL(10,2))) as total_order_value,
    COUNT(CASE WHEN event_type = 'insert' AND collection_name = 'orders' THEN 1 END) as orders_created,
    COUNT(CASE WHEN event_type = 'update' AND collection_name = 'inventory' THEN 1 END) as inventory_updates,

    -- Process success indicators
    CASE 
      WHEN COUNT(CASE WHEN processing_status = 'failed' THEN 1 END) = 0 
        AND COUNT(CASE WHEN processing_status = 'completed' THEN 1 END) = COUNT(*) 
      THEN 'Success'
      WHEN COUNT(CASE WHEN processing_status = 'failed' THEN 1 END) > 0 THEN 'Failed'
      ELSE 'In Progress'
    END as process_status

  FROM CHANGE_STREAM_EVENTS()
  WHERE correlation_id IS NOT NULL
    AND event_timestamp >= DATE_SUB(NOW(), INTERVAL 1 HOUR)
  GROUP BY correlation_id, business_process_id
),

-- Service dependency and interaction analysis
service_interactions AS (
  SELECT 
    source_service,
    target_service,
    interaction_type,

    -- Interaction volume and frequency
    COUNT(*) as interaction_count,
    COUNT(*) / (TIMESTAMPDIFF(SECOND, MIN(event_timestamp), MAX(event_timestamp)) / 60.0) as interactions_per_minute,

    -- Success and failure rates
    COUNT(CASE WHEN processing_status = 'completed' THEN 1 END) as successful_interactions,
    COUNT(CASE WHEN processing_status = 'failed' THEN 1 END) as failed_interactions,
    ROUND(
      (COUNT(CASE WHEN processing_status = 'completed' THEN 1 END) * 100.0) / COUNT(*), 2
    ) as interaction_success_rate,

    -- Performance metrics
    AVG(processing_duration_ms) as avg_interaction_time_ms,
    MAX(processing_duration_ms) as max_interaction_time_ms,

    -- Data volume analysis
    AVG(LENGTH(event_data)) as avg_event_size_bytes,
    SUM(LENGTH(event_data)) as total_data_transferred_bytes

  FROM CHANGE_STREAM_EVENTS()
  WHERE event_timestamp >= DATE_SUB(NOW(), INTERVAL 24 HOUR)
    AND source_service IS NOT NULL
    AND target_service IS NOT NULL
  GROUP BY source_service, target_service, interaction_type
)

SELECT 
  -- Process correlation summary
  'BUSINESS_PROCESSES' as section,
  JSON_OBJECT(
    'total_processes', COUNT(*),
    'successful_processes', COUNT(CASE WHEN process_status = 'Success' THEN 1 END),
    'failed_processes', COUNT(CASE WHEN process_status = 'Failed' THEN 1 END),
    'in_progress_processes', COUNT(CASE WHEN process_status = 'In Progress' THEN 1 END),
    'avg_process_duration_seconds', AVG(process_duration_seconds),
    'total_business_value', SUM(total_order_value),
    'top_processes', JSON_ARRAYAGG(
      JSON_OBJECT(
        'correlation_id', correlation_id,
        'duration_seconds', process_duration_seconds,
        'services_involved', services_involved,
        'event_sequence', event_sequence,
        'status', process_status
      ) LIMIT 10
    )
  ) as process_analytics
FROM event_correlation

UNION ALL

SELECT 
  -- Service interaction summary
  'SERVICE_INTERACTIONS' as section,
  JSON_OBJECT(
    'total_interactions', SUM(interaction_count),
    'service_pairs', COUNT(*),
    'avg_success_rate', AVG(interaction_success_rate),
    'total_data_transferred_mb', SUM(total_data_transferred_bytes) / 1024 / 1024,
    'interaction_details', JSON_ARRAYAGG(
      JSON_OBJECT(
        'source_service', source_service,
        'target_service', target_service,
        'interaction_count', interaction_count,
        'success_rate', interaction_success_rate,
        'avg_time_ms', avg_interaction_time_ms
      )
    )
  ) as interaction_analytics
FROM service_interactions;

-- Real-time event stream monitoring dashboard
CREATE VIEW microservices_event_dashboard AS
SELECT 
  -- Current system status
  (SELECT COUNT(*) FROM ACTIVE_CHANGE_STREAMS()) as active_streams,
  (SELECT COUNT(*) FROM EVENT_SUBSCRIPTIONS() WHERE status = 'active') as active_subscriptions,
  (SELECT COUNT(*) FROM CHANGE_STREAM_EVENTS() WHERE event_timestamp >= DATE_SUB(NOW(), INTERVAL 1 MINUTE)) as events_per_minute,

  -- Processing queue status
  (SELECT COUNT(*) FROM CHANGE_STREAM_EVENTS() WHERE processing_status = 'queued') as queued_events,
  (SELECT COUNT(*) FROM CHANGE_STREAM_EVENTS() WHERE processing_status = 'processing') as processing_events,
  (SELECT COUNT(*) FROM CHANGE_STREAM_EVENTS() WHERE processing_status = 'retrying') as retrying_events,

  -- Error indicators
  (SELECT COUNT(*) FROM CHANGE_STREAM_EVENTS() 
   WHERE processing_status = 'failed' AND event_timestamp >= DATE_SUB(NOW(), INTERVAL 1 HOUR)) as errors_last_hour,
  (SELECT COUNT(*) FROM DEAD_LETTER_EVENTS() 
   WHERE created_at >= DATE_SUB(NOW(), INTERVAL 1 HOUR)) as dead_letter_events_hour,

  -- Performance indicators
  (SELECT AVG(processing_duration_ms) FROM CHANGE_STREAM_EVENTS() 
   WHERE processing_status = 'completed' AND event_timestamp >= DATE_SUB(NOW(), INTERVAL 5 MINUTE)) as avg_processing_time_5min,
  (SELECT MAX(resume_token_lag_seconds) FROM EVENT_SUBSCRIPTIONS()) as max_resume_token_lag,

  -- Service health summary
  (SELECT 
     JSON_ARRAYAGG(
       JSON_OBJECT(
         'service_name', service_name,
         'subscription_count', subscription_count,
         'success_rate', success_rate,
         'health_status', health_status
       )
     )
   FROM (
     SELECT 
       service_name,
       COUNT(*) as subscription_count,
       AVG(success_rate_percent) as success_rate,
       CASE 
         WHEN AVG(success_rate_percent) >= 99 THEN 'Excellent'
         WHEN AVG(success_rate_percent) >= 95 THEN 'Good'
         WHEN AVG(success_rate_percent) >= 90 THEN 'Warning'
         ELSE 'Critical'
       END as health_status
     FROM event_stream_metrics
     GROUP BY service_name
   ) service_health
  ) as service_health_summary,

  -- System health assessment
  CASE 
    WHEN (SELECT COUNT(*) FROM CHANGE_STREAM_EVENTS() WHERE processing_status = 'failed' 
          AND event_timestamp >= DATE_SUB(NOW(), INTERVAL 5 MINUTE)) > 100 THEN 'Critical'
    WHEN (SELECT MAX(resume_token_lag_seconds) FROM EVENT_SUBSCRIPTIONS()) > 300 THEN 'Warning'
    WHEN (SELECT AVG(processing_duration_ms) FROM CHANGE_STREAM_EVENTS() 
          WHERE event_timestamp >= DATE_SUB(NOW(), INTERVAL 5 MINUTE)) > 5000 THEN 'Degraded'
    ELSE 'Healthy'
  END as overall_system_health,

  NOW() as dashboard_timestamp;

-- QueryLeaf Change Streams provide:
-- 1. SQL-familiar event subscription creation and management
-- 2. Real-time event processing monitoring and analytics
-- 3. Advanced event correlation and business process tracking
-- 4. Service interaction analysis and dependency mapping
-- 5. Comprehensive error handling and dead letter queue management
-- 6. Performance optimization recommendations and health monitoring
-- 7. Integration with MongoDB's native Change Streams capabilities
-- 8. Familiar SQL syntax for complex event processing workflows
-- 9. Real-time dashboard views for operational monitoring
-- 10. Enterprise-grade event-driven architecture patterns

Best Practices for MongoDB Change Streams

Event-Driven Architecture Design

Essential principles for building robust microservices with Change Streams:

  1. Event Filtering: Use precise filtering to reduce network traffic and processing overhead
  2. Resume Token Management: Implement robust resume token persistence for fault tolerance
  3. Batch Processing: Configure appropriate batch sizes for high-volume event scenarios
  4. Error Handling: Design comprehensive error handling with retry policies and dead letter queues
  5. Service Boundaries: Align Change Stream subscriptions with clear service boundaries and responsibilities
  6. Performance Monitoring: Implement real-time monitoring for event processing lag and system health

Production Deployment Strategies

Optimize Change Streams for enterprise-scale microservices architectures:

  1. Connection Management: Use dedicated connections for Change Streams to avoid resource contention
  2. Replica Set Configuration: Ensure proper read preferences for Change Stream operations
  3. Network Optimization: Configure appropriate network timeouts and connection pooling
  4. Scaling Patterns: Implement horizontal scaling strategies for high-volume event processing
  5. Security Integration: Secure Change Stream connections with proper authentication and encryption
  6. Operational Monitoring: Deploy comprehensive monitoring and alerting for Change Stream health

Conclusion

MongoDB Change Streams provide sophisticated event-driven capabilities that enable resilient microservices architectures through native database-level event processing, automatic fault tolerance, and comprehensive filtering mechanisms. By implementing advanced Change Stream patterns with QueryLeaf's familiar SQL interface, organizations can build robust distributed systems that maintain data consistency, service decoupling, and operational resilience at scale.

Key Change Streams benefits include:

  • Native Event Processing: Database-level event streaming without additional middleware dependencies
  • Guaranteed Delivery: Ordered event delivery with automatic resume token management for fault tolerance
  • Advanced Filtering: Sophisticated event filtering and routing capabilities with minimal network overhead
  • High Performance: Optimized event processing with configurable batching and concurrency controls
  • Service Decoupling: Clean separation of concerns enabling independent service evolution and scaling
  • Operational Simplicity: Reduced infrastructure complexity compared to traditional message queue systems

Whether you're building e-commerce platforms, financial services applications, or distributed data processing systems, MongoDB Change Streams with QueryLeaf's event processing interface provide the foundation for scalable, reliable event-driven microservices architectures that can evolve and scale with growing business requirements.

QueryLeaf Integration: QueryLeaf automatically translates SQL-familiar event processing commands into optimized MongoDB Change Stream operations, providing familiar subscription management, event correlation, and monitoring capabilities. Advanced event-driven patterns, service interaction analysis, and performance optimization are seamlessly handled through SQL-style interfaces, making sophisticated microservices architecture both powerful and accessible for database-oriented development teams.

The combination of MongoDB's native Change Streams with SQL-style event processing operations makes it an ideal platform for modern distributed systems that require both real-time event processing capabilities and familiar database administration patterns, ensuring your microservices architecture remains both scalable and maintainable as it grows to meet demanding production requirements.

MongoDB Data Archiving and Lifecycle Management: Advanced Strategies for Automated Data Retention, Performance Optimization, and Compliance

Production database systems accumulate vast amounts of data over time, creating significant challenges for performance optimization, storage cost management, and regulatory compliance. Traditional database systems often struggle with efficient data archiving strategies that balance query performance, storage costs, and data accessibility requirements while maintaining operational efficiency and compliance with data retention policies.

MongoDB provides comprehensive data lifecycle management capabilities that enable sophisticated archiving strategies through automated retention policies, performance-optimized data movement, and flexible storage tiering. Unlike traditional databases that require complex partitioning schemes and manual maintenance processes, MongoDB's document-based architecture and built-in features support seamless data archiving workflows that scale with growing data volumes while maintaining operational simplicity.

The Traditional Data Archiving Challenge

Conventional database systems face significant limitations when implementing data archiving and lifecycle management:

-- Traditional PostgreSQL data archiving - complex and maintenance-intensive approach

-- Create archive tables with identical structures (manual maintenance required)
CREATE TABLE orders_2023_archive (
    order_id SERIAL PRIMARY KEY,
    customer_id INTEGER NOT NULL,
    order_date TIMESTAMP NOT NULL,
    status VARCHAR(50) NOT NULL,
    total_amount DECIMAL(10,2) NOT NULL,
    items JSONB,
    shipping_address TEXT,
    billing_address TEXT,

    -- Archive-specific metadata
    archived_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    archived_by VARCHAR(100) DEFAULT current_user,
    archive_reason VARCHAR(200),
    original_table VARCHAR(100) DEFAULT 'orders',

    -- Compliance tracking
    retention_policy VARCHAR(100),
    scheduled_deletion_date DATE,
    legal_hold BOOLEAN DEFAULT false,

    -- Performance considerations
    CONSTRAINT orders_2023_archive_date_check 
        CHECK (order_date >= '2023-01-01' AND order_date < '2024-01-01')
);

-- Create indexes for archive table (must mirror production indexes)
CREATE INDEX orders_2023_archive_customer_id_idx ON orders_2023_archive(customer_id);
CREATE INDEX orders_2023_archive_date_idx ON orders_2023_archive(order_date);
CREATE INDEX orders_2023_archive_status_idx ON orders_2023_archive(status);
CREATE INDEX orders_2023_archive_archived_date_idx ON orders_2023_archive(archived_date);

-- Similar structure needed for each year and potentially each table
CREATE TABLE customer_interactions_2023_archive (
    interaction_id SERIAL PRIMARY KEY,
    customer_id INTEGER NOT NULL,
    interaction_date TIMESTAMP NOT NULL,
    interaction_type VARCHAR(100),
    details JSONB,
    outcome VARCHAR(100),

    -- Archive metadata
    archived_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    archived_by VARCHAR(100) DEFAULT current_user,
    archive_reason VARCHAR(200),
    original_table VARCHAR(100) DEFAULT 'customer_interactions',

    CONSTRAINT customer_interactions_2023_archive_date_check 
        CHECK (interaction_date >= '2023-01-01' AND interaction_date < '2024-01-01')
);

-- Complex archiving procedure with limited automation
CREATE OR REPLACE FUNCTION archive_old_data(
    source_table VARCHAR(100),
    archive_table VARCHAR(100), 
    cutoff_date DATE,
    batch_size INTEGER DEFAULT 1000,
    archive_reason VARCHAR(200) DEFAULT 'automated_archiving'
) RETURNS TABLE (
    records_archived INTEGER,
    batches_processed INTEGER,
    total_processing_time_seconds INTEGER,
    errors_encountered INTEGER,
    last_archived_id BIGINT
) AS $$
DECLARE
    current_batch INTEGER := 0;
    total_archived INTEGER := 0;
    start_time TIMESTAMP := clock_timestamp();
    last_id BIGINT := 0;
    batch_result INTEGER;
    error_count INTEGER := 0;
    sql_command TEXT;
    archive_command TEXT;
BEGIN

    LOOP
        -- Dynamic SQL for flexible table handling (security risk)
        sql_command := FORMAT('
            WITH batch_data AS (
                SELECT * FROM %I 
                WHERE created_date < %L 
                AND id > %L
                ORDER BY id 
                LIMIT %L
            ),
            archived_batch AS (
                INSERT INTO %I 
                SELECT *, CURRENT_TIMESTAMP, %L, %L, %L
                FROM batch_data
                RETURNING id
            ),
            deleted_batch AS (
                DELETE FROM %I 
                WHERE id IN (SELECT id FROM archived_batch)
                RETURNING id
            )
            SELECT COUNT(*), MAX(id) FROM deleted_batch',
            source_table,
            cutoff_date,
            last_id,
            batch_size,
            archive_table,
            current_user,
            archive_reason,
            source_table,
            source_table
        );

        BEGIN
            EXECUTE sql_command INTO batch_result, last_id;

            -- Exit if no more records to process
            IF batch_result = 0 OR last_id IS NULL THEN
                EXIT;
            END IF;

            total_archived := total_archived + batch_result;
            current_batch := current_batch + 1;

            -- Commit every batch to avoid long-running transactions
            COMMIT;

            -- Brief pause to avoid overwhelming the system
            PERFORM pg_sleep(0.1);

        EXCEPTION WHEN OTHERS THEN
            error_count := error_count + 1;

            -- Log error details (basic error handling)
            INSERT INTO archive_error_log (
                source_table,
                archive_table,
                batch_number,
                last_processed_id,
                error_message,
                error_timestamp
            ) VALUES (
                source_table,
                archive_table,
                current_batch,
                last_id,
                SQLERRM,
                CURRENT_TIMESTAMP
            );

            -- Stop after too many errors
            IF error_count > 10 THEN
                EXIT;
            END IF;
        END;
    END LOOP;

    RETURN QUERY SELECT 
        total_archived,
        current_batch,
        EXTRACT(SECONDS FROM clock_timestamp() - start_time)::INTEGER,
        error_count,
        COALESCE(last_id, 0);

EXCEPTION WHEN OTHERS THEN
    -- Global error handling
    INSERT INTO archive_error_log (
        source_table,
        archive_table,
        batch_number,
        last_processed_id,
        error_message,
        error_timestamp
    ) VALUES (
        source_table,
        archive_table,
        -1,
        -1,
        'Global archiving error: ' || SQLERRM,
        CURRENT_TIMESTAMP
    );

    RETURN QUERY SELECT 0, 0, 0, 1, 0::BIGINT;
END;
$$ LANGUAGE plpgsql;

-- Manual data archiving execution (error-prone and inflexible)
DO $$
DECLARE
    archive_result RECORD;
    tables_to_archive VARCHAR(100)[] := ARRAY['orders', 'customer_interactions', 'payment_transactions', 'audit_logs'];
    current_table VARCHAR(100);
    archive_table_name VARCHAR(100);
    cutoff_date DATE := CURRENT_DATE - INTERVAL '2 years';
BEGIN

    FOREACH current_table IN ARRAY tables_to_archive
    LOOP
        -- Generate archive table name
        archive_table_name := current_table || '_' || EXTRACT(YEAR FROM cutoff_date) || '_archive';

        -- Check if archive table exists (manual verification)
        IF NOT EXISTS (SELECT 1 FROM information_schema.tables 
                      WHERE table_name = archive_table_name) THEN
            RAISE NOTICE 'Archive table % does not exist, skipping %', archive_table_name, current_table;
            CONTINUE;
        END IF;

        RAISE NOTICE 'Starting archival of % to %', current_table, archive_table_name;

        -- Execute archiving function
        FOR archive_result IN 
            SELECT * FROM archive_old_data(
                current_table, 
                archive_table_name, 
                cutoff_date,
                1000,  -- batch size
                'automated_yearly_archival'
            )
        LOOP
            RAISE NOTICE 'Archived % records from % in % batches, % errors, processing time: % seconds',
                archive_result.records_archived,
                current_table,
                archive_result.batches_processed,
                archive_result.errors_encountered,
                archive_result.total_processing_time_seconds;
        END LOOP;

        -- Basic statistics update (manual maintenance)
        EXECUTE FORMAT('ANALYZE %I', archive_table_name);

    END LOOP;
END;
$$;

-- Attempt at automated retention policy management (very limited)
CREATE TABLE data_retention_policies (
    policy_id SERIAL PRIMARY KEY,
    table_name VARCHAR(100) NOT NULL,
    retention_period_months INTEGER NOT NULL,
    archive_after_months INTEGER,
    delete_after_months INTEGER,

    -- Policy configuration
    policy_enabled BOOLEAN DEFAULT true,
    date_field VARCHAR(100) NOT NULL DEFAULT 'created_date',
    archive_storage_location VARCHAR(200),

    -- Compliance settings
    legal_hold_exemption BOOLEAN DEFAULT false,
    gdpr_applicable BOOLEAN DEFAULT false,
    custom_retention_rules JSONB,

    -- Execution tracking
    last_executed TIMESTAMP,
    last_execution_status VARCHAR(50),
    last_execution_error TEXT,

    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Insert basic retention policies (manual configuration)
INSERT INTO data_retention_policies (
    table_name, retention_period_months, archive_after_months, delete_after_months,
    date_field, archive_storage_location
) VALUES 
('orders', 84, 24, 96, 'order_date', '/archives/orders/'),
('customer_interactions', 60, 12, 72, 'interaction_date', '/archives/interactions/'),
('payment_transactions', 120, 36, 144, 'transaction_date', '/archives/payments/'),
('audit_logs', 36, 6, 48, 'log_timestamp', '/archives/audit/');

-- Rudimentary retention policy execution function
CREATE OR REPLACE FUNCTION execute_retention_policies()
RETURNS TABLE (
    policy_name VARCHAR(100),
    execution_status VARCHAR(50),
    records_processed INTEGER,
    execution_time_seconds INTEGER,
    error_message TEXT
) AS $$
DECLARE
    policy_record RECORD;
    archive_cutoff DATE;
    delete_cutoff DATE;
    execution_start TIMESTAMP;
    archive_result RECORD;
    records_affected INTEGER;
BEGIN

    FOR policy_record IN 
        SELECT * FROM data_retention_policies 
        WHERE policy_enabled = true
    LOOP
        execution_start := clock_timestamp();

        BEGIN
            -- Calculate cutoff dates based on policy
            archive_cutoff := CURRENT_DATE - (policy_record.archive_after_months || ' months')::INTERVAL;
            delete_cutoff := CURRENT_DATE - (policy_record.delete_after_months || ' months')::INTERVAL;

            -- Archival phase (if configured)
            IF policy_record.archive_after_months IS NOT NULL THEN
                SELECT * INTO archive_result FROM archive_old_data(
                    policy_record.table_name,
                    policy_record.table_name || '_archive',
                    archive_cutoff,
                    500,
                    'retention_policy_execution'
                );

                records_affected := archive_result.records_archived;
            END IF;

            -- Update execution status
            UPDATE data_retention_policies 
            SET 
                last_executed = CURRENT_TIMESTAMP,
                last_execution_status = 'success',
                last_execution_error = NULL
            WHERE policy_id = policy_record.policy_id;

            RETURN QUERY SELECT 
                policy_record.table_name,
                'success'::VARCHAR(50),
                COALESCE(records_affected, 0),
                EXTRACT(SECONDS FROM clock_timestamp() - execution_start)::INTEGER,
                NULL::TEXT;

        EXCEPTION WHEN OTHERS THEN
            -- Update error status
            UPDATE data_retention_policies 
            SET 
                last_executed = CURRENT_TIMESTAMP,
                last_execution_status = 'error',
                last_execution_error = SQLERRM
            WHERE policy_id = policy_record.policy_id;

            RETURN QUERY SELECT 
                policy_record.table_name,
                'error'::VARCHAR(50),
                0,
                EXTRACT(SECONDS FROM clock_timestamp() - execution_start)::INTEGER,
                SQLERRM;
        END;
    END LOOP;
END;
$$ LANGUAGE plpgsql;

-- Problems with traditional archiving approaches:
-- 1. Manual archive table creation and maintenance for each table and time period
-- 2. Complex partitioning schemes that require ongoing schema management
-- 3. Limited automation capabilities requiring extensive custom development
-- 4. Poor performance during archiving operations that impact production systems
-- 5. Inflexible retention policies that don't adapt to changing business requirements
-- 6. Minimal integration with cloud storage and tiered storage strategies
-- 7. Limited compliance tracking and audit trail capabilities
-- 8. No built-in data lifecycle automation or policy-driven management
-- 9. Complex disaster recovery for archived data across multiple table structures
-- 10. High maintenance overhead for managing archive table schemas and indexes

MongoDB provides sophisticated data lifecycle management with automated archiving capabilities:

// MongoDB Data Archiving and Lifecycle Management - comprehensive automation system
const { MongoClient, GridFSBucket } = require('mongodb');
const { createReadStream, createWriteStream } = require('fs');
const { S3Client, PutObjectCommand, GetObjectCommand } = require('@aws-sdk/client-s3');
const { promisify } = require('util');
const zlib = require('zlib');

// Advanced data lifecycle management and archiving system
class MongoDataLifecycleManager {
  constructor(connectionUri, options = {}) {
    this.client = new MongoClient(connectionUri);
    this.db = null;
    this.collections = new Map();

    // Lifecycle management configuration
    this.config = {
      // Archive storage configuration
      archiveStorage: {
        type: options.archiveStorage?.type || 'mongodb', // mongodb, gridfs, s3, filesystem
        location: options.archiveStorage?.location || 'archives',
        compression: options.archiveStorage?.compression || 'gzip',
        encryption: options.archiveStorage?.encryption || false,
        checksumVerification: options.archiveStorage?.checksumVerification || true
      },

      // Performance optimization settings
      performance: {
        batchSize: options.performance?.batchSize || 1000,
        maxConcurrentOperations: options.performance?.maxConcurrentOperations || 3,
        throttleDelayMs: options.performance?.throttleDelayMs || 10,
        memoryLimitMB: options.performance?.memoryLimitMB || 512,
        indexOptimization: options.performance?.indexOptimization || true
      },

      // Compliance and audit settings
      compliance: {
        auditLogging: options.compliance?.auditLogging !== false,
        legalHoldSupport: options.compliance?.legalHoldSupport !== false,
        gdprCompliance: options.compliance?.gdprCompliance || false,
        dataClassification: options.compliance?.dataClassification || {},
        retentionPolicyEnforcement: options.compliance?.retentionPolicyEnforcement !== false
      },

      // Automation settings
      automation: {
        scheduledExecution: options.automation?.scheduledExecution || false,
        executionInterval: options.automation?.executionInterval || 86400000, // 24 hours
        failureRetryAttempts: options.automation?.failureRetryAttempts || 3,
        alerting: options.automation?.alerting || false,
        monitoringEnabled: options.automation?.monitoringEnabled !== false
      }
    };

    // External storage clients
    this.s3Client = options.s3Config ? new S3Client(options.s3Config) : null;
    this.gridFSBucket = null;

    // Operational state management
    this.retentionPolicies = new Map();
    this.executionHistory = [];
    this.activeOperations = new Map();
    this.performanceMetrics = {
      totalRecordsArchived: 0,
      totalStorageSaved: 0,
      averageOperationTime: 0,
      lastExecutionTime: null
    };
  }

  async initialize(dbName) {
    console.log('Initializing MongoDB Data Lifecycle Management system...');

    try {
      await this.client.connect();
      this.db = this.client.db(dbName);

      // Initialize GridFS bucket if needed
      if (this.config.archiveStorage.type === 'gridfs') {
        this.gridFSBucket = new GridFSBucket(this.db, { 
          bucketName: this.config.archiveStorage.location || 'archives' 
        });
      }

      // Setup system collections
      await this.setupSystemCollections();

      // Load existing retention policies
      await this.loadRetentionPolicies();

      // Setup automation if enabled
      if (this.config.automation.scheduledExecution) {
        await this.setupAutomatedExecution();
      }

      console.log('Data lifecycle management system initialized successfully');

    } catch (error) {
      console.error('Error initializing data lifecycle management:', error);
      throw error;
    }
  }

  async setupSystemCollections() {
    console.log('Setting up system collections for data lifecycle management...');

    // Retention policies collection
    const retentionPolicies = this.db.collection('data_retention_policies');
    await retentionPolicies.createIndexes([
      { key: { collection_name: 1 }, unique: true },
      { key: { policy_enabled: 1 } },
      { key: { next_execution: 1 } }
    ]);

    // Archive metadata collection
    const archiveMetadata = this.db.collection('archive_metadata');
    await archiveMetadata.createIndexes([
      { key: { source_collection: 1, archive_date: -1 } },
      { key: { archive_id: 1 }, unique: true },
      { key: { retention_policy_id: 1 } },
      { key: { compliance_status: 1 } }
    ]);

    // Execution audit log
    const executionAudit = this.db.collection('lifecycle_execution_audit');
    await executionAudit.createIndexes([
      { key: { execution_timestamp: -1 } },
      { key: { policy_id: 1, execution_timestamp: -1 } },
      { key: { operation_type: 1 } }
    ]);

    // Legal hold registry (compliance feature)
    if (this.config.compliance.legalHoldSupport) {
      const legalHolds = this.db.collection('legal_hold_registry');
      await legalHolds.createIndexes([
        { key: { hold_id: 1 }, unique: true },
        { key: { affected_collections: 1 } },
        { key: { hold_status: 1 } }
      ]);
    }
  }

  async defineRetentionPolicy(policyConfig) {
    console.log(`Defining retention policy for collection: ${policyConfig.collectionName}`);

    const policy = {
      policy_id: policyConfig.policyId || this.generatePolicyId(),
      collection_name: policyConfig.collectionName,

      // Retention timeline configuration
      retention_phases: {
        active_period_days: policyConfig.activePeriod || 365,
        archive_after_days: policyConfig.archiveAfter || 730,
        delete_after_days: policyConfig.deleteAfter || 2555, // 7 years default

        // Advanced retention phases
        cold_storage_after_days: policyConfig.coldStorageAfter,
        compliance_review_after_days: policyConfig.complianceReviewAfter
      },

      // Data identification and filtering
      date_field: policyConfig.dateField || 'created_at',
      additional_filters: policyConfig.filters || {},
      exclusion_criteria: policyConfig.exclusions || {},

      // Archive configuration
      archive_settings: {
        storage_type: policyConfig.archiveStorage || this.config.archiveStorage.type,
        compression_enabled: policyConfig.compression !== false,
        encryption_required: policyConfig.encryption || false,
        batch_size: policyConfig.batchSize || this.config.performance.batchSize,

        // Performance optimization
        index_hints: policyConfig.indexHints || [],
        sort_optimization: policyConfig.sortField || policyConfig.dateField,
        memory_limit: policyConfig.memoryLimit || '200M'
      },

      // Compliance configuration
      compliance_settings: {
        legal_hold_exempt: policyConfig.legalHoldExempt || false,
        data_classification: policyConfig.dataClassification || 'standard',
        gdpr_applicable: policyConfig.gdprApplicable || false,
        audit_level: policyConfig.auditLevel || 'standard',

        // Data sensitivity handling
        pii_fields: policyConfig.piiFields || [],
        anonymization_rules: policyConfig.anonymizationRules || {}
      },

      // Execution configuration
      execution_settings: {
        policy_enabled: policyConfig.enabled !== false,
        execution_schedule: policyConfig.schedule || '0 2 * * *', // Daily at 2 AM
        max_execution_time_minutes: policyConfig.maxExecutionTime || 120,
        failure_retry_attempts: policyConfig.retryAttempts || 3,
        notification_settings: policyConfig.notifications || {}
      },

      // Metadata and tracking
      policy_metadata: {
        created_by: policyConfig.createdBy || 'system',
        created_at: new Date(),
        last_modified: new Date(),
        policy_version: policyConfig.version || '1.0',
        description: policyConfig.description || '',
        business_justification: policyConfig.businessJustification || ''
      }
    };

    // Store retention policy
    const retentionPolicies = this.db.collection('data_retention_policies');
    await retentionPolicies.replaceOne(
      { collection_name: policy.collection_name },
      policy,
      { upsert: true }
    );

    // Cache policy for operational use
    this.retentionPolicies.set(policy.collection_name, policy);

    console.log(`Retention policy defined successfully for ${policy.collection_name}`);
    return policy.policy_id;
  }

  async executeDataArchiving(collectionName, options = {}) {
    console.log(`Starting data archiving for collection: ${collectionName}`);

    const policy = this.retentionPolicies.get(collectionName);
    if (!policy || !policy.execution_settings.policy_enabled) {
      throw new Error(`No enabled retention policy found for collection: ${collectionName}`);
    }

    const operationId = this.generateOperationId();
    const startTime = Date.now();

    try {
      // Check for legal holds
      if (this.config.compliance.legalHoldSupport) {
        await this.checkLegalHolds(collectionName, policy);
      }

      // Calculate archive cutoff date
      const cutoffDate = new Date();
      cutoffDate.setDate(cutoffDate.getDate() - policy.retention_phases.archive_after_days);

      // Build archive query with optimization
      const archiveQuery = {
        [policy.date_field]: { $lt: cutoffDate },
        ...policy.additional_filters,
        ...(policy.exclusion_criteria && { $nor: [policy.exclusion_criteria] })
      };

      // Count records to archive
      const sourceCollection = this.db.collection(collectionName);
      const recordCount = await sourceCollection.countDocuments(archiveQuery);

      if (recordCount === 0) {
        console.log(`No records found for archiving in ${collectionName}`);
        return { success: true, recordsProcessed: 0, operationId };
      }

      console.log(`Found ${recordCount} records to archive from ${collectionName}`);

      // Execute archiving in batches
      const archiveResult = await this.executeBatchArchiving(
        sourceCollection,
        archiveQuery,
        policy,
        operationId,
        options
      );

      // Create archive metadata record
      await this.createArchiveMetadata({
        archive_id: operationId,
        source_collection: collectionName,
        archive_date: new Date(),
        record_count: archiveResult.recordsArchived,
        archive_size: archiveResult.archiveSize,
        policy_id: policy.policy_id,
        archive_location: archiveResult.archiveLocation,
        checksum: archiveResult.checksum,

        compliance_info: {
          legal_hold_checked: this.config.compliance.legalHoldSupport,
          gdpr_compliant: policy.compliance_settings.gdpr_applicable,
          audit_trail: archiveResult.auditTrail
        }
      });

      // Log execution in audit trail
      await this.logExecutionAudit({
        operation_id: operationId,
        operation_type: 'archive',
        collection_name: collectionName,
        policy_id: policy.policy_id,
        execution_timestamp: new Date(),
        records_processed: archiveResult.recordsArchived,
        execution_duration_ms: Date.now() - startTime,
        status: 'success',
        performance_metrics: archiveResult.performanceMetrics
      });

      console.log(`Data archiving completed successfully for ${collectionName}`);
      return {
        success: true,
        operationId,
        recordsArchived: archiveResult.recordsArchived,
        archiveSize: archiveResult.archiveSize,
        executionTime: Date.now() - startTime
      };

    } catch (error) {
      console.error(`Error during data archiving for ${collectionName}:`, error);

      // Log error in audit trail
      await this.logExecutionAudit({
        operation_id: operationId,
        operation_type: 'archive',
        collection_name: collectionName,
        policy_id: policy?.policy_id,
        execution_timestamp: new Date(),
        execution_duration_ms: Date.now() - startTime,
        status: 'error',
        error_message: error.message
      });

      throw error;
    }
  }

  async executeBatchArchiving(sourceCollection, archiveQuery, policy, operationId, options) {
    console.log('Executing batch archiving with performance optimization...');

    const batchSize = policy.archive_settings.batch_size;
    const archiveLocation = await this.prepareArchiveLocation(operationId, policy);

    let totalArchived = 0;
    let totalSize = 0;
    let batchNumber = 0;
    const auditTrail = [];
    const performanceMetrics = {
      avgBatchTime: 0,
      maxBatchTime: 0,
      totalBatches: 0,
      throughputRecordsPerSecond: 0
    };

    // Create cursor with optimization hints
    const cursor = sourceCollection.find(archiveQuery)
      .sort({ [policy.archive_settings.sort_optimization]: 1 })
      .batchSize(batchSize);

    // Add index hint if specified
    if (policy.archive_settings.index_hints.length > 0) {
      cursor.hint(policy.archive_settings.index_hints[0]);
    }

    let batch = [];
    let batchStartTime = Date.now();

    for await (const document of cursor) {
      batch.push(document);

      // Process batch when full
      if (batch.length >= batchSize) {
        const batchResult = await this.processBatch(
          batch,
          archiveLocation,
          policy,
          batchNumber,
          operationId
        );

        const batchTime = Date.now() - batchStartTime;
        performanceMetrics.avgBatchTime = (performanceMetrics.avgBatchTime * batchNumber + batchTime) / (batchNumber + 1);
        performanceMetrics.maxBatchTime = Math.max(performanceMetrics.maxBatchTime, batchTime);

        totalArchived += batchResult.recordsProcessed;
        totalSize += batchResult.batchSize;
        batchNumber++;

        auditTrail.push({
          batch_number: batchNumber,
          records_processed: batchResult.recordsProcessed,
          batch_size: batchResult.batchSize,
          processing_time_ms: batchTime
        });

        // Reset batch
        batch = [];
        batchStartTime = Date.now();

        // Throttle to avoid overwhelming the system
        if (this.config.performance.throttleDelayMs > 0) {
          await new Promise(resolve => setTimeout(resolve, this.config.performance.throttleDelayMs));
        }
      }
    }

    // Process final partial batch
    if (batch.length > 0) {
      const batchResult = await this.processBatch(
        batch,
        archiveLocation,
        policy,
        batchNumber,
        operationId
      );

      totalArchived += batchResult.recordsProcessed;
      totalSize += batchResult.batchSize;
      batchNumber++;
    }

    // Calculate final performance metrics
    performanceMetrics.totalBatches = batchNumber;
    performanceMetrics.throughputRecordsPerSecond = totalArchived / ((Date.now() - batchStartTime) / 1000);

    // Generate archive checksum for integrity verification
    const checksum = await this.generateArchiveChecksum(archiveLocation, totalArchived);

    console.log(`Batch archiving completed: ${totalArchived} records in ${batchNumber} batches`);

    return {
      recordsArchived: totalArchived,
      archiveSize: totalSize,
      archiveLocation,
      checksum,
      auditTrail,
      performanceMetrics
    };
  }

  async processBatch(batch, archiveLocation, policy, batchNumber, operationId) {
    const batchStartTime = Date.now();

    // Apply data transformations if needed (PII anonymization, etc.)
    const processedBatch = await this.applyDataTransformations(batch, policy);

    // Store batch based on configured storage type
    let batchSize;
    switch (this.config.archiveStorage.type) {
      case 'mongodb':
        batchSize = await this.storeBatchToMongoDB(processedBatch, archiveLocation);
        break;
      case 'gridfs':
        batchSize = await this.storeBatchToGridFS(processedBatch, archiveLocation, batchNumber);
        break;
      case 's3':
        batchSize = await this.storeBatchToS3(processedBatch, archiveLocation, batchNumber);
        break;
      case 'filesystem':
        batchSize = await this.storeBatchToFileSystem(processedBatch, archiveLocation, batchNumber);
        break;
      default:
        throw new Error(`Unsupported archive storage type: ${this.config.archiveStorage.type}`);
    }

    // Remove archived documents from source collection
    const documentIds = batch.map(doc => doc._id);
    const deleteResult = await this.db.collection(policy.collection_name).deleteMany({
      _id: { $in: documentIds }
    });

    console.log(`Batch ${batchNumber}: archived ${batch.length} records, size: ${batchSize} bytes`);

    return {
      recordsProcessed: batch.length,
      batchSize,
      deletedRecords: deleteResult.deletedCount,
      processingTime: Date.now() - batchStartTime
    };
  }

  async applyDataTransformations(batch, policy) {
    if (!policy.compliance_settings.pii_fields.length && 
        !Object.keys(policy.compliance_settings.anonymization_rules).length) {
      return batch; // No transformations needed
    }

    console.log('Applying data transformations for compliance...');

    return batch.map(document => {
      let processedDoc = { ...document };

      // Apply PII field anonymization
      policy.compliance_settings.pii_fields.forEach(field => {
        if (processedDoc[field]) {
          processedDoc[field] = this.anonymizeField(processedDoc[field], field);
        }
      });

      // Apply custom anonymization rules
      Object.entries(policy.compliance_settings.anonymization_rules).forEach(([field, rule]) => {
        if (processedDoc[field]) {
          processedDoc[field] = this.applyAnonymizationRule(processedDoc[field], rule);
        }
      });

      // Add transformation metadata
      processedDoc._archive_metadata = {
        original_id: document._id,
        archived_at: new Date(),
        transformations_applied: [
          ...policy.compliance_settings.pii_fields.map(field => `pii_anonymization:${field}`),
          ...Object.keys(policy.compliance_settings.anonymization_rules).map(field => `custom_rule:${field}`)
        ]
      };

      return processedDoc;
    });
  }

  async storeBatchToMongoDB(batch, archiveLocation) {
    const archiveCollection = this.db.collection(archiveLocation);
    const insertResult = await archiveCollection.insertMany(batch, { 
      ordered: false,
      writeConcern: { w: 'majority', j: true }
    });

    return JSON.stringify(batch).length; // Approximate size
  }

  async storeBatchToGridFS(batch, archiveLocation, batchNumber) {
    const fileName = `${archiveLocation}_batch_${batchNumber.toString().padStart(6, '0')}.json`;
    const batchData = JSON.stringify(batch);

    if (this.config.archiveStorage.compression === 'gzip') {
      const compressedData = await promisify(zlib.gzip)(batchData);
      const uploadStream = this.gridFSBucket.openUploadStream(`${fileName}.gz`, {
        metadata: {
          batch_number: batchNumber,
          record_count: batch.length,
          compression: 'gzip',
          archived_at: new Date()
        }
      });

      uploadStream.end(compressedData);
      return compressedData.length;
    } else {
      const uploadStream = this.gridFSBucket.openUploadStream(fileName, {
        metadata: {
          batch_number: batchNumber,
          record_count: batch.length,
          archived_at: new Date()
        }
      });

      uploadStream.end(Buffer.from(batchData));
      return batchData.length;
    }
  }

  async storeBatchToS3(batch, archiveLocation, batchNumber) {
    if (!this.s3Client) {
      throw new Error('S3 client not configured for archive storage');
    }

    const key = `${archiveLocation}/batch_${batchNumber.toString().padStart(6, '0')}.json`;
    let data = JSON.stringify(batch);

    if (this.config.archiveStorage.compression === 'gzip') {
      data = await promisify(zlib.gzip)(data);
    }

    const putCommand = new PutObjectCommand({
      Bucket: this.config.archiveStorage.location,
      Key: key,
      Body: data,
      ContentType: 'application/json',
      ContentEncoding: this.config.archiveStorage.compression === 'gzip' ? 'gzip' : undefined,
      Metadata: {
        batch_number: batchNumber.toString(),
        record_count: batch.length.toString(),
        archived_at: new Date().toISOString()
      }
    });

    await this.s3Client.send(putCommand);
    return data.length;
  }

  async setupAutomaticDataDeletion(collectionName, options = {}) {
    console.log(`Setting up automatic data deletion for: ${collectionName}`);

    const policy = this.retentionPolicies.get(collectionName);
    if (!policy) {
      throw new Error(`No retention policy found for collection: ${collectionName}`);
    }

    // Use MongoDB TTL index for automatic deletion where possible
    const collection = this.db.collection(collectionName);

    // Create TTL index based on retention policy
    const ttlSeconds = policy.retention_phases.delete_after_days * 24 * 60 * 60;

    try {
      await collection.createIndex(
        { [policy.date_field]: 1 },
        { 
          expireAfterSeconds: ttlSeconds,
          background: true,
          name: `ttl_${policy.date_field}_${ttlSeconds}s`
        }
      );

      console.log(`TTL index created for automatic deletion: ${ttlSeconds} seconds`);

      // Update policy to track TTL index usage
      await this.db.collection('data_retention_policies').updateOne(
        { collection_name: collectionName },
        { 
          $set: { 
            'deletion_settings.ttl_enabled': true,
            'deletion_settings.ttl_seconds': ttlSeconds,
            'deletion_settings.ttl_field': policy.date_field
          }
        }
      );

      return { success: true, ttlSeconds, indexName: `ttl_${policy.date_field}_${ttlSeconds}s` };

    } catch (error) {
      console.error('Error setting up TTL index:', error);
      throw error;
    }
  }

  async retrieveArchivedData(archiveId, query = {}, options = {}) {
    console.log(`Retrieving archived data for archive ID: ${archiveId}`);

    // Get archive metadata
    const archiveMetadata = await this.db.collection('archive_metadata')
      .findOne({ archive_id: archiveId });

    if (!archiveMetadata) {
      throw new Error(`Archive not found: ${archiveId}`);
    }

    const { limit = 100, skip = 0, projection = {} } = options;
    let retrievedData = [];

    // Retrieve data based on storage type
    switch (this.config.archiveStorage.type) {
      case 'mongodb':
        const archiveCollection = this.db.collection(archiveMetadata.archive_location);
        retrievedData = await archiveCollection
          .find(query, { projection })
          .skip(skip)
          .limit(limit)
          .toArray();
        break;

      case 'gridfs':
        retrievedData = await this.retrieveFromGridFS(archiveMetadata, query, options);
        break;

      case 's3':
        retrievedData = await this.retrieveFromS3(archiveMetadata, query, options);
        break;

      default:
        throw new Error(`Archive retrieval not supported for storage type: ${this.config.archiveStorage.type}`);
    }

    // Log retrieval for audit purposes
    await this.logExecutionAudit({
      operation_id: this.generateOperationId(),
      operation_type: 'retrieve',
      archive_id: archiveId,
      execution_timestamp: new Date(),
      records_retrieved: retrievedData.length,
      retrieval_query: query,
      status: 'success'
    });

    return {
      archiveMetadata,
      data: retrievedData,
      totalRecords: archiveMetadata.record_count,
      retrievedCount: retrievedData.length
    };
  }

  async generateComplianceReport(collectionName, options = {}) {
    console.log(`Generating compliance report for: ${collectionName}`);

    const {
      startDate = new Date(Date.now() - 365 * 24 * 60 * 60 * 1000), // 1 year ago
      endDate = new Date(),
      includeMetrics = true,
      includeAuditTrail = true
    } = options;

    const policy = this.retentionPolicies.get(collectionName);
    if (!policy) {
      throw new Error(`No retention policy found for collection: ${collectionName}`);
    }

    // Collect compliance data
    const complianceData = {
      collection_name: collectionName,
      policy_id: policy.policy_id,
      report_generated_at: new Date(),
      reporting_period: { start: startDate, end: endDate },

      // Policy compliance status
      policy_compliance: {
        policy_enabled: policy.execution_settings.policy_enabled,
        gdpr_compliant: policy.compliance_settings.gdpr_applicable,
        legal_hold_support: this.config.compliance.legalHoldSupport,
        audit_level: policy.compliance_settings.audit_level
      },

      // Archive operations summary
      archive_summary: await this.getArchiveSummary(collectionName, startDate, endDate),

      // Current data status
      data_status: await this.getCurrentDataStatus(collectionName, policy)
    };

    if (includeMetrics) {
      complianceData.performance_metrics = await this.getPerformanceMetrics(collectionName, startDate, endDate);
    }

    if (includeAuditTrail) {
      complianceData.audit_trail = await this.getAuditTrail(collectionName, startDate, endDate);
    }

    // Check for any compliance issues
    complianceData.compliance_issues = await this.identifyComplianceIssues(collectionName, policy);

    return complianceData;
  }

  async loadRetentionPolicies() {
    const policies = await this.db.collection('data_retention_policies')
      .find({ 'execution_settings.policy_enabled': true })
      .toArray();

    policies.forEach(policy => {
      this.retentionPolicies.set(policy.collection_name, policy);
    });

    console.log(`Loaded ${policies.length} retention policies`);
  }

  generatePolicyId() {
    return `policy_${Date.now()}_${Math.random().toString(36).substr(2, 9)}`;
  }

  generateOperationId() {
    return `op_${Date.now()}_${Math.random().toString(36).substr(2, 9)}`;
  }

  anonymizeField(value, fieldType) {
    // Simple anonymization - in production, use proper anonymization libraries
    if (typeof value === 'string') {
      if (fieldType.includes('email')) {
        return 'anonymized@example.com';
      } else if (fieldType.includes('name')) {
        return 'ANONYMIZED';
      } else {
        return '***REDACTED***';
      }
    }
    return null;
  }

  async createArchiveMetadata(metadata) {
    return await this.db.collection('archive_metadata').insertOne(metadata);
  }

  async logExecutionAudit(auditRecord) {
    if (this.config.compliance.auditLogging) {
      return await this.db.collection('lifecycle_execution_audit').insertOne(auditRecord);
    }
  }
}

// Benefits of MongoDB Data Lifecycle Management:
// - Automated retention policy enforcement with minimal manual intervention
// - Flexible storage tiering supporting MongoDB, GridFS, S3, and filesystem storage
// - Built-in compliance features including legal hold support and audit trails  
// - Performance-optimized batch processing with throttling and memory management
// - Comprehensive data transformation capabilities for PII protection and anonymization
// - TTL index integration for automatic deletion without application logic
// - Real-time monitoring and alerting for policy execution and compliance status
// - Scalable architecture supporting large-scale data archiving operations
// - Integrated backup and recovery capabilities for archived data
// - SQL-compatible lifecycle management operations through QueryLeaf integration

module.exports = {
  MongoDataLifecycleManager
};

Understanding MongoDB Data Lifecycle Architecture

Advanced Archiving Strategies and Compliance Management

Implement sophisticated data lifecycle policies with enterprise-grade compliance and automation:

// Production-ready data lifecycle automation with enterprise compliance features
class EnterpriseDataLifecycleManager extends MongoDataLifecycleManager {
  constructor(connectionUri, enterpriseConfig) {
    super(connectionUri, enterpriseConfig);

    this.enterpriseFeatures = {
      // Advanced compliance management
      complianceIntegration: {
        gdprAutomation: true,
        legalHoldWorkflows: true,
        auditTrailEncryption: true,
        regulatoryReporting: true,
        dataSubjectRequests: true
      },

      // Enterprise storage integration
      storageIntegration: {
        multiTierStorage: true,
        cloudStorageIntegration: true,
        compressionOptimization: true,
        encryptionAtRest: true,
        geographicReplication: true
      },

      // Advanced automation
      automationCapabilities: {
        mlPredictiveArchiving: true,
        workloadOptimization: true,
        costOptimization: true,
        capacityPlanning: true,
        performanceTuning: true
      }
    };

    this.initializeEnterpriseFeatures();
  }

  async implementIntelligentArchiving(collectionName, options = {}) {
    console.log('Implementing intelligent archiving with machine learning optimization...');

    const archivingStrategy = {
      // Predictive analysis for optimal archiving timing
      predictiveModeling: {
        accessPatternAnalysis: true,
        queryFrequencyPrediction: true,
        storageOptimization: true,
        performanceImpactMinimization: true
      },

      // Cost-optimized storage tiering
      costOptimization: {
        automaticTierSelection: true,
        compressionOptimization: true,
        geographicOptimization: true,
        providerOptimization: true
      },

      // Performance-aware archiving
      performanceOptimization: {
        nonBlockingArchiving: true,
        priorityBasedProcessing: true,
        resourceThrottling: true,
        systemImpactMinimization: true
      }
    };

    return await this.deployIntelligentArchiving(collectionName, archivingStrategy, options);
  }

  async setupAdvancedComplianceWorkflows(complianceConfig) {
    console.log('Setting up advanced compliance workflows...');

    const complianceWorkflows = {
      // GDPR compliance automation
      gdprCompliance: {
        dataSubjectRequestHandling: true,
        rightToErasureAutomation: true,
        dataPortabilitySupport: true,
        consentManagement: true,
        breachNotificationIntegration: true
      },

      // Industry-specific compliance
      industryCompliance: {
        soxCompliance: complianceConfig.sox || false,
        hipaaCompliance: complianceConfig.hipaa || false,
        pciDssCompliance: complianceConfig.pciDss || false,
        iso27001Compliance: complianceConfig.iso27001 || false
      },

      // Legal hold management
      legalHoldManagement: {
        automaticHoldEnforcement: true,
        holdNotificationWorkflows: true,
        custodyChainTracking: true,
        releaseAutomation: true
      }
    };

    return await this.deployComplianceWorkflows(complianceWorkflows);
  }
}

SQL-Style Data Lifecycle Management with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB data archiving and lifecycle management:

-- QueryLeaf data lifecycle management with SQL-familiar patterns for MongoDB

-- Define comprehensive data retention policy with advanced features
CREATE RETENTION_POLICY order_data_lifecycle AS (
  -- Target collection and identification
  collection_name = 'orders',
  policy_enabled = true,

  -- Retention phases with flexible timing
  active_retention_days = 365,     -- Keep in active storage for 1 year
  archive_after_days = 730,        -- Archive after 2 years
  cold_storage_after_days = 1825,  -- Move to cold storage after 5 years  
  delete_after_days = 2555,        -- Delete after 7 years (regulatory requirement)

  -- Data identification and filtering
  date_field = 'order_date',
  additional_filters = JSON_BUILD_OBJECT(
    'status', JSON_BUILD_ARRAY('completed', 'shipped', 'delivered'),
    'total_amount', JSON_BUILD_OBJECT('$gt', 0)
  ),

  -- Exclude from archiving (VIP customers, ongoing disputes, etc.)
  exclusion_criteria = JSON_BUILD_OBJECT(
    '$or', JSON_BUILD_ARRAY(
      JSON_BUILD_OBJECT('customer_tier', 'vip'),
      JSON_BUILD_OBJECT('dispute_status', 'active'),
      JSON_BUILD_OBJECT('legal_hold', true)
    )
  ),

  -- Archive storage configuration
  archive_storage_type = 'gridfs',
  compression_enabled = true,
  encryption_required = false,
  batch_size = 1000,

  -- Performance optimization
  index_hints = JSON_BUILD_ARRAY('order_date_status_idx', 'customer_id_idx'),
  sort_field = 'order_date',
  memory_limit = '512M',
  max_execution_time_minutes = 180,

  -- Compliance settings
  gdpr_applicable = true,
  legal_hold_exempt = false,
  audit_level = 'detailed',
  pii_fields = JSON_BUILD_ARRAY('customer_email', 'billing_address', 'shipping_address'),

  -- Automation configuration
  execution_schedule = '0 2 * * 0',  -- Weekly on Sunday at 2 AM
  failure_retry_attempts = 3,
  notification_enabled = true,

  -- Business metadata
  business_justification = 'Regulatory compliance and performance optimization',
  data_owner = 'sales_operations_team',
  policy_version = '2.1'
);

-- Advanced customer data retention with PII protection
CREATE RETENTION_POLICY customer_data_lifecycle AS (
  collection_name = 'customers',
  policy_enabled = true,

  -- GDPR-compliant retention periods
  active_retention_days = 1095,    -- 3 years active retention
  archive_after_days = 1825,       -- Archive after 5 years  
  delete_after_days = 2555,        -- Delete after 7 years

  date_field = 'last_activity_date',

  -- PII anonymization before archiving
  pii_protection = JSON_BUILD_OBJECT(
    'anonymize_before_archive', true,
    'pii_fields', JSON_BUILD_ARRAY(
      'email', 'phone', 'address', 'birth_date', 'social_security_number'
    ),
    'anonymization_method', 'hash_with_salt'
  ),

  -- Data subject request handling
  gdpr_compliance = JSON_BUILD_OBJECT(
    'right_to_erasure_enabled', true,
    'data_portability_enabled', true,
    'consent_tracking_required', true,
    'processing_lawfulness_basis', 'legitimate_interest'
  ),

  archive_storage_type = 's3',
  s3_configuration = JSON_BUILD_OBJECT(
    'bucket', 'customer-data-archives',
    'storage_class', 'STANDARD_IA',
    'encryption', 'AES256'
  )
);

-- Execute data archiving with comprehensive monitoring
WITH archiving_execution AS (
  SELECT 
    collection_name,
    policy_id,

    -- Calculate records eligible for archiving
    (SELECT COUNT(*) 
     FROM orders 
     WHERE order_date < CURRENT_DATE - INTERVAL '2 years'
       AND status IN ('completed', 'shipped', 'delivered')
       AND total_amount > 0
       AND NOT (customer_tier = 'vip' OR dispute_status = 'active' OR legal_hold = true)
    ) as eligible_records,

    -- Estimate archive size and processing time
    (SELECT 
       ROUND(AVG(LENGTH(to_jsonb(o)::text))::numeric, 0) * COUNT(*) / 1024 / 1024
     FROM orders o 
     WHERE order_date < CURRENT_DATE - INTERVAL '2 years'
    ) as estimated_archive_size_mb,

    -- Performance projections
    CASE 
      WHEN eligible_records > 100000 THEN 'large_dataset_optimization_required'
      WHEN eligible_records > 10000 THEN 'standard_optimization_recommended'
      ELSE 'minimal_optimization_needed'
    END as performance_category,

    -- Compliance checks
    CASE 
      WHEN EXISTS (
        SELECT 1 FROM legal_hold_registry 
        WHERE collection_name = 'orders' 
        AND hold_status = 'active'
      ) THEN 'legal_hold_active_check_required'
      ELSE 'cleared_for_archiving'
    END as compliance_status

  FROM data_retention_policies 
  WHERE collection_name = 'orders' 
    AND policy_enabled = true
),

-- Execute archiving with batch processing and monitoring
archiving_results AS (
  EXECUTE_ARCHIVING(
    collection_name => 'orders',

    -- Batch processing configuration
    batch_processing => JSON_BUILD_OBJECT(
      'batch_size', 1000,
      'max_concurrent_batches', 3,
      'throttle_delay_ms', 10,
      'memory_limit_per_batch', '100M'
    ),

    -- Performance optimization
    performance_options => JSON_BUILD_OBJECT(
      'use_index_hints', true,
      'parallel_processing', true,
      'compression_level', 'standard',
      'checksum_validation', true
    ),

    -- Archive destination
    archive_destination => JSON_BUILD_OBJECT(
      'storage_type', 'gridfs',
      'bucket_name', 'order_archives',
      'naming_pattern', 'orders_archive_{year}_{month}_{batch}',
      'metadata_tags', JSON_BUILD_OBJECT(
        'department', 'sales',
        'retention_policy', 'order_data_lifecycle',
        'compliance_level', 'standard'
      )
    ),

    -- Compliance and audit settings
    compliance_settings => JSON_BUILD_OBJECT(
      'audit_logging', 'detailed',
      'pii_anonymization', false,  -- Orders don't contain direct PII
      'legal_hold_check', true,
      'gdpr_processing_log', true
    )
  )
)

SELECT 
  ae.collection_name,
  ae.eligible_records,
  ae.estimated_archive_size_mb,
  ae.performance_category,
  ae.compliance_status,

  -- Archiving execution results
  ar.operation_id,
  ar.records_archived,
  ar.archive_size_actual_mb,
  ar.execution_time_seconds,
  ar.batches_processed,

  -- Performance metrics
  ROUND(ar.records_archived::numeric / ar.execution_time_seconds, 2) as records_per_second,
  ROUND(ar.archive_size_actual_mb::numeric / ar.execution_time_seconds, 3) as mb_per_second,

  -- Compliance verification
  ar.compliance_checks_passed,
  ar.audit_trail_id,
  ar.archive_location,
  ar.checksum_verified,

  -- Success indicators
  CASE 
    WHEN ar.records_archived = ae.eligible_records THEN 'complete_success'
    WHEN ar.records_archived > ae.eligible_records * 0.95 THEN 'successful_with_minor_issues'
    WHEN ar.records_archived > 0 THEN 'partial_success_requires_review'
    ELSE 'failed_requires_investigation'
  END as execution_status,

  -- Recommendations for optimization
  CASE 
    WHEN ar.records_per_second < 10 THEN 'consider_batch_size_increase'
    WHEN ar.execution_time_seconds > 3600 THEN 'consider_parallel_processing_increase'
    WHEN ar.archive_size_actual_mb > ae.estimated_archive_size_mb * 1.5 THEN 'investigate_compression_efficiency'
    ELSE 'performance_within_expected_parameters'
  END as optimization_recommendation

FROM archiving_execution ae
CROSS JOIN archiving_results ar;

-- Monitor archiving operations with real-time dashboard
WITH current_archiving_operations AS (
  SELECT 
    operation_id,
    collection_name,
    policy_id,
    operation_type,
    started_at,

    -- Progress tracking
    records_processed,
    estimated_total_records,
    ROUND((records_processed::numeric / estimated_total_records) * 100, 1) as progress_percentage,

    -- Performance monitoring
    EXTRACT(SECONDS FROM CURRENT_TIMESTAMP - started_at) as elapsed_seconds,
    ROUND(records_processed::numeric / EXTRACT(SECONDS FROM CURRENT_TIMESTAMP - started_at), 2) as current_throughput,

    -- Resource utilization
    memory_usage_mb,
    cpu_utilization_percent,
    io_operations_per_second,

    -- Status indicators
    operation_status,
    error_count,
    last_error_message,

    -- ETA calculation
    CASE 
      WHEN records_processed > 0 AND operation_status = 'running' THEN
        CURRENT_TIMESTAMP + 
        (INTERVAL '1 second' * 
         ((estimated_total_records - records_processed) / 
          (records_processed::numeric / EXTRACT(SECONDS FROM CURRENT_TIMESTAMP - started_at))))
      ELSE NULL
    END as estimated_completion_time

  FROM lifecycle_operation_status
  WHERE operation_status IN ('running', 'paused', 'starting')
),

-- Historical performance analysis
archiving_performance_trends AS (
  SELECT 
    DATE_TRUNC('day', execution_timestamp) as execution_date,
    collection_name,

    -- Daily aggregated metrics
    COUNT(*) as operations_executed,
    SUM(records_processed) as total_records_archived,
    AVG(execution_duration_seconds) as avg_execution_time,
    AVG(records_processed::numeric / execution_duration_seconds) as avg_throughput,

    -- Success rate tracking
    COUNT(*) FILTER (WHERE status = 'success') as successful_operations,
    ROUND(
      (COUNT(*) FILTER (WHERE status = 'success')::numeric / COUNT(*)) * 100, 1
    ) as success_rate_percent,

    -- Resource efficiency metrics
    AVG(archive_size_mb::numeric / execution_duration_seconds) as avg_mb_per_second,
    AVG(memory_peak_usage_mb) as avg_peak_memory_usage,

    -- Trend indicators
    LAG(SUM(records_processed)) OVER (
      PARTITION BY collection_name 
      ORDER BY DATE_TRUNC('day', execution_timestamp)
    ) as previous_day_records,

    LAG(AVG(records_processed::numeric / execution_duration_seconds)) OVER (
      PARTITION BY collection_name
      ORDER BY DATE_TRUNC('day', execution_timestamp)  
    ) as previous_day_throughput

  FROM lifecycle_execution_audit
  WHERE execution_timestamp >= CURRENT_DATE - INTERVAL '30 days'
    AND operation_type = 'archive'
  GROUP BY DATE_TRUNC('day', execution_timestamp), collection_name
),

-- Data retention compliance dashboard
retention_compliance_status AS (
  SELECT 
    drp.collection_name,
    drp.policy_id,
    drp.policy_enabled,

    -- Current data status
    (SELECT COUNT(*) FROM INFORMATION_SCHEMA.COLLECTIONS 
     WHERE collection_name = drp.collection_name) as active_record_count,

    -- Retention phase analysis
    CASE 
      WHEN drp.active_retention_days IS NOT NULL THEN
        (SELECT COUNT(*) 
         FROM dynamic_collection_query(drp.collection_name)
         WHERE date_field_value < CURRENT_DATE - (drp.active_retention_days || ' days')::INTERVAL)
      ELSE 0
    END as records_past_active_retention,

    CASE 
      WHEN drp.archive_after_days IS NOT NULL THEN
        (SELECT COUNT(*) 
         FROM dynamic_collection_query(drp.collection_name)
         WHERE date_field_value < CURRENT_DATE - (drp.archive_after_days || ' days')::INTERVAL)
      ELSE 0
    END as records_ready_for_archive,

    CASE 
      WHEN drp.delete_after_days IS NOT NULL THEN
        (SELECT COUNT(*) 
         FROM dynamic_collection_query(drp.collection_name)
         WHERE date_field_value < CURRENT_DATE - (drp.delete_after_days || ' days')::INTERVAL)
      ELSE 0
    END as records_past_deletion_date,

    -- Compliance indicators
    CASE 
      WHEN records_past_deletion_date > 0 THEN 'non_compliant_immediate_attention'
      WHEN records_ready_for_archive > 10000 THEN 'compliance_risk_action_needed'
      WHEN records_past_active_retention > active_record_count * 0.3 THEN 'optimization_opportunity'
      ELSE 'compliant'
    END as compliance_status,

    -- Archive statistics
    (SELECT COUNT(*) FROM archive_metadata WHERE source_collection = drp.collection_name) as total_archives_created,
    (SELECT SUM(record_count) FROM archive_metadata WHERE source_collection = drp.collection_name) as total_records_archived,
    (SELECT MAX(archive_date) FROM archive_metadata WHERE source_collection = drp.collection_name) as last_archive_date,

    -- Next scheduled execution
    drp.next_execution_scheduled,
    EXTRACT(HOURS FROM drp.next_execution_scheduled - CURRENT_TIMESTAMP) as hours_until_next_execution

  FROM data_retention_policies drp
  WHERE drp.policy_enabled = true
)

SELECT 
  -- Current operations status
  'ACTIVE_OPERATIONS' as section,
  JSON_AGG(
    JSON_BUILD_OBJECT(
      'operation_id', cao.operation_id,
      'collection', cao.collection_name,
      'progress', cao.progress_percentage || '%',
      'throughput', cao.current_throughput || ' rec/sec',
      'eta', cao.estimated_completion_time,
      'status', cao.operation_status
    )
  ) as current_operations

FROM current_archiving_operations cao
WHERE cao.operation_status = 'running'

UNION ALL

SELECT 
  -- Performance trends
  'PERFORMANCE_TRENDS' as section,
  JSON_AGG(
    JSON_BUILD_OBJECT(
      'date', apt.execution_date,
      'collection', apt.collection_name,
      'records_archived', apt.total_records_archived,
      'avg_throughput', apt.avg_throughput || ' rec/sec',
      'success_rate', apt.success_rate_percent || '%',
      'trend', CASE 
        WHEN apt.avg_throughput > apt.previous_day_throughput * 1.1 THEN 'improving'
        WHEN apt.avg_throughput < apt.previous_day_throughput * 0.9 THEN 'declining'
        ELSE 'stable'
      END
    )
  ) as performance_data

FROM archiving_performance_trends apt
WHERE apt.execution_date >= CURRENT_DATE - INTERVAL '7 days'

UNION ALL

SELECT 
  -- Compliance status
  'COMPLIANCE_STATUS' as section,
  JSON_AGG(
    JSON_BUILD_OBJECT(
      'collection', rcs.collection_name,
      'compliance_status', rcs.compliance_status,
      'active_records', rcs.active_record_count,
      'ready_for_archive', rcs.records_ready_for_archive,
      'past_deletion_date', rcs.records_past_deletion_date,
      'last_archive', rcs.last_archive_date,
      'next_execution', rcs.hours_until_next_execution || ' hours',
      'total_archived', rcs.total_records_archived
    )
  ) as compliance_data

FROM retention_compliance_status rcs;

-- Advanced archive data retrieval with query optimization
WITH archive_query_optimization AS (
  SELECT 
    archive_id,
    source_collection,
    archive_date,
    record_count,
    archive_size_mb,
    storage_type,
    archive_location,

    -- Query complexity assessment
    CASE 
      WHEN record_count > 1000000 THEN 'complex_query_optimization_required'
      WHEN record_count > 100000 THEN 'standard_optimization_recommended'  
      ELSE 'direct_query_suitable'
    END as query_complexity,

    -- Storage access strategy
    CASE storage_type
      WHEN 'mongodb' THEN 'direct_collection_access'
      WHEN 'gridfs' THEN 'streaming_batch_retrieval'
      WHEN 's3' THEN 'cloud_storage_download_and_parse'
      ELSE 'custom_retrieval_strategy'
    END as retrieval_strategy

  FROM archive_metadata
  WHERE source_collection = 'orders'
    AND archive_date >= CURRENT_DATE - INTERVAL '1 year'
)

-- Execute optimized archive data retrieval
SELECT 
  RETRIEVE_ARCHIVED_DATA(
    archive_id => aqo.archive_id,

    -- Query parameters
    query_filter => JSON_BUILD_OBJECT(
      'customer_id', '507f1f77bcf86cd799439011',
      'total_amount', JSON_BUILD_OBJECT('$gte', 100),
      'order_date', JSON_BUILD_OBJECT(
        '$gte', '2023-01-01',
        '$lte', '2023-12-31'
      )
    ),

    -- Retrieval optimization
    retrieval_options => JSON_BUILD_OBJECT(
      'batch_size', CASE 
        WHEN aqo.query_complexity = 'complex_query_optimization_required' THEN 100
        WHEN aqo.query_complexity = 'standard_optimization_recommended' THEN 500
        ELSE 1000
      END,
      'parallel_processing', aqo.query_complexity != 'direct_query_suitable',
      'result_streaming', aqo.record_count > 10000,
      'compression_handling', 'automatic'
    ),

    -- Performance settings
    performance_limits => JSON_BUILD_OBJECT(
      'max_execution_time_seconds', 300,
      'memory_limit_mb', 256,
      'max_results', 10000
    )
  ) as retrieval_results

FROM archive_query_optimization aqo
WHERE aqo.archive_id IN (
  SELECT archive_id 
  FROM archive_metadata 
  WHERE source_collection = 'orders'
  ORDER BY archive_date DESC 
  LIMIT 5
);

-- QueryLeaf data lifecycle management features:
-- 1. SQL-familiar syntax for MongoDB data retention policy definition
-- 2. Automated archiving execution with batch processing and performance optimization
-- 3. Comprehensive compliance management including GDPR, legal holds, and audit trails
-- 4. Real-time monitoring dashboard for archiving operations and performance metrics
-- 5. Advanced archive data retrieval with query optimization and result streaming
-- 6. Intelligent data lifecycle automation with predictive analysis capabilities
-- 7. Multi-tier storage integration supporting MongoDB, GridFS, S3, and custom storage
-- 8. Performance-aware processing with resource throttling and system impact minimization
-- 9. Enterprise compliance workflows with automated reporting and alert generation
-- 10. Cost optimization strategies with intelligent storage tiering and compression

Best Practices for MongoDB Data Lifecycle Management

Archiving Strategy Design

Essential principles for effective MongoDB data archiving and lifecycle management:

  1. Policy-Driven Approach: Define comprehensive retention policies based on business requirements, regulatory compliance, and performance optimization goals
  2. Performance Optimization: Implement batch processing, indexing strategies, and resource throttling to minimize impact on production systems
  3. Compliance Integration: Build automated compliance workflows that address regulatory requirements like GDPR, HIPAA, and industry-specific standards
  4. Storage Optimization: Utilize multi-tier storage strategies with compression, encryption, and geographic distribution for cost and performance optimization
  5. Monitoring and Alerting: Deploy comprehensive monitoring systems that track archiving performance, compliance status, and operational health
  6. Recovery Planning: Design archive retrieval processes that support both routine access and emergency recovery scenarios

Production Deployment Strategies

Optimize MongoDB data lifecycle management for enterprise-scale requirements:

  1. Automated Execution: Implement scheduled archiving processes with intelligent failure recovery and retry mechanisms
  2. Resource Management: Configure memory limits, CPU throttling, and I/O optimization to prevent system impact during archiving operations
  3. Compliance Automation: Deploy automated compliance reporting, audit trail generation, and regulatory requirement enforcement
  4. Cost Optimization: Implement intelligent storage tiering that automatically moves data to appropriate storage classes based on access patterns
  5. Performance Monitoring: Monitor archiving throughput, resource utilization, and system performance to optimize operations
  6. Security Integration: Ensure data encryption, access controls, and audit logging meet enterprise security requirements

Conclusion

MongoDB data lifecycle management provides comprehensive capabilities for automated data archiving, compliance enforcement, and performance optimization that scale from simple retention policies to enterprise-wide governance programs. The flexible document-based architecture and built-in lifecycle features enable sophisticated archiving strategies that adapt to changing business requirements while maintaining operational efficiency.

Key MongoDB Data Lifecycle Management benefits include:

  • Automated Governance: Policy-driven data lifecycle management with minimal manual intervention and maximum compliance assurance
  • Performance Optimization: Intelligent archiving processes that maintain production system performance while managing large-scale data movement
  • Compliance Excellence: Built-in support for regulatory requirements including GDPR, industry standards, and legal hold management
  • Cost Efficiency: Multi-tier storage strategies with automated optimization that reduce storage costs while maintaining data accessibility
  • Operational Simplicity: Streamlined management processes that reduce administrative overhead while ensuring data governance
  • Scalable Architecture: Enterprise-ready capabilities that support growing data volumes and evolving compliance requirements

Whether you're building regulatory compliance systems, optimizing database performance, managing storage costs, or implementing enterprise data governance, MongoDB's data lifecycle management capabilities with QueryLeaf's familiar SQL interface provide the foundation for comprehensive, automated data archiving at scale.

QueryLeaf Integration: QueryLeaf automatically translates SQL-style data lifecycle management commands into optimized MongoDB operations, providing familiar retention policy syntax, archiving execution commands, and compliance reporting queries. Advanced lifecycle management patterns, performance optimization, and regulatory compliance workflows are seamlessly accessible through familiar SQL constructs, making sophisticated data governance both powerful and approachable for SQL-oriented operations teams.

The combination of MongoDB's flexible data lifecycle capabilities with SQL-style governance operations makes it an ideal platform for modern data management applications that require both comprehensive archiving functionality and operational simplicity, ensuring your data governance programs can scale efficiently while meeting evolving regulatory and business requirements.

MongoDB GridFS File Storage Management: Advanced Strategies for Large File Handling, Streaming, and Content Distribution with SQL-Style File Operations

Modern applications require sophisticated file storage solutions that can handle large media files, document repositories, streaming content, and complex file management workflows while maintaining high performance, scalability, and reliability across distributed systems. Traditional file storage approaches often struggle with large file limitations, metadata management complexity, and the challenges of integrating file operations with database transactions, leading to performance bottlenecks, storage inefficiencies, and operational complexity in production environments.

MongoDB GridFS provides comprehensive large file storage capabilities through intelligent file chunking, sophisticated metadata management, and seamless integration with MongoDB's document database features that enable applications to store, retrieve, and stream files of any size while maintaining ACID transaction support and distributed system reliability. Unlike traditional file systems that impose size limitations and separate file metadata from database operations, GridFS integrates advanced file storage directly with MongoDB's query engine, indexing capabilities, and replication features.

The Traditional File Storage Challenge

Conventional approaches to large file storage in enterprise applications face significant limitations in scalability and integration:

-- Traditional PostgreSQL file storage - limited and fragmented approach

-- Basic file metadata table (limited capabilities)
CREATE TABLE file_metadata (
    file_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    filename VARCHAR(255) NOT NULL,
    file_path VARCHAR(500) NOT NULL,
    file_size BIGINT NOT NULL,

    -- Basic file information
    mime_type VARCHAR(100),
    file_extension VARCHAR(10),
    original_filename VARCHAR(255),

    -- Simple metadata (limited structure)
    file_description TEXT,
    file_category VARCHAR(50),
    tags TEXT[], -- Basic array support

    -- Upload information
    uploaded_by UUID,
    uploaded_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    -- Storage information (filesystem dependent)
    storage_location VARCHAR(100) DEFAULT 'local', -- local, s3, azure, gcs
    storage_path VARCHAR(500),
    storage_bucket VARCHAR(100),

    -- Basic versioning (very limited)
    version_number INTEGER DEFAULT 1,
    is_current_version BOOLEAN DEFAULT TRUE,
    parent_file_id UUID REFERENCES file_metadata(file_id),

    -- Simple access control
    is_public BOOLEAN DEFAULT FALSE,
    access_permissions JSONB,

    -- Basic status tracking
    processing_status VARCHAR(20) DEFAULT 'uploaded', -- uploaded, processing, ready, error

    -- Audit fields
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- File chunks table for large file handling (manual implementation)
CREATE TABLE file_chunks (
    chunk_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    file_id UUID NOT NULL REFERENCES file_metadata(file_id) ON DELETE CASCADE,
    chunk_number INTEGER NOT NULL,
    chunk_size INTEGER NOT NULL,

    -- Chunk data (limited by database constraints)
    chunk_data BYTEA, -- Limited to ~1GB in PostgreSQL

    -- Chunk integrity
    chunk_checksum VARCHAR(64), -- MD5 or SHA256 hash

    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    UNIQUE (file_id, chunk_number)
);

-- File access log (basic tracking)
CREATE TABLE file_access_log (
    access_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    file_id UUID NOT NULL REFERENCES file_metadata(file_id),

    -- Access information
    accessed_by UUID,
    access_type VARCHAR(20), -- read, write, delete, stream
    access_timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    -- Request details
    client_ip INET,
    user_agent TEXT,
    request_method VARCHAR(10),

    -- Response information
    bytes_transferred BIGINT,
    response_status INTEGER,
    response_time_ms INTEGER,

    -- Streaming information (limited)
    stream_start_position BIGINT DEFAULT 0,
    stream_end_position BIGINT,

    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Complex query to manage file operations (expensive and limited)
WITH file_statistics AS (
    SELECT 
        fm.file_id,
        fm.filename,
        fm.file_size,
        fm.mime_type,
        fm.storage_location,
        fm.processing_status,

        -- Calculate chunk information (expensive operation)
        COUNT(fc.chunk_id) as total_chunks,
        SUM(fc.chunk_size) as total_chunk_size,

        -- Basic integrity check
        CASE 
            WHEN fm.file_size = SUM(fc.chunk_size) THEN 'intact'
            WHEN SUM(fc.chunk_size) IS NULL THEN 'no_chunks'
            ELSE 'corrupted'
        END as file_integrity,

        -- Recent access statistics (limited analysis)
        COUNT(CASE WHEN fal.access_timestamp >= CURRENT_TIMESTAMP - INTERVAL '24 hours' 
                   THEN 1 END) as daily_access_count,
        COUNT(CASE WHEN fal.access_timestamp >= CURRENT_TIMESTAMP - INTERVAL '7 days' 
                   THEN 1 END) as weekly_access_count,

        -- Data transfer statistics
        SUM(CASE WHEN fal.access_timestamp >= CURRENT_TIMESTAMP - INTERVAL '24 hours' 
                 THEN fal.bytes_transferred ELSE 0 END) as daily_bytes_transferred,

        -- Performance metrics
        AVG(CASE WHEN fal.access_timestamp >= CURRENT_TIMESTAMP - INTERVAL '24 hours' 
                 THEN fal.response_time_ms END) as avg_response_time_ms

    FROM file_metadata fm
    LEFT JOIN file_chunks fc ON fm.file_id = fc.file_id
    LEFT JOIN file_access_log fal ON fm.file_id = fal.file_id
    WHERE fm.is_current_version = TRUE
    GROUP BY fm.file_id, fm.filename, fm.file_size, fm.mime_type, 
             fm.storage_location, fm.processing_status
),

storage_analysis AS (
    SELECT 
        storage_location,
        COUNT(*) as file_count,
        SUM(file_size) as total_storage_bytes,
        AVG(file_size) as avg_file_size,

        -- Storage health indicators
        COUNT(CASE WHEN file_integrity = 'corrupted' THEN 1 END) as corrupted_files,
        COUNT(CASE WHEN processing_status = 'error' THEN 1 END) as error_files,

        -- Access patterns
        AVG(daily_access_count) as avg_daily_access,
        SUM(daily_bytes_transferred) as total_daily_transfer,

        -- Performance indicators
        AVG(avg_response_time_ms) as avg_response_time

    FROM file_statistics
    GROUP BY storage_location
)

SELECT 
    fs.filename,
    fs.file_size,
    fs.mime_type,
    fs.storage_location,
    fs.total_chunks,
    fs.file_integrity,
    fs.processing_status,

    -- Access metrics
    fs.daily_access_count,
    fs.weekly_access_count,
    fs.avg_response_time_ms,

    -- Data transfer
    ROUND(fs.daily_bytes_transferred / 1024.0 / 1024.0, 2) as daily_mb_transferred,

    -- Storage efficiency (limited calculation)
    ROUND((fs.total_chunk_size::DECIMAL / fs.file_size) * 100, 2) as storage_efficiency_percent,

    -- Health indicators
    CASE 
        WHEN fs.file_integrity = 'corrupted' THEN 'Critical - File Corrupted'
        WHEN fs.processing_status = 'error' THEN 'Error - Processing Failed'
        WHEN fs.avg_response_time_ms > 5000 THEN 'Warning - Slow Response'
        WHEN fs.daily_access_count > 1000 THEN 'High Usage'
        ELSE 'Normal'
    END as file_status

FROM file_statistics fs
ORDER BY fs.daily_access_count DESC, fs.file_size DESC
LIMIT 100;

-- Problems with traditional file storage approach:
-- 1. Database size limitations prevent storing large files
-- 2. Manual chunking implementation is complex and error-prone
-- 3. Limited integration between file operations and database transactions
-- 4. Poor performance for streaming and partial file access
-- 5. Complex metadata management across multiple tables
-- 6. Limited support for file versioning and content management
-- 7. Expensive joins required for file operations
-- 8. No built-in support for distributed file storage
-- 9. Manual implementation of file integrity and consistency checks
-- 10. Limited indexing and query capabilities for file metadata

MongoDB GridFS eliminates these limitations with intelligent file management:

// MongoDB GridFS - comprehensive large file storage and management
const { MongoClient, GridFSBucket } = require('mongodb');
const crypto = require('crypto');
const fs = require('fs');

// Advanced GridFS file management system
class MongoGridFSManager {
  constructor(client, databaseName, bucketName = 'files') {
    this.client = client;
    this.db = client.db(databaseName);
    this.bucket = new GridFSBucket(this.db, { 
      bucketName: bucketName,
      chunkSizeBytes: 1024 * 1024 // 1MB chunks for optimal performance
    });

    this.fileMetrics = {
      totalUploads: 0,
      totalDownloads: 0,
      totalStreams: 0,
      bytesUploaded: 0,
      bytesDownloaded: 0,
      averageUploadTime: 0,
      averageDownloadTime: 0,
      errorCount: 0
    };
  }

  // Upload large files with comprehensive metadata and progress tracking
  async uploadFile(filePath, options = {}) {
    const startTime = Date.now();

    try {
      // Generate comprehensive file metadata
      const fileStats = fs.statSync(filePath);
      const filename = options.filename || path.basename(filePath);

      // Create file hash for integrity checking
      const fileHash = await this.generateFileHash(filePath);

      // Comprehensive metadata for advanced file management
      const metadata = {
        // Basic file information
        originalName: filename,
        uploadedAt: new Date(),
        fileSize: fileStats.size,
        mimeType: options.mimeType || this.detectMimeType(filename),

        // File integrity and versioning
        md5Hash: fileHash.md5,
        sha256Hash: fileHash.sha256,
        version: options.version || 1,
        parentFileId: options.parentFileId || null,

        // Content management
        description: options.description || '',
        category: options.category || 'general',
        tags: options.tags || [],

        // Access control and permissions
        uploadedBy: options.uploadedBy || 'system',
        isPublic: options.isPublic || false,
        accessPermissions: options.accessPermissions || { read: ['authenticated'] },

        // Processing and workflow
        processingStatus: 'uploaded',
        processingMetadata: {},

        // Content-specific metadata
        contentMetadata: options.contentMetadata || {},

        // Storage and performance optimization
        compressionType: options.compression || 'none',
        encryptionStatus: options.encrypted || false,
        storageClass: options.storageClass || 'standard', // standard, archival, frequent_access

        // Business context
        projectId: options.projectId,
        customFields: options.customFields || {},

        // Audit and compliance
        retentionPolicy: options.retentionPolicy || 'standard',
        complianceFlags: options.complianceFlags || [],

        // Performance tracking
        uploadDuration: null, // Will be set after upload completes
        lastAccessedAt: new Date(),
        accessCount: 0,
        totalBytesTransferred: 0
      };

      return new Promise((resolve, reject) => {
        // Create upload stream with progress tracking
        const uploadStream = this.bucket.openUploadStream(filename, {
          metadata: metadata,
          chunkSizeBytes: options.chunkSize || (1024 * 1024) // 1MB default chunks
        });

        // Progress tracking variables
        let bytesUploaded = 0;
        const totalBytes = fileStats.size;

        // Create read stream from file
        const readStream = fs.createReadStream(filePath);

        // Progress tracking
        readStream.on('data', (chunk) => {
          bytesUploaded += chunk.length;

          if (options.onProgress) {
            const progress = {
              bytesUploaded: bytesUploaded,
              totalBytes: totalBytes,
              percentage: (bytesUploaded / totalBytes) * 100,
              remainingBytes: totalBytes - bytesUploaded,
              elapsedTime: Date.now() - startTime
            };
            options.onProgress(progress);
          }
        });

        // Handle upload completion
        uploadStream.on('finish', async () => {
          const uploadDuration = Date.now() - startTime;

          // Update file metadata with final upload information
          await this.db.collection(`${this.bucket.options.bucketName}.files`).updateOne(
            { _id: uploadStream.id },
            { 
              $set: { 
                'metadata.uploadDuration': uploadDuration,
                'metadata.uploadCompletedAt': new Date()
              }
            }
          );

          // Update metrics
          this.fileMetrics.totalUploads++;
          this.fileMetrics.bytesUploaded += totalBytes;
          this.fileMetrics.averageUploadTime = 
            (this.fileMetrics.averageUploadTime + uploadDuration) / this.fileMetrics.totalUploads;

          console.log(`File uploaded successfully: ${filename} (${totalBytes} bytes, ${uploadDuration}ms)`);

          resolve({
            fileId: uploadStream.id,
            filename: filename,
            size: totalBytes,
            uploadDuration: uploadDuration,
            metadata: metadata,
            chunksCount: Math.ceil(totalBytes / (options.chunkSize || (1024 * 1024)))
          });
        });

        // Handle upload errors
        uploadStream.on('error', (error) => {
          this.fileMetrics.errorCount++;
          console.error('Upload error:', error);
          reject(error);
        });

        // Start the upload
        readStream.pipe(uploadStream);
      });

    } catch (error) {
      this.fileMetrics.errorCount++;
      console.error('File upload error:', error);
      throw error;
    }
  }

  // Advanced file streaming with range support and performance optimization
  async streamFile(fileId, options = {}) {
    const startTime = Date.now();

    try {
      // Get file information for streaming optimization
      const fileInfo = await this.getFileInfo(fileId);
      if (!fileInfo) {
        throw new Error('File not found');
      }

      // Update access metrics
      await this.updateAccessMetrics(fileId);

      // Create download stream with optional range support
      const downloadOptions = {};

      // Support for HTTP range requests (partial content)
      if (options.start !== undefined || options.end !== undefined) {
        downloadOptions.start = options.start || 0;
        downloadOptions.end = options.end || fileInfo.length - 1;

        console.log(`Streaming file range: ${downloadOptions.start}-${downloadOptions.end}/${fileInfo.length}`);
      }

      const downloadStream = this.bucket.openDownloadStream(fileId, downloadOptions);

      // Track streaming metrics
      let bytesStreamed = 0;

      downloadStream.on('data', (chunk) => {
        bytesStreamed += chunk.length;

        if (options.onProgress) {
          const progress = {
            bytesStreamed: bytesStreamed,
            totalBytes: fileInfo.length,
            percentage: (bytesStreamed / fileInfo.length) * 100,
            elapsedTime: Date.now() - startTime
          };
          options.onProgress(progress);
        }
      });

      downloadStream.on('end', () => {
        const streamDuration = Date.now() - startTime;

        // Update metrics
        this.fileMetrics.totalStreams++;
        this.fileMetrics.bytesDownloaded += bytesStreamed;
        this.fileMetrics.averageDownloadTime = 
          (this.fileMetrics.averageDownloadTime + streamDuration) / this.fileMetrics.totalStreams;

        console.log(`File streamed: ${fileInfo.filename} (${bytesStreamed} bytes, ${streamDuration}ms)`);
      });

      downloadStream.on('error', (error) => {
        this.fileMetrics.errorCount++;
        console.error('Streaming error:', error);
      });

      return downloadStream;

    } catch (error) {
      this.fileMetrics.errorCount++;
      console.error('File streaming error:', error);
      throw error;
    }
  }

  // Comprehensive file search and metadata querying
  async searchFiles(query = {}, options = {}) {
    try {
      const searchCriteria = {};

      // Build comprehensive search query
      if (query.filename) {
        searchCriteria.filename = new RegExp(query.filename, 'i');
      }

      if (query.mimeType) {
        searchCriteria['metadata.mimeType'] = query.mimeType;
      }

      if (query.category) {
        searchCriteria['metadata.category'] = query.category;
      }

      if (query.tags && query.tags.length > 0) {
        searchCriteria['metadata.tags'] = { $in: query.tags };
      }

      if (query.uploadedBy) {
        searchCriteria['metadata.uploadedBy'] = query.uploadedBy;
      }

      if (query.dateRange) {
        searchCriteria.uploadDate = {};
        if (query.dateRange.from) {
          searchCriteria.uploadDate.$gte = new Date(query.dateRange.from);
        }
        if (query.dateRange.to) {
          searchCriteria.uploadDate.$lte = new Date(query.dateRange.to);
        }
      }

      if (query.sizeRange) {
        searchCriteria.length = {};
        if (query.sizeRange.min) {
          searchCriteria.length.$gte = query.sizeRange.min;
        }
        if (query.sizeRange.max) {
          searchCriteria.length.$lte = query.sizeRange.max;
        }
      }

      if (query.isPublic !== undefined) {
        searchCriteria['metadata.isPublic'] = query.isPublic;
      }

      // Full-text search in description and custom fields
      if (query.textSearch) {
        searchCriteria.$or = [
          { 'metadata.description': new RegExp(query.textSearch, 'i') },
          { 'metadata.customFields': new RegExp(query.textSearch, 'i') }
        ];
      }

      // Execute search with aggregation pipeline for advanced features
      const pipeline = [
        { $match: searchCriteria },

        // Add computed fields for enhanced results
        {
          $addFields: {
            fileSizeMB: { $divide: ['$length', 1024 * 1024] },
            uploadAge: { 
              $divide: [
                { $subtract: [new Date(), '$uploadDate'] },
                1000 * 60 * 60 * 24 // Convert to days
              ]
            }
          }
        },

        // Sort by relevance and recency
        {
          $sort: options.sortBy === 'size' ? { length: -1 } :
                 options.sortBy === 'name' ? { filename: 1 } :
                 { uploadDate: -1 } // Default: newest first
        },

        // Pagination
        { $skip: options.skip || 0 },
        { $limit: options.limit || 50 },

        // Project only needed fields for performance
        {
          $project: {
            _id: 1,
            filename: 1,
            length: 1,
            fileSizeMB: 1,
            uploadDate: 1,
            uploadAge: 1,
            md5: 1,
            'metadata.mimeType': 1,
            'metadata.category': 1,
            'metadata.tags': 1,
            'metadata.description': 1,
            'metadata.uploadedBy': 1,
            'metadata.isPublic': 1,
            'metadata.accessCount': 1,
            'metadata.lastAccessedAt': 1,
            'metadata.processingStatus': 1
          }
        }
      ];

      const files = await this.db.collection(`${this.bucket.options.bucketName}.files`)
        .aggregate(pipeline)
        .toArray();

      // Get total count for pagination
      const totalCount = await this.db.collection(`${this.bucket.options.bucketName}.files`)
        .countDocuments(searchCriteria);

      return {
        files: files,
        totalCount: totalCount,
        hasMore: (options.skip || 0) + files.length < totalCount,
        searchCriteria: searchCriteria,
        executionTime: Date.now()
      };

    } catch (error) {
      console.error('File search error:', error);
      throw error;
    }
  }

  // File versioning and content management
  async createFileVersion(originalFileId, newFilePath, versionOptions = {}) {
    try {
      // Get original file information
      const originalFile = await this.getFileInfo(originalFileId);
      if (!originalFile) {
        throw new Error('Original file not found');
      }

      // Create new version with inherited metadata
      const versionMetadata = {
        ...originalFile.metadata,
        version: (originalFile.metadata.version || 1) + 1,
        parentFileId: originalFileId,
        versionDescription: versionOptions.description || '',
        versionCreatedAt: new Date(),
        versionCreatedBy: versionOptions.createdBy || 'system',
        changeLog: versionOptions.changeLog || []
      };

      // Upload new version
      const uploadResult = await this.uploadFile(newFilePath, {
        filename: originalFile.filename,
        metadata: versionMetadata,
        ...versionOptions
      });

      // Update version tracking
      await this.updateVersionHistory(originalFileId, uploadResult.fileId);

      return {
        newVersionId: uploadResult.fileId,
        versionNumber: versionMetadata.version,
        originalFileId: originalFileId,
        uploadResult: uploadResult
      };

    } catch (error) {
      console.error('File versioning error:', error);
      throw error;
    }
  }

  // Advanced file analytics and reporting
  async getFileAnalytics(timeRange = '30d') {
    try {
      const now = new Date();
      const timeRanges = {
        '1d': 1,
        '7d': 7,
        '30d': 30,
        '90d': 90,
        '365d': 365
      };

      const days = timeRanges[timeRange] || 30;
      const startDate = new Date(now.getTime() - (days * 24 * 60 * 60 * 1000));

      // Comprehensive analytics aggregation
      const analyticsResults = await Promise.all([

        // Storage analytics
        this.db.collection(`${this.bucket.options.bucketName}.files`).aggregate([
          {
            $group: {
              _id: null,
              totalFiles: { $sum: 1 },
              totalStorageBytes: { $sum: '$length' },
              averageFileSize: { $avg: '$length' },
              largestFile: { $max: '$length' },
              smallestFile: { $min: '$length' }
            }
          }
        ]).toArray(),

        // Upload trends
        this.db.collection(`${this.bucket.options.bucketName}.files`).aggregate([
          {
            $match: {
              uploadDate: { $gte: startDate }
            }
          },
          {
            $group: {
              _id: {
                year: { $year: '$uploadDate' },
                month: { $month: '$uploadDate' },
                day: { $dayOfMonth: '$uploadDate' }
              },
              dailyUploads: { $sum: 1 },
              dailyStorageAdded: { $sum: '$length' }
            }
          },
          {
            $sort: { '_id.year': 1, '_id.month': 1, '_id.day': 1 }
          }
        ]).toArray(),

        // File type distribution
        this.db.collection(`${this.bucket.options.bucketName}.files`).aggregate([
          {
            $group: {
              _id: '$metadata.mimeType',
              fileCount: { $sum: 1 },
              totalSize: { $sum: '$length' },
              averageSize: { $avg: '$length' }
            }
          },
          {
            $sort: { fileCount: -1 }
          },
          {
            $limit: 20
          }
        ]).toArray(),

        // Category analysis
        this.db.collection(`${this.bucket.options.bucketName}.files`).aggregate([
          {
            $group: {
              _id: '$metadata.category',
              fileCount: { $sum: 1 },
              totalSize: { $sum: '$length' }
            }
          },
          {
            $sort: { fileCount: -1 }
          }
        ]).toArray(),

        // Access patterns
        this.db.collection(`${this.bucket.options.bucketName}.files`).aggregate([
          {
            $match: {
              'metadata.lastAccessedAt': { $gte: startDate }
            }
          },
          {
            $group: {
              _id: null,
              averageAccessCount: { $avg: '$metadata.accessCount' },
              totalBytesTransferred: { $sum: '$metadata.totalBytesTransferred' },
              mostAccessedFiles: { $push: {
                filename: '$filename',
                accessCount: '$metadata.accessCount'
              }}
            }
          }
        ]).toArray()
      ]);

      // Compile comprehensive analytics report
      const [storageStats, uploadTrends, fileTypeStats, categoryStats, accessStats] = analyticsResults;

      const analytics = {
        reportGeneratedAt: new Date(),
        timeRange: timeRange,

        // Storage overview
        storage: storageStats[0] || {
          totalFiles: 0,
          totalStorageBytes: 0,
          averageFileSize: 0,
          largestFile: 0,
          smallestFile: 0
        },

        // Upload trends
        uploadTrends: uploadTrends,

        // File type distribution
        fileTypes: fileTypeStats,

        // Category distribution
        categories: categoryStats,

        // Access patterns
        accessPatterns: accessStats[0] || {
          averageAccessCount: 0,
          totalBytesTransferred: 0,
          mostAccessedFiles: []
        },

        // Performance metrics
        performanceMetrics: {
          ...this.fileMetrics,
          reportedAt: new Date()
        },

        // Storage efficiency calculations
        efficiency: {
          storageUtilizationMB: Math.round((storageStats[0]?.totalStorageBytes || 0) / (1024 * 1024)),
          averageFileSizeMB: Math.round((storageStats[0]?.averageFileSize || 0) / (1024 * 1024)),
          chunksPerFile: Math.ceil((storageStats[0]?.averageFileSize || 0) / (1024 * 1024)), // Assumes 1MB chunks
          compressionRatio: 1.0 // Would be calculated from actual compression data
        }
      };

      return analytics;

    } catch (error) {
      console.error('Analytics generation error:', error);
      throw error;
    }
  }

  // Utility methods for file operations
  async getFileInfo(fileId) {
    try {
      const fileInfo = await this.db.collection(`${this.bucket.options.bucketName}.files`)
        .findOne({ _id: fileId });
      return fileInfo;
    } catch (error) {
      console.error('Get file info error:', error);
      return null;
    }
  }

  async updateAccessMetrics(fileId) {
    try {
      await this.db.collection(`${this.bucket.options.bucketName}.files`).updateOne(
        { _id: fileId },
        {
          $inc: { 'metadata.accessCount': 1 },
          $set: { 'metadata.lastAccessedAt': new Date() }
        }
      );
    } catch (error) {
      console.error('Access metrics update error:', error);
    }
  }

  async generateFileHash(filePath) {
    return new Promise((resolve, reject) => {
      const md5Hash = crypto.createHash('md5');
      const sha256Hash = crypto.createHash('sha256');
      const stream = fs.createReadStream(filePath);

      stream.on('data', (data) => {
        md5Hash.update(data);
        sha256Hash.update(data);
      });

      stream.on('end', () => {
        resolve({
          md5: md5Hash.digest('hex'),
          sha256: sha256Hash.digest('hex')
        });
      });

      stream.on('error', reject);
    });
  }

  detectMimeType(filename) {
    const extension = filename.toLowerCase().split('.').pop();
    const mimeTypes = {
      'jpg': 'image/jpeg',
      'jpeg': 'image/jpeg',
      'png': 'image/png',
      'gif': 'image/gif',
      'pdf': 'application/pdf',
      'mp4': 'video/mp4',
      'mp3': 'audio/mpeg',
      'txt': 'text/plain',
      'json': 'application/json',
      'zip': 'application/zip'
    };
    return mimeTypes[extension] || 'application/octet-stream';
  }

  async updateVersionHistory(originalFileId, newVersionId) {
    // Implementation for version history tracking
    await this.db.collection('file_versions').insertOne({
      originalFileId: originalFileId,
      versionId: newVersionId,
      createdAt: new Date()
    });
  }

  // File cleanup and maintenance
  async deleteFile(fileId) {
    try {
      await this.bucket.delete(fileId);
      console.log(`File deleted: ${fileId}`);
      return { success: true, deletedAt: new Date() };
    } catch (error) {
      console.error('File deletion error:', error);
      throw error;
    }
  }

  // Get comprehensive system metrics
  getSystemMetrics() {
    return {
      ...this.fileMetrics,
      timestamp: new Date(),
      bucketName: this.bucket.options.bucketName,
      chunkSize: this.bucket.options.chunkSizeBytes
    };
  }
}

// Example usage demonstrating comprehensive GridFS functionality
async function demonstrateGridFSOperations() {
  const client = new MongoClient('mongodb://localhost:27017');
  await client.connect();

  const gridFSManager = new MongoGridFSManager(client, 'mediaStorage', 'uploads');

  try {
    console.log('Demonstrating MongoDB GridFS advanced file management...');

    // Upload a large file with comprehensive metadata
    console.log('Uploading large file...');
    const uploadResult = await gridFSManager.uploadFile('/path/to/large-video.mp4', {
      description: 'Corporate training video',
      category: 'training',
      tags: ['corporate', 'training', 'hr'],
      uploadedBy: 'admin',
      isPublic: false,
      contentMetadata: {
        duration: 3600, // seconds
        resolution: '1920x1080',
        codec: 'h264'
      },
      onProgress: (progress) => {
        console.log(`Upload progress: ${progress.percentage.toFixed(1)}%`);
      }
    });

    console.log('Upload completed:', uploadResult);

    // Search for files with comprehensive criteria
    console.log('Searching for video files...');
    const searchResults = await gridFSManager.searchFiles({
      mimeType: 'video/mp4',
      category: 'training',
      tags: ['corporate'],
      sizeRange: { min: 100 * 1024 * 1024 } // Files larger than 100MB
    }, {
      sortBy: 'size',
      limit: 10
    });

    console.log(`Found ${searchResults.totalCount} matching files`);
    searchResults.files.forEach(file => {
      console.log(`- ${file.filename} (${file.fileSizeMB.toFixed(1)} MB)`);
    });

    // Stream file with range support
    if (searchResults.files.length > 0) {
      const fileToStream = searchResults.files[0];
      console.log(`Streaming file: ${fileToStream.filename}`);

      const streamOptions = {
        start: 0,
        end: 1024 * 1024, // First 1MB
        onProgress: (progress) => {
          console.log(`Streaming progress: ${progress.percentage.toFixed(1)}%`);
        }
      };

      const stream = await gridFSManager.streamFile(fileToStream._id, streamOptions);

      // In a real application, you would pipe this to a response or file
      stream.on('end', () => {
        console.log('Streaming completed');
      });
    }

    // Generate comprehensive analytics
    console.log('Generating file analytics...');
    const analytics = await gridFSManager.getFileAnalytics('30d');

    console.log('Storage Analytics:');
    console.log(`Total Files: ${analytics.storage.totalFiles}`);
    console.log(`Total Storage: ${(analytics.storage.totalStorageBytes / (1024 * 1024 * 1024)).toFixed(2)} GB`);
    console.log(`Average File Size: ${(analytics.storage.averageFileSize / (1024 * 1024)).toFixed(2)} MB`);

    console.log('File Type Distribution:');
    analytics.fileTypes.slice(0, 5).forEach(type => {
      console.log(`- ${type._id}: ${type.fileCount} files (${(type.totalSize / (1024 * 1024)).toFixed(1)} MB)`);
    });

    // Get system metrics
    const metrics = gridFSManager.getSystemMetrics();
    console.log('System Performance Metrics:', metrics);

  } catch (error) {
    console.error('GridFS demonstration error:', error);
  } finally {
    await client.close();
  }
}

// Benefits of MongoDB GridFS:
// - Seamless large file storage without size limitations
// - Automatic file chunking with optimal performance
// - Comprehensive metadata management with flexible schemas
// - Built-in streaming support with range request capabilities
// - Integration with MongoDB's query engine and indexing
// - ACID transaction support for file operations
// - Advanced search and analytics capabilities
// - Automatic replication and distributed storage
// - File versioning and content management features
// - High-performance concurrent access and streaming

SQL-Style File Operations with QueryLeaf

QueryLeaf provides familiar approaches to MongoDB GridFS file management and operations:

-- QueryLeaf GridFS file management with SQL-familiar syntax

-- Upload file with comprehensive metadata
INSERT INTO FILES (filename, file_data, metadata) VALUES (
  'corporate-training.mp4',
  UPLOAD_FILE('/path/to/video.mp4'),
  JSON_OBJECT(
    'description', 'Corporate training video on data security',
    'category', 'training',
    'tags', JSON_ARRAY('corporate', 'security', 'training'),
    'uploadedBy', 'admin@company.com',
    'isPublic', false,
    'contentMetadata', JSON_OBJECT(
      'duration', 3600,
      'resolution', '1920x1080',
      'codec', 'h264',
      'bitrate', '2000kbps'
    ),
    'processingStatus', 'uploaded',
    'retentionPolicy', 'business-7years',
    'complianceFlags', JSON_ARRAY('gdpr', 'sox')
  )
);

-- Search and query files with comprehensive criteria
SELECT 
  file_id,
  filename,
  file_size,
  ROUND(file_size / 1024.0 / 1024.0, 2) as file_size_mb,
  upload_date,

  -- Extract metadata fields
  JSON_EXTRACT(metadata, '$.description') as description,
  JSON_EXTRACT(metadata, '$.category') as category,
  JSON_EXTRACT(metadata, '$.tags') as tags,
  JSON_EXTRACT(metadata, '$.uploadedBy') as uploaded_by,
  JSON_EXTRACT(metadata, '$.contentMetadata.duration') as duration_seconds,
  JSON_EXTRACT(metadata, '$.processingStatus') as processing_status,

  -- Access metrics
  JSON_EXTRACT(metadata, '$.accessCount') as access_count,
  JSON_EXTRACT(metadata, '$.lastAccessedAt') as last_accessed,
  JSON_EXTRACT(metadata, '$.totalBytesTransferred') as total_bytes_transferred,

  -- File integrity
  md5_hash,
  chunk_count,

  -- Computed fields
  CASE 
    WHEN file_size > 1024*1024*1024 THEN 'Large (>1GB)'
    WHEN file_size > 100*1024*1024 THEN 'Medium (>100MB)'
    ELSE 'Small (<100MB)'
  END as size_category,

  DATEDIFF(CURRENT_DATE(), upload_date) as days_since_upload

FROM GRIDFS_FILES()
WHERE 
  -- File type filtering
  JSON_EXTRACT(metadata, '$.mimeType') LIKE 'video/%'

  -- Category and tag filtering
  AND JSON_EXTRACT(metadata, '$.category') = 'training'
  AND JSON_CONTAINS(JSON_EXTRACT(metadata, '$.tags'), '"corporate"')

  -- Size filtering
  AND file_size > 100 * 1024 * 1024  -- Files larger than 100MB

  -- Date range filtering
  AND upload_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 90 DAY)

  -- Access pattern filtering
  AND CAST(JSON_EXTRACT(metadata, '$.accessCount') AS UNSIGNED) > 5

  -- Processing status filtering
  AND JSON_EXTRACT(metadata, '$.processingStatus') = 'ready'

ORDER BY 
  CAST(JSON_EXTRACT(metadata, '$.accessCount') AS UNSIGNED) DESC,
  upload_date DESC
LIMIT 50;

-- File analytics and usage patterns
WITH file_analytics AS (
  SELECT 
    DATE_FORMAT(upload_date, '%Y-%m') as upload_month,
    JSON_EXTRACT(metadata, '$.category') as category,
    JSON_EXTRACT(metadata, '$.mimeType') as mime_type,

    -- File metrics
    COUNT(*) as file_count,
    SUM(file_size) as total_size_bytes,
    AVG(file_size) as avg_file_size,
    MIN(file_size) as min_file_size,
    MAX(file_size) as max_file_size,

    -- Access metrics
    SUM(CAST(JSON_EXTRACT(metadata, '$.accessCount') AS UNSIGNED)) as total_access_count,
    AVG(CAST(JSON_EXTRACT(metadata, '$.accessCount') AS UNSIGNED)) as avg_access_count,
    SUM(CAST(JSON_EXTRACT(metadata, '$.totalBytesTransferred') AS UNSIGNED)) as total_bytes_transferred,

    -- Performance metrics
    AVG(CAST(JSON_EXTRACT(metadata, '$.uploadDuration') AS UNSIGNED)) as avg_upload_time_ms

  FROM GRIDFS_FILES()
  WHERE upload_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 12 MONTH)
  GROUP BY 
    DATE_FORMAT(upload_date, '%Y-%m'),
    JSON_EXTRACT(metadata, '$.category'),
    JSON_EXTRACT(metadata, '$.mimeType')
),

category_summary AS (
  SELECT 
    category,

    -- Volume metrics
    SUM(file_count) as total_files,
    SUM(total_size_bytes) as category_total_size,
    ROUND(SUM(total_size_bytes) / 1024.0 / 1024.0 / 1024.0, 2) as category_total_gb,

    -- Access patterns
    SUM(total_access_count) as category_total_accesses,
    ROUND(AVG(avg_access_count), 2) as category_avg_access_per_file,

    -- Performance indicators
    ROUND(AVG(avg_upload_time_ms), 2) as category_avg_upload_time,

    -- Growth trends
    COUNT(DISTINCT upload_month) as active_months,

    -- Storage efficiency
    ROUND(AVG(avg_file_size) / 1024.0 / 1024.0, 2) as avg_file_size_mb,
    ROUND(SUM(total_bytes_transferred) / 1024.0 / 1024.0 / 1024.0, 2) as total_transfer_gb

  FROM file_analytics
  GROUP BY category
)

SELECT 
  category,
  total_files,
  category_total_gb,
  category_avg_access_per_file,
  avg_file_size_mb,
  total_transfer_gb,

  -- Storage cost estimation (example rates)
  ROUND(category_total_gb * 0.023, 2) as estimated_monthly_storage_cost_usd,
  ROUND(total_transfer_gb * 0.09, 2) as estimated_transfer_cost_usd,

  -- Performance assessment
  CASE 
    WHEN category_avg_upload_time < 1000 THEN 'Excellent'
    WHEN category_avg_upload_time < 5000 THEN 'Good'
    WHEN category_avg_upload_time < 15000 THEN 'Fair'
    ELSE 'Needs Optimization'
  END as upload_performance,

  -- Usage classification
  CASE 
    WHEN category_avg_access_per_file > 100 THEN 'High Usage'
    WHEN category_avg_access_per_file > 20 THEN 'Medium Usage'
    WHEN category_avg_access_per_file > 5 THEN 'Low Usage'
    ELSE 'Archived/Inactive'
  END as usage_pattern

FROM category_summary
ORDER BY category_total_gb DESC, category_avg_access_per_file DESC;

-- File streaming and download operations
SELECT 
  file_id,
  filename,
  file_size,

  -- Create streaming URLs with range support
  CONCAT('/api/files/stream/', file_id) as stream_url,
  CONCAT('/api/files/stream/', file_id, '?range=0-1048576') as preview_stream_url,
  CONCAT('/api/files/download/', file_id) as download_url,

  -- Content delivery optimization
  CASE 
    WHEN JSON_EXTRACT(metadata, '$.mimeType') LIKE 'video/%' THEN 'streaming'
    WHEN JSON_EXTRACT(metadata, '$.mimeType') LIKE 'audio/%' THEN 'streaming'
    WHEN JSON_EXTRACT(metadata, '$.mimeType') LIKE 'image/%' THEN 'direct'
    ELSE 'download'
  END as recommended_delivery_method,

  -- CDN configuration suggestions
  CASE 
    WHEN CAST(JSON_EXTRACT(metadata, '$.accessCount') AS UNSIGNED) > 1000 THEN 'edge-cache'
    WHEN CAST(JSON_EXTRACT(metadata, '$.accessCount') AS UNSIGNED) > 100 THEN 'regional-cache'
    ELSE 'origin-only'
  END as cdn_strategy,

  -- Access control
  JSON_EXTRACT(metadata, '$.isPublic') as is_public,
  JSON_EXTRACT(metadata, '$.accessPermissions') as access_permissions

FROM GRIDFS_FILES()
WHERE JSON_EXTRACT(metadata, '$.processingStatus') = 'ready'
ORDER BY CAST(JSON_EXTRACT(metadata, '$.accessCount') AS UNSIGNED) DESC;

-- File maintenance and cleanup operations
WITH file_maintenance AS (
  SELECT 
    file_id,
    filename,
    file_size,
    upload_date,

    -- Metadata analysis
    JSON_EXTRACT(metadata, '$.category') as category,
    JSON_EXTRACT(metadata, '$.retentionPolicy') as retention_policy,
    JSON_EXTRACT(metadata, '$.lastAccessedAt') as last_accessed,
    CAST(JSON_EXTRACT(metadata, '$.accessCount') AS UNSIGNED) as access_count,

    -- Age calculations
    DATEDIFF(CURRENT_DATE(), upload_date) as days_since_upload,
    DATEDIFF(CURRENT_DATE(), STR_TO_DATE(JSON_EXTRACT(metadata, '$.lastAccessedAt'), '%Y-%m-%d')) as days_since_access,

    -- Maintenance flags
    CASE 
      WHEN JSON_EXTRACT(metadata, '$.retentionPolicy') = 'business-7years' AND 
           DATEDIFF(CURRENT_DATE(), upload_date) > 2555 THEN 'DELETE'
      WHEN JSON_EXTRACT(metadata, '$.retentionPolicy') = 'business-3years' AND 
           DATEDIFF(CURRENT_DATE(), upload_date) > 1095 THEN 'DELETE'
      WHEN DATEDIFF(CURRENT_DATE(), STR_TO_DATE(JSON_EXTRACT(metadata, '$.lastAccessedAt'), '%Y-%m-%d')) > 365
           AND CAST(JSON_EXTRACT(metadata, '$.accessCount') AS UNSIGNED) = 0 THEN 'ARCHIVE'
      WHEN DATEDIFF(CURRENT_DATE(), STR_TO_DATE(JSON_EXTRACT(metadata, '$.lastAccessedAt'), '%Y-%m-%d')) > 180
           AND CAST(JSON_EXTRACT(metadata, '$.accessCount') AS UNSIGNED) < 5 THEN 'COLD_STORAGE'
      ELSE 'ACTIVE'
    END as maintenance_action

  FROM GRIDFS_FILES()
)

SELECT 
  maintenance_action,
  COUNT(*) as file_count,
  ROUND(SUM(file_size) / 1024.0 / 1024.0 / 1024.0, 2) as total_size_gb,

  -- Cost impact analysis
  ROUND((SUM(file_size) / 1024.0 / 1024.0 / 1024.0) * 0.023, 2) as current_monthly_cost_usd,

  -- Storage class optimization
  CASE maintenance_action
    WHEN 'COLD_STORAGE' THEN ROUND((SUM(file_size) / 1024.0 / 1024.0 / 1024.0) * 0.004, 2)
    WHEN 'ARCHIVE' THEN ROUND((SUM(file_size) / 1024.0 / 1024.0 / 1024.0) * 0.001, 2)
    WHEN 'DELETE' THEN 0
    ELSE ROUND((SUM(file_size) / 1024.0 / 1024.0 / 1024.0) * 0.023, 2)
  END as optimized_monthly_cost_usd,

  -- Sample files for review
  GROUP_CONCAT(
    CONCAT(filename, ' (', ROUND(file_size/1024/1024, 1), 'MB)')
    ORDER BY file_size DESC
    SEPARATOR '; '
  ) as sample_files

FROM file_maintenance
GROUP BY maintenance_action
ORDER BY total_size_gb DESC;

-- Real-time file system monitoring
CREATE VIEW file_system_health AS
SELECT 
  -- Current system status
  COUNT(*) as total_files,
  ROUND(SUM(file_size) / 1024.0 / 1024.0 / 1024.0, 2) as total_storage_gb,
  COUNT(CASE WHEN upload_date >= DATE_SUB(NOW(), INTERVAL 24 HOUR) THEN 1 END) as files_uploaded_24h,
  COUNT(CASE WHEN STR_TO_DATE(JSON_EXTRACT(metadata, '$.lastAccessedAt'), '%Y-%m-%d %H:%i:%s') >= DATE_SUB(NOW(), INTERVAL 1 HOUR) THEN 1 END) as files_accessed_1h,

  -- Performance indicators
  AVG(CAST(JSON_EXTRACT(metadata, '$.uploadDuration') AS UNSIGNED)) as avg_upload_time_ms,
  COUNT(CASE WHEN JSON_EXTRACT(metadata, '$.processingStatus') = 'error' THEN 1 END) as files_with_errors,
  COUNT(CASE WHEN chunk_count != CEIL(file_size / 1048576.0) THEN 1 END) as files_with_integrity_issues,

  -- Storage distribution
  COUNT(CASE WHEN file_size > 1024*1024*1024 THEN 1 END) as large_files_1gb_plus,
  COUNT(CASE WHEN file_size BETWEEN 100*1024*1024 AND 1024*1024*1024 THEN 1 END) as medium_files_100mb_1gb,
  COUNT(CASE WHEN file_size < 100*1024*1024 THEN 1 END) as small_files_under_100mb,

  -- Health assessment
  CASE 
    WHEN COUNT(CASE WHEN JSON_EXTRACT(metadata, '$.processingStatus') = 'error' THEN 1 END) > 
         COUNT(*) * 0.05 THEN 'Critical - High Error Rate'
    WHEN AVG(CAST(JSON_EXTRACT(metadata, '$.uploadDuration') AS UNSIGNED)) > 30000 THEN 'Warning - Slow Uploads'
    WHEN COUNT(CASE WHEN chunk_count != CEIL(file_size / 1048576.0) THEN 1 END) > 0 THEN 'Warning - Integrity Issues'
    ELSE 'Healthy'
  END as system_health_status,

  NOW() as report_timestamp

FROM GRIDFS_FILES()
WHERE upload_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY);

-- QueryLeaf GridFS provides:
-- 1. SQL-familiar file upload and management operations
-- 2. Comprehensive file search and filtering capabilities
-- 3. Advanced analytics and usage pattern analysis
-- 4. Intelligent file lifecycle management and cleanup
-- 5. Real-time system health monitoring and alerting
-- 6. Cost optimization and storage class recommendations
-- 7. Integration with MongoDB's GridFS streaming capabilities
-- 8. Metadata-driven content management and organization
-- 9. Performance monitoring and optimization insights
-- 10. Enterprise-grade file operations with ACID guarantees

Best Practices for MongoDB GridFS

File Storage Strategy

Optimal GridFS configuration for different application types:

  1. Media Streaming Applications: Large chunk sizes for optimal streaming performance
  2. Document Management Systems: Metadata-rich storage with comprehensive indexing
  3. Content Distribution Networks: Integration with CDN and caching strategies
  4. Backup and Archival Systems: Compression and long-term storage optimization
  5. Real-time Applications: Fast upload/download with minimal latency
  6. Multi-tenant Systems: Secure isolation and access control patterns

Performance Optimization Guidelines

Essential considerations for production GridFS deployments:

  1. Chunk Size Optimization: Balance between storage efficiency and streaming performance
  2. Index Strategy: Create appropriate indexes on metadata fields for fast queries
  3. Replication Configuration: Optimize replica set configuration for file operations
  4. Connection Pooling: Configure connection pools for concurrent file operations
  5. Monitoring Integration: Implement comprehensive file operation monitoring
  6. Storage Management: Plan for growth and implement lifecycle management

Conclusion

MongoDB GridFS provides sophisticated large file storage and management capabilities that seamlessly integrate with MongoDB's document database features while supporting unlimited file sizes, intelligent streaming, and comprehensive metadata management. By implementing advanced file management patterns, streaming optimization, and automated analytics, applications can handle complex file storage requirements while maintaining high performance and operational efficiency.

Key GridFS benefits include:

  • Unlimited File Storage: No size limitations with automatic chunking and distribution
  • Seamless Integration: Native integration with MongoDB queries, indexes, and transactions
  • Intelligent Streaming: High-performance streaming with range request support
  • Comprehensive Metadata: Flexible, searchable metadata with rich query capabilities
  • High Availability: Automatic replication and distributed storage across replica sets
  • Advanced Analytics: Built-in analytics and reporting for file usage and performance

Whether you're building media streaming platforms, document management systems, content distribution networks, or file-intensive applications, MongoDB GridFS with QueryLeaf's familiar file operation interface provides the foundation for scalable, efficient large file management. This combination enables you to leverage advanced file storage capabilities while maintaining familiar database administration patterns and SQL-style file operations.

QueryLeaf Integration: QueryLeaf automatically translates SQL-familiar file operations into optimal MongoDB GridFS commands while providing comprehensive file management and analytics through SQL-style queries. Advanced file storage patterns, streaming optimization, and lifecycle management are seamlessly handled through familiar database administration interfaces, making sophisticated file storage both powerful and accessible.

The integration of intelligent file storage with SQL-style file operations makes MongoDB an ideal platform for applications requiring both scalable file management and familiar database administration patterns, ensuring your files remain both accessible and efficiently managed as they scale to meet demanding production requirements.

MongoDB Aggregation Framework for Real-Time Analytics Dashboards: Advanced Data Processing and Visualization Pipelines

Modern data-driven applications require sophisticated analytics capabilities that can process large volumes of data in real-time, generate insights across multiple dimensions, and power interactive dashboards that provide immediate business intelligence. Traditional analytics approaches often involve complex ETL processes, separate analytics databases, and batch processing systems that introduce significant latency between data creation and insight availability, limiting the ability to make real-time business decisions.

MongoDB's Aggregation Framework provides comprehensive real-time analytics capabilities through powerful data processing pipelines that enable complex calculations, multi-stage transformations, and advanced statistical operations directly within the database. Unlike traditional analytics systems that require data movement and separate processing infrastructure, MongoDB aggregation pipelines can process operational data immediately, providing real-time insights with minimal latency and infrastructure complexity.

The Traditional Analytics Challenge

Conventional approaches to real-time analytics and dashboard creation have significant limitations for modern data-driven applications:

-- Traditional PostgreSQL analytics - complex and resource-intensive approaches

-- Basic analytics table structure with limited real-time capabilities
CREATE TABLE sales_transactions (
    transaction_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    customer_id UUID NOT NULL,
    product_id UUID NOT NULL,
    transaction_date TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    quantity INTEGER NOT NULL,
    unit_price DECIMAL(10,2) NOT NULL,
    total_amount DECIMAL(10,2) NOT NULL,
    discount_amount DECIMAL(10,2) DEFAULT 0,
    tax_amount DECIMAL(10,2) NOT NULL,
    payment_method VARCHAR(50) NOT NULL,
    sales_channel VARCHAR(50) NOT NULL,
    region VARCHAR(100) NOT NULL,

    -- Manual aggregation tracking (limited granularity)
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Basic customer demographics table
CREATE TABLE customers (
    customer_id UUID PRIMARY KEY,
    first_name VARCHAR(100) NOT NULL,
    last_name VARCHAR(100) NOT NULL,
    email VARCHAR(200) UNIQUE NOT NULL,
    age INTEGER,
    gender VARCHAR(20),
    city VARCHAR(100),
    state VARCHAR(50),
    country VARCHAR(50),
    customer_segment VARCHAR(50),
    registration_date TIMESTAMP NOT NULL,
    lifetime_value DECIMAL(15,2) DEFAULT 0
);

-- Product catalog with basic attributes
CREATE TABLE products (
    product_id UUID PRIMARY KEY,
    product_name VARCHAR(200) NOT NULL,
    category VARCHAR(100) NOT NULL,
    subcategory VARCHAR(100),
    brand VARCHAR(100),
    unit_cost DECIMAL(10,2) NOT NULL,
    list_price DECIMAL(10,2) NOT NULL,
    margin_percent DECIMAL(5,2),
    stock_quantity INTEGER DEFAULT 0,
    supplier_id UUID
);

-- Pre-aggregated summary tables (manual maintenance required)
CREATE TABLE daily_sales_summary (
    summary_date DATE NOT NULL,
    region VARCHAR(100) NOT NULL,
    category VARCHAR(100) NOT NULL,
    total_transactions INTEGER DEFAULT 0,
    total_revenue DECIMAL(15,2) DEFAULT 0,
    total_units_sold INTEGER DEFAULT 0,
    unique_customers INTEGER DEFAULT 0,
    avg_transaction_value DECIMAL(10,2) DEFAULT 0,

    -- Manual timestamp tracking
    calculated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (summary_date, region, category)
);

-- Complex materialized view for real-time dashboard (limited refresh capabilities)
CREATE MATERIALIZED VIEW current_sales_dashboard AS
WITH hourly_metrics AS (
    SELECT 
        DATE_TRUNC('hour', st.transaction_date) as hour_bucket,
        st.region,
        p.category,
        p.brand,
        c.customer_segment,

        -- Basic aggregations (limited computational capability)
        COUNT(*) as transaction_count,
        COUNT(DISTINCT st.customer_id) as unique_customers,
        SUM(st.total_amount) as total_revenue,
        SUM(st.quantity) as total_units,
        AVG(st.total_amount) as avg_transaction_value,
        SUM(st.discount_amount) as total_discounts,

        -- Limited statistical calculations
        STDDEV(st.total_amount) as revenue_stddev,
        PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY st.total_amount) as median_transaction_value,

        -- Payment method breakdown (basic pivot)
        COUNT(*) FILTER (WHERE st.payment_method = 'credit_card') as credit_card_transactions,
        COUNT(*) FILTER (WHERE st.payment_method = 'debit_card') as debit_card_transactions,
        COUNT(*) FILTER (WHERE st.payment_method = 'cash') as cash_transactions,
        COUNT(*) FILTER (WHERE st.payment_method = 'digital_wallet') as digital_wallet_transactions

    FROM sales_transactions st
    JOIN customers c ON st.customer_id = c.customer_id
    JOIN products p ON st.product_id = p.product_id
    WHERE st.transaction_date >= CURRENT_TIMESTAMP - INTERVAL '24 hours'
    GROUP BY 
        DATE_TRUNC('hour', st.transaction_date),
        st.region, p.category, p.brand, c.customer_segment
),

regional_performance AS (
    SELECT 
        hm.region,

        -- Regional aggregations (limited granularity)
        SUM(hm.transaction_count) as total_transactions,
        SUM(hm.total_revenue) as total_revenue,
        SUM(hm.unique_customers) as unique_customers,
        AVG(hm.avg_transaction_value) as avg_transaction_value,

        -- Simple ranking (no advanced analytics)
        RANK() OVER (ORDER BY SUM(hm.total_revenue) DESC) as revenue_rank,

        -- Basic percentage calculations
        SUM(hm.total_revenue) / SUM(SUM(hm.total_revenue)) OVER () * 100 as revenue_percentage,

        -- Limited trend analysis
        SUM(hm.total_revenue) FILTER (WHERE hm.hour_bucket >= CURRENT_TIMESTAMP - INTERVAL '12 hours') as revenue_last_12h,
        SUM(hm.total_revenue) FILTER (WHERE hm.hour_bucket < CURRENT_TIMESTAMP - INTERVAL '12 hours') as revenue_prev_12h

    FROM hourly_metrics hm
    GROUP BY hm.region
),

category_analysis AS (
    SELECT 
        hm.category,
        hm.brand,

        -- Category-level aggregations
        SUM(hm.transaction_count) as category_transactions,
        SUM(hm.total_revenue) as category_revenue,
        SUM(hm.total_units) as category_units,

        -- Limited cross-category analysis
        SUM(hm.total_revenue) / SUM(SUM(hm.total_revenue)) OVER () * 100 as category_revenue_share,
        DENSE_RANK() OVER (ORDER BY SUM(hm.total_revenue) DESC) as category_rank,

        -- Basic growth calculations (limited time series analysis)
        SUM(hm.total_revenue) FILTER (WHERE hm.hour_bucket >= CURRENT_TIMESTAMP - INTERVAL '6 hours') as recent_revenue,
        SUM(hm.total_revenue) FILTER (WHERE hm.hour_bucket < CURRENT_TIMESTAMP - INTERVAL '6 hours') as earlier_revenue

    FROM hourly_metrics hm
    GROUP BY hm.category, hm.brand
)

SELECT 
    CURRENT_TIMESTAMP as dashboard_last_updated,

    -- Overall metrics (basic calculations only)
    (SELECT SUM(total_transactions) FROM regional_performance) as total_transactions_24h,
    (SELECT SUM(total_revenue) FROM regional_performance) as total_revenue_24h,
    (SELECT SUM(unique_customers) FROM regional_performance) as unique_customers_24h,
    (SELECT AVG(avg_transaction_value) FROM regional_performance) as avg_transaction_value_24h,

    -- Regional performance (limited analysis depth)
    (SELECT JSON_AGG(
        JSON_BUILD_OBJECT(
            'region', region,
            'revenue', total_revenue,
            'transactions', total_transactions,
            'rank', revenue_rank,
            'percentage', ROUND(revenue_percentage, 2),
            'trend', CASE 
                WHEN revenue_last_12h > revenue_prev_12h THEN 'up'
                WHEN revenue_last_12h < revenue_prev_12h THEN 'down' 
                ELSE 'flat'
            END
        ) ORDER BY revenue_rank
    ) FROM regional_performance) as regional_data,

    -- Category analysis (basic breakdown only)
    (SELECT JSON_AGG(
        JSON_BUILD_OBJECT(
            'category', category,
            'brand', brand,
            'revenue', category_revenue,
            'units', category_units,
            'share', ROUND(category_revenue_share, 2),
            'rank', category_rank,
            'growth', CASE 
                WHEN recent_revenue > earlier_revenue THEN 'positive'
                WHEN recent_revenue < earlier_revenue THEN 'negative'
                ELSE 'neutral'
            END
        ) ORDER BY category_rank
    ) FROM category_analysis) as category_data,

    -- Payment method distribution (static breakdown)
    (SELECT JSON_BUILD_OBJECT(
        'credit_card', SUM(credit_card_transactions),
        'debit_card', SUM(debit_card_transactions), 
        'cash', SUM(cash_transactions),
        'digital_wallet', SUM(digital_wallet_transactions)
    ) FROM hourly_metrics) as payment_methods,

    -- Customer segment analysis (limited segmentation)
    (SELECT JSON_AGG(
        JSON_BUILD_OBJECT(
            'segment', customer_segment,
            'transactions', SUM(transaction_count),
            'revenue', SUM(total_revenue),
            'avg_value', AVG(avg_transaction_value)
        )
    ) FROM hourly_metrics GROUP BY customer_segment) as customer_segments;

-- Problems with traditional analytics approaches:
-- 1. Materialized views require manual refresh and don't support real-time updates
-- 2. Limited aggregation and statistical calculation capabilities
-- 3. Complex join operations impact performance with large datasets
-- 4. No support for advanced analytics like time series analysis or forecasting
-- 5. Difficult to handle nested data structures or dynamic schema requirements
-- 6. Pre-aggregation tables require significant maintenance and storage overhead
-- 7. Limited flexibility for ad-hoc analytics queries and dashboard customization
-- 8. No built-in support for complex data transformations or calculated metrics
-- 9. Poor scalability for high-volume real-time analytics workloads
-- 10. Complex query optimization and index management requirements

-- Manual refresh process (resource-intensive and not real-time)
REFRESH MATERIALIZED VIEW CONCURRENTLY current_sales_dashboard;

-- Attempt at real-time hourly summary calculation (performance bottleneck)
WITH real_time_hourly AS (
    SELECT 
        DATE_TRUNC('hour', CURRENT_TIMESTAMP) as current_hour,

        -- Current hour calculations (heavy resource usage)
        COUNT(*) as current_hour_transactions,
        SUM(total_amount) as current_hour_revenue,
        COUNT(DISTINCT customer_id) as current_hour_customers,
        AVG(total_amount) as current_hour_avg_value,

        -- Limited real-time comparisons
        COUNT(*) FILTER (WHERE transaction_date >= DATE_TRUNC('hour', CURRENT_TIMESTAMP)) as this_hour_so_far,
        COUNT(*) FILTER (WHERE transaction_date >= DATE_TRUNC('hour', CURRENT_TIMESTAMP - INTERVAL '1 hour')
                        AND transaction_date < DATE_TRUNC('hour', CURRENT_TIMESTAMP)) as previous_hour_full,

        -- Basic percentage calculations
        SUM(total_amount) FILTER (WHERE transaction_date >= DATE_TRUNC('hour', CURRENT_TIMESTAMP)) as revenue_this_hour,
        SUM(total_amount) FILTER (WHERE transaction_date >= DATE_TRUNC('hour', CURRENT_TIMESTAMP - INTERVAL '1 hour')
                                  AND transaction_date < DATE_TRUNC('hour', CURRENT_TIMESTAMP)) as revenue_previous_hour

    FROM sales_transactions
    WHERE transaction_date >= CURRENT_TIMESTAMP - INTERVAL '2 hours'
),

performance_indicators AS (
    SELECT 
        rth.*,

        -- Limited performance metrics
        CASE 
            WHEN revenue_previous_hour > 0 THEN
                ROUND(((revenue_this_hour - revenue_previous_hour) / revenue_previous_hour) * 100, 2)
            ELSE NULL
        END as revenue_change_percent,

        CASE 
            WHEN previous_hour_full > 0 THEN
                ROUND(((this_hour_so_far - previous_hour_full) / previous_hour_full::FLOAT) * 100, 2)
            ELSE NULL
        END as transaction_change_percent,

        -- Simple trend classification
        CASE 
            WHEN revenue_this_hour > revenue_previous_hour THEN 'increasing'
            WHEN revenue_this_hour < revenue_previous_hour THEN 'decreasing'
            ELSE 'stable'
        END as revenue_trend

    FROM real_time_hourly rth
)

SELECT 
    current_hour,
    current_hour_transactions,
    ROUND(current_hour_revenue::NUMERIC, 2) as current_hour_revenue,
    current_hour_customers,
    ROUND(current_hour_avg_value::NUMERIC, 2) as current_hour_avg_value,

    -- Trend indicators
    revenue_change_percent,
    transaction_change_percent,
    revenue_trend,

    -- Performance assessment (basic classification)
    CASE 
        WHEN revenue_change_percent > 20 THEN 'excellent'
        WHEN revenue_change_percent > 10 THEN 'good'
        WHEN revenue_change_percent > 0 THEN 'positive'
        WHEN revenue_change_percent > -10 THEN 'neutral'
        ELSE 'concerning'
    END as performance_status,

    CURRENT_TIMESTAMP as calculated_at

FROM performance_indicators;

-- Traditional limitations:
-- 1. No real-time dashboard updates - requires manual refresh or polling
-- 2. Limited analytical capabilities compared to specialized analytics databases
-- 3. Performance degrades significantly with large datasets and complex calculations
-- 4. Difficult to implement advanced analytics like cohort analysis or forecasting
-- 5. No support for nested document analysis or flexible schema structures
-- 6. Complex index management and query optimization requirements
-- 7. Limited ability to handle streaming data or event-driven analytics
-- 8. Poor integration with modern visualization tools and BI platforms
-- 9. Significant infrastructure and maintenance overhead for analytics workloads
-- 10. Inflexible aggregation patterns that don't adapt to changing business requirements

MongoDB provides sophisticated real-time analytics capabilities through its powerful Aggregation Framework:

// MongoDB Advanced Real-Time Analytics Dashboard System
const { MongoClient } = require('mongodb');

const client = new MongoClient('mongodb://localhost:27017/?replicaSet=rs0');
const db = client.db('realtime_analytics_system');

// Comprehensive MongoDB Analytics Dashboard Manager
class RealtimeAnalyticsDashboard {
  constructor(db, config = {}) {
    this.db = db;
    this.collections = {
      salesTransactions: db.collection('sales_transactions'),
      customers: db.collection('customers'),
      products: db.collection('products'),
      analyticsCache: db.collection('analytics_cache'),
      dashboardMetrics: db.collection('dashboard_metrics'),
      userSessions: db.collection('user_sessions')
    };

    // Advanced analytics configuration
    this.config = {
      // Real-time processing settings
      enableRealTimeUpdates: config.enableRealTimeUpdates !== false,
      updateInterval: config.updateInterval || 30000, // 30 seconds
      cacheExpiration: config.cacheExpiration || 300000, // 5 minutes

      // Performance optimization
      enableAggregationOptimization: config.enableAggregationOptimization !== false,
      useIndexes: config.useIndexes !== false,
      enableParallelProcessing: config.enableParallelProcessing !== false,
      maxConcurrentPipelines: config.maxConcurrentPipelines || 5,

      // Analytics features
      enableAdvancedMetrics: config.enableAdvancedMetrics !== false,
      enablePredictiveAnalytics: config.enablePredictiveAnalytics || false,
      enableCohortAnalysis: config.enableCohortAnalysis || false,
      enableAnomalyDetection: config.enableAnomalyDetection || false,

      // Dashboard customization
      timeWindows: config.timeWindows || ['1h', '6h', '24h', '7d', '30d'],
      metrics: config.metrics || ['revenue', 'transactions', 'customers', 'conversion'],
      dimensions: config.dimensions || ['region', 'category', 'channel', 'segment'],

      // Data retention
      rawDataRetention: config.rawDataRetention || 90, // days
      aggregatedDataRetention: config.aggregatedDataRetention || 365 // days
    };

    // Analytics state management
    this.dashboardState = {
      lastUpdate: null,
      activeConnections: 0,
      processingStats: {
        totalQueries: 0,
        avgResponseTime: 0,
        cacheHitRate: 0
      }
    };

    // Initialize analytics system
    this.initializeAnalyticsSystem();
  }

  async initializeAnalyticsSystem() {
    console.log('Initializing comprehensive MongoDB real-time analytics system...');

    try {
      // Setup analytics indexes for optimal performance
      await this.setupAnalyticsIndexes();

      // Initialize real-time data processing
      await this.setupRealTimeProcessing();

      // Setup analytics caching layer
      await this.setupAnalyticsCache();

      // Initialize dashboard metrics collection
      await this.initializeDashboardMetrics();

      // Setup performance monitoring
      await this.setupPerformanceMonitoring();

      console.log('Real-time analytics system initialized successfully');

    } catch (error) {
      console.error('Error initializing analytics system:', error);
      throw error;
    }
  }

  async setupAnalyticsIndexes() {
    console.log('Setting up analytics-optimized indexes...');

    try {
      // Sales transactions indexes for time-series analytics
      await this.collections.salesTransactions.createIndexes([
        { key: { transaction_date: 1, region: 1 }, background: true },
        { key: { transaction_date: 1, product_category: 1 }, background: true },
        { key: { customer_id: 1, transaction_date: 1 }, background: true },
        { key: { region: 1, sales_channel: 1, transaction_date: 1 }, background: true },
        { key: { product_id: 1, transaction_date: 1 }, background: true },
        { key: { payment_method: 1, transaction_date: 1 }, background: true }
      ]);

      // Customer analytics indexes
      await this.collections.customers.createIndexes([
        { key: { customer_segment: 1, registration_date: 1 }, background: true },
        { key: { region: 1, customer_segment: 1 }, background: true },
        { key: { lifetime_value: 1 }, background: true }
      ]);

      // Product catalog indexes
      await this.collections.products.createIndexes([
        { key: { category: 1, subcategory: 1 }, background: true },
        { key: { brand: 1, category: 1 }, background: true },
        { key: { margin_percent: 1 }, background: true }
      ]);

      console.log('Analytics indexes created successfully');

    } catch (error) {
      console.error('Error setting up analytics indexes:', error);
      throw error;
    }
  }

  async generateRealtimeSalesDashboard(timeWindow = '24h', filters = {}) {
    console.log(`Generating real-time sales dashboard for ${timeWindow} window...`);

    try {
      // Calculate time range based on window
      const timeRange = this.calculateTimeRange(timeWindow);

      // Build comprehensive aggregation pipeline for dashboard metrics
      const dashboardPipeline = [
        // Stage 1: Time-based filtering with optional additional filters
        {
          $match: {
            transaction_date: {
              $gte: timeRange.startDate,
              $lte: timeRange.endDate
            },
            ...this.buildDynamicFilters(filters)
          }
        },

        // Stage 2: Join with customer data for segmentation
        {
          $lookup: {
            from: 'customers',
            localField: 'customer_id',
            foreignField: '_id',
            as: 'customer_info'
          }
        },

        // Stage 3: Join with product data for category analysis
        {
          $lookup: {
            from: 'products',
            localField: 'product_id',
            foreignField: '_id',
            as: 'product_info'
          }
        },

        // Stage 4: Flatten joined data and add computed fields
        {
          $addFields: {
            customer: { $arrayElemAt: ['$customer_info', 0] },
            product: { $arrayElemAt: ['$product_info', 0] },
            transaction_hour: { $dateToString: { format: '%Y-%m-%d %H:00:00', date: '$transaction_date' } },
            transaction_day: { $dateToString: { format: '%Y-%m-%d', date: '$transaction_date' } },
            profit_margin: {
              $multiply: [
                { $subtract: ['$unit_price', '$product.unit_cost'] },
                '$quantity'
              ]
            },
            is_weekend: {
              $in: [{ $dayOfWeek: '$transaction_date' }, [1, 7]]
            },
            time_of_day: {
              $switch: {
                branches: [
                  { case: { $lt: [{ $hour: '$transaction_date' }, 6] }, then: 'night' },
                  { case: { $lt: [{ $hour: '$transaction_date' }, 12] }, then: 'morning' },
                  { case: { $lt: [{ $hour: '$transaction_date' }, 18] }, then: 'afternoon' },
                  { case: { $lt: [{ $hour: '$transaction_date' }, 22] }, then: 'evening' }
                ],
                default: 'night'
              }
            }
          }
        },

        // Stage 5: Advanced multi-dimensional aggregations
        {
          $facet: {
            // Overall metrics for the time period
            overallMetrics: [
              {
                $group: {
                  _id: null,
                  totalRevenue: { $sum: '$total_amount' },
                  totalTransactions: { $sum: 1 },
                  totalUnits: { $sum: '$quantity' },
                  uniqueCustomers: { $addToSet: '$customer_id' },
                  totalProfit: { $sum: '$profit_margin' },
                  avgTransactionValue: { $avg: '$total_amount' },
                  avgOrderSize: { $avg: '$quantity' },
                  totalDiscounts: { $sum: '$discount_amount' },
                  totalTax: { $sum: '$tax_amount' },

                  // Advanced statistical metrics
                  revenueStdDev: { $stdDevSamp: '$total_amount' },
                  transactionValuePercentiles: {
                    $push: '$total_amount'
                  }
                }
              },
              {
                $addFields: {
                  uniqueCustomerCount: { $size: '$uniqueCustomers' },
                  avgRevenuePerCustomer: {
                    $divide: ['$totalRevenue', { $size: '$uniqueCustomers' }]
                  },
                  profitMargin: {
                    $multiply: [
                      { $divide: ['$totalProfit', '$totalRevenue'] },
                      100
                    ]
                  },
                  discountRate: {
                    $multiply: [
                      { $divide: ['$totalDiscounts', '$totalRevenue'] },
                      100
                    ]
                  }
                }
              }
            ],

            // Time-based trend analysis (hourly breakdown)
            hourlyTrends: [
              {
                $group: {
                  _id: '$transaction_hour',
                  revenue: { $sum: '$total_amount' },
                  transactions: { $sum: 1 },
                  uniqueCustomers: { $addToSet: '$customer_id' },
                  avgTransactionValue: { $avg: '$total_amount' },
                  profit: { $sum: '$profit_margin' }
                }
              },
              {
                $addFields: {
                  uniqueCustomerCount: { $size: '$uniqueCustomers' },
                  hour: '$_id'
                }
              },
              {
                $sort: { _id: 1 }
              }
            ],

            // Regional performance analysis
            regionalPerformance: [
              {
                $group: {
                  _id: '$region',
                  revenue: { $sum: '$total_amount' },
                  transactions: { $sum: 1 },
                  uniqueCustomers: { $addToSet: '$customer_id' },
                  profit: { $sum: '$profit_margin' },
                  avgTransactionValue: { $avg: '$total_amount' },
                  topPaymentMethods: {
                    $push: '$payment_method'
                  }
                }
              },
              {
                $addFields: {
                  uniqueCustomerCount: { $size: '$uniqueCustomers' },
                  region: '$_id',
                  profitMargin: {
                    $multiply: [{ $divide: ['$profit', '$revenue'] }, 100]
                  }
                }
              },
              {
                $sort: { revenue: -1 }
              }
            ],

            // Product category analysis with advanced metrics
            categoryAnalysis: [
              {
                $group: {
                  _id: {
                    category: '$product.category',
                    subcategory: '$product.subcategory'
                  },
                  revenue: { $sum: '$total_amount' },
                  transactions: { $sum: 1 },
                  totalUnits: { $sum: '$quantity' },
                  profit: { $sum: '$profit_margin' },
                  avgUnitPrice: { $avg: '$unit_price' },
                  uniqueProducts: { $addToSet: '$product_id' },
                  brands: { $addToSet: '$product.brand' }
                }
              },
              {
                $addFields: {
                  category: '$_id.category',
                  subcategory: '$_id.subcategory',
                  uniqueProductCount: { $size: '$uniqueProducts' },
                  uniqueBrandCount: { $size: '$brands' },
                  profitMargin: {
                    $multiply: [{ $divide: ['$profit', '$revenue'] }, 100]
                  },
                  revenuePerProduct: {
                    $divide: ['$revenue', { $size: '$uniqueProducts' }]
                  }
                }
              },
              {
                $sort: { revenue: -1 }
              }
            ],

            // Customer segment performance
            customerSegmentAnalysis: [
              {
                $group: {
                  _id: '$customer.customer_segment',
                  revenue: { $sum: '$total_amount' },
                  transactions: { $sum: 1 },
                  uniqueCustomers: { $addToSet: '$customer_id' },
                  profit: { $sum: '$profit_margin' },
                  avgTransactionValue: { $avg: '$total_amount' },
                  avgAge: { $avg: '$customer.age' },
                  genderDistribution: { $push: '$customer.gender' }
                }
              },
              {
                $addFields: {
                  segment: '$_id',
                  uniqueCustomerCount: { $size: '$uniqueCustomers' },
                  revenuePerCustomer: {
                    $divide: ['$revenue', { $size: '$uniqueCustomers' }]
                  },
                  transactionsPerCustomer: {
                    $divide: ['$transactions', { $size: '$uniqueCustomers' }]
                  }
                }
              },
              {
                $sort: { revenuePerCustomer: -1 }
              }
            ],

            // Payment method and channel analysis
            paymentChannelAnalysis: [
              {
                $group: {
                  _id: {
                    paymentMethod: '$payment_method',
                    salesChannel: '$sales_channel'
                  },
                  revenue: { $sum: '$total_amount' },
                  transactions: { $sum: 1 },
                  avgTransactionValue: { $avg: '$total_amount' },
                  profit: { $sum: '$profit_margin' }
                }
              },
              {
                $addFields: {
                  paymentMethod: '$_id.paymentMethod',
                  salesChannel: '$_id.salesChannel'
                }
              },
              {
                $sort: { revenue: -1 }
              }
            ],

            // Time-of-day and weekend analysis
            temporalAnalysis: [
              {
                $group: {
                  _id: {
                    timeOfDay: '$time_of_day',
                    isWeekend: '$is_weekend'
                  },
                  revenue: { $sum: '$total_amount' },
                  transactions: { $sum: 1 },
                  avgTransactionValue: { $avg: '$total_amount' },
                  uniqueCustomers: { $addToSet: '$customer_id' }
                }
              },
              {
                $addFields: {
                  timeOfDay: '$_id.timeOfDay',
                  isWeekend: '$_id.isWeekend',
                  uniqueCustomerCount: { $size: '$uniqueCustomers' }
                }
              },
              {
                $sort: { revenue: -1 }
              }
            ]
          }
        }
      ];

      // Execute the aggregation pipeline
      const dashboardResults = await this.collections.salesTransactions
        .aggregate(dashboardPipeline, {
          allowDiskUse: true,
          hint: { transaction_date: 1, region: 1 }
        })
        .toArray();

      // Process and enrich the results
      const enrichedResults = await this.enrichDashboardResults(dashboardResults[0], timeWindow);

      // Cache the results for performance
      await this.cacheDashboardResults(enrichedResults, timeWindow, filters);

      // Update dashboard metrics
      await this.updateDashboardMetrics(enrichedResults);

      return enrichedResults;

    } catch (error) {
      console.error('Error generating real-time sales dashboard:', error);
      throw error;
    }
  }

  async enrichDashboardResults(results, timeWindow) {
    console.log('Enriching dashboard results with advanced analytics...');

    try {
      const overallMetrics = results.overallMetrics[0] || {};

      // Calculate percentiles for transaction values
      if (overallMetrics.transactionValuePercentiles) {
        const sortedValues = overallMetrics.transactionValuePercentiles.sort((a, b) => a - b);
        const length = sortedValues.length;

        overallMetrics.percentiles = {
          p25: this.calculatePercentile(sortedValues, 25),
          p50: this.calculatePercentile(sortedValues, 50),
          p75: this.calculatePercentile(sortedValues, 75),
          p90: this.calculatePercentile(sortedValues, 90),
          p95: this.calculatePercentile(sortedValues, 95)
        };

        delete overallMetrics.transactionValuePercentiles; // Remove raw data
      }

      // Add growth calculations (comparing with previous period)
      const previousPeriodMetrics = await this.getPreviousPeriodMetrics(timeWindow);
      if (previousPeriodMetrics) {
        overallMetrics.growth = {
          revenueGrowth: this.calculateGrowthRate(overallMetrics.totalRevenue, previousPeriodMetrics.totalRevenue),
          transactionGrowth: this.calculateGrowthRate(overallMetrics.totalTransactions, previousPeriodMetrics.totalTransactions),
          customerGrowth: this.calculateGrowthRate(overallMetrics.uniqueCustomerCount, previousPeriodMetrics.uniqueCustomerCount),
          avgValueGrowth: this.calculateGrowthRate(overallMetrics.avgTransactionValue, previousPeriodMetrics.avgTransactionValue)
        };
      }

      // Add revenue distribution analysis
      if (results.regionalPerformance) {
        const totalRevenue = overallMetrics.totalRevenue || 0;
        results.regionalPerformance = results.regionalPerformance.map(region => ({
          ...region,
          revenueShare: (region.revenue / totalRevenue * 100).toFixed(2),
          customerDensity: (region.uniqueCustomerCount / region.transactions * 100).toFixed(2)
        }));
      }

      // Add category performance rankings
      if (results.categoryAnalysis) {
        results.categoryAnalysis = results.categoryAnalysis.map((category, index) => ({
          ...category,
          rank: index + 1,
          performanceScore: this.calculateCategoryPerformanceScore(category)
        }));
      }

      // Add temporal insights
      if (results.temporalAnalysis) {
        results.temporalAnalysis = results.temporalAnalysis.map(period => ({
          ...period,
          efficiency: (period.revenue / period.transactions).toFixed(2),
          customerEngagement: (period.uniqueCustomerCount / period.transactions * 100).toFixed(2)
        }));
      }

      // Add dashboard metadata
      const enrichedResults = {
        ...results,
        metadata: {
          timeWindow: timeWindow,
          generatedAt: new Date(),
          dataFreshness: this.calculateDataFreshness(),
          performanceMetrics: {
            queryExecutionTime: Date.now() - this.queryStartTime,
            dataPoints: overallMetrics.totalTransactions,
            cacheStatus: 'fresh'
          }
        },
        overallMetrics: overallMetrics
      };

      return enrichedResults;

    } catch (error) {
      console.error('Error enriching dashboard results:', error);
      throw error;
    }
  }

  calculatePercentile(sortedArray, percentile) {
    const index = (percentile / 100) * (sortedArray.length - 1);
    const lower = Math.floor(index);
    const upper = Math.ceil(index);
    const weight = index % 1;

    return (sortedArray[lower] * (1 - weight) + sortedArray[upper] * weight).toFixed(2);
  }

  calculateGrowthRate(current, previous) {
    if (!previous || previous === 0) return null;
    return (((current - previous) / previous) * 100).toFixed(2);
  }

  calculateCategoryPerformanceScore(category) {
    // Weighted scoring based on revenue, profit margin, and transaction volume
    const revenueScore = Math.min(category.revenue / 10000, 100); // Scale revenue
    const profitScore = Math.max(0, Math.min(category.profitMargin || 0, 100));
    const volumeScore = Math.min(category.transactions / 100, 100);

    return ((revenueScore * 0.5) + (profitScore * 0.3) + (volumeScore * 0.2)).toFixed(2);
  }

  buildDynamicFilters(filters) {
    const mongoFilters = {};

    if (filters.regions && filters.regions.length > 0) {
      mongoFilters.region = { $in: filters.regions };
    }

    if (filters.categories && filters.categories.length > 0) {
      mongoFilters.product_category = { $in: filters.categories };
    }

    if (filters.paymentMethods && filters.paymentMethods.length > 0) {
      mongoFilters.payment_method = { $in: filters.paymentMethods };
    }

    if (filters.minAmount || filters.maxAmount) {
      mongoFilters.total_amount = {};
      if (filters.minAmount) mongoFilters.total_amount.$gte = filters.minAmount;
      if (filters.maxAmount) mongoFilters.total_amount.$lte = filters.maxAmount;
    }

    return mongoFilters;
  }

  calculateTimeRange(timeWindow) {
    const endDate = new Date();
    let startDate = new Date();

    switch (timeWindow) {
      case '1h':
        startDate.setHours(endDate.getHours() - 1);
        break;
      case '6h':
        startDate.setHours(endDate.getHours() - 6);
        break;
      case '24h':
        startDate.setDate(endDate.getDate() - 1);
        break;
      case '7d':
        startDate.setDate(endDate.getDate() - 7);
        break;
      case '30d':
        startDate.setDate(endDate.getDate() - 30);
        break;
      default:
        startDate.setDate(endDate.getDate() - 1);
    }

    return { startDate, endDate };
  }

  async generateCustomerLifetimeValueAnalysis() {
    console.log('Generating advanced customer lifetime value analysis...');

    try {
      const clvAnalysisPipeline = [
        // Stage 1: Join transactions with customer data
        {
          $lookup: {
            from: 'customers',
            localField: 'customer_id',
            foreignField: '_id',
            as: 'customer'
          }
        },

        // Stage 2: Flatten customer data
        {
          $addFields: {
            customer: { $arrayElemAt: ['$customer', 0] }
          }
        },

        // Stage 3: Calculate customer metrics
        {
          $group: {
            _id: '$customer_id',
            customerInfo: { $first: '$customer' },
            firstPurchase: { $min: '$transaction_date' },
            lastPurchase: { $max: '$transaction_date' },
            totalRevenue: { $sum: '$total_amount' },
            totalProfit: { 
              $sum: { 
                $multiply: [
                  { $subtract: ['$unit_price', { $ifNull: ['$unit_cost', 0] }] },
                  '$quantity'
                ]
              }
            },
            totalTransactions: { $sum: 1 },
            totalUnits: { $sum: '$quantity' },
            avgOrderValue: { $avg: '$total_amount' },
            purchaseFrequency: { $sum: 1 },
            categories: { $addToSet: '$product_category' },
            paymentMethods: { $push: '$payment_method' },
            channels: { $addToSet: '$sales_channel' }
          }
        },

        // Stage 4: Calculate advanced CLV metrics
        {
          $addFields: {
            customerLifespanDays: {
              $divide: [
                { $subtract: ['$lastPurchase', '$firstPurchase'] },
                1000 * 60 * 60 * 24
              ]
            },
            avgDaysBetweenPurchases: {
              $cond: {
                if: { $gt: ['$totalTransactions', 1] },
                then: {
                  $divide: [
                    { $divide: [
                      { $subtract: ['$lastPurchase', '$firstPurchase'] },
                      1000 * 60 * 60 * 24
                    ]},
                    { $subtract: ['$totalTransactions', 1] }
                  ]
                },
                else: null
              }
            },
            categoryDiversity: { $size: '$categories' },
            channelDiversity: { $size: '$channels' },
            profitMargin: {
              $multiply: [
                { $divide: ['$totalProfit', '$totalRevenue'] },
                100
              ]
            }
          }
        },

        // Stage 5: Calculate predicted CLV (simplified model)
        {
          $addFields: {
            predictedMonthlyValue: {
              $cond: {
                if: { $and: [
                  { $gt: ['$avgDaysBetweenPurchases', 0] },
                  { $lte: ['$avgDaysBetweenPurchases', 365] }
                ]},
                then: {
                  $multiply: [
                    '$avgOrderValue',
                    { $divide: [30, '$avgDaysBetweenPurchases'] }
                  ]
                },
                else: 0
              }
            },
            predictedAnnualValue: {
              $cond: {
                if: { $and: [
                  { $gt: ['$avgDaysBetweenPurchases', 0] },
                  { $lte: ['$avgDaysBetweenPurchases', 365] }
                ]},
                then: {
                  $multiply: [
                    '$avgOrderValue',
                    { $divide: [365, '$avgDaysBetweenPurchases'] }
                  ]
                },
                else: '$totalRevenue'
              }
            }
          }
        },

        // Stage 6: Customer segmentation
        {
          $addFields: {
            valueSegment: {
              $switch: {
                branches: [
                  { case: { $gte: ['$totalRevenue', 5000] }, then: 'high_value' },
                  { case: { $gte: ['$totalRevenue', 1000] }, then: 'medium_value' },
                  { case: { $gte: ['$totalRevenue', 100] }, then: 'low_value' }
                ],
                default: 'minimal_value'
              }
            },
            frequencySegment: {
              $switch: {
                branches: [
                  { case: { $gte: ['$totalTransactions', 20] }, then: 'very_frequent' },
                  { case: { $gte: ['$totalTransactions', 10] }, then: 'frequent' },
                  { case: { $gte: ['$totalTransactions', 5] }, then: 'occasional' }
                ],
                default: 'rare'
              }
            },
            recencySegment: {
              $switch: {
                branches: [
                  { 
                    case: { 
                      $gte: [
                        '$lastPurchase',
                        { $subtract: [new Date(), 30 * 24 * 60 * 60 * 1000] }
                      ]
                    },
                    then: 'recent'
                  },
                  {
                    case: {
                      $gte: [
                        '$lastPurchase',
                        { $subtract: [new Date(), 90 * 24 * 60 * 60 * 1000] }
                      ]
                    },
                    then: 'moderate'
                  }
                ],
                default: 'dormant'
              }
            }
          }
        },

        // Stage 7: Final CLV calculation and risk assessment
        {
          $addFields: {
            rfmScore: {
              $add: [
                {
                  $switch: {
                    branches: [
                      { case: { $eq: ['$recencySegment', 'recent'] }, then: 4 },
                      { case: { $eq: ['$recencySegment', 'moderate'] }, then: 2 }
                    ],
                    default: 1
                  }
                },
                {
                  $switch: {
                    branches: [
                      { case: { $eq: ['$frequencySegment', 'very_frequent'] }, then: 4 },
                      { case: { $eq: ['$frequencySegment', 'frequent'] }, then: 3 },
                      { case: { $eq: ['$frequencySegment', 'occasional'] }, then: 2 }
                    ],
                    default: 1
                  }
                },
                {
                  $switch: {
                    branches: [
                      { case: { $eq: ['$valueSegment', 'high_value'] }, then: 4 },
                      { case: { $eq: ['$valueSegment', 'medium_value'] }, then: 3 },
                      { case: { $eq: ['$valueSegment', 'low_value'] }, then: 2 }
                    ],
                    default: 1
                  }
                }
              ]
            },
            churnRisk: {
              $switch: {
                branches: [
                  {
                    case: {
                      $and: [
                        { $eq: ['$recencySegment', 'dormant'] },
                        { $lt: ['$avgDaysBetweenPurchases', 60] }
                      ]
                    },
                    then: 'high'
                  },
                  {
                    case: {
                      $and: [
                        { $eq: ['$recencySegment', 'moderate'] },
                        { $gt: ['$avgDaysBetweenPurchases', 30] }
                      ]
                    },
                    then: 'medium'
                  }
                ],
                default: 'low'
              }
            }
          }
        },

        // Stage 8: Sort by predicted annual value
        {
          $sort: { predictedAnnualValue: -1, totalRevenue: -1 }
        }
      ];

      const clvResults = await this.collections.salesTransactions
        .aggregate(clvAnalysisPipeline, { allowDiskUse: true })
        .toArray();

      return {
        customerAnalysis: clvResults,
        summary: await this.generateCLVSummary(clvResults),
        generatedAt: new Date()
      };

    } catch (error) {
      console.error('Error generating CLV analysis:', error);
      throw error;
    }
  }

  async generateCLVSummary(clvResults) {
    const totalCustomers = clvResults.length;
    const totalValue = clvResults.reduce((sum, customer) => sum + customer.totalRevenue, 0);
    const totalPredictedValue = clvResults.reduce((sum, customer) => sum + (customer.predictedAnnualValue || 0), 0);

    return {
      totalCustomers,
      totalHistoricalValue: totalValue,
      totalPredictedAnnualValue: totalPredictedValue,
      averageCustomerValue: totalValue / totalCustomers,
      averagePredictedValue: totalPredictedValue / totalCustomers,
      segmentBreakdown: {
        highValue: clvResults.filter(c => c.valueSegment === 'high_value').length,
        mediumValue: clvResults.filter(c => c.valueSegment === 'medium_value').length,
        lowValue: clvResults.filter(c => c.valueSegment === 'low_value').length,
        minimalValue: clvResults.filter(c => c.valueSegment === 'minimal_value').length
      },
      churnRiskDistribution: {
        high: clvResults.filter(c => c.churnRisk === 'high').length,
        medium: clvResults.filter(c => c.churnRisk === 'medium').length,
        low: clvResults.filter(c => c.churnRisk === 'low').length
      },
      topPerformers: clvResults.slice(0, 10).map(customer => ({
        customerId: customer._id,
        totalRevenue: customer.totalRevenue,
        predictedAnnualValue: customer.predictedAnnualValue,
        rfmScore: customer.rfmScore,
        segment: customer.valueSegment
      }))
    };
  }

  async cacheDashboardResults(results, timeWindow, filters) {
    try {
      const cacheKey = `dashboard_${timeWindow}_${JSON.stringify(filters)}`;

      await this.collections.analyticsCache.replaceOne(
        { cacheKey },
        {
          cacheKey,
          results,
          createdAt: new Date(),
          expiresAt: new Date(Date.now() + this.config.cacheExpiration)
        },
        { upsert: true }
      );
    } catch (error) {
      console.warn('Error caching dashboard results:', error.message);
    }
  }

  async getPreviousPeriodMetrics(timeWindow) {
    try {
      // Calculate previous period time range
      const previousTimeRange = this.calculatePreviousPeriodRange(timeWindow);

      const previousMetrics = await this.collections.salesTransactions.aggregate([
        {
          $match: {
            transaction_date: {
              $gte: previousTimeRange.startDate,
              $lte: previousTimeRange.endDate
            }
          }
        },
        {
          $group: {
            _id: null,
            totalRevenue: { $sum: '$total_amount' },
            totalTransactions: { $sum: 1 },
            uniqueCustomers: { $addToSet: '$customer_id' },
            avgTransactionValue: { $avg: '$total_amount' }
          }
        },
        {
          $addFields: {
            uniqueCustomerCount: { $size: '$uniqueCustomers' }
          }
        }
      ]).toArray();

      return previousMetrics[0] || null;

    } catch (error) {
      console.warn('Error getting previous period metrics:', error.message);
      return null;
    }
  }

  calculatePreviousPeriodRange(timeWindow) {
    const currentEndDate = new Date();
    let currentStartDate = new Date();

    // Calculate current period duration
    switch (timeWindow) {
      case '1h':
        currentStartDate.setHours(currentEndDate.getHours() - 1);
        break;
      case '6h':
        currentStartDate.setHours(currentEndDate.getHours() - 6);
        break;
      case '24h':
        currentStartDate.setDate(currentEndDate.getDate() - 1);
        break;
      case '7d':
        currentStartDate.setDate(currentEndDate.getDate() - 7);
        break;
      case '30d':
        currentStartDate.setDate(currentEndDate.getDate() - 30);
        break;
      default:
        currentStartDate.setDate(currentEndDate.getDate() - 1);
    }

    // Calculate previous period (same duration, preceding the current period)
    const periodDuration = currentEndDate.getTime() - currentStartDate.getTime();
    const previousEndDate = new Date(currentStartDate.getTime());
    const previousStartDate = new Date(currentStartDate.getTime() - periodDuration);

    return { startDate: previousStartDate, endDate: previousEndDate };
  }

  calculateDataFreshness() {
    // Calculate how fresh the data is based on the latest transaction
    const now = new Date();
    // This would typically query for the latest transaction timestamp
    // For demo purposes, assuming data is fresh within the last 5 minutes
    return 'fresh'; // 'fresh', 'stale', 'outdated'
  }

  async updateDashboardMetrics(results) {
    try {
      await this.collections.dashboardMetrics.insertOne({
        timestamp: new Date(),
        metrics: {
          totalRevenue: results.overallMetrics?.totalRevenue || 0,
          totalTransactions: results.overallMetrics?.totalTransactions || 0,
          uniqueCustomers: results.overallMetrics?.uniqueCustomerCount || 0,
          avgTransactionValue: results.overallMetrics?.avgTransactionValue || 0
        },
        performance: results.metadata?.performanceMetrics || {}
      });

      // Cleanup old metrics (keep last 1000 entries)
      const totalCount = await this.collections.dashboardMetrics.countDocuments();
      if (totalCount > 1000) {
        const oldestRecords = await this.collections.dashboardMetrics
          .find()
          .sort({ timestamp: 1 })
          .limit(totalCount - 1000)
          .toArray();

        const idsToDelete = oldestRecords.map(record => record._id);
        await this.collections.dashboardMetrics.deleteMany({
          _id: { $in: idsToDelete }
        });
      }
    } catch (error) {
      console.warn('Error updating dashboard metrics:', error.message);
    }
  }

  async setupRealTimeProcessing() {
    if (!this.config.enableRealTimeUpdates) return;

    console.log('Setting up real-time dashboard processing...');

    // Setup interval for dashboard updates
    setInterval(async () => {
      try {
        // Generate fresh dashboard data
        const dashboardData = await this.generateRealtimeSalesDashboard('1h');

        // Emit real-time updates (in a real implementation, this would push to connected clients)
        this.emit('dashboardUpdate', {
          timestamp: new Date(),
          data: dashboardData
        });

        this.dashboardState.lastUpdate = new Date();

      } catch (error) {
        console.error('Error in real-time processing:', error);
      }
    }, this.config.updateInterval);
  }

  async setupAnalyticsCache() {
    console.log('Setting up analytics caching layer...');

    try {
      // Create TTL index for cache expiration
      await this.collections.analyticsCache.createIndex(
        { expiresAt: 1 },
        { expireAfterSeconds: 0, background: true }
      );

      console.log('Analytics cache configured successfully');

    } catch (error) {
      console.error('Error setting up analytics cache:', error);
      throw error;
    }
  }

  async setupPerformanceMonitoring() {
    console.log('Setting up performance monitoring...');

    setInterval(async () => {
      try {
        // Monitor query performance
        const queryStats = await this.db.stats();

        // Update processing statistics
        this.dashboardState.processingStats.totalQueries++;

        // Log performance metrics
        console.log('Dashboard performance metrics:', {
          activeConnections: this.dashboardState.activeConnections,
          avgResponseTime: this.dashboardState.processingStats.avgResponseTime,
          cacheHitRate: this.dashboardState.processingStats.cacheHitRate,
          dbStats: {
            collections: queryStats.collections,
            dataSize: queryStats.dataSize,
            indexSize: queryStats.indexSize
          }
        });

      } catch (error) {
        console.warn('Error in performance monitoring:', error);
      }
    }, 60000); // Every minute
  }
}

// Benefits of MongoDB Advanced Real-Time Analytics:
// - Real-time dashboard updates with minimal latency through change streams integration
// - Complex multi-dimensional aggregations with advanced statistical calculations
// - Flexible data transformation and enrichment during query execution
// - Sophisticated customer segmentation and lifetime value analysis
// - Built-in performance optimization with intelligent caching strategies
// - Scalable architecture supporting high-volume analytics workloads
// - Native MongoDB aggregation framework providing SQL-compatible analytics
// - Advanced temporal analysis with time-series data processing capabilities
// - Comprehensive business intelligence with predictive analytics support
// - SQL-familiar analytics operations through QueryLeaf integration

module.exports = {
  RealtimeAnalyticsDashboard
};

Understanding MongoDB Analytics Architecture

Advanced Aggregation Pipeline Design and Performance Optimization

Implement sophisticated analytics patterns for production MongoDB deployments:

// Production-ready MongoDB analytics with enterprise-grade optimization
class EnterpriseAnalyticsPlatform extends RealtimeAnalyticsDashboard {
  constructor(db, enterpriseConfig) {
    super(db, enterpriseConfig);

    this.enterpriseConfig = {
      ...enterpriseConfig,
      enableDistributedAnalytics: true,
      enableMachineLearning: true,
      enablePredictiveModeling: true,
      enableDataGovernance: true,
      enableComplianceReporting: true,
      enableAdvancedVisualization: true
    };

    this.setupEnterpriseFeatures();
    this.initializeMLPipelines();
    this.setupDataGovernance();
  }

  async implementAdvancedTimeSeriesAnalytics() {
    console.log('Implementing advanced time series analytics with forecasting...');

    const timeSeriesConfig = {
      // Time series aggregation strategies
      temporalAggregation: {
        enableSeasonalityDetection: true,
        enableTrendAnalysis: true,
        enableAnomalyDetection: true,
        forecastHorizon: 30 // days
      },

      // Statistical modeling
      statisticalModeling: {
        enableMovingAverages: true,
        enableExponentialSmoothing: true,
        enableRegressionAnalysis: true,
        confidenceIntervals: true
      },

      // Performance optimization
      performanceOptimization: {
        enableTimeSeriesCollections: true,
        optimizedIndexes: true,
        compressionStrategies: true,
        partitioningSchemes: true
      }
    };

    return await this.deployTimeSeriesAnalytics(timeSeriesConfig);
  }

  async setupAdvancedMLPipelines() {
    console.log('Setting up machine learning pipelines for predictive analytics...');

    const mlPipelineConfig = {
      // Customer behavior prediction
      customerBehaviorML: {
        churnPredictionModel: true,
        clvPredictionModel: true,
        recommendationEngine: true,
        segmentationOptimization: true
      },

      // Sales forecasting
      salesForecastingML: {
        demandForecasting: true,
        inventoryOptimization: true,
        priceOptimization: true,
        seasonalityModeling: true
      },

      // Real-time decision making
      realTimeDecisionEngine: {
        dynamicPricing: true,
        inventoryAlerts: true,
        customerTargeting: true,
        fraudDetection: true
      }
    };

    return await this.implementMLPipelines(mlPipelineConfig);
  }
}

SQL-Style Analytics with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB analytics and dashboard operations:

-- QueryLeaf advanced real-time analytics with SQL-familiar syntax for MongoDB

-- Configure comprehensive analytics dashboard with real-time updates
CONFIGURE ANALYTICS_DASHBOARD
SET real_time_updates = true,
    update_interval_seconds = 30,
    cache_expiration_minutes = 5,
    enable_predictive_analytics = true,
    enable_advanced_metrics = true,
    enable_cohort_analysis = true,
    time_windows = ['1h', '6h', '24h', '7d', '30d'],
    dimensions = ['region', 'category', 'channel', 'segment'];

-- Advanced real-time sales dashboard with comprehensive metrics and analytics
WITH sales_analytics AS (
  -- Primary transaction data with enriched customer and product information
  SELECT 
    st.transaction_id,
    st.transaction_date,
    st.customer_id,
    st.product_id,
    st.total_amount,
    st.quantity,
    st.unit_price,
    st.discount_amount,
    st.tax_amount,
    st.payment_method,
    st.sales_channel,
    st.region,

    -- Customer enrichment
    c.customer_segment,
    c.age,
    c.gender,
    c.city,
    c.state,
    c.registration_date,
    c.lifetime_value,

    -- Product enrichment
    p.category,
    p.subcategory,
    p.brand,
    p.unit_cost,
    p.list_price,
    p.margin_percent,

    -- Calculated fields for analytics
    (st.unit_price - p.unit_cost) * st.quantity as profit_margin,
    st.total_amount - st.discount_amount - st.tax_amount as net_revenue,

    -- Time-based dimensions
    DATE_TRUNC('hour', st.transaction_date) as transaction_hour,
    DATE_TRUNC('day', st.transaction_date) as transaction_day,
    EXTRACT(hour FROM st.transaction_date) as hour_of_day,
    EXTRACT(dow FROM st.transaction_date) as day_of_week,
    EXTRACT(dow FROM st.transaction_date) IN (0, 6) as is_weekend,

    -- Time categorization
    CASE 
      WHEN EXTRACT(hour FROM st.transaction_date) BETWEEN 6 AND 11 THEN 'morning'
      WHEN EXTRACT(hour FROM st.transaction_date) BETWEEN 12 AND 17 THEN 'afternoon'  
      WHEN EXTRACT(hour FROM st.transaction_date) BETWEEN 18 AND 21 THEN 'evening'
      ELSE 'night'
    END as time_of_day_category,

    -- Customer lifecycle stage
    CASE 
      WHEN st.transaction_date - c.registration_date <= INTERVAL '30 days' THEN 'new_customer'
      WHEN st.transaction_date - c.registration_date <= INTERVAL '365 days' THEN 'established_customer'
      ELSE 'loyal_customer'
    END as customer_lifecycle_stage

  FROM sales_transactions st
  JOIN customers c ON st.customer_id = c.customer_id
  JOIN products p ON st.product_id = p.product_id
  WHERE st.transaction_date >= CURRENT_TIMESTAMP - INTERVAL '24 hours'
),

-- Overall metrics with advanced statistical calculations
overall_metrics AS (
  SELECT 
    -- Basic volume metrics
    COUNT(*) as total_transactions,
    COUNT(DISTINCT customer_id) as unique_customers,
    SUM(total_amount) as total_revenue,
    SUM(net_revenue) as total_net_revenue,
    SUM(quantity) as total_units_sold,
    SUM(profit_margin) as total_profit,
    SUM(discount_amount) as total_discounts,
    SUM(tax_amount) as total_tax,

    -- Advanced statistical metrics
    AVG(total_amount) as avg_transaction_value,
    STDDEV(total_amount) as transaction_value_stddev,
    PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY total_amount) as q1_transaction_value,
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY total_amount) as median_transaction_value,
    PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY total_amount) as q3_transaction_value,
    PERCENTILE_CONT(0.9) WITHIN GROUP (ORDER BY total_amount) as p90_transaction_value,
    PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY total_amount) as p95_transaction_value,

    -- Derived metrics
    AVG(quantity) as avg_order_size,
    AVG(profit_margin) as avg_profit_per_transaction,
    AVG(total_amount / NULLIF(quantity, 0)) as avg_price_per_unit,

    -- Efficiency metrics
    SUM(total_revenue) / COUNT(DISTINCT customer_id) as revenue_per_customer,
    COUNT(*) / COUNT(DISTINCT customer_id) as transactions_per_customer,
    SUM(profit_margin) / SUM(total_revenue) * 100 as overall_profit_margin_percent,
    SUM(discount_amount) / SUM(total_revenue) * 100 as overall_discount_rate_percent,

    -- Time-based metrics
    MIN(transaction_date) as earliest_transaction,
    MAX(transaction_date) as latest_transaction,
    COUNT(DISTINCT transaction_hour) as active_hours,
    COUNT(DISTINCT transaction_day) as active_days

  FROM sales_analytics
),

-- Temporal trend analysis with pattern detection
temporal_trends AS (
  SELECT 
    transaction_hour,

    -- Hourly volume metrics
    COUNT(*) as hourly_transactions,
    COUNT(DISTINCT customer_id) as hourly_unique_customers,
    SUM(total_amount) as hourly_revenue,
    SUM(profit_margin) as hourly_profit,
    AVG(total_amount) as hourly_avg_transaction_value,
    SUM(quantity) as hourly_units_sold,

    -- Hour-over-hour growth calculations
    LAG(SUM(total_amount)) OVER (ORDER BY transaction_hour) as prev_hour_revenue,
    LAG(COUNT(*)) OVER (ORDER BY transaction_hour) as prev_hour_transactions,

    -- Moving averages for trend smoothing
    AVG(SUM(total_amount)) OVER (
      ORDER BY transaction_hour 
      ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ) as revenue_3h_moving_avg,

    AVG(COUNT(*)) OVER (
      ORDER BY transaction_hour 
      ROWS BETWEEN 2 PRECEDING AND CURRENT ROW  
    ) as transactions_3h_moving_avg,

    -- Peak detection
    RANK() OVER (ORDER BY SUM(total_amount) DESC) as revenue_rank,
    RANK() OVER (ORDER BY COUNT(*) DESC) as transaction_rank,

    -- Performance classification
    CASE 
      WHEN SUM(total_amount) > AVG(SUM(total_amount)) OVER () * 1.5 THEN 'peak'
      WHEN SUM(total_amount) > AVG(SUM(total_amount)) OVER () THEN 'above_average'
      WHEN SUM(total_amount) > AVG(SUM(total_amount)) OVER () * 0.5 THEN 'below_average'
      ELSE 'low'
    END as performance_tier

  FROM sales_analytics
  GROUP BY transaction_hour
  ORDER BY transaction_hour
),

-- Regional performance analysis with competitive ranking
regional_performance AS (
  SELECT 
    region,

    -- Regional volume metrics
    COUNT(*) as region_transactions,
    COUNT(DISTINCT customer_id) as region_unique_customers,
    COUNT(DISTINCT product_id) as region_unique_products,
    SUM(total_amount) as region_revenue,
    SUM(profit_margin) as region_profit,
    SUM(quantity) as region_units_sold,

    -- Regional efficiency metrics
    AVG(total_amount) as region_avg_transaction_value,
    SUM(total_amount) / COUNT(DISTINCT customer_id) as region_revenue_per_customer,
    COUNT(*) / COUNT(DISTINCT customer_id) as region_frequency_per_customer,
    SUM(profit_margin) / SUM(total_amount) * 100 as region_profit_margin_percent,

    -- Market share calculations
    SUM(total_amount) / SUM(SUM(total_amount)) OVER () * 100 as region_revenue_share,
    COUNT(*) / SUM(COUNT(*)) OVER () * 100 as region_transaction_share,
    COUNT(DISTINCT customer_id) / SUM(COUNT(DISTINCT customer_id)) OVER () * 100 as region_customer_share,

    -- Regional ranking
    RANK() OVER (ORDER BY SUM(total_amount) DESC) as revenue_rank,
    RANK() OVER (ORDER BY SUM(profit_margin) DESC) as profit_rank,
    RANK() OVER (ORDER BY COUNT(DISTINCT customer_id) DESC) as customer_base_rank,

    -- Customer density and engagement
    COUNT(DISTINCT customer_id) / COUNT(*) * 100 as customer_density_percent,
    AVG(
      CASE WHEN customer_lifecycle_stage = 'new_customer' THEN 1 ELSE 0 END
    ) * 100 as new_customer_percent,

    -- Channel and payment preferences
    MODE() WITHIN GROUP (ORDER BY sales_channel) as dominant_sales_channel,
    MODE() WITHIN GROUP (ORDER BY payment_method) as dominant_payment_method,

    -- Performance indicators
    CASE 
      WHEN SUM(total_amount) / SUM(SUM(total_amount)) OVER () > 0.2 THEN 'market_leader'
      WHEN SUM(total_amount) / SUM(SUM(total_amount)) OVER () > 0.1 THEN 'major_market'
      WHEN SUM(total_amount) / SUM(SUM(total_amount)) OVER () > 0.05 THEN 'secondary_market'
      ELSE 'emerging_market'
    END as market_position

  FROM sales_analytics
  GROUP BY region
),

-- Advanced product category analysis with profitability insights
category_analysis AS (
  SELECT 
    category,
    subcategory,
    brand,

    -- Category performance metrics
    COUNT(*) as category_transactions,
    COUNT(DISTINCT customer_id) as category_customers,
    COUNT(DISTINCT product_id) as category_products,
    SUM(total_amount) as category_revenue,
    SUM(profit_margin) as category_profit,
    SUM(quantity) as category_units,

    -- Category efficiency and profitability
    AVG(total_amount) as category_avg_transaction_value,
    AVG(profit_margin) as category_avg_profit_per_transaction,
    SUM(profit_margin) / SUM(total_amount) * 100 as category_profit_margin_percent,
    AVG(unit_price) as category_avg_unit_price,
    AVG(margin_percent) as category_avg_product_margin,

    -- Market positioning
    SUM(total_amount) / SUM(SUM(total_amount)) OVER () * 100 as category_revenue_share,
    COUNT(*) / SUM(COUNT(*)) OVER () * 100 as category_transaction_share,

    -- Category rankings
    RANK() OVER (ORDER BY SUM(total_amount) DESC) as revenue_rank,
    RANK() OVER (ORDER BY SUM(profit_margin) DESC) as profit_rank,
    RANK() OVER (ORDER BY COUNT(*) DESC) as volume_rank,
    RANK() OVER (ORDER BY SUM(profit_margin) / SUM(total_amount) DESC) as margin_rank,

    -- Customer engagement
    COUNT(DISTINCT customer_id) / COUNT(*) * 100 as customer_diversity_percent,
    SUM(total_amount) / COUNT(DISTINCT customer_id) as revenue_per_customer,
    COUNT(*) / COUNT(DISTINCT customer_id) as repeat_purchase_rate,

    -- Product performance distribution
    AVG(list_price) as category_avg_list_price,
    AVG(unit_cost) as category_avg_unit_cost,
    STDDEV(unit_price) as category_price_variance,

    -- Growth and trend indicators
    SUM(total_amount) FILTER (WHERE transaction_date >= CURRENT_TIMESTAMP - INTERVAL '6 hours') as recent_6h_revenue,
    SUM(total_amount) FILTER (WHERE transaction_date < CURRENT_TIMESTAMP - INTERVAL '6 hours') as earlier_18h_revenue,

    -- Performance classification
    CASE 
      WHEN SUM(profit_margin) / SUM(total_amount) > 0.3 THEN 'high_margin'
      WHEN SUM(profit_margin) / SUM(total_amount) > 0.15 THEN 'medium_margin'
      ELSE 'low_margin'
    END as profitability_tier,

    CASE 
      WHEN SUM(total_amount) / SUM(SUM(total_amount)) OVER () > 0.15 THEN 'star_category'
      WHEN SUM(total_amount) / SUM(SUM(total_amount)) OVER () > 0.05 THEN 'growth_category'
      ELSE 'niche_category'
    END as strategic_category

  FROM sales_analytics
  GROUP BY category, subcategory, brand
),

-- Customer segmentation analysis with behavioral insights
customer_segment_analysis AS (
  SELECT 
    customer_segment,
    customer_lifecycle_stage,

    -- Segment volume metrics
    COUNT(*) as segment_transactions,
    COUNT(DISTINCT customer_id) as segment_customers,
    SUM(total_amount) as segment_revenue,
    SUM(profit_margin) as segment_profit,
    SUM(quantity) as segment_units,

    -- Segment behavior analysis
    AVG(total_amount) as segment_avg_transaction_value,
    SUM(total_amount) / COUNT(DISTINCT customer_id) as segment_revenue_per_customer,
    COUNT(*) / COUNT(DISTINCT customer_id) as segment_transactions_per_customer,
    AVG(age) as segment_avg_age,

    -- Demographic breakdown
    AVG(CASE WHEN gender = 'male' THEN 1 ELSE 0 END) * 100 as male_percentage,
    AVG(CASE WHEN gender = 'female' THEN 1 ELSE 0 END) * 100 as female_percentage,
    COUNT(DISTINCT city) as cities_represented,
    COUNT(DISTINCT state) as states_represented,

    -- Channel preferences  
    AVG(CASE WHEN sales_channel = 'online' THEN 1 ELSE 0 END) * 100 as online_preference_percent,
    AVG(CASE WHEN sales_channel = 'retail' THEN 1 ELSE 0 END) * 100 as retail_preference_percent,
    AVG(CASE WHEN sales_channel = 'mobile' THEN 1 ELSE 0 END) * 100 as mobile_preference_percent,

    -- Payment behavior
    AVG(CASE WHEN payment_method = 'credit_card' THEN 1 ELSE 0 END) * 100 as credit_card_usage_percent,
    AVG(CASE WHEN payment_method = 'digital_wallet' THEN 1 ELSE 0 END) * 100 as digital_wallet_usage_percent,

    -- Temporal behavior
    AVG(CASE WHEN is_weekend THEN 1 ELSE 0 END) * 100 as weekend_shopping_percent,
    MODE() WITHIN GROUP (ORDER BY time_of_day_category) as preferred_shopping_time,

    -- Value and profitability
    SUM(profit_margin) / SUM(total_amount) * 100 as segment_profit_margin_percent,
    AVG(lifetime_value) as segment_avg_lifetime_value,

    -- Segment rankings
    RANK() OVER (ORDER BY SUM(total_amount) DESC) as revenue_rank,
    RANK() OVER (ORDER BY SUM(total_amount) / COUNT(DISTINCT customer_id) DESC) as value_per_customer_rank,
    RANK() OVER (ORDER BY COUNT(*) / COUNT(DISTINCT customer_id) DESC) as engagement_rank,

    -- Segment classification
    CASE 
      WHEN SUM(total_amount) / COUNT(DISTINCT customer_id) > 1000 THEN 'high_value_segment'
      WHEN SUM(total_amount) / COUNT(DISTINCT customer_id) > 500 THEN 'medium_value_segment'
      ELSE 'opportunity_segment'
    END as value_classification

  FROM sales_analytics
  GROUP BY customer_segment, customer_lifecycle_stage
),

-- Payment method and channel effectiveness analysis  
channel_payment_analysis AS (
  SELECT 
    sales_channel,
    payment_method,

    -- Channel-payment combination metrics
    COUNT(*) as combination_transactions,
    SUM(total_amount) as combination_revenue,
    AVG(total_amount) as combination_avg_value,
    SUM(profit_margin) as combination_profit,
    COUNT(DISTINCT customer_id) as combination_customers,

    -- Effectiveness metrics
    SUM(total_amount) / COUNT(DISTINCT customer_id) as revenue_per_customer,
    COUNT(*) / COUNT(DISTINCT customer_id) as transactions_per_customer,
    SUM(profit_margin) / SUM(total_amount) * 100 as combination_profit_margin,

    -- Market share within channel
    SUM(total_amount) / SUM(SUM(total_amount)) OVER (PARTITION BY sales_channel) * 100 as payment_share_in_channel,

    -- Market share within payment method
    SUM(total_amount) / SUM(SUM(total_amount)) OVER (PARTITION BY payment_method) * 100 as channel_share_in_payment,

    -- Overall market share
    SUM(total_amount) / SUM(SUM(total_amount)) OVER () * 100 as overall_market_share,

    -- Customer behavior insights
    AVG(age) as combination_avg_customer_age,
    AVG(CASE WHEN customer_segment = 'premium' THEN 1 ELSE 0 END) * 100 as premium_customer_percent,
    AVG(CASE WHEN is_weekend THEN 1 ELSE 0 END) * 100 as weekend_usage_percent,

    -- Performance ranking
    RANK() OVER (ORDER BY SUM(total_amount) DESC) as revenue_rank,
    RANK() OVER (ORDER BY COUNT(*) DESC) as volume_rank,
    RANK() OVER (ORDER BY SUM(profit_margin) / SUM(total_amount) DESC) as profitability_rank

  FROM sales_analytics  
  GROUP BY sales_channel, payment_method
),

-- Advanced growth and trend analysis
growth_trend_analysis AS (
  SELECT 
    -- Current period metrics (last 6 hours)
    SUM(total_amount) FILTER (WHERE transaction_date >= CURRENT_TIMESTAMP - INTERVAL '6 hours') as current_6h_revenue,
    COUNT(*) FILTER (WHERE transaction_date >= CURRENT_TIMESTAMP - INTERVAL '6 hours') as current_6h_transactions,
    COUNT(DISTINCT customer_id) FILTER (WHERE transaction_date >= CURRENT_TIMESTAMP - INTERVAL '6 hours') as current_6h_customers,

    -- Previous period metrics (6-12 hours ago)
    SUM(total_amount) FILTER (
      WHERE transaction_date >= CURRENT_TIMESTAMP - INTERVAL '12 hours' 
      AND transaction_date < CURRENT_TIMESTAMP - INTERVAL '6 hours'
    ) as previous_6h_revenue,
    COUNT(*) FILTER (
      WHERE transaction_date >= CURRENT_TIMESTAMP - INTERVAL '12 hours' 
      AND transaction_date < CURRENT_TIMESTAMP - INTERVAL '6 hours'
    ) as previous_6h_transactions,
    COUNT(DISTINCT customer_id) FILTER (
      WHERE transaction_date >= CURRENT_TIMESTAMP - INTERVAL '12 hours' 
      AND transaction_date < CURRENT_TIMESTAMP - INTERVAL '6 hours'
    ) as previous_6h_customers,

    -- Earlier period metrics (12-18 hours ago) for trend detection
    SUM(total_amount) FILTER (
      WHERE transaction_date >= CURRENT_TIMESTAMP - INTERVAL '18 hours' 
      AND transaction_date < CURRENT_TIMESTAMP - INTERVAL '12 hours'
    ) as earlier_6h_revenue,
    COUNT(*) FILTER (
      WHERE transaction_date >= CURRENT_TIMESTAMP - INTERVAL '18 hours' 
      AND transaction_date < CURRENT_TIMESTAMP - INTERVAL '12 hours'
    ) as earlier_6h_transactions,

    -- Peak analysis
    MAX(total_amount) as peak_transaction_value,
    MIN(total_amount) as min_transaction_value,
    MODE() WITHIN GROUP (ORDER BY EXTRACT(hour FROM transaction_date)) as peak_hour,
    MODE() WITHIN GROUP (ORDER BY region) as dominant_region,
    MODE() WITHIN GROUP (ORDER BY category) as dominant_category

  FROM sales_analytics
)

-- Final dashboard results with comprehensive analytics
SELECT 
  CURRENT_TIMESTAMP as dashboard_generated_at,

  -- Overall performance summary
  JSON_OBJECT(
    'total_transactions', om.total_transactions,
    'total_revenue', ROUND(om.total_revenue::NUMERIC, 2),
    'total_net_revenue', ROUND(om.total_net_revenue::NUMERIC, 2), 
    'total_profit', ROUND(om.total_profit::NUMERIC, 2),
    'unique_customers', om.unique_customers,
    'avg_transaction_value', ROUND(om.avg_transaction_value::NUMERIC, 2),
    'median_transaction_value', ROUND(om.median_transaction_value::NUMERIC, 2),
    'profit_margin_percent', ROUND((om.total_profit / om.total_revenue * 100)::NUMERIC, 2),
    'discount_rate_percent', ROUND((om.total_discounts / om.total_revenue * 100)::NUMERIC, 2),
    'revenue_per_customer', ROUND(om.revenue_per_customer::NUMERIC, 2),
    'transactions_per_customer', ROUND(om.transactions_per_customer::NUMERIC, 2)
  ) as overall_metrics,

  -- Temporal trends with growth indicators
  (SELECT JSON_AGG(
    JSON_OBJECT(
      'hour', transaction_hour,
      'revenue', ROUND(hourly_revenue::NUMERIC, 2),
      'transactions', hourly_transactions,
      'customers', hourly_unique_customers,
      'avg_value', ROUND(hourly_avg_transaction_value::NUMERIC, 2),
      'units_sold', hourly_units_sold,
      'growth_rate_revenue', 
        CASE 
          WHEN prev_hour_revenue > 0 THEN
            ROUND(((hourly_revenue - prev_hour_revenue) / prev_hour_revenue * 100)::NUMERIC, 2)
          ELSE NULL
        END,
      'growth_rate_transactions',
        CASE 
          WHEN prev_hour_transactions > 0 THEN  
            ROUND(((hourly_transactions - prev_hour_transactions) / prev_hour_transactions::FLOAT * 100)::NUMERIC, 2)
          ELSE NULL
        END,
      'revenue_3h_moving_avg', ROUND(revenue_3h_moving_avg::NUMERIC, 2),
      'performance_tier', performance_tier,
      'revenue_rank', revenue_rank
    ) ORDER BY transaction_hour
  ) FROM temporal_trends) as hourly_trends,

  -- Regional performance with competitive analysis  
  (SELECT JSON_AGG(
    JSON_OBJECT(
      'region', region,
      'revenue', ROUND(region_revenue::NUMERIC, 2),
      'revenue_share', ROUND(region_revenue_share::NUMERIC, 2),
      'transactions', region_transactions,
      'customers', region_unique_customers,
      'products', region_unique_products,
      'avg_transaction_value', ROUND(region_avg_transaction_value::NUMERIC, 2),
      'revenue_per_customer', ROUND(region_revenue_per_customer::NUMERIC, 2),
      'profit_margin_percent', ROUND(region_profit_margin_percent::NUMERIC, 2),
      'revenue_rank', revenue_rank,
      'profit_rank', profit_rank,
      'customer_base_rank', customer_base_rank,
      'market_position', market_position,
      'dominant_channel', dominant_sales_channel,
      'dominant_payment', dominant_payment_method,
      'customer_density_percent', ROUND(customer_density_percent::NUMERIC, 2),
      'new_customer_percent', ROUND(new_customer_percent::NUMERIC, 2)
    ) ORDER BY revenue_rank
  ) FROM regional_performance) as regional_analysis,

  -- Category analysis with profitability insights
  (SELECT JSON_AGG(
    JSON_OBJECT(
      'category', category,
      'subcategory', subcategory,
      'brand', brand,
      'revenue', ROUND(category_revenue::NUMERIC, 2),
      'revenue_share', ROUND(category_revenue_share::NUMERIC, 2),
      'transactions', category_transactions,
      'customers', category_customers,
      'products', category_products,
      'profit_margin_percent', ROUND(category_profit_margin_percent::NUMERIC, 2),
      'avg_transaction_value', ROUND(category_avg_transaction_value::NUMERIC, 2),
      'revenue_per_customer', ROUND(revenue_per_customer::NUMERIC, 2),
      'revenue_rank', revenue_rank,
      'profit_rank', profit_rank,
      'margin_rank', margin_rank,
      'profitability_tier', profitability_tier,
      'strategic_category', strategic_category,
      'growth_indicator',
        CASE 
          WHEN earlier_18h_revenue > 0 THEN
            CASE 
              WHEN recent_6h_revenue > earlier_18h_revenue THEN 'growing'
              WHEN recent_6h_revenue < earlier_18h_revenue THEN 'declining'
              ELSE 'stable'
            END
          ELSE 'insufficient_data'
        END
    ) ORDER BY revenue_rank
  ) FROM category_analysis) as category_performance,

  -- Customer segment insights
  (SELECT JSON_AGG(
    JSON_OBJECT(
      'segment', customer_segment,
      'lifecycle_stage', customer_lifecycle_stage,
      'revenue', ROUND(segment_revenue::NUMERIC, 2),
      'customers', segment_customers,
      'transactions', segment_transactions,
      'revenue_per_customer', ROUND(segment_revenue_per_customer::NUMERIC, 2),
      'transactions_per_customer', ROUND(segment_transactions_per_customer::NUMERIC, 2),
      'avg_age', ROUND(segment_avg_age::NUMERIC, 1),
      'avg_lifetime_value', ROUND(segment_avg_lifetime_value::NUMERIC, 2),
      'profit_margin_percent', ROUND(segment_profit_margin_percent::NUMERIC, 2),
      'male_percentage', ROUND(male_percentage::NUMERIC, 1),
      'female_percentage', ROUND(female_percentage::NUMERIC, 1),
      'online_preference_percent', ROUND(online_preference_percent::NUMERIC, 1),
      'weekend_shopping_percent', ROUND(weekend_shopping_percent::NUMERIC, 1),
      'preferred_shopping_time', preferred_shopping_time,
      'value_classification', value_classification,
      'revenue_rank', revenue_rank
    ) ORDER BY revenue_rank
  ) FROM customer_segment_analysis) as segment_analysis,

  -- Channel and payment method effectiveness
  (SELECT JSON_AGG(
    JSON_OBJECT(
      'channel', sales_channel,
      'payment_method', payment_method,
      'revenue', ROUND(combination_revenue::NUMERIC, 2),
      'transactions', combination_transactions,
      'customers', combination_customers,
      'avg_transaction_value', ROUND(combination_avg_value::NUMERIC, 2),
      'revenue_per_customer', ROUND(revenue_per_customer::NUMERIC, 2),
      'profit_margin_percent', ROUND(combination_profit_margin::NUMERIC, 2),
      'overall_market_share', ROUND(overall_market_share::NUMERIC, 2),
      'payment_share_in_channel', ROUND(payment_share_in_channel::NUMERIC, 2),
      'channel_share_in_payment', ROUND(channel_share_in_payment::NUMERIC, 2),
      'premium_customer_percent', ROUND(premium_customer_percent::NUMERIC, 1),
      'weekend_usage_percent', ROUND(weekend_usage_percent::NUMERIC, 1),
      'revenue_rank', revenue_rank,
      'profitability_rank', profitability_rank
    ) ORDER BY revenue_rank
  ) FROM channel_payment_analysis) as channel_payment_effectiveness,

  -- Growth trends and momentum indicators
  (SELECT JSON_OBJECT(
    'current_6h_revenue', ROUND(current_6h_revenue::NUMERIC, 2),
    'current_6h_transactions', current_6h_transactions,
    'current_6h_customers', current_6h_customers,
    'previous_6h_revenue', ROUND(previous_6h_revenue::NUMERIC, 2),
    'previous_6h_transactions', previous_6h_transactions,
    'previous_6h_customers', previous_6h_customers,
    'revenue_growth_rate',
      CASE 
        WHEN previous_6h_revenue > 0 THEN
          ROUND(((current_6h_revenue - previous_6h_revenue) / previous_6h_revenue * 100)::NUMERIC, 2)
        ELSE NULL
      END,
    'transaction_growth_rate',
      CASE 
        WHEN previous_6h_transactions > 0 THEN
          ROUND(((current_6h_transactions - previous_6h_transactions) / previous_6h_transactions::FLOAT * 100)::NUMERIC, 2)
        ELSE NULL
      END,
    'customer_growth_rate',
      CASE 
        WHEN previous_6h_customers > 0 THEN
          ROUND(((current_6h_customers - previous_6h_customers) / previous_6h_customers::FLOAT * 100)::NUMERIC, 2)
        ELSE NULL
      END,
    'momentum_indicator',
      CASE 
        WHEN previous_6h_revenue > 0 AND earlier_6h_revenue > 0 THEN
          CASE 
            WHEN current_6h_revenue > previous_6h_revenue AND previous_6h_revenue > earlier_6h_revenue THEN 'accelerating'
            WHEN current_6h_revenue > previous_6h_revenue AND previous_6h_revenue <= earlier_6h_revenue THEN 'recovering'
            WHEN current_6h_revenue <= previous_6h_revenue AND previous_6h_revenue > earlier_6h_revenue THEN 'slowing'
            WHEN current_6h_revenue <= previous_6h_revenue AND previous_6h_revenue <= earlier_6h_revenue THEN 'declining'
            ELSE 'stable'
          END
        ELSE 'insufficient_data'
      END,
    'peak_transaction_value', ROUND(peak_transaction_value::NUMERIC, 2),
    'min_transaction_value', ROUND(min_transaction_value::NUMERIC, 2),
    'peak_hour', peak_hour,
    'dominant_region', dominant_region,
    'dominant_category', dominant_category
  ) FROM growth_trend_analysis) as growth_trends,

  -- Dashboard metadata and performance indicators
  JSON_OBJECT(
    'data_freshness_minutes', 
      EXTRACT(MINUTES FROM CURRENT_TIMESTAMP - (SELECT MAX(transaction_date) FROM sales_analytics)),
    'analysis_time_window', '24 hours',
    'total_data_points', (SELECT total_transactions FROM overall_metrics),
    'analysis_depth', 'comprehensive',
    'last_updated', CURRENT_TIMESTAMP,
    'performance_indicators', JSON_OBJECT(
      'query_complexity', 'high',
      'data_completeness', 'complete',
      'analytical_accuracy', 'high',
      'real_time_capability', true
    )
  ) as dashboard_metadata

FROM overall_metrics om
CROSS JOIN growth_trend_analysis gta;

-- Advanced customer lifetime value analysis with SQL aggregations
WITH customer_transaction_history AS (
  SELECT 
    st.customer_id,
    c.customer_segment,
    c.registration_date,
    c.age,
    c.gender,
    c.city,
    c.state,

    -- Transaction aggregations
    MIN(st.transaction_date) as first_purchase_date,
    MAX(st.transaction_date) as last_purchase_date,
    COUNT(*) as total_transactions,
    SUM(st.total_amount) as lifetime_revenue,
    AVG(st.total_amount) as avg_transaction_value,
    SUM(st.quantity) as total_units_purchased,
    SUM(st.profit_margin) as lifetime_profit,

    -- Temporal behavior
    COUNT(DISTINCT DATE_TRUNC('month', st.transaction_date)) as active_months,
    COUNT(DISTINCT p.category) as categories_purchased,
    COUNT(DISTINCT st.sales_channel) as channels_used,
    COUNT(DISTINCT st.payment_method) as payment_methods_used,

    -- Calculated metrics
    EXTRACT(DAYS FROM MAX(st.transaction_date) - MIN(st.transaction_date)) as customer_lifespan_days,
    AVG(EXTRACT(DAYS FROM st.transaction_date - LAG(st.transaction_date) OVER (
      PARTITION BY st.customer_id ORDER BY st.transaction_date
    ))) as avg_days_between_purchases

  FROM sales_transactions st
  JOIN customers c ON st.customer_id = c.customer_id
  JOIN products p ON st.product_id = p.product_id
  WHERE st.transaction_date >= CURRENT_TIMESTAMP - INTERVAL '365 days'
  GROUP BY st.customer_id, c.customer_segment, c.registration_date, c.age, c.gender, c.city, c.state
),

customer_segmentation_and_prediction AS (
  SELECT 
    cth.*,

    -- CLV calculations
    CASE 
      WHEN avg_days_between_purchases > 0 AND avg_days_between_purchases <= 365 THEN
        avg_transaction_value * (365 / avg_days_between_purchases)
      ELSE lifetime_revenue
    END as predicted_annual_value,

    CASE 
      WHEN avg_days_between_purchases > 0 AND avg_days_between_purchases <= 365 THEN
        avg_transaction_value * (30 / avg_days_between_purchases)
      ELSE lifetime_revenue / GREATEST(active_months, 1)
    END as predicted_monthly_value,

    -- RFM scoring
    NTILE(5) OVER (ORDER BY last_purchase_date DESC) as recency_score,
    NTILE(5) OVER (ORDER BY total_transactions DESC) as frequency_score, 
    NTILE(5) OVER (ORDER BY lifetime_revenue DESC) as monetary_score,

    -- Customer lifecycle classification
    CASE 
      WHEN last_purchase_date >= CURRENT_TIMESTAMP - INTERVAL '30 days' THEN 'active'
      WHEN last_purchase_date >= CURRENT_TIMESTAMP - INTERVAL '90 days' THEN 'at_risk'
      WHEN last_purchase_date >= CURRENT_TIMESTAMP - INTERVAL '180 days' THEN 'dormant'
      ELSE 'churned'
    END as lifecycle_status,

    -- Value segmentation
    CASE 
      WHEN lifetime_revenue >= 5000 THEN 'vip'
      WHEN lifetime_revenue >= 1000 THEN 'high_value'
      WHEN lifetime_revenue >= 500 THEN 'medium_value'
      WHEN lifetime_revenue >= 100 THEN 'low_value'
      ELSE 'minimal_value'
    END as value_segment,

    -- Engagement classification
    CASE 
      WHEN total_transactions >= 20 THEN 'highly_engaged'
      WHEN total_transactions >= 10 THEN 'engaged'
      WHEN total_transactions >= 5 THEN 'moderately_engaged'
      ELSE 'low_engagement'
    END as engagement_level,

    -- Churn risk assessment
    CASE 
      WHEN last_purchase_date < CURRENT_TIMESTAMP - INTERVAL '90 days' AND avg_days_between_purchases < 60 THEN 'high_risk'
      WHEN last_purchase_date < CURRENT_TIMESTAMP - INTERVAL '60 days' AND avg_days_between_purchases < 45 THEN 'medium_risk'
      WHEN last_purchase_date < CURRENT_TIMESTAMP - INTERVAL '30 days' AND total_transactions > 5 THEN 'low_risk'
      ELSE 'minimal_risk'
    END as churn_risk

  FROM customer_transaction_history cth
)

SELECT 
  -- Customer lifetime value summary
  JSON_OBJECT(
    'total_customers_analyzed', COUNT(*),
    'total_historical_revenue', SUM(lifetime_revenue),
    'total_predicted_annual_revenue', SUM(predicted_annual_value),
    'avg_customer_lifetime_value', AVG(lifetime_revenue),
    'avg_predicted_annual_value', AVG(predicted_annual_value),
    'avg_customer_lifespan_days', AVG(customer_lifespan_days),
    'avg_purchase_frequency_days', AVG(avg_days_between_purchases)
  ) as clv_summary,

  -- Value segment distribution
  (SELECT JSON_OBJECT_AGG(
    value_segment,
    JSON_OBJECT(
      'customer_count', COUNT(*),
      'total_revenue', SUM(lifetime_revenue),
      'avg_revenue_per_customer', AVG(lifetime_revenue),
      'avg_predicted_annual_value', AVG(predicted_annual_value),
      'avg_transactions', AVG(total_transactions),
      'revenue_share_percent', ROUND(SUM(lifetime_revenue) / SUM(SUM(lifetime_revenue)) OVER () * 100, 2)
    )
  ) FROM customer_segmentation_and_prediction GROUP BY value_segment) as value_segments,

  -- Lifecycle status analysis
  (SELECT JSON_OBJECT_AGG(
    lifecycle_status,
    JSON_OBJECT(
      'customer_count', COUNT(*),
      'total_revenue', SUM(lifetime_revenue),
      'avg_revenue_per_customer', AVG(lifetime_revenue),
      'avg_recency_score', AVG(recency_score),
      'avg_frequency_score', AVG(frequency_score),
      'avg_monetary_score', AVG(monetary_score)
    )
  ) FROM customer_segmentation_and_prediction GROUP BY lifecycle_status) as lifecycle_analysis,

  -- Churn risk assessment
  (SELECT JSON_OBJECT_AGG(
    churn_risk,
    JSON_OBJECT(
      'customer_count', COUNT(*),
      'at_risk_revenue', SUM(lifetime_revenue),
      'avg_predicted_annual_loss', AVG(predicted_annual_value),
      'high_value_customers_at_risk', COUNT(*) FILTER (WHERE value_segment IN ('vip', 'high_value'))
    )
  ) FROM customer_segmentation_and_prediction GROUP BY churn_risk) as churn_risk_analysis,

  -- Top performers
  (SELECT JSON_AGG(
    JSON_OBJECT(
      'customer_id', customer_id,
      'lifetime_revenue', ROUND(lifetime_revenue::NUMERIC, 2),
      'predicted_annual_value', ROUND(predicted_annual_value::NUMERIC, 2),
      'total_transactions', total_transactions,
      'customer_lifespan_days', customer_lifespan_days,
      'avg_transaction_value', ROUND(avg_transaction_value::NUMERIC, 2),
      'value_segment', value_segment,
      'engagement_level', engagement_level,
      'rfm_combined_score', recency_score + frequency_score + monetary_score,
      'churn_risk', churn_risk
    ) ORDER BY lifetime_revenue DESC LIMIT 20
  )) as top_customers,

  CURRENT_TIMESTAMP as analysis_generated_at

FROM customer_segmentation_and_prediction;

-- Real-time analytics performance monitoring
CREATE VIEW analytics_performance_dashboard AS
WITH performance_metrics AS (
  SELECT 
    -- Query performance indicators
    COUNT(*) as total_dashboard_queries_24h,
    AVG(query_duration_ms) as avg_query_duration_ms,
    PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY query_duration_ms) as p95_query_duration_ms,
    MAX(query_duration_ms) as max_query_duration_ms,

    -- Data freshness metrics
    AVG(EXTRACT(MINUTES FROM query_timestamp - data_timestamp)) as avg_data_age_minutes,
    MAX(EXTRACT(MINUTES FROM query_timestamp - data_timestamp)) as max_data_age_minutes,

    -- Cache performance
    COUNT(*) FILTER (WHERE cache_hit = true) as cache_hits,
    COUNT(*) FILTER (WHERE cache_hit = false) as cache_misses,

    -- Resource utilization
    AVG(memory_usage_mb) as avg_memory_usage_mb,
    MAX(memory_usage_mb) as peak_memory_usage_mb,
    AVG(cpu_utilization_percent) as avg_cpu_utilization,

    -- Error rates
    COUNT(*) FILTER (WHERE query_status = 'error') as query_errors,
    COUNT(*) FILTER (WHERE query_status = 'timeout') as query_timeouts

  FROM analytics_query_log
  WHERE query_timestamp >= CURRENT_TIMESTAMP - INTERVAL '24 hours'
)

SELECT 
  CURRENT_TIMESTAMP as dashboard_time,

  -- Performance indicators
  total_dashboard_queries_24h,
  ROUND(avg_query_duration_ms::NUMERIC, 2) as avg_response_time_ms,
  ROUND(p95_query_duration_ms::NUMERIC, 2) as p95_response_time_ms,
  ROUND(max_query_duration_ms::NUMERIC, 2) as max_response_time_ms,

  -- Data quality indicators
  ROUND(avg_data_age_minutes::NUMERIC, 2) as avg_data_freshness_minutes,
  ROUND(max_data_age_minutes::NUMERIC, 2) as max_data_age_minutes,

  -- Cache effectiveness
  cache_hits,
  cache_misses,
  CASE 
    WHEN (cache_hits + cache_misses) > 0 THEN
      ROUND((cache_hits::FLOAT / (cache_hits + cache_misses) * 100)::NUMERIC, 2)
    ELSE 0
  END as cache_hit_rate_percent,

  -- System resource utilization
  ROUND(avg_memory_usage_mb::NUMERIC, 2) as avg_memory_mb,
  ROUND(peak_memory_usage_mb::NUMERIC, 2) as peak_memory_mb,
  ROUND(avg_cpu_utilization::NUMERIC, 2) as avg_cpu_percent,

  -- Reliability indicators
  query_errors,
  query_timeouts,
  CASE 
    WHEN total_dashboard_queries_24h > 0 THEN
      ROUND(((total_dashboard_queries_24h - query_errors - query_timeouts)::FLOAT / total_dashboard_queries_24h * 100)::NUMERIC, 2)
    ELSE 100
  END as success_rate_percent,

  -- Health status
  CASE 
    WHEN avg_query_duration_ms > 5000 OR (query_errors + query_timeouts) > total_dashboard_queries_24h * 0.05 THEN 'critical'
    WHEN avg_query_duration_ms > 2000 OR (query_errors + query_timeouts) > total_dashboard_queries_24h * 0.02 THEN 'warning'
    ELSE 'healthy'
  END as system_health,

  -- Performance recommendations
  ARRAY[
    CASE WHEN avg_query_duration_ms > 3000 THEN 'Consider query optimization or caching improvements' END,
    CASE WHEN cache_hit_rate_percent < 70 THEN 'Cache hit rate is low - review caching strategy' END,
    CASE WHEN avg_data_age_minutes > 10 THEN 'Data freshness may impact real-time insights' END,
    CASE WHEN peak_memory_usage_mb > 1000 THEN 'High memory usage detected - consider resource scaling' END
  ]::TEXT[] as recommendations

FROM performance_metrics;

-- QueryLeaf provides comprehensive MongoDB analytics capabilities:
-- 1. SQL-familiar syntax for complex aggregation pipelines and dashboard queries
-- 2. Advanced real-time analytics with multi-dimensional data processing
-- 3. Customer lifetime value analysis with predictive modeling capabilities
-- 4. Sophisticated segmentation and behavioral analysis through SQL constructs
-- 5. Real-time performance monitoring with comprehensive health indicators
-- 6. Advanced temporal trend analysis with growth rate calculations
-- 7. Production-ready analytics operations with caching and optimization
-- 8. Integration with MongoDB's native aggregation framework optimizations
-- 9. Comprehensive business intelligence with statistical analysis support
-- 10. Enterprise-grade analytics dashboards accessible through familiar SQL patterns

Best Practices for Production Analytics Implementation

Analytics Pipeline Design and Optimization

Essential principles for effective MongoDB analytics dashboard deployment:

  1. Data Modeling Strategy: Design analytics-optimized schemas with appropriate indexing strategies for time-series and dimensional queries
  2. Aggregation Optimization: Implement efficient aggregation pipelines with proper stage ordering and memory-conscious operations
  3. Caching Architecture: Deploy intelligent caching layers that balance data freshness with query performance requirements
  4. Real-Time Processing: Configure change stream integration for live dashboard updates without performance degradation
  5. Scalability Design: Architect analytics systems that can handle growing data volumes and increasing concurrent user loads
  6. Performance Monitoring: Implement comprehensive monitoring that tracks query performance, resource utilization, and user experience metrics

Enterprise Analytics Deployment

Optimize analytics platforms for production enterprise environments:

  1. Distributed Processing: Implement distributed analytics processing that can leverage MongoDB's sharding capabilities for massive datasets
  2. Security Integration: Ensure analytics operations meet enterprise security requirements with proper access controls and data governance
  3. Compliance Framework: Design analytics systems that support regulatory requirements for data retention, audit trails, and reporting
  4. Operational Integration: Integrate analytics platforms with existing monitoring, alerting, and business intelligence infrastructure
  5. Multi-Tenant Architecture: Support multiple business units and use cases with scalable, isolated analytics environments
  6. Cost Optimization: Monitor and optimize analytics resource usage and processing costs for efficient operations

Conclusion

MongoDB's Aggregation Framework provides sophisticated real-time analytics capabilities that enable powerful dashboard creation, complex data processing, and comprehensive business intelligence without the complexity and infrastructure overhead of traditional analytics platforms. Native aggregation operations offer scalable, efficient, and flexible data processing directly within the operational database.

Key MongoDB Analytics benefits include:

  • Real-Time Processing: Immediate insight generation from operational data without ETL delays or separate analytics infrastructure
  • Advanced Aggregations: Sophisticated multi-stage data processing with statistical calculations, temporal analysis, and predictive modeling
  • Flexible Analytics: Dynamic dashboard creation with customizable metrics, dimensions, and filtering capabilities
  • Scalable Architecture: Native MongoDB integration that scales efficiently with data growth and analytical complexity
  • Performance Optimization: Built-in optimization features with intelligent caching, indexing, and query planning
  • SQL Accessibility: Familiar SQL-style analytics operations through QueryLeaf for accessible business intelligence development

Whether you're building executive dashboards, operational analytics, customer insights platforms, or real-time monitoring systems, MongoDB aggregation with QueryLeaf's familiar SQL interface provides the foundation for powerful, scalable, and efficient analytics solutions.

QueryLeaf Integration: QueryLeaf automatically optimizes MongoDB aggregation pipelines while providing SQL-familiar syntax for complex analytics operations. Advanced dashboard creation, customer segmentation, and predictive analytics are seamlessly handled through familiar SQL constructs, making sophisticated business intelligence accessible to SQL-oriented development teams without requiring deep MongoDB aggregation expertise.

The combination of MongoDB's robust aggregation capabilities with SQL-style analytics operations makes it an ideal platform for applications requiring both real-time operational data processing and familiar business intelligence patterns, ensuring your analytics solutions can deliver immediate insights while maintaining development team productivity and system performance.