Blog

December 4, 2025
26 min read

MongoDB Capped Collections: High-Performance Logging and Circular Buffer Management for Enterprise Data Streams

Modern applications generate continuous streams of time-series data, logs, events, and real-time messages that require efficient storage, retrieval, and automatic management without manual intervention. Traditional relational databases struggle with high-volume streaming data scenarios, requiring complex archival procedures, partition management, and manual cleanup processes that add operational complexity and performance overhead to data pipeline architectures.

MongoDB capped collections provide native circular buffer functionality with guaranteed insertion order, automatic size management, and optimized storage patterns designed for high-throughput streaming applications. Unlike traditional approaches that require external log rotation systems or complex partitioning strategies, capped collections automatically manage storage limits while maintaining insertion order and providing efficient tail-able cursor capabilities for real-time data consumption.

The Traditional High-Volume Logging Challenge

Conventional relational database approaches to high-volume logging and streaming data face significant operational limitations:

-- Traditional PostgreSQL high-volume logging - complex partition management and cleanup overhead

-- Application log management with manual partitioning and rotation
CREATE TABLE application_logs (
    log_id BIGSERIAL PRIMARY KEY,
    log_timestamp TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    log_level VARCHAR(20) NOT NULL,
    application VARCHAR(100) NOT NULL,
    component VARCHAR(100) NOT NULL,

    -- Log content and metadata
    log_message TEXT NOT NULL,
    log_data JSONB,
    user_id INTEGER,
    session_id VARCHAR(100),
    request_id VARCHAR(100),

    -- Performance tracking
    execution_time_ms INTEGER,
    memory_usage_mb DECIMAL(10,2),
    cpu_usage_percent DECIMAL(5,2),

    -- Context information
    server_hostname VARCHAR(200),
    process_id INTEGER,
    thread_id INTEGER,
    environment VARCHAR(50) DEFAULT 'production',

    -- Correlation and tracing
    trace_id VARCHAR(100),
    parent_span_id VARCHAR(100),
    operation_name VARCHAR(200),

    CONSTRAINT valid_log_level CHECK (log_level IN ('DEBUG', 'INFO', 'WARN', 'ERROR', 'FATAL')),
    CONSTRAINT valid_environment CHECK (environment IN ('development', 'testing', 'staging', 'production'))
) PARTITION BY RANGE (log_timestamp);

-- Create partitions for log data (manual partition management)
CREATE TABLE application_logs_2025_01 PARTITION OF application_logs
    FOR VALUES FROM ('2025-01-01') TO ('2025-02-01');

CREATE TABLE application_logs_2025_02 PARTITION OF application_logs  
    FOR VALUES FROM ('2025-02-01') TO ('2025-03-01');

CREATE TABLE application_logs_2025_03 PARTITION OF application_logs
    FOR VALUES FROM ('2025-03-01') TO ('2025-04-01');

-- Performance indexes for log queries (per partition)
CREATE INDEX idx_app_logs_2025_01_timestamp ON application_logs_2025_01 (log_timestamp DESC);
CREATE INDEX idx_app_logs_2025_01_level_app ON application_logs_2025_01 (log_level, application, log_timestamp DESC);
CREATE INDEX idx_app_logs_2025_01_user_session ON application_logs_2025_01 (user_id, session_id, log_timestamp DESC);
CREATE INDEX idx_app_logs_2025_01_trace ON application_logs_2025_01 (trace_id);

-- Real-time event stream with manual buffer management
CREATE TABLE event_stream_buffer (
    event_id BIGSERIAL PRIMARY KEY,
    event_timestamp TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    event_type VARCHAR(100) NOT NULL,
    event_source VARCHAR(100) NOT NULL,

    -- Event payload
    event_data JSONB NOT NULL,
    event_version VARCHAR(20) DEFAULT '1.0',
    event_schema_version INTEGER DEFAULT 1,

    -- Stream metadata
    stream_name VARCHAR(200) NOT NULL,
    partition_key VARCHAR(200),
    sequence_number BIGINT,

    -- Processing status
    processed BOOLEAN DEFAULT FALSE,
    processing_attempts INTEGER DEFAULT 0,
    last_processed TIMESTAMP,
    processing_error TEXT,

    -- Buffer management
    buffer_position INTEGER,
    retention_priority INTEGER DEFAULT 5, -- 1 highest, 10 lowest

    -- Performance metadata
    event_size_bytes INTEGER GENERATED ALWAYS AS (length(event_data::text)) STORED,
    ingestion_latency_ms INTEGER
);

-- Complex buffer management procedure with manual overflow handling
CREATE OR REPLACE FUNCTION manage_event_stream_buffer()
RETURNS INTEGER AS $$
DECLARE
    buffer_max_size INTEGER := 1000000; -- 1 million events
    buffer_max_age INTERVAL := '7 days';
    cleanup_batch_size INTEGER := 10000;
    current_buffer_size INTEGER;
    events_to_remove INTEGER := 0;
    removed_events INTEGER := 0;
    cleanup_cursor CURSOR FOR
        SELECT event_id, event_timestamp, event_size_bytes
        FROM event_stream_buffer
        WHERE (event_timestamp < CURRENT_TIMESTAMP - buffer_max_age
               OR (processed = TRUE AND processing_attempts >= 3))
        ORDER BY retention_priority DESC, event_timestamp ASC
        LIMIT cleanup_batch_size;

    event_record RECORD;
    total_size_removed BIGINT := 0;

BEGIN
    RAISE NOTICE 'Starting event stream buffer management...';

    -- Check current buffer size
    SELECT COUNT(*), SUM(event_size_bytes) 
    INTO current_buffer_size, total_size_removed
    FROM event_stream_buffer;

    RAISE NOTICE 'Current buffer: % events, % bytes', current_buffer_size, total_size_removed;

    -- Calculate events to remove if over capacity
    IF current_buffer_size > buffer_max_size THEN
        events_to_remove := current_buffer_size - buffer_max_size + (buffer_max_size * 0.1)::INTEGER;
        RAISE NOTICE 'Buffer over capacity, removing % events', events_to_remove;
    END IF;

    -- Remove old and processed events
    FOR event_record IN cleanup_cursor LOOP
        BEGIN
            -- Archive event before deletion (if required)
            INSERT INTO event_stream_archive (
                original_event_id, event_timestamp, event_type, event_source,
                event_data, stream_name, archived_at, archive_reason
            ) VALUES (
                event_record.event_id, event_record.event_timestamp, 
                (SELECT event_type FROM event_stream_buffer WHERE event_id = event_record.event_id),
                (SELECT event_source FROM event_stream_buffer WHERE event_id = event_record.event_id),
                (SELECT event_data FROM event_stream_buffer WHERE event_id = event_record.event_id),
                (SELECT stream_name FROM event_stream_buffer WHERE event_id = event_record.event_id),
                CURRENT_TIMESTAMP, 'buffer_management'
            );

            -- Remove event from buffer
            DELETE FROM event_stream_buffer WHERE event_id = event_record.event_id;

            removed_events := removed_events + 1;
            total_size_removed := total_size_removed + event_record.event_size_bytes;

            -- Exit if we've removed enough events
            EXIT WHEN events_to_remove > 0 AND removed_events >= events_to_remove;

        EXCEPTION WHEN OTHERS THEN
            RAISE WARNING 'Error processing event % during buffer cleanup: %', 
                event_record.event_id, SQLERRM;
        END;
    END LOOP;

    -- Update buffer positions for remaining events
    WITH position_update AS (
        SELECT event_id, 
               ROW_NUMBER() OVER (ORDER BY event_timestamp ASC) as new_position
        FROM event_stream_buffer
    )
    UPDATE event_stream_buffer 
    SET buffer_position = pu.new_position
    FROM position_update pu
    WHERE event_stream_buffer.event_id = pu.event_id;

    -- Log buffer management results
    INSERT INTO buffer_management_log (
        management_timestamp, events_removed, bytes_reclaimed,
        buffer_size_after, management_duration_ms
    ) VALUES (
        CURRENT_TIMESTAMP, removed_events, total_size_removed,
        (SELECT COUNT(*) FROM event_stream_buffer),
        EXTRACT(MILLISECONDS FROM (CURRENT_TIMESTAMP - (SELECT CURRENT_TIMESTAMP)))
    );

    RAISE NOTICE 'Buffer management completed: % events removed, % bytes reclaimed', 
        removed_events, total_size_removed;

    RETURN removed_events;
END;
$$ LANGUAGE plpgsql;

-- Scheduled buffer management (requires external cron job)
CREATE TABLE buffer_management_schedule (
    schedule_name VARCHAR(100) PRIMARY KEY,
    management_function VARCHAR(200) NOT NULL,
    schedule_cron VARCHAR(100) NOT NULL,
    last_execution TIMESTAMP,
    next_execution TIMESTAMP,

    -- Configuration
    enabled BOOLEAN DEFAULT TRUE,
    max_execution_time INTERVAL DEFAULT '30 minutes',
    buffer_size_threshold INTEGER,

    -- Performance tracking
    average_execution_time INTERVAL,
    average_events_processed INTEGER,
    consecutive_failures INTEGER DEFAULT 0,
    last_error_message TEXT
);

INSERT INTO buffer_management_schedule (schedule_name, management_function, schedule_cron) VALUES
('event_buffer_cleanup', 'manage_event_stream_buffer()', '*/15 * * * *'), -- Every 15 minutes
('log_partition_cleanup', 'cleanup_old_log_partitions()', '0 2 * * 0'),   -- Weekly at 2 AM
('archive_processed_events', 'archive_old_processed_events()', '0 1 * * *'); -- Daily at 1 AM

-- Manual partition management for log tables
CREATE OR REPLACE FUNCTION create_monthly_log_partitions(months_ahead INTEGER DEFAULT 3)
RETURNS INTEGER AS $$
DECLARE
    partition_count INTEGER := 0;
    partition_date DATE;
    partition_name TEXT;
    partition_start DATE;
    partition_end DATE;
    month_counter INTEGER := 0;

BEGIN
    -- Create partitions for upcoming months
    WHILE month_counter <= months_ahead LOOP
        partition_date := DATE_TRUNC('month', CURRENT_DATE) + (month_counter || ' months')::INTERVAL;
        partition_start := partition_date;
        partition_end := partition_start + INTERVAL '1 month';

        partition_name := 'application_logs_' || TO_CHAR(partition_date, 'YYYY_MM');

        -- Check if partition already exists
        IF NOT EXISTS (
            SELECT 1 FROM pg_tables 
            WHERE tablename = partition_name 
            AND schemaname = 'public'
        ) THEN
            -- Create partition
            EXECUTE format(
                'CREATE TABLE %I PARTITION OF application_logs FOR VALUES FROM (%L) TO (%L)',
                partition_name, partition_start, partition_end
            );

            -- Create indexes on new partition
            EXECUTE format(
                'CREATE INDEX %I ON %I (log_timestamp DESC)',
                'idx_' || partition_name || '_timestamp', partition_name
            );

            EXECUTE format(
                'CREATE INDEX %I ON %I (log_level, application, log_timestamp DESC)',
                'idx_' || partition_name || '_level_app', partition_name
            );

            partition_count := partition_count + 1;

            RAISE NOTICE 'Created partition: % for period % to %', 
                partition_name, partition_start, partition_end;
        END IF;

        month_counter := month_counter + 1;
    END LOOP;

    RETURN partition_count;
END;
$$ LANGUAGE plpgsql;

-- Complex log rotation and cleanup
CREATE OR REPLACE FUNCTION cleanup_old_log_partitions(retention_months INTEGER DEFAULT 6)
RETURNS INTEGER AS $$
DECLARE
    partition_record RECORD;
    dropped_partitions INTEGER := 0;
    retention_threshold DATE;
    partition_cursor CURSOR FOR
        SELECT schemaname, tablename,
               SUBSTRING(tablename FROM 'application_logs_([0-9]{4}_[0-9]{2})$') as period_str
        FROM pg_tables 
        WHERE tablename LIKE 'application_logs_2%'
        AND schemaname = 'public';

BEGIN
    retention_threshold := DATE_TRUNC('month', CURRENT_DATE) - (retention_months || ' months')::INTERVAL;

    RAISE NOTICE 'Cleaning up log partitions older than %', retention_threshold;

    FOR partition_record IN partition_cursor LOOP
        DECLARE
            partition_date DATE;
        BEGIN
            -- Parse partition date from table name
            partition_date := TO_DATE(partition_record.period_str, 'YYYY_MM');

            -- Check if partition is old enough to drop
            IF partition_date < retention_threshold THEN
                -- Archive partition data before dropping (if required)
                EXECUTE format(
                    'INSERT INTO application_logs_archive SELECT * FROM %I.%I',
                    partition_record.schemaname, partition_record.tablename
                );

                -- Drop the partition
                EXECUTE format('DROP TABLE %I.%I', 
                    partition_record.schemaname, partition_record.tablename);

                dropped_partitions := dropped_partitions + 1;

                RAISE NOTICE 'Dropped old partition: %', partition_record.tablename;
            END IF;

        EXCEPTION WHEN OTHERS THEN
            RAISE WARNING 'Error processing partition %: %', 
                partition_record.tablename, SQLERRM;
        END;
    END LOOP;

    RETURN dropped_partitions;
END;
$$ LANGUAGE plpgsql;

-- Monitor buffer and partition performance
WITH buffer_performance AS (
    SELECT 
        'event_stream_buffer' as buffer_name,
        COUNT(*) as total_events,
        SUM(event_size_bytes) as total_size_bytes,
        AVG(event_size_bytes) as avg_event_size,
        MIN(event_timestamp) as oldest_event,
        MAX(event_timestamp) as newest_event,

        -- Processing metrics
        COUNT(*) FILTER (WHERE processed = TRUE) as processed_events,
        COUNT(*) FILTER (WHERE processing_error IS NOT NULL) as error_events,
        AVG(processing_attempts) as avg_processing_attempts,

        -- Buffer efficiency
        EXTRACT(EPOCH FROM (MAX(event_timestamp) - MIN(event_timestamp))) / 3600 as timespan_hours,
        COUNT(*) / NULLIF(EXTRACT(EPOCH FROM (MAX(event_timestamp) - MIN(event_timestamp))) / 3600, 0) as events_per_hour

    FROM event_stream_buffer
),

partition_performance AS (
    SELECT 
        schemaname || '.' || tablename as partition_name,
        pg_total_relation_size(schemaname||'.'||tablename) as partition_size_bytes,

        -- Estimate row count (approximate)
        CASE 
            WHEN pg_total_relation_size(schemaname||'.'||tablename) > 0 THEN
                pg_total_relation_size(schemaname||'.'||tablename) / 1024 -- Rough estimate
            ELSE 0
        END as estimated_rows,

        SUBSTRING(tablename FROM 'application_logs_([0-9]{4}_[0-9]{2})$') as time_period

    FROM pg_tables 
    WHERE tablename LIKE 'application_logs_2%'
    AND schemaname = 'public'
)

SELECT 
    -- Buffer performance summary
    bp.buffer_name,
    bp.total_events,
    ROUND(bp.total_size_bytes / (1024 * 1024)::decimal, 2) as total_size_mb,
    ROUND(bp.avg_event_size::decimal, 2) as avg_event_size_bytes,
    bp.timespan_hours,
    ROUND(bp.events_per_hour::decimal, 2) as throughput_events_per_hour,

    -- Processing efficiency
    ROUND((bp.processed_events::decimal / bp.total_events::decimal) * 100, 1) as processing_success_rate,
    bp.error_events,
    ROUND(bp.avg_processing_attempts::decimal, 2) as avg_retry_attempts,

    -- Operational assessment
    CASE 
        WHEN bp.events_per_hour > 10000 THEN 'high_throughput'
        WHEN bp.events_per_hour > 1000 THEN 'medium_throughput' 
        ELSE 'low_throughput'
    END as throughput_classification,

    -- Management recommendations
    CASE 
        WHEN bp.total_events > 500000 THEN 'Buffer approaching capacity - increase cleanup frequency'
        WHEN bp.error_events > bp.total_events * 0.1 THEN 'High error rate - investigate processing issues'
        WHEN bp.avg_processing_attempts > 2 THEN 'Frequent retries - check downstream systems'
        ELSE 'Buffer operating within normal parameters'
    END as operational_recommendation

FROM buffer_performance bp

UNION ALL

SELECT 
    pp.partition_name,
    pp.estimated_rows as total_events,
    ROUND(pp.partition_size_bytes / (1024 * 1024)::decimal, 2) as total_size_mb,
    CASE WHEN pp.estimated_rows > 0 THEN 
        ROUND(pp.partition_size_bytes::decimal / pp.estimated_rows::decimal, 2) 
    ELSE 0 END as avg_event_size_bytes,
    NULL as timespan_hours,
    NULL as throughput_events_per_hour,
    NULL as processing_success_rate,
    NULL as error_events,
    NULL as avg_retry_attempts,

    -- Partition classification
    CASE 
        WHEN pp.partition_size_bytes > 1024 * 1024 * 1024 THEN 'large_partition' -- > 1GB
        WHEN pp.partition_size_bytes > 100 * 1024 * 1024 THEN 'medium_partition' -- > 100MB
        ELSE 'small_partition'
    END as throughput_classification,

    -- Partition management recommendations
    CASE 
        WHEN pp.partition_size_bytes > 5 * 1024 * 1024 * 1024 THEN 'Large partition - consider archival' -- > 5GB
        WHEN pp.time_period < TO_CHAR(CURRENT_DATE - INTERVAL '6 months', 'YYYY_MM') THEN 'Old partition - candidate for cleanup'
        ELSE 'Partition within normal size parameters'
    END as operational_recommendation

FROM partition_performance pp
ORDER BY total_size_mb DESC;

-- Traditional logging limitations:
-- 1. Complex partition management requiring manual creation and maintenance procedures  
-- 2. Resource-intensive cleanup operations affecting application performance and availability
-- 3. Manual buffer overflow handling with complex archival and rotation logic
-- 4. Limited scalability for high-volume streaming data scenarios requiring constant maintenance
-- 5. Operational overhead of monitoring partition sizes, buffer utilization, and cleanup scheduling
-- 6. Complex indexing strategies required for efficient time-series queries across partitions
-- 7. Risk of data loss during partition management operations and buffer overflow conditions
-- 8. Difficult integration with real-time streaming applications requiring tail-able cursors
-- 9. Performance degradation as partition counts increase requiring complex query optimization
-- 10. Manual coordination of cleanup schedules across multiple data retention policies

MongoDB capped collections provide native circular buffer functionality with automatic size management and optimized performance:

// MongoDB Capped Collections - Native circular buffer management for high-performance streaming data
const { MongoClient, ObjectId } = require('mongodb');

// Enterprise-grade MongoDB Capped Collections Manager for High-Performance Data Streams
class MongoCappedCollectionManager {
  constructor(client, config = {}) {
    this.client = client;
    this.db = client.db(config.database || 'streaming_platform');

    this.config = {
      // Capped collection configuration
      enableTailableCursors: config.enableTailableCursors !== false,
      enableOplogIntegration: config.enableOplogIntegration || false,
      enableMetricsCollection: config.enableMetricsCollection !== false,

      // Performance optimization
      enableIndexOptimization: config.enableIndexOptimization !== false,
      enableCompressionOptimization: config.enableCompressionOptimization || false,
      enableShardingSupport: config.enableShardingSupport || false,

      // Monitoring and alerts
      enablePerformanceMonitoring: config.enablePerformanceMonitoring !== false,
      enableCapacityAlerts: config.enableCapacityAlerts !== false,
      alertThresholdPercent: config.alertThresholdPercent || 85,

      // Advanced features
      enableDataArchiving: config.enableDataArchiving || false,
      enableReplicationOptimization: config.enableReplicationOptimization || false,
      enableBulkInsertOptimization: config.enableBulkInsertOptimization !== false
    };

    // Collection management state
    this.cappedCollections = new Map();
    this.tailableCursors = new Map();
    this.performanceMetrics = new Map();
    this.capacityMonitors = new Map();

    this.initializeManager();
  }

  async initializeManager() {
    console.log('Initializing MongoDB Capped Collections Manager for high-performance streaming...');

    try {
      // Setup capped collections for different streaming scenarios
      await this.setupApplicationLogsCappedCollection();
      await this.setupEventStreamCappedCollection();
      await this.setupRealTimeMetricsCappedCollection();
      await this.setupAuditTrailCappedCollection();
      await this.setupPerformanceMonitoringCollection();

      // Initialize performance monitoring
      if (this.config.enablePerformanceMonitoring) {
        await this.initializePerformanceMonitoring();
      }

      // Setup capacity monitoring
      if (this.config.enableCapacityAlerts) {
        await this.initializeCapacityMonitoring();
      }

      console.log('Capped Collections Manager initialized successfully');

    } catch (error) {
      console.error('Error initializing capped collections manager:', error);
      throw error;
    }
  }

  async setupApplicationLogsCappedCollection() {
    console.log('Setting up application logs capped collection...');

    try {
      const collectionName = 'application_logs';
      const cappedOptions = {
        capped: true,
        size: 1024 * 1024 * 1024, // 1GB size limit
        max: 1000000,              // 1 million document limit

        // Storage optimization
        storageEngine: {
          wiredTiger: {
            configString: 'block_compressor=snappy'
          }
        }
      };

      // Create capped collection with optimized configuration
      await this.db.createCollection(collectionName, cappedOptions);
      const collection = this.db.collection(collectionName);

      // Create optimal indexes for log queries (minimal indexing for capped collections)
      await collection.createIndexes([
        { key: { logLevel: 1, timestamp: 1 }, background: true },
        { key: { application: 1, component: 1 }, background: true },
        { key: { traceId: 1 }, background: true, sparse: true }
      ]);

      // Store collection configuration
      this.cappedCollections.set(collectionName, {
        collection: collection,
        cappedOptions: cappedOptions,
        insertionOrder: true,
        tailableSupported: true,
        useCase: 'application_logging',
        performanceProfile: 'high_throughput',

        // Monitoring configuration
        monitoring: {
          trackInsertRate: true,
          trackSizeUtilization: true,
          trackQueryPerformance: true
        }
      });

      console.log(`Application logs capped collection created: ${cappedOptions.size} bytes, ${cappedOptions.tailable} documents max`);

    } catch (error) {
      if (error.code === 48) {
        // Collection already exists and is capped
        console.log('Application logs capped collection already exists');
        const collection = this.db.collection('application_logs');
        this.cappedCollections.set('application_logs', {
          collection: collection,
          existing: true,
          useCase: 'application_logging'
        });
      } else {
        console.error('Error creating application logs capped collection:', error);
        throw error;
      }
    }
  }

  async setupEventStreamCappedCollection() {
    console.log('Setting up event stream capped collection...');

    try {
      const collectionName = 'event_stream';
      const cappedOptions = {
        capped: true,
        size: 2 * 1024 * 1024 * 1024, // 2GB size limit  
        max: 5000000,                  // 5 million document limit

        // Optimized for streaming workloads
        writeConcern: { w: 1, j: false }, // Fast writes for streaming
      };

      await this.db.createCollection(collectionName, cappedOptions);
      const collection = this.db.collection(collectionName);

      // Minimal indexing optimized for insertion order and tailable cursors
      await collection.createIndexes([
        { key: { eventType: 1, timestamp: 1 }, background: true },
        { key: { streamName: 1 }, background: true },
        { key: { correlationId: 1 }, background: true, sparse: true }
      ]);

      this.cappedCollections.set(collectionName, {
        collection: collection,
        cappedOptions: cappedOptions,
        insertionOrder: true,
        tailableSupported: true,
        useCase: 'event_streaming',
        performanceProfile: 'ultra_high_throughput',

        // Advanced streaming features
        streaming: {
          enableTailableCursors: true,
          enableChangeStreams: true,
          bufferOptimized: true,
          realTimeConsumption: true
        }
      });

      console.log(`Event stream capped collection created: ${cappedOptions.size} bytes capacity`);

    } catch (error) {
      if (error.code === 48) {
        console.log('Event stream capped collection already exists');
        const collection = this.db.collection('event_stream');
        this.cappedCollections.set('event_stream', {
          collection: collection,
          existing: true,
          useCase: 'event_streaming'
        });
      } else {
        console.error('Error creating event stream capped collection:', error);
        throw error;
      }
    }
  }

  async setupRealTimeMetricsCappedCollection() {
    console.log('Setting up real-time metrics capped collection...');

    try {
      const collectionName = 'realtime_metrics';
      const cappedOptions = {
        capped: true,
        size: 512 * 1024 * 1024, // 512MB size limit
        max: 2000000,             // 2 million document limit
      };

      await this.db.createCollection(collectionName, cappedOptions);
      const collection = this.db.collection(collectionName);

      // Optimized indexes for metrics queries
      await collection.createIndexes([
        { key: { metricType: 1, timestamp: 1 }, background: true },
        { key: { source: 1, timestamp: -1 }, background: true },
        { key: { aggregationLevel: 1 }, background: true }
      ]);

      this.cappedCollections.set(collectionName, {
        collection: collection,
        cappedOptions: cappedOptions,
        insertionOrder: true,
        tailableSupported: true,
        useCase: 'metrics_streaming',
        performanceProfile: 'time_series_optimized',

        // Metrics-specific configuration
        metrics: {
          enableAggregation: true,
          timeSeriesOptimized: true,
          enableRealTimeAlerts: true
        }
      });

      console.log(`Real-time metrics capped collection created: ${cappedOptions.size} bytes capacity`);

    } catch (error) {
      if (error.code === 48) {
        console.log('Real-time metrics capped collection already exists');
        const collection = this.db.collection('realtime_metrics');
        this.cappedCollections.set('realtime_metrics', {
          collection: collection,
          existing: true,
          useCase: 'metrics_streaming'
        });
      } else {
        console.error('Error creating real-time metrics capped collection:', error);
        throw error;
      }
    }
  }

  async setupAuditTrailCappedCollection() {
    console.log('Setting up audit trail capped collection...');

    try {
      const collectionName = 'audit_trail';
      const cappedOptions = {
        capped: true,
        size: 256 * 1024 * 1024, // 256MB size limit
        max: 500000,              // 500k document limit

        // Enhanced durability for audit data
        writeConcern: { w: 'majority', j: true }
      };

      await this.db.createCollection(collectionName, cappedOptions);
      const collection = this.db.collection(collectionName);

      // Audit-optimized indexes
      await collection.createIndexes([
        { key: { auditType: 1, timestamp: 1 }, background: true },
        { key: { userId: 1, timestamp: -1 }, background: true },
        { key: { resourceId: 1 }, background: true, sparse: true }
      ]);

      this.cappedCollections.set(collectionName, {
        collection: collection,
        cappedOptions: cappedOptions,
        insertionOrder: true,
        tailableSupported: true,
        useCase: 'audit_logging',
        performanceProfile: 'compliance_optimized',

        // Audit-specific features
        audit: {
          immutableInsertOrder: true,
          tamperEvident: true,
          complianceMode: true
        }
      });

      console.log(`Audit trail capped collection created: ${cappedOptions.size} bytes capacity`);

    } catch (error) {
      if (error.code === 48) {
        console.log('Audit trail capped collection already exists');
        const collection = this.db.collection('audit_trail');
        this.cappedCollections.set('audit_trail', {
          collection: collection,
          existing: true,
          useCase: 'audit_logging'
        });
      } else {
        console.error('Error creating audit trail capped collection:', error);
        throw error;
      }
    }
  }

  async logApplicationEvent(logData) {
    console.log('Logging application event to capped collection...');

    try {
      const logsCollection = this.cappedCollections.get('application_logs').collection;

      const logEntry = {
        _id: new ObjectId(),
        timestamp: new Date(),
        logLevel: logData.level || 'INFO',
        application: logData.application,
        component: logData.component,

        // Log content
        message: logData.message,
        logData: logData.data || {},

        // Context information
        userId: logData.userId,
        sessionId: logData.sessionId,
        requestId: logData.requestId,

        // Performance tracking
        executionTime: logData.executionTime || null,
        memoryUsage: logData.memoryUsage || null,
        cpuUsage: logData.cpuUsage || null,

        // Server context
        hostname: logData.hostname || require('os').hostname(),
        processId: process.pid,
        environment: logData.environment || 'production',

        // Distributed tracing
        traceId: logData.traceId,
        spanId: logData.spanId,
        operation: logData.operation,

        // Capped collection metadata
        insertionOrder: true,
        streamingOptimized: true
      };

      // High-performance insert optimized for capped collections
      const result = await logsCollection.insertOne(logEntry, {
        writeConcern: { w: 1, j: false } // Fast writes for logging
      });

      // Update performance metrics
      await this.updateCollectionMetrics('application_logs', 'insert', logEntry);

      console.log(`Application log inserted: ${result.insertedId}`);

      return {
        logId: result.insertedId,
        timestamp: logEntry.timestamp,
        cappedCollection: true,
        insertionOrder: logEntry.insertionOrder
      };

    } catch (error) {
      console.error('Error logging application event:', error);
      throw error;
    }
  }

  async streamEvent(eventData) {
    console.log('Streaming event to capped collection...');

    try {
      const eventCollection = this.cappedCollections.get('event_stream').collection;

      const streamEvent = {
        _id: new ObjectId(),
        timestamp: new Date(),
        eventType: eventData.type,
        eventSource: eventData.source,

        // Event payload
        eventData: eventData.payload || {},
        eventVersion: eventData.version || '1.0',
        schemaVersion: eventData.schemaVersion || 1,

        // Stream metadata
        streamName: eventData.streamName,
        partitionKey: eventData.partitionKey,
        sequenceNumber: Date.now(), // Monotonic sequence

        // Processing metadata
        processed: false,
        processingAttempts: 0,

        // Correlation and tracing
        correlationId: eventData.correlationId,
        causationId: eventData.causationId,

        // Performance optimization
        eventSizeBytes: JSON.stringify(eventData.payload || {}).length,
        ingestionLatency: eventData.ingestionLatency || null,

        // Streaming optimization
        tailableReady: true,
        bufferOptimized: true
      };

      // Ultra-high-performance insert for streaming
      const result = await eventCollection.insertOne(streamEvent, {
        writeConcern: { w: 1, j: false }
      });

      // Update streaming metrics
      await this.updateCollectionMetrics('event_stream', 'stream', streamEvent);

      console.log(`Stream event inserted: ${result.insertedId}`);

      return {
        eventId: result.insertedId,
        sequenceNumber: streamEvent.sequenceNumber,
        streamName: streamEvent.streamName,
        cappedOptimized: true
      };

    } catch (error) {
      console.error('Error streaming event:', error);
      throw error;
    }
  }

  async recordMetric(metricData) {
    console.log('Recording real-time metric to capped collection...');

    try {
      const metricsCollection = this.cappedCollections.get('realtime_metrics').collection;

      const metric = {
        _id: new ObjectId(),
        timestamp: new Date(),
        metricType: metricData.type,
        metricName: metricData.name,

        // Metric values
        value: metricData.value,
        unit: metricData.unit || 'count',
        tags: metricData.tags || {},

        // Source information
        source: metricData.source,
        sourceType: metricData.sourceType || 'application',

        // Aggregation metadata
        aggregationLevel: metricData.aggregationLevel || 'raw',
        aggregationWindow: metricData.aggregationWindow || null,

        // Time series optimization
        timeSeriesOptimized: true,
        bucketTimestamp: new Date(Math.floor(Date.now() / (60 * 1000)) * 60 * 1000), // 1-minute buckets

        // Performance metadata
        collectionTimestamp: Date.now(),
        processingLatency: metricData.processingLatency || null
      };

      // Time-series optimized insert
      const result = await metricsCollection.insertOne(metric, {
        writeConcern: { w: 1, j: false }
      });

      // Update metrics collection performance
      await this.updateCollectionMetrics('realtime_metrics', 'metric', metric);

      console.log(`Real-time metric recorded: ${result.insertedId}`);

      return {
        metricId: result.insertedId,
        metricType: metric.metricType,
        timestamp: metric.timestamp,
        timeSeriesOptimized: true
      };

    } catch (error) {
      console.error('Error recording metric:', error);
      throw error;
    }
  }

  async createTailableCursor(collectionName, options = {}) {
    console.log(`Creating tailable cursor for collection: ${collectionName}`);

    try {
      const collectionConfig = this.cappedCollections.get(collectionName);
      if (!collectionConfig) {
        throw new Error(`Collection ${collectionName} not found in capped collections`);
      }

      if (!collectionConfig.tailableSupported) {
        throw new Error(`Collection ${collectionName} does not support tailable cursors`);
      }

      const collection = collectionConfig.collection;

      // Configure tailable cursor options
      const tailableOptions = {
        tailable: true,
        awaitData: true,
        noCursorTimeout: true,
        maxTimeMS: options.maxTimeMS || 1000,
        batchSize: options.batchSize || 100,

        // Starting position
        sort: { $natural: 1 }, // Natural insertion order
        ...(options.filter || {})
      };

      // Create cursor starting from specified position or latest
      let cursor;
      if (options.fromTimestamp) {
        cursor = collection.find({ 
          timestamp: { $gte: options.fromTimestamp },
          ...(options.additionalFilter || {})
        }, tailableOptions);
      } else if (options.fromLatest) {
        // Start from the end of the collection
        const lastDoc = await collection.findOne({}, { sort: { $natural: -1 } });
        if (lastDoc) {
          cursor = collection.find({ 
            _id: { $gt: lastDoc._id },
            ...(options.additionalFilter || {})
          }, tailableOptions);
        } else {
          cursor = collection.find(options.additionalFilter || {}, tailableOptions);
        }
      } else {
        cursor = collection.find(options.additionalFilter || {}, tailableOptions);
      }

      // Store cursor for management
      const cursorId = `${collectionName}_${Date.now()}_${Math.random().toString(36).substr(2, 9)}`;
      this.tailableCursors.set(cursorId, {
        cursor: cursor,
        collectionName: collectionName,
        options: tailableOptions,
        createdAt: new Date(),
        active: true,

        // Performance tracking
        documentsRead: 0,
        lastActivity: new Date()
      });

      console.log(`Tailable cursor created: ${cursorId} for collection ${collectionName}`);

      return {
        cursorId: cursorId,
        cursor: cursor,
        collectionName: collectionName,
        tailableEnabled: true,
        awaitData: tailableOptions.awaitData
      };

    } catch (error) {
      console.error(`Error creating tailable cursor for ${collectionName}:`, error);
      throw error;
    }
  }

  async streamFromTailableCursor(cursorId, eventHandler, errorHandler) {
    console.log(`Starting streaming from tailable cursor: ${cursorId}`);

    try {
      const cursorInfo = this.tailableCursors.get(cursorId);
      if (!cursorInfo || !cursorInfo.active) {
        throw new Error(`Tailable cursor ${cursorId} not found or inactive`);
      }

      const cursor = cursorInfo.cursor;
      let streaming = true;

      // Process documents as they arrive
      while (streaming && cursorInfo.active) {
        try {
          const hasNext = await cursor.hasNext();

          if (hasNext) {
            const document = await cursor.next();

            // Update cursor activity
            cursorInfo.documentsRead++;
            cursorInfo.lastActivity = new Date();

            // Call event handler
            if (eventHandler) {
              const continueStreaming = await eventHandler(document, {
                cursorId: cursorId,
                collectionName: cursorInfo.collectionName,
                documentsRead: cursorInfo.documentsRead
              });

              if (continueStreaming === false) {
                streaming = false;
              }
            }

          } else {
            // Wait for new data (cursor will block until new documents arrive)
            await new Promise(resolve => setTimeout(resolve, 100));
          }

        } catch (cursorError) {
          console.error(`Error in tailable cursor streaming:`, cursorError);

          if (errorHandler) {
            const shouldContinue = await errorHandler(cursorError, {
              cursorId: cursorId,
              collectionName: cursorInfo.collectionName
            });

            if (!shouldContinue) {
              streaming = false;
            }
          } else {
            streaming = false;
          }
        }
      }

      console.log(`Streaming completed for cursor: ${cursorId}`);

    } catch (error) {
      console.error(`Error streaming from tailable cursor ${cursorId}:`, error);
      throw error;
    }
  }

  async bulkInsertToStream(collectionName, documents, options = {}) {
    console.log(`Performing bulk insert to capped collection: ${collectionName}`);

    try {
      const collectionConfig = this.cappedCollections.get(collectionName);
      if (!collectionConfig) {
        throw new Error(`Collection ${collectionName} not found in capped collections`);
      }

      const collection = collectionConfig.collection;

      // Prepare documents with capped collection optimization
      const optimizedDocuments = documents.map(doc => ({
        _id: new ObjectId(),
        timestamp: doc.timestamp || new Date(),
        ...doc,

        // Capped collection metadata
        insertionOrder: true,
        bulkInserted: true,
        batchId: options.batchId || new ObjectId().toString()
      }));

      // Perform optimized bulk insert
      const bulkOptions = {
        ordered: options.ordered !== false,
        writeConcern: { w: 1, j: false }, // Optimized for throughput
        bypassDocumentValidation: options.bypassValidation || false
      };

      const result = await collection.insertMany(optimizedDocuments, bulkOptions);

      // Update bulk performance metrics
      await this.updateCollectionMetrics(collectionName, 'bulk_insert', {
        documentsInserted: optimizedDocuments.length,
        batchSize: optimizedDocuments.length,
        bulkOperation: true
      });

      console.log(`Bulk insert completed: ${result.insertedCount} documents inserted to ${collectionName}`);

      return {
        insertedCount: result.insertedCount,
        insertedIds: Object.values(result.insertedIds),
        batchId: options.batchId,
        cappedOptimized: true,
        insertionOrder: true
      };

    } catch (error) {
      console.error(`Error performing bulk insert to ${collectionName}:`, error);
      throw error;
    }
  }

  async getCollectionStats(collectionName) {
    console.log(`Retrieving statistics for capped collection: ${collectionName}`);

    try {
      const collectionConfig = this.cappedCollections.get(collectionName);
      if (!collectionConfig) {
        throw new Error(`Collection ${collectionName} not found`);
      }

      const collection = collectionConfig.collection;

      // Get MongoDB collection statistics
      const stats = await this.db.command({ collStats: collectionName });

      // Get collection configuration
      const cappedOptions = collectionConfig.cappedOptions;

      // Calculate utilization metrics
      const sizeUtilization = (stats.size / cappedOptions.size) * 100;
      const countUtilization = cappedOptions.max ? (stats.count / cappedOptions.max) * 100 : 0;

      // Get recent activity metrics
      const performanceMetrics = this.performanceMetrics.get(collectionName) || {};

      const collectionStats = {
        collectionName: collectionName,
        cappedCollection: stats.capped,
        useCase: collectionConfig.useCase,
        performanceProfile: collectionConfig.performanceProfile,

        // Size and capacity metrics
        currentSize: stats.size,
        maxSize: cappedOptions.size,
        sizeUtilization: Math.round(sizeUtilization * 100) / 100,

        currentCount: stats.count,
        maxCount: cappedOptions.max || null,
        countUtilization: Math.round(countUtilization * 100) / 100,

        // Storage details
        avgDocumentSize: stats.avgObjSize,
        storageSize: stats.storageSize,
        totalIndexSize: stats.totalIndexSize,
        indexSizes: stats.indexSizes,

        // Performance indicators
        insertRate: performanceMetrics.insertRate || 0,
        queryRate: performanceMetrics.queryRate || 0,
        lastInsertTime: performanceMetrics.lastInsertTime || null,

        // Capped collection specific
        insertionOrder: collectionConfig.insertionOrder,
        tailableSupported: collectionConfig.tailableSupported,

        // Operational status
        healthStatus: this.assessCollectionHealth(sizeUtilization, countUtilization),
        recommendations: this.generateRecommendations(collectionName, sizeUtilization, performanceMetrics)
      };

      console.log(`Statistics retrieved for ${collectionName}: ${collectionStats.currentCount} documents, ${collectionStats.sizeUtilization}% capacity`);

      return collectionStats;

    } catch (error) {
      console.error(`Error retrieving statistics for ${collectionName}:`, error);
      throw error;
    }
  }

  // Utility methods for capped collection management

  async updateCollectionMetrics(collectionName, operation, metadata) {
    if (!this.config.enableMetricsCollection) return;

    const now = new Date();
    const metrics = this.performanceMetrics.get(collectionName) || {
      insertCount: 0,
      insertRate: 0,
      queryCount: 0,
      queryRate: 0,
      lastInsertTime: null,
      lastQueryTime: null,
      operationHistory: []
    };

    // Update operation counts and rates
    if (operation === 'insert' || operation === 'stream' || operation === 'bulk_insert') {
      metrics.insertCount += metadata.documentsInserted || 1;
      metrics.lastInsertTime = now;

      // Calculate insert rate (operations per second over last minute)
      const oneMinuteAgo = new Date(now.getTime() - 60000);
      const recentInserts = metrics.operationHistory.filter(
        op => op.type === 'insert' && op.timestamp > oneMinuteAgo
      ).length;
      metrics.insertRate = recentInserts;
    }

    // Record operation in history
    metrics.operationHistory.push({
      type: operation,
      timestamp: now,
      metadata: metadata
    });

    // Keep only last 1000 operations for performance
    if (metrics.operationHistory.length > 1000) {
      metrics.operationHistory = metrics.operationHistory.slice(-1000);
    }

    this.performanceMetrics.set(collectionName, metrics);
  }

  assessCollectionHealth(sizeUtilization, countUtilization) {
    const maxUtilization = Math.max(sizeUtilization, countUtilization);

    if (maxUtilization >= 95) return 'critical';
    if (maxUtilization >= 85) return 'warning';
    if (maxUtilization >= 70) return 'caution';
    return 'healthy';
  }

  generateRecommendations(collectionName, sizeUtilization, performanceMetrics) {
    const recommendations = [];

    if (sizeUtilization > 85) {
      recommendations.push('Consider increasing capped collection size limit');
    }

    if (performanceMetrics.insertRate > 10000) {
      recommendations.push('High insert rate detected - consider bulk insert optimization');
    }

    if (sizeUtilization < 30 && performanceMetrics.insertRate < 100) {
      recommendations.push('Collection may be oversized for current workload');
    }

    return recommendations;
  }

  async closeTailableCursor(cursorId) {
    console.log(`Closing tailable cursor: ${cursorId}`);

    try {
      const cursorInfo = this.tailableCursors.get(cursorId);
      if (cursorInfo) {
        cursorInfo.active = false;
        await cursorInfo.cursor.close();
        this.tailableCursors.delete(cursorId);
        console.log(`Tailable cursor closed: ${cursorId}`);
      }
    } catch (error) {
      console.error(`Error closing tailable cursor ${cursorId}:`, error);
    }
  }

  async cleanup() {
    console.log('Cleaning up Capped Collections Manager...');

    // Close all tailable cursors
    for (const [cursorId, cursorInfo] of this.tailableCursors) {
      try {
        cursorInfo.active = false;
        await cursorInfo.cursor.close();
      } catch (error) {
        console.error(`Error closing cursor ${cursorId}:`, error);
      }
    }

    // Clear all management state
    this.cappedCollections.clear();
    this.tailableCursors.clear();
    this.performanceMetrics.clear();
    this.capacityMonitors.clear();

    console.log('Capped Collections Manager cleanup completed');
  }
}

// Example usage demonstrating high-performance streaming with capped collections
async function demonstrateHighPerformanceStreaming() {
  const client = new MongoClient('mongodb://localhost:27017');
  await client.connect();

  const cappedManager = new MongoCappedCollectionManager(client, {
    database: 'high_performance_streaming',
    enableTailableCursors: true,
    enableMetricsCollection: true,
    enablePerformanceMonitoring: true
  });

  try {
    // Demonstrate high-volume application logging
    console.log('Demonstrating high-performance application logging...');
    const logPromises = [];
    for (let i = 0; i < 1000; i++) {
      logPromises.push(cappedManager.logApplicationEvent({
        level: ['INFO', 'WARN', 'ERROR'][Math.floor(Math.random() * 3)],
        application: 'web-api',
        component: 'user-service',
        message: `Processing user request ${i}`,
        data: {
          userId: `user_${Math.floor(Math.random() * 1000)}`,
          operation: 'profile_update',
          executionTime: Math.floor(Math.random() * 100) + 10
        },
        traceId: `trace_${i}`,
        requestId: `req_${Date.now()}_${i}`
      }));
    }
    await Promise.all(logPromises);
    console.log('High-volume logging completed');

    // Demonstrate event streaming with tailable cursor
    console.log('Demonstrating real-time event streaming...');
    const tailableCursor = await cappedManager.createTailableCursor('event_stream', {
      fromLatest: true,
      batchSize: 50
    });

    // Start streaming events in background
    const streamingPromise = cappedManager.streamFromTailableCursor(
      tailableCursor.cursorId,
      async (document, context) => {
        console.log(`Streamed event: ${document.eventType} from ${document.eventSource}`);
        return context.documentsRead < 100; // Stop after 100 events
      },
      async (error, context) => {
        console.error(`Streaming error:`, error.message);
        return false; // Stop on error
      }
    );

    // Generate stream events
    const eventPromises = [];
    for (let i = 0; i < 100; i++) {
      eventPromises.push(cappedManager.streamEvent({
        type: ['page_view', 'user_action', 'system_event'][Math.floor(Math.random() * 3)],
        source: 'web_application',
        streamName: 'user_activity',
        payload: {
          userId: `user_${Math.floor(Math.random() * 100)}`,
          action: 'click',
          page: '/dashboard',
          timestamp: new Date()
        },
        correlationId: `correlation_${i}`
      }));

      // Add small delay to demonstrate real-time streaming
      if (i % 10 === 0) {
        await new Promise(resolve => setTimeout(resolve, 10));
      }
    }

    await Promise.all(eventPromises);
    await streamingPromise;

    // Demonstrate bulk metrics insertion
    console.log('Demonstrating bulk metrics recording...');
    const metrics = [];
    for (let i = 0; i < 500; i++) {
      metrics.push({
        type: 'performance',
        name: 'response_time',
        value: Math.floor(Math.random() * 1000) + 50,
        unit: 'milliseconds',
        source: 'api-gateway',
        tags: {
          endpoint: '/api/users',
          method: 'GET',
          status_code: 200
        }
      });
    }

    await cappedManager.bulkInsertToStream('realtime_metrics', metrics, {
      batchId: 'metrics_batch_' + Date.now()
    });

    // Get collection statistics
    const logsStats = await cappedManager.getCollectionStats('application_logs');
    const eventsStats = await cappedManager.getCollectionStats('event_stream');
    const metricsStats = await cappedManager.getCollectionStats('realtime_metrics');

    console.log('High-Performance Streaming Results:');
    console.log('Application Logs Stats:', {
      count: logsStats.currentCount,
      sizeUtilization: logsStats.sizeUtilization,
      healthStatus: logsStats.healthStatus
    });
    console.log('Event Stream Stats:', {
      count: eventsStats.currentCount,
      sizeUtilization: eventsStats.sizeUtilization,
      healthStatus: eventsStats.healthStatus
    });
    console.log('Metrics Stats:', {
      count: metricsStats.currentCount,
      sizeUtilization: metricsStats.sizeUtilization,
      healthStatus: metricsStats.healthStatus
    });

    return {
      logsStats,
      eventsStats,
      metricsStats,
      tailableCursorDemo: true,
      bulkInsertDemo: true
    };

  } catch (error) {
    console.error('Error demonstrating high-performance streaming:', error);
    throw error;
  } finally {
    await cappedManager.cleanup();
    await client.close();
  }
}

// Benefits of MongoDB Capped Collections:
// - Native circular buffer functionality eliminates manual buffer overflow management
// - Guaranteed insertion order maintains chronological data integrity for time-series applications  
// - Automatic size management prevents storage bloat without external cleanup procedures
// - Tailable cursors enable real-time streaming applications with minimal latency
// - Optimized storage patterns provide superior performance for high-volume append-only workloads
// - Zero-maintenance operation reduces operational overhead compared to traditional logging systems
// - Built-in FIFO behavior ensures oldest data is automatically removed when capacity limits are reached
// - Integration with MongoDB's replication and sharding for distributed streaming architectures

module.exports = {
  MongoCappedCollectionManager,
  demonstrateHighPerformanceStreaming
};

SQL-Style Capped Collection Management with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB capped collections and circular buffer management:

-- QueryLeaf capped collections with SQL-familiar circular buffer management syntax

-- Configure capped collection settings and performance optimization
SET capped_collection_monitoring = true;
SET enable_tailable_cursors = true; 
SET enable_performance_metrics = true;
SET default_capped_size_mb = 1024; -- 1GB default
SET default_capped_max_documents = 1000000;
SET enable_bulk_insert_optimization = true;

-- Create capped collections with circular buffer functionality
WITH capped_collection_definitions AS (
  SELECT 
    collection_name,
    capped_size_bytes,
    max_document_count,
    use_case,
    performance_profile,

    -- Collection optimization settings
    JSON_BUILD_OBJECT(
      'capped', true,
      'size', capped_size_bytes,
      'max', max_document_count,
      'storageEngine', JSON_BUILD_OBJECT(
        'wiredTiger', JSON_BUILD_OBJECT(
          'configString', 'block_compressor=snappy'
        )
      )
    ) as creation_options,

    -- Index configuration for capped collections
    CASE use_case
      WHEN 'application_logging' THEN ARRAY[
        JSON_BUILD_OBJECT('key', JSON_BUILD_OBJECT('logLevel', 1, 'timestamp', 1)),
        JSON_BUILD_OBJECT('key', JSON_BUILD_OBJECT('application', 1, 'component', 1)),
        JSON_BUILD_OBJECT('key', JSON_BUILD_OBJECT('traceId', 1), 'sparse', true)
      ]
      WHEN 'event_streaming' THEN ARRAY[
        JSON_BUILD_OBJECT('key', JSON_BUILD_OBJECT('eventType', 1, 'timestamp', 1)),
        JSON_BUILD_OBJECT('key', JSON_BUILD_OBJECT('streamName', 1)),
        JSON_BUILD_OBJECT('key', JSON_BUILD_OBJECT('correlationId', 1), 'sparse', true)
      ]
      WHEN 'metrics_collection' THEN ARRAY[
        JSON_BUILD_OBJECT('key', JSON_BUILD_OBJECT('metricType', 1, 'timestamp', 1)),
        JSON_BUILD_OBJECT('key', JSON_BUILD_OBJECT('source', 1, 'timestamp', -1)),
        JSON_BUILD_OBJECT('key', JSON_BUILD_OBJECT('aggregationLevel', 1))
      ]
      ELSE ARRAY[
        JSON_BUILD_OBJECT('key', JSON_BUILD_OBJECT('timestamp', 1))
      ]
    END as index_configuration

  FROM (VALUES
    ('application_logs_capped', 1024 * 1024 * 1024, 1000000, 'application_logging', 'high_throughput'),
    ('event_stream_capped', 2048 * 1024 * 1024, 5000000, 'event_streaming', 'ultra_high_throughput'),
    ('realtime_metrics_capped', 512 * 1024 * 1024, 2000000, 'metrics_collection', 'time_series_optimized'),
    ('audit_trail_capped', 256 * 1024 * 1024, 500000, 'audit_logging', 'compliance_optimized'),
    ('system_events_capped', 128 * 1024 * 1024, 250000, 'system_monitoring', 'operational_tracking')
  ) AS collections(collection_name, capped_size_bytes, max_document_count, use_case, performance_profile)
),

-- High-performance application logging with capped collections
application_logs_streaming AS (
  INSERT INTO application_logs_capped
  SELECT 
    GENERATE_UUID() as log_id,
    CURRENT_TIMESTAMP - (random() * INTERVAL '1 hour') as timestamp,

    -- Log classification and severity
    (ARRAY['DEBUG', 'INFO', 'WARN', 'ERROR', 'FATAL'])
      [1 + floor(random() * 5)] as log_level,
    (ARRAY['web-api', 'auth-service', 'data-processor', 'notification-service'])
      [1 + floor(random() * 4)] as application,
    (ARRAY['controller', 'service', 'repository', 'middleware'])
      [1 + floor(random() * 4)] as component,

    -- Log content and context
    'Processing request for user operation ' || generate_series(1, 10000) as message,
    JSON_BUILD_OBJECT(
      'userId', 'user_' || (1 + floor(random() * 1000)),
      'operation', (ARRAY['create', 'read', 'update', 'delete', 'search'])[1 + floor(random() * 5)],
      'executionTime', floor(random() * 500) + 10,
      'memoryUsage', ROUND((random() * 100 + 50)::decimal, 2),
      'requestSize', floor(random() * 10000) + 100
    ) as log_data,

    -- Request correlation and tracing
    'req_' || EXTRACT(EPOCH FROM CURRENT_TIMESTAMP) || '_' || generate_series(1, 10000) as request_id,
    'session_' || (1 + floor(random() * 1000)) as session_id,
    'trace_' || generate_series(1, 10000) as trace_id,
    'span_' || generate_series(1, 10000) as span_id,

    -- Server and environment context
    ('server_' || (1 + floor(random() * 10))) as hostname,
    (1000 + floor(random() * 9000)) as process_id,
    'production' as environment,

    -- Capped collection metadata
    true as insertion_order_guaranteed,
    true as circular_buffer_managed,
    'high_throughput' as performance_optimized
  RETURNING log_id, timestamp, log_level, application
),

-- Real-time event streaming with automatic buffer management
event_stream_operations AS (
  INSERT INTO event_stream_capped
  SELECT 
    GENERATE_UUID() as event_id,
    CURRENT_TIMESTAMP - (random() * INTERVAL '30 minutes') as timestamp,

    -- Event classification
    (ARRAY['page_view', 'user_action', 'system_event', 'api_call', 'data_change'])
      [1 + floor(random() * 5)] as event_type,
    (ARRAY['web_app', 'mobile_app', 'api_gateway', 'background_service'])
      [1 + floor(random() * 4)] as event_source,

    -- Event payload and metadata
    JSON_BUILD_OBJECT(
      'userId', 'user_' || (1 + floor(random() * 500)),
      'action', (ARRAY['click', 'view', 'submit', 'navigate', 'search'])[1 + floor(random() * 5)],
      'page', (ARRAY['/dashboard', '/profile', '/settings', '/reports', '/admin'])[1 + floor(random() * 5)],
      'duration', floor(random() * 5000) + 100,
      'userAgent', 'Mozilla/5.0 (Enterprise Browser)',
      'ipAddress', '192.168.' || (1 + floor(random() * 254)) || '.' || (1 + floor(random() * 254))
    ) as event_data,

    -- Streaming metadata
    (ARRAY['user_activity', 'system_monitoring', 'api_analytics', 'security_events'])
      [1 + floor(random() * 4)] as stream_name,
    'partition_' || (1 + floor(random() * 10)) as partition_key,
    EXTRACT(EPOCH FROM CURRENT_TIMESTAMP) * 1000000 + generate_series(1, 50000) as sequence_number,

    -- Processing and correlation
    false as processed,
    0 as processing_attempts,
    'correlation_' || generate_series(1, 50000) as correlation_id,

    -- Performance optimization metadata
    JSON_LENGTH(event_data::text) as event_size_bytes,
    floor(random() * 50) + 5 as ingestion_latency_ms,

    -- Capped collection optimization
    true as tailable_cursor_ready,
    true as buffer_optimized,
    true as insertion_order_maintained
  RETURNING event_id, event_type, stream_name, sequence_number
),

-- High-frequency metrics collection with time-series optimization  
metrics_collection_operations AS (
  INSERT INTO realtime_metrics_capped
  SELECT 
    GENERATE_UUID() as metric_id,
    CURRENT_TIMESTAMP - (random() * INTERVAL '15 minutes') as timestamp,

    -- Metric classification
    (ARRAY['performance', 'business', 'system', 'security', 'custom'])
      [1 + floor(random() * 5)] as metric_type,
    (ARRAY['response_time', 'throughput', 'error_rate', 'cpu_usage', 'memory_usage', 'disk_io', 'network_latency'])
      [1 + floor(random() * 7)] as metric_name,

    -- Metric values and units
    CASE 
      WHEN metric_name IN ('response_time', 'network_latency') THEN random() * 1000 + 10
      WHEN metric_name = 'cpu_usage' THEN random() * 100
      WHEN metric_name = 'memory_usage' THEN random() * 16 + 2  -- GB
      WHEN metric_name = 'error_rate' THEN random() * 5
      WHEN metric_name = 'throughput' THEN random() * 10000 + 100
      ELSE random() * 1000
    END as value,

    CASE 
      WHEN metric_name IN ('response_time', 'network_latency') THEN 'milliseconds'
      WHEN metric_name IN ('cpu_usage', 'error_rate') THEN 'percent'
      WHEN metric_name = 'memory_usage' THEN 'gigabytes'
      WHEN metric_name = 'throughput' THEN 'requests_per_second'
      ELSE 'count'
    END as unit,

    -- Source and tagging
    (ARRAY['api-gateway', 'web-server', 'database', 'cache', 'queue'])
      [1 + floor(random() * 5)] as source,
    'application' as source_type,

    JSON_BUILD_OBJECT(
      'environment', 'production',
      'region', (ARRAY['us-east-1', 'us-west-2', 'eu-west-1'])[1 + floor(random() * 3)],
      'service', (ARRAY['auth', 'users', 'orders', 'notifications'])[1 + floor(random() * 4)],
      'instance', 'instance_' || (1 + floor(random() * 20))
    ) as tags,

    -- Time series optimization
    'raw' as aggregation_level,
    NULL as aggregation_window,

    -- Bucketing for time-series efficiency (1-minute buckets)
    DATE_TRUNC('minute', CURRENT_TIMESTAMP) as bucket_timestamp,

    -- Performance metadata
    true as time_series_optimized,
    EXTRACT(EPOCH FROM CURRENT_TIMESTAMP) * 1000 as collection_timestamp_ms,
    floor(random() * 10) + 1 as processing_latency_ms
  RETURNING metric_id, metric_type, metric_name, value, source
),

-- Monitor capped collection performance and utilization
capped_collection_monitoring AS (
  SELECT 
    collection_name,
    use_case,
    performance_profile,

    -- Collection capacity analysis
    capped_size_bytes as max_size_bytes,
    max_document_count as max_documents,

    -- Simulated current utilization (in production would query actual stats)
    CASE collection_name
      WHEN 'application_logs_capped' THEN floor(random() * 800000) + 100000  -- 100k-900k docs
      WHEN 'event_stream_capped' THEN floor(random() * 4000000) + 500000   -- 500k-4.5M docs  
      WHEN 'realtime_metrics_capped' THEN floor(random() * 1500000) + 200000 -- 200k-1.7M docs
      WHEN 'audit_trail_capped' THEN floor(random() * 300000) + 50000       -- 50k-350k docs
      ELSE floor(random() * 100000) + 10000
    END as current_document_count,

    -- Estimated current size (simplified calculation)
    CASE collection_name  
      WHEN 'application_logs_capped' THEN floor(random() * 800000000) + 100000000  -- 100MB-800MB
      WHEN 'event_stream_capped' THEN floor(random() * 1600000000) + 200000000    -- 200MB-1.6GB
      WHEN 'realtime_metrics_capped' THEN floor(random() * 400000000) + 50000000  -- 50MB-400MB
      WHEN 'audit_trail_capped' THEN floor(random() * 200000000) + 25000000       -- 25MB-200MB
      ELSE floor(random() * 50000000) + 10000000
    END as current_size_bytes,

    -- Performance simulation
    CASE performance_profile
      WHEN 'ultra_high_throughput' THEN floor(random() * 50000) + 10000  -- 10k-60k inserts/sec
      WHEN 'high_throughput' THEN floor(random() * 20000) + 5000         -- 5k-25k inserts/sec
      WHEN 'time_series_optimized' THEN floor(random() * 15000) + 3000   -- 3k-18k inserts/sec
      WHEN 'compliance_optimized' THEN floor(random() * 5000) + 1000     -- 1k-6k inserts/sec
      ELSE floor(random() * 2000) + 500                                  -- 500-2.5k inserts/sec
    END as estimated_insert_rate_per_sec

  FROM capped_collection_definitions
),

-- Calculate utilization metrics and health assessment
capped_utilization_analysis AS (
  SELECT 
    ccm.collection_name,
    ccm.use_case,
    ccm.performance_profile,

    -- Capacity utilization
    ccm.current_document_count,
    ccm.max_documents,
    ROUND((ccm.current_document_count::decimal / ccm.max_documents::decimal) * 100, 1) as document_utilization_percent,

    ccm.current_size_bytes,
    ccm.max_size_bytes,
    ROUND((ccm.current_size_bytes::decimal / ccm.max_size_bytes::decimal) * 100, 1) as size_utilization_percent,

    -- Performance metrics
    ccm.estimated_insert_rate_per_sec,
    ROUND(ccm.current_size_bytes::decimal / ccm.current_document_count::decimal, 2) as avg_document_size_bytes,

    -- Storage efficiency
    ROUND(ccm.current_size_bytes / (1024 * 1024)::decimal, 2) as current_size_mb,
    ROUND(ccm.max_size_bytes / (1024 * 1024)::decimal, 2) as max_size_mb,

    -- Operational assessment
    CASE 
      WHEN GREATEST(
        (ccm.current_document_count::decimal / ccm.max_documents::decimal) * 100,
        (ccm.current_size_bytes::decimal / ccm.max_size_bytes::decimal) * 100
      ) >= 95 THEN 'critical'
      WHEN GREATEST(
        (ccm.current_document_count::decimal / ccm.max_documents::decimal) * 100,
        (ccm.current_size_bytes::decimal / ccm.max_size_bytes::decimal) * 100
      ) >= 85 THEN 'warning'
      WHEN GREATEST(
        (ccm.current_document_count::decimal / ccm.max_documents::decimal) * 100,
        (ccm.current_size_bytes::decimal / ccm.max_size_bytes::decimal) * 100
      ) >= 70 THEN 'caution'
      ELSE 'healthy'
    END as health_status,

    -- Throughput assessment
    CASE 
      WHEN ccm.estimated_insert_rate_per_sec > 25000 THEN 'ultra_high'
      WHEN ccm.estimated_insert_rate_per_sec > 10000 THEN 'high'
      WHEN ccm.estimated_insert_rate_per_sec > 5000 THEN 'medium'
      WHEN ccm.estimated_insert_rate_per_sec > 1000 THEN 'moderate'
      ELSE 'low'
    END as throughput_classification

  FROM capped_collection_monitoring ccm
),

-- Generate optimization recommendations
capped_optimization_recommendations AS (
  SELECT 
    cua.collection_name,
    cua.health_status,
    cua.throughput_classification,
    cua.document_utilization_percent,
    cua.size_utilization_percent,

    -- Capacity recommendations
    CASE 
      WHEN cua.size_utilization_percent > 90 THEN 'Increase capped collection size immediately'
      WHEN cua.document_utilization_percent > 90 THEN 'Increase document count limit immediately'
      WHEN cua.size_utilization_percent > 80 THEN 'Monitor closely and consider size increase'
      WHEN cua.size_utilization_percent < 30 AND cua.throughput_classification = 'low' THEN 'Consider reducing collection size for efficiency'
      ELSE 'Capacity within optimal range'
    END as capacity_recommendation,

    -- Performance recommendations
    CASE 
      WHEN cua.throughput_classification = 'ultra_high' THEN 'Optimize for maximum throughput with bulk inserts'
      WHEN cua.throughput_classification = 'high' THEN 'Enable write optimization and consider sharding'
      WHEN cua.throughput_classification = 'medium' THEN 'Standard configuration appropriate'
      WHEN cua.throughput_classification = 'low' THEN 'Consider consolidating with other collections'
      ELSE 'Review usage patterns'
    END as performance_recommendation,

    -- Operational recommendations
    CASE 
      WHEN cua.health_status = 'critical' THEN 'Immediate intervention required'
      WHEN cua.health_status = 'warning' THEN 'Plan capacity expansion within 24 hours'
      WHEN cua.health_status = 'caution' THEN 'Monitor usage trends and prepare for expansion'
      ELSE 'Continue monitoring with current configuration'
    END as operational_recommendation,

    -- Efficiency metrics
    ROUND(cua.estimated_insert_rate_per_sec::decimal / (cua.size_utilization_percent / 100::decimal), 2) as efficiency_ratio,

    -- Projected timeline to capacity
    CASE 
      WHEN cua.estimated_insert_rate_per_sec > 0 AND cua.size_utilization_percent < 95 THEN
        ROUND(
          (cua.max_documents - cua.current_document_count)::decimal / 
          (cua.estimated_insert_rate_per_sec::decimal * 3600), 
          1
        )
      ELSE NULL
    END as hours_to_document_capacity,

    -- Circular buffer efficiency
    CASE 
      WHEN cua.size_utilization_percent > 90 THEN 'Active circular buffer management'
      WHEN cua.size_utilization_percent > 70 THEN 'Approaching circular buffer activation' 
      ELSE 'Pre-circular buffer phase'
    END as circular_buffer_status

  FROM capped_utilization_analysis cua
)

-- Comprehensive capped collections management dashboard
SELECT 
  cor.collection_name,
  cor.use_case,
  cor.throughput_classification,
  cor.health_status,

  -- Current state
  cua.current_document_count as documents,
  cua.document_utilization_percent || '%' as doc_utilization,
  cua.current_size_mb || ' MB' as current_size,
  cua.size_utilization_percent || '%' as size_utilization,

  -- Performance metrics
  cua.estimated_insert_rate_per_sec as inserts_per_second,
  ROUND(cua.avg_document_size_bytes / 1024, 2) || ' KB' as avg_doc_size,
  cor.efficiency_ratio as efficiency_score,

  -- Capacity management
  cor.circular_buffer_status,
  COALESCE(cor.hours_to_document_capacity || ' hours', 'N/A') as time_to_capacity,

  -- Operational guidance
  cor.capacity_recommendation,
  cor.performance_recommendation,
  cor.operational_recommendation,

  -- Capped collection benefits
  JSON_BUILD_OBJECT(
    'guaranteed_insertion_order', true,
    'automatic_size_management', true,
    'circular_buffer_behavior', true,
    'tailable_cursor_support', true,
    'high_performance_writes', true,
    'zero_maintenance_required', true
  ) as capped_collection_features,

  -- Next actions
  CASE cor.health_status
    WHEN 'critical' THEN 'Execute capacity expansion immediately'
    WHEN 'warning' THEN 'Schedule capacity planning meeting'
    WHEN 'caution' THEN 'Increase monitoring frequency'
    ELSE 'Continue standard monitoring'
  END as immediate_actions,

  -- Optimization opportunities
  CASE 
    WHEN cor.throughput_classification = 'ultra_high' AND cua.size_utilization_percent < 50 THEN 
      'Optimize collection size for current throughput'
    WHEN cor.efficiency_ratio > 1000 THEN 
      'Excellent efficiency - consider as template for other collections'
    WHEN cor.efficiency_ratio < 100 THEN
      'Review configuration for efficiency improvements'
    ELSE 'Configuration optimized for current workload'
  END as optimization_opportunities

FROM capped_optimization_recommendations cor
JOIN capped_utilization_analysis cua ON cor.collection_name = cua.collection_name
ORDER BY 
  CASE cor.health_status
    WHEN 'critical' THEN 1
    WHEN 'warning' THEN 2  
    WHEN 'caution' THEN 3
    ELSE 4
  END,
  cua.size_utilization_percent DESC;

-- QueryLeaf provides comprehensive MongoDB capped collection capabilities:
-- 1. Native circular buffer functionality with SQL-familiar collection management syntax
-- 2. Automatic size and document count management without manual cleanup procedures
-- 3. High-performance streaming applications with tailable cursor and real-time processing support
-- 4. Time-series optimized storage patterns for metrics, logs, and event data
-- 5. Enterprise-grade monitoring with capacity utilization and performance analytics
-- 6. Guaranteed insertion order maintenance for chronological data integrity
-- 7. Integration with MongoDB's replication and sharding for distributed streaming architectures
-- 8. SQL-style capped collection operations for familiar database management workflows
-- 9. Advanced performance optimization with bulk insert and streaming operation support
-- 10. Zero-maintenance circular buffer management with automatic FIFO behavior and overflow handling

Best Practices for MongoDB Capped Collections Implementation

High-Performance Streaming Architecture

Essential practices for implementing capped collections effectively in production environments:

Size Planning Strategy: Plan capped collection sizes based on data velocity, retention requirements, and query patterns for optimal performance
Index Optimization: Use minimal, strategic indexing that supports query patterns without impacting insert performance
Tailable Cursor Management: Implement robust tailable cursor patterns for real-time data consumption with proper error handling
Monitoring and Alerting: Establish comprehensive monitoring for collection capacity, insertion rates, and performance metrics
Integration Patterns: Design application integration that leverages natural insertion order and circular buffer behavior
Performance Baselines: Establish performance baselines for insert rates, query response times, and storage utilization

Production Deployment and Scalability

Optimize capped collections for enterprise-scale streaming requirements:

Capacity Management: Implement proactive capacity monitoring with automated alerting before reaching collection limits
Replication Strategy: Configure capped collections across replica sets with considerations for network bandwidth and lag
Sharding Considerations: Understand sharding limitations and alternatives for capped collections in distributed deployments
Backup Integration: Design backup strategies that account for circular buffer behavior and data rotation patterns
Operational Procedures: Create standardized procedures for capped collection management, capacity expansion, and performance tuning
Disaster Recovery: Plan for capped collection recovery scenarios with considerations for data loss tolerance and restoration priorities

Conclusion

MongoDB capped collections provide enterprise-grade circular buffer functionality that eliminates manual buffer management complexity while delivering superior performance for high-volume streaming applications. The native FIFO behavior combined with guaranteed insertion order and tailable cursor support makes capped collections ideal for logging, event streaming, metrics collection, and real-time data processing scenarios.

Key MongoDB Capped Collection benefits include:

Circular Buffer Management: Automatic size management with FIFO behavior eliminates manual cleanup and rotation procedures
Guaranteed Insertion Order: Natural insertion order maintains chronological integrity for time-series and logging applications
High-Performance Writes: Optimized storage patterns provide maximum throughput for append-heavy workloads
Real-Time Streaming: Tailable cursors enable efficient real-time data consumption with minimal latency
Zero Maintenance: No manual intervention required for buffer overflow management or data rotation
SQL Accessibility: Familiar capped collection management through SQL-style syntax and operations

Whether you're building logging systems, event streaming platforms, metrics collection infrastructure, or real-time monitoring applications, MongoDB capped collections with QueryLeaf's familiar SQL interface provide the foundation for scalable, efficient, and maintainable streaming data architectures.

QueryLeaf Integration: QueryLeaf automatically optimizes MongoDB capped collections while providing SQL-familiar syntax for circular buffer management, streaming operations, and performance monitoring. Advanced capped collection patterns, tailable cursor management, and high-throughput optimization techniques are seamlessly accessible through familiar SQL constructs, making sophisticated streaming data management both powerful and approachable for SQL-oriented development teams.

The combination of MongoDB's native circular buffer capabilities with SQL-style streaming operations makes it an ideal platform for applications requiring both high-performance data ingestion and familiar operational patterns, ensuring your streaming architectures can handle enterprise-scale data volumes while maintaining operational simplicity and performance excellence.

December 3, 2025
25 min read

MongoDB Bulk Operations and Batch Processing: High-Performance Data Operations and Enterprise-Scale Processing Optimization

Modern applications frequently require processing large volumes of data efficiently through bulk operations, batch processing, and high-throughput data manipulation operations that can handle millions of documents while maintaining performance, consistency, and system stability. Traditional approaches to large-scale data operations often rely on individual record processing, inefficient batching strategies, or complex application-level coordination that leads to poor performance, resource contention, and scalability limitations.

MongoDB provides sophisticated bulk operation capabilities that enable high-performance batch processing, efficient data migrations, and optimized large-scale data operations with minimal overhead and maximum throughput. Unlike traditional databases that require complex stored procedures or external batch processing frameworks, MongoDB's native bulk operations offer streamlined, scalable, and efficient data processing with built-in error handling, ordering guarantees, and performance optimization.

The Traditional Batch Processing Challenge

Conventional approaches to large-scale data operations suffer from significant performance and scalability limitations:

-- Traditional PostgreSQL batch processing - inefficient and resource-intensive approaches

-- Single-record processing with significant overhead and poor performance
CREATE TABLE products_import (
    import_id BIGSERIAL PRIMARY KEY,
    product_id UUID DEFAULT gen_random_uuid(),
    product_name VARCHAR(200) NOT NULL,
    category VARCHAR(100),
    price DECIMAL(10,2) NOT NULL,
    stock_quantity INTEGER NOT NULL DEFAULT 0,
    supplier_id UUID,
    description TEXT,

    -- Import tracking and status management
    import_batch_id VARCHAR(100),
    import_timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    import_status VARCHAR(50) DEFAULT 'pending',
    processing_attempts INTEGER DEFAULT 0,

    -- Validation and error tracking
    validation_errors TEXT[],
    processing_error TEXT,
    needs_review BOOLEAN DEFAULT FALSE,

    -- Performance tracking
    processing_start_time TIMESTAMP,
    processing_end_time TIMESTAMP,
    processing_duration_ms INTEGER
);

-- Inefficient single-record insert approach (extremely slow for large datasets)
DO $$
DECLARE
    product_record RECORD;
    processing_start TIMESTAMP;
    processing_end TIMESTAMP;
    error_count INTEGER := 0;
    success_count INTEGER := 0;
    batch_size INTEGER := 1000;
    current_batch INTEGER := 0;
    total_records INTEGER;
BEGIN
    -- Get total record count for progress tracking
    SELECT COUNT(*) INTO total_records FROM raw_product_data;
    RAISE NOTICE 'Processing % total records', total_records;

    -- Process each record individually (inefficient approach)
    FOR product_record IN 
        SELECT * FROM raw_product_data 
        ORDER BY import_order ASC
    LOOP
        processing_start := CURRENT_TIMESTAMP;

        BEGIN
            -- Individual record validation (repeated overhead)
            IF product_record.product_name IS NULL OR LENGTH(product_record.product_name) = 0 THEN
                RAISE EXCEPTION 'Invalid product name';
            END IF;

            IF product_record.price <= 0 THEN
                RAISE EXCEPTION 'Invalid price: %', product_record.price;
            END IF;

            -- Single record insert (high overhead per operation)
            INSERT INTO products_import (
                product_name,
                category,
                price,
                stock_quantity,
                supplier_id,
                description,
                import_batch_id,
                import_status,
                processing_start_time
            ) VALUES (
                product_record.product_name,
                product_record.category,
                product_record.price,
                product_record.stock_quantity,
                product_record.supplier_id,
                product_record.description,
                'batch_' || EXTRACT(EPOCH FROM CURRENT_TIMESTAMP),
                'processing',
                processing_start
            );

            processing_end := CURRENT_TIMESTAMP;

            -- Update processing time (additional overhead)
            UPDATE products_import 
            SET processing_end_time = processing_end,
                processing_duration_ms = EXTRACT(MILLISECONDS FROM processing_end - processing_start),
                import_status = 'completed'
            WHERE product_id = (SELECT product_id FROM products_import 
                              WHERE product_name = product_record.product_name 
                              ORDER BY import_timestamp DESC LIMIT 1);

            success_count := success_count + 1;

        EXCEPTION WHEN OTHERS THEN
            error_count := error_count + 1;

            -- Error logging with additional overhead
            INSERT INTO import_errors (
                import_batch_id,
                error_record_data,
                error_message,
                error_timestamp
            ) VALUES (
                'batch_' || EXTRACT(EPOCH FROM CURRENT_TIMESTAMP),
                row_to_json(product_record),
                SQLERRM,
                CURRENT_TIMESTAMP
            );
        END;

        -- Progress reporting overhead (every record)
        current_batch := current_batch + 1;
        IF current_batch % batch_size = 0 THEN
            RAISE NOTICE 'Processed % of % records (% success, % errors)', 
                current_batch, total_records, success_count, error_count;
        END IF;
    END LOOP;

    RAISE NOTICE 'Processing complete: % success, % errors', success_count, error_count;

END $$;

-- Batch processing with limited effectiveness and complex management
CREATE OR REPLACE FUNCTION process_product_batch(
    batch_id VARCHAR,
    batch_size INTEGER DEFAULT 1000,
    max_batches INTEGER DEFAULT 100
) 
RETURNS TABLE(
    batch_number INTEGER,
    records_processed INTEGER,
    records_success INTEGER,
    records_failed INTEGER,
    processing_time_ms INTEGER,
    total_processing_time_ms BIGINT
) AS $$
DECLARE
    current_batch INTEGER := 1;
    batch_start_time TIMESTAMP;
    batch_end_time TIMESTAMP;
    batch_processing_time INTEGER;
    total_start_time TIMESTAMP := CURRENT_TIMESTAMP;
    records_in_batch INTEGER;
    success_in_batch INTEGER;
    errors_in_batch INTEGER;

BEGIN
    -- Create batch processing table (overhead)
    CREATE TEMP TABLE IF NOT EXISTS current_batch_data AS
    SELECT * FROM raw_product_data WHERE 1=0;

    WHILE current_batch <= max_batches LOOP
        batch_start_time := CURRENT_TIMESTAMP;

        -- Clear previous batch data
        TRUNCATE current_batch_data;

        -- Load batch data (complex offset/limit approach)
        INSERT INTO current_batch_data
        SELECT *
        FROM raw_product_data
        WHERE processed = FALSE
        ORDER BY import_priority DESC, created_at ASC
        LIMIT batch_size;

        -- Check if batch has data
        SELECT COUNT(*) INTO records_in_batch FROM current_batch_data;
        EXIT WHEN records_in_batch = 0;

        success_in_batch := 0;
        errors_in_batch := 0;

        -- Process batch with individual operations (still inefficient)
        DECLARE
            batch_record RECORD;
        BEGIN
            FOR batch_record IN SELECT * FROM current_batch_data LOOP
                BEGIN
                    -- Validation logic (repeated for every record)
                    PERFORM validate_product_data(
                        batch_record.product_name,
                        batch_record.category,
                        batch_record.price,
                        batch_record.stock_quantity
                    );

                    -- Individual insert (suboptimal)
                    INSERT INTO products_import (
                        product_name,
                        category, 
                        price,
                        stock_quantity,
                        supplier_id,
                        description,
                        import_batch_id,
                        import_status
                    ) VALUES (
                        batch_record.product_name,
                        batch_record.category,
                        batch_record.price,
                        batch_record.stock_quantity,
                        batch_record.supplier_id,
                        batch_record.description,
                        batch_id,
                        'completed'
                    );

                    success_in_batch := success_in_batch + 1;

                EXCEPTION WHEN OTHERS THEN
                    errors_in_batch := errors_in_batch + 1;

                    -- Log error (additional overhead)
                    INSERT INTO batch_processing_errors (
                        batch_id,
                        batch_number,
                        record_data,
                        error_message,
                        error_timestamp
                    ) VALUES (
                        batch_id,
                        current_batch,
                        row_to_json(batch_record),
                        SQLERRM,
                        CURRENT_TIMESTAMP
                    );
                END;
            END LOOP;

        END;

        -- Mark records as processed (additional update overhead)
        UPDATE raw_product_data
        SET processed = TRUE,
            processed_batch = current_batch,
            processed_timestamp = CURRENT_TIMESTAMP
        WHERE id IN (SELECT id FROM current_batch_data);

        batch_end_time := CURRENT_TIMESTAMP;
        batch_processing_time := EXTRACT(MILLISECONDS FROM batch_end_time - batch_start_time);

        -- Return batch results
        batch_number := current_batch;
        records_processed := records_in_batch;
        records_success := success_in_batch;
        records_failed := errors_in_batch;
        processing_time_ms := batch_processing_time;
        total_processing_time_ms := EXTRACT(MILLISECONDS FROM batch_end_time - total_start_time);

        RETURN NEXT;

        current_batch := current_batch + 1;
    END LOOP;

    -- Cleanup
    DROP TABLE IF EXISTS current_batch_data;

END;
$$ LANGUAGE plpgsql;

-- Execute batch processing with limited control and monitoring
SELECT 
    bp.*,
    ROUND(bp.records_processed::NUMERIC / (bp.processing_time_ms / 1000.0), 2) as records_per_second,
    ROUND(bp.records_success::NUMERIC / bp.records_processed * 100, 2) as success_rate_percent
FROM process_product_batch('import_batch_2025', 5000, 50) bp
ORDER BY bp.batch_number;

-- Traditional approach limitations:
-- 1. Individual record processing with high per-operation overhead
-- 2. Limited batch optimization and inefficient resource utilization
-- 3. Complex error handling with poor performance during error conditions
-- 4. No built-in ordering guarantees or transaction-level consistency
-- 5. Difficult to monitor and control processing performance
-- 6. Limited scalability for very large datasets (millions of records)
-- 7. Complex progress tracking and status management overhead
-- 8. No automatic retry or recovery mechanisms for failed batches
-- 9. Inefficient memory usage and connection resource management
-- 10. Poor integration with modern distributed processing patterns

-- Complex bulk update attempt with limited effectiveness
WITH bulk_price_updates AS (
    SELECT 
        product_id,
        category,
        current_price,

        -- Calculate new prices based on complex business logic
        CASE category
            WHEN 'electronics' THEN current_price * 1.15  -- 15% increase
            WHEN 'clothing' THEN 
                CASE 
                    WHEN current_price > 100 THEN current_price * 1.10  -- 10% for high-end
                    ELSE current_price * 1.20  -- 20% for regular
                END
            WHEN 'books' THEN 
                CASE
                    WHEN stock_quantity > 50 THEN current_price * 0.95  -- 5% discount for overstocked
                    WHEN stock_quantity < 5 THEN current_price * 1.25   -- 25% increase for rare
                    ELSE current_price * 1.05  -- 5% standard increase
                END
            ELSE current_price * 1.08  -- 8% default increase
        END as new_price,

        -- Audit trail information
        'bulk_price_update_2025' as update_reason,
        CURRENT_TIMESTAMP as update_timestamp

    FROM products
    WHERE active = TRUE
    AND last_price_update < CURRENT_TIMESTAMP - INTERVAL '6 months'
),

update_validation AS (
    SELECT 
        bpu.*,

        -- Validation checks
        CASE 
            WHEN bpu.new_price <= 0 THEN 'invalid_price_zero_negative'
            WHEN bpu.new_price > bpu.current_price * 3 THEN 'price_increase_too_large'
            WHEN bpu.new_price < bpu.current_price * 0.5 THEN 'price_decrease_too_large'
            ELSE 'valid'
        END as validation_status,

        -- Price change analysis
        bpu.new_price - bpu.current_price as price_change,
        ROUND(((bpu.new_price - bpu.current_price) / bpu.current_price * 100)::NUMERIC, 2) as price_change_percent

    FROM bulk_price_updates bpu
),

validated_updates AS (
    SELECT *
    FROM update_validation
    WHERE validation_status = 'valid'
),

failed_updates AS (
    SELECT *
    FROM update_validation  
    WHERE validation_status != 'valid'
)

-- Execute bulk update (still limited by SQL constraints)
UPDATE products
SET 
    current_price = vu.new_price,
    previous_price = products.current_price,
    last_price_update = vu.update_timestamp,
    price_change_reason = vu.update_reason,
    price_change_amount = vu.price_change,
    price_change_percent = vu.price_change_percent,
    updated_at = CURRENT_TIMESTAMP
FROM validated_updates vu
WHERE products.product_id = vu.product_id;

-- Log failed updates for review
INSERT INTO price_update_errors (
    product_id,
    attempted_price,
    current_price,
    validation_error,
    error_timestamp,
    requires_manual_review
)
SELECT 
    fu.product_id,
    fu.new_price,
    fu.current_price,
    fu.validation_status,
    CURRENT_TIMESTAMP,
    TRUE
FROM failed_updates fu;

-- Limitations of traditional bulk processing:
-- 1. Limited by SQL's capabilities for complex bulk operations
-- 2. No native support for partial success handling in single operations
-- 3. Complex validation and error handling logic
-- 4. Poor performance optimization for very large datasets
-- 5. Difficult to monitor progress of long-running bulk operations
-- 6. No built-in retry mechanisms for transient failures
-- 7. Limited flexibility in operation ordering and dependency management
-- 8. Complex memory management for large batch operations
-- 9. No automatic optimization based on data distribution or system load
-- 10. Difficult integration with distributed systems and microservices

MongoDB provides sophisticated bulk operation capabilities with comprehensive optimization and error handling:

// MongoDB Advanced Bulk Operations and High-Performance Batch Processing System
const { MongoClient, BulkWriteResult } = require('mongodb');

const client = new MongoClient('mongodb://localhost:27017');
const db = client.db('bulk_operations_system');

// Comprehensive MongoDB Bulk Operations Manager
class AdvancedBulkOperationsManager {
  constructor(db, config = {}) {
    this.db = db;
    this.collections = {
      products: db.collection('products'),
      orders: db.collection('orders'),
      customers: db.collection('customers'),
      inventory: db.collection('inventory'),
      bulkOperationLog: db.collection('bulk_operation_log'),
      bulkOperationMetrics: db.collection('bulk_operation_metrics'),
      processingQueue: db.collection('processing_queue')
    };

    // Advanced bulk operations configuration
    this.config = {
      // Batch size optimization
      defaultBatchSize: config.defaultBatchSize || 1000,
      maxBatchSize: config.maxBatchSize || 10000,
      adaptiveBatchSizing: config.adaptiveBatchSizing !== false,

      // Performance optimization
      enableOrderedOperations: config.enableOrderedOperations !== false,
      enableParallelProcessing: config.enableParallelProcessing !== false,
      maxConcurrentBatches: config.maxConcurrentBatches || 5,

      // Error handling and recovery
      enableErrorRecovery: config.enableErrorRecovery !== false,
      maxRetries: config.maxRetries || 3,
      retryDelayMs: config.retryDelayMs || 1000,

      // Monitoring and metrics
      enableMetricsCollection: config.enableMetricsCollection !== false,
      enableProgressTracking: config.enableProgressTracking !== false,
      metricsReportingInterval: config.metricsReportingInterval || 10000,

      // Memory and resource management
      enableMemoryOptimization: config.enableMemoryOptimization !== false,
      maxMemoryUsageMB: config.maxMemoryUsageMB || 1024,
      enableGarbageCollection: config.enableGarbageCollection !== false
    };

    // Operational state management
    this.operationStats = {
      totalOperations: 0,
      successfulOperations: 0,
      failedOperations: 0,
      totalBatches: 0,
      avgBatchProcessingTime: 0,
      totalProcessingTime: 0
    };

    this.activeOperations = new Map();
    this.operationQueue = [];
    this.performanceMetrics = new Map();

    console.log('Advanced Bulk Operations Manager initialized');
  }

  async initializeBulkOperationsSystem() {
    console.log('Initializing comprehensive bulk operations system...');

    try {
      // Setup indexes for performance optimization
      await this.setupPerformanceIndexes();

      // Initialize metrics collection
      await this.initializeMetricsSystem();

      // Setup operation queue for large-scale processing
      await this.initializeProcessingQueue();

      // Configure memory and resource monitoring
      await this.setupResourceMonitoring();

      console.log('Bulk operations system initialized successfully');

    } catch (error) {
      console.error('Error initializing bulk operations system:', error);
      throw error;
    }
  }

  async performAdvancedBulkInsert(collectionName, documents, options = {}) {
    const operation = {
      operationId: `bulk_insert_${Date.now()}_${Math.random().toString(36).substr(2, 9)}`,
      operationType: 'bulk_insert',
      collectionName: collectionName,
      documentsCount: documents.length,
      startTime: new Date(),
      status: 'processing'
    };

    console.log(`Starting bulk insert operation: ${operation.operationId}`);
    console.log(`Inserting ${documents.length} documents into ${collectionName}`);

    try {
      // Register operation for tracking
      this.activeOperations.set(operation.operationId, operation);

      // Validate and prepare documents
      const validatedDocuments = await this.validateAndPrepareDocuments(documents, 'insert');

      // Determine optimal batch configuration
      const batchConfig = await this.optimizeBatchConfiguration(validatedDocuments, options);

      // Execute bulk insert with advanced error handling
      const result = await this.executeBulkInsert(
        this.collections[collectionName], 
        validatedDocuments, 
        batchConfig,
        operation
      );

      // Update operation status
      operation.endTime = new Date();
      operation.status = 'completed';
      operation.result = result;
      operation.processingTime = operation.endTime - operation.startTime;

      // Log operation results
      await this.logBulkOperation(operation);

      // Update performance metrics
      await this.updateOperationMetrics(operation);

      console.log(`Bulk insert completed: ${operation.operationId}`);
      console.log(`Inserted ${result.insertedCount} documents successfully`);

      return result;

    } catch (error) {
      console.error(`Bulk insert failed: ${operation.operationId}`, error);

      // Handle operation failure
      operation.endTime = new Date();
      operation.status = 'failed';
      operation.error = {
        message: error.message,
        stack: error.stack
      };

      await this.handleOperationError(operation, error);
      throw error;

    } finally {
      // Cleanup operation tracking
      this.activeOperations.delete(operation.operationId);
    }
  }

  async executeBulkInsert(collection, documents, batchConfig, operation) {
    const results = {
      insertedCount: 0,
      insertedIds: [],
      errors: [],
      batches: [],
      totalBatches: Math.ceil(documents.length / batchConfig.batchSize)
    };

    console.log(`Executing bulk insert with ${results.totalBatches} batches of size ${batchConfig.batchSize}`);

    // Process documents in optimized batches
    for (let i = 0; i < documents.length; i += batchConfig.batchSize) {
      const batchStart = Date.now();
      const batch = documents.slice(i, i + batchConfig.batchSize);
      const batchNumber = Math.floor(i / batchConfig.batchSize) + 1;

      try {
        console.log(`Processing batch ${batchNumber}/${results.totalBatches} (${batch.length} documents)`);

        // Create bulk write operations for batch
        const bulkOps = batch.map(doc => ({
          insertOne: {
            document: {
              ...doc,
              _bulkOperationId: operation.operationId,
              _batchNumber: batchNumber,
              _insertedAt: new Date()
            }
          }
        }));

        // Execute bulk write with proper options
        const batchResult = await collection.bulkWrite(bulkOps, {
          ordered: batchConfig.ordered,
          bypassDocumentValidation: false,
          ...batchConfig.bulkWriteOptions
        });

        // Process batch results
        const batchProcessingTime = Date.now() - batchStart;
        const batchInfo = {
          batchNumber: batchNumber,
          documentsCount: batch.length,
          insertedCount: batchResult.insertedCount,
          processingTime: batchProcessingTime,
          insertedIds: Object.values(batchResult.insertedIds || {}),
          throughput: batch.length / (batchProcessingTime / 1000)
        };

        results.batches.push(batchInfo);
        results.insertedCount += batchResult.insertedCount;
        results.insertedIds.push(...batchInfo.insertedIds);

        // Update operation progress
        operation.progress = {
          batchesCompleted: batchNumber,
          totalBatches: results.totalBatches,
          documentsProcessed: i + batch.length,
          totalDocuments: documents.length,
          completionPercent: Math.round(((i + batch.length) / documents.length) * 100)
        };

        // Report progress periodically
        if (batchNumber % 10 === 0 || batchNumber === results.totalBatches) {
          console.log(`Progress: ${operation.progress.completionPercent}% (${operation.progress.documentsProcessed}/${operation.progress.totalDocuments})`);
        }

        // Adaptive batch size optimization based on performance
        if (this.config.adaptiveBatchSizing) {
          batchConfig = await this.adaptBatchSize(batchConfig, batchInfo);
        }

        // Memory pressure management
        if (this.config.enableMemoryOptimization) {
          await this.manageMemoryPressure();
        }

      } catch (batchError) {
        console.error(`Batch ${batchNumber} failed:`, batchError);

        // Handle batch-level errors
        const batchErrorInfo = {
          batchNumber: batchNumber,
          documentsCount: batch.length,
          error: {
            message: batchError.message,
            code: batchError.code,
            details: batchError.writeErrors || []
          },
          processingTime: Date.now() - batchStart
        };

        results.errors.push(batchErrorInfo);
        results.batches.push(batchErrorInfo);

        // Determine if operation should continue
        if (batchConfig.ordered && !batchConfig.continueOnError) {
          throw new Error(`Bulk insert failed at batch ${batchNumber}: ${batchError.message}`);
        }

        // Retry failed batch if enabled
        if (this.config.enableErrorRecovery) {
          await this.retryFailedBatch(collection, batch, batchConfig, batchNumber, operation);
        }
      }
    }

    // Calculate final metrics
    results.totalProcessingTime = Date.now() - operation.startTime.getTime();
    results.avgBatchProcessingTime = results.batches
      .filter(b => b.processingTime)
      .reduce((sum, b) => sum + b.processingTime, 0) / results.batches.length;
    results.overallThroughput = results.insertedCount / (results.totalProcessingTime / 1000);
    results.successRate = (results.insertedCount / documents.length) * 100;

    return results;
  }

  async performAdvancedBulkUpdate(collectionName, updates, options = {}) {
    const operation = {
      operationId: `bulk_update_${Date.now()}_${Math.random().toString(36).substr(2, 9)}`,
      operationType: 'bulk_update',
      collectionName: collectionName,
      updatesCount: updates.length,
      startTime: new Date(),
      status: 'processing'
    };

    console.log(`Starting bulk update operation: ${operation.operationId}`);
    console.log(`Updating ${updates.length} documents in ${collectionName}`);

    try {
      // Register operation for tracking
      this.activeOperations.set(operation.operationId, operation);

      // Validate and prepare update operations
      const validatedUpdates = await this.validateAndPrepareUpdates(updates);

      // Optimize batch configuration for updates
      const batchConfig = await this.optimizeBatchConfiguration(validatedUpdates, options);

      // Execute bulk update operations
      const result = await this.executeBulkUpdate(
        this.collections[collectionName],
        validatedUpdates,
        batchConfig,
        operation
      );

      // Complete operation tracking
      operation.endTime = new Date();
      operation.status = 'completed';
      operation.result = result;
      operation.processingTime = operation.endTime - operation.startTime;

      // Log and report results
      await this.logBulkOperation(operation);
      await this.updateOperationMetrics(operation);

      console.log(`Bulk update completed: ${operation.operationId}`);
      console.log(`Updated ${result.modifiedCount} documents successfully`);

      return result;

    } catch (error) {
      console.error(`Bulk update failed: ${operation.operationId}`, error);

      operation.endTime = new Date();
      operation.status = 'failed';
      operation.error = {
        message: error.message,
        stack: error.stack
      };

      await this.handleOperationError(operation, error);
      throw error;

    } finally {
      this.activeOperations.delete(operation.operationId);
    }
  }

  async executeBulkUpdate(collection, updates, batchConfig, operation) {
    const results = {
      matchedCount: 0,
      modifiedCount: 0,
      upsertedCount: 0,
      upsertedIds: [],
      errors: [],
      batches: [],
      totalBatches: Math.ceil(updates.length / batchConfig.batchSize)
    };

    console.log(`Executing bulk update with ${results.totalBatches} batches`);

    // Process updates in optimized batches
    for (let i = 0; i < updates.length; i += batchConfig.batchSize) {
      const batchStart = Date.now();
      const batch = updates.slice(i, i + batchConfig.batchSize);
      const batchNumber = Math.floor(i / batchConfig.batchSize) + 1;

      try {
        console.log(`Processing update batch ${batchNumber}/${results.totalBatches} (${batch.length} operations)`);

        // Create bulk write operations
        const bulkOps = batch.map(update => {
          const updateOp = {
            filter: update.filter,
            update: {
              ...update.update,
              $set: {
                ...update.update.$set,
                _bulkOperationId: operation.operationId,
                _batchNumber: batchNumber,
                _lastUpdated: new Date()
              }
            }
          };

          if (update.upsert) {
            return {
              updateOne: {
                ...updateOp,
                upsert: true
              }
            };
          } else if (update.multi) {
            return {
              updateMany: updateOp
            };
          } else {
            return {
              updateOne: updateOp
            };
          }
        });

        // Execute bulk write
        const batchResult = await collection.bulkWrite(bulkOps, {
          ordered: batchConfig.ordered,
          bypassDocumentValidation: false,
          ...batchConfig.bulkWriteOptions
        });

        // Process batch results
        const batchProcessingTime = Date.now() - batchStart;
        const batchInfo = {
          batchNumber: batchNumber,
          operationsCount: batch.length,
          matchedCount: batchResult.matchedCount || 0,
          modifiedCount: batchResult.modifiedCount || 0,
          upsertedCount: batchResult.upsertedCount || 0,
          processingTime: batchProcessingTime,
          throughput: batch.length / (batchProcessingTime / 1000)
        };

        results.batches.push(batchInfo);
        results.matchedCount += batchInfo.matchedCount;
        results.modifiedCount += batchInfo.modifiedCount;
        results.upsertedCount += batchInfo.upsertedCount;

        if (batchResult.upsertedIds) {
          results.upsertedIds.push(...Object.values(batchResult.upsertedIds));
        }

        // Update progress tracking
        operation.progress = {
          batchesCompleted: batchNumber,
          totalBatches: results.totalBatches,
          operationsProcessed: i + batch.length,
          totalOperations: updates.length,
          completionPercent: Math.round(((i + batch.length) / updates.length) * 100)
        };

        // Progress reporting
        if (batchNumber % 5 === 0 || batchNumber === results.totalBatches) {
          console.log(`Update progress: ${operation.progress.completionPercent}% (${operation.progress.operationsProcessed}/${operation.progress.totalOperations})`);
        }

      } catch (batchError) {
        console.error(`Update batch ${batchNumber} failed:`, batchError);

        const batchErrorInfo = {
          batchNumber: batchNumber,
          operationsCount: batch.length,
          error: {
            message: batchError.message,
            code: batchError.code,
            writeErrors: batchError.writeErrors || []
          },
          processingTime: Date.now() - batchStart
        };

        results.errors.push(batchErrorInfo);
        results.batches.push(batchErrorInfo);

        if (batchConfig.ordered && !batchConfig.continueOnError) {
          throw new Error(`Bulk update failed at batch ${batchNumber}: ${batchError.message}`);
        }
      }
    }

    // Calculate final metrics
    results.totalProcessingTime = Date.now() - operation.startTime.getTime();
    results.avgBatchProcessingTime = results.batches
      .filter(b => b.processingTime)
      .reduce((sum, b) => sum + b.processingTime, 0) / results.batches.length;
    results.overallThroughput = results.modifiedCount / (results.totalProcessingTime / 1000);
    results.successRate = (results.modifiedCount / updates.length) * 100;

    return results;
  }

  async performAdvancedBulkDelete(collectionName, filters, options = {}) {
    const operation = {
      operationId: `bulk_delete_${Date.now()}_${Math.random().toString(36).substr(2, 9)}`,
      operationType: 'bulk_delete',
      collectionName: collectionName,
      filtersCount: filters.length,
      startTime: new Date(),
      status: 'processing'
    };

    console.log(`Starting bulk delete operation: ${operation.operationId}`);
    console.log(`Deleting documents with ${filters.length} filter conditions in ${collectionName}`);

    try {
      this.activeOperations.set(operation.operationId, operation);

      // Validate and prepare delete operations
      const validatedFilters = await this.validateAndPrepareDeletes(filters);

      // Optimize batch configuration for deletes
      const batchConfig = await this.optimizeBatchConfiguration(validatedFilters, options);

      // Execute bulk delete operations
      const result = await this.executeBulkDelete(
        this.collections[collectionName],
        validatedFilters,
        batchConfig,
        operation
      );

      // Complete operation
      operation.endTime = new Date();
      operation.status = 'completed';
      operation.result = result;
      operation.processingTime = operation.endTime - operation.startTime;

      await this.logBulkOperation(operation);
      await this.updateOperationMetrics(operation);

      console.log(`Bulk delete completed: ${operation.operationId}`);
      console.log(`Deleted ${result.deletedCount} documents successfully`);

      return result;

    } catch (error) {
      console.error(`Bulk delete failed: ${operation.operationId}`, error);

      operation.endTime = new Date();
      operation.status = 'failed';
      operation.error = {
        message: error.message,
        stack: error.stack
      };

      await this.handleOperationError(operation, error);
      throw error;

    } finally {
      this.activeOperations.delete(operation.operationId);
    }
  }

  async executeBulkDelete(collection, filters, batchConfig, operation) {
    const results = {
      deletedCount: 0,
      errors: [],
      batches: [],
      totalBatches: Math.ceil(filters.length / batchConfig.batchSize)
    };

    console.log(`Executing bulk delete with ${results.totalBatches} batches`);

    for (let i = 0; i < filters.length; i += batchConfig.batchSize) {
      const batchStart = Date.now();
      const batch = filters.slice(i, i + batchConfig.batchSize);
      const batchNumber = Math.floor(i / batchConfig.batchSize) + 1;

      try {
        console.log(`Processing delete batch ${batchNumber}/${results.totalBatches} (${batch.length} operations)`);

        // Create bulk delete operations
        const bulkOps = batch.map(filter => ({
          deleteMany: {
            filter: filter
          }
        }));

        // Execute bulk write
        const batchResult = await collection.bulkWrite(bulkOps, {
          ordered: batchConfig.ordered,
          ...batchConfig.bulkWriteOptions
        });

        const batchProcessingTime = Date.now() - batchStart;
        const batchInfo = {
          batchNumber: batchNumber,
          operationsCount: batch.length,
          deletedCount: batchResult.deletedCount || 0,
          processingTime: batchProcessingTime,
          throughput: (batchResult.deletedCount || 0) / (batchProcessingTime / 1000)
        };

        results.batches.push(batchInfo);
        results.deletedCount += batchInfo.deletedCount;

        // Update progress
        operation.progress = {
          batchesCompleted: batchNumber,
          totalBatches: results.totalBatches,
          operationsProcessed: i + batch.length,
          totalOperations: filters.length,
          completionPercent: Math.round(((i + batch.length) / filters.length) * 100)
        };

      } catch (batchError) {
        console.error(`Delete batch ${batchNumber} failed:`, batchError);

        const batchErrorInfo = {
          batchNumber: batchNumber,
          operationsCount: batch.length,
          error: {
            message: batchError.message,
            code: batchError.code
          },
          processingTime: Date.now() - batchStart
        };

        results.errors.push(batchErrorInfo);
        results.batches.push(batchErrorInfo);
      }
    }

    results.totalProcessingTime = Date.now() - operation.startTime.getTime();
    results.overallThroughput = results.deletedCount / (results.totalProcessingTime / 1000);

    return results;
  }

  async validateAndPrepareDocuments(documents, operationType) {
    console.log(`Validating and preparing ${documents.length} documents for ${operationType}`);

    const validatedDocuments = [];
    const validationErrors = [];

    for (let i = 0; i < documents.length; i++) {
      const doc = documents[i];

      try {
        // Basic validation
        if (!doc || typeof doc !== 'object') {
          throw new Error('Document must be a valid object');
        }

        // Add operation metadata
        const preparedDoc = {
          ...doc,
          _operationType: operationType,
          _operationTimestamp: new Date(),
          _validatedAt: new Date()
        };

        // Type-specific validation
        if (operationType === 'insert') {
          // Ensure no _id conflicts for inserts
          if (preparedDoc._id) {
            // Keep existing _id but validate it's unique
          }
        }

        validatedDocuments.push(preparedDoc);

      } catch (error) {
        validationErrors.push({
          index: i,
          document: doc,
          error: error.message
        });
      }
    }

    if (validationErrors.length > 0) {
      console.warn(`Found ${validationErrors.length} validation errors out of ${documents.length} documents`);

      // Log validation errors
      await this.collections.bulkOperationLog.insertOne({
        operationType: 'validation',
        validationErrors: validationErrors,
        timestamp: new Date()
      });
    }

    console.log(`Validation complete: ${validatedDocuments.length} valid documents`);
    return validatedDocuments;
  }

  async optimizeBatchConfiguration(data, options) {
    const dataSize = data.length;
    let optimalBatchSize = this.config.defaultBatchSize;

    // Adaptive batch size based on data volume
    if (this.config.adaptiveBatchSizing) {
      if (dataSize > 100000) {
        optimalBatchSize = Math.min(this.config.maxBatchSize, 5000);
      } else if (dataSize > 10000) {
        optimalBatchSize = 2000;
      } else if (dataSize > 1000) {
        optimalBatchSize = 1000;
      } else {
        optimalBatchSize = Math.max(100, dataSize);
      }
    }

    // Consider memory constraints
    if (this.config.enableMemoryOptimization) {
      const estimatedMemoryPerDoc = 1; // KB estimate
      const totalMemoryMB = (dataSize * estimatedMemoryPerDoc) / 1024;

      if (totalMemoryMB > this.config.maxMemoryUsageMB) {
        const memoryAdjustedBatchSize = Math.floor(
          (this.config.maxMemoryUsageMB * 1024) / estimatedMemoryPerDoc
        );
        optimalBatchSize = Math.min(optimalBatchSize, memoryAdjustedBatchSize);
      }
    }

    const batchConfig = {
      batchSize: optimalBatchSize,
      ordered: options.ordered !== false,
      continueOnError: options.continueOnError === true,
      bulkWriteOptions: {
        writeConcern: options.writeConcern || { w: 'majority' },
        ...(options.bulkWriteOptions || {})
      }
    };

    console.log(`Optimized batch configuration: size=${batchConfig.batchSize}, ordered=${batchConfig.ordered}`);
    return batchConfig;
  }

  async logBulkOperation(operation) {
    try {
      await this.collections.bulkOperationLog.insertOne({
        operationId: operation.operationId,
        operationType: operation.operationType,
        collectionName: operation.collectionName,
        status: operation.status,
        startTime: operation.startTime,
        endTime: operation.endTime,
        processingTime: operation.processingTime,
        result: operation.result,
        error: operation.error,
        progress: operation.progress,
        createdAt: new Date()
      });
    } catch (error) {
      console.warn('Error logging bulk operation:', error.message);
    }
  }

  async updateOperationMetrics(operation) {
    try {
      // Update global statistics
      this.operationStats.totalOperations++;
      if (operation.status === 'completed') {
        this.operationStats.successfulOperations++;
      } else {
        this.operationStats.failedOperations++;
      }

      if (operation.result && operation.result.batches) {
        this.operationStats.totalBatches += operation.result.batches.length;

        const avgBatchTime = operation.result.avgBatchProcessingTime;
        if (avgBatchTime) {
          this.operationStats.avgBatchProcessingTime = 
            (this.operationStats.avgBatchProcessingTime + avgBatchTime) / 2;
        }
      }

      // Store detailed metrics
      await this.collections.bulkOperationMetrics.insertOne({
        operationId: operation.operationId,
        operationType: operation.operationType,
        collectionName: operation.collectionName,
        metrics: {
          processingTime: operation.processingTime,
          throughput: operation.result ? operation.result.overallThroughput : null,
          successRate: operation.result ? operation.result.successRate : null,
          batchCount: operation.result ? operation.result.batches.length : null,
          avgBatchTime: operation.result ? operation.result.avgBatchProcessingTime : null
        },
        timestamp: new Date()
      });

    } catch (error) {
      console.warn('Error updating operation metrics:', error.message);
    }
  }

  async generateBulkOperationsReport() {
    console.log('Generating bulk operations performance report...');

    try {
      const report = {
        timestamp: new Date(),
        globalStats: { ...this.operationStats },
        activeOperations: this.activeOperations.size,

        // Recent operations analysis
        recentOperations: await this.collections.bulkOperationLog.find({
          startTime: { $gte: new Date(Date.now() - 24 * 60 * 60 * 1000) }
        }).sort({ startTime: -1 }).limit(50).toArray(),

        // Performance metrics
        performanceMetrics: await this.collections.bulkOperationMetrics.aggregate([
          {
            $match: {
              timestamp: { $gte: new Date(Date.now() - 24 * 60 * 60 * 1000) }
            }
          },
          {
            $group: {
              _id: '$operationType',
              count: { $sum: 1 },
              avgProcessingTime: { $avg: '$metrics.processingTime' },
              avgThroughput: { $avg: '$metrics.throughput' },
              avgSuccessRate: { $avg: '$metrics.successRate' },
              totalBatches: { $sum: '$metrics.batchCount' }
            }
          }
        ]).toArray()
      };

      // Calculate health indicators
      report.healthIndicators = {
        successRate: this.operationStats.totalOperations > 0 ? 
          (this.operationStats.successfulOperations / this.operationStats.totalOperations * 100).toFixed(2) : 0,
        avgProcessingTime: this.operationStats.avgBatchProcessingTime,
        systemLoad: this.activeOperations.size,
        status: this.activeOperations.size > 10 ? 'high_load' : 
                this.operationStats.failedOperations > this.operationStats.successfulOperations ? 'degraded' : 'healthy'
      };

      return report;

    } catch (error) {
      console.error('Error generating bulk operations report:', error);
      return {
        timestamp: new Date(),
        error: error.message,
        globalStats: this.operationStats
      };
    }
  }

  // Additional helper methods for comprehensive bulk operations management
  async setupPerformanceIndexes() {
    console.log('Setting up performance indexes for bulk operations...');

    // Index for operation logging and metrics
    await this.collections.bulkOperationLog.createIndex(
      { operationId: 1, startTime: -1 },
      { background: true }
    );

    await this.collections.bulkOperationMetrics.createIndex(
      { operationType: 1, timestamp: -1 },
      { background: true }
    );
  }

  async adaptBatchSize(currentConfig, batchInfo) {
    // Adaptive batch size optimization based on performance
    if (batchInfo.throughput < 100) { // documents per second
      currentConfig.batchSize = Math.max(100, Math.floor(currentConfig.batchSize * 0.8));
    } else if (batchInfo.throughput > 1000) {
      currentConfig.batchSize = Math.min(this.config.maxBatchSize, Math.floor(currentConfig.batchSize * 1.2));
    }

    return currentConfig;
  }

  async manageMemoryPressure() {
    if (this.config.enableGarbageCollection) {
      if (global.gc) {
        global.gc();
      }
    }
  }
}

// Benefits of MongoDB Advanced Bulk Operations:
// - Native bulk operation support with minimal overhead and maximum throughput
// - Sophisticated error handling with partial success support and retry mechanisms
// - Adaptive batch sizing and performance optimization based on data characteristics
// - Comprehensive operation tracking and monitoring with detailed metrics
// - Memory and resource management for large-scale data processing
// - Built-in transaction-level consistency and ordering guarantees
// - Flexible operation types (insert, update, delete, upsert) with advanced filtering
// - Scalable architecture supporting millions of documents efficiently
// - Integration with MongoDB's native indexing and query optimization
// - SQL-compatible bulk operations through QueryLeaf integration

module.exports = {
  AdvancedBulkOperationsManager
};

Understanding MongoDB Bulk Operations Architecture

Advanced Bulk Processing and Performance Optimization Patterns

Implement sophisticated bulk operation patterns for production MongoDB deployments:

// Enterprise-grade MongoDB bulk operations with advanced optimization
class EnterpriseBulkOperationsOrchestrator extends AdvancedBulkOperationsManager {
  constructor(db, enterpriseConfig) {
    super(db, enterpriseConfig);

    this.enterpriseConfig = {
      ...enterpriseConfig,
      enableDistributedProcessing: true,
      enableDataPartitioning: true,
      enableAutoSharding: true,
      enableComplianceTracking: true,
      enableAuditLogging: true
    };

    this.setupEnterpriseFeatures();
  }

  async implementDistributedBulkProcessing() {
    console.log('Implementing distributed bulk processing across shards...');

    // Advanced distributed processing configuration
    const distributedConfig = {
      shardAwareness: {
        enableShardKeyOptimization: true,
        balanceWorkloadAcrossShards: true,
        minimizeCrossShardOperations: true,
        optimizeForShardDistribution: true
      },

      parallelProcessing: {
        maxConcurrentShards: 8,
        adaptiveParallelism: true,
        loadBalancedDistribution: true,
        resourceAwareScheduling: true
      },

      consistencyManagement: {
        maintainTransactionalBoundaries: true,
        ensureShardConsistency: true,
        coordinateDistributedOperations: true,
        handlePartialFailures: true
      }
    };

    return await this.deployDistributedBulkProcessing(distributedConfig);
  }

  async setupEnterpriseComplianceFramework() {
    console.log('Setting up enterprise compliance framework...');

    const complianceConfig = {
      auditTrail: {
        comprehensiveOperationLogging: true,
        dataLineageTracking: true,
        complianceReporting: true,
        retentionPolicyEnforcement: true
      },

      securityControls: {
        operationAccessControl: true,
        dataEncryptionInTransit: true,
        auditLogEncryption: true,
        nonRepudiationSupport: true
      },

      governanceFramework: {
        operationApprovalWorkflows: true,
        dataClassificationEnforcement: true,
        regulatoryComplianceValidation: true,
        businessRuleValidation: true
      }
    };

    return await this.implementComplianceFramework(complianceConfig);
  }
}

SQL-Style Bulk Operations with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB bulk operations and batch processing:

-- QueryLeaf advanced bulk operations with SQL-familiar syntax for MongoDB

-- Configure bulk operations with comprehensive performance optimization
CONFIGURE BULK_OPERATIONS
SET batch_size = 1000,
    max_batch_size = 10000,
    adaptive_batching = true,
    ordered_operations = true,
    parallel_processing = true,
    max_concurrent_batches = 5,
    error_recovery = true,
    metrics_collection = true;

-- Advanced bulk insert with intelligent batching and error handling
BEGIN BULK_OPERATION 'product_import_2025';

WITH product_validation AS (
  -- Comprehensive data validation and preparation
  SELECT 
    *,

    -- Data quality validation
    CASE 
      WHEN product_name IS NULL OR LENGTH(TRIM(product_name)) = 0 THEN 'invalid_name'
      WHEN category IS NULL OR LENGTH(TRIM(category)) = 0 THEN 'invalid_category'
      WHEN price IS NULL OR price <= 0 THEN 'invalid_price'
      WHEN stock_quantity IS NULL OR stock_quantity < 0 THEN 'invalid_stock'
      ELSE 'valid'
    END as validation_status,

    -- Data enrichment and standardization
    UPPER(TRIM(product_name)) as normalized_name,
    LOWER(TRIM(category)) as normalized_category,
    ROUND(price::NUMERIC, 2) as normalized_price,
    COALESCE(stock_quantity, 0) as normalized_stock,

    -- Business rule validation
    CASE 
      WHEN category = 'electronics' AND price > 10000 THEN 'requires_approval'
      WHEN stock_quantity > 1000 AND supplier_id IS NULL THEN 'requires_supplier'
      ELSE 'approved'
    END as business_validation,

    -- Generate unique identifiers and metadata
    gen_random_uuid() as product_id,
    CURRENT_TIMESTAMP as import_timestamp,
    'bulk_import_2025' as import_batch,
    ROW_NUMBER() OVER (ORDER BY product_name) as import_sequence

  FROM raw_product_import_data
  WHERE status = 'pending'
),

validated_products AS (
  SELECT *
  FROM product_validation
  WHERE validation_status = 'valid'
    AND business_validation = 'approved'
),

rejected_products AS (
  SELECT *
  FROM product_validation  
  WHERE validation_status != 'valid'
    OR business_validation != 'approved'
)

-- Execute high-performance bulk insert with advanced error handling
INSERT INTO products (
  product_id,
  product_name,
  category,
  price,
  stock_quantity,
  supplier_id,
  description,

  -- Metadata and tracking fields
  import_batch,
  import_timestamp,
  import_sequence,
  created_at,
  updated_at,

  -- Search and indexing optimization
  search_keywords,
  normalized_name,
  normalized_category
)
SELECT 
  vp.product_id,
  vp.normalized_name,
  vp.normalized_category,
  vp.normalized_price,
  vp.normalized_stock,
  vp.supplier_id,
  vp.description,

  -- Tracking information
  vp.import_batch,
  vp.import_timestamp,
  vp.import_sequence,
  vp.import_timestamp,
  vp.import_timestamp,

  -- Generated fields for optimization
  ARRAY_CAT(
    STRING_TO_ARRAY(LOWER(vp.normalized_name), ' '),
    STRING_TO_ARRAY(LOWER(vp.normalized_category), ' ')
  ) as search_keywords,
  vp.normalized_name,
  vp.normalized_category

FROM validated_products vp

-- Bulk insert configuration with advanced options
WITH BULK_OPTIONS (
  batch_size = 2000,
  ordered = true,
  continue_on_error = false,
  write_concern = '{ "w": "majority", "j": true }',
  bypass_document_validation = false,

  -- Performance optimization
  adaptive_batching = true,
  parallel_processing = true,
  memory_optimization = true,

  -- Error handling configuration
  retry_attempts = 3,
  retry_delay_ms = 1000,
  dead_letter_queue = true,

  -- Progress tracking
  progress_reporting = true,
  progress_interval = 1000,
  metrics_collection = true
);

-- Log rejected products for review and correction
INSERT INTO product_import_errors (
  import_batch,
  error_timestamp,
  validation_error,
  business_error,
  raw_data,
  requires_manual_review
)
SELECT 
  rp.import_batch,
  CURRENT_TIMESTAMP,
  rp.validation_status,
  rp.business_validation,
  ROW_TO_JSON(rp),
  true
FROM rejected_products rp;

COMMIT BULK_OPERATION;

-- Advanced bulk update with complex business logic and performance optimization
BEGIN BULK_OPERATION 'price_adjustment_2025';

WITH price_adjustment_analysis AS (
  -- Sophisticated price adjustment calculation
  SELECT 
    p.product_id,
    p.product_name,
    p.category,
    p.current_price,
    p.stock_quantity,
    p.last_price_update,
    p.supplier_id,

    -- Market analysis data
    ma.competitor_avg_price,
    ma.market_demand_score,
    ma.seasonal_factor,

    -- Inventory analysis
    CASE 
      WHEN p.stock_quantity = 0 THEN 'out_of_stock'
      WHEN p.stock_quantity < 10 THEN 'low_stock'
      WHEN p.stock_quantity > 100 THEN 'overstocked'
      ELSE 'normal_stock'
    END as stock_status,

    -- Calculate new price with complex business rules
    CASE p.category
      WHEN 'electronics' THEN
        CASE 
          WHEN ma.market_demand_score > 8 AND p.stock_quantity < 10 THEN p.current_price * 1.25
          WHEN ma.competitor_avg_price > p.current_price * 1.1 THEN p.current_price * 1.15
          WHEN p.stock_quantity > 100 THEN p.current_price * 0.90
          ELSE p.current_price * (1 + (ma.seasonal_factor * 0.1))
        END
      WHEN 'clothing' THEN
        CASE 
          WHEN ma.seasonal_factor > 1.2 THEN p.current_price * 1.20
          WHEN p.stock_quantity > 50 THEN p.current_price * 0.85
          WHEN ma.market_demand_score > 7 THEN p.current_price * 1.10
          ELSE p.current_price * 1.05
        END
      WHEN 'books' THEN
        CASE 
          WHEN p.stock_quantity > 200 THEN p.current_price * 0.75
          WHEN ma.market_demand_score > 9 THEN p.current_price * 1.15
          ELSE p.current_price * 1.02
        END
      ELSE p.current_price * (1 + LEAST(0.15, ma.market_demand_score * 0.02))
    END as calculated_new_price,

    -- Adjustment metadata
    'market_analysis_2025' as adjustment_reason,
    CURRENT_TIMESTAMP as adjustment_timestamp

  FROM products p
  LEFT JOIN market_analysis ma ON p.product_id = ma.product_id
  WHERE p.active = true
    AND p.last_price_update < CURRENT_TIMESTAMP - INTERVAL '3 months'
    AND ma.analysis_date >= CURRENT_DATE - INTERVAL '7 days'
),

validated_price_adjustments AS (
  SELECT 
    paa.*,

    -- Price change validation
    paa.calculated_new_price - paa.current_price as price_change,

    ROUND(
      ((paa.calculated_new_price - paa.current_price) / paa.current_price * 100)::NUMERIC, 
      2
    ) as price_change_percent,

    -- Validation rules
    CASE 
      WHEN paa.calculated_new_price <= 0 THEN 'invalid_negative_price'
      WHEN ABS(paa.calculated_new_price - paa.current_price) / paa.current_price > 0.5 THEN 'change_too_large'
      WHEN paa.calculated_new_price = paa.current_price THEN 'no_change_needed'
      ELSE 'valid'
    END as price_validation,

    -- Business impact assessment
    CASE 
      WHEN ABS(paa.calculated_new_price - paa.current_price) > 100 THEN 'high_impact'
      WHEN ABS(paa.calculated_new_price - paa.current_price) > 20 THEN 'medium_impact'
      ELSE 'low_impact'
    END as business_impact

  FROM price_adjustment_analysis paa
),

approved_adjustments AS (
  SELECT *
  FROM validated_price_adjustments
  WHERE price_validation = 'valid'
    AND (business_impact != 'high_impact' OR market_demand_score > 8)
)

-- Execute bulk update with comprehensive tracking and optimization
UPDATE products 
SET 
  current_price = aa.calculated_new_price,
  previous_price = products.current_price,
  last_price_update = aa.adjustment_timestamp,
  price_change_amount = aa.price_change,
  price_change_percent = aa.price_change_percent,
  price_adjustment_reason = aa.adjustment_reason,

  -- Update metadata
  updated_at = aa.adjustment_timestamp,
  version = products.version + 1,

  -- Search index optimization
  price_tier = CASE 
    WHEN aa.calculated_new_price < 25 THEN 'budget'
    WHEN aa.calculated_new_price < 100 THEN 'mid_range'
    WHEN aa.calculated_new_price < 500 THEN 'premium'
    ELSE 'luxury'
  END,

  -- Business intelligence fields
  last_market_analysis = aa.adjustment_timestamp,
  stock_price_ratio = aa.calculated_new_price / GREATEST(aa.stock_quantity, 1),
  competitive_position = CASE 
    WHEN aa.competitor_avg_price > 0 THEN
      CASE 
        WHEN aa.calculated_new_price < aa.competitor_avg_price * 0.9 THEN 'price_leader'
        WHEN aa.calculated_new_price > aa.competitor_avg_price * 1.1 THEN 'premium_positioned'
        ELSE 'market_aligned'
      END
    ELSE 'no_competition_data'
  END

FROM approved_adjustments aa
WHERE products.product_id = aa.product_id

-- Bulk update configuration
WITH BULK_OPTIONS (
  batch_size = 1500,
  ordered = false,  -- Allow parallel processing for updates
  continue_on_error = true,
  write_concern = '{ "w": "majority" }',

  -- Performance optimization for updates
  adaptive_batching = true,
  parallel_processing = true,
  max_concurrent_batches = 8,

  -- Update-specific optimizations
  minimize_index_updates = true,
  batch_index_updates = true,
  optimize_for_throughput = true,

  -- Progress and monitoring
  progress_reporting = true,
  progress_interval = 500,
  operation_timeout_ms = 300000  -- 5 minutes
);

-- Create price adjustment audit trail
INSERT INTO price_adjustment_audit (
  adjustment_batch,
  product_id,
  old_price,
  new_price,
  price_change,
  price_change_percent,
  adjustment_reason,
  business_impact,
  market_data_used,
  adjustment_timestamp,
  approved_by
)
SELECT 
  'bulk_adjustment_2025',
  aa.product_id,
  aa.current_price,
  aa.calculated_new_price,
  aa.price_change,
  aa.price_change_percent,
  aa.adjustment_reason,
  aa.business_impact,
  JSON_OBJECT(
    'competitor_avg_price', aa.competitor_avg_price,
    'market_demand_score', aa.market_demand_score,
    'seasonal_factor', aa.seasonal_factor,
    'stock_status', aa.stock_status
  ),
  aa.adjustment_timestamp,
  'automated_system'
FROM approved_adjustments aa;

COMMIT BULK_OPERATION;

-- Advanced bulk delete with safety checks and cascade handling
BEGIN BULK_OPERATION 'product_cleanup_2025';

WITH deletion_analysis AS (
  -- Identify products for deletion with comprehensive safety checks
  SELECT 
    p.product_id,
    p.product_name,
    p.category,
    p.stock_quantity,
    p.last_sale_date,
    p.created_at,
    p.supplier_id,

    -- Dependency analysis
    (SELECT COUNT(*) FROM order_items oi WHERE oi.product_id = p.product_id) as order_references,
    (SELECT COUNT(*) FROM shopping_cart_items sci WHERE sci.product_id = p.product_id) as cart_references,
    (SELECT COUNT(*) FROM product_reviews pr WHERE pr.product_id = p.product_id) as review_count,
    (SELECT COUNT(*) FROM wishlist_items wi WHERE wi.product_id = p.product_id) as wishlist_references,

    -- Business impact assessment
    COALESCE(p.total_sales_amount, 0) as lifetime_sales,
    COALESCE(p.total_units_sold, 0) as lifetime_units_sold,

    -- Deletion criteria evaluation
    CASE 
      WHEN p.status = 'discontinued' 
       AND p.stock_quantity = 0 
       AND (p.last_sale_date IS NULL OR p.last_sale_date < CURRENT_DATE - INTERVAL '2 years')
       THEN 'eligible_discontinued'

      WHEN p.created_at < CURRENT_DATE - INTERVAL '5 years'
       AND COALESCE(p.total_units_sold, 0) = 0
       AND p.stock_quantity = 0
       THEN 'eligible_never_sold'

      WHEN p.status = 'draft'
       AND p.created_at < CURRENT_DATE - INTERVAL '1 year'
       AND p.stock_quantity = 0
       THEN 'eligible_old_draft'

      ELSE 'not_eligible'
    END as deletion_eligibility,

    -- Safety check results
    CASE 
      WHEN (SELECT COUNT(*) FROM order_items oi WHERE oi.product_id = p.product_id) > 0 THEN 'has_order_references'
      WHEN (SELECT COUNT(*) FROM shopping_cart_items sci WHERE sci.product_id = p.product_id) > 0 THEN 'has_cart_references'
      WHEN p.stock_quantity > 0 THEN 'has_inventory'
      WHEN p.status = 'active' THEN 'still_active'
      ELSE 'safe_to_delete'
    END as safety_check

  FROM products p
  WHERE p.status IN ('discontinued', 'draft', 'inactive')
),

safe_deletions AS (
  SELECT *
  FROM deletion_analysis
  WHERE deletion_eligibility != 'not_eligible'
    AND safety_check = 'safe_to_delete'
    AND order_references = 0
    AND cart_references = 0
),

cascade_cleanup_required AS (
  SELECT 
    sd.*,
    ARRAY[
      CASE WHEN sd.review_count > 0 THEN 'product_reviews' END,
      CASE WHEN sd.wishlist_references > 0 THEN 'wishlist_items' END
    ]::TEXT[] as cascade_tables
  FROM safe_deletions sd
  WHERE sd.review_count > 0 OR sd.wishlist_references > 0
)

-- Archive products before deletion
INSERT INTO archived_products
SELECT 
  p.*,
  sd.deletion_eligibility as archive_reason,
  CURRENT_TIMESTAMP as archived_at,
  'bulk_cleanup_2025' as archive_batch
FROM products p
JOIN safe_deletions sd ON p.product_id = sd.product_id;

-- Execute cascade deletions first
DELETE FROM product_reviews 
WHERE product_id IN (
  SELECT product_id FROM cascade_cleanup_required 
  WHERE 'product_reviews' = ANY(cascade_tables)
)
WITH BULK_OPTIONS (
  batch_size = 500,
  continue_on_error = true,
  ordered = false
);

DELETE FROM wishlist_items
WHERE product_id IN (
  SELECT product_id FROM cascade_cleanup_required
  WHERE 'wishlist_items' = ANY(cascade_tables)  
)
WITH BULK_OPTIONS (
  batch_size = 500,
  continue_on_error = true,
  ordered = false
);

-- Execute main product deletion
DELETE FROM products 
WHERE product_id IN (
  SELECT product_id FROM safe_deletions
)
WITH BULK_OPTIONS (
  batch_size = 1000,
  continue_on_error = false,  -- Fail fast for main deletions
  ordered = false,

  -- Deletion-specific optimizations
  optimize_for_throughput = true,
  minimal_logging = false,  -- Keep full audit trail

  -- Safety configurations
  max_deletion_rate = 100,  -- Max deletions per second
  safety_checks = true,
  confirm_deletion_count = true
);

-- Log deletion operation results
INSERT INTO bulk_operation_audit (
  operation_type,
  operation_batch,
  collection_name,
  records_processed,
  records_affected,
  operation_timestamp,
  operation_metadata
)
SELECT 
  'bulk_delete',
  'product_cleanup_2025', 
  'products',
  (SELECT COUNT(*) FROM safe_deletions),
  @@ROWCOUNT,  -- Actual deleted count
  CURRENT_TIMESTAMP,
  JSON_OBJECT(
    'deletion_criteria', 'discontinued_and_never_sold',
    'safety_checks_passed', true,
    'cascade_cleanup_performed', true,
    'products_archived', true
  );

COMMIT BULK_OPERATION;

-- Comprehensive bulk operations monitoring and analysis
WITH bulk_operation_analytics AS (
  SELECT 
    DATE_TRUNC('hour', operation_timestamp) as time_bucket,
    operation_type,
    collection_name,

    -- Volume metrics
    COUNT(*) as operation_count,
    SUM(records_processed) as total_records_processed,
    SUM(records_affected) as total_records_affected,

    -- Performance metrics  
    AVG(processing_time_ms) as avg_processing_time_ms,
    PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY processing_time_ms) as p95_processing_time_ms,
    AVG(throughput_records_per_second) as avg_throughput,

    -- Success metrics
    COUNT(*) FILTER (WHERE status = 'completed') as successful_operations,
    COUNT(*) FILTER (WHERE status = 'failed') as failed_operations,
    COUNT(*) FILTER (WHERE status = 'partial_success') as partial_success_operations,

    -- Resource utilization
    AVG(batch_count) as avg_batches_per_operation,
    AVG(memory_usage_mb) as avg_memory_usage_mb,
    AVG(cpu_usage_percent) as avg_cpu_usage_percent,

    -- Error analysis
    SUM(retry_attempts) as total_retry_attempts,
    COUNT(*) FILTER (WHERE error_type IS NOT NULL) as operations_with_errors

  FROM bulk_operation_log
  WHERE operation_timestamp >= CURRENT_TIMESTAMP - INTERVAL '24 hours'
  GROUP BY DATE_TRUNC('hour', operation_timestamp), operation_type, collection_name
),

performance_trends AS (
  SELECT 
    operation_type,
    collection_name,

    -- Trend analysis
    AVG(avg_processing_time_ms) as overall_avg_processing_time,
    STDDEV(avg_processing_time_ms) as processing_time_variability,
    AVG(avg_throughput) as overall_avg_throughput,

    -- Capacity analysis
    MAX(total_records_processed) as max_records_in_hour,
    AVG(avg_memory_usage_mb) as typical_memory_usage,
    MAX(avg_memory_usage_mb) as peak_memory_usage,

    -- Reliability metrics
    ROUND(
      (SUM(successful_operations)::FLOAT / 
       NULLIF(SUM(operation_count), 0)) * 100, 
      2
    ) as success_rate_percent,

    SUM(total_retry_attempts) as total_retries,
    SUM(operations_with_errors) as error_count

  FROM bulk_operation_analytics
  GROUP BY operation_type, collection_name
)

SELECT 
  boa.time_bucket,
  boa.operation_type,
  boa.collection_name,

  -- Current period metrics
  boa.operation_count,
  boa.total_records_processed,
  boa.total_records_affected,

  -- Performance indicators
  ROUND(boa.avg_processing_time_ms::NUMERIC, 2) as avg_processing_time_ms,
  ROUND(boa.p95_processing_time_ms::NUMERIC, 2) as p95_processing_time_ms,
  ROUND(boa.avg_throughput::NUMERIC, 2) as avg_throughput_rps,

  -- Success metrics
  boa.successful_operations,
  boa.failed_operations,
  boa.partial_success_operations,
  ROUND(
    (boa.successful_operations::FLOAT / 
     NULLIF(boa.operation_count, 0)) * 100,
    2
  ) as success_rate_percent,

  -- Resource utilization
  ROUND(boa.avg_batches_per_operation::NUMERIC, 1) as avg_batches_per_operation,
  ROUND(boa.avg_memory_usage_mb::NUMERIC, 2) as avg_memory_usage_mb,
  ROUND(boa.avg_cpu_usage_percent::NUMERIC, 1) as avg_cpu_usage_percent,

  -- Performance comparison with trends
  pt.overall_avg_processing_time,
  pt.overall_avg_throughput,
  pt.success_rate_percent as historical_success_rate,

  -- Performance indicators
  CASE 
    WHEN boa.avg_processing_time_ms > pt.overall_avg_processing_time * 1.5 THEN 'degraded'
    WHEN boa.avg_processing_time_ms < pt.overall_avg_processing_time * 0.8 THEN 'improved'
    ELSE 'stable'
  END as performance_trend,

  -- Health status
  CASE 
    WHEN boa.failed_operations > boa.successful_operations THEN 'unhealthy'
    WHEN boa.avg_processing_time_ms > 60000 THEN 'slow'  -- > 1 minute
    WHEN boa.avg_throughput < 10 THEN 'low_throughput'
    WHEN (boa.successful_operations::FLOAT / NULLIF(boa.operation_count, 0)) < 0.95 THEN 'unreliable'
    ELSE 'healthy'
  END as health_status,

  -- Optimization recommendations
  ARRAY[
    CASE WHEN boa.avg_processing_time_ms > 30000 THEN 'Consider increasing batch size' END,
    CASE WHEN boa.avg_memory_usage_mb > 1024 THEN 'Monitor memory usage' END,
    CASE WHEN boa.total_retry_attempts > 0 THEN 'Investigate retry causes' END,
    CASE WHEN boa.avg_throughput < pt.overall_avg_throughput * 0.8 THEN 'Performance degradation detected' END
  ]::TEXT[] as recommendations

FROM bulk_operation_analytics boa
LEFT JOIN performance_trends pt ON 
  boa.operation_type = pt.operation_type AND 
  boa.collection_name = pt.collection_name
ORDER BY boa.time_bucket DESC, boa.operation_type, boa.collection_name;

-- Real-time bulk operations dashboard
CREATE VIEW bulk_operations_dashboard AS
WITH current_operations AS (
  SELECT 
    COUNT(*) as active_operations,
    SUM(CASE WHEN status = 'processing' THEN 1 ELSE 0 END) as processing_operations,
    SUM(CASE WHEN status = 'queued' THEN 1 ELSE 0 END) as queued_operations,
    AVG(progress_percent) as avg_progress_percent
  FROM active_bulk_operations
),

recent_performance AS (
  SELECT 
    COUNT(*) as operations_last_hour,
    AVG(processing_time_ms) as avg_processing_time_last_hour,
    AVG(throughput_records_per_second) as avg_throughput_last_hour,
    COUNT(*) FILTER (WHERE status = 'completed') as successful_operations_last_hour,
    COUNT(*) FILTER (WHERE status = 'failed') as failed_operations_last_hour
  FROM bulk_operation_log
  WHERE operation_timestamp >= CURRENT_TIMESTAMP - INTERVAL '1 hour'
),

system_health AS (
  SELECT 
    CASE 
      WHEN co.processing_operations > 10 THEN 'high_load'
      WHEN co.queued_operations > 20 THEN 'queue_backlog'
      WHEN rp.failed_operations_last_hour > rp.successful_operations_last_hour THEN 'high_error_rate'
      WHEN rp.avg_processing_time_last_hour > 120000 THEN 'slow_performance'  -- > 2 minutes
      ELSE 'healthy'
    END as overall_status,

    co.active_operations,
    co.processing_operations,
    co.queued_operations,
    ROUND(co.avg_progress_percent::NUMERIC, 1) as avg_progress_percent,

    rp.operations_last_hour,
    ROUND(rp.avg_processing_time_last_hour::NUMERIC, 2) as avg_processing_time_ms,
    ROUND(rp.avg_throughput_last_hour::NUMERIC, 2) as avg_throughput_rps,
    rp.successful_operations_last_hour,
    rp.failed_operations_last_hour,

    CASE 
      WHEN rp.operations_last_hour > 0 THEN
        ROUND((rp.successful_operations_last_hour::FLOAT / rp.operations_last_hour * 100)::NUMERIC, 2)
      ELSE 0
    END as success_rate_last_hour

  FROM current_operations co
  CROSS JOIN recent_performance rp
)

SELECT 
  CURRENT_TIMESTAMP as dashboard_time,
  sh.overall_status,
  sh.active_operations,
  sh.processing_operations,
  sh.queued_operations,
  sh.avg_progress_percent,
  sh.operations_last_hour,
  sh.avg_processing_time_ms,
  sh.avg_throughput_rps,
  sh.successful_operations_last_hour,
  sh.failed_operations_last_hour,
  sh.success_rate_last_hour,

  -- Alert conditions
  ARRAY[
    CASE WHEN sh.processing_operations > 15 THEN 'High number of concurrent operations' END,
    CASE WHEN sh.queued_operations > 25 THEN 'Large operation queue detected' END,  
    CASE WHEN sh.success_rate_last_hour < 90 THEN 'Low success rate detected' END,
    CASE WHEN sh.avg_processing_time_ms > 180000 THEN 'Slow processing times detected' END
  ]::TEXT[] as current_alerts,

  -- Capacity indicators
  CASE 
    WHEN sh.active_operations > 20 THEN 'at_capacity'
    WHEN sh.active_operations > 10 THEN 'high_utilization'
    ELSE 'normal_capacity'
  END as capacity_status

FROM system_health sh;

-- QueryLeaf provides comprehensive MongoDB bulk operations capabilities:
-- 1. SQL-familiar syntax for complex bulk operations with advanced batching
-- 2. Intelligent performance optimization with adaptive batch sizing
-- 3. Comprehensive error handling and recovery mechanisms
-- 4. Real-time progress tracking and monitoring capabilities
-- 5. Advanced data validation and business rule enforcement
-- 6. Enterprise-grade audit trails and compliance logging
-- 7. Memory and resource management for large-scale operations
-- 8. Integration with MongoDB's native bulk operation optimizations
-- 9. Sophisticated cascade handling and dependency management
-- 10. Production-ready monitoring and alerting with health indicators

Best Practices for Production Bulk Operations

Bulk Operations Strategy Design

Essential principles for effective MongoDB bulk operations deployment:

Batch Size Optimization: Configure adaptive batch sizing based on data characteristics, system resources, and performance requirements
Error Handling Strategy: Implement comprehensive error recovery with retry logic, partial success handling, and dead letter queue management
Resource Management: Monitor memory usage, connection pooling, and system resources during large-scale bulk operations
Performance Monitoring: Track throughput, latency, and success rates with real-time alerting for performance degradation
Data Validation: Implement robust validation pipelines that catch errors early and minimize processing overhead
Transaction Management: Design bulk operations with appropriate consistency guarantees and transaction boundaries

Enterprise Bulk Processing Optimization

Optimize bulk operations for production enterprise environments:

Distributed Processing: Implement shard-aware bulk operations that optimize workload distribution across MongoDB clusters
Compliance Integration: Ensure bulk operations meet audit requirements with comprehensive logging and data lineage tracking
Capacity Planning: Design bulk processing systems that can scale with data volume growth and peak processing requirements
Security Controls: Implement access controls, encryption, and security monitoring for bulk data operations
Operational Integration: Integrate bulk operations with monitoring, alerting, and incident response workflows
Cost Optimization: Monitor and optimize resource usage for efficient bulk processing operations

Conclusion

MongoDB bulk operations provide sophisticated capabilities for high-performance batch processing, data migrations, and large-scale data operations that eliminate the complexity and performance limitations of traditional individual record processing approaches. Native bulk write operations offer scalable, efficient, and reliable data processing with comprehensive error handling and performance optimization.

Key MongoDB bulk operations benefits include:

High-Performance Processing: Native bulk operations with minimal overhead and maximum throughput for millions of documents
Advanced Error Handling: Sophisticated error recovery with partial success support and comprehensive retry mechanisms
Intelligent Optimization: Adaptive batch sizing and performance optimization based on data characteristics and system resources
Comprehensive Monitoring: Real-time operation tracking with detailed metrics and health indicators
Enterprise Scalability: Production-ready bulk processing that scales efficiently with data volume and system complexity
SQL Accessibility: Familiar SQL-style bulk operations through QueryLeaf for accessible high-performance data processing

Whether you're performing data migrations, batch updates, large-scale imports, or complex data transformations, MongoDB bulk operations with QueryLeaf's familiar SQL interface provide the foundation for reliable, efficient, and scalable high-performance data processing.

QueryLeaf Integration: QueryLeaf automatically translates SQL-style bulk operations into MongoDB's native bulk write operations, making high-performance batch processing accessible to SQL-oriented development teams. Complex validation pipelines, error handling strategies, and performance optimizations are seamlessly handled through familiar SQL constructs, enabling sophisticated bulk data operations without requiring deep MongoDB bulk processing expertise.

The combination of MongoDB's robust bulk operation capabilities with SQL-style batch processing syntax makes it an ideal platform for applications requiring both high-performance data operations and familiar database management patterns, ensuring your bulk processing workflows can handle enterprise-scale data volumes while maintaining reliability and performance as your systems grow and evolve.

December 2, 2025
23 min read

MongoDB Transactions and ACID Properties: Distributed Systems Consistency and Multi-Document Operations

Modern applications require transactional consistency across multiple operations to maintain data integrity, ensure business rule enforcement, and provide reliable state management in distributed environments. Traditional databases provide ACID transaction support, but scaling these capabilities across distributed systems introduces complexity in maintaining consistency while preserving performance and availability across multiple nodes and data centers.

MongoDB transactions provide comprehensive ACID properties with multi-document operations, distributed consistency guarantees, and session-based transaction management designed for modern distributed applications. Unlike traditional databases that struggle with distributed transactions, MongoDB's transaction implementation leverages replica sets and sharded clusters to provide enterprise-grade consistency while maintaining the flexibility and scalability of document-based data models.

The Traditional Transaction Management Challenge

Implementing consistent multi-table operations in traditional databases requires complex transaction coordination:

-- Traditional PostgreSQL transactions - complex multi-table coordination with limitations

-- Begin transaction for order processing workflow
BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED;

-- Order processing with inventory management
CREATE TABLE orders (
    order_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    customer_id UUID NOT NULL,
    order_number VARCHAR(50) UNIQUE NOT NULL,
    order_status VARCHAR(20) NOT NULL DEFAULT 'pending',
    order_total DECIMAL(15,2) NOT NULL,
    created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,

    -- Payment information
    payment_method VARCHAR(50),
    payment_status VARCHAR(20) DEFAULT 'pending',
    payment_reference VARCHAR(100),
    payment_amount DECIMAL(15,2),

    -- Shipping information
    shipping_address JSONB,
    shipping_method VARCHAR(50),
    shipping_cost DECIMAL(10,2),
    estimated_delivery DATE,

    -- Business metadata
    sales_channel VARCHAR(50),
    promotions_applied JSONB,
    order_notes TEXT,

    FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);

-- Order items with inventory tracking
CREATE TABLE order_items (
    item_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    order_id UUID NOT NULL,
    product_id UUID NOT NULL,
    quantity INTEGER NOT NULL CHECK (quantity > 0),
    unit_price DECIMAL(10,2) NOT NULL,
    line_total DECIMAL(15,2) NOT NULL,

    -- Product details snapshot
    product_sku VARCHAR(100),
    product_name VARCHAR(500),
    product_variant JSONB,

    -- Inventory management
    reserved_inventory BOOLEAN DEFAULT FALSE,
    reservation_id UUID,
    inventory_location VARCHAR(100),

    -- Pricing and promotions
    original_price DECIMAL(10,2),
    discount_amount DECIMAL(10,2) DEFAULT 0,
    tax_amount DECIMAL(10,2) DEFAULT 0,

    FOREIGN KEY (order_id) REFERENCES orders(order_id) ON DELETE CASCADE,
    FOREIGN KEY (product_id) REFERENCES products(product_id)
);

-- Inventory management table
CREATE TABLE inventory (
    inventory_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    product_id UUID NOT NULL,
    location_id VARCHAR(100) NOT NULL,
    available_quantity INTEGER NOT NULL CHECK (available_quantity >= 0),
    reserved_quantity INTEGER NOT NULL DEFAULT 0,
    total_quantity INTEGER GENERATED ALWAYS AS (available_quantity + reserved_quantity) STORED,

    -- Stock management
    reorder_point INTEGER DEFAULT 10,
    reorder_quantity INTEGER DEFAULT 50,
    last_restocked TIMESTAMP,

    -- Cost and valuation
    unit_cost DECIMAL(10,2),
    total_cost DECIMAL(15,2) GENERATED ALWAYS AS (total_quantity * unit_cost) STORED,

    -- Tracking
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    FOREIGN KEY (product_id) REFERENCES products(product_id),
    UNIQUE(product_id, location_id)
);

-- Payment transactions
CREATE TABLE payments (
    payment_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    order_id UUID NOT NULL,
    payment_method VARCHAR(50) NOT NULL,
    payment_amount DECIMAL(15,2) NOT NULL,
    payment_status VARCHAR(20) NOT NULL DEFAULT 'pending',

    -- Payment processing details
    payment_processor VARCHAR(50),
    processor_transaction_id VARCHAR(200),
    processor_response JSONB,

    -- Authorization and capture
    authorization_code VARCHAR(50),
    authorization_amount DECIMAL(15,2),
    capture_amount DECIMAL(15,2),
    refund_amount DECIMAL(15,2) DEFAULT 0,

    -- Timing
    authorized_at TIMESTAMP,
    captured_at TIMESTAMP,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    FOREIGN KEY (order_id) REFERENCES orders(order_id)
);

-- Complex transaction procedure for order processing
CREATE OR REPLACE FUNCTION process_customer_order(
    p_customer_id UUID,
    p_order_items JSONB,
    p_payment_info JSONB,
    p_shipping_info JSONB
) RETURNS TABLE(
    order_id UUID,
    order_number VARCHAR,
    total_amount DECIMAL,
    payment_status VARCHAR,
    inventory_status VARCHAR,
    success BOOLEAN,
    error_message TEXT
) AS $$
DECLARE
    v_order_id UUID;
    v_order_number VARCHAR(50);
    v_order_total DECIMAL(15,2) := 0;
    v_item JSONB;
    v_product_id UUID;
    v_quantity INTEGER;
    v_unit_price DECIMAL(10,2);
    v_available_inventory INTEGER;
    v_payment_id UUID;
    v_authorization_result JSONB;
    v_error_occurred BOOLEAN := FALSE;
    v_error_message TEXT := '';

BEGIN
    -- Generate order number and ID
    v_order_id := gen_random_uuid();
    v_order_number := 'ORD-' || to_char(CURRENT_TIMESTAMP, 'YYYYMMDD') || '-' || 
                      to_char(extract(epoch from CURRENT_TIMESTAMP)::integer % 10000, 'FM0000');

    -- Validate customer exists
    IF NOT EXISTS (SELECT 1 FROM customers WHERE customer_id = p_customer_id) THEN
        RETURN QUERY SELECT v_order_id, v_order_number, 0::DECIMAL(15,2), 'failed'::VARCHAR, 
                           'validation_failed'::VARCHAR, FALSE, 'Customer not found'::TEXT;
        RETURN;
    END IF;

    -- Start order processing transaction
    SAVEPOINT order_processing_start;

    BEGIN
        -- Validate and reserve inventory for each item
        FOR v_item IN SELECT * FROM jsonb_array_elements(p_order_items)
        LOOP
            v_product_id := (v_item->>'product_id')::UUID;
            v_quantity := (v_item->>'quantity')::INTEGER;
            v_unit_price := (v_item->>'unit_price')::DECIMAL(10,2);

            -- Check product exists and is active
            IF NOT EXISTS (
                SELECT 1 FROM products 
                WHERE product_id = v_product_id 
                AND status = 'active'
            ) THEN
                v_error_occurred := TRUE;
                v_error_message := 'Product ' || v_product_id || ' not found or inactive';
                EXIT;
            END IF;

            -- Check inventory availability with row-level locking
            SELECT available_quantity INTO v_available_inventory
            FROM inventory 
            WHERE product_id = v_product_id 
            AND location_id = 'main_warehouse'
            FOR UPDATE; -- Lock inventory record

            IF v_available_inventory < v_quantity THEN
                v_error_occurred := TRUE;
                v_error_message := 'Insufficient inventory for product ' || v_product_id || 
                                  ': requested ' || v_quantity || ', available ' || v_available_inventory;
                EXIT;
            END IF;

            -- Reserve inventory
            UPDATE inventory 
            SET available_quantity = available_quantity - v_quantity,
                reserved_quantity = reserved_quantity + v_quantity,
                updated_at = CURRENT_TIMESTAMP
            WHERE product_id = v_product_id 
            AND location_id = 'main_warehouse';

            v_order_total := v_order_total + (v_quantity * v_unit_price);
        END LOOP;

        -- If inventory validation failed, rollback and return error
        IF v_error_occurred THEN
            ROLLBACK TO SAVEPOINT order_processing_start;
            RETURN QUERY SELECT v_order_id, v_order_number, v_order_total, 'failed'::VARCHAR,
                               'inventory_insufficient'::VARCHAR, FALSE, v_error_message;
            RETURN;
        END IF;

        -- Create order record
        INSERT INTO orders (
            order_id, customer_id, order_number, order_status, order_total,
            payment_method, shipping_address, shipping_method, shipping_cost,
            sales_channel, order_notes
        ) VALUES (
            v_order_id, p_customer_id, v_order_number, 'pending', v_order_total,
            p_payment_info->>'method',
            p_shipping_info->'address',
            p_shipping_info->>'method',
            (p_shipping_info->>'cost')::DECIMAL(10,2),
            'web',
            'Order processed via transaction system'
        );

        -- Create order items
        FOR v_item IN SELECT * FROM jsonb_array_elements(p_order_items)
        LOOP
            INSERT INTO order_items (
                order_id, product_id, quantity, unit_price, line_total,
                product_sku, product_name, reserved_inventory, inventory_location
            ) 
            SELECT 
                v_order_id,
                (v_item->>'product_id')::UUID,
                (v_item->>'quantity')::INTEGER,
                (v_item->>'unit_price')::DECIMAL(10,2),
                (v_item->>'quantity')::INTEGER * (v_item->>'unit_price')::DECIMAL(10,2),
                p.sku,
                p.name,
                TRUE,
                'main_warehouse'
            FROM products p 
            WHERE p.product_id = (v_item->>'product_id')::UUID;
        END LOOP;

        -- Process payment authorization
        INSERT INTO payments (
            payment_id, order_id, payment_method, payment_amount, payment_status,
            payment_processor, authorization_amount
        ) VALUES (
            gen_random_uuid(), v_order_id, 
            p_payment_info->>'method',
            v_order_total,
            'authorizing',
            p_payment_info->>'processor',
            v_order_total
        ) RETURNING payment_id INTO v_payment_id;

        -- Simulate payment processing (in real system would call external API)
        -- This creates a critical point where external system coordination is required
        IF (p_payment_info->>'test_mode')::BOOLEAN = TRUE THEN
            -- Simulate successful authorization for testing
            UPDATE payments 
            SET payment_status = 'authorized',
                authorization_code = 'TEST_AUTH_' || extract(epoch from CURRENT_TIMESTAMP)::text,
                authorized_at = CURRENT_TIMESTAMP,
                processor_response = jsonb_build_object(
                    'status', 'approved',
                    'auth_code', 'TEST_AUTH_CODE',
                    'processor_ref', 'TEST_REF_' || v_payment_id,
                    'processed_at', CURRENT_TIMESTAMP
                )
            WHERE payment_id = v_payment_id;

            -- Update order status
            UPDATE orders 
            SET payment_status = 'authorized',
                order_status = 'confirmed',
                updated_at = CURRENT_TIMESTAMP
            WHERE order_id = v_order_id;

        ELSE
            -- In production, this would require external payment processor integration
            -- which introduces distributed transaction complexity and potential failures
            v_error_occurred := TRUE;
            v_error_message := 'Payment processing not available in non-test mode';
        END IF;

        -- Final validation and commit preparation
        IF v_error_occurred THEN
            ROLLBACK TO SAVEPOINT order_processing_start;

            RETURN QUERY SELECT v_order_id, v_order_number, v_order_total, 'failed'::VARCHAR,
                               'payment_failed'::VARCHAR, FALSE, v_error_message;
            RETURN;
        END IF;

        -- Success case
        COMMIT;

        RETURN QUERY SELECT v_order_id, v_order_number, v_order_total, 'authorized'::VARCHAR,
                           'reserved'::VARCHAR, TRUE, 'Order processed successfully'::TEXT;

    EXCEPTION
        WHEN serialization_failure THEN
            ROLLBACK TO SAVEPOINT order_processing_start;
            RETURN QUERY SELECT v_order_id, v_order_number, v_order_total, 'failed'::VARCHAR,
                               'serialization_error'::VARCHAR, FALSE, 'Transaction serialization failed'::TEXT;

        WHEN deadlock_detected THEN
            ROLLBACK TO SAVEPOINT order_processing_start;
            RETURN QUERY SELECT v_order_id, v_order_number, v_order_total, 'failed'::VARCHAR,
                               'deadlock_error'::VARCHAR, FALSE, 'Deadlock detected during processing'::TEXT;

        WHEN OTHERS THEN
            ROLLBACK TO SAVEPOINT order_processing_start;
            RETURN QUERY SELECT v_order_id, v_order_number, v_order_total, 'failed'::VARCHAR,
                               'system_error'::VARCHAR, FALSE, SQLERRM::TEXT;
    END;
END;
$$ LANGUAGE plpgsql;

-- Test the complex transaction workflow
SELECT * FROM process_customer_order(
    'customer-uuid-123'::UUID,
    '[
        {"product_id": "product-uuid-1", "quantity": 2, "unit_price": 29.99},
        {"product_id": "product-uuid-2", "quantity": 1, "unit_price": 149.99}
    ]'::JSONB,
    '{"method": "credit_card", "processor": "stripe", "test_mode": true}'::JSONB,
    '{"method": "standard", "cost": 9.99, "address": {"street": "123 Main St", "city": "Boston", "state": "MA"}}'::JSONB
);

-- Monitor transaction performance and conflicts
WITH transaction_analysis AS (
    SELECT 
        schemaname,
        tablename,
        n_tup_ins as inserts,
        n_tup_upd as updates,
        n_tup_del as deletes,
        n_deadlocks as deadlock_count,

        -- Lock analysis
        CASE 
            WHEN n_deadlocks > 0 THEN 'deadlock_issues'
            WHEN n_tup_upd > n_tup_ins * 2 THEN 'high_update_contention'
            ELSE 'normal_operation'
        END as transaction_health,

        -- Performance indicators
        ROUND(
            (n_tup_upd + n_tup_del)::DECIMAL / NULLIF((n_tup_ins + n_tup_upd + n_tup_del), 0) * 100, 
            2
        ) as modification_ratio

    FROM pg_stat_user_tables
    WHERE schemaname = 'public'
    AND tablename IN ('orders', 'order_items', 'inventory', 'payments')
),

lock_conflicts AS (
    SELECT 
        relation::regclass as table_name,
        mode as lock_mode,
        granted,
        COUNT(*) as lock_count
    FROM pg_locks 
    WHERE relation IS NOT NULL
    GROUP BY relation, mode, granted
)

SELECT 
    ta.tablename,
    ta.transaction_health,
    ta.modification_ratio || '%' as modification_percentage,
    ta.deadlock_count,

    -- Lock conflict analysis
    COALESCE(lc.lock_count, 0) as active_locks,
    COALESCE(lc.lock_mode, 'none') as primary_lock_mode,

    -- Transaction recommendations
    CASE 
        WHEN ta.deadlock_count > 5 THEN 'Redesign transaction order and locking strategy'
        WHEN ta.modification_ratio > 80 THEN 'Consider read replicas for query workload'
        WHEN ta.transaction_health = 'high_update_contention' THEN 'Optimize update batching and reduce lock duration'
        ELSE 'Transaction patterns within acceptable parameters'
    END as optimization_recommendation

FROM transaction_analysis ta
LEFT JOIN lock_conflicts lc ON ta.tablename = lc.table_name::text
ORDER BY ta.deadlock_count DESC, ta.modification_ratio DESC;

-- Problems with traditional transaction management:
-- 1. Complex multi-table coordination requiring careful lock management and deadlock prevention
-- 2. Limited scalability due to lock contention and serialization constraints
-- 3. Difficulty implementing distributed transactions across services and external systems
-- 4. Performance overhead from lock management and transaction coordination mechanisms
-- 5. Complex error handling for various transaction failure scenarios and rollback procedures
-- 6. Limited flexibility in transaction isolation levels affecting performance vs consistency
-- 7. Challenges with long-running transactions and their impact on system performance
-- 8. Complexity in implementing saga patterns for distributed transaction coordination
-- 9. Manual management of transaction boundaries and session coordination
-- 10. Difficulty in monitoring and optimizing transaction performance across complex workflows

MongoDB provides native ACID transactions with multi-document operations and distributed consistency:

// MongoDB Transactions - native ACID compliance with distributed consistency management
const { MongoClient, ObjectId } = require('mongodb');

// Advanced MongoDB Transaction Manager with ACID guarantees and distributed consistency
class MongoTransactionManager {
  constructor(config = {}) {
    this.config = {
      // Connection configuration
      uri: config.uri || 'mongodb://localhost:27017',
      database: config.database || 'ecommerce_platform',

      // Transaction configuration
      defaultTransactionOptions: {
        readConcern: { level: 'majority' },
        writeConcern: { w: 'majority', j: true },
        maxCommitTimeMS: 30000,
        maxTransactionLockRequestTimeoutMillis: 5000
      },

      // Session management
      sessionPoolSize: config.sessionPoolSize || 10,
      enableSessionPooling: config.enableSessionPooling !== false,

      // Retry and error handling
      maxRetryAttempts: config.maxRetryAttempts || 3,
      retryDelayMs: config.retryDelayMs || 1000,
      enableAutoRetry: config.enableAutoRetry !== false,

      // Monitoring and analytics
      enableTransactionMonitoring: config.enableTransactionMonitoring !== false,
      enablePerformanceTracking: config.enablePerformanceTracking !== false,

      // Advanced features
      enableDistributedTransactions: config.enableDistributedTransactions !== false,
      enableCausalConsistency: config.enableCausalConsistency !== false
    };

    this.client = null;
    this.database = null;
    this.sessionPool = [];
    this.transactionMetrics = {
      totalTransactions: 0,
      successfulTransactions: 0,
      failedTransactions: 0,
      retriedTransactions: 0,
      averageTransactionTime: 0,
      transactionTypes: new Map()
    };
  }

  async initialize() {
    console.log('Initializing MongoDB Transaction Manager with ACID guarantees...');

    try {
      // Connect with transaction-optimized settings
      this.client = new MongoClient(this.config.uri, {
        // Replica set configuration for transactions
        readPreference: 'primary',
        readConcern: { level: 'majority' },
        writeConcern: { w: 'majority', j: true },

        // Connection optimization for transactions
        maxPoolSize: 20,
        minPoolSize: 5,
        retryWrites: true,
        retryReads: true,

        // Session configuration
        maxIdleTimeMS: 60000,
        serverSelectionTimeoutMS: 30000,

        // Application identification
        appName: 'TransactionManager'
      });

      await this.client.connect();
      this.database = this.client.db(this.config.database);

      // Initialize session pool for transaction management
      await this.initializeSessionPool();

      // Setup transaction monitoring
      if (this.config.enableTransactionMonitoring) {
        await this.setupTransactionMonitoring();
      }

      console.log('MongoDB Transaction Manager initialized successfully');

      return this.database;

    } catch (error) {
      console.error('Failed to initialize transaction manager:', error);
      throw error;
    }
  }

  async initializeSessionPool() {
    console.log('Initializing session pool for transaction management...');

    for (let i = 0; i < this.config.sessionPoolSize; i++) {
      const session = this.client.startSession({
        causalConsistency: this.config.enableCausalConsistency,
        defaultTransactionOptions: this.config.defaultTransactionOptions
      });

      this.sessionPool.push({
        session,
        inUse: false,
        createdAt: new Date(),
        transactionCount: 0
      });
    }

    console.log(`Session pool initialized with ${this.sessionPool.length} sessions`);
  }

  async acquireSession() {
    // Find available session from pool
    let sessionWrapper = this.sessionPool.find(s => !s.inUse);

    if (!sessionWrapper) {
      // Create temporary session if pool exhausted
      console.warn('Session pool exhausted, creating temporary session');
      sessionWrapper = {
        session: this.client.startSession({
          causalConsistency: this.config.enableCausalConsistency,
          defaultTransactionOptions: this.config.defaultTransactionOptions
        }),
        inUse: true,
        createdAt: new Date(),
        transactionCount: 0,
        temporary: true
      };
    } else {
      sessionWrapper.inUse = true;
    }

    return sessionWrapper;
  }

  async releaseSession(sessionWrapper) {
    sessionWrapper.inUse = false;
    sessionWrapper.transactionCount++;

    // Clean up temporary sessions
    if (sessionWrapper.temporary) {
      await sessionWrapper.session.endSession();
    }
  }

  async executeTransaction(transactionFunction, options = {}) {
    console.log('Executing MongoDB transaction with ACID guarantees...');
    const startTime = Date.now();
    const transactionId = new ObjectId().toString();

    let sessionWrapper = null;
    let attempt = 0;
    const maxRetries = options.maxRetries || this.config.maxRetryAttempts;

    while (attempt < maxRetries) {
      try {
        // Acquire session for transaction
        sessionWrapper = await this.acquireSession();
        const session = sessionWrapper.session;

        // Configure transaction options
        const transactionOptions = {
          ...this.config.defaultTransactionOptions,
          ...options.transactionOptions
        };

        console.log(`Starting transaction ${transactionId} (attempt ${attempt + 1})`);

        // Start transaction with ACID properties
        session.startTransaction(transactionOptions);

        // Execute transaction function with session
        const result = await transactionFunction(session, this.database);

        // Commit transaction
        await session.commitTransaction();

        const executionTime = Date.now() - startTime;
        console.log(`Transaction ${transactionId} committed successfully in ${executionTime}ms`);

        // Update metrics
        await this.updateTransactionMetrics('success', executionTime, options.transactionType);

        return {
          transactionId,
          success: true,
          result,
          executionTime,
          attempt: attempt + 1
        };

      } catch (error) {
        console.error(`Transaction ${transactionId} failed on attempt ${attempt + 1}:`, error.message);

        if (sessionWrapper) {
          try {
            await sessionWrapper.session.abortTransaction();
          } catch (abortError) {
            console.error('Error aborting transaction:', abortError.message);
          }
        }

        // Check if error is retryable
        if (this.isRetryableError(error) && attempt < maxRetries - 1) {
          attempt++;
          console.log(`Retrying transaction ${transactionId} (attempt ${attempt + 1})`);

          // Wait with exponential backoff
          const delay = this.config.retryDelayMs * Math.pow(2, attempt - 1);
          await this.sleep(delay);

          continue;
        }

        // Transaction failed permanently
        const executionTime = Date.now() - startTime;
        await this.updateTransactionMetrics('failure', executionTime, options.transactionType);

        throw new Error(`Transaction ${transactionId} failed after ${attempt + 1} attempts: ${error.message}`);

      } finally {
        if (sessionWrapper) {
          await this.releaseSession(sessionWrapper);
        }
      }
    }
  }

  async processCustomerOrder(orderData) {
    console.log('Processing customer order with ACID transaction...');

    return await this.executeTransaction(async (session, db) => {
      const orderId = new ObjectId();
      const orderNumber = `ORD-${Date.now()}-${Math.floor(Math.random() * 1000)}`;
      const timestamp = new Date();

      // Collections for multi-document transaction
      const ordersCollection = db.collection('orders');
      const inventoryCollection = db.collection('inventory');
      const paymentsCollection = db.collection('payments');
      const customersCollection = db.collection('customers');

      // Step 1: Validate customer exists (with session for consistency)
      const customer = await customersCollection.findOne(
        { _id: new ObjectId(orderData.customerId) },
        { session }
      );

      if (!customer) {
        throw new Error('Customer not found');
      }

      // Step 2: Validate and reserve inventory atomically
      let totalAmount = 0;
      const inventoryUpdates = [];
      const orderItems = [];

      for (const item of orderData.items) {
        const productId = new ObjectId(item.productId);

        // Check inventory with document-level locking within transaction
        const inventory = await inventoryCollection.findOne(
          { productId: productId, locationId: 'main_warehouse' },
          { session }
        );

        if (!inventory) {
          throw new Error(`Inventory not found for product ${item.productId}`);
        }

        if (inventory.availableQuantity < item.quantity) {
          throw new Error(
            `Insufficient inventory for product ${item.productId}: ` +
            `requested ${item.quantity}, available ${inventory.availableQuantity}`
          );
        }

        // Prepare inventory update
        inventoryUpdates.push({
          updateOne: {
            filter: { 
              productId: productId, 
              locationId: 'main_warehouse',
              availableQuantity: { $gte: item.quantity } // Optimistic concurrency control
            },
            update: {
              $inc: {
                availableQuantity: -item.quantity,
                reservedQuantity: item.quantity
              },
              $set: { updatedAt: timestamp }
            }
          }
        });

        // Prepare order item
        const lineTotal = item.quantity * item.unitPrice;
        totalAmount += lineTotal;

        orderItems.push({
          _id: new ObjectId(),
          productId: productId,
          productSku: inventory.productSku,
          productName: inventory.productName,
          quantity: item.quantity,
          unitPrice: item.unitPrice,
          lineTotal: lineTotal,
          inventoryReserved: true,
          reservationTimestamp: timestamp
        });
      }

      // Step 3: Execute inventory updates atomically
      const inventoryResult = await inventoryCollection.bulkWrite(
        inventoryUpdates,
        { session, ordered: true }
      );

      if (inventoryResult.matchedCount !== orderData.items.length) {
        throw new Error('Inventory reservation failed due to concurrent updates');
      }

      // Step 4: Process payment authorization (atomic within transaction)
      const paymentId = new ObjectId();
      const payment = {
        _id: paymentId,
        orderId: orderId,
        paymentMethod: orderData.payment.method,
        paymentProcessor: orderData.payment.processor || 'stripe',
        amount: totalAmount,
        status: 'processing',
        authorizationAttempts: 0,
        createdAt: timestamp,

        // Payment details
        processorTransactionId: null,
        authorizationCode: null,
        processorResponse: null
      };

      // Simulate payment processing (in production would integrate with payment processor)
      if (orderData.payment.testMode) {
        payment.status = 'authorized';
        payment.authorizationCode = `TEST_AUTH_${Date.now()}`;
        payment.processorTransactionId = `test_txn_${paymentId}`;
        payment.authorizedAt = timestamp;
        payment.processorResponse = {
          status: 'approved',
          authCode: payment.authorizationCode,
          processorRef: payment.processorTransactionId,
          processedAt: timestamp
        };
      } else {
        // In production, would make external API call within transaction timeout
        payment.status = 'authorization_pending';
      }

      await paymentsCollection.insertOne(payment, { session });

      // Step 5: Create order document with all related data
      const order = {
        _id: orderId,
        orderNumber: orderNumber,
        customerId: new ObjectId(orderData.customerId),
        status: payment.status === 'authorized' ? 'confirmed' : 'payment_pending',

        // Order details
        items: orderItems,
        itemCount: orderItems.length,
        totalAmount: totalAmount,

        // Payment information
        paymentId: paymentId,
        paymentMethod: orderData.payment.method,
        paymentStatus: payment.status,

        // Shipping information
        shippingAddress: orderData.shipping.address,
        shippingMethod: orderData.shipping.method,
        shippingCost: orderData.shipping.cost || 0,

        // Timestamps and metadata
        createdAt: timestamp,
        updatedAt: timestamp,
        salesChannel: 'web',
        orderSource: 'transaction_api',

        // Transaction tracking
        transactionId: session.id ? session.id.toString() : null,
        inventoryReserved: true,
        inventoryReservationExpiry: new Date(timestamp.getTime() + 15 * 60 * 1000) // 15 minutes
      };

      await ordersCollection.insertOne(order, { session });

      // Step 6: Update customer order history (within same transaction)
      await customersCollection.updateOne(
        { _id: new ObjectId(orderData.customerId) },
        {
          $inc: { 
            totalOrders: 1,
            totalSpent: totalAmount
          },
          $push: {
            recentOrders: {
              $each: [{ orderId: orderId, orderNumber: orderNumber, amount: totalAmount, date: timestamp }],
              $slice: -10 // Keep only last 10 orders
            }
          },
          $set: { lastOrderDate: timestamp, updatedAt: timestamp }
        },
        { session }
      );

      console.log(`Order ${orderNumber} processed successfully with ${orderItems.length} items`);

      return {
        orderId: orderId,
        orderNumber: orderNumber,
        status: order.status,
        totalAmount: totalAmount,
        paymentStatus: payment.status,
        inventoryReserved: true,
        items: orderItems.length,
        processingTime: Date.now() - timestamp.getTime()
      };

    }, {
      transactionType: 'customer_order',
      maxRetries: 3,
      transactionOptions: {
        readConcern: { level: 'majority' },
        writeConcern: { w: 'majority', j: true }
      }
    });
  }

  async processInventoryTransfer(transferData) {
    console.log('Processing inventory transfer with ACID transaction...');

    return await this.executeTransaction(async (session, db) => {
      const transferId = new ObjectId();
      const timestamp = new Date();

      const inventoryCollection = db.collection('inventory');
      const transfersCollection = db.collection('inventory_transfers');

      // Validate source location inventory
      const sourceInventory = await inventoryCollection.findOne(
        { 
          productId: new ObjectId(transferData.productId),
          locationId: transferData.sourceLocation
        },
        { session }
      );

      if (!sourceInventory || sourceInventory.availableQuantity < transferData.quantity) {
        throw new Error(
          `Insufficient inventory at source location ${transferData.sourceLocation}: ` +
          `requested ${transferData.quantity}, available ${sourceInventory?.availableQuantity || 0}`
        );
      }

      // Validate destination location exists
      const destinationInventory = await inventoryCollection.findOne(
        { 
          productId: new ObjectId(transferData.productId),
          locationId: transferData.destinationLocation
        },
        { session }
      );

      if (!destinationInventory) {
        throw new Error(`Destination location ${transferData.destinationLocation} not found`);
      }

      // Execute atomic inventory updates
      const transferOperations = [
        {
          updateOne: {
            filter: {
              productId: new ObjectId(transferData.productId),
              locationId: transferData.sourceLocation,
              availableQuantity: { $gte: transferData.quantity }
            },
            update: {
              $inc: { availableQuantity: -transferData.quantity },
              $set: { updatedAt: timestamp }
            }
          }
        },
        {
          updateOne: {
            filter: {
              productId: new ObjectId(transferData.productId),
              locationId: transferData.destinationLocation
            },
            update: {
              $inc: { availableQuantity: transferData.quantity },
              $set: { updatedAt: timestamp }
            }
          }
        }
      ];

      const transferResult = await inventoryCollection.bulkWrite(
        transferOperations,
        { session, ordered: true }
      );

      if (transferResult.matchedCount !== 2) {
        throw new Error('Inventory transfer failed due to concurrent updates');
      }

      // Record transfer transaction
      const transfer = {
        _id: transferId,
        productId: new ObjectId(transferData.productId),
        sourceLocation: transferData.sourceLocation,
        destinationLocation: transferData.destinationLocation,
        quantity: transferData.quantity,
        transferType: transferData.transferType || 'manual',
        reason: transferData.reason || 'inventory_rebalancing',
        status: 'completed',

        // Audit trail
        requestedBy: transferData.requestedBy,
        approvedBy: transferData.approvedBy,
        createdAt: timestamp,
        completedAt: timestamp,

        // Transaction metadata
        transactionId: session.id ? session.id.toString() : null
      };

      await transfersCollection.insertOne(transfer, { session });

      console.log(`Inventory transfer completed: ${transferData.quantity} units from ${transferData.sourceLocation} to ${transferData.destinationLocation}`);

      return {
        transferId: transferId,
        productId: transferData.productId,
        quantity: transferData.quantity,
        sourceLocation: transferData.sourceLocation,
        destinationLocation: transferData.destinationLocation,
        status: 'completed',
        completedAt: timestamp
      };

    }, {
      transactionType: 'inventory_transfer',
      maxRetries: 3
    });
  }

  async processRefundTransaction(refundData) {
    console.log('Processing refund transaction with ACID guarantees...');

    return await this.executeTransaction(async (session, db) => {
      const refundId = new ObjectId();
      const timestamp = new Date();

      const ordersCollection = db.collection('orders');
      const paymentsCollection = db.collection('payments');
      const inventoryCollection = db.collection('inventory');
      const refundsCollection = db.collection('refunds');

      // Validate original order and payment
      const order = await ordersCollection.findOne(
        { _id: new ObjectId(refundData.orderId) },
        { session }
      );

      if (!order) {
        throw new Error('Order not found');
      }

      if (order.status === 'refunded') {
        throw new Error('Order already refunded');
      }

      const payment = await paymentsCollection.findOne(
        { orderId: new ObjectId(refundData.orderId) },
        { session }
      );

      if (!payment || payment.status !== 'authorized') {
        throw new Error('Payment not found or not in authorized state');
      }

      // Calculate refund amount
      const refundAmount = refundData.fullRefund ? order.totalAmount : refundData.amount;

      if (refundAmount > order.totalAmount) {
        throw new Error('Refund amount cannot exceed order total');
      }

      // Process inventory restoration if items are being refunded
      const inventoryUpdates = [];
      if (refundData.restoreInventory && refundData.itemsToRefund) {
        for (const refundItem of refundData.itemsToRefund) {
          const orderItem = order.items.find(item => 
            item.productId.toString() === refundItem.productId
          );

          if (!orderItem) {
            throw new Error(`Order item not found: ${refundItem.productId}`);
          }

          inventoryUpdates.push({
            updateOne: {
              filter: {
                productId: new ObjectId(refundItem.productId),
                locationId: 'main_warehouse'
              },
              update: {
                $inc: {
                  availableQuantity: refundItem.quantity,
                  reservedQuantity: -refundItem.quantity
                },
                $set: { updatedAt: timestamp }
              }
            }
          });
        }

        // Execute inventory restoration
        if (inventoryUpdates.length > 0) {
          await inventoryCollection.bulkWrite(inventoryUpdates, { session });
        }
      }

      // Create refund record
      const refund = {
        _id: refundId,
        orderId: new ObjectId(refundData.orderId),
        orderNumber: order.orderNumber,
        originalAmount: order.totalAmount,
        refundAmount: refundAmount,
        refundType: refundData.fullRefund ? 'full' : 'partial',
        reason: refundData.reason,

        // Processing details
        status: 'processing',
        paymentMethod: payment.paymentMethod,
        processorTransactionId: null,

        // Items being refunded
        itemsRefunded: refundData.itemsToRefund || [],
        inventoryRestored: refundData.restoreInventory || false,

        // Audit trail
        requestedBy: refundData.requestedBy,
        processedBy: refundData.processedBy,
        createdAt: timestamp,

        // Transaction metadata
        transactionId: session.id ? session.id.toString() : null
      };

      // Simulate refund processing
      if (refundData.testMode) {
        refund.status = 'completed';
        refund.processorTransactionId = `refund_${refundId}`;
        refund.processedAt = timestamp;
        refund.processorResponse = {
          status: 'refunded',
          refundRef: refund.processorTransactionId,
          processedAt: timestamp
        };
      }

      await refundsCollection.insertOne(refund, { session });

      // Update order status
      const newOrderStatus = refundData.fullRefund ? 'refunded' : 'partially_refunded';
      await ordersCollection.updateOne(
        { _id: new ObjectId(refundData.orderId) },
        {
          $set: {
            status: newOrderStatus,
            refundStatus: refund.status,
            refundAmount: refundAmount,
            refundedAt: timestamp,
            updatedAt: timestamp
          }
        },
        { session }
      );

      // Update payment record
      await paymentsCollection.updateOne(
        { orderId: new ObjectId(refundData.orderId) },
        {
          $set: {
            refundStatus: refund.status,
            refundAmount: refundAmount,
            refundedAt: timestamp,
            updatedAt: timestamp
          }
        },
        { session }
      );

      console.log(`Refund processed: ${refundAmount} for order ${order.orderNumber}`);

      return {
        refundId: refundId,
        orderId: refundData.orderId,
        refundAmount: refundAmount,
        status: refund.status,
        inventoryRestored: refund.inventoryRestored,
        processedAt: timestamp
      };

    }, {
      transactionType: 'refund_processing',
      maxRetries: 2
    });
  }

  // Utility methods for transaction management

  isRetryableError(error) {
    // MongoDB transient transaction errors that can be retried
    const retryableErrorCodes = [
      112, // WriteConflict
      117, // ConflictingOperationInProgress  
      251, // NoSuchTransaction
      244, // TransactionCoordinatorSteppingDown
      246, // TransactionCoordinatorReachedAbortDecision
    ];

    const retryableErrorLabels = [
      'TransientTransactionError',
      'UnknownTransactionCommitResult'
    ];

    return retryableErrorCodes.includes(error.code) ||
           retryableErrorLabels.some(label => error.errorLabels?.includes(label)) ||
           error.message.includes('WriteConflict') ||
           error.message.includes('TransientTransactionError');
  }

  async updateTransactionMetrics(status, executionTime, transactionType) {
    this.transactionMetrics.totalTransactions++;

    if (status === 'success') {
      this.transactionMetrics.successfulTransactions++;
    } else {
      this.transactionMetrics.failedTransactions++;
    }

    // Update average execution time
    const totalTime = this.transactionMetrics.averageTransactionTime * 
                      (this.transactionMetrics.totalTransactions - 1);
    this.transactionMetrics.averageTransactionTime = 
      (totalTime + executionTime) / this.transactionMetrics.totalTransactions;

    // Track transaction types
    if (transactionType) {
      const typeStats = this.transactionMetrics.transactionTypes.get(transactionType) || {
        count: 0,
        successCount: 0,
        failureCount: 0,
        averageTime: 0
      };

      typeStats.count++;
      if (status === 'success') {
        typeStats.successCount++;
      } else {
        typeStats.failureCount++;
      }

      typeStats.averageTime = ((typeStats.averageTime * (typeStats.count - 1)) + executionTime) / typeStats.count;

      this.transactionMetrics.transactionTypes.set(transactionType, typeStats);
    }
  }

  sleep(ms) {
    return new Promise(resolve => setTimeout(resolve, ms));
  }

  async getTransactionMetrics() {
    const successRate = this.transactionMetrics.totalTransactions > 0 ?
      (this.transactionMetrics.successfulTransactions / this.transactionMetrics.totalTransactions) * 100 : 0;

    return {
      totalTransactions: this.transactionMetrics.totalTransactions,
      successfulTransactions: this.transactionMetrics.successfulTransactions,
      failedTransactions: this.transactionMetrics.failedTransactions,
      successRate: Math.round(successRate * 100) / 100,
      averageTransactionTime: Math.round(this.transactionMetrics.averageTransactionTime),
      transactionTypes: Object.fromEntries(this.transactionMetrics.transactionTypes),
      sessionPoolSize: this.sessionPool.length,
      availableSessions: this.sessionPool.filter(s => !s.inUse).length
    };
  }

  async setupTransactionMonitoring() {
    console.log('Setting up transaction monitoring and analytics...');

    // Monitor transaction performance
    setInterval(async () => {
      const metrics = await this.getTransactionMetrics();
      console.log('Transaction Metrics:', metrics);

      // Store metrics to database for analysis
      if (this.database) {
        await this.database.collection('transaction_metrics').insertOne({
          ...metrics,
          timestamp: new Date()
        });
      }
    }, 60000); // Every minute
  }

  async closeTransactionManager() {
    console.log('Closing MongoDB Transaction Manager...');

    // End all sessions in pool
    for (const sessionWrapper of this.sessionPool) {
      try {
        await sessionWrapper.session.endSession();
      } catch (error) {
        console.error('Error ending session:', error);
      }
    }

    // Close MongoDB connection
    if (this.client) {
      await this.client.close();
    }

    console.log('Transaction Manager closed successfully');
  }
}

// Example usage demonstrating ACID transactions
async function demonstrateMongoDBTransactions() {
  const transactionManager = new MongoTransactionManager({
    uri: 'mongodb://localhost:27017',
    database: 'ecommerce_transactions',
    enableTransactionMonitoring: true
  });

  try {
    await transactionManager.initialize();

    // Demonstrate customer order processing with ACID guarantees
    const orderResult = await transactionManager.processCustomerOrder({
      customerId: '507f1f77bcf86cd799439011',
      items: [
        { productId: '507f1f77bcf86cd799439012', quantity: 2, unitPrice: 29.99 },
        { productId: '507f1f77bcf86cd799439013', quantity: 1, unitPrice: 149.99 }
      ],
      payment: {
        method: 'credit_card',
        processor: 'stripe',
        testMode: true
      },
      shipping: {
        method: 'standard',
        cost: 9.99,
        address: {
          street: '123 Main St',
          city: 'Boston',
          state: 'MA',
          zipCode: '02101'
        }
      }
    });

    console.log('Order processing result:', orderResult);

    // Demonstrate inventory transfer with ACID consistency
    const transferResult = await transactionManager.processInventoryTransfer({
      productId: '507f1f77bcf86cd799439012',
      sourceLocation: 'warehouse_east',
      destinationLocation: 'warehouse_west',
      quantity: 50,
      transferType: 'rebalancing',
      reason: 'Regional demand adjustment',
      requestedBy: 'inventory_manager',
      approvedBy: 'operations_director'
    });

    console.log('Inventory transfer result:', transferResult);

    // Demonstrate refund processing with inventory restoration
    const refundResult = await transactionManager.processRefundTransaction({
      orderId: orderResult.result.orderId,
      fullRefund: false,
      amount: 59.98, // Refund for first item
      reason: 'Customer satisfaction',
      restoreInventory: true,
      itemsToRefund: [
        { productId: '507f1f77bcf86cd799439012', quantity: 2 }
      ],
      testMode: true,
      requestedBy: 'customer_service',
      processedBy: 'service_manager'
    });

    console.log('Refund processing result:', refundResult);

    // Get transaction performance metrics
    const metrics = await transactionManager.getTransactionMetrics();
    console.log('Transaction Performance Metrics:', metrics);

    return {
      orderResult,
      transferResult, 
      refundResult,
      metrics
    };

  } catch (error) {
    console.error('Transaction demonstration failed:', error);
    throw error;
  } finally {
    await transactionManager.closeTransactionManager();
  }
}

// Benefits of MongoDB ACID Transactions:
// - Native multi-document ACID compliance eliminates complex coordination logic
// - Distributed transaction support across replica sets and sharded clusters  
// - Automatic retry and recovery mechanisms for transient failures
// - Session-based transaction management with connection pooling optimization
// - Comprehensive transaction monitoring and performance analytics
// - Flexible transaction boundaries supporting complex business workflows
// - Integration with MongoDB's document model for rich transactional operations
// - Production-ready error handling with intelligent retry strategies

module.exports = {
  MongoTransactionManager,
  demonstrateMongoDBTransactions
};

SQL-Style Transaction Management with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB transactions and ACID operations:

-- QueryLeaf MongoDB transactions with SQL-familiar ACID syntax

-- Configure transaction settings
SET transaction_isolation_level = 'read_committed';
SET transaction_timeout = '30 seconds';
SET enable_auto_retry = true;
SET max_retry_attempts = 3;
SET transaction_read_concern = 'majority';
SET transaction_write_concern = 'majority';

-- Begin transaction with explicit ACID properties
BEGIN TRANSACTION
    READ CONCERN MAJORITY
    WRITE CONCERN MAJORITY
    TIMEOUT 30000
    MAX_RETRY_ATTEMPTS 3;

-- Customer order processing with multi-collection ACID transaction
WITH order_transaction_context AS (
    -- Transaction metadata and configuration
    SELECT 
        GENERATE_UUID() as transaction_id,
        CURRENT_TIMESTAMP as transaction_start_time,
        'customer_order_processing' as transaction_type,
        JSON_OBJECT(
            'isolation_level', 'read_committed',
            'consistency_level', 'strong',
            'durability', 'guaranteed',
            'atomicity', 'all_or_nothing'
        ) as acid_properties
),

-- Step 1: Validate customer and inventory availability
order_validation AS (
    SELECT 
        c.customer_id,
        c.customer_email,
        c.customer_status,

        -- Order items validation with inventory checks
        ARRAY_AGG(
            JSON_OBJECT(
                'product_id', p.product_id,
                'product_sku', p.sku,
                'product_name', p.name,
                'requested_quantity', oi.quantity,
                'unit_price', oi.unit_price,
                'available_inventory', i.available_quantity,
                'can_fulfill', CASE WHEN i.available_quantity >= oi.quantity THEN true ELSE false END,
                'line_total', oi.quantity * oi.unit_price
            )
        ) as order_items_validation,

        -- Aggregate order totals
        SUM(oi.quantity * oi.unit_price) as order_total,
        COUNT(*) as item_count,
        COUNT(*) FILTER (WHERE i.available_quantity >= oi.quantity) as fulfillable_items,

        -- Validation status
        CASE 
            WHEN c.customer_status != 'active' THEN 'customer_inactive'
            WHEN COUNT(*) != COUNT(*) FILTER (WHERE i.available_quantity >= oi.quantity) THEN 'insufficient_inventory'
            ELSE 'validated'
        END as validation_status

    FROM customers c
    CROSS JOIN (
        SELECT 
            'product_uuid_1' as product_id, 2 as quantity, 29.99 as unit_price
        UNION ALL
        SELECT 
            'product_uuid_2' as product_id, 1 as quantity, 149.99 as unit_price
    ) oi
    JOIN products p ON p.product_id = oi.product_id
    JOIN inventory i ON i.product_id = oi.product_id AND i.location_id = 'main_warehouse'
    WHERE c.customer_id = 'customer_uuid_123'
    GROUP BY c.customer_id, c.customer_email, c.customer_status
),

-- Step 2: Reserve inventory atomically (within transaction)  
inventory_reservations AS (
    UPDATE inventory 
    SET 
        available_quantity = available_quantity - reservation_info.quantity,
        reserved_quantity = reserved_quantity + reservation_info.quantity,
        updated_at = CURRENT_TIMESTAMP,
        last_reservation_id = otc.transaction_id
    FROM (
        SELECT 
            json_array_elements(ov.order_items_validation)->>'product_id' as product_id,
            (json_array_elements(ov.order_items_validation)->>'requested_quantity')::INTEGER as quantity
        FROM order_validation ov
        CROSS JOIN order_transaction_context otc
        WHERE ov.validation_status = 'validated'
    ) reservation_info,
    order_transaction_context otc
    WHERE inventory.product_id = reservation_info.product_id
    AND inventory.location_id = 'main_warehouse'
    AND inventory.available_quantity >= reservation_info.quantity
    RETURNING 
        product_id,
        available_quantity as new_available_quantity,
        reserved_quantity as new_reserved_quantity,
        'reserved' as reservation_status
),

-- Step 3: Process payment authorization (simulated within transaction)
payment_processing AS (
    INSERT INTO payments (
        payment_id,
        transaction_id,  
        order_amount,
        payment_method,
        payment_processor,
        payment_status,
        authorization_code,
        processed_at,

        -- ACID transaction metadata
        transaction_isolation_level,
        transaction_consistency_guarantee,
        created_within_transaction
    )
    SELECT 
        GENERATE_UUID() as payment_id,
        otc.transaction_id,
        ov.order_total,
        'credit_card' as payment_method,
        'stripe' as payment_processor,
        'authorized' as payment_status,
        'AUTH_' || EXTRACT(EPOCH FROM CURRENT_TIMESTAMP) as authorization_code,
        CURRENT_TIMESTAMP as processed_at,

        -- Transaction ACID properties
        'read_committed' as transaction_isolation_level,
        'strong_consistency' as transaction_consistency_guarantee,
        true as created_within_transaction

    FROM order_validation ov
    CROSS JOIN order_transaction_context otc
    WHERE ov.validation_status = 'validated'
    RETURNING payment_id, payment_status, authorization_code
),

-- Step 4: Create order with full ACID compliance
order_creation AS (
    INSERT INTO orders (
        order_id,
        transaction_id,
        customer_id,
        order_number,
        order_status,

        -- Order details
        items,
        item_count,
        total_amount,

        -- Payment information
        payment_id,
        payment_method,
        payment_status,

        -- Inventory status
        inventory_reserved,
        reservation_expiry,

        -- Transaction metadata  
        created_within_transaction,
        transaction_isolation_level,
        acid_compliance_verified,

        -- Timestamps
        created_at,
        updated_at
    )
    SELECT 
        GENERATE_UUID() as order_id,
        otc.transaction_id,
        ov.customer_id,
        'ORD-' || to_char(CURRENT_TIMESTAMP, 'YYYYMMDD') || '-' || 
            LPAD(EXTRACT(EPOCH FROM CURRENT_TIMESTAMP)::INTEGER % 10000, 4, '0') as order_number,
        'confirmed' as order_status,

        -- Order items with reservation confirmation
        JSON_AGG(
            JSON_OBJECT(
                'product_id', (json_array_elements(ov.order_items_validation)->>'product_id'),
                'quantity', (json_array_elements(ov.order_items_validation)->>'requested_quantity')::INTEGER,
                'unit_price', (json_array_elements(ov.order_items_validation)->>'unit_price')::DECIMAL,
                'line_total', (json_array_elements(ov.order_items_validation)->>'line_total')::DECIMAL,
                'inventory_reserved', true,
                'reservation_confirmed', EXISTS(
                    SELECT 1 FROM inventory_reservations ir 
                    WHERE ir.product_id = json_array_elements(ov.order_items_validation)->>'product_id'
                )
            )
        ) as items,
        ov.item_count,
        ov.order_total,

        pp.payment_id,
        'credit_card' as payment_method,
        pp.payment_status,

        true as inventory_reserved,
        CURRENT_TIMESTAMP + INTERVAL '15 minutes' as reservation_expiry,

        -- ACID transaction verification
        true as created_within_transaction,
        'read_committed' as transaction_isolation_level,
        true as acid_compliance_verified,

        CURRENT_TIMESTAMP as created_at,
        CURRENT_TIMESTAMP as updated_at

    FROM order_validation ov
    CROSS JOIN order_transaction_context otc
    CROSS JOIN payment_processing pp
    WHERE ov.validation_status = 'validated'
    GROUP BY otc.transaction_id, ov.customer_id, ov.item_count, ov.order_total, 
             pp.payment_id, pp.payment_status
    RETURNING order_id, order_number, order_status, total_amount
),

-- Step 5: Update customer statistics (within same transaction)
customer_statistics_update AS (
    UPDATE customers 
    SET 
        total_orders = total_orders + 1,
        total_spent = total_spent + oc.total_amount,
        last_order_date = CURRENT_TIMESTAMP,
        last_order_amount = oc.total_amount,
        updated_at = CURRENT_TIMESTAMP,

        -- Transaction audit trail
        last_transaction_id = otc.transaction_id,
        updated_within_transaction = true

    FROM order_creation oc
    CROSS JOIN order_transaction_context otc
    WHERE customers.customer_id = (
        SELECT customer_id FROM order_validation WHERE validation_status = 'validated'
    )
    RETURNING customer_id, total_orders, total_spent, last_order_date
),

-- Final transaction result compilation
transaction_result AS (
    SELECT 
        otc.transaction_id,
        otc.transaction_type,
        'committed' as transaction_status,

        -- Order details
        oc.order_id,
        oc.order_number,
        oc.order_status,
        oc.total_amount,

        -- Payment confirmation
        pp.payment_id,
        pp.payment_status,
        pp.authorization_code,

        -- Inventory confirmation
        ARRAY_AGG(
            JSON_OBJECT(
                'product_id', ir.product_id,
                'reservation_status', ir.reservation_status,
                'available_quantity', ir.new_available_quantity,
                'reserved_quantity', ir.new_reserved_quantity
            )
        ) as inventory_reservations,

        -- Customer update confirmation
        csu.total_orders as customer_total_orders,
        csu.total_spent as customer_total_spent,

        -- ACID compliance verification
        JSON_OBJECT(
            'atomicity', 'all_operations_committed',
            'consistency', 'business_rules_enforced', 
            'isolation', 'read_committed_maintained',
            'durability', 'changes_persisted'
        ) as acid_verification,

        -- Performance metrics
        EXTRACT(EPOCH FROM (CURRENT_TIMESTAMP - otc.transaction_start_time)) * 1000 as transaction_duration_ms,
        COUNT(DISTINCT ir.product_id) as items_reserved,

        -- Transaction metadata
        CURRENT_TIMESTAMP as transaction_committed_at,
        true as transaction_successful

    FROM order_transaction_context otc
    CROSS JOIN order_creation oc
    CROSS JOIN payment_processing pp
    LEFT JOIN inventory_reservations ir ON true
    LEFT JOIN customer_statistics_update csu ON true
    GROUP BY otc.transaction_id, otc.transaction_type, otc.transaction_start_time,
             oc.order_id, oc.order_number, oc.order_status, oc.total_amount,
             pp.payment_id, pp.payment_status, pp.authorization_code,
             csu.total_orders, csu.total_spent
)

-- Return comprehensive transaction result
SELECT 
    tr.transaction_id,
    tr.transaction_status,
    tr.order_id,
    tr.order_number,
    tr.total_amount,
    tr.payment_status,
    tr.inventory_reservations,
    tr.acid_verification,
    tr.transaction_duration_ms || 'ms' as execution_time,
    tr.transaction_successful,

    -- Success confirmation message
    CASE 
        WHEN tr.transaction_successful THEN 
            'Order ' || tr.order_number || ' processed successfully with ACID guarantees: ' ||
            tr.items_reserved || ' items reserved, payment ' || tr.payment_status ||
            ', customer statistics updated'
        ELSE 'Transaction failed - all changes rolled back'
    END as result_summary

FROM transaction_result tr;

-- Commit transaction with durability guarantee
COMMIT TRANSACTION 
    WITH DURABILITY_GUARANTEE = 'majority_acknowledged'
    AND CONSISTENCY_CHECK = 'business_rules_validated';

-- Transaction performance and ACID compliance monitoring
WITH transaction_performance_analysis AS (
    SELECT 
        transaction_type,
        DATE_TRUNC('hour', transaction_committed_at) as hour_bucket,

        -- Performance metrics
        COUNT(*) as transaction_count,
        AVG(transaction_duration_ms) as avg_duration_ms,
        MAX(transaction_duration_ms) as max_duration_ms,
        PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY transaction_duration_ms) as p95_duration_ms,

        -- Success and failure rates
        COUNT(*) FILTER (WHERE transaction_successful = true) as successful_transactions,
        COUNT(*) FILTER (WHERE transaction_successful = false) as failed_transactions,
        ROUND(
            COUNT(*) FILTER (WHERE transaction_successful = true)::DECIMAL / COUNT(*) * 100, 
            2
        ) as success_rate_percent,

        -- ACID compliance metrics
        COUNT(*) FILTER (WHERE acid_verification->>'atomicity' = 'all_operations_committed') as atomic_transactions,
        COUNT(*) FILTER (WHERE acid_verification->>'consistency' = 'business_rules_enforced') as consistent_transactions,
        COUNT(*) FILTER (WHERE acid_verification->>'isolation' = 'read_committed_maintained') as isolated_transactions,
        COUNT(*) FILTER (WHERE acid_verification->>'durability' = 'changes_persisted') as durable_transactions,

        -- Resource utilization analysis
        AVG(items_reserved) as avg_items_per_transaction,
        SUM(total_amount) as total_transaction_value

    FROM transaction_results_log
    WHERE transaction_committed_at >= CURRENT_TIMESTAMP - INTERVAL '24 hours'
    GROUP BY transaction_type, DATE_TRUNC('hour', transaction_committed_at)
),

-- ACID compliance assessment
acid_compliance_assessment AS (
    SELECT 
        tpa.transaction_type,
        tpa.hour_bucket,
        tpa.transaction_count,

        -- Performance assessment
        CASE 
            WHEN tpa.avg_duration_ms < 100 THEN 'excellent'
            WHEN tpa.avg_duration_ms < 500 THEN 'good' 
            WHEN tpa.avg_duration_ms < 1000 THEN 'acceptable'
            ELSE 'needs_optimization'
        END as performance_rating,

        -- ACID compliance scoring
        ROUND(
            (tpa.atomic_transactions + tpa.consistent_transactions + 
             tpa.isolated_transactions + tpa.durable_transactions)::DECIMAL / 
            (tpa.transaction_count * 4) * 100, 
            2
        ) as acid_compliance_score,

        -- Reliability assessment
        CASE 
            WHEN tpa.success_rate_percent >= 99.9 THEN 'highly_reliable'
            WHEN tpa.success_rate_percent >= 99.0 THEN 'reliable'
            WHEN tpa.success_rate_percent >= 95.0 THEN 'acceptable'
            ELSE 'needs_improvement'
        END as reliability_rating,

        -- Throughput analysis
        ROUND(tpa.transaction_count / 3600.0, 2) as transactions_per_second,
        ROUND(tpa.total_transaction_value / tpa.transaction_count, 2) as avg_transaction_value,

        -- Optimization recommendations
        CASE 
            WHEN tpa.avg_duration_ms > 1000 THEN 'Optimize transaction logic and reduce operation count'
            WHEN tpa.success_rate_percent < 95 THEN 'Investigate failure patterns and improve error handling'
            WHEN tpa.p95_duration_ms > tpa.avg_duration_ms * 3 THEN 'Address performance outliers and resource contention'
            ELSE 'Transaction performance within acceptable parameters'
        END as optimization_recommendation

    FROM transaction_performance_analysis tpa
)

-- Comprehensive transaction monitoring dashboard
SELECT 
    aca.transaction_type,
    TO_CHAR(aca.hour_bucket, 'YYYY-MM-DD HH24:00') as analysis_period,
    aca.transaction_count,

    -- Performance metrics
    ROUND(tpa.avg_duration_ms, 2) || 'ms' as avg_execution_time,
    ROUND(tpa.p95_duration_ms, 2) || 'ms' as p95_execution_time,
    aca.performance_rating,
    aca.transactions_per_second || '/sec' as throughput,

    -- ACID compliance status
    aca.acid_compliance_score || '%' as acid_compliance,
    CASE 
        WHEN aca.acid_compliance_score >= 99.9 THEN 'Full ACID Compliance'
        WHEN aca.acid_compliance_score >= 99.0 THEN 'High ACID Compliance'
        WHEN aca.acid_compliance_score >= 95.0 THEN 'Acceptable ACID Compliance'
        ELSE 'ACID Compliance Issues Detected'
    END as compliance_status,

    -- Reliability metrics
    tpa.success_rate_percent || '%' as success_rate,
    aca.reliability_rating,
    tpa.failed_transactions as failure_count,

    -- Business impact
    '$' || ROUND(aca.avg_transaction_value, 2) as avg_transaction_value,
    '$' || ROUND(tpa.total_transaction_value, 2) as total_value_processed,

    -- Operational guidance
    aca.optimization_recommendation,

    -- System health indicators
    CASE 
        WHEN aca.performance_rating = 'excellent' AND aca.reliability_rating = 'highly_reliable' THEN 'optimal'
        WHEN aca.performance_rating IN ('excellent', 'good') AND aca.reliability_rating IN ('highly_reliable', 'reliable') THEN 'healthy'
        WHEN aca.performance_rating = 'acceptable' OR aca.reliability_rating = 'acceptable' THEN 'monitoring_required'
        ELSE 'attention_required'
    END as system_health,

    -- Next steps
    CASE 
        WHEN aca.performance_rating = 'needs_optimization' THEN 'Immediate performance tuning required'
        WHEN aca.reliability_rating = 'needs_improvement' THEN 'Investigate and resolve reliability issues'
        WHEN aca.acid_compliance_score < 99 THEN 'Review ACID compliance implementation'
        ELSE 'Continue monitoring and maintain current configuration'
    END as recommended_actions

FROM acid_compliance_assessment aca
JOIN transaction_performance_analysis tpa ON 
    aca.transaction_type = tpa.transaction_type AND 
    aca.hour_bucket = tpa.hour_bucket
ORDER BY aca.hour_bucket DESC, aca.transaction_count DESC;

-- QueryLeaf provides comprehensive MongoDB transaction capabilities:
-- 1. SQL-familiar ACID transaction syntax with explicit isolation levels and consistency guarantees
-- 2. Multi-document operations with atomic commit/rollback across collections
-- 3. Automatic retry mechanisms with configurable backoff strategies for transient failures
-- 4. Comprehensive transaction monitoring with performance and compliance analytics
-- 5. Session management and connection pooling optimization for transaction performance
-- 6. Distributed transaction coordination across replica sets and sharded clusters
-- 7. Business logic integration with transaction boundaries and error handling
-- 8. SQL-style transaction control statements (BEGIN, COMMIT, ROLLBACK) for familiar workflow
-- 9. Advanced analytics for transaction performance tuning and ACID compliance verification
-- 10. Enterprise-grade transaction management with monitoring and operational insights

Best Practices for MongoDB Transaction Implementation

ACID Compliance and Performance Optimization

Essential practices for implementing MongoDB transactions effectively:

Transaction Boundaries: Design clear transaction boundaries that encompass related operations while minimizing transaction duration
Error Handling Strategy: Implement comprehensive retry logic for transient failures and proper rollback procedures for business logic errors
Performance Considerations: Optimize transactions for minimal lock contention and efficient resource utilization
Session Management: Use connection pooling and session management to optimize transaction performance across concurrent operations
Monitoring and Analytics: Establish comprehensive monitoring for transaction success rates, performance, and ACID compliance verification
Testing Strategies: Implement thorough testing of transaction boundaries, failure scenarios, and recovery procedures

Production Deployment and Scalability

Key considerations for enterprise MongoDB transaction deployments:

Replica Set Configuration: Ensure proper replica set deployment with sufficient nodes for transaction availability and performance
Distributed Transactions: Design transaction patterns that work efficiently across sharded MongoDB clusters
Resource Planning: Plan for transaction resource requirements including memory, CPU, and network overhead
Backup and Recovery: Implement backup strategies that account for transaction consistency and point-in-time recovery
Security Implementation: Secure transaction operations with proper authentication, authorization, and audit logging
Operational Procedures: Create standardized procedures for transaction monitoring, troubleshooting, and performance tuning

Conclusion

MongoDB transactions provide comprehensive ACID properties with multi-document operations, distributed consistency guarantees, and intelligent session management designed for modern applications requiring strong consistency across complex business operations. The native transaction support eliminates the complexity of manual coordination while providing enterprise-grade reliability and performance for distributed systems.

Key MongoDB transaction benefits include:

Complete ACID Compliance: Full atomicity, consistency, isolation, and durability guarantees across multi-document operations
Distributed Consistency: Native support for transactions across replica sets and sharded clusters with automatic coordination
Intelligent Retry Logic: Built-in retry mechanisms for transient failures with configurable backoff strategies
Session Management: Optimized session pooling and connection management for transaction performance
Comprehensive Monitoring: Real-time transaction performance analytics and ACID compliance verification
SQL Compatibility: Familiar transaction management patterns accessible through SQL-style operations

Whether you're building financial applications, e-commerce platforms, inventory management systems, or any application requiring strong consistency guarantees, MongoDB transactions with QueryLeaf's SQL-familiar interface provide the foundation for reliable, scalable, and maintainable transactional operations.

QueryLeaf Integration: QueryLeaf automatically optimizes MongoDB transactions while providing SQL-familiar syntax for transaction management and monitoring. Advanced ACID patterns, error handling strategies, and performance optimization techniques are seamlessly accessible through familiar SQL constructs, making sophisticated transaction management both powerful and approachable for SQL-oriented development teams.

The combination of MongoDB's robust ACID transaction capabilities with familiar SQL-style transaction management makes it an ideal platform for applications that require both strong consistency guarantees and familiar development patterns, ensuring your transactional operations maintain data integrity while scaling efficiently across distributed environments.

December 1, 2025
23 min read

MongoDB TTL Collections and Automatic Data Lifecycle Management: Enterprise-Grade Data Expiration and Storage Optimization

Modern applications generate massive amounts of time-sensitive data that requires intelligent lifecycle management to prevent storage bloat, maintain performance, and satisfy compliance requirements. Traditional relational databases provide limited automatic data expiration capabilities, often requiring complex batch jobs, manual cleanup procedures, or external scheduling systems that add operational overhead and complexity to data management workflows.

MongoDB TTL (Time To Live) collections provide native automatic data expiration capabilities with precise control over data retention policies, storage optimization, and compliance-driven data lifecycle management. Unlike traditional databases that require manual cleanup procedures and complex scheduling, MongoDB's TTL functionality automatically removes expired documents based on date field values, ensuring optimal storage utilization while maintaining query performance and operational simplicity.

The Traditional Data Expiration Challenge

Conventional relational database data lifecycle management faces significant operational limitations:

-- Traditional PostgreSQL data expiration - manual cleanup with complex maintenance overhead

-- Session data management with manual expiration logic
CREATE TABLE user_sessions (
    session_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id INTEGER NOT NULL,
    session_token VARCHAR(256) UNIQUE NOT NULL,
    created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    expires_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP + INTERVAL '24 hours',
    last_activity TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,

    -- Session metadata
    user_agent TEXT,
    ip_address INET,
    login_method VARCHAR(50),
    session_data JSONB,

    -- Security tracking
    is_active BOOLEAN DEFAULT TRUE,
    invalid_attempts INTEGER DEFAULT 0,
    security_flags TEXT[],

    -- Cleanup tracking
    cleanup_eligible BOOLEAN DEFAULT FALSE,
    cleanup_scheduled TIMESTAMP,

    -- Foreign key constraints
    FOREIGN KEY (user_id) REFERENCES users(user_id) ON DELETE CASCADE
);

-- Audit log table requiring manual retention management
CREATE TABLE audit_logs (
    log_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    event_timestamp TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    event_type VARCHAR(100) NOT NULL,
    user_id INTEGER,

    -- Event details
    resource_type VARCHAR(100),
    resource_id VARCHAR(255),
    action_performed VARCHAR(100),
    event_data JSONB,

    -- Request context
    ip_address INET,
    user_agent TEXT,
    request_id VARCHAR(100),
    session_id VARCHAR(100),

    -- Compliance and retention
    retention_category VARCHAR(50) NOT NULL DEFAULT 'standard',
    retention_expiry TIMESTAMP,
    compliance_flags TEXT[],

    -- Cleanup metadata
    marked_for_deletion BOOLEAN DEFAULT FALSE,
    deletion_scheduled TIMESTAMP,
    deletion_reason TEXT,

    -- Performance indexes
    INDEX idx_audit_event_timestamp (event_timestamp),
    INDEX idx_audit_user_id_timestamp (user_id, event_timestamp),
    INDEX idx_audit_retention_expiry (retention_expiry),
    INDEX idx_audit_cleanup_eligible (marked_for_deletion, deletion_scheduled)
);

-- Complex manual cleanup procedure with performance impact
CREATE OR REPLACE FUNCTION cleanup_expired_sessions()
RETURNS INTEGER AS $$
DECLARE
    cleanup_batch_size INTEGER := 10000;
    total_deleted INTEGER := 0;
    batch_deleted INTEGER;
    cleanup_start TIMESTAMP := CURRENT_TIMESTAMP;
    session_cursor CURSOR FOR 
        SELECT session_id, user_id, expires_at, last_activity
        FROM user_sessions
        WHERE (expires_at < CURRENT_TIMESTAMP OR 
               last_activity < CURRENT_TIMESTAMP - INTERVAL '7 days')
        AND cleanup_eligible = FALSE
        ORDER BY expires_at ASC
        LIMIT cleanup_batch_size;

    session_record RECORD;

BEGIN
    RAISE NOTICE 'Starting session cleanup process at %', cleanup_start;

    -- Mark sessions eligible for cleanup
    UPDATE user_sessions 
    SET cleanup_eligible = TRUE,
        cleanup_scheduled = CURRENT_TIMESTAMP
    WHERE (expires_at < CURRENT_TIMESTAMP OR 
           last_activity < CURRENT_TIMESTAMP - INTERVAL '7 days')
    AND cleanup_eligible = FALSE;

    GET DIAGNOSTICS batch_deleted = ROW_COUNT;
    RAISE NOTICE 'Marked % sessions for cleanup', batch_deleted;

    -- Process cleanup in batches to avoid long locks
    FOR session_record IN session_cursor LOOP
        BEGIN
            -- Log session termination for audit
            INSERT INTO audit_logs (
                event_type, user_id, resource_type, resource_id,
                action_performed, event_data, retention_category
            ) VALUES (
                'session_expired', session_record.user_id, 'session', 
                session_record.session_id::text, 'automatic_cleanup',
                jsonb_build_object(
                    'expired_at', session_record.expires_at,
                    'last_activity', session_record.last_activity,
                    'cleanup_reason', 'ttl_expiration',
                    'cleanup_timestamp', CURRENT_TIMESTAMP
                ),
                'session_management'
            );

            -- Remove expired session
            DELETE FROM user_sessions 
            WHERE session_id = session_record.session_id;

            total_deleted := total_deleted + 1;

            -- Commit periodically to avoid long transactions
            IF total_deleted % 1000 = 0 THEN
                COMMIT;
                RAISE NOTICE 'Progress: % sessions cleaned up', total_deleted;
            END IF;

        EXCEPTION
            WHEN foreign_key_violation THEN
                RAISE WARNING 'Foreign key constraint prevents deletion of session %', 
                    session_record.session_id;
            WHEN OTHERS THEN
                RAISE WARNING 'Error cleaning up session %: %', 
                    session_record.session_id, SQLERRM;
        END;
    END LOOP;

    -- Update cleanup statistics
    INSERT INTO cleanup_statistics (
        cleanup_type, cleanup_timestamp, records_processed,
        processing_duration, success_count, error_count
    ) VALUES (
        'session_cleanup', cleanup_start, total_deleted,
        CURRENT_TIMESTAMP - cleanup_start, total_deleted, 0
    );

    RAISE NOTICE 'Session cleanup completed: % sessions removed in %',
        total_deleted, CURRENT_TIMESTAMP - cleanup_start;

    RETURN total_deleted;
END;
$$ LANGUAGE plpgsql;

-- Audit log retention with complex policy management
CREATE OR REPLACE FUNCTION manage_audit_log_retention()
RETURNS INTEGER AS $$
DECLARE
    retention_policies RECORD;
    policy_cursor CURSOR FOR
        SELECT retention_category, retention_days, compliance_required
        FROM retention_policy_config
        WHERE active = TRUE;

    total_processed INTEGER := 0;
    category_processed INTEGER;
    retention_threshold TIMESTAMP;

BEGIN
    RAISE NOTICE 'Starting audit log retention management...';

    -- Process each retention policy
    FOR retention_policies IN policy_cursor LOOP
        retention_threshold := CURRENT_TIMESTAMP - (retention_policies.retention_days || ' days')::INTERVAL;

        -- Mark logs for deletion based on retention policy
        UPDATE audit_logs 
        SET marked_for_deletion = TRUE,
            deletion_scheduled = CURRENT_TIMESTAMP + INTERVAL '24 hours',
            deletion_reason = 'retention_policy_' || retention_policies.retention_category
        WHERE retention_category = retention_policies.retention_category
        AND event_timestamp < retention_threshold
        AND marked_for_deletion = FALSE
        AND (compliance_flags IS NULL OR NOT compliance_flags && ARRAY['litigation_hold', 'investigation_hold']);

        GET DIAGNOSTICS category_processed = ROW_COUNT;
        total_processed := total_processed + category_processed;

        RAISE NOTICE 'Retention policy %: marked % logs for deletion (threshold: %)',
            retention_policies.retention_category, category_processed, retention_threshold;
    END LOOP;

    -- Execute delayed deletion for logs past grace period
    DELETE FROM audit_logs 
    WHERE marked_for_deletion = TRUE 
    AND deletion_scheduled < CURRENT_TIMESTAMP
    AND (compliance_flags IS NULL OR NOT compliance_flags && ARRAY['litigation_hold']);

    GET DIAGNOSTICS category_processed = ROW_COUNT;
    RAISE NOTICE 'Deleted % audit logs past grace period', category_processed;

    RETURN total_processed;
END;
$$ LANGUAGE plpgsql;

-- Complex cache data management with manual expiration
CREATE TABLE application_cache (
    cache_key VARCHAR(500) PRIMARY KEY,
    cache_namespace VARCHAR(100) NOT NULL DEFAULT 'default',
    cache_value JSONB NOT NULL,
    created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    expires_at TIMESTAMP NOT NULL,
    last_accessed TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,

    -- Cache metadata
    cache_size_bytes INTEGER,
    access_count INTEGER DEFAULT 1,
    cache_tags TEXT[],
    cache_priority INTEGER DEFAULT 5, -- 1 highest, 10 lowest

    -- Cleanup tracking
    cleanup_candidate BOOLEAN DEFAULT FALSE,

    -- Performance optimization indexes
    INDEX idx_cache_expires_at (expires_at),
    INDEX idx_cache_namespace_expires (cache_namespace, expires_at),
    INDEX idx_cache_cleanup_candidate (cleanup_candidate, expires_at)
);

-- Cache cleanup with performance considerations
CREATE OR REPLACE FUNCTION cleanup_expired_cache()
RETURNS INTEGER AS $$
DECLARE
    cleanup_batch_size INTEGER := 5000;
    total_cleaned INTEGER := 0;
    batch_count INTEGER;
    cleanup_rounds INTEGER := 0;
    max_cleanup_rounds INTEGER := 20;

BEGIN
    RAISE NOTICE 'Starting cache cleanup process...';

    WHILE cleanup_rounds < max_cleanup_rounds LOOP
        -- Delete expired cache entries in batches
        DELETE FROM application_cache 
        WHERE cache_key IN (
            SELECT cache_key 
            FROM application_cache
            WHERE expires_at < CURRENT_TIMESTAMP
            ORDER BY expires_at ASC
            LIMIT cleanup_batch_size
        );

        GET DIAGNOSTICS batch_count = ROW_COUNT;

        IF batch_count = 0 THEN
            EXIT; -- No more expired entries
        END IF;

        total_cleaned := total_cleaned + batch_count;
        cleanup_rounds := cleanup_rounds + 1;

        RAISE NOTICE 'Cleanup round %: removed % expired cache entries', 
            cleanup_rounds, batch_count;

        -- Brief pause to avoid overwhelming the system
        PERFORM pg_sleep(0.1);
    END LOOP;

    -- Additional cleanup for low-priority unused cache
    DELETE FROM application_cache 
    WHERE last_accessed < CURRENT_TIMESTAMP - INTERVAL '7 days'
    AND cache_priority >= 8
    AND access_count <= 5;

    GET DIAGNOSTICS batch_count = ROW_COUNT;
    total_cleaned := total_cleaned + batch_count;

    RAISE NOTICE 'Cache cleanup completed: % total entries removed', total_cleaned;

    RETURN total_cleaned;
END;
$$ LANGUAGE plpgsql;

-- Scheduled cleanup job management (requires external cron)
CREATE TABLE cleanup_job_schedule (
    job_name VARCHAR(100) PRIMARY KEY,
    job_function VARCHAR(200) NOT NULL,
    schedule_expression VARCHAR(100) NOT NULL, -- Cron expression
    last_execution TIMESTAMP,
    next_execution TIMESTAMP,
    execution_count INTEGER DEFAULT 0,

    -- Job configuration
    enabled BOOLEAN DEFAULT TRUE,
    max_execution_time INTERVAL DEFAULT '2 hours',
    cleanup_batch_size INTEGER DEFAULT 10000,

    -- Performance tracking
    average_execution_time INTERVAL,
    total_records_processed BIGINT DEFAULT 0,
    last_records_processed INTEGER,

    -- Error handling
    last_error_message TEXT,
    consecutive_failures INTEGER DEFAULT 0,
    max_failures_allowed INTEGER DEFAULT 3
);

-- Insert cleanup job configurations
INSERT INTO cleanup_job_schedule (job_name, job_function, schedule_expression) VALUES
('session_cleanup', 'cleanup_expired_sessions()', '0 */6 * * *'), -- Every 6 hours
('audit_retention', 'manage_audit_log_retention()', '0 2 * * 0'),  -- Weekly at 2 AM
('cache_cleanup', 'cleanup_expired_cache()', '*/30 * * * *'),      -- Every 30 minutes
('temp_file_cleanup', 'cleanup_temporary_files()', '0 1 * * *');   -- Daily at 1 AM

-- Monitor cleanup job performance
WITH cleanup_performance AS (
    SELECT 
        job_name,
        last_execution,
        next_execution,
        execution_count,
        average_execution_time,
        total_records_processed,
        last_records_processed,
        consecutive_failures,

        -- Performance calculations
        CASE 
            WHEN execution_count > 0 AND total_records_processed > 0 THEN
                ROUND(total_records_processed::DECIMAL / execution_count::DECIMAL, 0)
            ELSE 0
        END as avg_records_per_execution,

        -- Health status
        CASE 
            WHEN consecutive_failures >= max_failures_allowed THEN 'failed'
            WHEN consecutive_failures > 0 THEN 'degraded'
            WHEN last_execution < CURRENT_TIMESTAMP - INTERVAL '24 hours' THEN 'overdue'
            ELSE 'healthy'
        END as job_health

    FROM cleanup_job_schedule
    WHERE enabled = TRUE
),

cleanup_recommendations AS (
    SELECT 
        cp.job_name,
        cp.job_health,
        cp.avg_records_per_execution,
        cp.average_execution_time,

        -- Optimization recommendations
        CASE 
            WHEN cp.job_health = 'failed' THEN 'Immediate attention: job failing consistently'
            WHEN cp.average_execution_time > INTERVAL '1 hour' THEN 'Performance issue: execution time too long'
            WHEN cp.avg_records_per_execution > 50000 THEN 'Consider smaller batch sizes to reduce lock contention'
            WHEN cp.consecutive_failures > 0 THEN 'Monitor job execution and error logs'
            ELSE 'Job performing within expected parameters'
        END as recommendation,

        -- Resource impact assessment
        CASE 
            WHEN cp.average_execution_time > INTERVAL '30 minutes' THEN 'high'
            WHEN cp.average_execution_time > INTERVAL '10 minutes' THEN 'medium'
            ELSE 'low'
        END as resource_impact

    FROM cleanup_performance cp
)

-- Generate cleanup management dashboard
SELECT 
    cr.job_name,
    cr.job_health,
    cr.avg_records_per_execution,
    cr.average_execution_time,
    cr.resource_impact,
    cr.recommendation,

    -- Next steps
    CASE cr.job_health
        WHEN 'failed' THEN 'Review error logs and fix underlying issues'
        WHEN 'degraded' THEN 'Monitor next execution and investigate intermittent failures'
        WHEN 'overdue' THEN 'Check job scheduler and execute manually if needed'
        ELSE 'Continue monitoring performance trends'
    END as next_actions,

    -- Operational guidance
    CASE 
        WHEN cr.resource_impact = 'high' THEN 'Schedule during low-traffic periods'
        WHEN cr.avg_records_per_execution > 100000 THEN 'Consider parallel processing'
        ELSE 'Current execution strategy is appropriate'
    END as operational_guidance

FROM cleanup_recommendations cr
ORDER BY 
    CASE cr.job_health
        WHEN 'failed' THEN 1
        WHEN 'degraded' THEN 2
        WHEN 'overdue' THEN 3
        ELSE 4
    END,
    cr.resource_impact DESC;

-- Problems with traditional data expiration management:
-- 1. Complex manual cleanup procedures requiring extensive procedural code and maintenance
-- 2. Performance impact from batch deletion operations affecting application responsiveness
-- 3. Resource-intensive cleanup jobs requiring careful scheduling and monitoring  
-- 4. Risk of data inconsistency during cleanup operations due to foreign key constraints
-- 5. Limited scalability for high-volume data expiration scenarios
-- 6. Manual configuration and maintenance of retention policies across different data types
-- 7. Complex error handling and recovery procedures for failed cleanup operations
-- 8. Difficulty coordinating cleanup across multiple tables with interdependencies
-- 9. Operational overhead of monitoring and maintaining cleanup job performance
-- 10. Risk of storage bloat if cleanup jobs fail or are disabled

MongoDB provides native TTL functionality with automatic data expiration and lifecycle management:

// MongoDB TTL Collections - Native automatic data lifecycle management and expiration
const { MongoClient, ObjectId } = require('mongodb');

// Advanced MongoDB TTL Collection Manager with Enterprise Data Lifecycle Management
class MongoDBTTLManager {
  constructor(client, config = {}) {
    this.client = client;
    this.db = client.db(config.database || 'enterprise_data');

    this.config = {
      // TTL Configuration
      defaultTTLSeconds: config.defaultTTLSeconds || 86400, // 24 hours
      enableTTLMonitoring: config.enableTTLMonitoring !== false,
      enableExpirationAlerts: config.enableExpirationAlerts !== false,

      // Data lifecycle policies
      retentionPolicies: config.retentionPolicies || {},
      complianceMode: config.complianceMode || false,
      enableDataArchiving: config.enableDataArchiving || false,

      // Performance optimization
      enableBackgroundExpiration: config.enableBackgroundExpiration !== false,
      expirationBatchSize: config.expirationBatchSize || 1000,
      enableExpirationMetrics: config.enableExpirationMetrics !== false
    };

    // TTL collection management
    this.ttlCollections = new Map();
    this.retentionPolicies = new Map();
    this.expirationMetrics = new Map();

    this.initializeTTLManager();
  }

  async initializeTTLManager() {
    console.log('Initializing MongoDB TTL Collection Manager...');

    try {
      // Setup TTL collections for different data types
      await this.setupSessionTTLCollection();
      await this.setupAuditLogTTLCollection();
      await this.setupCacheTTLCollection();
      await this.setupTemporaryDataTTLCollection();
      await this.setupEventTTLCollection();

      // Initialize monitoring and metrics
      if (this.config.enableTTLMonitoring) {
        await this.initializeTTLMonitoring();
      }

      // Setup data lifecycle policies
      await this.configureDataLifecyclePolicies();

      console.log('MongoDB TTL Collection Manager initialized successfully');

    } catch (error) {
      console.error('Error initializing TTL manager:', error);
      throw error;
    }
  }

  async setupSessionTTLCollection() {
    console.log('Setting up session TTL collection...');

    try {
      const sessionCollection = this.db.collection('user_sessions');

      // Create TTL index on expiresAt field (24 hours)
      await sessionCollection.createIndex(
        { expiresAt: 1 }, 
        { 
          expireAfterSeconds: 0,  // Expire based on document date field value
          background: true,
          name: 'session_ttl_index'
        }
      );

      // Additional indexes for performance
      await sessionCollection.createIndexes([
        { key: { userId: 1, expiresAt: 1 }, background: true },
        { key: { sessionToken: 1 }, unique: true, background: true },
        { key: { lastActivity: -1 }, background: true },
        { key: { ipAddress: 1, createdAt: -1 }, background: true }
      ]);

      // Store TTL configuration
      this.ttlCollections.set('user_sessions', {
        collection: sessionCollection,
        ttlField: 'expiresAt',
        ttlSeconds: 0, // Document-controlled expiration
        retentionPolicy: 'session_management',
        complianceLevel: 'standard'
      });

      console.log('Session TTL collection configured with automatic expiration');

    } catch (error) {
      console.error('Error setting up session TTL collection:', error);
      throw error;
    }
  }

  async setupAuditLogTTLCollection() {
    console.log('Setting up audit log TTL collection with compliance requirements...');

    try {
      const auditCollection = this.db.collection('audit_logs');

      // Create TTL index for standard audit logs (90 days retention)
      await auditCollection.createIndex(
        { retentionExpiry: 1 },
        {
          expireAfterSeconds: 0, // Document-controlled expiration
          background: true,
          name: 'audit_retention_ttl_index',
          partialFilterExpression: {
            complianceHold: { $ne: true },
            retentionCategory: { $nin: ['critical', 'legal_hold'] }
          }
        }
      );

      // Performance indexes for audit queries
      await auditCollection.createIndexes([
        { key: { eventTimestamp: -1 }, background: true },
        { key: { userId: 1, eventTimestamp: -1 }, background: true },
        { key: { eventType: 1, eventTimestamp: -1 }, background: true },
        { key: { retentionCategory: 1, retentionExpiry: 1 }, background: true },
        { key: { complianceHold: 1 }, sparse: true, background: true }
      ]);

      this.ttlCollections.set('audit_logs', {
        collection: auditCollection,
        ttlField: 'retentionExpiry',
        ttlSeconds: 0,
        retentionPolicy: 'audit_compliance',
        complianceLevel: 'high',
        specialHandling: ['critical', 'legal_hold']
      });

      console.log('Audit log TTL collection configured with compliance controls');

    } catch (error) {
      console.error('Error setting up audit log TTL collection:', error);
      throw error;
    }
  }

  async setupCacheTTLCollection() {
    console.log('Setting up cache TTL collection for automatic cleanup...');

    try {
      const cacheCollection = this.db.collection('application_cache');

      // Create TTL index for cache expiration (immediate expiration when expired)
      await cacheCollection.createIndex(
        { expiresAt: 1 },
        {
          expireAfterSeconds: 60, // 1 minute grace period for cache cleanup
          background: true,
          name: 'cache_ttl_index'
        }
      );

      // Performance indexes for cache operations
      await cacheCollection.createIndexes([
        { key: { cacheKey: 1 }, unique: true, background: true },
        { key: { cacheNamespace: 1, cacheKey: 1 }, background: true },
        { key: { lastAccessed: -1 }, background: true },
        { key: { cachePriority: 1, expiresAt: 1 }, background: true }
      ]);

      this.ttlCollections.set('application_cache', {
        collection: cacheCollection,
        ttlField: 'expiresAt',
        ttlSeconds: 60, // Short grace period
        retentionPolicy: 'cache_management',
        complianceLevel: 'low'
      });

      console.log('Cache TTL collection configured for optimal performance');

    } catch (error) {
      console.error('Error setting up cache TTL collection:', error);
      throw error;
    }
  }

  async setupTemporaryDataTTLCollection() {
    console.log('Setting up temporary data TTL collection...');

    try {
      const tempCollection = this.db.collection('temporary_data');

      // Create TTL index for temporary data (1 hour default)
      await tempCollection.createIndex(
        { createdAt: 1 },
        {
          expireAfterSeconds: 3600, // 1 hour
          background: true,
          name: 'temp_data_ttl_index'
        }
      );

      // Additional indexes for temporary data queries
      await tempCollection.createIndexes([
        { key: { dataType: 1, createdAt: -1 }, background: true },
        { key: { userId: 1, dataType: 1 }, background: true },
        { key: { sessionId: 1 }, background: true, sparse: true }
      ]);

      this.ttlCollections.set('temporary_data', {
        collection: tempCollection,
        ttlField: 'createdAt',
        ttlSeconds: 3600,
        retentionPolicy: 'temporary_storage',
        complianceLevel: 'low'
      });

      console.log('Temporary data TTL collection configured');

    } catch (error) {
      console.error('Error setting up temporary data TTL collection:', error);
      throw error;
    }
  }

  async setupEventTTLCollection() {
    console.log('Setting up event TTL collection with tiered retention...');

    try {
      const eventCollection = this.db.collection('application_events');

      // Create compound TTL index with conditional expiration
      await eventCollection.createIndex(
        { retentionTier: 1, expiresAt: 1 },
        {
          expireAfterSeconds: 0, // Document-controlled
          background: true,
          name: 'event_tiered_ttl_index'
        }
      );

      // Performance indexes for event queries
      await eventCollection.createIndexes([
        { key: { eventTimestamp: -1 }, background: true },
        { key: { eventType: 1, eventTimestamp: -1 }, background: true },
        { key: { userId: 1, eventTimestamp: -1 }, background: true },
        { key: { retentionTier: 1, eventTimestamp: -1 }, background: true }
      ]);

      this.ttlCollections.set('application_events', {
        collection: eventCollection,
        ttlField: 'expiresAt',
        ttlSeconds: 0,
        retentionPolicy: 'tiered_retention',
        complianceLevel: 'medium',
        tiers: {
          'hot': 86400 * 7,    // 7 days
          'warm': 86400 * 30,  // 30 days  
          'cold': 86400 * 90   // 90 days
        }
      });

      console.log('Event TTL collection configured with tiered retention');

    } catch (error) {
      console.error('Error setting up event TTL collection:', error);
      throw error;
    }
  }

  async createSessionWithTTL(sessionData) {
    console.log('Creating user session with automatic TTL expiration...');

    try {
      const sessionCollection = this.db.collection('user_sessions');
      const expirationTime = new Date(Date.now() + (24 * 60 * 60 * 1000)); // 24 hours

      const session = {
        _id: new ObjectId(),
        sessionToken: sessionData.sessionToken,
        userId: sessionData.userId,
        createdAt: new Date(),
        expiresAt: expirationTime, // TTL expiration field
        lastActivity: new Date(),

        // Session metadata
        userAgent: sessionData.userAgent,
        ipAddress: sessionData.ipAddress,
        loginMethod: sessionData.loginMethod || 'password',
        sessionData: sessionData.additionalData || {},

        // Security tracking
        isActive: true,
        invalidAttempts: 0,
        securityFlags: [],

        // TTL metadata
        ttlManaged: true,
        retentionPolicy: 'session_management'
      };

      const result = await sessionCollection.insertOne(session);

      // Update session metrics
      await this.updateTTLMetrics('user_sessions', 'created', session);

      console.log(`Session created with TTL expiration: ${result.insertedId}`);

      return {
        sessionId: result.insertedId,
        expiresAt: expirationTime,
        ttlEnabled: true
      };

    } catch (error) {
      console.error('Error creating session with TTL:', error);
      throw error;
    }
  }

  async createAuditLogWithRetention(auditData) {
    console.log('Creating audit log with compliance-driven retention...');

    try {
      const auditCollection = this.db.collection('audit_logs');

      // Calculate retention expiry based on data classification
      const retentionDays = this.calculateRetentionPeriod(auditData.retentionCategory);
      const retentionExpiry = new Date(Date.now() + (retentionDays * 24 * 60 * 60 * 1000));

      const auditLog = {
        _id: new ObjectId(),
        eventTimestamp: new Date(),
        eventType: auditData.eventType,
        userId: auditData.userId,

        // Event details
        resourceType: auditData.resourceType,
        resourceId: auditData.resourceId,
        actionPerformed: auditData.action,
        eventData: auditData.eventData || {},

        // Request context
        ipAddress: auditData.ipAddress,
        userAgent: auditData.userAgent,
        requestId: auditData.requestId,
        sessionId: auditData.sessionId,

        // Compliance and retention
        retentionCategory: auditData.retentionCategory || 'standard',
        retentionExpiry: retentionExpiry, // TTL expiration field
        complianceFlags: auditData.complianceFlags || [],
        complianceHold: auditData.complianceHold || false,

        // TTL metadata
        ttlManaged: !auditData.complianceHold,
        retentionDays: retentionDays,
        dataClassification: auditData.dataClassification || 'internal'
      };

      const result = await auditCollection.insertOne(auditLog);

      // Update audit metrics
      await this.updateTTLMetrics('audit_logs', 'created', auditLog);

      console.log(`Audit log created with ${retentionDays}-day retention: ${result.insertedId}`);

      return {
        auditId: result.insertedId,
        retentionExpiry: retentionExpiry,
        retentionDays: retentionDays,
        ttlEnabled: !auditData.complianceHold
      };

    } catch (error) {
      console.error('Error creating audit log with retention:', error);
      throw error;
    }
  }

  async createCacheEntryWithTTL(cacheData) {
    console.log('Creating cache entry with automatic expiration...');

    try {
      const cacheCollection = this.db.collection('application_cache');

      // Calculate cache expiration based on cache type and priority
      const ttlSeconds = this.calculateCacheTTL(cacheData.cacheType, cacheData.priority);
      const expirationTime = new Date(Date.now() + (ttlSeconds * 1000));

      const cacheEntry = {
        _id: new ObjectId(),
        cacheKey: cacheData.key,
        cacheNamespace: cacheData.namespace || 'default',
        cacheValue: cacheData.value,
        createdAt: new Date(),
        expiresAt: expirationTime, // TTL expiration field
        lastAccessed: new Date(),

        // Cache metadata
        cacheType: cacheData.cacheType || 'general',
        cacheSizeBytes: JSON.stringify(cacheData.value).length,
        accessCount: 1,
        cacheTags: cacheData.tags || [],
        cachePriority: cacheData.priority || 5,

        // TTL configuration
        ttlSeconds: ttlSeconds,
        ttlManaged: true
      };

      // Use upsert to handle cache key uniqueness
      const result = await cacheCollection.replaceOne(
        { cacheKey: cacheData.key },
        cacheEntry,
        { upsert: true }
      );

      // Update cache metrics
      await this.updateTTLMetrics('application_cache', 'created', cacheEntry);

      console.log(`Cache entry created with ${ttlSeconds}s TTL: ${cacheData.key}`);

      return {
        cacheKey: cacheData.key,
        expiresAt: expirationTime,
        ttlSeconds: ttlSeconds,
        upserted: result.upsertedCount > 0
      };

    } catch (error) {
      console.error('Error creating cache entry with TTL:', error);
      throw error;
    }
  }

  async createEventWithTieredRetention(eventData) {
    console.log('Creating event with tiered retention policy...');

    try {
      const eventCollection = this.db.collection('application_events');

      // Determine retention tier based on event importance
      const retentionTier = this.determineEventRetentionTier(eventData);
      const ttlConfig = this.ttlCollections.get('application_events').tiers;
      const ttlSeconds = ttlConfig[retentionTier];
      const expirationTime = new Date(Date.now() + (ttlSeconds * 1000));

      const event = {
        _id: new ObjectId(),
        eventTimestamp: new Date(),
        eventType: eventData.type,
        userId: eventData.userId,

        // Event payload
        eventData: eventData.data || {},
        eventSource: eventData.source || 'application',
        eventSeverity: eventData.severity || 'info',

        // Context information
        sessionId: eventData.sessionId,
        requestId: eventData.requestId,
        correlationId: eventData.correlationId,

        // Tiered retention
        retentionTier: retentionTier,
        expiresAt: expirationTime, // TTL expiration field
        retentionDays: Math.floor(ttlSeconds / 86400),

        // Event metadata
        eventVersion: eventData.version || '1.0',
        processingRequirements: eventData.processing || [],

        // TTL management
        ttlManaged: true,
        ttlTier: retentionTier
      };

      const result = await eventCollection.insertOne(event);

      // Update event metrics
      await this.updateTTLMetrics('application_events', 'created', event);

      console.log(`Event created with ${retentionTier} tier retention: ${result.insertedId}`);

      return {
        eventId: result.insertedId,
        retentionTier: retentionTier,
        expiresAt: expirationTime,
        retentionDays: Math.floor(ttlSeconds / 86400)
      };

    } catch (error) {
      console.error('Error creating event with tiered retention:', error);
      throw error;
    }
  }

  async updateTTLConfiguration(collectionName, newTTLSeconds) {
    console.log(`Updating TTL configuration for collection: ${collectionName}`);

    try {
      const collection = this.db.collection(collectionName);
      const ttlConfig = this.ttlCollections.get(collectionName);

      if (!ttlConfig) {
        throw new Error(`TTL configuration not found for collection: ${collectionName}`);
      }

      // Update TTL index
      await collection.dropIndex(ttlConfig.ttlField + '_1');
      await collection.createIndex(
        { [ttlConfig.ttlField]: 1 },
        {
          expireAfterSeconds: newTTLSeconds,
          background: true,
          name: `${ttlConfig.ttlField}_ttl_index`
        }
      );

      // Update configuration
      ttlConfig.ttlSeconds = newTTLSeconds;
      this.ttlCollections.set(collectionName, ttlConfig);

      console.log(`TTL configuration updated: ${collectionName} now expires after ${newTTLSeconds} seconds`);

      return {
        collection: collectionName,
        ttlSeconds: newTTLSeconds,
        updated: true
      };

    } catch (error) {
      console.error(`Error updating TTL configuration for ${collectionName}:`, error);
      throw error;
    }
  }

  // Utility methods for TTL management

  calculateRetentionPeriod(retentionCategory) {
    const retentionPolicies = {
      'session_management': 1,      // 1 day
      'standard': 90,               // 90 days
      'security': 365,              // 1 year
      'financial': 2555,            // 7 years
      'legal': 3650,                // 10 years
      'critical': 7300,             // 20 years
      'permanent': 0                // No expiration
    };

    return retentionPolicies[retentionCategory] || 90;
  }

  calculateCacheTTL(cacheType, priority) {
    const baseTTL = {
      'session': 1800,         // 30 minutes
      'user_data': 3600,       // 1 hour  
      'api_response': 300,     // 5 minutes
      'computed': 7200,        // 2 hours
      'static': 86400          // 24 hours
    };

    const base = baseTTL[cacheType] || 3600;

    // Adjust TTL based on priority (1 = highest, 10 = lowest)
    const priorityMultiplier = Math.max(0.5, Math.min(2.0, (11 - priority) / 5));

    return Math.floor(base * priorityMultiplier);
  }

  determineEventRetentionTier(eventData) {
    const eventType = eventData.type;
    const severity = eventData.severity || 'info';
    const importance = eventData.importance || 'standard';

    // Critical events get longest retention
    if (severity === 'critical' || importance === 'high') {
      return 'cold'; // 90 days
    }

    // Security and audit events get medium retention  
    if (eventType.includes('security') || eventType.includes('audit')) {
      return 'warm'; // 30 days
    }

    // Regular application events get short retention
    return 'hot'; // 7 days
  }

  async updateTTLMetrics(collectionName, operation, document) {
    if (!this.config.enableExpirationMetrics) return;

    const metrics = this.expirationMetrics.get(collectionName) || {
      created: 0,
      expired: 0,
      totalSize: 0,
      lastUpdated: new Date()
    };

    if (operation === 'created') {
      metrics.created++;
      metrics.totalSize += JSON.stringify(document).length;
    } else if (operation === 'expired') {
      metrics.expired++;
    }

    metrics.lastUpdated = new Date();
    this.expirationMetrics.set(collectionName, metrics);
  }

  async getTTLStatus() {
    console.log('Retrieving TTL status for all managed collections...');

    const status = {
      collections: {},
      summary: {
        totalCollections: this.ttlCollections.size,
        totalDocuments: 0,
        upcomingExpirations: 0,
        storageOptimization: 0
      }
    };

    for (const [collectionName, config] of this.ttlCollections) {
      try {
        const collection = config.collection;
        const stats = await collection.stats();

        // Count documents expiring soon (next 24 hours)
        const upcoming = await collection.countDocuments({
          [config.ttlField]: {
            $lte: new Date(Date.now() + 86400000) // 24 hours
          }
        });

        status.collections[collectionName] = {
          ttlField: config.ttlField,
          ttlSeconds: config.ttlSeconds,
          retentionPolicy: config.retentionPolicy,
          documentCount: stats.count,
          storageSize: stats.storageSize,
          upcomingExpirations: upcoming,
          lastChecked: new Date()
        };

        status.summary.totalDocuments += stats.count;
        status.summary.upcomingExpirations += upcoming;
        status.summary.storageOptimization += stats.storageSize;

      } catch (error) {
        console.error(`Error getting TTL status for ${collectionName}:`, error);
        status.collections[collectionName] = {
          error: error.message,
          lastChecked: new Date()
        };
      }
    }

    return status;
  }

  async getExpirationMetrics() {
    console.log('Retrieving comprehensive expiration metrics...');

    const metrics = {
      timestamp: new Date(),
      collections: {},
      summary: {
        totalCreated: 0,
        totalExpired: 0,
        storageReclaimed: 0,
        expirationEfficiency: 0
      }
    };

    for (const [collectionName, collectionMetrics] of this.expirationMetrics) {
      metrics.collections[collectionName] = {
        ...collectionMetrics,
        expirationRate: collectionMetrics.expired / Math.max(collectionMetrics.created, 1)
      };

      metrics.summary.totalCreated += collectionMetrics.created;
      metrics.summary.totalExpired += collectionMetrics.expired;
    }

    metrics.summary.expirationEfficiency = 
      metrics.summary.totalExpired / Math.max(metrics.summary.totalCreated, 1);

    return metrics;
  }

  async cleanup() {
    console.log('Cleaning up TTL Manager resources...');

    // Clear monitoring intervals and cleanup resources
    this.ttlCollections.clear();
    this.retentionPolicies.clear();
    this.expirationMetrics.clear();

    console.log('TTL Manager cleanup completed');
  }
}

// Example usage for enterprise data lifecycle management
async function demonstrateEnterpriseDataLifecycle() {
  const client = new MongoClient('mongodb://localhost:27017');
  await client.connect();

  const ttlManager = new MongoDBTTLManager(client, {
    database: 'enterprise_lifecycle',
    enableTTLMonitoring: true,
    enableExpirationMetrics: true,
    complianceMode: true
  });

  try {
    // Create session with automatic 24-hour expiration
    const session = await ttlManager.createSessionWithTTL({
      sessionToken: 'session_' + Date.now(),
      userId: 'user_12345',
      userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)',
      ipAddress: '192.168.1.100',
      loginMethod: 'password'
    });

    // Create audit log with compliance-driven retention
    const auditLog = await ttlManager.createAuditLogWithRetention({
      eventType: 'user_login',
      userId: 'user_12345',
      resourceType: 'authentication',
      action: 'login_success',
      retentionCategory: 'security', // 365 days retention
      ipAddress: '192.168.1.100',
      eventData: {
        loginMethod: 'password',
        mfaUsed: true,
        riskScore: 'low'
      }
    });

    // Create cache entry with priority-based TTL
    const cacheEntry = await ttlManager.createCacheEntryWithTTL({
      key: 'user_preferences_12345',
      namespace: 'user_data',
      value: {
        theme: 'dark',
        language: 'en',
        timezone: 'UTC',
        notifications: true
      },
      cacheType: 'user_data',
      priority: 3, // High priority = longer TTL
      tags: ['preferences', 'user_settings']
    });

    // Create event with tiered retention
    const event = await ttlManager.createEventWithTieredRetention({
      type: 'page_view',
      userId: 'user_12345',
      severity: 'info',
      data: {
        page: '/dashboard',
        duration: 1500,
        interactions: 5
      },
      source: 'web_app',
      sessionId: session.sessionId.toString()
    });

    // Get TTL status and metrics
    const ttlStatus = await ttlManager.getTTLStatus();
    const expirationMetrics = await ttlManager.getExpirationMetrics();

    console.log('Enterprise Data Lifecycle Management Results:');
    console.log('Session:', session);
    console.log('Audit Log:', auditLog);
    console.log('Cache Entry:', cacheEntry);
    console.log('Event:', event);
    console.log('TTL Status:', JSON.stringify(ttlStatus, null, 2));
    console.log('Expiration Metrics:', JSON.stringify(expirationMetrics, null, 2));

    return {
      session,
      auditLog,
      cacheEntry,
      event,
      ttlStatus,
      expirationMetrics
    };

  } catch (error) {
    console.error('Error demonstrating enterprise data lifecycle:', error);
    throw error;
  } finally {
    await ttlManager.cleanup();
    await client.close();
  }
}

// Benefits of MongoDB TTL Collections:
// - Native automatic data expiration eliminates complex manual cleanup procedures
// - Document-level TTL control with flexible expiration policies based on business requirements
// - Zero performance impact on application operations with background expiration processing
// - Compliance-friendly retention management with audit trails and legal hold capabilities  
// - Intelligent storage optimization with automatic document removal and space reclamation
// - Scalable data lifecycle management that handles high-volume data expiration scenarios
// - Enterprise-grade monitoring and metrics for data retention and compliance reporting
// - Seamless integration with MongoDB's document model and indexing capabilities

module.exports = {
  MongoDBTTLManager,
  demonstrateEnterpriseDataLifecycle
};

SQL-Style TTL Management with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB TTL collections and data lifecycle management:

-- QueryLeaf TTL collections with SQL-familiar data lifecycle management syntax

-- Configure TTL collections and expiration policies
SET ttl_monitoring_enabled = true;
SET ttl_expiration_alerts = true;
SET default_ttl_seconds = 86400; -- 24 hours
SET enable_compliance_mode = true;
SET enable_data_archiving = true;

-- Create TTL-managed collections with expiration policies
WITH ttl_collection_configuration AS (
  SELECT 
    -- Collection TTL configurations
    'user_sessions' as collection_name,
    'expiresAt' as ttl_field,
    0 as ttl_seconds, -- Document-controlled expiration
    'session_management' as retention_policy,
    24 * 3600 as default_session_ttl_seconds,

    -- Index configuration
    JSON_BUILD_OBJECT(
      'ttl_index', JSON_BUILD_OBJECT(
        'field', 'expiresAt',
        'expireAfterSeconds', 0,
        'background', true
      ),
      'performance_indexes', ARRAY[
        JSON_BUILD_OBJECT('fields', JSON_BUILD_OBJECT('userId', 1, 'expiresAt', 1)),
        JSON_BUILD_OBJECT('fields', JSON_BUILD_OBJECT('sessionToken', 1), 'unique', true),
        JSON_BUILD_OBJECT('fields', JSON_BUILD_OBJECT('lastActivity', -1))
      ]
    ) as index_configuration

  UNION ALL

  SELECT 
    'audit_logs' as collection_name,
    'retentionExpiry' as ttl_field,
    0 as ttl_seconds, -- Document-controlled with compliance
    'audit_compliance' as retention_policy,
    90 * 24 * 3600 as default_audit_ttl_seconds,

    JSON_BUILD_OBJECT(
      'ttl_index', JSON_BUILD_OBJECT(
        'field', 'retentionExpiry',
        'expireAfterSeconds', 0,
        'background', true,
        'partial_filter', JSON_BUILD_OBJECT(
          'complianceHold', JSON_BUILD_OBJECT('$ne', true),
          'retentionCategory', JSON_BUILD_OBJECT('$nin', ARRAY['critical', 'legal_hold'])
        )
      ),
      'performance_indexes', ARRAY[
        JSON_BUILD_OBJECT('fields', JSON_BUILD_OBJECT('eventTimestamp', -1)),
        JSON_BUILD_OBJECT('fields', JSON_BUILD_OBJECT('userId', 1, 'eventTimestamp', -1)),
        JSON_BUILD_OBJECT('fields', JSON_BUILD_OBJECT('retentionCategory', 1, 'retentionExpiry', 1))
      ]
    ) as index_configuration

  UNION ALL

  SELECT 
    'application_cache' as collection_name,
    'expiresAt' as ttl_field,
    60 as ttl_seconds, -- 1 minute grace period
    'cache_management' as retention_policy,
    3600 as default_cache_ttl_seconds, -- 1 hour

    JSON_BUILD_OBJECT(
      'ttl_index', JSON_BUILD_OBJECT(
        'field', 'expiresAt',
        'expireAfterSeconds', 60,
        'background', true
      ),
      'performance_indexes', ARRAY[
        JSON_BUILD_OBJECT('fields', JSON_BUILD_OBJECT('cacheKey', 1), 'unique', true),
        JSON_BUILD_OBJECT('fields', JSON_BUILD_OBJECT('cacheNamespace', 1, 'cacheKey', 1)),
        JSON_BUILD_OBJECT('fields', JSON_BUILD_OBJECT('cachePriority', 1, 'expiresAt', 1))
      ]
    ) as index_configuration
),

-- Data retention policy definitions
retention_policy_definitions AS (
  SELECT 
    policy_name,
    retention_days,
    compliance_level,
    auto_expiration,
    archive_before_expiration,
    legal_hold_exempt,

    -- TTL calculation
    retention_days * 24 * 3600 as retention_seconds,

    -- Policy rules
    CASE policy_name
      WHEN 'session_management' THEN 'Expire user sessions after inactivity period'
      WHEN 'audit_compliance' THEN 'Retain audit logs per compliance requirements'
      WHEN 'cache_management' THEN 'Optimize cache storage with automatic cleanup'
      WHEN 'temporary_storage' THEN 'Remove temporary data after processing'
      WHEN 'event_analytics' THEN 'Tiered retention for application events'
    END as policy_description,

    -- Compliance requirements
    CASE compliance_level
      WHEN 'high' THEN ARRAY['audit_trail', 'legal_hold_support', 'data_classification']
      WHEN 'medium' THEN ARRAY['audit_trail', 'data_classification'] 
      ELSE ARRAY['basic_logging']
    END as compliance_requirements

  FROM (VALUES
    ('session_management', 1, 'medium', true, false, true),
    ('audit_compliance', 90, 'high', true, true, false),
    ('security_logs', 365, 'high', true, true, false),
    ('cache_management', 0, 'low', true, false, true), -- Immediate expiration
    ('temporary_storage', 1, 'low', true, false, true),
    ('event_analytics', 30, 'medium', true, false, true),
    ('financial_records', 2555, 'critical', false, true, false), -- 7 years
    ('legal_documents', 3650, 'critical', false, true, false)    -- 10 years
  ) AS policies(policy_name, retention_days, compliance_level, auto_expiration, archive_before_expiration, legal_hold_exempt)
),

-- Create session data with automatic TTL expiration
session_ttl_operations AS (
  INSERT INTO user_sessions_ttl
  SELECT 
    GENERATE_UUID() as session_id,
    'user_' || generate_series(1, 1000) as user_id,
    'session_token_' || EXTRACT(EPOCH FROM CURRENT_TIMESTAMP) || '_' || generate_series(1, 1000) as session_token,
    CURRENT_TIMESTAMP as created_at,
    CURRENT_TIMESTAMP + INTERVAL '24 hours' as expires_at, -- TTL expiration field
    CURRENT_TIMESTAMP as last_activity,

    -- Session metadata
    'Mozilla/5.0 (compatible; Enterprise App)' as user_agent,
    ('192.168.1.' || (1 + random() * 254)::int)::inet as ip_address,
    'password' as login_method,
    JSON_BUILD_OBJECT(
      'preferences', JSON_BUILD_OBJECT('theme', 'dark', 'language', 'en'),
      'permissions', ARRAY['read', 'write'],
      'mfa_verified', true
    ) as session_data,

    -- Security and TTL metadata
    true as is_active,
    0 as invalid_attempts,
    ARRAY[]::text[] as security_flags,
    true as ttl_managed,
    'session_management' as retention_policy
  RETURNING session_id, expires_at
),

-- Create audit logs with compliance-driven TTL
audit_log_ttl_operations AS (
  INSERT INTO audit_logs_ttl
  SELECT 
    GENERATE_UUID() as log_id,
    CURRENT_TIMESTAMP - (random() * INTERVAL '30 days') as event_timestamp,

    -- Event details
    (ARRAY['user_login', 'data_access', 'permission_change', 'security_event', 'system_action'])
      [1 + floor(random() * 5)] as event_type,
    'user_' || (1 + floor(random() * 100)) as user_id,
    'resource_' || (1 + floor(random() * 500)) as resource_id,
    (ARRAY['create', 'read', 'update', 'delete', 'execute'])
      [1 + floor(random() * 5)] as action_performed,

    -- Compliance and retention
    (ARRAY['standard', 'security', 'financial', 'legal'])
      [1 + floor(random() * 4)] as retention_category,

    -- Calculate retention expiry based on category
    CASE retention_category
      WHEN 'standard' THEN CURRENT_TIMESTAMP + INTERVAL '90 days'
      WHEN 'security' THEN CURRENT_TIMESTAMP + INTERVAL '365 days'  
      WHEN 'financial' THEN CURRENT_TIMESTAMP + INTERVAL '2555 days' -- 7 years
      WHEN 'legal' THEN CURRENT_TIMESTAMP + INTERVAL '3650 days'     -- 10 years
    END as retention_expiry, -- TTL expiration field

    -- Compliance flags and controls
    CASE WHEN random() < 0.1 THEN ARRAY['sensitive_data'] ELSE ARRAY[]::text[] END as compliance_flags,
    CASE WHEN random() < 0.05 THEN true ELSE false END as compliance_hold, -- Prevents TTL expiration

    -- Event data and context
    JSON_BUILD_OBJECT(
      'user_agent', 'Mozilla/5.0 (Enterprise Browser)',
      'ip_address', '192.168.' || (1 + floor(random() * 254)) || '.' || (1 + floor(random() * 254)),
      'request_id', 'req_' || EXTRACT(EPOCH FROM CURRENT_TIMESTAMP),
      'session_duration', floor(random() * 3600),
      'data_size', floor(random() * 10000)
    ) as event_data,

    -- TTL management metadata
    CASE WHEN compliance_hold THEN false ELSE true END as ttl_managed,
    'audit_compliance' as retention_policy_applied
  RETURNING log_id, retention_category, retention_expiry, compliance_hold
),

-- Create cache entries with priority-based TTL
cache_ttl_operations AS (
  INSERT INTO application_cache_ttl
  SELECT 
    'cache_key_' || generate_series(1, 5000) as cache_key,
    (ARRAY['user_data', 'api_responses', 'computed_results', 'session_data', 'static_content'])
      [1 + floor(random() * 5)] as cache_namespace,

    -- Cache value and metadata
    JSON_BUILD_OBJECT(
      'data', 'cached_data_' || generate_series(1, 5000),
      'computed_at', CURRENT_TIMESTAMP,
      'version', '1.0'
    ) as cache_value,

    CURRENT_TIMESTAMP as created_at,

    -- Priority-based TTL calculation
    CASE cache_namespace
      WHEN 'user_data' THEN CURRENT_TIMESTAMP + INTERVAL '1 hour'
      WHEN 'api_responses' THEN CURRENT_TIMESTAMP + INTERVAL '5 minutes'
      WHEN 'computed_results' THEN CURRENT_TIMESTAMP + INTERVAL '2 hours'
      WHEN 'session_data' THEN CURRENT_TIMESTAMP + INTERVAL '30 minutes'
      WHEN 'static_content' THEN CURRENT_TIMESTAMP + INTERVAL '24 hours'
    END as expires_at, -- TTL expiration field

    CURRENT_TIMESTAMP as last_accessed,

    -- Cache optimization metadata
    (1 + floor(random() * 10)) as cache_priority, -- 1 = highest, 10 = lowest
    JSON_LENGTH(cache_value::text) as cache_size_bytes,
    1 as access_count,
    ARRAY['generated', 'optimized'] as cache_tags,
    true as ttl_managed
  RETURNING cache_key, cache_namespace, expires_at
),

-- Monitor TTL operations and expiration patterns
ttl_monitoring_metrics AS (
  SELECT 
    collection_name,
    retention_policy,

    -- Document lifecycle metrics
    COUNT(*) as total_documents,
    COUNT(*) FILTER (WHERE ttl_managed = true) as ttl_managed_documents,
    COUNT(*) FILTER (WHERE expires_at <= CURRENT_TIMESTAMP + INTERVAL '1 hour') as expiring_soon,
    COUNT(*) FILTER (WHERE expires_at <= CURRENT_TIMESTAMP + INTERVAL '24 hours') as expiring_today,

    -- TTL efficiency analysis
    AVG(EXTRACT(EPOCH FROM (expires_at - created_at))) as avg_ttl_duration_seconds,
    MIN(expires_at) as next_expiration,
    MAX(expires_at) as latest_expiration,

    -- Storage optimization metrics
    SUM(COALESCE(JSON_LENGTH(session_data::text), JSON_LENGTH(cache_value::text), JSON_LENGTH(event_data::text), 0)) as total_storage_bytes,
    AVG(COALESCE(JSON_LENGTH(session_data::text), JSON_LENGTH(cache_value::text), JSON_LENGTH(event_data::text), 0)) as avg_document_size_bytes,

    -- Retention policy distribution
    MODE() WITHIN GROUP (ORDER BY retention_policy) as primary_retention_policy,

    -- Compliance tracking
    COUNT(*) FILTER (WHERE compliance_hold = true) as compliance_hold_count,
    COUNT(*) FILTER (WHERE compliance_flags IS NOT NULL AND array_length(compliance_flags, 1) > 0) as compliance_flagged

  FROM (
    -- Union all TTL-managed collections
    SELECT 'user_sessions' as collection_name, retention_policy, ttl_managed, 
           created_at, expires_at, session_data as data_field, 
           NULL::text[] as compliance_flags, false as compliance_hold
    FROM session_ttl_operations

    UNION ALL

    SELECT 'audit_logs' as collection_name, retention_policy_applied as retention_policy, ttl_managed,
           event_timestamp as created_at, retention_expiry as expires_at, event_data as data_field,
           compliance_flags, compliance_hold
    FROM audit_log_ttl_operations

    UNION ALL

    SELECT 'application_cache' as collection_name, 'cache_management' as retention_policy, ttl_managed,
           created_at, expires_at, cache_value as data_field,
           NULL::text[] as compliance_flags, false as compliance_hold
    FROM cache_ttl_operations
  ) combined_ttl_data
  GROUP BY collection_name, retention_policy
),

-- TTL performance and optimization analysis
ttl_optimization_analysis AS (
  SELECT 
    tmm.collection_name,
    tmm.retention_policy,
    tmm.total_documents,
    tmm.ttl_managed_documents,

    -- Expiration timeline
    tmm.expiring_soon,
    tmm.expiring_today,
    tmm.next_expiration,
    tmm.latest_expiration,

    -- Storage and performance metrics
    ROUND(tmm.total_storage_bytes / (1024 * 1024)::decimal, 2) as total_storage_mb,
    ROUND(tmm.avg_document_size_bytes / 1024::decimal, 2) as avg_document_size_kb,
    ROUND(tmm.avg_ttl_duration_seconds / 3600::decimal, 2) as avg_ttl_duration_hours,

    -- TTL efficiency assessment
    CASE 
      WHEN tmm.ttl_managed_documents::decimal / tmm.total_documents > 0.9 THEN 'highly_optimized'
      WHEN tmm.ttl_managed_documents::decimal / tmm.total_documents > 0.7 THEN 'well_optimized'
      WHEN tmm.ttl_managed_documents::decimal / tmm.total_documents > 0.5 THEN 'moderately_optimized'
      ELSE 'needs_optimization'
    END as ttl_optimization_level,

    -- Storage optimization potential
    CASE 
      WHEN tmm.expiring_today > tmm.total_documents * 0.1 THEN 'significant_storage_reclaim_expected'
      WHEN tmm.expiring_today > tmm.total_documents * 0.05 THEN 'moderate_storage_reclaim_expected'  
      WHEN tmm.expiring_today > 0 THEN 'minimal_storage_reclaim_expected'
      ELSE 'no_immediate_storage_reclaim'
    END as storage_optimization_forecast,

    -- Compliance assessment
    CASE 
      WHEN tmm.compliance_hold_count > 0 THEN 'compliance_holds_active'
      WHEN tmm.compliance_flagged > tmm.total_documents * 0.1 THEN 'high_compliance_requirements'
      WHEN tmm.compliance_flagged > 0 THEN 'moderate_compliance_requirements'
      ELSE 'standard_compliance_requirements'
    END as compliance_status,

    -- Operational recommendations
    CASE 
      WHEN tmm.avg_ttl_duration_seconds < 3600 THEN 'Consider longer TTL for performance'
      WHEN tmm.avg_ttl_duration_seconds > 86400 * 30 THEN 'Review long retention periods'
      WHEN tmm.expiring_soon > 1000 THEN 'High expiration volume - monitor performance'
      ELSE 'TTL configuration appropriate'
    END as operational_recommendation

  FROM ttl_monitoring_metrics tmm
),

-- Generate comprehensive TTL management dashboard
ttl_dashboard_comprehensive AS (
  SELECT 
    toa.collection_name,
    toa.retention_policy,

    -- Current status
    toa.total_documents,
    toa.ttl_managed_documents,
    ROUND((toa.ttl_managed_documents::decimal / toa.total_documents::decimal) * 100, 1) as ttl_coverage_percent,

    -- Expiration schedule
    toa.expiring_soon,
    toa.expiring_today,
    TO_CHAR(toa.next_expiration, 'YYYY-MM-DD HH24:MI:SS') as next_expiration_time,

    -- Storage metrics
    toa.total_storage_mb,
    toa.avg_document_size_kb,
    toa.avg_ttl_duration_hours,

    -- Optimization status
    toa.ttl_optimization_level,
    toa.storage_optimization_forecast,
    toa.compliance_status,
    toa.operational_recommendation,

    -- Retention policy details
    rpd.retention_days,
    rpd.compliance_level,
    rpd.auto_expiration,
    rpd.legal_hold_exempt,

    -- Performance projections
    CASE 
      WHEN toa.expiring_today > 0 THEN 
        ROUND((toa.expiring_today * toa.avg_document_size_kb) / 1024, 2)
      ELSE 0
    END as projected_storage_reclaim_mb,

    -- Action priorities
    CASE 
      WHEN toa.ttl_optimization_level = 'needs_optimization' THEN 'high'
      WHEN toa.compliance_status = 'compliance_holds_active' THEN 'high'
      WHEN toa.expiring_soon > 1000 THEN 'medium'
      WHEN toa.storage_optimization_forecast LIKE '%significant%' THEN 'medium'
      ELSE 'low'
    END as action_priority,

    -- Specific action items
    ARRAY[
      CASE WHEN toa.ttl_optimization_level = 'needs_optimization' 
           THEN 'Implement TTL for remaining ' || (toa.total_documents - toa.ttl_managed_documents) || ' documents' END,
      CASE WHEN toa.compliance_status = 'compliance_holds_active'
           THEN 'Review active compliance holds and update retention policies' END,
      CASE WHEN toa.expiring_soon > 1000
           THEN 'Monitor system performance during high-volume expiration period' END,
      CASE WHEN toa.operational_recommendation != 'TTL configuration appropriate'
           THEN toa.operational_recommendation END
    ] as action_items

  FROM ttl_optimization_analysis toa
  LEFT JOIN retention_policy_definitions rpd ON toa.retention_policy = rpd.policy_name
)

-- Final comprehensive TTL management report
SELECT 
  tdc.collection_name,
  tdc.retention_policy,
  tdc.compliance_level,

  -- Current state
  tdc.total_documents,
  tdc.ttl_coverage_percent || '%' as ttl_coverage,
  tdc.total_storage_mb || ' MB' as current_storage,

  -- Expiration schedule
  tdc.expiring_soon as expiring_next_hour,
  tdc.expiring_today as expiring_next_24h,
  tdc.next_expiration_time,

  -- Optimization assessment  
  tdc.ttl_optimization_level,
  tdc.storage_optimization_forecast,
  tdc.projected_storage_reclaim_mb || ' MB' as storage_reclaim_potential,

  -- Operational guidance
  tdc.action_priority,
  tdc.operational_recommendation,
  array_to_string(
    array_remove(tdc.action_items, NULL), 
    '; '
  ) as specific_action_items,

  -- Configuration recommendations
  CASE 
    WHEN tdc.ttl_coverage_percent < 70 THEN 
      'Enable TTL for ' || (100 - tdc.ttl_coverage_percent) || '% of documents to improve storage efficiency'
    WHEN tdc.avg_ttl_duration_hours > 720 THEN  -- 30 days
      'Review extended retention periods for compliance requirements'
    WHEN tdc.projected_storage_reclaim_mb > 100 THEN
      'Significant storage optimization opportunity available'
    ELSE 'TTL configuration optimized for current workload'
  END as configuration_guidance,

  -- Compliance and governance
  tdc.compliance_status,
  CASE 
    WHEN tdc.legal_hold_exempt = false THEN 'Legal hold procedures apply'
    WHEN tdc.auto_expiration = false THEN 'Manual expiration required'
    ELSE 'Automatic expiration enabled'
  END as governance_status,

  -- Performance impact assessment
  CASE 
    WHEN tdc.expiring_soon > 5000 THEN 'Monitor database performance during expiration'
    WHEN tdc.expiring_today > 10000 THEN 'Schedule expiration during low-traffic periods'
    WHEN tdc.total_storage_mb > 1000 THEN 'Storage optimization will improve query performance'
    ELSE 'Minimal performance impact expected'
  END as performance_impact_assessment,

  -- Success metrics
  JSON_BUILD_OBJECT(
    'storage_efficiency', ROUND(tdc.projected_storage_reclaim_mb / NULLIF(tdc.total_storage_mb, 0) * 100, 1),
    'automation_coverage', tdc.ttl_coverage_percent,
    'compliance_alignment', CASE WHEN tdc.compliance_status LIKE '%high%' THEN 'high' ELSE 'standard' END,
    'operational_maturity', tdc.ttl_optimization_level
  ) as success_metrics

FROM ttl_dashboard_comprehensive tdc
ORDER BY 
  CASE tdc.action_priority
    WHEN 'high' THEN 1
    WHEN 'medium' THEN 2
    ELSE 3
  END,
  tdc.total_storage_mb DESC;

-- QueryLeaf provides comprehensive MongoDB TTL capabilities:
-- 1. Native automatic data expiration with SQL-familiar TTL configuration syntax
-- 2. Compliance-driven retention policies with legal hold and audit trail support
-- 3. Intelligent TTL optimization based on data classification and access patterns  
-- 4. Performance monitoring with storage optimization and expiration forecasting
-- 5. Enterprise governance with retention policy management and compliance reporting
-- 6. Scalable data lifecycle management that handles high-volume expiration scenarios
-- 7. Integration with MongoDB's background TTL processing and index optimization
-- 8. SQL-style TTL operations for familiar data lifecycle management workflows
-- 9. Advanced analytics for TTL performance, storage optimization, and compliance tracking
-- 10. Automated recommendations for TTL configuration and data retention optimization

Best Practices for MongoDB TTL Implementation

Enterprise Data Lifecycle Management

Essential practices for implementing TTL collections effectively:

TTL Strategy Design: Plan TTL configurations based on data classification, compliance requirements, and business value
Performance Considerations: Monitor TTL processing impact and optimize index configurations for efficient expiration
Compliance Integration: Implement legal hold capabilities and audit trails for regulated data retention
Storage Optimization: Use TTL to maintain optimal storage utilization while preserving query performance
Monitoring and Alerting: Establish comprehensive monitoring for TTL operations and expiration patterns
Backup Coordination: Ensure backup strategies account for TTL expiration and data lifecycle requirements

Production Deployment and Scalability

Optimize TTL collections for enterprise-scale requirements:

Index Strategy: Design efficient compound indexes that support both TTL expiration and query patterns
Capacity Planning: Plan for TTL processing overhead and storage optimization benefits in capacity models
High Availability: Implement TTL collections across replica sets with consistent expiration behavior
Operational Excellence: Create standardized procedures for TTL configuration, monitoring, and compliance
Integration Patterns: Design application integration patterns that leverage TTL for optimal data lifecycle management
Performance Baselines: Establish performance baselines for TTL operations and storage optimization metrics

Conclusion

MongoDB TTL collections provide comprehensive automatic data lifecycle management that eliminates manual cleanup procedures, ensures compliance-driven retention, and maintains optimal storage utilization through intelligent document expiration. The native TTL functionality integrates seamlessly with MongoDB's document model and indexing capabilities to deliver enterprise-grade data lifecycle management.

Key MongoDB TTL Collection benefits include:

Automatic Expiration: Native document expiration eliminates manual cleanup procedures and operational overhead
Flexible Policies: Document-level and collection-level TTL control with compliance-driven retention management
Zero Performance Impact: Background expiration processing with no impact on application performance
Storage Optimization: Automatic storage reclamation and space optimization through intelligent document removal
Enterprise Compliance: Legal hold capabilities and audit trails for regulated data retention requirements
SQL Accessibility: Familiar TTL management operations through SQL-style syntax and configuration

Whether you're managing session data, audit logs, cache entries, or any time-sensitive information, MongoDB TTL collections with QueryLeaf's familiar SQL interface provide the foundation for scalable, compliant, and efficient data lifecycle management.

QueryLeaf Integration: QueryLeaf automatically optimizes MongoDB TTL collections while providing SQL-familiar syntax for data lifecycle management, retention policy configuration, and expiration monitoring. Advanced TTL patterns, compliance controls, and storage optimization strategies are seamlessly accessible through familiar SQL constructs, making sophisticated data lifecycle management both powerful and approachable for SQL-oriented teams.

The combination of MongoDB's intelligent TTL capabilities with SQL-style lifecycle management makes it an ideal platform for applications requiring both automated data expiration and familiar operational patterns, ensuring your data lifecycle strategies scale efficiently while maintaining compliance and operational excellence.

November 29, 2025
21 min read

MongoDB Connection Pooling and Concurrency Management: High-Performance Database Scaling and Enterprise Connection Optimization

Modern applications demand efficient database connection management to handle varying workloads, concurrent users, and peak traffic scenarios while maintaining optimal performance and resource utilization. Traditional database connection approaches often struggle with connection overhead, resource exhaustion, and poor scalability under high concurrency, leading to application bottlenecks, timeout errors, and degraded user experience.

MongoDB's connection pooling provides sophisticated connection management capabilities with intelligent pooling, automatic connection lifecycle management, and advanced concurrency control designed specifically for high-performance applications. Unlike traditional connection management that requires manual configuration and monitoring, MongoDB's connection pooling automatically optimizes connection usage while providing comprehensive monitoring and tuning capabilities for enterprise-scale deployments.

The Traditional Connection Management Challenge

Conventional database connection management faces significant scalability limitations:

-- Traditional PostgreSQL connection management - manual connection handling with poor scalability

-- Basic connection configuration (limited flexibility)
CREATE DATABASE production_app;

-- Connection pool configuration in application.properties (static configuration)
-- spring.datasource.url=jdbc:postgresql://localhost:5432/production_app
-- spring.datasource.username=app_user
-- spring.datasource.password=secure_password
-- spring.datasource.driver-class-name=org.postgresql.Driver

-- HikariCP connection pool settings (manual tuning required)
-- spring.datasource.hikari.maximum-pool-size=20
-- spring.datasource.hikari.minimum-idle=5
-- spring.datasource.hikari.connection-timeout=30000
-- spring.datasource.hikari.idle-timeout=600000
-- spring.datasource.hikari.max-lifetime=1800000
-- spring.datasource.hikari.leak-detection-threshold=60000

-- Application layer connection management with manual pooling
CREATE TABLE connection_metrics (
    metric_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    metric_timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    -- Connection pool metrics
    pool_name VARCHAR(100),
    active_connections INTEGER,
    idle_connections INTEGER,
    total_connections INTEGER,
    max_pool_size INTEGER,

    -- Performance metrics
    connection_acquisition_time_ms INTEGER,
    connection_usage_time_ms INTEGER,
    query_execution_count INTEGER,
    failed_connection_attempts INTEGER,

    -- Resource utilization
    memory_usage_bytes BIGINT,
    cpu_usage_percent DECIMAL(5,2),
    connection_wait_count INTEGER,
    connection_timeout_count INTEGER,

    -- Error tracking
    connection_leak_count INTEGER,
    pool_exhaustion_count INTEGER,
    database_errors INTEGER
);

-- Manual connection monitoring with limited visibility
CREATE OR REPLACE FUNCTION monitor_connection_pool()
RETURNS TABLE(
    pool_status VARCHAR,
    active_count INTEGER,
    idle_count INTEGER,
    wait_count INTEGER,
    usage_percent DECIMAL
) AS $$
BEGIN
    -- Basic connection pool monitoring (limited capabilities)
    RETURN QUERY
    WITH pool_stats AS (
        SELECT 
            'main_pool' as pool_name,
            -- Simulated pool metrics (not real-time)
            15 as current_active,
            5 as current_idle,
            20 as pool_max_size,
            2 as current_waiting
    )
    SELECT 
        'operational'::VARCHAR as pool_status,
        ps.current_active,
        ps.current_idle,
        ps.current_waiting,
        ROUND((ps.current_active::DECIMAL / ps.pool_max_size::DECIMAL) * 100, 2) as usage_percent
    FROM pool_stats ps;
END;
$$ LANGUAGE plpgsql;

-- Inadequate connection handling in stored procedures
CREATE OR REPLACE FUNCTION process_high_volume_transactions()
RETURNS VOID AS $$
DECLARE
    batch_size INTEGER := 1000;
    processed_count INTEGER := 0;
    error_count INTEGER := 0;
    connection_failures INTEGER := 0;
    start_time TIMESTAMP := CURRENT_TIMESTAMP;

    -- Limited connection context
    transaction_cursor CURSOR FOR 
        SELECT transaction_id, amount, user_id, transaction_type
        FROM pending_transactions
        WHERE status = 'pending'
        ORDER BY created_at
        LIMIT 10000;

    transaction_record RECORD;

BEGIN
    RAISE NOTICE 'Starting high-volume transaction processing...';

    -- Manual transaction processing with connection overhead
    FOR transaction_record IN transaction_cursor LOOP
        BEGIN
            -- Each operation creates connection overhead and latency
            INSERT INTO processed_transactions (
                original_transaction_id, 
                amount, 
                user_id, 
                transaction_type,
                processed_at,
                processing_batch
            ) VALUES (
                transaction_record.transaction_id,
                transaction_record.amount,
                transaction_record.user_id,
                transaction_record.transaction_type,
                CURRENT_TIMESTAMP,
                'batch_' || EXTRACT(EPOCH FROM start_time)
            );

            -- Update original transaction status
            UPDATE pending_transactions 
            SET status = 'processed',
                processed_at = CURRENT_TIMESTAMP,
                processed_by = CURRENT_USER
            WHERE transaction_id = transaction_record.transaction_id;

            processed_count := processed_count + 1;

            -- Frequent commits create connection pressure
            IF processed_count % batch_size = 0 THEN
                COMMIT;
                RAISE NOTICE 'Processed % transactions', processed_count;

                -- Manual connection health check (limited effectiveness)
                PERFORM pg_stat_get_activity(NULL);
            END IF;

        EXCEPTION
            WHEN connection_exception THEN
                connection_failures := connection_failures + 1;
                RAISE WARNING 'Connection failure for transaction %: %', 
                    transaction_record.transaction_id, SQLERRM;

            WHEN OTHERS THEN
                error_count := error_count + 1;
                RAISE WARNING 'Processing error for transaction %: %', 
                    transaction_record.transaction_id, SQLERRM;
        END;
    END LOOP;

    RAISE NOTICE 'Transaction processing completed: % processed, % errors, % connection failures in %',
        processed_count, error_count, connection_failures, 
        CURRENT_TIMESTAMP - start_time;

    -- Limited connection pool reporting
    INSERT INTO connection_metrics (
        pool_name, active_connections, total_connections,
        query_execution_count, failed_connection_attempts,
        connection_timeout_count
    ) VALUES (
        'manual_pool', 
        -- Estimated metrics (not accurate)
        GREATEST(processed_count / 100, 1),
        20,
        processed_count,
        connection_failures,
        connection_failures
    );
END;
$$ LANGUAGE plpgsql;

-- Complex manual connection management for concurrent operations
CREATE OR REPLACE FUNCTION concurrent_data_processing()
RETURNS TABLE(
    worker_id INTEGER,
    records_processed INTEGER,
    processing_time INTERVAL,
    connection_efficiency DECIMAL
) AS $$
DECLARE
    worker_count INTEGER := 5;
    records_per_worker INTEGER := 2000;
    worker_index INTEGER;
    processing_start TIMESTAMP;
    processing_end TIMESTAMP;

BEGIN
    processing_start := CURRENT_TIMESTAMP;

    -- Simulate concurrent workers (limited parallelization in PostgreSQL)
    FOR worker_index IN 1..worker_count LOOP
        BEGIN
            -- Each worker creates separate connection overhead
            PERFORM process_worker_batch(worker_index, records_per_worker);

            processing_end := CURRENT_TIMESTAMP;

            RETURN QUERY 
            SELECT 
                worker_index,
                records_per_worker,
                processing_end - processing_start,
                ROUND(
                    records_per_worker::DECIMAL / 
                    EXTRACT(EPOCH FROM processing_end - processing_start)::DECIMAL, 
                    2
                ) as efficiency;

        EXCEPTION
            WHEN connection_exception THEN
                RAISE WARNING 'Worker % failed due to connection issues', worker_index;

                RETURN QUERY 
                SELECT worker_index, 0, INTERVAL '0', 0.0::DECIMAL;

            WHEN OTHERS THEN
                RAISE WARNING 'Worker % failed: %', worker_index, SQLERRM;

                RETURN QUERY 
                SELECT worker_index, 0, INTERVAL '0', 0.0::DECIMAL;
        END;
    END LOOP;

    RETURN;
END;
$$ LANGUAGE plpgsql;

-- Helper function for worker batch processing
CREATE OR REPLACE FUNCTION process_worker_batch(
    p_worker_id INTEGER,
    p_batch_size INTEGER
) RETURNS VOID AS $$
DECLARE
    processed INTEGER := 0;
    batch_start TIMESTAMP := CURRENT_TIMESTAMP;
BEGIN
    -- Simulated batch processing with connection overhead
    WHILE processed < p_batch_size LOOP
        -- Each operation has connection acquisition overhead
        INSERT INTO worker_results (
            worker_id,
            batch_item,
            processed_at,
            processing_order
        ) VALUES (
            p_worker_id,
            processed + 1,
            CURRENT_TIMESTAMP,
            processed
        );

        processed := processed + 1;

        -- Frequent connection status checks
        IF processed % 100 = 0 THEN
            PERFORM pg_stat_get_activity(NULL);
        END IF;
    END LOOP;

    RAISE NOTICE 'Worker % completed % records in %',
        p_worker_id, processed, CURRENT_TIMESTAMP - batch_start;
END;
$$ LANGUAGE plpgsql;

-- Limited connection pool analysis and optimization
WITH connection_analysis AS (
    SELECT 
        pool_name,
        AVG(active_connections) as avg_active,
        MAX(active_connections) as peak_active,
        AVG(connection_acquisition_time_ms) as avg_acquisition_time,
        COUNT(*) FILTER (WHERE connection_timeout_count > 0) as timeout_incidents,
        COUNT(*) FILTER (WHERE pool_exhaustion_count > 0) as exhaustion_incidents,

        -- Basic utilization calculation
        AVG(active_connections::DECIMAL / total_connections::DECIMAL) as avg_utilization,

        -- Simple performance metrics
        AVG(query_execution_count) as avg_query_throughput,
        SUM(failed_connection_attempts) as total_failures

    FROM connection_metrics
    WHERE metric_timestamp >= CURRENT_TIMESTAMP - INTERVAL '24 hours'
    GROUP BY pool_name
),

pool_health_assessment AS (
    SELECT 
        ca.*,

        -- Basic health scoring (limited insight)
        CASE 
            WHEN ca.avg_utilization > 0.9 THEN 'overloaded'
            WHEN ca.avg_utilization > 0.7 THEN 'high_usage'
            WHEN ca.avg_utilization > 0.5 THEN 'normal'
            ELSE 'underutilized'
        END as pool_health,

        -- Simple recommendations
        CASE 
            WHEN ca.timeout_incidents > 5 THEN 'increase_pool_size'
            WHEN ca.avg_acquisition_time > 5000 THEN 'optimize_connection_creation'
            WHEN ca.exhaustion_incidents > 0 THEN 'review_connection_limits'
            ELSE 'monitor_trends'
        END as recommendation,

        -- Limited optimization suggestions
        CASE 
            WHEN ca.avg_utilization < 0.3 THEN 'reduce_pool_size_for_efficiency'
            WHEN ca.total_failures > 100 THEN 'investigate_connection_failures'
            ELSE 'maintain_current_configuration'
        END as optimization_advice

    FROM connection_analysis ca
)

SELECT 
    pha.pool_name,
    pha.avg_active,
    pha.peak_active,
    ROUND(pha.avg_utilization * 100, 1) as utilization_percent,
    pha.avg_acquisition_time || 'ms' as avg_connection_time,
    pha.pool_health,
    pha.recommendation,
    pha.optimization_advice,

    -- Basic performance assessment
    CASE 
        WHEN pha.avg_query_throughput > 1000 THEN 'high_performance'
        WHEN pha.avg_query_throughput > 500 THEN 'moderate_performance'
        ELSE 'low_performance'
    END as performance_assessment

FROM pool_health_assessment pha
ORDER BY pha.avg_utilization DESC;

-- Problems with traditional connection management:
-- 1. Manual configuration and tuning required for different workloads
-- 2. Limited visibility into connection usage patterns and performance
-- 3. Poor handling of connection spikes and variable load scenarios
-- 4. Rigid pooling strategies that don't adapt to application patterns
-- 5. Complex error handling for connection failures and timeouts
-- 6. Inefficient resource utilization with static pool configurations
-- 7. Difficult monitoring and debugging of connection-related issues
-- 8. Poor integration with modern microservices and cloud-native architectures
-- 9. Limited scalability with concurrent operations and high-throughput scenarios
-- 10. Complex optimization requiring deep database and application expertise

MongoDB provides comprehensive connection pooling with intelligent management and optimization:

// MongoDB Advanced Connection Pooling - enterprise-grade connection management and optimization
const { MongoClient, MongoServerError, MongoNetworkError } = require('mongodb');
const { EventEmitter } = require('events');

// Advanced MongoDB connection pool manager with intelligent optimization
class AdvancedConnectionPoolManager extends EventEmitter {
  constructor(config = {}) {
    super();

    this.config = {
      // Connection configuration
      uri: config.uri || 'mongodb://localhost:27017',
      database: config.database || 'production_app',

      // Connection pool configuration
      minPoolSize: config.minPoolSize || 5,
      maxPoolSize: config.maxPoolSize || 100,
      maxIdleTimeMS: config.maxIdleTimeMS || 30000,
      waitQueueTimeoutMS: config.waitQueueTimeoutMS || 5000,

      // Advanced pooling features
      enableConnectionPooling: config.enableConnectionPooling !== false,
      enableReadPreference: config.enableReadPreference !== false,
      enableWriteConcern: config.enableWriteConcern !== false,

      // Performance optimization
      maxConnecting: config.maxConnecting || 2,
      heartbeatFrequencyMS: config.heartbeatFrequencyMS || 10000,
      serverSelectionTimeoutMS: config.serverSelectionTimeoutMS || 30000,
      socketTimeoutMS: config.socketTimeoutMS || 0,

      // Connection management
      retryWrites: config.retryWrites !== false,
      retryReads: config.retryReads !== false,
      compressors: config.compressors || ['snappy', 'zlib'],

      // Monitoring and analytics
      enableConnectionPoolMonitoring: config.enableConnectionPoolMonitoring !== false,
      enablePerformanceAnalytics: config.enablePerformanceAnalytics !== false,
      enableAdaptivePooling: config.enableAdaptivePooling !== false,

      // Application-specific optimization
      applicationName: config.applicationName || 'enterprise-mongodb-app',
      loadBalanced: config.loadBalanced || false,
      directConnection: config.directConnection || false
    };

    // Connection pool state
    this.connectionState = {
      isInitialized: false,
      client: null,
      database: null,
      connectionStats: {
        totalConnections: 0,
        activeConnections: 0,
        availableConnections: 0,
        connectionRequests: 0,
        failedConnections: 0,
        pooledConnections: 0
      }
    };

    // Performance monitoring
    this.performanceMetrics = {
      connectionAcquisitionTimes: [],
      operationLatencies: [],
      throughputMeasurements: [],
      errorRates: [],
      resourceUtilization: []
    };

    // Connection pool event handlers
    this.poolEventHandlers = new Map();

    // Adaptive pooling algorithm
    this.adaptivePooling = {
      enabled: this.config.enableAdaptivePooling,
      learningPeriodMS: 300000, // 5 minutes
      adjustmentThreshold: 0.15,
      lastAdjustment: Date.now(),
      performanceBaseline: null
    };

    this.initializeConnectionPool();
  }

  async initializeConnectionPool() {
    console.log('Initializing advanced MongoDB connection pool...');

    try {
      // Create MongoDB client with optimized connection pool settings
      this.connectionState.client = new MongoClient(this.config.uri, {
        // Connection pool configuration
        minPoolSize: this.config.minPoolSize,
        maxPoolSize: this.config.maxPoolSize,
        maxIdleTimeMS: this.config.maxIdleTimeMS,
        waitQueueTimeoutMS: this.config.waitQueueTimeoutMS,
        maxConnecting: this.config.maxConnecting,

        // Server selection and timeouts
        serverSelectionTimeoutMS: this.config.serverSelectionTimeoutMS,
        heartbeatFrequencyMS: this.config.heartbeatFrequencyMS,
        socketTimeoutMS: this.config.socketTimeoutMS,
        connectTimeoutMS: 10000,

        // Connection optimization
        retryWrites: this.config.retryWrites,
        retryReads: this.config.retryReads,
        compressors: this.config.compressors,

        // Application configuration
        appName: this.config.applicationName,
        loadBalanced: this.config.loadBalanced,
        directConnection: this.config.directConnection,

        // Read and write preferences
        readPreference: 'secondaryPreferred',
        writeConcern: { w: 'majority', j: true },
        readConcern: { level: 'majority' },

        // Monitoring configuration
        monitorCommands: this.config.enableConnectionPoolMonitoring,
        loggerLevel: 'info'
      });

      // Setup connection pool event monitoring
      if (this.config.enableConnectionPoolMonitoring) {
        this.setupConnectionPoolMonitoring();
      }

      // Connect to MongoDB
      await this.connectionState.client.connect();
      this.connectionState.database = this.connectionState.client.db(this.config.database);
      this.connectionState.isInitialized = true;

      // Initialize performance monitoring
      if (this.config.enablePerformanceAnalytics) {
        await this.initializePerformanceMonitoring();
      }

      // Setup adaptive pooling if enabled
      if (this.config.enableAdaptivePooling) {
        this.setupAdaptivePooling();
      }

      console.log('MongoDB connection pool initialized successfully', {
        database: this.config.database,
        minPoolSize: this.config.minPoolSize,
        maxPoolSize: this.config.maxPoolSize,
        adaptivePooling: this.config.enableAdaptivePooling
      });

      this.emit('connectionPoolReady', this.getConnectionStats());

      return this.connectionState.database;

    } catch (error) {
      console.error('Failed to initialize connection pool:', error);
      this.emit('connectionPoolError', error);
      throw error;
    }
  }

  setupConnectionPoolMonitoring() {
    console.log('Setting up comprehensive connection pool monitoring...');

    // Connection pool opened
    this.connectionState.client.on('connectionPoolCreated', (event) => {
      console.log(`Connection pool created: ${event.address}`, {
        maxPoolSize: event.options?.maxPoolSize,
        minPoolSize: event.options?.minPoolSize
      });

      this.emit('poolCreated', event);
    });

    // Connection created
    this.connectionState.client.on('connectionCreated', (event) => {
      this.connectionState.connectionStats.totalConnections++;
      this.connectionState.connectionStats.availableConnections++;

      console.log(`Connection created: ${event.connectionId}`, {
        totalConnections: this.connectionState.connectionStats.totalConnections
      });

      this.emit('connectionCreated', event);
    });

    // Connection ready
    this.connectionState.client.on('connectionReady', (event) => {
      console.log(`Connection ready: ${event.connectionId}`);
      this.emit('connectionReady', event);
    });

    // Connection checked out
    this.connectionState.client.on('connectionCheckedOut', (event) => {
      this.connectionState.connectionStats.activeConnections++;
      this.connectionState.connectionStats.availableConnections--;

      const checkoutTime = Date.now();
      this.recordConnectionAcquisitionTime(checkoutTime);

      this.emit('connectionCheckedOut', event);
    });

    // Connection checked in
    this.connectionState.client.on('connectionCheckedIn', (event) => {
      this.connectionState.connectionStats.activeConnections--;
      this.connectionState.connectionStats.availableConnections++;

      this.emit('connectionCheckedIn', event);
    });

    // Connection pool closed
    this.connectionState.client.on('connectionPoolClosed', (event) => {
      console.log(`Connection pool closed: ${event.address}`);
      this.emit('connectionPoolClosed', event);
    });

    // Connection check out failed
    this.connectionState.client.on('connectionCheckOutFailed', (event) => {
      this.connectionState.connectionStats.failedConnections++;

      console.warn(`Connection checkout failed: ${event.reason}`, {
        failedConnections: this.connectionState.connectionStats.failedConnections
      });

      this.emit('connectionCheckoutFailed', event);

      // Trigger adaptive pooling adjustment if enabled
      if (this.config.enableAdaptivePooling) {
        this.evaluatePoolingAdjustment('checkout_failure');
      }
    });

    // Connection closed
    this.connectionState.client.on('connectionClosed', (event) => {
      this.connectionState.connectionStats.totalConnections--;

      console.log(`Connection closed: ${event.connectionId}`, {
        reason: event.reason,
        totalConnections: this.connectionState.connectionStats.totalConnections
      });

      this.emit('connectionClosed', event);
    });
  }

  async executeWithPoolManagement(operation, options = {}) {
    console.log('Executing operation with advanced pool management...');
    const startTime = Date.now();

    try {
      if (!this.connectionState.isInitialized) {
        throw new Error('Connection pool not initialized');
      }

      // Record connection request
      this.connectionState.connectionStats.connectionRequests++;

      // Check pool health before operation
      const poolHealth = await this.assessPoolHealth();
      if (poolHealth.status === 'critical') {
        console.warn('Pool in critical state, applying emergency measures...');
        await this.applyEmergencyPoolMeasures(poolHealth);
      }

      // Execute operation with connection management
      const result = await this.executeOperationWithRetry(operation, options);

      // Record successful operation
      const executionTime = Date.now() - startTime;
      this.recordOperationLatency(executionTime);

      // Update performance metrics
      if (this.config.enablePerformanceAnalytics) {
        this.updatePerformanceMetrics(executionTime, 'success');
      }

      return result;

    } catch (error) {
      const executionTime = Date.now() - startTime;

      console.error('Operation failed with connection pool:', error.message);

      // Record failed operation
      this.recordOperationLatency(executionTime, 'error');

      // Handle connection-specific errors
      if (this.isConnectionError(error)) {
        await this.handleConnectionError(error, options);
      }

      // Update error metrics
      if (this.config.enablePerformanceAnalytics) {
        this.updatePerformanceMetrics(executionTime, 'error');
      }

      throw error;
    }
  }

  async executeOperationWithRetry(operation, options) {
    const maxRetries = options.maxRetries || 3;
    const retryDelayMs = options.retryDelayMs || 1000;
    let lastError = null;

    for (let attempt = 1; attempt <= maxRetries; attempt++) {
      try {
        // Execute the operation
        const result = await operation(this.connectionState.database);

        if (attempt > 1) {
          console.log(`Operation succeeded on retry attempt ${attempt}`);
        }

        return result;

      } catch (error) {
        lastError = error;

        // Check if error is retryable
        if (!this.isRetryableError(error) || attempt === maxRetries) {
          throw error;
        }

        console.warn(`Operation failed (attempt ${attempt}/${maxRetries}): ${error.message}`);

        // Wait before retry with exponential backoff
        const delay = retryDelayMs * Math.pow(2, attempt - 1);
        await this.sleep(delay);
      }
    }

    throw lastError;
  }

  async performBulkOperationsWithPoolOptimization(collectionName, operations, options = {}) {
    console.log(`Executing bulk operations with pool optimization: ${operations.length} operations...`);
    const startTime = Date.now();

    try {
      // Optimize pool for bulk operations
      await this.optimizePoolForBulkOperations(operations.length);

      const collection = this.connectionState.database.collection(collectionName);
      const batchSize = options.batchSize || 1000;
      const results = {
        totalOperations: operations.length,
        successfulOperations: 0,
        failedOperations: 0,
        batches: [],
        totalTime: 0,
        averageLatency: 0
      };

      // Process operations in optimized batches
      const batches = this.createOptimizedBatches(operations, batchSize);

      for (let batchIndex = 0; batchIndex < batches.length; batchIndex++) {
        const batch = batches[batchIndex];
        const batchStartTime = Date.now();

        try {
          const batchResult = await this.executeWithPoolManagement(async (db) => {
            return await collection.bulkWrite(batch, {
              ordered: options.ordered !== false,
              writeConcern: { w: 'majority', j: true }
            });
          });

          const batchTime = Date.now() - batchStartTime;
          results.successfulOperations += batchResult.insertedCount + batchResult.modifiedCount;
          results.batches.push({
            batchIndex,
            batchSize: batch.length,
            executionTime: batchTime,
            insertedCount: batchResult.insertedCount,
            modifiedCount: batchResult.modifiedCount,
            deletedCount: batchResult.deletedCount
          });

          console.log(`Batch ${batchIndex + 1}/${batches.length} completed: ${batch.length} operations in ${batchTime}ms`);

        } catch (batchError) {
          console.error(`Batch ${batchIndex + 1} failed:`, batchError.message);
          results.failedOperations += batch.length;

          if (!options.continueOnError) {
            throw batchError;
          }
        }
      }

      // Calculate final statistics
      results.totalTime = Date.now() - startTime;
      results.averageLatency = results.totalTime / results.batches.length;

      console.log(`Bulk operations completed: ${results.successfulOperations}/${results.totalOperations} successful in ${results.totalTime}ms`);

      return results;

    } catch (error) {
      console.error('Bulk operations failed:', error);
      throw error;
    }
  }

  async handleConcurrentOperations(concurrentTasks, options = {}) {
    console.log(`Managing ${concurrentTasks.length} concurrent operations with pool optimization...`);
    const startTime = Date.now();

    try {
      // Optimize pool for concurrent operations
      await this.optimizePoolForConcurrency(concurrentTasks.length);

      const maxConcurrency = options.maxConcurrency || Math.min(concurrentTasks.length, this.config.maxPoolSize * 0.8);
      const results = [];
      const errors = [];

      // Execute tasks with controlled concurrency
      const taskPromises = [];
      const semaphore = { count: maxConcurrency };

      for (let i = 0; i < concurrentTasks.length; i++) {
        const task = concurrentTasks[i];
        const taskPromise = this.executeConcurrentTask(task, i, semaphore, options);
        taskPromises.push(taskPromise);
      }

      // Wait for all tasks to complete
      const taskResults = await Promise.allSettled(taskPromises);

      // Process results
      taskResults.forEach((result, index) => {
        if (result.status === 'fulfilled') {
          results.push({
            taskIndex: index,
            result: result.value,
            success: true
          });
        } else {
          errors.push({
            taskIndex: index,
            error: result.reason.message,
            success: false
          });
        }
      });

      const totalTime = Date.now() - startTime;

      console.log(`Concurrent operations completed: ${results.length} successful, ${errors.length} failed in ${totalTime}ms`);

      return {
        totalTasks: concurrentTasks.length,
        successfulTasks: results.length,
        failedTasks: errors.length,
        totalTime,
        results,
        errors,
        averageConcurrency: maxConcurrency
      };

    } catch (error) {
      console.error('Concurrent operations management failed:', error);
      throw error;
    }
  }

  async executeConcurrentTask(task, taskIndex, semaphore, options) {
    // Wait for semaphore (connection availability)
    await this.acquireSemaphore(semaphore);

    try {
      const taskStartTime = Date.now();

      const result = await this.executeWithPoolManagement(async (db) => {
        return await task(db, taskIndex);
      }, options);

      const taskTime = Date.now() - taskStartTime;

      return {
        taskIndex,
        executionTime: taskTime,
        result
      };

    } finally {
      this.releaseSemaphore(semaphore);
    }
  }

  async optimizePoolForBulkOperations(operationCount) {
    console.log(`Optimizing connection pool for ${operationCount} bulk operations...`);

    // Calculate optimal pool size for bulk operations
    const estimatedConnections = Math.min(
      Math.ceil(operationCount / 1000) + 2, // Base estimate plus buffer
      this.config.maxPoolSize
    );

    // Temporarily adjust pool if needed
    if (estimatedConnections > this.config.minPoolSize) {
      console.log(`Temporarily increasing pool size to ${estimatedConnections} for bulk operations`);
      // Note: In production, this would adjust pool configuration dynamically
    }
  }

  async optimizePoolForConcurrency(concurrentTaskCount) {
    console.log(`Optimizing connection pool for ${concurrentTaskCount} concurrent operations...`);

    // Ensure sufficient connections for concurrency
    const requiredConnections = Math.min(concurrentTaskCount + 2, this.config.maxPoolSize);

    if (requiredConnections > this.connectionState.connectionStats.totalConnections) {
      console.log(`Pool optimization: ensuring ${requiredConnections} connections are available`);
      // Note: MongoDB driver automatically manages this, but we can provide hints
    }
  }

  async assessPoolHealth() {
    const stats = this.getConnectionStats();
    const utilizationRatio = stats.activeConnections / this.config.maxPoolSize;
    const failureRate = stats.failedConnections / Math.max(stats.connectionRequests, 1);

    let status = 'healthy';
    const issues = [];

    if (utilizationRatio > 0.9) {
      status = 'critical';
      issues.push('high_utilization');
    } else if (utilizationRatio > 0.7) {
      status = 'warning';
      issues.push('moderate_utilization');
    }

    if (failureRate > 0.1) {
      status = status === 'healthy' ? 'warning' : 'critical';
      issues.push('high_failure_rate');
    }

    if (stats.availableConnections === 0) {
      status = 'critical';
      issues.push('no_available_connections');
    }

    return {
      status,
      utilizationRatio,
      failureRate,
      issues,
      recommendations: this.generateHealthRecommendations(issues)
    };
  }

  generateHealthRecommendations(issues) {
    const recommendations = [];

    if (issues.includes('high_utilization')) {
      recommendations.push('Consider increasing maxPoolSize');
    }

    if (issues.includes('high_failure_rate')) {
      recommendations.push('Check network connectivity and server health');
    }

    if (issues.includes('no_available_connections')) {
      recommendations.push('Investigate connection leaks and optimize operation duration');
    }

    return recommendations;
  }

  async applyEmergencyPoolMeasures(poolHealth) {
    console.log('Applying emergency pool measures:', poolHealth.issues);

    if (poolHealth.issues.includes('no_available_connections')) {
      console.log('Force closing idle connections to recover pool capacity...');
      // In production, this would implement connection cleanup
    }

    if (poolHealth.issues.includes('high_failure_rate')) {
      console.log('Implementing circuit breaker for connection failures...');
      // In production, this would implement circuit breaker pattern
    }
  }

  setupAdaptivePooling() {
    console.log('Setting up adaptive connection pooling algorithm...');

    setInterval(() => {
      this.evaluateAndAdjustPool();
    }, this.adaptivePooling.learningPeriodMS);
  }

  async evaluateAndAdjustPool() {
    if (!this.adaptivePooling.enabled) return;

    console.log('Evaluating pool performance for adaptive adjustment...');

    const currentMetrics = this.calculatePerformanceMetrics();

    if (this.adaptivePooling.performanceBaseline === null) {
      this.adaptivePooling.performanceBaseline = currentMetrics;
      return;
    }

    const performanceChange = this.comparePerformanceMetrics(
      currentMetrics,
      this.adaptivePooling.performanceBaseline
    );

    if (Math.abs(performanceChange) > this.adaptivePooling.adjustmentThreshold) {
      await this.adjustPoolConfiguration(performanceChange, currentMetrics);
      this.adaptivePooling.performanceBaseline = currentMetrics;
    }
  }

  async adjustPoolConfiguration(performanceChange, metrics) {
    console.log(`Adaptive pooling: adjusting configuration based on ${performanceChange > 0 ? 'improved' : 'degraded'} performance`);

    if (performanceChange < -this.adaptivePooling.adjustmentThreshold) {
      // Performance degraded, try to optimize
      if (metrics.utilizationRatio > 0.8) {
        console.log('Increasing pool size due to high utilization');
        // In production, would adjust pool size
      }
    } else if (performanceChange > this.adaptivePooling.adjustmentThreshold) {
      // Performance improved, maintain or optimize further
      console.log('Performance improved, maintaining current pool configuration');
    }
  }

  // Utility methods for connection pool management

  recordConnectionAcquisitionTime(checkoutTime) {
    const acquisitionTime = Date.now() - checkoutTime;
    this.performanceMetrics.connectionAcquisitionTimes.push(acquisitionTime);

    // Keep only recent measurements
    if (this.performanceMetrics.connectionAcquisitionTimes.length > 1000) {
      this.performanceMetrics.connectionAcquisitionTimes = 
        this.performanceMetrics.connectionAcquisitionTimes.slice(-500);
    }
  }

  recordOperationLatency(latency, status = 'success') {
    this.performanceMetrics.operationLatencies.push({
      latency,
      status,
      timestamp: Date.now()
    });

    // Keep only recent measurements
    if (this.performanceMetrics.operationLatencies.length > 1000) {
      this.performanceMetrics.operationLatencies = 
        this.performanceMetrics.operationLatencies.slice(-500);
    }
  }

  isConnectionError(error) {
    return error instanceof MongoNetworkError || 
           error instanceof MongoServerError ||
           error.message.includes('connection') ||
           error.message.includes('timeout');
  }

  isRetryableError(error) {
    if (error instanceof MongoNetworkError) return true;
    if (error.code === 11000) return false; // Duplicate key error
    if (error.message.includes('timeout')) return true;
    return false;
  }

  async handleConnectionError(error, options) {
    console.warn('Handling connection error:', error.message);

    if (error instanceof MongoNetworkError) {
      console.log('Network error detected, checking pool health...');
      const poolHealth = await this.assessPoolHealth();
      if (poolHealth.status === 'critical') {
        await this.applyEmergencyPoolMeasures(poolHealth);
      }
    }
  }

  createOptimizedBatches(operations, batchSize) {
    const batches = [];
    for (let i = 0; i < operations.length; i += batchSize) {
      batches.push(operations.slice(i, i + batchSize));
    }
    return batches;
  }

  async acquireSemaphore(semaphore) {
    while (semaphore.count <= 0) {
      await this.sleep(10);
    }
    semaphore.count--;
  }

  releaseSemaphore(semaphore) {
    semaphore.count++;
  }

  sleep(ms) {
    return new Promise(resolve => setTimeout(resolve, ms));
  }

  getConnectionStats() {
    return {
      ...this.connectionState.connectionStats,
      poolSize: this.config.maxPoolSize,
      utilizationRatio: this.connectionState.connectionStats.activeConnections / this.config.maxPoolSize,
      timestamp: new Date()
    };
  }

  calculatePerformanceMetrics() {
    const recent = this.performanceMetrics.operationLatencies.slice(-100);
    const avgLatency = recent.reduce((sum, op) => sum + op.latency, 0) / recent.length || 0;
    const successRate = recent.filter(op => op.status === 'success').length / recent.length || 0;
    const utilizationRatio = this.connectionState.connectionStats.activeConnections / this.config.maxPoolSize;

    return {
      avgLatency,
      successRate,
      utilizationRatio,
      throughput: recent.length / 5 // Operations per second estimate
    };
  }

  comparePerformanceMetrics(current, baseline) {
    const latencyChange = (baseline.avgLatency - current.avgLatency) / baseline.avgLatency;
    const successRateChange = current.successRate - baseline.successRate;
    const throughputChange = (current.throughput - baseline.throughput) / baseline.throughput;

    // Weighted performance score
    return (latencyChange * 0.4) + (successRateChange * 0.3) + (throughputChange * 0.3);
  }

  async getDetailedPoolAnalytics() {
    const stats = this.getConnectionStats();
    const metrics = this.calculatePerformanceMetrics();
    const poolHealth = await this.assessPoolHealth();

    return {
      connectionStats: stats,
      performanceMetrics: metrics,
      poolHealth: poolHealth,
      configuration: {
        minPoolSize: this.config.minPoolSize,
        maxPoolSize: this.config.maxPoolSize,
        maxIdleTimeMS: this.config.maxIdleTimeMS,
        adaptivePoolingEnabled: this.config.enableAdaptivePooling
      },
      recommendations: poolHealth.recommendations
    };
  }

  async closeConnectionPool() {
    console.log('Closing MongoDB connection pool...');

    if (this.connectionState.client) {
      await this.connectionState.client.close();
      this.connectionState.isInitialized = false;
      console.log('Connection pool closed successfully');
    }
  }
}

// Example usage for enterprise-scale applications
async function demonstrateAdvancedConnectionPooling() {
  const poolManager = new AdvancedConnectionPoolManager({
    uri: 'mongodb://localhost:27017',
    database: 'production_analytics',
    minPoolSize: 10,
    maxPoolSize: 50,
    enableAdaptivePooling: true,
    enablePerformanceAnalytics: true,
    applicationName: 'enterprise-data-processor'
  });

  try {
    // Wait for pool initialization
    await poolManager.initializeConnectionPool();

    // Demonstrate bulk operations with pool optimization
    const bulkOperations = Array.from({ length: 5000 }, (_, index) => ({
      insertOne: {
        document: {
          userId: `user_${index}`,
          eventType: 'page_view',
          timestamp: new Date(),
          sessionId: `session_${Math.floor(index / 100)}`,
          data: {
            page: `/page_${index % 50}`,
            duration: Math.floor(Math.random() * 300),
            source: 'web'
          }
        }
      }
    }));

    console.log('Executing bulk operations with pool optimization...');
    const bulkResults = await poolManager.performBulkOperationsWithPoolOptimization(
      'user_events',
      bulkOperations,
      {
        batchSize: 1000,
        continueOnError: true
      }
    );

    // Demonstrate concurrent operations
    const concurrentTasks = Array.from({ length: 20 }, (_, index) => 
      async (db, taskIndex) => {
        const collection = db.collection('analytics_data');

        // Simulate complex aggregation
        const result = await collection.aggregate([
          { $match: { userId: { $regex: `user_${taskIndex}` } } },
          { $group: {
            _id: '$eventType',
            count: { $sum: 1 },
            avgDuration: { $avg: '$data.duration' }
          }},
          { $sort: { count: -1 } }
        ]).toArray();

        return { taskIndex, resultCount: result.length };
      }
    );

    console.log('Executing concurrent operations with pool management...');
    const concurrentResults = await poolManager.handleConcurrentOperations(concurrentTasks, {
      maxConcurrency: 15
    });

    // Get detailed analytics
    const poolAnalytics = await poolManager.getDetailedPoolAnalytics();
    console.log('Connection Pool Analytics:', JSON.stringify(poolAnalytics, null, 2));

    return {
      bulkResults,
      concurrentResults,
      poolAnalytics
    };

  } catch (error) {
    console.error('Advanced connection pooling demonstration failed:', error);
    throw error;
  } finally {
    await poolManager.closeConnectionPool();
  }
}

// Benefits of MongoDB Advanced Connection Pooling:
// - Intelligent connection management with automatic optimization and resource management
// - Comprehensive monitoring with real-time pool health assessment and performance analytics
// - Adaptive pooling algorithms that adjust to application patterns and workload changes
// - Advanced error handling with retry mechanisms and circuit breaker patterns
// - Support for concurrent operations with intelligent connection allocation and management
// - Production-ready scalability with distributed connection management and optimization
// - Comprehensive analytics and monitoring for operational insight and troubleshooting
// - Seamless integration with MongoDB's native connection pooling and cluster management

module.exports = {
  AdvancedConnectionPoolManager,
  demonstrateAdvancedConnectionPooling
};

Understanding MongoDB Connection Pooling Architecture

Enterprise-Scale Connection Management and Optimization

Implement sophisticated connection pooling strategies for production applications:

// Production-ready connection pooling with advanced features and enterprise optimization
class ProductionConnectionPoolPlatform extends AdvancedConnectionPoolManager {
  constructor(productionConfig) {
    super(productionConfig);

    this.productionConfig = {
      ...productionConfig,
      distributedPooling: true,
      realtimeMonitoring: true,
      advancedLoadBalancing: true,
      enterpriseFailover: true,
      automaticRecovery: true,
      performanceOptimization: true
    };

    this.setupProductionFeatures();
    this.initializeDistributedPooling();
    this.setupEnterpriseMonitoring();
  }

  async implementDistributedConnectionPooling() {
    console.log('Setting up distributed connection pooling architecture...');

    const distributedStrategy = {
      // Multi-region pooling
      regionAwareness: {
        enabled: true,
        primaryRegion: 'us-east-1',
        secondaryRegions: ['us-west-2', 'eu-west-1'],
        crossRegionFailover: true
      },

      // Load balancing
      loadBalancing: {
        algorithm: 'weighted_round_robin',
        healthChecking: true,
        automaticFailover: true,
        loadFactors: {
          latency: 0.4,
          throughput: 0.3,
          availability: 0.3
        }
      },

      // Connection optimization
      optimization: {
        connectionAffinity: true,
        adaptiveBatchSizing: true,
        intelligentRouting: true,
        resourceOptimization: true
      }
    };

    return await this.deployDistributedStrategy(distributedStrategy);
  }

  async implementEnterpriseFailover() {
    console.log('Implementing enterprise-grade failover mechanisms...');

    const failoverStrategy = {
      // Automatic failover
      automaticFailover: {
        enabled: true,
        healthCheckInterval: 5000,
        failoverThreshold: 3,
        recoveryTimeout: 30000
      },

      // Connection recovery
      connectionRecovery: {
        automaticRecovery: true,
        retryBackoffStrategy: 'exponential',
        maxRecoveryAttempts: 5,
        recoveryDelay: 1000
      },

      // High availability
      highAvailability: {
        redundantConnections: true,
        crossDatacenterFailover: true,
        zeroDowntimeRecovery: true,
        dataConsistencyGuarantees: true
      }
    };

    return await this.deployFailoverStrategy(failoverStrategy);
  }

  async implementPerformanceOptimization() {
    console.log('Implementing advanced performance optimization...');

    const optimizationStrategy = {
      // Connection optimization
      connectionOptimization: {
        warmupConnections: true,
        connectionPreloading: true,
        intelligentCaching: true,
        resourcePooling: true
      },

      // Query optimization
      queryOptimization: {
        queryPlanCaching: true,
        connectionAffinity: true,
        batchOptimization: true,
        pipelineOptimization: true
      },

      // Resource management
      resourceManagement: {
        memoryOptimization: true,
        cpuUtilizationOptimization: true,
        networkOptimization: true,
        diskIOOptimization: true
      }
    };

    return await this.deployOptimizationStrategy(optimizationStrategy);
  }
}

SQL-Style Connection Management with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB connection pooling and management:

-- QueryLeaf connection pooling with SQL-familiar configuration syntax

-- Configure connection pool settings
SET connection_pool_min_size = 10;
SET connection_pool_max_size = 100;
SET connection_pool_max_idle_time = '30 seconds';
SET connection_pool_wait_timeout = '5 seconds';
SET enable_adaptive_pooling = true;
SET enable_connection_monitoring = true;

-- Advanced connection pool configuration
WITH connection_pool_configuration AS (
  SELECT 
    -- Pool sizing configuration
    10 as min_pool_size,
    100 as max_pool_size,
    30000 as max_idle_time_ms,
    5000 as wait_queue_timeout_ms,
    2 as max_connecting,

    -- Performance optimization
    true as enable_compression,
    ARRAY['snappy', 'zlib'] as compression_algorithms,
    true as retry_writes,
    true as retry_reads,

    -- Application configuration
    'enterprise-analytics-app' as application_name,
    false as load_balanced,
    false as direct_connection,

    -- Monitoring and analytics
    true as enable_monitoring,
    true as enable_performance_analytics,
    true as enable_adaptive_pooling,
    true as enable_health_checking,

    -- Timeout and retry configuration
    30000 as server_selection_timeout_ms,
    10000 as heartbeat_frequency_ms,
    0 as socket_timeout_ms,
    10000 as connect_timeout_ms,

    -- Read and write preferences
    'secondaryPreferred' as read_preference,
    JSON_OBJECT('w', 'majority', 'j', true) as write_concern,
    JSON_OBJECT('level', 'majority') as read_concern
),

-- Monitor connection pool performance and utilization
connection_pool_metrics AS (
  SELECT 
    pool_name,
    measurement_timestamp,

    -- Connection statistics
    total_connections,
    active_connections,
    available_connections,
    pooled_connections,
    connection_requests,
    failed_connections,

    -- Performance metrics
    avg_connection_acquisition_time_ms,
    max_connection_acquisition_time_ms,
    avg_operation_latency_ms,
    operations_per_second,

    -- Pool utilization analysis
    ROUND((active_connections::DECIMAL / total_connections::DECIMAL) * 100, 2) as utilization_percent,
    ROUND((failed_connections::DECIMAL / NULLIF(connection_requests::DECIMAL, 0)) * 100, 2) as failure_rate_percent,

    -- Connection lifecycle metrics
    connections_created_per_minute,
    connections_closed_per_minute,
    connection_timeouts,

    -- Resource utilization
    memory_usage_mb,
    cpu_usage_percent,
    network_bytes_per_second,

    -- Health indicators
    CASE 
      WHEN utilization_percent > 90 THEN 'critical'
      WHEN utilization_percent > 70 THEN 'warning'
      WHEN utilization_percent > 50 THEN 'normal'
      ELSE 'low'
    END as utilization_status,

    CASE 
      WHEN failure_rate_percent > 10 THEN 'critical'
      WHEN failure_rate_percent > 5 THEN 'warning'
      WHEN failure_rate_percent > 1 THEN 'moderate'
      ELSE 'healthy'
    END as connection_health_status

  FROM connection_pool_monitoring_data
  WHERE measurement_timestamp >= CURRENT_TIMESTAMP - INTERVAL '1 hour'
),

-- Analyze connection pool performance trends
performance_trend_analysis AS (
  SELECT 
    pool_name,
    DATE_TRUNC('minute', measurement_timestamp) as time_bucket,

    -- Aggregated performance metrics
    AVG(utilization_percent) as avg_utilization,
    MAX(utilization_percent) as peak_utilization,
    AVG(avg_connection_acquisition_time_ms) as avg_acquisition_time,
    MAX(max_connection_acquisition_time_ms) as peak_acquisition_time,
    AVG(operations_per_second) as avg_throughput,

    -- Error and timeout analysis
    SUM(failed_connections) as total_failures,
    SUM(connection_timeouts) as total_timeouts,
    AVG(failure_rate_percent) as avg_failure_rate,

    -- Resource consumption trends
    AVG(memory_usage_mb) as avg_memory_usage,
    AVG(cpu_usage_percent) as avg_cpu_usage,
    AVG(network_bytes_per_second) as avg_network_usage,

    -- Performance scoring
    CASE 
      WHEN AVG(avg_operation_latency_ms) < 10 AND AVG(failure_rate_percent) < 1 THEN 100
      WHEN AVG(avg_operation_latency_ms) < 50 AND AVG(failure_rate_percent) < 5 THEN 80
      WHEN AVG(avg_operation_latency_ms) < 100 AND AVG(failure_rate_percent) < 10 THEN 60
      ELSE 40
    END as performance_score,

    -- Trend calculations
    LAG(AVG(operations_per_second)) OVER (PARTITION BY pool_name ORDER BY time_bucket) as prev_throughput,
    LAG(AVG(avg_connection_acquisition_time_ms)) OVER (PARTITION BY pool_name ORDER BY time_bucket) as prev_acquisition_time

  FROM connection_pool_metrics
  GROUP BY pool_name, DATE_TRUNC('minute', measurement_timestamp)
),

-- Connection pool optimization recommendations
pool_optimization_analysis AS (
  SELECT 
    pta.pool_name,
    pta.time_bucket,
    pta.avg_utilization,
    pta.avg_acquisition_time,
    pta.avg_throughput,
    pta.performance_score,

    -- Performance trend analysis
    CASE 
      WHEN pta.avg_throughput > pta.prev_throughput THEN 'improving'
      WHEN pta.avg_throughput < pta.prev_throughput THEN 'degrading'
      ELSE 'stable'
    END as throughput_trend,

    CASE 
      WHEN pta.avg_acquisition_time < pta.prev_acquisition_time THEN 'improving'
      WHEN pta.avg_acquisition_time > pta.prev_acquisition_time THEN 'degrading'
      ELSE 'stable'
    END as latency_trend,

    -- Pool sizing recommendations
    CASE 
      WHEN pta.avg_utilization > 90 THEN 'increase_pool_size'
      WHEN pta.avg_utilization > 80 AND pta.avg_acquisition_time > 100 THEN 'increase_pool_size'
      WHEN pta.avg_utilization < 30 AND pta.performance_score > 80 THEN 'decrease_pool_size'
      WHEN pta.avg_acquisition_time > 200 THEN 'optimize_connection_creation'
      ELSE 'maintain_current_size'
    END as sizing_recommendation,

    -- Configuration optimization suggestions
    CASE 
      WHEN pta.total_failures > 10 THEN 'increase_retry_attempts'
      WHEN pta.total_timeouts > 5 THEN 'increase_timeout_values'
      WHEN pta.avg_failure_rate > 5 THEN 'investigate_connection_issues'
      WHEN pta.performance_score < 60 THEN 'comprehensive_optimization_needed'
      ELSE 'configuration_optimal'
    END as configuration_recommendation,

    -- Resource optimization suggestions
    CASE 
      WHEN pta.avg_memory_usage > 1000 THEN 'optimize_memory_usage'
      WHEN pta.avg_cpu_usage > 80 THEN 'optimize_cpu_utilization'
      WHEN pta.avg_network_usage > 100000000 THEN 'optimize_network_efficiency'
      ELSE 'resource_usage_optimal'
    END as resource_optimization,

    -- Priority scoring for optimization actions
    CASE 
      WHEN pta.avg_utilization > 95 OR pta.avg_failure_rate > 15 THEN 'critical'
      WHEN pta.avg_utilization > 85 OR pta.avg_failure_rate > 10 THEN 'high'
      WHEN pta.avg_utilization > 75 OR pta.avg_acquisition_time > 150 THEN 'medium'
      ELSE 'low'
    END as optimization_priority

  FROM performance_trend_analysis pta
),

-- Adaptive pooling recommendations based on workload patterns
adaptive_pooling_recommendations AS (
  SELECT 
    poa.pool_name,

    -- Current state assessment
    AVG(poa.avg_utilization) as current_avg_utilization,
    MAX(poa.avg_utilization) as current_peak_utilization,
    AVG(poa.avg_throughput) as current_avg_throughput,
    AVG(poa.performance_score) as current_performance_score,

    -- Optimization priority distribution
    COUNT(*) FILTER (WHERE poa.optimization_priority = 'critical') as critical_periods,
    COUNT(*) FILTER (WHERE poa.optimization_priority = 'high') as high_priority_periods,
    COUNT(*) FILTER (WHERE poa.optimization_priority = 'medium') as medium_priority_periods,

    -- Recommendation consensus
    MODE() WITHIN GROUP (ORDER BY poa.sizing_recommendation) as recommended_sizing_action,
    MODE() WITHIN GROUP (ORDER BY poa.configuration_recommendation) as recommended_config_action,
    MODE() WITHIN GROUP (ORDER BY poa.resource_optimization) as recommended_resource_action,

    -- Adaptive pooling configuration
    CASE 
      WHEN AVG(poa.avg_utilization) > 80 AND AVG(poa.performance_score) < 70 THEN
        JSON_OBJECT(
          'min_pool_size', GREATEST(cpc.min_pool_size + 5, 15),
          'max_pool_size', GREATEST(cpc.max_pool_size + 10, 50),
          'adjustment_reason', 'high_utilization_poor_performance'
        )
      WHEN AVG(poa.avg_utilization) < 40 AND AVG(poa.performance_score) > 85 THEN
        JSON_OBJECT(
          'min_pool_size', GREATEST(cpc.min_pool_size - 2, 5),
          'max_pool_size', cpc.max_pool_size,
          'adjustment_reason', 'low_utilization_good_performance'
        )
      WHEN AVG(poa.avg_throughput) FILTER (WHERE poa.throughput_trend = 'degrading') > 0.5 * COUNT(*) THEN
        JSON_OBJECT(
          'min_pool_size', cpc.min_pool_size + 3,
          'max_pool_size', cpc.max_pool_size + 15,
          'adjustment_reason', 'throughput_degradation'
        )
      ELSE
        JSON_OBJECT(
          'min_pool_size', cpc.min_pool_size,
          'max_pool_size', cpc.max_pool_size,
          'adjustment_reason', 'optimal_configuration'
        )
    END as adaptive_pool_config,

    -- Performance impact estimation
    CASE 
      WHEN COUNT(*) FILTER (WHERE poa.optimization_priority IN ('critical', 'high')) > COUNT(*) * 0.3 THEN
        'significant_improvement_expected'
      WHEN COUNT(*) FILTER (WHERE poa.optimization_priority = 'medium') > COUNT(*) * 0.5 THEN
        'moderate_improvement_expected'
      ELSE 'minimal_improvement_expected'
    END as expected_impact

  FROM pool_optimization_analysis poa
  CROSS JOIN connection_pool_configuration cpc
  GROUP BY poa.pool_name, cpc.min_pool_size, cpc.max_pool_size
)

-- Comprehensive connection pool management dashboard
SELECT 
  apr.pool_name,

  -- Current performance status
  ROUND(apr.current_avg_utilization, 1) || '%' as avg_utilization,
  ROUND(apr.current_peak_utilization, 1) || '%' as peak_utilization,
  ROUND(apr.current_avg_throughput, 0) as avg_throughput_ops_per_sec,
  apr.current_performance_score as performance_score,

  -- Problem severity assessment
  CASE 
    WHEN apr.critical_periods > 0 THEN 'Critical Issues Detected'
    WHEN apr.high_priority_periods > 0 THEN 'High Priority Issues Detected'
    WHEN apr.medium_priority_periods > 0 THEN 'Moderate Issues Detected'
    ELSE 'Operating Normally'
  END as overall_status,

  -- Optimization recommendations
  apr.recommended_sizing_action,
  apr.recommended_config_action,
  apr.recommended_resource_action,

  -- Adaptive pooling suggestions
  apr.adaptive_pool_config->>'min_pool_size' as recommended_min_pool_size,
  apr.adaptive_pool_config->>'max_pool_size' as recommended_max_pool_size,
  apr.adaptive_pool_config->>'adjustment_reason' as adjustment_rationale,

  -- Implementation priority and impact
  CASE 
    WHEN apr.critical_periods > 0 THEN 'Immediate'
    WHEN apr.high_priority_periods > 0 THEN 'Within 24 hours'
    WHEN apr.medium_priority_periods > 0 THEN 'Within 1 week'
    ELSE 'Monitor and evaluate'
  END as implementation_timeline,

  apr.expected_impact,

  -- Detailed action plan
  CASE 
    WHEN apr.recommended_sizing_action = 'increase_pool_size' THEN 
      ARRAY[
        'Increase max pool size to handle higher concurrent load',
        'Monitor utilization after adjustment',
        'Evaluate memory and CPU impact of larger pool',
        'Set up alerting for new utilization thresholds'
      ]
    WHEN apr.recommended_sizing_action = 'decrease_pool_size' THEN
      ARRAY[
        'Gradually reduce pool size to optimize resource usage',
        'Monitor for any performance degradation',
        'Adjust monitoring thresholds for new pool size',
        'Document resource savings achieved'
      ]
    WHEN apr.recommended_config_action = 'investigate_connection_issues' THEN
      ARRAY[
        'Review connection error logs for patterns',
        'Check network connectivity and latency',
        'Validate MongoDB server health and capacity',
        'Consider connection timeout optimization'
      ]
    ELSE 
      ARRAY['Continue monitoring current configuration', 'Review performance trends weekly']
  END as action_items,

  -- Configuration details for implementation
  JSON_BUILD_OBJECT(
    'current_configuration', JSON_BUILD_OBJECT(
      'min_pool_size', cpc.min_pool_size,
      'max_pool_size', cpc.max_pool_size,
      'max_idle_time_ms', cpc.max_idle_time_ms,
      'wait_timeout_ms', cpc.wait_queue_timeout_ms,
      'enable_adaptive_pooling', cpc.enable_adaptive_pooling
    ),
    'recommended_configuration', JSON_BUILD_OBJECT(
      'min_pool_size', (apr.adaptive_pool_config->>'min_pool_size')::integer,
      'max_pool_size', (apr.adaptive_pool_config->>'max_pool_size')::integer,
      'optimization_enabled', true,
      'monitoring_enhanced', true
    ),
    'expected_changes', JSON_BUILD_OBJECT(
      'utilization_improvement', CASE 
        WHEN apr.current_avg_utilization > 80 THEN 'Reduced peak utilization'
        WHEN apr.current_avg_utilization < 50 THEN 'Improved resource efficiency'
        ELSE 'Maintained optimal utilization'
      END,
      'performance_improvement', apr.expected_impact,
      'resource_impact', CASE 
        WHEN (apr.adaptive_pool_config->>'max_pool_size')::integer > cpc.max_pool_size THEN 'Increased memory usage'
        WHEN (apr.adaptive_pool_config->>'max_pool_size')::integer < cpc.max_pool_size THEN 'Reduced memory usage'
        ELSE 'No significant resource change'
      END
    )
  ) as configuration_details

FROM adaptive_pooling_recommendations apr
CROSS JOIN connection_pool_configuration cpc
ORDER BY apr.critical_periods DESC, apr.high_priority_periods DESC;

-- QueryLeaf provides comprehensive connection pooling capabilities:
-- 1. SQL-familiar connection pool configuration with advanced optimization settings
-- 2. Real-time monitoring and analytics for connection performance and utilization
-- 3. Intelligent pool sizing recommendations based on workload patterns and performance
-- 4. Adaptive pooling algorithms that automatically adjust to application requirements  
-- 5. Comprehensive error handling and retry mechanisms for connection reliability
-- 6. Advanced troubleshooting and optimization guidance for production environments
-- 7. Integration with MongoDB's native connection pooling features and optimizations
-- 8. Enterprise-scale monitoring with detailed metrics and performance analytics
-- 9. Automated optimization recommendations with implementation timelines and priorities
-- 10. SQL-style syntax for complex connection management workflows and configurations

Best Practices for Production Connection Pooling Implementation

Performance Architecture and Scaling Strategies

Essential principles for effective MongoDB connection pooling deployment:

Pool Sizing Strategy: Configure optimal pool sizes based on application concurrency patterns and server capacity
Performance Monitoring: Implement comprehensive monitoring for connection utilization, latency, and error rates
Adaptive Management: Use intelligent pooling algorithms that adjust to changing workload patterns
Error Handling: Design robust error handling with retry mechanisms and circuit breaker patterns
Resource Optimization: Balance connection pool sizes with memory usage and server resource constraints
Operational Excellence: Create monitoring dashboards and alerting for proactive pool management

Scalability and Production Deployment

Optimize connection pooling for enterprise-scale requirements:

Distributed Architecture: Design connection pooling strategies that work effectively across microservices
High Availability: Implement connection pooling with automatic failover and recovery capabilities
Performance Tuning: Optimize pool configurations based on application patterns and MongoDB cluster topology
Monitoring Integration: Integrate connection pool monitoring with enterprise observability platforms
Capacity Planning: Plan connection pool capacity based on expected growth and peak load scenarios
Security Considerations: Implement secure connection management with proper authentication and encryption

Conclusion

MongoDB connection pooling provides comprehensive high-performance database connection management capabilities that enable applications to efficiently handle concurrent operations, variable workloads, and peak traffic scenarios while maintaining optimal resource utilization and operational reliability. The intelligent pooling algorithms automatically optimize connection usage while providing detailed monitoring and tuning capabilities for enterprise deployments.

Key MongoDB connection pooling benefits include:

Intelligent Connection Management: Automatic connection lifecycle management with optimized pooling strategies
High Performance: Minimal connection overhead with intelligent connection reuse and resource optimization
Adaptive Optimization: Dynamic pool sizing based on application patterns and performance requirements
Comprehensive Monitoring: Real-time visibility into connection usage, performance, and health metrics
Enterprise Reliability: Robust error handling with automatic recovery and failover capabilities
Production Scalability: Distributed connection management that scales with application requirements

Whether you're building high-traffic web applications, real-time analytics platforms, microservices architectures, or any application requiring efficient database connectivity, MongoDB connection pooling with QueryLeaf's familiar SQL interface provides the foundation for scalable and reliable database connection management.

QueryLeaf Integration: QueryLeaf automatically optimizes MongoDB connection pooling while providing SQL-familiar syntax for connection management and monitoring. Advanced pooling patterns, performance optimization strategies, and enterprise monitoring capabilities are seamlessly handled through familiar SQL constructs, making sophisticated connection management accessible to SQL-oriented development teams.

The combination of MongoDB's robust connection pooling capabilities with SQL-style management operations makes it an ideal platform for modern applications that require both high-performance database connectivity and familiar management patterns, ensuring your connection pooling solutions scale efficiently while remaining operationally excellent.

November 28, 2025
22 min read

MongoDB Database Administration and Monitoring: Enterprise Operations Management and Performance Optimization

Enterprise MongoDB deployments require comprehensive database administration and monitoring strategies to ensure optimal performance, reliability, and operational excellence. Traditional relational database administration approaches often fall short when managing MongoDB's distributed architecture, flexible schema design, and unique operational characteristics that require specialized administrative expertise and tooling.

MongoDB database administration encompasses performance monitoring, capacity planning, security management, backup operations, and operational maintenance through sophisticated tooling and administrative frameworks. Unlike traditional SQL databases that rely on rigid administrative procedures, MongoDB administration requires understanding of document-based operations, replica set management, sharding strategies, and distributed system maintenance patterns.

The Traditional Database Administration Challenge

Conventional relational database administration approaches face significant limitations when applied to MongoDB environments:

-- Traditional PostgreSQL database administration - rigid procedures with limited MongoDB applicability

-- Database performance monitoring with limited visibility
CREATE TABLE db_performance_metrics (
    metric_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    server_name VARCHAR(100) NOT NULL,
    metric_timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    -- Connection metrics
    active_connections INTEGER,
    max_connections INTEGER,
    connection_utilization DECIMAL(5,2),
    idle_connections INTEGER,

    -- Query performance metrics
    slow_query_count INTEGER,
    average_query_time DECIMAL(10,4),
    queries_per_second DECIMAL(10,2),
    cache_hit_ratio DECIMAL(5,2),

    -- System resource utilization
    cpu_usage DECIMAL(5,2),
    memory_usage DECIMAL(5,2),
    disk_usage DECIMAL(5,2),
    io_wait DECIMAL(5,2),

    -- Database-specific metrics
    database_size BIGINT,
    index_size BIGINT,
    table_count INTEGER,
    index_count INTEGER,

    -- Lock contention
    lock_waits INTEGER,
    deadlock_count INTEGER,
    blocked_queries INTEGER,

    -- Transaction metrics
    transactions_per_second DECIMAL(10,2),
    transaction_rollback_rate DECIMAL(5,2)
);

-- Complex monitoring infrastructure with limited MongoDB compatibility
CREATE TABLE slow_query_log (
    log_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    server_name VARCHAR(100) NOT NULL,
    database_name VARCHAR(100) NOT NULL,
    query_timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    -- Query identification
    query_hash VARCHAR(64) NOT NULL,
    query_text TEXT NOT NULL,
    normalized_query TEXT,

    -- Performance metrics
    execution_time DECIMAL(10,4) NOT NULL,
    rows_examined BIGINT,
    rows_returned BIGINT,
    index_usage TEXT,

    -- Resource consumption
    cpu_time DECIMAL(10,4),
    io_reads INTEGER,
    io_writes INTEGER,
    memory_peak BIGINT,

    -- User context
    user_name VARCHAR(100),
    application_name VARCHAR(200),
    connection_id BIGINT,
    session_id VARCHAR(100),

    -- Query categorization
    query_type VARCHAR(20), -- SELECT, INSERT, UPDATE, DELETE
    complexity_score INTEGER,
    optimization_suggestions TEXT
);

-- Maintenance scheduling with manual coordination
CREATE TABLE maintenance_schedules (
    schedule_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    server_name VARCHAR(100) NOT NULL,
    maintenance_type VARCHAR(50) NOT NULL,
    scheduled_start TIMESTAMP NOT NULL,
    estimated_duration INTERVAL NOT NULL,

    -- Maintenance details
    maintenance_description TEXT,
    affected_databases TEXT[],
    expected_downtime INTERVAL,
    backup_required BOOLEAN DEFAULT TRUE,

    -- Approval workflow
    requested_by VARCHAR(100) NOT NULL,
    approved_by VARCHAR(100),
    approval_timestamp TIMESTAMP,
    approval_status VARCHAR(20) DEFAULT 'pending',

    -- Execution tracking
    actual_start TIMESTAMP,
    actual_end TIMESTAMP,
    execution_status VARCHAR(20) DEFAULT 'scheduled',
    execution_notes TEXT,

    -- Impact assessment
    business_impact VARCHAR(20), -- low, medium, high, critical
    user_notification_required BOOLEAN DEFAULT TRUE,
    rollback_plan TEXT
);

-- Limited backup and recovery management
CREATE TABLE backup_operations (
    backup_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    server_name VARCHAR(100) NOT NULL,
    database_name VARCHAR(100),
    backup_timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    -- Backup configuration
    backup_type VARCHAR(20) NOT NULL, -- full, incremental, differential
    backup_method VARCHAR(20) NOT NULL, -- dump, filesystem, streaming
    compression_enabled BOOLEAN DEFAULT TRUE,
    encryption_enabled BOOLEAN DEFAULT FALSE,

    -- Backup metrics
    backup_size BIGINT,
    compressed_size BIGINT,
    backup_duration INTERVAL,

    -- Storage information
    backup_location VARCHAR(500) NOT NULL,
    storage_type VARCHAR(50), -- local, s3, nfs, tape
    retention_period INTERVAL,
    deletion_scheduled TIMESTAMP,

    -- Validation and integrity
    integrity_check_performed BOOLEAN DEFAULT FALSE,
    integrity_check_passed BOOLEAN,
    checksum VARCHAR(128),

    -- Recovery testing
    recovery_test_performed BOOLEAN DEFAULT FALSE,
    recovery_test_passed BOOLEAN,
    recovery_test_notes TEXT
);

-- Complex user access and security management
CREATE TABLE user_access_audit (
    audit_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    server_name VARCHAR(100) NOT NULL,
    audit_timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    -- User information
    username VARCHAR(100) NOT NULL,
    user_type VARCHAR(20), -- admin, application, readonly
    authentication_method VARCHAR(50),
    source_ip INET,

    -- Access details
    operation_type VARCHAR(20) NOT NULL, -- login, logout, query, admin
    database_accessed VARCHAR(100),
    table_accessed VARCHAR(100),
    operation_details TEXT,

    -- Security context
    session_id VARCHAR(100),
    connection_duration INTERVAL,
    privilege_level VARCHAR(20),
    elevated_privileges BOOLEAN DEFAULT FALSE,

    -- Result tracking
    operation_success BOOLEAN NOT NULL,
    error_message TEXT,
    security_violation BOOLEAN DEFAULT FALSE,

    -- Risk assessment
    risk_level VARCHAR(20) DEFAULT 'low',
    anomaly_detected BOOLEAN DEFAULT FALSE,
    follow_up_required BOOLEAN DEFAULT FALSE
);

-- Manual index management with limited optimization capabilities
WITH index_analysis AS (
    SELECT 
        schemaname,
        tablename,
        indexname,
        idx_size,
        idx_tup_read,
        idx_tup_fetch,

        -- Index usage calculation
        CASE 
            WHEN idx_tup_read > 0 THEN 
                ROUND((idx_tup_fetch * 100.0 / idx_tup_read), 2)
            ELSE 0 
        END as index_efficiency,

        -- Index maintenance needs
        CASE 
            WHEN idx_tup_read < 1000 AND pg_size_pretty(idx_size::bigint) > '10 MB' THEN 'Consider removal'
            WHEN idx_tup_fetch < idx_tup_read * 0.1 THEN 'Poor selectivity'
            ELSE 'Optimal'
        END as optimization_recommendation

    FROM pg_stat_user_indexes 
    JOIN pg_indexes USING (schemaname, tablename, indexname)
),

maintenance_recommendations AS (
    SELECT 
        ia.schemaname,
        ia.tablename,
        COUNT(*) as total_indexes,
        COUNT(*) FILTER (WHERE ia.optimization_recommendation != 'Optimal') as problematic_indexes,
        array_agg(ia.indexname) FILTER (WHERE ia.optimization_recommendation = 'Consider removal') as removable_indexes,
        SUM(ia.idx_size) as total_index_size,
        AVG(ia.index_efficiency) as avg_efficiency

    FROM index_analysis ia
    GROUP BY ia.schemaname, ia.tablename
)

-- Generate maintenance recommendations
SELECT 
    mr.schemaname,
    mr.tablename,
    mr.total_indexes,
    mr.problematic_indexes,
    mr.removable_indexes,
    pg_size_pretty(mr.total_index_size::bigint) as total_size,
    ROUND(mr.avg_efficiency, 2) as avg_efficiency_percent,

    -- Maintenance priority
    CASE 
        WHEN mr.problematic_indexes > mr.total_indexes * 0.5 THEN 'High Priority'
        WHEN mr.problematic_indexes > 0 THEN 'Medium Priority'
        ELSE 'Low Priority'
    END as maintenance_priority,

    -- Specific recommendations
    CASE 
        WHEN array_length(mr.removable_indexes, 1) > 0 THEN 
            'Remove unused indexes: ' || array_to_string(mr.removable_indexes, ', ')
        WHEN mr.avg_efficiency < 50 THEN
            'Review index selectivity and query patterns'
        ELSE 
            'Continue current index strategy'
    END as detailed_recommendations

FROM maintenance_recommendations mr
WHERE mr.problematic_indexes > 0
ORDER BY mr.problematic_indexes DESC, mr.total_index_size DESC;

-- Problems with traditional database administration:
-- 1. Limited visibility into MongoDB-specific operations and performance characteristics
-- 2. Manual monitoring processes that don't scale with distributed MongoDB deployments  
-- 3. Rigid maintenance procedures that don't account for replica sets and sharding
-- 4. Backup strategies that don't leverage MongoDB's native backup capabilities
-- 5. Security management that doesn't integrate with MongoDB's role-based access control
-- 6. Performance tuning approaches that ignore MongoDB's unique optimization patterns
-- 7. Index management that doesn't understand MongoDB's compound index strategies
-- 8. Monitoring tools that lack MongoDB-specific metrics and operational insights
-- 9. Maintenance scheduling that doesn't coordinate across MongoDB cluster topology
-- 10. Recovery procedures that don't leverage MongoDB's built-in replication and failover

MongoDB provides comprehensive database administration capabilities with integrated monitoring and management tools:

// MongoDB Advanced Database Administration - Enterprise monitoring and operations management
const { MongoClient, ObjectId, GridFSBucket } = require('mongodb');

const client = new MongoClient('mongodb://localhost:27017');
const db = client.db('enterprise_operations');

// Comprehensive MongoDB Database Administration Manager
class MongoDBAdministrationManager {
  constructor(client, config = {}) {
    this.client = client;
    this.db = client.db(config.database || 'enterprise_operations');

    this.collections = {
      performanceMetrics: this.db.collection('performance_metrics'),
      slowOperations: this.db.collection('slow_operations'),
      indexAnalysis: this.db.collection('index_analysis'),
      maintenanceSchedule: this.db.collection('maintenance_schedule'),
      backupOperations: this.db.collection('backup_operations'),
      userActivity: this.db.collection('user_activity'),
      systemAlerts: this.db.collection('system_alerts'),
      capacityPlanning: this.db.collection('capacity_planning')
    };

    // Administration configuration
    this.config = {
      // Monitoring configuration
      metricsCollectionInterval: config.metricsCollectionInterval || 60000, // 1 minute
      slowOperationThreshold: config.slowOperationThreshold || 100, // 100ms
      enableDetailedProfiling: config.enableDetailedProfiling !== false,
      enableRealtimeAlerts: config.enableRealtimeAlerts !== false,

      // Performance thresholds
      cpuThreshold: config.cpuThreshold || 80,
      memoryThreshold: config.memoryThreshold || 85,
      connectionThreshold: config.connectionThreshold || 80,
      diskThreshold: config.diskThreshold || 90,

      // Maintenance configuration
      enableAutomaticMaintenance: config.enableAutomaticMaintenance || false,
      maintenanceWindow: config.maintenanceWindow || { start: '02:00', end: '04:00' },
      enableMaintenanceNotifications: config.enableMaintenanceNotifications !== false,

      // Backup configuration
      enableAutomaticBackups: config.enableAutomaticBackups !== false,
      backupRetentionDays: config.backupRetentionDays || 30,
      backupSchedule: config.backupSchedule || '0 2 * * *', // 2 AM daily

      // Security and compliance
      enableSecurityAuditing: config.enableSecurityAuditing !== false,
      enableComplianceTracking: config.enableComplianceTracking || false,
      enableAccessLogging: config.enableAccessLogging !== false
    };

    // Performance monitoring state
    this.performanceMonitors = new Map();
    this.alertSubscriptions = new Map();
    this.maintenanceQueue = [];

    this.initializeAdministrationSystem();
  }

  async initializeAdministrationSystem() {
    console.log('Initializing MongoDB administration and monitoring system...');

    try {
      // Setup monitoring infrastructure
      await this.setupMonitoringInfrastructure();

      // Initialize performance tracking
      await this.initializePerformanceMonitoring();

      // Setup automated maintenance
      if (this.config.enableAutomaticMaintenance) {
        await this.initializeMaintenanceAutomation();
      }

      // Initialize backup management
      if (this.config.enableAutomaticBackups) {
        await this.initializeBackupManagement();
      }

      // Setup security auditing
      if (this.config.enableSecurityAuditing) {
        await this.initializeSecurityAuditing();
      }

      // Start monitoring processes
      await this.startMonitoringProcesses();

      console.log('MongoDB administration system initialized successfully');

    } catch (error) {
      console.error('Error initializing administration system:', error);
      throw error;
    }
  }

  async setupMonitoringInfrastructure() {
    console.log('Setting up comprehensive monitoring infrastructure...');

    try {
      // Create optimized indexes for monitoring collections
      await this.collections.performanceMetrics.createIndexes([
        { key: { timestamp: -1 }, background: true },
        { key: { serverName: 1, timestamp: -1 }, background: true },
        { key: { 'metrics.alertLevel': 1, timestamp: -1 }, background: true },
        { key: { 'metrics.cpuUsage': 1 }, background: true, sparse: true },
        { key: { 'metrics.memoryUsage': 1 }, background: true, sparse: true }
      ]);

      // Slow operations monitoring indexes
      await this.collections.slowOperations.createIndexes([
        { key: { timestamp: -1 }, background: true },
        { key: { operation: 1, timestamp: -1 }, background: true },
        { key: { duration: -1 }, background: true },
        { key: { 'context.collection': 1, timestamp: -1 }, background: true },
        { key: { 'optimization.needsAttention': 1 }, background: true, sparse: true }
      ]);

      // Index analysis tracking
      await this.collections.indexAnalysis.createIndexes([
        { key: { collection: 1, timestamp: -1 }, background: true },
        { key: { 'analysis.efficiency': 1 }, background: true },
        { key: { 'recommendations.priority': 1, timestamp: -1 }, background: true }
      ]);

      // System alerts indexing
      await this.collections.systemAlerts.createIndexes([
        { key: { timestamp: -1 }, background: true },
        { key: { severity: 1, timestamp: -1 }, background: true },
        { key: { resolved: 1, timestamp: -1 }, background: true },
        { key: { category: 1, timestamp: -1 }, background: true }
      ]);

      console.log('Monitoring infrastructure setup completed');

    } catch (error) {
      console.error('Error setting up monitoring infrastructure:', error);
      throw error;
    }
  }

  async collectComprehensiveMetrics() {
    console.log('Collecting comprehensive performance metrics...');
    const startTime = Date.now();

    try {
      // Get server status information
      const serverStatus = await this.db.admin().serverStatus();
      const dbStats = await this.db.stats();
      const currentOperations = await this.db.admin().currentOp();

      // Collect performance metrics
      const performanceMetrics = {
        _id: new ObjectId(),
        timestamp: new Date(),
        serverName: serverStatus.host,

        // Connection metrics
        connections: {
          current: serverStatus.connections.current,
          available: serverStatus.connections.available,
          totalCreated: serverStatus.connections.totalCreated,
          utilization: Math.round((serverStatus.connections.current / 
                                  (serverStatus.connections.current + serverStatus.connections.available)) * 100)
        },

        // Operation metrics
        operations: {
          insert: serverStatus.opcounters.insert,
          query: serverStatus.opcounters.query,
          update: serverStatus.opcounters.update,
          delete: serverStatus.opcounters.delete,
          getmore: serverStatus.opcounters.getmore,
          command: serverStatus.opcounters.command,

          // Operations per second calculations
          insertRate: this.calculateOperationRate('insert', serverStatus.opcounters.insert),
          queryRate: this.calculateOperationRate('query', serverStatus.opcounters.query),
          updateRate: this.calculateOperationRate('update', serverStatus.opcounters.update),
          deleteRate: this.calculateOperationRate('delete', serverStatus.opcounters.delete)
        },

        // Memory metrics
        memory: {
          resident: serverStatus.mem.resident,
          virtual: serverStatus.mem.virtual,
          mapped: serverStatus.mem.mapped || 0,
          mappedWithJournal: serverStatus.mem.mappedWithJournal || 0,

          // WiredTiger cache metrics (if available)
          cacheSizeGB: serverStatus.wiredTiger?.cache?.['maximum bytes configured'] ? 
                      Math.round(serverStatus.wiredTiger.cache['maximum bytes configured'] / 1024 / 1024 / 1024) : 0,
          cacheUsedGB: serverStatus.wiredTiger?.cache?.['bytes currently in the cache'] ? 
                      Math.round(serverStatus.wiredTiger.cache['bytes currently in the cache'] / 1024 / 1024 / 1024) : 0,
          cacheUtilization: this.calculateCacheUtilization(serverStatus)
        },

        // Database metrics
        database: {
          collections: dbStats.collections,
          objects: dbStats.objects,
          dataSize: dbStats.dataSize,
          storageSize: dbStats.storageSize,
          indexes: dbStats.indexes,
          indexSize: dbStats.indexSize,

          // Growth metrics
          avgObjSize: dbStats.avgObjSize,
          scaleFactor: dbStats.scaleFactor || 1
        },

        // Network metrics
        network: {
          bytesIn: serverStatus.network.bytesIn,
          bytesOut: serverStatus.network.bytesOut,
          numRequests: serverStatus.network.numRequests,

          // Network rates
          bytesInRate: this.calculateNetworkRate('bytesIn', serverStatus.network.bytesIn),
          bytesOutRate: this.calculateNetworkRate('bytesOut', serverStatus.network.bytesOut),
          requestRate: this.calculateNetworkRate('numRequests', serverStatus.network.numRequests)
        },

        // Current operations analysis
        activeOperations: {
          total: currentOperations.inprog.length,
          reads: currentOperations.inprog.filter(op => op.op === 'query' || op.op === 'getmore').length,
          writes: currentOperations.inprog.filter(op => op.op === 'insert' || op.op === 'update' || op.op === 'remove').length,
          commands: currentOperations.inprog.filter(op => op.op === 'command').length,

          // Long running operations
          longRunning: currentOperations.inprog.filter(op => op.microsecs_running > 1000000).length, // > 1 second
          blocked: currentOperations.inprog.filter(op => op.waitingForLock).length
        },

        // Lock metrics
        locks: this.analyzeLockMetrics(serverStatus),

        // Replication metrics (if applicable)
        replication: await this.collectReplicationMetrics(),

        // Sharding metrics (if applicable)
        sharding: await this.collectShardingMetrics(),

        // Alert evaluation
        alerts: await this.evaluatePerformanceAlerts(serverStatus, dbStats),

        // Collection metadata
        collectionTime: Date.now() - startTime,
        metricsVersion: '2.0'
      };

      // Store metrics
      await this.collections.performanceMetrics.insertOne(performanceMetrics);

      // Process alerts if any
      if (performanceMetrics.alerts.length > 0) {
        await this.processPerformanceAlerts(performanceMetrics.alerts, performanceMetrics);
      }

      console.log(`Performance metrics collected successfully: ${performanceMetrics.alerts.length} alerts generated`);

      return performanceMetrics;

    } catch (error) {
      console.error('Error collecting performance metrics:', error);
      throw error;
    }
  }

  async analyzeSlowOperations() {
    console.log('Analyzing slow operations and query performance...');

    try {
      // Get current profiling level
      const profilingLevel = await this.db.runCommand({ profile: -1 });

      // Enable profiling if not already enabled
      if (profilingLevel.was < 1) {
        await this.db.runCommand({ 
          profile: 2, // Profile all operations
          slowms: this.config.slowOperationThreshold 
        });
      }

      // Retrieve slow operations from profiler collection
      const profileData = await this.db.collection('system.profile')
        .find({
          ts: { $gte: new Date(Date.now() - 300000) }, // Last 5 minutes
          millis: { $gte: this.config.slowOperationThreshold }
        })
        .sort({ ts: -1 })
        .limit(1000)
        .toArray();

      // Analyze operations
      for (const operation of profileData) {
        const analysis = await this.analyzeOperation(operation);

        const slowOpDocument = {
          _id: new ObjectId(),
          timestamp: operation.ts,
          operation: operation.op,
          namespace: operation.ns,
          duration: operation.millis,

          // Command details
          command: operation.command,
          planSummary: operation.planSummary,
          executionStats: operation.execStats,

          // Performance analysis
          analysis: analysis,

          // Optimization recommendations
          recommendations: await this.generateOptimizationRecommendations(operation, analysis),

          // Context information
          context: {
            client: operation.client,
            user: operation.user,
            collection: this.extractCollectionName(operation.ns),
            database: this.extractDatabaseName(operation.ns)
          },

          // Impact assessment
          impact: this.assessOperationImpact(operation, analysis),

          // Tracking metadata
          analyzed: true,
          reviewRequired: analysis.complexity === 'high' || operation.millis > 5000,
          processed: new Date()
        };

        await this.collections.slowOperations.insertOne(slowOpDocument);
      }

      console.log(`Analyzed ${profileData.length} slow operations`);

      return profileData.length;

    } catch (error) {
      console.error('Error analyzing slow operations:', error);
      throw error;
    }
  }

  async performIndexAnalysis() {
    console.log('Performing comprehensive index analysis...');

    try {
      const databases = await this.client.db().admin().listDatabases();

      for (const dbInfo of databases.databases) {
        if (['admin', 'local', 'config'].includes(dbInfo.name)) continue;

        const database = this.client.db(dbInfo.name);
        const collections = await database.listCollections().toArray();

        for (const collInfo of collections) {
          const collection = database.collection(collInfo.name);

          // Get index information
          const indexes = await collection.listIndexes().toArray();
          const indexStats = await collection.aggregate([
            { $indexStats: {} }
          ]).toArray();

          // Get collection statistics
          const collStats = await collection.stats();

          // Analyze each index
          for (const index of indexes) {
            const indexStat = indexStats.find(stat => stat.name === index.name);
            const analysis = await this.analyzeIndex(index, indexStat, collStats, collection);

            const indexAnalysisDocument = {
              _id: new ObjectId(),
              timestamp: new Date(),
              database: dbInfo.name,
              collection: collInfo.name,
              indexName: index.name,

              // Index configuration
              indexSpec: index.key,
              indexOptions: {
                unique: index.unique || false,
                sparse: index.sparse || false,
                background: index.background || false,
                partialFilterExpression: index.partialFilterExpression
              },

              // Index statistics
              statistics: {
                accessCount: indexStat?.accesses?.ops || 0,
                lastAccessed: indexStat?.accesses?.since || null,
                indexSize: index.indexSizes ? index.indexSizes[index.name] : 0
              },

              // Analysis results
              analysis: analysis,

              // Performance recommendations
              recommendations: await this.generateIndexRecommendations(analysis, index, collStats),

              // Usage patterns
              usagePatterns: await this.analyzeIndexUsagePatterns(index, collection),

              // Maintenance requirements
              maintenance: {
                needsRebuild: analysis.fragmentation > 30,
                needsOptimization: analysis.efficiency < 70,
                canBeDropped: analysis.usage === 'unused' && !index.unique && index.name !== '_id_',
                priority: this.calculateMaintenancePriority(analysis)
              }
            };

            await this.collections.indexAnalysis.insertOne(indexAnalysisDocument);
          }
        }
      }

      console.log('Index analysis completed successfully');

    } catch (error) {
      console.error('Error performing index analysis:', error);
      throw error;
    }
  }

  async scheduleMaintenanceOperation(maintenanceRequest) {
    console.log('Scheduling maintenance operation...');

    try {
      const maintenanceOperation = {
        _id: new ObjectId(),
        type: maintenanceRequest.type,
        scheduledTime: maintenanceRequest.scheduledTime || new Date(),
        estimatedDuration: maintenanceRequest.estimatedDuration,

        // Operation details
        description: maintenanceRequest.description,
        targetDatabases: maintenanceRequest.databases || [],
        targetCollections: maintenanceRequest.collections || [],

        // Impact assessment
        impactLevel: maintenanceRequest.impactLevel || 'medium',
        downTimeRequired: maintenanceRequest.downTimeRequired || false,
        userNotificationRequired: maintenanceRequest.userNotificationRequired !== false,

        // Approval workflow
        requestedBy: maintenanceRequest.requestedBy,
        approvalRequired: this.determineApprovalRequirement(maintenanceRequest),
        approvalStatus: 'pending',

        // Execution planning
        executionPlan: await this.generateExecutionPlan(maintenanceRequest),
        rollbackPlan: await this.generateRollbackPlan(maintenanceRequest),
        preChecks: await this.generatePreChecks(maintenanceRequest),
        postChecks: await this.generatePostChecks(maintenanceRequest),

        // Status tracking
        status: 'scheduled',
        createdAt: new Date(),
        executedAt: null,
        completedAt: null,

        // Results tracking
        executionResults: null,
        success: null,
        notes: []
      };

      // Validate maintenance window
      const validationResult = await this.validateMaintenanceWindow(maintenanceOperation);
      if (!validationResult.valid) {
        throw new Error(`Maintenance scheduling validation failed: ${validationResult.reason}`);
      }

      // Check for conflicts
      const conflicts = await this.checkMaintenanceConflicts(maintenanceOperation);
      if (conflicts.length > 0) {
        maintenanceOperation.conflicts = conflicts;
        maintenanceOperation.status = 'conflict_detected';
      }

      // Store maintenance operation
      await this.collections.maintenanceSchedule.insertOne(maintenanceOperation);

      // Send notifications if required
      if (maintenanceOperation.userNotificationRequired) {
        await this.sendMaintenanceNotification(maintenanceOperation);
      }

      console.log(`Maintenance operation scheduled: ${maintenanceOperation._id}`, {
        type: maintenanceOperation.type,
        scheduledTime: maintenanceOperation.scheduledTime,
        impactLevel: maintenanceOperation.impactLevel
      });

      return maintenanceOperation;

    } catch (error) {
      console.error('Error scheduling maintenance operation:', error);
      throw error;
    }
  }

  async executeBackupOperation(backupConfig) {
    console.log('Executing comprehensive backup operation...');

    try {
      const backupId = new ObjectId();
      const startTime = new Date();

      const backupOperation = {
        _id: backupId,
        type: backupConfig.type || 'full',
        startTime: startTime,

        // Backup configuration
        databases: backupConfig.databases || ['all'],
        compression: backupConfig.compression !== false,
        encryption: backupConfig.encryption || false,

        // Storage configuration
        storageLocation: backupConfig.storageLocation,
        storageType: backupConfig.storageType || 'local',
        retentionDays: backupConfig.retentionDays || this.config.backupRetentionDays,

        // Backup metadata
        triggeredBy: backupConfig.triggeredBy || 'system',
        backupReason: backupConfig.reason || 'scheduled_backup',

        // Status tracking
        status: 'in_progress',
        progress: 0,
        currentDatabase: null,

        // Results placeholder
        endTime: null,
        duration: null,
        backupSize: null,
        compressedSize: null,
        success: null,
        errorMessage: null
      };

      // Store backup operation record
      await this.collections.backupOperations.insertOne(backupOperation);

      // Execute backup based on type
      let backupResults;
      switch (backupConfig.type) {
        case 'full':
          backupResults = await this.performFullBackup(backupId, backupConfig);
          break;
        case 'incremental':
          backupResults = await this.performIncrementalBackup(backupId, backupConfig);
          break;
        case 'differential':
          backupResults = await this.performDifferentialBackup(backupId, backupConfig);
          break;
        default:
          throw new Error(`Unsupported backup type: ${backupConfig.type}`);
      }

      // Update backup operation with results
      const endTime = new Date();
      await this.collections.backupOperations.updateOne(
        { _id: backupId },
        {
          $set: {
            status: backupResults.success ? 'completed' : 'failed',
            endTime: endTime,
            duration: endTime - startTime,
            backupSize: backupResults.backupSize,
            compressedSize: backupResults.compressedSize,
            success: backupResults.success,
            errorMessage: backupResults.errorMessage,
            progress: 100,

            // Backup verification
            verificationResults: backupResults.verification,
            checksumVerified: backupResults.checksumVerified,

            // Storage information
            backupFiles: backupResults.files,
            storageLocation: backupResults.storageLocation
          }
        }
      );

      // Schedule cleanup of old backups
      await this.scheduleBackupCleanup(backupId, backupConfig);

      console.log(`Backup operation completed: ${backupId}`, {
        success: backupResults.success,
        duration: endTime - startTime,
        backupSize: backupResults.backupSize
      });

      return backupResults;

    } catch (error) {
      console.error('Error executing backup operation:', error);
      throw error;
    }
  }

  // Utility methods for MongoDB administration

  calculateOperationRate(operationType, currentValue) {
    const previousMetrics = this.performanceMonitors.get(`${operationType}_previous`);
    const previousTime = this.performanceMonitors.get(`${operationType}_time`);

    if (previousMetrics && previousTime) {
      const timeDiff = (Date.now() - previousTime) / 1000; // seconds
      const valueDiff = currentValue - previousMetrics;
      const rate = Math.round(valueDiff / timeDiff);

      // Update previous values
      this.performanceMonitors.set(`${operationType}_previous`, currentValue);
      this.performanceMonitors.set(`${operationType}_time`, Date.now());

      return rate;
    } else {
      // Initialize tracking
      this.performanceMonitors.set(`${operationType}_previous`, currentValue);
      this.performanceMonitors.set(`${operationType}_time`, Date.now());
      return 0;
    }
  }

  calculateNetworkRate(metric, currentValue) {
    return this.calculateOperationRate(`network_${metric}`, currentValue);
  }

  calculateCacheUtilization(serverStatus) {
    if (serverStatus.wiredTiger?.cache) {
      const maxBytes = serverStatus.wiredTiger.cache['maximum bytes configured'];
      const currentBytes = serverStatus.wiredTiger.cache['bytes currently in the cache'];

      if (maxBytes && currentBytes) {
        return Math.round((currentBytes / maxBytes) * 100);
      }
    }
    return 0;
  }

  analyzeLockMetrics(serverStatus) {
    if (serverStatus.locks) {
      return {
        globalLock: serverStatus.locks.Global || {},
        databaseLock: serverStatus.locks.Database || {},
        collectionLock: serverStatus.locks.Collection || {},

        // Lock contention analysis
        lockContention: this.calculateLockContention(serverStatus.locks),
        lockEfficiency: this.calculateLockEfficiency(serverStatus.locks)
      };
    }
    return {};
  }

  async collectReplicationMetrics() {
    try {
      const replStatus = await this.db.admin().replSetGetStatus();

      if (replStatus.ok) {
        return {
          setName: replStatus.set,
          members: replStatus.members.length,
          primary: replStatus.members.find(m => m.stateStr === 'PRIMARY'),
          secondaries: replStatus.members.filter(m => m.stateStr === 'SECONDARY'),
          replicationLag: this.calculateReplicationLag(replStatus.members)
        };
      }
    } catch (error) {
      // Not a replica set
      return null;
    }
  }

  async collectShardingMetrics() {
    try {
      const shardingStatus = await this.db.admin().runCommand({ listShards: 1 });

      if (shardingStatus.ok) {
        return {
          shards: shardingStatus.shards.length,
          shardsInfo: shardingStatus.shards,
          balancerActive: await this.checkBalancerStatus()
        };
      }
    } catch (error) {
      // Not a sharded cluster
      return null;
    }
  }

  async evaluatePerformanceAlerts(serverStatus, dbStats) {
    const alerts = [];

    // CPU usage alert
    if (serverStatus.extra_info?.page_faults > 1000) {
      alerts.push({
        type: 'high_page_faults',
        severity: 'warning',
        message: 'High page fault rate detected',
        value: serverStatus.extra_info.page_faults,
        threshold: 1000
      });
    }

    // Connection usage alert
    const connectionUtilization = (serverStatus.connections.current / 
                                 (serverStatus.connections.current + serverStatus.connections.available)) * 100;
    if (connectionUtilization > this.config.connectionThreshold) {
      alerts.push({
        type: 'high_connection_usage',
        severity: connectionUtilization > 95 ? 'critical' : 'warning',
        message: 'High connection utilization',
        value: Math.round(connectionUtilization),
        threshold: this.config.connectionThreshold
      });
    }

    // Memory usage alerts
    if (serverStatus.mem.resident > 8000) { // > 8GB
      alerts.push({
        type: 'high_memory_usage',
        severity: 'warning',
        message: 'High resident memory usage',
        value: serverStatus.mem.resident,
        threshold: 8000
      });
    }

    return alerts;
  }

  async analyzeOperation(operation) {
    return {
      complexity: this.assessQueryComplexity(operation),
      indexUsage: this.analyzeIndexUsage(operation),
      efficiency: this.calculateQueryEfficiency(operation),
      optimization: this.identifyOptimizationOpportunities(operation)
    };
  }

  assessQueryComplexity(operation) {
    let complexity = 'low';

    if (operation.execStats?.totalDocsExamined > 10000) complexity = 'medium';
    if (operation.execStats?.totalDocsExamined > 100000) complexity = 'high';
    if (operation.planSummary?.includes('COLLSCAN')) complexity = 'high';
    if (operation.millis > 5000) complexity = 'high';

    return complexity;
  }

  analyzeIndexUsage(operation) {
    if (operation.planSummary) {
      if (operation.planSummary.includes('IXSCAN')) return 'efficient';
      if (operation.planSummary.includes('COLLSCAN')) return 'full_scan';
    }
    return 'unknown';
  }

  calculateQueryEfficiency(operation) {
    if (operation.execStats?.totalDocsExamined && operation.execStats?.totalDocsReturned) {
      const efficiency = (operation.execStats.totalDocsReturned / operation.execStats.totalDocsExamined) * 100;
      return Math.round(efficiency);
    }
    return 0;
  }
}

// Benefits of MongoDB Advanced Database Administration:
// - Comprehensive performance monitoring with real-time metrics collection and analysis
// - Automated slow operation detection and optimization recommendation generation
// - Intelligent index analysis with usage patterns and maintenance prioritization
// - Automated maintenance scheduling with conflict detection and approval workflows
// - Enterprise backup management with compression, encryption, and retention policies
// - Integrated security auditing and access control monitoring
// - Real-time alerting with configurable thresholds and escalation procedures
// - Capacity planning with growth trend analysis and resource optimization
// - High availability monitoring for replica sets and sharded clusters
// - SQL-compatible administration operations through QueryLeaf integration

module.exports = {
  MongoDBAdministrationManager
};

Understanding MongoDB Database Administration Architecture

Enterprise Performance Monitoring and Optimization

Implement comprehensive monitoring for production MongoDB environments:

// Production-ready MongoDB monitoring with advanced analytics and alerting
class ProductionMonitoringManager extends MongoDBAdministrationManager {
  constructor(client, productionConfig) {
    super(client, productionConfig);

    this.productionConfig = {
      ...productionConfig,
      enableAdvancedAnalytics: true,
      enablePredictiveAlerts: true,
      enableCapacityPlanning: true,
      enableComplianceMonitoring: true,
      enableForensicLogging: true,
      enablePerformanceBaselines: true
    };

    this.setupProductionMonitoring();
    this.initializeAdvancedAnalytics();
    this.setupCapacityPlanning();
  }

  async implementAdvancedMonitoring(monitoringProfile) {
    console.log('Implementing advanced production monitoring...');

    const monitoringFramework = {
      // Real-time performance monitoring
      realTimeMetrics: {
        operationLatency: true,
        throughputAnalysis: true,
        resourceUtilization: true,
        connectionPoolAnalysis: true,
        lockContentionTracking: true
      },

      // Predictive analytics
      predictiveAnalytics: {
        capacityForecast: true,
        performanceTrendAnalysis: true,
        anomalyDetection: true,
        failurePrediction: true,
        maintenanceScheduling: true
      },

      // Advanced alerting
      intelligentAlerting: {
        dynamicThresholds: true,
        alertCorrelation: true,
        escalationManagement: true,
        suppressionRules: true,
        businessImpactAssessment: true
      }
    };

    return await this.deployMonitoringFramework(monitoringFramework, monitoringProfile);
  }

  async setupCapacityPlanningFramework() {
    console.log('Setting up comprehensive capacity planning...');

    const capacityFramework = {
      // Growth analysis
      growthAnalysis: {
        dataGrowthTrends: true,
        operationalGrowthPatterns: true,
        resourceConsumptionForecasting: true,
        scalingRecommendations: true
      },

      // Performance baselines
      performanceBaselines: {
        responseTimeBaselines: true,
        throughputBaselines: true,
        resourceUtilizationBaselines: true,
        operationalBaselines: true
      },

      // Optimization recommendations
      optimizationFramework: {
        indexingOptimization: true,
        queryOptimization: true,
        schemaOptimization: true,
        infrastructureOptimization: true
      }
    };

    return await this.deployCapacityFramework(capacityFramework);
  }

  async implementComplianceMonitoring(complianceRequirements) {
    console.log('Implementing comprehensive compliance monitoring...');

    const complianceFramework = {
      // Audit trail monitoring
      auditCompliance: {
        accessLogging: true,
        changeTracking: true,
        privilegedOperationsTracking: true,
        complianceReporting: true
      },

      // Security monitoring
      securityCompliance: {
        authenticationMonitoring: true,
        authorizationTracking: true,
        securityViolationDetection: true,
        threatDetection: true
      }
    };

    return await this.deployComplianceFramework(complianceFramework, complianceRequirements);
  }
}

SQL-Style Database Administration with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB database administration and monitoring operations:

-- QueryLeaf advanced database administration with SQL-familiar syntax for MongoDB

-- Configure monitoring and administration settings
SET monitoring_interval = 60; -- seconds
SET slow_operation_threshold = 100; -- milliseconds
SET enable_performance_profiling = true;
SET enable_automatic_maintenance = true;
SET enable_capacity_planning = true;

-- Comprehensive performance monitoring with SQL-familiar administration
WITH monitoring_configuration AS (
  SELECT 
    -- Monitoring settings
    60 as metrics_collection_interval_seconds,
    100 as slow_operation_threshold_ms,
    true as enable_detailed_profiling,
    true as enable_realtime_alerts,

    -- Performance thresholds
    80 as cpu_threshold_percent,
    85 as memory_threshold_percent,
    80 as connection_threshold_percent,
    90 as disk_threshold_percent,

    -- Alert configuration
    'critical' as high_severity_level,
    'warning' as medium_severity_level,
    'info' as low_severity_level,

    -- Maintenance configuration
    true as enable_automatic_maintenance,
    '02:00:00'::time as maintenance_window_start,
    '04:00:00'::time as maintenance_window_end,

    -- Backup configuration
    true as enable_automatic_backups,
    30 as backup_retention_days,
    'full' as default_backup_type
),

performance_metrics_collection AS (
  -- Collect comprehensive performance metrics
  SELECT 
    CURRENT_TIMESTAMP as metric_timestamp,
    EXTRACT(EPOCH FROM CURRENT_TIMESTAMP) as metric_epoch,

    -- Server identification
    'mongodb-primary-01' as server_name,
    'production' as environment,
    '4.4.15' as mongodb_version,

    -- Connection metrics
    JSON_BUILD_OBJECT(
      'current_connections', 245,
      'available_connections', 755,
      'total_created', 15420,
      'utilization_percent', ROUND((245.0 / (245 + 755)) * 100, 2)
    ) as connection_metrics,

    -- Operation metrics
    JSON_BUILD_OBJECT(
      'insert_ops', 1245670,
      'query_ops', 8901234,
      'update_ops', 567890,
      'delete_ops', 123456,
      'command_ops', 2345678,

      -- Operations per second (calculated)
      'insert_ops_per_sec', 12.5,
      'query_ops_per_sec', 156.7,
      'update_ops_per_sec', 8.9,
      'delete_ops_per_sec', 2.1,
      'command_ops_per_sec', 45.6
    ) as operation_metrics,

    -- Memory metrics
    JSON_BUILD_OBJECT(
      'resident_mb', 2048,
      'virtual_mb', 4096,
      'mapped_mb', 1024,
      'cache_size_gb', 8,
      'cache_used_gb', 6.5,
      'cache_utilization_percent', 81.25
    ) as memory_metrics,

    -- Database metrics
    JSON_BUILD_OBJECT(
      'total_collections', 45,
      'total_documents', 25678901,
      'data_size_gb', 12.5,
      'storage_size_gb', 15.2,
      'total_indexes', 123,
      'index_size_gb', 2.8,
      'avg_document_size_bytes', 512
    ) as database_metrics,

    -- Lock metrics
    JSON_BUILD_OBJECT(
      'global_lock_ratio', 0.02,
      'database_lock_ratio', 0.01,
      'collection_lock_ratio', 0.005,
      'lock_contention_percent', 1.2,
      'blocked_operations', 3
    ) as lock_metrics,

    -- Replication metrics (if replica set)
    JSON_BUILD_OBJECT(
      'replica_set_name', 'rs-production',
      'is_primary', true,
      'secondary_count', 2,
      'max_replication_lag_seconds', 0.5,
      'oplog_size_gb', 2.0,
      'oplog_used_percent', 15.3
    ) as replication_metrics,

    -- Alert evaluation
    ARRAY[
      CASE WHEN 245.0 / (245 + 755) > 0.8 THEN
        JSON_BUILD_OBJECT(
          'type', 'high_connection_usage',
          'severity', 'warning',
          'value', ROUND((245.0 / (245 + 755)) * 100, 2),
          'threshold', 80,
          'message', 'Connection utilization above threshold'
        )
      END,
      CASE WHEN 81.25 > 85 THEN
        JSON_BUILD_OBJECT(
          'type', 'high_cache_usage',
          'severity', 'warning',
          'value', 81.25,
          'threshold', 85,
          'message', 'Cache utilization above threshold'
        )
      END
    ] as performance_alerts

  FROM monitoring_configuration mc
),

slow_operations_analysis AS (
  -- Analyze slow operations and performance bottlenecks
  SELECT 
    operation_id,
    operation_timestamp,
    operation_type,
    database_name,
    collection_name,
    duration_ms,

    -- Operation details
    operation_command,
    plan_summary,
    execution_stats,

    -- Performance analysis
    CASE 
      WHEN duration_ms > 5000 THEN 'critical'
      WHEN duration_ms > 1000 THEN 'high'
      WHEN duration_ms > 500 THEN 'medium'
      ELSE 'low'
    END as performance_impact,

    -- Query complexity assessment
    CASE 
      WHEN plan_summary LIKE '%COLLSCAN%' THEN 'high'
      WHEN execution_stats->>'totalDocsExamined' > '100000' THEN 'high'
      WHEN execution_stats->>'totalDocsExamined' > '10000' THEN 'medium'
      ELSE 'low'
    END as query_complexity,

    -- Index usage analysis
    CASE 
      WHEN plan_summary LIKE '%IXSCAN%' THEN 'efficient'
      WHEN plan_summary LIKE '%COLLSCAN%' THEN 'full_scan'
      ELSE 'unknown'
    END as index_usage,

    -- Query efficiency calculation
    CASE 
      WHEN execution_stats->>'totalDocsExamined' > '0' AND execution_stats->>'totalDocsReturned' > '0' THEN
        ROUND(
          (execution_stats->>'totalDocsReturned')::decimal / 
          (execution_stats->>'totalDocsExamined')::decimal * 100, 2
        )
      ELSE 0
    END as query_efficiency_percent,

    -- Optimization recommendations
    CASE 
      WHEN plan_summary LIKE '%COLLSCAN%' THEN 'Create appropriate indexes'
      WHEN query_efficiency_percent < 10 THEN 'Optimize query selectivity'
      WHEN duration_ms > 1000 THEN 'Review query structure'
      ELSE 'Performance acceptable'
    END as optimization_recommendation

  FROM slow_operation_log sol
  WHERE sol.operation_timestamp >= CURRENT_TIMESTAMP - INTERVAL '1 hour'
  AND sol.duration_ms >= 100 -- From monitoring configuration
),

index_analysis_comprehensive AS (
  -- Comprehensive index analysis and optimization
  SELECT 
    database_name,
    collection_name,
    index_name,
    index_specification,

    -- Index statistics
    index_size_mb,
    access_count,
    last_accessed_timestamp,

    -- Index efficiency analysis
    CASE 
      WHEN access_count > 1000 AND index_size_mb < 50 THEN 'highly_efficient'
      WHEN access_count > 100 AND index_size_mb < 100 THEN 'efficient'
      WHEN access_count > 10 THEN 'moderately_efficient'
      WHEN access_count = 0 AND index_name != '_id_' THEN 'unused'
      ELSE 'inefficient'
    END as efficiency_rating,

    -- Usage pattern analysis
    CASE 
      WHEN last_accessed_timestamp < CURRENT_TIMESTAMP - INTERVAL '30 days' THEN 'rarely_used'
      WHEN last_accessed_timestamp < CURRENT_TIMESTAMP - INTERVAL '7 days' THEN 'occasionally_used'
      WHEN access_count > 100 THEN 'frequently_used'
      ELSE 'moderately_used'
    END as usage_pattern,

    -- Maintenance recommendations
    CASE 
      WHEN access_count = 0 AND index_name != '_id_' THEN 'Consider removal'
      WHEN index_size_mb > 1000 AND efficiency_rating = 'inefficient' THEN 'Rebuild index'
      WHEN efficiency_rating = 'unused' THEN 'Review necessity'
      WHEN usage_pattern = 'frequently_used' AND index_size_mb > 500 THEN 'Monitor fragmentation'
      ELSE 'Maintain current strategy'
    END as maintenance_recommendation,

    -- Priority scoring
    CASE 
      WHEN efficiency_rating = 'unused' THEN 10
      WHEN efficiency_rating = 'inefficient' AND index_size_mb > 100 THEN 8
      WHEN maintenance_recommendation LIKE '%Rebuild%' THEN 7
      WHEN usage_pattern = 'rarely_used' AND index_size_mb > 50 THEN 5
      ELSE 1
    END as maintenance_priority_score

  FROM index_statistics_view isv
),

maintenance_scheduling AS (
  -- Automated maintenance scheduling with priority management
  SELECT 
    GENERATE_UUID() as maintenance_id,
    maintenance_type,
    target_database,
    target_collection,
    target_index,

    -- Scheduling information
    CASE maintenance_type
      WHEN 'index_rebuild' THEN CURRENT_DATE + INTERVAL '1 week'
      WHEN 'index_removal' THEN CURRENT_DATE + INTERVAL '2 weeks'
      WHEN 'collection_compaction' THEN CURRENT_DATE + INTERVAL '1 month'
      WHEN 'statistics_refresh' THEN CURRENT_DATE + INTERVAL '1 day'
      ELSE CURRENT_DATE + INTERVAL '1 month'
    END as scheduled_date,

    -- Impact assessment
    CASE maintenance_type
      WHEN 'index_rebuild' THEN 'medium'
      WHEN 'index_removal' THEN 'low'
      WHEN 'collection_compaction' THEN 'high'
      WHEN 'statistics_refresh' THEN 'low'
      ELSE 'medium'
    END as impact_level,

    -- Duration estimation
    CASE maintenance_type
      WHEN 'index_rebuild' THEN INTERVAL '30 minutes'
      WHEN 'index_removal' THEN INTERVAL '5 minutes'
      WHEN 'collection_compaction' THEN INTERVAL '2 hours'
      WHEN 'statistics_refresh' THEN INTERVAL '10 minutes'
      ELSE INTERVAL '1 hour'
    END as estimated_duration,

    -- Approval requirements
    CASE impact_level
      WHEN 'high' THEN 'director_approval'
      WHEN 'medium' THEN 'manager_approval'
      ELSE 'automatic_approval'
    END as approval_requirement,

    -- Maintenance window validation
    CASE 
      WHEN EXTRACT(DOW FROM scheduled_date) IN (0, 6) THEN true -- Weekends preferred
      WHEN EXTRACT(HOUR FROM mc.maintenance_window_start) BETWEEN 2 AND 4 THEN true
      ELSE false
    END as within_maintenance_window,

    -- Business justification
    CASE maintenance_type
      WHEN 'index_rebuild' THEN 'Improve query performance and reduce storage overhead'
      WHEN 'index_removal' THEN 'Eliminate unused indexes to improve write performance'
      WHEN 'collection_compaction' THEN 'Reclaim disk space and optimize storage'
      WHEN 'statistics_refresh' THEN 'Ensure query optimizer has current statistics'
      ELSE 'General database maintenance and optimization'
    END as business_justification,

    CURRENT_TIMESTAMP as maintenance_created

  FROM (
    -- Generate maintenance tasks based on analysis results
    SELECT 'index_rebuild' as maintenance_type, database_name as target_database, 
           collection_name as target_collection, index_name as target_index
    FROM index_analysis_comprehensive
    WHERE maintenance_recommendation LIKE '%Rebuild%'

    UNION ALL

    SELECT 'index_removal' as maintenance_type, database_name, collection_name, index_name
    FROM index_analysis_comprehensive  
    WHERE maintenance_recommendation LIKE '%removal%'

    UNION ALL

    SELECT 'statistics_refresh' as maintenance_type, database_name, collection_name, NULL
    FROM slow_operations_analysis
    WHERE query_complexity = 'high'
    GROUP BY database_name, collection_name
  ) maintenance_tasks
  CROSS JOIN monitoring_configuration mc
),

backup_operations_management AS (
  -- Comprehensive backup operations and scheduling
  SELECT 
    GENERATE_UUID() as backup_id,
    backup_type,
    target_databases,

    -- Backup scheduling
    CASE backup_type
      WHEN 'full' THEN DATE_TRUNC('day', CURRENT_TIMESTAMP) + INTERVAL '2 hours'
      WHEN 'incremental' THEN DATE_TRUNC('hour', CURRENT_TIMESTAMP) + INTERVAL '4 hours'
      WHEN 'differential' THEN DATE_TRUNC('day', CURRENT_TIMESTAMP) + INTERVAL '8 hours'
      ELSE CURRENT_TIMESTAMP + INTERVAL '1 hour'
    END as scheduled_backup_time,

    -- Backup configuration
    JSON_BUILD_OBJECT(
      'compression_enabled', true,
      'encryption_enabled', true,
      'verify_backup', true,
      'test_restore', backup_type = 'full',
      'storage_location', '/backup/mongodb/' || TO_CHAR(CURRENT_DATE, 'YYYY/MM/DD'),
      'retention_days', mc.backup_retention_days
    ) as backup_configuration,

    -- Estimated backup metrics
    CASE backup_type
      WHEN 'full' THEN JSON_BUILD_OBJECT(
        'estimated_size_gb', 25.5,
        'estimated_duration_minutes', 45,
        'estimated_compressed_size_gb', 12.8
      )
      WHEN 'incremental' THEN JSON_BUILD_OBJECT(
        'estimated_size_gb', 2.5,
        'estimated_duration_minutes', 8,
        'estimated_compressed_size_gb', 1.2
      )
      WHEN 'differential' THEN JSON_BUILD_OBJECT(
        'estimated_size_gb', 8.5,
        'estimated_duration_minutes', 15,
        'estimated_compressed_size_gb', 4.2
      )
    END as backup_estimates,

    -- Backup priority and impact
    CASE backup_type
      WHEN 'full' THEN 'high'
      WHEN 'differential' THEN 'medium'
      ELSE 'low'
    END as backup_priority,

    -- Validation requirements
    JSON_BUILD_OBJECT(
      'integrity_check_required', true,
      'restore_test_required', backup_type = 'full',
      'checksum_verification', true,
      'backup_verification', true
    ) as validation_requirements

  FROM (
    SELECT 'full' as backup_type, ARRAY['production_db'] as target_databases
    UNION ALL
    SELECT 'incremental' as backup_type, ARRAY['production_db', 'analytics_db'] as target_databases
    UNION ALL  
    SELECT 'differential' as backup_type, ARRAY['production_db'] as target_databases
  ) backup_schedule
  CROSS JOIN monitoring_configuration mc
),

administration_dashboard AS (
  -- Comprehensive administration dashboard and reporting
  SELECT 
    pmc.metric_timestamp,
    pmc.server_name,
    pmc.environment,

    -- Connection summary
    (pmc.connection_metrics->>'current_connections')::int as current_connections,
    (pmc.connection_metrics->>'utilization_percent')::decimal as connection_utilization,

    -- Performance summary
    (pmc.operation_metrics->>'query_ops_per_sec')::decimal as queries_per_second,
    (pmc.memory_metrics->>'cache_utilization_percent')::decimal as cache_utilization,
    (pmc.database_metrics->>'data_size_gb')::decimal as data_size_gb,

    -- Slow operations summary
    COUNT(soa.operation_id) as slow_operations_count,
    COUNT(soa.operation_id) FILTER (WHERE soa.performance_impact = 'critical') as critical_slow_ops,
    COUNT(soa.operation_id) FILTER (WHERE soa.index_usage = 'full_scan') as full_scan_operations,
    AVG(soa.query_efficiency_percent) as avg_query_efficiency,

    -- Index maintenance summary
    COUNT(iac.index_name) as total_indexes_analyzed,
    COUNT(iac.index_name) FILTER (WHERE iac.efficiency_rating = 'unused') as unused_indexes,
    COUNT(iac.index_name) FILTER (WHERE iac.maintenance_recommendation != 'Maintain current strategy') as indexes_needing_maintenance,
    SUM(iac.maintenance_priority_score) as total_maintenance_priority,

    -- Maintenance scheduling summary
    COUNT(ms.maintenance_id) as scheduled_maintenance_tasks,
    COUNT(ms.maintenance_id) FILTER (WHERE ms.impact_level = 'high') as high_impact_maintenance,
    COUNT(ms.maintenance_id) FILTER (WHERE ms.approval_requirement = 'director_approval') as director_approval_required,

    -- Backup operations summary
    COUNT(bom.backup_id) as scheduled_backups,
    COUNT(bom.backup_id) FILTER (WHERE bom.backup_type = 'full') as full_backups_scheduled,
    SUM((bom.backup_estimates->>'estimated_size_gb')::decimal) as total_estimated_backup_size_gb,

    -- Alert summary
    array_length(pmc.performance_alerts, 1) as active_alerts_count,

    -- Overall health score
    CASE 
      WHEN connection_utilization < 60 AND cache_utilization < 80 AND critical_slow_ops = 0 THEN 'excellent'
      WHEN connection_utilization < 75 AND cache_utilization < 85 AND critical_slow_ops <= 2 THEN 'good'
      WHEN connection_utilization < 85 AND cache_utilization < 90 AND critical_slow_ops <= 5 THEN 'fair'
      ELSE 'needs_attention'
    END as overall_health_status,

    -- Recommendations summary
    CASE 
      WHEN unused_indexes > 5 THEN 'Review and remove unused indexes for improved performance'
      WHEN full_scan_operations > 10 THEN 'Create missing indexes to optimize query performance'
      WHEN avg_query_efficiency < 50 THEN 'Optimize queries for better selectivity'
      WHEN high_impact_maintenance > 3 THEN 'Schedule maintenance window for optimization tasks'
      ELSE 'Database performance is within acceptable parameters'
    END as primary_recommendation

  FROM performance_metrics_collection pmc
  LEFT JOIN slow_operations_analysis soa ON true
  LEFT JOIN index_analysis_comprehensive iac ON true
  LEFT JOIN maintenance_scheduling ms ON true
  LEFT JOIN backup_operations_management bom ON true
  GROUP BY pmc.metric_timestamp, pmc.server_name, pmc.environment, pmc.connection_metrics,
           pmc.operation_metrics, pmc.memory_metrics, pmc.database_metrics, pmc.performance_alerts
)

-- Generate comprehensive administration report
SELECT 
  ad.metric_timestamp,
  ad.server_name,
  ad.environment,

  -- Current status
  ad.current_connections,
  ad.connection_utilization,
  ad.queries_per_second,
  ad.cache_utilization,
  ad.data_size_gb,

  -- Performance analysis
  ad.slow_operations_count,
  ad.critical_slow_ops,
  ad.full_scan_operations,
  ROUND(ad.avg_query_efficiency, 1) as avg_query_efficiency_percent,

  -- Index management
  ad.total_indexes_analyzed,
  ad.unused_indexes,
  ad.indexes_needing_maintenance,
  ad.total_maintenance_priority,

  -- Operational management
  ad.scheduled_maintenance_tasks,
  ad.high_impact_maintenance,
  ad.director_approval_required,

  -- Backup management
  ad.scheduled_backups,
  ad.full_backups_scheduled,
  ROUND(ad.total_estimated_backup_size_gb, 1) as total_backup_size_gb,

  -- Health assessment
  ad.active_alerts_count,
  ad.overall_health_status,
  ad.primary_recommendation,

  -- Operational metrics
  CASE ad.overall_health_status
    WHEN 'excellent' THEN 'Continue current operations and monitoring'
    WHEN 'good' THEN 'Monitor performance trends and plan preventive maintenance'
    WHEN 'fair' THEN 'Schedule performance review and optimization planning'
    ELSE 'Immediate attention required - review alerts and implement optimizations'
  END as operational_guidance,

  -- Next actions
  ARRAY[
    CASE WHEN ad.critical_slow_ops > 0 THEN 'Investigate critical slow operations' END,
    CASE WHEN ad.unused_indexes > 3 THEN 'Schedule unused index removal' END,
    CASE WHEN ad.connection_utilization > 80 THEN 'Review connection pooling configuration' END,
    CASE WHEN ad.cache_utilization > 85 THEN 'Consider memory allocation optimization' END,
    CASE WHEN ad.full_scan_operations > 5 THEN 'Create missing indexes for better performance' END
  ] as immediate_action_items

FROM administration_dashboard ad
ORDER BY ad.metric_timestamp DESC;

-- QueryLeaf provides comprehensive MongoDB administration capabilities:
-- 1. Real-time performance monitoring with configurable thresholds and alerting
-- 2. Automated slow operation analysis and query optimization recommendations
-- 3. Intelligent index analysis with maintenance prioritization and scheduling
-- 4. Comprehensive backup management with scheduling and retention policies
-- 5. Maintenance planning with approval workflows and impact assessment
-- 6. Security auditing and compliance monitoring for enterprise requirements
-- 7. Capacity planning with growth trend analysis and resource optimization
-- 8. Integration with MongoDB native administration tools and enterprise monitoring
-- 9. SQL-familiar syntax for complex administration operations and reporting
-- 10. Automated administration workflows with intelligent decision-making capabilities

Best Practices for Production MongoDB Administration

Enterprise Operations Management and Monitoring

Essential principles for effective MongoDB database administration:

Comprehensive Monitoring: Implement real-time performance monitoring with configurable thresholds and intelligent alerting
Proactive Maintenance: Schedule automated maintenance operations with approval workflows and impact assessment
Performance Optimization: Continuously analyze slow operations and optimize query performance with index recommendations
Capacity Planning: Monitor growth trends and plan resource allocation for optimal performance and cost efficiency
Security Management: Implement comprehensive security auditing and access control monitoring for compliance
Backup Strategy: Maintain robust backup operations with automated scheduling, verification, and retention management

Scalability and Production Deployment

Optimize MongoDB administration for enterprise-scale requirements:

Monitoring Infrastructure: Deploy scalable monitoring solutions that handle high-volume metrics collection and analysis
Automated Operations: Implement intelligent automation for routine maintenance and optimization tasks
Performance Baselines: Establish and maintain performance baselines for proactive issue detection
Disaster Recovery: Design comprehensive backup and recovery procedures with regular testing and validation
Compliance Integration: Integrate administration workflows with enterprise compliance and governance frameworks
Operational Excellence: Create standardized procedures for MongoDB operations with documentation and training

Conclusion

MongoDB database administration provides comprehensive operations management capabilities that enable enterprise-grade performance monitoring, maintenance automation, and operational excellence through sophisticated administration frameworks and intelligent automation. The combination of real-time monitoring, automated maintenance, and proactive optimization ensures optimal MongoDB performance and reliability.

Key MongoDB Database Administration benefits include:

Real-time Monitoring: Comprehensive performance metrics collection with intelligent alerting and threshold management
Automated Maintenance: Intelligent scheduling and execution of maintenance operations with approval workflows
Performance Optimization: Continuous analysis and optimization of query performance with actionable recommendations
Enterprise Security: Comprehensive security auditing and access control monitoring for compliance requirements
Capacity Management: Proactive capacity planning with growth trend analysis and resource optimization
SQL Accessibility: Familiar SQL-style administration operations through QueryLeaf for accessible database management

Whether you're managing production MongoDB deployments, optimizing database performance, implementing compliance monitoring, or planning capacity growth, MongoDB database administration with QueryLeaf's familiar SQL interface provides the foundation for robust, scalable database operations management.

QueryLeaf Integration: QueryLeaf automatically optimizes MongoDB database administration while providing SQL-familiar syntax for performance monitoring, maintenance scheduling, and operational management. Advanced administration patterns, automated optimization workflows, and enterprise monitoring capabilities are seamlessly handled through familiar SQL constructs, making sophisticated database operations accessible to SQL-oriented administration teams.

The combination of MongoDB's comprehensive administration capabilities with SQL-style operations management makes it an ideal platform for applications requiring both sophisticated database operations and familiar administration patterns, ensuring your MongoDB deployments can operate efficiently while maintaining optimal performance and operational excellence.

November 26, 2025
23 min read

MongoDB Document Versioning and Audit Trails: Enterprise-Grade Data History Management and Compliance Tracking

Enterprise applications require comprehensive data history tracking, audit trails, and compliance management to meet regulatory requirements, support forensic analysis, and maintain data integrity across complex business operations. Traditional approaches to document versioning often struggle with storage efficiency, query performance, and the complexity of managing historical data alongside current records.

MongoDB document versioning provides sophisticated data history management capabilities that enable audit trails, compliance tracking, and temporal data analysis through flexible schema design and optimized storage strategies. Unlike rigid relational audit tables that require complex joins and separate storage, MongoDB's document model allows for efficient embedded versioning, reference-based history tracking, and hybrid approaches that balance performance with storage requirements.

The Traditional Audit Trail Challenge

Conventional relational approaches to audit trails and versioning face significant limitations:

-- Traditional PostgreSQL audit trail approach - complex table management and poor performance

-- Main business entity table
CREATE TABLE products (
    product_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    name VARCHAR(500) NOT NULL,
    category VARCHAR(100),
    price DECIMAL(10,2) NOT NULL,
    description TEXT,
    supplier_id UUID,
    status VARCHAR(50) DEFAULT 'active',

    -- Versioning metadata
    version INTEGER NOT NULL DEFAULT 1,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    created_by VARCHAR(100),
    updated_by VARCHAR(100),

    -- Soft delete support
    deleted BOOLEAN DEFAULT FALSE,
    deleted_at TIMESTAMP,
    deleted_by VARCHAR(100)
);

-- Separate audit trail table with complex structure
CREATE TABLE product_audit (
    audit_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    product_id UUID NOT NULL REFERENCES products(product_id),
    operation_type VARCHAR(20) NOT NULL, -- INSERT, UPDATE, DELETE, UNDELETE

    -- Version tracking
    version_from INTEGER,
    version_to INTEGER NOT NULL,

    -- Complete historical snapshot (storage intensive)
    historical_data JSONB NOT NULL,
    changed_fields JSONB,

    -- Change metadata
    change_timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    changed_by VARCHAR(100) NOT NULL,
    change_reason TEXT,
    change_source VARCHAR(100), -- 'api', 'admin_panel', 'batch_process', etc.

    -- Audit context
    session_id VARCHAR(100),
    request_id VARCHAR(100),
    ip_address INET,
    user_agent TEXT,

    -- Compliance metadata
    retention_until TIMESTAMP,
    compliance_flags JSONB DEFAULT '{}'::jsonb,
    regulatory_context VARCHAR(200)
);

-- Complex trigger system for automatic audit trail generation
CREATE OR REPLACE FUNCTION create_product_audit()
RETURNS TRIGGER AS $$
DECLARE
    changed_fields jsonb := '{}'::jsonb;
    field_name text;
    audit_operation text;
    user_context jsonb;
BEGIN
    -- Determine operation type
    IF TG_OP = 'INSERT' THEN
        audit_operation := 'INSERT';
        changed_fields := to_jsonb(NEW) - 'created_at' - 'updated_at';

        INSERT INTO product_audit (
            product_id, operation_type, version_to, 
            historical_data, changed_fields, changed_by, change_source
        ) VALUES (
            NEW.product_id, audit_operation, NEW.version,
            to_jsonb(NEW), changed_fields, 
            NEW.created_by, COALESCE(current_setting('app.change_source', true), 'system')
        );

        RETURN NEW;

    ELSIF TG_OP = 'UPDATE' THEN
        -- Detect soft delete
        IF OLD.deleted = FALSE AND NEW.deleted = TRUE THEN
            audit_operation := 'DELETE';
        ELSIF OLD.deleted = TRUE AND NEW.deleted = FALSE THEN
            audit_operation := 'UNDELETE';
        ELSE
            audit_operation := 'UPDATE';
        END IF;

        -- Complex field-by-field change detection
        FOR field_name IN SELECT key FROM jsonb_each(to_jsonb(NEW)) LOOP
            IF to_jsonb(NEW)->>field_name IS DISTINCT FROM to_jsonb(OLD)->>field_name THEN
                changed_fields := changed_fields || jsonb_build_object(
                    field_name, jsonb_build_object(
                        'old_value', to_jsonb(OLD)->>field_name,
                        'new_value', to_jsonb(NEW)->>field_name
                    )
                );
            END IF;
        END LOOP;

        -- Only create audit record if there are meaningful changes
        IF changed_fields != '{}'::jsonb THEN
            -- Increment version
            NEW.version := OLD.version + 1;
            NEW.updated_at := CURRENT_TIMESTAMP;

            INSERT INTO product_audit (
                product_id, operation_type, version_from, version_to,
                historical_data, changed_fields, changed_by, 
                change_source, change_reason
            ) VALUES (
                NEW.product_id, audit_operation, OLD.version, NEW.version,
                to_jsonb(OLD), changed_fields, 
                COALESCE(NEW.updated_by, OLD.updated_by),
                COALESCE(current_setting('app.change_source', true), 'system'),
                COALESCE(current_setting('app.change_reason', true), 'System update')
            );
        END IF;

        RETURN NEW;

    ELSIF TG_OP = 'DELETE' THEN
        -- Handle hard delete (rarely used in enterprise applications)
        INSERT INTO product_audit (
            product_id, operation_type, version_from,
            historical_data, changed_by, change_source
        ) VALUES (
            OLD.product_id, 'HARD_DELETE', OLD.version,
            to_jsonb(OLD), 
            COALESCE(current_setting('app.user_id', true), 'system'),
            COALESCE(current_setting('app.change_source', true), 'system')
        );

        RETURN OLD;
    END IF;

    RETURN COALESCE(NEW, OLD);
END;
$$ LANGUAGE plpgsql;

-- Apply audit trigger (adds significant overhead)
CREATE TRIGGER product_audit_trigger
    BEFORE INSERT OR UPDATE OR DELETE ON products
    FOR EACH ROW EXECUTE FUNCTION create_product_audit();

-- Complex audit trail queries with poor performance
WITH audit_trail_analysis AS (
    SELECT 
        pa.product_id,
        pa.audit_id,
        pa.operation_type,
        pa.version_from,
        pa.version_to,
        pa.change_timestamp,
        pa.changed_by,
        pa.changed_fields,

        -- Extract specific field changes (complex JSON processing)
        CASE 
            WHEN pa.changed_fields ? 'price' THEN 
                jsonb_build_object(
                    'old_price', (pa.changed_fields->'price'->>'old_value')::decimal,
                    'new_price', (pa.changed_fields->'price'->>'new_value')::decimal,
                    'price_change_percent', 
                        CASE WHEN (pa.changed_fields->'price'->>'old_value')::decimal > 0 THEN
                            (((pa.changed_fields->'price'->>'new_value')::decimal - 
                              (pa.changed_fields->'price'->>'old_value')::decimal) /
                             (pa.changed_fields->'price'->>'old_value')::decimal) * 100
                        ELSE 0 END
                )
            ELSE NULL
        END as price_change_analysis,

        -- Calculate time between changes
        LAG(pa.change_timestamp) OVER (
            PARTITION BY pa.product_id 
            ORDER BY pa.change_timestamp
        ) as previous_change_timestamp,

        -- Identify frequent changers
        COUNT(*) OVER (
            PARTITION BY pa.product_id 
            ORDER BY pa.change_timestamp 
            RANGE BETWEEN INTERVAL '1 day' PRECEDING AND CURRENT ROW
        ) as changes_in_last_day,

        -- User activity analysis
        COUNT(DISTINCT pa.changed_by) OVER (
            PARTITION BY pa.product_id
        ) as unique_users_modified,

        -- Compliance tracking
        CASE 
            WHEN pa.retention_until IS NOT NULL AND pa.retention_until < CURRENT_TIMESTAMP THEN 'expired'
            WHEN pa.compliance_flags ? 'gdpr_subject' THEN 'gdpr_protected'
            WHEN pa.compliance_flags ? 'financial_record' THEN 'sox_compliant'
            ELSE 'standard'
        END as compliance_status

    FROM product_audit pa
    WHERE pa.change_timestamp >= CURRENT_TIMESTAMP - INTERVAL '30 days'
),

audit_summary AS (
    -- Aggregate audit information (expensive operations)
    SELECT 
        ata.product_id,

        -- Change frequency analysis
        COUNT(*) as total_changes,
        COUNT(DISTINCT ata.changed_by) as unique_modifiers,
        COUNT(*) FILTER (WHERE ata.operation_type = 'UPDATE') as update_count,
        COUNT(*) FILTER (WHERE ata.operation_type = 'DELETE') as delete_count,

        -- Time-based analysis
        MAX(ata.change_timestamp) as last_modified,
        MIN(ata.change_timestamp) as first_modified_in_period,
        AVG(EXTRACT(EPOCH FROM (ata.change_timestamp - ata.previous_change_timestamp))) as avg_time_between_changes,

        -- Field change analysis
        COUNT(*) FILTER (WHERE ata.changed_fields ? 'price') as price_changes,
        COUNT(*) FILTER (WHERE ata.changed_fields ? 'status') as status_changes,
        COUNT(*) FILTER (WHERE ata.changed_fields ? 'supplier_id') as supplier_changes,

        -- Compliance summary
        array_agg(DISTINCT ata.compliance_status) as compliance_statuses,

        -- Most active user
        MODE() WITHIN GROUP (ORDER BY ata.changed_by) as most_active_modifier,

        -- Recent activity indicators
        MAX(ata.changes_in_last_day) as max_daily_changes,
        CASE WHEN MAX(ata.changes_in_last_day) > 10 THEN 'high_activity' 
             WHEN MAX(ata.changes_in_last_day) > 3 THEN 'moderate_activity'
             ELSE 'low_activity' END as activity_level

    FROM audit_trail_analysis ata
    GROUP BY ata.product_id
)

-- Generate audit report with complex joins
SELECT 
    p.product_id,
    p.name as current_name,
    p.price as current_price,
    p.status as current_status,
    p.version as current_version,

    -- Audit summary information
    aus.total_changes,
    aus.unique_modifiers,
    aus.last_modified,
    aus.activity_level,
    aus.most_active_modifier,

    -- Change breakdown
    aus.update_count,
    aus.delete_count,
    aus.price_changes,
    aus.status_changes,
    aus.supplier_changes,

    -- Compliance information
    aus.compliance_statuses,

    -- Performance metrics
    ROUND(aus.avg_time_between_changes / 3600, 2) as avg_hours_between_changes,

    -- Recent changes (expensive subquery)
    (
        SELECT jsonb_agg(
            jsonb_build_object(
                'timestamp', pa.change_timestamp,
                'operation', pa.operation_type,
                'changed_by', pa.changed_by,
                'changed_fields', jsonb_object_keys(pa.changed_fields)
            ) ORDER BY pa.change_timestamp DESC
        )
        FROM product_audit pa 
        WHERE pa.product_id = p.product_id 
        AND pa.change_timestamp >= CURRENT_TIMESTAMP - INTERVAL '7 days'
        LIMIT 10
    ) as recent_changes,

    -- Historical versions (very expensive)
    (
        SELECT jsonb_agg(
            jsonb_build_object(
                'version', pa.version_to,
                'timestamp', pa.change_timestamp,
                'data_snapshot', pa.historical_data
            ) ORDER BY pa.version_to DESC
        )
        FROM product_audit pa 
        WHERE pa.product_id = p.product_id
    ) as version_history

FROM products p
LEFT JOIN audit_summary aus ON p.product_id = aus.product_id
WHERE p.deleted = FALSE
ORDER BY aus.total_changes DESC NULLS LAST, p.updated_at DESC;

-- Problems with traditional audit trail approaches:
-- 1. Complex trigger systems that add significant overhead to every database operation
-- 2. Separate audit tables requiring expensive joins for historical analysis
-- 3. Storage inefficiency with complete document snapshots for every change
-- 4. Poor query performance for audit trail analysis and reporting
-- 5. Complex field-level change detection and comparison logic
-- 6. Difficult maintenance of audit table schemas as business entities evolve
-- 7. Limited flexibility for different versioning strategies per entity type
-- 8. Complex compliance and retention management across multiple tables
-- 9. Difficult integration with modern event-driven architectures
-- 10. Poor scalability with high-frequency change environments

-- Compliance reporting challenges (complex multi-table queries)
WITH gdpr_audit_compliance AS (
    SELECT 
        pa.product_id,
        pa.changed_by,
        pa.change_timestamp,
        pa.changed_fields,

        -- GDPR compliance analysis
        CASE 
            WHEN pa.compliance_flags ? 'gdpr_subject' THEN
                jsonb_build_object(
                    'requires_anonymization', true,
                    'retention_period', pa.retention_until,
                    'lawful_basis', pa.compliance_flags->'gdpr_lawful_basis',
                    'data_subject_rights', ARRAY['access', 'rectification', 'erasure', 'portability']
                )
            ELSE NULL
        END as gdpr_metadata,

        -- SOX compliance for financial data
        CASE 
            WHEN pa.compliance_flags ? 'financial_record' THEN
                jsonb_build_object(
                    'sox_compliant', true,
                    'immutable_record', true,
                    'retention_required', '7 years',
                    'audit_trail_complete', true
                )
            ELSE NULL
        END as sox_metadata,

        -- Change approval workflow
        CASE 
            WHEN pa.changed_fields ? 'price' AND (pa.changed_fields->'price'->>'new_value')::decimal > 1000 THEN
                'requires_manager_approval'
            WHEN pa.operation_type = 'DELETE' THEN
                'requires_director_approval'
            ELSE 'standard_approval'
        END as approval_requirement

    FROM product_audit pa
    WHERE pa.compliance_flags IS NOT NULL
    AND pa.change_timestamp >= CURRENT_TIMESTAMP - INTERVAL '1 year'
)

SELECT 
    COUNT(*) as total_compliance_events,
    COUNT(*) FILTER (WHERE gdpr_metadata IS NOT NULL) as gdpr_events,
    COUNT(*) FILTER (WHERE sox_metadata IS NOT NULL) as sox_events,
    COUNT(*) FILTER (WHERE approval_requirement != 'standard_approval') as approval_required_events,

    -- Compliance summary
    jsonb_object_agg(
        approval_requirement,
        COUNT(*)
    ) as approval_breakdown,

    array_agg(DISTINCT changed_by) as users_with_compliance_changes

FROM gdpr_audit_compliance;

-- Traditional approach limitations:
-- 1. Performance degradation with large audit tables and complex queries
-- 2. Storage overhead from complete document snapshots and redundant data
-- 3. Maintenance complexity for evolving audit schemas and compliance requirements
-- 4. Limited flexibility for different versioning strategies per business context
-- 5. Complex reporting and analytics across multiple related audit tables
-- 6. Difficult implementation of retention policies and data lifecycle management
-- 7. Poor integration with modern microservices and event-driven architectures
-- 8. Limited support for distributed audit trails across multiple systems
-- 9. Complex user access control and audit log security management
-- 10. Difficult compliance reporting across multiple regulatory frameworks

MongoDB provides comprehensive document versioning capabilities with flexible storage strategies:

// MongoDB Advanced Document Versioning - Enterprise-grade audit trails with flexible versioning strategies
const { MongoClient, ObjectId } = require('mongodb');

const client = new MongoClient('mongodb://localhost:27017');
const db = client.db('enterprise_audit_system');

// Comprehensive MongoDB Document Versioning Manager
class AdvancedDocumentVersioningManager {
  constructor(db, config = {}) {
    this.db = db;
    this.collections = {
      products: db.collection('products'),
      productVersions: db.collection('product_versions'),
      auditTrail: db.collection('audit_trail'),
      complianceTracking: db.collection('compliance_tracking'),
      retentionPolicies: db.collection('retention_policies'),
      userSessions: db.collection('user_sessions')
    };

    // Advanced versioning configuration
    this.config = {
      // Versioning strategy
      versioningStrategy: config.versioningStrategy || 'hybrid', // 'embedded', 'referenced', 'hybrid'
      maxEmbeddedVersions: config.maxEmbeddedVersions || 5,
      compressionEnabled: config.compressionEnabled !== false,
      enableFieldLevelVersioning: config.enableFieldLevelVersioning !== false,

      // Audit configuration
      enableAuditTrail: config.enableAuditTrail !== false,
      auditLevel: config.auditLevel || 'comprehensive', // 'basic', 'detailed', 'comprehensive'
      enableUserTracking: config.enableUserTracking !== false,
      enableSessionTracking: config.enableSessionTracking !== false,

      // Compliance configuration
      enableComplianceTracking: config.enableComplianceTracking !== false,
      gdprCompliance: config.gdprCompliance !== false,
      soxCompliance: config.soxCompliance || false,
      hipaCompliance: config.hipaCompliance || false,
      customComplianceRules: config.customComplianceRules || [],

      // Performance optimization
      enableIndexOptimization: config.enableIndexOptimization !== false,
      enableBackgroundArchiving: config.enableBackgroundArchiving || false,
      retentionPeriod: config.retentionPeriod || 365 * 7, // 7 years default
      enableChangeStreamIntegration: config.enableChangeStreamIntegration || false
    };

    // Version tracking
    this.versionCounters = new Map();
    this.sessionContext = new Map();
    this.complianceRules = new Map();

    this.initializeVersioningSystem();
  }

  async initializeVersioningSystem() {
    console.log('Initializing advanced document versioning system...');

    try {
      // Setup versioning infrastructure
      await this.setupVersioningInfrastructure();

      // Initialize compliance tracking
      if (this.config.enableComplianceTracking) {
        await this.initializeComplianceTracking();
      }

      // Setup audit trail processing
      if (this.config.enableAuditTrail) {
        await this.setupAuditTrailProcessing();
      }

      // Initialize background processes
      await this.initializeBackgroundProcesses();

      console.log('Document versioning system initialized successfully');

    } catch (error) {
      console.error('Error initializing versioning system:', error);
      throw error;
    }
  }

  async setupVersioningInfrastructure() {
    console.log('Setting up versioning infrastructure...');

    try {
      // Create optimized indexes for versioning
      await this.collections.products.createIndexes([
        { key: { _id: 1, version: -1 }, background: true },
        { key: { 'metadata.lastModified': -1 }, background: true },
        { key: { 'metadata.modifiedBy': 1, 'metadata.lastModified': -1 }, background: true },
        { key: { 'compliance.retentionUntil': 1 }, background: true, sparse: true }
      ]);

      // Version collection indexes
      await this.collections.productVersions.createIndexes([
        { key: { documentId: 1, version: -1 }, background: true },
        { key: { 'metadata.timestamp': -1 }, background: true },
        { key: { 'metadata.operationType': 1, 'metadata.timestamp': -1 }, background: true },
        { key: { 'compliance.retentionUntil': 1 }, background: true, sparse: true }
      ]);

      // Audit trail indexes
      await this.collections.auditTrail.createIndexes([
        { key: { documentId: 1, timestamp: -1 }, background: true },
        { key: { userId: 1, timestamp: -1 }, background: true },
        { key: { operationType: 1, timestamp: -1 }, background: true },
        { key: { 'compliance.requiresRetention': 1, timestamp: -1 }, background: true }
      ]);

      console.log('Versioning infrastructure setup completed');

    } catch (error) {
      console.error('Error setting up versioning infrastructure:', error);
      throw error;
    }
  }

  async createVersionedDocument(documentData, userContext = {}) {
    console.log('Creating new versioned document...');
    const startTime = Date.now();

    try {
      // Generate document metadata
      const documentId = new ObjectId();
      const currentTimestamp = new Date();

      // Prepare versioned document
      const versionedDocument = {
        _id: documentId,
        ...documentData,

        // Version metadata
        version: 1,
        metadata: {
          createdAt: currentTimestamp,
          lastModified: currentTimestamp,
          createdBy: userContext.userId || 'system',
          modifiedBy: userContext.userId || 'system',

          // Change tracking
          changeHistory: [],
          totalChanges: 0,

          // Session context
          sessionId: userContext.sessionId || new ObjectId().toString(),
          requestId: userContext.requestId || new ObjectId().toString(),
          ipAddress: userContext.ipAddress,
          userAgent: userContext.userAgent,

          // Version strategy metadata
          versioningStrategy: this.config.versioningStrategy,
          embeddedVersions: []
        },

        // Compliance tracking
        compliance: await this.generateComplianceMetadata(documentData, userContext, 'create'),

        // Audit context
        auditContext: {
          operationType: 'create',
          operationTimestamp: currentTimestamp,
          businessContext: userContext.businessContext || {},
          regulatoryContext: userContext.regulatoryContext || {}
        }
      };

      // Insert document with session for consistency
      const session = client.startSession();

      try {
        await session.withTransaction(async () => {
          // Create main document
          const insertResult = await this.collections.products.insertOne(versionedDocument, { session });

          // Create initial audit trail entry
          if (this.config.enableAuditTrail) {
            await this.createAuditTrailEntry({
              documentId: documentId,
              operationType: 'create',
              version: 1,
              documentData: versionedDocument,
              userContext: userContext,
              timestamp: currentTimestamp
            }, { session });
          }

          // Initialize compliance tracking
          if (this.config.enableComplianceTracking) {
            await this.initializeDocumentCompliance(documentId, versionedDocument, userContext, { session });
          }

          return insertResult;
        });

      } finally {
        await session.endSession();
      }

      const processingTime = Date.now() - startTime;

      console.log(`Document created successfully: ${documentId}`, {
        version: 1,
        processingTime: processingTime,
        complianceEnabled: this.config.enableComplianceTracking
      });

      return {
        documentId: documentId,
        version: 1,
        created: true,
        processingTime: processingTime,
        complianceMetadata: versionedDocument.compliance
      };

    } catch (error) {
      console.error('Error creating versioned document:', error);
      throw error;
    }
  }

  async updateVersionedDocument(documentId, updateData, userContext = {}) {
    console.log(`Updating versioned document: ${documentId}...`);
    const startTime = Date.now();

    try {
      const session = client.startSession();
      let updateResult;

      try {
        await session.withTransaction(async () => {
          // Retrieve current document
          const currentDocument = await this.collections.products.findOne(
            { _id: new ObjectId(documentId) },
            { session }
          );

          if (!currentDocument) {
            throw new Error(`Document not found: ${documentId}`);
          }

          // Analyze changes
          const changeAnalysis = await this.analyzeDocumentChanges(currentDocument, updateData, userContext);

          // Determine versioning strategy based on change significance
          const versioningStrategy = this.determineVersioningStrategy(changeAnalysis, currentDocument);

          // Create version backup based on strategy
          if (versioningStrategy.createVersionBackup) {
            await this.createVersionBackup(currentDocument, changeAnalysis, userContext, { session });
          }

          // Prepare updated document
          const updatedDocument = await this.prepareUpdatedDocument(
            currentDocument,
            updateData,
            changeAnalysis,
            userContext
          );

          // Update main document
          const updateOperation = await this.collections.products.replaceOne(
            { _id: new ObjectId(documentId) },
            updatedDocument,
            { session }
          );

          // Create audit trail entry
          if (this.config.enableAuditTrail) {
            await this.createAuditTrailEntry({
              documentId: new ObjectId(documentId),
              operationType: 'update',
              version: updatedDocument.version,
              previousVersion: currentDocument.version,
              changeAnalysis: changeAnalysis,
              documentData: updatedDocument,
              previousDocumentData: currentDocument,
              userContext: userContext,
              timestamp: new Date()
            }, { session });
          }

          // Update compliance tracking
          if (this.config.enableComplianceTracking) {
            await this.updateComplianceTracking(
              new ObjectId(documentId),
              changeAnalysis,
              userContext,
              { session }
            );
          }

          updateResult = {
            documentId: documentId,
            version: updatedDocument.version,
            previousVersion: currentDocument.version,
            changeAnalysis: changeAnalysis,
            versioningStrategy: versioningStrategy
          };
        });

      } finally {
        await session.endSession();
      }

      const processingTime = Date.now() - startTime;

      console.log(`Document updated successfully: ${documentId}`, {
        newVersion: updateResult.version,
        changesDetected: Object.keys(updateResult.changeAnalysis.changedFields).length,
        processingTime: processingTime
      });

      return {
        ...updateResult,
        updated: true,
        processingTime: processingTime
      };

    } catch (error) {
      console.error('Error updating versioned document:', error);
      throw error;
    }
  }

  async analyzeDocumentChanges(currentDocument, updateData, userContext) {
    console.log('Analyzing document changes...');

    const changeAnalysis = {
      changedFields: {},
      addedFields: {},
      removedFields: {},
      significantChanges: [],
      minorChanges: [],
      businessImpact: 'low',
      complianceImpact: 'none',
      approvalRequired: false,
      changeReason: userContext.changeReason || 'User update',
      changeCategory: 'standard'
    };

    // Perform deep comparison
    for (const [fieldPath, newValue] of Object.entries(updateData)) {
      const currentValue = this.getNestedValue(currentDocument, fieldPath);

      if (this.isDifferentValue(currentValue, newValue)) {
        const changeInfo = {
          field: fieldPath,
          oldValue: currentValue,
          newValue: newValue,
          changeType: this.determineChangeType(currentValue, newValue),
          timestamp: new Date()
        };

        changeAnalysis.changedFields[fieldPath] = changeInfo;

        // Categorize change significance
        if (this.isSignificantChange(fieldPath, currentValue, newValue, currentDocument)) {
          changeAnalysis.significantChanges.push(changeInfo);

          // Update business impact
          const fieldBusinessImpact = this.assessBusinessImpact(fieldPath, currentValue, newValue, currentDocument);
          if (this.compareImpactLevels(fieldBusinessImpact, changeAnalysis.businessImpact) > 0) {
            changeAnalysis.businessImpact = fieldBusinessImpact;
          }

          // Check compliance impact
          const fieldComplianceImpact = await this.assessComplianceImpact(fieldPath, currentValue, newValue, currentDocument);
          if (fieldComplianceImpact !== 'none') {
            changeAnalysis.complianceImpact = fieldComplianceImpact;
          }

        } else {
          changeAnalysis.minorChanges.push(changeInfo);
        }
      }
    }

    // Determine if approval is required
    changeAnalysis.approvalRequired = await this.requiresApproval(changeAnalysis, currentDocument, userContext);

    // Categorize change
    changeAnalysis.changeCategory = this.categorizeChange(changeAnalysis, currentDocument);

    return changeAnalysis;
  }

  async createVersionBackup(currentDocument, changeAnalysis, userContext, transactionOptions = {}) {
    console.log(`Creating version backup for document: ${currentDocument._id}`);

    try {
      // Determine backup strategy based on versioning configuration
      const backupStrategy = this.determineBackupStrategy(currentDocument, changeAnalysis);

      if (backupStrategy.strategy === 'embedded') {
        // Add version to embedded history
        await this.addEmbeddedVersion(currentDocument, changeAnalysis, userContext, transactionOptions);
      } else if (backupStrategy.strategy === 'referenced') {
        // Create separate version document
        await this.createReferencedVersion(currentDocument, changeAnalysis, userContext, transactionOptions);
      } else if (backupStrategy.strategy === 'hybrid') {
        // Use hybrid approach based on change significance
        if (changeAnalysis.businessImpact === 'high' || changeAnalysis.complianceImpact !== 'none') {
          await this.createReferencedVersion(currentDocument, changeAnalysis, userContext, transactionOptions);
        } else {
          await this.addEmbeddedVersion(currentDocument, changeAnalysis, userContext, transactionOptions);
        }
      }

    } catch (error) {
      console.error('Error creating version backup:', error);
      throw error;
    }
  }

  async addEmbeddedVersion(currentDocument, changeAnalysis, userContext, transactionOptions = {}) {
    console.log('Adding embedded version to document history...');

    const versionSnapshot = {
      version: currentDocument.version,
      timestamp: new Date(),
      data: this.createVersionSnapshot(currentDocument),
      metadata: {
        changeReason: changeAnalysis.changeReason,
        changedBy: userContext.userId || 'system',
        sessionId: userContext.sessionId,
        changeCategory: changeAnalysis.changeCategory,
        businessImpact: changeAnalysis.businessImpact,
        complianceImpact: changeAnalysis.complianceImpact
      }
    };

    // Add to embedded versions (with size limit)
    const updateOperation = {
      $push: {
        'metadata.embeddedVersions': {
          $each: [versionSnapshot],
          $slice: -this.config.maxEmbeddedVersions // Keep only recent versions
        }
      },
      $inc: {
        'metadata.totalVersions': 1
      }
    };

    await this.collections.products.updateOne(
      { _id: currentDocument._id },
      updateOperation,
      transactionOptions
    );
  }

  async createReferencedVersion(currentDocument, changeAnalysis, userContext, transactionOptions = {}) {
    console.log('Creating referenced version document...');

    const versionDocument = {
      _id: new ObjectId(),
      documentId: currentDocument._id,
      version: currentDocument.version,
      timestamp: new Date(),

      // Complete document snapshot
      documentSnapshot: this.createVersionSnapshot(currentDocument),

      // Change metadata
      changeMetadata: {
        changeReason: changeAnalysis.changeReason,
        changedBy: userContext.userId || 'system',
        sessionId: userContext.sessionId,
        requestId: userContext.requestId,
        changeCategory: changeAnalysis.changeCategory,
        businessImpact: changeAnalysis.businessImpact,
        complianceImpact: changeAnalysis.complianceImpact,
        changedFields: Object.keys(changeAnalysis.changedFields),
        significantChanges: changeAnalysis.significantChanges.length
      },

      // Compliance metadata
      compliance: await this.generateVersionComplianceMetadata(currentDocument, changeAnalysis, userContext),

      // Storage metadata
      storageMetadata: {
        compressionEnabled: this.config.compressionEnabled,
        storageSize: JSON.stringify(currentDocument).length,
        createdAt: new Date()
      }
    };

    await this.collections.productVersions.insertOne(versionDocument, transactionOptions);
  }

  async prepareUpdatedDocument(currentDocument, updateData, changeAnalysis, userContext) {
    console.log('Preparing updated document with versioning metadata...');

    // Create updated document
    const updatedDocument = {
      ...currentDocument,
      ...updateData,

      // Update version information
      version: currentDocument.version + 1,

      // Update metadata
      metadata: {
        ...currentDocument.metadata,
        lastModified: new Date(),
        modifiedBy: userContext.userId || 'system',
        totalChanges: currentDocument.metadata.totalChanges + 1,

        // Add change to history
        changeHistory: [
          ...currentDocument.metadata.changeHistory.slice(-9), // Keep recent 10 changes
          {
            version: currentDocument.version + 1,
            timestamp: new Date(),
            changedBy: userContext.userId || 'system',
            changeReason: changeAnalysis.changeReason,
            changedFields: Object.keys(changeAnalysis.changedFields),
            businessImpact: changeAnalysis.businessImpact
          }
        ],

        // Update session context
        sessionId: userContext.sessionId || currentDocument.metadata.sessionId,
        requestId: userContext.requestId || new ObjectId().toString(),
        ipAddress: userContext.ipAddress,
        userAgent: userContext.userAgent
      },

      // Update compliance metadata
      compliance: await this.updateComplianceMetadata(currentDocument.compliance, changeAnalysis, userContext),

      // Update audit context
      auditContext: {
        ...currentDocument.auditContext,
        lastOperation: {
          operationType: 'update',
          operationTimestamp: new Date(),
          changeAnalysis: {
            businessImpact: changeAnalysis.businessImpact,
            complianceImpact: changeAnalysis.complianceImpact,
            changeCategory: changeAnalysis.changeCategory,
            significantChanges: changeAnalysis.significantChanges.length
          },
          userContext: {
            userId: userContext.userId,
            sessionId: userContext.sessionId,
            businessContext: userContext.businessContext
          }
        }
      }
    };

    return updatedDocument;
  }

  async createAuditTrailEntry(auditData, transactionOptions = {}) {
    console.log('Creating comprehensive audit trail entry...');

    const auditEntry = {
      _id: new ObjectId(),
      documentId: auditData.documentId,
      operationType: auditData.operationType,
      version: auditData.version,
      previousVersion: auditData.previousVersion,
      timestamp: auditData.timestamp,

      // User and session context
      userId: auditData.userContext.userId || 'system',
      sessionId: auditData.userContext.sessionId,
      requestId: auditData.userContext.requestId,
      ipAddress: auditData.userContext.ipAddress,
      userAgent: auditData.userContext.userAgent,

      // Change details
      changeDetails: auditData.changeAnalysis ? {
        changedFieldsCount: Object.keys(auditData.changeAnalysis.changedFields).length,
        changedFields: Object.keys(auditData.changeAnalysis.changedFields),
        significantChangesCount: auditData.changeAnalysis.significantChanges.length,
        businessImpact: auditData.changeAnalysis.businessImpact,
        complianceImpact: auditData.changeAnalysis.complianceImpact,
        changeReason: auditData.changeAnalysis.changeReason,
        changeCategory: auditData.changeAnalysis.changeCategory,
        approvalRequired: auditData.changeAnalysis.approvalRequired
      } : null,

      // Document snapshots (based on audit level)
      documentSnapshots: this.createAuditSnapshots(auditData),

      // Business context
      businessContext: auditData.userContext.businessContext || {},
      regulatoryContext: auditData.userContext.regulatoryContext || {},

      // Compliance metadata
      compliance: {
        requiresRetention: await this.requiresComplianceRetention(auditData),
        retentionUntil: await this.calculateRetentionDate(auditData),
        complianceFlags: await this.generateComplianceFlags(auditData),
        regulatoryRequirements: await this.getApplicableRegulations(auditData)
      },

      // Technical metadata
      technicalMetadata: {
        auditLevel: this.config.auditLevel,
        processingTimestamp: new Date(),
        auditVersion: '2.0',
        dataClassification: await this.classifyAuditData(auditData)
      }
    };

    await this.collections.auditTrail.insertOne(auditEntry, transactionOptions);

    return auditEntry._id;
  }

  createAuditSnapshots(auditData) {
    const snapshots = {};

    switch (this.config.auditLevel) {
      case 'basic':
        // Only capture essential identifiers
        snapshots.documentId = auditData.documentId;
        snapshots.version = auditData.version;
        break;

      case 'detailed':
        // Capture changed fields and metadata
        snapshots.documentId = auditData.documentId;
        snapshots.version = auditData.version;
        if (auditData.changeAnalysis) {
          snapshots.changedFields = auditData.changeAnalysis.changedFields;
        }
        break;

      case 'comprehensive':
        // Capture complete document states
        snapshots.documentId = auditData.documentId;
        snapshots.version = auditData.version;
        if (auditData.documentData) {
          snapshots.currentState = this.createVersionSnapshot(auditData.documentData);
        }
        if (auditData.previousDocumentData) {
          snapshots.previousState = this.createVersionSnapshot(auditData.previousDocumentData);
        }
        if (auditData.changeAnalysis) {
          snapshots.changeAnalysis = auditData.changeAnalysis;
        }
        break;

      default:
        snapshots.documentId = auditData.documentId;
        snapshots.version = auditData.version;
    }

    return snapshots;
  }

  // Utility methods for comprehensive document versioning

  createVersionSnapshot(document) {
    // Create a clean snapshot without internal metadata
    const snapshot = { ...document };

    // Remove MongoDB internal fields
    delete snapshot._id;
    delete snapshot.metadata;
    delete snapshot.auditContext;

    // Apply compression if enabled
    if (this.config.compressionEnabled) {
      return this.compressSnapshot(snapshot);
    }

    return snapshot;
  }

  compressSnapshot(snapshot) {
    // Implement snapshot compression logic
    // This would typically use a compression algorithm like gzip
    return {
      compressed: true,
      data: snapshot, // In production, this would be compressed
      originalSize: JSON.stringify(snapshot).length,
      compressionRatio: 0.7 // Simulated compression ratio
    };
  }

  getNestedValue(obj, path) {
    return path.split('.').reduce((current, key) => current?.[key], obj);
  }

  isDifferentValue(oldValue, newValue) {
    // Handle different types and deep comparison
    if (oldValue === newValue) return false;

    if (typeof oldValue !== typeof newValue) return true;

    if (oldValue === null || newValue === null) return oldValue !== newValue;

    if (typeof oldValue === 'object') {
      return JSON.stringify(oldValue) !== JSON.stringify(newValue);
    }

    return oldValue !== newValue;
  }

  determineChangeType(oldValue, newValue) {
    if (oldValue === undefined || oldValue === null) return 'addition';
    if (newValue === undefined || newValue === null) return 'removal';
    if (typeof oldValue !== typeof newValue) return 'type_change';
    return 'modification';
  }

  isSignificantChange(fieldPath, oldValue, newValue, document) {
    // Define business-specific significant fields
    const significantFields = ['price', 'status', 'category', 'supplier_id', 'compliance_status'];

    if (significantFields.includes(fieldPath)) return true;

    // Check for percentage-based changes for numeric fields
    if (fieldPath === 'price' && typeof oldValue === 'number' && typeof newValue === 'number') {
      const changePercentage = Math.abs((newValue - oldValue) / oldValue);
      return changePercentage > 0.05; // 5% threshold
    }

    return false;
  }

  assessBusinessImpact(fieldPath, oldValue, newValue, document) {
    // Business impact assessment logic
    const highImpactFields = ['status', 'compliance_status', 'legal_status'];
    const mediumImpactFields = ['price', 'category', 'supplier_id'];

    if (highImpactFields.includes(fieldPath)) return 'high';
    if (mediumImpactFields.includes(fieldPath)) return 'medium';
    return 'low';
  }

  async assessComplianceImpact(fieldPath, oldValue, newValue, document) {
    // Compliance impact assessment
    if (document.compliance?.gdprSubject && ['name', 'email', 'phone'].includes(fieldPath)) {
      return 'gdpr_personal_data';
    }

    if (document.compliance?.financialRecord && ['price', 'cost', 'revenue'].includes(fieldPath)) {
      return 'sox_financial_data';
    }

    return 'none';
  }

  compareImpactLevels(level1, level2) {
    const levels = { 'low': 1, 'medium': 2, 'high': 3 };
    return levels[level1] - levels[level2];
  }

  async requiresApproval(changeAnalysis, document, userContext) {
    // Approval requirement logic
    if (changeAnalysis.businessImpact === 'high') return true;
    if (changeAnalysis.complianceImpact !== 'none') return true;

    // Check for high-value changes
    if (changeAnalysis.changedFields.price) {
      const priceChange = changeAnalysis.changedFields.price;
      if (priceChange.newValue > 10000 || Math.abs(priceChange.newValue - priceChange.oldValue) > 1000) {
        return true;
      }
    }

    return false;
  }

  categorizeChange(changeAnalysis, document) {
    if (changeAnalysis.businessImpact === 'high') return 'critical_business_change';
    if (changeAnalysis.complianceImpact !== 'none') return 'compliance_change';
    if (changeAnalysis.approvalRequired) return 'approval_required_change';
    if (changeAnalysis.significantChanges.length > 3) return 'major_change';
    return 'standard_change';
  }

  determineVersioningStrategy(changeAnalysis, document) {
    return {
      createVersionBackup: true,
      strategy: this.config.versioningStrategy,
      reason: changeAnalysis.businessImpact === 'high' ? 'high_impact_change' : 'standard_versioning'
    };
  }

  determineBackupStrategy(document, changeAnalysis) {
    if (this.config.versioningStrategy === 'hybrid') {
      // Use referenced storage for high-impact changes
      if (changeAnalysis.businessImpact === 'high' || changeAnalysis.complianceImpact !== 'none') {
        return { strategy: 'referenced', reason: 'high_impact_or_compliance' };
      }

      // Use embedded for standard changes if under limit
      if (document.metadata.embeddedVersions && document.metadata.embeddedVersions.length < this.config.maxEmbeddedVersions) {
        return { strategy: 'embedded', reason: 'under_embedded_limit' };
      }

      return { strategy: 'referenced', reason: 'embedded_limit_exceeded' };
    }

    return { strategy: this.config.versioningStrategy, reason: 'configured_strategy' };
  }

  async generateComplianceMetadata(documentData, userContext, operationType) {
    const complianceMetadata = {
      gdprSubject: false,
      financialRecord: false,
      healthRecord: false,
      customClassifications: [],
      dataRetentionRequired: true,
      retentionPeriod: this.config.retentionPeriod,
      retentionUntil: null
    };

    // GDPR classification
    if (this.config.gdprCompliance && this.containsPersonalData(documentData)) {
      complianceMetadata.gdprSubject = true;
      complianceMetadata.gdprLawfulBasis = userContext.gdprLawfulBasis || 'legitimate_interest';
      complianceMetadata.dataSubjectRights = ['access', 'rectification', 'erasure', 'portability'];
    }

    // SOX compliance
    if (this.config.soxCompliance && this.containsFinancialData(documentData)) {
      complianceMetadata.financialRecord = true;
      complianceMetadata.soxRetentionPeriod = 7 * 365; // 7 years
      complianceMetadata.immutableRecord = true;
    }

    // Calculate retention date
    if (complianceMetadata.dataRetentionRequired) {
      const retentionDays = complianceMetadata.soxRetentionPeriod || complianceMetadata.retentionPeriod;
      complianceMetadata.retentionUntil = new Date(Date.now() + (retentionDays * 24 * 60 * 60 * 1000));
    }

    return complianceMetadata;
  }

  containsPersonalData(documentData) {
    const personalDataFields = ['email', 'phone', 'address', 'ssn', 'name', 'dob'];
    return personalDataFields.some(field => documentData[field] !== undefined);
  }

  containsFinancialData(documentData) {
    const financialDataFields = ['price', 'cost', 'revenue', 'profit', 'tax', 'salary'];
    return financialDataFields.some(field => documentData[field] !== undefined);
  }
}

// Benefits of MongoDB Advanced Document Versioning:
// - Flexible versioning strategies (embedded, referenced, hybrid) for optimal performance
// - Comprehensive audit trails with configurable detail levels  
// - Built-in compliance tracking for GDPR, SOX, HIPAA, and custom regulations
// - Intelligent change analysis and business impact assessment
// - Automatic retention management and data lifecycle policies
// - High-performance versioning with optimized storage and indexing
// - Session and user context tracking for complete audit visibility
// - Change approval workflows for sensitive data modifications
// - Seamless integration with MongoDB's native features and operations
// - SQL-compatible versioning operations through QueryLeaf integration

module.exports = {
  AdvancedDocumentVersioningManager
};

Understanding MongoDB Document Versioning Architecture

Enterprise Compliance and Audit Trail Strategies

Implement sophisticated versioning for production compliance requirements:

// Production-ready MongoDB Document Versioning with comprehensive compliance and audit capabilities
class ProductionComplianceManager extends AdvancedDocumentVersioningManager {
  constructor(db, productionConfig) {
    super(db, productionConfig);

    this.productionConfig = {
      ...productionConfig,
      enableRegulatoryCompliance: true,
      enableAutomaticArchiving: true,
      enableComplianceReporting: true,
      enableDataGovernance: true,
      enablePrivacyProtection: true,
      enableForensicAnalysis: true
    };

    this.setupProductionCompliance();
    this.initializeRegulatoryFrameworks();
    this.setupDataGovernance();
  }

  async implementRegulatoryCompliance(documentData, regulatoryFramework) {
    console.log('Implementing comprehensive regulatory compliance...');

    const complianceFramework = {
      // GDPR compliance implementation
      gdpr: {
        dataMinimization: true,
        consentManagement: true,
        rightToRectification: true,
        rightToErasure: true,
        dataPortability: true,
        privacyByDesign: true
      },

      // SOX compliance implementation
      sox: {
        financialDataIntegrity: true,
        immutableAuditTrails: true,
        executiveApprovalWorkflows: true,
        quarterlyComplianceReporting: true,
        internalControlsTesting: true
      },

      // HIPAA compliance implementation
      hipaa: {
        phiProtection: true,
        accessControlEnforcement: true,
        encryptionAtRest: true,
        auditLogProtection: true,
        businessAssociateCompliance: true
      }
    };

    return await this.deployComplianceFramework(complianceFramework, regulatoryFramework);
  }

  async setupDataGovernanceFramework() {
    console.log('Setting up comprehensive data governance framework...');

    const governanceFramework = {
      // Data classification and cataloging
      dataClassification: {
        sensitivityLevels: ['public', 'internal', 'confidential', 'restricted'],
        dataCategories: ['personal', 'financial', 'operational', 'strategic'],
        automaticClassification: true,
        classificationWorkflows: true
      },

      // Access control and security
      accessControl: {
        roleBasedAccess: true,
        attributeBasedAccess: true,
        dynamicPermissions: true,
        privilegedAccessMonitoring: true
      },

      // Data quality management
      dataQuality: {
        validationRules: true,
        qualityMetrics: true,
        dataProfileling: true,
        qualityReporting: true
      }
    };

    return await this.deployGovernanceFramework(governanceFramework);
  }

  async implementForensicAnalysis(investigationContext) {
    console.log('Implementing forensic analysis capabilities...');

    const forensicCapabilities = {
      // Digital forensics support
      forensicAnalysis: {
        chainOfCustody: true,
        evidencePreservation: true,
        forensicReporting: true,
        expertWitnessSupport: true
      },

      // Investigation workflows
      investigationSupport: {
        timelineReconstruction: true,
        activityCorrelation: true,
        anomalyDetection: true,
        reportGeneration: true
      }
    };

    return await this.deployForensicFramework(forensicCapabilities, investigationContext);
  }
}

SQL-Style Document Versioning with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB document versioning and audit trails:

-- QueryLeaf advanced document versioning with SQL-familiar syntax for MongoDB

-- Configure document versioning settings
SET versioning_strategy = 'hybrid';
SET max_embedded_versions = 5;
SET audit_level = 'comprehensive';
SET compliance_tracking = true;
SET retention_period = 2557; -- 7 years in days

-- Advanced document versioning with comprehensive audit trail creation
WITH versioning_configuration AS (
  SELECT 
    -- Versioning strategy configuration
    'hybrid' as versioning_strategy,
    5 as max_embedded_versions,
    true as enable_compression,
    true as enable_compliance_tracking,

    -- Audit configuration
    'comprehensive' as audit_level,
    true as enable_field_level_auditing,
    true as enable_user_context_tracking,
    true as enable_session_tracking,

    -- Compliance configuration
    true as gdpr_compliance,
    false as sox_compliance,
    false as hipaa_compliance,
    ARRAY['financial_data', 'personal_data', 'health_data'] as sensitive_data_types,

    -- Retention policies
    2557 as default_retention_days, -- 7 years
    365 as gdpr_retention_days, -- 1 year for GDPR
    2557 as sox_retention_days  -- 7 years for SOX
),

document_change_analysis AS (
  -- Analyze changes and determine versioning strategy
  SELECT 
    doc_id,
    previous_version,
    new_version,
    operation_type,

    -- Field-level change analysis
    changed_fields,
    added_fields,
    removed_fields,

    -- Change significance assessment
    CASE 
      WHEN changed_fields ? 'price' AND 
           ABS((new_data->>'price')::decimal - (old_data->>'price')::decimal) > 1000 THEN 'high'
      WHEN changed_fields ? 'status' OR changed_fields ? 'compliance_status' THEN 'high'
      WHEN array_length(array(SELECT jsonb_object_keys(changed_fields)), 1) > 5 THEN 'medium'
      ELSE 'low'
    END as business_impact,

    -- Compliance impact assessment
    CASE 
      WHEN changed_fields ?& ARRAY['email', 'phone', 'address', 'name'] AND 
           old_data->>'gdpr_subject' = 'true' THEN 'gdpr_personal_data'
      WHEN changed_fields ?& ARRAY['price', 'cost', 'revenue'] AND 
           old_data->>'financial_record' = 'true' THEN 'sox_financial_data'
      WHEN changed_fields ?& ARRAY['medical_info', 'health_data'] AND 
           old_data->>'health_record' = 'true' THEN 'hipaa_health_data'
      ELSE 'none'
    END as compliance_impact,

    -- Approval requirement determination
    CASE 
      WHEN changed_fields ? 'price' AND (new_data->>'price')::decimal > 10000 THEN true
      WHEN operation_type = 'DELETE' THEN true
      WHEN changed_fields ? 'compliance_status' THEN true
      ELSE false
    END as requires_approval,

    -- Change categorization
    CASE 
      WHEN operation_type = 'DELETE' THEN 'deletion_operation'
      WHEN changed_fields ? 'price' THEN 'pricing_change'
      WHEN changed_fields ? 'status' THEN 'status_change'  
      WHEN changed_fields ? 'supplier_id' THEN 'supplier_change'
      WHEN array_length(array(SELECT jsonb_object_keys(changed_fields)), 1) > 3 THEN 'major_change'
      ELSE 'standard_change'
    END as change_category,

    -- User and session context
    user_id,
    session_id,
    request_id,
    ip_address,
    user_agent,
    change_reason,
    business_context,

    -- Timestamps
    change_timestamp,
    processing_timestamp

  FROM document_changes dc
  JOIN versioning_configuration vc ON true
  WHERE dc.change_timestamp >= CURRENT_TIMESTAMP - INTERVAL '1 hour'
),

version_backup_strategy AS (
  -- Determine optimal backup strategy for each change
  SELECT 
    dca.*,

    -- Version backup strategy determination
    CASE 
      WHEN dca.business_impact = 'high' OR dca.compliance_impact != 'none' THEN 'referenced'
      WHEN dca.change_category IN ('deletion_operation', 'major_change') THEN 'referenced'
      ELSE 'embedded'
    END as backup_strategy,

    -- Storage optimization
    CASE 
      WHEN business_impact = 'high' THEN false  -- No compression for high-impact changes
      WHEN LENGTH(old_data::text) > 100000 THEN true  -- Compress large documents
      ELSE vc.enable_compression
    END as enable_compression,

    -- Retention policy determination
    CASE dca.compliance_impact
      WHEN 'gdpr_personal_data' THEN vc.gdpr_retention_days
      WHEN 'sox_financial_data' THEN vc.sox_retention_days
      WHEN 'hipaa_health_data' THEN vc.sox_retention_days -- Use same retention as SOX
      ELSE vc.default_retention_days
    END as retention_days,

    -- Compliance metadata generation
    JSON_BUILD_OBJECT(
      'gdpr_subject', (dca.old_data->>'gdpr_subject')::boolean,
      'financial_record', (dca.old_data->>'financial_record')::boolean,
      'health_record', (dca.old_data->>'health_record')::boolean,
      'data_classification', dca.old_data->>'data_classification',
      'sensitivity_level', dca.old_data->>'sensitivity_level',
      'retention_required', true,
      'retention_until', CURRENT_TIMESTAMP + (
        CASE dca.compliance_impact
          WHEN 'gdpr_personal_data' THEN vc.gdpr_retention_days
          WHEN 'sox_financial_data' THEN vc.sox_retention_days
          ELSE vc.default_retention_days
        END || ' days'
      )::interval,
      'compliance_flags', JSON_BUILD_OBJECT(
        'requires_encryption', dca.compliance_impact != 'none',
        'requires_audit_trail', true,
        'requires_approval', dca.requires_approval,
        'immutable_record', dca.compliance_impact = 'sox_financial_data'
      )
    ) as compliance_metadata

  FROM document_change_analysis dca
  CROSS JOIN versioning_configuration vc
),

audit_trail_creation AS (
  -- Create comprehensive audit trail entries
  SELECT 
    vbs.doc_id,
    vbs.operation_type,
    vbs.previous_version,
    vbs.new_version,
    vbs.change_timestamp,

    -- Audit entry data
    JSON_BUILD_OBJECT(
      'audit_id', GENERATE_UUID(),
      'document_id', vbs.doc_id,
      'operation_type', vbs.operation_type,
      'version_from', vbs.previous_version,
      'version_to', vbs.new_version,
      'timestamp', vbs.change_timestamp,

      -- User context
      'user_context', JSON_BUILD_OBJECT(
        'user_id', vbs.user_id,
        'session_id', vbs.session_id,
        'request_id', vbs.request_id,
        'ip_address', vbs.ip_address,
        'user_agent', vbs.user_agent
      ),

      -- Change analysis
      'change_analysis', JSON_BUILD_OBJECT(
        'changed_fields', array(SELECT jsonb_object_keys(vbs.changed_fields)),
        'added_fields', array(SELECT jsonb_object_keys(vbs.added_fields)),
        'removed_fields', array(SELECT jsonb_object_keys(vbs.removed_fields)),
        'business_impact', vbs.business_impact,
        'compliance_impact', vbs.compliance_impact,
        'change_category', vbs.change_category,
        'change_reason', vbs.change_reason,
        'requires_approval', vbs.requires_approval
      ),

      -- Document snapshots (based on audit level)
      'document_snapshots', CASE vc.audit_level
        WHEN 'basic' THEN JSON_BUILD_OBJECT(
          'document_id', vbs.doc_id,
          'version', vbs.new_version
        )
        WHEN 'detailed' THEN JSON_BUILD_OBJECT(
          'document_id', vbs.doc_id,
          'version', vbs.new_version,
          'changed_fields', vbs.changed_fields
        )
        WHEN 'comprehensive' THEN JSON_BUILD_OBJECT(
          'document_id', vbs.doc_id,
          'version', vbs.new_version,
          'previous_state', vbs.old_data,
          'current_state', vbs.new_data,
          'field_changes', vbs.changed_fields
        )
        ELSE JSON_BUILD_OBJECT('document_id', vbs.doc_id)
      END,

      -- Business context
      'business_context', vbs.business_context,

      -- Compliance metadata
      'compliance', vbs.compliance_metadata,

      -- Technical metadata
      'technical_metadata', JSON_BUILD_OBJECT(
        'audit_level', vc.audit_level,
        'versioning_strategy', vbs.backup_strategy,
        'compression_enabled', vbs.enable_compression,
        'retention_days', vbs.retention_days,
        'processing_timestamp', vbs.processing_timestamp,
        'audit_version', '2.0'
      )

    ) as audit_entry_data

  FROM version_backup_strategy vbs
  CROSS JOIN versioning_configuration vc
),

version_storage_operations AS (
  -- Execute version backup operations based on strategy
  SELECT 
    vbs.doc_id,
    vbs.backup_strategy,
    vbs.previous_version,
    vbs.new_version,

    -- Embedded version data (for embedded strategy)
    CASE WHEN vbs.backup_strategy = 'embedded' THEN
      JSON_BUILD_OBJECT(
        'version', vbs.previous_version,
        'timestamp', vbs.change_timestamp,
        'data', CASE WHEN vbs.enable_compression THEN
          JSON_BUILD_OBJECT(
            'compressed', true,
            'data', vbs.old_data,
            'compression_ratio', 0.7
          )
          ELSE vbs.old_data
        END,
        'metadata', JSON_BUILD_OBJECT(
          'change_reason', vbs.change_reason,
          'changed_by', vbs.user_id,
          'session_id', vbs.session_id,
          'change_category', vbs.change_category,
          'business_impact', vbs.business_impact,
          'compliance_impact', vbs.compliance_impact
        )
      )
      ELSE NULL
    END as embedded_version_data,

    -- Referenced version document (for referenced strategy)
    CASE WHEN vbs.backup_strategy = 'referenced' THEN
      JSON_BUILD_OBJECT(
        'version_id', GENERATE_UUID(),
        'document_id', vbs.doc_id,
        'version', vbs.previous_version,
        'timestamp', vbs.change_timestamp,
        'document_snapshot', CASE WHEN vbs.enable_compression THEN
          JSON_BUILD_OBJECT(
            'compressed', true,
            'data', vbs.old_data,
            'original_size', LENGTH(vbs.old_data::text),
            'compression_ratio', 0.7
          )
          ELSE vbs.old_data
        END,
        'change_metadata', JSON_BUILD_OBJECT(
          'change_reason', vbs.change_reason,
          'changed_by', vbs.user_id,
          'session_id', vbs.session_id,
          'request_id', vbs.request_id,
          'change_category', vbs.change_category,
          'business_impact', vbs.business_impact,
          'compliance_impact', vbs.compliance_impact,
          'changed_fields', array(SELECT jsonb_object_keys(vbs.changed_fields)),
          'significant_changes', array_length(array(SELECT jsonb_object_keys(vbs.changed_fields)), 1)
        ),
        'compliance', vbs.compliance_metadata,
        'storage_metadata', JSON_BUILD_OBJECT(
          'compression_enabled', vbs.enable_compression,
          'storage_size', LENGTH(vbs.old_data::text),
          'created_at', vbs.processing_timestamp,
          'retention_until', CURRENT_TIMESTAMP + (vbs.retention_days || ' days')::interval
        )
      )
      ELSE NULL
    END as referenced_version_data

  FROM version_backup_strategy vbs
)

-- Execute versioning operations
INSERT INTO document_versions (
  document_id,
  version_strategy,
  version_data,
  audit_trail_id,
  compliance_metadata,
  created_at
)
SELECT 
  vso.doc_id,
  vso.backup_strategy,
  COALESCE(vso.embedded_version_data, vso.referenced_version_data),
  atc.audit_entry_data->>'audit_id',
  (atc.audit_entry_data->'compliance'),
  CURRENT_TIMESTAMP
FROM version_storage_operations vso
JOIN audit_trail_creation atc ON vso.doc_id = atc.doc_id;

-- Comprehensive versioning analytics and compliance reporting
WITH versioning_analytics AS (
  SELECT 
    DATE_TRUNC('day', created_at) as date_bucket,
    version_strategy,

    -- Volume metrics
    COUNT(*) as total_versions_created,
    COUNT(DISTINCT document_id) as unique_documents_versioned,

    -- Strategy distribution
    COUNT(*) FILTER (WHERE version_strategy = 'embedded') as embedded_versions,
    COUNT(*) FILTER (WHERE version_strategy = 'referenced') as referenced_versions,

    -- Compliance metrics
    COUNT(*) FILTER (WHERE compliance_metadata->>'gdpr_subject' = 'true') as gdpr_versions,
    COUNT(*) FILTER (WHERE compliance_metadata->>'financial_record' = 'true') as sox_versions,
    COUNT(*) FILTER (WHERE compliance_metadata->>'health_record' = 'true') as hipaa_versions,

    -- Business impact analysis
    COUNT(*) FILTER (WHERE (version_data->'change_metadata'->>'business_impact') = 'high') as high_impact_changes,
    COUNT(*) FILTER (WHERE (version_data->'change_metadata'->>'business_impact') = 'medium') as medium_impact_changes,
    COUNT(*) FILTER (WHERE (version_data->'change_metadata'->>'business_impact') = 'low') as low_impact_changes,

    -- Storage utilization
    AVG(LENGTH((version_data->'document_snapshot')::text)) as avg_version_size,
    SUM(LENGTH((version_data->'document_snapshot')::text)) as total_storage_used,
    AVG(CASE WHEN version_data->'storage_metadata'->>'compression_enabled' = 'true' 
             THEN (version_data->'storage_metadata'->>'compression_ratio')::decimal 
             ELSE 1.0 END) as avg_compression_ratio,

    -- User activity analysis
    COUNT(DISTINCT (version_data->'change_metadata'->>'changed_by')) as unique_users,
    MODE() WITHIN GROUP (ORDER BY (version_data->'change_metadata'->>'changed_by')) as most_active_user,

    -- Change pattern analysis
    COUNT(*) FILTER (WHERE (version_data->'change_metadata'->>'requires_approval') = 'true') as approval_required_changes,
    MODE() WITHIN GROUP (ORDER BY (version_data->'change_metadata'->>'change_category')) as most_common_change_type

  FROM document_versions
  WHERE created_at >= CURRENT_TIMESTAMP - INTERVAL '30 days'
  GROUP BY DATE_TRUNC('day', created_at), version_strategy
),

compliance_reporting AS (
  SELECT 
    va.*,

    -- Compliance percentage calculations
    ROUND((gdpr_versions * 100.0 / NULLIF(total_versions_created, 0)), 2) as gdpr_compliance_percent,
    ROUND((sox_versions * 100.0 / NULLIF(total_versions_created, 0)), 2) as sox_compliance_percent,
    ROUND((hipaa_versions * 100.0 / NULLIF(total_versions_created, 0)), 2) as hipaa_compliance_percent,

    -- Storage efficiency metrics
    ROUND((total_storage_used / 1024.0 / 1024.0), 2) as storage_used_mb,
    ROUND((avg_compression_ratio * 100), 1) as avg_compression_percent,
    ROUND((total_storage_used * (1 - avg_compression_ratio) / 1024.0 / 1024.0), 2) as storage_saved_mb,

    -- Change approval metrics
    ROUND((approval_required_changes * 100.0 / NULLIF(total_versions_created, 0)), 2) as approval_rate_percent,

    -- Risk assessment
    CASE 
      WHEN high_impact_changes > total_versions_created * 0.1 THEN 'high_risk'
      WHEN high_impact_changes > total_versions_created * 0.05 THEN 'medium_risk'
      ELSE 'low_risk'
    END as change_risk_level,

    -- Optimization recommendations
    CASE 
      WHEN avg_compression_ratio < 0.5 THEN 'review_compression_settings'
      WHEN referenced_versions > embedded_versions * 2 THEN 'optimize_versioning_strategy'
      WHEN approval_required_changes > total_versions_created * 0.2 THEN 'review_approval_thresholds'
      WHEN unique_users < 5 THEN 'review_user_access_patterns'
      ELSE 'performance_optimal'
    END as optimization_recommendation

  FROM versioning_analytics va
)

SELECT 
  date_bucket,
  version_strategy,

  -- Volume summary
  total_versions_created,
  unique_documents_versioned,
  unique_users,
  most_active_user,

  -- Strategy breakdown
  embedded_versions,
  referenced_versions,
  ROUND((embedded_versions * 100.0 / NULLIF(total_versions_created, 0)), 1) as embedded_strategy_percent,
  ROUND((referenced_versions * 100.0 / NULLIF(total_versions_created, 0)), 1) as referenced_strategy_percent,

  -- Impact analysis
  high_impact_changes,
  medium_impact_changes, 
  low_impact_changes,
  most_common_change_type,

  -- Compliance summary
  gdpr_versions,
  sox_versions,
  hipaa_versions,
  gdpr_compliance_percent,
  sox_compliance_percent,
  hipaa_compliance_percent,

  -- Storage metrics
  ROUND(avg_version_size / 1024.0, 1) as avg_version_size_kb,
  storage_used_mb,
  avg_compression_percent,
  storage_saved_mb,

  -- Approval workflow metrics
  approval_required_changes,
  approval_rate_percent,

  -- Risk and optimization
  change_risk_level,
  optimization_recommendation,

  -- Detailed recommendations
  CASE optimization_recommendation
    WHEN 'review_compression_settings' THEN 'Enable compression for better storage efficiency'
    WHEN 'optimize_versioning_strategy' THEN 'Consider increasing embedded version limits'
    WHEN 'review_approval_thresholds' THEN 'Adjust approval requirements for better workflow efficiency'
    WHEN 'review_user_access_patterns' THEN 'Evaluate user permissions and training needs'
    ELSE 'Continue current versioning configuration - performance is optimal'
  END as detailed_recommendation

FROM compliance_reporting
ORDER BY date_bucket DESC, version_strategy;

-- QueryLeaf provides comprehensive document versioning capabilities:
-- 1. Flexible versioning strategies with embedded, referenced, and hybrid approaches
-- 2. Advanced audit trails with configurable detail levels and compliance tracking
-- 3. Comprehensive change analysis and business impact assessment
-- 4. Built-in compliance support for GDPR, SOX, HIPAA, and custom regulations
-- 5. Intelligent storage optimization with compression and retention management
-- 6. User context tracking and session management for complete audit visibility
-- 7. Change approval workflows for sensitive data and high-impact modifications
-- 8. Performance monitoring and optimization recommendations for versioning strategies
-- 9. SQL-familiar syntax for complex versioning operations and compliance reporting
-- 10. Integration with MongoDB's native document features and indexing optimizations

Best Practices for Production Document Versioning

Enterprise Compliance and Audit Trail Management

Essential principles for effective MongoDB document versioning deployment:

Versioning Strategy Selection: Choose appropriate versioning strategies based on change frequency, document size, and business requirements
Compliance Integration: Implement comprehensive compliance tracking for regulatory frameworks like GDPR, SOX, and HIPAA
Audit Trail Design: Create detailed audit trails with configurable granularity for different business contexts
Storage Optimization: Balance version history completeness with storage efficiency through compression and retention policies
User Context Tracking: Capture complete user, session, and business context for forensic analysis capabilities
Change Approval Workflows: Implement automated approval workflows for high-impact changes and sensitive data modifications

Scalability and Production Deployment

Optimize document versioning for enterprise-scale requirements:

Performance Optimization: Design efficient indexing strategies for versioning and audit collections
Storage Management: Implement automated archiving and retention policies for historical data lifecycle management
Compliance Reporting: Create comprehensive reporting capabilities for regulatory audits and compliance verification
Data Governance: Integrate versioning with enterprise data governance frameworks and security policies
Forensic Readiness: Ensure versioning systems support digital forensics and legal discovery requirements
Integration Architecture: Design versioning systems that integrate seamlessly with existing enterprise applications and workflows

Conclusion

MongoDB document versioning provides comprehensive data history management capabilities that enable enterprise-grade audit trails, regulatory compliance, and forensic analysis through flexible storage strategies and configurable compliance frameworks. The combination of embedded, referenced, and hybrid versioning approaches ensures optimal performance while maintaining complete change visibility.

Key MongoDB Document Versioning benefits include:

Flexible Versioning Strategies: Multiple approaches to balance storage efficiency with audit completeness requirements
Comprehensive Compliance Support: Built-in support for GDPR, SOX, HIPAA, and custom regulatory frameworks
Detailed Audit Trails: Configurable audit granularity from basic change tracking to complete document snapshots
Intelligent Storage Management: Automated compression, retention, and archiving for optimized storage utilization
Complete Context Tracking: User, session, and business context capture for forensic analysis and compliance reporting
SQL Accessibility: Familiar SQL-style versioning operations through QueryLeaf for accessible audit trail management

Whether you're managing regulatory compliance, supporting forensic investigations, implementing data governance policies, or maintaining comprehensive change history for business operations, MongoDB document versioning with QueryLeaf's familiar SQL interface provides the foundation for robust, scalable audit trail management.

QueryLeaf Integration: QueryLeaf automatically optimizes MongoDB document versioning while providing SQL-familiar syntax for audit trail creation, compliance tracking, and historical analysis. Advanced versioning patterns, regulatory compliance workflows, and forensic analysis capabilities are seamlessly handled through familiar SQL constructs, making enterprise-grade audit trail management accessible to SQL-oriented development teams.

The combination of MongoDB's flexible document versioning capabilities with SQL-style audit operations makes it an ideal platform for applications requiring both comprehensive change tracking and familiar database management patterns, ensuring your audit trails can scale efficiently while maintaining regulatory compliance and operational transparency.

November 25, 2025
21 min read

MongoDB Bulk Operations for High-Performance Batch Processing: Enterprise Data Ingestion and Mass Data Processing with SQL-Style Bulk Operations

Enterprise applications frequently require processing large volumes of data through bulk operations that can efficiently insert, update, or delete thousands or millions of records while maintaining data consistency and optimal performance. Traditional approaches to mass data processing often struggle with network round-trips, transaction overhead, and resource utilization when handling high-volume batch operations common in ETL pipelines, data migrations, and real-time ingestion systems.

MongoDB's bulk operations provide native support for high-performance batch processing with automatic optimization, intelligent batching, and comprehensive error handling designed specifically for enterprise-scale data operations. Unlike traditional row-by-row processing that creates excessive network overhead and transaction costs, MongoDB bulk operations combine multiple operations into optimized batches while maintaining ACID guarantees and providing detailed execution feedback for complex data processing workflows.

The Traditional Batch Processing Challenge

Relational databases face significant performance limitations when processing large data volumes:

-- Traditional PostgreSQL bulk processing - inefficient row-by-row operations with poor performance

-- Large-scale product catalog management with individual operations
CREATE TABLE products (
    product_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    sku VARCHAR(100) UNIQUE NOT NULL,
    name VARCHAR(500) NOT NULL,
    description TEXT,
    category_id UUID,
    brand_id UUID,

    -- Pricing information
    base_price DECIMAL(12,2) NOT NULL,
    sale_price DECIMAL(12,2),
    cost_price DECIMAL(12,2),
    currency VARCHAR(3) DEFAULT 'USD',

    -- Inventory tracking
    stock_quantity INTEGER DEFAULT 0,
    reserved_quantity INTEGER DEFAULT 0,
    available_quantity INTEGER GENERATED ALWAYS AS (stock_quantity - reserved_quantity) STORED,
    low_stock_threshold INTEGER DEFAULT 10,
    reorder_point INTEGER DEFAULT 5,

    -- Product attributes
    weight_kg DECIMAL(8,3),
    dimensions_cm VARCHAR(50), -- Length x Width x Height
    color VARCHAR(50),
    size VARCHAR(50),
    material VARCHAR(100),

    -- Status and lifecycle
    status VARCHAR(20) DEFAULT 'active' CHECK (status IN ('active', 'inactive', 'discontinued')),
    launch_date DATE,
    discontinue_date DATE,

    -- SEO and marketing
    seo_title VARCHAR(200),
    seo_description TEXT,
    tags TEXT[],

    -- Supplier information
    supplier_id UUID,
    supplier_sku VARCHAR(100),
    lead_time_days INTEGER DEFAULT 14,

    -- Tracking timestamps
    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    last_inventory_update TIMESTAMP WITH TIME ZONE,

    -- Audit fields
    created_by UUID,
    updated_by UUID,
    version INTEGER DEFAULT 1,

    FOREIGN KEY (category_id) REFERENCES categories(category_id),
    FOREIGN KEY (brand_id) REFERENCES brands(brand_id),
    FOREIGN KEY (supplier_id) REFERENCES suppliers(supplier_id)
);

-- Product images table for multiple images per product
CREATE TABLE product_images (
    image_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    product_id UUID NOT NULL,
    image_url VARCHAR(500) NOT NULL,
    image_type VARCHAR(20) DEFAULT 'product' CHECK (image_type IN ('product', 'thumbnail', 'gallery', 'variant')),
    alt_text VARCHAR(200),
    sort_order INTEGER DEFAULT 0,
    is_primary BOOLEAN DEFAULT false,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,

    FOREIGN KEY (product_id) REFERENCES products(product_id) ON DELETE CASCADE
);

-- Product variants for size/color combinations
CREATE TABLE product_variants (
    variant_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    product_id UUID NOT NULL,
    variant_sku VARCHAR(100) UNIQUE NOT NULL,
    variant_name VARCHAR(200),

    -- Variant-specific attributes
    size VARCHAR(50),
    color VARCHAR(50),
    material VARCHAR(100),

    -- Variant pricing and inventory
    price_adjustment DECIMAL(12,2) DEFAULT 0,
    stock_quantity INTEGER DEFAULT 0,
    weight_adjustment_kg DECIMAL(8,3) DEFAULT 0,

    -- Status
    is_active BOOLEAN DEFAULT true,

    -- Timestamps
    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,

    FOREIGN KEY (product_id) REFERENCES products(product_id) ON DELETE CASCADE
);

-- Inefficient individual insert approach for bulk data loading
DO $$
DECLARE
    batch_size INTEGER := 1000;
    total_records INTEGER := 50000;
    current_batch INTEGER := 0;
    start_time TIMESTAMP;
    end_time TIMESTAMP;
    record_count INTEGER := 0;
    error_count INTEGER := 0;

    -- Sample product data
    categories UUID[] := ARRAY[
        '550e8400-e29b-41d4-a716-446655440001',
        '550e8400-e29b-41d4-a716-446655440002',
        '550e8400-e29b-41d4-a716-446655440003'
    ];

    brands UUID[] := ARRAY[
        '660e8400-e29b-41d4-a716-446655440001',
        '660e8400-e29b-41d4-a716-446655440002',
        '660e8400-e29b-41d4-a716-446655440003'
    ];

    suppliers UUID[] := ARRAY[
        '770e8400-e29b-41d4-a716-446655440001',
        '770e8400-e29b-41d4-a716-446655440002'
    ];

BEGIN
    start_time := clock_timestamp();

    RAISE NOTICE 'Starting bulk product insertion of % records...', total_records;

    -- Inefficient approach - individual INSERT statements with poor performance
    FOR i IN 1..total_records LOOP
        BEGIN
            -- Single row insert - creates network round-trip and transaction overhead for each record
            INSERT INTO products (
                sku, name, description, category_id, brand_id,
                base_price, cost_price, stock_quantity, weight_kg,
                status, supplier_id, created_by
            ) VALUES (
                'SKU-' || LPAD(i::text, 8, '0'),
                'Product ' || i,
                'Description for product ' || i || ' with detailed specifications and features.',
                categories[((i-1) % array_length(categories, 1)) + 1],
                brands[((i-1) % array_length(brands, 1)) + 1],
                ROUND((RANDOM() * 1000 + 10)::numeric, 2),
                ROUND((RANDOM() * 500 + 5)::numeric, 2),
                FLOOR(RANDOM() * 100),
                ROUND((RANDOM() * 10 + 0.1)::numeric, 3),
                CASE WHEN RANDOM() < 0.9 THEN 'active' ELSE 'inactive' END,
                suppliers[((i-1) % array_length(suppliers, 1)) + 1],
                '880e8400-e29b-41d4-a716-446655440001'
            );

            record_count := record_count + 1;

            -- Progress reporting every batch
            IF record_count % batch_size = 0 THEN
                current_batch := current_batch + 1;
                RAISE NOTICE 'Inserted batch %: % records (% total)', current_batch, batch_size, record_count;
            END IF;

        EXCEPTION 
            WHEN OTHERS THEN
                error_count := error_count + 1;
                RAISE NOTICE 'Error inserting record %: %', i, SQLERRM;
        END;
    END LOOP;

    end_time := clock_timestamp();

    RAISE NOTICE 'Bulk insertion completed:';
    RAISE NOTICE '  Total time: %', end_time - start_time;
    RAISE NOTICE '  Records inserted: %', record_count;
    RAISE NOTICE '  Errors encountered: %', error_count;
    RAISE NOTICE '  Average rate: % records/second', ROUND(record_count / EXTRACT(EPOCH FROM end_time - start_time));

    -- Performance issues with individual inserts:
    -- 1. Each INSERT creates a separate network round-trip
    -- 2. Individual transaction overhead for each operation
    -- 3. No batch optimization or bulk loading capabilities
    -- 4. Poor resource utilization and high latency
    -- 5. Difficulty in handling partial failures and rollbacks
    -- 6. Limited parallelization options
    -- 7. Inefficient index maintenance for each individual operation
END $$;

-- Equally inefficient bulk update approach
DO $$
DECLARE
    update_batch_size INTEGER := 500;
    products_cursor CURSOR FOR 
        SELECT product_id, base_price, stock_quantity 
        FROM products 
        WHERE status = 'active';

    product_record RECORD;
    updated_count INTEGER := 0;
    batch_count INTEGER := 0;
    start_time TIMESTAMP := clock_timestamp();

BEGIN
    RAISE NOTICE 'Starting bulk price and inventory update...';

    -- Individual UPDATE statements - highly inefficient for bulk operations
    FOR product_record IN products_cursor LOOP
        BEGIN
            -- Single row update with complex business logic
            UPDATE products 
            SET 
                base_price = CASE 
                    WHEN product_record.base_price < 50 THEN product_record.base_price * 1.15
                    WHEN product_record.base_price < 200 THEN product_record.base_price * 1.10
                    ELSE product_record.base_price * 1.05
                END,

                sale_price = CASE 
                    WHEN RANDOM() < 0.3 THEN base_price * 0.85  -- 30% chance of sale
                    ELSE NULL
                END,

                stock_quantity = CASE 
                    WHEN product_record.stock_quantity < 5 THEN product_record.stock_quantity + 50
                    WHEN product_record.stock_quantity < 20 THEN product_record.stock_quantity + 25
                    ELSE product_record.stock_quantity
                END,

                updated_at = CURRENT_TIMESTAMP,
                version = version + 1,
                updated_by = '880e8400-e29b-41d4-a716-446655440002'

            WHERE product_id = product_record.product_id;

            updated_count := updated_count + 1;

            -- Batch progress reporting
            IF updated_count % update_batch_size = 0 THEN
                batch_count := batch_count + 1;
                RAISE NOTICE 'Updated batch %: % products', batch_count, update_batch_size;
                COMMIT; -- Frequent commits for progress tracking
            END IF;

        EXCEPTION 
            WHEN OTHERS THEN
                RAISE NOTICE 'Error updating product %: %', product_record.product_id, SQLERRM;
        END;
    END LOOP;

    RAISE NOTICE 'Bulk update completed: % products updated in %', 
        updated_count, clock_timestamp() - start_time;
END $$;

-- Traditional batch delete with poor performance characteristics
DELETE FROM products 
WHERE status = 'discontinued' 
  AND discontinue_date < CURRENT_DATE - INTERVAL '2 years'
  AND product_id IN (
    -- Subquery creates additional overhead and complexity
    SELECT p.product_id 
    FROM products p
    LEFT JOIN product_variants pv ON p.product_id = pv.product_id
    LEFT JOIN order_items oi ON p.product_id = oi.product_id
    WHERE pv.variant_id IS NULL  -- No variants
      AND oi.item_id IS NULL     -- No order history
      AND p.stock_quantity = 0   -- No inventory
  );

-- Problems with traditional bulk processing:
-- 1. Row-by-row processing creates excessive network round-trips and latency
-- 2. Individual transaction overhead significantly impacts performance
-- 3. Poor resource utilization and limited parallelization capabilities  
-- 4. Complex error handling for partial failures and rollback scenarios
-- 5. Inefficient index maintenance with frequent individual operations
-- 6. Limited batch optimization and bulk loading strategies
-- 7. Difficulty in monitoring progress and performance of bulk operations
-- 8. Poor integration with modern data pipeline and ETL tools
-- 9. Scalability limitations with very large datasets and concurrent operations
-- 10. Lack of automatic retry and recovery mechanisms for failed operations

-- MySQL bulk operations (even more limited capabilities)
CREATE TABLE mysql_products (
    id BIGINT AUTO_INCREMENT PRIMARY KEY,
    sku VARCHAR(100) UNIQUE,
    name VARCHAR(500),
    price DECIMAL(10,2),
    quantity INT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- MySQL limited bulk insert (no advanced error handling)
INSERT INTO mysql_products (sku, name, price, quantity) VALUES
('SKU001', 'Product 1', 99.99, 10),
('SKU002', 'Product 2', 149.99, 5),
('SKU003', 'Product 3', 199.99, 15);

-- Basic bulk update (limited conditional logic)
UPDATE mysql_products 
SET price = price * 1.1 
WHERE quantity > 0;

-- Simple bulk delete (no complex conditions)
DELETE FROM mysql_products 
WHERE quantity = 0 AND created_at < DATE_SUB(NOW(), INTERVAL 30 DAY);

-- MySQL limitations for bulk operations:
-- - Basic INSERT VALUES limited by max_allowed_packet size
-- - No advanced bulk operation APIs or optimization
-- - Poor error handling and partial failure recovery
-- - Limited conditional logic in bulk updates
-- - Basic performance monitoring and progress tracking
-- - No automatic batching or optimization strategies

MongoDB's bulk operations provide comprehensive high-performance batch processing:

// MongoDB Bulk Operations - enterprise-scale high-performance batch processing
const { MongoClient, ObjectId } = require('mongodb');

const client = new MongoClient('mongodb://localhost:27017');
const db = client.db('enterprise_data_platform');

// Advanced bulk operations manager for enterprise data processing
class AdvancedBulkOperationsManager {
  constructor(db, config = {}) {
    this.db = db;
    this.config = {
      // Performance configuration
      batchSize: config.batchSize || 1000,
      maxConcurrentOperations: config.maxConcurrentOperations || 10,
      retryAttempts: config.retryAttempts || 3,
      retryDelayMs: config.retryDelayMs || 1000,

      // Memory management
      maxMemoryUsageMB: config.maxMemoryUsageMB || 500,
      memoryMonitoringInterval: config.memoryMonitoringInterval || 1000,

      // Progress tracking
      enableProgressTracking: config.enableProgressTracking || true,
      progressUpdateInterval: config.progressUpdateInterval || 1000,
      enableDetailedLogging: config.enableDetailedLogging || true,

      // Error handling
      continueOnError: config.continueOnError || false,
      errorLoggingLevel: config.errorLoggingLevel || 'detailed',

      // Optimization features
      enableIndexOptimization: config.enableIndexOptimization || true,
      enableCompressionOptimization: config.enableCompressionOptimization || true,
      enableShardingOptimization: config.enableShardingOptimization || true,

      // Monitoring and analytics
      enablePerformanceAnalytics: config.enablePerformanceAnalytics || true,
      enableResourceMonitoring: config.enableResourceMonitoring || true
    };

    this.operationStats = {
      totalOperations: 0,
      successfulOperations: 0,
      failedOperations: 0,
      bytesProcessed: 0,
      operationsPerSecond: 0,
      averageLatency: 0
    };

    this.setupMonitoring();
  }

  async performBulkInsert(collectionName, documents, options = {}) {
    console.log(`Starting bulk insert of ${documents.length} documents into ${collectionName}...`);
    const startTime = Date.now();

    try {
      const collection = this.db.collection(collectionName);
      const operationOptions = {
        ordered: options.ordered !== false, // Default to ordered operations
        bypassDocumentValidation: options.bypassDocumentValidation || false,
        ...options
      };

      // Prepare documents with validation and enrichment
      const preparedDocuments = await this.prepareDocumentsForInsertion(documents, options);

      // Create bulk operation
      const bulkOperations = [];
      const results = {
        insertedIds: [],
        insertedCount: 0,
        errors: [],
        processingTime: 0,
        throughput: 0
      };

      // Process in optimized batches
      const batches = this.createOptimizedBatches(preparedDocuments, this.config.batchSize);

      console.log(`Processing ${batches.length} batches of up to ${this.config.batchSize} documents each...`);

      for (let batchIndex = 0; batchIndex < batches.length; batchIndex++) {
        const batch = batches[batchIndex];
        const batchStartTime = Date.now();

        try {
          // Create bulk write operations for current batch
          const batchOperations = batch.map(doc => ({
            insertOne: {
              document: {
                ...doc,
                _createdAt: new Date(),
                _batchId: options.batchId || new ObjectId(),
                _batchIndex: batchIndex,
                _processingMetadata: {
                  insertedAt: new Date(),
                  batchSize: batch.length,
                  totalBatches: batches.length
                }
              }
            }
          }));

          // Execute bulk write with comprehensive error handling
          const batchResult = await collection.bulkWrite(batchOperations, operationOptions);

          // Track successful operations
          results.insertedCount += batchResult.insertedCount;
          results.insertedIds.push(...Object.values(batchResult.insertedIds));

          // Update statistics
          this.operationStats.totalOperations += batch.length;
          this.operationStats.successfulOperations += batchResult.insertedCount;

          const batchTime = Date.now() - batchStartTime;
          const batchThroughput = Math.round(batch.length / (batchTime / 1000));

          if (this.config.enableProgressTracking) {
            const progress = Math.round(((batchIndex + 1) / batches.length) * 100);
            console.log(`Batch ${batchIndex + 1}/${batches.length} completed: ${batchResult.insertedCount} inserted (${batchThroughput} docs/sec, ${progress}% complete)`);
          }

        } catch (batchError) {
          console.error(`Batch ${batchIndex + 1} failed:`, batchError.message);

          if (!this.config.continueOnError) {
            throw batchError;
          }

          results.errors.push({
            batchIndex,
            batchSize: batch.length,
            error: batchError.message,
            timestamp: new Date()
          });

          this.operationStats.failedOperations += batch.length;
        }
      }

      // Calculate final statistics
      const totalTime = Date.now() - startTime;
      results.processingTime = totalTime;
      results.throughput = Math.round(results.insertedCount / (totalTime / 1000));

      // Update global statistics
      this.operationStats.operationsPerSecond = results.throughput;
      this.operationStats.averageLatency = totalTime / batches.length;

      console.log(`Bulk insert completed: ${results.insertedCount} documents inserted in ${totalTime}ms (${results.throughput} docs/sec)`);

      if (results.errors.length > 0) {
        console.warn(`${results.errors.length} batch errors encountered during insertion`);
      }

      return results;

    } catch (error) {
      console.error(`Bulk insert operation failed:`, error);
      throw error;
    }
  }

  async performBulkUpdate(collectionName, updateOperations, options = {}) {
    console.log(`Starting bulk update of ${updateOperations.length} operations on ${collectionName}...`);
    const startTime = Date.now();

    try {
      const collection = this.db.collection(collectionName);
      const results = {
        matchedCount: 0,
        modifiedCount: 0,
        upsertedCount: 0,
        upsertedIds: [],
        errors: [],
        processingTime: 0,
        throughput: 0
      };

      // Process update operations in optimized batches
      const batches = this.createOptimizedBatches(updateOperations, this.config.batchSize);

      console.log(`Processing ${batches.length} update batches...`);

      for (let batchIndex = 0; batchIndex < batches.length; batchIndex++) {
        const batch = batches[batchIndex];
        const batchStartTime = Date.now();

        try {
          // Create bulk write operations for updates
          const batchOperations = batch.map(operation => {
            const updateDoc = {
              ...operation.update,
              $set: {
                ...operation.update.$set,
                _updatedAt: new Date(),
                _batchId: options.batchId || new ObjectId(),
                _batchIndex: batchIndex,
                _updateMetadata: {
                  updatedAt: new Date(),
                  batchSize: batch.length,
                  operationType: 'bulkUpdate'
                }
              }
            };

            if (operation.upsert) {
              return {
                updateOne: {
                  filter: operation.filter,
                  update: updateDoc,
                  upsert: true
                }
              };
            } else {
              return {
                updateMany: {
                  filter: operation.filter,
                  update: updateDoc
                }
              };
            }
          });

          // Execute bulk write
          const batchResult = await collection.bulkWrite(batchOperations, {
            ordered: options.ordered !== false,
            ...options
          });

          // Aggregate results
          results.matchedCount += batchResult.matchedCount;
          results.modifiedCount += batchResult.modifiedCount;
          results.upsertedCount += batchResult.upsertedCount;
          results.upsertedIds.push(...Object.values(batchResult.upsertedIds || {}));

          const batchTime = Date.now() - batchStartTime;
          const batchThroughput = Math.round(batch.length / (batchTime / 1000));

          if (this.config.enableProgressTracking) {
            const progress = Math.round(((batchIndex + 1) / batches.length) * 100);
            console.log(`Update batch ${batchIndex + 1}/${batches.length}: ${batchResult.modifiedCount} modified (${batchThroughput} ops/sec, ${progress}% complete)`);
          }

        } catch (batchError) {
          console.error(`Update batch ${batchIndex + 1} failed:`, batchError.message);

          if (!this.config.continueOnError) {
            throw batchError;
          }

          results.errors.push({
            batchIndex,
            batchSize: batch.length,
            error: batchError.message,
            timestamp: new Date()
          });
        }
      }

      // Calculate final statistics
      const totalTime = Date.now() - startTime;
      results.processingTime = totalTime;
      results.throughput = Math.round(updateOperations.length / (totalTime / 1000));

      console.log(`Bulk update completed: ${results.modifiedCount} documents modified in ${totalTime}ms (${results.throughput} ops/sec)`);

      return results;

    } catch (error) {
      console.error(`Bulk update operation failed:`, error);
      throw error;
    }
  }

  async performBulkDelete(collectionName, deleteFilters, options = {}) {
    console.log(`Starting bulk delete of ${deleteFilters.length} operations on ${collectionName}...`);
    const startTime = Date.now();

    try {
      const collection = this.db.collection(collectionName);
      const results = {
        deletedCount: 0,
        errors: [],
        processingTime: 0,
        throughput: 0
      };

      // Archive documents before deletion if required
      if (options.archiveBeforeDelete) {
        console.log('Archiving documents before deletion...');
        await this.archiveDocumentsBeforeDelete(collection, deleteFilters, options);
      }

      // Process delete operations in batches
      const batches = this.createOptimizedBatches(deleteFilters, this.config.batchSize);

      console.log(`Processing ${batches.length} delete batches...`);

      for (let batchIndex = 0; batchIndex < batches.length; batchIndex++) {
        const batch = batches[batchIndex];
        const batchStartTime = Date.now();

        try {
          // Create bulk write operations for deletes
          const batchOperations = batch.map(filter => ({
            deleteMany: { filter }
          }));

          // Execute bulk write
          const batchResult = await collection.bulkWrite(batchOperations, {
            ordered: options.ordered !== false,
            ...options
          });

          results.deletedCount += batchResult.deletedCount;

          const batchTime = Date.now() - batchStartTime;
          const batchThroughput = Math.round(batch.length / (batchTime / 1000));

          if (this.config.enableProgressTracking) {
            const progress = Math.round(((batchIndex + 1) / batches.length) * 100);
            console.log(`Delete batch ${batchIndex + 1}/${batches.length}: ${batchResult.deletedCount} deleted (${batchThroughput} ops/sec, ${progress}% complete)`);
          }

        } catch (batchError) {
          console.error(`Delete batch ${batchIndex + 1} failed:`, batchError.message);

          if (!this.config.continueOnError) {
            throw batchError;
          }

          results.errors.push({
            batchIndex,
            batchSize: batch.length,
            error: batchError.message,
            timestamp: new Date()
          });
        }
      }

      // Calculate final statistics
      const totalTime = Date.now() - startTime;
      results.processingTime = totalTime;
      results.throughput = Math.round(deleteFilters.length / (totalTime / 1000));

      console.log(`Bulk delete completed: ${results.deletedCount} documents deleted in ${totalTime}ms (${results.throughput} ops/sec)`);

      return results;

    } catch (error) {
      console.error(`Bulk delete operation failed:`, error);
      throw error;
    }
  }

  async performMixedBulkOperations(collectionName, operations, options = {}) {
    console.log(`Starting mixed bulk operations (${operations.length} operations) on ${collectionName}...`);
    const startTime = Date.now();

    try {
      const collection = this.db.collection(collectionName);
      const results = {
        insertedCount: 0,
        matchedCount: 0,
        modifiedCount: 0,
        deletedCount: 0,
        upsertedCount: 0,
        upsertedIds: [],
        errors: [],
        processingTime: 0,
        throughput: 0
      };

      // Process mixed operations in optimized batches
      const batches = this.createOptimizedBatches(operations, this.config.batchSize);

      console.log(`Processing ${batches.length} mixed operation batches...`);

      for (let batchIndex = 0; batchIndex < batches.length; batchIndex++) {
        const batch = batches[batchIndex];
        const batchStartTime = Date.now();

        try {
          // Transform operations into MongoDB bulk write format
          const batchOperations = batch.map(op => {
            switch (op.type) {
              case 'insert':
                return {
                  insertOne: {
                    document: {
                      ...op.document,
                      _createdAt: new Date(),
                      _batchId: options.batchId || new ObjectId()
                    }
                  }
                };

              case 'update':
                return {
                  updateMany: {
                    filter: op.filter,
                    update: {
                      ...op.update,
                      $set: {
                        ...op.update.$set,
                        _updatedAt: new Date(),
                        _batchId: options.batchId || new ObjectId()
                      }
                    }
                  }
                };

              case 'delete':
                return {
                  deleteMany: {
                    filter: op.filter
                  }
                };

              case 'upsert':
                return {
                  replaceOne: {
                    filter: op.filter,
                    replacement: {
                      ...op.document,
                      _updatedAt: new Date(),
                      _batchId: options.batchId || new ObjectId()
                    },
                    upsert: true
                  }
                };

              default:
                throw new Error(`Unsupported operation type: ${op.type}`);
            }
          });

          // Execute mixed bulk operations
          const batchResult = await collection.bulkWrite(batchOperations, {
            ordered: options.ordered !== false,
            ...options
          });

          // Aggregate results
          results.insertedCount += batchResult.insertedCount || 0;
          results.matchedCount += batchResult.matchedCount || 0;
          results.modifiedCount += batchResult.modifiedCount || 0;
          results.deletedCount += batchResult.deletedCount || 0;
          results.upsertedCount += batchResult.upsertedCount || 0;
          if (batchResult.upsertedIds) {
            results.upsertedIds.push(...Object.values(batchResult.upsertedIds));
          }

          const batchTime = Date.now() - batchStartTime;
          const batchThroughput = Math.round(batch.length / (batchTime / 1000));

          if (this.config.enableProgressTracking) {
            const progress = Math.round(((batchIndex + 1) / batches.length) * 100);
            console.log(`Mixed batch ${batchIndex + 1}/${batches.length}: completed (${batchThroughput} ops/sec, ${progress}% complete)`);
          }

        } catch (batchError) {
          console.error(`Mixed batch ${batchIndex + 1} failed:`, batchError.message);

          if (!this.config.continueOnError) {
            throw batchError;
          }

          results.errors.push({
            batchIndex,
            batchSize: batch.length,
            error: batchError.message,
            timestamp: new Date()
          });
        }
      }

      // Calculate final statistics
      const totalTime = Date.now() - startTime;
      results.processingTime = totalTime;
      results.throughput = Math.round(operations.length / (totalTime / 1000));

      console.log(`Mixed bulk operations completed in ${totalTime}ms (${results.throughput} ops/sec)`);
      console.log(`Results: ${results.insertedCount} inserted, ${results.modifiedCount} modified, ${results.deletedCount} deleted`);

      return results;

    } catch (error) {
      console.error(`Mixed bulk operations failed:`, error);
      throw error;
    }
  }

  async prepareDocumentsForInsertion(documents, options) {
    return documents.map((doc, index) => ({
      ...doc,
      _id: doc._id || new ObjectId(),
      _documentIndex: index,
      _validationStatus: 'validated',
      _preparationTimestamp: new Date()
    }));
  }

  createOptimizedBatches(items, batchSize) {
    const batches = [];
    for (let i = 0; i < items.length; i += batchSize) {
      batches.push(items.slice(i, i + batchSize));
    }
    return batches;
  }

  async archiveDocumentsBeforeDelete(collection, filters, options) {
    const archiveCollection = this.db.collection(`${collection.collectionName}_archive`);

    for (const filter of filters) {
      const documentsToArchive = await collection.find(filter).toArray();
      if (documentsToArchive.length > 0) {
        const archiveDocuments = documentsToArchive.map(doc => ({
          ...doc,
          _archivedAt: new Date(),
          _originalId: doc._id,
          _archiveReason: options.archiveReason || 'bulk_delete'
        }));

        await archiveCollection.insertMany(archiveDocuments);
      }
    }
  }

  setupMonitoring() {
    if (this.config.enablePerformanceAnalytics) {
      console.log('Performance analytics enabled');
    }
  }

  async getOperationStatistics() {
    return {
      ...this.operationStats,
      timestamp: new Date()
    };
  }
}

// Example usage for enterprise-scale bulk operations
async function demonstrateAdvancedBulkOperations() {
  const bulkManager = new AdvancedBulkOperationsManager(db, {
    batchSize: 2000,
    maxConcurrentOperations: 15,
    enableProgressTracking: true,
    enableDetailedLogging: true,
    continueOnError: true
  });

  try {
    // Bulk insert: Load 100,000 product records
    const productDocuments = Array.from({ length: 100000 }, (_, index) => ({
      sku: `BULK-PRODUCT-${String(index + 1).padStart(8, '0')}`,
      name: `Enterprise Product ${index + 1}`,
      description: `Comprehensive product description for bulk-loaded item ${index + 1}`,
      category: ['electronics', 'clothing', 'books', 'home'][index % 4],
      brand: ['BrandA', 'BrandB', 'BrandC', 'BrandD'][index % 4],
      price: Math.round((Math.random() * 1000 + 10) * 100) / 100,
      costPrice: Math.round((Math.random() * 500 + 5) * 100) / 100,
      stockQuantity: Math.floor(Math.random() * 100),
      weight: Math.round((Math.random() * 10 + 0.1) * 1000) / 1000,
      status: Math.random() < 0.9 ? 'active' : 'inactive',
      tags: [`tag-${index % 10}`, `category-${index % 5}`],
      metadata: {
        supplier: `supplier-${index % 20}`,
        leadTime: Math.floor(Math.random() * 30) + 1,
        quality: Math.random() < 0.8 ? 'high' : 'medium'
      }
    }));

    console.log('Starting bulk product insertion...');
    const insertResults = await bulkManager.performBulkInsert('products', productDocuments, {
      batchId: new ObjectId(),
      ordered: true
    });

    // Bulk update: Update pricing for all active products
    const priceUpdateOperations = [
      {
        filter: { status: 'active', price: { $lt: 100 } },
        update: {
          $mul: { price: 1.15 },
          $set: { priceUpdatedReason: 'low_price_adjustment' }
        }
      },
      {
        filter: { status: 'active', price: { $gte: 100, $lt: 500 } },
        update: {
          $mul: { price: 1.10 },
          $set: { priceUpdatedReason: 'medium_price_adjustment' }
        }
      },
      {
        filter: { status: 'active', price: { $gte: 500 } },
        update: {
          $mul: { price: 1.05 },
          $set: { priceUpdatedReason: 'premium_price_adjustment' }
        }
      }
    ];

    console.log('Starting bulk price updates...');
    const updateResults = await bulkManager.performBulkUpdate('products', priceUpdateOperations, {
      batchId: new ObjectId()
    });

    // Mixed operations: Complex business logic
    const mixedOperations = [
      // Insert new seasonal products
      ...Array.from({ length: 1000 }, (_, index) => ({
        type: 'insert',
        document: {
          sku: `SEASONAL-${String(index + 1).padStart(6, '0')}`,
          name: `Seasonal Product ${index + 1}`,
          category: 'seasonal',
          price: Math.round((Math.random() * 200 + 20) * 100) / 100,
          status: 'active',
          seasonal: true,
          season: 'winter'
        }
      })),

      // Update low-stock products
      {
        type: 'update',
        filter: { stockQuantity: { $lt: 10 }, status: 'active' },
        update: {
          $set: { 
            lowStockAlert: true,
            restockPriority: 'high',
            alertTriggeredAt: new Date()
          }
        }
      },

      // Delete discontinued products with no inventory
      {
        type: 'delete',
        filter: { 
          status: 'discontinued', 
          stockQuantity: 0,
          lastOrderDate: { $lt: new Date(Date.now() - 365 * 24 * 60 * 60 * 1000) }
        }
      }
    ];

    console.log('Starting mixed bulk operations...');
    const mixedResults = await bulkManager.performMixedBulkOperations('products', mixedOperations);

    // Get final statistics
    const finalStats = await bulkManager.getOperationStatistics();
    console.log('Final Operation Statistics:', finalStats);

    return {
      insertResults,
      updateResults,
      mixedResults,
      finalStats
    };

  } catch (error) {
    console.error('Bulk operations demonstration failed:', error);
    throw error;
  }
}

// Benefits of MongoDB Bulk Operations:
// - Native batch optimization with intelligent operation grouping and network efficiency
// - Comprehensive error handling with partial failure recovery and detailed error reporting
// - Flexible operation mixing with support for complex business logic in single batches
// - Advanced progress tracking and performance monitoring for enterprise operations
// - Memory-efficient processing with automatic resource management and optimization
// - Production-ready scalability with sharding optimization and distributed processing
// - Integrated retry mechanisms with configurable backoff strategies and failure handling
// - Rich analytics and monitoring capabilities for operational insight and optimization

module.exports = {
  AdvancedBulkOperationsManager,
  demonstrateAdvancedBulkOperations
};

Understanding MongoDB Bulk Operations Architecture

High-Performance Batch Processing Patterns

Implement sophisticated bulk operation strategies for enterprise data processing:

// Production-scale bulk operations implementation with advanced monitoring and optimization
class ProductionBulkProcessingPlatform extends AdvancedBulkOperationsManager {
  constructor(db, productionConfig) {
    super(db, productionConfig);

    this.productionConfig = {
      ...productionConfig,
      distributedProcessing: true,
      realTimeMonitoring: true,
      advancedOptimization: true,
      enterpriseErrorHandling: true,
      automaticRecovery: true,
      performanceAnalytics: true
    };

    this.setupProductionOptimizations();
    this.initializeDistributedProcessing();
    this.setupAdvancedMonitoring();
  }

  async implementDistributedBulkProcessing() {
    console.log('Setting up distributed bulk processing architecture...');

    const distributedStrategy = {
      // Parallel processing
      parallelization: {
        maxConcurrentBatches: 20,
        adaptiveBatchSizing: true,
        loadBalancing: true,
        resourceOptimization: true
      },

      // Distributed coordination
      coordination: {
        shardAwareness: true,
        crossShardOptimization: true,
        distributedTransactions: true,
        consistencyGuarantees: 'majority'
      },

      // Performance optimization
      performance: {
        networkOptimization: true,
        compressionEnabled: true,
        pipelineOptimization: true,
        indexOptimization: true
      }
    };

    return await this.deployDistributedStrategy(distributedStrategy);
  }

  async implementAdvancedErrorHandling() {
    console.log('Implementing enterprise-grade error handling...');

    const errorHandlingStrategy = {
      // Recovery mechanisms
      recovery: {
        automaticRetry: true,
        exponentialBackoff: true,
        circuitBreaker: true,
        fallbackStrategies: true
      },

      // Error classification
      classification: {
        transientErrors: 'retry',
        permanentErrors: 'skip_and_log',
        partialFailures: 'continue_processing',
        criticalErrors: 'immediate_stop'
      },

      // Monitoring and alerting
      monitoring: {
        realTimeAlerts: true,
        errorTrend: true,
        performanceImpact: true,
        operationalDashboard: true
      }
    };

    return await this.deployErrorHandlingStrategy(errorHandlingStrategy);
  }

  async implementPerformanceOptimization() {
    console.log('Implementing advanced performance optimization...');

    const optimizationStrategy = {
      // Batch optimization
      batchOptimization: {
        dynamicBatchSizing: true,
        contentAwareBatching: true,
        resourceBasedAdjustment: true,
        latencyOptimization: true
      },

      // Memory management
      memoryManagement: {
        streamingProcessing: true,
        memoryPooling: true,
        garbageCollectionOptimization: true,
        resourceMonitoring: true
      },

      // Network optimization
      networkOptimization: {
        connectionPooling: true,
        compressionEnabled: true,
        pipeliningEnabled: true,
        latencyReduction: true
      }
    };

    return await this.deployOptimizationStrategy(optimizationStrategy);
  }
}

SQL-Style Bulk Operations with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB bulk operations and batch processing:

-- QueryLeaf bulk operations with SQL-familiar batch processing syntax

-- Bulk insert with advanced batch configuration
BULK INSERT INTO products (sku, name, category, price, stock_quantity)
VALUES 
  -- Batch of 1000 products with optimized performance settings
  ('BULK-001', 'Enterprise Product 1', 'electronics', 299.99, 50),
  ('BULK-002', 'Enterprise Product 2', 'electronics', 399.99, 75),
  ('BULK-003', 'Enterprise Product 3', 'clothing', 99.99, 25),
  -- ... continuing for large datasets
WITH (
  batch_size = 2000,
  ordered = true,
  continue_on_error = false,
  enable_progress_tracking = true,

  -- Performance optimization
  bypass_document_validation = false,
  compression_enabled = true,
  parallel_processing = true,
  max_concurrent_batches = 10,

  -- Error handling
  retry_attempts = 3,
  retry_delay_ms = 1000,
  detailed_error_logging = true,

  -- Monitoring
  enable_analytics = true,
  progress_update_interval = 1000,
  performance_monitoring = true
);

-- Bulk insert from SELECT with data transformation
BULK INSERT INTO product_summaries (category, product_count, avg_price, total_value, last_updated)
SELECT 
  category,
  COUNT(*) as product_count,
  ROUND(AVG(price), 2) as avg_price,
  ROUND(SUM(price * stock_quantity), 2) as total_value,
  CURRENT_TIMESTAMP as last_updated
FROM products
WHERE status = 'active'
GROUP BY category
HAVING COUNT(*) > 10
WITH (
  batch_size = 500,
  upsert_on_conflict = true,
  conflict_resolution = 'replace'
);

-- Advanced bulk update with complex conditions and business logic
BULK UPDATE products
SET 
  price = CASE 
    WHEN category = 'electronics' AND price < 100 THEN price * 1.20
    WHEN category = 'electronics' AND price >= 100 THEN price * 1.15
    WHEN category = 'clothing' AND price < 50 THEN price * 1.25
    WHEN category = 'clothing' AND price >= 50 THEN price * 1.15
    WHEN category = 'books' THEN price * 1.10
    ELSE price * 1.05
  END,

  sale_price = CASE 
    WHEN RANDOM() < 0.3 THEN price * 0.85  -- 30% chance of sale
    ELSE NULL
  END,

  stock_status = CASE 
    WHEN stock_quantity = 0 THEN 'out_of_stock'
    WHEN stock_quantity < 10 THEN 'low_stock'
    WHEN stock_quantity < 5 THEN 'critical_stock'
    ELSE 'in_stock'
  END,

  priority_reorder = CASE 
    WHEN stock_quantity < reorder_point THEN true
    ELSE false
  END,

  last_price_update = CURRENT_TIMESTAMP,
  price_update_reason = 'bulk_adjustment_2025',
  updated_by = CURRENT_USER_ID(),
  version = version + 1

WHERE status = 'active'
  AND created_at >= CURRENT_DATE - INTERVAL '2 years'
WITH (
  batch_size = 1500,
  ordered = false,  -- Allow parallel processing
  continue_on_error = true,

  -- Update strategy
  multi_document_updates = true,
  atomic_operations = true,

  -- Performance tuning
  index_optimization = true,
  parallel_processing = true,
  max_concurrent_operations = 15,

  -- Progress tracking
  enable_progress_tracking = true,
  progress_callback = 'product_update_progress_handler',
  estimated_total_operations = 50000
);

-- Bulk upsert operations with conflict resolution
BULK UPSERT INTO inventory_snapshots (product_id, snapshot_date, stock_quantity, reserved_quantity, available_quantity)
SELECT 
  p.product_id,
  CURRENT_DATE as snapshot_date,
  p.stock_quantity,
  COALESCE(r.reserved_quantity, 0) as reserved_quantity,
  p.stock_quantity - COALESCE(r.reserved_quantity, 0) as available_quantity
FROM products p
LEFT JOIN (
  SELECT 
    product_id,
    SUM(quantity) as reserved_quantity
  FROM order_items oi
  JOIN orders o ON oi.order_id = o.order_id
  WHERE o.status IN ('pending', 'processing')
  GROUP BY product_id
) r ON p.product_id = r.product_id
WHERE p.status = 'active'
WITH (
  batch_size = 1000,
  conflict_resolution = 'update',

  -- Upsert configuration
  upsert_conditions = JSON_OBJECT(
    'match_fields', JSON_ARRAY('product_id', 'snapshot_date'),
    'update_strategy', 'replace_all',
    'preserve_audit_fields', true
  ),

  -- Performance optimization
  enable_bulk_optimization = true,
  parallel_upserts = true,
  transaction_batching = true
);

-- Complex bulk delete with archiving and cascade handling
BULK DELETE FROM products
WHERE status = 'discontinued'
  AND discontinue_date < CURRENT_DATE - INTERVAL '2 years'
  AND stock_quantity = 0
  AND product_id NOT IN (
    -- Exclude products with recent orders
    SELECT DISTINCT product_id 
    FROM order_items oi
    JOIN orders o ON oi.order_id = o.order_id
    WHERE o.order_date > CURRENT_DATE - INTERVAL '1 year'
  )
  AND product_id NOT IN (
    -- Exclude products with active variants
    SELECT DISTINCT product_id 
    FROM product_variants 
    WHERE is_active = true
  )
WITH (
  batch_size = 800,
  ordered = true,

  -- Archive configuration
  archive_before_delete = true,
  archive_collection = 'products_archive',
  archive_metadata = JSON_OBJECT(
    'deletion_reason', 'bulk_cleanup_2025',
    'deleted_by', CURRENT_USER_ID(),
    'deletion_date', CURRENT_TIMESTAMP,
    'cleanup_batch_id', 'batch_2025_001'
  ),

  -- Cascade handling
  handle_cascades = true,
  cascade_operations = JSON_ARRAY(
    JSON_OBJECT('collection', 'product_images', 'action', 'delete'),
    JSON_OBJECT('collection', 'product_reviews', 'action', 'archive'),
    JSON_OBJECT('collection', 'product_analytics', 'action', 'delete')
  ),

  -- Safety measures
  require_confirmation = true,
  max_delete_limit = 10000,
  enable_rollback = true,
  rollback_timeout_minutes = 30
);

-- Mixed bulk operations for complex business processes
BEGIN BULK TRANSACTION 'product_lifecycle_update_2025';

-- Step 1: Insert new seasonal products
BULK INSERT INTO products (sku, name, category, price, stock_quantity, status, seasonal_info)
SELECT 
  'SEASONAL-' || LPAD(ROW_NUMBER() OVER (ORDER BY t.name), 6, '0') as sku,
  t.name,
  t.category,
  t.price,
  t.initial_stock,
  'active' as status,
  JSON_OBJECT(
    'is_seasonal', true,
    'season', 'winter_2025',
    'availability_start', '2025-12-01',
    'availability_end', '2026-02-28'
  ) as seasonal_info
FROM temp_seasonal_products t
WITH (batch_size = 1000);

-- Step 2: Update existing product categories
BULK UPDATE products 
SET 
  category = CASE 
    WHEN category = 'winter_clothing' THEN 'clothing'
    WHEN category = 'holiday_electronics' THEN 'electronics'
    WHEN category = 'seasonal_books' THEN 'books'
    ELSE category
  END,

  tags = CASE 
    WHEN category LIKE '%seasonal%' THEN 
      ARRAY_APPEND(tags, 'seasonal_clearance')
    ELSE tags
  END,

  clearance_eligible = CASE 
    WHEN category LIKE '%seasonal%' AND stock_quantity > 50 THEN true
    ELSE clearance_eligible
  END

WHERE category LIKE '%seasonal%'
   OR category LIKE '%holiday%'
WITH (batch_size = 1200);

-- Step 3: Archive old product versions
BULK MOVE products TO products_archive
WHERE version < 5
  AND last_updated < CURRENT_DATE - INTERVAL '1 year'
  AND status = 'inactive'
WITH (
  batch_size = 500,
  preserve_relationships = false,
  archive_metadata = JSON_OBJECT(
    'archive_reason', 'version_cleanup',
    'archive_date', CURRENT_TIMESTAMP
  )
);

-- Step 4: Clean up orphaned related records
BULK DELETE FROM product_images 
WHERE product_id NOT IN (
  SELECT product_id FROM products
  UNION 
  SELECT original_product_id FROM products_archive
)
WITH (batch_size = 1000);

COMMIT BULK TRANSACTION;

-- Monitoring and analytics for bulk operations
WITH bulk_operation_analytics AS (
  SELECT 
    operation_type,
    collection_name,
    batch_id,
    operation_start_time,
    operation_end_time,
    EXTRACT(EPOCH FROM operation_end_time - operation_start_time) as duration_seconds,

    -- Operation counts
    total_operations_attempted,
    successful_operations,
    failed_operations,

    -- Performance metrics
    operations_per_second,
    average_batch_size,
    peak_memory_usage_mb,
    network_bytes_transmitted,

    -- Error analysis
    error_rate,
    most_common_error_type,
    retry_count,

    -- Resource utilization
    cpu_usage_percent,
    memory_usage_percent,
    disk_io_operations,

    -- Business impact
    data_volume_processed_mb,
    estimated_cost_savings,
    processing_efficiency_score

  FROM bulk_operation_logs
  WHERE operation_date >= CURRENT_DATE - INTERVAL '30 days'
),

performance_summary AS (
  SELECT 
    operation_type,
    COUNT(*) as total_operations,

    -- Performance statistics
    ROUND(AVG(duration_seconds), 2) as avg_duration_seconds,
    ROUND(AVG(operations_per_second), 0) as avg_throughput,
    ROUND(AVG(processing_efficiency_score), 2) as avg_efficiency,

    -- Error analysis
    ROUND(AVG(error_rate) * 100, 2) as avg_error_rate_percent,
    SUM(failed_operations) as total_failures,

    -- Resource consumption
    ROUND(AVG(peak_memory_usage_mb), 0) as avg_peak_memory_mb,
    ROUND(SUM(data_volume_processed_mb), 0) as total_data_processed_mb,

    -- Performance trends
    ROUND(
      (AVG(operations_per_second) FILTER (WHERE operation_start_time >= CURRENT_DATE - INTERVAL '7 days') - 
       AVG(operations_per_second) FILTER (WHERE operation_start_time < CURRENT_DATE - INTERVAL '7 days')) /
      AVG(operations_per_second) FILTER (WHERE operation_start_time < CURRENT_DATE - INTERVAL '7 days') * 100,
      1
    ) as performance_trend_percent,

    -- Cost analysis
    ROUND(SUM(estimated_cost_savings), 2) as total_cost_savings

  FROM bulk_operation_analytics
  GROUP BY operation_type
),

efficiency_recommendations AS (
  SELECT 
    ps.operation_type,
    ps.total_operations,
    ps.avg_throughput,
    ps.avg_efficiency,

    -- Performance assessment
    CASE 
      WHEN ps.avg_efficiency >= 0.9 THEN 'Excellent'
      WHEN ps.avg_efficiency >= 0.8 THEN 'Good'  
      WHEN ps.avg_efficiency >= 0.7 THEN 'Fair'
      ELSE 'Needs Improvement'
    END as performance_rating,

    -- Optimization recommendations
    CASE 
      WHEN ps.avg_error_rate_percent > 5 THEN 'Focus on error handling and data validation'
      WHEN ps.avg_peak_memory_mb > 1000 THEN 'Optimize memory usage and batch sizing'
      WHEN ps.avg_throughput < 100 THEN 'Increase parallelization and batch optimization'
      WHEN ps.performance_trend_percent < -10 THEN 'Investigate performance degradation'
      ELSE 'Performance is optimized'
    END as primary_recommendation,

    -- Capacity planning
    CASE 
      WHEN ps.total_data_processed_mb > 10000 THEN 'Consider distributed processing'
      WHEN ps.total_operations > 1000 THEN 'Implement advanced caching strategies'  
      ELSE 'Current capacity is sufficient'
    END as capacity_recommendation,

    -- Cost optimization
    ps.total_cost_savings,
    CASE 
      WHEN ps.total_cost_savings < 100 THEN 'Minimal cost impact'
      WHEN ps.total_cost_savings < 1000 THEN 'Moderate cost savings'
      ELSE 'Significant cost optimization achieved'
    END as cost_impact

  FROM performance_summary ps
)

-- Comprehensive bulk operations dashboard
SELECT 
  er.operation_type,
  er.total_operations,
  er.performance_rating,
  er.avg_throughput || ' ops/sec' as throughput,
  er.avg_efficiency * 100 || '%' as efficiency_percentage,

  -- Recommendations
  er.primary_recommendation,
  er.capacity_recommendation,
  er.cost_impact,

  -- Detailed metrics
  JSON_OBJECT(
    'avg_duration', ps.avg_duration_seconds || ' seconds',
    'error_rate', ps.avg_error_rate_percent || '%',
    'memory_usage', ps.avg_peak_memory_mb || ' MB',
    'data_processed', ps.total_data_processed_mb || ' MB',
    'performance_trend', ps.performance_trend_percent || '% change',
    'total_failures', ps.total_failures,
    'cost_savings', '$' || ps.total_cost_savings
  ) as detailed_metrics,

  -- Next actions
  CASE 
    WHEN er.performance_rating = 'Needs Improvement' THEN 
      JSON_ARRAY(
        'Review batch sizing configuration',
        'Analyze error patterns and root causes',
        'Consider infrastructure scaling',
        'Implement performance monitoring alerts'
      )
    WHEN er.performance_rating = 'Fair' THEN 
      JSON_ARRAY(
        'Fine-tune batch optimization parameters',
        'Implement advanced error handling',
        'Consider parallel processing improvements'
      )
    ELSE 
      JSON_ARRAY('Continue monitoring performance trends', 'Maintain current optimization level')
  END as recommended_actions

FROM efficiency_recommendations er
JOIN performance_summary ps ON er.operation_type = ps.operation_type
ORDER BY er.avg_throughput DESC;

-- QueryLeaf provides comprehensive bulk operation capabilities:
-- 1. SQL-familiar bulk insert, update, and delete operations with advanced configuration
-- 2. Sophisticated batch processing with intelligent optimization and error handling
-- 3. Mixed operation transactions with complex business logic and cascade handling
-- 4. Comprehensive monitoring and analytics for operational insight and optimization
-- 5. Production-ready performance tuning with parallel processing and resource management
-- 6. Advanced error handling with retry mechanisms and partial failure recovery
-- 7. Seamless integration with MongoDB's native bulk operation APIs and optimizations
-- 8. Enterprise-scale processing capabilities with distributed operation support
-- 9. Intelligent batch sizing and resource optimization for maximum throughput
-- 10. SQL-style syntax for complex bulk operation workflows and data transformations

Best Practices for Production Bulk Operations Implementation

Performance Architecture and Optimization Strategies

Essential principles for scalable MongoDB bulk operations deployment:

Batch Size Optimization: Configure optimal batch sizes based on document size, operation type, and available resources
Error Handling Strategy: Design comprehensive error handling with retry mechanisms and partial failure recovery
Resource Management: Implement intelligent resource monitoring and automatic optimization for memory and CPU usage
Progress Tracking: Provide detailed progress monitoring and analytics for long-running bulk operations
Transaction Management: Design appropriate transaction boundaries and consistency guarantees for bulk operations
Performance Monitoring: Implement comprehensive monitoring for throughput, latency, and resource utilization

Scalability and Operational Excellence

Optimize bulk operations for enterprise-scale requirements:

Distributed Processing: Design sharding-aware bulk operations that optimize cross-shard performance
Parallel Execution: Implement intelligent parallelization strategies that maximize throughput without overwhelming resources
Memory Optimization: Use streaming processing and memory pooling to handle large datasets efficiently
Network Efficiency: Minimize network overhead through compression and intelligent batching strategies
Error Recovery: Implement robust error recovery and rollback mechanisms for complex bulk operation workflows
Operational Monitoring: Provide comprehensive dashboards and alerting for production bulk operation management

Conclusion

MongoDB bulk operations provide comprehensive high-performance batch processing capabilities that enable efficient handling of large-scale data operations with automatic optimization, intelligent batching, and robust error handling designed specifically for enterprise applications. The native MongoDB integration ensures bulk operations benefit from the same scalability, consistency, and operational features as individual operations while providing significant performance improvements.

Key MongoDB bulk operations benefits include:

High Performance: Optimized batch processing with intelligent operation grouping and minimal network overhead
Comprehensive Error Handling: Robust error management with partial failure recovery and detailed reporting
Flexible Operations: Support for mixed operation types with complex business logic in single batch transactions
Resource Efficiency: Memory-efficient processing with automatic resource management and optimization
Production Scalability: Enterprise-ready performance with distributed processing and advanced monitoring
Operational Excellence: Integrated analytics, progress tracking, and performance optimization tools

Whether you're building ETL pipelines, data migration systems, real-time ingestion platforms, or any application requiring high-volume data processing, MongoDB bulk operations with QueryLeaf's familiar SQL interface provide the foundation for scalable and maintainable batch processing solutions.

QueryLeaf Integration: QueryLeaf automatically optimizes MongoDB bulk operations while providing SQL-familiar syntax for complex batch processing workflows. Advanced bulk operation patterns, error handling strategies, and performance optimization are seamlessly handled through familiar SQL constructs, making sophisticated batch processing capabilities accessible to SQL-oriented development teams.

The combination of MongoDB's robust bulk operation capabilities with SQL-style batch processing makes it an ideal platform for modern applications that require both powerful data processing and familiar database management patterns, ensuring your bulk operation solutions scale efficiently while remaining maintainable and feature-rich.

November 24, 2025
27 min read

MongoDB GridFS and Binary Data Management: Advanced File Storage Solutions for Large-Scale Applications with SQL-Style File Operations

Modern applications require robust file storage solutions that can handle large files, multimedia content, and binary data at scale while providing efficient streaming, versioning, and metadata management capabilities. Traditional file storage approaches struggle with managing large files, handling concurrent access, providing atomic operations, and integrating seamlessly with database transactions and application logic.

MongoDB GridFS provides comprehensive large file storage capabilities that enable efficient handling of binary data, multimedia content, and large documents with automatic chunking, streaming support, and integrated metadata management. Unlike traditional file systems that separate file storage from database operations, GridFS integrates file storage directly into MongoDB, enabling atomic operations, transactions, and unified query capabilities across both structured data and file content.

The Traditional File Storage Challenge

Conventional approaches to large file storage and binary data management have significant limitations for modern applications:

-- Traditional PostgreSQL large object storage - complex and limited integration

-- Basic large object table structure with limited capabilities
CREATE TABLE document_files (
    file_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    filename VARCHAR(255) NOT NULL,
    file_size BIGINT NOT NULL,
    mime_type VARCHAR(100) NOT NULL,
    content_hash VARCHAR(64) NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    -- PostgreSQL large object reference
    content_oid OID NOT NULL,

    -- File metadata
    original_filename VARCHAR(500),
    upload_session_id UUID,
    uploader_user_id UUID,

    -- File properties
    is_public BOOLEAN DEFAULT FALSE,
    download_count INTEGER DEFAULT 0,
    last_accessed TIMESTAMP,

    -- Content analysis
    file_extension VARCHAR(20),
    encoding VARCHAR(50),
    language VARCHAR(10),

    -- Storage metadata
    storage_location VARCHAR(200),
    backup_status VARCHAR(50) DEFAULT 'pending',
    compression_enabled BOOLEAN DEFAULT FALSE
);

-- Image-specific metadata table
CREATE TABLE image_files (
    file_id UUID PRIMARY KEY REFERENCES document_files(file_id),
    width INTEGER,
    height INTEGER,
    color_depth INTEGER,
    has_transparency BOOLEAN,
    image_format VARCHAR(20),
    resolution_dpi INTEGER,
    color_profile VARCHAR(100),

    -- Image processing metadata
    thumbnail_generated BOOLEAN DEFAULT FALSE,
    processed_versions JSONB,
    exif_data JSONB
);

-- Video-specific metadata table  
CREATE TABLE video_files (
    file_id UUID PRIMARY KEY REFERENCES document_files(file_id),
    duration_seconds INTEGER,
    width INTEGER,
    height INTEGER,
    frame_rate DECIMAL(5,2),
    video_codec VARCHAR(50),
    audio_codec VARCHAR(50),
    bitrate INTEGER,
    container_format VARCHAR(20),

    -- Video processing metadata
    thumbnails_generated BOOLEAN DEFAULT FALSE,
    preview_clips JSONB,
    processing_status VARCHAR(50) DEFAULT 'pending'
);

-- Audio file metadata table
CREATE TABLE audio_files (
    file_id UUID PRIMARY KEY REFERENCES document_files(file_id),
    duration_seconds INTEGER,
    sample_rate INTEGER,
    channels INTEGER,
    bitrate INTEGER,
    audio_codec VARCHAR(50),
    container_format VARCHAR(20),

    -- Audio metadata
    title VARCHAR(200),
    artist VARCHAR(200),
    album VARCHAR(200),
    genre VARCHAR(100),
    year INTEGER,

    -- Processing metadata
    waveform_generated BOOLEAN DEFAULT FALSE,
    transcription_status VARCHAR(50)
);

-- Complex file chunk management for large files
CREATE TABLE file_chunks (
    chunk_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    file_id UUID NOT NULL REFERENCES document_files(file_id),
    chunk_number INTEGER NOT NULL,
    chunk_size INTEGER NOT NULL,
    chunk_hash VARCHAR(64) NOT NULL,
    content_oid OID NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    UNIQUE(file_id, chunk_number)
);

-- Index for chunk retrieval performance
CREATE INDEX idx_file_chunks_file_id_number ON file_chunks (file_id, chunk_number);
CREATE INDEX idx_document_files_hash ON document_files (content_hash);
CREATE INDEX idx_document_files_mime_type ON document_files (mime_type);
CREATE INDEX idx_document_files_created ON document_files (created_at);

-- Complex file upload and streaming implementation
CREATE OR REPLACE FUNCTION upload_large_file(
    p_filename TEXT,
    p_file_content BYTEA,
    p_mime_type TEXT DEFAULT 'application/octet-stream',
    p_user_id UUID DEFAULT NULL,
    p_chunk_size INTEGER DEFAULT 1048576  -- 1MB chunks
) RETURNS UUID
LANGUAGE plpgsql
AS $$
DECLARE
    v_file_id UUID;
    v_content_oid OID;
    v_file_size BIGINT;
    v_content_hash TEXT;
    v_chunk_count INTEGER;
    v_chunk_start INTEGER;
    v_chunk_end INTEGER;
    v_chunk_content BYTEA;
    v_chunk_oid OID;
    i INTEGER;
BEGIN
    -- Calculate file properties
    v_file_size := length(p_file_content);
    v_content_hash := encode(digest(p_file_content, 'sha256'), 'hex');
    v_chunk_count := CEIL(v_file_size::DECIMAL / p_chunk_size);

    -- Check for duplicate content
    SELECT file_id INTO v_file_id 
    FROM document_files 
    WHERE content_hash = v_content_hash;

    IF v_file_id IS NOT NULL THEN
        -- Update access count for existing file
        UPDATE document_files 
        SET download_count = download_count + 1,
            last_accessed = CURRENT_TIMESTAMP
        WHERE file_id = v_file_id;
        RETURN v_file_id;
    END IF;

    -- Generate new file ID
    v_file_id := gen_random_uuid();

    -- Store main file content as large object
    v_content_oid := lo_create(0);
    PERFORM lo_put(v_content_oid, 0, p_file_content);

    -- Insert file metadata
    INSERT INTO document_files (
        file_id, filename, file_size, mime_type, content_hash,
        content_oid, original_filename, uploader_user_id,
        file_extension, storage_location
    ) VALUES (
        v_file_id, p_filename, v_file_size, p_mime_type, v_content_hash,
        v_content_oid, p_filename, p_user_id,
        SUBSTRING(p_filename FROM '\.([^.]*)$'),
        'postgresql_large_objects'
    );

    -- Create chunks for streaming and partial access
    FOR i IN 0..(v_chunk_count - 1) LOOP
        v_chunk_start := i * p_chunk_size;
        v_chunk_end := LEAST((i + 1) * p_chunk_size - 1, v_file_size - 1);

        -- Extract chunk content
        v_chunk_content := SUBSTRING(p_file_content FROM v_chunk_start + 1 FOR (v_chunk_end - v_chunk_start + 1));

        -- Store chunk as separate large object
        v_chunk_oid := lo_create(0);
        PERFORM lo_put(v_chunk_oid, 0, v_chunk_content);

        -- Insert chunk metadata
        INSERT INTO file_chunks (
            file_id, chunk_number, chunk_size, 
            chunk_hash, content_oid
        ) VALUES (
            v_file_id, i, length(v_chunk_content),
            encode(digest(v_chunk_content, 'md5'), 'hex'), v_chunk_oid
        );
    END LOOP;

    RETURN v_file_id;

EXCEPTION
    WHEN OTHERS THEN
        -- Cleanup on error
        IF v_content_oid IS NOT NULL THEN
            PERFORM lo_unlink(v_content_oid);
        END IF;
        RAISE;
END;
$$;

-- Complex streaming download function
CREATE OR REPLACE FUNCTION stream_file_chunk(
    p_file_id UUID,
    p_chunk_number INTEGER
) RETURNS TABLE(
    chunk_content BYTEA,
    chunk_size INTEGER,
    total_chunks INTEGER,
    file_size BIGINT,
    mime_type TEXT
)
LANGUAGE plpgsql
AS $$
DECLARE
    v_chunk_oid OID;
    v_content BYTEA;
BEGIN
    -- Get chunk information
    SELECT 
        fc.content_oid, fc.chunk_size,
        (SELECT COUNT(*) FROM file_chunks WHERE file_id = p_file_id),
        df.file_size, df.mime_type
    INTO v_chunk_oid, chunk_size, total_chunks, file_size, mime_type
    FROM file_chunks fc
    JOIN document_files df ON fc.file_id = df.file_id
    WHERE fc.file_id = p_file_id 
    AND fc.chunk_number = p_chunk_number;

    IF v_chunk_oid IS NULL THEN
        RAISE EXCEPTION 'Chunk not found: file_id=%, chunk=%', p_file_id, p_chunk_number;
    END IF;

    -- Read chunk content
    SELECT lo_get(v_chunk_oid) INTO v_content;
    chunk_content := v_content;

    -- Update access statistics
    UPDATE document_files 
    SET last_accessed = CURRENT_TIMESTAMP,
        download_count = CASE 
            WHEN p_chunk_number = 0 THEN download_count + 1 
            ELSE download_count 
        END
    WHERE file_id = p_file_id;

    RETURN NEXT;
END;
$$;

-- File search and management with limited capabilities
WITH file_analytics AS (
    SELECT 
        df.file_id,
        df.filename,
        df.file_size,
        df.mime_type,
        df.created_at,
        df.download_count,
        df.uploader_user_id,

        -- Size categorization
        CASE 
            WHEN df.file_size < 1048576 THEN 'small'    -- < 1MB
            WHEN df.file_size < 104857600 THEN 'medium' -- < 100MB
            WHEN df.file_size < 1073741824 THEN 'large' -- < 1GB
            ELSE 'xlarge'  -- >= 1GB
        END as size_category,

        -- Type categorization
        CASE 
            WHEN df.mime_type LIKE 'image/%' THEN 'image'
            WHEN df.mime_type LIKE 'video/%' THEN 'video'  
            WHEN df.mime_type LIKE 'audio/%' THEN 'audio'
            WHEN df.mime_type LIKE 'application/pdf' THEN 'document'
            WHEN df.mime_type LIKE 'text/%' THEN 'text'
            ELSE 'other'
        END as content_type,

        -- Storage efficiency
        (SELECT COUNT(*) FROM file_chunks WHERE file_id = df.file_id) as chunk_count,

        -- Usage metrics  
        EXTRACT(DAYS FROM CURRENT_TIMESTAMP - df.last_accessed) as days_since_access,

        -- Duplication analysis (limited by hash comparison only)
        (
            SELECT COUNT(*) - 1 
            FROM document_files df2 
            WHERE df2.content_hash = df.content_hash 
            AND df2.file_id != df.file_id
        ) as duplicate_count

    FROM document_files df
    WHERE df.created_at >= CURRENT_DATE - INTERVAL '90 days'
),
storage_summary AS (
    SELECT 
        content_type,
        size_category,
        COUNT(*) as file_count,
        SUM(file_size) as total_size_bytes,
        ROUND(AVG(file_size)::numeric, 0) as avg_file_size,
        SUM(download_count) as total_downloads,
        ROUND(AVG(download_count)::numeric, 1) as avg_downloads_per_file,

        -- Storage optimization opportunities
        SUM(CASE WHEN duplicate_count > 0 THEN file_size ELSE 0 END) as duplicate_storage_waste,
        COUNT(CASE WHEN days_since_access > 30 THEN 1 END) as stale_files,
        SUM(CASE WHEN days_since_access > 30 THEN file_size ELSE 0 END) as stale_storage_bytes

    FROM file_analytics
    GROUP BY content_type, size_category
)
SELECT 
    ss.content_type,
    ss.size_category,
    ss.file_count,

    -- Size formatting
    CASE 
        WHEN ss.total_size_bytes >= 1073741824 THEN 
            ROUND((ss.total_size_bytes / 1073741824.0)::numeric, 2) || ' GB'
        WHEN ss.total_size_bytes >= 1048576 THEN 
            ROUND((ss.total_size_bytes / 1048576.0)::numeric, 2) || ' MB'  
        WHEN ss.total_size_bytes >= 1024 THEN 
            ROUND((ss.total_size_bytes / 1024.0)::numeric, 2) || ' KB'
        ELSE ss.total_size_bytes || ' bytes'
    END as total_storage,

    ss.avg_file_size,
    ss.total_downloads,
    ss.avg_downloads_per_file,

    -- Storage optimization insights
    CASE 
        WHEN ss.duplicate_storage_waste > 0 THEN 
            ROUND((ss.duplicate_storage_waste / 1048576.0)::numeric, 2) || ' MB duplicate waste'
        ELSE 'No duplicates found'
    END as duplication_impact,

    ss.stale_files,
    CASE 
        WHEN ss.stale_storage_bytes > 0 THEN 
            ROUND((ss.stale_storage_bytes / 1048576.0)::numeric, 2) || ' MB in stale files'
        ELSE 'No stale files'
    END as stale_storage_impact,

    -- Storage efficiency recommendations
    CASE 
        WHEN ss.duplicate_count > ss.file_count * 0.1 THEN 'Implement deduplication'
        WHEN ss.stale_files > ss.file_count * 0.2 THEN 'Archive old files'
        WHEN ss.avg_file_size > 104857600 AND ss.content_type != 'video' THEN 'Consider compression'
        ELSE 'Storage optimized'
    END as optimization_recommendation

FROM storage_summary ss
ORDER BY ss.total_size_bytes DESC;

-- Problems with traditional file storage approaches:
-- 1. Complex chunking and streaming implementation with manual management
-- 2. Separate storage of file content and metadata in different systems  
-- 3. No atomic operations across file content and related database records
-- 4. Limited query capabilities for file content and metadata together
-- 5. Manual deduplication and storage optimization required
-- 6. Poor integration with application transactions and consistency
-- 7. Complex backup and replication strategies for large object storage
-- 8. Limited support for file versioning and concurrent access
-- 9. Difficult to implement advanced features like content-based indexing
-- 10. Scalability limitations with very large files and high concurrency

-- MySQL file storage (even more limited)
CREATE TABLE mysql_files (
    id BIGINT AUTO_INCREMENT PRIMARY KEY,
    filename VARCHAR(255) NOT NULL,
    file_content LONGBLOB,  -- Limited to ~4GB
    mime_type VARCHAR(100),
    file_size INT UNSIGNED,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    INDEX idx_filename (filename),
    INDEX idx_mime_type (mime_type)
);

-- Basic file insertion (limited by LONGBLOB size)
INSERT INTO mysql_files (filename, file_content, mime_type, file_size)
VALUES (?, ?, ?, LENGTH(?));

-- Simple file retrieval (no streaming capabilities)
SELECT file_content, mime_type, file_size 
FROM mysql_files 
WHERE id = ?;

-- MySQL limitations for file storage:
-- - LONGBLOB limited to ~4GB maximum file size
-- - No built-in chunking or streaming capabilities  
-- - Poor performance with large binary data
-- - No atomic operations with file content and metadata
-- - Limited backup and replication options for large files
-- - No advanced features like deduplication or versioning
-- - Basic search capabilities limited to filename and metadata

MongoDB GridFS provides comprehensive large file storage and binary data management:

// MongoDB GridFS - advanced large file storage with comprehensive binary data management
const { MongoClient, GridFSBucket } = require('mongodb');
const { createReadStream, createWriteStream } = require('fs');
const { pipeline } = require('stream');
const { promisify } = require('util');
const crypto = require('crypto');

const client = new MongoClient('mongodb://localhost:27017');
const db = client.db('advanced_file_management_platform');

// Advanced GridFS file management and multimedia processing system
class AdvancedGridFSManager {
  constructor(db, config = {}) {
    this.db = db;
    this.config = {
      chunkSizeBytes: config.chunkSizeBytes || 261120, // 255KB chunks
      maxFileSizeBytes: config.maxFileSizeBytes || 16 * 1024 * 1024 * 1024, // 16GB
      enableCompression: config.enableCompression || true,
      enableDeduplication: config.enableDeduplication || true,
      enableVersioning: config.enableVersioning || true,
      enableContentAnalysis: config.enableContentAnalysis || true,

      // Storage optimization
      compressionThreshold: config.compressionThreshold || 1048576, // 1MB
      dedupHashAlgorithm: config.dedupHashAlgorithm || 'sha256',
      thumbnailGeneration: config.thumbnailGeneration || true,
      contentIndexing: config.contentIndexing || true,

      // Performance tuning
      concurrentUploads: config.concurrentUploads || 10,
      streamingChunkSize: config.streamingChunkSize || 1024 * 1024, // 1MB
      cacheStrategy: config.cacheStrategy || 'lru',
      maxCacheSize: config.maxCacheSize || 100 * 1024 * 1024 // 100MB
    };

    // Initialize GridFS buckets for different content types
    this.buckets = {
      files: new GridFSBucket(db, { 
        bucketName: 'files',
        chunkSizeBytes: this.config.chunkSizeBytes
      }),
      images: new GridFSBucket(db, { 
        bucketName: 'images',
        chunkSizeBytes: this.config.chunkSizeBytes
      }),
      videos: new GridFSBucket(db, { 
        bucketName: 'videos',
        chunkSizeBytes: this.config.chunkSizeBytes
      }),
      audio: new GridFSBucket(db, { 
        bucketName: 'audio',
        chunkSizeBytes: this.config.chunkSizeBytes
      }),
      documents: new GridFSBucket(db, { 
        bucketName: 'documents',
        chunkSizeBytes: this.config.chunkSizeBytes
      }),
      archives: new GridFSBucket(db, { 
        bucketName: 'archives',
        chunkSizeBytes: this.config.chunkSizeBytes
      })
    };

    // File processing queues and caches
    this.processingQueue = new Map();
    this.contentCache = new Map();
    this.metadataCache = new Map();

    this.setupIndexes();
    this.initializeContentProcessors();
  }

  async setupIndexes() {
    console.log('Setting up GridFS performance indexes...');

    try {
      // Index configurations for all buckets
      const indexConfigs = [
        // Filename and content type indexes
        { 'filename': 1, 'metadata.contentType': 1 },
        { 'metadata.contentType': 1, 'uploadDate': -1 },

        // Content-based indexes
        { 'metadata.contentHash': 1 }, // For deduplication
        { 'metadata.originalHash': 1 },
        { 'metadata.fileSize': 1 },

        // Access pattern indexes
        { 'metadata.createdBy': 1, 'uploadDate': -1 },
        { 'metadata.accessCount': -1 },
        { 'metadata.lastAccessed': -1 },

        // Content analysis indexes
        { 'metadata.tags': 1 },
        { 'metadata.category': 1, 'metadata.subcategory': 1 },
        { 'metadata.language': 1 },
        { 'metadata.processingStatus': 1 },

        // Multimedia-specific indexes
        { 'metadata.imageProperties.width': 1, 'metadata.imageProperties.height': 1 },
        { 'metadata.videoProperties.duration': 1 },
        { 'metadata.audioProperties.duration': 1 },

        // Version and relationship indexes
        { 'metadata.version': 1, 'metadata.baseFileId': 1 },
        { 'metadata.parentFileId': 1 },
        { 'metadata.derivedFrom': 1 },

        // Storage optimization indexes
        { 'metadata.storageClass': 1 },
        { 'metadata.compressionRatio': 1 },
        { 'metadata.isCompressed': 1 },

        // Search and discovery indexes
        { 'metadata.searchableText': 'text' },
        { '$**': 'text' }, // Wildcard text index for flexible search

        // Geospatial indexes for location-based files
        { 'metadata.location': '2dsphere' },

        // Compound indexes for complex queries
        { 'metadata.contentType': 1, 'metadata.fileSize': -1, 'uploadDate': -1 },
        { 'metadata.createdBy': 1, 'metadata.contentType': 1, 'metadata.isPublic': 1 },
        { 'metadata.category': 1, 'metadata.processingStatus': 1, 'uploadDate': -1 }
      ];

      // Apply indexes to all bucket collections
      for (const [bucketName, bucket] of Object.entries(this.buckets)) {
        const filesCollection = this.db.collection(`${bucketName}.files`);
        const chunksCollection = this.db.collection(`${bucketName}.chunks`);

        // Files collection indexes
        for (const indexSpec of indexConfigs) {
          try {
            await filesCollection.createIndex(indexSpec, { background: true });
          } catch (error) {
            if (!error.message.includes('already exists')) {
              console.warn(`Index creation warning for ${bucketName}.files:`, error.message);
            }
          }
        }

        // Chunks collection optimization
        await chunksCollection.createIndex(
          { files_id: 1, n: 1 }, 
          { background: true, unique: true }
        );
      }

      console.log('GridFS indexes created successfully');
    } catch (error) {
      console.error('Error setting up GridFS indexes:', error);
      throw error;
    }
  }

  async uploadFile(fileStream, filename, metadata = {}, options = {}) {
    console.log(`Starting GridFS upload: ${filename}`);
    const uploadStart = Date.now();

    try {
      // Determine appropriate bucket based on content type
      const bucket = this.selectBucket(metadata.contentType || options.contentType);

      // Prepare comprehensive metadata
      const fileMetadata = await this.prepareFileMetadata(filename, metadata, options);

      // Check for deduplication if enabled
      if (this.config.enableDeduplication && fileMetadata.contentHash) {
        const existingFile = await this.checkForDuplicate(fileMetadata.contentHash);
        if (existingFile) {
          console.log(`Duplicate file found, linking to existing: ${existingFile._id}`);
          return await this.linkToDuplicate(existingFile, fileMetadata);
        }
      }

      // Create upload stream with compression if needed
      const uploadStream = bucket.openUploadStream(filename, {
        chunkSizeBytes: options.chunkSize || this.config.chunkSizeBytes,
        metadata: fileMetadata
      });

      // Set up progress tracking and error handling
      let uploadedBytes = 0;
      const totalSize = fileMetadata.fileSize || 0;

      uploadStream.on('progress', (bytesUploaded) => {
        uploadedBytes = bytesUploaded;
        if (options.onProgress) {
          options.onProgress({
            filename,
            uploadedBytes,
            totalSize,
            percentage: totalSize ? Math.round((uploadedBytes / totalSize) * 100) : 0
          });
        }
      });

      // Handle upload completion
      const uploadPromise = new Promise((resolve, reject) => {
        uploadStream.on('finish', async () => {
          try {
            const uploadTime = Date.now() - uploadStart;
            console.log(`Upload completed: ${filename} (${uploadTime}ms)`);

            // Post-upload processing
            const fileDoc = await this.getFileById(uploadStream.id);

            // Queue for content processing if enabled
            if (this.config.enableContentAnalysis) {
              await this.queueContentProcessing(fileDoc);
            }

            // Update upload statistics
            await this.updateUploadStatistics(fileDoc, uploadTime);

            resolve(fileDoc);
          } catch (error) {
            reject(error);
          }
        });

        uploadStream.on('error', reject);
      });

      // Pipe file stream to GridFS with compression if needed
      if (this.shouldCompress(fileMetadata)) {
        const compressionStream = this.createCompressionStream();
        pipeline(fileStream, compressionStream, uploadStream, (error) => {
          if (error) {
            console.error(`Upload pipeline error for ${filename}:`, error);
            uploadStream.destroy(error);
          }
        });
      } else {
        pipeline(fileStream, uploadStream, (error) => {
          if (error) {
            console.error(`Upload pipeline error for ${filename}:`, error);
            uploadStream.destroy(error);
          }
        });
      }

      return await uploadPromise;

    } catch (error) {
      console.error(`GridFS upload error for ${filename}:`, error);
      throw error;
    }
  }

  async prepareFileMetadata(filename, providedMetadata, options) {
    // Generate comprehensive file metadata
    const metadata = {
      // Basic file information
      originalFilename: filename,
      uploadedAt: new Date(),
      createdBy: options.userId || null,
      fileSize: providedMetadata.fileSize || null,

      // Content identification
      contentType: providedMetadata.contentType || this.detectContentType(filename),
      fileExtension: this.extractFileExtension(filename),
      encoding: providedMetadata.encoding || 'binary',

      // Content hashing for deduplication
      contentHash: providedMetadata.contentHash || null,
      originalHash: providedMetadata.originalHash || null,

      // Access control and visibility
      isPublic: options.isPublic || false,
      accessLevel: options.accessLevel || 'private',
      permissions: options.permissions || {},

      // Classification and organization
      category: providedMetadata.category || this.categorizeByContentType(providedMetadata.contentType),
      subcategory: providedMetadata.subcategory || null,
      tags: providedMetadata.tags || [],
      keywords: providedMetadata.keywords || [],

      // Content properties (will be updated during processing)
      language: providedMetadata.language || null,
      searchableText: providedMetadata.searchableText || '',

      // Processing status
      processingStatus: 'uploaded',
      processingQueue: [],
      processingResults: {},

      // Storage optimization
      isCompressed: false,
      compressionAlgorithm: null,
      compressionRatio: null,
      storageClass: options.storageClass || 'standard',

      // Usage tracking
      accessCount: 0,
      downloadCount: 0,
      lastAccessed: null,

      // Versioning and relationships
      version: options.version || 1,
      baseFileId: options.baseFileId || null,
      parentFileId: options.parentFileId || null,
      derivedFrom: options.derivedFrom || null,
      hasVersions: false,

      // Location and context
      location: providedMetadata.location || null,
      uploadSource: options.uploadSource || 'api',
      uploadSessionId: options.uploadSessionId || null,

      // Custom metadata
      customFields: providedMetadata.customFields || {},
      applicationData: providedMetadata.applicationData || {},

      // Media-specific properties (initialized empty, filled during processing)
      imageProperties: {},
      videoProperties: {},
      audioProperties: {},
      documentProperties: {}
    };

    return metadata;
  }

  async downloadFile(fileId, options = {}) {
    console.log(`Starting GridFS download: ${fileId}`);

    try {
      // Get file document first
      const fileDoc = await this.getFileById(fileId);
      if (!fileDoc) {
        throw new Error(`File not found: ${fileId}`);
      }

      // Check access permissions
      if (!await this.checkDownloadPermissions(fileDoc, options.userId)) {
        throw new Error('Insufficient permissions to download file');
      }

      // Select appropriate bucket
      const bucket = this.selectBucketForFile(fileDoc);

      // Create download stream with range support
      let downloadStream;

      if (options.range) {
        // Partial download with HTTP range support
        downloadStream = bucket.openDownloadStream(fileDoc._id, {
          start: options.range.start,
          end: options.range.end
        });
      } else {
        // Full file download
        downloadStream = bucket.openDownloadStream(fileDoc._id);
      }

      // Set up decompression if needed
      let finalStream = downloadStream;
      if (fileDoc.metadata.isCompressed && !options.skipDecompression) {
        const decompressionStream = this.createDecompressionStream(
          fileDoc.metadata.compressionAlgorithm
        );
        finalStream = pipeline(downloadStream, decompressionStream, () => {});
      }

      // Track download statistics
      downloadStream.on('file', async () => {
        await this.updateDownloadStatistics(fileDoc);
      });

      // Handle streaming errors
      downloadStream.on('error', (error) => {
        console.error(`Download error for file ${fileId}:`, error);
        throw error;
      });

      return {
        stream: finalStream,
        metadata: fileDoc.metadata,
        filename: fileDoc.filename,
        contentType: fileDoc.metadata.contentType,
        fileSize: fileDoc.length
      };

    } catch (error) {
      console.error(`GridFS download error for ${fileId}:`, error);
      throw error;
    }
  }

  async searchFiles(query, options = {}) {
    console.log('Performing advanced GridFS file search...', query);

    try {
      // Build comprehensive search pipeline
      const searchPipeline = this.buildFileSearchPipeline(query, options);

      // Select appropriate bucket or search across all
      const results = [];
      const bucketsToSearch = options.bucket ? [options.bucket] : Object.keys(this.buckets);

      for (const bucketName of bucketsToSearch) {
        const filesCollection = this.db.collection(`${bucketName}.files`);
        const bucketResults = await filesCollection.aggregate(searchPipeline).toArray();

        // Add bucket context to results
        const enhancedResults = bucketResults.map(result => ({
          ...result,
          bucketName: bucketName,
          downloadUrl: `/api/files/${bucketName}/${result._id}/download`,
          previewUrl: `/api/files/${bucketName}/${result._id}/preview`,
          metadataUrl: `/api/files/${bucketName}/${result._id}/metadata`
        }));

        results.push(...enhancedResults);
      }

      // Sort combined results by relevance
      results.sort((a, b) => (b.searchScore || 0) - (a.searchScore || 0));

      return {
        results: results.slice(0, options.limit || 50),
        totalCount: results.length,
        searchQuery: query,
        searchOptions: options,
        executionTime: Date.now()
      };

    } catch (error) {
      console.error('GridFS search error:', error);
      throw error;
    }
  }

  buildFileSearchPipeline(query, options) {
    const pipeline = [];
    const matchStage = {};

    // Text search across filename and searchable content
    if (query.text) {
      matchStage.$text = {
        $search: query.text,
        $caseSensitive: false,
        $diacriticSensitive: false
      };
    }

    // Content type filtering
    if (query.contentType) {
      matchStage['metadata.contentType'] = Array.isArray(query.contentType) 
        ? { $in: query.contentType }
        : query.contentType;
    }

    // File size filtering
    if (query.minSize || query.maxSize) {
      matchStage.length = {};
      if (query.minSize) matchStage.length.$gte = query.minSize;
      if (query.maxSize) matchStage.length.$lte = query.maxSize;
    }

    // Date range filtering
    if (query.dateFrom || query.dateTo) {
      matchStage.uploadDate = {};
      if (query.dateFrom) matchStage.uploadDate.$gte = new Date(query.dateFrom);
      if (query.dateTo) matchStage.uploadDate.$lte = new Date(query.dateTo);
    }

    // User/creator filtering
    if (query.createdBy) {
      matchStage['metadata.createdBy'] = query.createdBy;
    }

    // Category and tag filtering
    if (query.category) {
      matchStage['metadata.category'] = query.category;
    }

    if (query.tags && query.tags.length > 0) {
      matchStage['metadata.tags'] = { $in: query.tags };
    }

    // Processing status filtering
    if (query.processingStatus) {
      matchStage['metadata.processingStatus'] = query.processingStatus;
    }

    // Access level filtering
    if (query.accessLevel) {
      matchStage['metadata.accessLevel'] = query.accessLevel;
    }

    // Public/private filtering
    if (query.isPublic !== undefined) {
      matchStage['metadata.isPublic'] = query.isPublic;
    }

    // Add match stage
    if (Object.keys(matchStage).length > 0) {
      pipeline.push({ $match: matchStage });
    }

    // Add search scoring for text queries
    if (query.text) {
      pipeline.push({
        $addFields: {
          searchScore: { $meta: 'textScore' }
        }
      });
    }

    // Add computed fields for enhanced results
    pipeline.push({
      $addFields: {
        fileSizeFormatted: {
          $switch: {
            branches: [
              { case: { $gte: ['$length', 1073741824] }, then: { $concat: [{ $toString: { $round: [{ $divide: ['$length', 1073741824] }, 2] } }, ' GB'] } },
              { case: { $gte: ['$length', 1048576] }, then: { $concat: [{ $toString: { $round: [{ $divide: ['$length', 1048576] }, 2] } }, ' MB'] } },
              { case: { $gte: ['$length', 1024] }, then: { $concat: [{ $toString: { $round: [{ $divide: ['$length', 1024] }, 2] } }, ' KB'] } }
            ],
            default: { $concat: [{ $toString: '$length' }, ' bytes'] }
          }
        },

        uploadDateFormatted: {
          $dateToString: {
            format: '%Y-%m-%d %H:%M:%S',
            date: '$uploadDate'
          }
        },

        // Content category for display
        contentCategory: {
          $switch: {
            branches: [
              { case: { $regexMatch: { input: '$metadata.contentType', regex: '^image/' } }, then: 'Image' },
              { case: { $regexMatch: { input: '$metadata.contentType', regex: '^video/' } }, then: 'Video' },
              { case: { $regexMatch: { input: '$metadata.contentType', regex: '^audio/' } }, then: 'Audio' },
              { case: { $regexMatch: { input: '$metadata.contentType', regex: '^text/' } }, then: 'Text' },
              { case: { $eq: ['$metadata.contentType', 'application/pdf'] }, then: 'PDF Document' }
            ],
            default: 'Other'
          }
        },

        // Processing status indicator
        processingStatusDisplay: {
          $switch: {
            branches: [
              { case: { $eq: ['$metadata.processingStatus', 'uploaded'] }, then: 'Ready' },
              { case: { $eq: ['$metadata.processingStatus', 'processing'] }, then: 'Processing...' },
              { case: { $eq: ['$metadata.processingStatus', 'completed'] }, then: 'Processed' },
              { case: { $eq: ['$metadata.processingStatus', 'failed'] }, then: 'Processing Failed' }
            ],
            default: 'Unknown'
          }
        },

        // Popularity indicator
        popularityScore: {
          $multiply: [
            { $log10: { $add: [{ $ifNull: ['$metadata.downloadCount', 0] }, 1] } },
            { $log10: { $add: [{ $ifNull: ['$metadata.accessCount', 0] }, 1] } }
          ]
        }
      }
    });

    // Sorting
    const sortStage = {};
    if (query.text) {
      sortStage.searchScore = { $meta: 'textScore' };
    }

    if (options.sortBy) {
      switch (options.sortBy) {
        case 'uploadDate':
          sortStage.uploadDate = options.sortOrder === 'asc' ? 1 : -1;
          break;
        case 'fileSize':
          sortStage.length = options.sortOrder === 'asc' ? 1 : -1;
          break;
        case 'filename':
          sortStage.filename = options.sortOrder === 'asc' ? 1 : -1;
          break;
        case 'popularity':
          sortStage.popularityScore = -1;
          break;
        default:
          sortStage.uploadDate = -1;
      }
    } else {
      sortStage.uploadDate = -1; // Default sort by upload date
    }

    pipeline.push({ $sort: sortStage });

    // Pagination
    if (options.skip) {
      pipeline.push({ $skip: options.skip });
    }

    if (options.limit) {
      pipeline.push({ $limit: options.limit });
    }

    return pipeline;
  }

  async processMultimediaContent(fileDoc) {
    console.log(`Processing multimedia content: ${fileDoc.filename}`);

    try {
      const contentType = fileDoc.metadata.contentType;
      let processingResults = {};

      // Update processing status
      await this.updateFileMetadata(fileDoc._id, {
        'metadata.processingStatus': 'processing',
        'metadata.processingStarted': new Date()
      });

      // Image processing
      if (contentType.startsWith('image/')) {
        processingResults.image = await this.processImageFile(fileDoc);
      }
      // Video processing
      else if (contentType.startsWith('video/')) {
        processingResults.video = await this.processVideoFile(fileDoc);
      }
      // Audio processing
      else if (contentType.startsWith('audio/')) {
        processingResults.audio = await this.processAudioFile(fileDoc);
      }
      // Document processing
      else if (this.isDocumentType(contentType)) {
        processingResults.document = await this.processDocumentFile(fileDoc);
      }

      // Update file with processing results
      await this.updateFileMetadata(fileDoc._id, {
        'metadata.processingStatus': 'completed',
        'metadata.processingCompleted': new Date(),
        'metadata.processingResults': processingResults,
        'metadata.imageProperties': processingResults.image || {},
        'metadata.videoProperties': processingResults.video || {},
        'metadata.audioProperties': processingResults.audio || {},
        'metadata.documentProperties': processingResults.document || {}
      });

      console.log(`Multimedia processing completed: ${fileDoc.filename}`);
      return processingResults;

    } catch (error) {
      console.error(`Multimedia processing error for ${fileDoc.filename}:`, error);

      // Update error status
      await this.updateFileMetadata(fileDoc._id, {
        'metadata.processingStatus': 'failed',
        'metadata.processingError': error.message,
        'metadata.processingCompleted': new Date()
      });

      throw error;
    }
  }

  async processImageFile(fileDoc) {
    // Image processing implementation
    return {
      width: 1920,
      height: 1080,
      colorDepth: 24,
      hasTransparency: false,
      format: 'jpeg',
      resolutionDpi: 72,
      colorProfile: 'sRGB',
      thumbnailGenerated: true,
      exifData: {}
    };
  }

  async processVideoFile(fileDoc) {
    // Video processing implementation
    return {
      duration: 120.5,
      width: 1920,
      height: 1080,
      frameRate: 29.97,
      videoCodec: 'h264',
      audioCodec: 'aac',
      bitrate: 2500000,
      containerFormat: 'mp4',
      thumbnailsGenerated: true,
      previewClips: []
    };
  }

  async processAudioFile(fileDoc) {
    // Audio processing implementation
    return {
      duration: 245.3,
      sampleRate: 44100,
      channels: 2,
      bitrate: 320000,
      codec: 'mp3',
      containerFormat: 'mp3',
      title: 'Unknown',
      artist: 'Unknown',
      album: 'Unknown',
      waveformGenerated: true
    };
  }

  async performFileAnalytics(options = {}) {
    console.log('Performing comprehensive GridFS analytics...');

    try {
      const analytics = {};

      // Analyze each bucket
      for (const [bucketName, bucket] of Object.entries(this.buckets)) {
        console.log(`Analyzing bucket: ${bucketName}`);

        const filesCollection = this.db.collection(`${bucketName}.files`);
        const chunksCollection = this.db.collection(`${bucketName}.chunks`);

        // Basic statistics
        const totalFiles = await filesCollection.countDocuments();
        const totalSizeResult = await filesCollection.aggregate([
          { $group: { _id: null, totalSize: { $sum: '$length' } } }
        ]).toArray();

        const totalSize = totalSizeResult[0]?.totalSize || 0;

        // Content type distribution
        const contentTypeDistribution = await filesCollection.aggregate([
          {
            $group: {
              _id: '$metadata.contentType',
              count: { $sum: 1 },
              totalSize: { $sum: '$length' },
              avgSize: { $avg: '$length' }
            }
          },
          { $sort: { count: -1 } }
        ]).toArray();

        // Upload trends
        const uploadTrends = await filesCollection.aggregate([
          {
            $group: {
              _id: {
                year: { $year: '$uploadDate' },
                month: { $month: '$uploadDate' },
                day: { $dayOfMonth: '$uploadDate' }
              },
              dailyUploads: { $sum: 1 },
              dailySize: { $sum: '$length' }
            }
          },
          { $sort: { '_id.year': 1, '_id.month': 1, '_id.day': 1 } },
          { $limit: 30 } // Last 30 days
        ]).toArray();

        // Storage efficiency analysis
        const compressionAnalysis = await filesCollection.aggregate([
          {
            $group: {
              _id: '$metadata.isCompressed',
              count: { $sum: 1 },
              totalSize: { $sum: '$length' },
              avgCompressionRatio: { $avg: '$metadata.compressionRatio' }
            }
          }
        ]).toArray();

        // Usage patterns
        const usagePatterns = await filesCollection.aggregate([
          {
            $group: {
              _id: null,
              totalDownloads: { $sum: '$metadata.downloadCount' },
              totalAccesses: { $sum: '$metadata.accessCount' },
              avgDownloadsPerFile: { $avg: '$metadata.downloadCount' },
              mostDownloaded: { $max: '$metadata.downloadCount' }
            }
          }
        ]).toArray();

        // Chunk analysis
        const chunkAnalysis = await chunksCollection.aggregate([
          {
            $group: {
              _id: null,
              totalChunks: { $sum: 1 },
              avgChunkSize: { $avg: { $binarySize: '$data' } },
              minChunkSize: { $min: { $binarySize: '$data' } },
              maxChunkSize: { $max: { $binarySize: '$data' } }
            }
          }
        ]).toArray();

        analytics[bucketName] = {
          summary: {
            totalFiles,
            totalSize,
            avgFileSize: totalFiles > 0 ? Math.round(totalSize / totalFiles) : 0,
            formattedTotalSize: this.formatFileSize(totalSize)
          },
          contentTypes: contentTypeDistribution,
          uploadTrends: uploadTrends,
          compression: compressionAnalysis,
          usage: usagePatterns[0] || {},
          chunks: chunkAnalysis[0] || {},
          recommendations: this.generateOptimizationRecommendations({
            totalFiles,
            totalSize,
            contentTypeDistribution,
            compressionAnalysis,
            usagePatterns: usagePatterns[0]
          })
        };
      }

      return analytics;

    } catch (error) {
      console.error('GridFS analytics error:', error);
      throw error;
    }
  }

  generateOptimizationRecommendations(stats) {
    const recommendations = [];

    // Storage optimization
    if (stats.totalSize > 100 * 1024 * 1024 * 1024) { // 100GB
      recommendations.push({
        type: 'storage',
        priority: 'high',
        message: 'Large storage usage detected - consider implementing data archival strategies'
      });
    }

    // Compression recommendations
    const uncompressedFiles = stats.compressionAnalysis.find(c => c._id === false);
    if (uncompressedFiles && uncompressedFiles.count > stats.totalFiles * 0.8) {
      recommendations.push({
        type: 'compression',
        priority: 'medium',
        message: 'Many files could benefit from compression to save storage space'
      });
    }

    // Usage pattern recommendations
    if (stats.usagePatterns && stats.usagePatterns.avgDownloadsPerFile < 1) {
      recommendations.push({
        type: 'usage',
        priority: 'low',
        message: 'Low file access rates - consider implementing content cleanup policies'
      });
    }

    return recommendations;
  }

  // Utility methods
  selectBucket(contentType) {
    if (!contentType) return this.buckets.files;

    if (contentType.startsWith('image/')) return this.buckets.images;
    if (contentType.startsWith('video/')) return this.buckets.videos;
    if (contentType.startsWith('audio/')) return this.buckets.audio;
    if (this.isDocumentType(contentType)) return this.buckets.documents;
    if (contentType.includes('zip') || contentType.includes('tar')) return this.buckets.archives;

    return this.buckets.files;
  }

  isDocumentType(contentType) {
    return contentType === 'application/pdf' || 
           contentType.startsWith('text/') || 
           contentType.includes('document') ||
           contentType.includes('office') ||
           contentType.includes('word') ||
           contentType.includes('excel') ||
           contentType.includes('powerpoint');
  }

  formatFileSize(bytes) {
    if (bytes >= 1073741824) return `${(bytes / 1073741824).toFixed(2)} GB`;
    if (bytes >= 1048576) return `${(bytes / 1048576).toFixed(2)} MB`;
    if (bytes >= 1024) return `${(bytes / 1024).toFixed(2)} KB`;
    return `${bytes} bytes`;
  }

  detectContentType(filename) {
    const ext = this.extractFileExtension(filename).toLowerCase();
    const mimeTypes = {
      'jpg': 'image/jpeg', 'jpeg': 'image/jpeg', 'png': 'image/png', 'gif': 'image/gif',
      'mp4': 'video/mp4', 'avi': 'video/avi', 'mov': 'video/quicktime',
      'mp3': 'audio/mpeg', 'wav': 'audio/wav', 'flac': 'audio/flac',
      'pdf': 'application/pdf', 'doc': 'application/msword', 'txt': 'text/plain',
      'zip': 'application/zip', 'tar': 'application/x-tar'
    };
    return mimeTypes[ext] || 'application/octet-stream';
  }

  extractFileExtension(filename) {
    const lastDot = filename.lastIndexOf('.');
    return lastDot > 0 ? filename.substring(lastDot + 1) : '';
  }

  categorizeByContentType(contentType) {
    if (!contentType) return 'other';
    if (contentType.startsWith('image/')) return 'image';
    if (contentType.startsWith('video/')) return 'video';
    if (contentType.startsWith('audio/')) return 'audio';
    if (contentType === 'application/pdf') return 'document';
    if (contentType.startsWith('text/')) return 'text';
    return 'other';
  }

  async getFileById(fileId) {
    // Search across all buckets for the file
    for (const [bucketName, bucket] of Object.entries(this.buckets)) {
      const filesCollection = this.db.collection(`${bucketName}.files`);
      const fileDoc = await filesCollection.findOne({ _id: fileId });
      if (fileDoc) {
        fileDoc.bucketName = bucketName;
        return fileDoc;
      }
    }
    return null;
  }

  async updateFileMetadata(fileId, updates) {
    const fileDoc = await this.getFileById(fileId);
    if (!fileDoc) {
      throw new Error(`File not found: ${fileId}`);
    }

    const filesCollection = this.db.collection(`${fileDoc.bucketName}.files`);
    return await filesCollection.updateOne({ _id: fileId }, { $set: updates });
  }
}

// Benefits of MongoDB GridFS for Large File Management:
// - Native chunking and streaming capabilities with automatic chunk management
// - Atomic operations combining file content and metadata in database transactions
// - Built-in replication and sharding support for distributed file storage
// - Comprehensive indexing capabilities for file metadata and content properties
// - Integrated backup and restore operations with database-level consistency
// - Advanced querying capabilities across file content and associated data
// - Automatic load balancing and failover for file operations
// - Version control and concurrent access management built into the database
// - Seamless integration with MongoDB's security and access control systems
// - Production-ready scalability with automatic optimization and performance tuning

module.exports = {
  AdvancedGridFSManager
};

Understanding MongoDB GridFS Architecture

Advanced File Storage Patterns and Multimedia Processing

Implement sophisticated GridFS strategies for production file management systems:

// Production-scale GridFS implementation with advanced multimedia processing and content management
class ProductionGridFSPlatform extends AdvancedGridFSManager {
  constructor(db, productionConfig) {
    super(db, productionConfig);

    this.productionConfig = {
      ...productionConfig,
      highAvailability: true,
      globalDistribution: true,
      advancedSecurity: true,
      contentDelivery: true,
      realTimeProcessing: true,
      aiContentAnalysis: true
    };

    this.setupProductionOptimizations();
    this.initializeAdvancedProcessing();
    this.setupMonitoringAndAlerts();
  }

  async implementAdvancedContentProcessing() {
    console.log('Implementing advanced content processing pipeline...');

    const processingPipeline = {
      // AI-powered content analysis
      contentAnalysis: {
        imageRecognition: true,
        videoContentAnalysis: true,
        audioTranscription: true,
        documentOCR: true,
        contentModerationAI: true
      },

      // Multimedia optimization
      mediaOptimization: {
        imageCompression: true,
        videoTranscoding: true,
        audioNormalization: true,
        thumbnailGeneration: true,
        previewGeneration: true
      },

      // Content delivery optimization
      deliveryOptimization: {
        adaptiveStreaming: true,
        globalCDN: true,
        edgeCache: true,
        compressionOptimization: true
      }
    };

    return await this.deployProcessingPipeline(processingPipeline);
  }

  async setupDistributedFileStorage() {
    console.log('Setting up distributed file storage architecture...');

    const distributionStrategy = {
      // Geographic distribution
      regions: ['us-east-1', 'eu-west-1', 'ap-southeast-1'],
      replicationFactor: 3,

      // Storage tiers
      storageTiers: {
        hot: { accessPattern: 'frequent', retention: '30d' },
        warm: { accessPattern: 'occasional', retention: '90d' },
        cold: { accessPattern: 'rare', retention: '1y' },
        archive: { accessPattern: 'backup', retention: '7y' }
      },

      // Performance optimization
      performanceOptimization: {
        readPreference: 'nearest',
        writePreference: 'majority',
        connectionPooling: true,
        indexOptimization: true
      }
    };

    return await this.deployDistributionStrategy(distributionStrategy);
  }

  async implementAdvancedSecurity() {
    console.log('Implementing advanced security measures...');

    const securityMeasures = {
      // Encryption
      encryption: {
        encryptionAtRest: true,
        encryptionInTransit: true,
        fieldLevelEncryption: true,
        keyManagement: 'aws-kms'
      },

      // Access control
      accessControl: {
        roleBasedAccess: true,
        attributeBasedAccess: true,
        tokenBasedAuth: true,
        auditLogging: true
      },

      // Content security
      contentSecurity: {
        virusScanning: true,
        contentValidation: true,
        integrityChecking: true,
        accessTracking: true
      }
    };

    return await this.deploySecurityMeasures(securityMeasures);
  }
}

SQL-Style GridFS Operations with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB GridFS operations and file management:

-- QueryLeaf GridFS operations with SQL-familiar file management syntax

-- Create GridFS storage buckets with advanced configuration
CREATE GRIDFS BUCKET files_storage 
WITH (
  chunk_size = 261120,  -- 255KB chunks
  bucket_name = 'files',
  compression = 'zstd',
  deduplication = true,
  versioning = true,

  -- Storage optimization
  storage_class = 'standard',
  auto_tiering = true,
  retention_policy = '365 days',

  -- Performance tuning
  max_concurrent_uploads = 10,
  streaming_chunk_size = 1048576,
  index_optimization = 'performance'
);

CREATE GRIDFS BUCKET images_storage 
WITH (
  chunk_size = 261120,
  bucket_name = 'images',
  content_processing = true,
  thumbnail_generation = true,

  -- Image-specific settings
  image_optimization = true,
  format_conversion = true,
  quality_presets = JSON_ARRAY('thumbnail', 'medium', 'high', 'original')
);

CREATE GRIDFS BUCKET videos_storage 
WITH (
  chunk_size = 1048576,  -- 1MB chunks for videos
  bucket_name = 'videos',
  content_processing = true,

  -- Video-specific settings
  transcoding_enabled = true,
  preview_generation = true,
  streaming_optimization = true,
  adaptive_bitrate = true
);

-- Upload files with comprehensive metadata and processing options
UPLOAD FILE '/path/to/document.pdf'
TO GRIDFS BUCKET files_storage
AS 'important-document.pdf'
WITH (
  content_type = 'application/pdf',
  category = 'legal_documents',
  tags = JSON_ARRAY('contract', 'legal', '2025'),
  access_level = 'restricted',
  created_by = CURRENT_USER_ID(),

  -- Custom metadata
  metadata = JSON_OBJECT(
    'department', 'legal',
    'client_id', '12345',
    'confidentiality_level', 'high',
    'retention_period', '7 years'
  ),

  -- Processing options
  enable_ocr = true,
  enable_full_text_indexing = true,
  generate_thumbnail = true,
  content_analysis = true
);

-- Batch upload multiple files with pattern matching
UPLOAD FILES FROM DIRECTORY '/uploads/batch_2025/'
PATTERN '*.{jpg,png,gif}'
TO GRIDFS BUCKET images_storage
WITH (
  category = 'product_images',
  batch_id = 'batch_2025_001',
  auto_categorize = true,

  -- Image processing options
  generate_thumbnails = JSON_ARRAY('128x128', '256x256', '512x512'),
  compress_originals = true,
  extract_metadata = true,

  -- Content analysis
  image_recognition = true,
  face_detection = true,
  content_moderation = true
);

-- Advanced file search with complex filtering and ranking
WITH file_search AS (
  SELECT 
    f.file_id,
    f.filename,
    f.upload_date,
    f.file_size,
    f.content_type,
    f.metadata,

    -- Full-text search scoring
    GRIDFS_SEARCH_SCORE(f.filename || ' ' || f.metadata.searchable_text, 'contract legal document') as text_score,

    -- Content-based similarity (for images/videos)
    GRIDFS_CONTENT_SIMILARITY(f.file_id, 'reference_image_id') as content_similarity,

    -- Metadata-based relevance
    CASE 
      WHEN f.metadata.category = 'legal_documents' THEN 1.0
      WHEN f.metadata.tags @> JSON_ARRAY('legal') THEN 0.8
      WHEN f.metadata.tags @> JSON_ARRAY('contract') THEN 0.6
      ELSE 0.0
    END as category_relevance,

    -- Recency boost
    CASE 
      WHEN f.upload_date > CURRENT_DATE - INTERVAL '30 days' THEN 0.2
      WHEN f.upload_date > CURRENT_DATE - INTERVAL '90 days' THEN 0.1
      ELSE 0.0
    END as recency_boost,

    -- Usage popularity
    LOG(f.metadata.download_count + 1) * 0.1 as popularity_score,

    -- File quality indicators
    CASE 
      WHEN f.metadata.processing_status = 'completed' THEN 0.1
      WHEN f.metadata.has_thumbnail = true THEN 0.05
      WHEN f.metadata.content_indexed = true THEN 0.05
      ELSE 0.0
    END as quality_score

  FROM GRIDFS_FILES('files_storage') f
  WHERE 
    -- Content type filtering
    f.content_type IN ('application/pdf', 'application/msword', 'text/plain')

    -- Date range filtering
    AND f.upload_date >= CURRENT_DATE - INTERVAL '2 years'

    -- Access level filtering (based on user permissions)
    AND GRIDFS_CHECK_ACCESS(f.file_id, CURRENT_USER_ID()) = true

    -- Size filtering
    AND f.file_size BETWEEN 1024 AND 100*1024*1024  -- 1KB to 100MB

    -- Metadata filtering
    AND (
      f.metadata.category = 'legal_documents'
      OR f.metadata.tags @> JSON_ARRAY('legal')
      OR GRIDFS_FULL_TEXT_SEARCH(f.file_id, 'contract agreement legal') > 0.5
    )

    -- Processing status filtering
    AND f.metadata.processing_status IN ('completed', 'partial')
),

ranked_results AS (
  SELECT *,
    -- Combined relevance scoring
    (
      COALESCE(text_score, 0) * 0.4 +
      COALESCE(content_similarity, 0) * 0.2 +
      category_relevance * 0.2 +
      recency_boost +
      popularity_score +
      quality_score
    ) as combined_relevance_score,

    -- Result categorization
    CASE 
      WHEN content_similarity > 0.8 THEN 'visually_similar'
      WHEN text_score > 0.8 THEN 'text_match'
      WHEN category_relevance > 0.8 THEN 'category_match'
      ELSE 'general_relevance'
    END as match_type,

    -- Access recommendations
    CASE 
      WHEN metadata.access_level = 'public' THEN 'immediate_access'
      WHEN metadata.access_level = 'restricted' THEN 'approval_required'
      WHEN metadata.access_level = 'confidential' THEN 'special_authorization'
      ELSE 'standard_access'
    END as access_recommendation

  FROM file_search
  WHERE text_score > 0.1 OR content_similarity > 0.3 OR category_relevance > 0.0
),

file_analytics AS (
  SELECT 
    COUNT(*) as total_results,
    AVG(combined_relevance_score) as avg_relevance,

    -- Content type distribution
    JSON_OBJECT_AGG(
      content_type,
      COUNT(*)
    ) as content_type_distribution,

    -- Match type analysis
    JSON_OBJECT_AGG(
      match_type,
      COUNT(*)
    ) as match_type_distribution,

    -- Size distribution analysis
    JSON_OBJECT(
      'small_files', COUNT(*) FILTER (WHERE file_size < 1048576),
      'medium_files', COUNT(*) FILTER (WHERE file_size BETWEEN 1048576 AND 104857600),
      'large_files', COUNT(*) FILTER (WHERE file_size > 104857600)
    ) as size_distribution,

    -- Temporal distribution
    JSON_OBJECT_AGG(
      DATE_TRUNC('month', upload_date)::text,
      COUNT(*)
    ) as upload_timeline

  FROM ranked_results
)

-- Final comprehensive file search results with analytics
SELECT 
  -- File identification
  rr.file_id,
  rr.filename,
  rr.content_type,

  -- File properties
  GRIDFS_FORMAT_FILE_SIZE(rr.file_size) as file_size_formatted,
  rr.upload_date,
  DATE_TRUNC('day', rr.upload_date)::date as upload_date_formatted,

  -- Relevance and matching
  ROUND(rr.combined_relevance_score, 4) as relevance_score,
  rr.match_type,
  ROUND(rr.text_score, 3) as text_match_score,
  ROUND(rr.content_similarity, 3) as visual_similarity_score,

  -- Content and metadata
  rr.metadata.category,
  rr.metadata.tags,
  rr.metadata.description,

  -- Processing status and capabilities
  rr.metadata.processing_status,
  rr.metadata.has_thumbnail,
  rr.metadata.content_indexed,
  JSON_OBJECT(
    'ocr_available', COALESCE(rr.metadata.ocr_completed, false),
    'full_text_searchable', COALESCE(rr.metadata.full_text_indexed, false),
    'content_analyzed', COALESCE(rr.metadata.content_analysis_completed, false)
  ) as processing_capabilities,

  -- Access and usage
  rr.access_recommendation,
  rr.metadata.download_count,
  rr.metadata.last_accessed,

  -- File operations URLs
  CONCAT('/api/gridfs/files/', rr.file_id, '/download') as download_url,
  CONCAT('/api/gridfs/files/', rr.file_id, '/preview') as preview_url,
  CONCAT('/api/gridfs/files/', rr.file_id, '/thumbnail') as thumbnail_url,
  CONCAT('/api/gridfs/files/', rr.file_id, '/metadata') as metadata_url,

  -- Related files
  GRIDFS_FIND_SIMILAR_FILES(
    rr.file_id, 
    limit => 3,
    similarity_threshold => 0.7
  ) as related_files,

  -- Version information
  CASE 
    WHEN rr.metadata.has_versions = true THEN 
      JSON_OBJECT(
        'is_versioned', true,
        'version_number', rr.metadata.version,
        'latest_version', GRIDFS_GET_LATEST_VERSION(rr.metadata.base_file_id),
        'version_history_url', CONCAT('/api/gridfs/files/', rr.file_id, '/versions')
      )
    ELSE JSON_OBJECT('is_versioned', false)
  END as version_info,

  -- Search analytics (same for all results)
  (SELECT JSON_BUILD_OBJECT(
    'total_results', fa.total_results,
    'average_relevance', ROUND(fa.avg_relevance, 3),
    'content_types', fa.content_type_distribution,
    'match_types', fa.match_type_distribution,
    'size_distribution', fa.size_distribution,
    'upload_timeline', fa.upload_timeline
  ) FROM file_analytics fa) as search_analytics

FROM ranked_results rr
WHERE rr.combined_relevance_score > 0.2
ORDER BY rr.combined_relevance_score DESC
LIMIT 50;

-- Advanced file streaming and download operations
WITH streaming_session AS (
  SELECT 
    f.file_id,
    f.filename,
    f.file_size,
    f.content_type,
    f.metadata,

    -- Calculate optimal streaming parameters
    CASE 
      WHEN f.file_size > 1073741824 THEN 'chunked'  -- > 1GB
      WHEN f.file_size > 104857600 THEN 'buffered'   -- > 100MB
      ELSE 'direct'
    END as streaming_strategy,

    -- Determine chunk size based on file type and size
    CASE 
      WHEN f.content_type LIKE 'video/%' THEN 2097152  -- 2MB chunks for video
      WHEN f.content_type LIKE 'audio/%' THEN 524288   -- 512KB chunks for audio  
      WHEN f.file_size > 104857600 THEN 1048576        -- 1MB chunks for large files
      ELSE 262144                                      -- 256KB chunks for others
    END as optimal_chunk_size,

    -- Caching strategy
    CASE 
      WHEN f.metadata.download_count > 100 THEN 'cache_aggressively'
      WHEN f.metadata.download_count > 10 THEN 'cache_moderately'
      ELSE 'cache_minimally'
    END as cache_strategy

  FROM GRIDFS_FILES('videos_storage') f
  WHERE f.content_type LIKE 'video/%'
    AND f.file_size > 10485760  -- > 10MB
)

-- Stream video files with adaptive bitrate and quality selection
SELECT 
  ss.file_id,
  ss.filename,
  ss.streaming_strategy,
  ss.optimal_chunk_size,

  -- Generate streaming URLs for different qualities
  JSON_OBJECT(
    'original', GRIDFS_STREAMING_URL(ss.file_id, quality => 'original'),
    'hd', GRIDFS_STREAMING_URL(ss.file_id, quality => 'hd'),
    'sd', GRIDFS_STREAMING_URL(ss.file_id, quality => 'sd'),
    'mobile', GRIDFS_STREAMING_URL(ss.file_id, quality => 'mobile')
  ) as streaming_urls,

  -- Adaptive streaming manifest
  GRIDFS_GENERATE_HLS_MANIFEST(
    ss.file_id,
    qualities => JSON_ARRAY('original', 'hd', 'sd', 'mobile'),
    segment_duration => 10
  ) as hls_manifest_url,

  -- Video metadata for player
  JSON_OBJECT(
    'duration', ss.metadata.video_properties.duration,
    'width', ss.metadata.video_properties.width,
    'height', ss.metadata.video_properties.height,
    'frame_rate', ss.metadata.video_properties.frame_rate,
    'bitrate', ss.metadata.video_properties.bitrate,
    'codec', ss.metadata.video_properties.video_codec,
    'has_subtitles', COALESCE(ss.metadata.has_subtitles, false),
    'thumbnail_count', ARRAY_LENGTH(ss.metadata.video_thumbnails, 1)
  ) as video_metadata,

  -- Streaming optimization
  ss.cache_strategy,

  -- CDN and delivery optimization
  JSON_OBJECT(
    'cdn_enabled', true,
    'edge_cache_ttl', CASE ss.cache_strategy 
      WHEN 'cache_aggressively' THEN 3600
      WHEN 'cache_moderately' THEN 1800
      ELSE 600
    END,
    'compression_enabled', true,
    'adaptive_streaming', true
  ) as delivery_options

FROM streaming_session ss
ORDER BY ss.metadata.download_count DESC;

-- File management and lifecycle operations
WITH file_lifecycle_analysis AS (
  SELECT 
    f.file_id,
    f.filename,
    f.upload_date,
    f.file_size,
    f.metadata,

    -- Age categorization
    CASE 
      WHEN f.upload_date > CURRENT_DATE - INTERVAL '30 days' THEN 'recent'
      WHEN f.upload_date > CURRENT_DATE - INTERVAL '90 days' THEN 'current'  
      WHEN f.upload_date > CURRENT_DATE - INTERVAL '365 days' THEN 'old'
      ELSE 'archived'
    END as age_category,

    -- Usage categorization
    CASE 
      WHEN f.metadata.download_count > 100 THEN 'high_usage'
      WHEN f.metadata.download_count > 10 THEN 'medium_usage'
      WHEN f.metadata.download_count > 0 THEN 'low_usage'
      ELSE 'unused'
    END as usage_category,

    -- Storage efficiency analysis
    GRIDFS_CALCULATE_STORAGE_EFFICIENCY(f.file_id) as storage_efficiency,

    -- Content value scoring
    (
      LOG(f.metadata.download_count + 1) * 0.3 +
      CASE WHEN f.metadata.access_level = 'public' THEN 0.2 ELSE 0 END +
      CASE WHEN f.metadata.has_versions = true THEN 0.1 ELSE 0 END +
      CASE WHEN f.metadata.content_indexed = true THEN 0.1 ELSE 0 END +
      CASE WHEN ARRAY_LENGTH(f.metadata.tags, 1) > 0 THEN 0.1 ELSE 0 END
    ) as content_value_score,

    -- Days since last access
    COALESCE(EXTRACT(DAYS FROM CURRENT_DATE - f.metadata.last_accessed::date), 9999) as days_since_access

  FROM GRIDFS_FILES() f  -- Search across all buckets
  WHERE f.upload_date >= CURRENT_DATE - INTERVAL '2 years'
),

lifecycle_recommendations AS (
  SELECT 
    fla.*,

    -- Lifecycle action recommendations
    CASE 
      WHEN fla.age_category = 'archived' AND fla.usage_category = 'unused' THEN 'delete_candidate'
      WHEN fla.age_category = 'old' AND fla.usage_category IN ('unused', 'low_usage') THEN 'archive_candidate'
      WHEN fla.usage_category = 'high_usage' AND fla.storage_efficiency < 0.7 THEN 'optimize_candidate'
      WHEN fla.days_since_access > 180 AND fla.usage_category != 'high_usage' THEN 'cold_storage_candidate'
      ELSE 'maintain_current'
    END as lifecycle_action,

    -- Storage tier recommendation
    CASE 
      WHEN fla.usage_category = 'high_usage' AND fla.days_since_access <= 7 THEN 'hot'
      WHEN fla.usage_category IN ('high_usage', 'medium_usage') AND fla.days_since_access <= 30 THEN 'warm'
      WHEN fla.days_since_access <= 90 THEN 'cool'
      ELSE 'cold'
    END as recommended_storage_tier,

    -- Estimated cost savings
    GRIDFS_ESTIMATE_COST_SAVINGS(
      fla.file_id,
      current_tier => fla.metadata.storage_class,
      recommended_tier => CASE 
        WHEN fla.usage_category = 'high_usage' AND fla.days_since_access <= 7 THEN 'hot'
        WHEN fla.usage_category IN ('high_usage', 'medium_usage') AND fla.days_since_access <= 30 THEN 'warm'  
        WHEN fla.days_since_access <= 90 THEN 'cool'
        ELSE 'cold'
      END
    ) as estimated_monthly_savings,

    -- Priority score for lifecycle actions
    CASE fla.age_category
      WHEN 'archived' THEN 1
      WHEN 'old' THEN 2  
      WHEN 'current' THEN 3
      ELSE 4
    END * 
    CASE fla.usage_category
      WHEN 'unused' THEN 1
      WHEN 'low_usage' THEN 2
      WHEN 'medium_usage' THEN 3  
      ELSE 4
    END as action_priority

  FROM file_lifecycle_analysis fla
)

-- Execute lifecycle management recommendations
SELECT 
  lr.lifecycle_action,
  COUNT(*) as affected_files,
  SUM(lr.file_size) as total_size_bytes,
  GRIDFS_FORMAT_FILE_SIZE(SUM(lr.file_size)) as total_size_formatted,
  SUM(lr.estimated_monthly_savings) as total_monthly_savings,
  AVG(lr.action_priority) as avg_priority,

  -- Detailed breakdown by file characteristics
  JSON_OBJECT_AGG(lr.age_category, COUNT(*)) as age_distribution,
  JSON_OBJECT_AGG(lr.usage_category, COUNT(*)) as usage_distribution,
  JSON_OBJECT_AGG(lr.recommended_storage_tier, COUNT(*)) as tier_distribution,

  -- Sample files for review
  JSON_AGG(
    JSON_OBJECT(
      'file_id', lr.file_id,
      'filename', lr.filename,
      'size', GRIDFS_FORMAT_FILE_SIZE(lr.file_size),
      'age_days', EXTRACT(DAYS FROM CURRENT_DATE - lr.upload_date),
      'last_access_days', lr.days_since_access,
      'download_count', lr.metadata.download_count,
      'estimated_savings', lr.estimated_monthly_savings
    ) 
    ORDER BY lr.action_priority ASC, lr.file_size DESC
    LIMIT 5
  ) as sample_files,

  -- Implementation recommendations
  CASE lr.lifecycle_action
    WHEN 'delete_candidate' THEN 'Schedule for deletion after 30-day notice period'
    WHEN 'archive_candidate' THEN 'Move to archive storage tier'
    WHEN 'optimize_candidate' THEN 'Apply compression and deduplication'
    WHEN 'cold_storage_candidate' THEN 'Migrate to cold storage tier'
    ELSE 'No action required'
  END as implementation_recommendation

FROM lifecycle_recommendations lr
WHERE lr.lifecycle_action != 'maintain_current'
GROUP BY lr.lifecycle_action
ORDER BY total_size_bytes DESC;

-- Storage analytics and optimization insights
CREATE VIEW gridfs_storage_dashboard AS
WITH bucket_analytics AS (
  SELECT 
    bucket_name,
    COUNT(*) as total_files,
    SUM(file_size) as total_size_bytes,
    AVG(file_size) as avg_file_size,
    MIN(file_size) as min_file_size,
    MAX(file_size) as max_file_size,

    -- Content type distribution
    JSON_OBJECT_AGG(content_type, COUNT(*)) as content_type_counts,

    -- Upload trends
    JSON_OBJECT_AGG(
      DATE_TRUNC('month', upload_date)::text,
      COUNT(*)
    ) as monthly_upload_trends,

    -- Usage statistics
    SUM(metadata.download_count) as total_downloads,
    AVG(metadata.download_count) as avg_downloads_per_file,

    -- Processing statistics
    COUNT(*) FILTER (WHERE metadata.processing_status = 'completed') as processed_files,
    COUNT(*) FILTER (WHERE metadata.has_thumbnail = true) as files_with_thumbnails,
    COUNT(*) FILTER (WHERE metadata.content_indexed = true) as indexed_files,

    -- Storage efficiency
    AVG(
      CASE WHEN metadata.is_compressed = true 
        THEN metadata.compression_ratio 
        ELSE 1.0 
      END
    ) as avg_compression_ratio,

    COUNT(*) FILTER (WHERE metadata.is_compressed = true) as compressed_files,

    -- Age distribution
    COUNT(*) FILTER (WHERE upload_date > CURRENT_DATE - INTERVAL '30 days') as recent_files,
    COUNT(*) FILTER (WHERE upload_date <= CURRENT_DATE - INTERVAL '365 days') as old_files

  FROM GRIDFS_FILES() 
  GROUP BY bucket_name
)

SELECT 
  bucket_name,
  total_files,
  GRIDFS_FORMAT_FILE_SIZE(total_size_bytes) as total_storage,
  GRIDFS_FORMAT_FILE_SIZE(avg_file_size) as avg_file_size,

  -- Storage efficiency metrics
  ROUND((compressed_files::numeric / total_files) * 100, 1) as compression_percentage,
  ROUND(avg_compression_ratio, 2) as avg_compression_ratio,
  ROUND((processed_files::numeric / total_files) * 100, 1) as processing_completion_rate,

  -- Usage metrics
  total_downloads,
  ROUND(avg_downloads_per_file, 1) as avg_downloads_per_file,
  ROUND((indexed_files::numeric / total_files) * 100, 1) as indexing_coverage,

  -- Content insights
  content_type_counts,
  monthly_upload_trends,

  -- Storage optimization opportunities
  CASE 
    WHEN compressed_files::numeric / total_files < 0.5 THEN 
      CONCAT('Enable compression for ', ROUND(((total_files - compressed_files)::numeric / total_files) * 100, 1), '% of files')
    WHEN processed_files::numeric / total_files < 0.8 THEN
      CONCAT('Complete processing for ', ROUND(((total_files - processed_files)::numeric / total_files) * 100, 1), '% of files')
    WHEN old_files > total_files * 0.3 THEN
      CONCAT('Consider archiving ', old_files, ' old files (', ROUND((old_files::numeric / total_files) * 100, 1), '%)')
    ELSE 'Storage optimized'
  END as optimization_opportunity,

  -- Performance indicators
  JSON_OBJECT(
    'recent_activity', recent_files,
    'storage_growth_rate', ROUND((recent_files::numeric / GREATEST(total_files - recent_files, 1)) * 100, 1),
    'avg_file_age_days', ROUND(AVG(EXTRACT(DAYS FROM CURRENT_DATE - upload_date)), 0),
    'thumbnail_coverage', ROUND((files_with_thumbnails::numeric / total_files) * 100, 1)
  ) as performance_indicators

FROM bucket_analytics
ORDER BY total_size_bytes DESC;

-- QueryLeaf provides comprehensive GridFS capabilities:
-- 1. SQL-familiar file upload, download, and streaming operations
-- 2. Advanced file search with content-based and metadata filtering
-- 3. Multimedia processing integration with thumbnail and preview generation
-- 4. Intelligent file lifecycle management and storage optimization
-- 5. Comprehensive analytics and monitoring for file storage systems
-- 6. Production-ready security, access control, and audit logging
-- 7. Seamless integration with MongoDB's replication and sharding
-- 8. Advanced content analysis and AI-powered file processing
-- 9. Distributed file storage with global CDN integration
-- 10. SQL-style syntax for complex file management workflows

Best Practices for Production GridFS Implementation

Storage Architecture and Performance Optimization

Essential principles for scalable MongoDB GridFS deployment:

Bucket Organization: Design bucket structure based on content types, access patterns, and processing requirements
Chunk Size Optimization: Configure optimal chunk sizes based on file types, access patterns, and network characteristics
Index Strategy: Implement comprehensive indexing for file metadata, content properties, and access patterns
Storage Tiering: Design intelligent storage tiering strategies for cost optimization and performance
Processing Pipeline: Implement automated content processing for multimedia optimization and analysis
Security Integration: Ensure comprehensive security controls for file access, encryption, and audit logging

Scalability and Operational Excellence

Optimize GridFS deployments for enterprise-scale requirements:

Distributed Architecture: Design sharding strategies for large-scale file storage across multiple regions
Performance Monitoring: Implement comprehensive monitoring for storage usage, access patterns, and processing performance
Backup and Recovery: Design robust backup strategies that handle both file content and metadata consistency
Content Delivery: Integrate with CDN and edge caching for optimal file delivery performance
Cost Optimization: Implement automated lifecycle management and storage optimization policies
Disaster Recovery: Plan for business continuity with replicated file storage and failover capabilities

Conclusion

MongoDB GridFS provides comprehensive large file storage and binary data management capabilities that enable efficient handling of multimedia content, documents, and large datasets with automatic chunking, streaming, and integrated metadata management. The native MongoDB integration ensures GridFS benefits from the same scalability, consistency, and operational features as document storage.

Key MongoDB GridFS benefits include:

Native Integration: Seamless integration with MongoDB's document model, transactions, and consistency guarantees
Automatic Chunking: Efficient handling of large files with automatic chunking and streaming capabilities
Comprehensive Metadata: Rich metadata management with flexible schemas and advanced querying capabilities
Processing Integration: Built-in support for content processing, thumbnail generation, and multimedia optimization
Scalable Architecture: Production-ready scalability with sharding, replication, and distributed storage
Operational Excellence: Integrated backup, monitoring, and management tools for enterprise deployments

Whether you're building content management systems, multimedia platforms, document repositories, or any application requiring robust file storage, MongoDB GridFS with QueryLeaf's familiar SQL interface provides the foundation for scalable and maintainable file management solutions.

QueryLeaf Integration: QueryLeaf automatically manages MongoDB GridFS operations while providing SQL-familiar syntax for file uploads, downloads, content processing, and storage optimization. Advanced file management patterns, multimedia processing workflows, and storage analytics are seamlessly handled through familiar SQL constructs, making sophisticated file storage capabilities accessible to SQL-oriented development teams.

The combination of MongoDB's robust GridFS capabilities with SQL-style file operations makes it an ideal platform for modern applications that require both powerful file storage and familiar database management patterns, ensuring your file storage solutions scale efficiently while remaining maintainable and feature-rich.

November 23, 2025
32 min read

MongoDB Time Series Collections for IoT Sensor Data Management: Real-Time Analytics and High-Performance Time-Based Data Processing

Modern IoT applications generate massive volumes of time-stamped sensor data that require specialized storage and processing strategies to handle high ingestion rates, efficient time-based queries, and real-time analytics workloads. Traditional relational databases struggle with time series data due to their row-oriented storage models, lack of built-in time-based optimizations, and inefficient handling of high-frequency data ingestion patterns common in IoT environments.

MongoDB Time Series Collections provide native support for time-stamped data with automatic data organization, compression optimizations, and specialized indexing strategies designed specifically for temporal workloads. Unlike traditional approaches that require custom partitioning schemes and complex query optimization, MongoDB's time series collections automatically optimize storage layout, query performance, and data retention policies while maintaining familiar query interfaces and operational simplicity.

The Traditional Time Series Data Challenge

Relational databases face significant limitations when handling high-volume time series data:

-- Traditional PostgreSQL time series data management - complex partitioning and limited optimization

-- IoT sensor readings table with manual partitioning strategy
CREATE TABLE sensor_readings (
    reading_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    device_id UUID NOT NULL,
    sensor_type VARCHAR(50) NOT NULL,
    timestamp TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP,

    -- Sensor measurement values
    temperature DECIMAL(5,2),
    humidity DECIMAL(5,2),
    pressure DECIMAL(7,2),
    battery_level DECIMAL(5,2),
    signal_strength INTEGER,

    -- Location data
    latitude DECIMAL(10,8),
    longitude DECIMAL(11,8),
    altitude DECIMAL(8,2),

    -- Device status information
    device_status VARCHAR(20) DEFAULT 'active',
    firmware_version VARCHAR(20),
    last_calibration TIMESTAMP WITH TIME ZONE,

    -- Data quality indicators
    data_quality_score DECIMAL(3,2) DEFAULT 1.0,
    anomaly_detected BOOLEAN DEFAULT false,
    validation_flags JSONB,

    -- Metadata
    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    processing_timestamp TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,

    -- Constraints
    CONSTRAINT chk_timestamp_valid CHECK (timestamp <= CURRENT_TIMESTAMP + INTERVAL '1 hour'),
    CONSTRAINT chk_temperature_range CHECK (temperature BETWEEN -50 AND 125),
    CONSTRAINT chk_humidity_range CHECK (humidity BETWEEN 0 AND 100),
    CONSTRAINT chk_battery_level CHECK (battery_level BETWEEN 0 AND 100)
) PARTITION BY RANGE (timestamp);

-- Device metadata table
CREATE TABLE devices (
    device_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    device_name VARCHAR(200) NOT NULL,
    device_type VARCHAR(50) NOT NULL,
    manufacturer VARCHAR(100),
    model VARCHAR(100),
    firmware_version VARCHAR(20),

    -- Installation details
    installation_location VARCHAR(200),
    installation_date DATE,
    installation_coordinates POINT,

    -- Configuration
    sampling_interval_seconds INTEGER DEFAULT 60,
    reporting_interval_seconds INTEGER DEFAULT 300,
    sensor_configuration JSONB,

    -- Status tracking
    device_status VARCHAR(20) DEFAULT 'active',
    last_seen TIMESTAMP WITH TIME ZONE,
    last_maintenance TIMESTAMP WITH TIME ZONE,
    next_maintenance_due DATE,

    -- Metadata
    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);

-- Manual monthly partitioning (requires maintenance)
CREATE TABLE sensor_readings_2025_01 PARTITION OF sensor_readings
FOR VALUES FROM ('2025-01-01') TO ('2025-02-01');

CREATE TABLE sensor_readings_2025_02 PARTITION OF sensor_readings
FOR VALUES FROM ('2025-02-01') TO ('2025-03-01');

CREATE TABLE sensor_readings_2025_03 PARTITION OF sensor_readings
FOR VALUES FROM ('2025-03-01') TO ('2025-04-01');

-- Need to create partitions for every month manually or with automation
-- This becomes a maintenance burden and source of potential failures

-- Indexing strategy for time series queries (limited effectiveness)
CREATE INDEX idx_sensor_readings_timestamp ON sensor_readings (timestamp DESC);
CREATE INDEX idx_sensor_readings_device_time ON sensor_readings (device_id, timestamp DESC);
CREATE INDEX idx_sensor_readings_sensor_type_time ON sensor_readings (sensor_type, timestamp DESC);
CREATE INDEX idx_sensor_readings_location ON sensor_readings USING GIST (ST_MakePoint(longitude, latitude));

-- Complex aggregation query for sensor analytics
WITH hourly_sensor_averages AS (
  SELECT 
    device_id,
    sensor_type,
    DATE_TRUNC('hour', timestamp) as hour_bucket,

    -- Aggregated measurements
    COUNT(*) as reading_count,
    AVG(temperature) as avg_temperature,
    AVG(humidity) as avg_humidity,
    AVG(pressure) as avg_pressure,
    AVG(battery_level) as avg_battery_level,
    AVG(signal_strength) as avg_signal_strength,

    -- Statistical measures
    STDDEV(temperature) as temp_stddev,
    MIN(temperature) as min_temperature,
    MAX(temperature) as max_temperature,

    -- Data quality metrics
    AVG(data_quality_score) as avg_data_quality,
    COUNT(*) FILTER (WHERE anomaly_detected = true) as anomaly_count,

    -- Time-based calculations
    MIN(timestamp) as period_start,
    MAX(timestamp) as period_end,
    MAX(timestamp) - MIN(timestamp) as actual_duration

  FROM sensor_readings sr
  WHERE sr.timestamp >= CURRENT_TIMESTAMP - INTERVAL '24 hours'
    AND sr.device_status = 'active'
    AND sr.data_quality_score >= 0.8
  GROUP BY device_id, sensor_type, DATE_TRUNC('hour', timestamp)
),

device_performance_analysis AS (
  SELECT 
    hsa.*,
    d.device_name,
    d.device_type,
    d.installation_location,
    d.sampling_interval_seconds,

    -- Performance calculations
    CASE 
      WHEN hsa.reading_count < (3600 / d.sampling_interval_seconds) * 0.8 THEN 'under_reporting'
      WHEN hsa.reading_count > (3600 / d.sampling_interval_seconds) * 1.2 THEN 'over_reporting'
      ELSE 'normal'
    END as reporting_status,

    -- Battery analysis
    CASE 
      WHEN hsa.avg_battery_level < 20 THEN 'critical'
      WHEN hsa.avg_battery_level < 40 THEN 'low'
      WHEN hsa.avg_battery_level < 60 THEN 'medium'
      ELSE 'good'
    END as battery_status,

    -- Signal strength analysis
    CASE 
      WHEN hsa.avg_signal_strength < -80 THEN 'poor'
      WHEN hsa.avg_signal_strength < -60 THEN 'fair'
      WHEN hsa.avg_signal_strength < -40 THEN 'good'
      ELSE 'excellent'
    END as signal_status,

    -- Calculate trends using window functions (expensive)
    LAG(hsa.avg_temperature) OVER (
      PARTITION BY hsa.device_id, hsa.sensor_type 
      ORDER BY hsa.hour_bucket
    ) as prev_hour_temperature,

    -- Rolling averages (very expensive across partitions)
    AVG(hsa.avg_temperature) OVER (
      PARTITION BY hsa.device_id, hsa.sensor_type 
      ORDER BY hsa.hour_bucket 
      ROWS 23 PRECEDING
    ) as rolling_24h_avg_temperature

  FROM hourly_sensor_averages hsa
  JOIN devices d ON hsa.device_id = d.device_id
),

environmental_conditions AS (
  -- Complex environmental analysis requiring expensive calculations
  SELECT 
    dpa.hour_bucket,
    dpa.installation_location,

    -- Location-based aggregations
    AVG(dpa.avg_temperature) as location_avg_temperature,
    AVG(dpa.avg_humidity) as location_avg_humidity,
    AVG(dpa.avg_pressure) as location_avg_pressure,

    -- Device count and health by location
    COUNT(*) as active_devices,
    COUNT(*) FILTER (WHERE dpa.battery_status = 'critical') as critical_battery_devices,
    COUNT(*) FILTER (WHERE dpa.signal_status = 'poor') as poor_signal_devices,
    COUNT(*) FILTER (WHERE dpa.anomaly_count > 0) as devices_with_anomalies,

    -- Environmental variance analysis
    STDDEV(dpa.avg_temperature) as temperature_variance,
    STDDEV(dpa.avg_humidity) as humidity_variance,

    -- Extreme conditions detection
    BOOL_OR(dpa.avg_temperature > 40 OR dpa.avg_temperature < -10) as extreme_temperature_detected,
    BOOL_OR(dpa.avg_humidity > 90 OR dpa.avg_humidity < 10) as extreme_humidity_detected,

    -- Data quality aggregation
    AVG(dpa.avg_data_quality) as location_data_quality,
    SUM(dpa.anomaly_count) as total_location_anomalies

  FROM device_performance_analysis dpa
  GROUP BY dpa.hour_bucket, dpa.installation_location
)

SELECT 
  ec.hour_bucket,
  ec.installation_location,
  ec.active_devices,

  -- Environmental metrics
  ROUND(ec.location_avg_temperature::NUMERIC, 2) as avg_temperature,
  ROUND(ec.location_avg_humidity::NUMERIC, 2) as avg_humidity,
  ROUND(ec.location_avg_pressure::NUMERIC, 2) as avg_pressure,

  -- Device health summary
  ec.critical_battery_devices,
  ec.poor_signal_devices,
  ec.devices_with_anomalies,

  -- Environmental conditions
  CASE 
    WHEN ec.extreme_temperature_detected OR ec.extreme_humidity_detected THEN 'extreme'
    WHEN ec.temperature_variance > 5 OR ec.humidity_variance > 15 THEN 'variable'
    ELSE 'stable'
  END as environmental_stability,

  -- Data quality indicators
  ROUND(ec.location_data_quality::NUMERIC, 3) as data_quality_score,
  ec.total_location_anomalies,

  -- Health scoring
  (
    100 - 
    (ec.critical_battery_devices * 20) - 
    (ec.poor_signal_devices * 10) - 
    (ec.devices_with_anomalies * 5) -
    CASE WHEN ec.location_data_quality < 0.9 THEN 15 ELSE 0 END
  ) as location_health_score,

  -- Operational recommendations
  CASE 
    WHEN ec.critical_battery_devices > 0 THEN 'URGENT: Replace batteries on ' || ec.critical_battery_devices || ' devices'
    WHEN ec.poor_signal_devices > ec.active_devices * 0.3 THEN 'Consider signal boosters for location'
    WHEN ec.total_location_anomalies > 10 THEN 'Investigate environmental factors causing anomalies'
    WHEN ec.location_data_quality < 0.8 THEN 'Review device calibration and maintenance schedules'
    ELSE 'Location operating within normal parameters'
  END as operational_recommendation

FROM environmental_conditions ec
ORDER BY ec.hour_bucket DESC, ec.location_health_score ASC;

-- Performance problems with traditional time series approaches:
-- 1. Manual partition management creates operational overhead and failure points
-- 2. Complex query plans across multiple partitions reduce performance
-- 3. Limited compression and storage optimization for time-stamped data
-- 4. No native support for time-based retention policies and archiving
-- 5. Expensive aggregation operations across large time ranges
-- 6. Poor performance for recent data queries due to partition pruning limitations
-- 7. Complex indexing strategies required for different time-based access patterns
-- 8. Difficult to optimize for both high-throughput writes and analytical reads
-- 9. No built-in support for downsampling and data compaction strategies
-- 10. Limited ability to handle irregular time intervals and sparse data efficiently

-- Attempt at data retention management (complex and error-prone)
CREATE OR REPLACE FUNCTION manage_sensor_data_retention()
RETURNS void AS $$
DECLARE
    partition_name text;
    retention_date timestamp;
BEGIN
    -- Calculate retention boundary (keep 1 year of data)
    retention_date := CURRENT_TIMESTAMP - INTERVAL '1 year';

    -- Find partitions older than retention period
    FOR partition_name IN
        SELECT schemaname||'.'||tablename
        FROM pg_tables 
        WHERE tablename LIKE 'sensor_readings_%'
        AND schemaname = 'public'
    LOOP
        -- Extract date from partition name (fragile parsing)
        IF partition_name ~ 'sensor_readings_[0-9]{4}_[0-9]{2}$' THEN
            -- This logic is complex and error-prone
            -- Need to parse partition name, validate dates, check data
            -- Then carefully drop partitions without losing data
            RAISE NOTICE 'Would evaluate partition % for retention', partition_name;
        END IF;
    END LOOP;

    -- Complex logic needed to:
    -- 1. Verify partition contains only old data
    -- 2. Archive data if needed before deletion  
    -- 3. Update constraints and metadata
    -- 4. Handle dependencies and foreign keys
    -- 5. Clean up indexes and statistics

EXCEPTION
    WHEN OTHERS THEN
        -- Error handling for partition management failures
        RAISE EXCEPTION 'Retention management failed: %', SQLERRM;
END;
$$ LANGUAGE plpgsql;

-- Expensive real-time alerting query
WITH real_time_sensor_status AS (
  SELECT DISTINCT ON (device_id, sensor_type)
    device_id,
    sensor_type,
    timestamp,
    temperature,
    humidity,
    battery_level,
    signal_strength,
    anomaly_detected,
    data_quality_score
  FROM sensor_readings
  WHERE timestamp >= CURRENT_TIMESTAMP - INTERVAL '15 minutes'
  ORDER BY device_id, sensor_type, timestamp DESC
),

alert_conditions AS (
  SELECT 
    rtss.*,
    d.device_name,
    d.installation_location,

    -- Define alert conditions
    CASE 
      WHEN rtss.temperature > 50 OR rtss.temperature < -20 THEN 'temperature_extreme'
      WHEN rtss.humidity > 95 OR rtss.humidity < 5 THEN 'humidity_extreme'
      WHEN rtss.battery_level < 15 THEN 'battery_critical'
      WHEN rtss.signal_strength < -85 THEN 'signal_poor'
      WHEN rtss.anomaly_detected = true THEN 'anomaly_detected'
      WHEN rtss.data_quality_score < 0.7 THEN 'data_quality_poor'
      WHEN rtss.timestamp < CURRENT_TIMESTAMP - INTERVAL '10 minutes' THEN 'device_offline'
      ELSE null
    END as alert_type,

    -- Alert severity
    CASE 
      WHEN rtss.battery_level < 10 OR rtss.timestamp < CURRENT_TIMESTAMP - INTERVAL '20 minutes' THEN 'critical'
      WHEN rtss.temperature > 45 OR rtss.temperature < -15 OR rtss.anomaly_detected THEN 'high'
      WHEN rtss.battery_level < 20 OR rtss.signal_strength < -80 THEN 'medium'
      ELSE 'low'
    END as alert_severity

  FROM real_time_sensor_status rtss
  JOIN devices d ON rtss.device_id = d.device_id
  WHERE d.device_status = 'active'
)

SELECT 
  device_id,
  device_name,
  installation_location,
  sensor_type,
  alert_type,
  alert_severity,
  temperature,
  humidity,
  battery_level,
  signal_strength,
  timestamp,

  -- Alert message generation
  CASE alert_type
    WHEN 'temperature_extreme' THEN FORMAT('Temperature %s°C is outside safe range', temperature)
    WHEN 'humidity_extreme' THEN FORMAT('Humidity %s%% is at extreme level', humidity)
    WHEN 'battery_critical' THEN FORMAT('Battery level %s%% requires immediate attention', battery_level)
    WHEN 'signal_poor' THEN FORMAT('Signal strength %s dBm indicates connectivity issues', signal_strength)
    WHEN 'anomaly_detected' THEN 'Sensor readings show anomalous patterns'
    WHEN 'data_quality_poor' THEN FORMAT('Data quality score %s indicates sensor issues', data_quality_score)
    WHEN 'device_offline' THEN FORMAT('Device has not reported for %s minutes', EXTRACT(EPOCH FROM (CURRENT_TIMESTAMP - timestamp))/60)
    ELSE 'Unknown alert condition'
  END as alert_message,

  -- Recommended actions
  CASE alert_type
    WHEN 'temperature_extreme' THEN 'Check environmental conditions and sensor calibration'
    WHEN 'humidity_extreme' THEN 'Verify sensor operation and environmental factors'
    WHEN 'battery_critical' THEN 'Schedule immediate battery replacement'
    WHEN 'signal_poor' THEN 'Check device antenna and network infrastructure'
    WHEN 'anomaly_detected' THEN 'Investigate sensor readings and potential interference'
    WHEN 'data_quality_poor' THEN 'Perform sensor calibration and diagnostic checks'
    WHEN 'device_offline' THEN 'Check device power and network connectivity'
    ELSE 'Monitor device status'
  END as recommended_action

FROM alert_conditions
WHERE alert_type IS NOT NULL
ORDER BY 
  CASE alert_severity
    WHEN 'critical' THEN 1
    WHEN 'high' THEN 2
    WHEN 'medium' THEN 3
    ELSE 4
  END,
  timestamp DESC;

-- Traditional limitations for IoT time series data:
-- 1. Manual partition management and maintenance complexity
-- 2. Poor compression ratios for repetitive time series data patterns
-- 3. Expensive aggregation queries across large time ranges
-- 4. Limited real-time query performance for recent data analysis
-- 5. Complex retention policy implementation and data archiving
-- 6. No native support for irregular time intervals and sparse sensor data
-- 7. Difficult optimization for mixed analytical and operational workloads
-- 8. Limited scalability for high-frequency data ingestion (>1000 inserts/sec)
-- 9. Complex alerting and real-time monitoring query patterns
-- 10. Poor storage efficiency for IoT metadata and device information duplication

MongoDB Time Series Collections provide comprehensive optimization for temporal data workloads:

// MongoDB Time Series Collections - optimized for IoT sensor data and real-time analytics
const { MongoClient, ObjectId } = require('mongodb');

// Advanced IoT Time Series Data Manager
class IoTTimeSeriesDataManager {
  constructor() {
    this.client = null;
    this.db = null;
    this.collections = new Map();
    this.performanceMetrics = new Map();
    this.alertingRules = new Map();
    this.retentionPolicies = new Map();
  }

  async initialize() {
    console.log('Initializing IoT Time Series Data Manager...');

    // Connect with optimized settings for time series workloads
    this.client = new MongoClient(process.env.MONGODB_URI || 'mongodb://localhost:27017', {
      // Optimized for high-throughput time series writes
      maxPoolSize: 50,
      minPoolSize: 10,
      maxIdleTimeMS: 30000,
      serverSelectionTimeoutMS: 5000,

      // Write settings for time series data
      writeConcern: { 
        w: 1, 
        j: false, // Disable journaling for better write performance
        wtimeout: 5000
      },

      // Read preferences for time series queries
      readPreference: 'primaryPreferred',
      readConcern: { level: 'local' },

      // Compression for large time series datasets
      compressors: ['zstd', 'zlib'],

      appName: 'IoTTimeSeriesManager'
    });

    await this.client.connect();
    this.db = this.client.db('iot_platform');

    // Initialize time series collections with optimized configurations
    await this.setupTimeSeriesCollections();

    // Setup data retention policies
    await this.setupRetentionPolicies();

    // Initialize real-time alerting system
    await this.setupRealTimeAlerting();

    console.log('✅ IoT Time Series Data Manager initialized');
  }

  async setupTimeSeriesCollections() {
    console.log('Setting up optimized time series collections...');

    try {
      // Primary sensor readings time series collection
      const sensorReadingsTS = await this.db.createCollection('sensor_readings', {
        timeseries: {
          timeField: 'timestamp',           // Field containing the timestamp
          metaField: 'device',             // Field containing device metadata
          granularity: 'minutes',          // Optimized for minute-level granularity
          bucketMaxSpanSeconds: 3600       // 1-hour buckets for optimal compression
        },

        // Storage optimization
        storageEngine: {
          wiredTiger: {
            configString: 'block_compressor=zstd'  // Use zstd compression
          }
        }
      });

      // Device status and health time series
      const deviceHealthTS = await this.db.createCollection('device_health', {
        timeseries: {
          timeField: 'timestamp',
          metaField: 'device_info',
          granularity: 'hours',            // Less frequent health updates
          bucketMaxSpanSeconds: 86400      // 24-hour buckets
        }
      });

      // Environmental conditions aggregated data
      const environmentalTS = await this.db.createCollection('environmental_conditions', {
        timeseries: {
          timeField: 'timestamp',
          metaField: 'location',
          granularity: 'hours',
          bucketMaxSpanSeconds: 86400
        }
      });

      // Alert events time series
      const alertsTS = await this.db.createCollection('alert_events', {
        timeseries: {
          timeField: 'timestamp',
          metaField: 'alert_context',
          granularity: 'seconds',          // Fine-grained for alert analysis
          bucketMaxSpanSeconds: 3600
        }
      });

      // Store collection references
      this.collections.set('sensor_readings', this.db.collection('sensor_readings'));
      this.collections.set('device_health', this.db.collection('device_health'));
      this.collections.set('environmental_conditions', this.db.collection('environmental_conditions'));
      this.collections.set('alert_events', this.db.collection('alert_events'));

      // Create supporting collections for device metadata
      await this.setupDeviceCollections();

      // Create optimized indexes for time series queries
      await this.createTimeSeriesIndexes();

      console.log('✅ Time series collections configured with optimal settings');

    } catch (error) {
      console.error('Error setting up time series collections:', error);
      throw error;
    }
  }

  async setupDeviceCollections() {
    console.log('Setting up device metadata collections...');

    // Device registry collection (regular collection)
    const devicesCollection = this.db.collection('devices');
    await devicesCollection.createIndex({ device_id: 1 }, { unique: true });
    await devicesCollection.createIndex({ installation_location: 1 });
    await devicesCollection.createIndex({ device_type: 1 });
    await devicesCollection.createIndex({ "location_coordinates": "2dsphere" });

    // Location registry for environmental analytics
    const locationsCollection = this.db.collection('locations');
    await locationsCollection.createIndex({ location_id: 1 }, { unique: true });
    await locationsCollection.createIndex({ "coordinates": "2dsphere" });

    this.collections.set('devices', devicesCollection);
    this.collections.set('locations', locationsCollection);

    console.log('✅ Device metadata collections configured');
  }

  async createTimeSeriesIndexes() {
    console.log('Creating optimized time series indexes...');

    const sensorReadings = this.collections.get('sensor_readings');

    // Compound indexes optimized for common query patterns
    await sensorReadings.createIndex({ 
      'device.device_id': 1, 
      'timestamp': -1 
    }, { 
      name: 'device_timestamp_desc',
      background: true 
    });

    await sensorReadings.createIndex({ 
      'device.sensor_type': 1, 
      'timestamp': -1 
    }, { 
      name: 'sensor_type_timestamp_desc',
      background: true 
    });

    await sensorReadings.createIndex({ 
      'device.installation_location': 1, 
      'timestamp': -1 
    }, { 
      name: 'location_timestamp_desc',
      background: true 
    });

    // Partial indexes for alerting queries
    await sensorReadings.createIndex(
      { 'timestamp': -1 },
      { 
        name: 'recent_readings_partial',
        partialFilterExpression: { 
          timestamp: { $gte: new Date(Date.now() - 24 * 60 * 60 * 1000) }
        },
        background: true
      }
    );

    console.log('✅ Time series indexes created');
  }

  async ingestSensorData(sensorDataBatch) {
    console.log(`Ingesting batch of ${sensorDataBatch.length} sensor readings...`);

    const startTime = Date.now();

    try {
      // Transform data for time series optimized format
      const timeSeriesDocuments = sensorDataBatch.map(reading => ({
        timestamp: new Date(reading.timestamp),

        // Device metadata (metaField for automatic bucketing)
        device: {
          device_id: reading.device_id,
          sensor_type: reading.sensor_type,
          installation_location: reading.installation_location,
          device_model: reading.device_model
        },

        // Measurement values (optimized for compression)
        measurements: {
          temperature: reading.temperature,
          humidity: reading.humidity,
          pressure: reading.pressure,
          battery_level: reading.battery_level,
          signal_strength: reading.signal_strength
        },

        // Location data (when available)
        ...(reading.latitude && reading.longitude && {
          location: {
            coordinates: [reading.longitude, reading.latitude],
            altitude: reading.altitude
          }
        }),

        // Data quality and status
        quality: {
          data_quality_score: reading.data_quality_score || 1.0,
          anomaly_detected: reading.anomaly_detected || false,
          validation_flags: reading.validation_flags || {}
        },

        // Processing metadata
        ingestion_time: new Date(),
        processing_version: '1.0'
      }));

      // Bulk insert with optimal batch size for time series
      const insertResult = await this.collections.get('sensor_readings').insertMany(
        timeSeriesDocuments,
        { 
          ordered: false,  // Allow parallel inserts
          writeConcern: { w: 1, j: false }  // Optimize for throughput
        }
      );

      // Update device health tracking
      await this.updateDeviceHealthTracking(sensorDataBatch);

      // Process real-time alerts
      await this.processRealTimeAlerts(timeSeriesDocuments);

      // Update performance metrics
      const ingestionTime = Date.now() - startTime;
      await this.trackIngestionMetrics({
        batchSize: sensorDataBatch.length,
        ingestionTimeMs: ingestionTime,
        documentsInserted: insertResult.insertedCount,
        timestamp: new Date()
      });

      console.log(`✅ Ingested ${insertResult.insertedCount} sensor readings in ${ingestionTime}ms`);

      return {
        success: true,
        documentsInserted: insertResult.insertedCount,
        ingestionTimeMs: ingestionTime,
        throughput: (insertResult.insertedCount / ingestionTime * 1000).toFixed(2) + ' docs/sec'
      };

    } catch (error) {
      console.error('Error ingesting sensor data:', error);
      return { success: false, error: error.message };
    }
  }

  async queryRecentSensorData(deviceId, timeRangeMinutes = 60) {
    console.log(`Querying recent sensor data for device ${deviceId} over ${timeRangeMinutes} minutes...`);

    const startTime = Date.now();
    const queryStartTime = new Date(Date.now() - timeRangeMinutes * 60 * 1000);

    try {
      const pipeline = [
        {
          $match: {
            'device.device_id': deviceId,
            timestamp: { $gte: queryStartTime }
          }
        },
        {
          $sort: { timestamp: -1 }
        },
        {
          $limit: 1000  // Reasonable limit for recent data
        },
        {
          $project: {
            timestamp: 1,
            'device.sensor_type': 1,
            'device.installation_location': 1,
            measurements: 1,
            location: 1,
            'quality.data_quality_score': 1,
            'quality.anomaly_detected': 1
          }
        }
      ];

      const results = await this.collections.get('sensor_readings')
        .aggregate(pipeline, { allowDiskUse: false })
        .toArray();

      const queryTime = Date.now() - startTime;

      console.log(`✅ Retrieved ${results.length} readings in ${queryTime}ms`);

      return {
        deviceId: deviceId,
        timeRangeMinutes: timeRangeMinutes,
        readingCount: results.length,
        queryTimeMs: queryTime,
        data: results,

        // Data summary
        summary: {
          latestReading: results[0]?.timestamp,
          oldestReading: results[results.length - 1]?.timestamp,
          sensorTypes: [...new Set(results.map(r => r.device.sensor_type))],
          averageDataQuality: results.reduce((sum, r) => sum + (r.quality.data_quality_score || 1), 0) / results.length,
          anomaliesDetected: results.filter(r => r.quality.anomaly_detected).length
        }
      };

    } catch (error) {
      console.error('Error querying recent sensor data:', error);
      return { success: false, error: error.message };
    }
  }

  async performTimeSeriesAggregation(aggregationOptions) {
    console.log('Performing optimized time series aggregation...');

    const {
      timeRange = { hours: 24 },
      granularity = 'hour',
      metrics = ['temperature', 'humidity', 'pressure'],
      groupBy = ['device.installation_location'],
      filters = {}
    } = aggregationOptions;

    const startTime = Date.now();

    try {
      // Calculate time range
      const timeRangeMs = (timeRange.days || 0) * 24 * 60 * 60 * 1000 +
                          (timeRange.hours || 0) * 60 * 60 * 1000 +
                          (timeRange.minutes || 0) * 60 * 1000;
      const queryStartTime = new Date(Date.now() - timeRangeMs);

      // Build granularity for $dateTrunc
      const granularityMap = {
        'minute': 'minute',
        'hour': 'hour', 
        'day': 'day',
        'week': 'week'
      };

      const pipeline = [
        // Match stage with time range and filters
        {
          $match: {
            timestamp: { $gte: queryStartTime },
            ...filters
          }
        },

        // Add time bucket field
        {
          $addFields: {
            timeBucket: {
              $dateTrunc: {
                date: '$timestamp',
                unit: granularityMap[granularity] || 'hour'
              }
            }
          }
        },

        // Group by time bucket and specified dimensions
        {
          $group: {
            _id: {
              timeBucket: '$timeBucket',
              ...Object.fromEntries(groupBy.map(field => [field.replace('.', '_'), `$${field}`]))
            },

            // Count and time range
            readingCount: { $sum: 1 },
            periodStart: { $min: '$timestamp' },
            periodEnd: { $max: '$timestamp' },

            // Dynamic metric aggregations
            ...Object.fromEntries(metrics.flatMap(metric => [
              [`avg_${metric}`, { $avg: `$measurements.${metric}` }],
              [`min_${metric}`, { $min: `$measurements.${metric}` }],
              [`max_${metric}`, { $max: `$measurements.${metric}` }],
              [`stdDev_${metric}`, { $stdDevPop: `$measurements.${metric}` }]
            ])),

            // Data quality metrics
            avgDataQuality: { $avg: '$quality.data_quality_score' },
            anomalyCount: {
              $sum: { $cond: [{ $eq: ['$quality.anomaly_detected', true] }, 1, 0] }
            },

            // Device diversity
            uniqueDevices: { $addToSet: '$device.device_id' },
            sensorTypes: { $addToSet: '$device.sensor_type' }
          }
        },

        // Calculate derived metrics
        {
          $addFields: {
            // Device count
            deviceCount: { $size: '$uniqueDevices' },
            sensorTypeCount: { $size: '$sensorTypes' },

            // Data coverage and reliability
            dataCoveragePercent: {
              $multiply: [
                { $divide: ['$readingCount', { $multiply: ['$deviceCount', 60] }] }, // Assuming 1-minute intervals
                100
              ]
            },

            // Anomaly rate
            anomalyRate: {
              $cond: [
                { $gt: ['$readingCount', 0] },
                { $divide: ['$anomalyCount', '$readingCount'] },
                0
              ]
            }
          }
        },

        // Sort by time bucket
        {
          $sort: { '_id.timeBucket': 1 }
        },

        // Project final structure
        {
          $project: {
            timeBucket: '$_id.timeBucket',
            grouping: {
              $objectToArray: {
                $arrayToObject: {
                  $filter: {
                    input: { $objectToArray: '$_id' },
                    cond: { $ne: ['$$this.k', 'timeBucket'] }
                  }
                }
              }
            },

            // Measurements
            measurements: Object.fromEntries(metrics.map(metric => [
              metric,
              {
                avg: { $round: [`$avg_${metric}`, 2] },
                min: { $round: [`$min_${metric}`, 2] },
                max: { $round: [`$max_${metric}`, 2] },
                stdDev: { $round: [`$stdDev_${metric}`, 3] }
              }
            ])),

            // Metadata
            metadata: {
              readingCount: '$readingCount',
              deviceCount: '$deviceCount',
              sensorTypeCount: '$sensorTypeCount',
              periodStart: '$periodStart',
              periodEnd: '$periodEnd',
              dataCoveragePercent: { $round: ['$dataCoveragePercent', 1] },
              avgDataQuality: { $round: ['$avgDataQuality', 3] },
              anomalyCount: '$anomalyCount',
              anomalyRate: { $round: ['$anomalyRate', 4] }
            }
          }
        }
      ];

      // Execute aggregation with optimization hints
      const results = await this.collections.get('sensor_readings').aggregate(pipeline, {
        allowDiskUse: false,  // Use memory for better performance
        maxTimeMS: 30000,     // 30-second timeout
        hint: 'location_timestamp_desc'  // Use optimized index
      }).toArray();

      const aggregationTime = Date.now() - startTime;

      console.log(`✅ Completed time series aggregation: ${results.length} buckets in ${aggregationTime}ms`);

      return {
        success: true,
        aggregationTimeMs: aggregationTime,
        bucketCount: results.length,
        timeRange: timeRange,
        granularity: granularity,
        data: results,

        // Performance metrics
        performance: {
          documentsScanned: results.reduce((sum, bucket) => sum + bucket.metadata.readingCount, 0),
          averageBucketProcessingTime: aggregationTime / results.length,
          throughput: (results.length / aggregationTime * 1000).toFixed(2) + ' buckets/sec'
        }
      };

    } catch (error) {
      console.error('Error performing time series aggregation:', error);
      return { success: false, error: error.message };
    }
  }

  async setupRetentionPolicies() {
    console.log('Setting up automated data retention policies...');

    const retentionConfigs = {
      sensor_readings: {
        rawDataRetentionDays: 90,      // Keep raw data for 90 days
        aggregatedDataRetentionDays: 365, // Keep aggregated data for 1 year
        archiveAfterDays: 30,          // Archive data older than 30 days
        compressionLevel: 'high'
      },

      device_health: {
        rawDataRetentionDays: 180,     // Keep device health for 6 months
        aggregatedDataRetentionDays: 730, // Keep aggregated health data for 2 years
        archiveAfterDays: 60
      },

      alert_events: {
        rawDataRetentionDays: 365,     // Keep alerts for 1 year
        archiveAfterDays: 90
      }
    };

    // Store retention policies
    for (const [collection, config] of Object.entries(retentionConfigs)) {
      this.retentionPolicies.set(collection, config);

      // Create TTL indexes for automatic deletion
      await this.collections.get(collection).createIndex(
        { timestamp: 1 },
        { 
          expireAfterSeconds: config.rawDataRetentionDays * 24 * 60 * 60,
          name: `ttl_${collection}`,
          background: true
        }
      );
    }

    console.log('✅ Retention policies configured');
  }

  async processRealTimeAlerts(sensorDocuments) {
    const alertingRules = [
      {
        name: 'temperature_extreme',
        condition: (doc) => doc.measurements.temperature > 50 || doc.measurements.temperature < -20,
        severity: 'critical',
        message: (doc) => `Extreme temperature ${doc.measurements.temperature}°C detected at ${doc.device.installation_location}`
      },
      {
        name: 'battery_critical',
        condition: (doc) => doc.measurements.battery_level < 15,
        severity: 'high',
        message: (doc) => `Critical battery level ${doc.measurements.battery_level}% on device ${doc.device.device_id}`
      },
      {
        name: 'anomaly_detected',
        condition: (doc) => doc.quality.anomaly_detected === true,
        severity: 'medium',
        message: (doc) => `Anomalous readings detected from device ${doc.device.device_id}`
      },
      {
        name: 'data_quality_poor',
        condition: (doc) => doc.quality.data_quality_score < 0.7,
        severity: 'medium',
        message: (doc) => `Poor data quality (${doc.quality.data_quality_score}) from device ${doc.device.device_id}`
      }
    ];

    const alerts = [];
    const currentTime = new Date();

    for (const document of sensorDocuments) {
      for (const rule of alertingRules) {
        if (rule.condition(document)) {
          alerts.push({
            timestamp: currentTime,
            alert_context: {
              rule_name: rule.name,
              device_id: document.device.device_id,
              location: document.device.installation_location,
              sensor_type: document.device.sensor_type
            },
            severity: rule.severity,
            message: rule.message(document),
            source_data: {
              measurements: document.measurements,
              quality: document.quality,
              reading_timestamp: document.timestamp
            },
            status: 'active',
            acknowledgment: null,
            created_at: currentTime
          });
        }
      }
    }

    // Insert alerts into time series collection
    if (alerts.length > 0) {
      await this.collections.get('alert_events').insertMany(alerts, {
        ordered: false,
        writeConcern: { w: 1, j: false }
      });

      console.log(`🚨 Generated ${alerts.length} real-time alerts`);
    }

    return alerts;
  }

  async updateDeviceHealthTracking(sensorDataBatch) {
    // Aggregate device health metrics from sensor readings
    const deviceHealthUpdates = {};

    for (const reading of sensorDataBatch) {
      if (!deviceHealthUpdates[reading.device_id]) {
        deviceHealthUpdates[reading.device_id] = {
          readings: [],
          location: reading.installation_location,
          deviceModel: reading.device_model
        };
      }
      deviceHealthUpdates[reading.device_id].readings.push(reading);
    }

    const healthDocuments = Object.entries(deviceHealthUpdates).map(([deviceId, data]) => {
      const readings = data.readings;
      const avgBattery = readings.reduce((sum, r) => sum + (r.battery_level || 0), 0) / readings.length;
      const avgSignal = readings.reduce((sum, r) => sum + (r.signal_strength || -50), 0) / readings.length;
      const avgDataQuality = readings.reduce((sum, r) => sum + (r.data_quality_score || 1), 0) / readings.length;

      return {
        timestamp: new Date(),
        device_info: {
          device_id: deviceId,
          installation_location: data.location,
          device_model: data.deviceModel
        },
        health_metrics: {
          battery_level: avgBattery,
          signal_strength: avgSignal,
          data_quality_score: avgDataQuality,
          reading_frequency: readings.length,
          last_reading_time: new Date(Math.max(...readings.map(r => new Date(r.timestamp))))
        },
        health_status: {
          battery_status: avgBattery > 40 ? 'good' : avgBattery > 20 ? 'warning' : 'critical',
          connectivity_status: avgSignal > -60 ? 'excellent' : avgSignal > -80 ? 'good' : 'poor',
          overall_health: avgBattery > 20 && avgSignal > -80 && avgDataQuality > 0.8 ? 'healthy' : 'attention_needed'
        }
      };
    });

    if (healthDocuments.length > 0) {
      await this.collections.get('device_health').insertMany(healthDocuments, {
        ordered: false,
        writeConcern: { w: 1, j: false }
      });
    }
  }

  async trackIngestionMetrics(metrics) {
    const key = `${metrics.timestamp.getFullYear()}-${metrics.timestamp.getMonth() + 1}-${metrics.timestamp.getDate()}`;

    if (!this.performanceMetrics.has(key)) {
      this.performanceMetrics.set(key, {
        totalBatches: 0,
        totalDocuments: 0,
        totalIngestionTime: 0,
        averageBatchSize: 0,
        averageThroughput: 0
      });
    }

    const dailyMetrics = this.performanceMetrics.get(key);
    dailyMetrics.totalBatches++;
    dailyMetrics.totalDocuments += metrics.documentsInserted;
    dailyMetrics.totalIngestionTime += metrics.ingestionTimeMs;
    dailyMetrics.averageBatchSize = dailyMetrics.totalDocuments / dailyMetrics.totalBatches;
    dailyMetrics.averageThroughput = dailyMetrics.totalDocuments / (dailyMetrics.totalIngestionTime / 1000);
  }

  async getPerformanceMetrics() {
    const metrics = {
      timestamp: new Date(),
      ingestionMetrics: Object.fromEntries(this.performanceMetrics),

      // Collection statistics
      collectionStats: {},

      // Current throughput (last 5 minutes)
      recentThroughput: await this.calculateRecentThroughput(),

      // Storage efficiency
      storageStats: await this.getStorageStatistics()
    };

    // Get collection stats
    for (const [name, collection] of this.collections) {
      try {
        if (name.includes('sensor_readings') || name.includes('device_health')) {
          const stats = await this.db.command({ collStats: collection.collectionName });
          metrics.collectionStats[name] = {
            documentCount: stats.count,
            storageSize: stats.storageSize,
            averageDocumentSize: stats.avgObjSize,
            indexSize: stats.totalIndexSize,
            compressionRatio: stats.storageSize > 0 ? (stats.size / stats.storageSize).toFixed(2) : 0
          };
        }
      } catch (error) {
        console.warn(`Could not get stats for collection ${name}:`, error.message);
      }
    }

    return metrics;
  }

  async calculateRecentThroughput() {
    const fiveMinutesAgo = new Date(Date.now() - 5 * 60 * 1000);

    try {
      const recentCount = await this.collections.get('sensor_readings').countDocuments({
        timestamp: { $gte: fiveMinutesAgo }
      });

      return {
        documentsLast5Minutes: recentCount,
        throughputDocsPerSecond: (recentCount / 300).toFixed(2)
      };
    } catch (error) {
      return { error: error.message };
    }
  }

  async getStorageStatistics() {
    try {
      const dbStats = await this.db.command({ dbStats: 1 });

      return {
        totalDataSize: dbStats.dataSize,
        totalStorageSize: dbStats.storageSize,
        totalIndexSize: dbStats.indexSize,
        compressionRatio: dbStats.dataSize > 0 ? (dbStats.dataSize / dbStats.storageSize).toFixed(2) : 0,
        collections: dbStats.collections,
        objects: dbStats.objects
      };
    } catch (error) {
      return { error: error.message };
    }
  }

  async shutdown() {
    console.log('Shutting down IoT Time Series Data Manager...');

    if (this.client) {
      await this.client.close();
      console.log('✅ MongoDB connection closed');
    }

    this.collections.clear();
    this.performanceMetrics.clear();
    this.alertingRules.clear();
    this.retentionPolicies.clear();
  }
}

// Export the IoT time series data manager
module.exports = { IoTTimeSeriesDataManager };

// Benefits of MongoDB Time Series Collections for IoT:
// - Native time series optimization with automatic bucketing and compression
// - High-throughput data ingestion optimized for IoT sensor data patterns
// - Efficient storage with specialized compression algorithms for time series data
// - Automatic data retention policies with TTL indexes
// - Real-time query performance for recent data analysis and alerting
// - Flexible metadata handling for diverse IoT device types and sensor configurations
// - Built-in support for time-based aggregations and analytics
// - Seamless integration with existing MongoDB tooling and ecosystem
// - SQL-compatible time series operations through QueryLeaf integration
// - Enterprise-grade scalability for high-volume IoT data ingestion

Understanding MongoDB Time Series Architecture

Advanced IoT Data Processing and Analytics Patterns

Implement sophisticated time series data management for production IoT deployments:

// Production-ready IoT time series processing with advanced analytics and monitoring
class ProductionIoTTimeSeriesProcessor extends IoTTimeSeriesDataManager {
  constructor(productionConfig) {
    super();

    this.productionConfig = {
      ...productionConfig,
      enableAdvancedAnalytics: true,
      enablePredictiveAlerting: true,
      enableDataQualityMonitoring: true,
      enableAutomaticScaling: true,
      enableCompliance: true,
      enableDisasterRecovery: true
    };

    this.analyticsEngine = new Map();
    this.predictionModels = new Map();
    this.qualityMetrics = new Map();
    this.complianceTracking = new Map();
  }

  async setupAdvancedAnalytics() {
    console.log('Setting up advanced IoT analytics pipeline...');

    // Time series analytics configurations
    const analyticsConfigs = {
      // Real-time stream processing for immediate insights
      realTimeAnalytics: {
        windowSizes: ['5min', '15min', '1hour'],
        metrics: ['avg', 'min', 'max', 'stddev', 'percentiles'],
        alertingThresholds: {
          temperature: { min: -20, max: 50, stddev: 5 },
          humidity: { min: 0, max: 100, stddev: 10 },
          battery: { critical: 15, warning: 30 }
        },
        processingLatency: 'sub_second'
      },

      // Predictive analytics for proactive maintenance
      predictiveAnalytics: {
        algorithms: ['linear_trend', 'seasonal_decomposition', 'anomaly_detection'],
        predictionHorizons: ['1hour', '6hours', '24hours', '7days'],
        confidenceIntervals: [0.95, 0.99],
        modelRetraining: 'daily'
      },

      // Environmental correlation analysis
      environmentalAnalytics: {
        correlationAnalysis: true,
        spatialAnalysis: true,
        temporalPatterns: true,
        crossDeviceAnalysis: true
      },

      // Device performance analytics
      devicePerformanceAnalytics: {
        batteryLifePrediction: true,
        connectivityAnalysis: true,
        sensorDriftDetection: true,
        maintenancePrediction: true
      }
    };

    // Initialize analytics pipelines
    for (const [analyticsType, config] of Object.entries(analyticsConfigs)) {
      await this.initializeAnalyticsPipeline(analyticsType, config);
    }

    console.log('✅ Advanced analytics pipelines configured');
  }

  async performPredictiveAnalytics(deviceId, predictionType, horizonHours = 24) {
    console.log(`Performing predictive analytics for device ${deviceId}...`);

    const historicalHours = horizonHours * 10; // Use 10x horizon for historical analysis
    const startTime = new Date(Date.now() - historicalHours * 60 * 60 * 1000);

    const pipeline = [
      {
        $match: {
          'device.device_id': deviceId,
          timestamp: { $gte: startTime }
        }
      },
      {
        $sort: { timestamp: 1 }
      },
      {
        $group: {
          _id: {
            hour: { $dateToString: { format: '%Y-%m-%d-%H', date: '$timestamp' } }
          },
          avgTemperature: { $avg: '$measurements.temperature' },
          avgHumidity: { $avg: '$measurements.humidity' },
          avgBattery: { $avg: '$measurements.battery_level' },
          avgSignal: { $avg: '$measurements.signal_strength' },
          readingCount: { $sum: 1 },
          anomalyCount: { $sum: { $cond: ['$quality.anomaly_detected', 1, 0] } },
          minTimestamp: { $min: '$timestamp' },
          maxTimestamp: { $max: '$timestamp' }
        }
      },
      {
        $sort: { '_id.hour': 1 }
      },
      {
        $project: {
          hour: '$_id.hour',
          metrics: {
            temperature: '$avgTemperature',
            humidity: '$avgHumidity',
            battery: '$avgBattery',
            signal: '$avgSignal',
            readingCount: '$readingCount',
            anomalyRate: { $divide: ['$anomalyCount', '$readingCount'] }
          },
          timeRange: {
            start: '$minTimestamp',
            end: '$maxTimestamp'
          }
        }
      }
    ];

    const historicalData = await this.collections.get('sensor_readings')
      .aggregate(pipeline)
      .toArray();

    // Perform predictive analysis based on historical patterns
    const predictions = await this.generatePredictions(deviceId, historicalData, predictionType, horizonHours);

    return {
      deviceId: deviceId,
      predictionType: predictionType,
      horizon: `${horizonHours} hours`,
      historicalDataPoints: historicalData.length,
      predictions: predictions,
      confidence: this.calculatePredictionConfidence(historicalData),
      recommendations: this.generateMaintenanceRecommendations(predictions)
    };
  }

  async generatePredictions(deviceId, historicalData, predictionType, horizonHours) {
    const predictions = {
      batteryLife: null,
      temperatureTrends: null,
      connectivityHealth: null,
      maintenanceNeeds: null
    };

    if (predictionType === 'battery_life' || predictionType === 'all') {
      predictions.batteryLife = this.predictBatteryLife(historicalData, horizonHours);
    }

    if (predictionType === 'temperature_trends' || predictionType === 'all') {
      predictions.temperatureTrends = this.predictTemperatureTrends(historicalData, horizonHours);
    }

    if (predictionType === 'connectivity' || predictionType === 'all') {
      predictions.connectivityHealth = this.predictConnectivityHealth(historicalData, horizonHours);
    }

    if (predictionType === 'maintenance' || predictionType === 'all') {
      predictions.maintenanceNeeds = this.predictMaintenanceNeeds(historicalData, horizonHours);
    }

    return predictions;
  }

  predictBatteryLife(historicalData, horizonHours) {
    if (historicalData.length < 24) { // Need at least 24 hours of data
      return { error: 'Insufficient historical data for battery prediction' };
    }

    // Simple linear regression on battery level
    const batteryData = historicalData
      .filter(d => d.metrics.battery !== null)
      .map((d, index) => ({ x: index, y: d.metrics.battery }));

    if (batteryData.length < 10) {
      return { error: 'Insufficient battery data points' };
    }

    // Calculate linear trend
    const n = batteryData.length;
    const sumX = batteryData.reduce((sum, d) => sum + d.x, 0);
    const sumY = batteryData.reduce((sum, d) => sum + d.y, 0);
    const sumXY = batteryData.reduce((sum, d) => sum + (d.x * d.y), 0);
    const sumXX = batteryData.reduce((sum, d) => sum + (d.x * d.x), 0);

    const slope = (n * sumXY - sumX * sumY) / (n * sumXX - sumX * sumX);
    const intercept = (sumY - slope * sumX) / n;

    // Project battery level
    const currentBattery = batteryData[batteryData.length - 1].y;
    const futureX = batteryData.length - 1 + horizonHours;
    const projectedBattery = slope * futureX + intercept;

    // Calculate time until battery reaches critical level (15%)
    const criticalTime = slope !== 0 ? (15 - intercept) / slope : null;
    const hoursUntilCritical = criticalTime ? Math.max(0, criticalTime - (batteryData.length - 1)) : null;

    return {
      currentLevel: Math.round(currentBattery * 10) / 10,
      projectedLevel: Math.round(projectedBattery * 10) / 10,
      drainRate: Math.round((-slope) * 100) / 100, // % per hour
      hoursUntilCritical: hoursUntilCritical,
      confidence: this.calculateTrendConfidence(batteryData, slope, intercept),
      recommendation: hoursUntilCritical && hoursUntilCritical < 72 ? 
        'Schedule battery replacement within 3 days' : 
        'Battery levels normal'
    };
  }

  predictTemperatureTrends(historicalData, horizonHours) {
    const temperatureData = historicalData
      .filter(d => d.metrics.temperature !== null)
      .map((d, index) => ({ 
        x: index, 
        y: d.metrics.temperature,
        hour: d.hour
      }));

    if (temperatureData.length < 12) {
      return { error: 'Insufficient temperature data for trend analysis' };
    }

    // Calculate moving averages and trends
    const recentAvg = temperatureData.slice(-6).reduce((sum, d) => sum + d.y, 0) / 6;
    const historicalAvg = temperatureData.reduce((sum, d) => sum + d.y, 0) / temperatureData.length;
    const trend = recentAvg - historicalAvg;

    // Detect patterns (simplified seasonal detection)
    const hourlyAverages = this.calculateHourlyAverages(temperatureData);
    const dailyPattern = this.detectDailyPattern(hourlyAverages);

    return {
      currentAverage: Math.round(recentAvg * 100) / 100,
      historicalAverage: Math.round(historicalAvg * 100) / 100,
      trend: Math.round(trend * 100) / 100,
      trendDirection: trend > 1 ? 'increasing' : trend < -1 ? 'decreasing' : 'stable',
      dailyPattern: dailyPattern,
      extremeRisk: recentAvg > 40 || recentAvg < -10 ? 'high' : 'low',
      projectedRange: {
        min: Math.round((recentAvg + trend - 5) * 100) / 100,
        max: Math.round((recentAvg + trend + 5) * 100) / 100
      }
    };
  }

  predictConnectivityHealth(historicalData, horizonHours) {
    const signalData = historicalData
      .filter(d => d.metrics.signal !== null)
      .map(d => d.metrics.signal);

    const readingCountData = historicalData.map(d => d.metrics.readingCount);

    if (signalData.length < 6) {
      return { error: 'Insufficient connectivity data' };
    }

    const avgSignal = signalData.reduce((sum, s) => sum + s, 0) / signalData.length;
    const signalTrend = this.calculateSimpleTrend(signalData);

    const avgReadings = readingCountData.reduce((sum, r) => sum + r, 0) / readingCountData.length;
    const expectedReadings = 60; // Assuming 1-minute intervals
    const connectivityRatio = avgReadings / expectedReadings;

    return {
      averageSignalStrength: Math.round(avgSignal),
      signalTrend: Math.round(signalTrend * 100) / 100,
      connectivityRatio: Math.round(connectivityRatio * 1000) / 1000,
      connectivityStatus: connectivityRatio > 0.9 ? 'excellent' : 
                         connectivityRatio > 0.8 ? 'good' : 
                         connectivityRatio > 0.6 ? 'fair' : 'poor',
      projectedSignal: Math.round((avgSignal + signalTrend * horizonHours)),
      riskFactors: this.identifyConnectivityRisks(avgSignal, connectivityRatio, signalTrend)
    };
  }

  predictMaintenanceNeeds(historicalData, horizonHours) {
    const anomalyRates = historicalData.map(d => d.metrics.anomalyRate || 0);
    const recentAnomalyRate = anomalyRates.slice(-6).reduce((sum, r) => sum + r, 0) / 6;

    const batteryPrediction = this.predictBatteryLife(historicalData, horizonHours);
    const connectivityPrediction = this.predictConnectivityHealth(historicalData, horizonHours);

    const maintenanceScore = this.calculateMaintenanceScore(
      recentAnomalyRate,
      batteryPrediction,
      connectivityPrediction
    );

    return {
      maintenanceScore: Math.round(maintenanceScore),
      priority: maintenanceScore > 80 ? 'critical' :
                maintenanceScore > 60 ? 'high' :
                maintenanceScore > 40 ? 'medium' : 'low',

      recommendedActions: this.generateMaintenanceActions(
        maintenanceScore,
        batteryPrediction,
        connectivityPrediction,
        recentAnomalyRate
      ),

      estimatedMaintenanceWindow: this.estimateMaintenanceWindow(maintenanceScore),

      riskAssessment: {
        dataLossRisk: recentAnomalyRate > 0.1 ? 'high' : 'low',
        deviceFailureRisk: maintenanceScore > 70 ? 'high' : 'medium',
        serviceDisruptionRisk: connectivityPrediction.connectivityStatus === 'poor' ? 'high' : 'low'
      }
    };
  }

  calculateSimpleTrend(data) {
    if (data.length < 2) return 0;

    const n = data.length;
    const sumX = (n - 1) * n / 2; // Sum of 0,1,2,...,n-1
    const sumY = data.reduce((sum, y) => sum + y, 0);
    const sumXY = data.reduce((sum, y, x) => sum + (x * y), 0);
    const sumXX = (n - 1) * n * (2 * n - 1) / 6; // Sum of squares

    return (n * sumXY - sumX * sumY) / (n * sumXX - sumX * sumX);
  }

  calculateMaintenanceScore(anomalyRate, batteryPrediction, connectivityPrediction) {
    let score = 0;

    // Anomaly rate impact (0-30 points)
    score += Math.min(30, anomalyRate * 300);

    // Battery level impact (0-40 points)
    if (batteryPrediction && !batteryPrediction.error) {
      if (batteryPrediction.currentLevel < 20) score += 40;
      else if (batteryPrediction.currentLevel < 40) score += 25;
      else if (batteryPrediction.currentLevel < 60) score += 10;

      if (batteryPrediction.hoursUntilCritical && batteryPrediction.hoursUntilCritical < 72) {
        score += 30;
      }
    }

    // Connectivity impact (0-30 points)
    if (connectivityPrediction && !connectivityPrediction.error) {
      if (connectivityPrediction.connectivityStatus === 'poor') score += 30;
      else if (connectivityPrediction.connectivityStatus === 'fair') score += 20;
      else if (connectivityPrediction.connectivityStatus === 'good') score += 10;
    }

    return Math.min(100, score);
  }

  generateMaintenanceActions(score, batteryPrediction, connectivityPrediction, anomalyRate) {
    const actions = [];

    if (batteryPrediction && !batteryPrediction.error && batteryPrediction.currentLevel < 30) {
      actions.push({
        action: 'Replace device battery',
        priority: batteryPrediction.currentLevel < 15 ? 'urgent' : 'high',
        timeframe: batteryPrediction.currentLevel < 15 ? '24 hours' : '72 hours'
      });
    }

    if (connectivityPrediction && !connectivityPrediction.error && 
        connectivityPrediction.connectivityStatus === 'poor') {
      actions.push({
        action: 'Inspect device antenna and positioning',
        priority: 'high',
        timeframe: '48 hours'
      });
    }

    if (anomalyRate > 0.1) {
      actions.push({
        action: 'Perform sensor calibration and diagnostic check',
        priority: 'medium',
        timeframe: '1 week'
      });
    }

    if (score > 70) {
      actions.push({
        action: 'Comprehensive device health inspection',
        priority: 'high',
        timeframe: '48 hours'
      });
    }

    return actions.length > 0 ? actions : [{
      action: 'Continue routine monitoring',
      priority: 'low',
      timeframe: 'next_maintenance_cycle'
    }];
  }

  estimateMaintenanceWindow(score) {
    if (score > 80) return '0-24 hours';
    if (score > 60) return '1-3 days';
    if (score > 40) return '1-2 weeks';
    return '1-3 months';
  }

  calculateTrendConfidence(data, slope, intercept) {
    // Calculate R-squared for trend confidence
    const yMean = data.reduce((sum, d) => sum + d.y, 0) / data.length;
    const ssTotal = data.reduce((sum, d) => sum + Math.pow(d.y - yMean, 2), 0);
    const ssRes = data.reduce((sum, d) => {
      const predicted = slope * d.x + intercept;
      return sum + Math.pow(d.y - predicted, 2);
    }, 0);

    const rSquared = ssTotal > 0 ? 1 - (ssRes / ssTotal) : 0;

    if (rSquared > 0.8) return 'high';
    if (rSquared > 0.6) return 'medium';
    return 'low';
  }

  calculateHourlyAverages(temperatureData) {
    // Simplified hourly pattern detection
    const hourlyData = {};

    temperatureData.forEach(d => {
      const hour = d.hour.split('-')[3]; // Extract hour from YYYY-MM-DD-HH format
      if (!hourlyData[hour]) {
        hourlyData[hour] = [];
      }
      hourlyData[hour].push(d.y);
    });

    const hourlyAverages = {};
    for (const [hour, temps] of Object.entries(hourlyData)) {
      hourlyAverages[hour] = temps.reduce((sum, t) => sum + t, 0) / temps.length;
    }

    return hourlyAverages;
  }

  detectDailyPattern(hourlyAverages) {
    const hours = Object.keys(hourlyAverages).sort();
    if (hours.length < 6) return 'insufficient_data';

    const temperatures = hours.map(h => hourlyAverages[h]);
    const minTemp = Math.min(...temperatures);
    const maxTemp = Math.max(...temperatures);
    const range = maxTemp - minTemp;

    if (range > 10) return 'high_variation';
    if (range > 5) return 'moderate_variation';
    return 'stable';
  }

  identifyConnectivityRisks(avgSignal, connectivityRatio, signalTrend) {
    const risks = [];

    if (avgSignal < -80) {
      risks.push('Weak signal strength may cause intermittent connectivity');
    }

    if (connectivityRatio < 0.7) {
      risks.push('High packet loss affecting data reliability');
    }

    if (signalTrend < -1) {
      risks.push('Degrading signal strength trend detected');
    }

    if (risks.length === 0) {
      risks.push('No significant connectivity risks identified');
    }

    return risks;
  }

  generateMaintenanceRecommendations(predictions) {
    const recommendations = [];

    // Battery recommendations
    if (predictions.batteryLife && !predictions.batteryLife.error) {
      if (predictions.batteryLife.hoursUntilCritical && predictions.batteryLife.hoursUntilCritical < 168) {
        recommendations.push({
          type: 'battery',
          urgency: 'high',
          message: `Battery replacement needed within ${Math.ceil(predictions.batteryLife.hoursUntilCritical / 24)} days`,
          action: 'schedule_battery_replacement'
        });
      }
    }

    // Temperature recommendations
    if (predictions.temperatureTrends && !predictions.temperatureTrends.error) {
      if (predictions.temperatureTrends.extremeRisk === 'high') {
        recommendations.push({
          type: 'environmental',
          urgency: 'medium',
          message: 'Device operating in extreme temperature conditions',
          action: 'verify_installation_environment'
        });
      }
    }

    // Connectivity recommendations
    if (predictions.connectivityHealth && !predictions.connectivityHealth.error) {
      if (predictions.connectivityHealth.connectivityStatus === 'poor') {
        recommendations.push({
          type: 'connectivity',
          urgency: 'high',
          message: 'Poor connectivity affecting data transmission reliability',
          action: 'inspect_device_positioning_and_antenna'
        });
      }
    }

    // Maintenance recommendations
    if (predictions.maintenanceNeeds && !predictions.maintenanceNeeds.error) {
      if (predictions.maintenanceNeeds.priority === 'critical') {
        recommendations.push({
          type: 'maintenance',
          urgency: 'critical',
          message: 'Device requires immediate maintenance attention',
          action: 'schedule_emergency_maintenance'
        });
      }
    }

    return recommendations.length > 0 ? recommendations : [{
      type: 'status',
      urgency: 'none',
      message: 'Device operating within normal parameters',
      action: 'continue_monitoring'
    }];
  }

  calculatePredictionConfidence(historicalData) {
    if (historicalData.length < 12) return 'low';
    if (historicalData.length < 48) return 'medium';
    return 'high';
  }

  async getAdvancedAnalytics() {
    return {
      timestamp: new Date(),
      analyticsEngine: Object.fromEntries(this.analyticsEngine),
      predictionModels: Object.fromEntries(this.predictionModels),
      qualityMetrics: Object.fromEntries(this.qualityMetrics),
      systemHealth: await this.assessSystemHealth()
    };
  }

  async assessSystemHealth() {
    return {
      ingestionRate: 'optimal',
      queryPerformance: 'good',
      storageUtilization: 'normal',
      alertingSystem: 'operational',
      predictiveModels: 'trained'
    };
  }
}

// Export the production time series processor
module.exports = { ProductionIoTTimeSeriesProcessor };

SQL-Style Time Series Operations with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB time series operations:

-- QueryLeaf time series operations with SQL-familiar syntax for MongoDB

-- Create time series collection with SQL-style DDL
CREATE TIME SERIES COLLECTION sensor_readings (
  timestamp TIMESTAMP PRIMARY KEY,
  device OBJECT AS metaField (
    device_id VARCHAR(50),
    sensor_type VARCHAR(50),
    installation_location VARCHAR(200),
    device_model VARCHAR(100)
  ),
  measurements OBJECT (
    temperature DECIMAL(5,2),
    humidity DECIMAL(5,2), 
    pressure DECIMAL(7,2),
    battery_level DECIMAL(5,2),
    signal_strength INTEGER
  ),
  location OBJECT (
    coordinates ARRAY[2] OF DECIMAL(11,8),
    altitude DECIMAL(8,2)
  ),
  quality OBJECT (
    data_quality_score DECIMAL(3,2) DEFAULT 1.0,
    anomaly_detected BOOLEAN DEFAULT FALSE,
    validation_flags JSON
  )
)
WITH (
  granularity = 'minutes',
  bucket_max_span_seconds = 3600,
  compression = 'zstd',
  retention_days = 90
);

-- Insert time series data using familiar SQL syntax
INSERT INTO sensor_readings (timestamp, device, measurements, quality)
VALUES 
  (
    CURRENT_TIMESTAMP,
    JSON_OBJECT(
      'device_id', 'sensor_001',
      'sensor_type', 'environmental', 
      'installation_location', 'Warehouse A',
      'device_model', 'TempHumid Pro'
    ),
    JSON_OBJECT(
      'temperature', 23.5,
      'humidity', 45.2,
      'pressure', 1013.25,
      'battery_level', 87.5,
      'signal_strength', -65
    ),
    JSON_OBJECT(
      'data_quality_score', 0.95,
      'anomaly_detected', FALSE
    )
  );

-- Time-based queries with SQL window functions and time series optimizations
WITH recent_readings AS (
  SELECT 
    device->>'device_id' as device_id,
    device->>'installation_location' as location,
    timestamp,
    measurements->>'temperature'::DECIMAL as temperature,
    measurements->>'humidity'::DECIMAL as humidity,
    measurements->>'battery_level'::DECIMAL as battery_level,
    quality->>'data_quality_score'::DECIMAL as data_quality

  FROM sensor_readings
  WHERE timestamp >= CURRENT_TIMESTAMP - INTERVAL '24 hours'
    AND quality->>'data_quality_score'::DECIMAL >= 0.8
),

time_series_analysis AS (
  SELECT 
    device_id,
    location,
    timestamp,
    temperature,
    humidity,
    battery_level,

    -- Time series window functions for trend analysis
    AVG(temperature) OVER (
      PARTITION BY device_id 
      ORDER BY timestamp 
      ROWS BETWEEN 11 PRECEDING AND CURRENT ROW
    ) as temperature_12_point_avg,

    LAG(temperature, 1) OVER (
      PARTITION BY device_id 
      ORDER BY timestamp
    ) as prev_temperature,

    -- Calculate rate of change
    (temperature - LAG(temperature, 1) OVER (
      PARTITION BY device_id ORDER BY timestamp
    )) / EXTRACT(EPOCH FROM (
      timestamp - LAG(timestamp, 1) OVER (
        PARTITION BY device_id ORDER BY timestamp
      )
    )) * 3600 as temp_change_per_hour,

    -- Moving standard deviation for anomaly detection
    STDDEV(temperature) OVER (
      PARTITION BY device_id 
      ORDER BY timestamp 
      ROWS BETWEEN 23 PRECEDING AND CURRENT ROW
    ) as temperature_24_point_stddev,

    -- Battery drain analysis
    FIRST_VALUE(battery_level) OVER (
      PARTITION BY device_id 
      ORDER BY timestamp 
      ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) - battery_level as total_battery_drain,

    -- Time-based calculations
    EXTRACT(HOUR FROM timestamp) as hour_of_day,
    EXTRACT(DOW FROM timestamp) as day_of_week,
    DATE_TRUNC('hour', timestamp) as hour_bucket

  FROM recent_readings
),

hourly_aggregations AS (
  SELECT 
    hour_bucket,
    location,
    COUNT(*) as reading_count,

    -- Statistical aggregations optimized for time series
    AVG(temperature) as avg_temperature,
    MIN(temperature) as min_temperature,
    MAX(temperature) as max_temperature,
    STDDEV(temperature) as temp_stddev,
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY temperature) as temp_median,
    PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY temperature) as temp_p95,

    AVG(humidity) as avg_humidity,
    STDDEV(humidity) as humidity_stddev,

    -- Battery analysis
    AVG(battery_level) as avg_battery_level,
    MIN(battery_level) as min_battery_level,
    COUNT(*) FILTER (WHERE battery_level < 20) as low_battery_readings,

    -- Data quality metrics
    AVG(data_quality) as avg_data_quality,
    COUNT(*) FILTER (WHERE data_quality < 0.9) as poor_quality_readings,

    -- Anomaly detection indicators
    COUNT(*) FILTER (WHERE ABS(temp_change_per_hour) > 5) as rapid_temp_changes,
    COUNT(*) FILTER (WHERE ABS(temperature - temperature_12_point_avg) > 2 * temperature_24_point_stddev) as statistical_anomalies

  FROM time_series_analysis
  GROUP BY hour_bucket, location
),

location_health_analysis AS (
  SELECT 
    location,

    -- Time range analysis
    MIN(hour_bucket) as analysis_start,
    MAX(hour_bucket) as analysis_end,
    COUNT(*) as total_hours,

    -- Environmental conditions
    AVG(avg_temperature) as overall_avg_temperature,
    MAX(max_temperature) as peak_temperature,
    MIN(min_temperature) as lowest_temperature,
    AVG(temp_stddev) as avg_temperature_variability,

    AVG(avg_humidity) as overall_avg_humidity,
    AVG(humidity_stddev) as avg_humidity_variability,

    -- Device health indicators
    AVG(avg_battery_level) as location_avg_battery,
    SUM(low_battery_readings) as total_low_battery_readings,
    AVG(avg_data_quality) as location_data_quality,
    SUM(poor_quality_readings) as total_poor_quality_readings,

    -- Anomaly aggregations
    SUM(rapid_temp_changes) as total_rapid_changes,
    SUM(statistical_anomalies) as total_anomalies,
    SUM(reading_count) as total_readings,

    -- Calculated health metrics
    (SUM(reading_count) - SUM(poor_quality_readings)) * 100.0 / NULLIF(SUM(reading_count), 0) as data_reliability_percent,

    CASE 
      WHEN AVG(avg_temperature) BETWEEN 18 AND 25 AND AVG(avg_humidity) BETWEEN 40 AND 60 THEN 'optimal'
      WHEN AVG(avg_temperature) BETWEEN 15 AND 30 AND AVG(avg_humidity) BETWEEN 30 AND 70 THEN 'acceptable'
      WHEN AVG(avg_temperature) BETWEEN 10 AND 35 AND AVG(avg_humidity) BETWEEN 20 AND 80 THEN 'suboptimal'
      ELSE 'extreme'
    END as environmental_classification,

    CASE 
      WHEN AVG(avg_battery_level) > 60 THEN 'healthy'
      WHEN AVG(avg_battery_level) > 30 THEN 'moderate'
      WHEN AVG(avg_battery_level) > 15 THEN 'concerning'
      ELSE 'critical'
    END as battery_health_status

  FROM hourly_aggregations
  GROUP BY location
)

SELECT 
  location,
  TO_CHAR(analysis_start, 'YYYY-MM-DD HH24:MI') as period_start,
  TO_CHAR(analysis_end, 'YYYY-MM-DD HH24:MI') as period_end,
  total_hours,

  -- Environmental metrics
  ROUND(overall_avg_temperature::NUMERIC, 2) as avg_temperature_c,
  ROUND(peak_temperature::NUMERIC, 2) as max_temperature_c,
  ROUND(lowest_temperature::NUMERIC, 2) as min_temperature_c,
  ROUND(avg_temperature_variability::NUMERIC, 2) as temperature_stability,

  ROUND(overall_avg_humidity::NUMERIC, 2) as avg_humidity_percent,
  ROUND(avg_humidity_variability::NUMERIC, 2) as humidity_stability,

  environmental_classification,

  -- Device health metrics
  ROUND(location_avg_battery::NUMERIC, 2) as avg_battery_percent,
  battery_health_status,
  total_low_battery_readings,

  -- Data quality metrics
  ROUND(location_data_quality::NUMERIC, 3) as avg_data_quality,
  ROUND(data_reliability_percent::NUMERIC, 2) as data_reliability_percent,
  total_poor_quality_readings,

  -- Anomaly metrics
  total_rapid_changes,
  total_anomalies,
  ROUND((total_anomalies * 100.0 / NULLIF(total_readings, 0))::NUMERIC, 3) as anomaly_rate_percent,

  -- Overall location health score
  (
    CASE environmental_classification
      WHEN 'optimal' THEN 25
      WHEN 'acceptable' THEN 20
      WHEN 'suboptimal' THEN 15
      ELSE 5
    END +
    CASE battery_health_status
      WHEN 'healthy' THEN 25
      WHEN 'moderate' THEN 20
      WHEN 'concerning' THEN 10
      ELSE 5
    END +
    CASE 
      WHEN data_reliability_percent >= 95 THEN 25
      WHEN data_reliability_percent >= 90 THEN 20
      WHEN data_reliability_percent >= 85 THEN 15
      ELSE 10
    END +
    CASE 
      WHEN total_anomalies * 100.0 / NULLIF(total_readings, 0) < 1 THEN 25
      WHEN total_anomalies * 100.0 / NULLIF(total_readings, 0) < 3 THEN 20
      WHEN total_anomalies * 100.0 / NULLIF(total_readings, 0) < 5 THEN 15
      ELSE 10
    END
  ) as location_health_score,

  -- Operational recommendations
  CASE 
    WHEN environmental_classification = 'extreme' THEN 'URGENT: Review environmental conditions'
    WHEN battery_health_status = 'critical' THEN 'URGENT: Multiple devices need battery replacement'
    WHEN data_reliability_percent < 90 THEN 'HIGH: Investigate data quality issues'
    WHEN total_anomalies * 100.0 / NULLIF(total_readings, 0) > 5 THEN 'MEDIUM: High anomaly rate needs investigation'
    WHEN environmental_classification = 'suboptimal' THEN 'LOW: Monitor environmental conditions'
    ELSE 'Location operating within normal parameters'
  END as operational_recommendation,

  total_readings

FROM location_health_analysis
ORDER BY location_health_score ASC, location;

-- Real-time alerting with time series optimizations
WITH latest_device_readings AS (
  SELECT DISTINCT ON (device->>'device_id')
    device->>'device_id' as device_id,
    device->>'installation_location' as location,
    device->>'sensor_type' as sensor_type,
    timestamp,
    measurements->>'temperature'::DECIMAL as temperature,
    measurements->>'humidity'::DECIMAL as humidity,
    measurements->>'battery_level'::DECIMAL as battery_level,
    measurements->>'signal_strength'::INTEGER as signal_strength,
    quality->>'anomaly_detected'::BOOLEAN as anomaly_detected,
    quality->>'data_quality_score'::DECIMAL as data_quality_score

  FROM sensor_readings
  WHERE timestamp >= CURRENT_TIMESTAMP - INTERVAL '15 minutes'
  ORDER BY device->>'device_id', timestamp DESC
),

alert_conditions AS (
  SELECT 
    device_id,
    location,
    sensor_type,
    timestamp,
    temperature,
    humidity,
    battery_level,
    signal_strength,
    data_quality_score,

    -- Alert condition evaluation
    CASE 
      WHEN temperature > 50 OR temperature < -20 THEN 'temperature_extreme'
      WHEN humidity > 95 OR humidity < 5 THEN 'humidity_extreme'
      WHEN battery_level < 15 THEN 'battery_critical'
      WHEN signal_strength < -85 THEN 'signal_poor'
      WHEN anomaly_detected = TRUE THEN 'anomaly_detected'
      WHEN data_quality_score < 0.7 THEN 'data_quality_poor'
      WHEN timestamp < CURRENT_TIMESTAMP - INTERVAL '10 minutes' THEN 'device_offline'
    END as alert_type,

    -- Alert severity calculation
    CASE 
      WHEN battery_level < 10 OR timestamp < CURRENT_TIMESTAMP - INTERVAL '20 minutes' THEN 'critical'
      WHEN temperature > 45 OR temperature < -15 OR anomaly_detected THEN 'high'
      WHEN battery_level < 20 OR signal_strength < -80 THEN 'medium'
      ELSE 'low'
    END as alert_severity,

    -- Time since last reading
    EXTRACT(EPOCH FROM (CURRENT_TIMESTAMP - timestamp))/60 as minutes_since_reading

  FROM latest_device_readings
)

SELECT 
  device_id,
  location,
  sensor_type,
  alert_type,
  alert_severity,
  TO_CHAR(timestamp, 'YYYY-MM-DD HH24:MI:SS') as last_reading_time,
  ROUND(minutes_since_reading::NUMERIC, 1) as minutes_ago,

  -- Current sensor values
  temperature,
  humidity,
  battery_level,
  signal_strength,
  ROUND(data_quality_score::NUMERIC, 3) as data_quality,

  -- Alert message generation
  CASE alert_type
    WHEN 'temperature_extreme' THEN 
      FORMAT('Temperature %s°C exceeds safe operating range', temperature)
    WHEN 'humidity_extreme' THEN 
      FORMAT('Humidity %s%% is at extreme level', humidity)
    WHEN 'battery_critical' THEN 
      FORMAT('Battery level %s%% requires immediate replacement', battery_level)
    WHEN 'signal_poor' THEN 
      FORMAT('Signal strength %s dBm indicates connectivity issues', signal_strength)
    WHEN 'anomaly_detected' THEN 
      'Sensor anomaly detected in recent readings'
    WHEN 'data_quality_poor' THEN 
      FORMAT('Data quality score %s indicates sensor problems', data_quality_score)
    WHEN 'device_offline' THEN 
      FORMAT('Device offline for %s minutes', ROUND(minutes_since_reading::NUMERIC, 0))
    ELSE 'Unknown alert condition'
  END as alert_message,

  -- Recommended actions with time urgency
  CASE alert_type
    WHEN 'temperature_extreme' THEN 'Verify environmental conditions and sensor calibration within 2 hours'
    WHEN 'humidity_extreme' THEN 'Check sensor operation and environmental factors within 4 hours'
    WHEN 'battery_critical' THEN 'Replace battery immediately (within 24 hours)'
    WHEN 'signal_poor' THEN 'Inspect antenna and network infrastructure within 48 hours'
    WHEN 'anomaly_detected' THEN 'Investigate sensor readings and interference within 24 hours'
    WHEN 'data_quality_poor' THEN 'Perform sensor calibration within 48 hours'
    WHEN 'device_offline' THEN 'Check power and connectivity immediately'
    ELSE 'Monitor device status'
  END as recommended_action

FROM alert_conditions
WHERE alert_type IS NOT NULL
ORDER BY 
  CASE alert_severity
    WHEN 'critical' THEN 1
    WHEN 'high' THEN 2
    WHEN 'medium' THEN 3
    ELSE 4
  END,
  timestamp DESC;

-- Time series aggregation with downsampling for long-term analysis
WITH daily_device_summary AS (
  SELECT 
    DATE_TRUNC('day', timestamp) as day,
    device->>'device_id' as device_id,
    device->>'installation_location' as location,

    -- Daily statistical aggregations
    COUNT(*) as daily_reading_count,

    -- Temperature analysis
    AVG(measurements->>'temperature'::DECIMAL) as avg_temperature,
    MIN(measurements->>'temperature'::DECIMAL) as min_temperature,
    MAX(measurements->>'temperature'::DECIMAL) as max_temperature,
    STDDEV(measurements->>'temperature'::DECIMAL) as temp_daily_stddev,

    -- Humidity analysis
    AVG(measurements->>'humidity'::DECIMAL) as avg_humidity,
    STDDEV(measurements->>'humidity'::DECIMAL) as humidity_daily_stddev,

    -- Battery degradation tracking
    FIRST_VALUE(measurements->>'battery_level'::DECIMAL) OVER (
      PARTITION BY device->>'device_id', DATE_TRUNC('day', timestamp)
      ORDER BY timestamp ASC
      ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) as day_start_battery,

    LAST_VALUE(measurements->>'battery_level'::DECIMAL) OVER (
      PARTITION BY device->>'device_id', DATE_TRUNC('day', timestamp)
      ORDER BY timestamp ASC
      ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING  
    ) as day_end_battery,

    -- Data quality metrics
    AVG(quality->>'data_quality_score'::DECIMAL) as avg_daily_data_quality,
    COUNT(*) FILTER (WHERE quality->>'anomaly_detected'::BOOLEAN = TRUE) as daily_anomaly_count,

    -- Connectivity metrics
    AVG(measurements->>'signal_strength'::INTEGER) as avg_signal_strength,
    COUNT(*) as expected_readings, -- Based on device configuration

    -- Environmental stability
    CASE 
      WHEN STDDEV(measurements->>'temperature'::DECIMAL) < 2 AND 
           STDDEV(measurements->>'humidity'::DECIMAL) < 5 THEN 'stable'
      WHEN STDDEV(measurements->>'temperature'::DECIMAL) < 5 AND 
           STDDEV(measurements->>'humidity'::DECIMAL) < 15 THEN 'moderate'
      ELSE 'variable'
    END as environmental_stability

  FROM sensor_readings
  WHERE timestamp >= CURRENT_DATE - INTERVAL '30 days'
  GROUP BY DATE_TRUNC('day', timestamp), device->>'device_id', device->>'installation_location'
),

device_trend_analysis AS (
  SELECT 
    device_id,
    location,

    -- Time period
    MIN(day) as analysis_start_date,
    MAX(day) as analysis_end_date,
    COUNT(*) as total_days,

    -- Reading consistency
    AVG(daily_reading_count) as avg_daily_readings,
    STDDEV(daily_reading_count) as reading_count_consistency,

    -- Temperature trends
    AVG(avg_temperature) as overall_avg_temperature,
    STDDEV(avg_temperature) as temperature_day_to_day_variation,

    -- Linear regression on daily averages for trend detection
    REGR_SLOPE(avg_temperature, EXTRACT(EPOCH FROM day)) * 86400 as temp_trend_per_day,
    REGR_R2(avg_temperature, EXTRACT(EPOCH FROM day)) as temp_trend_confidence,

    -- Battery degradation analysis  
    AVG(day_end_battery - day_start_battery) as avg_daily_battery_drain,

    -- Battery trend analysis
    REGR_SLOPE(day_end_battery, EXTRACT(EPOCH FROM day)) * 86400 as battery_trend_per_day,
    REGR_R2(day_end_battery, EXTRACT(EPOCH FROM day)) as battery_trend_confidence,

    -- Data quality trends
    AVG(avg_daily_data_quality) as overall_avg_data_quality,
    AVG(daily_anomaly_count) as avg_daily_anomalies,

    -- Connectivity trends
    AVG(avg_signal_strength) as overall_avg_signal,
    REGR_SLOPE(avg_signal_strength, EXTRACT(EPOCH FROM day)) * 86400 as signal_trend_per_day,

    -- Environmental stability assessment
    MODE() WITHIN GROUP (ORDER BY environmental_stability) as predominant_stability,
    COUNT(*) FILTER (WHERE environmental_stability = 'stable') * 100.0 / COUNT(*) as stable_days_percent

  FROM daily_device_summary
  GROUP BY device_id, location
)

SELECT 
  device_id,
  location,
  TO_CHAR(analysis_start_date, 'YYYY-MM-DD') as period_start,
  TO_CHAR(analysis_end_date, 'YYYY-MM-DD') as period_end,
  total_days,

  -- Reading consistency metrics
  ROUND(avg_daily_readings::NUMERIC, 1) as avg_daily_readings,
  CASE 
    WHEN reading_count_consistency / NULLIF(avg_daily_readings, 0) < 0.1 THEN 'very_consistent'
    WHEN reading_count_consistency / NULLIF(avg_daily_readings, 0) < 0.2 THEN 'consistent'
    WHEN reading_count_consistency / NULLIF(avg_daily_readings, 0) < 0.4 THEN 'variable'
    ELSE 'inconsistent'
  END as reading_consistency,

  -- Temperature analysis
  ROUND(overall_avg_temperature::NUMERIC, 2) as avg_temperature_c,
  ROUND(temperature_day_to_day_variation::NUMERIC, 2) as temp_daily_variation,
  ROUND((temp_trend_per_day * 30)::NUMERIC, 3) as temp_trend_per_month_c,
  CASE 
    WHEN temp_trend_confidence > 0.7 THEN 'high_confidence'
    WHEN temp_trend_confidence > 0.4 THEN 'medium_confidence'
    ELSE 'low_confidence'
  END as temp_trend_reliability,

  -- Battery health analysis
  ROUND(avg_daily_battery_drain::NUMERIC, 2) as avg_daily_drain_percent,
  ROUND((battery_trend_per_day * 30)::NUMERIC, 2) as battery_degradation_per_month,

  -- Estimated battery life (assuming linear degradation)
  CASE 
    WHEN battery_trend_per_day < -0.1 THEN 
      ROUND((50.0 / ABS(battery_trend_per_day))::NUMERIC, 0) -- Days until 50% from current
    ELSE NULL
  END as estimated_days_until_50_percent,

  CASE 
    WHEN battery_trend_per_day < -0.05 THEN 
      ROUND((85.0 / ABS(battery_trend_per_day))::NUMERIC, 0) -- Days until need replacement
    ELSE NULL
  END as estimated_days_until_replacement,

  -- Data quality assessment
  ROUND(overall_avg_data_quality::NUMERIC, 3) as avg_data_quality,
  ROUND(avg_daily_anomalies::NUMERIC, 1) as avg_daily_anomalies,

  -- Connectivity assessment
  ROUND(overall_avg_signal::NUMERIC, 0) as avg_signal_strength_dbm,
  CASE 
    WHEN signal_trend_per_day < -0.5 THEN 'degrading'
    WHEN signal_trend_per_day > 0.5 THEN 'improving'
    ELSE 'stable'
  END as signal_trend,

  -- Environmental assessment
  predominant_stability as environmental_stability,
  ROUND(stable_days_percent::NUMERIC, 1) as stable_days_percent,

  -- Overall device health scoring
  (
    -- Reading consistency (0-25 points)
    CASE 
      WHEN reading_count_consistency / NULLIF(avg_daily_readings, 0) < 0.1 THEN 25
      WHEN reading_count_consistency / NULLIF(avg_daily_readings, 0) < 0.2 THEN 20
      WHEN reading_count_consistency / NULLIF(avg_daily_readings, 0) < 0.4 THEN 15
      ELSE 10
    END +

    -- Battery health (0-25 points)
    CASE 
      WHEN battery_trend_per_day > -0.05 THEN 25
      WHEN battery_trend_per_day > -0.1 THEN 20
      WHEN battery_trend_per_day > -0.2 THEN 15
      ELSE 10
    END +

    -- Data quality (0-25 points)
    CASE 
      WHEN overall_avg_data_quality >= 0.95 THEN 25
      WHEN overall_avg_data_quality >= 0.90 THEN 20
      WHEN overall_avg_data_quality >= 0.85 THEN 15
      ELSE 10
    END +

    -- Environmental stability (0-25 points)
    CASE 
      WHEN stable_days_percent >= 80 THEN 25
      WHEN stable_days_percent >= 60 THEN 20
      WHEN stable_days_percent >= 40 THEN 15
      ELSE 10
    END
  ) as device_health_score,

  -- Maintenance recommendations
  CASE 
    WHEN battery_trend_per_day < -0.2 OR overall_avg_data_quality < 0.8 THEN 'URGENT: Schedule maintenance'
    WHEN battery_trend_per_day < -0.1 OR stable_days_percent < 50 THEN 'HIGH: Review device status'
    WHEN overall_avg_signal < -80 OR avg_daily_anomalies > 5 THEN 'MEDIUM: Monitor closely'
    ELSE 'LOW: Continue routine monitoring'
  END as maintenance_priority

FROM device_trend_analysis
WHERE total_days >= 7  -- Only analyze devices with sufficient data
ORDER BY device_health_score ASC, device_id;

-- QueryLeaf provides comprehensive time series capabilities:
-- 1. Native time series collection creation with SQL DDL syntax
-- 2. Optimized time-based queries with window functions and aggregations  
-- 3. Real-time alerting with complex condition evaluation
-- 4. Long-term trend analysis with statistical functions
-- 5. Automated data retention and lifecycle management
-- 6. High-performance ingestion optimized for IoT data patterns
-- 7. Advanced analytics with predictive capabilities
-- 8. Integration with MongoDB's time series optimizations
-- 9. Familiar SQL syntax for complex temporal operations
-- 10. Enterprise-grade scalability for high-volume IoT applications

Best Practices for MongoDB Time Series Implementation

IoT Data Architecture and Optimization Strategies

Essential practices for production MongoDB time series deployments:

Granularity Selection: Choose appropriate time series granularity based on data frequency and query patterns
Metadata Organization: Structure device metadata efficiently in the metaField for optimal bucketing and compression
Index Strategy: Create compound indexes on metaField components and timestamp for optimal query performance
Retention Policies: Implement TTL indexes and automated data archiving based on business requirements
Compression Optimization: Use zstd compression for maximum storage efficiency with time series data
Query Optimization: Design aggregation pipelines that leverage time series collection optimizations

Scalability and Production Deployment

Optimize time series collections for enterprise IoT requirements:

High-Throughput Ingestion: Configure write settings and batch sizes for optimal data ingestion rates
Real-Time Analytics: Implement efficient real-time query patterns that leverage time series optimizations
Predictive Analytics: Build statistical models using historical time series data for proactive maintenance
Multi-Tenant Architecture: Design time series schemas that support multiple device types and customers
Compliance Integration: Ensure time series data meets regulatory retention and audit requirements
Disaster Recovery: Implement backup and recovery strategies optimized for time series data volumes

Conclusion

MongoDB Time Series Collections provide comprehensive optimization for IoT sensor data management through native time-stamped data support, automatic compression algorithms, and specialized indexing strategies designed specifically for temporal workloads. The integrated approach eliminates complex manual partitioning while delivering superior performance for high-frequency data ingestion and time-based analytics operations.

Key MongoDB Time Series benefits for IoT applications include:

Native IoT Optimization: Purpose-built time series collections with automatic bucketing and compression for sensor data
High-Performance Ingestion: Optimized write paths capable of handling thousands of sensor readings per second
Intelligent Storage Management: Automatic data compression and retention policies that scale with IoT data volumes
Real-Time Analytics: Efficient time-based queries and aggregations optimized for recent data analysis and alerting
Predictive Capabilities: Advanced analytics support for device maintenance, trend analysis, and anomaly detection
SQL Compatibility: Familiar time series operations accessible through SQL-style interfaces for operational simplicity

Whether you're managing environmental sensors, industrial equipment monitoring, smart city infrastructure, or consumer IoT devices, MongoDB Time Series Collections with QueryLeaf's SQL-familiar interface provide the foundation for scalable IoT data architecture that maintains high performance while supporting sophisticated real-time analytics and predictive maintenance workflows.

QueryLeaf Integration: QueryLeaf automatically optimizes MongoDB Time Series Collections while providing SQL-familiar syntax for time series data management, real-time analytics, and predictive maintenance operations. Advanced temporal query patterns, statistical analysis, and IoT-specific optimizations are seamlessly accessible through familiar SQL constructs, making sophisticated time series data management both powerful and approachable for SQL-oriented development teams.

The combination of MongoDB's specialized time series optimizations with familiar SQL-style operations makes it an ideal platform for IoT applications that require both high-performance temporal data processing and operational simplicity, ensuring your sensor data architecture scales efficiently while maintaining familiar development and analytical patterns.