Skip to content

Blog

MongoDB Transaction Error Handling and Recovery Patterns: Building Resilient Applications with Advanced Error Management and Automatic Retry Strategies

Production MongoDB applications require sophisticated error handling and recovery mechanisms that can gracefully manage transaction failures, network interruptions, server unavailability, and resource constraints while maintaining data consistency and application reliability. Traditional database error handling approaches often lack the nuanced understanding of distributed system challenges, leading to incomplete transactions, data inconsistencies, and poor user experiences when dealing with complex failure scenarios.

MongoDB provides comprehensive transaction error handling capabilities through intelligent retry mechanisms, detailed error classification, and sophisticated recovery patterns that enable applications to maintain consistency and reliability even in the face of network partitions, replica set failovers, and resource contention. Unlike traditional databases that provide basic error codes and limited retry logic, MongoDB transactions integrate advanced error detection with automatic recovery strategies and detailed diagnostic information.

The Traditional Transaction Error Handling Challenge

Conventional approaches to database transaction error management in enterprise applications face significant limitations in resilience and recovery capabilities:

-- Traditional PostgreSQL transaction error handling - basic error management with limited recovery options

-- Simple transaction error tracking table
CREATE TABLE transaction_error_log (
    error_id SERIAL PRIMARY KEY,
    transaction_id UUID,
    connection_id VARCHAR(100),

    -- Basic error information
    error_code VARCHAR(20),
    error_message TEXT,
    error_category VARCHAR(50), -- connection, constraint, timeout, etc.

    -- Timing information
    transaction_start_time TIMESTAMP,
    error_occurred_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    -- Context information (limited)
    table_name VARCHAR(100),
    operation_type VARCHAR(20), -- INSERT, UPDATE, DELETE, SELECT
    affected_rows INTEGER,

    -- Simple retry tracking
    retry_count INTEGER DEFAULT 0,
    max_retries INTEGER DEFAULT 3,
    retry_successful BOOLEAN DEFAULT FALSE,

    -- Manual resolution tracking
    resolved_at TIMESTAMP,
    resolution_method VARCHAR(100),
    resolved_by VARCHAR(100)
);

-- Basic transaction state tracking
CREATE TABLE active_transactions (
    transaction_id UUID PRIMARY KEY,
    connection_id VARCHAR(100) NOT NULL,
    start_time TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    -- Simple state management
    transaction_status VARCHAR(20) DEFAULT 'active', -- active, committed, rolled_back, failed
    isolation_level VARCHAR(30),
    read_only BOOLEAN DEFAULT FALSE,

    -- Basic operation tracking
    operations_count INTEGER DEFAULT 0,
    tables_affected TEXT[], -- Simple array of table names

    -- Timeout management (basic)
    timeout_seconds INTEGER DEFAULT 300,
    last_activity TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    -- Error tracking
    error_count INTEGER DEFAULT 0,
    last_error_message TEXT,

    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Manual transaction recovery procedure (limited functionality)
CREATE OR REPLACE FUNCTION recover_failed_transaction(
    p_transaction_id UUID,
    p_recovery_strategy VARCHAR(50) DEFAULT 'rollback'
) RETURNS TABLE (
    recovery_status VARCHAR(20),
    recovery_message TEXT,
    operations_recovered INTEGER
) AS $$
DECLARE
    v_transaction_record RECORD;
    v_recovery_count INTEGER := 0;
    v_retry_count INTEGER;
    v_max_retries INTEGER;
BEGIN
    -- Get transaction details
    SELECT * INTO v_transaction_record 
    FROM active_transactions 
    WHERE transaction_id = p_transaction_id;

    IF NOT FOUND THEN
        RETURN QUERY SELECT 'error'::VARCHAR(20), 
                           'Transaction not found'::TEXT, 
                           0::INTEGER;
        RETURN;
    END IF;

    -- Check retry limits (basic logic)
    SELECT retry_count, max_retries INTO v_retry_count, v_max_retries
    FROM transaction_error_log
    WHERE transaction_id = p_transaction_id
    ORDER BY error_occurred_at DESC
    LIMIT 1;

    IF v_retry_count >= v_max_retries THEN
        RETURN QUERY SELECT 'failed'::VARCHAR(20), 
                           'Maximum retries exceeded'::TEXT, 
                           0::INTEGER;
        RETURN;
    END IF;

    -- Simple recovery strategies
    CASE p_recovery_strategy
        WHEN 'rollback' THEN
            BEGIN
                -- Attempt to rollback (very basic)
                UPDATE active_transactions 
                SET transaction_status = 'rolled_back',
                    updated_at = CURRENT_TIMESTAMP
                WHERE transaction_id = p_transaction_id;

                v_recovery_count := 1;

                RETURN QUERY SELECT 'success'::VARCHAR(20), 
                                   'Transaction rolled back'::TEXT, 
                                   v_recovery_count::INTEGER;
            EXCEPTION WHEN OTHERS THEN
                RETURN QUERY SELECT 'error'::VARCHAR(20), 
                                   SQLERRM::TEXT, 
                                   0::INTEGER;
            END;

        WHEN 'retry' THEN
            BEGIN
                -- Basic retry logic (very limited)
                UPDATE transaction_error_log 
                SET retry_count = retry_count + 1,
                    retry_successful = FALSE
                WHERE transaction_id = p_transaction_id;

                -- Reset transaction status for retry
                UPDATE active_transactions 
                SET transaction_status = 'active',
                    error_count = 0,
                    last_error_message = NULL,
                    updated_at = CURRENT_TIMESTAMP
                WHERE transaction_id = p_transaction_id;

                v_recovery_count := 1;

                RETURN QUERY SELECT 'retry'::VARCHAR(20), 
                                   'Transaction queued for retry'::TEXT, 
                                   v_recovery_count::INTEGER;
            EXCEPTION WHEN OTHERS THEN
                RETURN QUERY SELECT 'error'::VARCHAR(20), 
                                   SQLERRM::TEXT, 
                                   0::INTEGER;
            END;

        ELSE
            RETURN QUERY SELECT 'error'::VARCHAR(20), 
                               'Unknown recovery strategy'::TEXT, 
                               0::INTEGER;
    END CASE;
END;
$$ LANGUAGE plpgsql;

-- Basic transaction monitoring query (limited insights)
WITH transaction_health AS (
    SELECT 
        DATE_TRUNC('hour', start_time) as hour_bucket,

        -- Simple transaction metrics
        COUNT(*) as total_transactions,
        COUNT(CASE WHEN transaction_status = 'committed' THEN 1 END) as successful_transactions,
        COUNT(CASE WHEN transaction_status = 'rolled_back' THEN 1 END) as rolled_back_transactions,
        COUNT(CASE WHEN transaction_status = 'failed' THEN 1 END) as failed_transactions,
        COUNT(CASE WHEN transaction_status = 'active' AND 
                        EXTRACT(EPOCH FROM (CURRENT_TIMESTAMP - last_activity)) > timeout_seconds 
                  THEN 1 END) as timed_out_transactions,

        -- Basic performance metrics
        AVG(operations_count) as avg_operations_per_transaction,
        AVG(EXTRACT(EPOCH FROM (updated_at - start_time))) as avg_transaction_duration_seconds,

        -- Simple error analysis
        AVG(error_count) as avg_errors_per_transaction,
        COUNT(CASE WHEN error_count > 0 THEN 1 END) as transactions_with_errors

    FROM active_transactions
    WHERE start_time >= CURRENT_TIMESTAMP - INTERVAL '24 hours'
    GROUP BY DATE_TRUNC('hour', start_time)
),

error_analysis AS (
    SELECT 
        DATE_TRUNC('hour', error_occurred_at) as hour_bucket,
        error_category,

        -- Error statistics
        COUNT(*) as error_count,
        COUNT(CASE WHEN retry_successful = TRUE THEN 1 END) as successful_retries,
        AVG(retry_count) as avg_retry_attempts,

        -- Common errors
        COUNT(CASE WHEN error_code LIKE 'SQLSTATE%' THEN 1 END) as sql_state_errors,
        COUNT(CASE WHEN error_message ILIKE '%timeout%' THEN 1 END) as timeout_errors,
        COUNT(CASE WHEN error_message ILIKE '%connection%' THEN 1 END) as connection_errors,
        COUNT(CASE WHEN error_message ILIKE '%deadlock%' THEN 1 END) as deadlock_errors

    FROM transaction_error_log
    WHERE error_occurred_at >= CURRENT_TIMESTAMP - INTERVAL '24 hours'
    GROUP BY DATE_TRUNC('hour', error_occurred_at), error_category
)

SELECT 
    th.hour_bucket,

    -- Transaction metrics
    th.total_transactions,
    th.successful_transactions,
    th.failed_transactions,
    ROUND((th.successful_transactions::DECIMAL / GREATEST(th.total_transactions, 1)) * 100, 2) as success_rate_percent,

    -- Performance metrics
    ROUND(th.avg_transaction_duration_seconds, 3) as avg_duration_seconds,
    ROUND(th.avg_operations_per_transaction, 1) as avg_operations,

    -- Error metrics
    COALESCE(SUM(ea.error_count), 0) as total_errors,
    COALESCE(SUM(ea.successful_retries), 0) as successful_retries,
    COALESCE(ROUND(AVG(ea.avg_retry_attempts), 1), 0) as avg_retry_attempts,

    -- Error categories
    COALESCE(SUM(ea.timeout_errors), 0) as timeout_errors,
    COALESCE(SUM(ea.connection_errors), 0) as connection_errors,
    COALESCE(SUM(ea.deadlock_errors), 0) as deadlock_errors,

    -- Health indicators
    th.timed_out_transactions,
    CASE 
        WHEN ROUND((th.successful_transactions::DECIMAL / GREATEST(th.total_transactions, 1)) * 100, 2) >= 95 THEN 'Healthy'
        WHEN ROUND((th.successful_transactions::DECIMAL / GREATEST(th.total_transactions, 1)) * 100, 2) >= 90 THEN 'Warning'
        ELSE 'Critical'
    END as health_status

FROM transaction_health th
LEFT JOIN error_analysis ea ON th.hour_bucket = ea.hour_bucket
GROUP BY th.hour_bucket, th.total_transactions, th.successful_transactions, 
         th.failed_transactions, th.avg_transaction_duration_seconds, 
         th.avg_operations_per_transaction, th.timed_out_transactions
ORDER BY th.hour_bucket DESC;

-- Problems with traditional transaction error handling:
-- 1. Basic error categorization with limited diagnostic information
-- 2. Manual retry logic without intelligent backoff strategies
-- 3. No automatic recovery based on error type and context
-- 4. Limited visibility into transaction state and progress
-- 5. Basic timeout handling without consideration of operation complexity
-- 6. No integration with connection pool health and server status
-- 7. Manual intervention required for most recovery scenarios
-- 8. Limited support for distributed transaction patterns
-- 9. Basic error aggregation without trend analysis
-- 10. No automatic optimization based on error patterns

MongoDB's intelligent transaction error handling eliminates these limitations:

// MongoDB advanced transaction error handling - intelligent and resilient
const { MongoClient } = require('mongodb');

// Comprehensive transaction error handling and recovery system
class MongoTransactionManager {
  constructor(client, options = {}) {
    this.client = client;
    this.options = {
      // Retry configuration
      maxRetryAttempts: options.maxRetryAttempts || 5,
      initialRetryDelayMs: options.initialRetryDelayMs || 100,
      maxRetryDelayMs: options.maxRetryDelayMs || 5000,
      retryDelayMultiplier: options.retryDelayMultiplier || 2,
      jitterFactor: options.jitterFactor || 0.1,

      // Transaction configuration
      defaultTransactionOptions: {
        readConcern: { level: options.readConcernLevel || 'snapshot' },
        writeConcern: { w: options.writeConcernW || 'majority', j: true },
        readPreference: options.readPreference || 'primary',
        maxCommitTimeMS: options.maxCommitTimeMS || 10000
      },

      // Error handling configuration
      retryableErrorCodes: options.retryableErrorCodes || [
        112, // WriteConflict
        117, // ConflictingOperationInProgress  
        133, // FailedToSatisfyReadPreference
        134, // ReadConcernMajorityNotAvailableYet
        208, // ExceededTimeLimit
        225, // LockTimeout
        244, // TransactionTooLarge
        251, // NoSuchTransaction
        256, // TransactionAborted
        261, // ExceededMaxTimeMS
        263, // TemporarilyUnavailable
        6   // HostUnreachable
      ],

      // Monitoring configuration
      enableDetailedLogging: options.enableDetailedLogging || true,
      enableMetricsCollection: options.enableMetricsCollection || true
    };

    this.transactionMetrics = {
      totalTransactions: 0,
      successfulTransactions: 0,
      failedTransactions: 0,
      retriedTransactions: 0,
      totalRetryAttempts: 0,
      errorsByCode: new Map(),
      errorsByCategory: new Map(),
      performanceStats: {
        averageTransactionDuration: 0,
        transactionDurations: [],
        retryDelays: [],
        averageRetryDelay: 0
      }
    };

    this.activeTransactions = new Map();
  }

  // Execute transaction with comprehensive error handling and retry logic
  async executeTransactionWithRetry(transactionFunction, transactionOptions = {}) {
    const transactionId = this.generateTransactionId();
    const startTime = Date.now();

    // Merge transaction options
    const mergedOptions = {
      ...this.options.defaultTransactionOptions,
      ...transactionOptions
    };

    let attempt = 1;
    let lastError = null;
    let session = null;

    // Track active transaction
    this.activeTransactions.set(transactionId, {
      id: transactionId,
      startTime: startTime,
      attempt: attempt,
      status: 'active',
      operationsExecuted: 0,
      errors: []
    });

    try {
      while (attempt <= this.options.maxRetryAttempts) {
        try {
          // Create new session for each attempt
          session = this.client.startSession();

          this.log(`Starting transaction ${transactionId}, attempt ${attempt}`);

          // Update transaction tracking
          this.updateTransactionStatus(transactionId, 'active', { attempt });

          // Execute transaction with intelligent error handling
          const result = await session.withTransaction(
            async (sessionContext) => {
              try {
                // Execute the user-provided transaction function
                const transactionResult = await transactionFunction(sessionContext, {
                  transactionId,
                  attempt,
                  onOperation: (operation) => this.trackOperation(transactionId, operation)
                });

                this.log(`Transaction ${transactionId} executed successfully on attempt ${attempt}`);
                return transactionResult;

              } catch (error) {
                this.log(`Transaction ${transactionId} error in user function:`, error);
                throw error;
              }
            },
            mergedOptions
          );

          // Transaction successful
          const duration = Date.now() - startTime;

          this.updateTransactionStatus(transactionId, 'committed', { 
            duration,
            totalAttempts: attempt 
          });

          this.recordSuccessfulTransaction(transactionId, duration, attempt);

          this.log(`Transaction ${transactionId} committed successfully after ${attempt} attempts (${duration}ms)`);

          return {
            success: true,
            result: result,
            transactionId: transactionId,
            attempts: attempt,
            duration: duration,
            metrics: this.getTransactionMetrics(transactionId)
          };

        } catch (error) {
          lastError = error;

          this.log(`Transaction ${transactionId} attempt ${attempt} failed:`, error);

          // Record error for analysis
          this.recordTransactionError(transactionId, error, attempt);

          // Analyze error and determine if retry is appropriate
          const errorAnalysis = this.analyzeTransactionError(error);

          if (!errorAnalysis.retryable || attempt >= this.options.maxRetryAttempts) {
            // Error is not retryable or max attempts reached
            this.updateTransactionStatus(transactionId, 'failed', { 
              finalError: error,
              totalAttempts: attempt,
              errorAnalysis 
            });

            break;
          }

          // Calculate intelligent retry delay
          const retryDelay = this.calculateRetryDelay(attempt, errorAnalysis);

          this.log(`Transaction ${transactionId} will retry in ${retryDelay}ms (attempt ${attempt + 1}/${this.options.maxRetryAttempts})`);

          // Update metrics
          this.transactionMetrics.totalRetryAttempts++;
          this.transactionMetrics.performanceStats.retryDelays.push(retryDelay);

          // Wait before retry
          if (retryDelay > 0) {
            await this.sleep(retryDelay);
          }

          attempt++;

        } finally {
          // Always close session
          if (session) {
            try {
              await session.endSession();
            } catch (sessionError) {
              this.log(`Error ending session for transaction ${transactionId}:`, sessionError);
            }
          }
        }
      }

      // All retries exhausted
      const totalDuration = Date.now() - startTime;

      this.recordFailedTransaction(transactionId, lastError, attempt - 1, totalDuration);

      this.log(`Transaction ${transactionId} failed after ${attempt - 1} attempts (${totalDuration}ms)`);

      return {
        success: false,
        error: lastError,
        transactionId: transactionId,
        attempts: attempt - 1,
        duration: totalDuration,
        errorAnalysis: this.analyzeTransactionError(lastError),
        metrics: this.getTransactionMetrics(transactionId),
        recoveryRecommendations: this.generateRecoveryRecommendations(transactionId, lastError)
      };

    } finally {
      // Clean up transaction tracking
      this.activeTransactions.delete(transactionId);
    }
  }

  // Intelligent error analysis for MongoDB transactions
  analyzeTransactionError(error) {
    const analysis = {
      errorCode: error.code,
      errorMessage: error.message,
      errorName: error.name,
      retryable: false,
      category: 'unknown',
      severity: 'medium',
      recommendedAction: 'investigate',
      estimatedRecoveryTime: 0,
      contextualInfo: {}
    };

    // Categorize error based on code and message
    if (error.code) {
      // Transient errors that should be retried
      if (this.options.retryableErrorCodes.includes(error.code)) {
        analysis.retryable = true;
        analysis.category = this.categorizeMongoError(error.code);
        analysis.severity = 'low';
        analysis.recommendedAction = 'retry';
        analysis.estimatedRecoveryTime = this.estimateRecoveryTime(error.code);
      }

      // Specific error code analysis
      switch (error.code) {
        case 112: // WriteConflict
          analysis.category = 'concurrency';
          analysis.recommendedAction = 'retry_with_backoff';
          analysis.contextualInfo.suggestion = 'Consider optimizing transaction scope to reduce conflicts';
          break;

        case 117: // ConflictingOperationInProgress
          analysis.category = 'concurrency';
          analysis.recommendedAction = 'retry_with_longer_delay';
          analysis.contextualInfo.suggestion = 'Wait for conflicting operation to complete';
          break;

        case 133: // FailedToSatisfyReadPreference
          analysis.category = 'availability';
          analysis.recommendedAction = 'check_replica_set_status';
          analysis.contextualInfo.suggestion = 'Verify replica set member availability';
          break;

        case 208: // ExceededTimeLimit
        case 261: // ExceededMaxTimeMS
          analysis.category = 'timeout';
          analysis.recommendedAction = 'optimize_or_increase_timeout';
          analysis.contextualInfo.suggestion = 'Consider breaking transaction into smaller operations';
          break;

        case 244: // TransactionTooLarge
          analysis.category = 'resource';
          analysis.retryable = false;
          analysis.severity = 'high';
          analysis.recommendedAction = 'reduce_transaction_size';
          analysis.contextualInfo.suggestion = 'Split transaction into smaller operations';
          break;

        case 251: // NoSuchTransaction
          analysis.category = 'state';
          analysis.recommendedAction = 'restart_transaction';
          analysis.contextualInfo.suggestion = 'Transaction may have been cleaned up by server';
          break;

        case 256: // TransactionAborted
          analysis.category = 'aborted';
          analysis.recommendedAction = 'retry_full_transaction';
          analysis.contextualInfo.suggestion = 'Transaction was aborted due to conflict or timeout';
          break;
      }
    }

    // Network-related errors
    if (error.message && (
      error.message.includes('network') || 
      error.message.includes('connection') ||
      error.message.includes('timeout') ||
      error.message.includes('unreachable')
    )) {
      analysis.retryable = true;
      analysis.category = 'network';
      analysis.recommendedAction = 'retry_with_exponential_backoff';
      analysis.estimatedRecoveryTime = 5000; // 5 seconds
      analysis.contextualInfo.suggestion = 'Check network connectivity and server status';
    }

    // Resource exhaustion errors
    if (error.message && (
      error.message.includes('memory') ||
      error.message.includes('disk space') ||
      error.message.includes('too many connections')
    )) {
      analysis.retryable = true;
      analysis.category = 'resource';
      analysis.severity = 'high';
      analysis.recommendedAction = 'wait_for_resources';
      analysis.estimatedRecoveryTime = 10000; // 10 seconds
      analysis.contextualInfo.suggestion = 'Monitor server resource usage';
    }

    return analysis;
  }

  categorizeMongoError(errorCode) {
    const errorCategories = {
      112: 'concurrency',    // WriteConflict
      117: 'concurrency',    // ConflictingOperationInProgress
      133: 'availability',   // FailedToSatisfyReadPreference
      134: 'availability',   // ReadConcernMajorityNotAvailableYet
      208: 'timeout',        // ExceededTimeLimit
      225: 'concurrency',    // LockTimeout
      244: 'resource',       // TransactionTooLarge
      251: 'state',          // NoSuchTransaction
      256: 'aborted',        // TransactionAborted
      261: 'timeout',        // ExceededMaxTimeMS
      263: 'availability',   // TemporarilyUnavailable
      6: 'network'           // HostUnreachable
    };

    return errorCategories[errorCode] || 'unknown';
  }

  estimateRecoveryTime(errorCode) {
    const recoveryTimes = {
      112: 100,   // WriteConflict - quick retry
      117: 500,   // ConflictingOperationInProgress - wait for operation
      133: 2000,  // FailedToSatisfyReadPreference - wait for replica
      134: 1000,  // ReadConcernMajorityNotAvailableYet - wait for majority
      208: 5000,  // ExceededTimeLimit - wait before retry
      225: 200,   // LockTimeout - quick retry
      251: 100,   // NoSuchTransaction - immediate retry
      256: 300,   // TransactionAborted - short wait
      261: 3000,  // ExceededMaxTimeMS - moderate wait
      263: 1000,  // TemporarilyUnavailable - short wait
      6: 5000     // HostUnreachable - wait for network
    };

    return recoveryTimes[errorCode] || 1000;
  }

  // Calculate intelligent retry delay with exponential backoff and jitter
  calculateRetryDelay(attemptNumber, errorAnalysis) {
    // Base delay calculation with exponential backoff
    let baseDelay = Math.min(
      this.options.initialRetryDelayMs * Math.pow(this.options.retryDelayMultiplier, attemptNumber - 1),
      this.options.maxRetryDelayMs
    );

    // Adjust based on error analysis
    if (errorAnalysis.estimatedRecoveryTime > 0) {
      baseDelay = Math.max(baseDelay, errorAnalysis.estimatedRecoveryTime);
    }

    // Add jitter to prevent thundering herd
    const jitterRange = baseDelay * this.options.jitterFactor;
    const jitter = (Math.random() * 2 - 1) * jitterRange; // Random value between -jitterRange and +jitterRange

    const finalDelay = Math.max(0, Math.floor(baseDelay + jitter));

    this.log(`Calculated retry delay: base=${baseDelay}ms, jitter=${jitter.toFixed(1)}ms, final=${finalDelay}ms`);

    return finalDelay;
  }

  // Generate recovery recommendations based on error patterns
  generateRecoveryRecommendations(transactionId, error) {
    const recommendations = [];
    const errorAnalysis = this.analyzeTransactionError(error);

    // Category-specific recommendations
    switch (errorAnalysis.category) {
      case 'concurrency':
        recommendations.push({
          type: 'optimization',
          priority: 'medium',
          description: 'Optimize transaction scope to reduce write conflicts',
          actions: [
            'Consider breaking large transactions into smaller operations',
            'Review document access patterns for optimization opportunities',
            'Implement optimistic locking where appropriate'
          ]
        });
        break;

      case 'timeout':
        recommendations.push({
          type: 'configuration',
          priority: 'high',
          description: 'Address transaction timeout issues',
          actions: [
            'Increase maxCommitTimeMS if operations are legitimately slow',
            'Optimize query performance with proper indexing',
            'Consider breaking complex operations into smaller transactions'
          ]
        });
        break;

      case 'resource':
        recommendations.push({
          type: 'scaling',
          priority: 'high',
          description: 'Address resource constraints',
          actions: [
            'Monitor server resource usage (CPU, memory, disk)',
            'Consider vertical or horizontal scaling',
            'Implement connection pooling optimization'
          ]
        });
        break;

      case 'network':
        recommendations.push({
          type: 'infrastructure',
          priority: 'high',
          description: 'Address network connectivity issues',
          actions: [
            'Check network connectivity between application and database',
            'Verify MongoDB server status and availability',
            'Consider implementing circuit breaker pattern'
          ]
        });
        break;

      case 'availability':
        recommendations.push({
          type: 'deployment',
          priority: 'high',
          description: 'Address replica set availability',
          actions: [
            'Check replica set member status',
            'Verify read preference configuration',
            'Monitor replica lag and catch-up status'
          ]
        });
        break;
    }

    // Pattern-based recommendations
    const transactionHistory = this.getTransactionHistory(transactionId);
    if (transactionHistory && transactionHistory.errors.length > 1) {
      // Check for recurring error patterns
      const errorCodes = transactionHistory.errors.map(e => e.code);
      const uniqueErrorCodes = [...new Set(errorCodes)];

      if (uniqueErrorCodes.length === 1) {
        recommendations.push({
          type: 'pattern',
          priority: 'high',
          description: 'Recurring error pattern detected',
          actions: [
            `Address root cause of error ${uniqueErrorCodes[0]}`,
            'Consider implementing circuit breaker pattern',
            'Review application architecture for reliability improvements'
          ]
        });
      }
    }

    return recommendations;
  }

  // Advanced transaction monitoring and metrics collection
  recordSuccessfulTransaction(transactionId, duration, attempts) {
    this.transactionMetrics.totalTransactions++;
    this.transactionMetrics.successfulTransactions++;

    if (attempts > 1) {
      this.transactionMetrics.retriedTransactions++;
    }

    // Update performance statistics
    this.transactionMetrics.performanceStats.transactionDurations.push(duration);

    // Keep only recent durations for average calculation
    if (this.transactionMetrics.performanceStats.transactionDurations.length > 1000) {
      this.transactionMetrics.performanceStats.transactionDurations = 
        this.transactionMetrics.performanceStats.transactionDurations.slice(-500);
    }

    // Recalculate average
    this.transactionMetrics.performanceStats.averageTransactionDuration = 
      this.transactionMetrics.performanceStats.transactionDurations.reduce((sum, d) => sum + d, 0) /
      this.transactionMetrics.performanceStats.transactionDurations.length;

    this.log(`Transaction ${transactionId} metrics recorded: duration=${duration}ms, attempts=${attempts}`);
  }

  recordFailedTransaction(transactionId, error, attempts, duration) {
    this.transactionMetrics.totalTransactions++;
    this.transactionMetrics.failedTransactions++;

    if (attempts > 1) {
      this.transactionMetrics.retriedTransactions++;
    }

    // Record error statistics
    const errorCode = error.code || 'unknown';
    const currentCount = this.transactionMetrics.errorsByCode.get(errorCode) || 0;
    this.transactionMetrics.errorsByCode.set(errorCode, currentCount + 1);

    const errorCategory = this.categorizeMongoError(error.code);
    const currentCategoryCount = this.transactionMetrics.errorsByCategory.get(errorCategory) || 0;
    this.transactionMetrics.errorsByCategory.set(errorCategory, currentCategoryCount + 1);

    this.log(`Transaction ${transactionId} failure recorded: error=${errorCode}, attempts=${attempts}, duration=${duration}ms`);
  }

  recordTransactionError(transactionId, error, attempt) {
    const transaction = this.activeTransactions.get(transactionId);
    if (transaction) {
      transaction.errors.push({
        attempt: attempt,
        error: error,
        timestamp: new Date(),
        errorCode: error.code,
        errorMessage: error.message,
        analysis: this.analyzeTransactionError(error)
      });
    }
  }

  updateTransactionStatus(transactionId, status, additionalInfo = {}) {
    const transaction = this.activeTransactions.get(transactionId);
    if (transaction) {
      transaction.status = status;
      transaction.lastUpdated = new Date();
      Object.assign(transaction, additionalInfo);
    }
  }

  trackOperation(transactionId, operation) {
    const transaction = this.activeTransactions.get(transactionId);
    if (transaction) {
      transaction.operationsExecuted++;
      transaction.lastOperation = {
        type: operation.type,
        collection: operation.collection,
        timestamp: new Date()
      };
    }
  }

  getTransactionMetrics(transactionId) {
    const transaction = this.activeTransactions.get(transactionId);
    return {
      transactionId: transactionId,
      operationsExecuted: transaction ? transaction.operationsExecuted : 0,
      errors: transaction ? transaction.errors : [],
      status: transaction ? transaction.status : 'unknown',
      startTime: transaction ? transaction.startTime : null,
      duration: transaction ? Date.now() - transaction.startTime : 0
    };
  }

  getTransactionHistory(transactionId) {
    return this.activeTransactions.get(transactionId);
  }

  // Comprehensive transaction health monitoring
  getTransactionHealthReport() {
    const report = {
      timestamp: new Date(),
      overall: {
        totalTransactions: this.transactionMetrics.totalTransactions,
        successfulTransactions: this.transactionMetrics.successfulTransactions,
        failedTransactions: this.transactionMetrics.failedTransactions,
        retriedTransactions: this.transactionMetrics.retriedTransactions,
        totalRetryAttempts: this.transactionMetrics.totalRetryAttempts,
        successRate: this.transactionMetrics.totalTransactions > 0 ? 
          (this.transactionMetrics.successfulTransactions / this.transactionMetrics.totalTransactions) * 100 : 0,
        retryRate: this.transactionMetrics.totalTransactions > 0 ?
          (this.transactionMetrics.retriedTransactions / this.transactionMetrics.totalTransactions) * 100 : 0
      },
      performance: {
        averageTransactionDuration: this.transactionMetrics.performanceStats.averageTransactionDuration,
        averageRetryDelay: this.transactionMetrics.performanceStats.retryDelays.length > 0 ?
          this.transactionMetrics.performanceStats.retryDelays.reduce((sum, d) => sum + d, 0) /
          this.transactionMetrics.performanceStats.retryDelays.length : 0,
        totalRecentTransactions: this.transactionMetrics.performanceStats.transactionDurations.length
      },
      errors: {
        byCode: Object.fromEntries(this.transactionMetrics.errorsByCode),
        byCategory: Object.fromEntries(this.transactionMetrics.errorsByCategory),
        mostCommonError: this.getMostCommonError(),
        mostCommonCategory: this.getMostCommonErrorCategory()
      },
      activeTransactions: {
        count: this.activeTransactions.size,
        transactions: Array.from(this.activeTransactions.values()).map(t => ({
          id: t.id,
          status: t.status,
          duration: Date.now() - t.startTime,
          attempts: t.attempt,
          operationsExecuted: t.operationsExecuted,
          errorCount: t.errors ? t.errors.length : 0
        }))
      }
    };

    return report;
  }

  getMostCommonError() {
    let maxCount = 0;
    let mostCommonError = null;

    for (const [errorCode, count] of this.transactionMetrics.errorsByCode.entries()) {
      if (count > maxCount) {
        maxCount = count;
        mostCommonError = { code: errorCode, count: count };
      }
    }

    return mostCommonError;
  }

  getMostCommonErrorCategory() {
    let maxCount = 0;
    let mostCommonCategory = null;

    for (const [category, count] of this.transactionMetrics.errorsByCategory.entries()) {
      if (count > maxCount) {
        maxCount = count;
        mostCommonCategory = { category: category, count: count };
      }
    }

    return mostCommonCategory;
  }

  // Utility methods
  generateTransactionId() {
    return `txn_${Date.now()}_${Math.random().toString(36).substr(2, 9)}`;
  }

  sleep(ms) {
    return new Promise(resolve => setTimeout(resolve, ms));
  }

  log(message, error = null) {
    if (this.options.enableDetailedLogging) {
      const timestamp = new Date().toISOString();
      if (error) {
        console.log(`[${timestamp}] ${message}`, error);
      } else {
        console.log(`[${timestamp}] ${message}`);
      }
    }
  }
}

// Example usage with comprehensive error handling
async function demonstrateTransactionErrorHandling() {
  const client = new MongoClient('mongodb://localhost:27017');
  await client.connect();

  const transactionManager = new MongoTransactionManager(client, {
    maxRetryAttempts: 3,
    initialRetryDelayMs: 100,
    maxRetryDelayMs: 5000,
    enableDetailedLogging: true,
    enableMetricsCollection: true
  });

  try {
    // Example transaction with comprehensive error handling
    const result = await transactionManager.executeTransactionWithRetry(
      async (session, context) => {
        const { transactionId, attempt } = context;

        console.log(`Executing business logic for transaction ${transactionId}, attempt ${attempt}`);

        const db = client.db('ecommerce');
        const ordersCollection = db.collection('orders');
        const inventoryCollection = db.collection('inventory');
        const accountsCollection = db.collection('accounts');

        // Track operations for monitoring
        context.onOperation({ type: 'insert', collection: 'orders' });
        context.onOperation({ type: 'update', collection: 'inventory' });
        context.onOperation({ type: 'update', collection: 'accounts' });

        // Complex business transaction
        const order = {
          orderId: `order_${Date.now()}`,
          customerId: 'customer_123',
          items: [
            { productId: 'prod_456', quantity: 2, price: 29.99 },
            { productId: 'prod_789', quantity: 1, price: 49.99 }
          ],
          totalAmount: 109.97,
          status: 'pending',
          createdAt: new Date()
        };

        // Insert order
        const orderResult = await ordersCollection.insertOne(order, { session });

        // Update inventory
        for (const item of order.items) {
          const inventoryUpdate = await inventoryCollection.updateOne(
            { productId: item.productId, quantity: { $gte: item.quantity } },
            { $inc: { quantity: -item.quantity } },
            { session }
          );

          if (inventoryUpdate.modifiedCount === 0) {
            throw new Error(`Insufficient inventory for product ${item.productId}`);
          }
        }

        // Update customer account
        await accountsCollection.updateOne(
          { customerId: order.customerId },
          { 
            $inc: { totalOrders: 1, totalSpent: order.totalAmount },
            $set: { lastOrderDate: new Date() }
          },
          { session }
        );

        return {
          orderId: order.orderId,
          orderResult: orderResult,
          message: 'Order processed successfully'
        };
      },
      {
        // Custom transaction options
        maxCommitTimeMS: 15000,
        readConcern: { level: 'snapshot' },
        writeConcern: { w: 'majority', j: true }
      }
    );

    if (result.success) {
      console.log('Transaction completed successfully:', result);
    } else {
      console.error('Transaction failed after all retries:', result);
    }

    // Get comprehensive health report
    const healthReport = transactionManager.getTransactionHealthReport();
    console.log('Transaction Health Report:', JSON.stringify(healthReport, null, 2));

  } catch (error) {
    console.error('Unexpected error:', error);
  } finally {
    await client.close();
  }
}

// Benefits of MongoDB intelligent transaction error handling:
// - Automatic retry logic with exponential backoff and jitter
// - Intelligent error classification and recovery recommendations
// - Comprehensive transaction state tracking and monitoring
// - Advanced performance metrics and health reporting
// - Context-aware error analysis and recovery strategies
// - Built-in support for MongoDB-specific error patterns
// - Detailed logging and diagnostic information
// - Integration with MongoDB driver optimization features
// - Automatic detection of retryable vs. non-retryable errors
// - Production-ready resilience and reliability patterns

Advanced Error Recovery Patterns

Sophisticated recovery strategies for production-grade MongoDB applications:

// Advanced MongoDB error recovery patterns for enterprise resilience
class MongoResilienceManager {
  constructor(client, options = {}) {
    this.client = client;
    this.transactionManager = new MongoTransactionManager(client, options);

    this.recoveryStrategies = new Map();
    this.circuitBreakers = new Map();
    this.healthCheckers = new Map();

    this.options = {
      // Circuit breaker configuration
      circuitBreakerThreshold: options.circuitBreakerThreshold || 5,
      circuitBreakerTimeout: options.circuitBreakerTimeout || 60000,
      circuitBreakerVolumeThreshold: options.circuitBreakerVolumeThreshold || 10,

      // Health check configuration
      healthCheckInterval: options.healthCheckInterval || 30000,
      healthCheckTimeout: options.healthCheckTimeout || 5000,

      // Recovery configuration
      enableAutomaticRecovery: options.enableAutomaticRecovery || true,
      maxRecoveryAttempts: options.maxRecoveryAttempts || 3
    };

    this.initialize();
  }

  initialize() {
    // Set up circuit breakers for different operation types
    this.setupCircuitBreakers();

    // Initialize health monitoring
    this.startHealthMonitoring();

    // Register recovery strategies
    this.registerRecoveryStrategies();
  }

  setupCircuitBreakers() {
    const operationTypes = ['transaction', 'query', 'update', 'insert', 'delete'];

    operationTypes.forEach(opType => {
      this.circuitBreakers.set(opType, {
        state: 'closed', // closed, open, half-open
        failureCount: 0,
        lastFailureTime: null,
        successCount: 0,
        totalRequests: 0,
        threshold: this.options.circuitBreakerThreshold,
        timeout: this.options.circuitBreakerTimeout,
        volumeThreshold: this.options.circuitBreakerVolumeThreshold
      });
    });
  }

  // Execute operation with circuit breaker protection
  async executeWithCircuitBreaker(operationType, operation) {
    const circuitBreaker = this.circuitBreakers.get(operationType);

    if (!circuitBreaker) {
      throw new Error(`No circuit breaker configured for operation type: ${operationType}`);
    }

    // Check circuit breaker state
    const canExecute = this.checkCircuitBreaker(circuitBreaker);

    if (!canExecute) {
      throw new Error(`Circuit breaker is OPEN for ${operationType}. Service temporarily unavailable.`);
    }

    try {
      // Execute operation
      const result = await operation();

      // Record success
      this.recordCircuitBreakerSuccess(circuitBreaker);

      return result;

    } catch (error) {
      // Record failure
      this.recordCircuitBreakerFailure(circuitBreaker);

      throw error;
    }
  }

  checkCircuitBreaker(circuitBreaker) {
    const now = Date.now();

    switch (circuitBreaker.state) {
      case 'closed':
        return true;

      case 'open':
        // Check if timeout has elapsed
        if (now - circuitBreaker.lastFailureTime >= circuitBreaker.timeout) {
          circuitBreaker.state = 'half-open';
          return true;
        }
        return false;

      case 'half-open':
        return true;

      default:
        return false;
    }
  }

  recordCircuitBreakerSuccess(circuitBreaker) {
    circuitBreaker.successCount++;
    circuitBreaker.totalRequests++;

    if (circuitBreaker.state === 'half-open') {
      // Reset circuit breaker on successful half-open request
      circuitBreaker.state = 'closed';
      circuitBreaker.failureCount = 0;
    }
  }

  recordCircuitBreakerFailure(circuitBreaker) {
    circuitBreaker.failureCount++;
    circuitBreaker.totalRequests++;
    circuitBreaker.lastFailureTime = Date.now();

    // Check if should open circuit
    if (circuitBreaker.totalRequests >= circuitBreaker.volumeThreshold &&
        circuitBreaker.failureCount >= circuitBreaker.threshold) {
      circuitBreaker.state = 'open';
      console.log(`Circuit breaker opened due to ${circuitBreaker.failureCount} failures`);
    }
  }

  // Comprehensive transaction execution with full resilience features
  async executeResilientTransaction(transactionFunction, options = {}) {
    const operationType = 'transaction';

    return await this.executeWithCircuitBreaker(operationType, async () => {
      // Execute transaction with comprehensive error handling
      const result = await this.transactionManager.executeTransactionWithRetry(
        transactionFunction,
        options
      );

      // If transaction failed, attempt recovery if enabled
      if (!result.success && this.options.enableAutomaticRecovery) {
        const recoveryResult = await this.attemptTransactionRecovery(result);
        if (recoveryResult && recoveryResult.success) {
          return recoveryResult;
        }
      }

      return result;
    });
  }

  // Intelligent transaction recovery based on error patterns
  async attemptTransactionRecovery(failedResult) {
    const { error, transactionId, attempts, errorAnalysis } = failedResult;

    console.log(`Attempting recovery for failed transaction ${transactionId}`);

    // Get appropriate recovery strategy
    const recoveryStrategy = this.getRecoveryStrategy(errorAnalysis);

    if (!recoveryStrategy) {
      console.log(`No recovery strategy available for error category: ${errorAnalysis.category}`);
      return null;
    }

    try {
      const recoveryResult = await recoveryStrategy.execute(failedResult);

      console.log(`Recovery attempt completed for transaction ${transactionId}:`, recoveryResult);

      return recoveryResult;

    } catch (recoveryError) {
      console.error(`Recovery failed for transaction ${transactionId}:`, recoveryError);
      return null;
    }
  }

  registerRecoveryStrategies() {
    // Network connectivity recovery
    this.recoveryStrategies.set('network', {
      execute: async (failedResult) => {
        console.log('Executing network recovery strategy');

        // Wait for network to recover
        await this.waitForNetworkRecovery();

        // Check server connectivity
        const healthOk = await this.performHealthCheck();

        if (healthOk) {
          console.log('Network recovery successful, retrying transaction');
          // Could retry the transaction here if the original function is available
          return { success: true, recovered: true, strategy: 'network' };
        }

        return { success: false, recovered: false, strategy: 'network' };
      }
    });

    // Resource recovery
    this.recoveryStrategies.set('resource', {
      execute: async (failedResult) => {
        console.log('Executing resource recovery strategy');

        // Wait for resources to become available
        await this.waitForResourceAvailability();

        // Check resource status
        const resourcesOk = await this.checkResourceStatus();

        if (resourcesOk) {
          console.log('Resource recovery successful');
          return { success: true, recovered: true, strategy: 'resource' };
        }

        return { success: false, recovered: false, strategy: 'resource' };
      }
    });

    // Availability recovery (replica set issues)
    this.recoveryStrategies.set('availability', {
      execute: async (failedResult) => {
        console.log('Executing availability recovery strategy');

        // Check replica set status
        const replicaSetOk = await this.checkReplicaSetHealth();

        if (replicaSetOk) {
          console.log('Availability recovery successful');
          return { success: true, recovered: true, strategy: 'availability' };
        }

        // Wait for replica set to recover
        await this.waitForReplicaSetRecovery();

        const recoveredReplicaSetOk = await this.checkReplicaSetHealth();

        return {
          success: recoveredReplicaSetOk,
          recovered: recoveredReplicaSetOk,
          strategy: 'availability'
        };
      }
    });
  }

  getRecoveryStrategy(errorAnalysis) {
    return this.recoveryStrategies.get(errorAnalysis.category);
  }

  // Health monitoring and recovery assistance
  startHealthMonitoring() {
    setInterval(async () => {
      try {
        await this.performComprehensiveHealthCheck();
      } catch (error) {
        console.error('Health monitoring error:', error);
      }
    }, this.options.healthCheckInterval);
  }

  async performComprehensiveHealthCheck() {
    const healthStatus = {
      timestamp: new Date(),
      overall: 'unknown',
      components: {}
    };

    try {
      // Check basic connectivity
      healthStatus.components.connectivity = await this.checkConnectivity();

      // Check replica set status
      healthStatus.components.replicaSet = await this.checkReplicaSetHealth();

      // Check resource status
      healthStatus.components.resources = await this.checkResourceStatus();

      // Check circuit breaker status
      healthStatus.components.circuitBreakers = this.getCircuitBreakerStatus();

      // Check transaction manager health
      healthStatus.components.transactionManager = this.transactionManager.getTransactionHealthReport();

      // Determine overall health
      const componentStatuses = Object.values(healthStatus.components);
      const healthyComponents = componentStatuses.filter(status => 
        status === true || (typeof status === 'object' && status.healthy !== false)
      );

      if (healthyComponents.length === componentStatuses.length) {
        healthStatus.overall = 'healthy';
      } else if (healthyComponents.length >= componentStatuses.length * 0.7) {
        healthStatus.overall = 'degraded';
      } else {
        healthStatus.overall = 'unhealthy';
      }

      // Store health status
      this.lastHealthStatus = healthStatus;

      return healthStatus;

    } catch (error) {
      healthStatus.overall = 'error';
      healthStatus.error = error.message;
      return healthStatus;
    }
  }

  async checkConnectivity() {
    try {
      const admin = this.client.db('admin');
      await admin.command({ ping: 1 }, { maxTimeMS: this.options.healthCheckTimeout });
      return true;
    } catch (error) {
      return false;
    }
  }

  async checkReplicaSetHealth() {
    try {
      const admin = this.client.db('admin');
      const status = await admin.command({ replSetGetStatus: 1 });

      // Check if majority of members are healthy
      const healthyMembers = status.members.filter(member => 
        member.health === 1 && ['PRIMARY', 'SECONDARY'].includes(member.stateStr)
      );

      return {
        healthy: healthyMembers.length >= Math.ceil(status.members.length / 2),
        totalMembers: status.members.length,
        healthyMembers: healthyMembers.length,
        primaryAvailable: status.members.some(m => m.stateStr === 'PRIMARY')
      };

    } catch (error) {
      // Might not be a replica set or insufficient privileges
      return { healthy: true, note: 'Replica set status unavailable' };
    }
  }

  async checkResourceStatus() {
    try {
      const admin = this.client.db('admin');
      const serverStatus = await admin.command({ serverStatus: 1 });

      const memUsage = serverStatus.mem.resident / serverStatus.mem.virtual;
      const connectionUsage = serverStatus.connections.current / serverStatus.connections.available;

      return {
        healthy: memUsage < 0.9 && connectionUsage < 0.9,
        memoryUsage: memUsage,
        connectionUsage: connectionUsage,
        connections: serverStatus.connections,
        memory: serverStatus.mem
      };

    } catch (error) {
      return { healthy: false, error: error.message };
    }
  }

  getCircuitBreakerStatus() {
    const status = {};

    for (const [opType, breaker] of this.circuitBreakers.entries()) {
      status[opType] = {
        state: breaker.state,
        failureCount: breaker.failureCount,
        successCount: breaker.successCount,
        totalRequests: breaker.totalRequests,
        failureRate: breaker.totalRequests > 0 ? 
          (breaker.failureCount / breaker.totalRequests) * 100 : 0
      };
    }

    return status;
  }

  // Recovery assistance methods
  async waitForNetworkRecovery() {
    const maxWaitTime = 30000; // 30 seconds
    const checkInterval = 1000;  // 1 second
    let waited = 0;

    while (waited < maxWaitTime) {
      try {
        const connected = await this.checkConnectivity();
        if (connected) {
          return true;
        }
      } catch (error) {
        // Continue waiting
      }

      await this.sleep(checkInterval);
      waited += checkInterval;
    }

    return false;
  }

  async waitForResourceAvailability() {
    const maxWaitTime = 60000; // 60 seconds
    const checkInterval = 5000;  // 5 seconds
    let waited = 0;

    while (waited < maxWaitTime) {
      try {
        const resourceStatus = await this.checkResourceStatus();
        if (resourceStatus.healthy) {
          return true;
        }
      } catch (error) {
        // Continue waiting
      }

      await this.sleep(checkInterval);
      waited += checkInterval;
    }

    return false;
  }

  async waitForReplicaSetRecovery() {
    const maxWaitTime = 120000; // 2 minutes
    const checkInterval = 10000;  // 10 seconds
    let waited = 0;

    while (waited < maxWaitTime) {
      try {
        const replicaStatus = await this.checkReplicaSetHealth();
        if (replicaStatus.healthy) {
          return true;
        }
      } catch (error) {
        // Continue waiting
      }

      await this.sleep(checkInterval);
      waited += checkInterval;
    }

    return false;
  }

  async performHealthCheck() {
    const health = await this.performComprehensiveHealthCheck();
    return health.overall === 'healthy' || health.overall === 'degraded';
  }

  sleep(ms) {
    return new Promise(resolve => setTimeout(resolve, ms));
  }

  // Get comprehensive resilience report
  getResilienceReport() {
    return {
      timestamp: new Date(),
      circuitBreakers: this.getCircuitBreakerStatus(),
      transactionHealth: this.transactionManager.getTransactionHealthReport(),
      lastHealthCheck: this.lastHealthStatus,
      recoveryStrategies: Array.from(this.recoveryStrategies.keys()),
      configuration: {
        circuitBreakerThreshold: this.options.circuitBreakerThreshold,
        circuitBreakerTimeout: this.options.circuitBreakerTimeout,
        healthCheckInterval: this.options.healthCheckInterval,
        automaticRecoveryEnabled: this.options.enableAutomaticRecovery
      }
    };
  }
}

SQL-Style Error Handling with QueryLeaf

QueryLeaf provides familiar approaches to MongoDB transaction error handling and monitoring:

-- QueryLeaf transaction error handling with SQL-familiar syntax

-- Monitor transaction error patterns
SELECT 
  DATE_TRUNC('hour', error_timestamp) as hour_bucket,
  error_category,
  error_code,

  -- Error statistics
  COUNT(*) as error_count,
  COUNT(DISTINCT transaction_id) as affected_transactions,
  AVG(retry_attempts) as avg_retry_attempts,
  COUNT(CASE WHEN recovery_successful = true THEN 1 END) as successful_recoveries,

  -- Performance impact
  AVG(transaction_duration_ms) as avg_failed_transaction_duration,
  AVG(time_to_failure_ms) as avg_time_to_failure,

  -- Recovery metrics
  AVG(recovery_time_ms) as avg_recovery_time,
  MAX(recovery_time_ms) as max_recovery_time,

  -- Success rates
  ROUND((COUNT(CASE WHEN recovery_successful = true THEN 1 END)::DECIMAL / COUNT(*)) * 100, 2) as recovery_success_rate

FROM TRANSACTION_ERROR_LOG()
WHERE error_timestamp >= NOW() - INTERVAL '24 hours'
GROUP BY DATE_TRUNC('hour', error_timestamp), error_category, error_code
ORDER BY hour_bucket DESC, error_count DESC;

-- Analyze transaction resilience patterns
WITH transaction_resilience AS (
  SELECT 
    transaction_id,
    transaction_type,

    -- Transaction characteristics
    operation_count,
    total_duration_ms,
    retry_attempts,

    -- Error analysis
    first_error_code,
    first_error_category,
    total_errors,

    -- Recovery analysis
    recovery_strategy_used,
    recovery_successful,
    recovery_duration_ms,

    -- Final outcome
    final_status, -- committed, failed, recovered

    -- Timing analysis
    created_at,
    completed_at

  FROM TRANSACTION_HISTORY()
  WHERE created_at >= NOW() - INTERVAL '7 days'
),

resilience_patterns AS (
  SELECT 
    transaction_type,
    first_error_category,

    -- Volume metrics
    COUNT(*) as transaction_count,
    COUNT(CASE WHEN final_status = 'committed' THEN 1 END) as successful_transactions,
    COUNT(CASE WHEN final_status = 'recovered' THEN 1 END) as recovered_transactions,
    COUNT(CASE WHEN final_status = 'failed' THEN 1 END) as failed_transactions,

    -- Retry analysis
    AVG(retry_attempts) as avg_retry_attempts,
    MAX(retry_attempts) as max_retry_attempts,
    COUNT(CASE WHEN retry_attempts > 0 THEN 1 END) as transactions_with_retries,

    -- Recovery analysis
    COUNT(CASE WHEN recovery_strategy_used IS NOT NULL THEN 1 END) as recovery_attempts,
    COUNT(CASE WHEN recovery_successful = true THEN 1 END) as successful_recoveries,
    AVG(recovery_duration_ms) as avg_recovery_duration,

    -- Performance metrics
    AVG(total_duration_ms) as avg_total_duration,
    PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY total_duration_ms) as p95_duration,

    -- Success rates
    ROUND((COUNT(CASE WHEN final_status IN ('committed', 'recovered') THEN 1 END)::DECIMAL / COUNT(*)) * 100, 2) as overall_success_rate,
    ROUND((COUNT(CASE WHEN recovery_successful = true THEN 1 END)::DECIMAL / 
           GREATEST(COUNT(CASE WHEN recovery_strategy_used IS NOT NULL THEN 1 END), 1)) * 100, 2) as recovery_success_rate

  FROM transaction_resilience
  GROUP BY transaction_type, first_error_category
)

SELECT 
  transaction_type,
  first_error_category,

  -- Volume and success metrics
  transaction_count,
  successful_transactions,
  recovered_transactions,
  failed_transactions,
  overall_success_rate,

  -- Retry patterns
  avg_retry_attempts,
  max_retry_attempts,
  ROUND((transactions_with_retries::DECIMAL / transaction_count) * 100, 2) as retry_rate_percent,

  -- Recovery effectiveness
  recovery_attempts,
  successful_recoveries,
  recovery_success_rate,
  avg_recovery_duration,

  -- Performance characteristics
  avg_total_duration,
  p95_duration,

  -- Health assessment
  CASE 
    WHEN overall_success_rate >= 99 THEN 'Excellent'
    WHEN overall_success_rate >= 95 THEN 'Good' 
    WHEN overall_success_rate >= 90 THEN 'Fair'
    ELSE 'Poor'
  END as resilience_grade,

  -- Recommendations
  CASE 
    WHEN recovery_success_rate < 50 AND recovery_attempts > 0 THEN 'Improve recovery strategies'
    WHEN avg_retry_attempts > 3 THEN 'Review retry configuration'
    WHEN failed_transactions > successful_transactions * 0.1 THEN 'Investigate error root causes'
    ELSE 'Performance acceptable'
  END as recommendation

FROM resilience_patterns
ORDER BY transaction_count DESC, overall_success_rate ASC;

-- Real-time transaction health monitoring
SELECT 
  -- Current status
  COUNT(CASE WHEN status = 'active' THEN 1 END) as active_transactions,
  COUNT(CASE WHEN status = 'retrying' THEN 1 END) as retrying_transactions,
  COUNT(CASE WHEN status = 'recovering' THEN 1 END) as recovering_transactions,
  COUNT(CASE WHEN status = 'failed' THEN 1 END) as failed_transactions,

  -- Recent performance (last 5 minutes)
  AVG(CASE WHEN completed_at >= NOW() - INTERVAL '5 minutes' 
           THEN duration_ms END) as recent_avg_duration_ms,
  COUNT(CASE WHEN completed_at >= NOW() - INTERVAL '5 minutes' 
             AND final_status = 'committed' THEN 1 END) as recent_successful_transactions,
  COUNT(CASE WHEN completed_at >= NOW() - INTERVAL '5 minutes' 
             AND final_status = 'failed' THEN 1 END) as recent_failed_transactions,

  -- Error rates
  ROUND((COUNT(CASE WHEN error_occurred_at >= NOW() - INTERVAL '5 minutes' THEN 1 END)::DECIMAL /
         GREATEST(COUNT(CASE WHEN created_at >= NOW() - INTERVAL '5 minutes' THEN 1 END), 1)) * 100, 2) 
         as recent_error_rate_percent,

  -- Circuit breaker status
  COUNT(CASE WHEN circuit_breaker_state = 'open' THEN 1 END) as open_circuit_breakers,
  COUNT(CASE WHEN circuit_breaker_state = 'half-open' THEN 1 END) as half_open_circuit_breakers,

  -- Recovery metrics
  COUNT(CASE WHEN recovery_in_progress = true THEN 1 END) as active_recoveries,
  AVG(CASE WHEN recovery_completed_at >= NOW() - INTERVAL '5 minutes' 
           THEN recovery_duration_ms END) as recent_avg_recovery_time_ms,

  -- Health indicators
  CASE 
    WHEN COUNT(CASE WHEN status = 'failed' THEN 1 END) > 
         COUNT(CASE WHEN status = 'active' THEN 1 END) * 0.5 THEN 'Critical'
    WHEN COUNT(CASE WHEN circuit_breaker_state = 'open' THEN 1 END) > 0 THEN 'Degraded'
    WHEN COUNT(CASE WHEN status = 'retrying' THEN 1 END) > 
         COUNT(CASE WHEN status = 'active' THEN 1 END) * 0.3 THEN 'Warning'
    ELSE 'Healthy'
  END as overall_health_status,

  NOW() as report_timestamp

FROM ACTIVE_TRANSACTION_STATUS()
CROSS JOIN CIRCUIT_BREAKER_STATUS()
CROSS JOIN RECOVERY_STATUS();

-- Transaction error prevention and optimization
CREATE ALERT TRANSACTION_ERROR_PREVENTION
ON TRANSACTION_ERROR_LOG()
WHEN (
  -- High error rate
  (SELECT COUNT(*) FROM TRANSACTION_ERROR_LOG() 
   WHERE error_timestamp >= NOW() - INTERVAL '5 minutes') > 10
  OR
  -- Circuit breaker opened
  (SELECT COUNT(*) FROM CIRCUIT_BREAKER_STATUS() 
   WHERE state = 'open') > 0
  OR
  -- Recovery failing
  (SELECT AVG(CASE WHEN recovery_successful = true THEN 1.0 ELSE 0.0 END) 
   FROM TRANSACTION_ERROR_LOG() 
   WHERE error_timestamp >= NOW() - INTERVAL '15 minutes' 
   AND recovery_strategy_used IS NOT NULL) < 0.5
)
NOTIFY ['dba-team@company.com', 'dev-team@company.com']
WITH MESSAGE TEMPLATE '''
{% raw %}
Transaction Error Alert

Current Status:
- Recent Errors (5 min): {{ recent_error_count }}
- Open Circuit Breakers: {{ open_circuit_breaker_count }}
- Active Recoveries: {{ active_recovery_count }}
- Recovery Success Rate: {{ recovery_success_rate }}%

Top Error Categories:
{{ top_error_categories }}

Recommended Actions:
{{ error_prevention_recommendations }}

Dashboard: https://monitoring.company.com/mongodb/transactions
{% endraw %}
'''
EVERY 1 MINUTES;

-- QueryLeaf transaction error handling provides:
-- 1. SQL-familiar error monitoring and analysis
-- 2. Comprehensive transaction resilience reporting
-- 3. Real-time health monitoring and alerting
-- 4. Intelligent error pattern detection
-- 5. Recovery strategy effectiveness analysis
-- 6. Circuit breaker status monitoring
-- 7. Performance impact assessment
-- 8. Automated prevention and optimization recommendations
-- 9. Integration with MongoDB's native error handling
-- 10. Production-ready operational visibility

Best Practices for MongoDB Transaction Error Handling

Error Classification Strategy

Optimal error handling configuration for different application patterns:

  1. High-Frequency Applications: Aggressive retry policies with intelligent backoff
  2. Mission-Critical Systems: Comprehensive recovery strategies with circuit breakers
  3. Batch Processing: Extended timeout configurations with resource monitoring
  4. Real-time Applications: Fast-fail approaches with immediate fallback mechanisms
  5. Microservices: Distributed error handling with service-level circuit breakers
  6. Analytics Workloads: Specialized error handling for long-running operations

Recovery Strategy Guidelines

Essential patterns for production transaction recovery:

  1. Automatic Retry Logic: Exponential backoff with jitter for transient failures
  2. Circuit Breaker Pattern: Prevent cascading failures with intelligent state management
  3. Health Monitoring: Continuous assessment of system and transaction health
  4. Recovery Automation: Context-aware recovery strategies for different error types
  5. Performance Monitoring: Track error impact on application performance
  6. Operational Alerting: Proactive notification of error patterns and recovery issues

Conclusion

MongoDB transaction error handling and recovery requires sophisticated strategies that balance reliability, performance, and operational complexity. By implementing intelligent retry mechanisms, comprehensive error classification, and automated recovery patterns, applications can maintain consistency and reliability even when facing distributed system challenges.

Key error handling benefits include:

  • Intelligent Recovery: Automatic retry logic with context-aware recovery strategies
  • Comprehensive Monitoring: Detailed error tracking and performance analysis
  • Circuit Breaker Protection: Prevention of cascading failures with intelligent state management
  • Health Assessment: Continuous monitoring of transaction and system health
  • Operational Visibility: Real-time insights into error patterns and recovery effectiveness
  • Production Resilience: Enterprise-grade reliability patterns for mission-critical applications

Whether you're building high-throughput web applications, distributed microservices, data processing pipelines, or real-time analytics platforms, MongoDB's intelligent transaction error handling with QueryLeaf's familiar management interface provides the foundation for resilient, reliable database operations. This combination enables you to leverage advanced error recovery capabilities while maintaining familiar database administration patterns and operational procedures.

QueryLeaf Integration: QueryLeaf automatically translates SQL-familiar error handling patterns into optimal MongoDB transaction configurations while providing comprehensive monitoring and recovery through SQL-style queries. Advanced error classification, recovery automation, and performance analysis are seamlessly managed through familiar database administration interfaces, making sophisticated error handling both powerful and accessible.

The integration of intelligent error handling with SQL-style database operations makes MongoDB an ideal platform for applications requiring both high reliability and familiar error management patterns, ensuring your transactions remain both consistent and resilient as they scale to meet demanding production requirements.

MongoDB Aggregation Framework for Real-Time Analytics: Advanced Data Processing Pipelines and SQL-Compatible Query Patterns

Modern applications require sophisticated data processing capabilities that can handle complex analytical queries, real-time aggregations, and advanced transformations at scale. Traditional approaches to data analytics often rely on separate ETL processes, batch processing systems, and complex data warehouses that introduce latency, complexity, and operational overhead that becomes increasingly problematic as data volumes and processing demands grow.

MongoDB's Aggregation Framework provides powerful in-database processing capabilities that enable real-time analytics, complex data transformations, and sophisticated analytical queries directly within the operational database. Unlike traditional batch-oriented analytics approaches, MongoDB aggregation pipelines process data in real-time, support complex multi-stage transformations, and integrate seamlessly with operational workloads while delivering high-performance analytical results.

The Traditional Data Analytics Limitations

Conventional relational database analytics approaches have significant constraints for modern real-time processing requirements:

-- Traditional PostgreSQL analytics - limited window functions and complex subqueries

-- Basic sales analytics with traditional SQL limitations
WITH monthly_sales_summary AS (
  SELECT 
    DATE_TRUNC('month', order_date) as month,
    product_category,
    customer_id,
    salesperson_id,
    region,

    -- Basic aggregations
    COUNT(*) as order_count,
    SUM(total_amount) as total_revenue,
    AVG(total_amount) as avg_order_value,
    MIN(total_amount) as min_order_value,
    MAX(total_amount) as max_order_value,

    -- Limited window function capabilities
    SUM(total_amount) OVER (
      PARTITION BY product_category, region 
      ORDER BY DATE_TRUNC('month', order_date)
      RANGE BETWEEN INTERVAL '3 months' PRECEDING AND CURRENT ROW
    ) as rolling_3_month_revenue,

    LAG(SUM(total_amount)) OVER (
      PARTITION BY product_category, region 
      ORDER BY DATE_TRUNC('month', order_date)
    ) as previous_month_revenue,

    -- Row number for ranking (limited functionality)
    ROW_NUMBER() OVER (
      PARTITION BY DATE_TRUNC('month', order_date), region
      ORDER BY SUM(total_amount) DESC
    ) as revenue_rank_in_region

  FROM orders o
  LEFT JOIN order_items oi ON o.order_id = oi.order_id
  LEFT JOIN products p ON oi.product_id = p.product_id
  LEFT JOIN customers c ON o.customer_id = c.customer_id
  LEFT JOIN salespeople s ON o.salesperson_id = s.salesperson_id
  WHERE o.order_date >= CURRENT_DATE - INTERVAL '12 months'
    AND o.status = 'completed'
  GROUP BY 
    DATE_TRUNC('month', order_date), 
    product_category, 
    customer_id, 
    salesperson_id, 
    region
),

customer_segmentation AS (
  SELECT 
    customer_id,
    region,

    -- Customer metrics calculation
    COUNT(*) as total_orders,
    SUM(total_revenue) as lifetime_revenue,
    AVG(avg_order_value) as avg_order_value,
    MAX(month) as last_order_month,
    MIN(month) as first_order_month,

    -- Recency, Frequency, Monetary calculation (limited)
    EXTRACT(DAYS FROM (CURRENT_DATE - MAX(month))) as days_since_last_order,
    COUNT(*) as frequency_score,
    SUM(total_revenue) as monetary_score,

    -- Simple percentile calculation (limited support)
    PERCENT_RANK() OVER (ORDER BY SUM(total_revenue)) as revenue_percentile,
    PERCENT_RANK() OVER (ORDER BY COUNT(*)) as frequency_percentile,

    -- Basic customer categorization
    CASE 
      WHEN SUM(total_revenue) > 10000 AND COUNT(*) > 10 THEN 'high_value'
      WHEN SUM(total_revenue) > 5000 OR COUNT(*) > 5 THEN 'medium_value'
      WHEN EXTRACT(DAYS FROM (CURRENT_DATE - MAX(month))) > 90 THEN 'at_risk'
      ELSE 'low_value'
    END as customer_segment,

    -- Growth trend analysis (very limited)
    CASE 
      WHEN COUNT(*) FILTER (WHERE month >= CURRENT_DATE - INTERVAL '3 months') > 0 THEN 'active'
      WHEN COUNT(*) FILTER (WHERE month >= CURRENT_DATE - INTERVAL '6 months') > 0 THEN 'declining'
      ELSE 'inactive'
    END as activity_trend

  FROM monthly_sales_summary
  GROUP BY customer_id, region
),

product_performance AS (
  SELECT 
    product_category,
    region,
    month,

    -- Product metrics
    SUM(order_count) as total_orders,
    SUM(total_revenue) as category_revenue,
    AVG(avg_order_value) as avg_category_order_value,
    COUNT(DISTINCT customer_id) as unique_customers,

    -- Market share calculation (complex with traditional SQL)
    SUM(total_revenue) / (
      SELECT SUM(total_revenue) 
      FROM monthly_sales_summary mss2 
      WHERE mss2.month = monthly_sales_summary.month 
        AND mss2.region = monthly_sales_summary.region
    ) * 100 as market_share_percent,

    -- Growth rate calculation
    SUM(total_revenue) / NULLIF(LAG(SUM(total_revenue)) OVER (
      PARTITION BY product_category, region 
      ORDER BY month
    ), 0) - 1 as month_over_month_growth,

    -- Seasonal analysis (limited capabilities)
    AVG(SUM(total_revenue)) OVER (
      PARTITION BY product_category, region, EXTRACT(MONTH FROM month)
      ORDER BY month
      ROWS BETWEEN 11 PRECEDING AND CURRENT ROW
    ) as seasonal_avg_revenue

  FROM monthly_sales_summary
  GROUP BY product_category, region, month
),

advanced_analytics AS (
  SELECT 
    cs.customer_segment,
    cs.region,
    cs.activity_trend,

    -- Customer segment analysis
    COUNT(*) as customers_in_segment,
    AVG(cs.lifetime_revenue) as avg_lifetime_value,
    AVG(cs.total_orders) as avg_orders_per_customer,
    AVG(cs.days_since_last_order) as avg_days_since_last_order,

    -- Revenue contribution by segment
    SUM(cs.lifetime_revenue) as segment_total_revenue,
    SUM(cs.lifetime_revenue) / (
      SELECT SUM(lifetime_revenue) FROM customer_segmentation
    ) * 100 as revenue_contribution_percent,

    -- Top products for each segment (limited subquery approach)
    (
      SELECT product_category 
      FROM monthly_sales_summary mss
      WHERE mss.customer_id IN (
        SELECT cs2.customer_id 
        FROM customer_segmentation cs2 
        WHERE cs2.customer_segment = cs.customer_segment
          AND cs2.region = cs.region
      )
      GROUP BY product_category
      ORDER BY SUM(total_revenue) DESC
      LIMIT 1
    ) as top_product_category,

    -- Cohort analysis (very complex with traditional SQL)
    COUNT(*) FILTER (
      WHERE cs.first_order_month >= CURRENT_DATE - INTERVAL '1 month'
    ) as new_customers_this_month,

    COUNT(*) FILTER (
      WHERE cs.last_order_month >= CURRENT_DATE - INTERVAL '1 month'
        AND cs.first_order_month < CURRENT_DATE - INTERVAL '1 month'
    ) as returning_customers_this_month

  FROM customer_segmentation cs
  GROUP BY cs.customer_segment, cs.region, cs.activity_trend
)

SELECT 
  customer_segment,
  region,
  activity_trend,
  customers_in_segment,
  ROUND(avg_lifetime_value::numeric, 2) as avg_lifetime_value,
  ROUND(avg_orders_per_customer::numeric, 2) as avg_orders_per_customer,
  ROUND(avg_days_since_last_order::numeric, 1) as avg_days_since_last_order,
  ROUND(segment_total_revenue::numeric, 2) as segment_revenue,
  ROUND(revenue_contribution_percent::numeric, 2) as revenue_contribution_pct,
  top_product_category,
  new_customers_this_month,
  returning_customers_this_month,

  -- Customer health score (simplified)
  CASE 
    WHEN customer_segment = 'high_value' AND activity_trend = 'active' THEN 95
    WHEN customer_segment = 'high_value' AND activity_trend = 'declining' THEN 70
    WHEN customer_segment = 'medium_value' AND activity_trend = 'active' THEN 80
    WHEN customer_segment = 'medium_value' AND activity_trend = 'declining' THEN 55
    WHEN customer_segment = 'low_value' AND activity_trend = 'active' THEN 65
    WHEN activity_trend = 'inactive' THEN 25
    ELSE 40
  END as customer_health_score,

  -- Recommendations (limited business logic)
  CASE 
    WHEN customer_segment = 'high_value' AND activity_trend = 'declining' THEN 'Urgent: Re-engagement campaign needed'
    WHEN customer_segment = 'medium_value' AND activity_trend = 'active' THEN 'Opportunity: Upsell to premium products'
    WHEN customer_segment = 'at_risk' THEN 'Action: Retention campaign required'
    WHEN new_customers_this_month > returning_customers_this_month THEN 'Focus: Improve customer retention'
    ELSE 'Monitor: Continue current strategy'
  END as recommended_action

FROM advanced_analytics
ORDER BY 
  CASE customer_segment 
    WHEN 'high_value' THEN 1 
    WHEN 'medium_value' THEN 2 
    WHEN 'low_value' THEN 3 
    ELSE 4 
  END,
  segment_revenue DESC;

-- Traditional PostgreSQL analytics problems:
-- 1. Complex multi-table JOINs required for comprehensive analysis
-- 2. Limited window function capabilities for advanced analytics
-- 3. Difficult to implement complex transformations and nested aggregations
-- 4. Poor performance with large datasets and complex calculations
-- 5. Limited support for hierarchical and nested data structures
-- 6. No built-in support for time-series analytics and forecasting
-- 7. Complex subqueries required for conditional aggregations
-- 8. Difficult to implement real-time analytics and streaming calculations
-- 9. Limited flexibility for dynamic grouping and pivot operations
-- 10. No native support for advanced statistical functions and machine learning

-- MySQL limitations are even more severe
SELECT 
  DATE_FORMAT(order_date, '%Y-%m') as month,
  product_category,
  region,
  COUNT(*) as order_count,
  SUM(total_amount) as revenue,
  AVG(total_amount) as avg_order_value,

  -- Very limited analytical capabilities
  -- No window functions in older MySQL versions
  -- No complex aggregation support
  -- Limited JSON processing capabilities
  -- Poor performance with complex queries

  (SELECT SUM(total_amount) 
   FROM orders o2 
   WHERE DATE_FORMAT(o2.order_date, '%Y-%m') = DATE_FORMAT(orders.order_date, '%Y-%m')
     AND o2.region = orders.region) as region_monthly_total

FROM orders
JOIN order_items ON orders.order_id = order_items.order_id
JOIN products ON order_items.product_id = products.product_id
WHERE order_date >= DATE_SUB(CURDATE(), INTERVAL 12 MONTH)
  AND status = 'completed'
GROUP BY 
  DATE_FORMAT(order_date, '%Y-%m'), 
  product_category, 
  region
ORDER BY month DESC, revenue DESC;

-- MySQL problems:
-- - No window functions in older versions
-- - Very limited JSON support and processing
-- - Basic aggregation functions only
-- - Poor performance with complex analytical queries
-- - No support for advanced statistical calculations
-- - Limited date/time processing capabilities
-- - No native support for real-time analytics
-- - Basic subquery support with performance issues

MongoDB's Aggregation Framework provides comprehensive real-time analytics capabilities:

// MongoDB Advanced Aggregation Framework - powerful real-time analytics and data processing
const { MongoClient } = require('mongodb');

class MongoDBAnalyticsEngine {
  constructor(db) {
    this.db = db;
    this.collections = {
      orders: db.collection('orders'),
      products: db.collection('products'),
      customers: db.collection('customers'),
      analytics: db.collection('analytics_cache')
    };
    this.pipelineCache = new Map();
  }

  async performComprehensiveAnalytics() {
    console.log('Executing comprehensive real-time analytics with MongoDB Aggregation Framework...');

    // Execute multiple analytical pipelines in parallel
    const [
      salesAnalytics,
      customerSegmentation,
      productPerformance,
      timeSeriesAnalysis,
      predictiveInsights
    ] = await Promise.all([
      this.executeSalesAnalyticsPipeline(),
      this.executeCustomerSegmentationPipeline(),
      this.executeProductPerformancePipeline(),
      this.executeTimeSeriesAnalytics(),
      this.executePredictiveAnalytics()
    ]);

    return {
      salesAnalytics,
      customerSegmentation,
      productPerformance,
      timeSeriesAnalysis,
      predictiveInsights,
      generatedAt: new Date()
    };
  }

  async executeSalesAnalyticsPipeline() {
    console.log('Executing advanced sales analytics pipeline...');

    const pipeline = [
      // Stage 1: Match recent completed orders
      {
        $match: {
          status: 'completed',
          orderDate: { $gte: new Date(Date.now() - 365 * 24 * 60 * 60 * 1000) }, // Last 12 months
          'totals.total': { $gt: 0 }
        }
      },

      // Stage 2: Add computed fields and date transformations
      {
        $addFields: {
          year: { $year: '$orderDate' },
          month: { $month: '$orderDate' },
          dayOfYear: { $dayOfYear: '$orderDate' },
          weekOfYear: { $week: '$orderDate' },
          quarter: { 
            $ceil: { $divide: [{ $month: '$orderDate' }, 3] }
          },

          // Calculate order metrics
          orderValue: '$totals.total',
          itemCount: { $size: '$items' },
          avgItemValue: { 
            $divide: ['$totals.total', { $size: '$items' }] 
          },

          // Customer type classification
          customerType: {
            $switch: {
              branches: [
                { case: { $gte: ['$totals.total', 1000] }, then: 'high_value' },
                { case: { $gte: ['$totals.total', 500] }, then: 'medium_value' },
                { case: { $gte: ['$totals.total', 100] }, then: 'regular' }
              ],
              default: 'low_value'
            }
          },

          // Season classification
          season: {
            $switch: {
              branches: [
                { case: { $in: [{ $month: '$orderDate' }, [12, 1, 2]] }, then: 'winter' },
                { case: { $in: [{ $month: '$orderDate' }, [3, 4, 5]] }, then: 'spring' },
                { case: { $in: [{ $month: '$orderDate' }, [6, 7, 8]] }, then: 'summer' },
                { case: { $in: [{ $month: '$orderDate' }, [9, 10, 11]] }, then: 'fall' }
              ],
              default: 'unknown'
            }
          }
        }
      },

      // Stage 3: Lookup customer information
      {
        $lookup: {
          from: 'customers',
          localField: 'customerId',
          foreignField: '_id',
          as: 'customer',
          pipeline: [
            {
              $project: {
                name: 1,
                email: 1,
                'profile.location.country': 1,
                'profile.location.region': 1,
                'account.type': 1,
                'account.registrationDate': 1,
                'preferences.category': 1
              }
            }
          ]
        }
      },

      // Stage 4: Unwind customer data
      { $unwind: '$customer' },

      // Stage 5: Unwind order items for detailed analysis
      { $unwind: '$items' },

      // Stage 6: Lookup product information
      {
        $lookup: {
          from: 'products',
          localField: 'items.productId',
          foreignField: '_id',
          as: 'product',
          pipeline: [
            {
              $project: {
                name: 1,
                category: 1,
                brand: 1,
                'pricing.cost': 1,
                'specifications.weight': 1,
                'inventory.supplier': 1
              }
            }
          ]
        }
      },

      // Stage 7: Unwind product data
      { $unwind: '$product' },

      // Stage 8: Calculate item-level metrics
      {
        $addFields: {
          itemRevenue: { $multiply: ['$items.quantity', '$items.unitPrice'] },
          itemProfit: { 
            $multiply: [
              '$items.quantity', 
              { $subtract: ['$items.unitPrice', '$product.pricing.cost'] }
            ]
          },
          profitMargin: {
            $divide: [
              { $subtract: ['$items.unitPrice', '$product.pricing.cost'] },
              '$items.unitPrice'
            ]
          }
        }
      },

      // Stage 9: Group by multiple dimensions for comprehensive analysis
      {
        $group: {
          _id: {
            year: '$year',
            month: '$month',
            quarter: '$quarter',
            season: '$season',
            category: '$product.category',
            brand: '$product.brand',
            country: '$customer.profile.location.country',
            region: '$customer.profile.location.region',
            customerType: '$customerType',
            accountType: '$customer.account.type'
          },

          // Order-level metrics
          totalOrders: { $sum: 1 },
          uniqueCustomers: { $addToSet: '$customerId' },
          totalRevenue: { $sum: '$itemRevenue' },
          totalProfit: { $sum: '$itemProfit' },
          totalQuantity: { $sum: '$items.quantity' },

          // Statistical measures
          avgOrderValue: { $avg: '$orderValue' },
          minOrderValue: { $min: '$orderValue' },
          maxOrderValue: { $max: '$orderValue' },
          stdDevOrderValue: { $stdDevPop: '$orderValue' },

          // Product performance
          avgProfitMargin: { $avg: '$profitMargin' },
          avgItemPrice: { $avg: '$items.unitPrice' },
          totalWeight: { $sum: { $multiply: ['$items.quantity', '$product.specifications.weight'] } },

          // Customer insights
          newCustomers: {
            $sum: {
              $cond: [
                { $gte: [
                  '$customer.account.registrationDate',
                  { $dateFromParts: { year: '$year', month: '$month', day: 1 } }
                ]},
                1, 0
              ]
            }
          },

          // Supplier diversity
          uniqueSuppliers: { $addToSet: '$product.inventory.supplier' },

          // Sample orders for detailed analysis
          sampleOrders: { $push: {
            orderId: '$_id',
            customerId: '$customerId',
            orderValue: '$orderValue',
            itemCount: '$itemCount',
            orderDate: '$orderDate'
          }}
        }
      },

      // Stage 10: Calculate derived metrics
      {
        $addFields: {
          uniqueCustomerCount: { $size: '$uniqueCustomers' },
          uniqueSupplierCount: { $size: '$uniqueSuppliers' },
          averageOrdersPerCustomer: { 
            $divide: ['$totalOrders', { $size: '$uniqueCustomers' }] 
          },
          revenuePerCustomer: { 
            $divide: ['$totalRevenue', { $size: '$uniqueCustomers' }] 
          },
          profitMarginPercent: { 
            $multiply: [{ $divide: ['$totalProfit', '$totalRevenue'] }, 100] 
          },
          customerAcquisitionRate: {
            $divide: ['$newCustomers', { $size: '$uniqueCustomers' }]
          }
        }
      },

      // Stage 11: Add ranking and percentile information
      {
        $setWindowFields: {
          partitionBy: { year: '$_id.year', quarter: '$_id.quarter' },
          sortBy: { totalRevenue: -1 },
          output: {
            revenueRank: { $rank: {} },
            revenuePercentile: { $percentRank: {} },
            cumulativeRevenue: { $sum: '$totalRevenue', window: { documents: ['unbounded preceding', 'current'] } },
            movingAvgRevenue: { $avg: '$totalRevenue', window: { documents: [-2, 2] } }
          }
        }
      },

      // Stage 12: Calculate growth rates using window functions
      {
        $setWindowFields: {
          partitionBy: { 
            category: '$_id.category', 
            country: '$_id.country' 
          },
          sortBy: { year: 1, month: 1 },
          output: {
            previousMonthRevenue: { 
              $shift: { output: '$totalRevenue', by: -1 } 
            },
            previousYearRevenue: { 
              $shift: { output: '$totalRevenue', by: -12 } 
            }
          }
        }
      },

      // Stage 13: Calculate final growth metrics
      {
        $addFields: {
          monthOverMonthGrowth: {
            $cond: [
              { $gt: ['$previousMonthRevenue', 0] },
              { 
                $subtract: [
                  { $divide: ['$totalRevenue', '$previousMonthRevenue'] },
                  1
                ]
              },
              null
            ]
          },
          yearOverYearGrowth: {
            $cond: [
              { $gt: ['$previousYearRevenue', 0] },
              { 
                $subtract: [
                  { $divide: ['$totalRevenue', '$previousYearRevenue'] },
                  1
                ]
              },
              null
            ]
          }
        }
      },

      // Stage 14: Add performance indicators
      {
        $addFields: {
          performanceIndicator: {
            $switch: {
              branches: [
                { 
                  case: { $and: [
                    { $gt: ['$monthOverMonthGrowth', 0.1] },
                    { $gt: ['$profitMarginPercent', 20] }
                  ]},
                  then: 'excellent'
                },
                { 
                  case: { $and: [
                    { $gt: ['$monthOverMonthGrowth', 0.05] },
                    { $gt: ['$profitMarginPercent', 15] }
                  ]},
                  then: 'good'
                },
                { 
                  case: { $or: [
                    { $lt: ['$monthOverMonthGrowth', -0.1] },
                    { $lt: ['$profitMarginPercent', 5] }
                  ]},
                  then: 'concerning'
                }
              ],
              default: 'average'
            }
          },

          // Business recommendations
          recommendation: {
            $switch: {
              branches: [
                { 
                  case: { $lt: ['$monthOverMonthGrowth', -0.2] },
                  then: 'Urgent: Investigate revenue decline and implement recovery strategy'
                },
                { 
                  case: { $lt: ['$profitMarginPercent', 5] },
                  then: 'Action: Review pricing strategy and cost structure'
                },
                { 
                  case: { $and: [
                    { $gt: ['$monthOverMonthGrowth', 0.15] },
                    { $gt: ['$revenuePercentile', 0.8] }
                  ]},
                  then: 'Opportunity: Scale successful strategies and increase investment'
                },
                { 
                  case: { $lt: ['$customerAcquisitionRate', 0.1] },
                  then: 'Focus: Improve customer acquisition and marketing effectiveness'
                }
              ],
              default: 'Monitor: Continue current strategies with minor optimizations'
            }
          }
        }
      },

      // Stage 15: Sort by strategic importance
      {
        $sort: {
          'totalRevenue': -1,
          'profitMarginPercent': -1,
          '_id.year': -1,
          '_id.month': -1
        }
      },

      // Stage 16: Limit to top performing segments for detailed analysis
      { $limit: 100 }
    ];

    const results = await this.collections.orders.aggregate(pipeline).toArray();

    console.log(`Sales analytics completed: ${results.length} segments analyzed`);
    return results;
  }

  async executeCustomerSegmentationPipeline() {
    console.log('Executing advanced customer segmentation pipeline...');

    const pipeline = [
      // Stage 1: Match active customers with orders
      {
        $match: {
          'account.status': 'active',
          'account.createdAt': { $gte: new Date(Date.now() - 730 * 24 * 60 * 60 * 1000) } // Last 2 years
        }
      },

      // Stage 2: Lookup customer orders
      {
        $lookup: {
          from: 'orders',
          localField: '_id',
          foreignField: 'customerId',
          as: 'orders',
          pipeline: [
            {
              $match: {
                status: 'completed',
                orderDate: { $gte: new Date(Date.now() - 365 * 24 * 60 * 60 * 1000) }
              }
            },
            {
              $project: {
                orderDate: 1,
                'totals.total': 1,
                'totals.currency': 1,
                items: 1
              }
            }
          ]
        }
      },

      // Stage 3: Calculate RFM metrics (Recency, Frequency, Monetary)
      {
        $addFields: {
          // Recency: Days since last order
          recency: {
            $cond: [
              { $gt: [{ $size: '$orders' }, 0] },
              {
                $divide: [
                  { $subtract: [new Date(), { $max: '$orders.orderDate' }] },
                  1000 * 60 * 60 * 24 // Convert to days
                ]
              },
              999 // Default high recency for customers with no orders
            ]
          },

          // Frequency: Number of orders
          frequency: { $size: '$orders' },

          // Monetary: Total spending
          monetary: {
            $reduce: {
              input: '$orders',
              initialValue: 0,
              in: { $add: ['$$value', '$$this.totals.total'] }
            }
          },

          // Additional customer metrics
          avgOrderValue: {
            $cond: [
              { $gt: [{ $size: '$orders' }, 0] },
              {
                $divide: [
                  {
                    $reduce: {
                      input: '$orders',
                      initialValue: 0,
                      in: { $add: ['$$value', '$$this.totals.total'] }
                    }
                  },
                  { $size: '$orders' }
                ]
              },
              0
            ]
          },

          firstOrderDate: { $min: '$orders.orderDate' },
          lastOrderDate: { $max: '$orders.orderDate' },

          // Calculate customer lifetime (days)
          customerLifetime: {
            $cond: [
              { $gt: [{ $size: '$orders' }, 0] },
              {
                $divide: [
                  { $subtract: [{ $max: '$orders.orderDate' }, { $min: '$orders.orderDate' }] },
                  1000 * 60 * 60 * 24
                ]
              },
              0
            ]
          }
        }
      },

      // Stage 4: Calculate RFM scores using percentile ranking
      {
        $setWindowFields: {
          sortBy: { recency: 1 }, // Lower recency is better (more recent)
          output: {
            recencyScore: {
              $percentRank: {}
            }
          }
        }
      },

      {
        $setWindowFields: {
          sortBy: { frequency: -1 }, // Higher frequency is better
          output: {
            frequencyScore: {
              $percentRank: {}
            }
          }
        }
      },

      {
        $setWindowFields: {
          sortBy: { monetary: -1 }, // Higher monetary is better
          output: {
            monetaryScore: {
              $percentRank: {}
            }
          }
        }
      },

      // Stage 5: Create RFM segments
      {
        $addFields: {
          // Convert percentile scores to 1-5 scale
          recencyBucket: {
            $ceil: { $multiply: [{ $subtract: [1, '$recencyScore'] }, 5] }
          },
          frequencyBucket: {
            $ceil: { $multiply: ['$frequencyScore', 5] }
          },
          monetaryBucket: {
            $ceil: { $multiply: ['$monetaryScore', 5] }
          }
        }
      },

      // Stage 6: Create customer segments based on RFM
      {
        $addFields: {
          rfmScore: {
            $concat: [
              { $toString: '$recencyBucket' },
              { $toString: '$frequencyBucket' },
              { $toString: '$monetaryBucket' }
            ]
          },

          customerSegment: {
            $switch: {
              branches: [
                // Champions: High value, bought recently, buy often
                { 
                  case: { $and: [
                    { $gte: ['$recencyBucket', 4] },
                    { $gte: ['$frequencyBucket', 4] },
                    { $gte: ['$monetaryBucket', 4] }
                  ]},
                  then: 'champions'
                },
                // Loyal customers: High frequency and monetary, but not recent
                { 
                  case: { $and: [
                    { $gte: ['$frequencyBucket', 4] },
                    { $gte: ['$monetaryBucket', 4] }
                  ]},
                  then: 'loyal_customers'
                },
                // Potential loyalists: Recent customers with good frequency
                { 
                  case: { $and: [
                    { $gte: ['$recencyBucket', 4] },
                    { $gte: ['$frequencyBucket', 3] }
                  ]},
                  then: 'potential_loyalists'
                },
                // New customers: Recent but low frequency/monetary
                { 
                  case: { $and: [
                    { $gte: ['$recencyBucket', 4] },
                    { $lte: ['$frequencyBucket', 2] }
                  ]},
                  then: 'new_customers'
                },
                // Promising: Recent moderate spenders
                { 
                  case: { $and: [
                    { $gte: ['$recencyBucket', 3] },
                    { $gte: ['$monetaryBucket', 3] }
                  ]},
                  then: 'promising'
                },
                // Need attention: Recent low spenders
                { 
                  case: { $and: [
                    { $gte: ['$recencyBucket', 3] },
                    { $lte: ['$monetaryBucket', 2] }
                  ]},
                  then: 'need_attention'
                },
                // About to sleep: Low recency but good historical value
                { 
                  case: { $and: [
                    { $lte: ['$recencyBucket', 2] },
                    { $gte: ['$monetaryBucket', 3] }
                  ]},
                  then: 'about_to_sleep'
                },
                // At risk: Low recency and frequency but good monetary
                { 
                  case: { $and: [
                    { $lte: ['$recencyBucket', 2] },
                    { $lte: ['$frequencyBucket', 2] },
                    { $gte: ['$monetaryBucket', 3] }
                  ]},
                  then: 'at_risk'
                },
                // Cannot lose: Very low recency but high monetary
                { 
                  case: { $and: [
                    { $eq: ['$recencyBucket', 1] },
                    { $gte: ['$monetaryBucket', 4] }
                  ]},
                  then: 'cannot_lose'
                },
                // Hibernating: Low across all dimensions
                { 
                  case: { $and: [
                    { $lte: ['$recencyBucket', 2] },
                    { $lte: ['$frequencyBucket', 2] },
                    { $lte: ['$monetaryBucket', 2] }
                  ]},
                  then: 'hibernating'
                }
              ],
              default: 'others'
            }
          },

          // Calculate customer lifetime value
          customerLifetimeValue: {
            $multiply: [
              '$avgOrderValue',
              { $divide: ['$frequency', { $max: [1, { $divide: ['$customerLifetime', 365] }] }] }, // Orders per year
              3 // Projected future years
            ]
          },

          // Churn risk assessment
          churnRisk: {
            $switch: {
              branches: [
                { case: { $gte: ['$recency', 180] }, then: 'high' },
                { case: { $gte: ['$recency', 90] }, then: 'medium' },
                { case: { $gte: ['$recency', 30] }, then: 'low' }
              ],
              default: 'very_low'
            }
          }
        }
      },

      // Stage 7: Enrich with customer profile data
      {
        $addFields: {
          profileCompleteness: {
            $divide: [
              {
                $add: [
                  { $cond: [{ $ne: ['$profile.firstName', null] }, 1, 0] },
                  { $cond: [{ $ne: ['$profile.lastName', null] }, 1, 0] },
                  { $cond: [{ $ne: ['$profile.phone', null] }, 1, 0] },
                  { $cond: [{ $ne: ['$profile.location', null] }, 1, 0] },
                  { $cond: [{ $ne: ['$profile.dateOfBirth', null] }, 1, 0] },
                  { $cond: [{ $ne: ['$preferences', null] }, 1, 0] }
                ]
              },
              6
            ]
          },

          engagementLevel: {
            $switch: {
              branches: [
                { 
                  case: { $and: [
                    { $gte: ['$frequency', 10] },
                    { $lte: ['$recency', 30] }
                  ]},
                  then: 'highly_engaged'
                },
                { 
                  case: { $and: [
                    { $gte: ['$frequency', 5] },
                    { $lte: ['$recency', 60] }
                  ]},
                  then: 'moderately_engaged'
                },
                { 
                  case: { $and: [
                    { $gte: ['$frequency', 2] },
                    { $lte: ['$recency', 120] }
                  ]},
                  then: 'lightly_engaged'
                }
              ],
              default: 'disengaged'
            }
          }
        }
      },

      // Stage 8: Create final customer analysis
      {
        $project: {
          _id: 1,
          email: 1,
          'profile.firstName': 1,
          'profile.lastName': 1,
          'profile.location.country': 1,
          'profile.location.region': 1,
          'account.type': 1,
          'account.createdAt': 1,

          // RFM Analysis
          recency: { $round: ['$recency', 1] },
          frequency: 1,
          monetary: { $round: ['$monetary', 2] },
          rfmScore: 1,
          recencyBucket: 1,
          frequencyBucket: 1,
          monetaryBucket: 1,

          // Customer Classification
          customerSegment: 1,
          churnRisk: 1,
          engagementLevel: 1,

          // Business Metrics
          avgOrderValue: { $round: ['$avgOrderValue', 2] },
          customerLifetimeValue: { $round: ['$customerLifetimeValue', 2] },
          customerLifetime: { $round: ['$customerLifetime', 0] },
          profileCompleteness: { $round: [{ $multiply: ['$profileCompleteness', 100] }, 1] },

          // Timeline
          firstOrderDate: 1,
          lastOrderDate: 1,

          // Marketing recommendations
          marketingAction: {
            $switch: {
              branches: [
                { case: { $eq: ['$customerSegment', 'champions'] }, then: 'Reward and advocate program' },
                { case: { $eq: ['$customerSegment', 'loyal_customers'] }, then: 'Upsell and cross-sell premium products' },
                { case: { $eq: ['$customerSegment', 'potential_loyalists'] }, then: 'Loyalty program enrollment' },
                { case: { $eq: ['$customerSegment', 'new_customers'] }, then: 'Onboarding and education campaign' },
                { case: { $eq: ['$customerSegment', 'promising'] }, then: 'Targeted promotions and engagement' },
                { case: { $eq: ['$customerSegment', 'need_attention'] }, then: 'Value demonstration and support' },
                { case: { $eq: ['$customerSegment', 'about_to_sleep'] }, then: 'Re-engagement campaign with incentives' },
                { case: { $eq: ['$customerSegment', 'at_risk'] }, then: 'Urgent retention program' },
                { case: { $eq: ['$customerSegment', 'cannot_lose'] }, then: 'Win-back campaign with premium offers' },
                { case: { $eq: ['$customerSegment', 'hibernating'] }, then: 'Reactivation with significant discount' }
              ],
              default: 'Monitor and nurture'
            }
          }
        }
      },

      // Stage 9: Sort by customer value and risk
      {
        $sort: {
          customerLifetimeValue: -1,
          recency: 1,
          frequency: -1
        }
      }
    ];

    const results = await this.collections.customers.aggregate(pipeline).toArray();

    console.log(`Customer segmentation completed: ${results.length} customers analyzed`);
    return results;
  }

  async executeProductPerformancePipeline() {
    console.log('Executing product performance analysis pipeline...');

    const pipeline = [
      // Stage 1: Match products with sales data
      {
        $lookup: {
          from: 'orders',
          let: { productId: '$_id' },
          pipeline: [
            {
              $match: {
                status: 'completed',
                orderDate: { $gte: new Date(Date.now() - 365 * 24 * 60 * 60 * 1000) }
              }
            },
            { $unwind: '$items' },
            {
              $match: {
                $expr: { $eq: ['$items.productId', '$$productId'] }
              }
            },
            {
              $project: {
                orderDate: 1,
                customerId: 1,
                quantity: '$items.quantity',
                unitPrice: '$items.unitPrice',
                totalPrice: '$items.totalPrice',
                'customer.location.country': 1
              }
            }
          ],
          as: 'sales'
        }
      },

      // Stage 2: Calculate comprehensive product metrics
      {
        $addFields: {
          // Sales volume metrics
          totalUnitsSold: {
            $reduce: {
              input: '$sales',
              initialValue: 0,
              in: { $add: ['$$value', '$$this.quantity'] }
            }
          },

          totalRevenue: {
            $reduce: {
              input: '$sales',
              initialValue: 0,
              in: { $add: ['$$value', '$$this.totalPrice'] }
            }
          },

          totalOrders: { $size: '$sales' },
          uniqueCustomers: { $size: { $setUnion: [{ $map: { input: '$sales', as: 'sale', in: '$$sale.customerId' } }, []] } },

          // Pricing analysis
          avgSellingPrice: {
            $cond: [
              { $gt: [{ $size: '$sales' }, 0] },
              {
                $divide: [
                  {
                    $reduce: {
                      input: '$sales',
                      initialValue: 0,
                      in: { $add: ['$$value', '$$this.unitPrice'] }
                    }
                  },
                  { $size: '$sales' }
                ]
              },
              0
            ]
          },

          // Profit analysis
          totalProfit: {
            $reduce: {
              input: '$sales',
              initialValue: 0,
              in: { 
                $add: [
                  '$$value', 
                  { 
                    $multiply: [
                      '$$this.quantity',
                      { $subtract: ['$$this.unitPrice', '$pricing.cost'] }
                    ]
                  }
                ]
              }
            }
          },

          // Time-based analysis
          firstSaleDate: { $min: '$sales.orderDate' },
          lastSaleDate: { $max: '$sales.orderDate' },

          // Calculate monthly sales trend
          monthlySales: {
            $map: {
              input: { $range: [0, 12] },
              as: 'monthOffset',
              in: {
                month: {
                  $dateFromParts: {
                    year: { $year: { $dateSubtract: { startDate: new Date(), unit: 'month', amount: '$$monthOffset' } } },
                    month: { $month: { $dateSubtract: { startDate: new Date(), unit: 'month', amount: '$$monthOffset' } } },
                    day: 1
                  }
                },
                sales: {
                  $reduce: {
                    input: {
                      $filter: {
                        input: '$sales',
                        cond: {
                          $and: [
                            { $gte: ['$$this.orderDate', { $dateSubtract: { startDate: new Date(), unit: 'month', amount: { $add: ['$$monthOffset', 1] } } }] },
                            { $lt: ['$$this.orderDate', { $dateSubtract: { startDate: new Date(), unit: 'month', amount: '$$monthOffset' } }] }
                          ]
                        }
                      }
                    },
                    initialValue: 0,
                    in: { $add: ['$$value', '$$this.totalPrice'] }
                  }
                }
              }
            }
          }
        }
      },

      // Stage 3: Calculate performance indicators
      {
        $addFields: {
          // Performance ratios
          profitMargin: {
            $cond: [
              { $gt: ['$totalRevenue', 0] },
              { $divide: ['$totalProfit', '$totalRevenue'] },
              0
            ]
          },

          revenuePerCustomer: {
            $cond: [
              { $gt: ['$uniqueCustomers', 0] },
              { $divide: ['$totalRevenue', '$uniqueCustomers'] },
              0
            ]
          },

          avgOrderValue: {
            $cond: [
              { $gt: ['$totalOrders', 0] },
              { $divide: ['$totalRevenue', '$totalOrders'] },
              0
            ]
          },

          // Inventory turnover (simplified)
          inventoryTurnover: {
            $cond: [
              { $gt: ['$inventory.quantity', 0] },
              { $divide: ['$totalUnitsSold', '$inventory.quantity'] },
              0
            ]
          },

          // Product lifecycle stage
          lifecycleStage: {
            $switch: {
              branches: [
                { 
                  case: { 
                    $lt: [
                      '$firstSaleDate', 
                      { $dateSubtract: { startDate: new Date(), unit: 'day', amount: 90 } }
                    ]
                  },
                  then: 'new'
                },
                {
                  case: { $and: [
                    { $gt: ['$totalRevenue', 10000] },
                    { $gt: ['$profitMargin', 0.2] }
                  ]},
                  then: 'growth'
                },
                {
                  case: { $and: [
                    { $gt: ['$totalRevenue', 50000] },
                    { $gte: ['$profitMargin', 0.15] }
                  ]},
                  then: 'maturity'
                },
                {
                  case: { $or: [
                    { $lt: ['$profitMargin', 0.1] },
                    { $lt: [
                      '$lastSaleDate',
                      { $dateSubtract: { startDate: new Date(), unit: 'day', amount: 60 } }
                    ]}
                  ]},
                  then: 'decline'
                }
              ],
              default: 'development'
            }
          },

          // Sales trend analysis
          salesTrend: {
            $let: {
              vars: {
                recentSales: { $slice: ['$monthlySales.sales', 0, 6] },
                olderSales: { $slice: ['$monthlySales.sales', 6, 6] }
              },
              in: {
                $cond: [
                  { $and: [
                    { $gt: [{ $avg: '$$recentSales' }, { $avg: '$$olderSales' }] },
                    { $gt: [{ $avg: '$$recentSales' }, 0] }
                  ]},
                  'growing',
                  {
                    $cond: [
                      { $lt: [{ $avg: '$$recentSales' }, { $multiply: [{ $avg: '$$olderSales' }, 0.8] }] },
                      'declining',
                      'stable'
                    ]
                  }
                ]
              }
            }
          }
        }
      },

      // Stage 4: Add competitive analysis using window functions
      {
        $setWindowFields: {
          partitionBy: '$category',
          sortBy: { totalRevenue: -1 },
          output: {
            categoryRank: { $rank: {} },
            categoryPercentile: { $percentRank: {} },
            marketShareInCategory: {
              $divide: [
                '$totalRevenue',
                { $sum: '$totalRevenue', window: { documents: ['unbounded preceding', 'unbounded following'] } }
              ]
            }
          }
        }
      },

      // Stage 5: Calculate final performance scores
      {
        $addFields: {
          // Overall performance score (0-100)
          performanceScore: {
            $multiply: [
              {
                $add: [
                  { $multiply: ['$categoryPercentile', 0.3] }, // Market position
                  { $multiply: [{ $min: ['$profitMargin', 0.5] }, 0.25] }, // Profitability (capped at 50%)
                  { $multiply: [{ $divide: [{ $min: ['$inventoryTurnover', 10] }, 10] }, 0.2] }, // Efficiency (capped at 10x)
                  { 
                    $multiply: [
                      {
                        $switch: {
                          branches: [
                            { case: { $eq: ['$salesTrend', 'growing'] }, then: 1 },
                            { case: { $eq: ['$salesTrend', 'stable'] }, then: 0.7 },
                            { case: { $eq: ['$salesTrend', 'declining'] }, then: 0.3 }
                          ],
                          default: 0.5
                        }
                      },
                      0.25
                    ]
                  } // Growth trend
                ]
              },
              100
            ]
          },

          // Strategic recommendations
          strategicRecommendation: {
            $switch: {
              branches: [
                {
                  case: { $and: [
                    { $eq: ['$salesTrend', 'growing'] },
                    { $gt: ['$profitMargin', 0.25] },
                    { $lt: ['$categoryRank', 5] }
                  ]},
                  then: 'Star Product: Increase investment and marketing focus'
                },
                {
                  case: { $and: [
                    { $eq: ['$lifecycleStage', 'maturity'] },
                    { $gt: ['$profitMargin', 0.2] }
                  ]},
                  then: 'Cash Cow: Optimize operations and maintain market share'
                },
                {
                  case: { $and: [
                    { $eq: ['$salesTrend', 'growing'] },
                    { $lt: ['$profitMargin', 0.15] }
                  ]},
                  then: 'Question Mark: Improve margins or consider repositioning'
                },
                {
                  case: { $and: [
                    { $eq: ['$salesTrend', 'declining'] },
                    { $lt: ['$profitMargin', 0.1] }
                  ]},
                  then: 'Dog: Consider discontinuation or major repositioning'
                },
                {
                  case: { $eq: ['$lifecycleStage', 'new'] },
                  then: 'Monitor closely and provide marketing support'
                }
              ],
              default: 'Maintain current strategy with regular monitoring'
            }
          }
        }
      },

      // Stage 6: Final projection and sorting
      {
        $project: {
          _id: 1,
          name: 1,
          category: 1,
          brand: 1,
          'pricing.cost': 1,
          'pricing.retail': 1,

          // Sales performance
          totalUnitsSold: 1,
          totalRevenue: { $round: ['$totalRevenue', 2] },
          totalProfit: { $round: ['$totalProfit', 2] },
          totalOrders: 1,
          uniqueCustomers: 1,

          // Financial metrics
          avgSellingPrice: { $round: ['$avgSellingPrice', 2] },
          profitMargin: { $round: [{ $multiply: ['$profitMargin', 100] }, 2] },
          revenuePerCustomer: { $round: ['$revenuePerCustomer', 2] },
          avgOrderValue: { $round: ['$avgOrderValue', 2] },

          // Performance indicators
          performanceScore: { $round: ['$performanceScore', 1] },
          lifecycleStage: 1,
          salesTrend: 1,
          categoryRank: 1,
          marketShareInCategory: { $round: [{ $multiply: ['$marketShareInCategory', 100] }, 3] },

          // Operational metrics
          inventoryTurnover: { $round: ['$inventoryTurnover', 2] },
          'inventory.quantity': 1,
          'inventory.lowStockThreshold': 1,

          // Timeline
          firstSaleDate: 1,
          lastSaleDate: 1,

          // Strategic guidance
          strategicRecommendation: 1,

          // Monthly trend data (last 6 months)
          recentMonthlySales: { $slice: ['$monthlySales', 0, 6] }
        }
      },

      // Stage 7: Sort by performance score and revenue
      {
        $sort: {
          performanceScore: -1,
          totalRevenue: -1
        }
      }
    ];

    const results = await this.collections.products.aggregate(pipeline).toArray();

    console.log(`Product performance analysis completed: ${results.length} products analyzed`);
    return results;
  }

  async executeTimeSeriesAnalytics() {
    console.log('Executing time-series analytics with advanced forecasting...');

    const pipeline = [
      // Stage 1: Match recent orders for time-series analysis
      {
        $match: {
          status: 'completed',
          orderDate: { $gte: new Date(Date.now() - 730 * 24 * 60 * 60 * 1000) } // Last 2 years
        }
      },

      // Stage 2: Group by time periods
      {
        $group: {
          _id: {
            year: { $year: '$orderDate' },
            month: { $month: '$orderDate' },
            week: { $week: '$orderDate' },
            dayOfWeek: { $dayOfWeek: '$orderDate' },
            hour: { $hour: '$orderDate' }
          },

          // Core metrics
          orderCount: { $sum: 1 },
          totalRevenue: { $sum: '$totals.total' },
          avgOrderValue: { $avg: '$totals.total' },
          uniqueCustomers: { $addToSet: '$customerId' },

          // Item-level aggregations
          totalItemsSold: {
            $sum: {
              $reduce: {
                input: '$items',
                initialValue: 0,
                in: { $add: ['$$value', '$$this.quantity'] }
              }
            }
          },

          // Distribution analysis
          orderValues: { $push: '$totals.total' },

          // Customer behavior
          newCustomers: {
            $sum: {
              $cond: [
                { $eq: [{ $dayOfYear: '$orderDate' }, { $dayOfYear: '$customer.account.createdAt' }] },
                1,
                0
              ]
            }
          },

          // Geographic distribution
          countries: { $addToSet: '$shippingAddress.country' },
          regions: { $addToSet: '$shippingAddress.region' }
        }
      },

      // Stage 3: Add time-based calculations
      {
        $addFields: {
          // Convert _id to more usable date format
          date: {
            $dateFromParts: {
              year: '$_id.year',
              month: '$_id.month',
              day: 1
            }
          },

          uniqueCustomerCount: { $size: '$uniqueCustomers' },
          uniqueCountryCount: { $size: '$countries' },

          // Statistical measures
          revenueStdDev: { $stdDevPop: '$orderValues' },
          medianOrderValue: {
            $let: {
              vars: {
                sortedValues: {
                  $sortArray: {
                    input: '$orderValues',
                    sortBy: 1
                  }
                }
              },
              in: {
                $arrayElemAt: [
                  '$$sortedValues',
                  { $floor: { $divide: [{ $size: '$$sortedValues' }, 2] } }
                ]
              }
            }
          },

          // Time period classifications
          periodType: {
            $switch: {
              branches: [
                { case: { $gte: ['$_id.dayOfWeek', 2] }, then: 'weekend' },
                { case: { $lte: ['$_id.dayOfWeek', 6] }, then: 'weekday' }
              ],
              default: 'weekend'
            }
          },

          timeOfDay: {
            $switch: {
              branches: [
                { case: { $lt: ['$_id.hour', 6] }, then: 'late_night' },
                { case: { $lt: ['$_id.hour', 12] }, then: 'morning' },
                { case: { $lt: ['$_id.hour', 18] }, then: 'afternoon' },
                { case: { $lt: ['$_id.hour', 22] }, then: 'evening' }
              ],
              default: 'night'
            }
          }
        }
      },

      // Stage 4: Add moving averages and trend analysis
      {
        $setWindowFields: {
          partitionBy: null,
          sortBy: { date: 1 },
          output: {
            // Moving averages
            revenue7DayMA: {
              $avg: '$totalRevenue',
              window: { documents: [-6, 0] }
            },
            revenue30DayMA: {
              $avg: '$totalRevenue',
              window: { documents: [-29, 0] }
            },

            // Growth calculations
            previousDayRevenue: {
              $shift: { output: '$totalRevenue', by: -1 }
            },
            previousWeekRevenue: {
              $shift: { output: '$totalRevenue', by: -7 }
            },
            previousMonthRevenue: {
              $shift: { output: '$totalRevenue', by: -30 }
            },

            // Volatility measures
            revenueVolatility: {
              $stdDevPop: '$totalRevenue',
              window: { documents: [-29, 0] }
            },

            // Trend strength
            trendLine: {
              $linearFill: '$totalRevenue'
            }
          }
        }
      },

      // Stage 5: Calculate growth rates and trend indicators
      {
        $addFields: {
          dayOverDayGrowth: {
            $cond: [
              { $gt: ['$previousDayRevenue', 0] },
              { $subtract: [{ $divide: ['$totalRevenue', '$previousDayRevenue'] }, 1] },
              null
            ]
          },

          weekOverWeekGrowth: {
            $cond: [
              { $gt: ['$previousWeekRevenue', 0] },
              { $subtract: [{ $divide: ['$totalRevenue', '$previousWeekRevenue'] }, 1] },
              null
            ]
          },

          monthOverMonthGrowth: {
            $cond: [
              { $gt: ['$previousMonthRevenue', 0] },
              { $subtract: [{ $divide: ['$totalRevenue', '$previousMonthRevenue'] }, 1] },
              null
            ]
          },

          // Trend classification
          trendDirection: {
            $switch: {
              branches: [
                { 
                  case: { $gt: ['$revenue7DayMA', { $multiply: ['$revenue30DayMA', 1.05] }] },
                  then: 'strong_upward'
                },
                { 
                  case: { $gt: ['$revenue7DayMA', { $multiply: ['$revenue30DayMA', 1.02] }] },
                  then: 'upward'
                },
                { 
                  case: { $lt: ['$revenue7DayMA', { $multiply: ['$revenue30DayMA', 0.95] }] },
                  then: 'strong_downward'
                },
                { 
                  case: { $lt: ['$revenue7DayMA', { $multiply: ['$revenue30DayMA', 0.98] }] },
                  then: 'downward'
                }
              ],
              default: 'stable'
            }
          },

          // Seasonality detection
          seasonalityScore: {
            $divide: [
              '$revenueStdDev',
              { $max: ['$revenue30DayMA', 1] }
            ]
          },

          // Performance classification
          performanceCategory: {
            $switch: {
              branches: [
                { 
                  case: { $gte: ['$totalRevenue', { $multiply: ['$revenue30DayMA', 1.2] }] },
                  then: 'exceptional'
                },
                { 
                  case: { $gte: ['$totalRevenue', { $multiply: ['$revenue30DayMA', 1.1] }] },
                  then: 'above_average'
                },
                { 
                  case: { $lte: ['$totalRevenue', { $multiply: ['$revenue30DayMA', 0.8] }] },
                  then: 'below_average'
                },
                { 
                  case: { $lte: ['$totalRevenue', { $multiply: ['$revenue30DayMA', 0.9] }] },
                  then: 'poor'
                }
              ],
              default: 'average'
            }
          }
        }
      },

      // Stage 6: Add forecasting indicators
      {
        $addFields: {
          // Simple linear trend projection (next 7 days)
          next7DayForecast: {
            $add: [
              '$revenue7DayMA',
              {
                $multiply: [
                  7,
                  { $subtract: ['$revenue7DayMA', { $shift: { output: '$revenue7DayMA', by: -7 } }] }
                ]
              }
            ]
          },

          // Confidence interval for forecast
          forecastConfidence: {
            $subtract: [
              100,
              { $multiply: [{ $divide: ['$revenueVolatility', '$revenue7DayMA'] }, 100] }
            ]
          },

          // Anomaly detection
          isAnomaly: {
            $or: [
              { $gt: ['$totalRevenue', { $add: ['$revenue7DayMA', { $multiply: ['$revenueVolatility', 2] }] }] },
              { $lt: ['$totalRevenue', { $subtract: ['$revenue7DayMA', { $multiply: ['$revenueVolatility', 2] }] }] }
            ]
          },

          // Business recommendations
          recommendation: {
            $switch: {
              branches: [
                {
                  case: { $eq: ['$trendDirection', 'strong_downward'] },
                  then: 'Urgent: Investigate revenue decline and implement recovery strategies'
                },
                {
                  case: { $eq: ['$trendDirection', 'strong_upward'] },
                  then: 'Opportunity: Scale successful initiatives and increase capacity'
                },
                {
                  case: { $gt: ['$seasonalityScore', 0.5] },
                  then: 'High volatility detected: Implement demand smoothing strategies'
                },
                {
                  case: { $eq: ['$performanceCategory', 'exceptional'] },
                  then: 'Analyze success factors for replication'
                }
              ],
              default: 'Continue monitoring with current strategy'
            }
          }
        }
      },

      // Stage 7: Final projection and filtering
      {
        $project: {
          date: 1,
          year: '$_id.year',
          month: '$_id.month',
          week: '$_id.week',
          dayOfWeek: '$_id.dayOfWeek',
          hour: '$_id.hour',

          // Core metrics
          orderCount: 1,
          totalRevenue: { $round: ['$totalRevenue', 2] },
          avgOrderValue: { $round: ['$avgOrderValue', 2] },
          uniqueCustomerCount: 1,
          totalItemsSold: 1,

          // Statistical measures
          medianOrderValue: { $round: ['$medianOrderValue', 2] },
          revenueStdDev: { $round: ['$revenueStdDev', 2] },

          // Trend analysis
          revenue7DayMA: { $round: ['$revenue7DayMA', 2] },
          revenue30DayMA: { $round: ['$revenue30DayMA', 2] },
          dayOverDayGrowth: { $round: [{ $multiply: ['$dayOverDayGrowth', 100] }, 2] },
          weekOverWeekGrowth: { $round: [{ $multiply: ['$weekOverWeekGrowth', 100] }, 2] },
          monthOverMonthGrowth: { $round: [{ $multiply: ['$monthOverMonthGrowth', 100] }, 2] },

          trendDirection: 1,
          performanceCategory: 1,
          seasonalityScore: { $round: ['$seasonalityScore', 3] },

          // Forecasting
          next7DayForecast: { $round: ['$next7DayForecast', 2] },
          forecastConfidence: { $round: ['$forecastConfidence', 1] },
          isAnomaly: 1,

          // Context
          periodType: 1,
          timeOfDay: 1,
          uniqueCountryCount: 1,

          // Business intelligence
          recommendation: 1
        }
      },

      // Stage 8: Sort by date descending
      {
        $sort: { date: -1 }
      },

      // Stage 9: Limit to recent data for performance
      {
        $limit: 365 // Last year of daily data
      }
    ];

    const results = await this.collections.orders.aggregate(pipeline).toArray();

    console.log(`Time-series analytics completed: ${results.length} time periods analyzed`);
    return results;
  }

  async executePredictiveAnalytics() {
    console.log('Executing predictive analytics and machine learning insights...');

    const pipeline = [
      // Stage 1: Create customer behavioral features
      {
        $match: {
          'account.status': 'active'
        }
      },

      // Stage 2: Lookup order history
      {
        $lookup: {
          from: 'orders',
          localField: '_id',
          foreignField: 'customerId',
          as: 'orders',
          pipeline: [
            {
              $match: {
                status: 'completed',
                orderDate: { $gte: new Date(Date.now() - 365 * 24 * 60 * 60 * 1000) }
              }
            },
            {
              $project: {
                orderDate: 1,
                'totals.total': 1,
                daysSinceRegistration: {
                  $divide: [
                    { $subtract: ['$orderDate', '$customer.account.createdAt'] },
                    1000 * 60 * 60 * 24
                  ]
                }
              }
            },
            { $sort: { orderDate: 1 } }
          ]
        }
      },

      // Stage 3: Calculate predictive features
      {
        $addFields: {
          // Temporal features
          daysSinceRegistration: {
            $divide: [
              { $subtract: [new Date(), '$account.createdAt'] },
              1000 * 60 * 60 * 24
            ]
          },

          daysSinceLastOrder: {
            $cond: [
              { $gt: [{ $size: '$orders' }, 0] },
              {
                $divide: [
                  { $subtract: [new Date(), { $max: '$orders.orderDate' }] },
                  1000 * 60 * 60 * 24
                ]
              },
              999
            ]
          },

          // Purchase behavior features
          totalOrders: { $size: '$orders' },
          totalSpent: {
            $reduce: {
              input: '$orders',
              initialValue: 0,
              in: { $add: ['$$value', '$$this.totals.total'] }
            }
          },

          // Purchase frequency and regularity
          avgDaysBetweenOrders: {
            $cond: [
              { $gt: [{ $size: '$orders' }, 1] },
              {
                $divide: [
                  {
                    $divide: [
                      { $subtract: [{ $max: '$orders.orderDate' }, { $min: '$orders.orderDate' }] },
                      1000 * 60 * 60 * 24
                    ]
                  },
                  { $subtract: [{ $size: '$orders' }, 1] }
                ]
              },
              null
            ]
          },

          // Purchase pattern analysis
          orderFrequencyTrend: {
            $let: {
              vars: {
                recentOrders: {
                  $size: {
                    $filter: {
                      input: '$orders',
                      cond: { $gte: ['$$this.orderDate', { $dateSubtract: { startDate: new Date(), unit: 'day', amount: 90 } }] }
                    }
                  }
                },
                olderOrders: {
                  $size: {
                    $filter: {
                      input: '$orders',
                      cond: { 
                        $and: [
                          { $lt: ['$$this.orderDate', { $dateSubtract: { startDate: new Date(), unit: 'day', amount: 90 } }] },
                          { $gte: ['$$this.orderDate', { $dateSubtract: { startDate: new Date(), unit: 'day', amount: 180 } }] }
                        ]
                      }
                    }
                  }
                }
              },
              in: {
                $cond: [
                  { $gt: ['$$olderOrders', 0] },
                  { $subtract: [{ $divide: ['$$recentOrders', 90] }, { $divide: ['$$olderOrders', 90] }] },
                  0
                ]
              }
            }
          }
        }
      },

      // Stage 4: Calculate churn probability using logistic regression approximation
      {
        $addFields: {
          // Feature normalization and scoring
          recencyScore: {
            $cond: [
              { $gt: ['$daysSinceLastOrder', 180] }, 0.8,
              { $cond: [
                { $gt: ['$daysSinceLastOrder', 90] }, 0.6,
                { $cond: [
                  { $gt: ['$daysSinceLastOrder', 30] }, 0.3,
                  0.1
                ]}
              ]}
            ]
          },

          frequencyScore: {
            $cond: [
              { $lt: ['$totalOrders', 2] }, 0.7,
              { $cond: [
                { $lt: ['$totalOrders', 5] }, 0.5,
                { $cond: [
                  { $lt: ['$totalOrders', 10] }, 0.3,
                  0.1
                ]}
              ]}
            ]
          },

          monetaryScore: {
            $cond: [
              { $lt: ['$totalSpent', 100] }, 0.6,
              { $cond: [
                { $lt: ['$totalSpent', 500] }, 0.4,
                { $cond: [
                  { $lt: ['$totalSpent', 1000] }, 0.2,
                  0.1
                ]}
              ]}
            ]
          },

          engagementScore: {
            $cond: [
              { $lt: ['$orderFrequencyTrend', -0.5] }, 0.8,
              { $cond: [
                { $lt: ['$orderFrequencyTrend', 0] }, 0.6,
                { $cond: [
                  { $gt: ['$orderFrequencyTrend', 0.5] }, 0.1,
                  0.3
                ]}
              ]}
            ]
          }
        }
      },

      // Stage 5: Calculate composite churn probability
      {
        $addFields: {
          churnProbability: {
            $multiply: [
              {
                $add: [
                  { $multiply: ['$recencyScore', 0.35] },
                  { $multiply: ['$frequencyScore', 0.25] },
                  { $multiply: ['$monetaryScore', 0.25] },
                  { $multiply: ['$engagementScore', 0.15] }
                ]
              },
              100
            ]
          },

          // Customer lifetime value prediction
          predictedLifetimeValue: {
            $cond: [
              { $and: [
                { $gt: ['$totalOrders', 0] },
                { $gt: ['$avgDaysBetweenOrders', 0] }
              ]},
              {
                $multiply: [
                  { $divide: ['$totalSpent', '$totalOrders'] }, // Average order value
                  { $divide: [365, '$avgDaysBetweenOrders'] }, // Orders per year
                  { $subtract: [5, { $multiply: ['$churnProbability', 0.05] }] } // Expected years (adjusted for churn risk)
                ]
              },
              '$totalSpent'
            ]
          },

          // Next purchase prediction
          nextPurchasePrediction: {
            $cond: [
              { $gt: ['$avgDaysBetweenOrders', 0] },
              {
                $dateAdd: {
                  startDate: { $max: '$orders.orderDate' },
                  unit: 'day',
                  amount: { 
                    $multiply: [
                      '$avgDaysBetweenOrders',
                      { $add: [1, { $multiply: ['$churnProbability', 0.01] }] } // Adjust for churn risk
                    ]
                  }
                }
              },
              null
            ]
          },

          // Upselling opportunity score
          upsellOpportunity: {
            $multiply: [
              {
                $add: [
                  { $cond: [{ $gt: ['$totalOrders', 5] }, 0.3, 0] },
                  { $cond: [{ $gt: ['$totalSpent', 500] }, 0.3, 0] },
                  { $cond: [{ $lt: ['$daysSinceLastOrder', 30] }, 0.25, 0] },
                  { $cond: [{ $gt: ['$orderFrequencyTrend', 0] }, 0.15, 0] }
                ]
              },
              100
            ]
          }
        }
      },

      // Stage 6: Risk segmentation and recommendations
      {
        $addFields: {
          riskSegment: {
            $switch: {
              branches: [
                { case: { $gte: ['$churnProbability', 70] }, then: 'high_risk' },
                { case: { $gte: ['$churnProbability', 50] }, then: 'medium_risk' },
                { case: { $gte: ['$churnProbability', 30] }, then: 'low_risk' }
              ],
              default: 'stable'
            }
          },

          valueSegment: {
            $switch: {
              branches: [
                { case: { $gte: ['$predictedLifetimeValue', 2000] }, then: 'high_value' },
                { case: { $gte: ['$predictedLifetimeValue', 1000] }, then: 'medium_value' },
                { case: { $gte: ['$predictedLifetimeValue', 500] }, then: 'moderate_value' }
              ],
              default: 'low_value'
            }
          },

          // AI-driven marketing recommendations
          marketingRecommendation: {
            $switch: {
              branches: [
                {
                  case: { $and: [
                    { $eq: ['$riskSegment', 'high_risk'] },
                    { $in: ['$valueSegment', ['high_value', 'medium_value']] }
                  ]},
                  then: 'Urgent win-back campaign with premium incentives'
                },
                {
                  case: { $and: [
                    { $eq: ['$riskSegment', 'medium_risk'] },
                    { $gte: ['$upsellOpportunity', 60] }
                  ]},
                  then: 'Proactive engagement with upselling opportunities'
                },
                {
                  case: { $and: [
                    { $eq: ['$riskSegment', 'stable'] },
                    { $gte: ['$upsellOpportunity', 70] }
                  ]},
                  then: 'Cross-sell and premium product recommendations'
                },
                {
                  case: { $eq: ['$riskSegment', 'low_risk'] },
                  then: 'Retention campaign with loyalty program enrollment'
                }
              ],
              default: 'Monitor and maintain current engagement level'
            }
          }
        }
      },

      // Stage 7: Add market basket analysis
      {
        $lookup: {
          from: 'orders',
          let: { customerId: '$_id' },
          pipeline: [
            {
              $match: {
                $expr: { $eq: ['$customerId', '$$customerId'] },
                status: 'completed'
              }
            },
            { $unwind: '$items' },
            {
              $group: {
                _id: '$items.productId',
                purchaseCount: { $sum: 1 },
                totalQuantity: { $sum: '$items.quantity' },
                totalSpent: { $sum: '$items.totalPrice' }
              }
            },
            { $sort: { purchaseCount: -1 } },
            { $limit: 5 }
          ],
          as: 'topProducts'
        }
      },

      // Stage 8: Final projection
      {
        $project: {
          _id: 1,
          email: 1,
          'profile.firstName': 1,
          'profile.lastName': 1,
          'account.type': 1,
          'account.createdAt': 1,

          // Behavioral metrics
          daysSinceRegistration: { $round: ['$daysSinceRegistration', 0] },
          daysSinceLastOrder: { $round: ['$daysSinceLastOrder', 0] },
          totalOrders: 1,
          totalSpent: { $round: ['$totalSpent', 2] },
          avgDaysBetweenOrders: { $round: ['$avgDaysBetweenOrders', 1] },

          // Predictive scores
          churnProbability: { $round: ['$churnProbability', 1] },
          predictedLifetimeValue: { $round: ['$predictedLifetimeValue', 2] },
          upsellOpportunity: { $round: ['$upsellOpportunity', 1] },

          // Segmentation
          riskSegment: 1,
          valueSegment: 1,

          // Predictions
          nextPurchasePrediction: 1,
          marketingRecommendation: 1,

          // Product affinity
          topProducts: 1,

          // Trend analysis
          orderFrequencyTrend: { $round: ['$orderFrequencyTrend', 3] }
        }
      },

      // Stage 9: Sort by strategic importance
      {
        $sort: {
          predictedLifetimeValue: -1,
          churnProbability: -1,
          upsellOpportunity: -1
        }
      },

      // Stage 10: Limit to top opportunities
      {
        $limit: 1000
      }
    ];

    const results = await this.collections.customers.aggregate(pipeline).toArray();

    console.log(`Predictive analytics completed: ${results.length} customers analyzed with ML insights`);
    return results;
  }

  async cacheAnalyticsResults(analysisType, data) {
    console.log(`Caching ${analysisType} analytics results...`);

    try {
      await this.collections.analytics.replaceOne(
        { type: analysisType },
        {
          type: analysisType,
          data: data,
          generatedAt: new Date(),
          expiresAt: new Date(Date.now() + 24 * 60 * 60 * 1000) // 24 hour TTL
        },
        { upsert: true }
      );

    } catch (error) {
      console.warn('Failed to cache analytics results:', error.message);
    }
  }

  async getAnalyticsDashboard() {
    console.log('Generating comprehensive analytics dashboard...');

    const [
      salesSummary,
      customerInsights,
      productInsights,
      timeSeriesInsights,
      predictiveInsights
    ] = await Promise.all([
      this.getSalesSummary(),
      this.getCustomerInsights(),
      this.getProductInsights(),
      this.getTimeSeriesInsights(),
      this.getPredictiveInsights()
    ]);

    return {
      dashboard: {
        salesSummary,
        customerInsights,
        productInsights,
        timeSeriesInsights,
        predictiveInsights
      },
      metadata: {
        generatedAt: new Date(),
        dataFreshness: '< 1 hour',
        recordsCovered: {
          orders: salesSummary.totalOrders || 0,
          customers: customerInsights.totalCustomers || 0,
          products: productInsights.totalProducts || 0
        }
      }
    };
  }

  async getSalesSummary() {
    const pipeline = [
      {
        $match: {
          status: 'completed',
          orderDate: { $gte: new Date(Date.now() - 30 * 24 * 60 * 60 * 1000) }
        }
      },
      {
        $group: {
          _id: null,
          totalOrders: { $sum: 1 },
          totalRevenue: { $sum: '$totals.total' },
          avgOrderValue: { $avg: '$totals.total' },
          uniqueCustomers: { $addToSet: '$customerId' }
        }
      },
      {
        $project: {
          totalOrders: 1,
          totalRevenue: { $round: ['$totalRevenue', 2] },
          avgOrderValue: { $round: ['$avgOrderValue', 2] },
          uniqueCustomers: { $size: '$uniqueCustomers' }
        }
      }
    ];

    const result = await this.collections.orders.aggregate(pipeline).toArray();
    return result[0] || {};
  }

  async getCustomerInsights() {
    const pipeline = [
      {
        $group: {
          _id: null,
          totalCustomers: { $sum: 1 },
          activeCustomers: { $sum: { $cond: [{ $eq: ['$account.status', 'active'] }, 1, 0] } },
          premiumCustomers: { $sum: { $cond: [{ $eq: ['$account.type', 'premium'] }, 1, 0] } }
        }
      }
    ];

    const result = await this.collections.customers.aggregate(pipeline).toArray();
    return result[0] || {};
  }

  async getProductInsights() {
    const pipeline = [
      {
        $group: {
          _id: null,
          totalProducts: { $sum: 1 },
          activeProducts: { $sum: { $cond: [{ $eq: ['$status', 'active'] }, 1, 0] } },
          avgPrice: { $avg: '$pricing.retail' }
        }
      },
      {
        $project: {
          totalProducts: 1,
          activeProducts: 1,
          avgPrice: { $round: ['$avgPrice', 2] }
        }
      }
    ];

    const result = await this.collections.products.aggregate(pipeline).toArray();
    return result[0] || {};
  }

  async getTimeSeriesInsights() {
    return {
      trend: 'upward',
      growthRate: 12.5,
      volatility: 'moderate'
    };
  }

  async getPredictiveInsights() {
    return {
      averageChurnRisk: 25.3,
      highValueCustomers: 150,
      upsellOpportunities: 320
    };
  }
}

// Benefits of MongoDB Advanced Aggregation Framework:
// - Real-time analytics processing without ETL pipelines or data warehouses
// - Complex multi-stage transformations with window functions and statistical operations
// - Advanced time-series analysis with forecasting and trend detection capabilities
// - Machine learning integration for predictive analytics and customer segmentation
// - Flexible aggregation patterns that adapt to changing analytical requirements
// - High-performance processing that scales with data volume and complexity
// - SQL-compatible analytical operations through QueryLeaf integration
// - Comprehensive business intelligence capabilities within the operational database
// - Advanced statistical functions and mathematical operations for data science
// - Real-time dashboard generation with automated insights and recommendations

module.exports = {
  MongoDBAnalyticsEngine
};

SQL-Style Aggregation with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB aggregation operations:

-- QueryLeaf advanced analytics with SQL-familiar aggregation syntax

-- Complex sales analytics with window functions and advanced aggregations
WITH monthly_sales_analysis AS (
  SELECT 
    DATE_TRUNC('month', order_date) as month,
    product_category,
    customer_location.country,
    customer_type,

    -- Basic aggregations
    COUNT(*) as order_count,
    COUNT(DISTINCT customer_id) as unique_customers,
    SUM(total_amount) as total_revenue,
    AVG(total_amount) as avg_order_value,
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY total_amount) as median_order_value,
    STDDEV_POP(total_amount) as order_value_stddev,

    -- Item-level aggregations
    SUM(item_quantity) as total_items_sold,
    AVG(item_quantity) as avg_items_per_order,
    SUM(item_quantity * item_unit_price) as item_revenue,
    AVG(item_unit_price) as avg_item_price,

    -- Advanced calculations
    SUM(item_quantity * (item_unit_price - product_cost)) as total_profit,
    AVG((item_unit_price - product_cost) / item_unit_price) as avg_profit_margin,

    -- Customer behavior metrics
    COUNT(*) FILTER (WHERE customer_registration_date >= DATE_TRUNC('month', order_date)) as new_customers,
    COUNT(DISTINCT customer_id) FILTER (WHERE previous_order_date < DATE_TRUNC('month', order_date) - INTERVAL '3 months') as returning_customers,

    -- Geographic diversity
    COUNT(DISTINCT customer_location.country) as unique_countries,
    COUNT(DISTINCT customer_location.region) as unique_regions

  FROM orders o
  CROSS JOIN UNNEST(o.items) as item
  JOIN products p ON item.product_id = p._id
  JOIN customers c ON o.customer_id = c._id
  WHERE o.status = 'completed'
    AND o.order_date >= CURRENT_DATE - INTERVAL '24 months'
  GROUP BY 
    DATE_TRUNC('month', order_date),
    product_category,
    customer_location.country,
    customer_type
),

-- Advanced window functions for trend analysis
sales_with_trends AS (
  SELECT 
    *,

    -- Moving averages
    AVG(total_revenue) OVER (
      PARTITION BY product_category, customer_location.country
      ORDER BY month
      ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ) as revenue_3month_ma,

    AVG(total_revenue) OVER (
      PARTITION BY product_category, customer_location.country
      ORDER BY month  
      ROWS BETWEEN 5 PRECEDING AND CURRENT ROW
    ) as revenue_6month_ma,

    -- Growth calculations
    LAG(total_revenue, 1) OVER (
      PARTITION BY product_category, customer_location.country
      ORDER BY month
    ) as prev_month_revenue,

    LAG(total_revenue, 12) OVER (
      PARTITION BY product_category, customer_location.country
      ORDER BY month
    ) as prev_year_revenue,

    -- Ranking and percentiles
    RANK() OVER (
      PARTITION BY month
      ORDER BY total_revenue DESC
    ) as monthly_revenue_rank,

    PERCENT_RANK() OVER (
      PARTITION BY month
      ORDER BY total_revenue
    ) as monthly_revenue_percentile,

    -- Cumulative calculations
    SUM(total_revenue) OVER (
      PARTITION BY product_category, customer_location.country
      ORDER BY month
      ROWS UNBOUNDED PRECEDING
    ) as cumulative_revenue,

    -- Volatility measures
    STDDEV(total_revenue) OVER (
      PARTITION BY product_category, customer_location.country
      ORDER BY month
      ROWS BETWEEN 5 PRECEDING AND CURRENT ROW
    ) as revenue_volatility,

    -- Lead/lag for forecasting
    LEAD(total_revenue, 1) OVER (
      PARTITION BY product_category, customer_location.country
      ORDER BY month
    ) as next_month_actual,

    -- Dense rank for market position
    DENSE_RANK() OVER (
      PARTITION BY month
      ORDER BY total_revenue DESC
    ) as market_position

  FROM monthly_sales_analysis
),

-- Calculate growth rates and performance indicators
performance_metrics AS (
  SELECT 
    *,

    -- Growth rate calculations
    CASE 
      WHEN prev_month_revenue > 0 THEN 
        ROUND(((total_revenue - prev_month_revenue) / prev_month_revenue * 100), 2)
      ELSE NULL
    END as month_over_month_growth,

    CASE 
      WHEN prev_year_revenue > 0 THEN
        ROUND(((total_revenue - prev_year_revenue) / prev_year_revenue * 100), 2)
      ELSE NULL
    END as year_over_year_growth,

    -- Trend classification
    CASE 
      WHEN revenue_3month_ma > revenue_6month_ma * 1.05 THEN 'strong_growth'
      WHEN revenue_3month_ma > revenue_6month_ma * 1.02 THEN 'moderate_growth'
      WHEN revenue_3month_ma < revenue_6month_ma * 0.95 THEN 'declining'
      WHEN revenue_3month_ma < revenue_6month_ma * 0.98 THEN 'weak_growth'
      ELSE 'stable'
    END as trend_classification,

    -- Performance assessment
    CASE 
      WHEN monthly_revenue_percentile >= 0.9 THEN 'top_performer'
      WHEN monthly_revenue_percentile >= 0.75 THEN 'strong_performer'
      WHEN monthly_revenue_percentile >= 0.5 THEN 'average_performer'
      WHEN monthly_revenue_percentile >= 0.25 THEN 'weak_performer'
      ELSE 'bottom_performer'
    END as performance_category,

    -- Volatility assessment
    CASE 
      WHEN revenue_volatility / NULLIF(revenue_6month_ma, 0) > 0.3 THEN 'high_volatility'
      WHEN revenue_volatility / NULLIF(revenue_6month_ma, 0) > 0.15 THEN 'moderate_volatility'
      ELSE 'low_volatility'
    END as volatility_level,

    -- Market share approximation
    ROUND(
      (total_revenue / SUM(total_revenue) OVER (PARTITION BY month) * 100), 
      3
    ) as market_share_percent,

    -- Customer metrics
    ROUND((total_revenue / unique_customers), 2) as revenue_per_customer,
    ROUND((total_profit / total_revenue * 100), 2) as profit_margin_percent,
    ROUND((new_customers / unique_customers * 100), 2) as new_customer_rate,

    -- Operational efficiency
    ROUND((total_items_sold / order_count), 2) as items_per_order,
    ROUND((total_revenue / total_items_sold), 2) as revenue_per_item

  FROM sales_with_trends
),

-- Advanced customer segmentation with RFM analysis
customer_rfm_analysis AS (
  SELECT 
    customer_id,
    customer_type,
    customer_location.country,
    customer_registration_date,

    -- Recency calculation (days since last order)
    EXTRACT(DAYS FROM (CURRENT_DATE - MAX(order_date))) as recency_days,

    -- Frequency (number of orders)
    COUNT(*) as frequency,

    -- Monetary (total spending)
    SUM(total_amount) as monetary_value,

    -- Additional behavioral metrics
    AVG(total_amount) as avg_order_value,
    MIN(order_date) as first_order_date,
    MAX(order_date) as last_order_date,
    COUNT(DISTINCT product_category) as unique_categories_purchased,
    EXTRACT(DAYS FROM (MAX(order_date) - MIN(order_date))) as customer_lifetime_days,

    -- Purchase patterns
    AVG(EXTRACT(DAYS FROM (order_date - LAG(order_date) OVER (PARTITION BY customer_id ORDER BY order_date)))) as avg_days_between_orders,
    STDDEV(total_amount) as order_value_consistency,

    -- Seasonal analysis
    COUNT(*) FILTER (WHERE EXTRACT(QUARTER FROM order_date) = 1) as q1_orders,
    COUNT(*) FILTER (WHERE EXTRACT(QUARTER FROM order_date) = 2) as q2_orders,
    COUNT(*) FILTER (WHERE EXTRACT(QUARTER FROM order_date) = 3) as q3_orders,
    COUNT(*) FILTER (WHERE EXTRACT(QUARTER FROM order_date) = 4) as q4_orders

  FROM orders o
  JOIN customers c ON o.customer_id = c._id
  WHERE o.status = 'completed'
    AND o.order_date >= CURRENT_DATE - INTERVAL '24 months'
  GROUP BY customer_id, customer_type, customer_location.country, customer_registration_date
),

-- Calculate RFM scores and customer segments
customer_segments AS (
  SELECT 
    *,

    -- RFM score calculations using percentile ranking
    NTILE(5) OVER (ORDER BY recency_days ASC) as recency_score, -- Lower recency is better
    NTILE(5) OVER (ORDER BY frequency DESC) as frequency_score, -- Higher frequency is better  
    NTILE(5) OVER (ORDER BY monetary_value DESC) as monetary_score, -- Higher monetary is better

    -- Customer lifetime value prediction
    CASE 
      WHEN avg_days_between_orders > 0 THEN
        ROUND(
          (avg_order_value * (365.0 / avg_days_between_orders) * 3), -- 3 year projection
          2
        )
      ELSE monetary_value
    END as predicted_lifetime_value,

    -- Churn risk assessment
    CASE 
      WHEN recency_days > 180 THEN 'high_risk'
      WHEN recency_days > 90 THEN 'medium_risk'
      WHEN recency_days > 30 THEN 'low_risk'
      ELSE 'active'
    END as churn_risk,

    -- Engagement level
    CASE 
      WHEN frequency >= 10 AND recency_days <= 30 THEN 'highly_engaged'
      WHEN frequency >= 5 AND recency_days <= 60 THEN 'moderately_engaged'
      WHEN frequency >= 2 AND recency_days <= 120 THEN 'lightly_engaged'
      ELSE 'disengaged'
    END as engagement_level,

    -- Purchase diversity
    CASE 
      WHEN unique_categories_purchased >= 5 THEN 'diverse_buyer'
      WHEN unique_categories_purchased >= 3 THEN 'selective_buyer'
      ELSE 'focused_buyer'
    END as purchase_diversity

  FROM customer_rfm_analysis
),

-- Final customer classification
customer_classification AS (
  SELECT 
    *,

    -- RFM segment classification
    CASE 
      WHEN recency_score >= 4 AND frequency_score >= 4 AND monetary_score >= 4 THEN 'champions'
      WHEN recency_score >= 2 AND frequency_score >= 3 AND monetary_score >= 3 THEN 'loyal_customers'
      WHEN recency_score >= 3 AND frequency_score <= 3 AND monetary_score <= 3 THEN 'potential_loyalists'
      WHEN recency_score >= 4 AND frequency_score <= 1 THEN 'new_customers'
      WHEN recency_score >= 3 AND frequency_score <= 1 AND monetary_score <= 2 THEN 'promising'
      WHEN recency_score <= 2 AND frequency_score >= 2 AND monetary_score >= 2 THEN 'need_attention'
      WHEN recency_score <= 2 AND frequency_score <= 2 AND monetary_score >= 3 THEN 'about_to_sleep'
      WHEN recency_score <= 2 AND frequency_score <= 2 AND monetary_score <= 2 THEN 'at_risk'
      WHEN recency_score <= 1 AND frequency_score <= 2 AND monetary_score >= 4 THEN 'cannot_lose_them'
      ELSE 'hibernating'
    END as rfm_segment,

    -- Marketing action recommendations
    CASE 
      WHEN recency_score >= 4 AND frequency_score >= 4 THEN 'Reward with loyalty program and exclusive offers'
      WHEN monetary_score >= 4 AND recency_score <= 2 THEN 'Win-back campaign with premium incentives'
      WHEN recency_score >= 4 AND frequency_score <= 2 THEN 'Nurture with educational content and onboarding'
      WHEN frequency_score >= 3 AND recency_days > 60 THEN 'Re-engagement campaign with personalized offers'
      WHEN churn_risk = 'high_risk' AND monetary_score >= 3 THEN 'Urgent retention campaign'
      ELSE 'Monitor and maintain regular communication'
    END as marketing_recommendation

  FROM customer_segments
),

-- Product performance analysis with advanced metrics
product_performance AS (
  SELECT 
    p._id as product_id,
    p.name as product_name,
    p.category,
    p.brand,
    p.pricing.cost,
    p.pricing.retail,

    -- Sales metrics from orders
    COALESCE(sales.total_units_sold, 0) as total_units_sold,
    COALESCE(sales.total_revenue, 0) as total_revenue,
    COALESCE(sales.total_orders, 0) as total_orders,
    COALESCE(sales.unique_customers, 0) as unique_customers,
    COALESCE(sales.avg_selling_price, p.pricing.retail) as avg_selling_price,

    -- Profitability analysis
    COALESCE(sales.total_profit, 0) as total_profit,
    CASE 
      WHEN COALESCE(sales.total_revenue, 0) > 0 THEN
        ROUND((COALESCE(sales.total_profit, 0) / sales.total_revenue * 100), 2)
      ELSE 0
    END as profit_margin_percent,

    -- Performance indicators
    CASE 
      WHEN COALESCE(sales.total_revenue, 0) = 0 THEN 'no_sales'
      WHEN sales.first_sale_date >= CURRENT_DATE - INTERVAL '90 days' THEN 'new_product'
      WHEN sales.total_revenue >= 50000 AND sales.total_profit / sales.total_revenue >= 0.2 THEN 'star'
      WHEN sales.total_revenue >= 10000 AND sales.total_profit / sales.total_revenue >= 0.15 THEN 'promising'
      WHEN sales.last_sale_date < CURRENT_DATE - INTERVAL '60 days' THEN 'declining'
      ELSE 'stable'
    END as performance_category,

    -- Inventory analysis
    p.inventory.quantity as current_stock,
    CASE 
      WHEN COALESCE(sales.total_units_sold, 0) > 0 AND p.inventory.quantity > 0 THEN
        ROUND((sales.total_units_sold / p.inventory.quantity), 2)
      ELSE 0
    END as inventory_turnover,

    -- Market position
    sales.category_rank,
    sales.category_market_share,

    -- Time-based metrics
    sales.first_sale_date,
    sales.last_sale_date,
    sales.sales_trend

  FROM products p
  LEFT JOIN (
    SELECT 
      item.product_id,
      COUNT(DISTINCT o.order_id) as total_orders,
      SUM(item.quantity) as total_units_sold,
      SUM(item.quantity * item.unit_price) as total_revenue,
      COUNT(DISTINCT o.customer_id) as unique_customers,
      AVG(item.unit_price) as avg_selling_price,
      MIN(o.order_date) as first_sale_date,
      MAX(o.order_date) as last_sale_date,
      SUM(item.quantity * (item.unit_price - p.pricing.cost)) as total_profit,

      -- Category ranking
      RANK() OVER (PARTITION BY p.category ORDER BY SUM(item.quantity * item.unit_price) DESC) as category_rank,

      -- Market share within category
      ROUND(
        (SUM(item.quantity * item.unit_price) / 
         SUM(SUM(item.quantity * item.unit_price)) OVER (PARTITION BY p.category) * 100),
        2
      ) as category_market_share,

      -- Sales trend analysis
      CASE 
        WHEN COUNT(*) FILTER (WHERE o.order_date >= CURRENT_DATE - INTERVAL '90 days') >
             COUNT(*) FILTER (WHERE o.order_date BETWEEN CURRENT_DATE - INTERVAL '180 days' AND CURRENT_DATE - INTERVAL '90 days') 
        THEN 'growing'
        WHEN COUNT(*) FILTER (WHERE o.order_date >= CURRENT_DATE - INTERVAL '90 days') <
             COUNT(*) FILTER (WHERE o.order_date BETWEEN CURRENT_DATE - INTERVAL '180 days' AND CURRENT_DATE - INTERVAL '90 days') * 0.8
        THEN 'declining'  
        ELSE 'stable'
      END as sales_trend

    FROM orders o
    CROSS JOIN UNNEST(o.items) as item
    JOIN products p ON item.product_id = p._id
    WHERE o.status = 'completed'
      AND o.order_date >= CURRENT_DATE - INTERVAL '12 months'
    GROUP BY item.product_id, p.category, p.pricing.cost
  ) sales ON p._id = sales.product_id
)

-- Final consolidated analytics report
SELECT 
  'EXECUTIVE_SUMMARY' as report_section,

  -- Overall business performance
  (SELECT COUNT(*) FROM performance_metrics WHERE month >= CURRENT_DATE - INTERVAL '1 month') as current_month_segments,
  (SELECT ROUND(AVG(total_revenue), 2) FROM performance_metrics WHERE month >= CURRENT_DATE - INTERVAL '1 month') as avg_monthly_revenue,
  (SELECT ROUND(AVG(month_over_month_growth), 2) FROM performance_metrics WHERE month_over_month_growth IS NOT NULL) as avg_growth_rate,

  -- Customer insights
  (SELECT COUNT(*) FROM customer_classification WHERE rfm_segment = 'champions') as champion_customers,
  (SELECT COUNT(*) FROM customer_classification WHERE churn_risk = 'high_risk') as high_risk_customers,
  (SELECT ROUND(AVG(predicted_lifetime_value), 2) FROM customer_classification) as avg_customer_lifetime_value,

  -- Product insights
  (SELECT COUNT(*) FROM product_performance WHERE performance_category = 'star') as star_products,
  (SELECT COUNT(*) FROM product_performance WHERE performance_category = 'declining') as declining_products,
  (SELECT ROUND(AVG(profit_margin_percent), 2) FROM product_performance WHERE total_revenue > 0) as avg_profit_margin,

  -- Strategic recommendations
  CASE 
    WHEN (SELECT AVG(month_over_month_growth) FROM performance_metrics WHERE month_over_month_growth IS NOT NULL) < -10 
    THEN 'URGENT: Implement revenue recovery strategy'
    WHEN (SELECT COUNT(*) FROM customer_classification WHERE churn_risk = 'high_risk') > 
         (SELECT COUNT(*) FROM customer_classification WHERE rfm_segment = 'champions')
    THEN 'FOCUS: Customer retention and re-engagement programs'
    WHEN (SELECT COUNT(*) FROM product_performance WHERE performance_category = 'star') < 5
    THEN 'OPPORTUNITY: Invest in product development and innovation'
    ELSE 'MAINTAIN: Continue current strategies with incremental improvements'
  END as primary_strategic_recommendation

UNION ALL

-- Performance trends
SELECT 
  'PERFORMANCE_TRENDS',
  month::text,
  product_category,
  customer_location.country,
  total_revenue,
  month_over_month_growth,
  trend_classification,
  performance_category,
  market_share_percent
FROM performance_metrics
WHERE month >= CURRENT_DATE - INTERVAL '6 months'
ORDER BY month DESC, total_revenue DESC
LIMIT 20

UNION ALL

-- Top customer segments  
SELECT 
  'CUSTOMER_SEGMENTS',
  rfm_segment,
  COUNT(*)::text as customer_count,
  ROUND(AVG(monetary_value), 2)::text as avg_lifetime_value,
  churn_risk,
  engagement_level,
  marketing_recommendation
FROM customer_classification  
GROUP BY rfm_segment, churn_risk, engagement_level, marketing_recommendation
ORDER BY AVG(monetary_value) DESC
LIMIT 15

UNION ALL

-- Product performance summary
SELECT 
  'PRODUCT_PERFORMANCE',
  product_name,
  category,
  total_revenue::text,
  profit_margin_percent::text,
  performance_category,
  inventory_turnover::text,
  sales_trend,
  CASE category_rank WHEN 1 THEN 'Category Leader' ELSE category_rank::text END
FROM product_performance
WHERE total_revenue > 0
ORDER BY total_revenue DESC
LIMIT 25;

-- QueryLeaf provides comprehensive aggregation capabilities:
-- 1. SQL-familiar window functions with OVER clauses and frame specifications  
-- 2. Advanced statistical functions including percentiles, standard deviation, and ranking
-- 3. Complex GROUP BY operations with ROLLUP, CUBE, and GROUPING SETS support
-- 4. Sophisticated JOIN operations including LATERAL joins for nested processing
-- 5. CTEs (Common Table Expressions) for complex multi-stage analytical queries  
-- 6. CASE expressions and conditional logic for business rule implementation
-- 7. Date/time functions for temporal analysis and time-series processing
-- 8. String and array functions for text processing and data transformation
-- 9. JSON processing functions for nested document analysis and extraction
-- 10. Integration with MongoDB's native aggregation optimizations and indexing

Best Practices for MongoDB Aggregation Implementation

Pipeline Design Principles

Essential guidelines for effective aggregation pipeline construction:

  1. Early Filtering: Place $match stages early to reduce data volume through the pipeline
  2. Index Utilization: Design pipelines to leverage existing indexes for optimal performance
  3. Stage Ordering: Order stages to minimize computational overhead and data transfer
  4. Memory Management: Monitor memory usage and use allowDiskUse for large datasets
  5. Field Projection: Use $project to limit fields and reduce document size early
  6. Pipeline Caching: Cache frequently-used aggregation results for improved performance

Performance Optimization Strategies

Optimize MongoDB aggregation pipelines for production workloads:

  1. Compound Indexes: Create indexes that support multiple pipeline stages
  2. Covered Queries: Design pipelines that can be satisfied entirely from indexes
  3. Parallel Processing: Use multiple concurrent pipelines for independent analyses
  4. Result Caching: Implement intelligent caching for expensive aggregations
  5. Incremental Updates: Process only new/changed data for time-series analytics
  6. Resource Monitoring: Track aggregation performance and optimize accordingly

Conclusion

MongoDB's Aggregation Framework provides comprehensive real-time analytics capabilities that eliminate the need for separate ETL processes, data warehouses, and batch processing systems. The powerful pipeline architecture enables sophisticated data transformations, statistical analysis, and predictive modeling directly within the operational database, delivering immediate insights while maintaining high performance at scale.

Key MongoDB Aggregation Framework benefits include:

  • Real-Time Processing: Immediate analytical results without data movement or batch delays
  • Advanced Analytics: Comprehensive statistical functions, window operations, and machine learning integration
  • Flexible Pipelines: Multi-stage transformations that adapt to evolving analytical requirements
  • Scalable Performance: High-performance processing that scales with data volume and complexity
  • SQL Compatibility: Familiar analytical operations through QueryLeaf's SQL interface
  • Integrated Architecture: Seamless integration with operational workloads and existing applications

Whether you're building real-time dashboards, customer analytics platforms, financial reporting systems, or any application requiring sophisticated data analysis, MongoDB's Aggregation Framework with QueryLeaf's familiar SQL interface provides the foundation for powerful, maintainable analytical solutions.

QueryLeaf Integration: QueryLeaf automatically optimizes MongoDB aggregation operations while providing SQL-familiar analytics syntax, window functions, and statistical operations. Complex analytical queries, predictive models, and real-time insights are seamlessly handled through familiar SQL constructs, making advanced data processing both powerful and accessible to SQL-oriented development teams.

The combination of MongoDB's native aggregation capabilities with SQL-style analytics operations makes MongoDB an ideal platform for applications requiring both operational efficiency and analytical sophistication, ensuring your applications can deliver immediate insights while maintaining optimal performance as data volumes and complexity grow.

MongoDB Connection Pooling Optimization Strategies: Advanced Connection Management and Performance Tuning for High-Throughput Applications

High-throughput database applications require sophisticated connection management strategies and comprehensive pooling optimization techniques that can handle concurrent request patterns, varying workload demands, and complex scaling requirements while maintaining optimal performance and resource utilization. Traditional database connection approaches often struggle with dynamic scaling, connection overhead management, and the complexity of balancing connection availability with resource consumption, leading to performance bottlenecks, resource exhaustion, and operational challenges in production environments.

MongoDB provides comprehensive connection pooling capabilities through intelligent connection management, sophisticated monitoring features, and optimized driver implementations that enable applications to achieve maximum throughput with minimal connection overhead. Unlike traditional databases that require manual connection tuning procedures and complex pooling configuration, MongoDB drivers integrate advanced pooling algorithms directly with automatic connection scaling, real-time performance monitoring, and intelligent connection lifecycle management.

The Traditional Connection Management Challenge

Conventional approaches to database connection management in enterprise applications face significant limitations in scalability and resource optimization:

-- Traditional PostgreSQL connection management - manual pooling with limited optimization capabilities

-- Basic connection tracking table with minimal functionality
CREATE TABLE connection_pool_stats (
    pool_id SERIAL PRIMARY KEY,
    application_name VARCHAR(100) NOT NULL,
    database_name VARCHAR(100) NOT NULL,
    host_address VARCHAR(255) NOT NULL,
    port_number INTEGER DEFAULT 5432,

    -- Basic pool configuration (static settings)
    min_connections INTEGER DEFAULT 5,
    max_connections INTEGER DEFAULT 20,
    connection_timeout INTEGER DEFAULT 30,
    idle_timeout INTEGER DEFAULT 600,

    -- Simple usage statistics (very limited visibility)
    active_connections INTEGER DEFAULT 0,
    idle_connections INTEGER DEFAULT 0,
    total_connections_created BIGINT DEFAULT 0,
    total_connections_destroyed BIGINT DEFAULT 0,

    -- Basic performance metrics
    avg_connection_wait_time DECIMAL(8,3),
    max_connection_wait_time DECIMAL(8,3),
    connection_failures BIGINT DEFAULT 0,

    -- Manual tracking timestamps
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    last_updated TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    last_analyzed TIMESTAMP
);

-- Query execution tracking table (basic functionality)
CREATE TABLE query_execution_log (
    execution_id SERIAL PRIMARY KEY,
    pool_id INTEGER REFERENCES connection_pool_stats(pool_id),
    session_id VARCHAR(100),

    -- Query identification
    query_hash VARCHAR(64),
    query_type VARCHAR(50), -- SELECT, INSERT, UPDATE, DELETE
    query_text TEXT, -- Usually truncated for storage

    -- Basic timing information
    start_time TIMESTAMP NOT NULL,
    end_time TIMESTAMP,
    execution_duration DECIMAL(10,3),

    -- Connection usage
    connection_acquired_at TIMESTAMP,
    connection_released_at TIMESTAMP,
    connection_wait_time DECIMAL(8,3),

    -- Simple result metrics
    rows_affected INTEGER,
    rows_returned INTEGER,

    -- Status tracking
    execution_status VARCHAR(20), -- success, error, timeout
    error_message TEXT,

    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Manual connection pool configuration (static and inflexible)
CREATE TABLE pool_configuration (
    config_id SERIAL PRIMARY KEY,
    pool_name VARCHAR(100) UNIQUE NOT NULL,

    -- Static configuration parameters
    initial_pool_size INTEGER DEFAULT 5,
    maximum_pool_size INTEGER DEFAULT 50,
    minimum_idle_connections INTEGER DEFAULT 2,

    -- Timeout settings (fixed values)
    connection_timeout_seconds INTEGER DEFAULT 30,
    idle_connection_timeout_seconds INTEGER DEFAULT 1800,
    validation_timeout_seconds INTEGER DEFAULT 5,

    -- Simple retry configuration
    max_retry_attempts INTEGER DEFAULT 3,
    retry_delay_seconds INTEGER DEFAULT 1,

    -- Basic health check
    validation_query VARCHAR(500) DEFAULT 'SELECT 1',
    validate_on_borrow BOOLEAN DEFAULT true,
    validate_on_return BOOLEAN DEFAULT false,

    -- Manual maintenance
    test_while_idle BOOLEAN DEFAULT true,
    time_between_eviction_runs INTEGER DEFAULT 300,
    num_tests_per_eviction_run INTEGER DEFAULT 3,

    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Complex query to analyze connection performance (expensive and limited)
WITH connection_performance AS (
    SELECT 
        cps.application_name,
        cps.database_name,
        cps.host_address,

        -- Basic pool utilization
        CASE 
            WHEN cps.max_connections > 0 THEN 
                (cps.active_connections::DECIMAL / cps.max_connections) * 100
            ELSE 0 
        END as pool_utilization_percent,

        -- Simple connection metrics
        cps.active_connections,
        cps.idle_connections,
        cps.total_connections_created,
        cps.total_connections_destroyed,

        -- Basic performance statistics
        cps.avg_connection_wait_time,
        cps.max_connection_wait_time,
        cps.connection_failures,

        -- Query performance (limited aggregation)
        COUNT(qel.execution_id) as total_queries_24h,
        AVG(qel.execution_duration) as avg_query_duration,
        AVG(qel.connection_wait_time) as avg_connection_wait,

        -- Simple error tracking
        COUNT(CASE WHEN qel.execution_status = 'error' THEN 1 END) as error_count,
        COUNT(CASE WHEN qel.execution_status = 'timeout' THEN 1 END) as timeout_count,

        -- Basic connection efficiency (limited insights)
        CASE 
            WHEN COUNT(qel.execution_id) > 0 THEN
                COUNT(CASE WHEN qel.connection_wait_time < 0.100 THEN 1 END)::DECIMAL / 
                COUNT(qel.execution_id) * 100
            ELSE 0
        END as fast_connection_percentage

    FROM connection_pool_stats cps
    LEFT JOIN query_execution_log qel ON cps.pool_id = qel.pool_id
        AND qel.start_time >= CURRENT_TIMESTAMP - INTERVAL '24 hours'

    WHERE cps.last_updated >= CURRENT_TIMESTAMP - INTERVAL '1 hour'

    GROUP BY cps.pool_id, cps.application_name, cps.database_name, 
             cps.host_address, cps.active_connections, cps.idle_connections,
             cps.max_connections, cps.total_connections_created, 
             cps.total_connections_destroyed, cps.avg_connection_wait_time,
             cps.max_connection_wait_time, cps.connection_failures
),

pool_health_analysis AS (
    SELECT *,
        -- Simple health scoring (limited factors)
        CASE 
            WHEN pool_utilization_percent > 90 THEN 'Critical'
            WHEN pool_utilization_percent > 75 THEN 'Warning'
            WHEN avg_connection_wait > 1.0 THEN 'Warning'
            WHEN error_count > total_queries_24h * 0.05 THEN 'Warning'
            ELSE 'Healthy'
        END as pool_health_status,

        -- Basic recommendation logic (very limited)
        CASE 
            WHEN pool_utilization_percent > 85 THEN 
                'Consider increasing max_connections'
            WHEN avg_connection_wait > 0.5 THEN 
                'Review connection timeout settings'
            WHEN error_count > 10 THEN 
                'Investigate connection failures'
            ELSE 'Pool configuration appears adequate'
        END as basic_recommendation

    FROM connection_performance
)

SELECT 
    application_name,
    database_name,
    host_address,

    -- Pool status overview
    pool_utilization_percent,
    pool_health_status,
    active_connections,
    idle_connections,

    -- Performance metrics
    total_queries_24h,
    avg_query_duration,
    avg_connection_wait,
    fast_connection_percentage,

    -- Error tracking
    error_count,
    timeout_count,

    -- Basic recommendations
    basic_recommendation,

    CURRENT_TIMESTAMP as analysis_timestamp

FROM pool_health_analysis
ORDER BY 
    CASE pool_health_status
        WHEN 'Critical' THEN 1
        WHEN 'Warning' THEN 2
        ELSE 3
    END,
    pool_utilization_percent DESC;

-- Problems with traditional connection pooling approach:
-- 1. Static configuration cannot adapt to changing workloads
-- 2. Limited visibility into connection lifecycle and performance
-- 3. Manual tuning required for optimal performance
-- 4. No automatic scaling based on demand patterns
-- 5. Basic health checking with limited diagnostic capabilities
-- 6. Inefficient connection distribution across database instances
-- 7. No built-in monitoring for connection pool performance
-- 8. Difficult to troubleshoot connection-related performance issues
-- 9. Limited integration with application performance monitoring
-- 10. Manual intervention required for pool optimization

MongoDB's intelligent connection pooling eliminates these limitations:

// MongoDB optimized connection pooling - intelligent and performance-focused
// Advanced connection management with automatic optimization

const { MongoClient } = require('mongodb');

// Comprehensive connection pool configuration
class MongoConnectionPoolManager {
  constructor(connectionUri, options = {}) {
    this.connectionUri = connectionUri;
    this.poolOptions = {
      // Intelligent pool sizing
      minPoolSize: options.minPoolSize || 5,
      maxPoolSize: options.maxPoolSize || 100,
      maxIdleTimeMS: options.maxIdleTimeMS || 30000,

      // Advanced connection management
      waitQueueTimeoutMS: options.waitQueueTimeoutMS || 2500,
      serverSelectionTimeoutMS: options.serverSelectionTimeoutMS || 5000,
      socketTimeoutMS: options.socketTimeoutMS || 45000,
      connectTimeoutMS: options.connectTimeoutMS || 10000,

      // Intelligent retry logic
      retryWrites: true,
      retryReads: true,
      maxStalenessSeconds: options.maxStalenessSeconds || 90,

      // Advanced monitoring capabilities
      monitorCommands: true,

      // Intelligent load balancing
      loadBalanced: options.loadBalanced || false,

      // Connection compression
      compressors: options.compressors || ['snappy', 'zlib'],

      // SSL/TLS optimization
      ssl: options.ssl || true,
      sslValidate: options.sslValidate || true,

      // Advanced read preferences
      readPreference: options.readPreference || 'secondaryPreferred',
      readConcern: { level: options.readConcernLevel || 'majority' },

      // Write concern optimization
      writeConcern: {
        w: options.writeConcernW || 'majority',
        j: options.writeConcernJ || true,
        wtimeout: options.writeConcernTimeout || 5000
      }
    };

    this.client = null;
    this.connectionMetrics = new Map();
    this.performanceStats = {
      totalConnections: 0,
      activeConnections: 0,
      connectionWaitTimes: [],
      queryExecutionTimes: [],
      connectionErrors: 0,
      poolHealthScore: 100
    };
  }

  async initializeConnectionPool() {
    try {
      console.log('Initializing MongoDB connection pool with intelligent optimization...');

      // Create client with advanced pooling options
      this.client = new MongoClient(this.connectionUri, this.poolOptions);

      // Set up comprehensive event listeners for monitoring
      this.setupConnectionMonitoring();

      // Connect with retry logic and health checking
      await this.connectWithHealthCheck();

      // Initialize performance monitoring
      this.startPerformanceMonitoring();

      // Setup automatic pool optimization
      this.setupAutomaticOptimization();

      console.log('MongoDB connection pool initialized successfully');
      return this.client;

    } catch (error) {
      console.error('Failed to initialize MongoDB connection pool:', error);
      throw error;
    }
  }

  setupConnectionMonitoring() {
    // Connection pool monitoring events
    this.client.on('connectionPoolCreated', (event) => {
      console.log(`Connection pool created: ${event.address}`);
      this.logPoolEvent('pool_created', event);
    });

    this.client.on('connectionPoolReady', (event) => {
      console.log(`Connection pool ready: ${event.address}`);
      this.logPoolEvent('pool_ready', event);
    });

    this.client.on('connectionCreated', (event) => {
      this.performanceStats.totalConnections++;
      this.logPoolEvent('connection_created', event);
    });

    this.client.on('connectionReady', (event) => {
      this.performanceStats.activeConnections++;
      this.logPoolEvent('connection_ready', event);
    });

    this.client.on('connectionClosed', (event) => {
      this.performanceStats.activeConnections = Math.max(0, this.performanceStats.activeConnections - 1);
      this.logPoolEvent('connection_closed', event);
    });

    this.client.on('connectionCheckOutStarted', (event) => {
      event.startTime = Date.now();
      this.logPoolEvent('checkout_started', event);
    });

    this.client.on('connectionCheckedOut', (event) => {
      const waitTime = event.startTime ? Date.now() - event.startTime : 0;
      this.performanceStats.connectionWaitTimes.push(waitTime);
      this.logPoolEvent('connection_checked_out', { ...event, waitTime });
    });

    this.client.on('connectionCheckedIn', (event) => {
      this.logPoolEvent('connection_checked_in', event);
    });

    // Command monitoring for performance analysis
    this.client.on('commandStarted', (event) => {
      event.startTime = Date.now();
      this.logCommandEvent('command_started', event);
    });

    this.client.on('commandSucceeded', (event) => {
      const executionTime = event.startTime ? Date.now() - event.startTime : 0;
      this.performanceStats.queryExecutionTimes.push(executionTime);
      this.logCommandEvent('command_succeeded', { ...event, executionTime });
    });

    this.client.on('commandFailed', (event) => {
      this.performanceStats.connectionErrors++;
      const executionTime = event.startTime ? Date.now() - event.startTime : 0;
      this.logCommandEvent('command_failed', { ...event, executionTime });
    });

    // Server monitoring for intelligent scaling
    this.client.on('serverHeartbeatStarted', (event) => {
      this.logServerEvent('heartbeat_started', event);
    });

    this.client.on('serverHeartbeatSucceeded', (event) => {
      this.logServerEvent('heartbeat_succeeded', event);
    });

    this.client.on('serverHeartbeatFailed', (event) => {
      this.logServerEvent('heartbeat_failed', event);
    });
  }

  async connectWithHealthCheck() {
    const maxRetries = 3;
    let retryCount = 0;

    while (retryCount < maxRetries) {
      try {
        await this.client.connect();

        // Perform health check
        const healthCheck = await this.performHealthCheck();
        if (healthCheck.healthy) {
          console.log('Connection pool health check passed');
          return;
        } else {
          throw new Error(`Health check failed: ${healthCheck.issues.join(', ')}`);
        }

      } catch (error) {
        retryCount++;
        console.error(`Connection attempt ${retryCount} failed:`, error.message);

        if (retryCount >= maxRetries) {
          throw error;
        }

        // Exponential backoff
        await new Promise(resolve => setTimeout(resolve, Math.pow(2, retryCount) * 1000));
      }
    }
  }

  async performHealthCheck() {
    try {
      // Test basic connectivity
      const admin = this.client.db('admin');
      const pingResult = await admin.command({ ping: 1 });

      // Test read operations
      const testDb = this.client.db('test');
      await testDb.collection('healthcheck').findOne({}, { maxTimeMS: 5000 });

      // Check connection pool stats
      const poolStats = await this.getConnectionPoolStats();

      const issues = [];

      // Analyze pool health
      if (poolStats.availableConnections < 2) {
        issues.push('Low available connections');
      }

      if (poolStats.averageWaitTime > 1000) {
        issues.push('High average connection wait time');
      }

      if (poolStats.errorRate > 0.05) {
        issues.push('High error rate detected');
      }

      return {
        healthy: issues.length === 0,
        issues: issues,
        timestamp: new Date(),
        poolStats: poolStats
      };

    } catch (error) {
      return {
        healthy: false,
        issues: [`Health check error: ${error.message}`],
        timestamp: new Date()
      };
    }
  }

  startPerformanceMonitoring() {
    // Real-time performance monitoring
    setInterval(async () => {
      try {
        const stats = await this.getDetailedPerformanceStats();
        this.analyzePerformanceTrends(stats);
        this.updatePoolHealthScore(stats);

        // Log performance summary
        console.log(`Pool Performance - Health: ${this.performanceStats.poolHealthScore}%, ` +
                   `Active: ${stats.activeConnections}, ` +
                   `Avg Wait: ${stats.averageWaitTime}ms, ` +
                   `Avg Query: ${stats.averageQueryTime}ms`);

      } catch (error) {
        console.error('Performance monitoring error:', error);
      }
    }, 30000); // Every 30 seconds
  }

  async getDetailedPerformanceStats() {
    const now = Date.now();
    const fiveMinutesAgo = now - (5 * 60 * 1000);

    // Filter recent metrics
    const recentWaitTimes = this.performanceStats.connectionWaitTimes
      .filter(time => time.timestamp > fiveMinutesAgo);
    const recentQueryTimes = this.performanceStats.queryExecutionTimes
      .filter(time => time.timestamp > fiveMinutesAgo);

    const stats = {
      timestamp: now,
      totalConnections: this.performanceStats.totalConnections,
      activeConnections: this.performanceStats.activeConnections,

      // Connection timing analysis
      averageWaitTime: this.calculateAverage(recentWaitTimes.map(t => t.value)),
      maxWaitTime: Math.max(...(recentWaitTimes.map(t => t.value) || [0])),
      p95WaitTime: this.calculatePercentile(recentWaitTimes.map(t => t.value), 0.95),

      // Query performance analysis
      averageQueryTime: this.calculateAverage(recentQueryTimes.map(t => t.value)),
      maxQueryTime: Math.max(...(recentQueryTimes.map(t => t.value) || [0])),
      p95QueryTime: this.calculatePercentile(recentQueryTimes.map(t => t.value), 0.95),

      // Error analysis
      errorRate: this.calculateErrorRate(fiveMinutesAgo),
      connectionErrors: this.performanceStats.connectionErrors,

      // Pool utilization
      poolUtilization: (this.performanceStats.activeConnections / this.poolOptions.maxPoolSize) * 100,

      // Connection efficiency
      connectionEfficiency: this.calculateConnectionEfficiency(recentWaitTimes),

      // Server health indicators
      serverHealth: await this.getServerHealthIndicators()
    };

    return stats;
  }

  setupAutomaticOptimization() {
    // Intelligent pool optimization based on performance metrics
    setInterval(async () => {
      try {
        const stats = await this.getDetailedPerformanceStats();
        const optimizations = this.generateOptimizationRecommendations(stats);

        if (optimizations.length > 0) {
          console.log('Applying automatic optimizations:', optimizations);
          await this.applyOptimizations(optimizations);
        }

      } catch (error) {
        console.error('Automatic optimization error:', error);
      }
    }, 300000); // Every 5 minutes
  }

  generateOptimizationRecommendations(stats) {
    const recommendations = [];

    // High utilization optimization
    if (stats.poolUtilization > 85) {
      recommendations.push({
        type: 'increase_pool_size',
        current: this.poolOptions.maxPoolSize,
        recommended: Math.min(this.poolOptions.maxPoolSize * 1.2, 200),
        reason: 'High pool utilization detected'
      });
    }

    // High wait time optimization
    if (stats.averageWaitTime > 500) {
      recommendations.push({
        type: 'reduce_idle_timeout',
        current: this.poolOptions.maxIdleTimeMS,
        recommended: Math.max(this.poolOptions.maxIdleTimeMS * 0.8, 10000),
        reason: 'High connection wait times detected'
      });
    }

    // Low utilization optimization
    if (stats.poolUtilization < 20 && this.poolOptions.maxPoolSize > 20) {
      recommendations.push({
        type: 'decrease_pool_size',
        current: this.poolOptions.maxPoolSize,
        recommended: Math.max(this.poolOptions.maxPoolSize * 0.8, 10),
        reason: 'Low pool utilization detected'
      });
    }

    // Error rate optimization
    if (stats.errorRate > 0.05) {
      recommendations.push({
        type: 'increase_timeout',
        current: this.poolOptions.serverSelectionTimeoutMS,
        recommended: this.poolOptions.serverSelectionTimeoutMS * 1.5,
        reason: 'High error rate suggests timeout issues'
      });
    }

    return recommendations;
  }

  async applyOptimizations(optimizations) {
    for (const optimization of optimizations) {
      try {
        switch (optimization.type) {
          case 'increase_pool_size':
            // Note: Pool size changes require connection recreation
            console.log(`Recommending pool size increase from ${optimization.current} to ${optimization.recommended}`);
            break;

          case 'decrease_pool_size':
            console.log(`Recommending pool size decrease from ${optimization.current} to ${optimization.recommended}`);
            break;

          case 'reduce_idle_timeout':
            console.log(`Recommending idle timeout reduction from ${optimization.current} to ${optimization.recommended}`);
            break;

          case 'increase_timeout':
            console.log(`Recommending timeout increase from ${optimization.current} to ${optimization.recommended}`);
            break;
        }

        // Log optimization for operational tracking
        this.logOptimization(optimization);

      } catch (error) {
        console.error(`Failed to apply optimization ${optimization.type}:`, error);
      }
    }
  }

  async getConnectionPoolStats() {
    return {
      totalConnections: this.performanceStats.totalConnections,
      activeConnections: this.performanceStats.activeConnections,
      availableConnections: this.poolOptions.maxPoolSize - this.performanceStats.activeConnections,
      maxPoolSize: this.poolOptions.maxPoolSize,
      minPoolSize: this.poolOptions.minPoolSize,

      // Recent performance metrics
      averageWaitTime: this.calculateAverage(
        this.performanceStats.connectionWaitTimes
          .slice(-100)
          .map(t => t.value || t)
      ),

      averageQueryTime: this.calculateAverage(
        this.performanceStats.queryExecutionTimes
          .slice(-100)
          .map(t => t.value || t)
      ),

      errorRate: this.calculateErrorRate(Date.now() - (60 * 60 * 1000)), // Last hour
      poolHealthScore: this.performanceStats.poolHealthScore
    };
  }

  // Utility methods for calculations
  calculateAverage(values) {
    if (!values || values.length === 0) return 0;
    return values.reduce((sum, val) => sum + val, 0) / values.length;
  }

  calculatePercentile(values, percentile) {
    if (!values || values.length === 0) return 0;
    const sorted = [...values].sort((a, b) => a - b);
    const index = Math.ceil(sorted.length * percentile) - 1;
    return sorted[index] || 0;
  }

  calculateErrorRate(since) {
    const totalQueries = this.performanceStats.queryExecutionTimes
      .filter(t => t.timestamp > since).length;
    const errors = this.performanceStats.connectionErrors;
    return totalQueries > 0 ? errors / totalQueries : 0;
  }

  calculateConnectionEfficiency(waitTimes) {
    if (!waitTimes || waitTimes.length === 0) return 100;
    const fastConnections = waitTimes.filter(t => t.value < 100).length;
    return (fastConnections / waitTimes.length) * 100;
  }

  async getServerHealthIndicators() {
    try {
      const admin = this.client.db('admin');
      const serverStatus = await admin.command({ serverStatus: 1 });

      return {
        uptime: serverStatus.uptime,
        connections: serverStatus.connections,
        opcounters: serverStatus.opcounters,
        mem: serverStatus.mem,
        globalLock: serverStatus.globalLock
      };
    } catch (error) {
      console.error('Failed to get server health indicators:', error);
      return null;
    }
  }

  updatePoolHealthScore(stats) {
    let score = 100;

    // Penalize high utilization
    if (stats.poolUtilization > 90) score -= 30;
    else if (stats.poolUtilization > 75) score -= 15;

    // Penalize high wait times
    if (stats.averageWaitTime > 1000) score -= 25;
    else if (stats.averageWaitTime > 500) score -= 10;

    // Penalize errors
    if (stats.errorRate > 0.05) score -= 20;
    else if (stats.errorRate > 0.02) score -= 10;

    // Penalize low efficiency
    if (stats.connectionEfficiency < 70) score -= 15;
    else if (stats.connectionEfficiency < 85) score -= 5;

    this.performanceStats.poolHealthScore = Math.max(0, Math.min(100, score));
  }

  logPoolEvent(eventType, event) {
    this.connectionMetrics.set(`${eventType}_${Date.now()}`, {
      type: eventType,
      timestamp: new Date(),
      ...event
    });
  }

  logCommandEvent(eventType, event) {
    // Store command execution metrics
    const timestamp = new Date();

    if (eventType === 'command_succeeded' && event.executionTime !== undefined) {
      this.performanceStats.queryExecutionTimes.push({
        value: event.executionTime,
        timestamp: timestamp.getTime()
      });
    }

    // Keep only recent metrics to prevent memory growth
    if (this.performanceStats.queryExecutionTimes.length > 10000) {
      this.performanceStats.queryExecutionTimes = 
        this.performanceStats.queryExecutionTimes.slice(-5000);
    }
  }

  logServerEvent(eventType, event) {
    // Log server-level events for health monitoring
    console.log(`Server event: ${eventType}`, {
      timestamp: new Date(),
      ...event
    });
  }

  logOptimization(optimization) {
    console.log('Optimization Applied:', {
      timestamp: new Date(),
      ...optimization
    });
  }

  // Graceful shutdown
  async shutdown() {
    console.log('Shutting down MongoDB connection pool...');

    try {
      if (this.client) {
        await this.client.close(true); // Force close all connections
        console.log('MongoDB connection pool shut down successfully');
      }
    } catch (error) {
      console.error('Error during connection pool shutdown:', error);
    }
  }
}

// Example usage with intelligent configuration
async function createOptimizedMongoConnection() {
  const connectionManager = new MongoConnectionPoolManager(
    'mongodb://localhost:27017/production_db',
    {
      // Intelligent pool sizing based on application type
      minPoolSize: 10,           // Minimum connections for baseline performance
      maxPoolSize: 100,          // Maximum connections for peak load
      maxIdleTimeMS: 30000,      // 30 seconds idle timeout

      // Optimized timeouts for production
      waitQueueTimeoutMS: 2500,        // 2.5 seconds max wait for connection
      serverSelectionTimeoutMS: 5000,  // 5 seconds for server selection
      socketTimeoutMS: 45000,          // 45 seconds for socket operations
      connectTimeoutMS: 10000,         // 10 seconds connection timeout

      // Performance optimizations
      compressors: ['snappy', 'zlib'],
      loadBalanced: true,              // Enable load balancing
      readPreference: 'secondaryPreferred',
      readConcernLevel: 'majority',

      // Write concern for consistency
      writeConcernW: 'majority',
      writeConcernJ: true,
      writeConcernTimeout: 5000
    }
  );

  try {
    const client = await connectionManager.initializeConnectionPool();

    // Return both client and manager for full control
    return {
      client,
      manager: connectionManager
    };

  } catch (error) {
    console.error('Failed to create optimized MongoDB connection:', error);
    throw error;
  }
}

// Benefits of MongoDB intelligent connection pooling:
// - Automatic connection scaling based on demand
// - Real-time performance monitoring and optimization
// - Intelligent retry logic with exponential backoff
// - Advanced health checking and diagnostic capabilities
// - Built-in connection efficiency analysis
// - Automatic pool optimization based on performance metrics
// - Comprehensive event tracking for troubleshooting
// - Native integration with MongoDB driver optimizations
// - Load balancing and failover support
// - Zero-downtime connection management

Advanced Connection Pool Optimization Techniques

Strategic connection management patterns for production-grade performance:

// Advanced MongoDB connection pooling patterns for enterprise applications
class EnterpriseConnectionManager {
  constructor() {
    this.connectionPools = new Map();
    this.routingStrategies = new Map();
    this.performanceMetrics = new Map();
    this.healthCheckers = new Map();
  }

  // Multi-tier connection pooling strategy
  async createTieredConnectionPools(configurations) {
    const poolTiers = {
      // High-priority pool for critical operations
      critical: {
        minPoolSize: 15,
        maxPoolSize: 50,
        maxIdleTimeMS: 10000,
        waitQueueTimeoutMS: 1000,
        priority: 'high'
      },

      // Standard pool for regular operations
      standard: {
        minPoolSize: 10,
        maxPoolSize: 75,
        maxIdleTimeMS: 30000,
        waitQueueTimeoutMS: 2500,
        priority: 'normal'
      },

      // Batch pool for background operations
      batch: {
        minPoolSize: 5,
        maxPoolSize: 25,
        maxIdleTimeMS: 60000,
        waitQueueTimeoutMS: 10000,
        priority: 'low'
      },

      // Analytics pool for reporting queries
      analytics: {
        minPoolSize: 3,
        maxPoolSize: 20,
        maxIdleTimeMS: 120000,
        waitQueueTimeoutMS: 30000,
        readPreference: 'secondary',
        priority: 'analytics'
      }
    };

    for (const [tierName, config] of Object.entries(poolTiers)) {
      try {
        const connectionManager = new MongoConnectionPoolManager(
          configurations[tierName]?.uri || configurations.default.uri,
          {
            ...config,
            ...configurations[tierName]
          }
        );

        const poolInfo = await connectionManager.initializeConnectionPool();

        this.connectionPools.set(tierName, {
          manager: connectionManager,
          client: poolInfo,
          config: config,
          createdAt: new Date(),
          lastHealthCheck: null
        });

        console.log(`Initialized ${tierName} connection pool with ${config.maxPoolSize} max connections`);

      } catch (error) {
        console.error(`Failed to create ${tierName} connection pool:`, error);
        throw error;
      }
    }
  }

  // Intelligent connection routing based on operation type
  getConnectionForOperation(operationType, priority = 'normal') {
    const routingRules = {
      'user_query': priority === 'high' ? 'critical' : 'standard',
      'admin_operation': 'critical',
      'bulk_insert': 'batch',
      'bulk_update': 'batch',
      'reporting_query': 'analytics',
      'aggregation': priority === 'high' ? 'standard' : 'analytics',
      'index_operation': 'batch',
      'backup_operation': 'batch',
      'monitoring_query': 'analytics'
    };

    const preferredPool = routingRules[operationType] || 'standard';
    const poolInfo = this.connectionPools.get(preferredPool);

    if (poolInfo && this.isPoolHealthy(preferredPool)) {
      return poolInfo.client;
    }

    // Fallback to standard pool if preferred pool is unavailable
    const fallbackPool = this.connectionPools.get('standard');
    if (fallbackPool && this.isPoolHealthy('standard')) {
      console.warn(`Using fallback pool for ${operationType} (preferred: ${preferredPool})`);
      return fallbackPool.client;
    }

    throw new Error('No healthy connection pools available');
  }

  // Advanced performance monitoring across all pools
  async getComprehensivePerformanceReport() {
    const report = {
      timestamp: new Date(),
      overallHealth: 'unknown',
      pools: {},
      recommendations: [],
      alerts: []
    };

    let totalHealthScore = 0;
    let poolCount = 0;

    for (const [poolName, poolInfo] of this.connectionPools.entries()) {
      try {
        const stats = await poolInfo.manager.getDetailedPerformanceStats();

        report.pools[poolName] = {
          ...stats,
          configuration: poolInfo.config,
          uptime: Date.now() - poolInfo.createdAt.getTime(),
          healthStatus: this.calculatePoolHealth(stats)
        };

        totalHealthScore += stats.poolHealthScore || 0;
        poolCount++;

        // Generate pool-specific recommendations
        const poolRecommendations = this.generatePoolRecommendations(poolName, stats);
        report.recommendations.push(...poolRecommendations);

        // Check for alerts
        const alerts = this.checkForAlerts(poolName, stats);
        report.alerts.push(...alerts);

      } catch (error) {
        report.pools[poolName] = {
          error: error.message,
          healthStatus: 'unhealthy'
        };

        report.alerts.push({
          severity: 'critical',
          pool: poolName,
          message: `Pool health check failed: ${error.message}`
        });
      }
    }

    // Calculate overall health
    report.overallHealth = poolCount > 0 ? 
      (totalHealthScore / poolCount > 80 ? 'healthy' : 
       totalHealthScore / poolCount > 60 ? 'warning' : 'critical') : 'unknown';

    return report;
  }

  isPoolHealthy(poolName) {
    const poolInfo = this.connectionPools.get(poolName);
    if (!poolInfo) return false;

    // Simple health check - can be enhanced with more sophisticated logic
    return poolInfo.manager.performanceStats.poolHealthScore > 50;
  }

  calculatePoolHealth(stats) {
    if (stats.poolHealthScore >= 80) return 'healthy';
    if (stats.poolHealthScore >= 60) return 'warning';
    return 'critical';
  }

  generatePoolRecommendations(poolName, stats) {
    const recommendations = [];

    // High utilization recommendations
    if (stats.poolUtilization > 85) {
      recommendations.push({
        pool: poolName,
        type: 'capacity',
        severity: 'high',
        message: `${poolName} pool utilization is ${stats.poolUtilization.toFixed(1)}% - consider increasing pool size`,
        suggestedAction: `Increase maxPoolSize from ${stats.maxPoolSize} to ${Math.ceil(stats.maxPoolSize * 1.3)}`
      });
    }

    // Performance recommendations
    if (stats.averageWaitTime > 1000) {
      recommendations.push({
        pool: poolName,
        type: 'performance',
        severity: 'medium',
        message: `${poolName} pool has high average wait time: ${stats.averageWaitTime.toFixed(1)}ms`,
        suggestedAction: 'Review connection timeout settings and pool sizing'
      });
    }

    // Error rate recommendations
    if (stats.errorRate > 0.05) {
      recommendations.push({
        pool: poolName,
        type: 'reliability',
        severity: 'high',
        message: `${poolName} pool has high error rate: ${(stats.errorRate * 100).toFixed(1)}%`,
        suggestedAction: 'Investigate connection failures and server health'
      });
    }

    return recommendations;
  }

  checkForAlerts(poolName, stats) {
    const alerts = [];

    // Critical utilization alert
    if (stats.poolUtilization > 95) {
      alerts.push({
        severity: 'critical',
        pool: poolName,
        type: 'utilization',
        message: `${poolName} pool utilization critical: ${stats.poolUtilization.toFixed(1)}%`,
        threshold: 95,
        currentValue: stats.poolUtilization
      });
    }

    // High error rate alert
    if (stats.errorRate > 0.1) {
      alerts.push({
        severity: 'critical',
        pool: poolName,
        type: 'error_rate',
        message: `${poolName} pool error rate critical: ${(stats.errorRate * 100).toFixed(1)}%`,
        threshold: 10,
        currentValue: stats.errorRate * 100
      });
    }

    // Connection timeout alert
    if (stats.p95WaitTime > 5000) {
      alerts.push({
        severity: 'warning',
        pool: poolName,
        type: 'latency',
        message: `${poolName} pool 95th percentile wait time high: ${stats.p95WaitTime.toFixed(1)}ms`,
        threshold: 5000,
        currentValue: stats.p95WaitTime
      });
    }

    return alerts;
  }

  // Automatic pool rebalancing based on usage patterns
  async rebalanceConnectionPools() {
    console.log('Starting automatic pool rebalancing...');

    const report = await this.getComprehensivePerformanceReport();

    for (const [poolName, poolStats] of Object.entries(report.pools)) {
      if (poolStats.error) continue;

      const rebalanceActions = this.calculateRebalanceActions(poolName, poolStats);

      for (const action of rebalanceActions) {
        await this.executeRebalanceAction(poolName, action);
      }
    }

    console.log('Pool rebalancing completed');
  }

  calculateRebalanceActions(poolName, stats) {
    const actions = [];

    // Pool size adjustments
    if (stats.poolUtilization > 80 && stats.maxPoolSize < 200) {
      actions.push({
        type: 'increase_pool_size',
        currentSize: stats.maxPoolSize,
        newSize: Math.min(Math.ceil(stats.maxPoolSize * 1.2), 200),
        reason: 'High utilization'
      });
    } else if (stats.poolUtilization < 30 && stats.maxPoolSize > 10) {
      actions.push({
        type: 'decrease_pool_size',
        currentSize: stats.maxPoolSize,
        newSize: Math.max(Math.ceil(stats.maxPoolSize * 0.8), 10),
        reason: 'Low utilization'
      });
    }

    return actions;
  }

  async executeRebalanceAction(poolName, action) {
    console.log(`Executing rebalance action for ${poolName}:`, action);

    // Note: Actual implementation would require careful coordination
    // to avoid disrupting active connections
    switch (action.type) {
      case 'increase_pool_size':
        console.log(`Would increase ${poolName} pool size from ${action.currentSize} to ${action.newSize}`);
        break;

      case 'decrease_pool_size':
        console.log(`Would decrease ${poolName} pool size from ${action.currentSize} to ${action.newSize}`);
        break;
    }
  }

  // Graceful shutdown of all connection pools
  async shutdownAllPools() {
    console.log('Shutting down all connection pools...');

    const shutdownPromises = [];

    for (const [poolName, poolInfo] of this.connectionPools.entries()) {
      shutdownPromises.push(
        poolInfo.manager.shutdown()
          .catch(error => console.error(`Error shutting down ${poolName} pool:`, error))
      );
    }

    await Promise.all(shutdownPromises);
    this.connectionPools.clear();

    console.log('All connection pools shut down successfully');
  }
}

SQL-Style Connection Pool Management with QueryLeaf

QueryLeaf provides familiar approaches to MongoDB connection pool configuration and monitoring:

-- QueryLeaf connection pool management with SQL-familiar syntax

-- Configure connection pool settings
CONFIGURE CONNECTION POOL production_pool WITH (
  min_connections = 10,
  max_connections = 100,
  connection_timeout = 10000,
  idle_timeout = 30000,
  wait_queue_timeout = 2500,

  -- Advanced settings
  retry_writes = true,
  retry_reads = true,
  compression = ['snappy', 'zlib'],
  load_balanced = true,

  -- Read preferences
  read_preference = 'secondaryPreferred',
  read_concern_level = 'majority',

  -- Write concern
  write_concern_w = 'majority',
  write_concern_j = true,
  write_concern_timeout = 5000
);

-- Monitor connection pool performance
SELECT 
  pool_name,
  active_connections,
  idle_connections,
  total_connections,

  -- Performance metrics
  avg_connection_wait_time_ms,
  max_connection_wait_time_ms,
  p95_connection_wait_time_ms,

  -- Query performance
  avg_query_execution_time_ms,
  queries_per_second,

  -- Health indicators
  pool_utilization_percent,
  error_rate_percent,
  health_score,

  -- Efficiency metrics
  connection_efficiency_percent,
  throughput_score,

  last_updated

FROM CONNECTION_POOL_STATS('production_pool')
WHERE timestamp >= NOW() - INTERVAL '1 hour';

-- Analyze connection pool trends
WITH pool_performance_trends AS (
  SELECT 
    DATE_TRUNC('minute', timestamp) as minute_bucket,

    -- Connection metrics
    AVG(active_connections) as avg_active_connections,
    MAX(active_connections) as max_active_connections,
    AVG(pool_utilization_percent) as avg_utilization,

    -- Performance metrics
    AVG(avg_connection_wait_time_ms) as avg_wait_time,
    AVG(avg_query_execution_time_ms) as avg_query_time,
    SUM(queries_per_second) as total_qps,

    -- Health metrics
    AVG(health_score) as avg_health_score,
    AVG(error_rate_percent) as avg_error_rate,
    COUNT(*) as measurement_count

  FROM CONNECTION_POOL_STATS('production_pool')
  WHERE timestamp >= NOW() - INTERVAL '24 hours'
  GROUP BY DATE_TRUNC('minute', timestamp)
),

performance_analysis AS (
  SELECT *,
    -- Trend analysis
    LAG(avg_utilization, 5) OVER (ORDER BY minute_bucket) as utilization_5min_ago,
    LAG(avg_wait_time, 10) OVER (ORDER BY minute_bucket) as wait_time_10min_ago,

    -- Performance scoring
    CASE 
      WHEN avg_health_score >= 90 THEN 'Excellent'
      WHEN avg_health_score >= 80 THEN 'Good'
      WHEN avg_health_score >= 60 THEN 'Fair'
      ELSE 'Poor'
    END as performance_grade,

    -- Utilization trends
    CASE 
      WHEN avg_utilization > LAG(avg_utilization, 5) OVER (ORDER BY minute_bucket) + 10 
        THEN 'Increasing'
      WHEN avg_utilization < LAG(avg_utilization, 5) OVER (ORDER BY minute_bucket) - 10 
        THEN 'Decreasing'
      ELSE 'Stable'
    END as utilization_trend

  FROM pool_performance_trends
)

SELECT 
  minute_bucket,
  avg_active_connections,
  max_active_connections,
  avg_utilization,
  avg_wait_time,
  avg_query_time,
  total_qps,
  performance_grade,
  utilization_trend,
  avg_health_score

FROM performance_analysis
WHERE minute_bucket >= NOW() - INTERVAL '4 hours'
ORDER BY minute_bucket DESC;

-- Connection pool optimization recommendations
WITH current_performance AS (
  SELECT 
    pool_name,
    active_connections,
    max_connections,
    pool_utilization_percent,
    avg_connection_wait_time_ms,
    error_rate_percent,
    health_score,
    queries_per_second

  FROM CONNECTION_POOL_STATS('production_pool')
  WHERE timestamp >= NOW() - INTERVAL '5 minutes'
  ORDER BY timestamp DESC
  LIMIT 1
),

optimization_analysis AS (
  SELECT *,
    -- Pool sizing recommendations
    CASE 
      WHEN pool_utilization_percent > 85 THEN 
        CONCAT('Increase max_connections from ', max_connections, ' to ', CEIL(max_connections * 1.3))
      WHEN pool_utilization_percent < 30 AND max_connections > 20 THEN 
        CONCAT('Decrease max_connections from ', max_connections, ' to ', GREATEST(CEIL(max_connections * 0.8), 20))
      ELSE 'Pool size appears optimal'
    END as pool_sizing_recommendation,

    -- Timeout recommendations
    CASE 
      WHEN avg_connection_wait_time_ms > 1000 THEN 'Consider increasing connection timeout or pool size'
      WHEN avg_connection_wait_time_ms < 50 THEN 'Connection timeouts are optimal'
      ELSE 'Connection timeouts are acceptable'
    END as timeout_recommendation,

    -- Performance recommendations
    CASE 
      WHEN error_rate_percent > 5 THEN 'Investigate connection errors - check server health and network'
      WHEN health_score < 70 THEN 'Pool performance needs attention - review metrics and configuration'
      WHEN queries_per_second > 1000 AND pool_utilization_percent > 80 THEN 'High throughput with high utilization - consider scaling'
      ELSE 'Performance appears satisfactory'
    END as performance_recommendation,

    -- Priority scoring
    CASE 
      WHEN pool_utilization_percent > 90 OR error_rate_percent > 10 THEN 'Critical'
      WHEN pool_utilization_percent > 75 OR avg_connection_wait_time_ms > 500 THEN 'High'
      WHEN health_score < 80 THEN 'Medium'
      ELSE 'Low'
    END as optimization_priority

  FROM current_performance
)

SELECT 
  pool_name,

  -- Current status
  CONCAT(active_connections, '/', max_connections) as connection_usage,
  ROUND(pool_utilization_percent, 1) as utilization_percent,
  ROUND(avg_connection_wait_time_ms, 1) as avg_wait_ms,
  ROUND(error_rate_percent, 2) as error_rate_percent,
  ROUND(health_score, 1) as health_score,

  -- Recommendations
  pool_sizing_recommendation,
  timeout_recommendation,
  performance_recommendation,
  optimization_priority,

  -- Action items
  CASE 
    WHEN optimization_priority = 'Critical' THEN 'Immediate action required'
    WHEN optimization_priority = 'High' THEN 'Schedule optimization within 24 hours'
    WHEN optimization_priority = 'Medium' THEN 'Plan optimization within 1 week'
    ELSE 'Monitor and review monthly'
  END as recommended_timeline,

  NOW() as analysis_timestamp

FROM optimization_analysis;

-- Automated pool health monitoring with alerts
CREATE ALERT CONNECTION_POOL_HEALTH_MONITOR
ON CONNECTION_POOL_STATS('production_pool')
WHEN (
  pool_utilization_percent > 90 OR
  avg_connection_wait_time_ms > 2000 OR
  error_rate_percent > 5 OR
  health_score < 70
)
NOTIFY ['dba-team@company.com', 'ops-team@company.com']
WITH MESSAGE TEMPLATE '''
Connection Pool Alert: {{ pool_name }}

Current Status:
- Utilization: {{ pool_utilization_percent }}%
- Active Connections: {{ active_connections }}/{{ max_connections }}
- Average Wait Time: {{ avg_connection_wait_time_ms }}ms
- Error Rate: {{ error_rate_percent }}%
- Health Score: {{ health_score }}

Recommended Actions:
{{ pool_sizing_recommendation }}
{{ timeout_recommendation }}
{{ performance_recommendation }}

Dashboard: https://monitoring.company.com/mongodb/pools/{{ pool_name }}
'''
EVERY 5 MINUTES;

-- Historical connection pool analysis
SELECT 
  DATE(timestamp) as analysis_date,

  -- Daily aggregates
  AVG(pool_utilization_percent) as avg_daily_utilization,
  MAX(pool_utilization_percent) as peak_daily_utilization,
  AVG(avg_connection_wait_time_ms) as avg_daily_wait_time,
  MAX(active_connections) as peak_daily_connections,

  -- Performance indicators
  AVG(health_score) as avg_daily_health_score,
  MIN(health_score) as lowest_daily_health_score,
  AVG(queries_per_second) as avg_daily_qps,
  MAX(queries_per_second) as peak_daily_qps,

  -- Issue tracking
  COUNT(CASE WHEN error_rate_percent > 1 THEN 1 END) as error_incidents,
  COUNT(CASE WHEN pool_utilization_percent > 85 THEN 1 END) as high_utilization_incidents,

  -- Efficiency metrics
  AVG(connection_efficiency_percent) as avg_connection_efficiency,
  AVG(throughput_score) as avg_throughput_score

FROM CONNECTION_POOL_STATS('production_pool')
WHERE timestamp >= NOW() - INTERVAL '30 days'
GROUP BY DATE(timestamp)
ORDER BY analysis_date DESC;

-- QueryLeaf connection pooling provides:
-- 1. SQL-familiar pool configuration and management
-- 2. Comprehensive performance monitoring and analysis
-- 3. Intelligent optimization recommendations
-- 4. Automated health monitoring and alerting
-- 5. Historical trend analysis and capacity planning
-- 6. Integration with MongoDB's native pooling features
-- 7. Real-time performance metrics and diagnostics
-- 8. Automated scaling recommendations based on usage patterns
-- 9. Multi-tier pooling strategies for different workload types
-- 10. Enterprise-grade monitoring and operational visibility

Best Practices for MongoDB Connection Pooling

Pool Sizing Strategy

Optimal connection pool configuration for different application types:

  1. High-Traffic Web Applications: Large pools with aggressive timeouts for rapid response
  2. Batch Processing Systems: Moderate pools with longer timeouts for sustained throughput
  3. Analytics Applications: Smaller pools with secondary read preferences for reporting queries
  4. Microservices Architecture: Multiple specialized pools for different service patterns
  5. Real-time Applications: Priority-based pooling with guaranteed connection availability
  6. Background Services: Separate pools to prevent interference with user-facing operations

Performance Monitoring Guidelines

Essential metrics for production connection pool management:

  1. Utilization Metrics: Track active vs. available connections continuously
  2. Latency Monitoring: Monitor connection wait times and query execution performance
  3. Error Rate Analysis: Track connection failures and timeout patterns
  4. Resource Efficiency: Analyze connection reuse rates and pool effectiveness
  5. Capacity Planning: Use historical data to predict scaling requirements
  6. Health Scoring: Implement composite health metrics for proactive management

Conclusion

MongoDB connection pooling optimization requires sophisticated strategies that balance performance, resource utilization, and operational reliability. By implementing intelligent pooling algorithms, comprehensive monitoring systems, and automated optimization techniques, applications can achieve maximum throughput while maintaining efficient resource usage and operational stability.

Key connection pooling benefits include:

  • Intelligent Scaling: Automatic pool sizing based on demand patterns and performance metrics
  • Performance Optimization: Real-time monitoring and tuning for optimal query execution
  • Resource Efficiency: Optimal connection reuse and lifecycle management
  • Operational Visibility: Comprehensive metrics and alerting for proactive management
  • High Availability: Intelligent failover and connection recovery mechanisms
  • Enterprise Integration: Support for complex deployment architectures and monitoring systems

Whether you're building high-throughput web applications, data processing pipelines, analytics platforms, or distributed microservices, MongoDB's intelligent connection pooling with QueryLeaf's familiar management interface provides the foundation for scalable, efficient database operations. This combination enables you to leverage advanced connection management capabilities while maintaining familiar database administration patterns and operational procedures.

QueryLeaf Integration: QueryLeaf automatically translates SQL-familiar connection pool configuration into optimal MongoDB driver settings while providing comprehensive monitoring and optimization through SQL-style queries. Advanced pooling strategies, performance analysis, and automated tuning are seamlessly managed through familiar database administration interfaces, making sophisticated connection management both powerful and accessible.

The integration of intelligent connection pooling with SQL-style database operations makes MongoDB an ideal platform for applications requiring both high-performance database access and familiar connection management patterns, ensuring your database connections remain both efficient and reliable as they scale to meet demanding production requirements.

MongoDB Data Modeling Best Practices and Schema Design: Advanced Document Structure Optimization and Relationship Management for Scalable Applications

Modern applications require sophisticated data modeling strategies that can handle complex relationships, evolving schemas, and high-performance requirements while maintaining data consistency and query flexibility. Traditional relational modeling approaches often struggle with document-oriented data, nested structures, and the dynamic schema requirements of modern applications, leading to complex object-relational mapping, rigid schema constraints, and performance bottlenecks that limit application scalability and development velocity.

MongoDB provides comprehensive data modeling capabilities through flexible document structures, embedded relationships, and advanced schema design patterns that enable sophisticated data organization with optimal performance characteristics. Unlike traditional databases that enforce rigid table structures and require complex joins, MongoDB integrates data modeling directly into the document structure with native support for arrays, nested objects, and flexible schemas that adapt to application requirements.

The Traditional Relational Data Modeling Challenge

Conventional approaches to data modeling in relational systems face significant limitations when handling complex, hierarchical, and rapidly evolving data structures:

-- Traditional relational data modeling - rigid schema with complex relationship management

-- Basic user management with limited flexibility
CREATE TABLE users (
    user_id SERIAL PRIMARY KEY,
    username VARCHAR(255) UNIQUE NOT NULL,
    email VARCHAR(255) UNIQUE NOT NULL,

    -- Basic profile information (limited structure)
    first_name VARCHAR(100),
    last_name VARCHAR(100),
    date_of_birth DATE,
    phone_number VARCHAR(20),

    -- Address information (denormalized for simplicity)
    address_line_1 VARCHAR(255),
    address_line_2 VARCHAR(255),
    city VARCHAR(100),
    state VARCHAR(100),
    postal_code VARCHAR(20),
    country VARCHAR(100),

    -- Account metadata
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    last_login TIMESTAMP,
    account_status VARCHAR(50) DEFAULT 'active',

    -- Basic preferences (very limited)
    preferred_language VARCHAR(10) DEFAULT 'en',
    timezone VARCHAR(50) DEFAULT 'UTC',

    -- Social media links (limited and rigid)
    facebook_url VARCHAR(255),
    twitter_url VARCHAR(255),
    linkedin_url VARCHAR(255),
    instagram_url VARCHAR(255)
);

-- Separate table for user profiles (normalized approach)
CREATE TABLE user_profiles (
    profile_id SERIAL PRIMARY KEY,
    user_id INTEGER REFERENCES users(user_id) ON DELETE CASCADE,

    -- Extended profile information
    bio TEXT,
    website VARCHAR(255),
    company VARCHAR(255),
    job_title VARCHAR(255),

    -- Skills and interests (very basic approach)
    skills TEXT, -- Comma-separated values - not optimal
    interests TEXT, -- Comma-separated values - not optimal

    -- Professional information
    years_of_experience INTEGER,
    education_level VARCHAR(100),

    -- Contact preferences
    email_notifications BOOLEAN DEFAULT true,
    sms_notifications BOOLEAN DEFAULT false,
    marketing_emails BOOLEAN DEFAULT false,

    -- Profile metadata
    profile_completeness_percent DECIMAL(5,2) DEFAULT 0.0,
    profile_visibility VARCHAR(50) DEFAULT 'public',

    -- Profile customization (limited)
    theme VARCHAR(50) DEFAULT 'default',
    profile_picture_url VARCHAR(255),
    cover_photo_url VARCHAR(255)
);

-- User posts with basic relationship management
CREATE TABLE posts (
    post_id SERIAL PRIMARY KEY,
    user_id INTEGER REFERENCES users(user_id) ON DELETE CASCADE,

    -- Post content
    title VARCHAR(500) NOT NULL,
    content TEXT NOT NULL,
    post_type VARCHAR(50) DEFAULT 'article',

    -- Post metadata
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    published_at TIMESTAMP,

    -- Post status and visibility
    status VARCHAR(50) DEFAULT 'draft',
    visibility VARCHAR(50) DEFAULT 'public',

    -- SEO and categorization
    slug VARCHAR(500) UNIQUE,
    meta_description TEXT,
    featured_image_url VARCHAR(255),

    -- Engagement metrics (basic)
    view_count INTEGER DEFAULT 0,
    like_count INTEGER DEFAULT 0,
    comment_count INTEGER DEFAULT 0,
    share_count INTEGER DEFAULT 0,

    -- Content flags
    is_featured BOOLEAN DEFAULT false,
    is_pinned BOOLEAN DEFAULT false,
    allow_comments BOOLEAN DEFAULT true
);

-- Post categories (many-to-many relationship)
CREATE TABLE categories (
    category_id SERIAL PRIMARY KEY,
    category_name VARCHAR(255) UNIQUE NOT NULL,
    category_slug VARCHAR(255) UNIQUE NOT NULL,
    description TEXT,
    parent_category_id INTEGER REFERENCES categories(category_id),

    -- Category metadata
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    is_active BOOLEAN DEFAULT true,
    sort_order INTEGER DEFAULT 0,

    -- Category appearance
    color VARCHAR(7), -- Hex color code
    icon VARCHAR(100) -- Icon identifier
);

-- Post-category relationships (junction table)
CREATE TABLE post_categories (
    post_id INTEGER REFERENCES posts(post_id) ON DELETE CASCADE,
    category_id INTEGER REFERENCES categories(category_id) ON DELETE CASCADE,

    PRIMARY KEY (post_id, category_id),

    -- Relationship metadata
    assigned_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    assigned_by INTEGER REFERENCES users(user_id)
);

-- Comments with hierarchical structure (self-referencing)
CREATE TABLE comments (
    comment_id SERIAL PRIMARY KEY,
    post_id INTEGER REFERENCES posts(post_id) ON DELETE CASCADE,
    user_id INTEGER REFERENCES users(user_id) ON DELETE CASCADE,
    parent_comment_id INTEGER REFERENCES comments(comment_id) ON DELETE CASCADE,

    -- Comment content
    content TEXT NOT NULL,
    comment_type VARCHAR(50) DEFAULT 'text',

    -- Comment metadata
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    -- Comment status
    status VARCHAR(50) DEFAULT 'published',
    is_edited BOOLEAN DEFAULT false,
    is_pinned BOOLEAN DEFAULT false,

    -- Engagement
    like_count INTEGER DEFAULT 0,
    reply_count INTEGER DEFAULT 0,

    -- Moderation
    is_flagged BOOLEAN DEFAULT false,
    moderation_status VARCHAR(50) DEFAULT 'approved'
);

-- Tags for flexible categorization (many-to-many)
CREATE TABLE tags (
    tag_id SERIAL PRIMARY KEY,
    tag_name VARCHAR(255) UNIQUE NOT NULL,
    tag_slug VARCHAR(255) UNIQUE NOT NULL,
    description TEXT,

    -- Tag metadata
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    usage_count INTEGER DEFAULT 0,
    is_trending BOOLEAN DEFAULT false,

    -- Tag appearance
    color VARCHAR(7)
);

-- Post-tag relationships
CREATE TABLE post_tags (
    post_id INTEGER REFERENCES posts(post_id) ON DELETE CASCADE,
    tag_id INTEGER REFERENCES tags(tag_id) ON DELETE CASCADE,

    PRIMARY KEY (post_id, tag_id),

    -- Relationship metadata
    tagged_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    tagged_by INTEGER REFERENCES users(user_id),
    relevance_score DECIMAL(3,2) DEFAULT 1.0
);

-- Complex query to retrieve post with all relationships (performance issues)
WITH post_data AS (
    SELECT 
        p.post_id,
        p.title,
        p.content,
        p.created_at,
        p.status,
        p.view_count,
        p.like_count,
        p.comment_count,

        -- User information (requires join)
        u.username,
        u.email,
        up.bio,
        up.profile_picture_url,

        -- Categories (requires aggregation)
        STRING_AGG(DISTINCT c.category_name, ', ' ORDER BY c.category_name) as categories,

        -- Tags (requires aggregation)
        STRING_AGG(DISTINCT t.tag_name, ', ' ORDER BY t.tag_name) as tags

    FROM posts p
    JOIN users u ON p.user_id = u.user_id
    LEFT JOIN user_profiles up ON u.user_id = up.user_id
    LEFT JOIN post_categories pc ON p.post_id = pc.post_id
    LEFT JOIN categories c ON pc.category_id = c.category_id
    LEFT JOIN post_tags pt ON p.post_id = pt.post_id
    LEFT JOIN tags t ON pt.tag_id = t.tag_id

    WHERE p.status = 'published'
    GROUP BY 
        p.post_id, p.title, p.content, p.created_at, p.status, 
        p.view_count, p.like_count, p.comment_count,
        u.username, u.email, up.bio, up.profile_picture_url
),

comment_hierarchy AS (
    -- Recursive CTE for nested comments (complex and performance-intensive)
    WITH RECURSIVE comment_tree AS (
        SELECT 
            c.comment_id,
            c.post_id,
            c.content,
            c.created_at,
            c.parent_comment_id,
            u.username as commenter_username,
            up.profile_picture_url as commenter_picture,
            0 as depth,
            CAST(c.comment_id as TEXT) as path
        FROM comments c
        JOIN users u ON c.user_id = u.user_id
        LEFT JOIN user_profiles up ON u.user_id = up.user_id
        WHERE c.parent_comment_id IS NULL
        AND c.status = 'published'

        UNION ALL

        SELECT 
            c.comment_id,
            c.post_id,
            c.content,
            c.created_at,
            c.parent_comment_id,
            u.username,
            up.profile_picture_url,
            ct.depth + 1,
            ct.path || '.' || c.comment_id
        FROM comments c
        JOIN users u ON c.user_id = u.user_id
        LEFT JOIN user_profiles up ON u.user_id = up.user_id
        JOIN comment_tree ct ON c.parent_comment_id = ct.comment_id
        WHERE c.status = 'published'
        AND ct.depth < 5 -- Limit recursion depth
    )
    SELECT 
        post_id,
        JSON_AGG(
            JSON_BUILD_OBJECT(
                'comment_id', comment_id,
                'content', content,
                'created_at', created_at,
                'commenter_username', commenter_username,
                'commenter_picture', commenter_picture,
                'depth', depth,
                'path', path
            ) ORDER BY path
        ) as comments_json
    FROM comment_tree
    GROUP BY post_id
)

SELECT 
    pd.post_id,
    pd.title,
    pd.content,
    pd.created_at,
    pd.username as author_username,
    pd.bio as author_bio,
    pd.profile_picture_url as author_picture,
    pd.categories,
    pd.tags,
    pd.view_count,
    pd.like_count,
    pd.comment_count,

    -- Comments as JSON (complex aggregation)
    COALESCE(ch.comments_json, '[]'::json) as comments

FROM post_data pd
LEFT JOIN comment_hierarchy ch ON pd.post_id = ch.post_id
ORDER BY pd.created_at DESC;

-- Basic user activity analysis (multiple complex joins)
WITH user_activity AS (
    SELECT 
        u.user_id,
        u.username,
        u.email,
        u.created_at as user_created_at,

        -- Post statistics
        COUNT(DISTINCT p.post_id) as total_posts,
        COUNT(DISTINCT CASE WHEN p.status = 'published' THEN p.post_id END) as published_posts,
        SUM(p.view_count) as total_views,
        SUM(p.like_count) as total_likes,

        -- Comment statistics
        COUNT(DISTINCT c.comment_id) as total_comments,

        -- Category usage
        COUNT(DISTINCT pc.category_id) as categories_used,

        -- Tag usage
        COUNT(DISTINCT pt.tag_id) as tags_used,

        -- Activity timeline
        MAX(GREATEST(p.created_at, c.created_at)) as last_activity_at,

        -- Engagement metrics
        AVG(p.view_count) as avg_views_per_post,
        AVG(p.like_count) as avg_likes_per_post,
        AVG(p.comment_count) as avg_comments_per_post

    FROM users u
    LEFT JOIN posts p ON u.user_id = p.user_id
    LEFT JOIN comments c ON u.user_id = c.user_id
    LEFT JOIN post_categories pc ON p.post_id = pc.post_id
    LEFT JOIN post_tags pt ON p.post_id = pt.post_id

    WHERE u.account_status = 'active'
    GROUP BY u.user_id, u.username, u.email, u.created_at
),

engagement_analysis AS (
    SELECT 
        ua.*,

        -- Activity classification
        CASE 
            WHEN ua.total_posts > 50 AND ua.total_comments > 100 THEN 'highly_active'
            WHEN ua.total_posts > 10 AND ua.total_comments > 25 THEN 'moderately_active'
            WHEN ua.total_posts > 0 OR ua.total_comments > 0 THEN 'low_activity'
            ELSE 'inactive'
        END as activity_level,

        -- Content quality indicators
        CASE 
            WHEN ua.avg_views_per_post > 1000 AND ua.avg_likes_per_post > 50 THEN 'high_quality'
            WHEN ua.avg_views_per_post > 500 AND ua.avg_likes_per_post > 20 THEN 'good_quality'
            WHEN ua.avg_views_per_post > 100 THEN 'average_quality'
            ELSE 'low_engagement'
        END as content_quality,

        -- User tenure
        EXTRACT(DAYS FROM CURRENT_TIMESTAMP - ua.user_created_at) as days_since_signup,
        EXTRACT(DAYS FROM CURRENT_TIMESTAMP - ua.last_activity_at) as days_since_last_activity,

        -- Productivity metrics
        CASE 
            WHEN EXTRACT(DAYS FROM CURRENT_TIMESTAMP - ua.user_created_at) > 0 THEN
                ua.total_posts / EXTRACT(DAYS FROM CURRENT_TIMESTAMP - ua.user_created_at)::DECIMAL
            ELSE 0
        END as posts_per_day,

        -- Diversity metrics
        CASE 
            WHEN ua.total_posts > 0 THEN ua.categories_used / ua.total_posts::DECIMAL
            ELSE 0
        END as category_diversity,

        CASE 
            WHEN ua.total_posts > 0 THEN ua.tags_used / ua.total_posts::DECIMAL
            ELSE 0
        END as tag_diversity

    FROM user_activity ua
)

SELECT 
    ea.username,
    ea.activity_level,
    ea.content_quality,
    ea.total_posts,
    ea.published_posts,
    ROUND(ea.total_views, 0) as total_views,
    ROUND(ea.total_likes, 0) as total_likes,
    ea.total_comments,

    -- Engagement metrics
    ROUND(ea.avg_views_per_post, 1) as avg_views_per_post,
    ROUND(ea.avg_likes_per_post, 1) as avg_likes_per_post,
    ROUND(ea.avg_comments_per_post, 1) as avg_comments_per_post,

    -- Activity metrics
    ROUND(ea.posts_per_day, 3) as posts_per_day,
    ROUND(ea.category_diversity, 2) as category_diversity,
    ROUND(ea.tag_diversity, 2) as tag_diversity,

    -- Time metrics
    ea.days_since_signup,
    ea.days_since_last_activity,

    -- Recommendations
    CASE 
        WHEN ea.activity_level = 'inactive' AND ea.days_since_signup < 30 THEN 'new_user_onboarding'
        WHEN ea.activity_level = 'low_activity' AND ea.days_since_last_activity > 30 THEN 're_engagement_campaign'
        WHEN ea.content_quality = 'high_quality' THEN 'featured_contributor'
        WHEN ea.activity_level = 'highly_active' AND ea.content_quality != 'high_quality' THEN 'content_improvement_guidance'
        ELSE 'continue_monitoring'
    END as engagement_recommendation

FROM engagement_analysis ea
ORDER BY ea.total_views DESC, ea.total_posts DESC;

-- Problems with traditional relational data modeling:
-- 1. Rigid schema requiring extensive migrations for changes
-- 2. Complex joins across multiple tables for simple data retrieval
-- 3. Object-relational impedance mismatch for nested data structures
-- 4. Performance overhead from normalization and multiple table queries
-- 5. Difficulty modeling hierarchical and semi-structured data
-- 6. Limited flexibility for evolving application requirements
-- 7. Complex relationship management requiring junction tables
-- 8. Inefficient storage for sparse or optional data fields
-- 9. Challenging aggregation across related entities
-- 10. Maintenance complexity for schema evolution and data migration

MongoDB provides comprehensive data modeling capabilities with flexible document structures and embedded relationships:

// MongoDB Advanced Data Modeling - flexible document structures with optimized relationships
const { MongoClient, ObjectId } = require('mongodb');

// Comprehensive MongoDB Data Modeling Manager
class AdvancedDataModelingManager {
  constructor(mongoUri, modelingConfig = {}) {
    this.mongoUri = mongoUri;
    this.client = null;
    this.db = null;

    // Data modeling configuration
    this.config = {
      // Schema validation settings
      enableSchemaValidation: modelingConfig.enableSchemaValidation !== false,
      strictValidation: modelingConfig.strictValidation || false,
      validationLevel: modelingConfig.validationLevel || 'moderate',

      // Document design preferences
      embeddingStrategy: modelingConfig.embeddingStrategy || 'balanced', // balanced, aggressive, conservative
      referencingThreshold: modelingConfig.referencingThreshold || 100, // Size threshold for referencing
      denormalizationLevel: modelingConfig.denormalizationLevel || 'moderate',

      // Performance optimization
      enableIndexOptimization: modelingConfig.enableIndexOptimization !== false,
      enableAggregationOptimization: modelingConfig.enableAggregationOptimization || false,
      enableQueryPatternAnalysis: modelingConfig.enableQueryPatternAnalysis || false,

      // Relationship management
      cascadeDeletes: modelingConfig.cascadeDeletes || false,
      maintainReferentialIntegrity: modelingConfig.maintainReferentialIntegrity || false,
      enableRelationshipIndexing: modelingConfig.enableRelationshipIndexing !== false,

      // Schema evolution
      enableSchemaEvolution: modelingConfig.enableSchemaEvolution || false,
      backwardCompatibility: modelingConfig.backwardCompatibility !== false,
      versionedSchemas: modelingConfig.versionedSchemas || false
    };

    // Document schemas and relationships
    this.documentSchemas = new Map();
    this.relationshipMappings = new Map();
    this.validationRules = new Map();

    // Performance and optimization state
    this.queryPatterns = new Map();
    this.indexStrategies = new Map();
    this.optimizationRecommendations = [];

    this.initializeDataModeling();
  }

  async initializeDataModeling() {
    console.log('Initializing advanced MongoDB data modeling...');

    try {
      // Connect to MongoDB
      this.client = new MongoClient(this.mongoUri);
      await this.client.connect();
      this.db = this.client.db();

      // Setup comprehensive user schema with embedded relationships
      await this.defineUserSchema();

      // Setup post schema with flexible content structure
      await this.definePostSchema();

      // Setup optimized indexes for performance
      if (this.config.enableIndexOptimization) {
        await this.setupOptimizedIndexes();
      }

      // Initialize schema validation if enabled
      if (this.config.enableSchemaValidation) {
        await this.applySchemaValidation();
      }

      console.log('Advanced data modeling initialized successfully');

    } catch (error) {
      console.error('Error initializing data modeling:', error);
      throw error;
    }
  }

  async defineUserSchema() {
    console.log('Defining comprehensive user schema with embedded relationships...');

    try {
      const userSchema = {
        // Schema metadata
        schemaVersion: '1.0',
        schemaName: 'user_profile',
        lastUpdated: new Date(),

        // Document structure
        documentStructure: {
          // Core identification
          _id: 'ObjectId',
          userId: 'string', // Application-level ID
          username: 'string',
          email: 'string',

          // Personal information (embedded object)
          profile: {
            firstName: 'string',
            lastName: 'string',
            displayName: 'string',
            bio: 'string',
            dateOfBirth: 'date',
            phoneNumber: 'string',

            // Professional information
            company: 'string',
            jobTitle: 'string',
            yearsOfExperience: 'number',
            educationLevel: 'string',

            // Skills and interests (arrays for flexibility)
            skills: ['string'],
            interests: ['string'],
            languages: [
              {
                language: 'string',
                proficiency: 'string' // beginner, intermediate, advanced, native
              }
            ],

            // Social media links (flexible object)
            socialMedia: {
              facebook: 'string',
              twitter: 'string',
              linkedin: 'string',
              instagram: 'string',
              github: 'string',
              website: 'string'
            },

            // Profile media
            profilePicture: {
              url: 'string',
              thumbnailUrl: 'string',
              uploadedAt: 'date',
              fileSize: 'number',
              dimensions: {
                width: 'number',
                height: 'number'
              }
            },

            coverPhoto: {
              url: 'string',
              uploadedAt: 'date',
              fileSize: 'number'
            }
          },

          // Contact information (embedded for locality)
          contact: {
            addresses: [
              {
                type: 'string', // home, work, billing, shipping
                addressLine1: 'string',
                addressLine2: 'string',
                city: 'string',
                state: 'string',
                postalCode: 'string',
                country: 'string',
                isPrimary: 'boolean',
                coordinates: {
                  latitude: 'number',
                  longitude: 'number'
                }
              }
            ],

            phoneNumbers: [
              {
                type: 'string', // mobile, home, work
                number: 'string',
                countryCode: 'string',
                isPrimary: 'boolean',
                isVerified: 'boolean'
              }
            ],

            emailAddresses: [
              {
                email: 'string',
                type: 'string', // primary, work, personal
                isVerified: 'boolean',
                isPrimary: 'boolean'
              }
            ]
          },

          // Account settings and preferences (embedded)
          settings: {
            // Privacy settings
            privacy: {
              profileVisibility: 'string', // public, private, friends
              emailVisible: 'boolean',
              phoneVisible: 'boolean',
              searchable: 'boolean'
            },

            // Notification preferences
            notifications: {
              email: {
                posts: 'boolean',
                comments: 'boolean',
                mentions: 'boolean',
                messages: 'boolean',
                newsletter: 'boolean',
                marketing: 'boolean'
              },
              push: {
                posts: 'boolean',
                comments: 'boolean',
                mentions: 'boolean',
                messages: 'boolean'
              },
              sms: {
                security: 'boolean',
                important: 'boolean'
              }
            },

            // UI preferences
            interface: {
              theme: 'string', // light, dark, auto
              language: 'string',
              timezone: 'string',
              dateFormat: 'string',
              currency: 'string'
            },

            // Content preferences
            content: {
              defaultPostVisibility: 'string',
              autoSaveEnabled: 'boolean',
              contentLanguages: ['string']
            }
          },

          // Activity tracking (embedded for performance)
          activity: {
            // Account lifecycle
            createdAt: 'date',
            updatedAt: 'date',
            lastLoginAt: 'date',
            lastActiveAt: 'date',

            // Status information
            status: 'string', // active, inactive, suspended, deleted
            emailVerifiedAt: 'date',
            phoneVerifiedAt: 'date',

            // Statistics (denormalized for performance)
            stats: {
              totalPosts: 'number',
              publishedPosts: 'number',
              totalComments: 'number',
              totalLikes: 'number',
              totalViews: 'number',
              followersCount: 'number',
              followingCount: 'number',

              // Calculated metrics
              engagementRate: 'number',
              averagePostViews: 'number',
              profileCompleteness: 'number'
            },

            // Activity timeline (recent activities embedded)
            recentActivities: [
              {
                type: 'string', // login, post_created, comment_posted, profile_updated
                timestamp: 'date',
                details: 'object', // Flexible details object
                ipAddress: 'string',
                userAgent: 'string'
              }
            ]
          },

          // Authentication and security (embedded)
          authentication: {
            passwordHash: 'string',
            passwordSalt: 'string',
            lastPasswordChange: 'date',

            // Two-factor authentication
            twoFactorEnabled: 'boolean',
            twoFactorSecret: 'string',
            backupCodes: ['string'],

            // Session management
            activeSessions: [
              {
                sessionId: 'string',
                createdAt: 'date',
                lastActivityAt: 'date',
                ipAddress: 'string',
                userAgent: 'string',
                deviceInfo: 'object'
              }
            ],

            // Security events
            securityEvents: [
              {
                type: 'string', // login_attempt, password_change, suspicious_activity
                timestamp: 'date',
                details: 'object',
                resolved: 'boolean'
              }
            ]
          },

          // Content relationships (selective referencing for large collections)
          content: {
            // Recent posts (embedded for performance)
            recentPosts: [
              {
                postId: 'ObjectId',
                title: 'string',
                createdAt: 'date',
                status: 'string',
                viewCount: 'number',
                likeCount: 'number'
              }
            ],

            // Favorite posts (referenced due to potential size)
            favoritePostIds: ['ObjectId'],

            // Bookmarked content
            bookmarks: [
              {
                contentId: 'ObjectId',
                contentType: 'string', // post, comment, user
                bookmarkedAt: 'date',
                tags: ['string'],
                notes: 'string'
              }
            ]
          },

          // Social relationships (hybrid approach)
          social: {
            // Close relationships (embedded for performance)
            following: [
              {
                userId: 'ObjectId',
                username: 'string',
                followedAt: 'date',
                relationshipType: 'string' // friend, colleague, interest
              }
            ],

            // Large follower lists (referenced)
            followerIds: ['ObjectId'],

            // Social groups and communities
            groups: [
              {
                groupId: 'ObjectId',
                groupName: 'string',
                role: 'string', // member, moderator, admin
                joinedAt: 'date'
              }
            ]
          },

          // Flexible metadata for extensibility
          metadata: {
            customFields: 'object', // Application-specific fields
            tags: ['string'],
            categories: ['string'],
            source: 'string', // registration_source
            referrer: 'string'
          }
        },

        // Validation rules
        validationRules: {
          required: ['username', 'email', 'profile.firstName', 'profile.lastName'],
          unique: ['username', 'email', 'userId'],
          patterns: {
            email: /^[^\s@]+@[^\s@]+\.[^\s@]+$/,
            username: /^[a-zA-Z0-9_]{3,30}$/
          },
          ranges: {
            'profile.yearsOfExperience': { min: 0, max: 70 },
            'activity.stats.profileCompleteness': { min: 0, max: 100 }
          }
        },

        // Index strategies for optimal performance
        indexStrategies: [
          { fields: { username: 1 }, unique: true },
          { fields: { email: 1 }, unique: true },
          { fields: { userId: 1 }, unique: true },
          { fields: { 'activity.lastActiveAt': -1 } },
          { fields: { 'activity.createdAt': -1 } },
          { fields: { 'profile.skills': 1 } },
          { fields: { 'metadata.tags': 1 } },

          // Compound indexes for common query patterns
          { fields: { 'activity.status': 1, 'activity.lastActiveAt': -1 } },
          { fields: { 'profile.company': 1, 'profile.jobTitle': 1 } },
          { fields: { 'settings.privacy.profileVisibility': 1, 'activity.stats.totalPosts': -1 } }
        ]
      };

      // Store schema definition
      this.documentSchemas.set('users', userSchema);

      console.log('User schema defined with embedded relationships and flexible structure');

    } catch (error) {
      console.error('Error defining user schema:', error);
      throw error;
    }
  }

  async definePostSchema() {
    console.log('Defining flexible post schema with content optimization...');

    try {
      const postSchema = {
        // Schema metadata
        schemaVersion: '1.0',
        schemaName: 'content_post',
        lastUpdated: new Date(),

        // Document structure optimized for content management
        documentStructure: {
          // Core identification
          _id: 'ObjectId',
          postId: 'string', // Application-level ID
          slug: 'string', // URL-friendly identifier

          // Author information (denormalized for performance)
          author: {
            userId: 'ObjectId',
            username: 'string',
            displayName: 'string',
            profilePicture: 'string',

            // Author stats (denormalized)
            totalPosts: 'number',
            followerCount: 'number',
            verified: 'boolean'
          },

          // Content structure (flexible for different content types)
          content: {
            // Basic content information
            title: 'string',
            subtitle: 'string',
            excerpt: 'string',
            body: 'string', // Main content
            contentType: 'string', // article, tutorial, review, announcement

            // Rich content elements
            media: [
              {
                type: 'string', // image, video, audio, embed
                url: 'string',
                thumbnailUrl: 'string',
                caption: 'string',
                altText: 'string',
                dimensions: {
                  width: 'number',
                  height: 'number'
                },
                fileSize: 'number',
                mimeType: 'string',
                duration: 'number', // For video/audio
                uploadedAt: 'date'
              }
            ],

            // Content structure and formatting
            sections: [
              {
                type: 'string', // paragraph, heading, list, code, quote
                content: 'string',
                level: 'number', // For headings
                language: 'string', // For code blocks
                order: 'number'
              }
            ],

            // SEO and metadata
            seo: {
              metaTitle: 'string',
              metaDescription: 'string',
              keywords: ['string'],
              canonicalUrl: 'string',
              openGraphImage: 'string',

              // Schema.org structured data
              structuredData: 'object'
            },

            // Content settings
            formatting: {
              readingTime: 'number', // Estimated reading time in minutes
              wordCount: 'number',
              language: 'string',
              rtlDirection: 'boolean'
            }
          },

          // Publication and status management
          publication: {
            // Status workflow
            status: 'string', // draft, review, published, archived, deleted
            visibility: 'string', // public, private, unlisted, password_protected
            password: 'string', // For password-protected posts

            // Publishing timeline
            createdAt: 'date',
            updatedAt: 'date',
            publishedAt: 'date',
            scheduledPublishAt: 'date',

            // Revision history (embedded for recent changes)
            revisions: [
              {
                version: 'number',
                changedAt: 'date',
                changedBy: 'ObjectId',
                changeType: 'string', // content, metadata, status
                changesSummary: 'string',
                previousTitle: 'string', // Track major changes
                previousContent: 'string' // Last few versions only
              }
            ],

            // Publishing settings
            allowComments: 'boolean',
            allowSharing: 'boolean',
            allowIndexing: 'boolean',
            requireApproval: 'boolean'
          },

          // Categorization and tagging (embedded for performance)
          taxonomy: {
            // Categories (hierarchical structure)
            categories: [
              {
                categoryId: 'ObjectId',
                name: 'string',
                slug: 'string',
                level: 'number', // For hierarchical categories
                parentCategory: 'string'
              }
            ],

            // Tags (flat structure for flexibility)
            tags: [
              {
                tag: 'string',
                relevanceScore: 'number',
                addedBy: 'ObjectId',
                addedAt: 'date'
              }
            ],

            // Custom taxonomies
            customFields: {
              difficulty: 'string', // For tutorials
              estimatedTime: 'number', // For how-to content
              targetAudience: 'string',
              prerequisites: ['string']
            }
          },

          // Engagement metrics (denormalized for performance)
          engagement: {
            // View statistics
            views: {
              total: 'number',
              unique: 'number',
              today: 'number',
              thisWeek: 'number',
              thisMonth: 'number',

              // View sources
              sources: {
                direct: 'number',
                social: 'number',
                search: 'number',
                referral: 'number'
              }
            },

            // Interaction statistics
            interactions: {
              likes: 'number',
              dislikes: 'number',
              shares: 'number',
              bookmarks: 'number',

              // Comment statistics
              comments: {
                total: 'number',
                approved: 'number',
                pending: 'number',
                spam: 'number'
              }
            },

            // Engagement metrics
            metrics: {
              engagementRate: 'number',
              averageTimeOnPage: 'number',
              bounceRate: 'number',
              socialShares: 'number'
            },

            // Top comments (embedded for performance)
            topComments: [
              {
                commentId: 'ObjectId',
                content: 'string',
                author: {
                  userId: 'ObjectId',
                  username: 'string',
                  profilePicture: 'string'
                },
                createdAt: 'date',
                likeCount: 'number',
                isHighlighted: 'boolean'
              }
            ]
          },

          // Comments (hybrid approach - recent embedded, full collection referenced)
          comments: {
            // Recent comments embedded for quick access
            recent: [
              {
                commentId: 'ObjectId',
                parentCommentId: 'ObjectId', // For threading
                content: 'string',

                // Author information (denormalized)
                author: {
                  userId: 'ObjectId',
                  username: 'string',
                  displayName: 'string',
                  profilePicture: 'string'
                },

                // Comment metadata
                createdAt: 'date',
                updatedAt: 'date',
                status: 'string', // approved, pending, spam, deleted

                // Comment engagement
                likeCount: 'number',
                replyCount: 'number',
                isEdited: 'boolean',
                isPinned: 'boolean',

                // Moderation
                flags: ['string'],
                moderationStatus: 'string'
              }
            ],

            // Statistics
            statistics: {
              totalComments: 'number',
              approvedComments: 'number',
              pendingComments: 'number',
              lastCommentAt: 'date'
            }
          },

          // Performance optimization data
          performance: {
            // Caching information
            lastCached: 'date',
            cacheVersion: 'string',

            // Search optimization
            searchTerms: ['string'], // Extracted keywords for search
            searchBoost: 'number', // Manual search ranking boost

            // Content analysis
            sentiment: {
              score: 'number', // -1 to 1
              magnitude: 'number',
              language: 'string'
            },

            readabilityScore: 'number',
            complexity: 'string' // simple, moderate, complex
          },

          // Flexible metadata
          metadata: {
            customFields: 'object',
            source: 'string', // web, mobile, api
            importedFrom: 'string',
            externalIds: 'object', // For integration with other systems

            // A/B testing
            experiments: [
              {
                experimentId: 'string',
                variant: 'string',
                startDate: 'date',
                endDate: 'date'
              }
            ]
          }
        },

        // Validation rules for data integrity
        validationRules: {
          required: ['content.title', 'author.userId', 'publication.status'],
          unique: ['slug', 'postId'],
          patterns: {
            slug: /^[a-z0-9-]+$/,
            'content.contentType': /^(article|tutorial|review|announcement|news)$/
          },
          ranges: {
            'content.formatting.readingTime': { min: 0, max: 300 },
            'engagement.metrics.engagementRate': { min: 0, max: 100 }
          }
        },

        // Index strategies optimized for content queries
        indexStrategies: [
          { fields: { slug: 1 }, unique: true },
          { fields: { postId: 1 }, unique: true },
          { fields: { 'author.userId': 1, 'publication.publishedAt': -1 } },
          { fields: { 'publication.status': 1, 'publication.publishedAt': -1 } },
          { fields: { 'taxonomy.categories.name': 1 } },
          { fields: { 'taxonomy.tags.tag': 1 } },

          // Text search index
          { fields: { 'content.title': 'text', 'content.body': 'text', 'taxonomy.tags.tag': 'text' } },

          // Performance optimization indexes
          { fields: { 'engagement.views.total': -1, 'publication.publishedAt': -1 } },
          { fields: { 'publication.visibility': 1, 'engagement.views.total': -1 } },
          { fields: { 'content.contentType': 1, 'publication.publishedAt': -1 } }
        ]
      };

      // Store schema definition
      this.documentSchemas.set('posts', postSchema);

      console.log('Post schema defined with flexible content structure and performance optimization');

    } catch (error) {
      console.error('Error defining post schema:', error);
      throw error;
    }
  }

  async createOptimizedUserProfile(userData, profileData = {}) {
    console.log(`Creating optimized user profile: ${userData.username}`);

    try {
      const userDocument = {
        // Core identification
        userId: userData.userId || new ObjectId().toString(),
        username: userData.username,
        email: userData.email,

        // Personal information (embedded)
        profile: {
          firstName: profileData.firstName || '',
          lastName: profileData.lastName || '',
          displayName: profileData.displayName || `${profileData.firstName} ${profileData.lastName}`.trim(),
          bio: profileData.bio || '',
          dateOfBirth: profileData.dateOfBirth ? new Date(profileData.dateOfBirth) : null,
          phoneNumber: profileData.phoneNumber || '',

          // Professional information
          company: profileData.company || '',
          jobTitle: profileData.jobTitle || '',
          yearsOfExperience: profileData.yearsOfExperience || 0,
          educationLevel: profileData.educationLevel || '',

          // Skills and interests
          skills: profileData.skills || [],
          interests: profileData.interests || [],
          languages: profileData.languages || [
            { language: 'English', proficiency: 'native' }
          ],

          // Social media links
          socialMedia: {
            facebook: profileData.socialMedia?.facebook || '',
            twitter: profileData.socialMedia?.twitter || '',
            linkedin: profileData.socialMedia?.linkedin || '',
            instagram: profileData.socialMedia?.instagram || '',
            github: profileData.socialMedia?.github || '',
            website: profileData.socialMedia?.website || ''
          },

          // Profile media
          profilePicture: profileData.profilePicture ? {
            url: profileData.profilePicture.url,
            thumbnailUrl: profileData.profilePicture.thumbnailUrl || profileData.profilePicture.url,
            uploadedAt: new Date(),
            fileSize: profileData.profilePicture.fileSize || 0,
            dimensions: profileData.profilePicture.dimensions || { width: 0, height: 0 }
          } : null
        },

        // Contact information
        contact: {
          addresses: profileData.addresses || [],
          phoneNumbers: profileData.phoneNumbers || [],
          emailAddresses: [
            {
              email: userData.email,
              type: 'primary',
              isVerified: false,
              isPrimary: true
            }
          ]
        },

        // Account settings with sensible defaults
        settings: {
          privacy: {
            profileVisibility: 'public',
            emailVisible: false,
            phoneVisible: false,
            searchable: true
          },

          notifications: {
            email: {
              posts: true,
              comments: true,
              mentions: true,
              messages: true,
              newsletter: false,
              marketing: false
            },
            push: {
              posts: true,
              comments: true,
              mentions: true,
              messages: true
            },
            sms: {
              security: true,
              important: false
            }
          },

          interface: {
            theme: 'light',
            language: 'en',
            timezone: 'UTC',
            dateFormat: 'MM/DD/YYYY',
            currency: 'USD'
          },

          content: {
            defaultPostVisibility: 'public',
            autoSaveEnabled: true,
            contentLanguages: ['en']
          }
        },

        // Activity tracking
        activity: {
          createdAt: new Date(),
          updatedAt: new Date(),
          lastLoginAt: new Date(),
          lastActiveAt: new Date(),

          status: 'active',
          emailVerifiedAt: null,
          phoneVerifiedAt: null,

          // Initialize statistics
          stats: {
            totalPosts: 0,
            publishedPosts: 0,
            totalComments: 0,
            totalLikes: 0,
            totalViews: 0,
            followersCount: 0,
            followingCount: 0,
            engagementRate: 0,
            averagePostViews: 0,
            profileCompleteness: this.calculateProfileCompleteness(profileData)
          },

          recentActivities: [
            {
              type: 'account_created',
              timestamp: new Date(),
              details: { source: 'registration' }
            }
          ]
        },

        // Authentication (placeholder - would be handled by auth system)
        authentication: {
          passwordHash: '', // Would be set by authentication system
          passwordSalt: '',
          lastPasswordChange: new Date(),
          twoFactorEnabled: false,
          activeSessions: [],
          securityEvents: []
        },

        // Initialize content relationships
        content: {
          recentPosts: [],
          favoritePostIds: [],
          bookmarks: []
        },

        // Initialize social relationships
        social: {
          following: [],
          followerIds: [],
          groups: []
        },

        // Metadata
        metadata: {
          customFields: profileData.customFields || {},
          tags: profileData.tags || [],
          categories: profileData.categories || [],
          source: profileData.source || 'direct_registration',
          referrer: profileData.referrer || ''
        }
      };

      // Insert user document
      const result = await this.db.collection('users').insertOne(userDocument);

      // Update activity statistics
      await this.updateUserStatistics(result.insertedId);

      return {
        success: true,
        userId: result.insertedId,
        userDocument: userDocument,
        profileCompleteness: userDocument.activity.stats.profileCompleteness
      };

    } catch (error) {
      console.error(`Error creating user profile for ${userData.username}:`, error);
      return {
        success: false,
        error: error.message,
        username: userData.username
      };
    }
  }

  async createOptimizedPost(postData, authorId) {
    console.log(`Creating optimized post: ${postData.title}`);

    try {
      // Get author information for denormalization
      const author = await this.db.collection('users').findOne(
        { _id: new ObjectId(authorId) },
        {
          projection: {
            username: 1,
            'profile.displayName': 1,
            'profile.profilePicture.url': 1,
            'activity.stats.totalPosts': 1,
            'activity.stats.followersCount': 1
          }
        }
      );

      if (!author) {
        throw new Error('Author not found');
      }

      const postDocument = {
        // Core identification
        postId: postData.postId || new ObjectId().toString(),
        slug: postData.slug || this.generateSlug(postData.title),

        // Author information (denormalized)
        author: {
          userId: new ObjectId(authorId),
          username: author.username,
          displayName: author.profile?.displayName || author.username,
          profilePicture: author.profile?.profilePicture?.url || '',
          totalPosts: author.activity?.stats?.totalPosts || 0,
          followerCount: author.activity?.stats?.followersCount || 0,
          verified: false // Would be determined by verification system
        },

        // Content structure
        content: {
          title: postData.title,
          subtitle: postData.subtitle || '',
          excerpt: postData.excerpt || this.generateExcerpt(postData.body),
          body: postData.body,
          contentType: postData.contentType || 'article',

          // Media content
          media: postData.media || [],

          // Content sections (for structured content)
          sections: this.parseContentSections(postData.body),

          // SEO optimization
          seo: {
            metaTitle: postData.seo?.metaTitle || postData.title,
            metaDescription: postData.seo?.metaDescription || postData.excerpt,
            keywords: postData.seo?.keywords || this.extractKeywords(postData.body),
            canonicalUrl: postData.seo?.canonicalUrl || '',
            openGraphImage: postData.featuredImage || ''
          },

          // Content formatting
          formatting: {
            readingTime: this.calculateReadingTime(postData.body),
            wordCount: this.calculateWordCount(postData.body),
            language: postData.language || 'en',
            rtlDirection: postData.rtlDirection || false
          }
        },

        // Publication settings
        publication: {
          status: postData.status || 'draft',
          visibility: postData.visibility || 'public',
          password: postData.password || '',

          createdAt: new Date(),
          updatedAt: new Date(),
          publishedAt: postData.status === 'published' ? new Date() : null,
          scheduledPublishAt: postData.scheduledPublishAt ? new Date(postData.scheduledPublishAt) : null,

          revisions: [
            {
              version: 1,
              changedAt: new Date(),
              changedBy: new ObjectId(authorId),
              changeType: 'content',
              changesSummary: 'Initial post creation'
            }
          ],

          allowComments: postData.allowComments !== false,
          allowSharing: postData.allowSharing !== false,
          allowIndexing: postData.allowIndexing !== false,
          requireApproval: postData.requireApproval || false
        },

        // Taxonomy
        taxonomy: {
          categories: (postData.categories || []).map(cat => ({
            categoryId: new ObjectId(),
            name: cat.name || cat,
            slug: this.generateSlug(cat.name || cat),
            level: cat.level || 1,
            parentCategory: cat.parent || ''
          })),

          tags: (postData.tags || []).map(tag => ({
            tag: typeof tag === 'string' ? tag : tag.name,
            relevanceScore: typeof tag === 'object' ? tag.relevance : 1.0,
            addedBy: new ObjectId(authorId),
            addedAt: new Date()
          })),

          customFields: postData.customFields || {}
        },

        // Initialize engagement metrics
        engagement: {
          views: {
            total: 0,
            unique: 0,
            today: 0,
            thisWeek: 0,
            thisMonth: 0,
            sources: {
              direct: 0,
              social: 0,
              search: 0,
              referral: 0
            }
          },

          interactions: {
            likes: 0,
            dislikes: 0,
            shares: 0,
            bookmarks: 0,
            comments: {
              total: 0,
              approved: 0,
              pending: 0,
              spam: 0
            }
          },

          metrics: {
            engagementRate: 0,
            averageTimeOnPage: 0,
            bounceRate: 0,
            socialShares: 0
          },

          topComments: []
        },

        // Initialize comments
        comments: {
          recent: [],
          statistics: {
            totalComments: 0,
            approvedComments: 0,
            pendingComments: 0,
            lastCommentAt: null
          }
        },

        // Performance data
        performance: {
          lastCached: null,
          cacheVersion: '1.0',
          searchTerms: this.extractSearchTerms(postData.title, postData.body),
          searchBoost: postData.searchBoost || 1.0,
          sentiment: this.analyzeSentiment(postData.body),
          readabilityScore: this.calculateReadabilityScore(postData.body),
          complexity: this.assessComplexity(postData.body)
        },

        // Metadata
        metadata: {
          customFields: postData.metadata || {},
          source: postData.source || 'web',
          importedFrom: postData.importedFrom || '',
          externalIds: postData.externalIds || {},
          experiments: postData.experiments || []
        }
      };

      // Insert post document
      const result = await this.db.collection('posts').insertOne(postDocument);

      // Update author statistics
      await this.updateAuthorStatistics(authorId, 'post_created');

      // Update user's recent posts
      await this.updateUserRecentPosts(authorId, result.insertedId, postDocument);

      return {
        success: true,
        postId: result.insertedId,
        postDocument: postDocument,
        readingTime: postDocument.content.formatting.readingTime,
        wordCount: postDocument.content.formatting.wordCount
      };

    } catch (error) {
      console.error(`Error creating post '${postData.title}':`, error);
      return {
        success: false,
        error: error.message,
        title: postData.title
      };
    }
  }

  async performAdvancedQuery(queryOptions) {
    console.log('Executing advanced MongoDB query with optimized document structure...');

    try {
      const {
        collection,
        filters = {},
        projection = {},
        sort = {},
        limit = 50,
        skip = 0,
        includeRelated = false
      } = queryOptions;

      // Build aggregation pipeline for complex queries
      const pipeline = [];

      // Match stage
      if (Object.keys(filters).length > 0) {
        pipeline.push({ $match: filters });
      }

      // Add related data if requested
      if (includeRelated && collection === 'posts') {
        pipeline.push(
          // Add full comment documents for recent comments
          {
            $lookup: {
              from: 'comments',
              localField: '_id',
              foreignField: 'postId',
              as: 'fullComments',
              pipeline: [
                { $match: { status: 'approved' } },
                { $sort: { createdAt: -1 } },
                { $limit: 10 }
              ]
            }
          },

          // Add author's full profile
          {
            $lookup: {
              from: 'users',
              localField: 'author.userId',
              foreignField: '_id',
              as: 'authorProfile',
              pipeline: [
                {
                  $project: {
                    username: 1,
                    'profile.displayName': 1,
                    'profile.bio': 1,
                    'profile.profilePicture': 1,
                    'activity.stats': 1
                  }
                }
              ]
            }
          }
        );
      }

      // Projection stage
      if (Object.keys(projection).length > 0) {
        pipeline.push({ $project: projection });
      }

      // Sort stage
      if (Object.keys(sort).length > 0) {
        pipeline.push({ $sort: sort });
      }

      // Pagination
      if (skip > 0) {
        pipeline.push({ $skip: skip });
      }

      if (limit > 0) {
        pipeline.push({ $limit: limit });
      }

      // Execute aggregation
      const results = await this.db.collection(collection).aggregate(pipeline).toArray();

      return {
        success: true,
        results: results,
        count: results.length,
        pipeline: pipeline
      };

    } catch (error) {
      console.error('Error executing advanced query:', error);
      return {
        success: false,
        error: error.message,
        queryOptions: queryOptions
      };
    }
  }

  // Utility methods for document processing and optimization

  calculateProfileCompleteness(profileData) {
    let score = 0;
    const maxScore = 100;

    // Basic information (40 points)
    if (profileData.firstName) score += 10;
    if (profileData.lastName) score += 10;
    if (profileData.bio) score += 10;
    if (profileData.profilePicture) score += 10;

    // Professional information (30 points)
    if (profileData.company) score += 10;
    if (profileData.jobTitle) score += 10;
    if (profileData.skills && profileData.skills.length > 0) score += 10;

    // Contact information (20 points)
    if (profileData.phoneNumber) score += 10;
    if (profileData.addresses && profileData.addresses.length > 0) score += 10;

    // Additional information (10 points)
    if (profileData.socialMedia && Object.values(profileData.socialMedia).some(url => url)) score += 10;

    return Math.min(score, maxScore);
  }

  generateSlug(title) {
    return title
      .toLowerCase()
      .replace(/[^a-z0-9\s-]/g, '')
      .replace(/\s+/g, '-')
      .replace(/-+/g, '-')
      .trim('-');
  }

  generateExcerpt(body, maxLength = 200) {
    const text = body.replace(/<[^>]*>/g, '').trim(); // Remove HTML tags
    return text.length > maxLength ? text.substring(0, maxLength) + '...' : text;
  }

  calculateReadingTime(text) {
    const wordsPerMinute = 200;
    const wordCount = this.calculateWordCount(text);
    return Math.ceil(wordCount / wordsPerMinute);
  }

  calculateWordCount(text) {
    const cleanText = text.replace(/<[^>]*>/g, '').trim(); // Remove HTML tags
    return cleanText.split(/\s+/).filter(word => word.length > 0).length;
  }

  extractKeywords(text, maxKeywords = 10) {
    // Simple keyword extraction - in production, use NLP libraries
    const words = text.toLowerCase().match(/\b\w{4,}\b/g) || [];
    const frequency = {};

    words.forEach(word => {
      frequency[word] = (frequency[word] || 0) + 1;
    });

    return Object.entries(frequency)
      .sort(([, a], [, b]) => b - a)
      .slice(0, maxKeywords)
      .map(([word]) => word);
  }

  extractSearchTerms(title, body) {
    const titleWords = title.toLowerCase().match(/\b\w{3,}\b/g) || [];
    const bodyWords = this.extractKeywords(body, 20);
    return [...new Set([...titleWords, ...bodyWords])];
  }

  parseContentSections(body) {
    // Simple section parsing - would be more sophisticated in production
    const sections = [];
    const lines = body.split('\n');
    let order = 0;

    lines.forEach(line => {
      const trimmed = line.trim();
      if (trimmed.startsWith('#')) {
        const level = trimmed.match(/^#+/)[0].length;
        sections.push({
          type: 'heading',
          content: trimmed.replace(/^#+\s*/, ''),
          level: level,
          order: order++
        });
      } else if (trimmed.startsWith('```')) {
        sections.push({
          type: 'code',
          content: trimmed.replace(/```(\w+)?/, ''),
          language: trimmed.match(/```(\w+)/)?.[1] || 'text',
          order: order++
        });
      } else if (trimmed.length > 0) {
        sections.push({
          type: 'paragraph',
          content: trimmed,
          order: order++
        });
      }
    });

    return sections;
  }

  analyzeSentiment(text) {
    // Placeholder sentiment analysis - use proper NLP library in production
    const positiveWords = ['good', 'great', 'excellent', 'amazing', 'wonderful', 'fantastic'];
    const negativeWords = ['bad', 'terrible', 'awful', 'horrible', 'disappointing'];

    const words = text.toLowerCase().split(/\s+/);
    let score = 0;

    words.forEach(word => {
      if (positiveWords.includes(word)) score += 0.1;
      if (negativeWords.includes(word)) score -= 0.1;
    });

    return {
      score: Math.max(-1, Math.min(1, score)),
      magnitude: Math.abs(score),
      language: 'en'
    };
  }

  calculateReadabilityScore(text) {
    // Simple readability calculation - use proper libraries in production
    const sentences = text.split(/[.!?]+/).filter(s => s.trim().length > 0);
    const words = text.split(/\s+/);
    const avgWordsPerSentence = words.length / sentences.length;

    // Simple scoring based on average sentence length
    if (avgWordsPerSentence < 15) return 90;
    if (avgWordsPerSentence < 20) return 70;
    if (avgWordsPerSentence < 25) return 50;
    return 30;
  }

  assessComplexity(text) {
    const wordCount = this.calculateWordCount(text);
    const readabilityScore = this.calculateReadabilityScore(text);

    if (wordCount < 500 && readabilityScore > 70) return 'simple';
    if (wordCount < 2000 && readabilityScore > 50) return 'moderate';
    return 'complex';
  }

  async updateUserStatistics(userId) {
    // Update user statistics after profile changes
    await this.db.collection('users').updateOne(
      { _id: new ObjectId(userId) },
      {
        $set: {
          'activity.updatedAt': new Date()
        }
      }
    );
  }

  async updateAuthorStatistics(authorId, action) {
    const updates = {};

    if (action === 'post_created') {
      updates['$inc'] = {
        'activity.stats.totalPosts': 1
      };
    }

    updates['$set'] = {
      'activity.updatedAt': new Date(),
      'activity.lastActiveAt': new Date()
    };

    await this.db.collection('users').updateOne(
      { _id: new ObjectId(authorId) },
      updates
    );
  }

  async updateUserRecentPosts(userId, postId, postDocument) {
    await this.db.collection('users').updateOne(
      { _id: new ObjectId(userId) },
      {
        $push: {
          'content.recentPosts': {
            $each: [
              {
                postId: postId,
                title: postDocument.content.title,
                createdAt: postDocument.publication.createdAt,
                status: postDocument.publication.status,
                viewCount: 0,
                likeCount: 0
              }
            ],
            $slice: -10 // Keep only the 10 most recent posts
          }
        }
      }
    );
  }

  async setupOptimizedIndexes() {
    console.log('Setting up optimized indexes for document collections...');

    try {
      // Apply indexes from schema definitions
      for (const [collectionName, schema] of this.documentSchemas.entries()) {
        const collection = this.db.collection(collectionName);

        for (const indexStrategy of schema.indexStrategies) {
          await collection.createIndex(indexStrategy.fields, {
            background: true,
            unique: indexStrategy.unique || false,
            sparse: indexStrategy.sparse || false,
            partialFilterExpression: indexStrategy.partialFilterExpression
          });
        }
      }

      console.log('Optimized indexes created successfully');

    } catch (error) {
      console.error('Error setting up optimized indexes:', error);
      throw error;
    }
  }

  async applySchemaValidation() {
    console.log('Applying schema validation rules...');

    try {
      // Apply validation for users collection
      await this.db.createCollection('users', {
        validator: {
          $jsonSchema: {
            bsonType: 'object',
            required: ['username', 'email'],
            properties: {
              username: {
                bsonType: 'string',
                pattern: '^[a-zA-Z0-9_]{3,30}$',
                description: 'Username must be 3-30 characters with only letters, numbers, and underscores'
              },
              email: {
                bsonType: 'string',
                pattern: '^[^\\s@]+@[^\\s@]+\\.[^\\s@]+$',
                description: 'Valid email address required'
              },
              'profile.yearsOfExperience': {
                bsonType: 'int',
                minimum: 0,
                maximum: 70,
                description: 'Years of experience must be between 0 and 70'
              }
            }
          }
        },
        validationLevel: this.config.validationLevel,
        validationAction: this.config.strictValidation ? 'error' : 'warn'
      });

      // Apply validation for posts collection
      await this.db.createCollection('posts', {
        validator: {
          $jsonSchema: {
            bsonType: 'object',
            required: ['content.title', 'author.userId'],
            properties: {
              'content.title': {
                bsonType: 'string',
                minLength: 1,
                maxLength: 500,
                description: 'Post title is required and must be 1-500 characters'
              },
              'content.contentType': {
                bsonType: 'string',
                'enum': ['article', 'tutorial', 'review', 'announcement', 'news'],
                description: 'Content type must be one of the predefined values'
              },
              'publication.status': {
                bsonType: 'string',
                'enum': ['draft', 'review', 'published', 'archived', 'deleted'],
                description: 'Publication status must be one of the predefined values'
              }
            }
          }
        },
        validationLevel: this.config.validationLevel,
        validationAction: this.config.strictValidation ? 'error' : 'warn'
      });

      console.log('Schema validation rules applied successfully');

    } catch (error) {
      // Collections might already exist, which is fine
      if (!error.message.includes('already exists')) {
        console.error('Error applying schema validation:', error);
        throw error;
      }
    }
  }
}

// Benefits of MongoDB Advanced Data Modeling:
// - Flexible document structures that adapt to application requirements
// - Embedded relationships for optimal read performance and data locality
// - Denormalized data patterns for reduced join operations and improved query speed
// - Hierarchical data modeling with natural document nesting capabilities
// - Schema evolution support without complex migration procedures
// - Optimized indexing strategies for diverse query patterns
// - Rich data types including arrays, objects, and geospatial data
// - Query pattern optimization through strategic embedding and referencing
// - SQL-compatible operations through QueryLeaf integration
// - Production-ready data modeling patterns for scalable applications

module.exports = {
  AdvancedDataModelingManager
};

Understanding MongoDB Document Architecture

Advanced Schema Design and Relationship Optimization Patterns

Implement sophisticated data modeling workflows for enterprise MongoDB applications:

// Enterprise-grade data modeling with advanced relationship management capabilities
class EnterpriseDataModelingOrchestrator extends AdvancedDataModelingManager {
  constructor(mongoUri, enterpriseConfig) {
    super(mongoUri, enterpriseConfig);

    this.enterpriseConfig = {
      ...enterpriseConfig,
      enableAdvancedRelationships: true,
      enableDataGovernance: true,
      enablePerformanceOptimization: true,
      enableComplianceValidation: true,
      enableSchemaEvolution: true
    };

    this.setupEnterpriseCapabilities();
    this.initializeDataGovernance();
    this.setupAdvancedRelationshipManagement();
  }

  async implementAdvancedDataStrategy() {
    console.log('Implementing enterprise data modeling strategy...');

    const dataStrategy = {
      // Multi-tier data organization
      dataTiers: {
        operationalData: {
          embedding: 'aggressive',
          caching: 'memory',
          indexing: 'comprehensive',
          validation: 'strict'
        },
        analyticalData: {
          embedding: 'conservative',
          caching: 'disk',
          indexing: 'selective',
          validation: 'moderate'
        },
        archivalData: {
          embedding: 'minimal',
          caching: 'none',
          indexing: 'basic',
          validation: 'basic'
        }
      },

      // Advanced relationship management
      relationshipManagement: {
        dynamicReferencing: true,
        cascadingOperations: true,
        relationshipIndexing: true,
        crossCollectionValidation: true
      }
    };

    return await this.deployDataStrategy(dataStrategy);
  }

  async setupAdvancedDataGovernance() {
    console.log('Setting up enterprise data governance...');

    const governanceCapabilities = {
      // Data quality management
      dataQuality: {
        validationRules: true,
        dataCleansingPipelines: true,
        qualityMonitoring: true,
        anomalyDetection: true
      },

      // Compliance and auditing
      compliance: {
        dataLineage: true,
        auditTrails: true,
        privacyControls: true,
        retentionPolicies: true
      },

      // Schema governance
      schemaGovernance: {
        versionControl: true,
        changeApproval: true,
        backwardCompatibility: true,
        migrationAutomation: true
      }
    };

    return await this.deployDataGovernance(governanceCapabilities);
  }
}

SQL-Style Data Modeling with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB data modeling and schema operations:

-- QueryLeaf advanced data modeling operations with SQL-familiar syntax for MongoDB

-- Create comprehensive user profile schema with embedded relationships
CREATE DOCUMENT_SCHEMA user_profiles AS (
  -- Core identification
  user_id VARCHAR(24) PRIMARY KEY,
  username VARCHAR(255) UNIQUE NOT NULL,
  email VARCHAR(255) UNIQUE NOT NULL,

  -- Embedded personal information
  profile OBJECT(
    first_name VARCHAR(100),
    last_name VARCHAR(100),
    display_name VARCHAR(200),
    bio TEXT,
    date_of_birth DATE,
    phone_number VARCHAR(20),

    -- Professional information (embedded object)
    professional OBJECT(
      company VARCHAR(255),
      job_title VARCHAR(255),
      years_experience INTEGER CHECK(years_experience >= 0 AND years_experience <= 70),
      education_level VARCHAR(100),
      skills ARRAY[VARCHAR(100)],
      languages ARRAY[OBJECT(
        language VARCHAR(50),
        proficiency VARCHAR(20) CHECK(proficiency IN ('beginner', 'intermediate', 'advanced', 'native'))
      )]
    ),

    -- Social media links (embedded object)
    social_media OBJECT(
      facebook VARCHAR(255),
      twitter VARCHAR(255),
      linkedin VARCHAR(255),
      instagram VARCHAR(255),
      github VARCHAR(255),
      website VARCHAR(255)
    ),

    -- Profile media (embedded object)
    profile_picture OBJECT(
      url VARCHAR(500),
      thumbnail_url VARCHAR(500),
      uploaded_at TIMESTAMP,
      file_size INTEGER,
      dimensions OBJECT(
        width INTEGER,
        height INTEGER
      )
    )
  ),

  -- Contact information (embedded array)
  contact OBJECT(
    addresses ARRAY[OBJECT(
      type VARCHAR(20) CHECK(type IN ('home', 'work', 'billing', 'shipping')),
      address_line_1 VARCHAR(255),
      address_line_2 VARCHAR(255),
      city VARCHAR(100),
      state VARCHAR(100),
      postal_code VARCHAR(20),
      country VARCHAR(100),
      is_primary BOOLEAN DEFAULT false,
      coordinates OBJECT(
        latitude DECIMAL(10, 7),
        longitude DECIMAL(10, 7)
      )
    )],

    phone_numbers ARRAY[OBJECT(
      type VARCHAR(20) CHECK(type IN ('mobile', 'home', 'work')),
      number VARCHAR(20),
      country_code VARCHAR(5),
      is_primary BOOLEAN DEFAULT false,
      is_verified BOOLEAN DEFAULT false
    )],

    email_addresses ARRAY[OBJECT(
      email VARCHAR(255),
      type VARCHAR(20) CHECK(type IN ('primary', 'work', 'personal')),
      is_verified BOOLEAN DEFAULT false,
      is_primary BOOLEAN DEFAULT false
    )]
  ),

  -- User settings (embedded object)
  settings OBJECT(
    privacy OBJECT(
      profile_visibility VARCHAR(20) CHECK(profile_visibility IN ('public', 'private', 'friends')) DEFAULT 'public',
      email_visible BOOLEAN DEFAULT false,
      phone_visible BOOLEAN DEFAULT false,
      searchable BOOLEAN DEFAULT true
    ),

    notifications OBJECT(
      email OBJECT(
        posts BOOLEAN DEFAULT true,
        comments BOOLEAN DEFAULT true,
        mentions BOOLEAN DEFAULT true,
        messages BOOLEAN DEFAULT true,
        newsletter BOOLEAN DEFAULT false,
        marketing BOOLEAN DEFAULT false
      ),
      push OBJECT(
        posts BOOLEAN DEFAULT true,
        comments BOOLEAN DEFAULT true,
        mentions BOOLEAN DEFAULT true,
        messages BOOLEAN DEFAULT true
      ),
      sms OBJECT(
        security BOOLEAN DEFAULT true,
        important BOOLEAN DEFAULT false
      )
    ),

    interface OBJECT(
      theme VARCHAR(20) CHECK(theme IN ('light', 'dark', 'auto')) DEFAULT 'light',
      language VARCHAR(10) DEFAULT 'en',
      timezone VARCHAR(50) DEFAULT 'UTC',
      date_format VARCHAR(20) DEFAULT 'MM/DD/YYYY',
      currency VARCHAR(3) DEFAULT 'USD'
    )
  ),

  -- Activity tracking (embedded object)
  activity OBJECT(
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    last_login_at TIMESTAMP,
    last_active_at TIMESTAMP,

    status VARCHAR(20) CHECK(status IN ('active', 'inactive', 'suspended', 'deleted')) DEFAULT 'active',
    email_verified_at TIMESTAMP,
    phone_verified_at TIMESTAMP,

    -- Denormalized statistics for performance
    stats OBJECT(
      total_posts INTEGER DEFAULT 0,
      published_posts INTEGER DEFAULT 0,
      total_comments INTEGER DEFAULT 0,
      total_likes INTEGER DEFAULT 0,
      total_views INTEGER DEFAULT 0,
      followers_count INTEGER DEFAULT 0,
      following_count INTEGER DEFAULT 0,
      engagement_rate DECIMAL(5,2) DEFAULT 0.0,
      average_post_views DECIMAL(10,2) DEFAULT 0.0,
      profile_completeness DECIMAL(5,2) DEFAULT 0.0
    ),

    -- Recent activities (embedded array with limited size)
    recent_activities ARRAY[OBJECT(
      type VARCHAR(50),
      timestamp TIMESTAMP,
      details OBJECT,
      ip_address VARCHAR(45),
      user_agent VARCHAR(500)
    )] -- Limited to last 50 activities
  ),

  -- Content relationships (selective embedding/referencing)
  content OBJECT(
    -- Recent posts embedded for performance
    recent_posts ARRAY[OBJECT(
      post_id VARCHAR(24),
      title VARCHAR(500),
      created_at TIMESTAMP,
      status VARCHAR(20),
      view_count INTEGER,
      like_count INTEGER
    )] -- Limited to last 10 posts

    -- Large collections referenced
    favorite_post_ids ARRAY[VARCHAR(24)],

    -- Bookmarks with metadata
    bookmarks ARRAY[OBJECT(
      content_id VARCHAR(24),
      content_type VARCHAR(20) CHECK(content_type IN ('post', 'comment', 'user')),
      bookmarked_at TIMESTAMP,
      tags ARRAY[VARCHAR(50)],
      notes TEXT
    )]
  ),

  -- Social relationships (hybrid approach)
  social OBJECT(
    -- Following relationships (embedded for moderate size)
    following ARRAY[OBJECT(
      user_id VARCHAR(24),
      username VARCHAR(255),
      followed_at TIMESTAMP,
      relationship_type VARCHAR(20) CHECK(relationship_type IN ('friend', 'colleague', 'interest'))
    )],

    -- Large follower lists referenced
    follower_ids ARRAY[VARCHAR(24)],

    -- Group memberships
    groups ARRAY[OBJECT(
      group_id VARCHAR(24),
      group_name VARCHAR(255),
      role VARCHAR(20) CHECK(role IN ('member', 'moderator', 'admin')),
      joined_at TIMESTAMP
    )]
  ),

  -- Flexible metadata for extensibility
  metadata OBJECT(
    custom_fields OBJECT,
    tags ARRAY[VARCHAR(50)],
    categories ARRAY[VARCHAR(50)],
    source VARCHAR(100),
    referrer VARCHAR(255)
  ),

  -- Indexes for optimal performance
  INDEX idx_username (username),
  INDEX idx_email (email),
  INDEX idx_status_last_active (activity.status, activity.last_active_at DESC),
  INDEX idx_skills (profile.professional.skills),
  INDEX idx_location (contact.addresses.city, contact.addresses.state),

  -- Text search index
  INDEX idx_text_search ON (
    username TEXT,
    profile.display_name TEXT,
    profile.bio TEXT,
    profile.professional.skills TEXT
  ),

  -- Compound indexes for common query patterns
  INDEX idx_visibility_stats (settings.privacy.profile_visibility, activity.stats.total_posts DESC),
  INDEX idx_company_role (profile.professional.company, profile.professional.job_title)
);

-- Advanced post schema with flexible content structure
CREATE DOCUMENT_SCHEMA content_posts AS (
  -- Core identification
  post_id VARCHAR(24) PRIMARY KEY,
  slug VARCHAR(500) UNIQUE NOT NULL,

  -- Author information (denormalized for performance)
  author OBJECT(
    user_id VARCHAR(24) NOT NULL,
    username VARCHAR(255) NOT NULL,
    display_name VARCHAR(200),
    profile_picture VARCHAR(500),
    total_posts INTEGER,
    follower_count INTEGER,
    verified BOOLEAN DEFAULT false
  ),

  -- Flexible content structure
  content OBJECT(
    title VARCHAR(500) NOT NULL,
    subtitle VARCHAR(500),
    excerpt TEXT,
    body TEXT NOT NULL,
    content_type VARCHAR(20) CHECK(content_type IN ('article', 'tutorial', 'review', 'announcement', 'news')) DEFAULT 'article',

    -- Rich media content
    media ARRAY[OBJECT(
      type VARCHAR(20) CHECK(type IN ('image', 'video', 'audio', 'embed')),
      url VARCHAR(1000),
      thumbnail_url VARCHAR(1000),
      caption TEXT,
      alt_text TEXT,
      dimensions OBJECT(
        width INTEGER,
        height INTEGER
      ),
      file_size INTEGER,
      mime_type VARCHAR(100),
      duration INTEGER, -- For video/audio
      uploaded_at TIMESTAMP
    )],

    -- Structured content sections
    sections ARRAY[OBJECT(
      type VARCHAR(20) CHECK(type IN ('paragraph', 'heading', 'list', 'code', 'quote')),
      content TEXT,
      level INTEGER, -- For headings
      language VARCHAR(20), -- For code blocks
      order_index INTEGER
    )],

    -- SEO and metadata
    seo OBJECT(
      meta_title VARCHAR(500),
      meta_description TEXT,
      keywords ARRAY[VARCHAR(100)],
      canonical_url VARCHAR(1000),
      open_graph_image VARCHAR(1000),
      structured_data OBJECT
    ),

    -- Content analysis
    formatting OBJECT(
      reading_time INTEGER, -- Minutes
      word_count INTEGER,
      language VARCHAR(10) DEFAULT 'en',
      rtl_direction BOOLEAN DEFAULT false
    )
  ),

  -- Publication management
  publication OBJECT(
    status VARCHAR(20) CHECK(status IN ('draft', 'review', 'published', 'archived', 'deleted')) DEFAULT 'draft',
    visibility VARCHAR(20) CHECK(visibility IN ('public', 'private', 'unlisted', 'password_protected')) DEFAULT 'public',
    password VARCHAR(255),

    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    published_at TIMESTAMP,
    scheduled_publish_at TIMESTAMP,

    -- Revision tracking (limited to recent changes)
    revisions ARRAY[OBJECT(
      version INTEGER,
      changed_at TIMESTAMP,
      changed_by VARCHAR(24),
      change_type VARCHAR(20) CHECK(change_type IN ('content', 'metadata', 'status')),
      changes_summary TEXT,
      previous_title VARCHAR(500),
      previous_content TEXT
    )] -- Limited to last 10 revisions

    allow_comments BOOLEAN DEFAULT true,
    allow_sharing BOOLEAN DEFAULT true,
    allow_indexing BOOLEAN DEFAULT true,
    require_approval BOOLEAN DEFAULT false
  ),

  -- Categorization and tagging
  taxonomy OBJECT(
    categories ARRAY[OBJECT(
      category_id VARCHAR(24),
      name VARCHAR(255),
      slug VARCHAR(255),
      level INTEGER,
      parent_category VARCHAR(255)
    )],

    tags ARRAY[OBJECT(
      tag VARCHAR(100),
      relevance_score DECIMAL(3,2) DEFAULT 1.0,
      added_by VARCHAR(24),
      added_at TIMESTAMP
    )],

    custom_fields OBJECT(
      difficulty VARCHAR(20), -- For tutorials
      estimated_time INTEGER, -- For how-to content
      target_audience VARCHAR(100),
      prerequisites ARRAY[VARCHAR(100)]
    )
  ),

  -- Engagement metrics (denormalized for performance)
  engagement OBJECT(
    views OBJECT(
      total INTEGER DEFAULT 0,
      unique INTEGER DEFAULT 0,
      today INTEGER DEFAULT 0,
      this_week INTEGER DEFAULT 0,
      this_month INTEGER DEFAULT 0,
      sources OBJECT(
        direct INTEGER DEFAULT 0,
        social INTEGER DEFAULT 0,
        search INTEGER DEFAULT 0,
        referral INTEGER DEFAULT 0
      )
    ),

    interactions OBJECT(
      likes INTEGER DEFAULT 0,
      dislikes INTEGER DEFAULT 0,
      shares INTEGER DEFAULT 0,
      bookmarks INTEGER DEFAULT 0,
      comments OBJECT(
        total INTEGER DEFAULT 0,
        approved INTEGER DEFAULT 0,
        pending INTEGER DEFAULT 0,
        spam INTEGER DEFAULT 0
      )
    ),

    metrics OBJECT(
      engagement_rate DECIMAL(5,2) DEFAULT 0.0,
      average_time_on_page INTEGER DEFAULT 0, -- Seconds
      bounce_rate DECIMAL(5,2) DEFAULT 0.0,
      social_shares INTEGER DEFAULT 0
    ),

    -- Top comments embedded for quick access
    top_comments ARRAY[OBJECT(
      comment_id VARCHAR(24),
      content TEXT,
      author OBJECT(
        user_id VARCHAR(24),
        username VARCHAR(255),
        profile_picture VARCHAR(500)
      ),
      created_at TIMESTAMP,
      like_count INTEGER,
      is_highlighted BOOLEAN DEFAULT false
    )] -- Limited to top 5 comments
  ),

  -- Comment management (hybrid approach)
  comments OBJECT(
    -- Recent comments embedded
    recent ARRAY[OBJECT(
      comment_id VARCHAR(24),
      parent_comment_id VARCHAR(24),
      content TEXT,
      author OBJECT(
        user_id VARCHAR(24),
        username VARCHAR(255),
        display_name VARCHAR(200),
        profile_picture VARCHAR(500)
      ),
      created_at TIMESTAMP,
      updated_at TIMESTAMP,
      status VARCHAR(20) CHECK(status IN ('approved', 'pending', 'spam', 'deleted')) DEFAULT 'approved',
      like_count INTEGER DEFAULT 0,
      reply_count INTEGER DEFAULT 0,
      is_edited BOOLEAN DEFAULT false,
      is_pinned BOOLEAN DEFAULT false,
      flags ARRAY[VARCHAR(50)],
      moderation_status VARCHAR(20)
    )] -- Limited to last 20 comments

    statistics OBJECT(
      total_comments INTEGER DEFAULT 0,
      approved_comments INTEGER DEFAULT 0,
      pending_comments INTEGER DEFAULT 0,
      last_comment_at TIMESTAMP
    )
  ),

  -- Performance optimization
  performance OBJECT(
    last_cached TIMESTAMP,
    cache_version VARCHAR(10),
    search_terms ARRAY[VARCHAR(100)],
    search_boost DECIMAL(3,2) DEFAULT 1.0,

    sentiment OBJECT(
      score DECIMAL(3,2), -- -1 to 1
      magnitude DECIMAL(3,2),
      language VARCHAR(10)
    ),

    readability_score INTEGER,
    complexity VARCHAR(20) CHECK(complexity IN ('simple', 'moderate', 'complex'))
  ),

  -- Flexible metadata
  metadata OBJECT(
    custom_fields OBJECT,
    source VARCHAR(50) DEFAULT 'web',
    imported_from VARCHAR(100),
    external_ids OBJECT,

    experiments ARRAY[OBJECT(
      experiment_id VARCHAR(50),
      variant VARCHAR(50),
      start_date DATE,
      end_date DATE
    )]
  ),

  -- Optimized indexes for content queries
  INDEX idx_slug (slug),
  INDEX idx_author_published (author.user_id, publication.published_at DESC),
  INDEX idx_status_published (publication.status, publication.published_at DESC),
  INDEX idx_categories (taxonomy.categories.name),
  INDEX idx_tags (taxonomy.tags.tag),
  INDEX idx_engagement (engagement.views.total DESC, publication.published_at DESC),

  -- Text search index for content
  INDEX idx_content_search ON (
    content.title TEXT,
    content.body TEXT,
    taxonomy.tags.tag TEXT,
    taxonomy.categories.name TEXT
  ),

  -- Compound indexes for complex queries
  INDEX idx_visibility_engagement (publication.visibility, engagement.views.total DESC),
  INDEX idx_type_published (content.content_type, publication.published_at DESC),
  INDEX idx_author_stats (author.user_id, engagement.interactions.likes DESC)
);

-- Advanced data modeling analysis and optimization queries
WITH document_structure_analysis AS (
  SELECT 
    collection_name,
    COUNT(*) as total_documents,

    -- Document size analysis
    AVG(BSON_SIZE(document)) as avg_document_size_bytes,
    MAX(BSON_SIZE(document)) as max_document_size_bytes,
    MIN(BSON_SIZE(document)) as min_document_size_bytes,

    -- Embedded array analysis
    AVG(ARRAY_LENGTH(profile.professional.skills)) as avg_skills_count,
    AVG(ARRAY_LENGTH(contact.addresses)) as avg_addresses_count,
    AVG(ARRAY_LENGTH(social.following)) as avg_following_count,

    -- Nested object complexity
    AVG(OBJECT_DEPTH(profile)) as avg_profile_depth,
    AVG(OBJECT_DEPTH(settings)) as avg_settings_depth,
    AVG(OBJECT_DEPTH(activity)) as avg_activity_depth,

    -- Data completeness analysis
    COUNT(*) FILTER (WHERE profile.first_name IS NOT NULL) as profiles_with_first_name,
    COUNT(*) FILTER (WHERE profile.bio IS NOT NULL) as profiles_with_bio,
    COUNT(*) FILTER (WHERE profile.professional.company IS NOT NULL) as profiles_with_company,
    COUNT(*) FILTER (WHERE contact.addresses IS NOT NULL AND ARRAY_LENGTH(contact.addresses) > 0) as profiles_with_address,

    -- Activity patterns
    AVG(activity.stats.total_posts) as avg_posts_per_user,
    AVG(activity.stats.profile_completeness) as avg_profile_completeness,

    -- Relationship analysis
    AVG(ARRAY_LENGTH(content.favorite_post_ids)) as avg_favorites_per_user,
    AVG(ARRAY_LENGTH(social.follower_ids)) as avg_followers_per_user

  FROM USER_PROFILES
  GROUP BY collection_name
),

performance_optimization_analysis AS (
  SELECT 
    dsa.*,

    -- Document size categorization
    CASE 
      WHEN dsa.avg_document_size_bytes < 16384 THEN 'optimal_size' -- < 16KB
      WHEN dsa.avg_document_size_bytes < 65536 THEN 'good_size'     -- < 64KB
      WHEN dsa.avg_document_size_bytes < 262144 THEN 'large_size'   -- < 256KB
      ELSE 'very_large_size'                                        -- >= 256KB
    END as document_size_category,

    -- Embedding effectiveness
    CASE 
      WHEN dsa.avg_skills_count > 20 THEN 'consider_referencing_skills'
      WHEN dsa.avg_following_count > 1000 THEN 'consider_referencing_following'
      WHEN dsa.avg_addresses_count > 5 THEN 'consider_referencing_addresses'
      ELSE 'embedding_appropriate'
    END as embedding_recommendation,

    -- Data completeness scoring
    ROUND(
      (dsa.profiles_with_first_name * 100.0 / dsa.total_documents + 
       dsa.profiles_with_bio * 100.0 / dsa.total_documents + 
       dsa.profiles_with_company * 100.0 / dsa.total_documents + 
       dsa.profiles_with_address * 100.0 / dsa.total_documents) / 4, 
      2
    ) as overall_data_completeness_percent,

    -- Performance indicators
    CASE 
      WHEN dsa.avg_profile_depth > 4 THEN 'consider_flattening_structure'
      WHEN dsa.max_document_size_bytes > 1048576 THEN 'critical_size_optimization_needed' -- > 1MB
      WHEN dsa.avg_followers_per_user > 10000 THEN 'implement_follower_pagination'
      ELSE 'structure_optimized'
    END as structure_optimization_recommendation,

    -- Index strategy recommendations
    ARRAY[
      CASE WHEN dsa.profiles_with_company * 100.0 / dsa.total_documents > 60 
           THEN 'Add index on profile.professional.company' END,
      CASE WHEN dsa.avg_skills_count > 3 
           THEN 'Optimize skills array indexing' END,
      CASE WHEN dsa.profiles_with_address * 100.0 / dsa.total_documents > 70 
           THEN 'Add geospatial index for addresses' END,
      CASE WHEN dsa.avg_posts_per_user > 50 
           THEN 'Consider post relationship optimization' END
    ]::TEXT[] as indexing_recommendations

  FROM document_structure_analysis dsa
),

content_modeling_analysis AS (
  SELECT 
    'content_posts' as collection_name,
    COUNT(*) as total_posts,

    -- Content structure analysis
    AVG(BSON_SIZE(content)) as avg_content_size_bytes,
    AVG(content.formatting.word_count) as avg_word_count,
    AVG(content.formatting.reading_time) as avg_reading_time_minutes,
    AVG(ARRAY_LENGTH(content.media)) as avg_media_items,

    -- Taxonomy analysis
    AVG(ARRAY_LENGTH(taxonomy.categories)) as avg_categories_per_post,
    AVG(ARRAY_LENGTH(taxonomy.tags)) as avg_tags_per_post,

    -- Engagement patterns
    AVG(engagement.views.total) as avg_total_views,
    AVG(engagement.interactions.likes) as avg_likes,
    AVG(engagement.interactions.comments.total) as avg_comments,

    -- Comment embedding analysis
    AVG(ARRAY_LENGTH(comments.recent)) as avg_embedded_comments,
    MAX(ARRAY_LENGTH(comments.recent)) as max_embedded_comments,

    -- Content type distribution
    COUNT(*) FILTER (WHERE content.content_type = 'article') as article_count,
    COUNT(*) FILTER (WHERE content.content_type = 'tutorial') as tutorial_count,
    COUNT(*) FILTER (WHERE content.content_type = 'review') as review_count,

    -- Publication patterns
    COUNT(*) FILTER (WHERE publication.status = 'published') as published_posts,
    COUNT(*) FILTER (WHERE publication.status = 'draft') as draft_posts,

    -- Performance metrics
    AVG(performance.readability_score) as avg_readability_score,
    COUNT(*) FILTER (WHERE performance.complexity = 'simple') as simple_content,
    COUNT(*) FILTER (WHERE performance.complexity = 'moderate') as moderate_content,
    COUNT(*) FILTER (WHERE performance.complexity = 'complex') as complex_content

  FROM CONTENT_POSTS
  WHERE publication.created_at >= CURRENT_TIMESTAMP - INTERVAL '30 days'
)

SELECT 
  poa.collection_name,
  poa.total_documents,
  poa.document_size_category,

  -- Size metrics
  ROUND(poa.avg_document_size_bytes / 1024.0, 2) as avg_size_kb,
  ROUND(poa.max_document_size_bytes / 1024.0, 2) as max_size_kb,

  -- Structure analysis
  ROUND(poa.avg_profile_depth, 1) as avg_nesting_depth,
  poa.embedding_recommendation,
  poa.structure_optimization_recommendation,

  -- Data quality
  ROUND(poa.overall_data_completeness_percent, 1) as data_completeness_percent,
  ROUND(poa.avg_profile_completeness, 1) as avg_profile_completeness,

  -- Relationship metrics
  ROUND(poa.avg_skills_count, 1) as avg_skills_per_user,
  ROUND(poa.avg_following_count, 1) as avg_following_per_user,
  ROUND(poa.avg_followers_per_user, 1) as avg_followers_per_user,

  -- Performance recommendations
  ARRAY_REMOVE(poa.indexing_recommendations, NULL) as optimization_recommendations,

  -- Data modeling assessment
  CASE 
    WHEN poa.document_size_category = 'very_large_size' THEN 'critical_optimization_needed'
    WHEN poa.embedding_recommendation != 'embedding_appropriate' THEN 'relationship_optimization_needed'
    WHEN poa.overall_data_completeness_percent < 60 THEN 'data_quality_improvement_needed'
    ELSE 'data_model_optimized'
  END as overall_assessment,

  -- Specific action items
  ARRAY[
    CASE WHEN poa.avg_document_size_bytes > 262144 
         THEN 'Split large documents or reference large arrays' END,
    CASE WHEN poa.overall_data_completeness_percent < 50 
         THEN 'Implement data validation and user onboarding improvements' END,
    CASE WHEN poa.avg_followers_per_user > 5000 
         THEN 'Implement follower pagination and lazy loading' END,
    CASE WHEN poa.max_document_size_bytes > 1048576 
         THEN 'URGENT: Address oversized documents immediately' END
  ]::TEXT[] as action_items,

  -- Performance impact
  CASE 
    WHEN poa.document_size_category IN ('large_size', 'very_large_size') THEN 'high_performance_impact'
    WHEN poa.embedding_recommendation != 'embedding_appropriate' THEN 'medium_performance_impact'
    ELSE 'low_performance_impact'
  END as performance_impact

FROM performance_optimization_analysis poa

UNION ALL

-- Content analysis results
SELECT 
  cma.collection_name,
  cma.total_posts as total_documents,

  CASE 
    WHEN cma.avg_content_size_bytes < 32768 THEN 'optimal_size'
    WHEN cma.avg_content_size_bytes < 131072 THEN 'good_size' 
    WHEN cma.avg_content_size_bytes < 524288 THEN 'large_size'
    ELSE 'very_large_size'
  END as document_size_category,

  ROUND(cma.avg_content_size_bytes / 1024.0, 2) as avg_size_kb,
  0 as max_size_kb, -- Placeholder for union compatibility

  0 as avg_nesting_depth, -- Placeholder

  CASE 
    WHEN cma.avg_media_items > 10 THEN 'consider_referencing_media'
    WHEN cma.max_embedded_comments > 50 THEN 'optimize_comment_embedding'
    ELSE 'embedding_appropriate'
  END as embedding_recommendation,

  CASE 
    WHEN cma.avg_content_size_bytes > 524288 THEN 'split_large_content'
    WHEN cma.avg_embedded_comments > 25 THEN 'implement_comment_pagination'
    ELSE 'structure_optimized'
  END as structure_optimization_recommendation,

  ROUND((cma.published_posts * 100.0 / cma.total_posts), 1) as data_completeness_percent,
  ROUND(cma.avg_readability_score, 1) as avg_profile_completeness,

  ROUND(cma.avg_categories_per_post, 1) as avg_skills_per_user,
  ROUND(cma.avg_tags_per_post, 1) as avg_following_per_user,
  ROUND(cma.avg_total_views, 0) as avg_followers_per_user,

  ARRAY[
    CASE WHEN cma.avg_word_count > 3000 THEN 'Consider content length optimization' END,
    CASE WHEN cma.avg_media_items > 5 THEN 'Optimize media storage and delivery' END,
    CASE WHEN cma.complex_content > cma.total_posts * 0.3 THEN 'Improve content readability' END
  ]::TEXT[] as optimization_recommendations,

  CASE 
    WHEN cma.avg_content_size_bytes > 524288 THEN 'critical_optimization_needed'
    WHEN cma.avg_embedded_comments > 25 THEN 'relationship_optimization_needed'
    ELSE 'data_model_optimized'
  END as overall_assessment,

  ARRAY[
    CASE WHEN cma.avg_content_size_bytes > 262144 THEN 'Optimize content storage and caching' END,
    CASE WHEN cma.max_embedded_comments > 50 THEN 'Implement comment pagination' END
  ]::TEXT[] as action_items,

  CASE 
    WHEN cma.avg_content_size_bytes > 262144 THEN 'high_performance_impact'
    ELSE 'low_performance_impact'
  END as performance_impact

FROM content_modeling_analysis cma
ORDER BY performance_impact DESC, total_documents DESC;

-- QueryLeaf provides comprehensive MongoDB data modeling capabilities:
-- 1. Flexible document schema design with embedded and referenced relationships
-- 2. Advanced validation rules and constraints for data integrity
-- 3. Optimized indexing strategies for diverse query patterns
-- 4. Performance-focused embedding and referencing decisions
-- 5. Schema evolution support with backward compatibility
-- 6. Data quality analysis and optimization recommendations
-- 7. SQL-familiar syntax for complex MongoDB data operations
-- 8. Enterprise-grade data governance and compliance features
-- 9. Automated performance optimization and monitoring
-- 10. Production-ready data modeling patterns for scalable applications

Best Practices for Production Data Modeling

Document Design Strategy and Performance Optimization

Essential principles for effective MongoDB data modeling in production environments:

  1. Embedding vs. Referencing Strategy: Design optimal data relationships based on access patterns, update frequency, and document size constraints
  2. Schema Evolution Planning: Implement flexible schemas that can evolve with application requirements while maintaining backward compatibility
  3. Performance-First Design: Optimize document structures for common query patterns and minimize the need for complex aggregations
  4. Data Integrity Management: Establish validation rules, referential integrity patterns, and data quality monitoring procedures
  5. Indexing Strategy: Design comprehensive indexing strategies that support diverse query patterns while minimizing storage overhead
  6. Scalability Considerations: Plan for growth patterns and design document structures that scale efficiently with data volume

Enterprise Data Governance

Implement comprehensive data governance for enterprise-scale applications:

  1. Data Quality Framework: Establish automated data validation, cleansing pipelines, and quality monitoring systems
  2. Schema Governance: Implement version control, change approval processes, and automated migration procedures for schema evolution
  3. Compliance Integration: Ensure data modeling patterns meet regulatory requirements and industry standards
  4. Performance Monitoring: Monitor query performance, document size growth, and relationship efficiency continuously
  5. Data Lifecycle Management: Design retention policies, archival strategies, and data purging procedures
  6. Documentation Standards: Maintain comprehensive documentation for schemas, relationships, and optimization decisions

Conclusion

MongoDB data modeling provides comprehensive document design capabilities that enable sophisticated relationship management, flexible schema evolution, and performance-optimized data structures through embedded documents, selective referencing, and intelligent denormalization strategies. The native document model and rich data types ensure that applications can represent complex data relationships naturally while maintaining optimal query performance.

Key MongoDB Data Modeling benefits include:

  • Flexible Document Structures: Rich document model with native support for arrays, embedded objects, and hierarchical data organization
  • Optimized Relationships: Strategic embedding and referencing patterns that balance performance, consistency, and maintainability
  • Schema Evolution: Dynamic schema capabilities that adapt to changing requirements without complex migration procedures
  • Performance Optimization: Document design patterns that minimize query complexity and maximize read/write efficiency
  • Data Integrity: Comprehensive validation rules, constraints, and referential integrity patterns for production data quality
  • SQL Accessibility: Familiar SQL-style data modeling operations through QueryLeaf for accessible document design

Whether you're designing user management systems, content platforms, e-commerce applications, or analytical systems, MongoDB data modeling with QueryLeaf's familiar SQL interface provides the foundation for sophisticated, scalable document-oriented applications.

QueryLeaf Integration: QueryLeaf automatically optimizes MongoDB data modeling operations while providing SQL-familiar syntax for schema design, relationship management, and validation rules. Advanced document structures, embedding strategies, and performance optimization are seamlessly handled through familiar SQL constructs, making sophisticated data modeling accessible to SQL-oriented development teams.

The combination of MongoDB's flexible document capabilities with SQL-style modeling operations makes it an ideal platform for applications requiring both complex data relationships and familiar database design patterns, ensuring your data architecture can evolve efficiently while maintaining performance and consistency as application complexity and data volume grow.

MongoDB GridFS Large File Storage and Management: Advanced Distributed File Systems and Binary Data Operations for Enterprise Applications

Modern applications require sophisticated file storage capabilities that can handle large binary files, multimedia content, and document management while providing distributed access, version control, and efficient streaming. Traditional file system approaches struggle with scalability, metadata management, and integration with database operations, leading to complex architecture with separate storage systems, synchronization challenges, and operational overhead that complicates application development and deployment.

MongoDB GridFS provides comprehensive large file storage through distributed binary data management, efficient chunk-based storage, integrated metadata handling, and streaming capabilities that enable seamless file operations within database transactions. Unlike traditional file systems that require separate storage infrastructure and complex synchronization, GridFS integrates file storage directly into MongoDB with automatic chunking, replica set distribution, and transactional consistency.

The Traditional Large File Storage Challenge

Conventional approaches to large file storage in application architectures face significant limitations:

-- Traditional file storage management - complex infrastructure with limited integration capabilities

-- Basic file metadata tracking table with minimal functionality
CREATE TABLE file_metadata (
    file_id SERIAL PRIMARY KEY,
    file_name VARCHAR(255) NOT NULL,
    file_path TEXT NOT NULL,
    file_type VARCHAR(100),
    mime_type VARCHAR(100),

    -- Basic file information (limited metadata)
    file_size_bytes BIGINT,
    created_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    modified_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    created_by VARCHAR(100),

    -- Storage location tracking (manual management)
    storage_location VARCHAR(200),
    storage_server VARCHAR(100),
    storage_partition VARCHAR(50),

    -- Basic versioning (very limited)
    version_number INTEGER DEFAULT 1,
    is_current_version BOOLEAN DEFAULT true,
    parent_file_id INTEGER REFERENCES file_metadata(file_id),

    -- Access control (basic)
    access_permissions VARCHAR(50) DEFAULT 'private',
    owner_user_id INTEGER,

    -- File status
    file_status VARCHAR(50) DEFAULT 'active',
    checksum VARCHAR(64),

    -- Backup and replication tracking
    backup_status VARCHAR(50) DEFAULT 'pending',
    last_backup_time TIMESTAMP,
    replication_status VARCHAR(50) DEFAULT 'single'
);

-- File chunk storage simulation (very basic)
CREATE TABLE file_chunks (
    chunk_id SERIAL PRIMARY KEY,
    file_id INTEGER REFERENCES file_metadata(file_id),
    chunk_number INTEGER NOT NULL,
    chunk_size_bytes INTEGER NOT NULL,

    -- Chunk storage (can't actually store binary data efficiently)
    chunk_data TEXT, -- Base64 encoded - very inefficient
    chunk_checksum VARCHAR(64),

    -- Storage tracking
    storage_location VARCHAR(200),
    compression_applied BOOLEAN DEFAULT false,
    compression_ratio DECIMAL(5,2),

    UNIQUE(file_id, chunk_number)
);

-- Manual file upload processing function (very limited functionality)
CREATE OR REPLACE FUNCTION process_file_upload(
    file_name_param VARCHAR(255),
    file_path_param TEXT,
    file_size_param BIGINT,
    chunk_size_param INTEGER DEFAULT 1048576 -- 1MB chunks
) RETURNS TABLE (
    upload_success BOOLEAN,
    file_id INTEGER,
    total_chunks INTEGER,
    processing_time_seconds INTEGER,
    error_message TEXT
) AS $$
DECLARE
    new_file_id INTEGER;
    total_chunks_count INTEGER;
    chunk_counter INTEGER := 1;
    processing_start TIMESTAMP;
    processing_end TIMESTAMP;
    upload_error TEXT := '';
    upload_result BOOLEAN := true;
    simulated_chunk_data TEXT;
BEGIN
    processing_start := clock_timestamp();

    BEGIN
        -- Calculate total chunks needed
        total_chunks_count := CEILING(file_size_param::DECIMAL / chunk_size_param);

        -- Create file metadata record
        INSERT INTO file_metadata (
            file_name, file_path, file_size_bytes, 
            storage_location, checksum
        )
        VALUES (
            file_name_param, file_path_param, file_size_param,
            '/storage/files/' || EXTRACT(YEAR FROM CURRENT_DATE) || '/' || 
            EXTRACT(MONTH FROM CURRENT_DATE) || '/',
            MD5(file_name_param || file_size_param::TEXT) -- Basic checksum
        )
        RETURNING file_metadata.file_id INTO new_file_id;

        -- Simulate chunk processing (very basic)
        WHILE chunk_counter <= total_chunks_count LOOP
            -- Calculate chunk size for this chunk
            DECLARE
                current_chunk_size INTEGER;
            BEGIN
                IF chunk_counter = total_chunks_count THEN
                    current_chunk_size := file_size_param - ((chunk_counter - 1) * chunk_size_param);
                ELSE
                    current_chunk_size := chunk_size_param;
                END IF;

                -- Simulate chunk data (can't actually handle binary data efficiently)
                simulated_chunk_data := 'chunk_' || chunk_counter || '_data_placeholder';

                -- Insert chunk record
                INSERT INTO file_chunks (
                    file_id, chunk_number, chunk_size_bytes, 
                    chunk_data, chunk_checksum, storage_location
                )
                VALUES (
                    new_file_id, chunk_counter, current_chunk_size,
                    simulated_chunk_data,
                    MD5(simulated_chunk_data),
                    '/storage/chunks/' || new_file_id || '/' || chunk_counter
                );

                chunk_counter := chunk_counter + 1;

                -- Simulate processing time
                PERFORM pg_sleep(0.01);
            END;
        END LOOP;

        -- Update file status
        UPDATE file_metadata 
        SET file_status = 'available',
            modified_date = clock_timestamp()
        WHERE file_id = new_file_id;

    EXCEPTION WHEN OTHERS THEN
        upload_result := false;
        upload_error := SQLERRM;

        -- Cleanup on failure
        DELETE FROM file_chunks WHERE file_id = new_file_id;
        DELETE FROM file_metadata WHERE file_id = new_file_id;
    END;

    processing_end := clock_timestamp();

    RETURN QUERY SELECT 
        upload_result,
        new_file_id,
        total_chunks_count,
        EXTRACT(SECONDS FROM processing_end - processing_start)::INTEGER,
        CASE WHEN NOT upload_result THEN upload_error ELSE NULL END;

END;
$$ LANGUAGE plpgsql;

-- Basic file download function (very limited streaming capabilities)
CREATE OR REPLACE FUNCTION download_file_chunks(file_id_param INTEGER)
RETURNS TABLE (
    chunk_number INTEGER,
    chunk_size_bytes INTEGER,
    chunk_data TEXT,
    download_order INTEGER
) AS $$
BEGIN
    -- Simple chunk retrieval (no streaming, no optimization)
    RETURN QUERY
    SELECT 
        fc.chunk_number,
        fc.chunk_size_bytes,
        fc.chunk_data,
        fc.chunk_number as download_order
    FROM file_chunks fc
    WHERE fc.file_id = file_id_param
    ORDER BY fc.chunk_number;

    -- Update download statistics (basic tracking)
    UPDATE file_metadata 
    SET modified_date = CURRENT_TIMESTAMP
    WHERE file_id = file_id_param;
END;
$$ LANGUAGE plpgsql;

-- Execute file upload simulation
SELECT * FROM process_file_upload('large_document.pdf', '/uploads/large_document.pdf', 50000000, 1048576);

-- Basic file management and cleanup
WITH file_storage_analysis AS (
    SELECT 
        fm.file_id,
        fm.file_name,
        fm.file_size_bytes,
        fm.created_date,
        fm.file_status,
        COUNT(fc.chunk_id) as total_chunks,
        SUM(fc.chunk_size_bytes) as total_chunk_size,

        -- Storage efficiency calculation (basic)
        CASE 
            WHEN fm.file_size_bytes > 0 THEN
                (SUM(fc.chunk_size_bytes)::DECIMAL / fm.file_size_bytes) * 100
            ELSE 0
        END as storage_efficiency_percent,

        -- Age analysis
        EXTRACT(DAYS FROM CURRENT_TIMESTAMP - fm.created_date) as file_age_days,

        -- Basic categorization
        CASE 
            WHEN fm.file_size_bytes > 100 * 1024 * 1024 THEN 'large'
            WHEN fm.file_size_bytes > 10 * 1024 * 1024 THEN 'medium'
            ELSE 'small'
        END as file_size_category

    FROM file_metadata fm
    LEFT JOIN file_chunks fc ON fm.file_id = fc.file_id
    WHERE fm.created_date >= CURRENT_DATE - INTERVAL '30 days'
    GROUP BY fm.file_id, fm.file_name, fm.file_size_bytes, fm.created_date, fm.file_status
)
SELECT 
    fsa.file_name,
    fsa.file_size_category,
    ROUND(fsa.file_size_bytes / 1024.0 / 1024.0, 2) as file_size_mb,
    fsa.total_chunks,
    ROUND(fsa.storage_efficiency_percent, 1) as storage_efficiency_percent,
    fsa.file_age_days,
    fsa.file_status,

    -- Storage recommendations (very basic)
    CASE 
        WHEN fsa.storage_efficiency_percent < 95 THEN 'check_chunk_integrity'
        WHEN fsa.file_age_days > 365 AND fsa.file_status = 'active' THEN 'consider_archiving'
        WHEN fsa.total_chunks = 0 THEN 'missing_chunks'
        ELSE 'normal'
    END as recommendation

FROM file_storage_analysis fsa
ORDER BY fsa.file_size_bytes DESC, fsa.created_date DESC;

-- Basic file cleanup (manual process)
WITH old_files AS (
    SELECT file_id, file_name, file_size_bytes
    FROM file_metadata
    WHERE created_date < CURRENT_DATE - INTERVAL '2 years'
    AND file_status = 'archived'
),
cleanup_chunks AS (
    DELETE FROM file_chunks
    WHERE file_id IN (SELECT file_id FROM old_files)
    RETURNING file_id, chunk_size_bytes
),
cleanup_files AS (
    DELETE FROM file_metadata
    WHERE file_id IN (SELECT file_id FROM old_files)
    RETURNING file_id, file_size_bytes
)
SELECT 
    COUNT(DISTINCT cf.file_id) as files_cleaned,
    SUM(cf.file_size_bytes) as total_space_freed_bytes,
    ROUND(SUM(cf.file_size_bytes) / 1024.0 / 1024.0 / 1024.0, 2) as space_freed_gb,
    COUNT(cc.file_id) as chunks_cleaned
FROM cleanup_files cf
LEFT JOIN cleanup_chunks cc ON cf.file_id = cc.file_id;

-- Problems with traditional file storage approaches:
-- 1. Inefficient binary data handling in relational databases
-- 2. Manual chunk management with no automatic optimization
-- 3. Limited streaming capabilities and poor performance for large files
-- 4. No built-in replication or distributed storage features
-- 5. Basic metadata management with limited search capabilities
-- 6. Complex backup and recovery procedures for file data
-- 7. No transactional consistency between file operations and database operations
-- 8. Limited scalability for high-volume file storage requirements
-- 9. No built-in compression or space optimization features
-- 10. Manual versioning and access control management

MongoDB GridFS provides comprehensive large file storage with advanced binary data management:

// MongoDB GridFS Advanced File Storage - comprehensive binary data management with streaming capabilities
const { MongoClient, GridFSBucket } = require('mongodb');
const fs = require('fs');
const { createReadStream, createWriteStream } = require('fs');
const { pipeline } = require('stream');
const { promisify } = require('util');
const crypto = require('crypto');
const { EventEmitter } = require('events');

// Comprehensive MongoDB GridFS File Manager
class AdvancedGridFSFileManager extends EventEmitter {
  constructor(connectionString, gridFSConfig = {}) {
    super();
    this.connectionString = connectionString;
    this.client = null;
    this.db = null;
    this.gridFSBuckets = new Map();

    // Advanced GridFS configuration
    this.config = {
      // Bucket configuration
      defaultBucket: gridFSConfig.defaultBucket || 'fs',
      customBuckets: gridFSConfig.customBuckets || {},
      chunkSizeBytes: gridFSConfig.chunkSizeBytes || 261120, // 255KB default

      // File management settings
      enableMetadataIndexing: gridFSConfig.enableMetadataIndexing !== false,
      enableVersionControl: gridFSConfig.enableVersionControl || false,
      enableCompression: gridFSConfig.enableCompression || false,
      enableEncryption: gridFSConfig.enableEncryption || false,

      // Storage optimization
      enableAutomaticCleanup: gridFSConfig.enableAutomaticCleanup || false,
      enableDeduplication: gridFSConfig.enableDeduplication || false,
      enableThumbnailGeneration: gridFSConfig.enableThumbnailGeneration || false,

      // Performance configuration
      enableParallelUploads: gridFSConfig.enableParallelUploads || false,
      maxConcurrentUploads: gridFSConfig.maxConcurrentUploads || 5,
      enableStreamingOptimization: gridFSConfig.enableStreamingOptimization || false,

      // Access control and security
      enableAccessControl: gridFSConfig.enableAccessControl || false,
      defaultPermissions: gridFSConfig.defaultPermissions || 'private',
      enableAuditLogging: gridFSConfig.enableAuditLogging || false,

      // Backup and replication
      enableBackupIntegration: gridFSConfig.enableBackupIntegration || false,
      enableReplicationMonitoring: gridFSConfig.enableReplicationMonitoring || false,

      // File processing
      enableContentAnalysis: gridFSConfig.enableContentAnalysis || false,
      enableVirusScan: gridFSConfig.enableVirusScan || false,
      enableFormatValidation: gridFSConfig.enableFormatValidation || false
    };

    // File management state
    this.activeUploads = new Map();
    this.activeDownloads = new Map();
    this.fileOperations = new Map();
    this.uploadQueue = [];

    // Performance metrics
    this.metrics = {
      totalFilesStored: 0,
      totalBytesStored: 0,
      averageUploadSpeed: 0,
      averageDownloadSpeed: 0,
      storageEfficiency: 0
    };

    this.initializeGridFS();
  }

  async initializeGridFS() {
    console.log('Initializing advanced GridFS file management...');

    try {
      // Connect to MongoDB
      this.client = new MongoClient(this.connectionString);
      await this.client.connect();
      this.db = this.client.db();

      // Initialize default GridFS bucket
      this.initializeBucket(this.config.defaultBucket);

      // Initialize custom buckets
      for (const [bucketName, bucketConfig] of Object.entries(this.config.customBuckets)) {
        this.initializeBucket(bucketName, bucketConfig);
      }

      // Setup metadata indexing
      if (this.config.enableMetadataIndexing) {
        await this.setupMetadataIndexing();
      }

      // Setup file processing pipeline
      await this.setupFileProcessingPipeline();

      // Initialize monitoring and metrics
      await this.setupMonitoringAndMetrics();

      console.log('Advanced GridFS file management initialized successfully');

    } catch (error) {
      console.error('Error initializing GridFS:', error);
      throw error;
    }
  }

  initializeBucket(bucketName, bucketConfig = {}) {
    const bucket = new GridFSBucket(this.db, {
      bucketName: bucketName,
      chunkSizeBytes: bucketConfig.chunkSizeBytes || this.config.chunkSizeBytes
    });

    this.gridFSBuckets.set(bucketName, {
      bucket: bucket,
      config: bucketConfig,
      stats: {
        totalFiles: 0,
        totalBytes: 0,
        averageFileSize: 0,
        lastActivity: new Date()
      }
    });

    console.log(`Initialized GridFS bucket: ${bucketName}`);
  }

  async setupMetadataIndexing() {
    console.log('Setting up metadata indexing for GridFS...');

    try {
      // Create indexes on files collection for efficient queries
      for (const [bucketName, bucketInfo] of this.gridFSBuckets.entries()) {
        const filesCollection = this.db.collection(`${bucketName}.files`);
        const chunksCollection = this.db.collection(`${bucketName}.chunks`);

        // Files collection indexes
        await filesCollection.createIndex(
          { filename: 1, uploadDate: -1 },
          { background: true }
        );

        await filesCollection.createIndex(
          { 'metadata.contentType': 1, uploadDate: -1 },
          { background: true }
        );

        await filesCollection.createIndex(
          { 'metadata.tags': 1 },
          { background: true }
        );

        await filesCollection.createIndex(
          { length: -1, uploadDate: -1 },
          { background: true }
        );

        // Chunks collection optimization
        await chunksCollection.createIndex(
          { files_id: 1, n: 1 },
          { unique: true, background: true }
        );
      }

    } catch (error) {
      console.error('Error setting up metadata indexing:', error);
      throw error;
    }
  }

  async uploadFile(filePath, options = {}) {
    console.log(`Starting file upload: ${filePath}`);

    const uploadId = this.generateUploadId();
    const startTime = Date.now();

    try {
      // Validate file exists and get stats
      const fileStats = await fs.promises.stat(filePath);

      // Prepare upload configuration
      const uploadConfig = {
        uploadId: uploadId,
        filePath: filePath,
        fileName: options.filename || path.basename(filePath),
        bucketName: options.bucket || this.config.defaultBucket,
        contentType: options.contentType || this.detectContentType(filePath),

        // File metadata
        metadata: {
          originalPath: filePath,
          fileSize: fileStats.size,
          uploadedBy: options.uploadedBy || 'system',
          uploadDate: new Date(),
          contentType: options.contentType || this.detectContentType(filePath),

          // Custom metadata
          tags: options.tags || [],
          category: options.category || 'general',
          permissions: options.permissions || this.config.defaultPermissions,

          // Processing configuration
          processOnUpload: options.processOnUpload || false,
          generateThumbnail: options.generateThumbnail || false,
          enableCompression: options.enableCompression || this.config.enableCompression,

          // Checksums for integrity
          checksums: {}
        },

        // Upload progress tracking
        progress: {
          bytesUploaded: 0,
          totalBytes: fileStats.size,
          percentComplete: 0,
          uploadSpeed: 0,
          estimatedTimeRemaining: 0
        }
      };

      // Get GridFS bucket
      const bucketInfo = this.gridFSBuckets.get(uploadConfig.bucketName);
      if (!bucketInfo) {
        throw new Error(`GridFS bucket not found: ${uploadConfig.bucketName}`);
      }

      // Store upload state
      this.activeUploads.set(uploadId, uploadConfig);

      // Calculate file checksum before upload
      if (this.config.enableDeduplication) {
        uploadConfig.metadata.checksums.md5 = await this.calculateFileChecksum(filePath, 'md5');
        uploadConfig.metadata.checksums.sha256 = await this.calculateFileChecksum(filePath, 'sha256');

        // Check for duplicate files
        const duplicate = await this.findDuplicateFile(uploadConfig.metadata.checksums.sha256, uploadConfig.bucketName);
        if (duplicate && options.skipDuplicates) {
          this.emit('duplicateDetected', {
            uploadId: uploadId,
            duplicateFileId: duplicate._id,
            fileName: uploadConfig.fileName
          });

          return {
            success: true,
            uploadId: uploadId,
            fileId: duplicate._id,
            isDuplicate: true,
            fileName: uploadConfig.fileName,
            fileSize: duplicate.length
          };
        }
      }

      // Create upload stream
      const uploadStream = bucketInfo.bucket.openUploadStream(uploadConfig.fileName, {
        chunkSizeBytes: bucketInfo.config.chunkSizeBytes || this.config.chunkSizeBytes,
        metadata: uploadConfig.metadata
      });

      // Create read stream from file
      const fileReadStream = createReadStream(filePath);

      // Track upload progress
      const progressTracker = this.createProgressTracker(uploadId, uploadConfig);

      // Pipeline streams with error handling
      const pipelineAsync = promisify(pipeline);

      await pipelineAsync(
        fileReadStream,
        progressTracker,
        uploadStream
      );

      // Update upload completion
      const endTime = Date.now();
      const duration = endTime - startTime;
      const fileId = uploadStream.id;

      uploadConfig.fileId = fileId;
      uploadConfig.status = 'completed';
      uploadConfig.duration = duration;
      uploadConfig.uploadSpeed = (fileStats.size / 1024 / 1024) / (duration / 1000); // MB/s

      // Update bucket statistics
      bucketInfo.stats.totalFiles++;
      bucketInfo.stats.totalBytes += fileStats.size;
      bucketInfo.stats.averageFileSize = bucketInfo.stats.totalBytes / bucketInfo.stats.totalFiles;
      bucketInfo.stats.lastActivity = new Date();

      // Post-processing
      if (uploadConfig.metadata.processOnUpload) {
        await this.processUploadedFile(fileId, uploadConfig);
      }

      // Update system metrics
      this.updateMetrics(uploadConfig);

      // Cleanup
      this.activeUploads.delete(uploadId);

      this.emit('uploadCompleted', {
        uploadId: uploadId,
        fileId: fileId,
        fileName: uploadConfig.fileName,
        fileSize: fileStats.size,
        duration: duration,
        uploadSpeed: uploadConfig.uploadSpeed
      });

      console.log(`File upload completed: ${uploadConfig.fileName} (${fileId})`);

      return {
        success: true,
        uploadId: uploadId,
        fileId: fileId,
        fileName: uploadConfig.fileName,
        fileSize: fileStats.size,
        duration: duration,
        uploadSpeed: uploadConfig.uploadSpeed,
        bucketName: uploadConfig.bucketName
      };

    } catch (error) {
      console.error(`File upload failed for ${uploadId}:`, error);

      // Update upload state
      const uploadConfig = this.activeUploads.get(uploadId);
      if (uploadConfig) {
        uploadConfig.status = 'failed';
        uploadConfig.error = error.message;
      }

      this.emit('uploadFailed', {
        uploadId: uploadId,
        fileName: options.filename || path.basename(filePath),
        error: error.message
      });

      return {
        success: false,
        uploadId: uploadId,
        error: error.message
      };
    }
  }

  createProgressTracker(uploadId, uploadConfig) {
    const { Transform } = require('stream');

    return new Transform({
      transform(chunk, encoding, callback) {
        // Update progress
        uploadConfig.progress.bytesUploaded += chunk.length;
        uploadConfig.progress.percentComplete = 
          (uploadConfig.progress.bytesUploaded / uploadConfig.progress.totalBytes) * 100;

        // Calculate upload speed
        const currentTime = Date.now();
        const timeElapsed = (currentTime - uploadConfig.startTime) / 1000; // seconds
        uploadConfig.progress.uploadSpeed = 
          (uploadConfig.progress.bytesUploaded / 1024 / 1024) / timeElapsed; // MB/s

        // Estimate time remaining
        const remainingBytes = uploadConfig.progress.totalBytes - uploadConfig.progress.bytesUploaded;
        uploadConfig.progress.estimatedTimeRemaining = 
          remainingBytes / (uploadConfig.progress.uploadSpeed * 1024 * 1024);

        // Emit progress update
        this.emit('uploadProgress', {
          uploadId: uploadId,
          progress: uploadConfig.progress
        });

        callback(null, chunk);
      }.bind(this)
    });
  }

  async downloadFile(fileId, downloadPath, options = {}) {
    console.log(`Starting file download: ${fileId} -> ${downloadPath}`);

    const downloadId = this.generateDownloadId();
    const startTime = Date.now();

    try {
      // Get bucket
      const bucketName = options.bucket || this.config.defaultBucket;
      const bucketInfo = this.gridFSBuckets.get(bucketName);
      if (!bucketInfo) {
        throw new Error(`GridFS bucket not found: ${bucketName}`);
      }

      // Get file metadata
      const fileMetadata = await this.getFileMetadata(fileId, bucketName);
      if (!fileMetadata) {
        throw new Error(`File not found: ${fileId}`);
      }

      // Prepare download configuration
      const downloadConfig = {
        downloadId: downloadId,
        fileId: fileId,
        downloadPath: downloadPath,
        bucketName: bucketName,
        fileSize: fileMetadata.length,

        // Download progress tracking
        progress: {
          bytesDownloaded: 0,
          totalBytes: fileMetadata.length,
          percentComplete: 0,
          downloadSpeed: 0,
          estimatedTimeRemaining: 0
        }
      };

      // Store download state
      this.activeDownloads.set(downloadId, downloadConfig);

      // Create download stream
      const downloadStream = bucketInfo.bucket.openDownloadStream(fileId);

      // Create write stream to file
      const fileWriteStream = createWriteStream(downloadPath);

      // Track download progress
      const progressTracker = this.createDownloadProgressTracker(downloadId, downloadConfig);

      // Pipeline streams
      const pipelineAsync = promisify(pipeline);

      await pipelineAsync(
        downloadStream,
        progressTracker,
        fileWriteStream
      );

      // Update download completion
      const endTime = Date.now();
      const duration = endTime - startTime;
      downloadConfig.duration = duration;
      downloadConfig.downloadSpeed = (fileMetadata.length / 1024 / 1024) / (duration / 1000); // MB/s

      // Cleanup
      this.activeDownloads.delete(downloadId);

      this.emit('downloadCompleted', {
        downloadId: downloadId,
        fileId: fileId,
        fileName: fileMetadata.filename,
        fileSize: fileMetadata.length,
        duration: duration,
        downloadSpeed: downloadConfig.downloadSpeed
      });

      console.log(`File download completed: ${fileMetadata.filename} (${fileId})`);

      return {
        success: true,
        downloadId: downloadId,
        fileId: fileId,
        fileName: fileMetadata.filename,
        fileSize: fileMetadata.length,
        duration: duration,
        downloadSpeed: downloadConfig.downloadSpeed
      };

    } catch (error) {
      console.error(`File download failed for ${downloadId}:`, error);

      // Cleanup
      this.activeDownloads.delete(downloadId);

      this.emit('downloadFailed', {
        downloadId: downloadId,
        fileId: fileId,
        error: error.message
      });

      return {
        success: false,
        downloadId: downloadId,
        fileId: fileId,
        error: error.message
      };
    }
  }

  createDownloadProgressTracker(downloadId, downloadConfig) {
    const { Transform } = require('stream');

    return new Transform({
      transform(chunk, encoding, callback) {
        // Update progress
        downloadConfig.progress.bytesDownloaded += chunk.length;
        downloadConfig.progress.percentComplete = 
          (downloadConfig.progress.bytesDownloaded / downloadConfig.progress.totalBytes) * 100;

        // Calculate download speed
        const currentTime = Date.now();
        const timeElapsed = (currentTime - downloadConfig.startTime) / 1000; // seconds
        downloadConfig.progress.downloadSpeed = 
          (downloadConfig.progress.bytesDownloaded / 1024 / 1024) / timeElapsed; // MB/s

        // Emit progress update
        this.emit('downloadProgress', {
          downloadId: downloadId,
          progress: downloadConfig.progress
        });

        callback(null, chunk);
      }.bind(this)
    });
  }

  async getFileMetadata(fileId, bucketName = null) {
    console.log(`Getting file metadata: ${fileId}`);

    try {
      bucketName = bucketName || this.config.defaultBucket;
      const filesCollection = this.db.collection(`${bucketName}.files`);

      const fileMetadata = await filesCollection.findOne({ _id: fileId });
      return fileMetadata;

    } catch (error) {
      console.error(`Error getting file metadata for ${fileId}:`, error);
      throw error;
    }
  }

  async searchFiles(searchCriteria, options = {}) {
    console.log('Searching files with criteria:', searchCriteria);

    try {
      const bucketName = options.bucket || this.config.defaultBucket;
      const filesCollection = this.db.collection(`${bucketName}.files`);

      // Build search query
      const query = {};

      // Text search on filename
      if (searchCriteria.filename) {
        query.filename = { $regex: searchCriteria.filename, $options: 'i' };
      }

      // Content type filter
      if (searchCriteria.contentType) {
        query['metadata.contentType'] = searchCriteria.contentType;
      }

      // Size range filter
      if (searchCriteria.sizeRange) {
        query.length = {};
        if (searchCriteria.sizeRange.min) {
          query.length.$gte = searchCriteria.sizeRange.min;
        }
        if (searchCriteria.sizeRange.max) {
          query.length.$lte = searchCriteria.sizeRange.max;
        }
      }

      // Date range filter
      if (searchCriteria.dateRange) {
        query.uploadDate = {};
        if (searchCriteria.dateRange.from) {
          query.uploadDate.$gte = new Date(searchCriteria.dateRange.from);
        }
        if (searchCriteria.dateRange.to) {
          query.uploadDate.$lte = new Date(searchCriteria.dateRange.to);
        }
      }

      // Tags filter
      if (searchCriteria.tags) {
        query['metadata.tags'] = { $in: searchCriteria.tags };
      }

      // Category filter
      if (searchCriteria.category) {
        query['metadata.category'] = searchCriteria.category;
      }

      // Execute search with pagination
      const limit = options.limit || 50;
      const skip = options.skip || 0;
      const sort = options.sort || { uploadDate: -1 };

      const files = await filesCollection
        .find(query)
        .sort(sort)
        .limit(limit)
        .skip(skip)
        .toArray();

      // Get total count for pagination
      const totalCount = await filesCollection.countDocuments(query);

      return {
        success: true,
        files: files.map(file => ({
          fileId: file._id,
          filename: file.filename,
          length: file.length,
          uploadDate: file.uploadDate,
          contentType: file.metadata?.contentType,
          tags: file.metadata?.tags || [],
          category: file.metadata?.category,
          checksums: file.metadata?.checksums || {}
        })),
        totalCount: totalCount,
        currentPage: Math.floor(skip / limit) + 1,
        totalPages: Math.ceil(totalCount / limit)
      };

    } catch (error) {
      console.error('Error searching files:', error);
      return {
        success: false,
        error: error.message
      };
    }
  }

  async deleteFile(fileId, bucketName = null) {
    console.log(`Deleting file: ${fileId}`);

    try {
      bucketName = bucketName || this.config.defaultBucket;
      const bucketInfo = this.gridFSBuckets.get(bucketName);
      if (!bucketInfo) {
        throw new Error(`GridFS bucket not found: ${bucketName}`);
      }

      // Get file metadata before deletion
      const fileMetadata = await this.getFileMetadata(fileId, bucketName);
      if (!fileMetadata) {
        throw new Error(`File not found: ${fileId}`);
      }

      // Delete file from GridFS
      await bucketInfo.bucket.delete(fileId);

      // Update bucket statistics
      bucketInfo.stats.totalFiles = Math.max(0, bucketInfo.stats.totalFiles - 1);
      bucketInfo.stats.totalBytes = Math.max(0, bucketInfo.stats.totalBytes - fileMetadata.length);
      if (bucketInfo.stats.totalFiles > 0) {
        bucketInfo.stats.averageFileSize = bucketInfo.stats.totalBytes / bucketInfo.stats.totalFiles;
      }
      bucketInfo.stats.lastActivity = new Date();

      this.emit('fileDeleted', {
        fileId: fileId,
        fileName: fileMetadata.filename,
        fileSize: fileMetadata.length,
        bucketName: bucketName
      });

      console.log(`File deleted successfully: ${fileMetadata.filename} (${fileId})`);

      return {
        success: true,
        fileId: fileId,
        fileName: fileMetadata.filename,
        fileSize: fileMetadata.length
      };

    } catch (error) {
      console.error(`Error deleting file ${fileId}:`, error);
      return {
        success: false,
        fileId: fileId,
        error: error.message
      };
    }
  }

  async getStorageStatistics(bucketName = null) {
    console.log(`Getting storage statistics${bucketName ? ' for bucket: ' + bucketName : ''}`);

    try {
      const statistics = {};

      if (bucketName) {
        // Get statistics for specific bucket
        const bucketInfo = this.gridFSBuckets.get(bucketName);
        if (!bucketInfo) {
          throw new Error(`GridFS bucket not found: ${bucketName}`);
        }

        statistics[bucketName] = await this.calculateBucketStatistics(bucketName, bucketInfo);
      } else {
        // Get statistics for all buckets
        for (const [name, bucketInfo] of this.gridFSBuckets.entries()) {
          statistics[name] = await this.calculateBucketStatistics(name, bucketInfo);
        }
      }

      // Calculate system-wide statistics
      const systemStats = {
        totalBuckets: this.gridFSBuckets.size,
        totalFiles: Object.values(statistics).reduce((sum, bucket) => sum + bucket.fileCount, 0),
        totalBytes: Object.values(statistics).reduce((sum, bucket) => sum + bucket.totalBytes, 0),
        averageFileSize: 0,
        storageEfficiency: this.metrics.storageEfficiency
      };

      if (systemStats.totalFiles > 0) {
        systemStats.averageFileSize = systemStats.totalBytes / systemStats.totalFiles;
      }

      return {
        success: true,
        bucketStatistics: statistics,
        systemStatistics: systemStats,
        retrievalTime: new Date()
      };

    } catch (error) {
      console.error('Error getting storage statistics:', error);
      return {
        success: false,
        error: error.message
      };
    }
  }

  async calculateBucketStatistics(bucketName, bucketInfo) {
    const filesCollection = this.db.collection(`${bucketName}.files`);
    const chunksCollection = this.db.collection(`${bucketName}.chunks`);

    // Basic file statistics
    const fileStats = await filesCollection.aggregate([
      {
        $group: {
          _id: null,
          fileCount: { $sum: 1 },
          totalBytes: { $sum: '$length' },
          averageFileSize: { $avg: '$length' },
          largestFile: { $max: '$length' },
          smallestFile: { $min: '$length' }
        }
      }
    ]).toArray();

    // Content type distribution
    const contentTypeStats = await filesCollection.aggregate([
      {
        $group: {
          _id: '$metadata.contentType',
          count: { $sum: 1 },
          totalBytes: { $sum: '$length' }
        }
      },
      { $sort: { count: -1 } },
      { $limit: 10 }
    ]).toArray();

    // Chunk statistics
    const chunkStats = await chunksCollection.aggregate([
      {
        $group: {
          _id: null,
          totalChunks: { $sum: 1 },
          averageChunkSize: { $avg: { $binarySize: '$data' } }
        }
      }
    ]).toArray();

    const baseStats = fileStats[0] || {
      fileCount: 0,
      totalBytes: 0,
      averageFileSize: 0,
      largestFile: 0,
      smallestFile: 0
    };

    return {
      fileCount: baseStats.fileCount,
      totalBytes: baseStats.totalBytes,
      averageFileSize: Math.round(baseStats.averageFileSize || 0),
      largestFile: baseStats.largestFile,
      smallestFile: baseStats.smallestFile,
      contentTypes: contentTypeStats,
      totalChunks: chunkStats[0]?.totalChunks || 0,
      averageChunkSize: Math.round(chunkStats[0]?.averageChunkSize || 0),
      storageEfficiency: this.calculateStorageEfficiency(bucketName)
    };
  }

  // Utility methods

  generateUploadId() {
    return `upload_${Date.now()}_${Math.random().toString(36).substr(2, 9)}`;
  }

  generateDownloadId() {
    return `download_${Date.now()}_${Math.random().toString(36).substr(2, 9)}`;
  }

  detectContentType(filePath) {
    const path = require('path');
    const ext = path.extname(filePath).toLowerCase();

    const mimeTypes = {
      '.pdf': 'application/pdf',
      '.doc': 'application/msword',
      '.docx': 'application/vnd.openxmlformats-officedocument.wordprocessingml.document',
      '.jpg': 'image/jpeg',
      '.jpeg': 'image/jpeg',
      '.png': 'image/png',
      '.gif': 'image/gif',
      '.mp4': 'video/mp4',
      '.mp3': 'audio/mpeg',
      '.zip': 'application/zip',
      '.txt': 'text/plain',
      '.json': 'application/json',
      '.xml': 'application/xml'
    };

    return mimeTypes[ext] || 'application/octet-stream';
  }

  async calculateFileChecksum(filePath, algorithm = 'sha256') {
    return new Promise((resolve, reject) => {
      const hash = crypto.createHash(algorithm);
      const stream = createReadStream(filePath);

      stream.on('data', (data) => {
        hash.update(data);
      });

      stream.on('end', () => {
        resolve(hash.digest('hex'));
      });

      stream.on('error', (error) => {
        reject(error);
      });
    });
  }

  async findDuplicateFile(checksum, bucketName) {
    const filesCollection = this.db.collection(`${bucketName}.files`);
    return await filesCollection.findOne({
      'metadata.checksums.sha256': checksum
    });
  }

  calculateStorageEfficiency(bucketName) {
    // Simplified storage efficiency calculation
    // In a real implementation, this would analyze compression ratios, deduplication, etc.
    return 85.0; // Placeholder
  }

  updateMetrics(uploadConfig) {
    this.metrics.totalFilesStored++;
    this.metrics.totalBytesStored += uploadConfig.progress.totalBytes;

    // Update average upload speed
    const totalUploads = this.metrics.totalFilesStored;
    this.metrics.averageUploadSpeed = 
      ((this.metrics.averageUploadSpeed * (totalUploads - 1)) + uploadConfig.uploadSpeed) / totalUploads;
  }

  async setupFileProcessingPipeline() {
    // Setup file processing pipeline for thumbnails, content analysis, etc.
    console.log('Setting up file processing pipeline...');
  }

  async setupMonitoringAndMetrics() {
    // Setup monitoring and metrics collection
    console.log('Setting up monitoring and metrics...');
  }

  async processUploadedFile(fileId, uploadConfig) {
    // Process uploaded file (thumbnails, analysis, etc.)
    console.log(`Processing uploaded file: ${fileId}`);
  }

  async shutdown() {
    console.log('Shutting down GridFS file manager...');

    try {
      // Wait for active uploads/downloads to complete
      if (this.activeUploads.size > 0) {
        console.log(`Waiting for ${this.activeUploads.size} uploads to complete...`);
      }

      if (this.activeDownloads.size > 0) {
        console.log(`Waiting for ${this.activeDownloads.size} downloads to complete...`);
      }

      // Close MongoDB connection
      if (this.client) {
        await this.client.close();
      }

      console.log('GridFS file manager shutdown complete');

    } catch (error) {
      console.error('Error during shutdown:', error);
    }
  }
}

// Benefits of MongoDB GridFS Advanced File Storage:
// - Efficient binary data storage with automatic chunking and compression
// - Integrated metadata management with full-text search capabilities
// - Streaming upload and download with progress tracking and optimization
// - Built-in replication and distributed storage through MongoDB replica sets
// - Transactional consistency between file operations and database operations
// - Advanced file processing pipeline with thumbnail generation and content analysis
// - Comprehensive version control and access management capabilities
// - SQL-compatible file operations through QueryLeaf integration
// - Enterprise-grade security, encryption, and audit logging
// - Production-ready scalability with automatic load balancing and optimization

module.exports = {
  AdvancedGridFSFileManager
};

Understanding MongoDB GridFS Architecture

Advanced File Storage Design and Implementation Patterns

Implement comprehensive GridFS workflows for enterprise file management:

// Enterprise-grade GridFS with advanced distributed file management capabilities
class EnterpriseGridFSManager extends AdvancedGridFSFileManager {
  constructor(connectionString, enterpriseConfig) {
    super(connectionString, enterpriseConfig);

    this.enterpriseConfig = {
      ...enterpriseConfig,
      enableDistributedProcessing: true,
      enableContentDeliveryNetwork: true,
      enableAdvancedSecurity: true,
      enableComplianceAuditing: true,
      enableGlobalReplication: true
    };

    this.setupEnterpriseCapabilities();
    this.initializeDistributedProcessing();
    this.setupContentDeliveryNetwork();
  }

  async implementAdvancedFileStrategy() {
    console.log('Implementing enterprise file management strategy...');

    const fileStrategy = {
      // Multi-tier storage strategy
      storageTiers: {
        hotStorage: {
          criteria: 'accessed_within_30_days',
          chunkSize: 261120,
          compressionLevel: 6,
          replicationFactor: 3
        },
        coldStorage: {
          criteria: 'accessed_30_to_90_days_ago',
          chunkSize: 1048576,
          compressionLevel: 9,
          replicationFactor: 2
        },
        archiveStorage: {
          criteria: 'accessed_more_than_90_days_ago',
          chunkSize: 4194304,
          compressionLevel: 9,
          replicationFactor: 1
        }
      },

      // Content delivery optimization
      contentDelivery: {
        enableGlobalDistribution: true,
        enableEdgeCaching: true,
        enableImageOptimization: true,
        enableVideoTranscoding: true
      },

      // Advanced processing
      fileProcessing: {
        enableMachineLearning: true,
        enableContentRecognition: true,
        enableAutomaticTagging: true,
        enableThreatDetection: true
      }
    };

    return await this.deployEnterpriseStrategy(fileStrategy);
  }

  async setupAdvancedSecurity() {
    console.log('Setting up enterprise security for file operations...');

    const securityConfig = {
      // File encryption
      encryptionAtRest: true,
      encryptionInTransit: true,
      encryptionKeyRotation: true,

      // Access control
      roleBasedAccess: true,
      attributeBasedAccess: true,
      dynamicPermissions: true,

      // Threat protection
      malwareScanning: true,
      contentFiltering: true,
      dataLossPrevention: true
    };

    return await this.deploySecurityFramework(securityConfig);
  }
}

SQL-Style GridFS Operations with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB GridFS operations:

-- QueryLeaf advanced GridFS operations with SQL-familiar syntax for MongoDB

-- Configure GridFS bucket with comprehensive settings
CREATE GRIDFS_BUCKET media_files 
WITH chunk_size_bytes = 261120,
     enable_compression = true,
     compression_level = 6,
     enable_encryption = true,
     enable_metadata_indexing = true,
     enable_version_control = true,
     enable_thumbnail_generation = true,
     enable_content_analysis = true,

     -- Storage optimization
     enable_deduplication = true,
     enable_automatic_cleanup = true,
     storage_tier_management = true,

     -- Access control
     default_permissions = 'private',
     enable_access_logging = true,
     enable_audit_trail = true,

     -- Performance settings
     max_concurrent_uploads = 10,
     enable_parallel_processing = true,
     enable_streaming_optimization = true,

     -- Backup and replication
     enable_backup_integration = true,
     cross_region_replication = true,
     replication_factor = 3;

-- Advanced file upload with comprehensive metadata and processing
WITH file_uploads AS (
  SELECT 
    file_id,
    filename,
    file_size_bytes,
    content_type,
    upload_timestamp,
    upload_duration_seconds,
    upload_speed_mbps,

    -- Processing results
    compression_applied,
    compression_ratio,
    thumbnail_generated,
    content_analysis_completed,
    virus_scan_status,

    -- Metadata extraction
    JSON_EXTRACT(metadata, '$.originalPath') as original_path,
    JSON_EXTRACT(metadata, '$.uploadedBy') as uploaded_by,
    JSON_EXTRACT(metadata, '$.tags') as file_tags,
    JSON_EXTRACT(metadata, '$.category') as file_category,
    JSON_EXTRACT(metadata, '$.permissions') as access_permissions,

    -- File integrity
    JSON_EXTRACT(metadata, '$.checksums.md5') as md5_checksum,
    JSON_EXTRACT(metadata, '$.checksums.sha256') as sha256_checksum,

    -- Processing pipeline results
    JSON_EXTRACT(metadata, '$.processingResults') as processing_results,
    JSON_EXTRACT(metadata, '$.thumbnailPath') as thumbnail_path,
    JSON_EXTRACT(metadata, '$.contentAnalysis') as content_analysis

  FROM GRIDFS_FILES('media_files')
  WHERE upload_timestamp >= CURRENT_TIMESTAMP - INTERVAL '24 hours'
),

upload_performance AS (
  SELECT 
    file_category,
    content_type,
    COUNT(*) as total_uploads,
    SUM(file_size_bytes) as total_bytes_uploaded,
    AVG(upload_duration_seconds) as avg_upload_time,
    AVG(upload_speed_mbps) as avg_upload_speed,
    PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY upload_duration_seconds) as p95_upload_time,

    -- Processing performance
    COUNT(*) FILTER (WHERE compression_applied = true) as compressed_files,
    AVG(compression_ratio) FILTER (WHERE compression_applied = true) as avg_compression_ratio,
    COUNT(*) FILTER (WHERE thumbnail_generated = true) as thumbnails_generated,
    COUNT(*) FILTER (WHERE content_analysis_completed = true) as content_analyzed,
    COUNT(*) FILTER (WHERE virus_scan_status = 'clean') as clean_files,
    COUNT(*) FILTER (WHERE virus_scan_status = 'threat_detected') as threat_files,

    -- File size distribution
    AVG(file_size_bytes) as avg_file_size_bytes,
    MAX(file_size_bytes) as largest_file_bytes,
    MIN(file_size_bytes) as smallest_file_bytes,

    -- Storage efficiency
    SUM(file_size_bytes) as original_total_bytes,
    SUM(CASE WHEN compression_applied THEN 
          file_size_bytes * (1 - compression_ratio) 
        ELSE file_size_bytes 
    END) as stored_total_bytes

  FROM file_uploads
  GROUP BY file_category, content_type
),

storage_analysis AS (
  SELECT 
    bucket_name,
    DATE_TRUNC('hour', upload_timestamp) as upload_hour,

    -- Upload volume analysis
    COUNT(*) as files_uploaded,
    SUM(file_size_bytes) as bytes_uploaded,
    AVG(upload_speed_mbps) as avg_hourly_upload_speed,

    -- Content type distribution
    COUNT(*) FILTER (WHERE content_type LIKE 'image/%') as image_files,
    COUNT(*) FILTER (WHERE content_type LIKE 'video/%') as video_files,
    COUNT(*) FILTER (WHERE content_type LIKE 'audio/%') as audio_files,
    COUNT(*) FILTER (WHERE content_type = 'application/pdf') as pdf_files,
    COUNT(*) FILTER (WHERE content_type NOT IN ('image/%', 'video/%', 'audio/%', 'application/pdf')) as other_files,

    -- Processing success rates
    COUNT(*) FILTER (WHERE virus_scan_status = 'clean') as safe_files,
    COUNT(*) FILTER (WHERE content_analysis_completed = true) as analyzed_files,
    COUNT(*) FILTER (WHERE thumbnail_generated = true) as thumbnail_files,

    -- Storage optimization metrics
    AVG(CASE WHEN compression_applied THEN compression_ratio ELSE 0 END) as avg_compression_ratio,
    SUM(CASE WHEN compression_applied THEN 
          file_size_bytes * compression_ratio 
        ELSE 0 
    END) as total_space_saved_bytes

  FROM file_uploads
  GROUP BY bucket_name, DATE_TRUNC('hour', upload_timestamp)
)

SELECT 
  up.file_category,
  up.content_type,
  up.total_uploads,

  -- Upload performance metrics
  ROUND(up.total_bytes_uploaded / 1024.0 / 1024.0, 2) as total_uploaded_mb,
  ROUND(up.avg_upload_time, 2) as avg_upload_time_seconds,
  ROUND(up.avg_upload_speed, 2) as avg_upload_speed_mbps,
  ROUND(up.p95_upload_time, 2) as p95_upload_time_seconds,

  -- Processing efficiency
  up.compressed_files,
  ROUND((up.compressed_files * 100.0) / up.total_uploads, 1) as compression_rate_percent,
  ROUND(up.avg_compression_ratio * 100, 1) as avg_compression_percent,
  up.thumbnails_generated,
  ROUND((up.thumbnails_generated * 100.0) / up.total_uploads, 1) as thumbnail_rate_percent,

  -- Content analysis results
  up.content_analyzed,
  ROUND((up.content_analyzed * 100.0) / up.total_uploads, 1) as analysis_rate_percent,

  -- Security metrics
  up.clean_files,
  up.threat_files,
  CASE 
    WHEN up.threat_files > 0 THEN 'security_issues_detected'
    ELSE 'all_files_clean'
  END as security_status,

  -- File size statistics
  ROUND(up.avg_file_size_bytes / 1024.0 / 1024.0, 2) as avg_file_size_mb,
  ROUND(up.largest_file_bytes / 1024.0 / 1024.0, 2) as largest_file_mb,
  ROUND(up.smallest_file_bytes / 1024.0, 2) as smallest_file_kb,

  -- Storage optimization
  ROUND(up.original_total_bytes / 1024.0 / 1024.0, 2) as original_storage_mb,
  ROUND(up.stored_total_bytes / 1024.0 / 1024.0, 2) as actual_storage_mb,
  ROUND(((up.original_total_bytes - up.stored_total_bytes) / up.original_total_bytes) * 100, 1) as storage_savings_percent,

  -- Performance assessment
  CASE 
    WHEN up.avg_upload_speed > 50 THEN 'excellent'
    WHEN up.avg_upload_speed > 20 THEN 'good'
    WHEN up.avg_upload_speed > 10 THEN 'acceptable'
    ELSE 'needs_optimization'
  END as upload_performance_rating,

  -- Processing health
  CASE 
    WHEN up.threat_files > 0 THEN 'security_review_required'
    WHEN (up.thumbnails_generated * 100.0 / up.total_uploads) < 80 AND up.content_type LIKE 'image/%' THEN 'thumbnail_generation_issues'
    WHEN (up.content_analyzed * 100.0 / up.total_uploads) < 90 THEN 'content_analysis_issues'
    ELSE 'processing_healthy'
  END as processing_health_status,

  -- Optimization recommendations
  ARRAY[
    CASE WHEN up.avg_upload_speed < 10 THEN 'Optimize network bandwidth or chunk size' END,
    CASE WHEN up.avg_compression_ratio < 0.3 AND up.content_type LIKE 'image/%' THEN 'Review image compression settings' END,
    CASE WHEN (up.thumbnails_generated * 100.0 / up.total_uploads) < 50 AND up.content_type LIKE 'image/%' THEN 'Fix thumbnail generation pipeline' END,
    CASE WHEN up.threat_files > 0 THEN 'Review security scanning configuration' END,
    CASE WHEN up.p95_upload_time > 300 THEN 'Optimize upload processing for large files' END
  ]::TEXT[] as optimization_recommendations

FROM upload_performance up
ORDER BY up.total_bytes_uploaded DESC, up.total_uploads DESC;

-- Advanced file search and retrieval with comprehensive filtering
WITH file_search_results AS (
  SELECT 
    file_id,
    filename,
    content_type,
    file_size_bytes,
    upload_timestamp,

    -- Metadata extraction
    JSON_EXTRACT(metadata, '$.category') as category,
    JSON_EXTRACT(metadata, '$.tags') as tags,
    JSON_EXTRACT(metadata, '$.uploadedBy') as uploaded_by,
    JSON_EXTRACT(metadata, '$.permissions') as permissions,
    JSON_EXTRACT(metadata, '$.contentAnalysis.description') as content_description,
    JSON_EXTRACT(metadata, '$.contentAnalysis.keywords') as content_keywords,
    JSON_EXTRACT(metadata, '$.processingResults.thumbnailAvailable') as has_thumbnail,
    JSON_EXTRACT(metadata, '$.processingResults.textExtracted') as has_text_content,

    -- File access patterns
    download_count,
    last_accessed,
    access_frequency_score,

    -- Storage tier information
    storage_tier,
    CASE storage_tier
      WHEN 'hot' THEN 1
      WHEN 'warm' THEN 2  
      WHEN 'cold' THEN 3
      WHEN 'archive' THEN 4
      ELSE 5
    END as tier_priority,

    -- File age and usage
    EXTRACT(DAYS FROM (CURRENT_TIMESTAMP - upload_timestamp)) as file_age_days,
    EXTRACT(DAYS FROM (CURRENT_TIMESTAMP - last_accessed)) as days_since_last_access

  FROM GRIDFS_FILES('media_files')
  WHERE 
    -- Content type filters
    (content_type IN ('image/jpeg', 'image/png', 'application/pdf', 'video/mp4') OR content_type LIKE '%/%')

    -- Size filters
    AND file_size_bytes BETWEEN 1024 AND 1073741824  -- 1KB to 1GB

    -- Date range filters
    AND upload_timestamp >= CURRENT_TIMESTAMP - INTERVAL '90 days'

    -- Category and tag filters
    AND (JSON_EXTRACT(metadata, '$.category') IS NOT NULL)
    AND (JSON_EXTRACT(metadata, '$.tags') IS NOT NULL)
),

file_analytics AS (
  SELECT 
    fsr.*,

    -- Content analysis scoring
    CASE 
      WHEN fsr.content_description IS NOT NULL AND fsr.content_keywords IS NOT NULL THEN 'fully_analyzed'
      WHEN fsr.content_description IS NOT NULL OR fsr.content_keywords IS NOT NULL THEN 'partially_analyzed'
      ELSE 'not_analyzed'
    END as analysis_completeness,

    -- Access pattern classification
    CASE 
      WHEN fsr.access_frequency_score > 0.8 THEN 'frequently_accessed'
      WHEN fsr.access_frequency_score > 0.4 THEN 'moderately_accessed'
      WHEN fsr.access_frequency_score > 0.1 THEN 'rarely_accessed'
      ELSE 'never_accessed'
    END as access_pattern,

    -- Storage optimization opportunities
    CASE 
      WHEN fsr.days_since_last_access > 90 AND fsr.storage_tier IN ('hot', 'warm') THEN 'candidate_for_cold_storage'
      WHEN fsr.days_since_last_access > 365 AND fsr.storage_tier != 'archive' THEN 'candidate_for_archive'
      WHEN fsr.access_frequency_score > 0.6 AND fsr.storage_tier IN ('cold', 'archive') THEN 'candidate_for_hot_storage'
      ELSE 'appropriate_storage_tier'
    END as storage_optimization,

    -- File health assessment
    CASE 
      WHEN fsr.has_thumbnail = false AND fsr.content_type LIKE 'image/%' THEN 'missing_thumbnail'
      WHEN fsr.has_text_content = false AND fsr.content_type = 'application/pdf' THEN 'text_extraction_needed'
      WHEN fsr.analysis_completeness = 'not_analyzed' AND fsr.file_age_days > 7 THEN 'analysis_overdue'
      ELSE 'healthy'
    END as file_health_status

  FROM file_search_results fsr
),

usage_patterns AS (
  SELECT 
    content_type,
    category,
    access_pattern,
    storage_tier,
    COUNT(*) as file_count,
    SUM(file_size_bytes) as total_bytes,
    AVG(download_count) as avg_downloads,
    AVG(access_frequency_score) as avg_access_score,

    -- Storage tier distribution
    COUNT(*) FILTER (WHERE storage_tier = 'hot') as hot_tier_count,
    COUNT(*) FILTER (WHERE storage_tier = 'warm') as warm_tier_count,
    COUNT(*) FILTER (WHERE storage_tier = 'cold') as cold_tier_count,
    COUNT(*) FILTER (WHERE storage_tier = 'archive') as archive_tier_count,

    -- Health metrics
    COUNT(*) FILTER (WHERE file_health_status = 'healthy') as healthy_files,
    COUNT(*) FILTER (WHERE file_health_status != 'healthy') as unhealthy_files,

    -- Optimization opportunities
    COUNT(*) FILTER (WHERE storage_optimization LIKE 'candidate_for_%') as optimization_candidates

  FROM file_analytics
  GROUP BY content_type, category, access_pattern, storage_tier
)

SELECT 
  fa.file_id,
  fa.filename,
  fa.content_type,
  ROUND(fa.file_size_bytes / 1024.0 / 1024.0, 2) as file_size_mb,
  fa.category,
  fa.tags,
  fa.uploaded_by,

  -- Access and usage information
  fa.download_count,
  fa.access_pattern,
  fa.days_since_last_access,
  ROUND(fa.access_frequency_score, 3) as access_score,

  -- Storage and optimization
  fa.storage_tier,
  fa.storage_optimization,
  fa.file_health_status,

  -- Content analysis
  fa.analysis_completeness,
  CASE WHEN fa.has_thumbnail THEN 'yes' ELSE 'no' END as thumbnail_available,
  CASE WHEN fa.has_text_content THEN 'yes' ELSE 'no' END as text_content_available,

  -- File management recommendations
  ARRAY[
    CASE WHEN fa.storage_optimization LIKE 'candidate_for_%' THEN 
           'Move to ' || REPLACE(REPLACE(fa.storage_optimization, 'candidate_for_', ''), '_storage', ' storage')
         END,
    CASE WHEN fa.file_health_status = 'missing_thumbnail' THEN 'Generate thumbnail' END,
    CASE WHEN fa.file_health_status = 'text_extraction_needed' THEN 'Extract text content' END,
    CASE WHEN fa.file_health_status = 'analysis_overdue' THEN 'Run content analysis' END,
    CASE WHEN fa.days_since_last_access > 180 AND fa.download_count = 0 THEN 'Consider deletion' END
  ]::TEXT[] as recommendations,

  -- Priority scoring for operations
  CASE 
    WHEN fa.file_health_status != 'healthy' THEN 'high'
    WHEN fa.storage_optimization LIKE 'candidate_for_%' THEN 'medium' 
    WHEN fa.analysis_completeness = 'not_analyzed' THEN 'medium'
    ELSE 'low'
  END as maintenance_priority,

  -- Search relevance scoring
  (
    CASE WHEN fa.access_frequency_score > 0.5 THEN 2 ELSE 0 END +
    CASE WHEN fa.analysis_completeness = 'fully_analyzed' THEN 1 ELSE 0 END +
    CASE WHEN fa.file_health_status = 'healthy' THEN 1 ELSE 0 END +
    CASE WHEN fa.storage_tier = 'hot' THEN 1 ELSE 0 END
  ) as relevance_score

FROM file_analytics fa
WHERE 
  -- Apply additional search filters
  fa.file_size_mb BETWEEN 1 AND 100  -- 1MB to 100MB
  AND fa.file_age_days <= 60  -- Files from last 60 days
  AND fa.analysis_completeness != 'not_analyzed'  -- Only analyzed files

ORDER BY 
  -- Primary sort by maintenance priority, then relevance
  CASE fa.maintenance_priority 
    WHEN 'high' THEN 1 
    WHEN 'medium' THEN 2 
    ELSE 3 
  END,
  relevance_score DESC,
  fa.access_frequency_score DESC,
  fa.upload_timestamp DESC
LIMIT 100;

-- GridFS storage tier management and optimization
CREATE VIEW gridfs_storage_optimization AS
WITH current_storage_state AS (
  SELECT 
    storage_tier,
    COUNT(*) as file_count,
    SUM(file_size_bytes) as total_bytes,
    AVG(file_size_bytes) as avg_file_size,
    AVG(access_frequency_score) as avg_access_frequency,
    AVG(EXTRACT(DAYS FROM (CURRENT_TIMESTAMP - last_accessed))) as avg_days_since_access,

    -- Cost analysis (simplified model)
    SUM(file_size_bytes) * CASE storage_tier
      WHEN 'hot' THEN 0.023    -- $0.023 per GB/month
      WHEN 'warm' THEN 0.0125  -- $0.0125 per GB/month  
      WHEN 'cold' THEN 0.004   -- $0.004 per GB/month
      WHEN 'archive' THEN 0.001 -- $0.001 per GB/month
      ELSE 0.023
    END / 1024.0 / 1024.0 / 1024.0 as estimated_monthly_cost_usd,

    -- Performance characteristics  
    AVG(CASE storage_tier
      WHEN 'hot' THEN 10      -- 10ms avg access time
      WHEN 'warm' THEN 100    -- 100ms avg access time
      WHEN 'cold' THEN 1000   -- 1s avg access time  
      WHEN 'archive' THEN 15000 -- 15s avg access time
      ELSE 1000
    END) as avg_access_time_ms

  FROM GRIDFS_FILES('media_files')
  WHERE upload_timestamp >= CURRENT_TIMESTAMP - INTERVAL '365 days'
  GROUP BY storage_tier
),

optimization_opportunities AS (
  SELECT 
    file_id,
    filename,
    storage_tier,
    file_size_bytes,
    access_frequency_score,
    EXTRACT(DAYS FROM (CURRENT_TIMESTAMP - last_accessed)) as days_since_access,
    download_count,

    -- Current cost
    file_size_bytes * CASE storage_tier
      WHEN 'hot' THEN 0.023
      WHEN 'warm' THEN 0.0125  
      WHEN 'cold' THEN 0.004
      WHEN 'archive' THEN 0.001
      ELSE 0.023
    END / 1024.0 / 1024.0 / 1024.0 as current_monthly_cost_usd,

    -- Recommended tier based on access patterns
    CASE 
      WHEN access_frequency_score > 0.7 OR days_since_access <= 7 THEN 'hot'
      WHEN access_frequency_score > 0.3 OR days_since_access <= 30 THEN 'warm'
      WHEN access_frequency_score > 0.1 OR days_since_access <= 90 THEN 'cold'
      ELSE 'archive'
    END as recommended_tier,

    -- Potential savings calculation
    CASE 
      WHEN access_frequency_score > 0.7 OR days_since_access <= 7 THEN 0.023
      WHEN access_frequency_score > 0.3 OR days_since_access <= 30 THEN 0.0125
      WHEN access_frequency_score > 0.1 OR days_since_access <= 90 THEN 0.004
      ELSE 0.001
    END as recommended_cost_per_gb

  FROM GRIDFS_FILES('media_files')
  WHERE upload_timestamp >= CURRENT_TIMESTAMP - INTERVAL '365 days'
)

SELECT 
  css.storage_tier as current_tier,
  css.file_count,
  ROUND(css.total_bytes / 1024.0 / 1024.0 / 1024.0, 2) as storage_gb,
  ROUND(css.avg_file_size / 1024.0 / 1024.0, 2) as avg_file_size_mb,
  ROUND(css.avg_access_frequency, 3) as avg_access_frequency,
  ROUND(css.avg_days_since_access, 1) as avg_days_since_access,
  ROUND(css.estimated_monthly_cost_usd, 2) as current_monthly_cost_usd,
  ROUND(css.avg_access_time_ms, 0) as avg_access_time_ms,

  -- Optimization analysis
  (SELECT COUNT(*) 
   FROM optimization_opportunities oo 
   WHERE oo.storage_tier = css.storage_tier 
   AND oo.recommended_tier != oo.storage_tier) as files_needing_optimization,

  (SELECT SUM(ABS(oo.current_monthly_cost_usd - 
                   (oo.file_size_bytes * oo.recommended_cost_per_gb / 1024.0 / 1024.0 / 1024.0)))
   FROM optimization_opportunities oo 
   WHERE oo.storage_tier = css.storage_tier 
   AND oo.recommended_tier != oo.storage_tier) as potential_monthly_savings_usd,

  -- Tier health assessment
  CASE 
    WHEN css.avg_access_frequency < 0.1 AND css.storage_tier = 'hot' THEN 'overprovisioned'
    WHEN css.avg_access_frequency > 0.6 AND css.storage_tier IN ('cold', 'archive') THEN 'underprovisioned' 
    WHEN css.avg_days_since_access > 90 AND css.storage_tier IN ('hot', 'warm') THEN 'tier_too_hot'
    WHEN css.avg_days_since_access < 30 AND css.storage_tier IN ('cold', 'archive') THEN 'tier_too_cold'
    ELSE 'appropriately_tiered'
  END as tier_health_status,

  -- Recommendations
  CASE 
    WHEN css.avg_access_frequency < 0.1 AND css.storage_tier = 'hot' THEN 'Move files to cold or archive storage'
    WHEN css.avg_access_frequency > 0.6 AND css.storage_tier IN ('cold', 'archive') THEN 'Move files to hot storage'
    WHEN css.avg_days_since_access > 180 AND css.storage_tier != 'archive' THEN 'Consider archiving old files'
    ELSE 'Current tiering appears appropriate'
  END as optimization_recommendation

FROM current_storage_state css
ORDER BY css.estimated_monthly_cost_usd DESC;

-- QueryLeaf provides comprehensive GridFS capabilities:
-- 1. SQL-familiar syntax for MongoDB GridFS bucket configuration and management
-- 2. Advanced file upload and download operations with progress tracking
-- 3. Comprehensive metadata management and content analysis integration
-- 4. Intelligent storage tier management with cost optimization
-- 5. File search and retrieval with advanced filtering and relevance scoring
-- 6. Performance monitoring and optimization recommendations
-- 7. Enterprise security and compliance features built-in
-- 8. Automated file processing pipelines with thumbnail generation
-- 9. Storage efficiency analysis with deduplication and compression
-- 10. Production-ready file management with scalable architecture

Best Practices for Production GridFS Deployment

File Storage Architecture Design Principles

Essential principles for effective MongoDB GridFS production deployment:

  1. Bucket Design Strategy: Organize files into logical buckets based on content type, access patterns, and retention requirements
  2. Chunk Size Optimization: Configure appropriate chunk sizes based on file types and access patterns for optimal performance
  3. Metadata Management: Design comprehensive metadata schemas for efficient searching, categorization, and content management
  4. Storage Tier Strategy: Implement intelligent storage tiering based on file access frequency and business requirements
  5. Security Integration: Establish comprehensive access controls, encryption, and audit logging for enterprise security
  6. Performance Monitoring: Monitor upload/download performance, storage efficiency, and system resource utilization

Enterprise File Management

Design GridFS systems for enterprise-scale file operations:

  1. Content Processing Pipeline: Implement automated file processing for thumbnails, content analysis, and format optimization
  2. Disaster Recovery: Design backup strategies and cross-region replication for business continuity
  3. Compliance Management: Ensure file operations meet regulatory requirements and data retention policies
  4. API Integration: Build RESTful APIs and SDK integrations for seamless application development
  5. Monitoring and Alerting: Implement comprehensive monitoring for storage usage, performance, and operational health
  6. Capacity Planning: Monitor growth patterns and plan storage capacity and performance requirements

Conclusion

MongoDB GridFS provides comprehensive large file storage capabilities that enable sophisticated binary data management, efficient streaming operations, and integrated metadata handling through distributed chunk-based storage, automatic replication, and transactional consistency. The native file management tools and streaming interfaces ensure that applications can handle large files efficiently with minimal infrastructure complexity.

Key MongoDB GridFS benefits include:

  • Efficient Binary Storage: Advanced chunk-based storage with compression, deduplication, and intelligent space optimization
  • Integrated Metadata Management: Comprehensive metadata handling with full-text search, tagging, and content analysis capabilities
  • Streaming Operations: High-performance upload and download streaming with progress tracking and parallel processing
  • Distributed Architecture: Built-in replication and distributed storage through MongoDB's replica set technology
  • Transaction Integration: Full transactional consistency between file operations and database operations within MongoDB
  • SQL Accessibility: Familiar SQL-style file management operations through QueryLeaf for accessible binary data operations

Whether you're building document management systems, media streaming platforms, enterprise content repositories, or distributed file storage solutions, MongoDB GridFS with QueryLeaf's familiar SQL interface provides the foundation for sophisticated, scalable file operations.

QueryLeaf Integration: QueryLeaf automatically optimizes MongoDB GridFS operations while providing SQL-familiar syntax for file storage, retrieval, and management. Advanced file processing, content analysis, and storage optimization are seamlessly handled through familiar SQL constructs, making sophisticated binary data management accessible to SQL-oriented development teams.

The combination of MongoDB GridFS's robust file storage capabilities with SQL-style file operations makes it an ideal platform for applications requiring both large file handling and familiar database management patterns, ensuring your file storage infrastructure can scale efficiently while maintaining operational simplicity and developer productivity.

MongoDB Index Optimization and Query Performance Analysis: Advanced Database Performance Tuning and Query Optimization for High-Performance Applications

High-performance database applications require sophisticated indexing strategies and comprehensive query optimization techniques that can handle complex query patterns, large data volumes, and evolving access requirements while maintaining optimal response times. Traditional database optimization approaches often struggle with dynamic workloads, compound query patterns, and the complexity of managing multiple index strategies across diverse data access patterns, leading to suboptimal performance, excessive resource consumption, and operational challenges in production environments.

MongoDB provides comprehensive index optimization capabilities through advanced indexing strategies, sophisticated query analysis tools, and intelligent performance monitoring features that enable database administrators and developers to achieve optimal query performance with minimal resource overhead. Unlike traditional databases that require complex index tuning procedures and manual optimization workflows, MongoDB integrates performance analysis directly into the database with automated index recommendations, real-time query analysis, and built-in optimization guidance.

The Traditional Query Performance Challenge

Conventional approaches to database query optimization in relational systems face significant limitations in performance analysis and index management:

-- Traditional PostgreSQL query optimization - manual index management with limited analysis capabilities

-- Basic index tracking table with minimal functionality
CREATE TABLE index_usage_stats (
    index_id SERIAL PRIMARY KEY,
    schema_name VARCHAR(100) NOT NULL,
    table_name VARCHAR(100) NOT NULL,
    index_name VARCHAR(100) NOT NULL,
    index_type VARCHAR(50),

    -- Basic usage statistics (very limited visibility)
    index_scans BIGINT DEFAULT 0,
    tuples_read BIGINT DEFAULT 0,
    tuples_fetched BIGINT DEFAULT 0,

    -- Size information (manual tracking)
    index_size_bytes BIGINT,
    table_size_bytes BIGINT,

    -- Basic metadata
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    last_analyzed TIMESTAMP,
    is_unique BOOLEAN DEFAULT false,
    is_partial BOOLEAN DEFAULT false,

    -- Simple effectiveness metrics
    scan_ratio DECIMAL(10,4),
    selectivity_estimate DECIMAL(10,4)
);

-- Query performance tracking table (basic functionality)
CREATE TABLE query_performance_log (
    query_id SERIAL PRIMARY KEY,
    query_hash VARCHAR(64),
    query_text TEXT,

    -- Basic execution metrics
    execution_time_ms INTEGER,
    rows_examined BIGINT,
    rows_returned BIGINT,
    execution_timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    -- Resource usage (limited tracking)
    cpu_usage_ms INTEGER,
    memory_usage_kb INTEGER,
    disk_reads INTEGER,

    -- Connection information
    database_name VARCHAR(100),
    username VARCHAR(100),
    application_name VARCHAR(100),

    -- Basic query plan information (very limited)
    query_plan_hash VARCHAR(64),
    index_usage TEXT[], -- Simple array of index names

    -- Performance classification
    performance_category VARCHAR(50) DEFAULT 'unknown'
);

-- Manual query plan analysis function (very basic capabilities)
CREATE OR REPLACE FUNCTION analyze_query_performance(
    query_text_param TEXT,
    execution_count INTEGER DEFAULT 1
) RETURNS TABLE (
    avg_execution_time_ms INTEGER,
    total_rows_examined BIGINT,
    total_rows_returned BIGINT,
    selectivity_ratio DECIMAL(10,4),
    suggested_indexes TEXT[],
    performance_rating VARCHAR(20)
) AS $$
DECLARE
    total_execution_time INTEGER := 0;
    total_examined BIGINT := 0;
    total_returned BIGINT := 0;
    execution_counter INTEGER := 0;
    current_execution_time INTEGER;
    current_examined BIGINT;
    current_returned BIGINT;
    plan_info TEXT;
BEGIN
    -- Simulate multiple query executions for analysis
    WHILE execution_counter < execution_count LOOP
        -- Execute EXPLAIN ANALYZE (simplified simulation)
        BEGIN
            -- This would be an actual EXPLAIN ANALYZE in reality
            EXECUTE 'EXPLAIN ANALYZE ' || query_text_param INTO plan_info;

            -- Extract basic metrics (very simplified parsing)
            current_execution_time := (random() * 1000 + 10)::INTEGER; -- Simulated execution time
            current_examined := (random() * 10000 + 100)::BIGINT; -- Simulated rows examined
            current_returned := (random() * 1000 + 10)::BIGINT; -- Simulated rows returned

            total_execution_time := total_execution_time + current_execution_time;
            total_examined := total_examined + current_examined;
            total_returned := total_returned + current_returned;

            -- Log query performance
            INSERT INTO query_performance_log (
                query_text,
                execution_time_ms,
                rows_examined,
                rows_returned,
                query_plan_hash
            ) VALUES (
                query_text_param,
                current_execution_time,
                current_examined,
                current_returned,
                md5(plan_info)
            );

        EXCEPTION WHEN OTHERS THEN
            -- Basic error handling
            current_execution_time := 9999; -- Error indicator
            current_examined := 0;
            current_returned := 0;
        END;

        execution_counter := execution_counter + 1;
    END LOOP;

    -- Calculate average metrics
    RETURN QUERY SELECT 
        (total_execution_time / execution_count)::INTEGER,
        total_examined,
        total_returned,
        CASE 
            WHEN total_examined > 0 THEN (total_returned::DECIMAL / total_examined)
            ELSE 0
        END,

        -- Very basic index suggestions (limited analysis)
        CASE 
            WHEN total_execution_time > 1000 THEN ARRAY['Consider adding indexes on WHERE clause columns']
            WHEN total_examined > total_returned * 10 THEN ARRAY['Add indexes to improve selectivity']
            ELSE ARRAY['Performance appears acceptable']
        END::TEXT[],

        -- Simple performance rating
        CASE 
            WHEN total_execution_time < 100 THEN 'excellent'
            WHEN total_execution_time < 500 THEN 'good'
            WHEN total_execution_time < 1000 THEN 'acceptable'
            ELSE 'poor'
        END;

END;
$$ LANGUAGE plpgsql;

-- Execute query performance analysis (basic functionality)
SELECT * FROM analyze_query_performance('SELECT * FROM users WHERE email = ''test@example.com'' AND created_at > ''2023-01-01''', 5);

-- Index effectiveness monitoring (limited capabilities)
WITH index_effectiveness AS (
    SELECT 
        ius.schema_name,
        ius.table_name,
        ius.index_name,
        ius.index_type,
        ius.index_scans,
        ius.tuples_read,
        ius.tuples_fetched,
        ius.index_size_bytes,

        -- Basic effectiveness calculations
        CASE 
            WHEN ius.index_scans > 0 AND ius.tuples_read > 0 THEN
                ius.tuples_fetched::DECIMAL / ius.tuples_read
            ELSE 0
        END as fetch_ratio,

        CASE 
            WHEN ius.table_size_bytes > 0 AND ius.index_size_bytes > 0 THEN
                (ius.index_size_bytes::DECIMAL / ius.table_size_bytes) * 100
            ELSE 0
        END as size_overhead_percent,

        -- Usage frequency analysis
        CASE 
            WHEN ius.index_scans = 0 THEN 'unused'
            WHEN ius.index_scans < 10 THEN 'rarely_used'
            WHEN ius.index_scans < 100 THEN 'moderately_used'
            ELSE 'frequently_used'
        END as usage_category

    FROM index_usage_stats ius
    WHERE ius.last_analyzed >= CURRENT_DATE - INTERVAL '7 days'
),

query_patterns AS (
    SELECT 
        qpl.database_name,
        qpl.query_hash,
        COUNT(*) as execution_count,
        AVG(qpl.execution_time_ms) as avg_execution_time,
        MAX(qpl.execution_time_ms) as max_execution_time,
        AVG(qpl.rows_examined) as avg_rows_examined,
        AVG(qpl.rows_returned) as avg_rows_returned,

        -- Performance trend analysis (very basic)
        CASE 
            WHEN COUNT(*) > 100 AND AVG(qpl.execution_time_ms) > 500 THEN 'high_impact_slow'
            WHEN COUNT(*) > 1000 THEN 'high_frequency'
            WHEN AVG(qpl.execution_time_ms) > 1000 THEN 'slow_query'
            ELSE 'normal'
        END as query_pattern_type,

        -- Index usage analysis from query logs
        STRING_AGG(DISTINCT unnest(qpl.index_usage), ', ') as indexes_used,

        -- Execution time trends
        PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY qpl.execution_time_ms) as p95_execution_time

    FROM query_performance_log qpl
    WHERE qpl.execution_timestamp >= CURRENT_DATE - INTERVAL '7 days'
    GROUP BY qpl.database_name, qpl.query_hash
)

SELECT 
    ie.schema_name,
    ie.table_name,
    ie.index_name,
    ie.index_type,
    ie.usage_category,

    -- Index effectiveness metrics
    ie.index_scans,
    ROUND(ie.fetch_ratio, 4) as selectivity_ratio,
    ROUND(ie.size_overhead_percent, 2) as size_overhead_percent,

    -- Size analysis
    ROUND(ie.index_size_bytes / 1024.0 / 1024.0, 2) as index_size_mb,

    -- Related query patterns
    COUNT(qp.query_hash) as related_query_patterns,
    COALESCE(AVG(qp.avg_execution_time), 0) as avg_query_time_using_index,
    COALESCE(AVG(qp.avg_rows_examined), 0) as avg_rows_examined,

    -- Index recommendations (very basic logic)
    CASE 
        WHEN ie.usage_category = 'unused' AND ie.index_size_bytes > 100*1024*1024 THEN 'consider_dropping'
        WHEN ie.fetch_ratio < 0.1 AND ie.index_scans > 0 THEN 'poor_selectivity'
        WHEN ie.usage_category = 'frequently_used' AND ie.fetch_ratio > 0.8 THEN 'high_performance'
        WHEN ie.size_overhead_percent > 50 THEN 'review_necessity'
        ELSE 'monitor'
    END as recommendation,

    -- Performance impact assessment
    CASE 
        WHEN ie.usage_category IN ('frequently_used', 'moderately_used') AND ie.fetch_ratio > 0.5 THEN 'positive_impact'
        WHEN ie.usage_category = 'unused' THEN 'no_impact'
        WHEN ie.fetch_ratio < 0.1 THEN 'negative_impact'
        ELSE 'unclear_impact'
    END as performance_impact

FROM index_effectiveness ie
LEFT JOIN query_patterns qp ON qp.indexes_used LIKE '%' || ie.index_name || '%'
GROUP BY 
    ie.schema_name, ie.table_name, ie.index_name, ie.index_type, 
    ie.usage_category, ie.index_scans, ie.fetch_ratio, 
    ie.size_overhead_percent, ie.index_size_bytes
ORDER BY 
    ie.index_scans DESC, 
    ie.fetch_ratio DESC,
    ie.index_size_bytes DESC;

-- Query optimization recommendations (very limited analysis)
WITH slow_queries AS (
    SELECT 
        query_hash,
        query_text,
        COUNT(*) as execution_count,
        AVG(execution_time_ms) as avg_time,
        MAX(execution_time_ms) as max_time,
        AVG(rows_examined) as avg_examined,
        AVG(rows_returned) as avg_returned,

        -- Basic pattern detection
        CASE 
            WHEN query_text ILIKE '%WHERE%=%' THEN 'equality_filter'
            WHEN query_text ILIKE '%WHERE%>%' OR query_text ILIKE '%WHERE%<%' THEN 'range_filter'
            WHEN query_text ILIKE '%ORDER BY%' THEN 'sorting'
            WHEN query_text ILIKE '%GROUP BY%' THEN 'aggregation'
            ELSE 'unknown_pattern'
        END as query_pattern

    FROM query_performance_log
    WHERE execution_time_ms > 500  -- Focus on slow queries
    AND execution_timestamp >= CURRENT_DATE - INTERVAL '24 hours'
    GROUP BY query_hash, query_text
    HAVING COUNT(*) >= 5  -- Frequently executed slow queries
)

SELECT 
    sq.query_hash,
    LEFT(sq.query_text, 100) || '...' as query_preview,
    sq.execution_count,
    ROUND(sq.avg_time, 0) as avg_execution_ms,
    sq.max_time as max_execution_ms,
    ROUND(sq.avg_examined, 0) as avg_rows_examined,
    ROUND(sq.avg_returned, 0) as avg_rows_returned,
    sq.query_pattern,

    -- Selectivity analysis
    CASE 
        WHEN sq.avg_examined > 0 THEN 
            ROUND((sq.avg_returned / sq.avg_examined) * 100, 2)
        ELSE 0
    END as selectivity_percent,

    -- Impact assessment
    ROUND(sq.execution_count * sq.avg_time, 0) as total_time_impact_ms,

    -- Basic optimization suggestions (very limited)
    CASE 
        WHEN sq.query_pattern = 'equality_filter' AND sq.avg_examined > sq.avg_returned * 10 THEN 
            'Add single-column index on equality filter columns'
        WHEN sq.query_pattern = 'range_filter' AND sq.avg_time > 1000 THEN 
            'Consider range-optimized index or query rewrite'
        WHEN sq.query_pattern = 'sorting' AND sq.avg_time > 800 THEN 
            'Add index supporting ORDER BY clause'
        WHEN sq.query_pattern = 'aggregation' AND sq.avg_examined > 10000 THEN 
            'Consider partial index or pre-aggregated data'
        WHEN sq.avg_examined > sq.avg_returned * 100 THEN 
            'Review query selectivity and indexing strategy'
        ELSE 'Manual analysis required'
    END as optimization_suggestion,

    -- Priority assessment
    CASE 
        WHEN sq.execution_count > 100 AND sq.avg_time > 1000 THEN 'high'
        WHEN sq.execution_count > 50 OR sq.avg_time > 2000 THEN 'medium'
        ELSE 'low'
    END as optimization_priority

FROM slow_queries sq
ORDER BY 
    CASE 
        WHEN sq.execution_count > 100 AND sq.avg_time > 1000 THEN 1
        WHEN sq.execution_count > 50 OR sq.avg_time > 2000 THEN 2
        ELSE 3
    END,
    (sq.execution_count * sq.avg_time) DESC;

-- Problems with traditional query optimization approaches:
-- 1. Manual index management with no automated recommendations
-- 2. Limited query plan analysis and optimization guidance
-- 3. Basic performance metrics with no comprehensive analysis
-- 4. No real-time query performance monitoring
-- 5. Minimal index effectiveness assessment
-- 6. Complex manual tuning procedures requiring deep database expertise
-- 7. No support for compound index optimization strategies
-- 8. Limited visibility into query execution patterns and resource usage
-- 9. Basic alerting with no proactive optimization suggestions
-- 10. No integration with application performance monitoring systems

MongoDB provides comprehensive index optimization with advanced query performance analysis capabilities:

// MongoDB Advanced Index Optimization and Query Performance Analysis
const { MongoClient } = require('mongodb');
const { EventEmitter } = require('events');

// Comprehensive MongoDB Performance Optimizer
class AdvancedPerformanceOptimizer extends EventEmitter {
  constructor(mongoUri, optimizationConfig = {}) {
    super();
    this.mongoUri = mongoUri;
    this.client = null;
    this.db = null;

    // Advanced optimization configuration
    this.config = {
      // Performance analysis configuration
      enableQueryProfiling: optimizationConfig.enableQueryProfiling !== false,
      profilingSampleRate: optimizationConfig.profilingSampleRate || 0.1,
      slowQueryThresholdMs: optimizationConfig.slowQueryThresholdMs || 100,

      // Index optimization settings
      enableAutomaticIndexRecommendations: optimizationConfig.enableAutomaticIndexRecommendations !== false,
      enableIndexUsageAnalysis: optimizationConfig.enableIndexUsageAnalysis !== false,
      enableCompoundIndexOptimization: optimizationConfig.enableCompoundIndexOptimization || false,

      // Monitoring and alerting
      enablePerformanceMonitoring: optimizationConfig.enablePerformanceMonitoring !== false,
      enableRealTimeAnalysis: optimizationConfig.enableRealTimeAnalysis || false,
      enablePerformanceAlerting: optimizationConfig.enablePerformanceAlerting || false,

      // Analysis parameters
      analysisWindowHours: optimizationConfig.analysisWindowHours || 24,
      minQueryExecutions: optimizationConfig.minQueryExecutions || 10,
      indexUsageThreshold: optimizationConfig.indexUsageThreshold || 0.1,

      // Resource optimization
      enableResourceOptimization: optimizationConfig.enableResourceOptimization || false,
      enableQueryPlanCaching: optimizationConfig.enableQueryPlanCaching !== false,
      enableConnectionPoolOptimization: optimizationConfig.enableConnectionPoolOptimization || false
    };

    // Performance tracking and analysis state
    this.queryPatterns = new Map();
    this.indexUsageStats = new Map();
    this.performanceMetrics = new Map();
    this.optimizationRecommendations = [];

    // Query execution tracking
    this.queryExecutionHistory = [];
    this.slowQueryLog = [];
    this.indexEffectivenessCache = new Map();

    this.initializePerformanceOptimizer();
  }

  async initializePerformanceOptimizer() {
    console.log('Initializing advanced MongoDB performance optimizer...');

    try {
      // Connect to MongoDB
      this.client = new MongoClient(this.mongoUri, {
        // Optimized connection settings
        maxPoolSize: 20,
        minPoolSize: 5,
        maxIdleTimeMS: 30000,
        serverSelectionTimeoutMS: 5000,
        heartbeatFrequencyMS: 10000
      });

      await this.client.connect();
      this.db = this.client.db();

      // Setup performance monitoring infrastructure
      await this.setupPerformanceInfrastructure();

      // Enable query profiling if configured
      if (this.config.enableQueryProfiling) {
        await this.enableQueryProfiling();
      }

      // Start real-time monitoring if enabled
      if (this.config.enableRealTimeAnalysis) {
        await this.startRealTimeMonitoring();
      }

      // Initialize index analysis
      if (this.config.enableIndexUsageAnalysis) {
        await this.initializeIndexAnalysis();
      }

      console.log('Advanced performance optimizer initialized successfully');

    } catch (error) {
      console.error('Error initializing performance optimizer:', error);
      throw error;
    }
  }

  async setupPerformanceInfrastructure() {
    console.log('Setting up performance monitoring infrastructure...');

    try {
      // Create collections for performance tracking
      const collections = {
        queryPerformanceLog: this.db.collection('query_performance_log'),
        indexUsageStats: this.db.collection('index_usage_stats'),
        performanceMetrics: this.db.collection('performance_metrics'),
        optimizationRecommendations: this.db.collection('optimization_recommendations'),
        queryPatterns: this.db.collection('query_patterns')
      };

      // Create indexes for performance collections
      await collections.queryPerformanceLog.createIndex(
        { timestamp: -1, executionTimeMs: -1 },
        { background: true, expireAfterSeconds: 7 * 24 * 60 * 60 } // 7 days retention
      );

      await collections.indexUsageStats.createIndex(
        { collection: 1, indexName: 1, timestamp: -1 },
        { background: true }
      );

      await collections.performanceMetrics.createIndex(
        { metricType: 1, timestamp: -1 },
        { background: true, expireAfterSeconds: 30 * 24 * 60 * 60 } // 30 days retention
      );

      this.collections = collections;

    } catch (error) {
      console.error('Error setting up performance infrastructure:', error);
      throw error;
    }
  }

  async enableQueryProfiling() {
    console.log('Enabling MongoDB query profiling...');

    try {
      // Set profiling level based on configuration
      await this.db.admin().command({
        profile: 2, // Profile all operations
        slowms: this.config.slowQueryThresholdMs,
        sampleRate: this.config.profilingSampleRate
      });

      console.log(`Query profiling enabled with ${this.config.slowQueryThresholdMs}ms threshold and ${this.config.profilingSampleRate} sample rate`);

    } catch (error) {
      console.error('Error enabling query profiling:', error);
      // Don't throw - profiling is optional
    }
  }

  async analyzeQueryPerformance(timeRangeHours = 24) {
    console.log(`Analyzing query performance for the last ${timeRangeHours} hours...`);

    try {
      const analysisStartTime = new Date(Date.now() - (timeRangeHours * 60 * 60 * 1000));

      // Analyze profiler data for slow queries and patterns
      const slowQueries = await this.analyzeSlowQueries(analysisStartTime);
      const queryPatterns = await this.analyzeQueryPatterns(analysisStartTime);
      const indexUsageAnalysis = await this.analyzeIndexUsage(analysisStartTime);

      // Generate performance insights
      const performanceInsights = {
        analysisTimestamp: new Date(),
        timeRangeHours: timeRangeHours,

        // Query performance summary
        queryPerformanceSummary: {
          totalQueries: slowQueries.totalQueries,
          slowQueries: slowQueries.slowQueryCount,
          averageExecutionTime: slowQueries.averageExecutionTime,
          p95ExecutionTime: slowQueries.p95ExecutionTime,
          p99ExecutionTime: slowQueries.p99ExecutionTime,

          // Query type distribution
          queryTypeDistribution: queryPatterns.queryTypeDistribution,

          // Resource usage patterns
          resourceUsage: {
            totalExaminedDocuments: slowQueries.totalExaminedDocuments,
            totalReturnedDocuments: slowQueries.totalReturnedDocuments,
            averageSelectivityRatio: slowQueries.averageSelectivityRatio
          }
        },

        // Index effectiveness analysis
        indexEffectiveness: {
          totalIndexes: indexUsageAnalysis.totalIndexes,
          activelyUsedIndexes: indexUsageAnalysis.activelyUsedIndexes,
          unusedIndexes: indexUsageAnalysis.unusedIndexes,
          inefficientIndexes: indexUsageAnalysis.inefficientIndexes,

          // Index usage patterns
          indexUsagePatterns: indexUsageAnalysis.usagePatterns,

          // Index performance metrics
          averageIndexSelectivity: indexUsageAnalysis.averageSelectivity,
          indexSizeOverhead: indexUsageAnalysis.totalIndexSizeBytes
        },

        // Performance bottlenecks
        performanceBottlenecks: await this.identifyPerformanceBottlenecks(slowQueries, queryPatterns, indexUsageAnalysis),

        // Optimization opportunities
        optimizationOpportunities: await this.generateOptimizationRecommendations(slowQueries, queryPatterns, indexUsageAnalysis)
      };

      // Store performance analysis results
      await this.collections.performanceMetrics.insertOne({
        metricType: 'comprehensive_analysis',
        timestamp: new Date(),
        analysisResults: performanceInsights
      });

      this.emit('performanceAnalysisCompleted', performanceInsights);

      return {
        success: true,
        analysisResults: performanceInsights
      };

    } catch (error) {
      console.error('Error analyzing query performance:', error);
      return {
        success: false,
        error: error.message
      };
    }
  }

  async analyzeSlowQueries(startTime) {
    console.log('Analyzing slow query patterns...');

    try {
      // Query the profiler collection for slow queries
      const profilerCollection = this.db.collection('system.profile');

      const slowQueryAggregation = [
        {
          $match: {
            ts: { $gte: startTime },
            op: { $in: ['query', 'getmore'] }, // Focus on read operations
            millis: { $gte: this.config.slowQueryThresholdMs }
          }
        },
        {
          $addFields: {
            // Normalize query shape for pattern analysis
            queryShape: {
              $function: {
                body: function(command) {
                  // Simplified query shape normalization
                  if (!command || !command.find) return 'unknown';

                  const filter = command.find.filter || {};
                  const sort = command.find.sort || {};
                  const projection = command.find.projection || {};

                  // Create shape by replacing values with type indicators
                  const shapeFilter = Object.keys(filter).reduce((acc, key) => {
                    acc[key] = typeof filter[key];
                    return acc;
                  }, {});

                  return JSON.stringify({
                    filter: shapeFilter,
                    sort: Object.keys(sort),
                    projection: Object.keys(projection)
                  });
                },
                args: ['$command'],
                lang: 'js'
              }
            },

            // Extract collection name
            targetCollection: {
              $ifNull: ['$command.find', '$command.collection']
            },

            // Calculate selectivity ratio
            selectivityRatio: {
              $cond: [
                { $and: [{ $gt: ['$docsExamined', 0] }, { $gt: ['$nreturned', 0] }] },
                { $divide: ['$nreturned', '$docsExamined'] },
                0
              ]
            }
          }
        },
        {
          $group: {
            _id: {
              queryShape: '$queryShape',
              collection: '$targetCollection'
            },

            // Execution statistics
            executionCount: { $sum: 1 },
            totalExecutionTime: { $sum: '$millis' },
            averageExecutionTime: { $avg: '$millis' },
            maxExecutionTime: { $max: '$millis' },
            minExecutionTime: { $min: '$millis' },

            // Document examination statistics
            totalDocsExamined: { $sum: '$docsExamined' },
            totalDocsReturned: { $sum: '$nreturned' },
            averageSelectivity: { $avg: '$selectivityRatio' },

            // Index usage tracking
            indexesUsed: { $addToSet: '$planSummary' },

            // Resource usage
            totalKeysExamined: { $sum: '$keysExamined' },

            // Sample query for reference
            sampleQuery: { $first: '$command' },
            sampleTimestamp: { $first: '$ts' }
          }
        },
        {
          $addFields: {
            // Calculate performance impact
            performanceImpact: {
              $multiply: ['$executionCount', '$averageExecutionTime']
            },

            // Assess query efficiency
            queryEfficiency: {
              $cond: [
                { $gt: ['$averageSelectivity', 0.1] },
                'efficient',
                { $cond: [{ $gt: ['$averageSelectivity', 0.01] }, 'moderate', 'inefficient'] }
              ]
            }
          }
        },
        {
          $sort: { performanceImpact: -1 }
        },
        {
          $limit: 100 // Top 100 slow query patterns
        }
      ];

      const slowQueryResults = await profilerCollection.aggregate(slowQueryAggregation).toArray();

      // Calculate summary statistics
      const totalQueries = slowQueryResults.reduce((sum, query) => sum + query.executionCount, 0);
      const totalExecutionTime = slowQueryResults.reduce((sum, query) => sum + query.totalExecutionTime, 0);
      const allExecutionTimes = slowQueryResults.flatMap(query => Array(query.executionCount).fill(query.averageExecutionTime));

      // Calculate percentiles
      allExecutionTimes.sort((a, b) => a - b);
      const p95Index = Math.floor(allExecutionTimes.length * 0.95);
      const p99Index = Math.floor(allExecutionTimes.length * 0.99);

      return {
        slowQueryPatterns: slowQueryResults,
        totalQueries: totalQueries,
        slowQueryCount: slowQueryResults.length,
        averageExecutionTime: totalQueries > 0 ? totalExecutionTime / totalQueries : 0,
        p95ExecutionTime: allExecutionTimes[p95Index] || 0,
        p99ExecutionTime: allExecutionTimes[p99Index] || 0,
        totalExaminedDocuments: slowQueryResults.reduce((sum, query) => sum + query.totalDocsExamined, 0),
        totalReturnedDocuments: slowQueryResults.reduce((sum, query) => sum + query.totalDocsReturned, 0),
        averageSelectivityRatio: slowQueryResults.length > 0 
          ? slowQueryResults.reduce((sum, query) => sum + (query.averageSelectivity || 0), 0) / slowQueryResults.length 
          : 0
      };

    } catch (error) {
      console.error('Error analyzing slow queries:', error);
      throw error;
    }
  }

  async analyzeQueryPatterns(startTime) {
    console.log('Analyzing query execution patterns...');

    try {
      const profilerCollection = this.db.collection('system.profile');

      // Analyze query type distribution and patterns
      const queryPatternAggregation = [
        {
          $match: {
            ts: { $gte: startTime },
            op: { $in: ['query', 'getmore', 'update', 'delete', 'insert'] }
          }
        },
        {
          $addFields: {
            // Categorize query operations
            queryCategory: {
              $switch: {
                branches: [
                  {
                    case: { $eq: ['$op', 'query'] },
                    then: {
                      $cond: [
                        { $ifNull: ['$command.find.sort', false] },
                        'sorted_query',
                        { $cond: [
                          { $gt: [{ $size: { $objectToArray: { $ifNull: ['$command.find.filter', {}] } } }, 0] },
                          'filtered_query',
                          'full_scan'
                        ]}
                      ]
                    }
                  },
                  { case: { $eq: ['$op', 'update'] }, then: 'update_operation' },
                  { case: { $eq: ['$op', 'delete'] }, then: 'delete_operation' },
                  { case: { $eq: ['$op', 'insert'] }, then: 'insert_operation' }
                ],
                default: 'other_operation'
              }
            },

            // Analyze query complexity
            queryComplexity: {
              $switch: {
                branches: [
                  {
                    case: { $and: [
                      { $eq: ['$op', 'query'] },
                      { $gt: [{ $size: { $objectToArray: { $ifNull: ['$command.find.filter', {}] } } }, 5] }
                    ]},
                    then: 'complex'
                  },
                  {
                    case: { $and: [
                      { $eq: ['$op', 'query'] },
                      { $gt: [{ $size: { $objectToArray: { $ifNull: ['$command.find.filter', {}] } } }, 2] }
                    ]},
                    then: 'moderate'
                  }
                ],
                default: 'simple'
              }
            }
          }
        },
        {
          $group: {
            _id: {
              collection: { $ifNull: ['$command.find', '$command.collection', '$ns'] },
              queryCategory: '$queryCategory',
              queryComplexity: '$queryComplexity'
            },

            // Pattern statistics
            executionCount: { $sum: 1 },
            averageExecutionTime: { $avg: '$millis' },
            totalExecutionTime: { $sum: '$millis' },

            // Resource usage patterns
            averageDocsExamined: { $avg: '$docsExamined' },
            averageDocsReturned: { $avg: '$nreturned' },

            // Index usage patterns
            commonIndexes: { $addToSet: '$planSummary' },

            // Performance characteristics
            maxExecutionTime: { $max: '$millis' },
            minExecutionTime: { $min: '$millis' }
          }
        },
        {
          $sort: { totalExecutionTime: -1 }
        }
      ];

      const queryPatternResults = await profilerCollection.aggregate(queryPatternAggregation).toArray();

      // Calculate query type distribution
      const queryTypeDistribution = queryPatternResults.reduce((distribution, pattern) => {
        const category = pattern._id.queryCategory;
        if (!distribution[category]) {
          distribution[category] = {
            count: 0,
            totalTime: 0,
            avgTime: 0
          };
        }

        distribution[category].count += pattern.executionCount;
        distribution[category].totalTime += pattern.totalExecutionTime;
        distribution[category].avgTime = distribution[category].totalTime / distribution[category].count;

        return distribution;
      }, {});

      return {
        queryPatterns: queryPatternResults,
        queryTypeDistribution: queryTypeDistribution,
        totalPatterns: queryPatternResults.length
      };

    } catch (error) {
      console.error('Error analyzing query patterns:', error);
      throw error;
    }
  }

  async analyzeIndexUsage(startTime) {
    console.log('Analyzing index usage effectiveness...');

    try {
      // Get all collections for comprehensive index analysis
      const collections = await this.db.listCollections().toArray();
      const indexAnalysisResults = [];

      for (const collectionInfo of collections) {
        if (collectionInfo.type === 'collection') {
          const collection = this.db.collection(collectionInfo.name);

          // Get index information
          const indexes = await collection.indexes();

          // Analyze each index
          for (const index of indexes) {
            try {
              // Get index usage statistics
              const indexStats = await collection.aggregate([
                { $indexStats: {} },
                { $match: { name: index.name } }
              ]).toArray();

              const indexStat = indexStats[0];

              if (indexStat) {
                // Calculate index effectiveness metrics
                const indexAnalysis = {
                  collection: collectionInfo.name,
                  indexName: index.name,
                  indexKeys: index.key,
                  indexType: this.determineIndexType(index),

                  // Usage statistics
                  usageCount: indexStat.accesses?.ops || 0,
                  lastUsed: indexStat.accesses?.since || null,

                  // Size and storage information
                  indexSize: index.size || 0,

                  // Effectiveness calculations
                  usageEffectiveness: this.calculateIndexEffectiveness(indexStat, index),

                  // Index health assessment
                  healthStatus: this.assessIndexHealth(indexStat, index),

                  // Optimization opportunities
                  optimizationOpportunities: await this.identifyIndexOptimizations(collection, index, indexStat)
                };

                indexAnalysisResults.push(indexAnalysis);
              }

            } catch (indexError) {
              console.warn(`Error analyzing index ${index.name} on ${collectionInfo.name}:`, indexError.message);
            }
          }
        }
      }

      // Calculate summary statistics
      const totalIndexes = indexAnalysisResults.length;
      const activelyUsedIndexes = indexAnalysisResults.filter(index => index.usageCount > 0).length;
      const unusedIndexes = indexAnalysisResults.filter(index => index.usageCount === 0);
      const inefficientIndexes = indexAnalysisResults.filter(index => 
        index.healthStatus === 'inefficient' || index.usageEffectiveness < 0.1
      );

      // Analyze usage patterns
      const usagePatterns = this.analyzeIndexUsagePatterns(indexAnalysisResults);

      return {
        indexAnalysisResults: indexAnalysisResults,
        totalIndexes: totalIndexes,
        activelyUsedIndexes: activelyUsedIndexes,
        unusedIndexes: unusedIndexes,
        inefficientIndexes: inefficientIndexes,
        usagePatterns: usagePatterns,
        averageSelectivity: this.calculateAverageIndexSelectivity(indexAnalysisResults),
        totalIndexSizeBytes: indexAnalysisResults.reduce((total, index) => total + (index.indexSize || 0), 0)
      };

    } catch (error) {
      console.error('Error analyzing index usage:', error);
      throw error;
    }
  }

  async generateOptimizationRecommendations(slowQueries, queryPatterns, indexUsage) {
    console.log('Generating performance optimization recommendations...');

    try {
      const recommendations = [];

      // Analyze slow queries for index recommendations
      for (const slowQuery of slowQueries.slowQueryPatterns) {
        if (slowQuery.averageSelectivity < 0.1 && slowQuery.executionCount > this.config.minQueryExecutions) {
          recommendations.push({
            type: 'index_recommendation',
            priority: 'high',
            collection: slowQuery._id.collection,
            issue: 'Low selectivity query pattern with high execution frequency',
            recommendation: await this.generateIndexRecommendation(slowQuery),
            expectedImprovement: this.estimatePerformanceImprovement(slowQuery),
            implementationComplexity: 'medium',
            estimatedImpact: slowQuery.performanceImpact
          });
        }
      }

      // Analyze unused indexes
      for (const unusedIndex of indexUsage.unusedIndexes) {
        if (unusedIndex.indexName !== '_id_') { // Never recommend dropping _id index
          recommendations.push({
            type: 'index_cleanup',
            priority: 'medium',
            collection: unusedIndex.collection,
            issue: `Unused index consuming storage space: ${unusedIndex.indexName}`,
            recommendation: `Consider dropping unused index '${unusedIndex.indexName}' to save ${Math.round((unusedIndex.indexSize || 0) / 1024 / 1024)} MB storage`,
            expectedImprovement: {
              storageReduction: unusedIndex.indexSize || 0,
              maintenanceOverheadReduction: 'low'
            },
            implementationComplexity: 'low',
            estimatedImpact: unusedIndex.indexSize || 0
          });
        }
      }

      // Analyze compound index opportunities
      if (this.config.enableCompoundIndexOptimization) {
        const compoundIndexOpportunities = await this.analyzeCompoundIndexOpportunities(queryPatterns);
        recommendations.push(...compoundIndexOpportunities);
      }

      // Sort recommendations by priority and estimated impact
      recommendations.sort((a, b) => {
        const priorityOrder = { high: 3, medium: 2, low: 1 };
        const priorityDiff = priorityOrder[b.priority] - priorityOrder[a.priority];

        if (priorityDiff !== 0) return priorityDiff;
        return (b.estimatedImpact || 0) - (a.estimatedImpact || 0);
      });

      return recommendations.slice(0, 20); // Return top 20 recommendations

    } catch (error) {
      console.error('Error generating optimization recommendations:', error);
      return [];
    }
  }

  async generateIndexRecommendation(slowQuery) {
    try {
      // Analyze the query shape to determine optimal index structure
      const queryShape = JSON.parse(slowQuery._id.queryShape);
      const filterFields = Object.keys(queryShape.filter || {});
      const sortFields = queryShape.sort || [];

      let recommendedIndex = {};

      // Build compound index recommendation based on query patterns
      // Rule 1: Equality filters first
      filterFields.forEach(field => {
        if (queryShape.filter[field] === 'string' || queryShape.filter[field] === 'number') {
          recommendedIndex[field] = 1;
        }
      });

      // Rule 2: Range filters after equality filters
      filterFields.forEach(field => {
        if (queryShape.filter[field] === 'object') { // Likely range query
          if (!recommendedIndex[field]) {
            recommendedIndex[field] = 1;
          }
        }
      });

      // Rule 3: Sort fields last
      sortFields.forEach(field => {
        if (!recommendedIndex[field]) {
          recommendedIndex[field] = 1;
        }
      });

      return {
        suggestedIndex: recommendedIndex,
        indexCommand: `db.${slowQuery._id.collection}.createIndex(${JSON.stringify(recommendedIndex)})`,
        reasoning: `Compound index optimized for query pattern with ${filterFields.length} filter fields and ${sortFields.length} sort fields`,
        estimatedSize: this.estimateIndexSize(recommendedIndex, slowQuery._id.collection)
      };

    } catch (error) {
      console.error('Error generating index recommendation:', error);
      return {
        suggestedIndex: {},
        indexCommand: 'Manual analysis required',
        reasoning: 'Unable to analyze query pattern automatically',
        estimatedSize: 0
      };
    }
  }

  async explainQuery(collection, query, options = {}) {
    console.log(`Explaining query execution plan for collection: ${collection}`);

    try {
      const targetCollection = this.db.collection(collection);

      // Execute explain with detailed execution stats
      const explainResult = await targetCollection
        .find(query.filter || {}, options)
        .sort(query.sort || {})
        .limit(query.limit || 0)
        .explain('executionStats');

      // Analyze execution plan
      const executionAnalysis = this.analyzeExecutionPlan(explainResult);

      // Generate optimization insights
      const optimizationInsights = await this.generateQueryOptimizationInsights(
        collection, 
        query, 
        explainResult, 
        executionAnalysis
      );

      return {
        success: true,
        query: query,
        executionPlan: explainResult,
        executionAnalysis: executionAnalysis,
        optimizationInsights: optimizationInsights,
        explainTimestamp: new Date()
      };

    } catch (error) {
      console.error(`Error explaining query for collection ${collection}:`, error);
      return {
        success: false,
        collection: collection,
        query: query,
        error: error.message
      };
    }
  }

  analyzeExecutionPlan(explainResult) {
    try {
      const executionStats = explainResult.executionStats;
      const winningPlan = explainResult.queryPlanner?.winningPlan;

      const analysis = {
        // Basic execution metrics
        executionTime: executionStats.executionTimeMillis,
        documentsExamined: executionStats.totalDocsExamined,
        documentsReturned: executionStats.totalDocsReturned,
        keysExamined: executionStats.totalKeysExamined,

        // Efficiency calculations
        selectivityRatio: executionStats.totalDocsExamined > 0 
          ? executionStats.totalDocsReturned / executionStats.totalDocsExamined 
          : 0,

        indexEfficiency: executionStats.totalKeysExamined > 0 
          ? executionStats.totalDocsReturned / executionStats.totalKeysExamined 
          : 0,

        // Plan analysis
        planType: this.identifyPlanType(winningPlan),
        indexesUsed: this.extractIndexesUsed(winningPlan),
        hasSort: this.hasSortStage(winningPlan),
        hasBlockingSort: this.hasBlockingSortStage(winningPlan),

        // Performance assessment
        performanceRating: this.assessQueryPerformance(executionStats, winningPlan),

        // Resource usage
        workingSetSize: executionStats.workingSetSize || 0,

        // Optimization opportunities
        needsOptimization: this.needsOptimization(executionStats, winningPlan)
      };

      return analysis;

    } catch (error) {
      console.error('Error analyzing execution plan:', error);
      return {
        error: 'Failed to analyze execution plan',
        executionTime: 0,
        documentsExamined: 0,
        documentsReturned: 0,
        needsOptimization: true
      };
    }
  }

  async generateQueryOptimizationInsights(collection, query, explainResult, executionAnalysis) {
    try {
      const insights = [];

      // Check for full collection scans
      if (executionAnalysis.planType === 'COLLSCAN') {
        insights.push({
          type: 'full_scan_detected',
          severity: 'high',
          message: 'Query is performing a full collection scan',
          recommendation: 'Add an appropriate index to avoid collection scanning',
          suggestedIndex: await this.suggestIndexForQuery(query)
        });
      }

      // Check for low selectivity
      if (executionAnalysis.selectivityRatio < 0.1) {
        insights.push({
          type: 'low_selectivity',
          severity: 'medium',
          message: `Query selectivity is low (${(executionAnalysis.selectivityRatio * 100).toFixed(2)}%)`,
          recommendation: 'Consider more selective query conditions or compound indexes',
          currentSelectivity: executionAnalysis.selectivityRatio
        });
      }

      // Check for blocking sorts
      if (executionAnalysis.hasBlockingSort) {
        insights.push({
          type: 'blocking_sort',
          severity: 'high',
          message: 'Query requires in-memory sorting which can be expensive',
          recommendation: 'Create an index that supports the sort order',
          suggestedIndex: this.suggestSortIndex(query.sort)
        });
      }

      // Check for excessive key examination
      if (executionAnalysis.keysExamined > executionAnalysis.documentsReturned * 10) {
        insights.push({
          type: 'excessive_key_examination',
          severity: 'medium',
          message: 'Query is examining significantly more keys than documents returned',
          recommendation: 'Consider compound indexes to improve key examination efficiency',
          keysExamined: executionAnalysis.keysExamined,
          documentsReturned: executionAnalysis.documentsReturned
        });
      }

      // Check execution time
      if (executionAnalysis.executionTime > this.config.slowQueryThresholdMs) {
        insights.push({
          type: 'slow_execution',
          severity: executionAnalysis.executionTime > this.config.slowQueryThresholdMs * 5 ? 'high' : 'medium',
          message: `Query execution time (${executionAnalysis.executionTime}ms) exceeds threshold`,
          recommendation: 'Consider query optimization or index improvements',
          executionTime: executionAnalysis.executionTime,
          threshold: this.config.slowQueryThresholdMs
        });
      }

      return insights;

    } catch (error) {
      console.error('Error generating query optimization insights:', error);
      return [];
    }
  }

  async getPerformanceMetrics(timeRangeHours = 24) {
    console.log(`Retrieving performance metrics for the last ${timeRangeHours} hours...`);

    try {
      const startTime = new Date(Date.now() - (timeRangeHours * 60 * 60 * 1000));

      // Get comprehensive performance metrics
      const metrics = await this.collections.performanceMetrics
        .find({
          timestamp: { $gte: startTime }
        })
        .sort({ timestamp: -1 })
        .toArray();

      // Calculate summary statistics
      const performanceSummary = this.calculatePerformanceSummary(metrics);

      // Get current optimization recommendations
      const currentRecommendations = await this.collections.optimizationRecommendations
        .find({
          createdAt: { $gte: startTime },
          status: { $ne: 'implemented' }
        })
        .sort({ priority: -1, estimatedImpact: -1 })
        .limit(10)
        .toArray();

      return {
        success: true,
        timeRangeHours: timeRangeHours,
        metricsCollected: metrics.length,
        performanceSummary: performanceSummary,
        currentRecommendations: currentRecommendations,
        lastUpdated: new Date()
      };

    } catch (error) {
      console.error('Error retrieving performance metrics:', error);
      return {
        success: false,
        error: error.message,
        timeRangeHours: timeRangeHours
      };
    }
  }

  calculatePerformanceSummary(metrics) {
    if (metrics.length === 0) {
      return {
        totalQueries: 0,
        averageExecutionTime: 0,
        slowQueries: 0,
        indexEffectiveness: 'unknown'
      };
    }

    // Extract metrics from analysis results
    const analysisResults = metrics
      .filter(metric => metric.metricType === 'comprehensive_analysis')
      .map(metric => metric.analysisResults);

    if (analysisResults.length === 0) {
      return {
        totalQueries: 0,
        averageExecutionTime: 0,
        slowQueries: 0,
        indexEffectiveness: 'no_data'
      };
    }

    const latestAnalysis = analysisResults[0];

    return {
      totalQueries: latestAnalysis.queryPerformanceSummary?.totalQueries || 0,
      averageExecutionTime: latestAnalysis.queryPerformanceSummary?.averageExecutionTime || 0,
      p95ExecutionTime: latestAnalysis.queryPerformanceSummary?.p95ExecutionTime || 0,
      slowQueries: latestAnalysis.queryPerformanceSummary?.slowQueries || 0,

      // Index effectiveness
      indexEffectiveness: {
        totalIndexes: latestAnalysis.indexEffectiveness?.totalIndexes || 0,
        activelyUsedIndexes: latestAnalysis.indexEffectiveness?.activelyUsedIndexes || 0,
        unusedIndexes: latestAnalysis.indexEffectiveness?.unusedIndexes?.length || 0,
        averageSelectivity: latestAnalysis.indexEffectiveness?.averageIndexSelectivity || 0
      },

      // Performance trends
      performanceBottlenecks: latestAnalysis.performanceBottlenecks || [],
      optimizationOpportunities: latestAnalysis.optimizationOpportunities?.length || 0
    };
  }

  // Utility methods for performance analysis

  determineIndexType(index) {
    if (index.name === '_id_') return 'primary';
    if (index.unique) return 'unique';
    if (index.sparse) return 'sparse';
    if (index.partialFilterExpression) return 'partial';
    if (Object.values(index.key).includes('text')) return 'text';
    if (Object.values(index.key).includes('2dsphere')) return 'geospatial';
    if (Object.keys(index.key).length > 1) return 'compound';
    return 'single';
  }

  calculateIndexEffectiveness(indexStat, index) {
    const usageCount = indexStat.accesses?.ops || 0;
    const indexSize = index.size || 0;

    // Calculate effectiveness based on usage frequency and size efficiency
    if (usageCount === 0) return 0;
    if (indexSize === 0) return 1;

    // Simple effectiveness metric: usage per MB of index size
    const sizeInMB = indexSize / (1024 * 1024);
    return Math.min(usageCount / Math.max(sizeInMB, 1), 100);
  }

  assessIndexHealth(indexStat, index) {
    const usageCount = indexStat.accesses?.ops || 0;
    const effectiveness = this.calculateIndexEffectiveness(indexStat, index);

    if (usageCount === 0) return 'unused';
    if (effectiveness < 0.1) return 'inefficient';
    if (effectiveness > 10) return 'highly_effective';
    return 'moderate';
  }

  identifyPlanType(winningPlan) {
    if (!winningPlan) return 'unknown';
    if (winningPlan.stage === 'COLLSCAN') return 'COLLSCAN';
    if (winningPlan.stage === 'IXSCAN') return 'IXSCAN';
    if (winningPlan.inputStage?.stage === 'IXSCAN') return 'IXSCAN';
    return winningPlan.stage || 'unknown';
  }

  extractIndexesUsed(winningPlan) {
    const indexes = [];

    function extractFromStage(stage) {
      if (stage.indexName) {
        indexes.push(stage.indexName);
      }
      if (stage.inputStage) {
        extractFromStage(stage.inputStage);
      }
      if (stage.inputStages) {
        stage.inputStages.forEach(extractFromStage);
      }
    }

    if (winningPlan) {
      extractFromStage(winningPlan);
    }

    return [...new Set(indexes)]; // Remove duplicates
  }

  hasSortStage(winningPlan) {
    if (!winningPlan) return false;

    function checkForSort(stage) {
      if (stage.stage === 'SORT') return true;
      if (stage.inputStage) return checkForSort(stage.inputStage);
      if (stage.inputStages) return stage.inputStages.some(checkForSort);
      return false;
    }

    return checkForSort(winningPlan);
  }

  hasBlockingSortStage(winningPlan) {
    if (!winningPlan) return false;

    function checkForBlockingSort(stage) {
      // A sort is blocking if it's not supported by an index
      if (stage.stage === 'SORT' && !stage.inputStage?.stage?.includes('IXSCAN')) {
        return true;
      }
      if (stage.inputStage) return checkForBlockingSort(stage.inputStage);
      if (stage.inputStages) return stage.inputStages.some(checkForBlockingSort);
      return false;
    }

    return checkForBlockingSort(winningPlan);
  }

  assessQueryPerformance(executionStats, winningPlan) {
    const executionTime = executionStats.executionTimeMillis || 0;
    const selectivityRatio = executionStats.totalDocsExamined > 0 
      ? executionStats.totalDocsReturned / executionStats.totalDocsExamined 
      : 0;

    // Performance rating based on multiple factors
    let score = 100;

    // Penalize slow execution
    if (executionTime > 1000) score -= 40;
    else if (executionTime > 500) score -= 20;
    else if (executionTime > 100) score -= 10;

    // Penalize low selectivity
    if (selectivityRatio < 0.01) score -= 30;
    else if (selectivityRatio < 0.1) score -= 15;

    // Penalize full collection scans
    if (winningPlan?.stage === 'COLLSCAN') score -= 25;

    // Penalize blocking sorts
    if (this.hasBlockingSortStage(winningPlan)) score -= 15;

    if (score >= 80) return 'excellent';
    if (score >= 60) return 'good';
    if (score >= 40) return 'fair';
    return 'poor';
  }

  needsOptimization(executionStats, winningPlan) {
    const executionTime = executionStats.executionTimeMillis || 0;
    const selectivityRatio = executionStats.totalDocsExamined > 0 
      ? executionStats.totalDocsReturned / executionStats.totalDocsExamined 
      : 0;

    return executionTime > this.config.slowQueryThresholdMs ||
           selectivityRatio < 0.1 ||
           winningPlan?.stage === 'COLLSCAN' ||
           this.hasBlockingSortStage(winningPlan);
  }

  estimatePerformanceImprovement(slowQuery) {
    return {
      executionTimeReduction: '60-80%',
      documentExaminationReduction: '90-95%',
      resourceUsageReduction: '70-85%',
      confidenceLevel: 'high'
    };
  }

  estimateIndexSize(indexKeys, collection) {
    // Simplified index size estimation
    const keyCount = Object.keys(indexKeys).length;
    const estimatedDocumentSize = 100; // Average document size estimate
    const estimatedCollectionSize = 100000; // Estimate

    return keyCount * estimatedDocumentSize * estimatedCollectionSize * 0.1;
  }

  async shutdown() {
    console.log('Shutting down performance optimizer...');

    try {
      // Disable profiling
      if (this.config.enableQueryProfiling) {
        await this.db.admin().command({ profile: 0 });
      }

      // Close MongoDB connection
      if (this.client) {
        await this.client.close();
      }

      console.log('Performance optimizer shutdown complete');

    } catch (error) {
      console.error('Error during shutdown:', error);
    }
  }

  // Additional methods would include implementations for:
  // - startRealTimeMonitoring()
  // - initializeIndexAnalysis()
  // - identifyPerformanceBottlenecks()
  // - analyzeIndexUsagePatterns()
  // - calculateAverageIndexSelectivity()
  // - analyzeCompoundIndexOpportunities()
  // - identifyIndexOptimizations()
  // - suggestIndexForQuery()
  // - suggestSortIndex()
}

// Benefits of MongoDB Advanced Performance Optimization:
// - Comprehensive query performance analysis and monitoring
// - Intelligent index optimization recommendations
// - Real-time performance bottleneck identification
// - Advanced execution plan analysis and insights
// - Automated slow query detection and optimization
// - Index usage effectiveness assessment
// - Compound index optimization strategies
// - SQL-compatible performance operations through QueryLeaf integration
// - Production-ready monitoring and alerting capabilities
// - Enterprise-grade performance tuning automation

module.exports = {
  AdvancedPerformanceOptimizer
};

Understanding MongoDB Performance Architecture

Advanced Query Optimization and Index Management Patterns

Implement sophisticated performance optimization workflows for enterprise MongoDB deployments:

// Enterprise-grade performance optimization with advanced analytics capabilities
class EnterprisePerformanceManager extends AdvancedPerformanceOptimizer {
  constructor(mongoUri, enterpriseConfig) {
    super(mongoUri, enterpriseConfig);

    this.enterpriseConfig = {
      ...enterpriseConfig,
      enablePredictiveOptimization: true,
      enableCapacityPlanning: true,
      enableAutomatedTuning: true,
      enablePerformanceForecasting: true,
      enableComplianceReporting: true
    };

    this.setupEnterpriseCapabilities();
    this.initializePredictiveAnalytics();
    this.setupAutomatedOptimization();
  }

  async implementAdvancedOptimizationStrategy() {
    console.log('Implementing enterprise optimization strategy...');

    const optimizationStrategy = {
      // Multi-tier optimization approach
      optimizationTiers: {
        realTimeOptimization: {
          enabled: true,
          responseTimeThreshold: 100,
          automaticIndexCreation: true,
          queryRewriting: true
        },
        batchOptimization: {
          enabled: true,
          analysisInterval: '1h',
          comprehensiveIndexAnalysis: true,
          workloadPatternAnalysis: true
        },
        predictiveOptimization: {
          enabled: true,
          forecastingHorizon: '7d',
          capacityPlanning: true,
          performanceTrendAnalysis: true
        }
      },

      // Advanced analytics
      performanceAnalytics: {
        enableMachineLearning: true,
        anomalyDetection: true,
        performanceForecasting: true,
        workloadCharacterization: true
      }
    };

    return await this.deployOptimizationStrategy(optimizationStrategy);
  }
}

SQL-Style Performance Optimization with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB performance analysis and optimization:

-- QueryLeaf advanced performance optimization with SQL-familiar syntax for MongoDB

-- Comprehensive query performance analysis
WITH query_performance_analysis AS (
    SELECT 
        collection_name,
        query_shape_hash,
        query_pattern_type,

        -- Execution statistics
        COUNT(*) as execution_count,
        AVG(execution_time_ms) as avg_execution_time_ms,
        PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY execution_time_ms) as p95_execution_time_ms,
        MAX(execution_time_ms) as max_execution_time_ms,

        -- Document examination analysis
        AVG(documents_examined) as avg_docs_examined,
        AVG(documents_returned) as avg_docs_returned,
        CASE 
            WHEN AVG(documents_examined) > 0 THEN
                AVG(documents_returned) / AVG(documents_examined)
            ELSE 0
        END as avg_selectivity_ratio,

        -- Index usage analysis
        STRING_AGG(DISTINCT index_name, ', ') as indexes_used,
        AVG(keys_examined) as avg_keys_examined,

        -- Resource utilization
        SUM(execution_time_ms) as total_execution_time_ms,
        AVG(working_set_size_kb) as avg_working_set_kb,

        -- Performance categorization
        CASE 
            WHEN AVG(execution_time_ms) < 50 THEN 'fast'
            WHEN AVG(execution_time_ms) < 200 THEN 'moderate' 
            WHEN AVG(execution_time_ms) < 1000 THEN 'slow'
            ELSE 'very_slow'
        END as performance_category,

        -- Optimization need assessment
        CASE 
            WHEN AVG(execution_time_ms) > 500 OR 
                 (AVG(documents_examined) > AVG(documents_returned) * 100) OR
                 COUNT(*) > 1000 THEN true
            ELSE false
        END as needs_optimization

    FROM QUERY_PERFORMANCE_LOG
    WHERE execution_timestamp >= CURRENT_TIMESTAMP - INTERVAL '24 hours'
    GROUP BY collection_name, query_shape_hash, query_pattern_type
),

index_effectiveness_analysis AS (
    SELECT 
        collection_name,
        index_name,
        index_type,
        COALESCE(JSON_EXTRACT(index_definition, '$'), '{}') as index_keys,

        -- Usage statistics
        COALESCE(usage_count, 0) as usage_count,
        COALESCE(last_used_timestamp, '1970-01-01'::timestamp) as last_used,

        -- Size and storage analysis
        index_size_bytes,
        ROUND(index_size_bytes / 1024.0 / 1024.0, 2) as index_size_mb,

        -- Effectiveness calculations
        CASE 
            WHEN usage_count = 0 THEN 0
            WHEN index_size_bytes > 0 THEN 
                usage_count / GREATEST((index_size_bytes / 1024.0 / 1024.0), 1)
            ELSE usage_count
        END as effectiveness_score,

        -- Usage categorization
        CASE 
            WHEN usage_count = 0 THEN 'unused'
            WHEN usage_count < 100 THEN 'rarely_used'
            WHEN usage_count < 1000 THEN 'moderately_used'
            ELSE 'frequently_used'
        END as usage_category,

        -- Health assessment
        CASE 
            WHEN usage_count = 0 AND index_name != '_id_' THEN 'candidate_for_removal'
            WHEN usage_count > 0 AND index_size_bytes > 100*1024*1024 AND usage_count < 100 THEN 'review_necessity'
            WHEN usage_count > 1000 THEN 'valuable'
            ELSE 'monitor'
        END as health_status,

        -- Age analysis
        EXTRACT(DAYS FROM (CURRENT_TIMESTAMP - COALESCE(last_used_timestamp, created_timestamp))) as days_since_last_use

    FROM INDEX_USAGE_STATS
    WHERE analysis_timestamp >= CURRENT_TIMESTAMP - INTERVAL '7 days'
),

optimization_opportunities AS (
    SELECT 
        qpa.collection_name,
        qpa.query_pattern_type,
        qpa.execution_count,
        qpa.avg_execution_time_ms,
        qpa.avg_selectivity_ratio,
        qpa.performance_category,
        qpa.needs_optimization,

        -- Performance impact calculation
        qpa.total_execution_time_ms as performance_impact_ms,
        ROUND(qpa.total_execution_time_ms / 1000.0, 2) as performance_impact_seconds,

        -- Index analysis correlation
        COUNT(iea.index_name) as available_indexes,
        STRING_AGG(iea.index_name, ', ') as collection_indexes,
        AVG(iea.effectiveness_score) as avg_index_effectiveness,

        -- Optimization recommendations
        CASE 
            WHEN qpa.avg_selectivity_ratio < 0.01 AND qpa.execution_count > 100 THEN 'create_selective_index'
            WHEN qpa.avg_execution_time_ms > 1000 AND qpa.indexes_used IS NULL THEN 'add_supporting_index'
            WHEN qpa.avg_execution_time_ms > 500 AND qpa.indexes_used LIKE '%COLLSCAN%' THEN 'replace_collection_scan'
            WHEN qpa.performance_category = 'very_slow' THEN 'comprehensive_optimization'
            WHEN qpa.execution_count > 10000 AND qpa.performance_category IN ('slow', 'moderate') THEN 'high_frequency_optimization'
            ELSE 'monitor_performance'
        END as optimization_recommendation,

        -- Priority assessment
        CASE 
            WHEN qpa.total_execution_time_ms > 60000 AND qpa.execution_count > 1000 THEN 'critical'
            WHEN qpa.total_execution_time_ms > 30000 OR qpa.avg_execution_time_ms > 2000 THEN 'high'
            WHEN qpa.total_execution_time_ms > 10000 OR qpa.execution_count > 5000 THEN 'medium'
            ELSE 'low'
        END as optimization_priority,

        -- Estimated improvement potential
        CASE 
            WHEN qpa.avg_selectivity_ratio < 0.01 THEN '80-90% improvement expected'
            WHEN qpa.performance_category = 'very_slow' THEN '60-80% improvement expected'
            WHEN qpa.performance_category = 'slow' THEN '40-60% improvement expected'
            ELSE '20-40% improvement expected'
        END as estimated_improvement

    FROM query_performance_analysis qpa
    LEFT JOIN index_effectiveness_analysis iea ON qpa.collection_name = iea.collection_name
    WHERE qpa.needs_optimization = true
    GROUP BY 
        qpa.collection_name, qpa.query_pattern_type, qpa.execution_count,
        qpa.avg_execution_time_ms, qpa.avg_selectivity_ratio, qpa.performance_category,
        qpa.needs_optimization, qpa.total_execution_time_ms, qpa.indexes_used
)

SELECT 
    oo.collection_name,
    oo.query_pattern_type,
    oo.optimization_priority,
    oo.optimization_recommendation,

    -- Performance metrics
    oo.execution_count,
    ROUND(oo.avg_execution_time_ms, 2) as avg_execution_time_ms,
    ROUND(oo.performance_impact_seconds, 2) as total_impact_seconds,
    ROUND(oo.avg_selectivity_ratio * 100, 2) as selectivity_percent,

    -- Current state analysis
    oo.performance_category,
    oo.available_indexes,
    COALESCE(oo.collection_indexes, 'No indexes found') as current_indexes,
    ROUND(COALESCE(oo.avg_index_effectiveness, 0), 2) as avg_index_effectiveness,

    -- Optimization guidance
    oo.estimated_improvement,

    -- Specific recommendations based on analysis
    CASE oo.optimization_recommendation
        WHEN 'create_selective_index' THEN 
            'Create compound index on high-selectivity filter fields for collection: ' || oo.collection_name
        WHEN 'add_supporting_index' THEN 
            'Add index to eliminate collection scans in collection: ' || oo.collection_name
        WHEN 'replace_collection_scan' THEN 
            'Critical: Replace collection scan with indexed access in collection: ' || oo.collection_name
        WHEN 'comprehensive_optimization' THEN 
            'Comprehensive query and index optimization needed for collection: ' || oo.collection_name
        WHEN 'high_frequency_optimization' THEN 
            'Optimize high-frequency queries in collection: ' || oo.collection_name
        ELSE 'Continue monitoring performance trends'
    END as detailed_recommendation,

    -- Implementation complexity assessment
    CASE 
        WHEN oo.available_indexes = 0 THEN 'high_complexity'
        WHEN oo.avg_index_effectiveness < 1 THEN 'medium_complexity'
        ELSE 'low_complexity'
    END as implementation_complexity,

    -- Business impact estimation
    CASE oo.optimization_priority
        WHEN 'critical' THEN 'High business impact - immediate attention required'
        WHEN 'high' THEN 'Moderate business impact - optimize within 1 week'
        WHEN 'medium' THEN 'Low business impact - optimize within 1 month'
        ELSE 'Minimal business impact - optimize when convenient'
    END as business_impact_assessment,

    -- Resource requirements
    CASE 
        WHEN oo.optimization_recommendation IN ('create_selective_index', 'add_supporting_index') THEN 'Index creation: 5-30 minutes'
        WHEN oo.optimization_recommendation = 'comprehensive_optimization' THEN 'Full analysis: 2-8 hours'
        ELSE 'Monitoring: ongoing'
    END as estimated_effort

FROM optimization_opportunities oo
ORDER BY 
    CASE oo.optimization_priority 
        WHEN 'critical' THEN 1 
        WHEN 'high' THEN 2 
        WHEN 'medium' THEN 3 
        ELSE 4 
    END,
    oo.performance_impact_seconds DESC,
    oo.execution_count DESC;

-- Index usage and effectiveness analysis
WITH index_usage_trends AS (
    SELECT 
        collection_name,
        index_name,

        -- Usage trend analysis over time windows
        SUM(CASE WHEN analysis_timestamp >= CURRENT_TIMESTAMP - INTERVAL '1 hour' THEN usage_count ELSE 0 END) as usage_last_hour,
        SUM(CASE WHEN analysis_timestamp >= CURRENT_TIMESTAMP - INTERVAL '24 hours' THEN usage_count ELSE 0 END) as usage_last_24h,
        SUM(CASE WHEN analysis_timestamp >= CURRENT_TIMESTAMP - INTERVAL '7 days' THEN usage_count ELSE 0 END) as usage_last_7d,

        -- Size and storage trends
        AVG(index_size_bytes) as avg_index_size_bytes,
        MAX(index_size_bytes) as max_index_size_bytes,

        -- Usage efficiency trends
        AVG(CASE WHEN index_size_bytes > 0 AND usage_count > 0 THEN 
                usage_count / (index_size_bytes / 1024.0 / 1024.0)
            ELSE 0 
        END) as avg_usage_efficiency,

        -- Consistency analysis
        COUNT(DISTINCT DATE_TRUNC('day', analysis_timestamp)) as analysis_days,
        STDDEV(usage_count) as usage_variability,

        -- Most recent statistics
        MAX(analysis_timestamp) as last_analysis,
        MAX(last_used_timestamp) as most_recent_use

    FROM index_usage_stats
    WHERE analysis_timestamp >= CURRENT_TIMESTAMP - INTERVAL '7 days'
    GROUP BY collection_name, index_name
),

index_recommendations AS (
    SELECT 
        iut.*,

        -- Usage trend classification
        CASE 
            WHEN iut.usage_last_hour = 0 AND iut.usage_last_24h = 0 AND iut.usage_last_7d = 0 THEN 'completely_unused'
            WHEN iut.usage_last_hour = 0 AND iut.usage_last_24h = 0 AND iut.usage_last_7d > 0 THEN 'infrequently_used'
            WHEN iut.usage_last_hour = 0 AND iut.usage_last_24h > 0 THEN 'daily_usage'
            WHEN iut.usage_last_hour > 0 THEN 'active_usage'
            ELSE 'unknown_usage'
        END as usage_trend,

        -- Storage efficiency assessment
        CASE 
            WHEN iut.avg_index_size_bytes > 1024*1024*1024 AND iut.usage_last_7d < 100 THEN 'storage_inefficient'
            WHEN iut.avg_index_size_bytes > 100*1024*1024 AND iut.usage_last_7d < 10 THEN 'questionable_storage_usage'
            WHEN iut.avg_usage_efficiency > 10 THEN 'storage_efficient'
            ELSE 'acceptable_storage_usage'
        END as storage_efficiency,

        -- Recommendation generation
        CASE 
            WHEN iut.usage_last_7d = 0 AND iut.index_name != '_id_' THEN 'consider_dropping'
            WHEN iut.avg_index_size_bytes > 500*1024*1024 AND iut.usage_last_7d < 50 THEN 'evaluate_necessity'
            WHEN iut.usage_variability > iut.usage_last_7d * 0.8 THEN 'inconsistent_usage_investigate'
            WHEN iut.avg_usage_efficiency > 20 THEN 'high_value_maintain'
            WHEN iut.usage_last_hour > 100 THEN 'critical_index_monitor'
            ELSE 'continue_monitoring'
        END as recommendation,

        -- Impact assessment for potential changes
        CASE 
            WHEN iut.usage_last_hour > 0 THEN 'high_impact_if_removed'
            WHEN iut.usage_last_24h > 0 THEN 'medium_impact_if_removed'
            WHEN iut.usage_last_7d > 0 THEN 'low_impact_if_removed'
            ELSE 'no_impact_if_removed'
        END as removal_impact,

        -- Storage savings potential
        CASE 
            WHEN iut.avg_index_size_bytes > 0 THEN 
                ROUND(iut.avg_index_size_bytes / 1024.0 / 1024.0, 2)
            ELSE 0
        END as storage_savings_mb

    FROM index_usage_trends iut
),

collection_performance_summary AS (
    SELECT 
        collection_name,
        COUNT(*) as total_indexes,

        -- Usage distribution
        COUNT(*) FILTER (WHERE usage_trend = 'active_usage') as active_indexes,
        COUNT(*) FILTER (WHERE usage_trend = 'daily_usage') as daily_indexes,
        COUNT(*) FILTER (WHERE usage_trend = 'infrequently_used') as infrequent_indexes,
        COUNT(*) FILTER (WHERE usage_trend = 'completely_unused') as unused_indexes,

        -- Storage analysis
        SUM(avg_index_size_bytes) as total_index_storage_bytes,
        AVG(avg_usage_efficiency) as collection_avg_efficiency,

        -- Optimization potential
        COUNT(*) FILTER (WHERE recommendation = 'consider_dropping') as indexes_to_drop,
        COUNT(*) FILTER (WHERE recommendation = 'evaluate_necessity') as indexes_to_evaluate,
        SUM(CASE WHEN recommendation IN ('consider_dropping', 'evaluate_necessity') 
                 THEN storage_savings_mb ELSE 0 END) as potential_storage_savings_mb,

        -- Collection health assessment
        CASE 
            WHEN COUNT(*) FILTER (WHERE usage_trend = 'active_usage') = 0 THEN 'no_active_indexes'
            WHEN COUNT(*) FILTER (WHERE usage_trend = 'completely_unused') > COUNT(*) * 0.5 THEN 'many_unused_indexes'
            WHEN AVG(avg_usage_efficiency) < 1 THEN 'poor_index_efficiency'
            ELSE 'healthy_index_usage'
        END as collection_health

    FROM index_recommendations
    GROUP BY collection_name
)

SELECT 
    cps.collection_name,
    cps.total_indexes,
    cps.collection_health,

    -- Index usage distribution
    cps.active_indexes,
    cps.daily_indexes,
    cps.infrequent_indexes,
    cps.unused_indexes,

    -- Storage utilization
    ROUND(cps.total_index_storage_bytes / 1024.0 / 1024.0, 2) as total_storage_mb,
    ROUND(cps.collection_avg_efficiency, 2) as avg_efficiency_score,

    -- Optimization opportunities
    cps.indexes_to_drop,
    cps.indexes_to_evaluate, 
    ROUND(cps.potential_storage_savings_mb, 2) as potential_savings_mb,

    -- Optimization priority
    CASE 
        WHEN cps.collection_health = 'no_active_indexes' THEN 'critical_review_needed'
        WHEN cps.unused_indexes > 5 OR cps.potential_storage_savings_mb > 1000 THEN 'high_cleanup_priority'
        WHEN cps.collection_avg_efficiency < 2 THEN 'medium_optimization_priority'
        ELSE 'low_maintenance_priority'
    END as optimization_priority,

    -- Recommendations summary
    CASE cps.collection_health
        WHEN 'no_active_indexes' THEN 'URGENT: Collection has no actively used indexes - investigate query patterns'
        WHEN 'many_unused_indexes' THEN 'Multiple unused indexes detected - perform index cleanup'
        WHEN 'poor_index_efficiency' THEN 'Index usage is inefficient - review index design'
        ELSE 'Index usage appears healthy - continue monitoring'
    END as primary_recommendation,

    -- Storage efficiency assessment
    CASE 
        WHEN cps.potential_storage_savings_mb > 1000 THEN 
            'High storage optimization potential: ' || ROUND(cps.potential_storage_savings_mb, 0) || 'MB recoverable'
        WHEN cps.potential_storage_savings_mb > 100 THEN 
            'Moderate storage optimization: ' || ROUND(cps.potential_storage_savings_mb, 0) || 'MB recoverable'
        WHEN cps.potential_storage_savings_mb > 10 THEN 
            'Minor storage optimization: ' || ROUND(cps.potential_storage_savings_mb, 0) || 'MB recoverable'
        ELSE 'Minimal storage optimization potential'
    END as storage_optimization_summary,

    -- Specific next actions
    ARRAY[
        CASE WHEN cps.indexes_to_drop > 0 THEN 
            'Review and drop ' || cps.indexes_to_drop || ' unused indexes' END,
        CASE WHEN cps.indexes_to_evaluate > 0 THEN 
            'Evaluate necessity of ' || cps.indexes_to_evaluate || ' underutilized indexes' END,
        CASE WHEN cps.collection_avg_efficiency < 1 THEN 
            'Redesign indexes for better efficiency' END,
        CASE WHEN cps.active_indexes = 0 THEN 
            'Investigate why no indexes are actively used' END
    ]::TEXT[] as action_items

FROM collection_performance_summary cps
ORDER BY 
    CASE cps.collection_health 
        WHEN 'no_active_indexes' THEN 1 
        WHEN 'many_unused_indexes' THEN 2 
        WHEN 'poor_index_efficiency' THEN 3 
        ELSE 4 
    END,
    cps.potential_storage_savings_mb DESC,
    cps.total_indexes DESC;

-- Real-time query performance monitoring and alerting
CREATE VIEW real_time_performance_dashboard AS
WITH current_performance AS (
    SELECT 
        collection_name,
        query_pattern_type,

        -- Recent performance metrics (last hour)
        COUNT(*) FILTER (WHERE execution_timestamp >= CURRENT_TIMESTAMP - INTERVAL '1 hour') as queries_last_hour,
        AVG(execution_time_ms) FILTER (WHERE execution_timestamp >= CURRENT_TIMESTAMP - INTERVAL '1 hour') as avg_time_last_hour,
        MAX(execution_time_ms) FILTER (WHERE execution_timestamp >= CURRENT_TIMESTAMP - INTERVAL '1 hour') as max_time_last_hour,

        -- Performance trend comparison (current hour vs previous hour)
        AVG(execution_time_ms) FILTER (WHERE execution_timestamp >= CURRENT_TIMESTAMP - INTERVAL '2 hours'
                                              AND execution_timestamp < CURRENT_TIMESTAMP - INTERVAL '1 hour') as avg_time_prev_hour,

        -- Critical performance indicators
        COUNT(*) FILTER (WHERE execution_timestamp >= CURRENT_TIMESTAMP - INTERVAL '1 hour'
                               AND execution_time_ms > 5000) as critical_slow_queries,
        COUNT(*) FILTER (WHERE execution_timestamp >= CURRENT_TIMESTAMP - INTERVAL '1 hour'
                               AND execution_time_ms > 1000) as slow_queries,

        -- Resource utilization trends
        AVG(documents_examined) FILTER (WHERE execution_timestamp >= CURRENT_TIMESTAMP - INTERVAL '1 hour') as avg_docs_examined,
        AVG(documents_returned) FILTER (WHERE execution_timestamp >= CURRENT_TIMESTAMP - INTERVAL '1 hour') as avg_docs_returned,

        -- Most recent query information
        MAX(execution_timestamp) as last_execution,
        MAX(execution_time_ms) as recent_max_time

    FROM query_performance_log
    WHERE execution_timestamp >= CURRENT_TIMESTAMP - INTERVAL '2 hours'
    GROUP BY collection_name, query_pattern_type
),

performance_alerts AS (
    SELECT 
        cp.*,

        -- Performance trend analysis
        CASE 
            WHEN cp.avg_time_last_hour > cp.avg_time_prev_hour * 2 THEN 'degradation_alert'
            WHEN cp.avg_time_last_hour > 2000 THEN 'slow_performance_alert'
            WHEN cp.critical_slow_queries > 0 THEN 'critical_performance_alert'
            WHEN cp.queries_last_hour > 1000 AND cp.avg_time_last_hour > 500 THEN 'high_volume_slow_alert'
            ELSE 'normal'
        END as alert_level,

        -- Selectivity analysis
        CASE 
            WHEN cp.avg_docs_examined > 0 THEN cp.avg_docs_returned / cp.avg_docs_examined
            ELSE 1
        END as current_selectivity,

        -- Performance change calculation
        CASE 
            WHEN cp.avg_time_prev_hour > 0 THEN 
                ROUND(((cp.avg_time_last_hour - cp.avg_time_prev_hour) / cp.avg_time_prev_hour) * 100, 1)
            ELSE 0
        END as performance_change_percent,

        -- Alert priority
        CASE 
            WHEN cp.critical_slow_queries > 0 THEN 'critical'
            WHEN cp.avg_time_last_hour > cp.avg_time_prev_hour * 2 THEN 'high'
            WHEN cp.slow_queries > 10 THEN 'medium'
            ELSE 'low'
        END as alert_priority

    FROM current_performance cp
    WHERE cp.queries_last_hour > 0
)

SELECT 
    pa.collection_name,
    pa.query_pattern_type,
    pa.alert_level,
    pa.alert_priority,

    -- Current performance metrics
    pa.queries_last_hour,
    ROUND(pa.avg_time_last_hour, 2) as current_avg_time_ms,
    pa.max_time_last_hour,
    pa.recent_max_time,

    -- Performance comparison
    ROUND(COALESCE(pa.avg_time_prev_hour, 0), 2) as previous_avg_time_ms,
    pa.performance_change_percent || '%' as performance_change,

    -- Problem severity indicators
    pa.critical_slow_queries,
    pa.slow_queries,
    ROUND(pa.current_selectivity * 100, 2) as selectivity_percent,

    -- Alert messages
    CASE pa.alert_level
        WHEN 'critical_performance_alert' THEN 
            'CRITICAL: ' || pa.critical_slow_queries || ' queries exceeded 5 second threshold'
        WHEN 'degradation_alert' THEN 
            'WARNING: Performance degraded by ' || pa.performance_change_percent || '% from previous hour'
        WHEN 'slow_performance_alert' THEN 
            'WARNING: Average query time (' || ROUND(pa.avg_time_last_hour, 0) || 'ms) exceeds acceptable threshold'
        WHEN 'high_volume_slow_alert' THEN 
            'WARNING: High query volume (' || pa.queries_last_hour || ') with slow performance'
        ELSE 'No performance alerts'
    END as alert_message,

    -- Recommended actions
    CASE pa.alert_level
        WHEN 'critical_performance_alert' THEN 'Immediate investigation required - check for index issues or resource constraints'
        WHEN 'degradation_alert' THEN 'Investigate performance regression - check recent changes or resource utilization'
        WHEN 'slow_performance_alert' THEN 'Review query optimization opportunities and index effectiveness'
        WHEN 'high_volume_slow_alert' THEN 'Consider query optimization and capacity scaling'
        ELSE 'Continue monitoring'
    END as recommended_action,

    -- Urgency indicator
    CASE pa.alert_priority
        WHEN 'critical' THEN 'Immediate attention required (< 15 minutes)'
        WHEN 'high' THEN 'Urgent attention needed (< 1 hour)'
        WHEN 'medium' THEN 'Should be addressed within 4 hours'
        ELSE 'Monitor and address during normal maintenance'
    END as response_urgency,

    -- Last occurrence
    pa.last_execution,
    EXTRACT(MINUTES FROM (CURRENT_TIMESTAMP - pa.last_execution)) as minutes_since_last_query

FROM performance_alerts pa
WHERE pa.alert_level != 'normal'
ORDER BY 
    CASE pa.alert_priority 
        WHEN 'critical' THEN 1 
        WHEN 'high' THEN 2 
        WHEN 'medium' THEN 3 
        ELSE 4 
    END,
    pa.performance_change_percent DESC,
    pa.avg_time_last_hour DESC;

-- QueryLeaf provides comprehensive MongoDB performance optimization capabilities:
-- 1. Advanced query performance analysis with SQL-familiar syntax
-- 2. Comprehensive index usage monitoring and effectiveness analysis
-- 3. Real-time performance alerting and automated optimization recommendations
-- 4. Detailed execution plan analysis and optimization insights
-- 5. Index optimization strategies including compound index recommendations
-- 6. Performance trend analysis and predictive optimization
-- 7. Resource utilization monitoring and capacity planning
-- 8. Automated slow query detection and optimization guidance
-- 9. Enterprise-grade performance management with minimal configuration
-- 10. Production-ready monitoring and optimization automation

Best Practices for Production Performance Optimization

Index Strategy Design Principles

Essential principles for effective MongoDB index optimization deployment:

  1. Compound Index Design: Create efficient compound indexes following ESR rule (Equality, Sort, Range) for optimal query performance
  2. Index Usage Monitoring: Continuously monitor index usage patterns and effectiveness to identify optimization opportunities
  3. Query Pattern Analysis: Analyze query execution patterns to understand workload characteristics and optimization requirements
  4. Performance Testing: Implement comprehensive performance testing procedures for index changes and query optimizations
  5. Capacity Planning: Monitor query performance trends and resource utilization for proactive capacity management
  6. Automated Optimization: Establish automated performance monitoring and optimization recommendation systems

Enterprise Performance Management

Design performance optimization systems for enterprise-scale requirements:

  1. Real-Time Monitoring: Implement comprehensive real-time performance monitoring with intelligent alerting and automated responses
  2. Predictive Analytics: Use performance trend analysis and predictive modeling for proactive optimization and capacity planning
  3. Performance Governance: Establish performance standards, monitoring procedures, and optimization workflows
  4. Resource Optimization: Balance query performance with storage efficiency and maintenance overhead
  5. Compliance Integration: Ensure performance optimization procedures meet operational and compliance requirements
  6. Knowledge Management: Document optimization procedures, performance patterns, and best practices for operational excellence

Conclusion

MongoDB index optimization and query performance analysis provides comprehensive database tuning capabilities that enable applications to achieve optimal performance through intelligent indexing strategies, sophisticated query analysis, and automated optimization recommendations. The native performance analysis tools and integrated optimization guidance ensure that database operations maintain peak efficiency with minimal operational overhead.

Key MongoDB Performance Optimization benefits include:

  • Intelligent Analysis: Advanced query performance analysis with automated bottleneck identification and optimization recommendations
  • Index Optimization: Comprehensive index usage analysis with effectiveness assessment and automated cleanup suggestions
  • Real-Time Monitoring: Continuous performance monitoring with intelligent alerting and proactive optimization capabilities
  • Execution Plan Analysis: Detailed query execution plan analysis with optimization insights and improvement recommendations
  • Automated Recommendations: AI-powered optimization recommendations based on workload patterns and performance characteristics
  • SQL Accessibility: Familiar SQL-style performance operations through QueryLeaf for accessible database optimization

Whether you're optimizing high-traffic applications, managing large-scale data workloads, implementing performance monitoring systems, or maintaining enterprise database performance, MongoDB performance optimization with QueryLeaf's familiar SQL interface provides the foundation for sophisticated, scalable database tuning operations.

QueryLeaf Integration: QueryLeaf automatically translates SQL-style performance analysis operations into MongoDB's native profiling and indexing capabilities, making advanced performance optimization accessible to SQL-oriented database administrators. Complex index analysis, query optimization recommendations, and performance monitoring are seamlessly handled through familiar SQL constructs, enabling sophisticated database tuning without requiring deep MongoDB performance expertise.

The combination of MongoDB's robust performance analysis capabilities with SQL-style optimization operations makes it an ideal platform for applications requiring both sophisticated database performance management and familiar database administration patterns, ensuring your database operations can maintain optimal performance while scaling efficiently as workload complexity and data volume grow.

MongoDB Atlas Deployment Automation and Cloud Infrastructure: Advanced DevOps Integration and Infrastructure-as-Code for Scalable Database Operations

Modern cloud-native applications require sophisticated database infrastructure that can automatically scale, self-heal, and integrate seamlessly with DevOps workflows and CI/CD pipelines. Traditional database deployment approaches require manual configuration, complex scaling procedures, and extensive operational overhead to maintain production-ready database infrastructure. Effective cloud database management demands automated provisioning, intelligent resource optimization, and integrated monitoring capabilities.

MongoDB Atlas provides comprehensive cloud database automation through infrastructure-as-code integration, automated scaling policies, and advanced DevOps toolchain compatibility that enables sophisticated database operations with minimal manual intervention. Unlike traditional database hosting that requires complex server management and manual optimization, Atlas integrates database infrastructure directly into modern DevOps workflows with automated provisioning, intelligent scaling, and built-in operational excellence.

The Traditional Cloud Database Deployment Challenge

Conventional approaches to cloud database infrastructure management face significant operational complexity:

-- Traditional cloud database management - manual setup with extensive operational overhead

-- Basic database server provisioning tracking (manual process)
CREATE TABLE database_servers (
    server_id SERIAL PRIMARY KEY,
    server_name VARCHAR(255) NOT NULL,
    cloud_provider VARCHAR(100) NOT NULL,
    instance_type VARCHAR(100) NOT NULL,
    region VARCHAR(100) NOT NULL,

    -- Manual resource configuration
    cpu_cores INTEGER,
    memory_gb INTEGER,
    storage_gb INTEGER,
    iops INTEGER,

    -- Network configuration (manual setup)
    vpc_id VARCHAR(100),
    subnet_id VARCHAR(100),
    security_group_ids TEXT[],
    public_ip INET,
    private_ip INET,

    -- Database configuration
    database_engine VARCHAR(50) DEFAULT 'postgresql',
    engine_version VARCHAR(20),
    port INTEGER DEFAULT 5432,

    -- Status tracking
    server_status VARCHAR(50) DEFAULT 'creating',
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    last_updated TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    provisioned_by VARCHAR(100),

    -- Cost tracking (manual)
    estimated_monthly_cost DECIMAL(10,2),
    actual_monthly_cost DECIMAL(10,2)
);

-- Database deployment tracking (complex manual process)
CREATE TABLE database_deployments (
    deployment_id SERIAL PRIMARY KEY,
    deployment_name VARCHAR(255) NOT NULL,
    server_id INTEGER REFERENCES database_servers(server_id),
    environment VARCHAR(100) NOT NULL,

    -- Deployment configuration (manual setup)
    database_name VARCHAR(100) NOT NULL,
    schema_version VARCHAR(50),
    application_version VARCHAR(50),

    -- Manual backup configuration
    backup_enabled BOOLEAN DEFAULT true,
    backup_schedule VARCHAR(100), -- Cron format
    backup_retention_days INTEGER DEFAULT 30,
    backup_storage_location VARCHAR(200),

    -- Scaling configuration (manual)
    enable_auto_scaling BOOLEAN DEFAULT false,
    min_capacity INTEGER,
    max_capacity INTEGER,
    target_cpu_utilization DECIMAL(5,2) DEFAULT 70.0,
    target_memory_utilization DECIMAL(5,2) DEFAULT 80.0,

    -- Monitoring setup (manual integration)
    monitoring_enabled BOOLEAN DEFAULT false,
    monitoring_tools TEXT[],
    alert_endpoints TEXT[],

    -- Deployment metadata
    deployment_status VARCHAR(50) DEFAULT 'pending',
    deployed_at TIMESTAMP,
    deployed_by VARCHAR(100),
    deployment_duration_seconds INTEGER,

    -- Configuration validation
    config_validation_status VARCHAR(50),
    validation_errors TEXT[]
);

-- Manual scaling operation tracking
CREATE TABLE scaling_operations (
    scaling_id SERIAL PRIMARY KEY,
    server_id INTEGER REFERENCES database_servers(server_id),
    scaling_trigger VARCHAR(100),

    -- Resource changes (manual calculation)
    previous_cpu_cores INTEGER,
    new_cpu_cores INTEGER,
    previous_memory_gb INTEGER,
    new_memory_gb INTEGER,
    previous_storage_gb INTEGER,
    new_storage_gb INTEGER,

    -- Scaling metrics
    trigger_metric VARCHAR(100),
    trigger_threshold DECIMAL(10,2),
    current_utilization DECIMAL(10,2),

    -- Scaling execution
    scaling_status VARCHAR(50) DEFAULT 'pending',
    started_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    completed_at TIMESTAMP,
    downtime_seconds INTEGER,

    -- Cost impact
    previous_hourly_cost DECIMAL(10,4),
    new_hourly_cost DECIMAL(10,4),
    cost_impact_monthly DECIMAL(10,2)
);

-- Basic monitoring and alerting (very limited automation)
CREATE OR REPLACE FUNCTION check_database_health()
RETURNS TABLE (
    server_id INTEGER,
    health_status VARCHAR(50),
    cpu_utilization DECIMAL(5,2),
    memory_utilization DECIMAL(5,2),
    disk_utilization DECIMAL(5,2),
    connection_count INTEGER,
    active_queries INTEGER,
    replication_lag_seconds INTEGER,
    backup_status VARCHAR(50),
    alert_level VARCHAR(20),
    recommendations TEXT[]
) AS $$
BEGIN
    -- This would be a very simplified health check
    -- Real implementation would require complex monitoring integration

    RETURN QUERY
    SELECT 
        ds.server_id,

        -- Basic status assessment (very limited)
        CASE 
            WHEN ds.server_status != 'running' THEN 'unhealthy'
            ELSE 'healthy'
        END as health_status,

        -- Simulated metrics (would need real monitoring integration)
        (random() * 100)::DECIMAL(5,2) as cpu_utilization,
        (random() * 100)::DECIMAL(5,2) as memory_utilization,
        (random() * 100)::DECIMAL(5,2) as disk_utilization,
        (random() * 100)::INTEGER as connection_count,
        (random() * 20)::INTEGER as active_queries,
        (random() * 10)::INTEGER as replication_lag_seconds,

        -- Backup status (manual tracking)
        'unknown' as backup_status,

        -- Alert level determination
        CASE 
            WHEN ds.server_status != 'running' THEN 'critical'
            WHEN random() > 0.9 THEN 'warning'
            ELSE 'info'
        END as alert_level,

        -- Basic recommendations (very limited)
        ARRAY[
            CASE WHEN random() > 0.8 THEN 'Consider scaling up CPU resources' END,
            CASE WHEN random() > 0.7 THEN 'Review backup configuration' END,
            CASE WHEN random() > 0.6 THEN 'Monitor connection pool usage' END
        ]::TEXT[] as recommendations

    FROM database_servers ds
    WHERE ds.server_status = 'running';
END;
$$ LANGUAGE plpgsql;

-- Manual deployment automation attempt (very basic)
CREATE OR REPLACE FUNCTION deploy_database_environment(
    deployment_name_param VARCHAR(255),
    environment_param VARCHAR(100),
    instance_type_param VARCHAR(100),
    database_name_param VARCHAR(100)
) RETURNS TABLE (
    deployment_success BOOLEAN,
    deployment_id INTEGER,
    server_id INTEGER,
    deployment_time_seconds INTEGER,
    error_message TEXT
) AS $$
DECLARE
    new_deployment_id INTEGER;
    new_server_id INTEGER;
    deployment_start TIMESTAMP;
    deployment_end TIMESTAMP;
    deployment_error TEXT := '';
    deployment_result BOOLEAN := true;
BEGIN
    deployment_start := clock_timestamp();

    BEGIN
        -- Step 1: Create database server record (manual provisioning simulation)
        INSERT INTO database_servers (
            server_name,
            cloud_provider,
            instance_type,
            region,
            cpu_cores,
            memory_gb,
            storage_gb,
            server_status,
            provisioned_by
        )
        VALUES (
            deployment_name_param || '_' || environment_param,
            'manual_cloud_provider',
            instance_type_param,
            'us-east-1',
            -- Static resource allocation (no optimization)
            CASE instance_type_param
                WHEN 't3.micro' THEN 1
                WHEN 't3.small' THEN 2
                WHEN 't3.medium' THEN 2
                ELSE 4
            END,
            CASE instance_type_param
                WHEN 't3.micro' THEN 1
                WHEN 't3.small' THEN 2
                WHEN 't3.medium' THEN 4
                ELSE 8
            END,
            100, -- Fixed storage
            'creating',
            current_user
        )
        RETURNING server_id INTO new_server_id;

        -- Simulate provisioning time
        PERFORM pg_sleep(2);

        -- Update server status
        UPDATE database_servers 
        SET server_status = 'running', last_updated = clock_timestamp()
        WHERE server_id = new_server_id;

        -- Step 2: Create deployment record
        INSERT INTO database_deployments (
            deployment_name,
            server_id,
            environment,
            database_name,
            deployment_status,
            deployed_by
        )
        VALUES (
            deployment_name_param,
            new_server_id,
            environment_param,
            database_name_param,
            'creating',
            current_user
        )
        RETURNING deployment_id INTO new_deployment_id;

        -- Simulate deployment process
        PERFORM pg_sleep(1);

        -- Update deployment status
        UPDATE database_deployments 
        SET deployment_status = 'completed',
            deployed_at = clock_timestamp()
        WHERE deployment_id = new_deployment_id;

    EXCEPTION WHEN OTHERS THEN
        deployment_result := false;
        deployment_error := SQLERRM;

        -- Cleanup on failure
        IF new_server_id IS NOT NULL THEN
            UPDATE database_servers 
            SET server_status = 'failed'
            WHERE server_id = new_server_id;
        END IF;

        IF new_deployment_id IS NOT NULL THEN
            UPDATE database_deployments 
            SET deployment_status = 'failed'
            WHERE deployment_id = new_deployment_id;
        END IF;
    END;

    deployment_end := clock_timestamp();

    RETURN QUERY SELECT 
        deployment_result,
        new_deployment_id,
        new_server_id,
        EXTRACT(SECONDS FROM deployment_end - deployment_start)::INTEGER,
        deployment_error;
END;
$$ LANGUAGE plpgsql;

-- Basic infrastructure monitoring query (very limited capabilities)
WITH server_utilization AS (
    SELECT 
        ds.server_id,
        ds.server_name,
        ds.instance_type,
        ds.cpu_cores,
        ds.memory_gb,
        ds.storage_gb,
        ds.server_status,
        ds.estimated_monthly_cost,

        -- Simulated current utilization (would need real monitoring)
        (random() * 100)::DECIMAL(5,2) as current_cpu_percent,
        (random() * 100)::DECIMAL(5,2) as current_memory_percent,
        (random() * 100)::DECIMAL(5,2) as current_storage_percent,

        -- Basic scaling recommendations (very limited logic)
        CASE 
            WHEN random() > 0.8 THEN 'scale_up'
            WHEN random() < 0.2 THEN 'scale_down'
            ELSE 'no_action'
        END as scaling_recommendation

    FROM database_servers ds
    WHERE ds.server_status = 'running'
),

cost_analysis AS (
    SELECT 
        su.*,
        dd.environment,

        -- Basic cost optimization suggestions (manual analysis)
        CASE 
            WHEN su.current_cpu_percent < 30 AND su.current_memory_percent < 30 THEN 'overprovisioned'
            WHEN su.current_cpu_percent > 80 OR su.current_memory_percent > 80 THEN 'underprovisioned'
            ELSE 'appropriately_sized'
        END as resource_sizing,

        -- Simple cost projection
        su.estimated_monthly_cost * 
        CASE su.scaling_recommendation
            WHEN 'scale_up' THEN 1.5
            WHEN 'scale_down' THEN 0.7
            ELSE 1.0
        END as projected_monthly_cost

    FROM server_utilization su
    JOIN database_deployments dd ON su.server_id = dd.server_id
)

SELECT 
    ca.server_name,
    ca.environment,
    ca.instance_type,
    ca.server_status,

    -- Resource utilization
    ca.current_cpu_percent,
    ca.current_memory_percent,
    ca.current_storage_percent,

    -- Scaling analysis
    ca.scaling_recommendation,
    ca.resource_sizing,

    -- Cost analysis
    ca.estimated_monthly_cost,
    ca.projected_monthly_cost,
    ROUND((ca.projected_monthly_cost - ca.estimated_monthly_cost), 2) as monthly_cost_impact,

    -- Basic recommendations
    CASE 
        WHEN ca.resource_sizing = 'overprovisioned' THEN 'Consider downsizing to reduce costs'
        WHEN ca.resource_sizing = 'underprovisioned' THEN 'Scale up to improve performance'
        WHEN ca.current_storage_percent > 85 THEN 'Increase storage capacity soon'
        ELSE 'Monitor current resource usage'
    END as operational_recommendation

FROM cost_analysis ca
ORDER BY ca.estimated_monthly_cost DESC;

-- Problems with traditional cloud database deployment:
-- 1. Manual provisioning with no infrastructure-as-code integration
-- 2. Limited auto-scaling capabilities requiring manual intervention
-- 3. Basic monitoring with no intelligent alerting or remediation
-- 4. Complex backup and disaster recovery configuration
-- 5. No built-in security best practices or compliance features
-- 6. Manual cost optimization requiring constant monitoring
-- 7. Limited integration with CI/CD pipelines and DevOps workflows
-- 8. No automatic patching or maintenance scheduling
-- 9. Complex networking and security group management
-- 10. Basic performance optimization requiring database expertise

MongoDB Atlas provides comprehensive cloud database automation with advanced DevOps integration:

// MongoDB Atlas Advanced Deployment Automation and Cloud Infrastructure Management
const { MongoClient } = require('mongodb');
const axios = require('axios');

// Comprehensive MongoDB Atlas Infrastructure Manager
class AdvancedAtlasInfrastructureManager {
  constructor(atlasConfig = {}) {
    // Atlas API configuration
    this.atlasConfig = {
      publicKey: atlasConfig.publicKey,
      privateKey: atlasConfig.privateKey,
      baseURL: atlasConfig.baseURL || 'https://cloud.mongodb.com/api/atlas/v1.0',

      // Organization and project configuration
      organizationId: atlasConfig.organizationId,
      projectId: atlasConfig.projectId,

      // Infrastructure automation settings
      enableAutomatedDeployment: atlasConfig.enableAutomatedDeployment !== false,
      enableInfrastructureAsCode: atlasConfig.enableInfrastructureAsCode || false,
      enableAutomatedScaling: atlasConfig.enableAutomatedScaling !== false,

      // DevOps integration
      cicdIntegration: atlasConfig.cicdIntegration || false,
      terraformIntegration: atlasConfig.terraformIntegration || false,
      kubernetesIntegration: atlasConfig.kubernetesIntegration || false,

      // Monitoring and alerting
      enableAdvancedMonitoring: atlasConfig.enableAdvancedMonitoring !== false,
      enableAutomatedAlerting: atlasConfig.enableAutomatedAlerting !== false,
      enablePerformanceAdvisor: atlasConfig.enablePerformanceAdvisor !== false,

      // Security and compliance
      enableAdvancedSecurity: atlasConfig.enableAdvancedSecurity !== false,
      enableEncryptionAtRest: atlasConfig.enableEncryptionAtRest !== false,
      enableNetworkSecurity: atlasConfig.enableNetworkSecurity !== false,

      // Backup and disaster recovery
      enableContinuousBackup: atlasConfig.enableContinuousBackup !== false,
      enableCrossRegionBackup: atlasConfig.enableCrossRegionBackup || false,
      backupRetentionDays: atlasConfig.backupRetentionDays || 30,

      // Cost optimization
      enableCostOptimization: atlasConfig.enableCostOptimization || false,
      enableAutoArchiving: atlasConfig.enableAutoArchiving || false,
      costBudgetAlerts: atlasConfig.costBudgetAlerts || []
    };

    // Infrastructure state management
    this.clusters = new Map();
    this.deployments = new Map();
    this.scalingOperations = new Map();
    this.monitoringAlerts = new Map();

    // DevOps integration state
    this.cicdPipelines = new Map();
    this.infrastructureTemplates = new Map();

    // Performance and cost tracking
    this.performanceMetrics = {
      totalClusters: 0,
      averageResponseTime: 0,
      totalMonthlySpend: 0,
      costPerOperation: 0
    };

    this.initializeAtlasInfrastructure();
  }

  async initializeAtlasInfrastructure() {
    console.log('Initializing MongoDB Atlas infrastructure management...');

    try {
      // Validate Atlas API credentials
      await this.validateAtlasCredentials();

      // Initialize infrastructure automation
      if (this.atlasConfig.enableAutomatedDeployment) {
        await this.setupAutomatedDeployment();
      }

      // Setup infrastructure-as-code integration
      if (this.atlasConfig.enableInfrastructureAsCode) {
        await this.setupInfrastructureAsCode();
      }

      // Initialize monitoring and alerting
      if (this.atlasConfig.enableAdvancedMonitoring) {
        await this.setupAdvancedMonitoring();
      }

      // Setup DevOps integrations
      if (this.atlasConfig.cicdIntegration) {
        await this.setupCICDIntegration();
      }

      console.log('Atlas infrastructure management initialized successfully');

    } catch (error) {
      console.error('Error initializing Atlas infrastructure:', error);
      throw error;
    }
  }

  async deployCluster(clusterConfig, deploymentOptions = {}) {
    console.log(`Deploying Atlas cluster: ${clusterConfig.name}`);

    try {
      const deployment = {
        deploymentId: this.generateDeploymentId(),
        clusterName: clusterConfig.name,

        // Cluster specification
        clusterSpec: {
          name: clusterConfig.name,
          clusterType: clusterConfig.clusterType || 'REPLICASET',
          mongoDBVersion: clusterConfig.mongoDBVersion || '7.0',

          // Provider configuration
          providerSettings: {
            providerName: clusterConfig.providerName || 'AWS',
            regionName: clusterConfig.regionName || 'US_EAST_1',
            instanceSizeName: clusterConfig.instanceSizeName || 'M30',

            // Advanced configuration
            diskIOPS: clusterConfig.diskIOPS,
            encryptEBSVolume: this.atlasConfig.enableEncryptionAtRest,
            volumeType: clusterConfig.volumeType || 'STANDARD'
          },

          // Replication configuration
          replicationSpecs: clusterConfig.replicationSpecs || [
            {
              numShards: 1,
              regionsConfig: {
                [clusterConfig.regionName || 'US_EAST_1']: {
                  electableNodes: 3,
                  priority: 7,
                  readOnlyNodes: 0
                }
              }
            }
          ],

          // Backup configuration
          backupEnabled: this.atlasConfig.enableContinuousBackup,
          providerBackupEnabled: this.atlasConfig.enableCrossRegionBackup,

          // Auto-scaling configuration
          autoScaling: {
            diskGBEnabled: this.atlasConfig.enableAutomatedScaling,
            compute: {
              enabled: this.atlasConfig.enableAutomatedScaling,
              scaleDownEnabled: true,
              minInstanceSize: clusterConfig.minInstanceSize || 'M10',
              maxInstanceSize: clusterConfig.maxInstanceSize || 'M80'
            }
          }
        },

        // Deployment configuration
        deploymentConfig: {
          environment: deploymentOptions.environment || 'production',
          deploymentType: deploymentOptions.deploymentType || 'standard',
          rolloutStrategy: deploymentOptions.rolloutStrategy || 'immediate',

          // Network security
          networkAccessList: clusterConfig.networkAccessList || [],

          // Database users
          databaseUsers: clusterConfig.databaseUsers || [],

          // Advanced security
          ldapConfiguration: clusterConfig.ldapConfiguration,
          encryptionAtRestProvider: clusterConfig.encryptionAtRestProvider
        },

        // Deployment metadata
        startTime: new Date(),
        status: 'creating',
        createdBy: deploymentOptions.createdBy || 'system'
      };

      // Store deployment state
      this.deployments.set(deployment.deploymentId, deployment);

      // Execute Atlas cluster creation
      const clusterResponse = await this.createAtlasCluster(deployment.clusterSpec);

      // Setup monitoring and alerting
      if (this.atlasConfig.enableAdvancedMonitoring) {
        await this.setupClusterMonitoring(clusterResponse.clusterId, deployment);
      }

      // Configure network security
      await this.configureNetworkSecurity(clusterResponse.clusterId, deployment.deploymentConfig);

      // Create database users
      await this.createDatabaseUsers(clusterResponse.clusterId, deployment.deploymentConfig.databaseUsers);

      // Wait for cluster to be ready
      const clusterStatus = await this.waitForClusterReady(clusterResponse.clusterId);

      // Update deployment status
      deployment.status = 'completed';
      deployment.endTime = new Date();
      deployment.clusterId = clusterResponse.clusterId;
      deployment.connectionString = clusterStatus.connectionString;

      // Store cluster information
      this.clusters.set(clusterResponse.clusterId, {
        clusterId: clusterResponse.clusterId,
        deployment: deployment,
        specification: deployment.clusterSpec,
        status: clusterStatus,
        createdAt: deployment.startTime
      });

      // Update performance metrics
      this.updateInfrastructureMetrics(deployment);

      console.log(`Cluster deployed successfully: ${clusterConfig.name} (${clusterResponse.clusterId})`);

      return {
        success: true,
        deploymentId: deployment.deploymentId,
        clusterId: clusterResponse.clusterId,
        clusterName: clusterConfig.name,
        connectionString: clusterStatus.connectionString,

        // Deployment details
        deploymentTime: deployment.endTime.getTime() - deployment.startTime.getTime(),
        environment: deployment.deploymentConfig.environment,
        configuration: deployment.clusterSpec,

        // Monitoring and security
        monitoringEnabled: this.atlasConfig.enableAdvancedMonitoring,
        securityEnabled: this.atlasConfig.enableAdvancedSecurity,
        backupEnabled: deployment.clusterSpec.backupEnabled
      };

    } catch (error) {
      console.error(`Error deploying cluster '${clusterConfig.name}':`, error);

      // Update deployment status
      const deployment = this.deployments.get(this.generateDeploymentId());
      if (deployment) {
        deployment.status = 'failed';
        deployment.error = error.message;
        deployment.endTime = new Date();
      }

      return {
        success: false,
        error: error.message,
        clusterName: clusterConfig.name
      };
    }
  }

  async createAtlasCluster(clusterSpec) {
    console.log(`Creating Atlas cluster via API: ${clusterSpec.name}`);

    try {
      const response = await this.atlasAPIRequest('POST', `/groups/${this.atlasConfig.projectId}/clusters`, clusterSpec);

      return {
        clusterId: response.id,
        name: response.name,
        stateName: response.stateName,
        createDate: response.createDate
      };

    } catch (error) {
      console.error('Error creating Atlas cluster:', error);
      throw error;
    }
  }

  async setupAutomatedScaling(clusterId, scalingConfig) {
    console.log(`Setting up automated scaling for cluster: ${clusterId}`);

    try {
      const scalingConfiguration = {
        clusterId: clusterId,

        // Compute scaling configuration
        computeScaling: {
          enabled: scalingConfig.computeScaling !== false,
          scaleDownEnabled: scalingConfig.scaleDownEnabled !== false,

          // Instance size limits
          minInstanceSize: scalingConfig.minInstanceSize || 'M10',
          maxInstanceSize: scalingConfig.maxInstanceSize || 'M80',

          // Scaling triggers
          targetCPUUtilization: scalingConfig.targetCPUUtilization || 75,
          targetMemoryUtilization: scalingConfig.targetMemoryUtilization || 80,

          // Scaling behavior
          scaleUpPolicy: {
            cooldownMinutes: scalingConfig.scaleUpCooldown || 15,
            incrementPercent: scalingConfig.scaleUpIncrement || 100,
            units: 'INSTANCE_SIZE'
          },
          scaleDownPolicy: {
            cooldownMinutes: scalingConfig.scaleDownCooldown || 30,
            decrementPercent: scalingConfig.scaleDownDecrement || 50,
            units: 'INSTANCE_SIZE'
          }
        },

        // Storage scaling configuration
        storageScaling: {
          enabled: scalingConfig.storageScaling !== false,

          // Storage scaling triggers
          targetStorageUtilization: scalingConfig.targetStorageUtilization || 85,
          incrementGigabytes: scalingConfig.storageIncrement || 10,
          maxStorageGigabytes: scalingConfig.maxStorage || 4096
        },

        // Advanced scaling features
        advancedScaling: {
          enablePredictiveScaling: scalingConfig.enablePredictiveScaling || false,
          enableScheduledScaling: scalingConfig.enableScheduledScaling || false,
          scheduledScalingEvents: scalingConfig.scheduledScalingEvents || []
        }
      };

      // Configure compute auto-scaling
      if (scalingConfiguration.computeScaling.enabled) {
        await this.configureComputeScaling(clusterId, scalingConfiguration.computeScaling);
      }

      // Configure storage auto-scaling
      if (scalingConfiguration.storageScaling.enabled) {
        await this.configureStorageScaling(clusterId, scalingConfiguration.storageScaling);
      }

      // Store scaling configuration
      this.scalingOperations.set(clusterId, scalingConfiguration);

      return {
        success: true,
        clusterId: clusterId,
        scalingConfiguration: scalingConfiguration
      };

    } catch (error) {
      console.error(`Error setting up automated scaling for cluster ${clusterId}:`, error);
      return {
        success: false,
        error: error.message,
        clusterId: clusterId
      };
    }
  }

  async setupAdvancedMonitoring(clusterId, monitoringConfig = {}) {
    console.log(`Setting up advanced monitoring for cluster: ${clusterId}`);

    try {
      const monitoringConfiguration = {
        clusterId: clusterId,

        // Performance monitoring
        performanceMonitoring: {
          enabled: monitoringConfig.performanceMonitoring !== false,

          // Metrics collection
          collectDetailedMetrics: true,
          metricsRetentionDays: monitoringConfig.metricsRetentionDays || 30,

          // Performance insights
          enableSlowQueryAnalysis: true,
          enableIndexSuggestions: true,
          enableQueryOptimization: true,

          // Real-time monitoring
          enableRealTimeAlerts: true,
          alertLatencyThresholds: {
            warning: monitoringConfig.warningLatency || 1000,
            critical: monitoringConfig.criticalLatency || 5000
          }
        },

        // Infrastructure monitoring
        infrastructureMonitoring: {
          enabled: monitoringConfig.infrastructureMonitoring !== false,

          // Resource monitoring
          monitorCPUUtilization: true,
          monitorMemoryUtilization: true,
          monitorStorageUtilization: true,
          monitorNetworkUtilization: true,

          // Capacity planning
          enableCapacityForecasting: true,
          forecastingHorizonDays: monitoringConfig.forecastingHorizon || 30,

          // Health checks
          enableHealthChecks: true,
          healthCheckIntervalMinutes: monitoringConfig.healthCheckInterval || 5
        },

        // Application monitoring
        applicationMonitoring: {
          enabled: monitoringConfig.applicationMonitoring !== false,

          // Connection monitoring
          monitorConnectionUsage: true,
          connectionPoolAnalysis: true,

          // Query monitoring
          slowQueryThresholdMs: monitoringConfig.slowQueryThreshold || 1000,
          enableQueryProfiling: true,
          profileSampleRate: monitoringConfig.profileSampleRate || 0.1,

          // Error monitoring
          enableErrorTracking: true,
          errorAlertThreshold: monitoringConfig.errorAlertThreshold || 10
        },

        // Security monitoring
        securityMonitoring: {
          enabled: this.atlasConfig.enableAdvancedSecurity,

          // Access monitoring
          monitorDatabaseAccess: true,
          unusualAccessAlerts: true,

          // Authentication monitoring
          authenticationFailureAlerts: true,
          multipleFailedAttemptsThreshold: 5,

          // Data access monitoring
          sensitiveDataAccessMonitoring: true,
          dataExportMonitoring: true
        }
      };

      // Setup performance monitoring
      if (monitoringConfiguration.performanceMonitoring.enabled) {
        await this.configurePerformanceMonitoring(clusterId, monitoringConfiguration.performanceMonitoring);
      }

      // Setup infrastructure monitoring
      if (monitoringConfiguration.infrastructureMonitoring.enabled) {
        await this.configureInfrastructureMonitoring(clusterId, monitoringConfiguration.infrastructureMonitoring);
      }

      // Setup application monitoring
      if (monitoringConfiguration.applicationMonitoring.enabled) {
        await this.configureApplicationMonitoring(clusterId, monitoringConfiguration.applicationMonitoring);
      }

      // Setup security monitoring
      if (monitoringConfiguration.securityMonitoring.enabled) {
        await this.configureSecurityMonitoring(clusterId, monitoringConfiguration.securityMonitoring);
      }

      // Store monitoring configuration
      this.monitoringAlerts.set(clusterId, monitoringConfiguration);

      return {
        success: true,
        clusterId: clusterId,
        monitoringConfiguration: monitoringConfiguration
      };

    } catch (error) {
      console.error(`Error setting up monitoring for cluster ${clusterId}:`, error);
      return {
        success: false,
        error: error.message,
        clusterId: clusterId
      };
    }
  }

  async setupInfrastructureAsCode(templateConfig = {}) {
    console.log('Setting up infrastructure-as-code integration...');

    try {
      const infrastructureTemplate = {
        templateId: this.generateTemplateId(),
        templateName: templateConfig.name || 'mongodb-atlas-infrastructure',
        templateType: templateConfig.type || 'terraform',

        // Template configuration
        templateConfiguration: {
          // Provider configuration
          provider: templateConfig.provider || 'terraform',
          version: templateConfig.version || '1.0',

          // Atlas provider settings
          atlasProvider: {
            publicKey: '${var.atlas_public_key}',
            privateKey: '${var.atlas_private_key}',
            baseURL: this.atlasConfig.baseURL
          },

          // Infrastructure resources
          resources: {
            // Project resource
            project: {
              name: '${var.project_name}',
              orgId: this.atlasConfig.organizationId,

              // Project configuration
              isCollectingBugs: false,
              isDataExplorerEnabled: true,
              isPerformanceAdvisorEnabled: true,
              isRealtimePerformancePanelEnabled: true,
              isSchemaAdvisorEnabled: true
            },

            // Cluster resources
            clusters: templateConfig.clusters || [],

            // Database user resources
            databaseUsers: templateConfig.databaseUsers || [],

            // Network access rules
            networkAccessList: templateConfig.networkAccessList || [],

            // Alert configurations
            alertConfigurations: templateConfig.alertConfigurations || []
          },

          // Variables
          variables: {
            atlas_public_key: {
              description: 'MongoDB Atlas API Public Key',
              type: 'string',
              sensitive: false
            },
            atlas_private_key: {
              description: 'MongoDB Atlas API Private Key',
              type: 'string',
              sensitive: true
            },
            project_name: {
              description: 'Atlas Project Name',
              type: 'string',
              default: 'default-project'
            },
            environment: {
              description: 'Deployment Environment',
              type: 'string',
              default: 'development'
            }
          },

          // Outputs
          outputs: {
            cluster_connection_strings: {
              description: 'Atlas Cluster Connection Strings',
              value: '${tomap({ for k, cluster in mongodbatlas_cluster.clusters : k => cluster.connection_strings[0].standard_srv })}'
            },
            cluster_ids: {
              description: 'Atlas Cluster IDs',
              value: '${tomap({ for k, cluster in mongodbatlas_cluster.clusters : k => cluster.cluster_id })}'
            },
            project_id: {
              description: 'Atlas Project ID',
              value: '${mongodbatlas_project.project.id}'
            }
          }
        },

        // CI/CD integration
        cicdIntegration: {
          enabled: templateConfig.cicdIntegration || false,

          // Pipeline configuration
          pipeline: {
            stages: ['validate', 'plan', 'apply'],
            approvalRequired: templateConfig.requireApproval !== false,

            // Environment promotion
            environments: ['development', 'staging', 'production'],
            promotionStrategy: templateConfig.promotionStrategy || 'manual'
          },

          // Integration settings
          integrations: {
            github: templateConfig.githubIntegration || false,
            jenkins: templateConfig.jenkinsIntegration || false,
            gitlab: templateConfig.gitlabIntegration || false,
            azureDevOps: templateConfig.azureDevOpsIntegration || false
          }
        }
      };

      // Generate Terraform configuration
      const terraformConfig = this.generateTerraformConfig(infrastructureTemplate);

      // Generate CI/CD pipeline configuration
      const pipelineConfig = this.generatePipelineConfig(infrastructureTemplate);

      // Store template configuration
      this.infrastructureTemplates.set(infrastructureTemplate.templateId, infrastructureTemplate);

      return {
        success: true,
        templateId: infrastructureTemplate.templateId,
        templateConfiguration: infrastructureTemplate,
        terraformConfig: terraformConfig,
        pipelineConfig: pipelineConfig
      };

    } catch (error) {
      console.error('Error setting up infrastructure-as-code:', error);
      return {
        success: false,
        error: error.message
      };
    }
  }

  async performCostOptimization(clusterId, optimizationOptions = {}) {
    console.log(`Performing cost optimization for cluster: ${clusterId}`);

    try {
      const cluster = this.clusters.get(clusterId);
      if (!cluster) {
        throw new Error(`Cluster not found: ${clusterId}`);
      }

      // Collect performance and utilization metrics
      const performanceMetrics = await this.collectPerformanceMetrics(clusterId);
      const utilizationMetrics = await this.collectUtilizationMetrics(clusterId);
      const costMetrics = await this.collectCostMetrics(clusterId);

      // Analyze optimization opportunities
      const optimizationAnalysis = {
        clusterId: clusterId,
        analysisTime: new Date(),

        // Performance analysis
        performanceAnalysis: {
          averageResponseTime: performanceMetrics.averageResponseTime,
          peakResponseTime: performanceMetrics.peakResponseTime,
          queryThroughput: performanceMetrics.queryThroughput,
          resourceBottlenecks: performanceMetrics.bottlenecks
        },

        // Utilization analysis
        utilizationAnalysis: {
          cpuUtilization: {
            average: utilizationMetrics.cpu.average,
            peak: utilizationMetrics.cpu.peak,
            recommendation: this.generateCPURecommendation(utilizationMetrics.cpu)
          },
          memoryUtilization: {
            average: utilizationMetrics.memory.average,
            peak: utilizationMetrics.memory.peak,
            recommendation: this.generateMemoryRecommendation(utilizationMetrics.memory)
          },
          storageUtilization: {
            used: utilizationMetrics.storage.used,
            available: utilizationMetrics.storage.available,
            growthRate: utilizationMetrics.storage.growthRate,
            recommendation: this.generateStorageRecommendation(utilizationMetrics.storage)
          }
        },

        // Cost analysis
        costAnalysis: {
          currentMonthlyCost: costMetrics.currentMonthlyCost,
          costTrends: costMetrics.trends,
          costBreakdown: costMetrics.breakdown,

          // Optimization opportunities
          optimizationOpportunities: []
        }
      };

      // Generate optimization recommendations
      const recommendations = this.generateOptimizationRecommendations(
        performanceMetrics,
        utilizationMetrics,
        costMetrics,
        optimizationOptions
      );

      optimizationAnalysis.recommendations = recommendations;

      // Calculate potential savings
      const savingsAnalysis = this.calculatePotentialSavings(recommendations, costMetrics);
      optimizationAnalysis.savingsAnalysis = savingsAnalysis;

      // Apply optimizations if auto-optimization is enabled
      if (optimizationOptions.autoOptimize) {
        const optimizationResults = await this.applyOptimizations(clusterId, recommendations);
        optimizationAnalysis.optimizationResults = optimizationResults;
      }

      return {
        success: true,
        clusterId: clusterId,
        optimizationAnalysis: optimizationAnalysis
      };

    } catch (error) {
      console.error(`Error performing cost optimization for cluster ${clusterId}:`, error);
      return {
        success: false,
        error: error.message,
        clusterId: clusterId
      };
    }
  }

  // Utility methods for Atlas operations

  async atlasAPIRequest(method, endpoint, data = null) {
    const url = `${this.atlasConfig.baseURL}${endpoint}`;
    const auth = Buffer.from(`${this.atlasConfig.publicKey}:${this.atlasConfig.privateKey}`).toString('base64');

    try {
      const config = {
        method: method,
        url: url,
        headers: {
          'Content-Type': 'application/json',
          'Authorization': `Basic ${auth}`
        }
      };

      if (data) {
        config.data = data;
      }

      const response = await axios(config);
      return response.data;

    } catch (error) {
      console.error(`Atlas API request failed: ${method} ${endpoint}`, error);
      throw error;
    }
  }

  async validateAtlasCredentials() {
    try {
      await this.atlasAPIRequest('GET', '/orgs');
      console.log('Atlas API credentials validated successfully');
    } catch (error) {
      throw new Error('Invalid Atlas API credentials');
    }
  }

  generateDeploymentId() {
    return `deployment_${Date.now()}_${Math.random().toString(36).substr(2, 9)}`;
  }

  generateTemplateId() {
    return `template_${Date.now()}_${Math.random().toString(36).substr(2, 9)}`;
  }

  async waitForClusterReady(clusterId, timeoutMinutes = 30) {
    const timeout = timeoutMinutes * 60 * 1000;
    const startTime = Date.now();

    while (Date.now() - startTime < timeout) {
      try {
        const clusterStatus = await this.atlasAPIRequest('GET', `/groups/${this.atlasConfig.projectId}/clusters/${clusterId}`);

        if (clusterStatus.stateName === 'IDLE') {
          return {
            clusterId: clusterId,
            state: clusterStatus.stateName,
            connectionString: clusterStatus.connectionStrings?.standardSrv,
            mongoDBVersion: clusterStatus.mongoDBVersion
          };
        }

        console.log(`Waiting for cluster ${clusterId} to be ready. Current state: ${clusterStatus.stateName}`);
        await this.sleep(30000); // Wait 30 seconds

      } catch (error) {
        console.error(`Error checking cluster status: ${clusterId}`, error);
        await this.sleep(30000);
      }
    }

    throw new Error(`Cluster ${clusterId} did not become ready within ${timeoutMinutes} minutes`);
  }

  sleep(ms) {
    return new Promise(resolve => setTimeout(resolve, ms));
  }

  updateInfrastructureMetrics(deployment) {
    this.performanceMetrics.totalClusters++;
    // Update other metrics based on deployment
  }

  generateTerraformConfig(infrastructureTemplate) {
    // Generate Terraform configuration files based on template
    return {
      mainTf: `# MongoDB Atlas Infrastructure Configuration
provider "mongodbatlas" {
  public_key  = var.atlas_public_key
  private_key = var.atlas_private_key
}

# Variables and resources would be generated here based on template
`,
      variablesTf: `# Infrastructure variables
variable "atlas_public_key" {
  description = "MongoDB Atlas API Public Key"
  type        = string
}

variable "atlas_private_key" {
  description = "MongoDB Atlas API Private Key"
  type        = string
  sensitive   = true
}
`,
      outputsTf: `# Infrastructure outputs
output "cluster_connection_strings" {
  description = "Atlas Cluster Connection Strings"
  value       = mongodbatlas_cluster.main.connection_strings
}
`
    };
  }

  generatePipelineConfig(infrastructureTemplate) {
    // Generate CI/CD pipeline configuration
    return {
      githubActions: `# GitHub Actions workflow for Atlas infrastructure
name: MongoDB Atlas Infrastructure Deployment

on:
  push:
    branches: [main, develop]
  pull_request:
    branches: [main]

jobs:
  terraform:
    name: Terraform
    runs-on: ubuntu-latest

    steps:
    - name: Checkout
      uses: actions/checkout@v2

    - name: Setup Terraform
      uses: hashicorp/setup-terraform@v1

    - name: Terraform Plan
      run: terraform plan

    - name: Terraform Apply
      if: github.ref == 'refs/heads/main'
      run: terraform apply -auto-approve
`,
      jenkins: `// Jenkins pipeline for Atlas infrastructure
pipeline {
    agent any

    stages {
        stage('Checkout') {
            steps {
                checkout scm
            }
        }

        stage('Terraform Plan') {
            steps {
                sh 'terraform plan -out=tfplan'
            }
        }

        stage('Terraform Apply') {
            when {
                branch 'main'
            }
            steps {
                sh 'terraform apply tfplan'
            }
        }
    }
}
`
    };
  }

  // Additional methods would include implementations for:
  // - setupAutomatedDeployment()
  // - setupCICDIntegration()
  // - configureNetworkSecurity()
  // - createDatabaseUsers()
  // - configureComputeScaling()
  // - configureStorageScaling()
  // - configurePerformanceMonitoring()
  // - configureInfrastructureMonitoring()
  // - configureApplicationMonitoring()
  // - configureSecurityMonitoring()
  // - collectPerformanceMetrics()
  // - collectUtilizationMetrics()
  // - collectCostMetrics()
  // - generateOptimizationRecommendations()
  // - calculatePotentialSavings()
  // - applyOptimizations()
}

// Benefits of MongoDB Atlas Advanced Infrastructure Management:
// - Automated deployment with infrastructure-as-code integration
// - Intelligent auto-scaling based on real-time metrics and predictions
// - Comprehensive monitoring and alerting for proactive management
// - Advanced security and compliance features built-in
// - DevOps pipeline integration for continuous deployment
// - Cost optimization with automated resource right-sizing
// - Enterprise-grade backup and disaster recovery capabilities
// - Multi-cloud deployment and management capabilities
// - SQL-compatible operations through QueryLeaf integration
// - Production-ready infrastructure automation and orchestration

module.exports = {
  AdvancedAtlasInfrastructureManager
};

Understanding MongoDB Atlas Infrastructure Architecture

Advanced Cloud Database Operations and DevOps Integration Patterns

Implement sophisticated Atlas infrastructure patterns for enterprise deployments:

// Enterprise-grade Atlas infrastructure with advanced DevOps integration and multi-cloud capabilities
class EnterpriseAtlasOrchestrator extends AdvancedAtlasInfrastructureManager {
  constructor(atlasConfig, enterpriseConfig) {
    super(atlasConfig);

    this.enterpriseConfig = {
      ...enterpriseConfig,
      enableMultiCloudDeployment: true,
      enableDisasterRecoveryAutomation: true,
      enableComplianceAutomation: true,
      enableAdvancedSecurity: true,
      enableGlobalDistribution: true
    };

    this.setupEnterpriseCapabilities();
    this.initializeMultiCloudOrchestration();
    this.setupComplianceAutomation();
  }

  async implementMultiCloudStrategy(cloudConfiguration) {
    console.log('Implementing multi-cloud Atlas deployment strategy...');

    const multiCloudStrategy = {
      // Multi-cloud provider configuration
      cloudProviders: {
        aws: { regions: ['us-east-1', 'eu-west-1'], priority: 1 },
        gcp: { regions: ['us-central1', 'europe-west1'], priority: 2 },
        azure: { regions: ['eastus', 'westeurope'], priority: 3 }
      },

      // Global distribution strategy
      globalDistribution: {
        primaryRegion: 'us-east-1',
        secondaryRegions: ['eu-west-1', 'asia-southeast-1'],
        dataResidencyRules: true,
        latencyOptimization: true
      },

      // Disaster recovery automation
      disasterRecovery: {
        crossCloudBackup: true,
        automaticFailover: true,
        recoveryTimeObjective: '4h',
        recoveryPointObjective: '15min'
      }
    };

    return await this.deployMultiCloudInfrastructure(multiCloudStrategy);
  }

  async setupAdvancedComplianceAutomation() {
    console.log('Setting up enterprise compliance automation...');

    const complianceCapabilities = {
      // Regulatory compliance
      regulatoryFrameworks: {
        gdpr: { dataResidency: true, rightToErasure: true },
        hipaa: { encryption: true, auditLogging: true },
        sox: { changeTracking: true, accessControls: true },
        pci: { dataEncryption: true, networkSecurity: true }
      },

      // Automated compliance monitoring
      complianceMonitoring: {
        continuousAssessment: true,
        violationDetection: true,
        automaticRemediation: true,
        complianceReporting: true
      },

      // Enterprise security
      enterpriseSecurity: {
        zeroTrustNetworking: true,
        advancedThreatDetection: true,
        dataLossPreventionm: true,
        privilegedAccessManagement: true
      }
    };

    return await this.deployComplianceAutomation(complianceCapabilities);
  }
}

SQL-Style Atlas Operations with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB Atlas infrastructure operations:

-- QueryLeaf advanced Atlas infrastructure operations with SQL-familiar syntax for MongoDB

-- Atlas cluster deployment with comprehensive configuration
CREATE ATLAS_CLUSTER production_cluster (
  -- Cluster configuration
  cluster_type = 'REPLICASET',
  mongodb_version = '7.0',

  -- Provider and region configuration
  provider_name = 'AWS',
  region_name = 'US_EAST_1',
  instance_size = 'M30',

  -- Multi-region configuration
  replication_specs = JSON_OBJECT(
    'num_shards', 1,
    'regions_config', JSON_OBJECT(
      'US_EAST_1', JSON_OBJECT(
        'electable_nodes', 3,
        'priority', 7,
        'read_only_nodes', 0
      ),
      'EU_WEST_1', JSON_OBJECT(
        'electable_nodes', 2,
        'priority', 6,
        'read_only_nodes', 1
      )
    )
  ),

  -- Auto-scaling configuration
  auto_scaling = JSON_OBJECT(
    'disk_gb_enabled', true,
    'compute_enabled', true,
    'compute_scale_down_enabled', true,
    'compute_min_instance_size', 'M10',
    'compute_max_instance_size', 'M80'
  ),

  -- Backup configuration
  backup_enabled = true,
  provider_backup_enabled = true,

  -- Performance configuration
  disk_iops = 3000,
  volume_type = 'PROVISIONED',
  encrypt_ebs_volume = true,

  -- Advanced configuration
  bi_connector_enabled = false,
  pit_enabled = true,
  oplog_size_mb = 2048,

  -- Network security
  network_access_list = ARRAY[
    JSON_OBJECT('ip_address', '10.0.0.0/8', 'comment', 'Internal network'),
    JSON_OBJECT('cidr_block', '172.16.0.0/12', 'comment', 'VPC network')
  ],

  -- Monitoring and alerting
  monitoring = JSON_OBJECT(
    'enable_performance_advisor', true,
    'enable_realtime_performance_panel', true,
    'enable_schema_advisor', true,
    'data_explorer_enabled', true
  )
);

-- Advanced Atlas cluster monitoring and performance analysis
WITH cluster_performance AS (
  SELECT 
    cluster_name,
    cluster_id,
    DATE_TRUNC('hour', metric_timestamp) as time_bucket,

    -- Performance metrics aggregation
    AVG(connections_current) as avg_connections,
    MAX(connections_current) as peak_connections,
    AVG(opcounters_query) as avg_queries_per_second,
    AVG(opcounters_insert) as avg_inserts_per_second,
    AVG(opcounters_update) as avg_updates_per_second,
    AVG(opcounters_delete) as avg_deletes_per_second,

    -- Resource utilization
    AVG(system_cpu_user) as avg_cpu_user,
    AVG(system_cpu_kernel) as avg_cpu_kernel,
    AVG(system_memory_used_mb) / AVG(system_memory_available_mb) * 100 as avg_memory_utilization,
    AVG(system_network_in_bytes) as avg_network_in_bytes,
    AVG(system_network_out_bytes) as avg_network_out_bytes,

    -- Storage metrics
    AVG(system_disk_space_used_data_bytes) as avg_data_size_bytes,
    AVG(system_disk_space_used_index_bytes) as avg_index_size_bytes,
    AVG(system_disk_space_used_total_bytes) as avg_total_storage_bytes,

    -- Performance indicators
    AVG(global_lock_current_queue_readers) as avg_queue_readers,
    AVG(global_lock_current_queue_writers) as avg_queue_writers,
    AVG(wt_cache_pages_currently_held_in_cache) as avg_cache_pages,

    -- Replication metrics
    AVG(replset_oplog_head_timestamp) as oplog_head_timestamp,
    AVG(replset_oplog_tail_timestamp) as oplog_tail_timestamp,
    MAX(replset_member_lag_millis) as max_replication_lag

  FROM ATLAS_METRICS('production_cluster')
  WHERE metric_timestamp >= CURRENT_TIMESTAMP - INTERVAL '24 hours'
  GROUP BY cluster_name, cluster_id, DATE_TRUNC('hour', metric_timestamp)
),

performance_analysis AS (
  SELECT 
    cp.*,

    -- Calculate total CPU utilization
    (cp.avg_cpu_user + cp.avg_cpu_kernel) as total_cpu_utilization,

    -- Calculate storage utilization percentage
    CASE 
      WHEN cp.avg_total_storage_bytes > 0 THEN
        (cp.avg_data_size_bytes + cp.avg_index_size_bytes) / cp.avg_total_storage_bytes * 100
      ELSE 0
    END as storage_utilization_percent,

    -- Network utilization
    (cp.avg_network_in_bytes + cp.avg_network_out_bytes) / 1024 / 1024 as total_network_mb,

    -- Performance score calculation
    CASE 
      WHEN (cp.avg_cpu_user + cp.avg_cpu_kernel) > 80 THEN 'high_cpu_load'
      WHEN cp.avg_memory_utilization > 85 THEN 'high_memory_usage'
      WHEN cp.avg_queue_readers + cp.avg_queue_writers > 10 THEN 'high_queue_pressure'
      WHEN cp.max_replication_lag > 10000 THEN 'high_replication_lag'
      ELSE 'healthy'
    END as performance_status,

    -- Capacity planning indicators
    LAG(cp.avg_connections) OVER (ORDER BY cp.time_bucket) as prev_hour_connections,
    LAG(cp.avg_total_storage_bytes) OVER (ORDER BY cp.time_bucket) as prev_hour_storage,

    -- Query performance trends
    (cp.avg_queries_per_second + cp.avg_inserts_per_second + 
     cp.avg_updates_per_second + cp.avg_deletes_per_second) as total_operations_per_second

  FROM cluster_performance cp
),

scaling_recommendations AS (
  SELECT 
    pa.*,

    -- Connection scaling analysis
    CASE 
      WHEN pa.peak_connections > 500 AND pa.avg_connections / pa.peak_connections > 0.8 THEN 'scale_up_connections'
      WHEN pa.peak_connections < 100 AND pa.avg_connections < 50 THEN 'optimize_connection_pooling'
      ELSE 'connections_appropriate'
    END as connection_scaling_recommendation,

    -- Compute scaling analysis
    CASE 
      WHEN pa.total_cpu_utilization > 75 AND pa.avg_memory_utilization > 80 THEN 'scale_up_compute'
      WHEN pa.total_cpu_utilization < 30 AND pa.avg_memory_utilization < 50 THEN 'scale_down_compute'
      ELSE 'compute_appropriate'
    END as compute_scaling_recommendation,

    -- Storage scaling analysis
    CASE 
      WHEN pa.storage_utilization_percent > 85 THEN 'increase_storage_immediately'
      WHEN pa.storage_utilization_percent > 75 THEN 'monitor_storage_closely'
      WHEN (pa.avg_total_storage_bytes - pa.prev_hour_storage) > 1024*1024*1024 THEN 'high_storage_growth'
      ELSE 'storage_appropriate'
    END as storage_scaling_recommendation,

    -- Performance optimization recommendations
    ARRAY[
      CASE WHEN pa.avg_queue_readers > 5 THEN 'optimize_read_queries' END,
      CASE WHEN pa.avg_queue_writers > 5 THEN 'optimize_write_operations' END,
      CASE WHEN pa.max_replication_lag > 5000 THEN 'investigate_replication_lag' END,
      CASE WHEN pa.avg_cache_pages < 1000 THEN 'increase_cache_size' END
    ]::TEXT[] as performance_optimization_recommendations,

    -- Cost optimization opportunities
    CASE 
      WHEN pa.total_cpu_utilization < 25 AND pa.avg_memory_utilization < 40 THEN 'overprovisioned'
      WHEN pa.total_operations_per_second < 100 AND pa.avg_connections < 10 THEN 'underutilized'
      ELSE 'appropriately_sized'
    END as cost_optimization_status

  FROM performance_analysis pa
)

SELECT 
  sr.cluster_name,
  sr.time_bucket,

  -- Performance metrics
  ROUND(sr.total_cpu_utilization, 2) as cpu_utilization_percent,
  ROUND(sr.avg_memory_utilization, 2) as memory_utilization_percent,
  ROUND(sr.storage_utilization_percent, 2) as storage_utilization_percent,
  sr.avg_connections,
  sr.peak_connections,

  -- Operations throughput
  ROUND(sr.total_operations_per_second, 2) as operations_per_second,
  ROUND(sr.total_network_mb, 2) as network_throughput_mb,

  -- Performance assessment
  sr.performance_status,

  -- Scaling recommendations
  sr.connection_scaling_recommendation,
  sr.compute_scaling_recommendation,
  sr.storage_scaling_recommendation,

  -- Optimization recommendations
  ARRAY_REMOVE(sr.performance_optimization_recommendations, NULL) as optimization_recommendations,

  -- Cost optimization
  sr.cost_optimization_status,

  -- Growth trends
  CASE 
    WHEN sr.avg_connections > sr.prev_hour_connections * 1.1 THEN 'connection_growth'
    WHEN sr.avg_total_storage_bytes > sr.prev_hour_storage * 1.05 THEN 'storage_growth'
    ELSE 'stable'
  END as growth_trend,

  -- Alert conditions
  ARRAY[
    CASE WHEN sr.total_cpu_utilization > 90 THEN 'CRITICAL: CPU utilization very high' END,
    CASE WHEN sr.avg_memory_utilization > 95 THEN 'CRITICAL: Memory utilization critical' END,
    CASE WHEN sr.storage_utilization_percent > 90 THEN 'CRITICAL: Storage nearly full' END,
    CASE WHEN sr.max_replication_lag > 30000 THEN 'WARNING: High replication lag detected' END,
    CASE WHEN sr.avg_queue_readers + sr.avg_queue_writers > 20 THEN 'WARNING: High queue pressure' END
  ]::TEXT[] as active_alerts,

  -- Actionable insights
  CASE 
    WHEN sr.performance_status = 'high_cpu_load' THEN 'Scale up instance size or optimize queries'
    WHEN sr.performance_status = 'high_memory_usage' THEN 'Increase memory or optimize data structures'
    WHEN sr.performance_status = 'high_queue_pressure' THEN 'Optimize slow queries and add indexes'
    WHEN sr.performance_status = 'high_replication_lag' THEN 'Check network connectivity and oplog size'
    WHEN sr.cost_optimization_status = 'overprovisioned' THEN 'Consider scaling down to reduce costs'
    ELSE 'Continue monitoring current performance'
  END as recommended_action

FROM scaling_recommendations sr
WHERE sr.performance_status != 'healthy' 
   OR sr.cost_optimization_status IN ('overprovisioned', 'underutilized')
   OR sr.compute_scaling_recommendation != 'compute_appropriate'
ORDER BY 
  CASE sr.performance_status 
    WHEN 'high_cpu_load' THEN 1
    WHEN 'high_memory_usage' THEN 2
    WHEN 'high_queue_pressure' THEN 3
    WHEN 'high_replication_lag' THEN 4
    ELSE 5
  END,
  sr.time_bucket DESC;

-- Atlas infrastructure-as-code deployment and management
WITH deployment_templates AS (
  SELECT 
    template_name,
    template_version,
    environment,

    -- Infrastructure specification
    JSON_BUILD_OBJECT(
      'cluster_config', JSON_BUILD_OBJECT(
        'cluster_type', 'REPLICASET',
        'mongodb_version', '7.0',
        'provider_name', 'AWS',
        'instance_size', CASE environment
          WHEN 'production' THEN 'M30'
          WHEN 'staging' THEN 'M20'
          WHEN 'development' THEN 'M10'
        END,
        'replication_factor', CASE environment
          WHEN 'production' THEN 3
          WHEN 'staging' THEN 3
          WHEN 'development' THEN 1
        END
      ),
      'auto_scaling', JSON_BUILD_OBJECT(
        'compute_enabled', environment IN ('production', 'staging'),
        'storage_enabled', true,
        'min_instance_size', CASE environment
          WHEN 'production' THEN 'M30'
          WHEN 'staging' THEN 'M20'
          WHEN 'development' THEN 'M10'
        END,
        'max_instance_size', CASE environment
          WHEN 'production' THEN 'M80'
          WHEN 'staging' THEN 'M40'
          WHEN 'development' THEN 'M20'
        END
      ),
      'backup_config', JSON_BUILD_OBJECT(
        'continuous_backup_enabled', environment = 'production',
        'snapshot_backup_enabled', true,
        'backup_retention_days', CASE environment
          WHEN 'production' THEN 7
          WHEN 'staging' THEN 3
          WHEN 'development' THEN 1
        END
      ),
      'security_config', JSON_BUILD_OBJECT(
        'encryption_at_rest', environment IN ('production', 'staging'),
        'network_access_restricted', true,
        'database_auditing', environment = 'production',
        'ldap_authentication', environment = 'production'
      )
    ) as infrastructure_spec,

    -- Deployment configuration
    JSON_BUILD_OBJECT(
      'deployment_strategy', 'rolling',
      'approval_required', environment = 'production',
      'automated_testing', true,
      'rollback_on_failure', true,
      'notification_channels', ARRAY[
        'email:ops-team@company.com',
        'slack:#database-ops'
      ]
    ) as deployment_config,

    -- Monitoring configuration
    JSON_BUILD_OBJECT(
      'performance_monitoring', true,
      'custom_alerts', ARRAY[
        JSON_BUILD_OBJECT(
          'metric', 'CONNECTIONS_PERCENT',
          'threshold', 80,
          'comparison', 'GREATER_THAN'
        ),
        JSON_BUILD_OBJECT(
          'metric', 'NORMALIZED_SYSTEM_CPU_USER',
          'threshold', 75,
          'comparison', 'GREATER_THAN'
        ),
        JSON_BUILD_OBJECT(
          'metric', 'DISK_PARTITION_SPACE_USED_DATA',
          'threshold', 85,
          'comparison', 'GREATER_THAN'
        )
      ],
      'notification_delay_minutes', 5,
      'auto_scaling_triggers', JSON_BUILD_OBJECT(
        'cpu_threshold_percent', 75,
        'memory_threshold_percent', 80,
        'connections_threshold_percent', 80
      )
    ) as monitoring_config

  FROM (
    VALUES 
      ('web-app-cluster', '1.0', 'production'),
      ('web-app-cluster', '1.0', 'staging'),
      ('web-app-cluster', '1.0', 'development'),
      ('analytics-cluster', '1.0', 'production'),
      ('reporting-cluster', '1.0', 'production')
  ) as templates(template_name, template_version, environment)
),

deployment_validation AS (
  SELECT 
    dt.*,

    -- Cost estimation
    CASE dt.environment
      WHEN 'production' THEN 
        CASE 
          WHEN dt.infrastructure_spec->>'cluster_config'->>'instance_size' = 'M30' THEN 590
          WHEN dt.infrastructure_spec->>'cluster_config'->>'instance_size' = 'M40' THEN 940
          WHEN dt.infrastructure_spec->>'cluster_config'->>'instance_size' = 'M80' THEN 2350
          ELSE 300
        END
      WHEN 'staging' THEN 
        CASE 
          WHEN dt.infrastructure_spec->>'cluster_config'->>'instance_size' = 'M20' THEN 350
          WHEN dt.infrastructure_spec->>'cluster_config'->>'instance_size' = 'M40' THEN 940
          ELSE 200
        END
      ELSE 57  -- Development M10
    END as estimated_monthly_cost_usd,

    -- Compliance validation
    CASE 
      WHEN dt.environment = 'production' AND 
           (dt.infrastructure_spec->'security_config'->>'encryption_at_rest')::BOOLEAN = false THEN 'encryption_required'
      WHEN dt.environment = 'production' AND 
           (dt.infrastructure_spec->'backup_config'->>'continuous_backup_enabled')::BOOLEAN = false THEN 'continuous_backup_required'
      WHEN (dt.infrastructure_spec->'security_config'->>'network_access_restricted')::BOOLEAN = false THEN 'network_security_required'
      ELSE 'compliant'
    END as compliance_status,

    -- Resource sizing validation
    CASE 
      WHEN dt.environment = 'production' AND 
           dt.infrastructure_spec->>'cluster_config'->>'instance_size' < 'M30' THEN 'undersized_for_production'
      WHEN dt.environment = 'development' AND 
           dt.infrastructure_spec->>'cluster_config'->>'instance_size' > 'M20' THEN 'oversized_for_development'
      ELSE 'appropriately_sized'
    END as sizing_validation,

    -- Deployment readiness
    CASE 
      WHEN dt.infrastructure_spec IS NULL THEN 'missing_infrastructure_spec'
      WHEN dt.deployment_config IS NULL THEN 'missing_deployment_config'
      WHEN dt.monitoring_config IS NULL THEN 'missing_monitoring_config'
      ELSE 'ready_for_deployment'
    END as deployment_readiness

  FROM deployment_templates dt
)

SELECT 
  dv.template_name,
  dv.environment,
  dv.template_version,

  -- Infrastructure summary
  dv.infrastructure_spec->'cluster_config'->>'instance_size' as instance_size,
  dv.infrastructure_spec->'cluster_config'->>'mongodb_version' as mongodb_version,
  (dv.infrastructure_spec->'cluster_config'->>'replication_factor')::INTEGER as replication_factor,

  -- Auto-scaling configuration
  (dv.infrastructure_spec->'auto_scaling'->>'compute_enabled')::BOOLEAN as auto_scaling_enabled,
  dv.infrastructure_spec->'auto_scaling'->>'min_instance_size' as min_instance_size,
  dv.infrastructure_spec->'auto_scaling'->>'max_instance_size' as max_instance_size,

  -- Security and backup
  (dv.infrastructure_spec->'security_config'->>'encryption_at_rest')::BOOLEAN as encryption_enabled,
  (dv.infrastructure_spec->'backup_config'->>'continuous_backup_enabled')::BOOLEAN as continuous_backup,
  (dv.infrastructure_spec->'backup_config'->>'backup_retention_days')::INTEGER as backup_retention_days,

  -- Cost and validation
  dv.estimated_monthly_cost_usd,
  dv.compliance_status,
  dv.sizing_validation,
  dv.deployment_readiness,

  -- Alert configuration count
  JSON_ARRAY_LENGTH(dv.monitoring_config->'custom_alerts') as custom_alert_count,

  -- Deployment recommendations
  ARRAY[
    CASE WHEN dv.compliance_status != 'compliant' THEN 'Fix compliance issues before deployment' END,
    CASE WHEN dv.sizing_validation LIKE '%undersized%' THEN 'Increase instance size for production workload' END,
    CASE WHEN dv.sizing_validation LIKE '%oversized%' THEN 'Consider smaller instance size to reduce costs' END,
    CASE WHEN dv.estimated_monthly_cost_usd > 1000 AND dv.environment != 'production' 
         THEN 'Review cost allocation for non-production environment' END,
    CASE WHEN JSON_ARRAY_LENGTH(dv.monitoring_config->'custom_alerts') < 3 
         THEN 'Add more comprehensive monitoring alerts' END
  ]::TEXT[] as deployment_recommendations,

  -- Deployment priority
  CASE 
    WHEN dv.deployment_readiness != 'ready_for_deployment' THEN 'blocked'
    WHEN dv.compliance_status != 'compliant' THEN 'compliance_review_required'
    WHEN dv.environment = 'production' THEN 'high_priority'
    WHEN dv.environment = 'staging' THEN 'medium_priority'
    ELSE 'low_priority'
  END as deployment_priority,

  -- Terraform generation command
  CASE 
    WHEN dv.deployment_readiness = 'ready_for_deployment' THEN
      FORMAT('terraform apply -var="environment=%s" -var="instance_size=%s" -target=mongodbatlas_cluster.%s_%s',
             dv.environment,
             dv.infrastructure_spec->'cluster_config'->>'instance_size',
             dv.template_name,
             dv.environment)
    ELSE 'Fix validation issues first'
  END as terraform_command

FROM deployment_validation dv
ORDER BY 
  CASE dv.deployment_priority
    WHEN 'blocked' THEN 1
    WHEN 'compliance_review_required' THEN 2
    WHEN 'high_priority' THEN 3
    WHEN 'medium_priority' THEN 4
    ELSE 5
  END,
  dv.template_name,
  dv.environment;

-- QueryLeaf provides comprehensive MongoDB Atlas infrastructure capabilities:
-- 1. Automated cluster deployment with infrastructure-as-code integration
-- 2. Advanced performance monitoring and intelligent auto-scaling
-- 3. Cost optimization and resource right-sizing recommendations
-- 4. Security and compliance automation with policy enforcement
-- 5. DevOps pipeline integration for continuous deployment
-- 6. Multi-cloud deployment and disaster recovery capabilities
-- 7. SQL-familiar syntax for complex Atlas infrastructure operations
-- 8. Enterprise-grade monitoring, alerting, and operational excellence
-- 9. Terraform and CI/CD integration for automated infrastructure management
-- 10. Production-ready Atlas operations with comprehensive automation

Best Practices for Production Atlas Deployments

Infrastructure Architecture and Automation Strategy

Essential principles for effective MongoDB Atlas production deployment:

  1. Infrastructure-as-Code: Implement comprehensive infrastructure-as-code with version control, testing, and automated deployment pipelines
  2. Auto-Scaling Configuration: Design intelligent auto-scaling policies based on application patterns and performance requirements
  3. Security Integration: Implement advanced security controls, network isolation, and encryption at rest and in transit
  4. Monitoring Strategy: Configure comprehensive monitoring, alerting, and performance optimization for proactive management
  5. Disaster Recovery: Design multi-region backup strategies and automated disaster recovery procedures
  6. Cost Optimization: Implement continuous cost monitoring and automated resource optimization based on utilization patterns

DevOps Integration and Production Operations

Optimize Atlas operations for enterprise-scale DevOps workflows:

  1. CI/CD Integration: Build comprehensive deployment pipelines with automated testing, approval workflows, and rollback capabilities
  2. Environment Management: Design consistent environment promotion strategies with appropriate resource sizing and security controls
  3. Performance Monitoring: Implement intelligent performance monitoring with predictive scaling and optimization recommendations
  4. Compliance Automation: Ensure automated compliance monitoring and policy enforcement for regulatory requirements
  5. Operational Excellence: Design automated operational procedures for maintenance, scaling, and incident response
  6. Cost Management: Monitor cloud spending patterns and implement automated cost optimization strategies

Conclusion

MongoDB Atlas provides comprehensive cloud database infrastructure automation that enables sophisticated DevOps integration, intelligent scaling, and enterprise-grade operational capabilities through infrastructure-as-code, automated monitoring, and advanced security features. The Atlas platform ensures that cloud database operations benefit from MongoDB's managed service expertise while providing the flexibility and control needed for production applications.

Key MongoDB Atlas benefits include:

  • Infrastructure Automation: Complete infrastructure-as-code integration with automated provisioning, scaling, and lifecycle management
  • Intelligent Operations: AI-powered performance optimization, predictive scaling, and automated operational recommendations
  • DevOps Integration: Seamless CI/CD pipeline integration with automated testing, deployment, and rollback capabilities
  • Enterprise Security: Advanced security controls, compliance automation, and built-in best practices for production environments
  • Cost Optimization: Intelligent resource management and automated cost optimization based on actual usage patterns
  • SQL Accessibility: Familiar SQL-style Atlas operations through QueryLeaf for accessible cloud database management

Whether you're building cloud-native applications, implementing DevOps automation, managing multi-environment deployments, or optimizing database operations at scale, MongoDB Atlas with QueryLeaf's familiar SQL interface provides the foundation for sophisticated, automated cloud database infrastructure.

QueryLeaf Integration: QueryLeaf automatically optimizes MongoDB Atlas operations while providing SQL-familiar syntax for infrastructure management, monitoring, and automation. Advanced Atlas features, DevOps integration, and operational automation are seamlessly handled through familiar SQL constructs, making sophisticated cloud database operations accessible to SQL-oriented infrastructure teams.

The combination of MongoDB Atlas's robust cloud capabilities with SQL-style infrastructure operations makes it an ideal platform for applications requiring both automated database operations and familiar infrastructure management patterns, ensuring your cloud database infrastructure can scale efficiently while maintaining operational excellence and cost optimization as application complexity and usage grow.

MongoDB Backup and Recovery Strategies: Advanced Disaster Recovery and Data Protection for Mission-Critical Applications

Production database environments require robust backup and recovery strategies that can protect against data loss, system failures, and disaster scenarios while enabling rapid recovery with minimal business disruption. Traditional backup approaches often struggle with large database sizes, complex recovery procedures, and inconsistent backup scheduling, leading to extended recovery times, potential data loss, and operational complexity that can compromise business continuity during critical incidents.

MongoDB provides comprehensive backup and recovery capabilities through native backup tools, automated backup scheduling, incremental backup strategies, and point-in-time recovery features that ensure robust data protection with minimal performance impact. Unlike traditional databases that require complex backup scripting and manual recovery procedures, MongoDB integrates backup and recovery operations directly into the database with optimized backup compression, automatic consistency verification, and streamlined recovery workflows.

The Traditional Backup and Recovery Challenge

Conventional database backup approaches face significant limitations in enterprise environments:

-- Traditional PostgreSQL backup management - manual processes with limited automation capabilities

-- Basic backup tracking table with minimal functionality
CREATE TABLE backup_jobs (
    backup_id SERIAL PRIMARY KEY,
    backup_name VARCHAR(255) NOT NULL,
    backup_type VARCHAR(100) NOT NULL, -- full, incremental, differential
    database_name VARCHAR(100) NOT NULL,

    -- Backup execution tracking
    backup_start_time TIMESTAMP NOT NULL,
    backup_end_time TIMESTAMP,
    backup_status VARCHAR(50) DEFAULT 'running',

    -- Basic size and performance metrics (limited visibility)
    backup_size_bytes BIGINT,
    backup_duration_seconds INTEGER,
    backup_compression_ratio DECIMAL(5,2),

    -- File location tracking (manual)
    backup_file_path TEXT,
    backup_storage_location VARCHAR(200),
    backup_retention_days INTEGER DEFAULT 30,

    -- Basic validation (very limited)
    backup_checksum VARCHAR(64),
    backup_verification_status VARCHAR(50),
    backup_verification_time TIMESTAMP,

    -- Error tracking
    backup_error_message TEXT,
    backup_warning_count INTEGER DEFAULT 0,

    -- Metadata
    created_by VARCHAR(100) DEFAULT current_user,
    backup_method VARCHAR(100) DEFAULT 'pg_dump'
);

-- Simple backup scheduling table (no real automation)
CREATE TABLE backup_schedules (
    schedule_id SERIAL PRIMARY KEY,
    schedule_name VARCHAR(255) NOT NULL,
    database_name VARCHAR(100) NOT NULL,
    backup_type VARCHAR(100) NOT NULL,

    -- Basic scheduling (cron-like but manual)
    schedule_frequency VARCHAR(50), -- daily, weekly, monthly
    schedule_time TIME,
    schedule_days VARCHAR(20), -- comma-separated day numbers

    -- Basic configuration
    retention_days INTEGER DEFAULT 30,
    backup_location VARCHAR(200),
    compression_enabled BOOLEAN DEFAULT true,

    -- Status tracking
    schedule_enabled BOOLEAN DEFAULT true,
    last_backup_time TIMESTAMP,
    last_backup_status VARCHAR(50),
    next_backup_time TIMESTAMP,

    -- Error tracking
    consecutive_failures INTEGER DEFAULT 0,
    last_error_message TEXT,

    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Manual backup execution function (very basic functionality)
CREATE OR REPLACE FUNCTION execute_backup(
    database_name_param VARCHAR(100),
    backup_type_param VARCHAR(100) DEFAULT 'full'
) RETURNS TABLE (
    backup_id INTEGER,
    backup_status VARCHAR(50),
    backup_duration_seconds INTEGER,
    backup_size_mb INTEGER,
    backup_file_path TEXT,
    error_message TEXT
) AS $$
DECLARE
    new_backup_id INTEGER;
    backup_start TIMESTAMP;
    backup_end TIMESTAMP;
    backup_command TEXT;
    backup_filename TEXT;
    backup_directory TEXT := '/backup/postgresql/';
    command_result INTEGER;
    backup_size BIGINT;
    final_status VARCHAR(50) := 'completed';
    error_msg TEXT := '';
BEGIN
    backup_start := clock_timestamp();

    -- Generate backup filename
    backup_filename := database_name_param || '_' || 
                      backup_type_param || '_' || 
                      TO_CHAR(backup_start, 'YYYY-MM-DD_HH24-MI-SS') || '.sql';

    -- Create backup job record
    INSERT INTO backup_jobs (
        backup_name, backup_type, database_name, 
        backup_start_time, backup_file_path, backup_method
    )
    VALUES (
        backup_filename, backup_type_param, database_name_param,
        backup_start, backup_directory || backup_filename, 'pg_dump'
    )
    RETURNING backup_jobs.backup_id INTO new_backup_id;

    BEGIN
        -- Execute backup command (this is a simulation - real implementation would call external command)
        -- In reality: pg_dump -h localhost -U postgres -d database_name -f backup_file

        -- Simulate backup process with basic validation
        IF database_name_param NOT IN (SELECT datname FROM pg_database) THEN
            RAISE EXCEPTION 'Database % does not exist', database_name_param;
        END IF;

        -- Simulate backup time based on type
        CASE backup_type_param
            WHEN 'full' THEN PERFORM pg_sleep(2.0);  -- Simulate 2 seconds for full backup
            WHEN 'incremental' THEN PERFORM pg_sleep(0.5);  -- Simulate 0.5 seconds for incremental
            ELSE PERFORM pg_sleep(1.0);
        END CASE;

        -- Simulate backup size calculation (very basic)
        SELECT pg_database_size(database_name_param) INTO backup_size;

        -- Basic compression simulation
        backup_size := backup_size * 0.3;  -- Assume 70% compression

    EXCEPTION WHEN OTHERS THEN
        final_status := 'failed';
        error_msg := SQLERRM;
        backup_size := 0;
    END;

    backup_end := clock_timestamp();

    -- Update backup job record
    UPDATE backup_jobs 
    SET 
        backup_end_time = backup_end,
        backup_status = final_status,
        backup_size_bytes = backup_size,
        backup_duration_seconds = EXTRACT(SECONDS FROM backup_end - backup_start)::INTEGER,
        backup_compression_ratio = CASE WHEN backup_size > 0 THEN 70.0 ELSE 0 END,
        backup_error_message = CASE WHEN final_status = 'failed' THEN error_msg ELSE NULL END
    WHERE backup_jobs.backup_id = new_backup_id;

    -- Return results
    RETURN QUERY SELECT 
        new_backup_id,
        final_status,
        EXTRACT(SECONDS FROM backup_end - backup_start)::INTEGER,
        (backup_size / 1024 / 1024)::INTEGER,
        backup_directory || backup_filename,
        CASE WHEN final_status = 'failed' THEN error_msg ELSE NULL END;

END;
$$ LANGUAGE plpgsql;

-- Execute a backup (basic functionality)
SELECT * FROM execute_backup('production_db', 'full');

-- Basic backup verification function (very limited)
CREATE OR REPLACE FUNCTION verify_backup(backup_id_param INTEGER)
RETURNS TABLE (
    backup_id INTEGER,
    verification_status VARCHAR(50),
    verification_duration_seconds INTEGER,
    file_exists BOOLEAN,
    file_size_mb INTEGER,
    checksum_valid BOOLEAN,
    error_message TEXT
) AS $$
DECLARE
    backup_record RECORD;
    verification_start TIMESTAMP;
    verification_end TIMESTAMP;
    file_size BIGINT;
    verification_error TEXT := '';
    verification_result VARCHAR(50) := 'valid';
BEGIN
    verification_start := clock_timestamp();

    -- Get backup record
    SELECT * INTO backup_record
    FROM backup_jobs
    WHERE backup_jobs.backup_id = backup_id_param;

    IF NOT FOUND THEN
        RETURN QUERY SELECT 
            backup_id_param,
            'not_found'::VARCHAR(50),
            0,
            false,
            0,
            false,
            'Backup record not found'::TEXT;
        RETURN;
    END IF;

    BEGIN
        -- Simulate file verification (in reality would check actual file)
        -- Check if backup was successful
        IF backup_record.backup_status != 'completed' THEN
            verification_result := 'invalid';
            verification_error := 'Original backup failed';
        END IF;

        -- Simulate file size check
        file_size := backup_record.backup_size_bytes;

        -- Basic integrity simulation
        IF file_size = 0 OR backup_record.backup_duration_seconds = 0 THEN
            verification_result := 'invalid';
            verification_error := 'Backup file appears to be empty or corrupted';
        END IF;

        -- Simulate verification time
        PERFORM pg_sleep(0.1);

    EXCEPTION WHEN OTHERS THEN
        verification_result := 'error';
        verification_error := SQLERRM;
    END;

    verification_end := clock_timestamp();

    -- Update backup record with verification results
    UPDATE backup_jobs
    SET 
        backup_verification_status = verification_result,
        backup_verification_time = verification_end
    WHERE backup_jobs.backup_id = backup_id_param;

    -- Return verification results
    RETURN QUERY SELECT 
        backup_id_param,
        verification_result,
        EXTRACT(SECONDS FROM verification_end - verification_start)::INTEGER,
        CASE WHEN file_size > 0 THEN true ELSE false END,
        (file_size / 1024 / 1024)::INTEGER,
        CASE WHEN verification_result = 'valid' THEN true ELSE false END,
        CASE WHEN verification_result != 'valid' THEN verification_error ELSE NULL END;

END;
$$ LANGUAGE plpgsql;

-- Recovery function (very basic and manual)
CREATE OR REPLACE FUNCTION restore_backup(
    backup_id_param INTEGER,
    target_database_name VARCHAR(100)
) RETURNS TABLE (
    restore_success BOOLEAN,
    restore_duration_seconds INTEGER,
    restored_size_mb INTEGER,
    error_message TEXT
) AS $$
DECLARE
    backup_record RECORD;
    restore_start TIMESTAMP;
    restore_end TIMESTAMP;
    restore_error TEXT := '';
    restore_result BOOLEAN := true;
BEGIN
    restore_start := clock_timestamp();

    -- Get backup information
    SELECT * INTO backup_record
    FROM backup_jobs
    WHERE backup_id = backup_id_param
    AND backup_status = 'completed';

    IF NOT FOUND THEN
        RETURN QUERY SELECT 
            false,
            0,
            0,
            'Valid backup not found for restore operation'::TEXT;
        RETURN;
    END IF;

    BEGIN
        -- Simulate restore process (in reality would execute psql command)
        -- psql -h localhost -U postgres -d target_database -f backup_file

        -- Basic validation
        IF target_database_name IS NULL OR LENGTH(target_database_name) = 0 THEN
            RAISE EXCEPTION 'Target database name is required';
        END IF;

        -- Simulate restore time proportional to backup size
        PERFORM pg_sleep(LEAST(backup_record.backup_duration_seconds * 1.5, 10.0));

    EXCEPTION WHEN OTHERS THEN
        restore_result := false;
        restore_error := SQLERRM;
    END;

    restore_end := clock_timestamp();

    -- Return restore results
    RETURN QUERY SELECT 
        restore_result,
        EXTRACT(SECONDS FROM restore_end - restore_start)::INTEGER,
        (backup_record.backup_size_bytes / 1024 / 1024)::INTEGER,
        CASE WHEN NOT restore_result THEN restore_error ELSE NULL END;

END;
$$ LANGUAGE plpgsql;

-- Basic backup monitoring and cleanup
WITH backup_status_summary AS (
    SELECT 
        DATE_TRUNC('day', backup_start_time) as backup_date,
        database_name,
        backup_type,
        COUNT(*) as total_backups,
        COUNT(*) FILTER (WHERE backup_status = 'completed') as successful_backups,
        COUNT(*) FILTER (WHERE backup_status = 'failed') as failed_backups,
        SUM(backup_size_bytes) as total_backup_size_bytes,
        AVG(backup_duration_seconds) as avg_backup_duration,
        MIN(backup_start_time) as first_backup,
        MAX(backup_start_time) as last_backup

    FROM backup_jobs
    WHERE backup_start_time >= CURRENT_DATE - INTERVAL '7 days'
    GROUP BY DATE_TRUNC('day', backup_start_time), database_name, backup_type
)
SELECT 
    backup_date,
    database_name,
    backup_type,
    total_backups,
    successful_backups,
    failed_backups,

    -- Success rate
    CASE 
        WHEN total_backups > 0 THEN
            ROUND((successful_backups::DECIMAL / total_backups) * 100, 1)
        ELSE 0
    END as success_rate_percent,

    -- Size and performance metrics
    ROUND((total_backup_size_bytes / 1024.0 / 1024.0), 1) as total_size_mb,
    ROUND(avg_backup_duration::NUMERIC, 1) as avg_duration_seconds,

    -- Backup frequency analysis
    EXTRACT(HOURS FROM (last_backup - first_backup))::INTEGER as backup_window_hours,

    -- Health assessment
    CASE 
        WHEN failed_backups > 0 THEN 'issues'
        WHEN successful_backups = 0 THEN 'no_backups'
        ELSE 'healthy'
    END as backup_health,

    -- Recommendations
    CASE 
        WHEN failed_backups > total_backups * 0.2 THEN 'investigate_failures'
        WHEN avg_backup_duration > 3600 THEN 'optimize_performance'
        WHEN total_backup_size_bytes > 100 * 1024 * 1024 * 1024 THEN 'consider_compression'
        ELSE 'monitor'
    END as recommendation

FROM backup_status_summary
ORDER BY backup_date DESC, database_name, backup_type;

-- Cleanup old backups (manual process)
WITH old_backups AS (
    SELECT backup_id, backup_file_path, backup_size_bytes
    FROM backup_jobs
    WHERE backup_start_time < CURRENT_DATE - INTERVAL '90 days'
    AND backup_status = 'completed'
),
cleanup_summary AS (
    DELETE FROM backup_jobs
    WHERE backup_id IN (SELECT backup_id FROM old_backups)
    RETURNING backup_id, backup_size_bytes
)
SELECT 
    COUNT(*) as backups_cleaned,
    SUM(backup_size_bytes) as total_space_freed_bytes,
    ROUND(SUM(backup_size_bytes) / 1024.0 / 1024.0 / 1024.0, 2) as space_freed_gb
FROM cleanup_summary;

-- Problems with traditional backup approaches:
-- 1. Manual backup execution with no automation or scheduling
-- 2. Limited backup verification and integrity checking
-- 3. No point-in-time recovery capabilities
-- 4. Basic error handling with no automatic retry mechanisms
-- 5. No incremental backup support or optimization
-- 6. Manual cleanup and retention management
-- 7. Limited monitoring and alerting capabilities
-- 8. No support for distributed backup strategies
-- 9. Complex recovery procedures requiring manual intervention
-- 10. No integration with cloud storage or disaster recovery systems

MongoDB provides comprehensive backup and recovery capabilities with automated scheduling and management:

// MongoDB Advanced Backup and Recovery - comprehensive data protection with automated disaster recovery
const { MongoClient, GridFSBucket } = require('mongodb');
const { spawn } = require('child_process');
const fs = require('fs').promises;
const path = require('path');
const { createHash } = require('crypto');
const { EventEmitter } = require('events');

// Comprehensive MongoDB Backup and Recovery Manager
class AdvancedBackupRecoveryManager extends EventEmitter {
  constructor(connectionString, backupConfig = {}) {
    super();
    this.connectionString = connectionString;
    this.client = null;
    this.db = null;

    // Advanced backup and recovery configuration
    this.config = {
      // Backup strategy configuration
      enableAutomatedBackups: backupConfig.enableAutomatedBackups !== false,
      enableIncrementalBackups: backupConfig.enableIncrementalBackups || false,
      enablePointInTimeRecovery: backupConfig.enablePointInTimeRecovery || false,
      enableCompression: backupConfig.enableCompression !== false,

      // Backup scheduling
      fullBackupSchedule: backupConfig.fullBackupSchedule || '0 2 * * *', // Daily at 2 AM
      incrementalBackupSchedule: backupConfig.incrementalBackupSchedule || '0 */6 * * *', // Every 6 hours

      // Storage configuration
      backupStoragePath: backupConfig.backupStoragePath || './backups',
      maxBackupSize: backupConfig.maxBackupSize || 10 * 1024 * 1024 * 1024, // 10GB
      compressionLevel: backupConfig.compressionLevel || 6,

      // Retention policies
      dailyBackupRetention: backupConfig.dailyBackupRetention || 30, // 30 days
      weeklyBackupRetention: backupConfig.weeklyBackupRetention || 12, // 12 weeks
      monthlyBackupRetention: backupConfig.monthlyBackupRetention || 12, // 12 months

      // Backup validation
      enableBackupVerification: backupConfig.enableBackupVerification !== false,
      verificationSampleSize: backupConfig.verificationSampleSize || 1000,
      enableChecksumValidation: backupConfig.enableChecksumValidation !== false,

      // Recovery configuration
      enableParallelRecovery: backupConfig.enableParallelRecovery || false,
      maxRecoveryThreads: backupConfig.maxRecoveryThreads || 4,
      recoveryBatchSize: backupConfig.recoveryBatchSize || 1000,

      // Monitoring and alerting
      enableBackupMonitoring: backupConfig.enableBackupMonitoring !== false,
      enableRecoveryTesting: backupConfig.enableRecoveryTesting || false,
      alertThresholds: {
        backupFailureCount: backupConfig.backupFailureThreshold || 3,
        backupDurationMinutes: backupConfig.backupDurationThreshold || 120,
        backupSizeVariation: backupConfig.backupSizeVariationThreshold || 50
      },

      // Disaster recovery
      enableReplication: backupConfig.enableReplication || false,
      replicationTargets: backupConfig.replicationTargets || [],
      enableCloudSync: backupConfig.enableCloudSync || false,
      cloudSyncConfig: backupConfig.cloudSyncConfig || {}
    };

    // Backup and recovery state management
    this.backupJobs = new Map();
    this.scheduledBackups = new Map();
    this.recoveryOperations = new Map();
    this.backupMetrics = {
      totalBackups: 0,
      successfulBackups: 0,
      failedBackups: 0,
      totalDataBackedUp: 0,
      averageBackupDuration: 0
    };

    // Backup history and metadata
    this.backupHistory = [];
    this.recoveryHistory = [];

    this.initializeBackupSystem();
  }

  async initializeBackupSystem() {
    console.log('Initializing advanced backup and recovery system...');

    try {
      // Connect to MongoDB
      this.client = new MongoClient(this.connectionString);
      await this.client.connect();
      this.db = this.client.db();

      // Setup backup infrastructure
      await this.setupBackupInfrastructure();

      // Initialize automated backup scheduling
      if (this.config.enableAutomatedBackups) {
        await this.setupAutomatedBackups();
      }

      // Setup backup monitoring
      if (this.config.enableBackupMonitoring) {
        await this.setupBackupMonitoring();
      }

      // Initialize point-in-time recovery if enabled
      if (this.config.enablePointInTimeRecovery) {
        await this.setupPointInTimeRecovery();
      }

      console.log('Advanced backup and recovery system initialized successfully');

    } catch (error) {
      console.error('Error initializing backup system:', error);
      throw error;
    }
  }

  async setupBackupInfrastructure() {
    console.log('Setting up backup infrastructure...');

    try {
      // Create backup storage directory
      await fs.mkdir(this.config.backupStoragePath, { recursive: true });

      // Create subdirectories for different backup types
      const backupDirs = ['full', 'incremental', 'logs', 'metadata', 'recovery-points'];
      for (const dir of backupDirs) {
        await fs.mkdir(path.join(this.config.backupStoragePath, dir), { recursive: true });
      }

      // Setup backup metadata collections
      const collections = {
        backupJobs: this.db.collection('backup_jobs'),
        backupMetadata: this.db.collection('backup_metadata'),
        recoveryOperations: this.db.collection('recovery_operations'),
        backupSchedules: this.db.collection('backup_schedules')
      };

      // Create indexes for backup operations
      await collections.backupJobs.createIndex(
        { startTime: -1, status: 1 },
        { background: true }
      );

      await collections.backupMetadata.createIndex(
        { backupId: 1, backupType: 1, timestamp: -1 },
        { background: true }
      );

      await collections.recoveryOperations.createIndex(
        { recoveryId: 1, startTime: -1 },
        { background: true }
      );

      this.collections = collections;

    } catch (error) {
      console.error('Error setting up backup infrastructure:', error);
      throw error;
    }
  }

  async createFullBackup(backupOptions = {}) {
    console.log('Starting full database backup...');

    const backupId = this.generateBackupId('full');
    const startTime = new Date();

    try {
      // Create backup job record
      const backupJob = {
        backupId: backupId,
        backupType: 'full',
        startTime: startTime,
        status: 'running',

        // Backup configuration
        options: {
          compression: this.config.enableCompression,
          compressionLevel: this.config.compressionLevel,
          includeIndexes: backupOptions.includeIndexes !== false,
          includeSystemCollections: backupOptions.includeSystemCollections || false,
          oplogCapture: this.config.enablePointInTimeRecovery
        },

        // Progress tracking
        progress: {
          collectionsProcessed: 0,
          totalCollections: 0,
          documentsProcessed: 0,
          totalDocuments: 0,
          bytesProcessed: 0,
          estimatedTotalBytes: 0
        },

        // Performance metrics
        performance: {
          throughputMBps: 0,
          compressionRatio: 0,
          parallelStreams: 1
        }
      };

      await this.collections.backupJobs.insertOne(backupJob);
      this.backupJobs.set(backupId, backupJob);

      // Get database statistics for progress tracking
      const dbStats = await this.db.stats();
      backupJob.progress.estimatedTotalBytes = dbStats.dataSize;

      // Get collection list and metadata
      const collections = await this.db.listCollections().toArray();
      backupJob.progress.totalCollections = collections.length;

      // Calculate total document count across collections
      let totalDocuments = 0;
      for (const collectionInfo of collections) {
        if (collectionInfo.type === 'collection') {
          const collection = this.db.collection(collectionInfo.name);
          const count = await collection.estimatedDocumentCount();
          totalDocuments += count;
        }
      }
      backupJob.progress.totalDocuments = totalDocuments;

      // Create backup using mongodump
      const backupResult = await this.executeMongoDump(backupId, backupJob);

      // Verify backup integrity
      if (this.config.enableBackupVerification) {
        await this.verifyBackupIntegrity(backupId, backupResult);
      }

      // Calculate backup metrics
      const endTime = new Date();
      const duration = endTime.getTime() - startTime.getTime();
      const backupSizeBytes = backupResult.backupSize;
      const compressionRatio = backupResult.originalSize > 0 ? 
        (backupResult.originalSize - backupSizeBytes) / backupResult.originalSize : 0;

      // Update backup job with results
      const completedJob = {
        ...backupJob,
        endTime: endTime,
        status: 'completed',
        duration: duration,
        backupSize: backupSizeBytes,
        originalSize: backupResult.originalSize,
        compressionRatio: compressionRatio,
        backupPath: backupResult.backupPath,
        checksum: backupResult.checksum,

        // Final performance metrics
        performance: {
          throughputMBps: (backupSizeBytes / 1024 / 1024) / (duration / 1000),
          compressionRatio: compressionRatio,
          parallelStreams: backupResult.parallelStreams || 1
        }
      };

      await this.collections.backupJobs.replaceOne(
        { backupId: backupId },
        completedJob
      );

      // Update backup metrics
      this.updateBackupMetrics(completedJob);

      // Store backup metadata for recovery operations
      await this.storeBackupMetadata(completedJob);

      this.emit('backupCompleted', {
        backupId: backupId,
        backupType: 'full',
        duration: duration,
        backupSize: backupSizeBytes,
        compressionRatio: compressionRatio
      });

      console.log(`Full backup completed: ${backupId} (${Math.round(backupSizeBytes / 1024 / 1024)} MB, ${Math.round(duration / 1000)}s)`);

      return {
        success: true,
        backupId: backupId,
        backupSize: backupSizeBytes,
        duration: duration,
        compressionRatio: compressionRatio,
        backupPath: backupResult.backupPath
      };

    } catch (error) {
      console.error(`Full backup failed for ${backupId}:`, error);

      // Update backup job with error
      await this.collections.backupJobs.updateOne(
        { backupId: backupId },
        {
          $set: {
            status: 'failed',
            endTime: new Date(),
            error: {
              message: error.message,
              stack: error.stack,
              timestamp: new Date()
            }
          }
        }
      );

      this.backupMetrics.failedBackups++;

      this.emit('backupFailed', {
        backupId: backupId,
        backupType: 'full',
        error: error.message
      });

      return {
        success: false,
        backupId: backupId,
        error: error.message
      };
    }
  }

  async executeMongoDump(backupId, backupJob) {
    console.log(`Executing mongodump for backup: ${backupId}`);

    return new Promise((resolve, reject) => {
      const backupPath = path.join(
        this.config.backupStoragePath,
        'full',
        `${backupId}.archive`
      );

      // Build mongodump command arguments
      const mongodumpArgs = [
        '--uri', this.connectionString,
        '--archive=' + backupPath,
        '--gzip'
      ];

      // Add additional options based on configuration
      if (backupJob.options.oplogCapture) {
        mongodumpArgs.push('--oplog');
      }

      if (!backupJob.options.includeSystemCollections) {
        mongodumpArgs.push('--excludeCollection=system.*');
      }

      // Execute mongodump
      const mongodumpProcess = spawn('mongodump', mongodumpArgs);

      let stdoutData = '';
      let stderrData = '';

      mongodumpProcess.stdout.on('data', (data) => {
        stdoutData += data.toString();
        this.parseBackupProgress(backupId, data.toString());
      });

      mongodumpProcess.stderr.on('data', (data) => {
        stderrData += data.toString();
        console.warn('mongodump stderr:', data.toString());
      });

      mongodumpProcess.on('close', async (code) => {
        try {
          if (code === 0) {
            // Get backup file statistics
            const stats = await fs.stat(backupPath);
            const backupSize = stats.size;

            // Calculate checksum for integrity verification
            const checksum = await this.calculateFileChecksum(backupPath);

            resolve({
              backupPath: backupPath,
              backupSize: backupSize,
              originalSize: backupJob.progress.estimatedTotalBytes,
              checksum: checksum,
              stdout: stdoutData,
              parallelStreams: 1
            });
          } else {
            reject(new Error(`mongodump failed with exit code ${code}: ${stderrData}`));
          }
        } catch (error) {
          reject(error);
        }
      });

      mongodumpProcess.on('error', (error) => {
        reject(new Error(`Failed to start mongodump: ${error.message}`));
      });
    });
  }

  parseBackupProgress(backupId, output) {
    // Parse mongodump output to extract progress information
    const backupJob = this.backupJobs.get(backupId);
    if (!backupJob) return;

    // Look for progress indicators in mongodump output
    const progressMatches = output.match(/(\d+)\s+documents?\s+to\s+(\w+)\.(\w+)/g);
    if (progressMatches) {
      for (const match of progressMatches) {
        const [, docCount, dbName, collectionName] = match.match(/(\d+)\s+documents?\s+to\s+(\w+)\.(\w+)/);

        backupJob.progress.documentsProcessed += parseInt(docCount);
        backupJob.progress.collectionsProcessed++;

        // Emit progress update
        this.emit('backupProgress', {
          backupId: backupId,
          progress: {
            collectionsProcessed: backupJob.progress.collectionsProcessed,
            totalCollections: backupJob.progress.totalCollections,
            documentsProcessed: backupJob.progress.documentsProcessed,
            totalDocuments: backupJob.progress.totalDocuments,
            percentComplete: (backupJob.progress.documentsProcessed / backupJob.progress.totalDocuments) * 100
          }
        });
      }
    }
  }

  async calculateFileChecksum(filePath) {
    console.log(`Calculating checksum for: ${filePath}`);

    try {
      const fileBuffer = await fs.readFile(filePath);
      const hash = createHash('sha256');
      hash.update(fileBuffer);
      return hash.digest('hex');

    } catch (error) {
      console.error('Error calculating file checksum:', error);
      throw error;
    }
  }

  async verifyBackupIntegrity(backupId, backupResult) {
    console.log(`Verifying backup integrity: ${backupId}`);

    try {
      const verification = {
        backupId: backupId,
        verificationTime: new Date(),
        checksumVerified: false,
        sampleVerified: false,
        errors: []
      };

      // Verify file checksum
      const currentChecksum = await this.calculateFileChecksum(backupResult.backupPath);
      verification.checksumVerified = currentChecksum === backupResult.checksum;

      if (!verification.checksumVerified) {
        verification.errors.push('Checksum verification failed - file may be corrupted');
      }

      // Perform sample restore verification
      if (this.config.verificationSampleSize > 0) {
        const sampleResult = await this.performSampleRestoreTest(backupId, backupResult);
        verification.sampleVerified = sampleResult.success;

        if (!sampleResult.success) {
          verification.errors.push(`Sample restore failed: ${sampleResult.error}`);
        }
      }

      // Store verification results
      await this.collections.backupMetadata.updateOne(
        { backupId: backupId },
        {
          $set: {
            verification: verification,
            lastVerificationTime: verification.verificationTime
          }
        },
        { upsert: true }
      );

      this.emit('backupVerified', {
        backupId: backupId,
        verification: verification
      });

      return verification;

    } catch (error) {
      console.error(`Backup verification failed for ${backupId}:`, error);
      throw error;
    }
  }

  async performSampleRestoreTest(backupId, backupResult) {
    console.log(`Performing sample restore test for backup: ${backupId}`);

    try {
      // Create temporary database for restore test
      const testDbName = `backup_test_${backupId}_${Date.now()}`;

      // Execute mongorestore on sample data
      const restoreResult = await this.executeSampleRestore(
        backupResult.backupPath,
        testDbName
      );

      // Verify restored data integrity
      const verificationResult = await this.verifySampleData(testDbName);

      // Cleanup test database
      await this.cleanupTestDatabase(testDbName);

      return {
        success: restoreResult.success && verificationResult.success,
        error: restoreResult.error || verificationResult.error,
        restoredDocuments: restoreResult.documentCount,
        verificationDetails: verificationResult
      };

    } catch (error) {
      console.error(`Sample restore test failed for ${backupId}:`, error);
      return {
        success: false,
        error: error.message
      };
    }
  }

  async createIncrementalBackup(baseBackupId, backupOptions = {}) {
    console.log(`Starting incremental backup based on: ${baseBackupId}`);

    const backupId = this.generateBackupId('incremental');
    const startTime = new Date();

    try {
      // Get base backup metadata
      const baseBackup = await this.collections.backupJobs.findOne({ backupId: baseBackupId });
      if (!baseBackup) {
        throw new Error(`Base backup not found: ${baseBackupId}`);
      }

      // Create incremental backup job record
      const backupJob = {
        backupId: backupId,
        backupType: 'incremental',
        baseBackupId: baseBackupId,
        startTime: startTime,
        status: 'running',

        // Incremental backup specific configuration
        options: {
          ...backupOptions,
          fromTimestamp: baseBackup.endTime,
          toTimestamp: startTime,
          oplogOnly: true,
          compression: this.config.enableCompression
        },

        progress: {
          oplogEntriesProcessed: 0,
          totalOplogEntries: 0,
          bytesProcessed: 0
        }
      };

      await this.collections.backupJobs.insertOne(backupJob);
      this.backupJobs.set(backupId, backupJob);

      // Execute incremental backup using oplog
      const backupResult = await this.executeOplogBackup(backupId, backupJob);

      // Update backup job with results
      const endTime = new Date();
      const duration = endTime.getTime() - startTime.getTime();

      const completedJob = {
        ...backupJob,
        endTime: endTime,
        status: 'completed',
        duration: duration,
        backupSize: backupResult.backupSize,
        oplogEntries: backupResult.oplogEntries,
        backupPath: backupResult.backupPath,
        checksum: backupResult.checksum
      };

      await this.collections.backupJobs.replaceOne(
        { backupId: backupId },
        completedJob
      );

      this.updateBackupMetrics(completedJob);
      await this.storeBackupMetadata(completedJob);

      this.emit('backupCompleted', {
        backupId: backupId,
        backupType: 'incremental',
        baseBackupId: baseBackupId,
        duration: duration,
        backupSize: backupResult.backupSize,
        oplogEntries: backupResult.oplogEntries
      });

      console.log(`Incremental backup completed: ${backupId}`);

      return {
        success: true,
        backupId: backupId,
        baseBackupId: baseBackupId,
        backupSize: backupResult.backupSize,
        duration: duration,
        oplogEntries: backupResult.oplogEntries
      };

    } catch (error) {
      console.error(`Incremental backup failed for ${backupId}:`, error);

      await this.collections.backupJobs.updateOne(
        { backupId: backupId },
        {
          $set: {
            status: 'failed',
            endTime: new Date(),
            error: {
              message: error.message,
              stack: error.stack,
              timestamp: new Date()
            }
          }
        }
      );

      return {
        success: false,
        backupId: backupId,
        error: error.message
      };
    }
  }

  async restoreFromBackup(backupId, restoreOptions = {}) {
    console.log(`Starting database restore from backup: ${backupId}`);

    const recoveryId = this.generateRecoveryId();
    const startTime = new Date();

    try {
      // Get backup metadata
      const backupJob = await this.collections.backupJobs.findOne({ backupId: backupId });
      if (!backupJob || backupJob.status !== 'completed') {
        throw new Error(`Valid backup not found: ${backupId}`);
      }

      // Create recovery operation record
      const recoveryOperation = {
        recoveryId: recoveryId,
        backupId: backupId,
        backupType: backupJob.backupType,
        startTime: startTime,
        status: 'running',

        // Recovery configuration
        options: {
          targetDatabase: restoreOptions.targetDatabase || this.db.databaseName,
          dropBeforeRestore: restoreOptions.dropBeforeRestore || false,
          restoreIndexes: restoreOptions.restoreIndexes !== false,
          parallelRecovery: this.config.enableParallelRecovery,
          batchSize: this.config.recoveryBatchSize
        },

        progress: {
          collectionsRestored: 0,
          totalCollections: 0,
          documentsRestored: 0,
          totalDocuments: 0,
          bytesRestored: 0
        }
      };

      await this.collections.recoveryOperations.insertOne(recoveryOperation);
      this.recoveryOperations.set(recoveryId, recoveryOperation);

      // Execute restore process
      const restoreResult = await this.executeRestore(recoveryId, backupJob, recoveryOperation);

      // Verify restore integrity
      if (this.config.enableBackupVerification) {
        await this.verifyRestoreIntegrity(recoveryId, restoreResult);
      }

      // Update recovery operation with results
      const endTime = new Date();
      const duration = endTime.getTime() - startTime.getTime();

      const completedRecovery = {
        ...recoveryOperation,
        endTime: endTime,
        status: 'completed',
        duration: duration,
        restoredSize: restoreResult.restoredSize,
        documentsRestored: restoreResult.documentsRestored,
        collectionsRestored: restoreResult.collectionsRestored
      };

      await this.collections.recoveryOperations.replaceOne(
        { recoveryId: recoveryId },
        completedRecovery
      );

      this.recoveryHistory.push(completedRecovery);

      this.emit('restoreCompleted', {
        recoveryId: recoveryId,
        backupId: backupId,
        duration: duration,
        restoredSize: restoreResult.restoredSize,
        documentsRestored: restoreResult.documentsRestored
      });

      console.log(`Database restore completed: ${recoveryId}`);

      return {
        success: true,
        recoveryId: recoveryId,
        backupId: backupId,
        duration: duration,
        restoredSize: restoreResult.restoredSize,
        documentsRestored: restoreResult.documentsRestored,
        collectionsRestored: restoreResult.collectionsRestored
      };

    } catch (error) {
      console.error(`Database restore failed for ${recoveryId}:`, error);

      await this.collections.recoveryOperations.updateOne(
        { recoveryId: recoveryId },
        {
          $set: {
            status: 'failed',
            endTime: new Date(),
            error: {
              message: error.message,
              stack: error.stack,
              timestamp: new Date()
            }
          }
        }
      );

      return {
        success: false,
        recoveryId: recoveryId,
        backupId: backupId,
        error: error.message
      };
    }
  }

  async getBackupStatus(backupId = null) {
    console.log(`Getting backup status${backupId ? ' for: ' + backupId : ' (all backups)'}`);

    try {
      let query = {};
      if (backupId) {
        query.backupId = backupId;
      }

      const backups = await this.collections.backupJobs
        .find(query)
        .sort({ startTime: -1 })
        .limit(backupId ? 1 : 50)
        .toArray();

      const backupStatuses = backups.map(backup => ({
        backupId: backup.backupId,
        backupType: backup.backupType,
        status: backup.status,
        startTime: backup.startTime,
        endTime: backup.endTime,
        duration: backup.duration,
        backupSize: backup.backupSize,
        compressionRatio: backup.compressionRatio,
        documentsProcessed: backup.progress?.documentsProcessed || 0,
        collectionsProcessed: backup.progress?.collectionsProcessed || 0,
        error: backup.error?.message || null,

        // Additional metadata
        baseBackupId: backup.baseBackupId || null,
        checksum: backup.checksum || null,
        backupPath: backup.backupPath || null,

        // Performance metrics
        throughputMBps: backup.performance?.throughputMBps || 0,

        // Health indicators
        healthStatus: this.assessBackupHealth(backup),
        lastVerificationTime: backup.verification?.verificationTime || null,
        verificationStatus: backup.verification?.checksumVerified ? 'verified' : 'pending'
      }));

      return {
        success: true,
        backups: backupStatuses,
        totalBackups: backups.length,

        // System-wide metrics
        systemMetrics: {
          totalBackups: this.backupMetrics.totalBackups,
          successfulBackups: this.backupMetrics.successfulBackups,
          failedBackups: this.backupMetrics.failedBackups,
          averageBackupDuration: this.backupMetrics.averageBackupDuration,
          totalDataBackedUp: this.backupMetrics.totalDataBackedUp
        }
      };

    } catch (error) {
      console.error('Error getting backup status:', error);
      return {
        success: false,
        error: error.message
      };
    }
  }

  assessBackupHealth(backup) {
    if (backup.status === 'failed') return 'unhealthy';
    if (backup.status === 'running') return 'in_progress';
    if (backup.status !== 'completed') return 'unknown';

    // Check verification status
    if (backup.verification && !backup.verification.checksumVerified) {
      return 'verification_failed';
    }

    // Check backup age
    const ageHours = (Date.now() - backup.startTime.getTime()) / (1000 * 60 * 60);
    if (ageHours > 24 * 7) return 'stale'; // Older than 1 week

    return 'healthy';
  }

  updateBackupMetrics(backupJob) {
    this.backupMetrics.totalBackups++;

    if (backupJob.status === 'completed') {
      this.backupMetrics.successfulBackups++;
      this.backupMetrics.totalDataBackedUp += backupJob.backupSize || 0;

      // Update average duration
      const currentAvg = this.backupMetrics.averageBackupDuration;
      const totalSuccessful = this.backupMetrics.successfulBackups;
      this.backupMetrics.averageBackupDuration = 
        ((currentAvg * (totalSuccessful - 1)) + (backupJob.duration || 0)) / totalSuccessful;
    } else if (backupJob.status === 'failed') {
      this.backupMetrics.failedBackups++;
    }
  }

  async storeBackupMetadata(backupJob) {
    const metadata = {
      backupId: backupJob.backupId,
      backupType: backupJob.backupType,
      timestamp: backupJob.startTime,
      backupSize: backupJob.backupSize,
      backupPath: backupJob.backupPath,
      checksum: backupJob.checksum,
      compressionRatio: backupJob.compressionRatio,
      baseBackupId: backupJob.baseBackupId || null,

      // Retention information
      retentionPolicy: this.determineRetentionPolicy(backupJob),
      expirationDate: this.calculateExpirationDate(backupJob),

      // Recovery information
      recoveryMetadata: {
        documentsCount: backupJob.progress?.documentsProcessed || 0,
        collectionsCount: backupJob.progress?.collectionsProcessed || 0,
        indexesIncluded: backupJob.options?.includeIndexes !== false,
        oplogIncluded: backupJob.options?.oplogCapture === true
      }
    };

    await this.collections.backupMetadata.replaceOne(
      { backupId: backupJob.backupId },
      metadata,
      { upsert: true }
    );
  }

  determineRetentionPolicy(backupJob) {
    const dayOfWeek = backupJob.startTime.getDay();
    const dayOfMonth = backupJob.startTime.getDate();

    if (dayOfMonth === 1) return 'monthly';
    if (dayOfWeek === 0) return 'weekly'; // Sunday
    return 'daily';
  }

  calculateExpirationDate(backupJob) {
    const retentionPolicy = this.determineRetentionPolicy(backupJob);
    const startTime = backupJob.startTime;

    switch (retentionPolicy) {
      case 'monthly':
        return new Date(startTime.getTime() + (this.config.monthlyBackupRetention * 30 * 24 * 60 * 60 * 1000));
      case 'weekly':
        return new Date(startTime.getTime() + (this.config.weeklyBackupRetention * 7 * 24 * 60 * 60 * 1000));
      default:
        return new Date(startTime.getTime() + (this.config.dailyBackupRetention * 24 * 60 * 60 * 1000));
    }
  }

  generateBackupId(type) {
    const timestamp = new Date().toISOString().replace(/[:.]/g, '-');
    return `backup_${type}_${timestamp}_${Math.random().toString(36).substr(2, 9)}`;
  }

  generateRecoveryId() {
    const timestamp = new Date().toISOString().replace(/[:.]/g, '-');
    return `recovery_${timestamp}_${Math.random().toString(36).substr(2, 9)}`;
  }

  async shutdown() {
    console.log('Shutting down backup and recovery manager...');

    try {
      // Stop all scheduled backups
      for (const [scheduleId, schedule] of this.scheduledBackups.entries()) {
        clearInterval(schedule.interval);
      }

      // Wait for active backup jobs to complete
      for (const [backupId, backupJob] of this.backupJobs.entries()) {
        if (backupJob.status === 'running') {
          console.log(`Waiting for backup to complete: ${backupId}`);
          // In a real implementation, we would wait for or gracefully cancel the backup
        }
      }

      // Close MongoDB connection
      if (this.client) {
        await this.client.close();
      }

      console.log('Backup and recovery manager shutdown complete');

    } catch (error) {
      console.error('Error during shutdown:', error);
    }
  }

  // Additional methods would include implementations for:
  // - setupAutomatedBackups()
  // - setupBackupMonitoring() 
  // - setupPointInTimeRecovery()
  // - executeOplogBackup()
  // - executeRestore()
  // - executeSampleRestore()
  // - verifySampleData()
  // - cleanupTestDatabase()
  // - verifyRestoreIntegrity()
}

// Benefits of MongoDB Advanced Backup and Recovery:
// - Automated backup scheduling with flexible retention policies
// - Comprehensive backup verification and integrity checking
// - Point-in-time recovery capabilities with oplog integration
// - Incremental backup support for efficient storage utilization
// - Advanced compression and optimization for large databases
// - Parallel backup and recovery operations for improved performance
// - Comprehensive monitoring and alerting for backup operations
// - Disaster recovery capabilities with replication and cloud sync
// - SQL-compatible backup management through QueryLeaf integration
// - Production-ready backup automation with minimal configuration

module.exports = {
  AdvancedBackupRecoveryManager
};

Understanding MongoDB Backup and Recovery Architecture

Advanced Backup Strategy Design and Implementation Patterns

Implement comprehensive backup and recovery workflows for enterprise MongoDB deployments:

// Enterprise-grade MongoDB backup and recovery with advanced disaster recovery capabilities
class EnterpriseBackupStrategy extends AdvancedBackupRecoveryManager {
  constructor(connectionString, enterpriseConfig) {
    super(connectionString, enterpriseConfig);

    this.enterpriseConfig = {
      ...enterpriseConfig,
      enableGeographicReplication: true,
      enableComplianceAuditing: true,
      enableAutomatedTesting: true,
      enableDisasterRecoveryProcedures: true,
      enableCapacityPlanning: true
    };

    this.setupEnterpriseBackupStrategy();
    this.initializeDisasterRecoveryProcedures();
    this.setupComplianceAuditing();
  }

  async implementAdvancedBackupStrategy() {
    console.log('Implementing enterprise backup strategy...');

    const backupStrategy = {
      // Multi-tier backup strategy
      backupTiers: {
        primaryBackups: {
          frequency: 'daily',
          retentionDays: 30,
          compressionLevel: 9,
          verificationLevel: 'full'
        },
        secondaryBackups: {
          frequency: 'hourly',
          retentionDays: 7,
          compressionLevel: 6,
          verificationLevel: 'checksum'
        },
        archivalBackups: {
          frequency: 'monthly',
          retentionMonths: 84, // 7 years for compliance
          compressionLevel: 9,
          verificationLevel: 'full'
        }
      },

      // Disaster recovery configuration
      disasterRecovery: {
        geographicReplication: true,
        crossRegionBackups: true,
        automatedFailoverTesting: true,
        recoveryTimeObjective: 4 * 60 * 60 * 1000, // 4 hours
        recoveryPointObjective: 15 * 60 * 1000 // 15 minutes
      },

      // Performance optimization
      performanceOptimization: {
        parallelBackupStreams: 8,
        networkOptimization: true,
        storageOptimization: true,
        resourceThrottling: true
      }
    };

    return await this.deployEnterpriseStrategy(backupStrategy);
  }

  async setupComplianceAuditing() {
    console.log('Setting up compliance auditing for backup operations...');

    const auditingConfig = {
      // Regulatory compliance
      regulations: ['SOX', 'GDPR', 'HIPAA', 'PCI-DSS'],
      auditTrailRetention: 7 * 365, // 7 years
      encryptionStandards: ['AES-256', 'RSA-2048'],
      accessControlAuditing: true,

      // Data governance
      dataClassification: {
        sensitiveDataHandling: true,
        dataRetentionPolicies: true,
        dataLineageTracking: true,
        privacyCompliance: true
      }
    };

    return await this.deployComplianceFramework(auditingConfig);
  }
}

SQL-Style Backup and Recovery with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB backup and recovery operations:

-- QueryLeaf advanced backup and recovery with SQL-familiar syntax for MongoDB

-- Configure comprehensive backup strategy
CONFIGURE BACKUP_STRATEGY 
SET strategy_name = 'enterprise_backup',
    backup_types = ['full', 'incremental', 'differential'],

    -- Full backup configuration
    full_backup_schedule = '0 2 * * 0',  -- Weekly on Sunday at 2 AM
    full_backup_retention_days = 90,
    full_backup_compression_level = 9,

    -- Incremental backup configuration  
    incremental_backup_schedule = '0 */6 * * *',  -- Every 6 hours
    incremental_backup_retention_days = 14,
    incremental_backup_compression_level = 6,

    -- Point-in-time recovery
    enable_point_in_time_recovery = true,
    oplog_retention_hours = 168,  -- 7 days
    recovery_point_objective_minutes = 15,
    recovery_time_objective_hours = 4,

    -- Storage and performance
    backup_storage_path = '/backup/mongodb',
    enable_compression = true,
    enable_encryption = true,
    parallel_backup_streams = 8,
    max_backup_bandwidth_mbps = 1000,

    -- Verification and validation
    enable_backup_verification = true,
    verification_sample_size = 10000,
    enable_checksum_validation = true,
    enable_restore_testing = true,

    -- Disaster recovery
    enable_geographic_replication = true,
    cross_region_backup_locations = ['us-east-1', 'eu-west-1'],
    enable_automated_failover_testing = true,

    -- Monitoring and alerting
    enable_backup_monitoring = true,
    alert_on_backup_failure = true,
    alert_on_backup_delay_minutes = 60,
    alert_on_verification_failure = true;

-- Execute comprehensive backup with monitoring
WITH backup_execution AS (
  SELECT 
    backup_id,
    backup_type,
    backup_start_time,
    backup_end_time,
    backup_status,
    backup_size_bytes,
    compression_ratio,

    -- Performance metrics
    EXTRACT(SECONDS FROM (backup_end_time - backup_start_time)) as backup_duration_seconds,
    CASE 
      WHEN EXTRACT(SECONDS FROM (backup_end_time - backup_start_time)) > 0 THEN
        (backup_size_bytes / 1024.0 / 1024.0) / EXTRACT(SECONDS FROM (backup_end_time - backup_start_time))
      ELSE 0
    END as throughput_mbps,

    -- Progress tracking
    collections_processed,
    total_collections,
    documents_processed,
    total_documents,
    CASE 
      WHEN total_documents > 0 THEN 
        (documents_processed * 100.0) / total_documents
      ELSE 0
    END as completion_percentage,

    -- Quality metrics
    backup_checksum,
    verification_status,
    verification_timestamp,

    -- Storage efficiency
    original_size_bytes,
    CASE 
      WHEN original_size_bytes > 0 THEN
        ((original_size_bytes - backup_size_bytes) * 100.0) / original_size_bytes
      ELSE 0
    END as compression_percentage,

    -- Error tracking
    error_message,
    warning_count,
    retry_count

  FROM BACKUP_JOBS('full', 'production_db')
  WHERE backup_start_time >= CURRENT_TIMESTAMP - INTERVAL '24 hours'
),

performance_analysis AS (
  SELECT 
    backup_type,
    COUNT(*) as total_backups,
    COUNT(*) FILTER (WHERE backup_status = 'completed') as successful_backups,
    COUNT(*) FILTER (WHERE backup_status = 'failed') as failed_backups,
    COUNT(*) FILTER (WHERE backup_status = 'running') as in_progress_backups,

    -- Performance statistics
    AVG(backup_duration_seconds) as avg_duration_seconds,
    PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY backup_duration_seconds) as p95_duration_seconds,
    AVG(throughput_mbps) as avg_throughput_mbps,
    MAX(throughput_mbps) as max_throughput_mbps,

    -- Size and compression analysis
    SUM(backup_size_bytes) as total_backup_size_bytes,
    AVG(compression_percentage) as avg_compression_percentage,

    -- Quality metrics
    COUNT(*) FILTER (WHERE verification_status = 'verified') as verified_backups,
    COUNT(*) FILTER (WHERE error_message IS NOT NULL) as backups_with_errors,
    AVG(warning_count) as avg_warnings_per_backup,

    -- Success rate calculations
    CASE 
      WHEN COUNT(*) > 0 THEN
        (COUNT(*) FILTER (WHERE backup_status = 'completed') * 100.0) / COUNT(*)
      ELSE 0
    END as success_rate_percentage,

    -- Recent trends
    COUNT(*) FILTER (
      WHERE backup_start_time >= CURRENT_TIMESTAMP - INTERVAL '7 days'
      AND backup_status = 'completed'
    ) as successful_backups_last_week

  FROM backup_execution
  GROUP BY backup_type
),

storage_analysis AS (
  SELECT 
    DATE_TRUNC('day', backup_start_time) as backup_date,
    SUM(backup_size_bytes) as daily_backup_size_bytes,
    COUNT(*) as daily_backup_count,
    AVG(compression_ratio) as avg_daily_compression_ratio,

    -- Growth analysis
    LAG(SUM(backup_size_bytes)) OVER (
      ORDER BY DATE_TRUNC('day', backup_start_time)
    ) as prev_day_backup_size,

    -- Storage efficiency
    SUM(original_size_bytes - backup_size_bytes) as daily_space_saved_bytes,

    -- Quality indicators
    COUNT(*) FILTER (WHERE verification_status = 'verified') as verified_backups_per_day,
    COUNT(*) FILTER (WHERE backup_status = 'failed') as failed_backups_per_day

  FROM backup_execution
  WHERE backup_start_time >= CURRENT_DATE - INTERVAL '30 days'
  GROUP BY DATE_TRUNC('day', backup_start_time)
)

SELECT 
  pa.backup_type,
  pa.total_backups,
  pa.successful_backups,
  pa.failed_backups,
  pa.in_progress_backups,

  -- Performance summary
  ROUND(pa.avg_duration_seconds, 1) as avg_backup_time_seconds,
  ROUND(pa.p95_duration_seconds, 1) as p95_backup_time_seconds,
  ROUND(pa.avg_throughput_mbps, 2) as avg_throughput_mbps,
  ROUND(pa.max_throughput_mbps, 2) as max_throughput_mbps,

  -- Storage summary
  ROUND(pa.total_backup_size_bytes / 1024.0 / 1024.0 / 1024.0, 2) as total_backup_size_gb,
  ROUND(pa.avg_compression_percentage, 1) as avg_compression_percent,

  -- Quality assessment
  pa.verified_backups,
  ROUND((pa.verified_backups * 100.0) / NULLIF(pa.successful_backups, 0), 1) as verification_rate_percent,
  pa.success_rate_percentage,

  -- Health indicators
  CASE 
    WHEN pa.success_rate_percentage < 95 THEN 'critical'
    WHEN pa.success_rate_percentage < 98 THEN 'warning'
    WHEN pa.avg_duration_seconds > 7200 THEN 'warning'  -- 2 hours
    ELSE 'healthy'
  END as backup_health_status,

  -- Operational recommendations
  CASE 
    WHEN pa.failed_backups > pa.total_backups * 0.05 THEN 'investigate_failures'
    WHEN pa.avg_duration_seconds > 3600 THEN 'optimize_performance'
    WHEN pa.avg_compression_percentage < 50 THEN 'review_compression_settings'
    WHEN pa.verified_backups < pa.successful_backups * 0.9 THEN 'improve_verification_coverage'
    ELSE 'monitor_continued'
  END as recommendation,

  -- Recent activity
  pa.successful_backups_last_week,
  CASE 
    WHEN pa.successful_backups_last_week < 7 AND pa.backup_type = 'full' THEN 'backup_frequency_low'
    WHEN pa.successful_backups_last_week < 28 AND pa.backup_type = 'incremental' THEN 'backup_frequency_low'
    ELSE 'backup_frequency_adequate'
  END as frequency_assessment,

  -- Storage trends from storage_analysis
  (SELECT 
     ROUND(AVG(sa.daily_backup_size_bytes) / 1024.0 / 1024.0, 1) 
   FROM storage_analysis sa 
   WHERE sa.backup_date >= CURRENT_DATE - INTERVAL '7 days'
  ) as avg_daily_backup_size_mb,

  (SELECT 
     ROUND(SUM(sa.daily_space_saved_bytes) / 1024.0 / 1024.0 / 1024.0, 2) 
   FROM storage_analysis sa 
   WHERE sa.backup_date >= CURRENT_DATE - INTERVAL '30 days'
  ) as total_space_saved_last_month_gb

FROM performance_analysis pa
ORDER BY pa.backup_type;

-- Point-in-time recovery analysis and recommendations
WITH recovery_scenarios AS (
  SELECT 
    recovery_id,
    backup_id,
    recovery_type,
    target_timestamp,
    recovery_start_time,
    recovery_end_time,
    recovery_status,

    -- Recovery performance
    EXTRACT(SECONDS FROM (recovery_end_time - recovery_start_time)) as recovery_duration_seconds,
    documents_restored,
    collections_restored,
    restored_data_size_bytes,

    -- Recovery quality
    data_consistency_verified,
    index_rebuild_required,
    post_recovery_validation_status,

    -- Business impact
    downtime_seconds,
    affected_applications,
    recovery_point_achieved,
    recovery_time_objective_met,

    -- Error tracking
    recovery_errors,
    manual_intervention_required

  FROM RECOVERY_OPERATIONS
  WHERE recovery_start_time >= CURRENT_TIMESTAMP - INTERVAL '90 days'
),

recovery_performance AS (
  SELECT 
    recovery_type,
    COUNT(*) as total_recoveries,
    COUNT(*) FILTER (WHERE recovery_status = 'completed') as successful_recoveries,
    COUNT(*) FILTER (WHERE recovery_status = 'failed') as failed_recoveries,

    -- Performance metrics
    AVG(recovery_duration_seconds) as avg_recovery_time_seconds,
    PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY recovery_duration_seconds) as p95_recovery_time_seconds,
    AVG(downtime_seconds) as avg_downtime_seconds,

    -- Data recovery metrics
    SUM(documents_restored) as total_documents_recovered,
    AVG(restored_data_size_bytes) as avg_data_size_recovered,

    -- Quality metrics
    COUNT(*) FILTER (WHERE data_consistency_verified = true) as verified_recoveries,
    COUNT(*) FILTER (WHERE recovery_time_objective_met = true) as rto_met_count,
    COUNT(*) FILTER (WHERE manual_intervention_required = true) as manual_intervention_count,

    -- Success rate
    CASE 
      WHEN COUNT(*) > 0 THEN
        (COUNT(*) FILTER (WHERE recovery_status = 'completed') * 100.0) / COUNT(*)
      ELSE 0
    END as recovery_success_rate_percent

  FROM recovery_scenarios
  GROUP BY recovery_type
),

backup_recovery_readiness AS (
  SELECT 
    backup_id,
    backup_type,
    backup_timestamp,
    backup_size_bytes,
    backup_status,
    verification_status,

    -- Recovery readiness assessment
    CASE 
      WHEN backup_status = 'completed' AND verification_status = 'verified' THEN 'ready'
      WHEN backup_status = 'completed' AND verification_status = 'pending' THEN 'needs_verification'
      WHEN backup_status = 'completed' AND verification_status = 'failed' THEN 'not_reliable'
      WHEN backup_status = 'failed' THEN 'not_available'
      ELSE 'unknown'
    END as recovery_readiness,

    -- Age assessment for recovery planning
    EXTRACT(DAYS FROM (CURRENT_TIMESTAMP - backup_timestamp)) as backup_age_days,
    CASE 
      WHEN EXTRACT(DAYS FROM (CURRENT_TIMESTAMP - backup_timestamp)) <= 1 THEN 'very_recent'
      WHEN EXTRACT(DAYS FROM (CURRENT_TIMESTAMP - backup_timestamp)) <= 7 THEN 'recent'
      WHEN EXTRACT(DAYS FROM (CURRENT_TIMESTAMP - backup_timestamp)) <= 30 THEN 'moderate'
      ELSE 'old'
    END as backup_age_category,

    -- Estimated recovery time based on size
    CASE 
      WHEN backup_size_bytes < 1024 * 1024 * 1024 THEN 'fast'      -- < 1GB
      WHEN backup_size_bytes < 10 * 1024 * 1024 * 1024 THEN 'moderate' -- < 10GB  
      WHEN backup_size_bytes < 100 * 1024 * 1024 * 1024 THEN 'slow'     -- < 100GB
      ELSE 'very_slow'                                                   -- >= 100GB
    END as estimated_recovery_speed

  FROM backup_jobs
  WHERE backup_timestamp >= CURRENT_TIMESTAMP - INTERVAL '30 days'
  AND backup_type IN ('full', 'incremental')
)

SELECT 
  rp.recovery_type,
  rp.total_recoveries,
  rp.successful_recoveries,
  rp.failed_recoveries,
  ROUND(rp.recovery_success_rate_percent, 1) as success_rate_percent,

  -- Performance summary
  ROUND(rp.avg_recovery_time_seconds / 60.0, 1) as avg_recovery_time_minutes,
  ROUND(rp.p95_recovery_time_seconds / 60.0, 1) as p95_recovery_time_minutes,
  ROUND(rp.avg_downtime_seconds / 60.0, 1) as avg_downtime_minutes,

  -- Data recovery summary  
  rp.total_documents_recovered,
  ROUND(rp.avg_data_size_recovered / 1024.0 / 1024.0, 1) as avg_data_recovered_mb,

  -- Quality assessment
  rp.verified_recoveries,
  ROUND((rp.verified_recoveries * 100.0) / NULLIF(rp.successful_recoveries, 0), 1) as verification_rate_percent,
  rp.rto_met_count,
  ROUND((rp.rto_met_count * 100.0) / NULLIF(rp.total_recoveries, 0), 1) as rto_achievement_percent,

  -- Operational indicators
  rp.manual_intervention_count,
  CASE 
    WHEN rp.recovery_success_rate_percent < 95 THEN 'critical'
    WHEN rp.avg_recovery_time_seconds > 14400 THEN 'warning'  -- 4 hours
    WHEN rp.manual_intervention_count > rp.total_recoveries * 0.2 THEN 'warning'
    ELSE 'healthy'
  END as recovery_health_status,

  -- Backup readiness summary
  (SELECT COUNT(*) 
   FROM backup_recovery_readiness brr 
   WHERE brr.recovery_readiness = 'ready' 
   AND brr.backup_age_category IN ('very_recent', 'recent')
  ) as ready_recent_backups,

  (SELECT COUNT(*) 
   FROM backup_recovery_readiness brr 
   WHERE brr.recovery_readiness = 'needs_verification'
  ) as backups_needing_verification,

  -- Recovery capability assessment
  CASE 
    WHEN rp.avg_recovery_time_seconds <= 3600 THEN 'excellent'  -- ≤ 1 hour
    WHEN rp.avg_recovery_time_seconds <= 14400 THEN 'good'      -- ≤ 4 hours  
    WHEN rp.avg_recovery_time_seconds <= 28800 THEN 'acceptable' -- ≤ 8 hours
    ELSE 'needs_improvement'
  END as recovery_capability_rating,

  -- Recommendations
  ARRAY[
    CASE WHEN rp.recovery_success_rate_percent < 98 THEN 'Improve backup verification processes' END,
    CASE WHEN rp.avg_recovery_time_seconds > 7200 THEN 'Optimize recovery performance' END,
    CASE WHEN rp.manual_intervention_count > 0 THEN 'Automate recovery procedures' END,
    CASE WHEN rp.rto_achievement_percent < 90 THEN 'Review recovery time objectives' END
  ]::TEXT[] as improvement_recommendations

FROM recovery_performance rp
ORDER BY rp.recovery_type;

-- Disaster recovery readiness assessment
CREATE VIEW disaster_recovery_dashboard AS
WITH current_backup_status AS (
  SELECT 
    backup_type,
    COUNT(*) as total_backups,
    COUNT(*) FILTER (WHERE backup_status = 'completed') as completed_backups,
    COUNT(*) FILTER (WHERE verification_status = 'verified') as verified_backups,
    MAX(backup_timestamp) as latest_backup_time,

    -- Recovery point assessment
    MIN(EXTRACT(MINUTES FROM (CURRENT_TIMESTAMP - backup_timestamp))) as minutes_since_latest,

    -- Geographic distribution
    COUNT(DISTINCT backup_location) as backup_locations,
    COUNT(*) FILTER (WHERE backup_location LIKE '%cross-region%') as cross_region_backups

  FROM backup_jobs
  WHERE backup_timestamp >= CURRENT_TIMESTAMP - INTERVAL '7 days'
  GROUP BY backup_type
),

disaster_scenarios AS (
  SELECT 
    scenario_name,
    scenario_type,
    estimated_data_loss_minutes,
    estimated_recovery_time_hours,
    recovery_success_probability,
    last_tested_date,
    test_result_status

  FROM disaster_recovery_tests
  WHERE test_date >= CURRENT_TIMESTAMP - INTERVAL '90 days'
),

compliance_status AS (
  SELECT 
    regulation_name,
    compliance_status,
    last_audit_date,
    next_audit_due_date,
    backup_retention_requirement_days,
    encryption_requirement_met,
    access_control_requirement_met

  FROM compliance_audits
  WHERE audit_type = 'backup_recovery'
)

SELECT 
  CURRENT_TIMESTAMP as dashboard_timestamp,

  -- Overall backup health
  (SELECT 
     CASE 
       WHEN MIN(minutes_since_latest) <= 60 AND 
            AVG((completed_backups * 100.0) / total_backups) >= 95 THEN 'excellent'
       WHEN MIN(minutes_since_latest) <= 240 AND 
            AVG((completed_backups * 100.0) / total_backups) >= 90 THEN 'good'  
       WHEN MIN(minutes_since_latest) <= 1440 AND 
            AVG((completed_backups * 100.0) / total_backups) >= 85 THEN 'acceptable'
       ELSE 'critical'
     END 
   FROM current_backup_status) as overall_backup_health,

  -- Recovery readiness
  (SELECT 
     CASE
       WHEN COUNT(*) FILTER (WHERE recovery_success_probability >= 0.95) = COUNT(*) THEN 'fully_ready'
       WHEN COUNT(*) FILTER (WHERE recovery_success_probability >= 0.90) >= COUNT(*) * 0.8 THEN 'mostly_ready' 
       WHEN COUNT(*) FILTER (WHERE recovery_success_probability >= 0.75) >= COUNT(*) * 0.6 THEN 'partially_ready'
       ELSE 'not_ready'
     END
   FROM disaster_scenarios) as disaster_recovery_readiness,

  -- Compliance status
  (SELECT 
     CASE 
       WHEN COUNT(*) FILTER (WHERE compliance_status = 'compliant') = COUNT(*) THEN 'fully_compliant'
       WHEN COUNT(*) FILTER (WHERE compliance_status = 'compliant') >= COUNT(*) * 0.8 THEN 'mostly_compliant'
       ELSE 'non_compliant'
     END
   FROM compliance_status) as regulatory_compliance_status,

  -- Detailed metrics
  (SELECT JSON_AGG(
     JSON_BUILD_OBJECT(
       'backup_type', backup_type,
       'completion_rate', ROUND((completed_backups * 100.0) / total_backups, 1),
       'verification_rate', ROUND((verified_backups * 100.0) / completed_backups, 1),
       'minutes_since_latest', minutes_since_latest,
       'geographic_distribution', backup_locations,
       'cross_region_backups', cross_region_backups
     )
   ) FROM current_backup_status) as backup_status_details,

  -- Critical alerts
  ARRAY[
    CASE WHEN (SELECT MIN(minutes_since_latest) FROM current_backup_status) > 1440 
         THEN 'CRITICAL: No recent backups found (>24 hours)' END,
    CASE WHEN (SELECT COUNT(*) FROM disaster_scenarios WHERE last_tested_date < CURRENT_DATE - INTERVAL '90 days') > 0
         THEN 'WARNING: Disaster recovery procedures not recently tested' END,
    CASE WHEN (SELECT COUNT(*) FROM compliance_status WHERE compliance_status != 'compliant') > 0
         THEN 'WARNING: Compliance violations detected' END,
    CASE WHEN (SELECT AVG((verified_backups * 100.0) / completed_backups) FROM current_backup_status) < 90
         THEN 'WARNING: Low backup verification rate' END
  ]::TEXT[] as critical_alerts;

-- QueryLeaf provides comprehensive backup and recovery capabilities:
-- 1. SQL-familiar syntax for MongoDB backup configuration and management
-- 2. Advanced backup scheduling with flexible retention policies
-- 3. Comprehensive backup verification and integrity monitoring
-- 4. Point-in-time recovery capabilities with oplog integration
-- 5. Disaster recovery planning and readiness assessment
-- 6. Compliance auditing and regulatory requirement management
-- 7. Performance monitoring and optimization recommendations
-- 8. Automated backup testing and recovery validation
-- 9. Enterprise-grade backup management with minimal configuration
-- 10. Production-ready disaster recovery automation and procedures

Best Practices for Production Backup and Recovery

Backup Strategy Design Principles

Essential principles for effective MongoDB backup and recovery deployment:

  1. Multi-Tier Backup Strategy: Implement multiple backup frequencies and retention policies for different recovery scenarios
  2. Verification and Testing: Establish comprehensive backup verification and regular recovery testing procedures
  3. Point-in-Time Recovery: Configure oplog capture and incremental backups for granular recovery capabilities
  4. Geographic Distribution: Implement cross-region backup replication for disaster recovery protection
  5. Performance Optimization: Balance backup frequency with system performance impact through intelligent scheduling
  6. Compliance Integration: Ensure backup procedures meet regulatory requirements and audit standards

Enterprise Backup Architecture

Design backup systems for enterprise-scale requirements:

  1. Automated Scheduling: Implement intelligent backup scheduling based on business requirements and system load
  2. Storage Management: Optimize backup storage with compression, deduplication, and lifecycle management
  3. Monitoring Integration: Integrate backup monitoring with existing alerting and operational workflows
  4. Security Controls: Implement encryption, access controls, and audit trails for backup security
  5. Disaster Recovery: Design comprehensive disaster recovery procedures with automated failover capabilities
  6. Capacity Planning: Monitor backup growth patterns and plan storage capacity requirements

Conclusion

MongoDB backup and recovery provides comprehensive data protection capabilities that enable robust disaster recovery, regulatory compliance, and business continuity through automated backup scheduling, point-in-time recovery, and advanced verification features. The native backup tools and integrated recovery procedures ensure that critical data is protected with minimal operational overhead.

Key MongoDB Backup and Recovery benefits include:

  • Automated Protection: Intelligent backup scheduling with comprehensive retention policies and automated lifecycle management
  • Advanced Recovery Options: Point-in-time recovery capabilities with oplog integration and incremental backup support
  • Enterprise Reliability: Production-ready backup verification, disaster recovery procedures, and compliance auditing
  • Performance Optimization: Efficient backup compression, parallel processing, and minimal performance impact
  • Operational Excellence: Comprehensive monitoring, alerting, and automated testing for backup system reliability
  • SQL Accessibility: Familiar SQL-style backup management operations through QueryLeaf for accessible data protection

Whether you're protecting mission-critical applications, meeting regulatory compliance requirements, implementing disaster recovery procedures, or managing enterprise backup operations, MongoDB backup and recovery with QueryLeaf's familiar SQL interface provides the foundation for comprehensive, reliable data protection.

QueryLeaf Integration: QueryLeaf automatically optimizes MongoDB backup and recovery operations while providing SQL-familiar syntax for backup configuration, monitoring, and recovery procedures. Advanced backup strategies, disaster recovery planning, and compliance auditing are seamlessly handled through familiar SQL constructs, making sophisticated data protection accessible to SQL-oriented operations teams.

The combination of MongoDB's robust backup capabilities with SQL-style data protection operations makes it an ideal platform for applications requiring both comprehensive data protection and familiar database management patterns, ensuring your critical data remains secure and recoverable as your systems scale and evolve.

MongoDB Data Pipeline Management and Stream Processing: Advanced Real-Time Data Processing and ETL Pipelines for Modern Applications

Modern data-driven applications require sophisticated data processing pipelines that can handle real-time data ingestion, complex transformations, and reliable data delivery across multiple systems and formats. Traditional batch processing approaches struggle with latency requirements, data volume scalability, and the complexity of managing distributed processing workflows. Effective data pipeline management demands real-time stream processing, incremental data transformations, and intelligent error handling mechanisms.

MongoDB's comprehensive data pipeline capabilities provide advanced stream processing features through Change Streams, Aggregation Framework, and native pipeline orchestration that enable sophisticated real-time data processing workflows. Unlike traditional ETL systems that require separate infrastructure components and complex coordination mechanisms, MongoDB integrates stream processing directly into the database with optimized pipeline execution, automatic scaling, and built-in fault tolerance.

The Traditional Data Pipeline Challenge

Conventional approaches to data pipeline management in relational systems face significant limitations in real-time processing:

-- Traditional PostgreSQL data pipeline management - complex batch processing with limited real-time capabilities

-- Basic ETL tracking table with limited functionality
CREATE TABLE etl_job_runs (
    run_id SERIAL PRIMARY KEY,
    job_name VARCHAR(255) NOT NULL,
    job_type VARCHAR(100) NOT NULL,
    source_system VARCHAR(100),
    target_system VARCHAR(100),

    -- Job execution tracking
    start_time TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    end_time TIMESTAMP,
    status VARCHAR(50) DEFAULT 'running',

    -- Basic metrics (very limited)
    records_processed INTEGER DEFAULT 0,
    records_inserted INTEGER DEFAULT 0,
    records_updated INTEGER DEFAULT 0,
    records_deleted INTEGER DEFAULT 0,
    records_failed INTEGER DEFAULT 0,

    -- Error tracking (basic)
    error_message TEXT,
    error_count INTEGER DEFAULT 0,

    -- Resource usage (manual tracking)
    cpu_usage_percent DECIMAL(5,2),
    memory_usage_mb INTEGER,
    disk_io_mb INTEGER,

    -- Basic configuration
    batch_size INTEGER DEFAULT 1000,
    parallel_workers INTEGER DEFAULT 1,
    retry_attempts INTEGER DEFAULT 3
);

-- Data transformation rules (static and inflexible)
CREATE TABLE transformation_rules (
    rule_id SERIAL PRIMARY KEY,
    rule_name VARCHAR(255) NOT NULL,
    source_table VARCHAR(255),
    target_table VARCHAR(255),
    transformation_type VARCHAR(100),

    -- Transformation logic (limited SQL expressions)
    source_columns TEXT[],
    target_columns TEXT[],
    transformation_sql TEXT,

    -- Basic validation rules
    validation_rules TEXT[],
    data_quality_checks TEXT[],

    -- Rule metadata
    active BOOLEAN DEFAULT true,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    created_by VARCHAR(100)
);

-- Simple batch processing function (no real-time capabilities)
CREATE OR REPLACE FUNCTION execute_batch_etl(
    job_name_param VARCHAR(255),
    batch_size_param INTEGER DEFAULT 1000
) RETURNS TABLE (
    run_id INTEGER,
    records_processed INTEGER,
    execution_time_seconds INTEGER,
    status VARCHAR(50),
    error_message TEXT
) AS $$
DECLARE
    current_run_id INTEGER;
    processing_start TIMESTAMP;
    processing_end TIMESTAMP;
    batch_count INTEGER := 0;
    total_records INTEGER := 0;
    error_msg TEXT := '';
    processing_status VARCHAR(50) := 'completed';
BEGIN
    -- Start new job run
    INSERT INTO etl_job_runs (job_name, job_type, status)
    VALUES (job_name_param, 'batch_etl', 'running')
    RETURNING etl_job_runs.run_id INTO current_run_id;

    processing_start := clock_timestamp();

    BEGIN
        -- Very basic batch processing loop
        LOOP
            -- Simulate batch processing (would be actual data transformation in reality)
            PERFORM pg_sleep(0.1); -- Simulate processing time

            batch_count := batch_count + 1;
            total_records := total_records + batch_size_param;

            -- Simple exit condition (no real data source integration)
            EXIT WHEN batch_count >= 10; -- Process 10 batches maximum

        END LOOP;

    EXCEPTION WHEN OTHERS THEN
        error_msg := SQLERRM;
        processing_status := 'failed';

    END;

    processing_end := clock_timestamp();

    -- Update job run status
    UPDATE etl_job_runs 
    SET 
        end_time = processing_end,
        status = processing_status,
        records_processed = total_records,
        records_inserted = total_records,
        error_message = error_msg,
        error_count = CASE WHEN processing_status = 'failed' THEN 1 ELSE 0 END
    WHERE etl_job_runs.run_id = current_run_id;

    -- Return execution results
    RETURN QUERY SELECT 
        current_run_id,
        total_records,
        EXTRACT(SECONDS FROM (processing_end - processing_start))::INTEGER,
        processing_status,
        error_msg;

END;
$$ LANGUAGE plpgsql;

-- Execute batch ETL job (very basic functionality)
SELECT * FROM execute_batch_etl('customer_data_sync', 500);

-- Data quality monitoring (limited real-time capabilities)
WITH data_quality_metrics AS (
    SELECT 
        ejr.job_name,
        ejr.run_id,
        ejr.start_time,
        ejr.end_time,
        ejr.records_processed,
        ejr.records_failed,

        -- Basic quality calculations
        CASE 
            WHEN ejr.records_processed > 0 THEN 
                ROUND((ejr.records_processed - ejr.records_failed)::DECIMAL / ejr.records_processed * 100, 2)
            ELSE 0
        END as success_rate_percent,

        -- Processing rate
        CASE 
            WHEN EXTRACT(SECONDS FROM (ejr.end_time - ejr.start_time)) > 0 THEN
                ROUND(ejr.records_processed::DECIMAL / EXTRACT(SECONDS FROM (ejr.end_time - ejr.start_time)), 2)
            ELSE 0
        END as records_per_second,

        -- Basic status assessment
        CASE ejr.status
            WHEN 'completed' THEN 'success'
            WHEN 'failed' THEN 'failure'
            ELSE 'unknown'
        END as quality_status

    FROM etl_job_runs ejr
    WHERE ejr.start_time >= CURRENT_DATE - INTERVAL '7 days'
),

quality_summary AS (
    SELECT 
        job_name,
        COUNT(*) as total_runs,
        COUNT(*) FILTER (WHERE quality_status = 'success') as successful_runs,
        COUNT(*) FILTER (WHERE quality_status = 'failure') as failed_runs,

        -- Quality metrics
        AVG(success_rate_percent) as avg_success_rate,
        AVG(records_per_second) as avg_processing_rate,
        SUM(records_processed) as total_records_processed,
        SUM(records_failed) as total_records_failed,

        -- Time-based analysis
        AVG(EXTRACT(SECONDS FROM (end_time - start_time))) as avg_execution_seconds,
        MAX(EXTRACT(SECONDS FROM (end_time - start_time))) as max_execution_seconds,
        MIN(start_time) as first_run,
        MAX(end_time) as last_run

    FROM data_quality_metrics
    GROUP BY job_name
)

SELECT 
    job_name,
    total_runs,
    successful_runs,
    failed_runs,

    -- Success rates
    CASE 
        WHEN total_runs > 0 THEN 
            ROUND((successful_runs::DECIMAL / total_runs) * 100, 1)
        ELSE 0
    END as job_success_rate_percent,

    -- Performance metrics
    ROUND(avg_success_rate, 1) as avg_record_success_rate_percent,
    ROUND(avg_processing_rate, 1) as avg_records_per_second,
    total_records_processed,
    total_records_failed,

    -- Timing analysis
    ROUND(avg_execution_seconds, 1) as avg_duration_seconds,
    ROUND(max_execution_seconds, 1) as max_duration_seconds,

    -- Data quality assessment
    CASE 
        WHEN failed_runs = 0 AND avg_success_rate > 98 THEN 'excellent'
        WHEN failed_runs <= total_runs * 0.05 AND avg_success_rate > 95 THEN 'good'
        WHEN failed_runs <= total_runs * 0.1 AND avg_success_rate > 90 THEN 'acceptable'
        ELSE 'poor'
    END as data_quality_rating,

    -- Recommendations
    CASE 
        WHEN failed_runs > total_runs * 0.1 THEN 'investigate_failures'
        WHEN avg_processing_rate < 100 THEN 'optimize_performance'
        WHEN max_execution_seconds > avg_execution_seconds * 3 THEN 'check_consistency'
        ELSE 'monitor_continued'
    END as recommendation

FROM quality_summary
ORDER BY total_records_processed DESC;

-- Real-time data change tracking (very limited functionality)
CREATE TABLE data_changes (
    change_id SERIAL PRIMARY KEY,
    table_name VARCHAR(255) NOT NULL,
    operation_type VARCHAR(10) NOT NULL, -- INSERT, UPDATE, DELETE
    record_id VARCHAR(100),

    -- Change tracking (basic)
    old_values JSONB,
    new_values JSONB,
    changed_columns TEXT[],

    -- Metadata
    change_timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    user_id VARCHAR(100),
    application_name VARCHAR(100),

    -- Processing status
    processed BOOLEAN DEFAULT false,
    processing_attempts INTEGER DEFAULT 0,
    last_processing_attempt TIMESTAMP,
    processing_error TEXT
);

-- Basic trigger function for change tracking
CREATE OR REPLACE FUNCTION track_data_changes()
RETURNS TRIGGER AS $$
BEGIN
    -- Insert change record (very basic functionality)
    INSERT INTO data_changes (
        table_name,
        operation_type,
        record_id,
        old_values,
        new_values,
        user_id
    )
    VALUES (
        TG_TABLE_NAME,
        TG_OP,
        CASE 
            WHEN TG_OP = 'DELETE' THEN OLD.id::TEXT
            ELSE NEW.id::TEXT
        END,
        CASE 
            WHEN TG_OP = 'DELETE' THEN to_jsonb(OLD)
            WHEN TG_OP = 'UPDATE' THEN to_jsonb(OLD)
            ELSE NULL
        END,
        CASE 
            WHEN TG_OP = 'DELETE' THEN NULL
            ELSE to_jsonb(NEW)
        END,
        current_user
    );

    -- Return appropriate record
    CASE TG_OP
        WHEN 'DELETE' THEN RETURN OLD;
        ELSE RETURN NEW;
    END CASE;

EXCEPTION WHEN OTHERS THEN
    -- Basic error handling (logs errors but doesn't stop operations)
    RAISE WARNING 'Change tracking failed: %', SQLERRM;
    CASE TG_OP
        WHEN 'DELETE' THEN RETURN OLD;
        ELSE RETURN NEW;
    END CASE;
END;
$$ LANGUAGE plpgsql;

-- Process pending changes (batch processing only)
WITH pending_changes AS (
    SELECT 
        change_id,
        table_name,
        operation_type,
        new_values,
        old_values,
        change_timestamp,

        -- Group changes by time windows for batch processing
        DATE_TRUNC('minute', change_timestamp) as processing_window

    FROM data_changes
    WHERE processed = false 
    AND processing_attempts < 3
    ORDER BY change_timestamp
    LIMIT 1000
),

change_summary AS (
    SELECT 
        processing_window,
        table_name,
        operation_type,
        COUNT(*) as change_count,
        MIN(change_timestamp) as first_change,
        MAX(change_timestamp) as last_change,

        -- Basic aggregations (very limited analysis)
        COUNT(*) FILTER (WHERE operation_type = 'INSERT') as inserts,
        COUNT(*) FILTER (WHERE operation_type = 'UPDATE') as updates,
        COUNT(*) FILTER (WHERE operation_type = 'DELETE') as deletes

    FROM pending_changes
    GROUP BY processing_window, table_name, operation_type
)

SELECT 
    processing_window,
    table_name,
    operation_type,
    change_count,
    first_change,
    last_change,

    -- Change rate analysis
    CASE 
        WHEN EXTRACT(SECONDS FROM (last_change - first_change)) > 0 THEN
            ROUND(change_count::DECIMAL / EXTRACT(SECONDS FROM (last_change - first_change)), 2)
        ELSE change_count
    END as changes_per_second,

    -- Processing recommendations (very basic)
    CASE 
        WHEN change_count > 1000 THEN 'high_volume_batch'
        WHEN change_count > 100 THEN 'medium_batch'
        ELSE 'small_batch'
    END as processing_strategy,

    -- Simple priority assessment
    CASE table_name
        WHEN 'users' THEN 'high'
        WHEN 'orders' THEN 'high'
        WHEN 'products' THEN 'medium'
        ELSE 'low'
    END as processing_priority

FROM change_summary
ORDER BY processing_window DESC, change_count DESC;

-- Problems with traditional data pipeline approaches:
-- 1. No real-time processing - only batch operations with delays
-- 2. Limited transformation capabilities - basic SQL only
-- 3. Poor scalability - single-threaded processing
-- 4. Manual error handling and recovery
-- 5. No automatic schema evolution or data type handling
-- 6. Limited monitoring and observability
-- 7. Complex integration with external systems
-- 8. No built-in data quality validation
-- 9. Difficult to maintain and debug complex pipelines
-- 10. No support for stream processing or event-driven architectures

MongoDB provides comprehensive data pipeline management with advanced stream processing capabilities:

// MongoDB Advanced Data Pipeline Management and Stream Processing
const { MongoClient, ChangeStream } = require('mongodb');
const { EventEmitter } = require('events');

// Comprehensive MongoDB Data Pipeline Manager
class AdvancedDataPipelineManager extends EventEmitter {
  constructor(mongoUri, pipelineConfig = {}) {
    super();
    this.mongoUri = mongoUri;
    this.client = null;
    this.db = null;

    // Advanced pipeline configuration
    this.config = {
      // Processing configuration
      enableRealTimeProcessing: pipelineConfig.enableRealTimeProcessing !== false,
      enableBatchProcessing: pipelineConfig.enableBatchProcessing !== false,
      enableStreamProcessing: pipelineConfig.enableStreamProcessing !== false,

      // Performance settings
      maxConcurrentPipelines: pipelineConfig.maxConcurrentPipelines || 10,
      batchSize: pipelineConfig.batchSize || 1000,
      maxRetries: pipelineConfig.maxRetries || 3,
      retryDelay: pipelineConfig.retryDelay || 1000,

      // Change stream configuration
      enableChangeStreams: pipelineConfig.enableChangeStreams !== false,
      changeStreamOptions: pipelineConfig.changeStreamOptions || {
        fullDocument: 'updateLookup',
        fullDocumentBeforeChange: 'whenAvailable'
      },

      // Data quality and validation
      enableDataValidation: pipelineConfig.enableDataValidation !== false,
      enableSchemaEvolution: pipelineConfig.enableSchemaEvolution || false,
      enableDataLineage: pipelineConfig.enableDataLineage || false,

      // Monitoring and observability
      enableMetrics: pipelineConfig.enableMetrics !== false,
      enablePipelineMonitoring: pipelineConfig.enablePipelineMonitoring !== false,
      enableErrorTracking: pipelineConfig.enableErrorTracking !== false,

      // Advanced features
      enableIncrementalProcessing: pipelineConfig.enableIncrementalProcessing || false,
      enableDataDeduplication: pipelineConfig.enableDataDeduplication || false,
      enableDataEnrichment: pipelineConfig.enableDataEnrichment || false
    };

    // Pipeline registry and state management
    this.pipelines = new Map();
    this.changeStreams = new Map();
    this.pipelineMetrics = new Map();
    this.activeProcessing = new Map();

    // Error tracking and recovery
    this.errorHistory = [];
    this.retryQueues = new Map();

    this.initializeDataPipelines();
  }

  async initializeDataPipelines() {
    console.log('Initializing advanced data pipeline management system...');

    try {
      // Connect to MongoDB
      this.client = new MongoClient(this.mongoUri);
      await this.client.connect();
      this.db = this.client.db();

      // Setup pipeline infrastructure
      await this.setupPipelineInfrastructure();

      // Initialize change streams if enabled
      if (this.config.enableChangeStreams) {
        await this.setupChangeStreams();
      }

      // Start pipeline monitoring
      if (this.config.enablePipelineMonitoring) {
        await this.startPipelineMonitoring();
      }

      console.log('Advanced data pipeline system initialized successfully');

    } catch (error) {
      console.error('Error initializing data pipeline system:', error);
      throw error;
    }
  }

  async setupPipelineInfrastructure() {
    console.log('Setting up pipeline infrastructure...');

    try {
      // Create collections for pipeline management
      const collections = {
        pipelineDefinitions: this.db.collection('pipeline_definitions'),
        pipelineRuns: this.db.collection('pipeline_runs'),
        pipelineMetrics: this.db.collection('pipeline_metrics'),
        dataLineage: this.db.collection('data_lineage'),
        pipelineErrors: this.db.collection('pipeline_errors'),
        transformationRules: this.db.collection('transformation_rules')
      };

      // Create indexes for optimal performance
      await collections.pipelineRuns.createIndex(
        { pipelineId: 1, startTime: -1 },
        { background: true }
      );

      await collections.pipelineMetrics.createIndex(
        { pipelineId: 1, timestamp: -1 },
        { background: true }
      );

      await collections.dataLineage.createIndex(
        { sourceCollection: 1, targetCollection: 1, timestamp: -1 },
        { background: true }
      );

      this.collections = collections;

    } catch (error) {
      console.error('Error setting up pipeline infrastructure:', error);
      throw error;
    }
  }

  async registerDataPipeline(pipelineDefinition) {
    console.log(`Registering data pipeline: ${pipelineDefinition.name}`);

    try {
      // Validate pipeline definition
      const validatedDefinition = await this.validatePipelineDefinition(pipelineDefinition);

      // Enhanced pipeline definition with metadata
      const enhancedDefinition = {
        ...validatedDefinition,
        pipelineId: this.generatePipelineId(validatedDefinition.name),

        // Pipeline metadata
        registeredAt: new Date(),
        version: pipelineDefinition.version || '1.0.0',
        status: 'registered',

        // Processing configuration
        processingMode: pipelineDefinition.processingMode || 'stream', // stream, batch, hybrid
        triggerType: pipelineDefinition.triggerType || 'change_stream', // change_stream, schedule, manual

        // Data transformation pipeline
        transformationStages: pipelineDefinition.transformationStages || [],

        // Data sources and targets
        dataSources: pipelineDefinition.dataSources || [],
        dataTargets: pipelineDefinition.dataTargets || [],

        // Quality and validation rules
        dataQualityRules: pipelineDefinition.dataQualityRules || [],
        schemaValidationRules: pipelineDefinition.schemaValidationRules || [],

        // Performance configuration
        performance: {
          batchSize: pipelineDefinition.batchSize || this.config.batchSize,
          maxConcurrency: pipelineDefinition.maxConcurrency || 5,
          timeoutMs: pipelineDefinition.timeoutMs || 300000,

          // Resource limits
          maxMemoryMB: pipelineDefinition.maxMemoryMB || 1024,
          maxCpuPercent: pipelineDefinition.maxCpuPercent || 80
        },

        // Error handling configuration
        errorHandling: {
          retryStrategy: pipelineDefinition.retryStrategy || 'exponential_backoff',
          maxRetries: pipelineDefinition.maxRetries || this.config.maxRetries,
          deadLetterQueue: pipelineDefinition.deadLetterQueue || true,
          errorNotifications: pipelineDefinition.errorNotifications || []
        },

        // Monitoring configuration
        monitoring: {
          enableMetrics: pipelineDefinition.enableMetrics !== false,
          metricsInterval: pipelineDefinition.metricsInterval || 60000,
          alertThresholds: pipelineDefinition.alertThresholds || {}
        }
      };

      // Store pipeline definition
      await this.collections.pipelineDefinitions.replaceOne(
        { pipelineId: enhancedDefinition.pipelineId },
        enhancedDefinition,
        { upsert: true }
      );

      // Register pipeline in memory
      this.pipelines.set(enhancedDefinition.pipelineId, {
        definition: enhancedDefinition,
        status: 'registered',
        lastRun: null,
        statistics: {
          totalRuns: 0,
          successfulRuns: 0,
          failedRuns: 0,
          totalRecordsProcessed: 0,
          averageProcessingTime: 0
        }
      });

      console.log(`Pipeline '${enhancedDefinition.name}' registered successfully with ID: ${enhancedDefinition.pipelineId}`);

      // Start pipeline if configured for automatic startup
      if (enhancedDefinition.autoStart) {
        await this.startPipeline(enhancedDefinition.pipelineId);
      }

      return {
        success: true,
        pipelineId: enhancedDefinition.pipelineId,
        definition: enhancedDefinition
      };

    } catch (error) {
      console.error(`Error registering pipeline '${pipelineDefinition.name}':`, error);
      return {
        success: false,
        error: error.message,
        pipelineDefinition: pipelineDefinition
      };
    }
  }

  async startPipeline(pipelineId) {
    console.log(`Starting data pipeline: ${pipelineId}`);

    try {
      const pipeline = this.pipelines.get(pipelineId);
      if (!pipeline) {
        throw new Error(`Pipeline not found: ${pipelineId}`);
      }

      if (pipeline.status === 'running') {
        console.log(`Pipeline ${pipelineId} is already running`);
        return { success: true, status: 'already_running' };
      }

      const definition = pipeline.definition;

      // Create pipeline run record
      const runRecord = {
        runId: this.generateRunId(),
        pipelineId: pipelineId,
        pipelineName: definition.name,
        startTime: new Date(),
        status: 'running',

        // Processing metrics
        recordsProcessed: 0,
        recordsSuccessful: 0,
        recordsFailed: 0,

        // Performance tracking
        processingTimeMs: 0,
        throughputRecordsPerSecond: 0,

        // Resource usage
        memoryUsageMB: 0,
        cpuUsagePercent: 0,

        // Error tracking
        errors: [],
        retryAttempts: 0
      };

      await this.collections.pipelineRuns.insertOne(runRecord);

      // Start processing based on trigger type
      switch (definition.triggerType) {
        case 'change_stream':
          await this.startChangeStreamPipeline(pipelineId, definition, runRecord);
          break;

        case 'schedule':
          await this.startScheduledPipeline(pipelineId, definition, runRecord);
          break;

        case 'batch':
          await this.startBatchPipeline(pipelineId, definition, runRecord);
          break;

        default:
          throw new Error(`Unsupported trigger type: ${definition.triggerType}`);
      }

      // Update pipeline status
      pipeline.status = 'running';
      pipeline.lastRun = runRecord;

      this.emit('pipelineStarted', {
        pipelineId: pipelineId,
        runId: runRecord.runId,
        startTime: runRecord.startTime
      });

      return {
        success: true,
        pipelineId: pipelineId,
        runId: runRecord.runId,
        status: 'running'
      };

    } catch (error) {
      console.error(`Error starting pipeline ${pipelineId}:`, error);

      // Update pipeline status to error
      const pipeline = this.pipelines.get(pipelineId);
      if (pipeline) {
        pipeline.status = 'error';
      }

      return {
        success: false,
        pipelineId: pipelineId,
        error: error.message
      };
    }
  }

  async startChangeStreamPipeline(pipelineId, definition, runRecord) {
    console.log(`Starting change stream pipeline: ${pipelineId}`);

    try {
      const dataSources = definition.dataSources;

      for (const dataSource of dataSources) {
        const collection = this.db.collection(dataSource.collection);

        // Configure change stream options
        const changeStreamOptions = {
          ...this.config.changeStreamOptions,
          ...dataSource.changeStreamOptions,

          // Add pipeline-specific filters
          ...(dataSource.filter && { matchStage: { $match: dataSource.filter } })
        };

        // Create change stream
        const changeStream = collection.watch([], changeStreamOptions);

        // Store change stream reference
        this.changeStreams.set(`${pipelineId}_${dataSource.collection}`, changeStream);

        // Setup change stream event handlers
        changeStream.on('change', async (changeEvent) => {
          await this.processChangeEvent(pipelineId, definition, runRecord, changeEvent);
        });

        changeStream.on('error', async (error) => {
          console.error(`Change stream error for pipeline ${pipelineId}:`, error);
          await this.handlePipelineError(pipelineId, runRecord, error);
        });

        changeStream.on('close', () => {
          console.log(`Change stream closed for pipeline ${pipelineId}`);
          this.emit('pipelineStreamClosed', { pipelineId, collection: dataSource.collection });
        });
      }

    } catch (error) {
      console.error(`Error starting change stream pipeline ${pipelineId}:`, error);
      throw error;
    }
  }

  async processChangeEvent(pipelineId, definition, runRecord, changeEvent) {
    try {
      // Track processing start
      const processingStart = Date.now();

      // Apply transformation stages
      let processedData = changeEvent;

      for (const transformationStage of definition.transformationStages) {
        processedData = await this.applyTransformation(
          processedData, 
          transformationStage, 
          definition
        );
      }

      // Apply data quality validation
      if (this.config.enableDataValidation) {
        const validationResult = await this.validateData(
          processedData, 
          definition.dataQualityRules
        );

        if (!validationResult.isValid) {
          await this.handleValidationError(pipelineId, runRecord, processedData, validationResult);
          return;
        }
      }

      // Write to target destinations
      const writeResults = await this.writeToTargets(
        processedData, 
        definition.dataTargets, 
        definition
      );

      // Update run metrics
      const processingTime = Date.now() - processingStart;

      await this.updateRunMetrics(runRecord, {
        recordsProcessed: 1,
        recordsSuccessful: writeResults.successCount,
        recordsFailed: writeResults.failureCount,
        processingTimeMs: processingTime
      });

      // Record data lineage if enabled
      if (this.config.enableDataLineage) {
        await this.recordDataLineage(pipelineId, changeEvent, processedData, definition);
      }

      this.emit('recordProcessed', {
        pipelineId: pipelineId,
        runId: runRecord.runId,
        changeEvent: changeEvent,
        processedData: processedData,
        processingTime: processingTime
      });

    } catch (error) {
      console.error(`Error processing change event for pipeline ${pipelineId}:`, error);
      await this.handleProcessingError(pipelineId, runRecord, changeEvent, error);
    }
  }

  async applyTransformation(data, transformationStage, pipelineDefinition) {
    console.log(`Applying transformation: ${transformationStage.type}`);

    try {
      switch (transformationStage.type) {
        case 'aggregation':
          return await this.applyAggregationTransformation(data, transformationStage);

        case 'field_mapping':
          return await this.applyFieldMapping(data, transformationStage);

        case 'data_enrichment':
          return await this.applyDataEnrichment(data, transformationStage, pipelineDefinition);

        case 'filtering':
          return await this.applyFiltering(data, transformationStage);

        case 'normalization':
          return await this.applyNormalization(data, transformationStage);

        case 'custom_function':
          return await this.applyCustomFunction(data, transformationStage);

        default:
          console.warn(`Unknown transformation type: ${transformationStage.type}`);
          return data;
      }

    } catch (error) {
      console.error(`Error applying transformation ${transformationStage.type}:`, error);
      throw error;
    }
  }

  async applyAggregationTransformation(data, transformationStage) {
    // Apply MongoDB aggregation pipeline to transform data
    const pipeline = transformationStage.aggregationPipeline;

    if (!Array.isArray(pipeline) || pipeline.length === 0) {
      return data;
    }

    try {
      // Execute aggregation on source data
      // This would work with the actual data structure in a real implementation
      let transformedData = data;

      // Simulate aggregation operations
      for (const stage of pipeline) {
        if (stage.$project) {
          transformedData = this.projectFields(transformedData, stage.$project);
        } else if (stage.$match) {
          transformedData = this.matchFilter(transformedData, stage.$match);
        } else if (stage.$addFields) {
          transformedData = this.addFields(transformedData, stage.$addFields);
        }
        // Add more aggregation operators as needed
      }

      return transformedData;

    } catch (error) {
      console.error('Error in aggregation transformation:', error);
      throw error;
    }
  }

  async applyFieldMapping(data, transformationStage) {
    // Apply field mapping transformation
    const mappings = transformationStage.fieldMappings;

    if (!mappings || Object.keys(mappings).length === 0) {
      return data;
    }

    try {
      let mappedData = { ...data };

      // Apply field mappings
      Object.entries(mappings).forEach(([targetField, sourceField]) => {
        const sourceValue = this.getNestedValue(data, sourceField);
        this.setNestedValue(mappedData, targetField, sourceValue);
      });

      return mappedData;

    } catch (error) {
      console.error('Error in field mapping transformation:', error);
      throw error;
    }
  }

  async applyDataEnrichment(data, transformationStage, pipelineDefinition) {
    // Apply data enrichment from external sources
    const enrichmentConfig = transformationStage.enrichmentConfig;

    try {
      let enrichedData = { ...data };

      for (const enrichment of enrichmentConfig.enrichments) {
        switch (enrichment.type) {
          case 'lookup':
            enrichedData = await this.applyLookupEnrichment(enrichedData, enrichment);
            break;

          case 'calculation':
            enrichedData = await this.applyCalculationEnrichment(enrichedData, enrichment);
            break;

          case 'external_api':
            enrichedData = await this.applyExternalApiEnrichment(enrichedData, enrichment);
            break;
        }
      }

      return enrichedData;

    } catch (error) {
      console.error('Error in data enrichment transformation:', error);
      throw error;
    }
  }

  async writeToTargets(processedData, dataTargets, pipelineDefinition) {
    console.log('Writing processed data to targets...');

    const writeResults = {
      successCount: 0,
      failureCount: 0,
      results: []
    };

    try {
      const writePromises = dataTargets.map(async (target) => {
        try {
          const result = await this.writeToTarget(processedData, target, pipelineDefinition);
          writeResults.successCount++;
          writeResults.results.push({ target: target.name, success: true, result });
          return result;

        } catch (error) {
          console.error(`Error writing to target ${target.name}:`, error);
          writeResults.failureCount++;
          writeResults.results.push({ 
            target: target.name, 
            success: false, 
            error: error.message 
          });
          throw error;
        }
      });

      await Promise.allSettled(writePromises);

      return writeResults;

    } catch (error) {
      console.error('Error writing to targets:', error);
      throw error;
    }
  }

  async writeToTarget(processedData, target, pipelineDefinition) {
    console.log(`Writing to target: ${target.name} (${target.type})`);

    try {
      switch (target.type) {
        case 'mongodb_collection':
          return await this.writeToMongoDBCollection(processedData, target);

        case 'file':
          return await this.writeToFile(processedData, target);

        case 'external_api':
          return await this.writeToExternalAPI(processedData, target);

        case 'message_queue':
          return await this.writeToMessageQueue(processedData, target);

        default:
          throw new Error(`Unsupported target type: ${target.type}`);
      }

    } catch (error) {
      console.error(`Error writing to target ${target.name}:`, error);
      throw error;
    }
  }

  async writeToMongoDBCollection(processedData, target) {
    const collection = this.db.collection(target.collection);

    try {
      switch (target.writeMode || 'insert') {
        case 'insert':
          const insertResult = await collection.insertOne(processedData);
          return { operation: 'insert', insertedId: insertResult.insertedId };

        case 'upsert':
          const upsertResult = await collection.replaceOne(
            target.upsertFilter || { _id: processedData._id },
            processedData,
            { upsert: true }
          );
          return { 
            operation: 'upsert', 
            modifiedCount: upsertResult.modifiedCount,
            upsertedId: upsertResult.upsertedId
          };

        case 'update':
          const updateResult = await collection.updateOne(
            target.updateFilter || { _id: processedData._id },
            { $set: processedData }
          );
          return {
            operation: 'update',
            matchedCount: updateResult.matchedCount,
            modifiedCount: updateResult.modifiedCount
          };

        default:
          throw new Error(`Unsupported write mode: ${target.writeMode}`);
      }

    } catch (error) {
      console.error('Error writing to MongoDB collection:', error);
      throw error;
    }
  }

  async getPipelineMetrics(pipelineId, timeRange = {}) {
    console.log(`Getting metrics for pipeline: ${pipelineId}`);

    try {
      const pipeline = this.pipelines.get(pipelineId);
      if (!pipeline) {
        throw new Error(`Pipeline not found: ${pipelineId}`);
      }

      // Build time range filter
      const timeFilter = {};
      if (timeRange.startTime) {
        timeFilter.$gte = new Date(timeRange.startTime);
      }
      if (timeRange.endTime) {
        timeFilter.$lte = new Date(timeRange.endTime);
      }

      const matchStage = { pipelineId: pipelineId };
      if (Object.keys(timeFilter).length > 0) {
        matchStage.startTime = timeFilter;
      }

      // Aggregate pipeline metrics
      const metricsAggregation = [
        { $match: matchStage },
        {
          $group: {
            _id: '$pipelineId',
            totalRuns: { $sum: 1 },
            successfulRuns: { 
              $sum: { $cond: [{ $eq: ['$status', 'completed'] }, 1, 0] } 
            },
            failedRuns: { 
              $sum: { $cond: [{ $eq: ['$status', 'failed'] }, 1, 0] } 
            },
            totalRecordsProcessed: { $sum: '$recordsProcessed' },
            totalRecordsSuccessful: { $sum: '$recordsSuccessful' },
            totalRecordsFailed: { $sum: '$recordsFailed' },

            // Performance metrics
            averageProcessingTime: { $avg: '$processingTimeMs' },
            maxProcessingTime: { $max: '$processingTimeMs' },
            minProcessingTime: { $min: '$processingTimeMs' },

            // Throughput metrics
            averageThroughput: { $avg: '$throughputRecordsPerSecond' },
            maxThroughput: { $max: '$throughputRecordsPerSecond' },

            // Resource usage
            averageMemoryUsage: { $avg: '$memoryUsageMB' },
            maxMemoryUsage: { $max: '$memoryUsageMB' },
            averageCpuUsage: { $avg: '$cpuUsagePercent' },
            maxCpuUsage: { $max: '$cpuUsagePercent' },

            // Time range
            firstRun: { $min: '$startTime' },
            lastRun: { $max: '$startTime' }
          }
        }
      ];

      const metricsResult = await this.collections.pipelineRuns
        .aggregate(metricsAggregation)
        .toArray();

      const metrics = metricsResult[0] || {
        _id: pipelineId,
        totalRuns: 0,
        successfulRuns: 0,
        failedRuns: 0,
        totalRecordsProcessed: 0,
        totalRecordsSuccessful: 0,
        totalRecordsFailed: 0,
        averageProcessingTime: 0,
        averageThroughput: 0,
        averageMemoryUsage: 0,
        averageCpuUsage: 0
      };

      // Calculate additional derived metrics
      const successRate = metrics.totalRuns > 0 ? 
        (metrics.successfulRuns / metrics.totalRuns) * 100 : 0;

      const dataQualityRate = metrics.totalRecordsProcessed > 0 ? 
        (metrics.totalRecordsSuccessful / metrics.totalRecordsProcessed) * 100 : 0;

      return {
        success: true,
        pipelineId: pipelineId,
        timeRange: timeRange,

        // Basic metrics
        totalRuns: metrics.totalRuns,
        successfulRuns: metrics.successfulRuns,
        failedRuns: metrics.failedRuns,
        successRate: Math.round(successRate * 100) / 100,

        // Data processing metrics
        totalRecordsProcessed: metrics.totalRecordsProcessed,
        totalRecordsSuccessful: metrics.totalRecordsSuccessful,
        totalRecordsFailed: metrics.totalRecordsFailed,
        dataQualityRate: Math.round(dataQualityRate * 100) / 100,

        // Performance metrics
        performance: {
          averageProcessingTimeMs: Math.round(metrics.averageProcessingTime || 0),
          maxProcessingTimeMs: metrics.maxProcessingTime || 0,
          minProcessingTimeMs: metrics.minProcessingTime || 0,
          averageThroughputRps: Math.round((metrics.averageThroughput || 0) * 100) / 100,
          maxThroughputRps: Math.round((metrics.maxThroughput || 0) * 100) / 100
        },

        // Resource usage
        resourceUsage: {
          averageMemoryMB: Math.round(metrics.averageMemoryUsage || 0),
          maxMemoryMB: metrics.maxMemoryUsage || 0,
          averageCpuPercent: Math.round((metrics.averageCpuUsage || 0) * 100) / 100,
          maxCpuPercent: Math.round((metrics.maxCpuUsage || 0) * 100) / 100
        },

        // Time range
        timeSpan: {
          firstRun: metrics.firstRun,
          lastRun: metrics.lastRun,
          duration: metrics.firstRun && metrics.lastRun ? 
            metrics.lastRun.getTime() - metrics.firstRun.getTime() : 0
        },

        // Pipeline status
        currentStatus: pipeline.status,
        lastRunStatus: pipeline.lastRun ? pipeline.lastRun.status : null
      };

    } catch (error) {
      console.error(`Error getting pipeline metrics for ${pipelineId}:`, error);
      return {
        success: false,
        pipelineId: pipelineId,
        error: error.message
      };
    }
  }

  async stopPipeline(pipelineId) {
    console.log(`Stopping pipeline: ${pipelineId}`);

    try {
      const pipeline = this.pipelines.get(pipelineId);
      if (!pipeline) {
        throw new Error(`Pipeline not found: ${pipelineId}`);
      }

      // Stop change streams
      for (const [streamKey, changeStream] of this.changeStreams.entries()) {
        if (streamKey.startsWith(pipelineId)) {
          await changeStream.close();
          this.changeStreams.delete(streamKey);
        }
      }

      // Update pipeline status
      pipeline.status = 'stopped';

      // Update current run if exists
      if (pipeline.lastRun && pipeline.lastRun.status === 'running') {
        await this.collections.pipelineRuns.updateOne(
          { runId: pipeline.lastRun.runId },
          {
            $set: {
              status: 'stopped',
              endTime: new Date(),
              processingTimeMs: Date.now() - pipeline.lastRun.startTime.getTime()
            }
          }
        );
      }

      this.emit('pipelineStopped', {
        pipelineId: pipelineId,
        stopTime: new Date()
      });

      return {
        success: true,
        pipelineId: pipelineId,
        status: 'stopped'
      };

    } catch (error) {
      console.error(`Error stopping pipeline ${pipelineId}:`, error);
      return {
        success: false,
        pipelineId: pipelineId,
        error: error.message
      };
    }
  }

  // Utility methods for data processing

  getNestedValue(obj, path) {
    return path.split('.').reduce((current, key) => current && current[key], obj);
  }

  setNestedValue(obj, path, value) {
    const keys = path.split('.');
    const lastKey = keys.pop();
    const target = keys.reduce((current, key) => {
      if (!current[key]) current[key] = {};
      return current[key];
    }, obj);
    target[lastKey] = value;
  }

  projectFields(data, projection) {
    const result = {};
    Object.entries(projection).forEach(([field, include]) => {
      if (include) {
        const value = this.getNestedValue(data, field);
        if (value !== undefined) {
          this.setNestedValue(result, field, value);
        }
      }
    });
    return result;
  }

  matchFilter(data, filter) {
    // Simplified match implementation
    // In production, would implement full MongoDB query matching
    for (const [field, condition] of Object.entries(filter)) {
      const value = this.getNestedValue(data, field);

      if (typeof condition === 'object' && condition !== null) {
        // Handle operators like $eq, $ne, $gt, etc.
        for (const [operator, operand] of Object.entries(condition)) {
          switch (operator) {
            case '$eq':
              if (value !== operand) return null;
              break;
            case '$ne':
              if (value === operand) return null;
              break;
            case '$gt':
              if (value <= operand) return null;
              break;
            case '$gte':
              if (value < operand) return null;
              break;
            case '$lt':
              if (value >= operand) return null;
              break;
            case '$lte':
              if (value > operand) return null;
              break;
            case '$in':
              if (!operand.includes(value)) return null;
              break;
            case '$nin':
              if (operand.includes(value)) return null;
              break;
          }
        }
      } else {
        // Direct value comparison
        if (value !== condition) return null;
      }
    }

    return data;
  }

  addFields(data, fieldsToAdd) {
    const result = { ...data };

    Object.entries(fieldsToAdd).forEach(([field, expression]) => {
      // Simplified field addition
      // In production, would implement full MongoDB expression evaluation
      if (typeof expression === 'string' && expression.startsWith('$')) {
        // Reference to another field
        const referencedValue = this.getNestedValue(data, expression.slice(1));
        this.setNestedValue(result, field, referencedValue);
      } else {
        // Literal value
        this.setNestedValue(result, field, expression);
      }
    });

    return result;
  }

  generatePipelineId(name) {
    return `pipeline_${name.toLowerCase().replace(/\s+/g, '_')}_${Date.now()}`;
  }

  generateRunId() {
    return `run_${Date.now()}_${Math.random().toString(36).substr(2, 9)}`;
  }

  async validatePipelineDefinition(definition) {
    // Validate required fields
    if (!definition.name) {
      throw new Error('Pipeline name is required');
    }

    if (!definition.dataSources || definition.dataSources.length === 0) {
      throw new Error('At least one data source is required');
    }

    if (!definition.dataTargets || definition.dataTargets.length === 0) {
      throw new Error('At least one data target is required');
    }

    // Add more validation as needed
    return definition;
  }

  async updateRunMetrics(runRecord, metrics) {
    try {
      const updateData = {};

      if (metrics.recordsProcessed) {
        updateData.$inc = { recordsProcessed: metrics.recordsProcessed };
      }

      if (metrics.recordsSuccessful) {
        updateData.$inc = { ...updateData.$inc, recordsSuccessful: metrics.recordsSuccessful };
      }

      if (metrics.recordsFailed) {
        updateData.$inc = { ...updateData.$inc, recordsFailed: metrics.recordsFailed };
      }

      if (metrics.processingTimeMs) {
        updateData.$set = { 
          lastProcessingTime: metrics.processingTimeMs,
          lastUpdateTime: new Date()
        };
      }

      if (Object.keys(updateData).length > 0) {
        await this.collections.pipelineRuns.updateOne(
          { runId: runRecord.runId },
          updateData
        );
      }

    } catch (error) {
      console.error('Error updating run metrics:', error);
    }
  }

  async handlePipelineError(pipelineId, runRecord, error) {
    console.error(`Pipeline error for ${pipelineId}:`, error);

    try {
      // Record error
      const errorRecord = {
        pipelineId: pipelineId,
        runId: runRecord.runId,
        errorTime: new Date(),
        errorType: error.constructor.name,
        errorMessage: error.message,
        errorStack: error.stack,

        // Context information
        processingContext: {
          recordsProcessedBeforeError: runRecord.recordsProcessed,
          runDuration: Date.now() - runRecord.startTime.getTime()
        }
      };

      await this.collections.pipelineErrors.insertOne(errorRecord);

      // Update run status
      await this.collections.pipelineRuns.updateOne(
        { runId: runRecord.runId },
        {
          $set: {
            status: 'failed',
            endTime: new Date(),
            errorMessage: error.message
          },
          $push: { errors: errorRecord }
        }
      );

      // Update pipeline status
      const pipeline = this.pipelines.get(pipelineId);
      if (pipeline) {
        pipeline.status = 'error';
        pipeline.statistics.failedRuns++;
      }

      this.emit('pipelineError', {
        pipelineId: pipelineId,
        runId: runRecord.runId,
        error: errorRecord
      });

    } catch (recordingError) {
      console.error('Error recording pipeline error:', recordingError);
    }
  }

  async validateData(data, qualityRules) {
    // Implement data quality validation logic
    const validationResult = {
      isValid: true,
      errors: [],
      warnings: []
    };

    // Apply quality rules
    for (const rule of qualityRules) {
      try {
        const ruleResult = await this.applyQualityRule(data, rule);
        if (!ruleResult.passed) {
          validationResult.isValid = false;
          validationResult.errors.push({
            rule: rule.name,
            message: ruleResult.message,
            field: rule.field,
            value: this.getNestedValue(data, rule.field)
          });
        }
      } catch (error) {
        validationResult.warnings.push({
          rule: rule.name,
          message: `Rule validation failed: ${error.message}`
        });
      }
    }

    return validationResult;
  }

  async applyQualityRule(data, rule) {
    // Implement specific quality rule logic
    switch (rule.type) {
      case 'required':
        const value = this.getNestedValue(data, rule.field);
        return {
          passed: value !== null && value !== undefined && value !== '',
          message: value ? 'Field is present' : `Required field '${rule.field}' is missing`
        };

      case 'type':
        const fieldValue = this.getNestedValue(data, rule.field);
        const actualType = typeof fieldValue;
        return {
          passed: actualType === rule.expectedType,
          message: actualType === rule.expectedType ? 
            'Type validation passed' : 
            `Expected type '${rule.expectedType}' but got '${actualType}'`
        };

      case 'range':
        const numericValue = this.getNestedValue(data, rule.field);
        const inRange = numericValue >= rule.min && numericValue <= rule.max;
        return {
          passed: inRange,
          message: inRange ? 
            'Value is within range' : 
            `Value ${numericValue} is outside range [${rule.min}, ${rule.max}]`
        };

      default:
        return { passed: true, message: 'Unknown rule type' };
    }
  }

  async recordDataLineage(pipelineId, originalData, processedData, definition) {
    try {
      const lineageRecord = {
        pipelineId: pipelineId,
        timestamp: new Date(),

        // Data sources
        dataSources: definition.dataSources.map(source => ({
          collection: source.collection,
          database: source.database || this.db.databaseName
        })),

        // Data targets
        dataTargets: definition.dataTargets.map(target => ({
          collection: target.collection,
          database: target.database || this.db.databaseName,
          type: target.type
        })),

        // Transformation metadata
        transformations: definition.transformationStages.map(stage => ({
          type: stage.type,
          applied: true
        })),

        // Data checksums for integrity verification
        originalDataChecksum: this.calculateChecksum(originalData),
        processedDataChecksum: this.calculateChecksum(processedData),

        // Record identifiers
        originalRecordId: originalData._id || originalData.id,
        processedRecordId: processedData._id || processedData.id
      };

      await this.collections.dataLineage.insertOne(lineageRecord);

    } catch (error) {
      console.error('Error recording data lineage:', error);
      // Don't throw - lineage recording shouldn't stop pipeline execution
    }
  }

  calculateChecksum(data) {
    // Simple checksum calculation for demonstration
    // In production, would use proper hashing algorithm
    const dataString = JSON.stringify(data, Object.keys(data).sort());
    let hash = 0;
    for (let i = 0; i < dataString.length; i++) {
      const char = dataString.charCodeAt(i);
      hash = ((hash << 5) - hash) + char;
      hash = hash & hash; // Convert to 32bit integer
    }
    return hash.toString(36);
  }

  async shutdown() {
    console.log('Shutting down data pipeline manager...');

    try {
      // Stop all running pipelines
      for (const [pipelineId, pipeline] of this.pipelines.entries()) {
        if (pipeline.status === 'running') {
          await this.stopPipeline(pipelineId);
        }
      }

      // Close all change streams
      for (const [streamKey, changeStream] of this.changeStreams.entries()) {
        await changeStream.close();
      }

      // Close MongoDB connection
      if (this.client) {
        await this.client.close();
      }

      console.log('Data pipeline manager shutdown complete');

    } catch (error) {
      console.error('Error during shutdown:', error);
    }
  }

  // Additional methods would include implementations for:
  // - setupChangeStreams()
  // - startPipelineMonitoring()
  // - startScheduledPipeline()
  // - startBatchPipeline()
  // - applyLookupEnrichment()
  // - applyCalculationEnrichment()
  // - applyExternalApiEnrichment()
  // - applyFiltering()
  // - applyNormalization()
  // - applyCustomFunction()
  // - writeToFile()
  // - writeToExternalAPI()
  // - writeToMessageQueue()
  // - handleValidationError()
  // - handleProcessingError()
}

// Benefits of MongoDB Advanced Data Pipeline Management:
// - Real-time stream processing with Change Streams
// - Sophisticated data transformation and enrichment capabilities  
// - Comprehensive error handling and recovery mechanisms
// - Built-in data quality validation and monitoring
// - Automatic scalability and performance optimization
// - Data lineage tracking and audit capabilities
// - Flexible pipeline orchestration and scheduling
// - SQL-compatible operations through QueryLeaf integration
// - Production-ready monitoring and observability features
// - Enterprise-grade reliability and fault tolerance

module.exports = {
  AdvancedDataPipelineManager
};

Advanced Stream Processing Patterns

Real-Time Data Transformation and Analytics

Implement sophisticated stream processing for real-time data analytics:

// Advanced real-time stream processing and analytics
class RealTimeStreamProcessor extends AdvancedDataPipelineManager {
  constructor(mongoUri, streamConfig) {
    super(mongoUri, streamConfig);

    this.streamConfig = {
      ...streamConfig,
      enableWindowedProcessing: true,
      enableEventTimeProcessing: true,
      enableComplexEventProcessing: true,
      enableStreamAggregation: true
    };

    this.windowManager = new Map();
    this.eventPatterns = new Map();
    this.streamState = new Map();

    this.setupStreamProcessing();
  }

  async processEventStream(streamDefinition) {
    console.log('Setting up advanced event stream processing...');

    try {
      const streamProcessor = {
        streamId: this.generateStreamId(streamDefinition.name),
        definition: streamDefinition,

        // Windowing configuration
        windowConfig: {
          type: streamDefinition.windowType || 'tumbling', // tumbling, hopping, sliding
          size: streamDefinition.windowSize || 60000, // 1 minute
          advance: streamDefinition.windowAdvance || 30000 // 30 seconds
        },

        // Processing configuration
        processingConfig: {
          enableLateSparks: streamDefinition.enableLateSparks || false,
          watermarkDelay: streamDefinition.watermarkDelay || 5000,
          enableExactlyOnceProcessing: streamDefinition.enableExactlyOnceProcessing || false
        },

        // Analytics configuration
        analyticsConfig: {
          enableAggregation: streamDefinition.enableAggregation !== false,
          enablePatternDetection: streamDefinition.enablePatternDetection || false,
          enableAnomalyDetection: streamDefinition.enableAnomalyDetection || false,
          enableTrendAnalysis: streamDefinition.enableTrendAnalysis || false
        }
      };

      return await this.deployStreamProcessor(streamProcessor);

    } catch (error) {
      console.error('Error processing event stream:', error);
      return {
        success: false,
        error: error.message
      };
    }
  }

  async deployStreamProcessor(streamProcessor) {
    console.log(`Deploying stream processor: ${streamProcessor.streamId}`);

    try {
      // Setup windowed processing
      if (this.streamConfig.enableWindowedProcessing) {
        await this.setupWindowedProcessing(streamProcessor);
      }

      // Setup complex event processing
      if (this.streamConfig.enableComplexEventProcessing) {
        await this.setupComplexEventProcessing(streamProcessor);
      }

      // Setup stream aggregation
      if (this.streamConfig.enableStreamAggregation) {
        await this.setupStreamAggregation(streamProcessor);
      }

      return {
        success: true,
        streamId: streamProcessor.streamId,
        processorConfig: streamProcessor
      };

    } catch (error) {
      console.error(`Error deploying stream processor ${streamProcessor.streamId}:`, error);
      return {
        success: false,
        streamId: streamProcessor.streamId,
        error: error.message
      };
    }
  }
}

SQL-Style Data Pipeline Operations with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB data pipeline management:

-- QueryLeaf advanced data pipeline operations with SQL-familiar syntax for MongoDB

-- Pipeline definition and configuration
CREATE OR REPLACE PIPELINE customer_data_enrichment_pipeline
AS
WITH pipeline_config AS (
    -- Pipeline metadata and configuration
    SELECT 
        'customer_data_enrichment' as pipeline_name,
        'stream' as processing_mode,
        'change_stream' as trigger_type,
        true as auto_start,

        -- Performance configuration
        1000 as batch_size,
        5 as max_concurrency,
        300000 as timeout_ms,

        -- Quality configuration
        true as enable_data_validation,
        true as enable_schema_evolution,
        true as enable_data_lineage,

        -- Error handling
        'exponential_backoff' as retry_strategy,
        3 as max_retries,
        true as dead_letter_queue
),

data_sources AS (
    -- Define data sources for pipeline
    SELECT ARRAY[
        JSON_BUILD_OBJECT(
            'name', 'customer_changes',
            'collection', 'customers',
            'database', 'ecommerce',
            'filter', JSON_BUILD_OBJECT(
                'operationType', JSON_BUILD_OBJECT('$in', ARRAY['insert', 'update'])
            ),
            'change_stream_options', JSON_BUILD_OBJECT(
                'fullDocument', 'updateLookup',
                'fullDocumentBeforeChange', 'whenAvailable'
            )
        ),
        JSON_BUILD_OBJECT(
            'name', 'order_changes',
            'collection', 'orders',
            'database', 'ecommerce',
            'filter', JSON_BUILD_OBJECT(
                'fullDocument.customer_id', JSON_BUILD_OBJECT('$exists', true)
            )
        )
    ] as sources
),

transformation_stages AS (
    -- Define transformation pipeline stages
    SELECT ARRAY[
        -- Stage 1: Data enrichment with external lookups
        JSON_BUILD_OBJECT(
            'type', 'data_enrichment',
            'name', 'customer_profile_enrichment',
            'enrichment_config', JSON_BUILD_OBJECT(
                'enrichments', ARRAY[
                    JSON_BUILD_OBJECT(
                        'type', 'lookup',
                        'lookup_collection', 'customer_profiles',
                        'lookup_field', 'customer_id',
                        'source_field', 'fullDocument.customer_id',
                        'target_field', 'customer_profile'
                    ),
                    JSON_BUILD_OBJECT(
                        'type', 'calculation',
                        'calculations', ARRAY[
                            JSON_BUILD_OBJECT(
                                'field', 'customer_lifetime_value',
                                'expression', 'customer_profile.total_orders * customer_profile.avg_order_value'
                            ),
                            JSON_BUILD_OBJECT(
                                'field', 'customer_segment',
                                'expression', 'CASE WHEN customer_lifetime_value > 1000 THEN "premium" WHEN customer_lifetime_value > 500 THEN "standard" ELSE "basic" END'
                            )
                        ]
                    )
                ]
            )
        ),

        -- Stage 2: Field mapping and normalization
        JSON_BUILD_OBJECT(
            'type', 'field_mapping',
            'name', 'customer_data_mapping',
            'field_mappings', JSON_BUILD_OBJECT(
                'customer_id', 'fullDocument.customer_id',
                'customer_email', 'fullDocument.email',
                'customer_name', 'fullDocument.full_name',
                'customer_phone', 'fullDocument.phone_number',
                'registration_date', 'fullDocument.created_at',
                'last_login', 'fullDocument.last_login_at',
                'profile_completion', 'customer_profile.completion_percentage',
                'lifetime_value', 'customer_lifetime_value',
                'segment', 'customer_segment',
                'change_type', 'operationType',
                'change_timestamp', 'clusterTime'
            )
        ),

        -- Stage 3: Data validation and quality checks
        JSON_BUILD_OBJECT(
            'type', 'data_validation',
            'name', 'customer_data_validation',
            'validation_rules', ARRAY[
                JSON_BUILD_OBJECT(
                    'field', 'customer_email',
                    'type', 'email',
                    'required', true
                ),
                JSON_BUILD_OBJECT(
                    'field', 'customer_phone',
                    'type', 'phone',
                    'required', false
                ),
                JSON_BUILD_OBJECT(
                    'field', 'lifetime_value',
                    'type', 'numeric',
                    'min_value', 0,
                    'max_value', 100000
                )
            ]
        ),

        -- Stage 4: Aggregation for analytics
        JSON_BUILD_OBJECT(
            'type', 'aggregation',
            'name', 'customer_analytics_aggregation',
            'aggregation_pipeline', ARRAY[
                JSON_BUILD_OBJECT(
                    '$addFields', JSON_BUILD_OBJECT(
                        'processing_date', '$$NOW',
                        'data_freshness_score', JSON_BUILD_OBJECT(
                            '$subtract', ARRAY[100, JSON_BUILD_OBJECT(
                                '$divide', ARRAY[
                                    JSON_BUILD_OBJECT('$subtract', ARRAY['$$NOW', '$change_timestamp']),
                                    3600000  -- Convert to hours
                                ]
                            )]
                        ),
                        'engagement_score', JSON_BUILD_OBJECT(
                            '$multiply', ARRAY[
                                '$profile_completion',
                                JSON_BUILD_OBJECT('$cond', ARRAY[
                                    JSON_BUILD_OBJECT('$ne', ARRAY['$last_login', NULL]),
                                    1.2,  -- Boost for active users
                                    1.0
                                ])
                            ]
                        )
                    )
                ),
                JSON_BUILD_OBJECT(
                    '$addFields', JSON_BUILD_OBJECT(
                        'customer_score', JSON_BUILD_OBJECT(
                            '$add', ARRAY[
                                JSON_BUILD_OBJECT('$multiply', ARRAY['$lifetime_value', 0.4]),
                                JSON_BUILD_OBJECT('$multiply', ARRAY['$engagement_score', 0.3]),
                                JSON_BUILD_OBJECT('$multiply', ARRAY['$data_freshness_score', 0.3])
                            ]
                        )
                    )
                )
            ]
        )
    ] as stages
),

data_targets AS (
    -- Define output destinations
    SELECT ARRAY[
        JSON_BUILD_OBJECT(
            'name', 'enriched_customers',
            'type', 'mongodb_collection',
            'collection', 'enriched_customers',
            'database', 'analytics',
            'write_mode', 'upsert',
            'upsert_filter', JSON_BUILD_OBJECT('customer_id', '$customer_id')
        ),
        JSON_BUILD_OBJECT(
            'name', 'customer_analytics_stream',
            'type', 'message_queue',
            'queue_name', 'customer_analytics',
            'format', 'json',
            'partition_key', 'customer_segment'
        ),
        JSON_BUILD_OBJECT(
            'name', 'data_warehouse_export',
            'type', 'file',
            'file_path', '/data/exports/customer_enrichment',
            'format', 'parquet',
            'partition_by', ARRAY['segment', 'processing_date']
        )
    ] as targets
),

data_quality_rules AS (
    -- Define comprehensive data quality rules
    SELECT ARRAY[
        JSON_BUILD_OBJECT(
            'name', 'required_customer_id',
            'type', 'required',
            'field', 'customer_id',
            'severity', 'critical'
        ),
        JSON_BUILD_OBJECT(
            'name', 'valid_email_format',
            'type', 'regex',
            'field', 'customer_email',
            'pattern', '^[\\w\\.-]+@[\\w\\.-]+\\.[a-zA-Z]{2,}$',
            'severity', 'high'
        ),
        JSON_BUILD_OBJECT(
            'name', 'reasonable_lifetime_value',
            'type', 'range',
            'field', 'lifetime_value',
            'min', 0,
            'max', 50000,
            'severity', 'medium'
        ),
        JSON_BUILD_OBJECT(
            'name', 'valid_customer_segment',
            'type', 'enum',
            'field', 'segment',
            'allowed_values', ARRAY['premium', 'standard', 'basic'],
            'severity', 'high'
        )
    ] as rules
)

-- Create the pipeline with comprehensive configuration
SELECT 
    'customer_data_enrichment_pipeline' as pipeline_name,
    pipeline_config.*,
    data_sources.sources,
    transformation_stages.stages,
    data_targets.targets,
    data_quality_rules.rules,

    -- Pipeline scheduling
    JSON_BUILD_OBJECT(
        'schedule_type', 'real_time',
        'trigger_conditions', ARRAY[
            'customer_data_change',
            'order_completion',
            'profile_update'
        ]
    ) as scheduling_config,

    -- Monitoring configuration  
    JSON_BUILD_OBJECT(
        'enable_metrics', true,
        'metrics_interval_seconds', 60,
        'alert_thresholds', JSON_BUILD_OBJECT(
            'error_rate_percent', 5,
            'processing_latency_ms', 5000,
            'throughput_records_per_second', 100
        ),
        'notification_channels', ARRAY[
            'email:data-team@company.com',
            'slack:#data-pipelines',
            'webhook:https://monitoring.company.com/alerts'
        ]
    ) as monitoring_config

FROM pipeline_config, data_sources, transformation_stages, data_targets, data_quality_rules;

-- Pipeline execution and monitoring queries

-- Real-time pipeline performance monitoring
WITH pipeline_performance AS (
    SELECT 
        pipeline_id,
        pipeline_name,
        run_id,
        start_time,
        end_time,
        status,

        -- Processing metrics
        records_processed,
        records_successful,
        records_failed,

        -- Performance calculations
        EXTRACT(MILLISECONDS FROM (COALESCE(end_time, CURRENT_TIMESTAMP) - start_time)) as duration_ms,

        -- Throughput calculation
        CASE 
            WHEN EXTRACT(SECONDS FROM (COALESCE(end_time, CURRENT_TIMESTAMP) - start_time)) > 0 THEN
                records_processed / EXTRACT(SECONDS FROM (COALESCE(end_time, CURRENT_TIMESTAMP) - start_time))
            ELSE 0
        END as throughput_records_per_second,

        -- Success rate
        CASE 
            WHEN records_processed > 0 THEN 
                (records_successful * 100.0) / records_processed
            ELSE 0
        END as success_rate_percent,

        -- Resource utilization
        memory_usage_mb,
        cpu_usage_percent,

        -- Current processing lag
        CASE 
            WHEN status = 'running' THEN 
                EXTRACT(SECONDS FROM (CURRENT_TIMESTAMP - last_processed_timestamp))
            ELSE NULL
        END as current_lag_seconds

    FROM pipeline_runs
    WHERE start_time >= CURRENT_TIMESTAMP - INTERVAL '1 hour'
),

performance_summary AS (
    SELECT 
        pipeline_name,
        COUNT(*) as total_runs,
        COUNT(*) FILTER (WHERE status = 'completed') as successful_runs,
        COUNT(*) FILTER (WHERE status = 'failed') as failed_runs,
        COUNT(*) FILTER (WHERE status = 'running') as active_runs,

        -- Aggregate performance metrics
        SUM(records_processed) as total_records_processed,
        SUM(records_successful) as total_records_successful,
        SUM(records_failed) as total_records_failed,

        -- Performance statistics
        AVG(duration_ms) as avg_duration_ms,
        PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY duration_ms) as p95_duration_ms,
        AVG(throughput_records_per_second) as avg_throughput_rps,
        MAX(throughput_records_per_second) as max_throughput_rps,

        -- Quality metrics
        AVG(success_rate_percent) as avg_success_rate,
        MIN(success_rate_percent) as min_success_rate,

        -- Resource usage
        AVG(memory_usage_mb) as avg_memory_usage_mb,
        MAX(memory_usage_mb) as max_memory_usage_mb,
        AVG(cpu_usage_percent) as avg_cpu_usage,
        MAX(cpu_usage_percent) as max_cpu_usage,

        -- Lag analysis
        AVG(current_lag_seconds) as avg_processing_lag_seconds,
        MAX(current_lag_seconds) as max_processing_lag_seconds

    FROM pipeline_performance
    GROUP BY pipeline_name
)

SELECT 
    pipeline_name,
    total_runs,
    successful_runs,
    failed_runs,
    active_runs,

    -- Overall health assessment
    CASE 
        WHEN failed_runs > total_runs * 0.1 THEN 'critical'
        WHEN avg_success_rate < 95 THEN 'warning'
        WHEN avg_processing_lag_seconds > 300 THEN 'warning'  -- 5 minutes lag
        WHEN max_cpu_usage > 90 OR max_memory_usage_mb > 4096 THEN 'warning'
        ELSE 'healthy'
    END as health_status,

    -- Processing statistics
    total_records_processed,
    total_records_successful,
    total_records_failed,

    -- Performance metrics
    ROUND(avg_duration_ms, 0) as avg_duration_ms,
    ROUND(p95_duration_ms, 0) as p95_duration_ms,
    ROUND(avg_throughput_rps, 2) as avg_throughput_rps,
    ROUND(max_throughput_rps, 2) as max_throughput_rps,

    -- Quality and reliability
    ROUND(avg_success_rate, 2) as avg_success_rate_percent,
    ROUND(min_success_rate, 2) as min_success_rate_percent,

    -- Resource utilization
    ROUND(avg_memory_usage_mb, 0) as avg_memory_usage_mb,
    ROUND(max_memory_usage_mb, 0) as max_memory_usage_mb,
    ROUND(avg_cpu_usage, 1) as avg_cpu_usage_percent,
    ROUND(max_cpu_usage, 1) as max_cpu_usage_percent,

    -- Processing lag indicators
    COALESCE(ROUND(avg_processing_lag_seconds, 0), 0) as avg_lag_seconds,
    COALESCE(ROUND(max_processing_lag_seconds, 0), 0) as max_lag_seconds,

    -- Operational recommendations
    CASE 
        WHEN failed_runs > total_runs * 0.05 THEN 'investigate_errors'
        WHEN avg_throughput_rps < 50 THEN 'optimize_performance'
        WHEN max_cpu_usage > 80 THEN 'scale_up_resources'
        WHEN avg_processing_lag_seconds > 120 THEN 'reduce_processing_latency'
        ELSE 'monitor_continued'
    END as recommendation,

    -- Capacity planning
    CASE 
        WHEN max_throughput_rps / avg_throughput_rps < 1.5 THEN 'add_capacity'
        WHEN max_memory_usage_mb > 3072 THEN 'increase_memory'
        WHEN active_runs > 1 THEN 'check_concurrency_limits'
        ELSE 'capacity_sufficient'
    END as capacity_recommendation

FROM performance_summary
ORDER BY 
    CASE health_status 
        WHEN 'critical' THEN 1 
        WHEN 'warning' THEN 2 
        ELSE 3 
    END,
    total_records_processed DESC;

-- Data lineage and quality tracking
WITH data_lineage_analysis AS (
    SELECT 
        pipeline_id,
        DATE_TRUNC('hour', timestamp) as processing_hour,

        -- Source and target tracking
        JSONB_ARRAY_ELEMENTS(data_sources) ->> 'collection' as source_collection,
        JSONB_ARRAY_ELEMENTS(data_targets) ->> 'collection' as target_collection,

        -- Data quality metrics
        COUNT(*) as total_transformations,
        COUNT(*) FILTER (WHERE original_data_checksum != processed_data_checksum) as data_modified,
        COUNT(DISTINCT original_record_id) as unique_source_records,
        COUNT(DISTINCT processed_record_id) as unique_target_records,

        -- Transformation tracking
        JSONB_ARRAY_ELEMENTS(transformations) ->> 'type' as transformation_type,
        COUNT(*) FILTER (WHERE (JSONB_ARRAY_ELEMENTS(transformations) ->> 'applied')::boolean = true) as transformations_applied,

        -- Data integrity checks
        COUNT(*) FILTER (WHERE original_data_checksum IS NOT NULL AND processed_data_checksum IS NOT NULL) as checksum_validations,

        -- Processing metadata
        MIN(timestamp) as first_processing,
        MAX(timestamp) as last_processing

    FROM data_lineage
    WHERE timestamp >= CURRENT_TIMESTAMP - INTERVAL '24 hours'
    GROUP BY 
        pipeline_id, 
        DATE_TRUNC('hour', timestamp),
        JSONB_ARRAY_ELEMENTS(data_sources) ->> 'collection',
        JSONB_ARRAY_ELEMENTS(data_targets) ->> 'collection',
        JSONB_ARRAY_ELEMENTS(transformations) ->> 'type'
),

quality_summary AS (
    SELECT 
        pipeline_id,
        source_collection,
        target_collection,
        transformation_type,

        -- Aggregated metrics
        SUM(total_transformations) as total_transformations,
        SUM(data_modified) as total_data_modified,
        SUM(unique_source_records) as total_source_records,
        SUM(unique_target_records) as total_target_records,
        SUM(transformations_applied) as total_transformations_applied,
        SUM(checksum_validations) as total_checksum_validations,

        -- Data quality calculations
        CASE 
            WHEN SUM(total_transformations) > 0 THEN
                (SUM(transformations_applied) * 100.0) / SUM(total_transformations)
            ELSE 0
        END as transformation_success_rate,

        CASE 
            WHEN SUM(unique_source_records) > 0 THEN
                (SUM(unique_target_records) * 100.0) / SUM(unique_source_records)
            ELSE 0
        END as record_completeness_rate,

        -- Data modification analysis
        CASE 
            WHEN SUM(total_transformations) > 0 THEN
                (SUM(data_modified) * 100.0) / SUM(total_transformations)
            ELSE 0
        END as data_modification_rate,

        -- Processing consistency
        COUNT(DISTINCT processing_hour) as processing_hours_active,
        AVG(EXTRACT(MINUTES FROM (last_processing - first_processing))) as avg_processing_window_minutes

    FROM data_lineage_analysis
    GROUP BY pipeline_id, source_collection, target_collection, transformation_type
)

SELECT 
    pipeline_id,
    source_collection,
    target_collection,
    transformation_type,

    -- Volume metrics
    total_source_records,
    total_target_records,
    total_transformations,
    total_transformations_applied,

    -- Quality scores
    ROUND(transformation_success_rate, 2) as transformation_success_percent,
    ROUND(record_completeness_rate, 2) as record_completeness_percent,
    ROUND(data_modification_rate, 2) as data_modification_percent,

    -- Data integrity assessment
    total_checksum_validations,
    CASE 
        WHEN total_checksum_validations > 0 AND transformation_success_rate > 98 THEN 'excellent'
        WHEN total_checksum_validations > 0 AND transformation_success_rate > 95 THEN 'good'
        WHEN total_checksum_validations > 0 AND transformation_success_rate > 90 THEN 'acceptable'
        ELSE 'needs_attention'
    END as data_quality_rating,

    -- Processing consistency
    processing_hours_active,
    ROUND(avg_processing_window_minutes, 1) as avg_processing_window_minutes,

    -- Operational insights
    CASE 
        WHEN record_completeness_rate < 98 THEN 'investigate_data_loss'
        WHEN transformation_success_rate < 95 THEN 'review_transformation_logic'
        WHEN data_modification_rate > 80 THEN 'validate_transformation_accuracy'
        WHEN avg_processing_window_minutes > 60 THEN 'optimize_processing_speed'
        ELSE 'quality_acceptable'
    END as quality_recommendation,

    -- Data flow health
    CASE 
        WHEN record_completeness_rate > 99 AND transformation_success_rate > 98 THEN 'healthy'
        WHEN record_completeness_rate > 95 AND transformation_success_rate > 95 THEN 'stable'
        WHEN record_completeness_rate > 90 AND transformation_success_rate > 90 THEN 'concerning'
        ELSE 'critical'
    END as data_flow_health

FROM quality_summary
WHERE total_transformations > 0
ORDER BY 
    CASE data_flow_health 
        WHEN 'critical' THEN 1 
        WHEN 'concerning' THEN 2 
        WHEN 'stable' THEN 3 
        ELSE 4 
    END,
    total_source_records DESC;

-- Error analysis and troubleshooting
SELECT 
    pe.pipeline_id,
    pe.run_id,
    pe.error_time,
    pe.error_type,
    pe.error_message,

    -- Error context
    pe.processing_context ->> 'recordsProcessedBeforeError' as records_before_error,
    pe.processing_context ->> 'runDuration' as run_duration_before_error,

    -- Error frequency analysis
    COUNT(*) OVER (
        PARTITION BY pe.pipeline_id, pe.error_type 
        ORDER BY pe.error_time 
        RANGE BETWEEN INTERVAL '1 hour' PRECEDING AND CURRENT ROW
    ) as similar_errors_last_hour,

    -- Error pattern detection
    LAG(pe.error_time) OVER (
        PARTITION BY pe.pipeline_id, pe.error_type 
        ORDER BY pe.error_time
    ) as previous_similar_error,

    -- Pipeline run context
    pr.start_time as run_start_time,
    pr.records_processed as total_run_records,
    pr.status as run_status,

    -- Resolution tracking
    CASE 
        WHEN pe.error_type IN ('ValidationError', 'SchemaError') THEN 'data_quality_issue'
        WHEN pe.error_type IN ('ConnectionError', 'TimeoutError') THEN 'infrastructure_issue'
        WHEN pe.error_type IN ('TransformationError', 'ProcessingError') THEN 'logic_issue'
        ELSE 'unknown_category'
    END as error_category,

    -- Priority assessment
    CASE 
        WHEN COUNT(*) OVER (PARTITION BY pe.pipeline_id, pe.error_type ORDER BY pe.error_time RANGE BETWEEN INTERVAL '1 hour' PRECEDING AND CURRENT ROW) > 10 THEN 'high'
        WHEN pe.error_type IN ('ConnectionError', 'TimeoutError') THEN 'high'
        WHEN pr.records_processed > 1000 THEN 'medium'
        ELSE 'low'
    END as error_priority,

    -- Suggested resolution
    CASE 
        WHEN pe.error_type = 'ValidationError' THEN 'Review data quality rules and source data format'
        WHEN pe.error_type = 'ConnectionError' THEN 'Check database connectivity and network stability'
        WHEN pe.error_type = 'TimeoutError' THEN 'Increase timeout values or optimize query performance'
        WHEN pe.error_type = 'TransformationError' THEN 'Review transformation logic and test with sample data'
        ELSE 'Investigate error stack trace and contact development team'
    END as suggested_resolution

FROM pipeline_errors pe
JOIN pipeline_runs pr ON pe.run_id = pr.run_id
WHERE pe.error_time >= CURRENT_TIMESTAMP - INTERVAL '24 hours'
ORDER BY error_priority DESC, pe.error_time DESC;

-- QueryLeaf provides comprehensive MongoDB data pipeline capabilities:
-- 1. Real-time change stream processing with SQL-familiar syntax
-- 2. Advanced data transformation and enrichment operations
-- 3. Comprehensive data quality validation and monitoring
-- 4. Pipeline orchestration and scheduling capabilities
-- 5. Data lineage tracking and audit functionality
-- 6. Error handling and troubleshooting tools
-- 7. Performance monitoring and optimization features
-- 8. Stream processing and windowed analytics
-- 9. SQL-style pipeline definition and management
-- 10. Enterprise-grade reliability and fault tolerance

Best Practices for Production Data Pipelines

Pipeline Architecture and Design Principles

Essential principles for effective MongoDB data pipeline deployment:

  1. Stream Processing Design: Implement real-time change stream processing for low-latency data operations
  2. Data Quality Management: Establish comprehensive validation rules and monitoring for data integrity
  3. Error Handling Strategy: Design robust error handling with retry mechanisms and dead letter queues
  4. Performance Optimization: Optimize pipeline throughput with appropriate batching and concurrency settings
  5. Monitoring Integration: Implement comprehensive monitoring for pipeline health and data quality metrics
  6. Schema Evolution: Plan for schema changes and backward compatibility in data transformations

Scalability and Production Operations

Optimize data pipeline operations for enterprise-scale requirements:

  1. Resource Management: Configure appropriate resource limits and scaling policies for pipeline execution
  2. Data Lineage: Track data transformations and dependencies for auditing and troubleshooting
  3. Backup and Recovery: Implement pipeline state backup and recovery mechanisms
  4. Security Integration: Ensure pipeline operations meet security and compliance requirements
  5. Operational Integration: Integrate pipeline monitoring with existing alerting and operational workflows
  6. Cost Optimization: Monitor resource usage and optimize pipeline efficiency for cost-effective operations

Conclusion

MongoDB data pipeline management provides sophisticated real-time data processing capabilities that enable modern applications to handle complex data transformation workflows, stream processing, and ETL operations with advanced monitoring, error handling, and scalability features. The native change stream support and aggregation framework ensure that data pipelines can process high-volume data streams efficiently while maintaining data quality and reliability.

Key MongoDB Data Pipeline benefits include:

  • Real-Time Processing: Native change stream support for immediate data processing and transformation
  • Advanced Transformations: Comprehensive data transformation capabilities with aggregation framework integration
  • Data Quality Management: Built-in validation, monitoring, and quality assessment tools
  • Stream Processing: Sophisticated stream processing patterns for complex event processing and analytics
  • Pipeline Orchestration: Flexible pipeline scheduling and orchestration with error handling and recovery
  • SQL Accessibility: Familiar SQL-style pipeline operations through QueryLeaf for accessible data pipeline management

Whether you're building real-time analytics systems, data warehousing pipelines, microservices data synchronization, or complex ETL workflows, MongoDB data pipeline management with QueryLeaf's familiar SQL interface provides the foundation for sophisticated, scalable data processing operations.

QueryLeaf Integration: QueryLeaf automatically translates SQL-style pipeline operations into MongoDB's native change streams and aggregation pipelines, making advanced data processing functionality accessible to SQL-oriented development teams. Complex data transformations, stream processing operations, and pipeline orchestration are seamlessly handled through familiar SQL constructs, enabling sophisticated data workflows without requiring deep MongoDB pipeline expertise.

The combination of MongoDB's robust data pipeline capabilities with SQL-style pipeline management operations makes it an ideal platform for applications requiring both sophisticated real-time data processing and familiar database management patterns, ensuring your data pipelines can scale efficiently while maintaining reliability and performance as data volume and processing complexity grow.

MongoDB Time Series Data Storage and Optimization: Advanced Temporal Data Analytics and High-Performance Storage Strategies

Modern applications generate massive volumes of time-stamped data from IoT devices, system monitoring, financial markets, user analytics, and sensor networks. Managing temporal data efficiently requires specialized storage strategies that can handle high ingestion rates, optimize storage utilization, and provide fast analytical queries across time ranges. Traditional relational databases struggle with time series workloads due to inefficient storage patterns, limited compression capabilities, and poor query performance for temporal analytics.

MongoDB's time series collections provide purpose-built capabilities for temporal data management through advanced compression algorithms, optimized storage layouts, and specialized indexing strategies. Unlike traditional approaches that require complex partitioning schemes and manual optimization, MongoDB time series collections automatically optimize storage efficiency, query performance, and analytical capabilities while maintaining schema flexibility for diverse time-stamped data formats.

The Traditional Time Series Data Challenge

Conventional approaches to time series data management in relational databases face significant limitations:

-- Traditional PostgreSQL time series data handling - inefficient storage and limited optimization

-- Basic time series table with poor storage efficiency
CREATE TABLE sensor_readings (
    reading_id SERIAL PRIMARY KEY,
    device_id VARCHAR(50) NOT NULL,
    sensor_type VARCHAR(50) NOT NULL,
    location VARCHAR(100),
    timestamp TIMESTAMP NOT NULL,

    -- Measurements stored as separate columns (inflexible schema)
    temperature DECIMAL(5,2),
    humidity DECIMAL(5,2),
    pressure DECIMAL(7,2),
    battery_level INTEGER,
    signal_strength INTEGER,

    -- Limited metadata support
    device_metadata JSONB,

    -- Basic audit fields
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    -- Manual partitioning hint
    partition_key DATE GENERATED ALWAYS AS (DATE(timestamp)) STORED
);

-- Manual partitioning setup (complex maintenance overhead)
CREATE INDEX idx_sensor_readings_timestamp ON sensor_readings(timestamp DESC);
CREATE INDEX idx_sensor_readings_device_time ON sensor_readings(device_id, timestamp DESC);
CREATE INDEX idx_sensor_readings_type_time ON sensor_readings(sensor_type, timestamp DESC);

-- Attempt at time-based partitioning (limited automation)
DO $$
DECLARE
    start_date DATE;
    end_date DATE;
    partition_name TEXT;
BEGIN
    start_date := DATE_TRUNC('month', CURRENT_DATE - INTERVAL '6 months');

    WHILE start_date <= DATE_TRUNC('month', CURRENT_DATE + INTERVAL '3 months') LOOP
        end_date := start_date + INTERVAL '1 month';
        partition_name := 'sensor_readings_' || TO_CHAR(start_date, 'YYYY_MM');

        EXECUTE format('
            CREATE TABLE IF NOT EXISTS %I PARTITION OF sensor_readings
            FOR VALUES FROM (%L) TO (%L)',
            partition_name, start_date, end_date);

        start_date := end_date;
    END LOOP;
END;
$$;

-- Time series aggregation queries (inefficient for large datasets)
WITH hourly_averages AS (
    SELECT 
        device_id,
        sensor_type,
        DATE_TRUNC('hour', timestamp) as hour_bucket,

        -- Basic aggregations (limited analytical functions)
        COUNT(*) as reading_count,
        AVG(temperature) as avg_temperature,
        AVG(humidity) as avg_humidity,
        AVG(pressure) as avg_pressure,
        MIN(temperature) as min_temperature,
        MAX(temperature) as max_temperature,

        -- Standard deviation calculations (expensive)
        STDDEV(temperature) as temp_stddev,
        STDDEV(humidity) as humidity_stddev,

        -- Battery and connectivity metrics
        AVG(battery_level) as avg_battery,
        AVG(signal_strength) as avg_signal_strength,

        -- Data quality metrics
        COUNT(*) FILTER (WHERE temperature IS NOT NULL) as valid_temp_readings,
        COUNT(*) FILTER (WHERE humidity IS NOT NULL) as valid_humidity_readings

    FROM sensor_readings sr
    WHERE timestamp >= CURRENT_TIMESTAMP - INTERVAL '24 hours'
    AND timestamp < CURRENT_TIMESTAMP
    GROUP BY device_id, sensor_type, DATE_TRUNC('hour', timestamp)
),

daily_summaries AS (
    SELECT 
        device_id,
        sensor_type,
        DATE_TRUNC('day', hour_bucket) as day_bucket,

        -- Aggregation of aggregations (double computation overhead)
        SUM(reading_count) as total_readings_per_day,
        AVG(avg_temperature) as daily_avg_temperature,
        MIN(min_temperature) as daily_min_temperature,
        MAX(max_temperature) as daily_max_temperature,
        AVG(avg_humidity) as daily_avg_humidity,
        AVG(avg_pressure) as daily_avg_pressure,

        -- Battery consumption analysis
        MIN(avg_battery) as daily_min_battery,
        AVG(avg_battery) as daily_avg_battery,

        -- Connectivity quality
        AVG(avg_signal_strength) as daily_avg_signal,

        -- Data completeness metrics
        ROUND(
            (SUM(valid_temp_readings) * 100.0) / NULLIF(SUM(reading_count), 0), 2
        ) as temperature_data_completeness_percent,

        ROUND(
            (SUM(valid_humidity_readings) * 100.0) / NULLIF(SUM(reading_count), 0), 2
        ) as humidity_data_completeness_percent

    FROM hourly_averages
    GROUP BY device_id, sensor_type, DATE_TRUNC('day', hour_bucket)
),

device_health_analysis AS (
    -- Complex analysis requiring multiple scans
    SELECT 
        ds.device_id,
        ds.sensor_type,
        COUNT(*) as analysis_days,

        -- Temperature trend analysis (limited analytical capabilities)
        AVG(ds.daily_avg_temperature) as overall_avg_temperature,
        STDDEV(ds.daily_avg_temperature) as temperature_variability,

        -- Battery degradation analysis
        CASE 
            WHEN COUNT(*) > 1 THEN
                -- Simple linear trend approximation
                (MAX(ds.daily_avg_battery) - MIN(ds.daily_avg_battery)) / NULLIF(COUNT(*) - 1, 0)
            ELSE NULL
        END as daily_battery_degradation_rate,

        -- Connectivity stability
        AVG(ds.daily_avg_signal) as avg_connectivity,
        STDDEV(ds.daily_avg_signal) as connectivity_stability,

        -- Data quality assessment
        AVG(ds.temperature_data_completeness_percent) as avg_data_completeness,

        -- Device status classification
        CASE 
            WHEN AVG(ds.daily_avg_battery) < 20 THEN 'low_battery'
            WHEN AVG(ds.daily_avg_signal) < 30 THEN 'poor_connectivity'  
            WHEN AVG(ds.temperature_data_completeness_percent) < 80 THEN 'unreliable_data'
            ELSE 'healthy'
        END as device_status,

        -- Alert generation
        ARRAY[
            CASE WHEN AVG(ds.daily_avg_battery) < 15 THEN 'CRITICAL_BATTERY' END,
            CASE WHEN AVG(ds.daily_avg_signal) < 20 THEN 'CRITICAL_CONNECTIVITY' END,
            CASE WHEN AVG(ds.temperature_data_completeness_percent) < 50 THEN 'DATA_QUALITY_ISSUE' END,
            CASE WHEN STDDEV(ds.daily_avg_temperature) > 10 THEN 'TEMPERATURE_ANOMALY' END
        ]::TEXT[] as active_alerts

    FROM daily_summaries ds
    WHERE ds.day_bucket >= CURRENT_DATE - INTERVAL '7 days'
    GROUP BY ds.device_id, ds.sensor_type
)
SELECT 
    device_id,
    sensor_type,
    analysis_days,

    -- Performance metrics
    ROUND(overall_avg_temperature, 2) as avg_temp,
    ROUND(temperature_variability, 2) as temp_variability,
    ROUND(daily_battery_degradation_rate, 4) as battery_degradation_per_day,
    ROUND(avg_connectivity, 1) as avg_signal_strength,
    ROUND(avg_data_completeness, 1) as data_completeness_percent,

    -- Status and alerts
    device_status,
    ARRAY_REMOVE(active_alerts, NULL) as alerts,

    -- Recommendations
    CASE device_status
        WHEN 'low_battery' THEN 'Schedule battery replacement or reduce sampling frequency'
        WHEN 'poor_connectivity' THEN 'Check network coverage or relocate device'
        WHEN 'unreliable_data' THEN 'Inspect device sensors and calibration'
        ELSE 'Device operating normally'
    END as recommendation

FROM device_health_analysis
ORDER BY 
    CASE device_status 
        WHEN 'low_battery' THEN 1
        WHEN 'poor_connectivity' THEN 2  
        WHEN 'unreliable_data' THEN 3
        ELSE 4
    END,
    overall_avg_temperature DESC;

-- Traditional approach problems:
-- 1. Inefficient storage - no automatic compression for time series patterns
-- 2. Manual partitioning overhead with limited automation
-- 3. Poor query performance for time range analytics
-- 4. Complex aggregation logic requiring multiple query stages
-- 5. Limited schema flexibility for diverse sensor data
-- 6. No built-in time series analytical functions
-- 7. Expensive index maintenance for time-based queries
-- 8. Poor compression ratios leading to high storage costs
-- 9. Complex retention policy implementation
-- 10. Limited support for high-frequency data ingestion

-- Attempt at high-frequency data insertion (poor performance)
INSERT INTO sensor_readings (
    device_id, sensor_type, location, timestamp,
    temperature, humidity, pressure, battery_level, signal_strength
)
VALUES 
    ('device_001', 'environmental', 'warehouse_a', '2024-10-14 10:00:00', 23.5, 45.2, 1013.2, 85, 75),
    ('device_001', 'environmental', 'warehouse_a', '2024-10-14 10:00:10', 23.6, 45.1, 1013.3, 85, 76),
    ('device_001', 'environmental', 'warehouse_a', '2024-10-14 10:00:20', 23.4, 45.3, 1013.1, 85, 74),
    ('device_002', 'environmental', 'warehouse_b', '2024-10-14 10:00:00', 24.1, 42.8, 1012.8, 90, 82),
    ('device_002', 'environmental', 'warehouse_b', '2024-10-14 10:00:10', 24.2, 42.9, 1012.9, 90, 83);
-- Individual inserts are extremely inefficient for high-frequency data

-- Range queries with limited optimization
SELECT 
    device_id,
    AVG(temperature) as avg_temp,
    COUNT(*) as reading_count
FROM sensor_readings
WHERE timestamp BETWEEN '2024-10-14 09:00:00' AND '2024-10-14 11:00:00'
    AND sensor_type = 'environmental'
GROUP BY device_id
ORDER BY avg_temp DESC;

-- Problems:
-- 1. Full table scan for time range queries despite indexing
-- 2. No automatic data compression reducing storage efficiency
-- 3. Poor aggregation performance for time-based analytics
-- 4. Limited analytical functions for time series analysis
-- 5. Complex retention and archival policy implementation
-- 6. No built-in support for irregular time intervals
-- 7. Inefficient handling of sparse data and missing measurements
-- 8. Manual optimization required for high ingestion rates
-- 9. Limited support for multi-metric time series analysis
-- 10. Complex downsampling and data summarization requirements

MongoDB provides sophisticated time series collection capabilities with automatic optimization:

// MongoDB Advanced Time Series Data Management
const { MongoClient, ObjectId } = require('mongodb');

const client = new MongoClient('mongodb://localhost:27017');
const db = client.db('advanced_time_series');

// Comprehensive MongoDB Time Series Manager
class AdvancedTimeSeriesManager {
  constructor(db, config = {}) {
    this.db = db;
    this.config = {
      // Time series collection configuration
      defaultGranularity: config.defaultGranularity || 'seconds',
      defaultExpiration: config.defaultExpiration || 86400 * 30, // 30 days
      enableCompression: config.enableCompression !== false,

      // Bucketing and storage optimization
      bucketMaxSpanSeconds: config.bucketMaxSpanSeconds || 3600, // 1 hour
      bucketRoundingSeconds: config.bucketRoundingSeconds || 60, // 1 minute

      // Performance optimization
      enablePreAggregation: config.enablePreAggregation || false,
      aggregationLevels: config.aggregationLevels || ['hourly', 'daily'],
      enableAutomaticIndexing: config.enableAutomaticIndexing !== false,

      // Data retention and lifecycle
      enableAutomaticExpiration: config.enableAutomaticExpiration !== false,
      retentionPolicies: config.retentionPolicies || {
        raw: 7 * 24 * 3600,      // 7 days
        hourly: 90 * 24 * 3600,  // 90 days  
        daily: 365 * 24 * 3600   // 1 year
      },

      // Quality and monitoring
      enableDataQualityTracking: config.enableDataQualityTracking || false,
      enableAnomalyDetection: config.enableAnomalyDetection || false,
      alertingThresholds: config.alertingThresholds || {}
    };

    this.collections = new Map();
    this.aggregationPipelines = new Map();

    this.initializeTimeSeriesSystem();
  }

  async initializeTimeSeriesSystem() {
    console.log('Initializing advanced time series system...');

    try {
      // Setup time series collections with optimization
      await this.setupTimeSeriesCollections();

      // Configure automatic aggregation pipelines  
      if (this.config.enablePreAggregation) {
        await this.setupPreAggregationPipelines();
      }

      // Setup data quality monitoring
      if (this.config.enableDataQualityTracking) {
        await this.setupDataQualityMonitoring();
      }

      // Initialize retention policies
      if (this.config.enableAutomaticExpiration) {
        await this.setupRetentionPolicies();
      }

      console.log('Time series system initialized successfully');

    } catch (error) {
      console.error('Error initializing time series system:', error);
      throw error;
    }
  }

  async createTimeSeriesCollection(collectionName, options = {}) {
    console.log(`Creating optimized time series collection: ${collectionName}`);

    try {
      const timeSeriesOptions = {
        timeseries: {
          timeField: options.timeField || 'timestamp',
          metaField: options.metaField || 'metadata',
          granularity: options.granularity || this.config.defaultGranularity,

          // Advanced bucketing configuration
          bucketMaxSpanSeconds: options.bucketMaxSpanSeconds || this.config.bucketMaxSpanSeconds,
          bucketRoundingSeconds: options.bucketRoundingSeconds || this.config.bucketRoundingSeconds
        },

        // Automatic expiration configuration
        expireAfterSeconds: options.expireAfterSeconds || this.config.defaultExpiration,

        // Storage optimization
        storageEngine: {
          wiredTiger: {
            configString: options.enableCompression ? 'block_compressor=zstd' : undefined
          }
        }
      };

      // Create the time series collection
      const collection = await this.db.createCollection(collectionName, timeSeriesOptions);

      // Store collection reference for management
      this.collections.set(collectionName, {
        collection: collection,
        config: timeSeriesOptions,
        createdAt: new Date()
      });

      // Create optimized indexes for time series queries
      await this.createTimeSeriesIndexes(collection, options);

      console.log(`Time series collection '${collectionName}' created successfully`);

      return {
        success: true,
        collectionName: collectionName,
        configuration: timeSeriesOptions,
        indexesCreated: true
      };

    } catch (error) {
      console.error(`Error creating time series collection '${collectionName}':`, error);
      return {
        success: false,
        error: error.message,
        collectionName: collectionName
      };
    }
  }

  async createTimeSeriesIndexes(collection, options = {}) {
    console.log('Creating optimized indexes for time series collection...');

    try {
      const indexes = [
        // Compound index for time range queries with metadata
        {
          key: { 
            [`${options.metaField || 'metadata'}.device_id`]: 1,
            [`${options.timeField || 'timestamp'}`]: -1 
          },
          name: 'device_time_idx',
          background: true
        },

        // Index for sensor type queries
        {
          key: { 
            [`${options.metaField || 'metadata'}.sensor_type`]: 1,
            [`${options.timeField || 'timestamp'}`]: -1 
          },
          name: 'sensor_time_idx',
          background: true
        },

        // Compound index for location-based queries
        {
          key: { 
            [`${options.metaField || 'metadata'}.location`]: 1,
            [`${options.metaField || 'metadata'}.device_id`]: 1,
            [`${options.timeField || 'timestamp'}`]: -1 
          },
          name: 'location_device_time_idx',
          background: true
        },

        // Index for data quality queries
        {
          key: { 
            [`${options.metaField || 'metadata'}.data_quality`]: 1,
            [`${options.timeField || 'timestamp'}`]: -1 
          },
          name: 'quality_time_idx',
          background: true,
          sparse: true
        }
      ];

      // Create all indexes
      await collection.createIndexes(indexes);

      console.log(`Created ${indexes.length} optimized indexes for time series collection`);

    } catch (error) {
      console.error('Error creating time series indexes:', error);
      throw error;
    }
  }

  async insertTimeSeriesData(collectionName, documents, options = {}) {
    console.log(`Inserting ${documents.length} time series documents into ${collectionName}...`);

    try {
      const collectionInfo = this.collections.get(collectionName);
      if (!collectionInfo) {
        throw new Error(`Time series collection '${collectionName}' not found`);
      }

      const collection = collectionInfo.collection;

      // Prepare documents for time series insertion
      const preparedDocuments = documents.map(doc => this.prepareTimeSeriesDocument(doc, options));

      // Execute optimized bulk insertion
      const insertOptions = {
        ordered: options.ordered !== undefined ? options.ordered : false,
        writeConcern: options.writeConcern || { w: 'majority', j: true },
        ...options.insertOptions
      };

      const insertResult = await collection.insertMany(preparedDocuments, insertOptions);

      // Update data quality metrics if enabled
      if (this.config.enableDataQualityTracking) {
        await this.updateDataQualityMetrics(collectionName, preparedDocuments);
      }

      // Trigger anomaly detection if enabled
      if (this.config.enableAnomalyDetection) {
        await this.checkForAnomalies(collectionName, preparedDocuments);
      }

      return {
        success: true,
        collectionName: collectionName,
        documentsInserted: insertResult.insertedCount,
        insertedIds: insertResult.insertedIds,

        // Performance metrics
        averageDocumentSize: this.calculateAverageDocumentSize(preparedDocuments),
        compressionEnabled: collectionInfo.config.timeseries.enableCompression,

        // Data quality summary
        dataQualityScore: options.trackQuality ? this.calculateDataQualityScore(preparedDocuments) : null
      };

    } catch (error) {
      console.error(`Error inserting time series data into '${collectionName}':`, error);
      return {
        success: false,
        error: error.message,
        collectionName: collectionName
      };
    }
  }

  prepareTimeSeriesDocument(document, options = {}) {
    // Ensure proper time series document structure
    const prepared = {
      timestamp: document.timestamp || new Date(),

      // Organize metadata for optimal bucketing
      metadata: {
        device_id: document.device_id || document.metadata?.device_id,
        sensor_type: document.sensor_type || document.metadata?.sensor_type,
        location: document.location || document.metadata?.location,

        // Device-specific metadata
        device_model: document.device_model || document.metadata?.device_model,
        firmware_version: document.firmware_version || document.metadata?.firmware_version,

        // Data quality indicators
        data_quality: options.calculateQuality ? this.assessDataQuality(document) : undefined,

        // Additional metadata preservation
        ...document.metadata
      },

      // Measurements with proper data types
      measurements: {
        // Environmental measurements
        temperature: this.validateMeasurement(document.temperature, 'temperature'),
        humidity: this.validateMeasurement(document.humidity, 'humidity'),
        pressure: this.validateMeasurement(document.pressure, 'pressure'),

        // Device status measurements
        battery_level: this.validateMeasurement(document.battery_level, 'battery'),
        signal_strength: this.validateMeasurement(document.signal_strength, 'signal'),

        // Custom measurements
        ...this.extractCustomMeasurements(document)
      }
    };

    // Remove undefined values to optimize storage
    this.removeUndefinedValues(prepared);

    return prepared;
  }

  validateMeasurement(value, measurementType) {
    if (value === null || value === undefined) return undefined;

    // Type-specific validation and normalization
    const validationRules = {
      temperature: { min: -50, max: 100, precision: 2 },
      humidity: { min: 0, max: 100, precision: 1 },
      pressure: { min: 900, max: 1100, precision: 1 },
      battery: { min: 0, max: 100, precision: 0 },
      signal: { min: 0, max: 100, precision: 0 }
    };

    const rule = validationRules[measurementType];
    if (!rule) return value; // No validation rule, return as-is

    const numericValue = Number(value);
    if (isNaN(numericValue)) return undefined;

    // Apply bounds checking
    const boundedValue = Math.max(rule.min, Math.min(rule.max, numericValue));

    // Apply precision rounding
    return Number(boundedValue.toFixed(rule.precision));
  }

  async performTimeSeriesAggregation(collectionName, aggregationRequest) {
    console.log(`Performing time series aggregation on ${collectionName}...`);

    try {
      const collectionInfo = this.collections.get(collectionName);
      if (!collectionInfo) {
        throw new Error(`Time series collection '${collectionName}' not found`);
      }

      const collection = collectionInfo.collection;

      // Build optimized aggregation pipeline
      const aggregationPipeline = this.buildTimeSeriesAggregationPipeline(aggregationRequest);

      // Execute aggregation with appropriate options
      const aggregationOptions = {
        allowDiskUse: true,
        maxTimeMS: aggregationRequest.maxTimeMS || 60000,
        hint: aggregationRequest.hint,
        comment: `time_series_aggregation_${Date.now()}`
      };

      const results = await collection.aggregate(aggregationPipeline, aggregationOptions).toArray();

      // Post-process results for enhanced analytics
      const processedResults = this.processAggregationResults(results, aggregationRequest);

      return {
        success: true,
        collectionName: collectionName,
        aggregationType: aggregationRequest.type,
        resultCount: results.length,

        // Aggregation results
        results: processedResults,

        // Execution metadata
        executionStats: {
          pipelineStages: aggregationPipeline.length,
          executionTime: Date.now(),
          dataPointsAnalyzed: this.estimateDataPointsAnalyzed(aggregationRequest)
        }
      };

    } catch (error) {
      console.error(`Error performing time series aggregation on '${collectionName}':`, error);
      return {
        success: false,
        error: error.message,
        collectionName: collectionName,
        aggregationType: aggregationRequest.type
      };
    }
  }

  buildTimeSeriesAggregationPipeline(request) {
    const pipeline = [];

    // Time range filtering (essential first stage for performance)
    if (request.timeRange) {
      pipeline.push({
        $match: {
          timestamp: {
            $gte: new Date(request.timeRange.start),
            $lte: new Date(request.timeRange.end)
          }
        }
      });
    }

    // Metadata filtering
    if (request.filters) {
      const matchConditions = {};

      if (request.filters.device_ids) {
        matchConditions['metadata.device_id'] = { $in: request.filters.device_ids };
      }

      if (request.filters.sensor_types) {
        matchConditions['metadata.sensor_type'] = { $in: request.filters.sensor_types };
      }

      if (request.filters.locations) {
        matchConditions['metadata.location'] = { $in: request.filters.locations };
      }

      if (Object.keys(matchConditions).length > 0) {
        pipeline.push({ $match: matchConditions });
      }
    }

    // Time-based grouping and aggregation
    switch (request.type) {
      case 'time_bucket_aggregation':
        pipeline.push(...this.buildTimeBucketAggregation(request));
        break;
      case 'device_summary':
        pipeline.push(...this.buildDeviceSummaryAggregation(request));
        break;
      case 'trend_analysis':
        pipeline.push(...this.buildTrendAnalysisAggregation(request));
        break;
      case 'anomaly_detection':
        pipeline.push(...this.buildAnomalyDetectionAggregation(request));
        break;
      default:
        pipeline.push(...this.buildDefaultAggregation(request));
    }

    // Result limiting and sorting
    if (request.sort) {
      pipeline.push({ $sort: request.sort });
    }

    if (request.limit) {
      pipeline.push({ $limit: request.limit });
    }

    return pipeline;
  }

  buildTimeBucketAggregation(request) {
    const bucketSize = request.bucketSize || 'hour';
    const bucketFormat = this.getBucketDateFormat(bucketSize);

    return [
      {
        $group: {
          _id: {
            time_bucket: {
              $dateFromString: {
                dateString: {
                  $dateToString: {
                    date: '$timestamp',
                    format: bucketFormat
                  }
                }
              }
            },
            device_id: '$metadata.device_id',
            sensor_type: '$metadata.sensor_type'
          },

          // Statistical aggregations
          measurement_count: { $sum: 1 },

          // Temperature statistics
          avg_temperature: { $avg: '$measurements.temperature' },
          min_temperature: { $min: '$measurements.temperature' },
          max_temperature: { $max: '$measurements.temperature' },
          temp_variance: { $stdDevPop: '$measurements.temperature' },

          // Humidity statistics
          avg_humidity: { $avg: '$measurements.humidity' },
          min_humidity: { $min: '$measurements.humidity' },
          max_humidity: { $max: '$measurements.humidity' },

          // Pressure statistics
          avg_pressure: { $avg: '$measurements.pressure' },
          pressure_range: {
            $subtract: [
              { $max: '$measurements.pressure' },
              { $min: '$measurements.pressure' }
            ]
          },

          // Device health metrics
          avg_battery_level: { $avg: '$measurements.battery_level' },
          min_battery_level: { $min: '$measurements.battery_level' },
          avg_signal_strength: { $avg: '$measurements.signal_strength' },

          // Data quality metrics
          data_completeness: {
            $avg: {
              $cond: {
                if: {
                  $and: [
                    { $ne: ['$measurements.temperature', null] },
                    { $ne: ['$measurements.humidity', null] },
                    { $ne: ['$measurements.pressure', null] }
                  ]
                },
                then: 1,
                else: 0
              }
            }
          },

          // Time range within bucket
          earliest_reading: { $min: '$timestamp' },
          latest_reading: { $max: '$timestamp' }
        }
      },

      // Post-processing and enrichment
      {
        $addFields: {
          time_bucket: '$_id.time_bucket',
          device_id: '$_id.device_id',
          sensor_type: '$_id.sensor_type',

          // Calculate additional metrics
          temperature_stability: {
            $cond: {
              if: { $gt: ['$temp_variance', 0] },
              then: { $divide: ['$temp_variance', '$avg_temperature'] },
              else: 0
            }
          },

          // Battery consumption rate (simplified)
          estimated_battery_consumption: {
            $subtract: [100, '$avg_battery_level']
          },

          // Data quality score
          data_quality_score: {
            $multiply: ['$data_completeness', 100]
          },

          // Bucket duration in minutes
          bucket_duration_minutes: {
            $divide: [
              { $subtract: ['$latest_reading', '$earliest_reading'] },
              60000
            ]
          }
        }
      },

      // Remove the grouped _id field
      {
        $project: { _id: 0 }
      }
    ];
  }

  buildDeviceSummaryAggregation(request) {
    return [
      {
        $group: {
          _id: '$metadata.device_id',

          // Basic metrics
          total_readings: { $sum: 1 },
          sensor_types: { $addToSet: '$metadata.sensor_type' },
          locations: { $addToSet: '$metadata.location' },

          // Time range
          first_reading: { $min: '$timestamp' },
          last_reading: { $max: '$timestamp' },

          // Environmental averages
          avg_temperature: { $avg: '$measurements.temperature' },
          avg_humidity: { $avg: '$measurements.humidity' },
          avg_pressure: { $avg: '$measurements.pressure' },

          // Environmental ranges
          temperature_range: {
            $subtract: [
              { $max: '$measurements.temperature' },
              { $min: '$measurements.temperature' }
            ]
          },
          humidity_range: {
            $subtract: [
              { $max: '$measurements.humidity' },
              { $min: '$measurements.humidity' }
            ]
          },

          // Device health metrics
          current_battery_level: { $last: '$measurements.battery_level' },
          min_battery_level: { $min: '$measurements.battery_level' },
          avg_signal_strength: { $avg: '$measurements.signal_strength' },
          min_signal_strength: { $min: '$measurements.signal_strength' },

          // Data quality assessment
          complete_readings: {
            $sum: {
              $cond: {
                if: {
                  $and: [
                    { $ne: ['$measurements.temperature', null] },
                    { $ne: ['$measurements.humidity', null] },
                    { $ne: ['$measurements.pressure', null] }
                  ]
                },
                then: 1,
                else: 0
              }
            }
          }
        }
      },

      {
        $addFields: {
          device_id: '$_id',

          // Operational duration
          operational_duration_hours: {
            $divide: [
              { $subtract: ['$last_reading', '$first_reading'] },
              3600000
            ]
          },

          // Reading frequency
          avg_reading_interval_minutes: {
            $cond: {
              if: { $gt: ['$total_readings', 1] },
              then: {
                $divide: [
                  { $subtract: ['$last_reading', '$first_reading'] },
                  { $multiply: [{ $subtract: ['$total_readings', 1] }, 60000] }
                ]
              },
              else: null
            }
          },

          // Data completeness percentage
          data_completeness_percent: {
            $multiply: [
              { $divide: ['$complete_readings', '$total_readings'] },
              100
            ]
          },

          // Device health status
          device_health_status: {
            $switch: {
              branches: [
                {
                  case: { $lt: ['$current_battery_level', 15] },
                  then: 'critical_battery'
                },
                {
                  case: { $lt: ['$avg_signal_strength', 30] },
                  then: 'poor_connectivity'
                },
                {
                  case: {
                    $lt: [
                      { $divide: ['$complete_readings', '$total_readings'] },
                      0.8
                    ]
                  },
                  then: 'data_quality_issues'
                }
              ],
              default: 'healthy'
            }
          }
        }
      },

      {
        $project: { _id: 0 }
      }
    ];
  }

  getBucketDateFormat(bucketSize) {
    const formats = {
      'minute': '%Y-%m-%d %H:%M:00',
      'hour': '%Y-%m-%d %H:00:00',
      'day': '%Y-%m-%d 00:00:00',
      'week': '%Y-%U 00:00:00', // Year-Week
      'month': '%Y-%m-01 00:00:00'
    };

    return formats[bucketSize] || formats['hour'];
  }

  async setupRetentionPolicies() {
    console.log('Setting up automatic data retention policies...');

    try {
      for (const [collectionName, collectionInfo] of this.collections.entries()) {
        // Configure TTL indexes for automatic expiration
        const collection = collectionInfo.collection;

        await collection.createIndex(
          { timestamp: 1 },
          {
            name: 'ttl_index',
            expireAfterSeconds: collectionInfo.config.expireAfterSeconds,
            background: true
          }
        );

        console.log(`Retention policy configured for ${collectionName}: ${collectionInfo.config.expireAfterSeconds} seconds`);
      }

    } catch (error) {
      console.error('Error setting up retention policies:', error);
      throw error;
    }
  }

  async setupPreAggregationPipelines() {
    console.log('Setting up pre-aggregation pipelines...');

    // This would typically involve setting up MongoDB change streams
    // or scheduled aggregation jobs for common query patterns

    for (const level of this.config.aggregationLevels) {
      const pipelineName = `pre_aggregation_${level}`;

      // Store pipeline configuration for later execution
      this.aggregationPipelines.set(pipelineName, {
        level: level,
        schedule: this.getAggregationSchedule(level),
        pipeline: this.buildPreAggregationPipeline(level)
      });

      console.log(`Pre-aggregation pipeline configured for ${level} level`);
    }
  }

  // Utility methods for time series management

  calculateAverageDocumentSize(documents) {
    if (!documents || documents.length === 0) return 0;

    const totalSize = documents.reduce((size, doc) => {
      return size + JSON.stringify(doc).length;
    }, 0);

    return Math.round(totalSize / documents.length);
  }

  assessDataQuality(document) {
    let qualityScore = 0;
    let totalChecks = 0;

    // Check for presence of key measurements
    const measurements = ['temperature', 'humidity', 'pressure'];
    for (const measurement of measurements) {
      totalChecks++;
      if (document[measurement] !== null && document[measurement] !== undefined) {
        qualityScore++;
      }
    }

    // Check for reasonable value ranges
    if (document.temperature !== null && document.temperature >= -50 && document.temperature <= 100) {
      qualityScore += 0.5;
    }
    totalChecks += 0.5;

    if (document.humidity !== null && document.humidity >= 0 && document.humidity <= 100) {
      qualityScore += 0.5;
    }
    totalChecks += 0.5;

    return totalChecks > 0 ? qualityScore / totalChecks : 0;
  }

  extractCustomMeasurements(document) {
    const customMeasurements = {};
    const standardFields = ['timestamp', 'device_id', 'sensor_type', 'location', 'metadata', 'temperature', 'humidity', 'pressure', 'battery_level', 'signal_strength'];

    for (const [key, value] of Object.entries(document)) {
      if (!standardFields.includes(key) && typeof value === 'number') {
        customMeasurements[key] = value;
      }
    }

    return customMeasurements;
  }

  removeUndefinedValues(obj) {
    Object.keys(obj).forEach(key => {
      if (obj[key] === undefined) {
        delete obj[key];
      } else if (typeof obj[key] === 'object' && obj[key] !== null) {
        this.removeUndefinedValues(obj[key]);

        // Remove empty objects
        if (Object.keys(obj[key]).length === 0) {
          delete obj[key];
        }
      }
    });
  }

  processAggregationResults(results, request) {
    // Add additional context and calculations to aggregation results
    return results.map(result => ({
      ...result,

      // Add computed fields based on aggregation type
      aggregation_metadata: {
        request_type: request.type,
        generated_at: new Date(),
        bucket_size: request.bucketSize,
        time_range: request.timeRange
      }
    }));
  }

  estimateDataPointsAnalyzed(request) {
    // Simplified estimation based on time range and expected frequency
    if (!request.timeRange) return 'unknown';

    const timeRangeMs = new Date(request.timeRange.end) - new Date(request.timeRange.start);
    const assumedFrequencyMs = 60000; // Assume 1 minute intervals

    return Math.round(timeRangeMs / assumedFrequencyMs);
  }

  getAggregationSchedule(level) {
    const schedules = {
      'hourly': '0 */1 * * * *',     // Every hour
      'daily': '0 0 */1 * * *',      // Every day at midnight
      'weekly': '0 0 0 */7 * *',     // Every week
      'monthly': '0 0 0 1 */1 *'     // Every month on 1st
    };

    return schedules[level] || schedules['daily'];
  }

  buildPreAggregationPipeline(level) {
    // Simplified pre-aggregation pipeline
    // In production, this would be much more sophisticated
    return [
      {
        $match: {
          timestamp: {
            $gte: new Date(Date.now() - this.getLevelTimeRange(level))
          }
        }
      },
      {
        $group: {
          _id: {
            device_id: '$metadata.device_id',
            time_bucket: this.getTimeBucketExpression(level)
          },
          avg_temperature: { $avg: '$measurements.temperature' },
          avg_humidity: { $avg: '$measurements.humidity' },
          count: { $sum: 1 }
        }
      }
    ];
  }

  getLevelTimeRange(level) {
    const ranges = {
      'hourly': 24 * 60 * 60 * 1000,      // 1 day
      'daily': 30 * 24 * 60 * 60 * 1000,  // 30 days
      'weekly': 12 * 7 * 24 * 60 * 60 * 1000, // 12 weeks
      'monthly': 12 * 30 * 24 * 60 * 60 * 1000 // 12 months
    };

    return ranges[level] || ranges['daily'];
  }

  getTimeBucketExpression(level) {
    const expressions = {
      'hourly': {
        $dateFromString: {
          dateString: {
            $dateToString: {
              date: '$timestamp',
              format: '%Y-%m-%d %H:00:00'
            }
          }
        }
      },
      'daily': {
        $dateFromString: {
          dateString: {
            $dateToString: {
              date: '$timestamp',
              format: '%Y-%m-%d 00:00:00'
            }
          }
        }
      }
    };

    return expressions[level] || expressions['hourly'];
  }
}

// Benefits of MongoDB Advanced Time Series Collections:
// - Purpose-built storage optimization with automatic compression
// - Intelligent bucketing for optimal query performance  
// - Built-in retention policies and automatic data expiration
// - Advanced indexing strategies optimized for temporal queries
// - Schema flexibility for diverse sensor and measurement data
// - Native aggregation capabilities for time series analytics
// - Automatic storage optimization and compression
// - High ingestion performance for IoT and monitoring workloads
// - Built-in support for metadata organization and filtering
// - SQL-compatible time series operations through QueryLeaf integration

module.exports = {
  AdvancedTimeSeriesManager
};

Understanding MongoDB Time Series Architecture

Advanced Temporal Data Management and Storage Optimization Strategies

Implement sophisticated time series patterns for production MongoDB deployments:

// Production-ready MongoDB time series with enterprise-grade optimization and monitoring
class ProductionTimeSeriesManager extends AdvancedTimeSeriesManager {
  constructor(db, productionConfig) {
    super(db, productionConfig);

    this.productionConfig = {
      ...productionConfig,
      enableDistributedCollection: true,
      enableRealTimeAggregation: true,
      enablePredictiveAnalytics: true,
      enableAutomaticScaling: true,
      enableComplianceTracking: true,
      enableAdvancedAlerting: true
    };

    this.setupProductionOptimizations();
    this.initializeRealTimeProcessing();
    this.setupPredictiveAnalytics();
  }

  async implementDistributedTimeSeriesProcessing(collections, distributionStrategy) {
    console.log('Implementing distributed time series processing across multiple collections...');

    const distributedStrategy = {
      // Temporal sharding strategies
      temporalSharding: {
        enableTimeBasedSharding: true,
        shardingGranularity: 'monthly',
        automaticShardRotation: true,
        optimizeForQueryPatterns: true
      },

      // Data lifecycle management
      lifecycleManagement: {
        hotDataRetention: '7d',
        warmDataRetention: '90d', 
        coldDataArchival: '1y',
        automaticTiering: true
      },

      // Performance optimization
      performanceOptimization: {
        compressionOptimization: true,
        indexingOptimization: true,
        bucketingOptimization: true,
        aggregationOptimization: true
      }
    };

    return await this.deployDistributedTimeSeriesArchitecture(collections, distributedStrategy);
  }

  async setupAdvancedTimeSeriesAnalytics() {
    console.log('Setting up advanced time series analytics and machine learning capabilities...');

    const analyticsCapabilities = {
      // Real-time analytics
      realTimeAnalytics: {
        streamingAggregation: true,
        anomalyDetection: true,
        trendAnalysis: true,
        alertingPipelines: true
      },

      // Predictive analytics
      predictiveAnalytics: {
        forecastingModels: true,
        patternRecognition: true,
        seasonalityDetection: true,
        capacityPlanning: true
      },

      // Advanced reporting
      reportingCapabilities: {
        automaticDashboards: true,
        customMetrics: true,
        correlationAnalysis: true,
        performanceReporting: true
      }
    };

    return await this.deployAdvancedAnalytics(analyticsCapabilities);
  }
}

SQL-Style Time Series Operations with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB time series operations and analytics:

-- QueryLeaf advanced time series operations with SQL-familiar syntax for MongoDB

-- Create optimized time series collection with advanced configuration
CREATE COLLECTION sensor_data AS TIME_SERIES (
  time_field = 'timestamp',
  meta_field = 'metadata',
  granularity = 'seconds',

  -- Storage optimization
  bucket_max_span_seconds = 3600,
  bucket_rounding_seconds = 60,
  expire_after_seconds = 2592000,  -- 30 days

  -- Compression settings
  enable_compression = true,
  compression_algorithm = 'zstd',

  -- Performance optimization
  enable_automatic_indexing = true,
  optimize_for_ingestion = true,
  optimize_for_analytics = true
);

-- Advanced time series data insertion with automatic optimization
INSERT INTO sensor_data (
  timestamp,
  metadata.device_id,
  metadata.sensor_type,
  metadata.location,
  metadata.data_quality,
  measurements.temperature,
  measurements.humidity,
  measurements.pressure,
  measurements.battery_level,
  measurements.signal_strength
)
SELECT 
  -- Time series specific timestamp handling
  CASE 
    WHEN source_timestamp IS NOT NULL THEN source_timestamp
    ELSE CURRENT_TIMESTAMP
  END as timestamp,

  -- Metadata organization for optimal bucketing
  device_identifier as "metadata.device_id",
  sensor_classification as "metadata.sensor_type", 
  installation_location as "metadata.location",

  -- Data quality assessment
  CASE 
    WHEN temp_reading IS NOT NULL AND humidity_reading IS NOT NULL AND pressure_reading IS NOT NULL THEN 'complete'
    WHEN temp_reading IS NOT NULL OR humidity_reading IS NOT NULL THEN 'partial'
    ELSE 'incomplete'
  END as "metadata.data_quality",

  -- Validated measurements
  CASE 
    WHEN temp_reading BETWEEN -50 AND 100 THEN ROUND(temp_reading, 2)
    ELSE NULL
  END as "measurements.temperature",

  CASE 
    WHEN humidity_reading BETWEEN 0 AND 100 THEN ROUND(humidity_reading, 1)
    ELSE NULL  
  END as "measurements.humidity",

  CASE 
    WHEN pressure_reading BETWEEN 900 AND 1100 THEN ROUND(pressure_reading, 1)
    ELSE NULL
  END as "measurements.pressure",

  -- Device health measurements
  GREATEST(0, LEAST(100, battery_percentage)) as "measurements.battery_level",
  GREATEST(0, LEAST(100, connectivity_strength)) as "measurements.signal_strength"

FROM staging_sensor_readings
WHERE ingestion_timestamp >= CURRENT_TIMESTAMP - INTERVAL '1 hour'
  AND device_identifier IS NOT NULL
  AND source_timestamp IS NOT NULL

-- Time series bulk insert configuration
WITH (
  batch_size = 5000,
  ordered_operations = false,
  write_concern = 'majority',
  enable_compression = true,
  bypass_document_validation = false
);

-- Advanced time-bucket aggregation with comprehensive analytics
WITH time_bucket_analysis AS (
  SELECT 
    -- Time bucketing with flexible granularity
    DATE_TRUNC('hour', timestamp) as time_bucket,
    metadata.device_id,
    metadata.sensor_type,
    metadata.location,

    -- Volume metrics
    COUNT(*) as reading_count,
    COUNT(measurements.temperature) as temp_reading_count,
    COUNT(measurements.humidity) as humidity_reading_count,
    COUNT(measurements.pressure) as pressure_reading_count,

    -- Temperature analytics
    AVG(measurements.temperature) as avg_temperature,
    MIN(measurements.temperature) as min_temperature,
    MAX(measurements.temperature) as max_temperature,
    STDDEV_POP(measurements.temperature) as temp_stddev,
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY measurements.temperature) as temp_median,
    PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY measurements.temperature) as temp_p95,

    -- Humidity analytics
    AVG(measurements.humidity) as avg_humidity,
    MIN(measurements.humidity) as min_humidity,
    MAX(measurements.humidity) as max_humidity,
    STDDEV_POP(measurements.humidity) as humidity_stddev,

    -- Pressure analytics  
    AVG(measurements.pressure) as avg_pressure,
    MIN(measurements.pressure) as min_pressure,
    MAX(measurements.pressure) as max_pressure,
    (MAX(measurements.pressure) - MIN(measurements.pressure)) as pressure_range,

    -- Device health analytics
    AVG(measurements.battery_level) as avg_battery,
    MIN(measurements.battery_level) as min_battery,
    AVG(measurements.signal_strength) as avg_signal,
    MIN(measurements.signal_strength) as min_signal,

    -- Data quality analytics
    (COUNT(measurements.temperature) * 100.0 / COUNT(*)) as temp_completeness_percent,
    (COUNT(measurements.humidity) * 100.0 / COUNT(*)) as humidity_completeness_percent,
    (COUNT(measurements.pressure) * 100.0 / COUNT(*)) as pressure_completeness_percent,

    -- Time range within bucket
    MIN(timestamp) as bucket_start_time,
    MAX(timestamp) as bucket_end_time,

    -- Advanced statistical measures
    (MAX(measurements.temperature) - MIN(measurements.temperature)) as temp_range,
    CASE 
      WHEN AVG(measurements.temperature) > 0 THEN 
        STDDEV_POP(measurements.temperature) / AVG(measurements.temperature) 
      ELSE NULL
    END as temp_coefficient_variation

  FROM sensor_data
  WHERE timestamp >= CURRENT_TIMESTAMP - INTERVAL '24 hours'
    AND timestamp < CURRENT_TIMESTAMP
    AND metadata.data_quality IN ('complete', 'partial')
  GROUP BY 
    DATE_TRUNC('hour', timestamp),
    metadata.device_id,
    metadata.sensor_type,
    metadata.location
),

anomaly_detection AS (
  SELECT 
    tba.*,

    -- Temperature anomaly detection
    CASE 
      WHEN temp_stddev > 0 THEN
        ABS(avg_temperature - LAG(avg_temperature) OVER (
          PARTITION BY device_id 
          ORDER BY time_bucket
        )) / temp_stddev
      ELSE 0
    END as temp_anomaly_score,

    -- Humidity anomaly detection  
    CASE 
      WHEN humidity_stddev > 0 THEN
        ABS(avg_humidity - LAG(avg_humidity) OVER (
          PARTITION BY device_id 
          ORDER BY time_bucket
        )) / humidity_stddev
      ELSE 0
    END as humidity_anomaly_score,

    -- Battery degradation analysis
    LAG(avg_battery) OVER (
      PARTITION BY device_id 
      ORDER BY time_bucket
    ) - avg_battery as battery_degradation,

    -- Signal strength trend
    avg_signal - LAG(avg_signal) OVER (
      PARTITION BY device_id 
      ORDER BY time_bucket
    ) as signal_trend,

    -- Data quality trend
    (temp_completeness_percent + humidity_completeness_percent + pressure_completeness_percent) / 3.0 as overall_completeness,

    -- Bucket characteristics
    EXTRACT(MINUTES FROM (bucket_end_time - bucket_start_time)) as bucket_duration_minutes

  FROM time_bucket_analysis tba
),

device_health_assessment AS (
  SELECT 
    ad.device_id,
    ad.sensor_type,
    ad.location,
    COUNT(*) as analysis_periods,

    -- Environmental stability analysis
    AVG(ad.avg_temperature) as device_avg_temperature,
    STDDEV(ad.avg_temperature) as temperature_stability,
    AVG(ad.temp_coefficient_variation) as avg_temp_variability,

    -- Environmental range analysis
    MIN(ad.min_temperature) as absolute_min_temperature,
    MAX(ad.max_temperature) as absolute_max_temperature,
    AVG(ad.temp_range) as avg_hourly_temp_range,

    -- Humidity environment analysis
    AVG(ad.avg_humidity) as device_avg_humidity,
    STDDEV(ad.avg_humidity) as humidity_stability,
    AVG(ad.pressure_range) as avg_pressure_variation,

    -- Device health metrics
    MIN(ad.min_battery) as lowest_battery_level,
    AVG(ad.avg_battery) as avg_battery_level,
    MAX(ad.battery_degradation) as max_battery_drop_per_hour,

    -- Connectivity analysis
    AVG(ad.avg_signal) as avg_connectivity,
    MIN(ad.min_signal) as worst_connectivity,
    STDDEV(ad.avg_signal) as connectivity_stability,

    -- Data reliability metrics
    AVG(ad.overall_completeness) as avg_data_completeness,
    MIN(ad.overall_completeness) as worst_data_completeness,

    -- Anomaly frequency
    COUNT(*) FILTER (WHERE ad.temp_anomaly_score > 2) as temp_anomaly_count,
    COUNT(*) FILTER (WHERE ad.humidity_anomaly_score > 2) as humidity_anomaly_count,
    AVG(ad.temp_anomaly_score) as avg_temp_anomaly_score,

    -- Recent trends (last 6 hours vs previous)
    AVG(CASE WHEN ad.time_bucket >= CURRENT_TIMESTAMP - INTERVAL '6 hours' 
             THEN ad.avg_battery ELSE NULL END) - 
    AVG(CASE WHEN ad.time_bucket < CURRENT_TIMESTAMP - INTERVAL '6 hours' 
             THEN ad.avg_battery ELSE NULL END) as recent_battery_trend,

    AVG(CASE WHEN ad.time_bucket >= CURRENT_TIMESTAMP - INTERVAL '6 hours' 
             THEN ad.avg_signal ELSE NULL END) - 
    AVG(CASE WHEN ad.time_bucket < CURRENT_TIMESTAMP - INTERVAL '6 hours' 
             THEN ad.avg_signal ELSE NULL END) as recent_signal_trend

  FROM anomaly_detection ad
  GROUP BY ad.device_id, ad.sensor_type, ad.location
)

SELECT 
  dha.device_id,
  dha.sensor_type,
  dha.location,
  dha.analysis_periods,

  -- Environmental summary
  ROUND(dha.device_avg_temperature, 2) as avg_temperature,
  ROUND(dha.temperature_stability, 3) as temp_stability_stddev,
  ROUND(dha.avg_temp_variability, 3) as avg_temp_coefficient_variation,
  dha.absolute_min_temperature,
  dha.absolute_max_temperature,

  -- Environmental classification
  CASE 
    WHEN dha.temperature_stability > 5 THEN 'highly_variable'
    WHEN dha.temperature_stability > 2 THEN 'moderately_variable'  
    WHEN dha.temperature_stability > 1 THEN 'stable'
    ELSE 'very_stable'
  END as temperature_environment_classification,

  -- Device health summary
  ROUND(dha.avg_battery_level, 1) as avg_battery_level,
  dha.lowest_battery_level,
  ROUND(dha.max_battery_drop_per_hour, 2) as max_hourly_battery_consumption,
  ROUND(dha.avg_connectivity, 1) as avg_signal_strength,

  -- Device status assessment
  CASE 
    WHEN dha.lowest_battery_level < 15 THEN 'critical_battery'
    WHEN dha.avg_battery_level < 25 THEN 'low_battery'
    WHEN dha.avg_connectivity < 30 THEN 'connectivity_issues'
    WHEN dha.avg_data_completeness < 80 THEN 'data_quality_issues'
    WHEN dha.temp_anomaly_count > dha.analysis_periods * 0.2 THEN 'environmental_anomalies'
    ELSE 'healthy'
  END as device_status,

  -- Data quality assessment
  ROUND(dha.avg_data_completeness, 1) as avg_data_completeness_percent,
  dha.worst_data_completeness,

  -- Anomaly summary
  dha.temp_anomaly_count,
  dha.humidity_anomaly_count,
  ROUND(dha.avg_temp_anomaly_score, 3) as avg_temp_anomaly_score,

  -- Recent trends
  ROUND(dha.recent_battery_trend, 2) as recent_battery_change,
  ROUND(dha.recent_signal_trend, 1) as recent_signal_change,

  -- Trend classification
  CASE 
    WHEN dha.recent_battery_trend < -2 THEN 'battery_degrading_fast'
    WHEN dha.recent_battery_trend < -0.5 THEN 'battery_degrading'
    WHEN dha.recent_battery_trend > 1 THEN 'battery_improving'  -- Could indicate replacement
    ELSE 'battery_stable'
  END as battery_trend_classification,

  CASE 
    WHEN dha.recent_signal_trend < -5 THEN 'connectivity_degrading'
    WHEN dha.recent_signal_trend > 5 THEN 'connectivity_improving'
    ELSE 'connectivity_stable'
  END as connectivity_trend_classification,

  -- Alert generation
  ARRAY[
    CASE WHEN dha.lowest_battery_level < 10 THEN 'CRITICAL: Battery critically low' END,
    CASE WHEN dha.avg_connectivity < 25 THEN 'WARNING: Poor connectivity detected' END,
    CASE WHEN dha.avg_data_completeness < 70 THEN 'WARNING: Low data quality' END,
    CASE WHEN dha.recent_battery_trend < -3 THEN 'ALERT: Rapid battery degradation' END,
    CASE WHEN dha.temp_anomaly_count > dha.analysis_periods * 0.3 THEN 'ALERT: Frequent temperature anomalies' END
  ]::TEXT[] as active_alerts,

  -- Recommendations
  CASE 
    WHEN dha.lowest_battery_level < 15 THEN 'Schedule immediate battery replacement'
    WHEN dha.avg_connectivity < 30 THEN 'Check network coverage and device positioning'  
    WHEN dha.avg_data_completeness < 80 THEN 'Inspect sensors and perform calibration'
    WHEN dha.temp_anomaly_count > dha.analysis_periods * 0.2 THEN 'Investigate environmental factors'
    ELSE 'Device operating within normal parameters'
  END as maintenance_recommendation

FROM device_health_assessment dha
ORDER BY 
  CASE 
    WHEN dha.lowest_battery_level < 15 THEN 1
    WHEN dha.avg_connectivity < 30 THEN 2
    WHEN dha.avg_data_completeness < 80 THEN 3
    ELSE 4
  END,
  dha.device_id;

-- Advanced time series trend analysis with seasonality detection
WITH daily_aggregates AS (
  SELECT 
    DATE_TRUNC('day', timestamp) as date_bucket,
    metadata.location,
    metadata.sensor_type,

    -- Daily environmental summaries
    AVG(measurements.temperature) as daily_avg_temp,
    MIN(measurements.temperature) as daily_min_temp,
    MAX(measurements.temperature) as daily_max_temp,
    AVG(measurements.humidity) as daily_avg_humidity,
    AVG(measurements.pressure) as daily_avg_pressure,

    -- Data volume and quality
    COUNT(*) as daily_reading_count,
    (COUNT(measurements.temperature) * 100.0 / COUNT(*)) as daily_completeness

  FROM sensor_data
  WHERE timestamp >= CURRENT_DATE - INTERVAL '90 days'
    AND timestamp < CURRENT_DATE
    AND metadata.location IS NOT NULL
  GROUP BY DATE_TRUNC('day', timestamp), metadata.location, metadata.sensor_type
),

weekly_patterns AS (
  SELECT 
    da.*,
    EXTRACT(DOW FROM da.date_bucket) as day_of_week,  -- 0=Sunday, 6=Saturday
    EXTRACT(WEEK FROM da.date_bucket) as week_number,

    -- Moving averages for trend analysis
    AVG(da.daily_avg_temp) OVER (
      PARTITION BY da.location, da.sensor_type
      ORDER BY da.date_bucket
      ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ) as temp_7day_avg,

    AVG(da.daily_avg_temp) OVER (
      PARTITION BY da.location, da.sensor_type  
      ORDER BY da.date_bucket
      ROWS BETWEEN 29 PRECEDING AND CURRENT ROW
    ) as temp_30day_avg,

    -- Trend detection
    da.daily_avg_temp - LAG(da.daily_avg_temp, 7) OVER (
      PARTITION BY da.location, da.sensor_type
      ORDER BY da.date_bucket
    ) as week_over_week_temp_change,

    -- Seasonality indicators
    LAG(da.daily_avg_temp, 7) OVER (
      PARTITION BY da.location, da.sensor_type
      ORDER BY da.date_bucket
    ) as same_day_last_week_temp,

    LAG(da.daily_avg_temp, 30) OVER (
      PARTITION BY da.location, da.sensor_type  
      ORDER BY da.date_bucket
    ) as same_day_last_month_temp

  FROM daily_aggregates da
),

trend_analysis AS (
  SELECT 
    wp.location,
    wp.sensor_type,
    COUNT(*) as analysis_days,

    -- Overall trend analysis
    AVG(wp.daily_avg_temp) as overall_avg_temp,
    STDDEV(wp.daily_avg_temp) as temp_variability,
    MIN(wp.daily_min_temp) as absolute_min_temp,
    MAX(wp.daily_max_temp) as absolute_max_temp,

    -- Seasonal pattern analysis  
    AVG(CASE WHEN wp.day_of_week IN (0,6) THEN wp.daily_avg_temp END) as weekend_avg_temp,
    AVG(CASE WHEN wp.day_of_week BETWEEN 1 AND 5 THEN wp.daily_avg_temp END) as weekday_avg_temp,

    -- Weekly cyclical patterns
    AVG(CASE WHEN wp.day_of_week = 0 THEN wp.daily_avg_temp END) as sunday_avg,
    AVG(CASE WHEN wp.day_of_week = 1 THEN wp.daily_avg_temp END) as monday_avg,
    AVG(CASE WHEN wp.day_of_week = 2 THEN wp.daily_avg_temp END) as tuesday_avg,
    AVG(CASE WHEN wp.day_of_week = 3 THEN wp.daily_avg_temp END) as wednesday_avg,
    AVG(CASE WHEN wp.day_of_week = 4 THEN wp.daily_avg_temp END) as thursday_avg,
    AVG(CASE WHEN wp.day_of_week = 5 THEN wp.daily_avg_temp END) as friday_avg,
    AVG(CASE WHEN wp.day_of_week = 6 THEN wp.daily_avg_temp END) as saturday_avg,

    -- Trend strength analysis
    AVG(wp.week_over_week_temp_change) as avg_weekly_change,
    STDDEV(wp.week_over_week_temp_change) as weekly_change_variability,

    -- Linear trend approximation (simplified)
    (MAX(wp.temp_30day_avg) - MIN(wp.temp_30day_avg)) / 
    NULLIF(EXTRACT(DAYS FROM MAX(wp.date_bucket) - MIN(wp.date_bucket)), 0) as daily_trend_rate,

    -- Data quality trend
    AVG(wp.daily_completeness) as avg_data_completeness,
    MIN(wp.daily_completeness) as worst_daily_completeness

  FROM weekly_patterns wp
  WHERE wp.date_bucket >= CURRENT_DATE - INTERVAL '60 days'  -- Focus on last 60 days for trends
  GROUP BY wp.location, wp.sensor_type
)

SELECT 
  ta.location,
  ta.sensor_type,
  ta.analysis_days,

  -- Environmental summary
  ROUND(ta.overall_avg_temp, 2) as avg_temperature,
  ROUND(ta.temp_variability, 2) as temperature_variability,
  ta.absolute_min_temp,
  ta.absolute_max_temp,

  -- Seasonal patterns
  ROUND(COALESCE(ta.weekday_avg_temp, 0), 2) as weekday_avg_temp,
  ROUND(COALESCE(ta.weekend_avg_temp, 0), 2) as weekend_avg_temp,
  ROUND(COALESCE(ta.weekend_avg_temp - ta.weekday_avg_temp, 0), 2) as weekend_weekday_diff,

  -- Weekly pattern analysis (day of week variations)
  JSON_OBJECT(
    'sunday', ROUND(COALESCE(ta.sunday_avg, 0), 2),
    'monday', ROUND(COALESCE(ta.monday_avg, 0), 2),
    'tuesday', ROUND(COALESCE(ta.tuesday_avg, 0), 2),
    'wednesday', ROUND(COALESCE(ta.wednesday_avg, 0), 2),
    'thursday', ROUND(COALESCE(ta.thursday_avg, 0), 2),
    'friday', ROUND(COALESCE(ta.friday_avg, 0), 2),
    'saturday', ROUND(COALESCE(ta.saturday_avg, 0), 2)
  ) as daily_temperature_pattern,

  -- Trend analysis
  ROUND(ta.avg_weekly_change, 3) as avg_weekly_temperature_change,
  ROUND(ta.daily_trend_rate * 30, 3) as monthly_trend_rate,

  -- Trend classification
  CASE 
    WHEN ta.daily_trend_rate > 0.1 THEN 'warming_trend'
    WHEN ta.daily_trend_rate < -0.1 THEN 'cooling_trend'
    ELSE 'stable'
  END as temperature_trend_classification,

  -- Seasonal pattern classification
  CASE 
    WHEN ABS(COALESCE(ta.weekend_avg_temp - ta.weekday_avg_temp, 0)) > 2 THEN 'strong_weekly_pattern'
    WHEN ABS(COALESCE(ta.weekend_avg_temp - ta.weekday_avg_temp, 0)) > 1 THEN 'moderate_weekly_pattern'
    ELSE 'minimal_weekly_pattern'
  END as weekly_seasonality,

  -- Variability assessment
  CASE 
    WHEN ta.temp_variability > 5 THEN 'highly_variable'
    WHEN ta.temp_variability > 2 THEN 'moderately_variable'
    ELSE 'stable_environment'
  END as environment_stability,

  -- Data quality assessment
  ROUND(ta.avg_data_completeness, 1) as avg_data_completeness_percent,

  -- Insights and recommendations
  CASE 
    WHEN ABS(ta.daily_trend_rate) > 0.1 THEN 'Monitor for environmental changes'
    WHEN ta.temp_variability > 5 THEN 'High variability - check for external factors'
    WHEN ta.avg_data_completeness < 90 THEN 'Improve sensor reliability'
    ELSE 'Environment stable, monitoring nominal'
  END as analysis_recommendation

FROM trend_analysis ta
WHERE ta.analysis_days >= 30  -- Require at least 30 days for meaningful trend analysis
ORDER BY 
  ABS(ta.daily_trend_rate) DESC,  -- Show locations with strongest trends first
  ta.temp_variability DESC,
  ta.location, 
  ta.sensor_type;

-- Time series data retention and archival with automated lifecycle management
WITH retention_analysis AS (
  SELECT 
    -- Analyze data age distribution
    DATE_TRUNC('day', timestamp) as date_bucket,
    metadata.location,
    COUNT(*) as daily_record_count,
    AVG(JSON_EXTRACT_PATH_TEXT(measurements, 'temperature')::NUMERIC) as daily_avg_temp,

    -- Data age categories
    CASE 
      WHEN timestamp >= CURRENT_TIMESTAMP - INTERVAL '7 days' THEN 'hot_data'
      WHEN timestamp >= CURRENT_TIMESTAMP - INTERVAL '30 days' THEN 'warm_data' 
      WHEN timestamp >= CURRENT_TIMESTAMP - INTERVAL '90 days' THEN 'cold_data'
      ELSE 'archive_candidate'
    END as data_tier,

    -- Storage impact estimation
    COUNT(*) * 500 as estimated_storage_bytes,  -- Assume ~500 bytes per document

    -- Access pattern analysis (simplified)
    CURRENT_DATE - DATE_TRUNC('day', timestamp)::DATE as days_old

  FROM sensor_data
  WHERE timestamp >= CURRENT_DATE - INTERVAL '180 days'  -- Analyze last 6 months
  GROUP BY DATE_TRUNC('day', timestamp), metadata.location
),

archival_candidates AS (
  SELECT 
    ra.location,
    ra.data_tier,
    COUNT(*) as total_days,
    SUM(ra.daily_record_count) as total_records,
    SUM(ra.estimated_storage_bytes) as total_estimated_bytes,
    MIN(ra.days_old) as newest_data_age_days,
    MAX(ra.days_old) as oldest_data_age_days,
    AVG(ra.daily_avg_temp) as avg_temperature_for_tier

  FROM retention_analysis ra
  GROUP BY ra.location, ra.data_tier
),

archival_recommendations AS (
  SELECT 
    ac.location,
    ac.data_tier,
    ac.total_records,
    ROUND(ac.total_estimated_bytes / 1024.0 / 1024.0, 2) as estimated_storage_mb,
    ac.oldest_data_age_days,

    -- Archival recommendations
    CASE ac.data_tier
      WHEN 'archive_candidate' THEN 'ARCHIVE: Move to cold storage or delete'
      WHEN 'cold_data' THEN 'CONSIDER: Compress or move to slower storage'
      WHEN 'warm_data' THEN 'OPTIMIZE: Apply compression if not already done'
      ELSE 'KEEP: Hot data for active queries'
    END as retention_recommendation,

    -- Priority scoring for archival actions
    CASE 
      WHEN ac.data_tier = 'archive_candidate' AND ac.total_estimated_bytes > 100*1024*1024 THEN 'high_priority'
      WHEN ac.data_tier = 'cold_data' AND ac.total_estimated_bytes > 50*1024*1024 THEN 'medium_priority'
      WHEN ac.data_tier IN ('archive_candidate', 'cold_data') THEN 'low_priority'
      ELSE 'no_action_needed'
    END as archival_priority,

    -- Estimated storage savings
    CASE ac.data_tier
      WHEN 'archive_candidate' THEN ac.total_estimated_bytes * 0.9  -- 90% savings from deletion
      WHEN 'cold_data' THEN ac.total_estimated_bytes * 0.6  -- 60% savings from compression
      ELSE 0
    END as potential_storage_savings_bytes

  FROM archival_candidates ac
)

SELECT 
  ar.location,
  ar.data_tier,
  ar.total_records,
  ar.estimated_storage_mb,
  ar.oldest_data_age_days,
  ar.retention_recommendation,
  ar.archival_priority,
  ROUND(ar.potential_storage_savings_bytes / 1024.0 / 1024.0, 2) as potential_savings_mb,

  -- Specific actions
  CASE ar.data_tier
    WHEN 'archive_candidate' THEN 
      FORMAT('DELETE FROM sensor_data WHERE timestamp < CURRENT_DATE - INTERVAL ''%s days'' AND metadata.location = ''%s''', 
             ar.oldest_data_age_days, ar.location)
    WHEN 'cold_data' THEN
      FORMAT('Consider enabling compression for location: %s', ar.location)
    ELSE 'No action required'
  END as suggested_action

FROM archival_recommendations ar
WHERE ar.archival_priority != 'no_action_needed'
ORDER BY 
  CASE ar.archival_priority
    WHEN 'high_priority' THEN 1
    WHEN 'medium_priority' THEN 2  
    WHEN 'low_priority' THEN 3
    ELSE 4
  END,
  ar.estimated_storage_mb DESC;

-- QueryLeaf provides comprehensive MongoDB time series capabilities:
-- 1. Purpose-built time series collections with automatic optimization
-- 2. Advanced temporal aggregation with statistical analysis
-- 3. Intelligent bucketing and compression for storage efficiency
-- 4. Built-in retention policies and lifecycle management
-- 5. Real-time analytics and anomaly detection
-- 6. Comprehensive trend analysis and seasonality detection
-- 7. SQL-familiar syntax for complex time series operations
-- 8. Automatic indexing and query optimization
-- 9. Production-ready time series analytics and reporting
-- 10. Integration with MongoDB's native time series optimizations

Best Practices for Production Time Series Applications

Storage Optimization and Performance Strategy

Essential principles for effective MongoDB time series application deployment:

  1. Collection Design: Configure appropriate time series granularity and bucketing strategies based on data ingestion patterns
  2. Index Strategy: Create compound indexes optimizing for common query patterns combining time ranges with metadata filters
  3. Compression Management: Enable appropriate compression algorithms to optimize storage efficiency for temporal data
  4. Retention Policies: Implement automatic data expiration and archival strategies aligned with business requirements
  5. Aggregation Optimization: Design aggregation pipelines that leverage time series collection optimizations
  6. Monitoring Integration: Track collection performance, storage utilization, and query patterns for continuous optimization

Scalability and Production Deployment

Optimize time series operations for enterprise-scale requirements:

  1. Sharding Strategy: Design shard keys that support time-based distribution and query patterns
  2. Data Lifecycle Management: Implement tiered storage strategies for hot, warm, and cold time series data
  3. Real-Time Processing: Configure streaming aggregation and real-time analytics for time-sensitive applications
  4. Capacity Planning: Monitor ingestion rates, storage growth, and query performance for scaling decisions
  5. Disaster Recovery: Design backup and recovery strategies appropriate for time series data characteristics
  6. Integration Patterns: Implement integration with monitoring, alerting, and visualization platforms

Conclusion

MongoDB time series collections provide comprehensive temporal data management capabilities that enable efficient storage, high-performance analytics, and scalable ingestion for IoT, monitoring, and analytical applications through purpose-built storage optimization, advanced compression, and specialized indexing strategies. The native time series support ensures that temporal workloads benefit from MongoDB's storage efficiency, query optimization, and analytical capabilities.

Key MongoDB Time Series benefits include:

  • Storage Optimization: Automatic compression and bucketing strategies optimized for temporal data patterns
  • Query Performance: Specialized indexing and aggregation capabilities for time-range and analytical queries
  • Ingestion Efficiency: High-throughput data insertion with minimal overhead and optimal storage utilization
  • Analytical Capabilities: Built-in aggregation functions designed for time series analytics and trend analysis
  • Lifecycle Management: Automatic retention policies and data expiration for operational efficiency
  • SQL Accessibility: Familiar SQL-style time series operations through QueryLeaf for accessible temporal data management

Whether you're building IoT platforms, system monitoring solutions, financial analytics applications, or sensor data management systems, MongoDB time series collections with QueryLeaf's familiar SQL interface provide the foundation for efficient, scalable temporal data management.

QueryLeaf Integration: QueryLeaf automatically optimizes MongoDB time series operations while providing SQL-familiar syntax for temporal data management, aggregation, and analytics. Advanced time series patterns, compression strategies, and analytical functions are seamlessly handled through familiar SQL constructs, making sophisticated time series applications accessible to SQL-oriented development teams.

The combination of MongoDB's robust time series capabilities with SQL-style temporal operations makes it an ideal platform for applications requiring both high-performance time series storage and familiar database management patterns, ensuring your temporal data operations can scale efficiently while maintaining query performance and storage optimization as data volume and analytical complexity grow.