Blog

September 9, 2025
21 min read

MongoDB Replica Sets and Fault Tolerance: SQL-Style High Availability with Automatic Failover

Production applications demand high availability, data redundancy, and automatic failover capabilities to ensure uninterrupted service and data protection. Traditional SQL databases achieve high availability through complex clustering solutions, master-slave replication, and expensive failover systems that often require manual intervention and specialized expertise.

MongoDB replica sets provide built-in high availability with automatic failover, distributed consensus, and flexible read/write routing - all managed through a simple, unified interface. Unlike traditional database clustering solutions that require separate technologies and complex configuration, MongoDB replica sets deliver enterprise-grade availability features as a core part of the database platform.

The High Availability Challenge

Traditional SQL database high availability approaches involve significant complexity:

-- Traditional SQL high availability setup challenges

-- Master-Slave replication requires manual failover
-- Primary database server
CREATE SERVER primary_db
  CONNECTION 'host=db-master.example.com port=5432 dbname=production';

-- Read replica servers  
CREATE SERVER replica_db_1
  CONNECTION 'host=db-replica-1.example.com port=5432 dbname=production';
CREATE SERVER replica_db_2  
  CONNECTION 'host=db-replica-2.example.com port=5432 dbname=production';

-- Application must handle connection routing
-- Read queries to replicas
SELECT customer_id, name, email 
FROM customers@replica_db_1
WHERE status = 'active';

-- Write queries to master
INSERT INTO orders (customer_id, product_id, quantity)
VALUES (123, 456, 2)
-- Must go to primary_db

-- Problems with traditional approaches:
-- - Manual failover when primary fails
-- - Complex connection string management
-- - Application-level routing logic required
-- - No automatic primary election
-- - Split-brain scenarios possible
-- - Expensive clustering solutions
-- - Recovery requires manual intervention
-- - No built-in consistency guarantees

MongoDB replica sets provide automatic high availability:

// MongoDB replica set - automatic high availability
// Single connection string handles all routing
const mongoUrl = 'mongodb://db1.example.com,db2.example.com,db3.example.com/production?replicaSet=prodRS';

const client = new MongoClient(mongoUrl, {
  // Automatic failover and reconnection
  maxPoolSize: 10,
  serverSelectionTimeoutMS: 5000,
  heartbeatFrequencyMS: 10000,

  // Write concern for durability
  writeConcern: {
    w: 'majority',      // Write to majority of replica set members
    j: true,            // Ensure write to journal
    wtimeout: 5000      // Timeout for write acknowledgment
  },

  // Read preference for load distribution
  readPreference: 'secondaryPreferred', // Use secondaries when available
  readConcern: { level: 'majority' }     // Consistent reads
});

// Application code remains unchanged - replica set handles routing
const db = client.db('production');
const orders = db.collection('orders');

// Writes automatically go to primary
await orders.insertOne({
  customerId: ObjectId('64f1a2c4567890abcdef1234'),
  productId: ObjectId('64f1a2c4567890abcdef5678'),
  quantity: 2,
  orderDate: new Date(),
  status: 'pending'
});

// Reads can use secondaries based on read preference
const recentOrders = await orders.find({
  orderDate: { $gte: new Date(Date.now() - 24 * 60 * 60 * 1000) }
}).toArray();

// Benefits:
// - Automatic primary election on failure
// - Transparent failover (no connection string changes)
// - Built-in data consistency guarantees
// - Distributed consensus prevents split-brain
// - Hot standby replicas ready immediately
// - Rolling updates without downtime
// - Geographic distribution support
// - No additional clustering software needed

Understanding MongoDB Replica Sets

Replica Set Architecture and Consensus

Implement robust replica set configurations with proper consensus:

// Replica set configuration and management
class ReplicaSetManager {
  constructor() {
    this.replicaSetConfig = {
      _id: 'prodRS',
      version: 1,
      members: [
        {
          _id: 0,
          host: 'db-primary.example.com:27017',
          priority: 10,        // High priority for primary preference
          arbiterOnly: false,
          buildIndexes: true,
          hidden: false,
          slaveDelay: 0,
          votes: 1
        },
        {
          _id: 1, 
          host: 'db-secondary-1.example.com:27017',
          priority: 5,         // Medium priority for secondary
          arbiterOnly: false,
          buildIndexes: true,
          hidden: false,
          slaveDelay: 0,
          votes: 1
        },
        {
          _id: 2,
          host: 'db-secondary-2.example.com:27017', 
          priority: 5,         // Medium priority for secondary
          arbiterOnly: false,
          buildIndexes: true,
          hidden: false,
          slaveDelay: 0,
          votes: 1
        },
        {
          _id: 3,
          host: 'db-arbiter.example.com:27017',
          priority: 0,         // Arbiters cannot become primary
          arbiterOnly: true,   // Voting member but no data
          buildIndexes: false,
          hidden: false,
          slaveDelay: 0,
          votes: 1
        }
      ],
      settings: {
        chainingAllowed: true,              // Allow secondary-to-secondary sync
        heartbeatIntervalMillis: 2000,      // Heartbeat frequency
        heartbeatTimeoutSecs: 10,           // Heartbeat timeout
        electionTimeoutMillis: 10000,       // Election timeout
        catchUpTimeoutMillis: 60000,        // Catchup period for new primary
        getLastErrorModes: {
          'datacenter': {                   // Custom write concern mode
            'dc1': 1,
            'dc2': 1
          }
        },
        getLastErrorDefaults: {
          w: 'majority',
          wtimeout: 5000
        }
      }
    };
  }

  async initializeReplicaSet(primaryConnection) {
    try {
      // Initialize replica set on primary node
      const result = await primaryConnection.db('admin').runCommand({
        replSetInitiate: this.replicaSetConfig
      });

      console.log('Replica set initialization result:', result);

      // Wait for replica set to stabilize
      await this.waitForReplicaSetReady(primaryConnection);

      return { success: true, config: this.replicaSetConfig };

    } catch (error) {
      throw new Error(`Replica set initialization failed: ${error.message}`);
    }
  }

  async waitForReplicaSetReady(connection, maxWaitMs = 60000) {
    const startTime = Date.now();

    while (Date.now() - startTime < maxWaitMs) {
      try {
        const status = await connection.db('admin').runCommand({ replSetGetStatus: 1 });

        const primaryCount = status.members.filter(member => member.state === 1).length;
        const secondaryCount = status.members.filter(member => member.state === 2).length;

        if (primaryCount === 1 && secondaryCount >= 1) {
          console.log('Replica set is ready:', {
            primary: primaryCount,
            secondaries: secondaryCount,
            total: status.members.length
          });
          return true;
        }

        console.log('Waiting for replica set to stabilize...', {
          primary: primaryCount,
          secondaries: secondaryCount
        });

      } catch (error) {
        console.log('Replica set not ready yet:', error.message);
      }

      await new Promise(resolve => setTimeout(resolve, 2000));
    }

    throw new Error('Replica set failed to become ready within timeout');
  }

  async addReplicaSetMember(primaryConnection, newMemberConfig) {
    try {
      // Get current configuration
      const currentConfig = await primaryConnection.db('admin').runCommand({
        replSetGetConfig: 1
      });

      // Add new member to configuration
      const updatedConfig = currentConfig.config;
      updatedConfig.version += 1;
      updatedConfig.members.push(newMemberConfig);

      // Reconfigure replica set
      const result = await primaryConnection.db('admin').runCommand({
        replSetReconfig: updatedConfig
      });

      console.log('Member added successfully:', result);
      return result;

    } catch (error) {
      throw new Error(`Failed to add replica set member: ${error.message}`);
    }
  }

  async removeReplicaSetMember(primaryConnection, memberId) {
    try {
      const currentConfig = await primaryConnection.db('admin').runCommand({
        replSetGetConfig: 1
      });

      const updatedConfig = currentConfig.config;
      updatedConfig.version += 1;
      updatedConfig.members = updatedConfig.members.filter(member => member._id !== memberId);

      const result = await primaryConnection.db('admin').runCommand({
        replSetReconfig: updatedConfig
      });

      console.log('Member removed successfully:', result);
      return result;

    } catch (error) {
      throw new Error(`Failed to remove replica set member: ${error.message}`);
    }
  }

  async performStepDown(primaryConnection, stepDownSecs = 60) {
    try {
      // Force primary to step down (useful for maintenance)
      const result = await primaryConnection.db('admin').runCommand({
        replSetStepDown: stepDownSecs,
        secondaryCatchUpPeriodSecs: 15
      });

      console.log('Primary step down initiated:', result);
      return result;

    } catch (error) {
      // Step down command typically causes connection error as primary changes
      if (error.message.includes('connection') || error.message.includes('network')) {
        console.log('Step down successful - connection closed as expected');
        return { success: true, message: 'Primary stepped down successfully' };
      }
      throw error;
    }
  }
}

Write Concerns and Read Preferences

Configure appropriate consistency and performance settings:

// Advanced write concern and read preference management
class ReplicaSetConsistencyManager {
  constructor(client) {
    this.client = client;
    this.db = client.db();

    // Define write concern levels for different operations
    this.writeConcerns = {
      critical: {
        w: 'majority',        // Wait for majority acknowledgment
        j: true,              // Wait for journal sync
        wtimeout: 10000       // 10 second timeout
      },
      standard: {
        w: 'majority',
        j: false,             // Don't wait for journal (faster)
        wtimeout: 5000
      },
      fast: {
        w: 1,                 // Only primary acknowledgment
        j: false,
        wtimeout: 1000
      },
      datacenter: {
        w: 'datacenter',      // Custom write concern mode
        j: true,
        wtimeout: 15000
      }
    };

    // Define read preference strategies
    this.readPreferences = {
      primaryOnly: { mode: 'primary' },
      secondaryPreferred: { 
        mode: 'secondaryPreferred',
        tagSets: [
          { region: 'us-west', datacenter: 'dc1' },  // Prefer specific tags
          { region: 'us-west' },                     // Fall back to region
          {}                                         // Fall back to any
        ],
        maxStalenessSeconds: 90  // Max replication lag
      },
      nearestRead: {
        mode: 'nearest',
        tagSets: [{ region: 'us-west' }],
        maxStalenessSeconds: 60
      },
      analyticsReads: {
        mode: 'secondary',
        tagSets: [{ usage: 'analytics' }],  // Dedicated analytics secondaries
        maxStalenessSeconds: 300
      }
    };
  }

  async performCriticalWrite(collection, operation, data, options = {}) {
    // High consistency write for critical data
    try {
      const session = this.client.startSession();

      const result = await session.withTransaction(async () => {
        const coll = this.db.collection(collection).withOptions({
          writeConcern: this.writeConcerns.critical,
          readPreference: this.readPreferences.primaryOnly
        });

        let operationResult;
        switch (operation) {
          case 'insert':
            operationResult = await coll.insertOne(data, { session });
            break;
          case 'update':
            operationResult = await coll.updateOne(data.filter, data.update, { 
              session, 
              ...options 
            });
            break;
          case 'replace':
            operationResult = await coll.replaceOne(data.filter, data.replacement, { 
              session, 
              ...options 
            });
            break;
          case 'delete':
            operationResult = await coll.deleteOne(data.filter, { session });
            break;
          default:
            throw new Error(`Unsupported operation: ${operation}`);
        }

        // Verify write was acknowledged by majority
        if (operationResult.acknowledged && 
            (operationResult.insertedId || operationResult.modifiedCount || operationResult.deletedCount)) {

          // Add audit log for critical operations
          await this.db.collection('audit_log').insertOne({
            operation: operation,
            collection: collection,
            timestamp: new Date(),
            writeConcern: 'critical',
            sessionId: session.id,
            result: {
              acknowledged: operationResult.acknowledged,
              insertedId: operationResult.insertedId,
              modifiedCount: operationResult.modifiedCount,
              deletedCount: operationResult.deletedCount
            }
          }, { session });
        }

        return operationResult;
      }, {
        readConcern: { level: 'majority' },
        writeConcern: this.writeConcerns.critical
      });

      await session.endSession();
      return result;

    } catch (error) {
      throw new Error(`Critical write failed: ${error.message}`);
    }
  }

  async performFastWrite(collection, operation, data, options = {}) {
    // Fast write for non-critical data
    const coll = this.db.collection(collection).withOptions({
      writeConcern: this.writeConcerns.fast
    });

    switch (operation) {
      case 'insert':
        return await coll.insertOne(data, options);
      case 'insertMany':
        return await coll.insertMany(data, options);
      case 'update':
        return await coll.updateOne(data.filter, data.update, options);
      case 'updateMany':
        return await coll.updateMany(data.filter, data.update, options);
      default:
        throw new Error(`Unsupported fast write operation: ${operation}`);
    }
  }

  async performConsistentRead(collection, query, options = {}) {
    // Read with strong consistency
    const coll = this.db.collection(collection).withOptions({
      readPreference: this.readPreferences.primaryOnly,
      readConcern: { level: 'majority' }
    });

    if (options.findOne) {
      return await coll.findOne(query, options);
    } else {
      return await coll.find(query, options).toArray();
    }
  }

  async performEventuallyConsistentRead(collection, query, options = {}) {
    // Read from secondaries for better performance
    const coll = this.db.collection(collection).withOptions({
      readPreference: this.readPreferences.secondaryPreferred,
      readConcern: { level: 'local' }
    });

    if (options.findOne) {
      return await coll.findOne(query, options);
    } else {
      return await coll.find(query, options).toArray();
    }
  }

  async performAnalyticsRead(collection, pipeline, options = {}) {
    // Long-running analytics queries on dedicated secondaries
    const coll = this.db.collection(collection).withOptions({
      readPreference: this.readPreferences.analyticsReads,
      readConcern: { level: 'available' }  // Fastest read concern
    });

    return await coll.aggregate(pipeline, {
      ...options,
      allowDiskUse: true,           // Allow large aggregations
      maxTimeMS: 300000,            // 5 minute timeout
      batchSize: 1000              // Optimize batch size
    }).toArray();
  }

  async checkReplicationLag() {
    // Monitor replication lag across replica set
    try {
      const status = await this.db.admin().command({ replSetGetStatus: 1 });
      const primary = status.members.find(member => member.state === 1);
      const secondaries = status.members.filter(member => member.state === 2);

      if (!primary) {
        return { error: 'No primary found in replica set' };
      }

      const lagInfo = secondaries.map(secondary => {
        const lagMs = primary.optimeDate.getTime() - secondary.optimeDate.getTime();
        return {
          member: secondary.name,
          lagSeconds: Math.round(lagMs / 1000),
          health: secondary.health,
          state: secondary.stateStr,
          lastHeartbeat: secondary.lastHeartbeat
        };
      });

      const maxLag = Math.max(...lagInfo.map(info => info.lagSeconds));

      return {
        primary: primary.name,
        secondaries: lagInfo,
        maxLagSeconds: maxLag,
        healthy: maxLag < 10, // Consider healthy if under 10 seconds lag
        timestamp: new Date()
      };

    } catch (error) {
      return { error: `Failed to check replication lag: ${error.message}` };
    }
  }

  async adaptWriteConcernBasedOnLag() {
    // Dynamically adjust write concern based on replication lag
    const lagInfo = await this.checkReplicationLag();

    if (lagInfo.error || !lagInfo.healthy) {
      console.warn('Replication issues detected, using primary-only writes');
      return this.writeConcerns.fast; // Fallback to primary-only
    }

    if (lagInfo.maxLagSeconds < 5) {
      return this.writeConcerns.critical; // Normal high consistency
    } else if (lagInfo.maxLagSeconds < 30) {
      return this.writeConcerns.standard; // Medium consistency
    } else {
      return this.writeConcerns.fast; // Primary-only for performance
    }
  }

  async performAdaptiveWrite(collection, operation, data, options = {}) {
    // Automatically choose write concern based on replica set health
    const adaptedWriteConcern = await this.adaptWriteConcernBasedOnLag();

    const coll = this.db.collection(collection).withOptions({
      writeConcern: adaptedWriteConcern
    });

    console.log(`Using adaptive write concern:`, adaptedWriteConcern);

    switch (operation) {
      case 'insert':
        return await coll.insertOne(data, options);
      case 'update':
        return await coll.updateOne(data.filter, data.update, options);
      case 'replace':
        return await coll.replaceOne(data.filter, data.replacement, options);
      case 'delete':
        return await coll.deleteOne(data.filter, options);
      default:
        throw new Error(`Unsupported adaptive write operation: ${operation}`);
    }
  }
}

Failover Testing and Disaster Recovery

Implement comprehensive failover testing and recovery procedures:

// Failover testing and disaster recovery automation
class FailoverTestingManager {
  constructor(replicaSetUrl) {
    this.replicaSetUrl = replicaSetUrl;
    this.testResults = [];
  }

  async simulateNetworkPartition(duration = 30000) {
    // Simulate network partition by stepping down primary
    console.log('Starting network partition simulation...');

    const client = new MongoClient(this.replicaSetUrl);
    await client.connect();

    try {
      const startTime = Date.now();

      // Record initial replica set status
      const initialStatus = await this.getReplicaSetStatus(client);
      console.log('Initial status:', {
        primary: initialStatus.primary,
        secondaries: initialStatus.secondaries.length
      });

      // Force primary step down
      await client.db('admin').command({
        replSetStepDown: Math.ceil(duration / 1000),
        secondaryCatchUpPeriodSecs: 10
      });

      // Monitor failover process
      const failoverResult = await this.monitorFailover(client, duration);

      const testResult = {
        testType: 'network_partition',
        startTime: new Date(startTime),
        duration: duration,
        initialPrimary: initialStatus.primary,
        failoverTime: failoverResult.failoverTime,
        newPrimary: failoverResult.newPrimary,
        dataConsistency: await this.verifyDataConsistency(client),
        success: failoverResult.success
      };

      this.testResults.push(testResult);
      return testResult;

    } finally {
      await client.close();
    }
  }

  async simulateSecondaryFailure(secondaryHost) {
    // Simulate secondary node failure
    console.log(`Simulating failure of secondary: ${secondaryHost}`);

    const client = new MongoClient(this.replicaSetUrl);
    await client.connect();

    try {
      const startTime = Date.now();
      const initialStatus = await this.getReplicaSetStatus(client);

      // Simulate removing secondary from replica set
      const config = await client.db('admin').command({ replSetGetConfig: 1 });
      const targetMember = config.config.members.find(m => m.host.includes(secondaryHost));

      if (!targetMember) {
        throw new Error(`Secondary ${secondaryHost} not found in replica set`);
      }

      const updatedConfig = { ...config.config };
      updatedConfig.version += 1;
      updatedConfig.members = updatedConfig.members.filter(m => m._id !== targetMember._id);

      await client.db('admin').command({ replSetReconfig: updatedConfig });

      // Wait for configuration change to take effect
      await this.waitForConfigurationChange(client, 30000);

      // Test write operations during reduced redundancy
      const writeTestResult = await this.testWriteOperations(client);

      // Restore secondary after test period
      setTimeout(async () => {
        await this.restoreSecondary(client, targetMember);
      }, 60000);

      const testResult = {
        testType: 'secondary_failure',
        startTime: new Date(startTime),
        failedSecondary: secondaryHost,
        initialSecondaries: initialStatus.secondaries.length,
        writeTestResult: writeTestResult,
        success: writeTestResult.success
      };

      this.testResults.push(testResult);
      return testResult;

    } finally {
      await client.close();
    }
  }

  async testWriteOperations(client, testCount = 100) {
    // Test write operations during failure scenarios
    const testCollection = client.db('test').collection('failover_test');
    const results = {
      attempted: testCount,
      successful: 0,
      failed: 0,
      errors: [],
      averageLatency: 0,
      success: false
    };

    const latencies = [];

    for (let i = 0; i < testCount; i++) {
      const startTime = Date.now();

      try {
        await testCollection.insertOne({
          testId: i,
          timestamp: new Date(),
          data: `Test document ${i}`,
          failoverTest: true
        }, {
          writeConcern: { w: 'majority', wtimeout: 5000 }
        });

        const latency = Date.now() - startTime;
        latencies.push(latency);
        results.successful++;

      } catch (error) {
        results.failed++;
        results.errors.push({
          testId: i,
          error: error.message,
          timestamp: new Date()
        });
      }

      // Small delay between writes
      await new Promise(resolve => setTimeout(resolve, 100));
    }

    if (latencies.length > 0) {
      results.averageLatency = latencies.reduce((a, b) => a + b, 0) / latencies.length;
    }

    results.success = results.successful >= (testCount * 0.95); // 95% success rate

    // Clean up test data
    await testCollection.deleteMany({ failoverTest: true });

    return results;
  }

  async monitorFailover(client, maxWaitTime) {
    // Monitor replica set during failover
    const startTime = Date.now();
    let newPrimary = null;
    let failoverTime = null;

    while (Date.now() - startTime < maxWaitTime) {
      try {
        const status = await this.getReplicaSetStatus(client);

        if (status.primary && status.primary !== 'No primary') {
          newPrimary = status.primary;
          failoverTime = Date.now() - startTime;
          console.log(`New primary elected: ${newPrimary} (${failoverTime}ms)`);
          break;
        }

        console.log('Waiting for new primary election...');
        await new Promise(resolve => setTimeout(resolve, 1000));

      } catch (error) {
        console.log('Error during failover monitoring:', error.message);
        await new Promise(resolve => setTimeout(resolve, 2000));
      }
    }

    return {
      success: newPrimary !== null,
      newPrimary: newPrimary,
      failoverTime: failoverTime
    };
  }

  async getReplicaSetStatus(client) {
    // Get current replica set status
    try {
      const status = await client.db('admin').command({ replSetGetStatus: 1 });

      const primary = status.members.find(m => m.state === 1);
      const secondaries = status.members.filter(m => m.state === 2);
      const arbiters = status.members.filter(m => m.state === 7);

      return {
        primary: primary ? primary.name : 'No primary',
        secondaries: secondaries.map(s => ({ name: s.name, health: s.health })),
        arbiters: arbiters.map(a => ({ name: a.name, health: a.health })),
        ok: status.ok
      };

    } catch (error) {
      return {
        error: error.message,
        primary: 'Unknown',
        secondaries: [],
        arbiters: []
      };
    }
  }

  async verifyDataConsistency(client) {
    // Verify data consistency across replica set
    try {
      // Insert test document with strong consistency
      const testDoc = {
        _id: new ObjectId(),
        consistencyTest: true,
        timestamp: new Date(),
        randomValue: Math.random()
      };

      const testCollection = client.db('test').collection('consistency_test');

      await testCollection.insertOne(testDoc, {
        writeConcern: { w: 'majority', j: true }
      });

      // Wait for replication
      await new Promise(resolve => setTimeout(resolve, 2000));

      // Read from primary
      const primaryResult = await testCollection.findOne(
        { _id: testDoc._id },
        { readPreference: { mode: 'primary' } }
      );

      // Read from secondary
      const secondaryResult = await testCollection.findOne(
        { _id: testDoc._id },
        { 
          readPreference: { mode: 'secondaryPreferred' },
          maxStalenessSeconds: 10
        }
      );

      // Clean up
      await testCollection.deleteOne({ _id: testDoc._id });

      const consistent = primaryResult && secondaryResult && 
                        primaryResult.randomValue === secondaryResult.randomValue;

      return {
        consistent: consistent,
        primaryResult: primaryResult ? 'found' : 'not found',
        secondaryResult: secondaryResult ? 'found' : 'not found',
        timestamp: new Date()
      };

    } catch (error) {
      return {
        consistent: false,
        error: error.message,
        timestamp: new Date()
      };
    }
  }

  async generateFailoverReport() {
    // Generate comprehensive failover test report
    if (this.testResults.length === 0) {
      return { message: 'No failover tests have been run' };
    }

    const report = {
      totalTests: this.testResults.length,
      successfulTests: this.testResults.filter(t => t.success).length,
      failedTests: this.testResults.filter(t => !t.success).length,
      averageFailoverTime: 0,
      testTypes: {},
      consistency: {
        passed: 0,
        failed: 0
      },
      recommendations: []
    };

    // Calculate statistics
    const failoverTimes = this.testResults
      .filter(t => t.failoverTime)
      .map(t => t.failoverTime);

    if (failoverTimes.length > 0) {
      report.averageFailoverTime = failoverTimes.reduce((a, b) => a + b, 0) / failoverTimes.length;
    }

    // Group by test type
    this.testResults.forEach(result => {
      if (!report.testTypes[result.testType]) {
        report.testTypes[result.testType] = {
          count: 0,
          successful: 0,
          failed: 0
        };
      }

      report.testTypes[result.testType].count++;
      if (result.success) {
        report.testTypes[result.testType].successful++;
      } else {
        report.testTypes[result.testType].failed++;
      }
    });

    // Consistency check summary
    this.testResults.forEach(result => {
      if (result.dataConsistency) {
        if (result.dataConsistency.consistent) {
          report.consistency.passed++;
        } else {
          report.consistency.failed++;
        }
      }
    });

    // Generate recommendations
    if (report.averageFailoverTime > 30000) {
      report.recommendations.push('Consider tuning election timeout settings for faster failover');
    }

    if (report.consistency.failed > 0) {
      report.recommendations.push('Data consistency issues detected - review read/write concern settings');
    }

    if (report.failedTests > report.totalTests * 0.1) {
      report.recommendations.push('High failure rate detected - review replica set configuration');
    }

    report.generatedAt = new Date();
    return report;
  }

  // Utility methods
  async waitForConfigurationChange(client, maxWait) {
    const startTime = Date.now();
    while (Date.now() - startTime < maxWait) {
      try {
        await client.db('admin').command({ replSetGetStatus: 1 });
        return true;
      } catch (error) {
        await new Promise(resolve => setTimeout(resolve, 1000));
      }
    }
    return false;
  }

  async restoreSecondary(client, memberConfig) {
    try {
      const config = await client.db('admin').command({ replSetGetConfig: 1 });
      const updatedConfig = { ...config.config };
      updatedConfig.version += 1;
      updatedConfig.members.push(memberConfig);

      await client.db('admin').command({ replSetReconfig: updatedConfig });
      console.log('Secondary restored successfully');
    } catch (error) {
      console.error('Failed to restore secondary:', error.message);
    }
  }
}

Advanced Replica Set Patterns

Geographic Distribution and Multi-Region Setup

Implement geographically distributed replica sets:

// Multi-region replica set configuration
class GeographicReplicaSetManager {
  constructor() {
    this.multiRegionConfig = {
      _id: 'globalRS',
      version: 1,
      members: [
        // Primary region (US East)
        {
          _id: 0,
          host: 'db-primary-us-east.example.com:27017',
          priority: 10,
          tags: { 
            region: 'us-east',
            datacenter: 'dc1',
            usage: 'primary'
          }
        },
        {
          _id: 1,
          host: 'db-secondary-us-east.example.com:27017',
          priority: 8,
          tags: { 
            region: 'us-east',
            datacenter: 'dc1',
            usage: 'secondary'
          }
        },

        // Secondary region (US West)
        {
          _id: 2,
          host: 'db-secondary-us-west.example.com:27017',
          priority: 6,
          tags: { 
            region: 'us-west',
            datacenter: 'dc2',
            usage: 'secondary'
          }
        },
        {
          _id: 3,
          host: 'db-secondary-us-west-2.example.com:27017',
          priority: 5,
          tags: { 
            region: 'us-west',
            datacenter: 'dc2',
            usage: 'analytics'
          }
        },

        // Tertiary region (Europe)
        {
          _id: 4,
          host: 'db-secondary-eu-west.example.com:27017',
          priority: 4,
          tags: { 
            region: 'eu-west',
            datacenter: 'dc3',
            usage: 'secondary'
          }
        },

        // Arbiter for odd number voting
        {
          _id: 5,
          host: 'arbiter-us-central.example.com:27017',
          priority: 0,
          arbiterOnly: true,
          tags: { 
            region: 'us-central',
            usage: 'arbiter'
          }
        }
      ],
      settings: {
        getLastErrorModes: {
          'multiRegion': {
            'region': 2  // Require writes to reach 2 different regions
          },
          'crossDatacenter': {
            'datacenter': 2  // Require writes to reach 2 different datacenters
          }
        },
        getLastErrorDefaults: {
          w: 'multiRegion',
          wtimeout: 10000
        }
      }
    };
  }

  createRegionalReadPreferences() {
    return {
      // US East users - prefer local region
      usEastUsers: {
        mode: 'secondaryPreferred',
        tagSets: [
          { region: 'us-east' },
          { region: 'us-west' },
          {}
        ],
        maxStalenessSeconds: 90
      },

      // US West users - prefer local region
      usWestUsers: {
        mode: 'secondaryPreferred',
        tagSets: [
          { region: 'us-west' },
          { region: 'us-east' },
          {}
        ],
        maxStalenessSeconds: 90
      },

      // European users - prefer local region
      europeanUsers: {
        mode: 'secondaryPreferred',
        tagSets: [
          { region: 'eu-west' },
          { region: 'us-east' },
          {}
        ],
        maxStalenessSeconds: 120
      },

      // Analytics workloads - dedicated secondaries
      analytics: {
        mode: 'secondary',
        tagSets: [
          { usage: 'analytics' }
        ],
        maxStalenessSeconds: 300
      }
    };
  }

  async routeRequestByRegion(clientRegion, operation, collection, data) {
    const readPreferences = this.createRegionalReadPreferences();
    const regionPreference = readPreferences[`${clientRegion}Users`] || readPreferences.usEastUsers;

    // Create region-optimized connection
    const client = new MongoClient(this.globalReplicaSetUrl, {
      readPreference: regionPreference,
      writeConcern: { w: 'multiRegion', wtimeout: 10000 }
    });

    try {
      await client.connect();
      const db = client.db();
      const coll = db.collection(collection);

      switch (operation.type) {
        case 'read':
          return await coll.find(data.query).toArray();

        case 'write':
          // Ensure cross-region durability for writes
          return await coll.insertOne(data, {
            writeConcern: { w: 'multiRegion', j: true, wtimeout: 15000 }
          });

        case 'update':
          return await coll.updateOne(data.filter, data.update, {
            writeConcern: { w: 'crossDatacenter', j: true, wtimeout: 12000 }
          });

        default:
          throw new Error(`Unsupported operation: ${operation.type}`);
      }

    } finally {
      await client.close();
    }
  }

  async monitorCrossRegionLatency() {
    // Monitor latency between regions
    const regions = ['us-east', 'us-west', 'eu-west'];
    const latencyResults = {};

    for (const region of regions) {
      try {
        const startTime = Date.now();

        // Connect with region-specific preference
        const client = new MongoClient(this.globalReplicaSetUrl, {
          readPreference: {
            mode: 'secondary',
            tagSets: [{ region: region }]
          }
        });

        await client.connect();

        // Perform test read
        await client.db('test').collection('ping').findOne({});

        const latency = Date.now() - startTime;
        latencyResults[region] = {
          latency: latency,
          status: latency < 200 ? 'good' : latency < 500 ? 'acceptable' : 'poor'
        };

        await client.close();

      } catch (error) {
        latencyResults[region] = {
          latency: null,
          status: 'error',
          error: error.message
        };
      }
    }

    return {
      timestamp: new Date(),
      regions: latencyResults,
      averageLatency: Object.values(latencyResults)
        .filter(r => r.latency)
        .reduce((sum, r, _, arr) => sum + r.latency / arr.length, 0)
    };
  }
}

Rolling Maintenance and Zero-Downtime Updates

Implement maintenance procedures without service interruption:

// Zero-downtime maintenance manager
class MaintenanceManager {
  constructor(replicaSetUrl) {
    this.replicaSetUrl = replicaSetUrl;
    this.maintenanceLog = [];
  }

  async performRollingMaintenance(maintenanceConfig) {
    // Perform rolling maintenance across replica set
    console.log('Starting rolling maintenance:', maintenanceConfig);

    const maintenanceSession = {
      id: `maintenance_${Date.now()}`,
      startTime: new Date(),
      config: maintenanceConfig,
      steps: [],
      status: 'running'
    };

    try {
      // Step 1: Perform maintenance on secondaries first
      await this.maintainSecondaries(maintenanceSession);

      // Step 2: Step down primary and maintain
      await this.maintainPrimary(maintenanceSession);

      // Step 3: Verify replica set health
      await this.verifyPostMaintenanceHealth(maintenanceSession);

      maintenanceSession.status = 'completed';
      maintenanceSession.endTime = new Date();

    } catch (error) {
      maintenanceSession.status = 'failed';
      maintenanceSession.error = error.message;
      maintenanceSession.endTime = new Date();

      throw error;
    } finally {
      this.maintenanceLog.push(maintenanceSession);
    }

    return maintenanceSession;
  }

  async maintainSecondaries(maintenanceSession) {
    const client = new MongoClient(this.replicaSetUrl);
    await client.connect();

    try {
      const status = await client.db('admin').command({ replSetGetStatus: 1 });
      const secondaries = status.members.filter(m => m.state === 2);

      for (const secondary of secondaries) {
        console.log(`Maintaining secondary: ${secondary.name}`);

        const stepStart = Date.now();

        // Remove secondary from replica set temporarily
        await this.removeSecondaryForMaintenance(client, secondary);

        // Perform maintenance operations
        await this.performMaintenanceOperations(secondary, maintenanceSession.config);

        // Add secondary back to replica set
        await this.addSecondaryAfterMaintenance(client, secondary);

        // Wait for secondary to catch up
        await this.waitForSecondaryCatchup(client, secondary.name);

        maintenanceSession.steps.push({
          type: 'secondary_maintenance',
          member: secondary.name,
          startTime: new Date(stepStart),
          endTime: new Date(),
          duration: Date.now() - stepStart,
          success: true
        });

        console.log(`Secondary ${secondary.name} maintenance completed`);
      }

    } finally {
      await client.close();
    }
  }

  async maintainPrimary(maintenanceSession) {
    const client = new MongoClient(this.replicaSetUrl);
    await client.connect();

    try {
      const stepStart = Date.now();

      // Get current primary
      const status = await client.db('admin').command({ replSetGetStatus: 1 });
      const primary = status.members.find(m => m.state === 1);

      if (!primary) {
        throw new Error('No primary found in replica set');
      }

      console.log(`Maintaining primary: ${primary.name}`);

      // Step down primary to trigger election
      await client.db('admin').command({
        replSetStepDown: 300, // 5 minutes
        secondaryCatchUpPeriodSecs: 30
      });

      // Wait for new primary to be elected
      await this.waitForNewPrimary(client, primary.name);

      // Perform maintenance on the stepped-down primary (now secondary)
      await this.performMaintenanceOperations(primary, maintenanceSession.config);

      // Wait for maintenance to complete and node to rejoin
      await this.waitForNodeRejoin(client, primary.name);

      maintenanceSession.steps.push({
        type: 'primary_maintenance',
        member: primary.name,
        startTime: new Date(stepStart),
        endTime: new Date(),
        duration: Date.now() - stepStart,
        success: true
      });

      console.log(`Primary ${primary.name} maintenance completed`);

    } finally {
      await client.close();
    }
  }

  async performMaintenanceOperations(member, config) {
    // Simulate maintenance operations
    console.log(`Performing maintenance operations on ${member.name}`);

    const operations = [];

    if (config.operations.includes('system_update')) {
      operations.push(this.simulateSystemUpdate(member));
    }

    if (config.operations.includes('mongodb_upgrade')) {
      operations.push(this.simulateMongoDBUpgrade(member));
    }

    if (config.operations.includes('index_rebuild')) {
      operations.push(this.simulateIndexRebuild(member));
    }

    if (config.operations.includes('disk_maintenance')) {
      operations.push(this.simulateDiskMaintenance(member));
    }

    // Execute all maintenance operations
    await Promise.all(operations);

    console.log(`Maintenance operations completed for ${member.name}`);
  }

  async simulateSystemUpdate(member) {
    console.log(`Applying system updates to ${member.name}`);
    // Simulate system update time
    await new Promise(resolve => setTimeout(resolve, 30000)); // 30 seconds
  }

  async simulateMongoDBUpgrade(member) {
    console.log(`Upgrading MongoDB on ${member.name}`);
    // Simulate MongoDB upgrade time
    await new Promise(resolve => setTimeout(resolve, 60000)); // 1 minute
  }

  async simulateIndexRebuild(member) {
    console.log(`Rebuilding indexes on ${member.name}`);
    // Simulate index rebuild time
    await new Promise(resolve => setTimeout(resolve, 120000)); // 2 minutes
  }

  async simulateDiskMaintenance(member) {
    console.log(`Performing disk maintenance on ${member.name}`);
    // Simulate disk maintenance time
    await new Promise(resolve => setTimeout(resolve, 45000)); // 45 seconds
  }

  async waitForNewPrimary(client, oldPrimaryName, maxWait = 60000) {
    const startTime = Date.now();

    while (Date.now() - startTime < maxWait) {
      try {
        const status = await client.db('admin').command({ replSetGetStatus: 1 });
        const primary = status.members.find(m => m.state === 1);

        if (primary && primary.name !== oldPrimaryName) {
          console.log(`New primary elected: ${primary.name}`);
          return primary;
        }

      } catch (error) {
        console.log('Waiting for primary election...', error.message);
      }

      await new Promise(resolve => setTimeout(resolve, 2000));
    }

    throw new Error('New primary not elected within timeout');
  }

  async waitForSecondaryCatchup(client, memberName, maxWait = 120000) {
    const startTime = Date.now();

    while (Date.now() - startTime < maxWait) {
      try {
        const status = await client.db('admin').command({ replSetGetStatus: 1 });
        const member = status.members.find(m => m.name === memberName);

        if (member && member.state === 2) { // Secondary state
          const primary = status.members.find(m => m.state === 1);
          if (primary) {
            const lag = primary.optimeDate.getTime() - member.optimeDate.getTime();
            if (lag < 10000) { // Less than 10 seconds lag
              console.log(`${memberName} caught up (lag: ${lag}ms)`);
              return true;
            }
          }
        }

      } catch (error) {
        console.log(`Waiting for ${memberName} to catch up...`, error.message);
      }

      await new Promise(resolve => setTimeout(resolve, 5000));
    }

    throw new Error(`${memberName} failed to catch up within timeout`);
  }

  async verifyPostMaintenanceHealth(maintenanceSession) {
    const client = new MongoClient(this.replicaSetUrl);
    await client.connect();

    try {
      const healthCheck = {
        timestamp: new Date(),
        replicaSetStatus: null,
        primaryElected: false,
        allMembersHealthy: false,
        replicationLag: null,
        writeTest: null,
        readTest: null
      };

      // Check replica set status
      const status = await client.db('admin').command({ replSetGetStatus: 1 });
      healthCheck.replicaSetStatus = status.ok === 1 ? 'healthy' : 'unhealthy';

      // Check primary election
      const primary = status.members.find(m => m.state === 1);
      healthCheck.primaryElected = !!primary;

      // Check member health
      const unhealthyMembers = status.members.filter(m => m.health !== 1);
      healthCheck.allMembersHealthy = unhealthyMembers.length === 0;

      // Check replication lag
      if (primary) {
        const secondaries = status.members.filter(m => m.state === 2);
        const maxLag = Math.max(...secondaries.map(s => 
          primary.optimeDate.getTime() - s.optimeDate.getTime()
        ));
        healthCheck.replicationLag = Math.round(maxLag / 1000); // seconds
      }

      // Test write operations
      try {
        await client.db('test').collection('maintenance_test').insertOne({
          test: 'post_maintenance_write',
          timestamp: new Date()
        }, { writeConcern: { w: 'majority', wtimeout: 5000 } });

        healthCheck.writeTest = 'passed';
      } catch (error) {
        healthCheck.writeTest = `failed: ${error.message}`;
      }

      // Test read operations
      try {
        await client.db('test').collection('maintenance_test').findOne({
          test: 'post_maintenance_write'
        });

        healthCheck.readTest = 'passed';

        // Clean up test document
        await client.db('test').collection('maintenance_test').deleteOne({
          test: 'post_maintenance_write'
        });

      } catch (error) {
        healthCheck.readTest = `failed: ${error.message}`;
      }

      maintenanceSession.postMaintenanceHealth = healthCheck;

      const isHealthy = healthCheck.replicaSetStatus === 'healthy' &&
                       healthCheck.primaryElected &&
                       healthCheck.allMembersHealthy &&
                       healthCheck.replicationLag < 30 &&
                       healthCheck.writeTest === 'passed' &&
                       healthCheck.readTest === 'passed';

      if (!isHealthy) {
        throw new Error(`Post-maintenance health check failed: ${JSON.stringify(healthCheck)}`);
      }

      console.log('Post-maintenance health check passed:', healthCheck);
      return healthCheck;

    } finally {
      await client.close();
    }
  }

  // Utility methods for maintenance operations
  async removeSecondaryForMaintenance(client, secondary) {
    // Temporarily remove secondary from replica set
    console.log(`Removing ${secondary.name} for maintenance`);
    // Implementation would remove member from config
  }

  async addSecondaryAfterMaintenance(client, secondary) {
    // Add secondary back to replica set
    console.log(`Adding ${secondary.name} back after maintenance`);
    // Implementation would add member back to config
  }

  async waitForNodeRejoin(client, memberName, maxWait = 180000) {
    // Wait for node to rejoin and become healthy
    const startTime = Date.now();

    while (Date.now() - startTime < maxWait) {
      try {
        const status = await client.db('admin').command({ replSetGetStatus: 1 });
        const member = status.members.find(m => m.name === memberName);

        if (member && (member.state === 1 || member.state === 2) && member.health === 1) {
          console.log(`${memberName} rejoined as ${member.stateStr}`);
          return true;
        }

      } catch (error) {
        console.log(`Waiting for ${memberName} to rejoin...`, error.message);
      }

      await new Promise(resolve => setTimeout(resolve, 5000));
    }

    throw new Error(`${memberName} failed to rejoin within timeout`);
  }
}

SQL-Style High Availability with QueryLeaf

QueryLeaf provides familiar SQL approaches to MongoDB replica set management:

-- QueryLeaf high availability operations with SQL-style syntax

-- Monitor replica set status
SELECT 
  member_name,
  member_state,
  member_health,
  priority,
  votes,
  CASE member_state
    WHEN 1 THEN 'PRIMARY'
    WHEN 2 THEN 'SECONDARY'  
    WHEN 7 THEN 'ARBITER'
    ELSE 'OTHER'
  END as role_description
FROM REPLICA_SET_STATUS()
ORDER BY member_state, priority DESC;

-- Check replication lag across members
WITH replication_status AS (
  SELECT 
    primary_optime,
    member_name,
    member_optime,
    member_state,
    EXTRACT(EPOCH FROM (primary_optime - member_optime)) as lag_seconds
  FROM REPLICA_SET_STATUS()
  WHERE member_state IN (1, 2) -- Primary and Secondary only
)
SELECT 
  member_name,
  CASE 
    WHEN lag_seconds <= 1 THEN 'Excellent'
    WHEN lag_seconds <= 5 THEN 'Good' 
    WHEN lag_seconds <= 30 THEN 'Acceptable'
    ELSE 'Poor'
  END as replication_health,
  lag_seconds,
  CASE 
    WHEN lag_seconds > 60 THEN 'CRITICAL: High replication lag'
    WHEN lag_seconds > 30 THEN 'WARNING: Monitor replication lag'
    ELSE 'OK'
  END as alert_level
FROM replication_status
WHERE member_state = 2 -- Secondaries only
ORDER BY lag_seconds DESC;

-- High availability connection management
-- QueryLeaf automatically handles connection routing
SELECT 
  customer_id,
  order_date,
  total_amount,
  status
FROM orders 
WITH READ_PREFERENCE = 'secondaryPreferred'
WHERE order_date >= CURRENT_DATE - INTERVAL '7 days'
  AND status = 'pending'
ORDER BY order_date DESC;

-- Critical writes with strong consistency
INSERT INTO financial_transactions (
  account_id,
  transaction_type,
  amount,
  timestamp,
  reference_number
)
VALUES (
  '12345',
  'withdrawal',
  500.00,
  CURRENT_TIMESTAMP,
  'TXN_' || EXTRACT(EPOCH FROM CURRENT_TIMESTAMP)
)
WITH WRITE_CONCERN = ('w=majority', 'j=true', 'wtimeout=10000');

-- Geographic read routing
SELECT 
  product_id,
  name,
  price,
  inventory_count
FROM products
WITH READ_PREFERENCE = 'secondary',
     TAG_SETS = '[{"region": "us-west"}, {"region": "us-east"}, {}]',
     MAX_STALENESS = 90
WHERE category = 'electronics'
  AND inventory_count > 0;

-- Multi-region write durability
UPDATE customer_profiles
SET last_login = CURRENT_TIMESTAMP,
    login_count = login_count + 1
WHERE customer_id = @customer_id
WITH WRITE_CONCERN = ('w=multiRegion', 'j=true', 'wtimeout=15000');

-- Failover testing and monitoring
WITH failover_metrics AS (
  SELECT 
    test_timestamp,
    test_type,
    failover_duration_ms,
    success,
    old_primary,
    new_primary
  FROM FAILOVER_TEST_RESULTS()
  WHERE test_timestamp >= CURRENT_DATE - INTERVAL '30 days'
)
SELECT 
  test_type,
  COUNT(*) as total_tests,
  SUM(CASE WHEN success THEN 1 ELSE 0 END) as successful_tests,
  AVG(failover_duration_ms) as avg_failover_time,
  MIN(failover_duration_ms) as min_failover_time,
  MAX(failover_duration_ms) as max_failover_time,
  ROUND(
    (SUM(CASE WHEN success THEN 1 ELSE 0 END)::FLOAT / COUNT(*)) * 100, 
    2
  ) as success_rate_percent
FROM failover_metrics
GROUP BY test_type
ORDER BY success_rate_percent DESC;

-- Maintenance scheduling and coordination
BEGIN;

-- Check replica set health before maintenance
IF EXISTS(
  SELECT 1 FROM REPLICA_SET_STATUS() 
  WHERE member_health != 1 
     OR (member_state = 2 AND replication_lag_seconds > 30)
) 
BEGIN
  ROLLBACK;
  RAISERROR('Replica set unhealthy - maintenance postponed', 16, 1);
  RETURN;
END;

-- Schedule rolling maintenance
EXEC SCHEDULE_MAINTENANCE 
  @maintenance_type = 'rolling_update',
  @operations = 'mongodb_upgrade,index_rebuild',
  @start_time = '2025-09-10 02:00:00 UTC',
  @max_duration_hours = 4,
  @notification_endpoints = 'ops-team@example.com,slack-ops-channel';

COMMIT;

-- Performance monitoring across replica set members
SELECT 
  member_name,
  member_type,
  -- Connection metrics
  active_connections,
  available_connections,
  connections_created_per_second,

  -- Operation metrics  
  queries_per_second,
  inserts_per_second,
  updates_per_second,
  deletes_per_second,

  -- Resource utilization
  cpu_utilization_percent,
  memory_usage_mb,
  disk_usage_percent,
  network_io_mb_per_second,

  -- Replica set specific metrics
  replication_lag_seconds,
  replication_batch_size,

  -- Health indicators
  CASE 
    WHEN cpu_utilization_percent > 90 THEN 'CPU_HIGH'
    WHEN memory_usage_mb > memory_limit_mb * 0.9 THEN 'MEMORY_HIGH'
    WHEN disk_usage_percent > 85 THEN 'DISK_HIGH'
    WHEN replication_lag_seconds > 60 THEN 'REPLICATION_LAG'
    WHEN active_connections > available_connections * 0.8 THEN 'CONNECTION_HIGH'
    ELSE 'HEALTHY'
  END as health_status

FROM REPLICA_SET_PERFORMANCE_METRICS()
WHERE sample_timestamp >= CURRENT_TIMESTAMP - INTERVAL '5 minutes'
ORDER BY 
  CASE member_type WHEN 'PRIMARY' THEN 1 WHEN 'SECONDARY' THEN 2 ELSE 3 END,
  member_name;

-- Automatic failover and recovery tracking
WITH failover_events AS (
  SELECT 
    event_timestamp,
    event_type,
    old_primary,
    new_primary,
    cause,
    recovery_time_seconds,
    data_loss_detected,
    applications_affected
  FROM REPLICA_SET_EVENT_LOG
  WHERE event_type IN ('failover', 'stepdown', 'election')
    AND event_timestamp >= CURRENT_DATE - INTERVAL '90 days'
)
SELECT 
  DATE_TRUNC('week', event_timestamp) as week_start,
  COUNT(*) as total_events,
  SUM(CASE WHEN event_type = 'failover' THEN 1 ELSE 0 END) as failover_count,
  AVG(recovery_time_seconds) as avg_recovery_time,
  SUM(CASE WHEN data_loss_detected THEN 1 ELSE 0 END) as data_loss_events,
  STRING_AGG(DISTINCT cause, ', ') as failure_causes,

  -- Calculate availability
  ROUND(
    (1 - (SUM(recovery_time_seconds) / (7 * 24 * 3600))) * 100, 
    4
  ) as weekly_availability_percent

FROM failover_events
GROUP BY DATE_TRUNC('week', event_timestamp)
ORDER BY week_start DESC;

-- QueryLeaf provides comprehensive replica set management:
-- 1. Automatic connection routing based on read preferences
-- 2. Write concern enforcement for data durability
-- 3. Geographic distribution with tag-based routing
-- 4. Built-in failover testing and monitoring
-- 5. Maintenance coordination and scheduling
-- 6. Performance monitoring across all replica set members
-- 7. SQL-familiar syntax for all high availability operations

Best Practices for MongoDB High Availability

Replica Set Configuration Guidelines

Essential practices for production replica sets:

Odd Number of Voting Members: Use odd numbers (3, 5, 7) to prevent election ties
Geographic Distribution: Spread members across availability zones or regions
Appropriate Member Types: Use arbiters judiciously for voting without data storage
Priority Settings: Configure priorities to influence primary election preference
Write Concerns: Choose appropriate write concerns balancing durability and performance
Read Preferences: Distribute read load while maintaining consistency requirements

Monitoring and Alerting

Implement comprehensive monitoring for replica sets:

Health Monitoring: Track member health, state, and connectivity
Replication Lag: Monitor and alert on excessive replication lag
Performance Metrics: Track throughput, latency, and resource utilization
Failover Detection: Automated detection and response to failover events
Capacity Planning: Monitor growth trends and capacity requirements
Security Monitoring: Track authentication failures and unauthorized access

Conclusion

MongoDB replica sets provide enterprise-grade high availability with automatic failover, distributed consensus, and flexible consistency controls. Unlike traditional database clustering solutions that require complex setup and manual intervention, MongoDB replica sets deliver robust availability features as core database functionality.

Key high availability benefits include:

Automatic Failover: Transparent primary election and failover without manual intervention
Data Redundancy: Multiple synchronized copies ensure data protection and availability
Geographic Distribution: Support for multi-region deployments with local read performance
Flexible Consistency: Tunable read and write concerns to balance performance and consistency
Zero-Downtime Maintenance: Rolling updates and maintenance without service interruption

Whether you're building mission-critical applications, global platforms, or systems requiring 99.9%+ availability, MongoDB replica sets with QueryLeaf's familiar SQL interface provide the foundation for robust, highly available database infrastructure. This combination enables you to implement sophisticated availability patterns while preserving familiar administration and query approaches.

QueryLeaf Integration: QueryLeaf automatically manages MongoDB replica set connections, read/write routing, and failover handling while providing SQL-familiar syntax for high availability operations. Complex replica set management, geographic distribution, and consistency controls are seamlessly handled through familiar SQL patterns, making enterprise-grade availability both powerful and accessible.

The integration of automatic high availability with SQL-style administration makes MongoDB an ideal platform for applications requiring both robust availability guarantees and familiar database management patterns, ensuring your high availability strategy remains both effective and maintainable as it scales and evolves.

September 8, 2025
20 min read

MongoDB Text Search and Full-Text Indexing: SQL-Style Content Search with Advanced Language Processing

Modern applications are increasingly content-driven, requiring sophisticated search capabilities that go far beyond simple string matching. Whether you're building a document management system, content management platform, e-commerce catalog, or knowledge base, users expect fast, relevant, and intelligent text search that understands natural language, handles multiple languages, and delivers ranked results.

Traditional database text search often relies on basic LIKE operations or external search engines, creating complexity in system architecture and data synchronization. MongoDB's built-in text search capabilities provide native full-text indexing, language-aware search, relevance scoring, and advanced text analysis - all integrated seamlessly with your document database.

The Traditional Text Search Challenge

Conventional approaches to text search have significant limitations:

-- SQL basic text search - limited and inefficient
-- Simple pattern matching - no relevance scoring
SELECT 
  article_id,
  title,
  content,
  author,
  published_date
FROM articles
WHERE title LIKE '%mongodb%'
   OR content LIKE '%database%'
   OR content LIKE '%nosql%'
ORDER BY published_date DESC
LIMIT 20;

-- Problems with LIKE-based search:
-- - No relevance ranking or scoring
-- - Case-sensitive matching issues
-- - No stemming (search for "running" won't find "run")
-- - No language-specific processing
-- - Poor performance on large text fields
-- - No phrase matching or proximity search
-- - Cannot handle synonyms or related terms

-- More advanced SQL text search with full-text indexes
CREATE FULLTEXT INDEX article_search_idx ON articles(title, content);

SELECT 
  article_id,
  title,
  MATCH(title, content) AGAINST('mongodb database nosql' IN NATURAL LANGUAGE MODE) as relevance_score
FROM articles
WHERE MATCH(title, content) AGAINST('mongodb database nosql' IN NATURAL LANGUAGE MODE)
ORDER BY relevance_score DESC
LIMIT 20;

-- Limitations even with full-text indexes:
-- - Limited language support and customization
-- - Basic relevance scoring algorithms
-- - Difficulty combining with other query conditions
-- - Limited control over text analysis pipeline
-- - No support for complex document structures
-- - Separate indexing and maintenance overhead

MongoDB text search provides comprehensive solutions:

// MongoDB native text search - comprehensive and efficient
// Create sophisticated text index with language-specific processing
db.articles.createIndex({
  title: "text",
  content: "text", 
  tags: "text",
  author: "text"
}, {
  weights: {
    title: 10,    // Title matches weighted higher
    content: 5,   // Content matches medium weight
    tags: 8,      // Tag matches high weight
    author: 3     // Author matches lower weight
  },
  default_language: "english",
  language_override: "language", // Per-document language specification
  textIndexVersion: 3,
  name: "comprehensive_text_search"
});

// Powerful text search with relevance scoring and ranking
const searchResults = await db.articles.find({
  $text: {
    $search: "mongodb database nosql performance",
    $language: "english",
    $caseSensitive: false,
    $diacriticSensitive: false
  }
}, {
  score: { $meta: "textScore" },
  title: 1,
  content: 1,
  author: 1,
  published_date: 1,
  tags: 1
}).sort({ 
  score: { $meta: "textScore" } 
}).limit(20);

// Benefits of MongoDB text search:
// - Built-in relevance scoring with customizable weights
// - Language-aware stemming and stop word processing
// - Phrase matching and proximity scoring
// - Case and diacritic insensitive search
// - Integration with other query conditions
// - Support for 15+ languages out of the box
// - Efficient indexing with document structure awareness
// - Real-time search without external dependencies

Understanding MongoDB Text Search

Text Index Fundamentals

Implement comprehensive text indexing strategies:

// Advanced text indexing for content-rich applications
class TextSearchManager {
  constructor(db) {
    this.db = db;
    this.searchConfig = {
      supportedLanguages: [
        'english', 'spanish', 'french', 'german', 'portuguese',
        'russian', 'arabic', 'chinese', 'japanese', 'korean'
      ],
      defaultWeights: {
        title: 10,
        summary: 8,
        content: 5,
        tags: 7,
        category: 6,
        author: 3
      }
    };
  }

  async setupComprehensiveTextIndex(collectionName, indexConfig) {
    const collection = this.db.collection(collectionName);

    // Build text index specification
    const indexSpec = {};
    const options = {
      weights: {},
      default_language: indexConfig.defaultLanguage || 'english',
      language_override: indexConfig.languageField || 'language',
      textIndexVersion: 3,
      name: `${collectionName}_comprehensive_text_search`
    };

    // Configure searchable fields with weights
    for (const [field, weight] of Object.entries(indexConfig.fields)) {
      indexSpec[field] = 'text';
      options.weights[field] = weight;
    }

    // Add partial filter for performance
    if (indexConfig.partialFilter) {
      options.partialFilterExpression = indexConfig.partialFilter;
    }

    // Create the text index
    await collection.createIndex(indexSpec, options);

    console.log(`Text index created for ${collectionName}:`, {
      fields: Object.keys(indexSpec),
      weights: options.weights,
      language: options.default_language
    });

    return {
      collection: collectionName,
      indexName: options.name,
      configuration: options
    };
  }

  async setupArticleTextSearch() {
    // Specialized text search for article/blog content
    const articleConfig = {
      fields: {
        title: 10,           // Highest priority for title matches
        summary: 8,          // High priority for summary/excerpt
        content: 5,          // Medium priority for body content
        tags: 7,             // High priority for tag matches
        category: 6,         // Medium-high for category matches
        author: 3            // Lower priority for author matches
      },
      defaultLanguage: 'english',
      languageField: 'language',
      partialFilter: { 
        status: 'published',
        deleted: { $ne: true }
      }
    };

    return await this.setupComprehensiveTextIndex('articles', articleConfig);
  }

  async setupProductTextSearch() {
    // E-commerce product search configuration
    const productConfig = {
      fields: {
        name: 10,            // Product name highest priority
        description: 6,      // Product description medium-high
        brand: 8,            // Brand name high priority
        category: 7,         // Category high priority
        tags: 8,             // Product tags high priority
        specifications: 4,   // Technical specs lower priority
        reviews: 3           // Customer review content lowest
      },
      defaultLanguage: 'english',
      languageField: 'language',
      partialFilter: { 
        active: true,
        in_stock: true
      }
    };

    return await this.setupComprehensiveTextIndex('products', productConfig);
  }

  async performTextSearch(collectionName, searchQuery, options = {}) {
    const collection = this.db.collection(collectionName);

    // Build text search query
    const textSearchFilter = {
      $text: {
        $search: searchQuery,
        $language: options.language || 'english',
        $caseSensitive: options.caseSensitive || false,
        $diacriticSensitive: options.diacriticSensitive || false
      }
    };

    // Combine with additional filters
    const combinedFilter = { ...textSearchFilter };
    if (options.additionalFilters) {
      Object.assign(combinedFilter, options.additionalFilters);
    }

    // Build projection with text score
    const projection = {
      score: { $meta: "textScore" }
    };

    if (options.fields) {
      options.fields.forEach(field => {
        projection[field] = 1;
      });
    }

    // Execute search with scoring and sorting
    const cursor = collection.find(combinedFilter, projection)
      .sort({ score: { $meta: "textScore" } });

    // Apply pagination
    if (options.skip) cursor.skip(options.skip);
    if (options.limit) cursor.limit(options.limit);

    const results = await cursor.toArray();

    // Enhance results with search metadata
    return {
      results: results.map(doc => ({
        ...doc,
        relevanceScore: doc.score,
        searchQuery: searchQuery,
        matchedTerms: this.extractMatchedTerms(doc, searchQuery)
      })),
      searchMetadata: {
        query: searchQuery,
        totalResults: results.length,
        language: options.language || 'english',
        searchTime: new Date(),
        facets: await this.generateSearchFacets(collectionName, combinedFilter, options)
      }
    };
  }

  async performAdvancedTextSearch(collectionName, searchConfig) {
    // Advanced search with multiple query types and aggregation
    const collection = this.db.collection(collectionName);

    const pipeline = [];

    // Stage 1: Text search matching
    if (searchConfig.textQuery) {
      pipeline.push({
        $match: {
          $text: {
            $search: searchConfig.textQuery,
            $language: searchConfig.language || 'english'
          }
        }
      });

      // Add text score to documents
      pipeline.push({
        $addFields: {
          textScore: { $meta: "textScore" }
        }
      });
    }

    // Stage 2: Additional filtering
    if (searchConfig.filters) {
      pipeline.push({
        $match: searchConfig.filters
      });
    }

    // Stage 3: Enhanced scoring with business logic
    if (searchConfig.customScoring) {
      pipeline.push({
        $addFields: {
          combinedScore: {
            $add: [
              { $multiply: ["$textScore", searchConfig.textWeight || 1] },
              { $multiply: [
                { $cond: [{ $gte: ["$popularity_score", 80] }, 2, 1] },
                searchConfig.popularityWeight || 0.5
              ]},
              { $multiply: [
                { $cond: [{ $gte: ["$recency_days", 0] }, 
                  { $subtract: [30, "$recency_days"] }, 0] },
                searchConfig.recencyWeight || 0.3
              ]}
            ]
          }
        }
      });
    }

    // Stage 4: Faceted search aggregation
    if (searchConfig.generateFacets) {
      pipeline.push({
        $facet: {
          results: [
            { $sort: { 
              [searchConfig.customScoring ? 'combinedScore' : 'textScore']: -1 
            }},
            { $skip: searchConfig.skip || 0 },
            { $limit: searchConfig.limit || 20 },
            {
              $project: {
                _id: 1,
                title: 1,
                content: { $substr: ["$content", 0, 200] }, // Excerpt
                author: 1,
                category: 1,
                tags: 1,
                published_date: 1,
                textScore: 1,
                combinedScore: searchConfig.customScoring ? 1 : 0,
                highlightedContent: {
                  $function: {
                    body: `function(content, query) {
                      const terms = query.split(' ');
                      let highlighted = content;
                      terms.forEach(term => {
                        const regex = new RegExp(term, 'gi');
                        highlighted = highlighted.replace(regex, '<mark>$&</mark>');
                      });
                      return highlighted.substring(0, 300);
                    }`,
                    args: ["$content", searchConfig.textQuery],
                    lang: "js"
                  }
                }
              }
            }
          ],
          categoryFacets: [
            { $group: { 
              _id: "$category", 
              count: { $sum: 1 },
              avgScore: { $avg: "$textScore" }
            }},
            { $sort: { count: -1 } },
            { $limit: 10 }
          ],
          authorFacets: [
            { $group: { 
              _id: "$author", 
              count: { $sum: 1 },
              avgScore: { $avg: "$textScore" }
            }},
            { $sort: { count: -1 } },
            { $limit: 10 }
          ],
          tagFacets: [
            { $unwind: "$tags" },
            { $group: { 
              _id: "$tags", 
              count: { $sum: 1 },
              avgScore: { $avg: "$textScore" }
            }},
            { $sort: { count: -1 } },
            { $limit: 15 }
          ],
          dateRangeFacets: [
            {
              $group: {
                _id: {
                  $dateToString: {
                    format: "%Y-%m",
                    date: "$published_date"
                  }
                },
                count: { $sum: 1 },
                avgScore: { $avg: "$textScore" }
              }
            },
            { $sort: { "_id": -1 } },
            { $limit: 12 }
          ]
        }
      });
    } else {
      // Simple results without faceting
      pipeline.push(
        { $sort: { 
          [searchConfig.customScoring ? 'combinedScore' : 'textScore']: -1 
        }},
        { $skip: searchConfig.skip || 0 },
        { $limit: searchConfig.limit || 20 }
      );
    }

    const searchResults = await collection.aggregate(pipeline).toArray();

    return {
      searchResults: searchConfig.generateFacets ? searchResults[0] : { results: searchResults },
      searchMetadata: {
        query: searchConfig.textQuery,
        filters: searchConfig.filters,
        language: searchConfig.language,
        searchTime: new Date(),
        configuration: searchConfig
      }
    };
  }

  async performPhraseSearch(collectionName, phrase, options = {}) {
    // Exact phrase and proximity matching
    const collection = this.db.collection(collectionName);

    const searchQueries = [
      // Exact phrase search (quoted)
      {
        query: `"${phrase}"`,
        weight: 3,
        type: 'exact_phrase'
      },
      // Proximity search (words near each other)
      {
        query: phrase.split(' ').join(' '),
        weight: 2,
        type: 'proximity'  
      },
      // Individual terms (fallback)
      {
        query: phrase,
        weight: 1,
        type: 'terms'
      }
    ];

    const results = [];

    for (const searchQuery of searchQueries) {
      const queryResults = await collection.find({
        $text: { $search: searchQuery.query }
      }, {
        score: { $meta: "textScore" },
        title: 1,
        content: 1,
        author: 1,
        published_date: 1
      }).sort({ 
        score: { $meta: "textScore" } 
      }).limit(options.limit || 10).toArray();

      // Weight and tag results by search type
      const weightedResults = queryResults.map(doc => ({
        ...doc,
        searchType: searchQuery.type,
        adjustedScore: doc.score * searchQuery.weight,
        originalScore: doc.score
      }));

      results.push(...weightedResults);
    }

    // Deduplicate and merge results
    const deduplicatedResults = this.deduplicateSearchResults(results);

    // Sort by adjusted score
    deduplicatedResults.sort((a, b) => b.adjustedScore - a.adjustedScore);

    return {
      results: deduplicatedResults.slice(0, options.limit || 20),
      phraseQuery: phrase,
      searchStrategies: searchQueries.map(q => q.type)
    };
  }

  async performMultiLanguageSearch(collectionName, searchQuery, languages = []) {
    // Search across multiple languages with language detection
    const collection = this.db.collection(collectionName);

    const searchPromises = languages.map(async language => {
      const results = await collection.find({
        $text: {
          $search: searchQuery,
          $language: language
        }
      }, {
        score: { $meta: "textScore" },
        title: 1,
        content: 1,
        language: 1,
        author: 1,
        published_date: 1
      }).sort({ 
        score: { $meta: "textScore" } 
      }).limit(10).toArray();

      return results.map(doc => ({
        ...doc,
        searchLanguage: language,
        languageScore: doc.score
      }));
    });

    const languageResults = await Promise.all(searchPromises);
    const allResults = languageResults.flat();

    // Group by document and select best language match
    const documentMap = new Map();

    allResults.forEach(result => {
      const docId = result._id.toString();
      if (!documentMap.has(docId) || 
          documentMap.get(docId).languageScore < result.languageScore) {
        documentMap.set(docId, result);
      }
    });

    const finalResults = Array.from(documentMap.values())
      .sort((a, b) => b.languageScore - a.languageScore);

    return {
      results: finalResults,
      searchQuery: searchQuery,
      languagesSearched: languages,
      languageDistribution: this.calculateLanguageDistribution(finalResults)
    };
  }

  extractMatchedTerms(document, searchQuery) {
    // Extract which search terms matched in the document
    const searchTerms = searchQuery.toLowerCase().split(/\s+/);
    const documentText = [
      document.title || '',
      document.content || '',
      (document.tags || []).join(' '),
      document.author || ''
    ].join(' ').toLowerCase();

    return searchTerms.filter(term => 
      documentText.includes(term)
    );
  }

  async generateSearchFacets(collectionName, baseFilter, options) {
    // Generate search facets for filtering
    const collection = this.db.collection(collectionName);

    const facetPipeline = [
      { $match: baseFilter },
      {
        $facet: {
          categories: [
            { $group: { _id: "$category", count: { $sum: 1 } } },
            { $sort: { count: -1 } }
          ],
          authors: [
            { $group: { _id: "$author", count: { $sum: 1 } } },
            { $sort: { count: -1 } }
          ],
          dateRanges: [
            {
              $group: {
                _id: {
                  $dateToString: { format: "%Y", date: "$published_date" }
                },
                count: { $sum: 1 }
              }
            },
            { $sort: { "_id": -1 } }
          ]
        }
      }
    ];

    const facetResults = await collection.aggregate(facetPipeline).toArray();
    return facetResults[0] || {};
  }

  deduplicateSearchResults(results) {
    // Remove duplicate documents, keeping highest scored version
    const seen = new Map();

    results.forEach(result => {
      const docId = result._id.toString();
      if (!seen.has(docId) || seen.get(docId).adjustedScore < result.adjustedScore) {
        seen.set(docId, result);
      }
    });

    return Array.from(seen.values());
  }

  calculateLanguageDistribution(results) {
    // Calculate distribution of results by language
    const distribution = {};

    results.forEach(result => {
      const lang = result.searchLanguage || result.language || 'unknown';
      distribution[lang] = (distribution[lang] || 0) + 1;
    });

    return distribution;
  }

  async createSearchSuggestions(collectionName, partialQuery, options = {}) {
    // Generate search suggestions based on partial input
    const collection = this.db.collection(collectionName);

    // Use aggregation to find common terms
    const suggestionPipeline = [
      {
        $match: {
          $or: [
            { title: { $regex: partialQuery, $options: 'i' } },
            { tags: { $regex: partialQuery, $options: 'i' } },
            { category: { $regex: partialQuery, $options: 'i' } }
          ]
        }
      },
      {
        $project: {
          words: {
            $concatArrays: [
              { $split: [{ $toLower: "$title" }, " "] },
              { $ifNull: ["$tags", []] },
              [{ $toLower: "$category" }]
            ]
          }
        }
      },
      { $unwind: "$words" },
      {
        $match: {
          words: { $regex: `^${partialQuery.toLowerCase()}`, $options: 'i' }
        }
      },
      {
        $group: {
          _id: "$words",
          frequency: { $sum: 1 }
        }
      },
      {
        $match: {
          frequency: { $gte: 2 } // Only suggest terms that appear multiple times
        }
      },
      { $sort: { frequency: -1 } },
      { $limit: options.maxSuggestions || 10 }
    ];

    const suggestions = await collection.aggregate(suggestionPipeline).toArray();

    return suggestions.map(s => ({
      term: s._id,
      frequency: s.frequency,
      suggestion: s._id
    }));
  }
}

Advanced Text Analysis and Processing

Implement sophisticated text processing capabilities:

// Advanced text analysis and custom processing
class TextAnalysisEngine {
  constructor(db) {
    this.db = db;
    this.stopWords = {
      english: ['the', 'and', 'or', 'but', 'in', 'on', 'at', 'to', 'for', 'of', 'with', 'by'],
      spanish: ['el', 'la', 'de', 'que', 'y', 'a', 'en', 'un', 'es', 'se', 'no', 'te'],
      french: ['le', 'de', 'et', 'à', 'un', 'il', 'être', 'et', 'en', 'avoir', 'que', 'pour']
    };
  }

  async analyzeDocumentContent(document, language = 'english') {
    // Comprehensive document analysis for search optimization
    const content = this.extractTextContent(document);

    const analysis = {
      documentId: document._id,
      language: language,

      // Basic statistics
      wordCount: this.countWords(content),
      characterCount: content.length,
      sentenceCount: this.countSentences(content),
      paragraphCount: this.countParagraphs(content),

      // Term analysis
      termFrequency: await this.calculateTermFrequency(content, language),
      keyPhrases: await this.extractKeyPhrases(content, language),
      namedEntities: await this.extractNamedEntities(content),

      // Content quality metrics
      readabilityScore: this.calculateReadabilityScore(content),
      contentDensity: this.calculateContentDensity(content),
      uniqueTermsRatio: await this.calculateUniqueTermsRatio(content, language),

      // Search optimization data
      suggestedTags: await this.generateSuggestedTags(content, language),
      searchKeywords: await this.extractSearchKeywords(content, language),
      contentSummary: await this.generateContentSummary(content),

      analyzedAt: new Date()
    };

    return analysis;
  }

  extractTextContent(document) {
    // Extract searchable text from document structure
    const textFields = [];

    if (document.title) textFields.push(document.title);
    if (document.content) textFields.push(document.content);
    if (document.summary) textFields.push(document.summary);
    if (document.description) textFields.push(document.description);
    if (document.tags) textFields.push(document.tags.join(' '));

    return textFields.join(' ');
  }

  async calculateTermFrequency(content, language) {
    // Calculate term frequency for search relevance
    const words = content.toLowerCase()
      .replace(/[^\w\s]/g, ' ')
      .split(/\s+/)
      .filter(word => word.length > 2);

    // Remove stop words
    const stopWords = this.stopWords[language] || this.stopWords.english;
    const filteredWords = words.filter(word => !stopWords.includes(word));

    // Calculate frequency
    const frequency = {};
    filteredWords.forEach(word => {
      frequency[word] = (frequency[word] || 0) + 1;
    });

    // Return sorted by frequency
    return Object.entries(frequency)
      .sort((a, b) => b[1] - a[1])
      .slice(0, 20)
      .map(([term, count]) => ({
        term,
        frequency: count,
        percentage: (count / filteredWords.length) * 100
      }));
  }

  async extractKeyPhrases(content, language, maxPhrases = 10) {
    // Extract key phrases (2-3 word combinations)
    const sentences = content.split(/[.!?]+/);
    const phrases = [];

    sentences.forEach(sentence => {
      const words = sentence.toLowerCase()
        .replace(/[^\w\s]/g, ' ')
        .split(/\s+/)
        .filter(word => word.length > 2);

      // Generate 2-word phrases
      for (let i = 0; i < words.length - 1; i++) {
        const phrase = `${words[i]} ${words[i + 1]}`;
        if (!this.isStopWordPhrase(phrase, language)) {
          phrases.push(phrase);
        }
      }

      // Generate 3-word phrases
      for (let i = 0; i < words.length - 2; i++) {
        const phrase = `${words[i]} ${words[i + 1]} ${words[i + 2]}`;
        if (!this.isStopWordPhrase(phrase, language)) {
          phrases.push(phrase);
        }
      }
    });

    // Count phrase frequency
    const phraseFreq = {};
    phrases.forEach(phrase => {
      phraseFreq[phrase] = (phraseFreq[phrase] || 0) + 1;
    });

    return Object.entries(phraseFreq)
      .filter(([phrase, count]) => count >= 2) // Only phrases that appear multiple times
      .sort((a, b) => b[1] - a[1])
      .slice(0, maxPhrases)
      .map(([phrase, frequency]) => ({ phrase, frequency }));
  }

  async extractNamedEntities(content) {
    // Simple named entity recognition
    const entities = {
      persons: [],
      organizations: [],
      locations: [],
      technologies: []
    };

    // Technology terms (common in technical content)
    const techTerms = [
      'mongodb', 'javascript', 'python', 'java', 'react', 'node.js', 'express',
      'angular', 'vue', 'postgresql', 'mysql', 'redis', 'docker', 'kubernetes',
      'aws', 'azure', 'google cloud', 'github', 'gitlab', 'jenkins'
    ];

    const lowerContent = content.toLowerCase();

    techTerms.forEach(term => {
      if (lowerContent.includes(term.toLowerCase())) {
        entities.technologies.push({
          entity: term,
          occurrences: (lowerContent.match(new RegExp(term.toLowerCase(), 'g')) || []).length
        });
      }
    });

    // Simple patterns for other entities (can be enhanced with NLP libraries)
    const capitalizedWords = content.match(/\b[A-Z][a-z]+(?:\s+[A-Z][a-z]+)*\b/g) || [];

    // Classify capitalized words (simplified heuristic)
    capitalizedWords.forEach(word => {
      if (word.length > 2 && !this.isCommonWord(word)) {
        // Simple classification based on context patterns
        if (content.includes(`${word} Inc`) || content.includes(`${word} Corp`) ||
            content.includes(`${word} Company`) || content.includes(`${word} Ltd`)) {
          entities.organizations.push({ entity: word, confidence: 0.7 });
        } else if (content.includes(`in ${word}`) || content.includes(`at ${word}`) ||
                   content.includes(`${word} city`) || content.includes(`${word} state`)) {
          entities.locations.push({ entity: word, confidence: 0.6 });
        } else {
          entities.persons.push({ entity: word, confidence: 0.5 });
        }
      }
    });

    return entities;
  }

  async generateSuggestedTags(content, language) {
    // Generate suggested tags based on content analysis
    const termFreq = await this.calculateTermFrequency(content, language);
    const keyPhrases = await this.extractKeyPhrases(content, language, 5);
    const entities = await this.extractNamedEntities(content);

    const suggestedTags = [];

    // Add high-frequency terms
    termFreq.slice(0, 5).forEach(term => {
      suggestedTags.push({
        tag: term.term,
        source: 'term_frequency',
        confidence: Math.min(term.percentage / 10, 1.0)
      });
    });

    // Add key phrases
    keyPhrases.forEach(phrase => {
      suggestedTags.push({
        tag: phrase.phrase.replace(/\s+/g, '-'),
        source: 'key_phrase',
        confidence: Math.min(phrase.frequency / 5, 1.0)
      });
    });

    // Add technology entities
    entities.technologies.forEach(tech => {
      suggestedTags.push({
        tag: tech.entity.toLowerCase().replace(/\s+/g, '-'),
        source: 'technology',
        confidence: Math.min(tech.occurrences / 3, 1.0)
      });
    });

    // Sort by confidence and remove duplicates
    return suggestedTags
      .filter(tag => tag.confidence > 0.3)
      .sort((a, b) => b.confidence - a.confidence)
      .slice(0, 10);
  }

  async generateContentSummary(content, maxLength = 200) {
    // Generate content summary for search previews
    const sentences = content.split(/[.!?]+/)
      .map(s => s.trim())
      .filter(s => s.length > 20);

    if (sentences.length === 0) {
      return content.substring(0, maxLength);
    }

    // Score sentences based on term frequency and position
    const termFreq = await this.calculateTermFrequency(content, 'english');
    const importantTerms = termFreq.slice(0, 10).map(t => t.term);

    const sentenceScores = sentences.map((sentence, index) => {
      let score = 0;

      // Position score (earlier sentences weighted higher)
      score += Math.max(0, 5 - index) * 0.2;

      // Important terms score
      const lowerSentence = sentence.toLowerCase();
      importantTerms.forEach(term => {
        if (lowerSentence.includes(term)) {
          score += 1;
        }
      });

      return { sentence, score, index };
    });

    // Select top sentences while maintaining order
    const selectedSentences = sentenceScores
      .sort((a, b) => b.score - a.score)
      .slice(0, 3)
      .sort((a, b) => a.index - b.index);

    const summary = selectedSentences
      .map(s => s.sentence)
      .join('. ');

    return summary.length > maxLength ? 
      summary.substring(0, maxLength - 3) + '...' : 
      summary;
  }

  calculateReadabilityScore(content) {
    // Simple readability score calculation
    const words = content.split(/\s+/).length;
    const sentences = content.split(/[.!?]+/).length;
    const characters = content.replace(/\s/g, '').length;

    if (sentences === 0 || words === 0) return 0;

    // Simplified Flesch Reading Ease formula
    const avgWordsPerSentence = words / sentences;
    const avgCharsPerWord = characters / words;

    const readabilityScore = 206.835 - 1.015 * avgWordsPerSentence - 84.6 * (avgCharsPerWord / 5);
    return Math.max(0, Math.min(100, readabilityScore));
  }

  countWords(content) {
    return content.split(/\s+/).filter(word => word.length > 0).length;
  }

  countSentences(content) {
    return content.split(/[.!?]+/).filter(s => s.trim().length > 0).length;
  }

  countParagraphs(content) {
    return content.split(/\n\s*\n/).filter(p => p.trim().length > 0).length;
  }

  isStopWordPhrase(phrase, language) {
    const stopWords = this.stopWords[language] || this.stopWords.english;
    const words = phrase.split(' ');
    return words.every(word => stopWords.includes(word));
  }

  isCommonWord(word) {
    const commonWords = ['The', 'This', 'That', 'With', 'From', 'They', 'Have', 'More'];
    return commonWords.includes(word);
  }
}

Search Performance Optimization

Implement search optimization and caching strategies:

// Search performance optimization and caching
class SearchOptimizationService {
  constructor(db, cacheService) {
    this.db = db;
    this.cache = cacheService;
    this.searchStats = db.collection('search_statistics');
    this.popularQueries = new Map();
  }

  async setupSearchPerformanceMonitoring() {
    // Monitor and optimize search performance
    const searchCollections = ['articles', 'products', 'documents'];

    for (const collection of searchCollections) {
      await this.analyzeTextIndexPerformance(collection);
      await this.setupSearchQueryCaching(collection);
      await this.createSearchAnalytics(collection);
    }
  }

  async analyzeTextIndexPerformance(collectionName) {
    // Analyze text index performance and suggest optimizations
    const collection = this.db.collection(collectionName);

    // Get index statistics
    const indexStats = await collection.aggregate([
      { $indexStats: {} },
      { $match: { "key.title": "text" } } // Find text indexes
    ]).toArray();

    // Sample query performance
    const sampleQueries = [
      'database performance optimization',
      'mongodb indexing strategies', 
      'full text search implementation',
      'content management system',
      'real time analytics'
    ];

    const performanceResults = [];

    for (const query of sampleQueries) {
      const startTime = process.hrtime.bigint();

      const results = await collection.find({
        $text: { $search: query }
      }, {
        score: { $meta: "textScore" }
      }).sort({ 
        score: { $meta: "textScore" } 
      }).limit(20).toArray();

      const endTime = process.hrtime.bigint();
      const executionTime = Number(endTime - startTime) / 1000000; // Convert to milliseconds

      performanceResults.push({
        query,
        resultCount: results.length,
        executionTimeMs: executionTime,
        avgScore: results.length > 0 ? 
          results.reduce((sum, r) => sum + r.score, 0) / results.length : 0
      });
    }

    // Store performance analysis
    await this.db.collection('search_performance').insertOne({
      collection: collectionName,
      indexStats: indexStats,
      queryPerformance: performanceResults,
      averageExecutionTime: performanceResults.reduce((sum, r) => sum + r.executionTimeMs, 0) / performanceResults.length,
      analyzedAt: new Date(),
      recommendations: this.generatePerformanceRecommendations(performanceResults, indexStats)
    });

    return performanceResults;
  }

  async setupSearchQueryCaching(collectionName) {
    // Implement intelligent search result caching
    const originalFind = this.db.collection(collectionName).find;
    const collection = this.db.collection(collectionName);

    // Override find method to add caching for text searches
    collection.findWithCache = async function(query, projection, options = {}) {
      // Only cache text search queries
      if (query.$text) {
        const cacheKey = this.generateCacheKey(query, projection, options);
        const cachedResult = await this.cache.get(cacheKey);

        if (cachedResult) {
          console.log(`Cache hit for search query: ${query.$text.$search}`);
          return cachedResult;
        }

        // Execute query and cache results
        const results = await originalFind.call(collection, query, projection)
          .sort(options.sort || { score: { $meta: "textScore" } })
          .limit(options.limit || 20)
          .toArray();

        // Cache for 5 minutes
        await this.cache.set(cacheKey, results, 300);
        console.log(`Cached search results for: ${query.$text.$search}`);

        return results;
      }

      // Non-text searches use original method
      return originalFind.call(collection, query, projection);
    }.bind(this);
  }

  generateCacheKey(query, projection, options) {
    // Generate consistent cache key for search queries
    const keyData = {
      search: query.$text?.$search,
      language: query.$text?.$language,
      filters: Object.keys(query).filter(k => k !== '$text'),
      projection: projection,
      sort: options.sort,
      limit: options.limit
    };

    return `search_${Buffer.from(JSON.stringify(keyData)).toString('base64')}`;
  }

  async createSearchAnalytics(collectionName) {
    // Create comprehensive search analytics
    const analyticsData = {
      collection: collectionName,
      date: new Date(),

      // Query pattern analysis
      popularSearchTerms: await this.getPopularSearchTerms(collectionName),
      searchFrequency: await this.getSearchFrequencyStats(collectionName),
      noResultsQueries: await this.getNoResultsQueries(collectionName),

      // Performance metrics
      avgResponseTime: await this.getAverageResponseTime(collectionName),
      queryComplexityDistribution: await this.getQueryComplexityStats(collectionName),

      // Result quality metrics
      clickThroughRates: await this.getClickThroughRates(collectionName),
      searchAbandonmentRate: await this.getSearchAbandonmentRate(collectionName)
    };

    await this.db.collection('search_analytics').insertOne(analyticsData);
    return analyticsData;
  }

  async logSearchQuery(collectionName, query, results, executionTime, userId = null) {
    // Log search queries for analytics
    const searchLog = {
      collection: collectionName,
      query: query,
      resultCount: results.length,
      executionTimeMs: executionTime,
      userId: userId,
      timestamp: new Date(),

      // Extract search characteristics
      searchLength: query.$text?.$search?.length || 0,
      searchTerms: query.$text?.$search?.split(' ').length || 0,
      hasFilters: Object.keys(query).length > 1,
      language: query.$text?.$language || 'english'
    };

    await this.searchStats.insertOne(searchLog);

    // Update popular queries tracking
    const queryText = query.$text?.$search;
    if (queryText) {
      this.updatePopularQueries(queryText);
    }
  }

  updatePopularQueries(queryText) {
    // Track popular queries in memory for quick access
    const count = this.popularQueries.get(queryText) || 0;
    this.popularQueries.set(queryText, count + 1);

    // Keep only top 100 queries to manage memory
    if (this.popularQueries.size > 100) {
      const sorted = Array.from(this.popularQueries.entries())
        .sort((a, b) => b[1] - a[1]);

      this.popularQueries.clear();
      sorted.slice(0, 100).forEach(([query, count]) => {
        this.popularQueries.set(query, count);
      });
    }
  }

  async getPopularSearchTerms(collectionName, limit = 20) {
    // Get most popular search terms from analytics
    const pipeline = [
      { 
        $match: { 
          collection: collectionName,
          timestamp: { $gte: new Date(Date.now() - 7 * 24 * 60 * 60 * 1000) } // Last 7 days
        }
      },
      {
        $group: {
          _id: "$query.$text.$search",
          searchCount: { $sum: 1 },
          avgResults: { $avg: "$resultCount" },
          avgExecutionTime: { $avg: "$executionTimeMs" }
        }
      },
      { $sort: { searchCount: -1 } },
      { $limit: limit }
    ];

    return await this.searchStats.aggregate(pipeline).toArray();
  }

  async optimizeSearchIndexes(collectionName) {
    // Optimize search indexes based on query patterns
    const collection = this.db.collection(collectionName);
    const analytics = await this.getSearchFrequencyStats(collectionName);

    // Analyze frequently searched fields
    const fieldUsage = await this.searchStats.aggregate([
      { $match: { collection: collectionName } },
      {
        $group: {
          _id: null,
          totalSearches: { $sum: 1 },
          avgResultCount: { $avg: "$resultCount" },
          slowQueries: { 
            $sum: { $cond: [{ $gt: ["$executionTimeMs", 100] }, 1, 0] }
          }
        }
      }
    ]).toArray();

    const optimizationRecommendations = [];

    // Check if index optimization is needed
    if (fieldUsage[0]?.slowQueries > fieldUsage[0]?.totalSearches * 0.1) {
      optimizationRecommendations.push({
        type: 'performance',
        recommendation: 'Consider compound indexes for frequently combined query filters',
        priority: 'high'
      });
    }

    if (fieldUsage[0]?.avgResultCount < 5) {
      optimizationRecommendations.push({
        type: 'relevance', 
        recommendation: 'Review text index weights to improve result relevance',
        priority: 'medium'
      });
    }

    return {
      currentPerformance: fieldUsage[0],
      recommendations: optimizationRecommendations,
      suggestedActions: await this.generateOptimizationActions(collectionName)
    };
  }

  generatePerformanceRecommendations(performanceResults, indexStats) {
    const recommendations = [];
    const avgExecutionTime = performanceResults.reduce((sum, r) => sum + r.executionTimeMs, 0) / performanceResults.length;

    if (avgExecutionTime > 50) {
      recommendations.push({
        type: 'performance',
        message: 'Average query execution time is high. Consider index optimization.',
        priority: 'high'
      });
    }

    const lowScoreQueries = performanceResults.filter(r => r.avgScore < 1.0);
    if (lowScoreQueries.length > performanceResults.length * 0.3) {
      recommendations.push({
        type: 'relevance',
        message: 'Many queries return low relevance scores. Review index weights.',
        priority: 'medium'
      });
    }

    return recommendations;
  }

  // Placeholder methods for analytics
  async getSearchFrequencyStats(collection) { /* Implementation */ }
  async getNoResultsQueries(collection) { /* Implementation */ }
  async getAverageResponseTime(collection) { /* Implementation */ }
  async getQueryComplexityStats(collection) { /* Implementation */ }
  async getClickThroughRates(collection) { /* Implementation */ }
  async getSearchAbandonmentRate(collection) { /* Implementation */ }
  async generateOptimizationActions(collection) { /* Implementation */ }
}

SQL-Style Text Search with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB text search operations:

-- QueryLeaf text search operations with SQL-familiar syntax

-- Basic full-text search with SQL LIKE-style syntax
SELECT 
  article_id,
  title,
  author,
  published_date,
  EXCERPT(content, 200) as content_preview,
  TEXT_SCORE() as relevance_score
FROM articles
WHERE MATCH(title, content, tags) AGAINST ('mongodb database performance optimization')
ORDER BY relevance_score DESC
LIMIT 20;

-- QueryLeaf automatically converts this to:
-- db.articles.find({
--   $text: { $search: "mongodb database performance optimization" }
-- }, {
--   score: { $meta: "textScore" }
-- }).sort({ score: { $meta: "textScore" } }).limit(20)

-- Advanced text search with filters and scoring
SELECT 
  p.product_id,
  p.name,
  p.description,
  p.price,
  p.category,
  p.brand,
  -- Custom relevance scoring with business logic
  (TEXT_SCORE() * 
    CASE 
      WHEN p.rating >= 4.5 THEN 1.5  -- Boost highly rated products
      WHEN p.in_stock = true THEN 1.2 -- Boost available products
      ELSE 1.0
    END
  ) as combined_score,

  -- Highlight matching terms in description
  HIGHLIGHT(p.description, 'laptop gaming performance') as highlighted_description

FROM products p
WHERE MATCH(p.name, p.description, p.brand) AGAINST ('laptop gaming performance')
  AND p.price BETWEEN 500 AND 2000
  AND p.category = 'Electronics'
  AND p.in_stock = true
ORDER BY combined_score DESC
LIMIT 50;

-- Multi-language search support
SELECT 
  d.document_id,
  d.title,
  d.content,
  d.language,
  TEXT_SCORE() as relevance
FROM documents d
WHERE (MATCH(d.title, d.content) AGAINST ('artificial intelligence' IN LANGUAGE 'english'))
   OR (MATCH(d.title, d.content) AGAINST ('intelligence artificielle' IN LANGUAGE 'french'))
   OR (MATCH(d.title, d.content) AGAINST ('inteligencia artificial' IN LANGUAGE 'spanish'))
ORDER BY relevance DESC;

-- Phrase search and proximity matching
SELECT 
  article_id,
  title,
  content,
  -- Different types of text matching
  CASE 
    WHEN MATCH(title, content) AGAINST ('"machine learning algorithms"' IN BOOLEAN MODE) THEN 'exact_phrase'
    WHEN MATCH(title, content) AGAINST ('machine learning algorithms' WITH PROXIMITY 5) THEN 'proximity_match'
    WHEN MATCH(title, content) AGAINST ('machine learning algorithms') THEN 'term_match'
  END as match_type,
  TEXT_SCORE() as score
FROM articles
WHERE MATCH(title, content) AGAINST ('machine learning algorithms')
ORDER BY 
  CASE match_type 
    WHEN 'exact_phrase' THEN 1
    WHEN 'proximity_match' THEN 2  
    WHEN 'term_match' THEN 3
  END,
  score DESC;

-- Search with faceted results and aggregations
WITH search_results AS (
  SELECT 
    a.article_id,
    a.title,
    a.content,
    a.author,
    a.category,
    a.tags,
    a.published_date,
    TEXT_SCORE() as relevance
  FROM articles a
  WHERE MATCH(a.title, a.content, a.tags) AGAINST ('data visualization dashboard')
)
SELECT 
  -- Main search results
  sr.article_id,
  sr.title,
  sr.author,
  SUBSTRING(sr.content, 1, 200) as excerpt,
  sr.relevance,

  -- Faceted aggregations for filtering
  (SELECT COUNT(*) FROM search_results WHERE category = sr.category) as category_count,
  (SELECT COUNT(*) FROM search_results WHERE author = sr.author) as author_article_count,
  (SELECT AVG(relevance) FROM search_results WHERE category = sr.category) as category_avg_relevance

FROM search_results sr
WHERE sr.relevance > 0.5
ORDER BY sr.relevance DESC
LIMIT 20;

-- Real-time search suggestions and auto-complete
SELECT DISTINCT
  CASE 
    WHEN title LIKE 'mongodb%' THEN EXTRACT_PHRASE(title, 'mongodb', 3)
    WHEN tags LIKE '%mongodb%' THEN EXTRACT_MATCHING_TAG(tags, 'mongodb')
    WHEN content LIKE '%mongodb%' THEN EXTRACT_CONTEXT(content, 'mongodb', 50)
  END as suggestion,

  COUNT(*) as frequency,
  AVG(popularity_score) as avg_popularity

FROM articles
WHERE (title LIKE '%mongodb%' OR content LIKE '%mongodb%' OR tags LIKE '%mongodb%')
  AND status = 'published'
  AND published_date >= CURRENT_DATE - INTERVAL '1 year'
GROUP BY suggestion
HAVING frequency >= 3
ORDER BY frequency DESC, avg_popularity DESC
LIMIT 10;

-- Search analytics and performance monitoring
SELECT 
  search_term,
  COUNT(*) as search_count,
  AVG(result_count) as avg_results,
  AVG(click_through_rate) as avg_ctr,
  AVG(execution_time_ms) as avg_response_time,

  -- Query performance categories
  CASE 
    WHEN AVG(execution_time_ms) < 50 THEN 'fast'
    WHEN AVG(execution_time_ms) < 200 THEN 'medium' 
    ELSE 'slow'
  END as performance_category,

  -- Result quality assessment
  CASE 
    WHEN AVG(click_through_rate) > 0.3 THEN 'high_relevance'
    WHEN AVG(click_through_rate) > 0.1 THEN 'medium_relevance'
    ELSE 'low_relevance' 
  END as relevance_quality

FROM search_analytics
WHERE search_date >= CURRENT_DATE - INTERVAL '30 days'
  AND search_term IS NOT NULL
GROUP BY search_term
HAVING search_count >= 10
ORDER BY search_count DESC, avg_ctr DESC;

-- Advanced search with custom scoring algorithms
WITH weighted_search AS (
  SELECT 
    a.*,
    -- Multi-field text scoring
    MATCH(a.title) AGAINST ('machine learning') * 3.0 +
    MATCH(a.content) AGAINST ('machine learning') * 2.0 +
    MATCH(a.tags) AGAINST ('machine learning') * 2.5 +
    MATCH(a.summary) AGAINST ('machine learning') * 1.5 as text_score,

    -- Business logic scoring
    CASE a.content_type
      WHEN 'tutorial' THEN 1.5
      WHEN 'reference' THEN 1.2
      WHEN 'news' THEN 0.8
      ELSE 1.0
    END as content_type_boost,

    -- Recency scoring (newer content boosted)
    POWER(0.99, DATEDIFF(CURRENT_DATE, a.published_date)) as recency_score,

    -- Author authority scoring
    (SELECT AVG(view_count) FROM articles WHERE author = a.author) / 1000 as author_authority

  FROM articles a
  WHERE MATCH(a.title, a.content, a.tags) AGAINST ('machine learning')
)
SELECT 
  article_id,
  title,
  author,
  published_date,
  -- Combined relevance score
  (text_score * content_type_boost * recency_score * (1 + author_authority)) as final_score,

  -- Score components for debugging
  text_score,
  content_type_boost,
  recency_score,
  author_authority

FROM weighted_search
WHERE final_score > 1.0
ORDER BY final_score DESC
LIMIT 25;

-- Search result clustering and categorization
SELECT 
  cluster_category,
  COUNT(*) as article_count,
  AVG(relevance_score) as avg_relevance,
  STRING_AGG(DISTINCT author, ', ') as contributing_authors,
  MIN(published_date) as earliest_article,
  MAX(published_date) as latest_article

FROM (
  SELECT 
    a.article_id,
    a.title,
    a.author,
    a.published_date,
    TEXT_SCORE() as relevance_score,

    -- Automatic categorization based on content analysis
    CASE 
      WHEN MATCH(a.content) AGAINST ('tutorial guide how-to step-by-step') THEN 'tutorials'
      WHEN MATCH(a.content) AGAINST ('documentation reference API specification') THEN 'documentation'  
      WHEN MATCH(a.content) AGAINST ('news announcement release update') THEN 'news'
      WHEN MATCH(a.content) AGAINST ('analysis opinion editorial review') THEN 'analysis'
      ELSE 'general'
    END as cluster_category

  FROM articles a
  WHERE MATCH(a.title, a.content, a.tags) AGAINST ('javascript frameworks comparison')
    AND a.status = 'published'
) clustered_results
GROUP BY cluster_category
ORDER BY article_count DESC, avg_relevance DESC;

-- QueryLeaf provides comprehensive text search features:
-- 1. SQL-familiar MATCH...AGAINST syntax for text queries
-- 2. Built-in relevance scoring with TEXT_SCORE() function
-- 3. Phrase matching, proximity search, and boolean operators
-- 4. Multi-language search support with language detection
-- 5. Search result highlighting and excerpt generation
-- 6. Custom scoring algorithms with business logic
-- 7. Search analytics and performance monitoring
-- 8. Faceted search with aggregated filtering options
-- 9. Real-time search suggestions and auto-completion
-- 10. Integration with MongoDB's native text indexing capabilities

Best Practices for Text Search

Index Strategy and Optimization

Design efficient text search architectures:

Field Weighting: Assign appropriate weights based on field importance and search patterns
Language Configuration: Configure proper language settings for stemming and stop word processing
Partial Filtering: Use partial filter expressions to index only relevant documents
Compound Indexes: Combine text indexes with other query conditions for optimal performance
Index Maintenance: Monitor and maintain text indexes for optimal search performance
Resource Management: Consider memory and storage requirements for text indexes

Search Quality and Relevance

Optimize search quality and user experience:

Relevance Tuning: Continuously tune search weights and scoring algorithms
Query Analysis: Analyze search queries to understand user intent and improve results
Result Presentation: Present search results with proper highlighting and excerpts
Faceted Navigation: Provide filtering options to help users refine search results
Search Analytics: Implement comprehensive search analytics to measure and improve performance
User Feedback: Incorporate user feedback to continuously improve search relevance

Conclusion

MongoDB text search with full-text indexing provides powerful, native search capabilities that eliminate the need for external search engines while delivering sophisticated language processing, relevance scoring, and search optimization. Combined with SQL-familiar search syntax, MongoDB enables comprehensive text search functionality that integrates seamlessly with your application data and queries.

Key text search benefits include:

Native Integration: Built-in search functionality without external dependencies
Language Intelligence: Support for 15+ languages with proper stemming and stop word processing
Relevance Scoring: Sophisticated scoring algorithms with customizable weighting
Performance Optimization: Efficient indexing and query processing for fast search results
Flexible Querying: Combine text search with other query conditions and aggregations

Whether you're building content management systems, e-commerce search, knowledge bases, or document repositories, MongoDB text search with QueryLeaf's familiar SQL interface provides the foundation for sophisticated search functionality. This combination enables you to implement powerful text search capabilities while preserving the development patterns and query approaches your team already knows.

QueryLeaf Integration: QueryLeaf automatically manages MongoDB text index creation, optimization, and maintenance while providing SQL-familiar MATCH...AGAINST syntax and search result functions. Complex text analysis, multi-language support, and custom scoring algorithms are seamlessly handled through familiar SQL patterns, making advanced text search both powerful and accessible.

The integration of native text search with SQL-style query syntax makes MongoDB an ideal platform for applications requiring both sophisticated search functionality and familiar database interaction patterns, ensuring your search features remain both effective and maintainable as they scale and evolve.

September 7, 2025
16 min read

MongoDB Data Archiving and Lifecycle Management: SQL-Style Data Retention with Automated Cleanup and Tiered Storage

As applications mature and data volumes grow exponentially, effective data lifecycle management becomes critical for maintaining database performance, controlling storage costs, and meeting compliance requirements. Without proper archiving strategies, databases can become bloated with historical data that's rarely accessed but continues to impact query performance and storage costs.

MongoDB's flexible document model and rich aggregation framework provide powerful tools for implementing sophisticated data archiving and lifecycle management strategies. Combined with SQL-familiar data retention patterns, MongoDB enables automated data lifecycle policies that maintain optimal performance while preserving historical data when needed.

The Data Lifecycle Challenge

Traditional approaches to data management often lack systematic lifecycle policies:

-- Traditional approach - no lifecycle management
-- Production table grows indefinitely
CREATE TABLE user_activities (
    activity_id SERIAL PRIMARY KEY,
    user_id INTEGER NOT NULL,
    activity_type VARCHAR(50),
    activity_data JSON,
    ip_address INET,
    user_agent TEXT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Problems without lifecycle management:
-- - Table size grows indefinitely impacting performance
-- - Old data consumes expensive primary storage
-- - Backup and maintenance operations slow down
-- - Compliance requirements for data retention not met
-- - Query performance degrades over time
-- - No automated cleanup processes

-- Manual archival attempts are error-prone
-- Copy old data (risky operation)
INSERT INTO user_activities_archive 
SELECT * FROM user_activities 
WHERE created_at < CURRENT_DATE - INTERVAL '2 years';

-- Delete old data (point of no return)
DELETE FROM user_activities 
WHERE created_at < CURRENT_DATE - INTERVAL '2 years';

-- Issues:
-- - Manual process prone to errors
-- - Risk of data loss during transfer
-- - No validation of archive integrity
-- - Downtime required for large operations
-- - No rollback capability

MongoDB with automated lifecycle management provides systematic solutions:

// MongoDB automated data lifecycle management
const lifecycleManager = new DataLifecycleManager(db);

// Define data retention policies
const retentionPolicies = [
  {
    collection: 'user_activities',
    stages: [
      {
        name: 'hot',
        duration: '90d',
        storage_class: 'primary',
        indexes: 'full'
      },
      {
        name: 'warm',
        duration: '1y',
        storage_class: 'secondary',
        indexes: 'minimal'
      },
      {
        name: 'cold',
        duration: '7y',
        storage_class: 'archive',
        indexes: 'none'
      },
      {
        name: 'purge',
        duration: null, // Delete after 7 years
        action: 'delete'
      }
    ],
    criteria: {
      date_field: 'created_at',
      partition_field: 'user_id'
    }
  }
];

// Automated lifecycle execution
await lifecycleManager.executeRetentionPolicies(retentionPolicies);

// Benefits:
// - Automated data transitions between storage tiers
// - Performance optimization through data temperature management
// - Cost optimization with appropriate storage classes
// - Compliance through systematic retention policies
// - Data integrity validation during transitions
// - Rollback capabilities for each transition stage

Understanding Data Lifecycle Management

Data Temperature and Tiered Storage

Implement data temperature-based storage strategies:

// Data temperature classification and tiered storage
class DataTemperatureManager {
  constructor(db) {
    this.db = db;
    this.temperatureConfig = {
      hot: {
        maxAge: 90, // days
        storageClass: 'primary',
        compressionLevel: 'fast',
        indexStrategy: 'full',
        replicationFactor: 3
      },
      warm: {
        maxAge: 365, // 1 year
        storageClass: 'secondary', 
        compressionLevel: 'standard',
        indexStrategy: 'essential',
        replicationFactor: 2
      },
      cold: {
        maxAge: 2555, // 7 years
        storageClass: 'archive',
        compressionLevel: 'maximum',
        indexStrategy: 'minimal',
        replicationFactor: 1
      }
    };
  }

  async classifyDataTemperature(collection, options = {}) {
    const pipeline = [
      {
        $addFields: {
          age_days: {
            $divide: [
              { $subtract: [new Date(), `$${options.dateField || 'created_at'}`] },
              86400000 // milliseconds per day
            ]
          }
        }
      },
      {
        $addFields: {
          data_temperature: {
            $switch: {
              branches: [
                {
                  case: { $lte: ['$age_days', this.temperatureConfig.hot.maxAge] },
                  then: 'hot'
                },
                {
                  case: { $lte: ['$age_days', this.temperatureConfig.warm.maxAge] },
                  then: 'warm'
                },
                {
                  case: { $lte: ['$age_days', this.temperatureConfig.cold.maxAge] },
                  then: 'cold'
                }
              ],
              default: 'expired'
            }
          }
        }
      },
      {
        $group: {
          _id: '$data_temperature',
          document_count: { $sum: 1 },
          avg_size_bytes: { $avg: { $bsonSize: '$$ROOT' } },
          total_size_bytes: { $sum: { $bsonSize: '$$ROOT' } },
          oldest_document: { $min: `$${options.dateField || 'created_at'}` },
          newest_document: { $max: `$${options.dateField || 'created_at'}` },
          sample_ids: { $push: { $limit: [{ $slice: ['$$ROOT', 5] }, 1] } }
        }
      },
      {
        $project: {
          temperature: '$_id',
          document_count: 1,
          avg_size_kb: { $round: [{ $divide: ['$avg_size_bytes', 1024] }, 2] },
          total_size_mb: { $round: [{ $divide: ['$total_size_bytes', 1048576] }, 2] },
          date_range: {
            oldest: '$oldest_document',
            newest: '$newest_document'
          },
          storage_recommendation: {
            $switch: {
              branches: [
                { case: { $eq: ['$_id', 'hot'] }, then: this.temperatureConfig.hot },
                { case: { $eq: ['$_id', 'warm'] }, then: this.temperatureConfig.warm },
                { case: { $eq: ['$_id', 'cold'] }, then: this.temperatureConfig.cold }
              ],
              default: { action: 'archive_or_delete' }
            }
          },
          _id: 0
        }
      },
      {
        $sort: {
          document_count: -1
        }
      }
    ];

    return await this.db.collection(collection).aggregate(pipeline).toArray();
  }

  async implementTieredStorage(collection, temperatureAnalysis) {
    const results = [];

    for (const tier of temperatureAnalysis) {
      const { temperature, storage_recommendation } = tier;

      if (temperature === 'expired') {
        // Handle expired data according to retention policy
        const result = await this.handleExpiredData(collection, tier);
        results.push(result);
        continue;
      }

      const targetCollection = this.getTargetCollection(collection, temperature);

      // Move data to appropriate tier
      const migrationResult = await this.migrateToTier(
        collection,
        targetCollection,
        temperature,
        storage_recommendation
      );

      results.push({
        temperature,
        collection: targetCollection,
        migration_result: migrationResult
      });
    }

    return results;
  }

  async migrateToTier(sourceCollection, targetCollection, temperature, config) {
    const session = this.db.client.startSession();

    try {
      const result = await session.withTransaction(async () => {
        // Find documents matching temperature criteria
        const ageThreshold = new Date();
        ageThreshold.setDate(ageThreshold.getDate() - this.getMaxAge(temperature));

        const documentsToMigrate = await this.db.collection(sourceCollection).find({
          data_temperature: temperature,
          migrated: { $ne: true }
        }, { session }).toArray();

        if (documentsToMigrate.length === 0) {
          return { migrated: 0, message: 'No documents to migrate' };
        }

        // Prepare documents for target tier
        const processedDocs = documentsToMigrate.map(doc => ({
          ...doc,
          migrated_at: new Date(),
          storage_tier: temperature,
          compression_applied: config.compressionLevel,
          source_collection: sourceCollection
        }));

        // Insert into target collection with appropriate settings
        await this.db.collection(targetCollection).insertMany(processedDocs, { 
          session,
          writeConcern: { w: config.replicationFactor, j: true }
        });

        // Mark original documents as migrated
        const migratedIds = documentsToMigrate.map(doc => doc._id);
        await this.db.collection(sourceCollection).updateMany(
          { _id: { $in: migratedIds } },
          { 
            $set: { 
              migrated: true, 
              migrated_to: targetCollection,
              migrated_at: new Date()
            }
          },
          { session }
        );

        // Update indexes for new tier
        await this.updateIndexesForTier(targetCollection, config.indexStrategy);

        return {
          migrated: documentsToMigrate.length,
          target_collection: targetCollection,
          storage_tier: temperature
        };
      });

      return result;

    } catch (error) {
      throw new Error(`Migration failed for ${temperature} tier: ${error.message}`);
    } finally {
      await session.endSession();
    }
  }

  async updateIndexesForTier(collection, indexStrategy) {
    const targetCollection = this.db.collection(collection);

    // Drop existing indexes (except _id)
    const existingIndexes = await targetCollection.indexes();
    for (const index of existingIndexes) {
      if (index.name !== '_id_') {
        await targetCollection.dropIndex(index.name);
      }
    }

    // Apply tier-appropriate indexes
    switch (indexStrategy) {
      case 'full':
        await targetCollection.createIndexes([
          { key: { created_at: -1 } },
          { key: { user_id: 1 } },
          { key: { activity_type: 1 } },
          { key: { user_id: 1, created_at: -1 } },
          { key: { activity_type: 1, created_at: -1 } }
        ]);
        break;

      case 'essential':
        await targetCollection.createIndexes([
          { key: { created_at: -1 } },
          { key: { user_id: 1 } }
        ]);
        break;

      case 'minimal':
        await targetCollection.createIndexes([
          { key: { created_at: -1 } }
        ]);
        break;

      case 'none':
        // Only _id index remains
        break;
    }
  }

  getTargetCollection(baseCollection, temperature) {
    return `${baseCollection}_${temperature}`;
  }

  getMaxAge(temperature) {
    return this.temperatureConfig[temperature]?.maxAge || 0;
  }

  async handleExpiredData(collection, expiredTier) {
    // Implement retention policy for expired data
    const retentionPolicy = await this.getRetentionPolicy(collection);

    switch (retentionPolicy.action) {
      case 'archive':
        return await this.archiveExpiredData(collection, expiredTier);
      case 'delete':
        return await this.deleteExpiredData(collection, expiredTier);
      default:
        return { action: 'no_action', expired_count: expiredTier.document_count };
    }
  }
}

Automated Retention Policies

Implement automated data retention with policy-driven management:

// Automated retention policy engine
class RetentionPolicyEngine {
  constructor(db) {
    this.db = db;
    this.policies = new Map();
    this.executionLog = db.collection('retention_execution_log');
  }

  async defineRetentionPolicy(policyConfig) {
    const policy = {
      policy_id: policyConfig.policy_id,
      collection: policyConfig.collection,
      created_at: new Date(),

      // Retention rules
      retention_rules: {
        hot_data: {
          duration: policyConfig.hot_duration || '90d',
          action: 'maintain',
          storage_class: 'primary',
          backup_frequency: 'daily'
        },
        warm_data: {
          duration: policyConfig.warm_duration || '1y',
          action: 'archive_to_secondary',
          storage_class: 'secondary',
          backup_frequency: 'weekly'
        },
        cold_data: {
          duration: policyConfig.cold_duration || '7y',
          action: 'archive_to_glacier',
          storage_class: 'archive',
          backup_frequency: 'monthly'
        },
        expired_data: {
          duration: policyConfig.retention_period || '10y',
          action: policyConfig.expired_action || 'delete',
          compliance_hold: policyConfig.compliance_hold || false
        }
      },

      // Selection criteria
      selection_criteria: {
        date_field: policyConfig.date_field || 'created_at',
        partition_fields: policyConfig.partition_fields || [],
        exclude_conditions: policyConfig.exclude_conditions || {},
        include_conditions: policyConfig.include_conditions || {}
      },

      // Execution settings
      execution_settings: {
        batch_size: policyConfig.batch_size || 1000,
        max_execution_time: policyConfig.max_execution_time || 3600000, // 1 hour
        dry_run: policyConfig.dry_run || false,
        notification_settings: policyConfig.notifications || {}
      }
    };

    // Store policy definition
    await this.db.collection('retention_policies').updateOne(
      { policy_id: policy.policy_id },
      { $set: policy },
      { upsert: true }
    );

    this.policies.set(policy.policy_id, policy);
    return policy;
  }

  async executeRetentionPolicy(policyId, options = {}) {
    const policy = this.policies.get(policyId) || 
      await this.db.collection('retention_policies').findOne({ policy_id: policyId });

    if (!policy) {
      throw new Error(`Retention policy ${policyId} not found`);
    }

    const executionId = `exec_${policyId}_${Date.now()}`;
    const startTime = new Date();

    try {
      // Log execution start
      await this.executionLog.insertOne({
        execution_id: executionId,
        policy_id: policyId,
        started_at: startTime,
        status: 'running',
        dry_run: options.dry_run || policy.execution_settings.dry_run
      });

      const results = await this.processRetentionRules(policy, executionId, options);

      // Log successful completion
      await this.executionLog.updateOne(
        { execution_id: executionId },
        {
          $set: {
            status: 'completed',
            completed_at: new Date(),
            execution_time_ms: Date.now() - startTime.getTime(),
            results: results
          }
        }
      );

      return results;

    } catch (error) {
      // Log execution failure
      await this.executionLog.updateOne(
        { execution_id: executionId },
        {
          $set: {
            status: 'failed',
            failed_at: new Date(),
            error: error.message,
            execution_time_ms: Date.now() - startTime.getTime()
          }
        }
      );

      throw error;
    }
  }

  async processRetentionRules(policy, executionId, options) {
    const { collection, retention_rules, selection_criteria } = policy;
    const results = {};

    for (const [ruleName, rule] of Object.entries(retention_rules)) {
      const ruleResult = await this.executeRetentionRule(
        collection,
        rule,
        selection_criteria,
        executionId,
        options
      );

      results[ruleName] = ruleResult;
    }

    return results;
  }

  async executeRetentionRule(collection, rule, criteria, executionId, options) {
    const targetCollection = this.db.collection(collection);
    const dryRun = options.dry_run || false;

    // Calculate age threshold for this rule
    const ageThreshold = this.calculateAgeThreshold(rule.duration);

    // Build selection query
    const selectionQuery = {
      [criteria.date_field]: { $lt: ageThreshold },
      ...criteria.include_conditions
    };

    // Apply exclusion conditions
    if (Object.keys(criteria.exclude_conditions).length > 0) {
      selectionQuery.$nor = [criteria.exclude_conditions];
    }

    // Get affected documents count
    const affectedCount = await targetCollection.countDocuments(selectionQuery);

    if (dryRun) {
      return {
        action: rule.action,
        affected_documents: affectedCount,
        age_threshold: ageThreshold,
        dry_run: true,
        message: `Would ${rule.action} ${affectedCount} documents`
      };
    }

    // Execute the retention action
    const actionResult = await this.executeRetentionAction(
      targetCollection,
      rule.action,
      selectionQuery,
      rule,
      executionId
    );

    return {
      action: rule.action,
      affected_documents: affectedCount,
      processed_documents: actionResult.processed,
      age_threshold: ageThreshold,
      execution_time_ms: actionResult.execution_time,
      details: actionResult.details
    };
  }

  async executeRetentionAction(collection, action, query, rule, executionId) {
    const startTime = Date.now();

    switch (action) {
      case 'maintain':
        return {
          processed: 0,
          execution_time: Date.now() - startTime,
          details: { message: 'Data maintained in current tier' }
        };

      case 'archive_to_secondary':
        return await this.archiveToSecondary(collection, query, rule, executionId);

      case 'archive_to_glacier':
        return await this.archiveToGlacier(collection, query, rule, executionId);

      case 'delete':
        return await this.deleteExpiredDocuments(collection, query, rule, executionId);

      default:
        throw new Error(`Unknown retention action: ${action}`);
    }
  }

  async archiveToSecondary(collection, query, rule, executionId) {
    const session = this.db.client.startSession();
    const archiveCollection = `${collection.collectionName}_archive`;
    let processedCount = 0;

    try {
      await session.withTransaction(async () => {
        const cursor = collection.find(query, { session }).batch(1000);

        while (await cursor.hasNext()) {
          const batch = [];

          // Collect batch
          for (let i = 0; i < 1000 && await cursor.hasNext(); i++) {
            const doc = await cursor.next();
            batch.push({
              ...doc,
              archived_at: new Date(),
              archived_by_policy: executionId,
              storage_class: rule.storage_class,
              original_collection: collection.collectionName
            });
          }

          if (batch.length > 0) {
            // Insert into archive collection
            await this.db.collection(archiveCollection).insertMany(batch, { session });

            // Mark documents as archived in original collection
            const docIds = batch.map(doc => doc._id);
            await collection.updateMany(
              { _id: { $in: docIds } },
              { 
                $set: { 
                  archived: true,
                  archived_at: new Date(),
                  archive_location: archiveCollection
                }
              },
              { session }
            );

            processedCount += batch.length;
          }
        }
      });

      return {
        processed: processedCount,
        execution_time: Date.now(),
        details: {
          archive_collection: archiveCollection,
          storage_class: rule.storage_class
        }
      };

    } finally {
      await session.endSession();
    }
  }

  async deleteExpiredDocuments(collection, query, rule, executionId) {
    if (rule.compliance_hold) {
      throw new Error('Cannot delete documents under compliance hold');
    }

    const session = this.db.client.startSession();
    let deletedCount = 0;

    try {
      await session.withTransaction(async () => {
        // Create deletion log before deleting
        const docsToDelete = await collection.find(query, { 
          projection: { _id: 1, [this.criteria?.date_field || 'created_at']: 1 },
          session 
        }).toArray();

        if (docsToDelete.length > 0) {
          // Log deletion for audit purposes
          await this.db.collection('deletion_log').insertOne({
            execution_id: executionId,
            collection: collection.collectionName,
            deleted_at: new Date(),
            deleted_count: docsToDelete.length,
            deletion_criteria: query,
            deleted_document_ids: docsToDelete.map(d => d._id)
          }, { session });

          // Execute deletion
          const deleteResult = await collection.deleteMany(query, { session });
          deletedCount = deleteResult.deletedCount;
        }
      });

      return {
        processed: deletedCount,
        execution_time: Date.now(),
        details: {
          action: 'permanent_deletion',
          audit_logged: true
        }
      };

    } finally {
      await session.endSession();
    }
  }

  calculateAgeThreshold(duration) {
    const now = new Date();
    const match = duration.match(/^(\d+)([dwmy])$/);

    if (!match) {
      throw new Error(`Invalid duration format: ${duration}`);
    }

    const [, amount, unit] = match;
    const num = parseInt(amount);

    switch (unit) {
      case 'd': // days
        return new Date(now.getTime() - (num * 24 * 60 * 60 * 1000));
      case 'w': // weeks
        return new Date(now.getTime() - (num * 7 * 24 * 60 * 60 * 1000));
      case 'm': // months (approximate)
        return new Date(now.getTime() - (num * 30 * 24 * 60 * 60 * 1000));
      case 'y': // years (approximate)
        return new Date(now.getTime() - (num * 365 * 24 * 60 * 60 * 1000));
      default:
        throw new Error(`Unknown duration unit: ${unit}`);
    }
  }

  async scheduleRetentionExecution(policyId, schedule) {
    // Store scheduled execution configuration
    await this.db.collection('retention_schedules').updateOne(
      { policy_id: policyId },
      {
        $set: {
          policy_id: policyId,
          schedule: schedule, // cron format
          enabled: true,
          created_at: new Date(),
          next_execution: this.calculateNextExecution(schedule)
        }
      },
      { upsert: true }
    );
  }

  calculateNextExecution(cronExpression) {
    // Basic cron parsing - in production, use a cron library
    const now = new Date();
    // Simplified: assume daily at specified hour
    if (cronExpression.match(/^0 (\d+) \* \* \*$/)) {
      const hour = parseInt(cronExpression.match(/^0 (\d+) \* \* \*$/)[1]);
      const nextExecution = new Date(now);
      nextExecution.setHours(hour, 0, 0, 0);

      if (nextExecution <= now) {
        nextExecution.setDate(nextExecution.getDate() + 1);
      }

      return nextExecution;
    }

    return new Date(now.getTime() + 24 * 60 * 60 * 1000); // Default: next day
  }
}

Advanced Archival Patterns

Incremental Archival with Change Tracking

Implement incremental archival for efficient data movement:

// Incremental archival with change tracking
class IncrementalArchivalManager {
  constructor(db) {
    this.db = db;
    this.changeTracker = db.collection('archival_change_log');
    this.archivalState = db.collection('archival_state');
  }

  async setupIncrementalArchival(collection, archivalConfig) {
    const config = {
      source_collection: collection,
      archive_collection: `${collection}_archive`,
      incremental_field: archivalConfig.incremental_field || 'updated_at',
      archival_criteria: archivalConfig.criteria || {},
      batch_size: archivalConfig.batch_size || 1000,
      schedule: archivalConfig.schedule || 'daily',
      created_at: new Date()
    };

    // Initialize archival state tracking
    await this.archivalState.updateOne(
      { collection: collection },
      {
        $set: {
          ...config,
          last_archival_timestamp: new Date(0), // Start from beginning
          total_archived: 0,
          last_execution: null
        }
      },
      { upsert: true }
    );

    return config;
  }

  async executeIncrementalArchival(collection, options = {}) {
    const state = await this.archivalState.findOne({ collection });
    if (!state) {
      throw new Error(`No archival configuration found for collection: ${collection}`);
    }

    const session = this.db.client.startSession();
    const executionId = `incr_arch_${collection}_${Date.now()}`;
    let archivedCount = 0;

    try {
      await session.withTransaction(async () => {
        // Find documents modified since last archival
        const query = {
          [state.incremental_field]: { $gt: state.last_archival_timestamp },
          ...state.archival_criteria,
          archived: { $ne: true }
        };

        const sourceCollection = this.db.collection(collection);
        const cursor = sourceCollection
          .find(query, { session })
          .sort({ [state.incremental_field]: 1 })
          .batch(state.batch_size);

        let latestTimestamp = state.last_archival_timestamp;

        while (await cursor.hasNext()) {
          const batch = [];

          for (let i = 0; i < state.batch_size && await cursor.hasNext(); i++) {
            const doc = await cursor.next();

            // Prepare document for archival
            const archivalDoc = {
              ...doc,
              archived_at: new Date(),
              archived_by: executionId,
              archive_method: 'incremental',
              original_collection: collection
            };

            batch.push(archivalDoc);

            // Track latest timestamp
            if (doc[state.incremental_field] > latestTimestamp) {
              latestTimestamp = doc[state.incremental_field];
            }
          }

          if (batch.length > 0) {
            // Insert into archive collection
            await this.db.collection(state.archive_collection)
              .insertMany(batch, { session });

            // Mark originals as archived
            const docIds = batch.map(doc => doc._id);
            await sourceCollection.updateMany(
              { _id: { $in: docIds } },
              { 
                $set: { 
                  archived: true,
                  archived_at: new Date(),
                  archive_location: state.archive_collection
                }
              },
              { session }
            );

            archivedCount += batch.length;

            // Log incremental change
            await this.changeTracker.insertOne({
              execution_id: executionId,
              collection: collection,
              batch_number: Math.floor(archivedCount / state.batch_size),
              documents_archived: batch.length,
              timestamp_range: {
                start: batch[0][state.incremental_field],
                end: batch[batch.length - 1][state.incremental_field]
              },
              processed_at: new Date()
            }, { session });
          }
        }

        // Update archival state
        await this.archivalState.updateOne(
          { collection },
          {
            $set: {
              last_archival_timestamp: latestTimestamp,
              last_execution: new Date()
            },
            $inc: {
              total_archived: archivedCount
            }
          },
          { session }
        );
      });

      return {
        success: true,
        execution_id: executionId,
        documents_archived: archivedCount,
        last_timestamp: await this.archivalState.findOne(
          { collection }, 
          { projection: { last_archival_timestamp: 1 } }
        )
      };

    } catch (error) {
      return {
        success: false,
        execution_id: executionId,
        error: error.message,
        documents_archived: archivedCount
      };
    } finally {
      await session.endSession();
    }
  }

  async rollbackArchival(executionId, options = {}) {
    const session = this.db.client.startSession();

    try {
      await session.withTransaction(async () => {
        // Find all changes for this execution
        const changes = await this.changeTracker
          .find({ execution_id: executionId }, { session })
          .toArray();

        if (changes.length === 0) {
          throw new Error(`No archival changes found for execution: ${executionId}`);
        }

        const collection = changes[0].collection;
        const sourceCollection = this.db.collection(collection);
        const archiveCollection = this.db.collection(`${collection}_archive`);

        // Get archived documents for this execution
        const archivedDocs = await archiveCollection
          .find({ archived_by: executionId }, { session })
          .toArray();

        if (archivedDocs.length > 0) {
          // Remove from archive
          await archiveCollection.deleteMany(
            { archived_by: executionId },
            { session }
          );

          // Restore archived flag in source
          const docIds = archivedDocs.map(doc => doc._id);
          await sourceCollection.updateMany(
            { _id: { $in: docIds } },
            { 
              $unset: { 
                archived: '',
                archived_at: '',
                archive_location: ''
              }
            },
            { session }
          );
        }

        // Remove change tracking records
        await this.changeTracker.deleteMany(
          { execution_id: executionId },
          { session }
        );

        // Update archival state if needed
        if (options.updateState) {
          await this.recalculateArchivalState(collection, session);
        }
      });

      return { success: true, rolled_back_execution: executionId };

    } catch (error) {
      return { success: false, error: error.message };
    } finally {
      await session.endSession();
    }
  }

  async recalculateArchivalState(collection, session) {
    // Recalculate archival state from actual data
    const sourceCollection = this.db.collection(collection);

    const pipeline = [
      { $match: { archived: true } },
      {
        $group: {
          _id: null,
          total_archived: { $sum: 1 },
          latest_archived: { $max: '$archived_at' }
        }
      }
    ];

    const result = await sourceCollection.aggregate(pipeline, { session }).toArray();
    const stats = result[0] || { total_archived: 0, latest_archived: new Date(0) };

    await this.archivalState.updateOne(
      { collection },
      {
        $set: {
          total_archived: stats.total_archived,
          last_archival_timestamp: stats.latest_archived,
          recalculated_at: new Date()
        }
      },
      { session, upsert: true }
    );
  }
}

Compliance and Audit Integration

Implement compliance-aware archival with audit trails:

// Compliance-aware archival with audit integration
class ComplianceArchivalManager {
  constructor(db) {
    this.db = db;
    this.auditLog = db.collection('compliance_audit_log');
    this.retentionPolicies = db.collection('compliance_policies');
    this.legalHolds = db.collection('legal_holds');
  }

  async defineCompliancePolicy(policyConfig) {
    const policy = {
      policy_id: policyConfig.policy_id,
      regulation_type: policyConfig.regulation_type, // GDPR, CCPA, SOX, HIPAA
      data_classification: policyConfig.data_classification, // PII, PHI, Financial
      retention_requirements: {
        minimum_retention: policyConfig.minimum_retention,
        maximum_retention: policyConfig.maximum_retention,
        deletion_required: policyConfig.deletion_required || false
      },
      geographic_scope: policyConfig.geographic_scope || ['global'],
      audit_requirements: {
        audit_trail_required: policyConfig.audit_trail || true,
        access_logging: policyConfig.access_logging || true,
        deletion_approval: policyConfig.deletion_approval || false
      },
      created_at: new Date(),
      effective_date: new Date(policyConfig.effective_date),
      review_date: new Date(policyConfig.review_date)
    };

    await this.retentionPolicies.updateOne(
      { policy_id: policy.policy_id },
      { $set: policy },
      { upsert: true }
    );

    return policy;
  }

  async checkComplianceBeforeArchival(collection, documents, policyId) {
    const policy = await this.retentionPolicies.findOne({ policy_id: policyId });
    if (!policy) {
      throw new Error(`Compliance policy ${policyId} not found`);
    }

    const complianceChecks = [];

    // Check legal holds
    const activeHolds = await this.legalHolds.find({
      status: 'active',
      collections: collection,
      $or: [
        { expiry_date: { $gt: new Date() } },
        { expiry_date: null }
      ]
    }).toArray();

    if (activeHolds.length > 0) {
      complianceChecks.push({
        check: 'legal_hold',
        status: 'blocked',
        message: `Active legal holds prevent archival: ${activeHolds.map(h => h.hold_id).join(', ')}`,
        holds: activeHolds
      });
    }

    // Check minimum retention requirements
    const now = new Date();
    const minRetentionViolations = documents.filter(doc => {
      const createdAt = doc.created_at || doc._id.getTimestamp();
      const ageInDays = (now - createdAt) / (24 * 60 * 60 * 1000);
      const minRetentionDays = this.parseDuration(policy.retention_requirements.minimum_retention);
      return ageInDays < minRetentionDays;
    });

    if (minRetentionViolations.length > 0) {
      complianceChecks.push({
        check: 'minimum_retention',
        status: 'blocked',
        message: `${minRetentionViolations.length} documents violate minimum retention requirements`,
        violation_count: minRetentionViolations.length
      });
    }

    // Check maximum retention requirements
    if (policy.retention_requirements.deletion_required) {
      const maxRetentionDays = this.parseDuration(policy.retention_requirements.maximum_retention);
      const maxRetentionViolations = documents.filter(doc => {
        const createdAt = doc.created_at || doc._id.getTimestamp();
        const ageInDays = (now - createdAt) / (24 * 60 * 60 * 1000);
        return ageInDays > maxRetentionDays;
      });

      if (maxRetentionViolations.length > 0) {
        complianceChecks.push({
          check: 'maximum_retention',
          status: 'warning',
          message: `${maxRetentionViolations.length} documents exceed maximum retention and must be deleted`,
          violation_count: maxRetentionViolations.length
        });
      }
    }

    const hasBlockingIssues = complianceChecks.some(check => check.status === 'blocked');

    return {
      compliant: !hasBlockingIssues,
      checks: complianceChecks,
      policy: policy,
      checked_at: new Date()
    };
  }

  async executeComplianceArchival(collection, policyId, options = {}) {
    const session = this.db.client.startSession();
    const executionId = `comp_arch_${collection}_${Date.now()}`;

    try {
      const result = await session.withTransaction(async () => {
        // Get documents eligible for archival
        const query = this.buildArchivalQuery(collection, policyId, options);
        const documents = await this.db.collection(collection)
          .find(query, { session })
          .toArray();

        if (documents.length === 0) {
          return { documents_processed: 0, message: 'No documents eligible for archival' };
        }

        // Compliance check
        const complianceResult = await this.checkComplianceBeforeArchival(
          collection, documents, policyId
        );

        if (!complianceResult.compliant) {
          throw new Error(`Compliance check failed: ${JSON.stringify(complianceResult.checks)}`);
        }

        // Log compliance check
        await this.auditLog.insertOne({
          execution_id: executionId,
          action: 'compliance_check',
          collection: collection,
          policy_id: policyId,
          documents_checked: documents.length,
          compliance_result: complianceResult,
          timestamp: new Date()
        }, { session });

        // Execute archival with audit trail
        const archivalResult = await this.executeAuditedArchival(
          collection, documents, policyId, executionId, session
        );

        return {
          execution_id: executionId,
          documents_processed: archivalResult.processed,
          compliance_checks: complianceResult.checks,
          archival_details: archivalResult
        };
      });

      return { success: true, ...result };

    } catch (error) {
      // Log compliance violation or error
      await this.auditLog.insertOne({
        execution_id: executionId,
        action: 'compliance_error',
        collection: collection,
        policy_id: policyId,
        error: error.message,
        timestamp: new Date()
      });

      return { success: false, error: error.message, execution_id: executionId };
    } finally {
      await session.endSession();
    }
  }

  async executeAuditedArchival(collection, documents, policyId, executionId, session) {
    const policy = await this.retentionPolicies.findOne({ policy_id: policyId });
    const archiveCollection = `${collection}_archive`;

    // Prepare documents with compliance metadata
    const archivalDocuments = documents.map(doc => ({
      ...doc,
      compliance_metadata: {
        archived_under_policy: policyId,
        regulation_type: policy.regulation_type,
        data_classification: policy.data_classification,
        archived_at: new Date(),
        archived_by: executionId,
        retention_expiry: this.calculateRetentionExpiry(doc, policy),
        audit_trail_id: `audit_${doc._id}_${Date.now()}`
      }
    }));

    // Insert into archive with compliance metadata
    await this.db.collection(archiveCollection).insertMany(archivalDocuments, { session });

    // Create detailed audit entries
    for (const doc of documents) {
      await this.auditLog.insertOne({
        execution_id: executionId,
        action: 'document_archived',
        document_id: doc._id,
        collection: collection,
        archive_collection: archiveCollection,
        policy_id: policyId,
        data_classification: policy.data_classification,
        retention_expiry: this.calculateRetentionExpiry(doc, policy),
        timestamp: new Date(),
        user_context: options.user_context || 'system'
      }, { session });
    }

    // Mark original documents
    const docIds = documents.map(doc => doc._id);
    await this.db.collection(collection).updateMany(
      { _id: { $in: docIds } },
      {
        $set: {
          archived: true,
          archived_at: new Date(),
          archive_location: archiveCollection,
          compliance_policy: policyId
        }
      },
      { session }
    );

    return {
      processed: documents.length,
      archive_collection: archiveCollection,
      audit_entries_created: documents.length + 1 // +1 for the compliance check
    };
  }

  calculateRetentionExpiry(document, policy) {
    const createdAt = document.created_at || document._id.getTimestamp();
    const maxRetentionDays = this.parseDuration(policy.retention_requirements.maximum_retention);

    if (maxRetentionDays > 0) {
      return new Date(createdAt.getTime() + (maxRetentionDays * 24 * 60 * 60 * 1000));
    }

    return null; // No expiry
  }

  parseDuration(duration) {
    const match = duration.match(/^(\d+)([dwmy])$/);
    if (!match) return 0;

    const [, amount, unit] = match;
    const num = parseInt(amount);

    switch (unit) {
      case 'd': return num;
      case 'w': return num * 7;
      case 'm': return num * 30;
      case 'y': return num * 365;
      default: return 0;
    }
  }
}

QueryLeaf Data Lifecycle Integration

QueryLeaf provides SQL-familiar syntax for data lifecycle management:

-- QueryLeaf data lifecycle management with SQL-style syntax

-- Create automated retention policy using DDL-style syntax
CREATE RETENTION POLICY user_data_lifecycle
ON user_activities
WITH (
    HOT_RETENTION = '90 days',
    WARM_RETENTION = '1 year', 
    COLD_RETENTION = '7 years',
    PURGE_AFTER = '10 years',
    DATE_COLUMN = 'created_at',
    PARTITION_BY = 'user_id',
    COMPLIANCE_POLICY = 'GDPR'
);

-- Execute retention policy manually
EXEC APPLY_RETENTION_POLICY 'user_data_lifecycle';

-- Archive data using SQL-style syntax with temperature-based storage
WITH data_temperature AS (
    SELECT 
        *,
        DATEDIFF(day, created_at, CURRENT_DATE) as age_days,
        CASE 
            WHEN DATEDIFF(day, created_at, CURRENT_DATE) <= 90 THEN 'hot'
            WHEN DATEDIFF(day, created_at, CURRENT_DATE) <= 365 THEN 'warm'
            WHEN DATEDIFF(day, created_at, CURRENT_DATE) <= 2555 THEN 'cold'
            ELSE 'expired'
        END as data_temperature
    FROM user_activities
    WHERE archived IS NULL
)
-- Move warm data to secondary storage
INSERT INTO user_activities_warm
SELECT 
    *,
    CURRENT_TIMESTAMP as archived_at,
    'secondary_storage' as storage_class
FROM data_temperature 
WHERE data_temperature = 'warm';

-- QueryLeaf automatically handles:
-- 1. MongoDB collection creation for each temperature tier
-- 2. Appropriate index strategy for each tier
-- 3. Compression settings based on access patterns
-- 4. Transaction management for data migration

-- Compliance-aware data deletion with audit trail
BEGIN TRANSACTION;

-- Check for legal holds before deletion
IF EXISTS(
    SELECT 1 FROM legal_holds lh
    WHERE lh.collection_name = 'user_activities'
      AND lh.status = 'active' 
      AND (lh.expiry_date IS NULL OR lh.expiry_date > CURRENT_DATE)
)
BEGIN
    ROLLBACK TRANSACTION;
    RAISERROR('Cannot delete data - active legal hold exists', 16, 1);
    RETURN;
END

-- Log deletion for audit purposes
INSERT INTO data_deletion_audit (
    collection_name,
    deletion_policy,
    records_affected,
    deletion_criteria,
    deleted_by,
    deletion_timestamp
)
SELECT 
    'user_activities' as collection_name,
    'GDPR_RIGHT_TO_ERASURE' as deletion_policy,
    COUNT(*) as records_affected,
    'user_id = @user_id AND created_at < @retention_cutoff' as deletion_criteria,
    SYSTEM_USER as deleted_by,
    CURRENT_TIMESTAMP as deletion_timestamp
FROM user_activities
WHERE user_id = @user_id 
  AND created_at < DATEADD(year, -7, CURRENT_DATE);

-- Execute compliant deletion
DELETE FROM user_activities
WHERE user_id = @user_id 
  AND created_at < DATEADD(year, -7, CURRENT_DATE);

-- Also clean up related data across collections
DELETE FROM user_sessions WHERE user_id = @user_id;
DELETE FROM user_preferences WHERE user_id = @user_id;
DELETE FROM audit_trail WHERE subject_id = @user_id AND created_at < DATEADD(year, -7, CURRENT_DATE);

COMMIT TRANSACTION;

-- Automated lifecycle management with SQL scheduling
CREATE SCHEDULE retention_job_daily
FOR PROCEDURE apply_all_retention_policies()
EXECUTE DAILY AT '02:00:00';

-- Real-time archival monitoring with window functions
WITH archival_stats AS (
    SELECT 
        collection_name,
        data_temperature,
        COUNT(*) as document_count,
        AVG(BSON_SIZE(document)) as avg_size_bytes,
        SUM(BSON_SIZE(document)) as total_size_bytes,
        MIN(created_at) as oldest_document,
        MAX(created_at) as newest_document,

        -- Calculate storage cost based on temperature
        CASE data_temperature
            WHEN 'hot' THEN SUM(BSON_SIZE(document)) * 0.10  -- $0.10 per GB
            WHEN 'warm' THEN SUM(BSON_SIZE(document)) * 0.05 -- $0.05 per GB  
            WHEN 'cold' THEN SUM(BSON_SIZE(document)) * 0.01 -- $0.01 per GB
            ELSE 0
        END as estimated_monthly_cost,

        -- Calculate archival urgency
        CASE 
            WHEN data_temperature = 'hot' AND DATEDIFF(day, MAX(created_at), CURRENT_DATE) > 90 
            THEN 'URGENT_ARCHIVE_NEEDED'
            WHEN data_temperature = 'warm' AND DATEDIFF(day, MAX(created_at), CURRENT_DATE) > 365
            THEN 'ARCHIVE_RECOMMENDED'
            ELSE 'NO_ACTION_NEEDED'
        END as archival_recommendation

    FROM (
        SELECT 
            'user_activities' as collection_name,
            CASE 
                WHEN DATEDIFF(day, created_at, CURRENT_DATE) <= 90 THEN 'hot'
                WHEN DATEDIFF(day, created_at, CURRENT_DATE) <= 365 THEN 'warm' 
                WHEN DATEDIFF(day, created_at, CURRENT_DATE) <= 2555 THEN 'cold'
                ELSE 'expired'
            END as data_temperature,
            created_at,
            *
        FROM user_activities
        WHERE archived IS NULL
    ) classified_data
    GROUP BY collection_name, data_temperature
),

cost_analysis AS (
    SELECT 
        SUM(estimated_monthly_cost) as total_monthly_cost,
        SUM(CASE WHEN archival_recommendation != 'NO_ACTION_NEEDED' THEN estimated_monthly_cost ELSE 0 END) as potential_savings,
        COUNT(CASE WHEN archival_recommendation = 'URGENT_ARCHIVE_NEEDED' THEN 1 END) as urgent_collections
    FROM archival_stats
)

SELECT 
    a.*,
    c.total_monthly_cost,
    c.potential_savings,
    ROUND((c.potential_savings / c.total_monthly_cost) * 100, 2) as potential_savings_percent
FROM archival_stats a
CROSS JOIN cost_analysis c
ORDER BY 
    CASE archival_recommendation
        WHEN 'URGENT_ARCHIVE_NEEDED' THEN 1
        WHEN 'ARCHIVE_RECOMMENDED' THEN 2
        ELSE 3
    END,
    estimated_monthly_cost DESC;

-- QueryLeaf provides:
-- 1. Automatic lifecycle policy creation and management
-- 2. Temperature-based storage tier management
-- 3. Compliance-aware retention with audit trails
-- 4. Cost optimization through intelligent archival
-- 5. SQL-familiar syntax for complex lifecycle operations
-- 6. Integration with MongoDB's native archival capabilities

Best Practices for Data Lifecycle Management

Architecture Guidelines

Design scalable data lifecycle architectures:

Temperature-Based Storage: Implement hot/warm/cold data tiers based on access patterns
Automated Policies: Use policy-driven automation to reduce manual intervention
Incremental Processing: Implement incremental archival to minimize performance impact
Compliance Integration: Build compliance requirements into lifecycle policies
Monitoring and Alerting: Monitor archival processes and set up alerting for issues
Recovery Planning: Plan for archive recovery and rollback scenarios

Performance Optimization

Optimize lifecycle operations for production environments:

Batch Processing: Use appropriate batch sizes to balance performance and memory usage
Index Strategy: Maintain different index strategies for each storage tier
Compression: Apply appropriate compression based on data temperature
Scheduling: Schedule intensive operations during low-traffic periods
Resource Management: Monitor and limit resource usage during archival operations
Query Optimization: Optimize queries across archived and active data

Conclusion

MongoDB data archiving and lifecycle management provide essential capabilities for maintaining database performance, controlling costs, and meeting compliance requirements as data volumes grow. Combined with SQL-familiar lifecycle management patterns, MongoDB enables systematic data lifecycle policies that scale with business needs.

Key lifecycle management benefits include:

Performance Optimization: Keep active data performant by archiving historical data
Cost Control: Optimize storage costs through tiered storage strategies
Compliance Adherence: Meet regulatory requirements through systematic retention policies
Automated Operations: Reduce manual intervention through policy-driven automation
Data Integrity: Maintain data integrity and audit trails throughout the lifecycle

Whether you're managing user activity data, transaction records, audit logs, or analytical datasets, MongoDB lifecycle management with QueryLeaf's familiar SQL interface provides the tools for systematic data archiving at scale. This combination enables you to implement sophisticated lifecycle policies while preserving familiar development patterns.

QueryLeaf Integration: QueryLeaf automatically manages MongoDB lifecycle operations including temperature-based storage, compliance-aware archival, and automated policy execution while providing SQL-familiar DDL and DML syntax. Complex archival workflows, audit trail generation, and cost optimization strategies are seamlessly handled through familiar SQL patterns, making data lifecycle management both powerful and accessible.

The integration of automated lifecycle management with SQL-style policy definition makes MongoDB an ideal platform for applications requiring both systematic data archiving and familiar database administration patterns, ensuring your data lifecycle strategies remain both effective and maintainable as they scale and evolve.

September 6, 2025
20 min read

MongoDB Schema Design Patterns: Optimizing Document Structure with SQL-Style Data Modeling

Effective database schema design is crucial for application performance, scalability, and maintainability. While MongoDB's document-based structure provides tremendous flexibility compared to rigid SQL table schemas, this flexibility can be both a blessing and a curse. Without proper design patterns and guidelines, MongoDB schemas can become inefficient, leading to poor query performance, excessive memory usage, and difficult maintenance.

MongoDB schema design requires understanding document relationships, query patterns, data access frequencies, and growth projections. The key to successful MongoDB applications lies in choosing the right schema patterns that align with your specific use cases while maintaining performance and flexibility for future requirements.

The Schema Design Challenge

Traditional SQL databases enforce rigid schemas with predefined relationships:

-- Traditional SQL normalized schema
CREATE TABLE customers (
    customer_id SERIAL PRIMARY KEY,
    first_name VARCHAR(50) NOT NULL,
    last_name VARCHAR(50) NOT NULL,
    email VARCHAR(100) UNIQUE NOT NULL,
    phone VARCHAR(20),
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE addresses (
    address_id SERIAL PRIMARY KEY,
    customer_id INTEGER REFERENCES customers(customer_id),
    street_address VARCHAR(200),
    city VARCHAR(50),
    state VARCHAR(50),
    postal_code VARCHAR(20),
    country VARCHAR(50),
    address_type VARCHAR(20) DEFAULT 'shipping'
);

CREATE TABLE orders (
    order_id SERIAL PRIMARY KEY,
    customer_id INTEGER REFERENCES customers(customer_id),
    shipping_address_id INTEGER REFERENCES addresses(address_id),
    billing_address_id INTEGER REFERENCES addresses(address_id),
    order_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    total_amount DECIMAL(10,2)
);

-- Complex queries require multiple JOINs
SELECT 
    c.first_name,
    c.last_name,
    c.email,
    o.order_date,
    o.total_amount,
    sa.street_address as shipping_street,
    sa.city as shipping_city,
    ba.street_address as billing_street,
    ba.city as billing_city
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
JOIN addresses sa ON o.shipping_address_id = sa.address_id
JOIN addresses ba ON o.billing_address_id = ba.address_id
WHERE c.customer_id = 123;

-- Problems with traditional normalized approach:
-- - Multiple table JOINs impact query performance
-- - Complex queries for simple data retrieval
-- - Schema changes require ALTER TABLE operations
-- - Relationships must be predefined and static
-- - Difficult to represent hierarchical or nested data

MongoDB document schemas can eliminate JOIN operations and provide flexible data structures:

// MongoDB document schema - embedded approach
{
  _id: ObjectId("64f1a2c4567890abcdef1234"),
  firstName: "John",
  lastName: "Smith", 
  email: "john.smith@example.com",
  phone: "+1-555-123-4567",
  addresses: [
    {
      type: "shipping",
      street: "123 Main St",
      city: "New York",
      state: "NY",
      postalCode: "10001",
      country: "USA",
      isDefault: true
    },
    {
      type: "billing", 
      street: "456 Oak Ave",
      city: "Brooklyn",
      state: "NY",
      postalCode: "11201",
      country: "USA",
      isDefault: false
    }
  ],
  orders: [
    {
      orderId: ObjectId("64f1a2c4567890abcdef5678"),
      orderDate: ISODate("2025-09-06T14:30:00Z"),
      totalAmount: 259.99,
      status: "shipped",
      shippingAddress: {
        street: "123 Main St",
        city: "New York", 
        state: "NY",
        postalCode: "10001"
      },
      items: [
        {
          productId: ObjectId("64f1a2c4567890abcdef9abc"),
          productName: "Wireless Headphones",
          quantity: 1,
          unitPrice: 199.99
        },
        {
          productId: ObjectId("64f1a2c4567890abcdef9def"),
          productName: "Phone Case",
          quantity: 2,
          unitPrice: 29.99
        }
      ]
    }
  ],
  preferences: {
    newsletter: true,
    notifications: {
      email: true,
      sms: false,
      push: true
    },
    defaultCurrency: "USD",
    language: "en"
  },
  createdAt: ISODate("2025-08-15T09:00:00Z"),
  lastActivity: ISODate("2025-09-06T14:30:00Z")
}

// Benefits:
// - Single document query retrieves all related data
// - No complex JOIN operations required
// - Flexible schema allows different document structures
// - Natural representation of hierarchical data
// - Atomic operations on entire document
// - Easy to add new fields without schema migrations

Core Schema Design Patterns

Embedding vs Referencing

Choose between embedding and referencing based on data relationships and access patterns:

// Schema design pattern analyzer
class SchemaDesignAnalyzer {
  constructor() {
    this.embeddingCriteria = {
      maxDocumentSize: 16 * 1024 * 1024, // 16MB BSON limit
      maxArrayElements: 1000, // Practical limit for embedded arrays
      updateFrequency: 'low', // Low update frequency favors embedding
      queryPatterns: 'together' // Data accessed together favors embedding
    };
  }

  analyzeRelationship(parentEntity, childEntity, relationshipInfo) {
    const analysis = {
      relationshipType: relationshipInfo.type, // 1:1, 1:many, many:many
      dataSize: relationshipInfo.estimatedSize,
      accessPattern: relationshipInfo.accessPattern,
      updateFrequency: relationshipInfo.updateFrequency,
      growthProjection: relationshipInfo.growthProjection
    };

    return this.recommendPattern(analysis);
  }

  recommendPattern(analysis) {
    const recommendations = [];

    // One-to-One relationships - usually embed
    if (analysis.relationshipType === '1:1') {
      if (analysis.dataSize < this.embeddingCriteria.maxDocumentSize / 10) {
        recommendations.push({
          pattern: 'embedding',
          confidence: 'high',
          reason: 'One-to-one relationship with small data size favors embedding'
        });
      }
    }

    // One-to-Many relationships - analyze carefully
    if (analysis.relationshipType === '1:many') {
      if (analysis.growthProjection === 'bounded' && 
          analysis.accessPattern === 'together') {
        recommendations.push({
          pattern: 'embedding',
          confidence: 'medium',
          reason: 'Bounded growth with related access patterns'
        });
      } else if (analysis.growthProjection === 'unbounded' ||
                 analysis.updateFrequency === 'high') {
        recommendations.push({
          pattern: 'referencing',
          confidence: 'high', 
          reason: 'Unbounded growth or high update frequency requires referencing'
        });
      }
    }

    // Many-to-Many relationships - usually reference
    if (analysis.relationshipType === 'many:many') {
      recommendations.push({
        pattern: 'referencing',
        confidence: 'high',
        reason: 'Many-to-many relationships require referencing to avoid duplication'
      });
    }

    return recommendations;
  }
}

// Example schema patterns

// Pattern 1: Embedding for One-to-One relationships
const userWithProfile = {
  _id: ObjectId("64f1a2c4567890abcdef1234"),
  username: "johndoe",
  email: "john@example.com",

  // Embedded profile information
  profile: {
    firstName: "John",
    lastName: "Doe",
    dateOfBirth: ISODate("1985-06-15T00:00:00Z"),
    biography: "Software engineer with 10 years of experience",
    avatar: {
      url: "https://cdn.example.com/avatars/johndoe.jpg",
      uploadedAt: ISODate("2025-09-01T10:00:00Z")
    },
    socialMedia: {
      twitter: "@johndoe",
      linkedin: "linkedin.com/in/johndoe",
      github: "github.com/johndoe"
    }
  },

  preferences: {
    theme: "dark",
    notifications: true,
    privacy: {
      profileVisible: true,
      emailVisible: false,
      activityVisible: true
    }
  },

  createdAt: ISODate("2024-12-01T00:00:00Z"),
  lastLogin: ISODate("2025-09-06T08:30:00Z")
};

// Pattern 2: Referencing for One-to-Many with unbounded growth
const userWithOrders = {
  _id: ObjectId("64f1a2c4567890abcdef1234"),
  username: "johndoe",
  email: "john@example.com",

  // Order summary for quick access
  orderSummary: {
    totalOrders: 47,
    totalSpent: 2859.94,
    averageOrderValue: 60.85,
    lastOrderDate: ISODate("2025-09-05T16:45:00Z"),
    favoriteCategories: ["electronics", "books", "clothing"]
  }
  // Orders stored in separate collection due to unbounded growth
};

// Separate orders collection
const orderDocument = {
  _id: ObjectId("64f1a2c4567890abcdef5678"),
  customerId: ObjectId("64f1a2c4567890abcdef1234"),
  orderNumber: "ORD-2025-000047",
  orderDate: ISODate("2025-09-05T16:45:00Z"),
  status: "delivered",

  items: [
    {
      productId: ObjectId("64f1a2c4567890abcdef9abc"),
      sku: "WH-2025-BT",
      name: "Wireless Bluetooth Headphones",
      category: "electronics",
      quantity: 1,
      unitPrice: 199.99,
      totalPrice: 199.99
    }
  ],

  shipping: {
    method: "standard",
    cost: 9.99,
    address: {
      street: "123 Main St",
      city: "New York",
      state: "NY",
      postalCode: "10001"
    },
    trackingNumber: "1Z999AA1234567890",
    estimatedDelivery: ISODate("2025-09-08T00:00:00Z")
  },

  totalAmount: 209.98,
  paymentMethod: "credit_card",
  paymentStatus: "completed"
};

// Pattern 3: Hybrid approach for moderate growth
const blogPostWithComments = {
  _id: ObjectId("64f1a2c4567890abcdef1234"),
  title: "Advanced MongoDB Schema Design",
  slug: "advanced-mongodb-schema-design",
  author: {
    userId: ObjectId("64f1a2c4567890abcdef5678"),
    name: "Jane Developer",
    avatar: "https://cdn.example.com/avatars/jane.jpg"
  },

  content: "Comprehensive guide to MongoDB schema design patterns...",
  publishedAt: ISODate("2025-09-06T10:00:00Z"),

  // Embed recent comments for quick display
  recentComments: [
    {
      commentId: ObjectId("64f1a2c4567890abcdef9abc"),
      author: {
        userId: ObjectId("64f1a2c4567890abcdef9def"),
        name: "Mike Reader"
      },
      content: "Great article! Very helpful examples.",
      createdAt: ISODate("2025-09-06T11:30:00Z"),
      likes: 5
    }
    // Only store 5-10 most recent comments embedded
  ],

  // Summary information
  commentSummary: {
    totalComments: 142,
    totalLikes: 387,
    lastCommentAt: ISODate("2025-09-06T14:22:00Z")
  },

  tags: ["mongodb", "database", "schema", "design", "tutorial"],
  viewCount: 2847,
  likeCount: 89
};

Polymorphic Pattern

Handle documents with varying structures using polymorphic patterns:

// Polymorphic schema pattern for heterogeneous data
class PolymorphicSchemaManager {
  constructor(db) {
    this.db = db;
    this.contentCollection = db.collection('content_items');
  }

  // Base schema with discriminator field
  createContentSchema() {
    return {
      // Common fields for all content types
      _id: ObjectId,
      type: String, // Discriminator field
      title: String,
      author: {
        userId: ObjectId,
        name: String
      },
      createdAt: Date,
      updatedAt: Date,
      publishedAt: Date,
      status: String, // 'draft', 'published', 'archived'
      tags: [String],

      // Type-specific fields added based on 'type' discriminator
      // Will vary by document type
    };
  }

  // Article-specific schema
  createArticle(articleData) {
    return {
      ...this.createContentSchema(),
      type: 'article',
      content: articleData.content,
      excerpt: articleData.excerpt,
      readingTime: articleData.readingTime,
      wordCount: articleData.wordCount,

      // Article-specific metadata
      seo: {
        metaDescription: articleData.metaDescription,
        keywords: articleData.keywords,
        canonicalUrl: articleData.canonicalUrl
      },

      // Social sharing data
      socialMedia: {
        featured: articleData.featuredImage,
        ogTitle: articleData.ogTitle,
        ogDescription: articleData.ogDescription
      }
    };
  }

  // Video-specific schema
  createVideo(videoData) {
    return {
      ...this.createContentSchema(),
      type: 'video',

      // Video-specific fields
      videoUrl: videoData.videoUrl,
      duration: videoData.duration, // seconds
      thumbnail: videoData.thumbnail,
      resolution: videoData.resolution,

      // Video metadata
      transcription: videoData.transcription,
      chapters: [
        {
          title: String,
          startTime: Number, // seconds
          endTime: Number
        }
      ],

      // Streaming information
      streaming: {
        formats: [
          {
            quality: String, // '720p', '1080p', '4K'
            url: String,
            fileSize: Number
          }
        ],
        subtitles: [
          {
            language: String,
            url: String
          }
        ]
      }
    };
  }

  // Podcast-specific schema  
  createPodcast(podcastData) {
    return {
      ...this.createContentSchema(),
      type: 'podcast',

      // Podcast-specific fields
      audioUrl: podcastData.audioUrl,
      duration: podcastData.duration,
      transcript: podcastData.transcript,

      // Podcast metadata
      episode: {
        number: podcastData.episodeNumber,
        season: podcastData.season,
        seriesId: ObjectId(podcastData.seriesId)
      },

      // Guest information
      guests: [
        {
          name: String,
          bio: String,
          socialMedia: {
            twitter: String,
            linkedin: String,
            website: String
          }
        }
      ]
    };
  }

  // Query patterns that work across all content types
  async findContentByType(contentType, filters = {}) {
    const query = { 
      type: contentType, 
      status: 'published',
      ...filters 
    };

    return await this.contentCollection
      .find(query)
      .sort({ publishedAt: -1 })
      .toArray();
  }

  // Polymorphic aggregation across content types
  async getContentStats(dateRange = {}) {
    const matchStage = {
      status: 'published'
    };

    if (dateRange.start || dateRange.end) {
      matchStage.publishedAt = {};
      if (dateRange.start) matchStage.publishedAt.$gte = dateRange.start;
      if (dateRange.end) matchStage.publishedAt.$lte = dateRange.end;
    }

    const pipeline = [
      { $match: matchStage },
      {
        $group: {
          _id: '$type',
          count: { $sum: 1 },
          avgViews: { $avg: '$viewCount' },
          totalViews: { $sum: '$viewCount' },

          // Type-specific aggregations using conditional operators
          totalDuration: {
            $sum: {
              $cond: {
                if: { $in: ['$type', ['video', 'podcast']] },
                then: '$duration',
                else: 0
              }
            }
          },

          avgWordCount: {
            $avg: {
              $cond: {
                if: { $eq: ['$type', 'article'] },
                then: '$wordCount',
                else: null
              }
            }
          },

          recentContent: { $push: '$title' }
        }
      },
      {
        $addFields: {
          contentType: '$_id',
          avgDurationFormatted: {
            $cond: {
              if: { $gt: ['$totalDuration', 0] },
              then: {
                $concat: [
                  { $toString: { $floor: { $divide: ['$totalDuration', 60] } } },
                  ' minutes'
                ]
              },
              else: null
            }
          }
        }
      },
      {
        $project: {
          _id: 0,
          contentType: 1,
          count: 1,
          avgViews: { $round: ['$avgViews', 0] },
          totalViews: 1,
          totalDuration: 1,
          avgDurationFormatted: 1,
          avgWordCount: { $round: ['$avgWordCount', 0] },
          recentTitles: { $slice: ['$recentContent', 5] }
        }
      },
      {
        $sort: { count: -1 }
      }
    ];

    return await this.contentCollection.aggregate(pipeline).toArray();
  }

  // Search across polymorphic content
  async searchContent(searchTerm, filters = {}) {
    const pipeline = [
      {
        $match: {
          $text: { $search: searchTerm },
          status: 'published',
          ...filters
        }
      },
      {
        $addFields: {
          searchScore: { $meta: 'textScore' },

          // Add type-specific display fields
          displayContent: {
            $switch: {
              branches: [
                {
                  case: { $eq: ['$type', 'article'] },
                  then: '$excerpt'
                },
                {
                  case: { $eq: ['$type', 'video'] },
                  then: {
                    $concat: [
                      'Video - ',
                      { $toString: { $floor: { $divide: ['$duration', 60] } } },
                      ' minutes'
                    ]
                  }
                },
                {
                  case: { $eq: ['$type', 'podcast'] },
                  then: {
                    $concat: [
                      'Episode ',
                      { $toString: '$episode.number' },
                      ' - ',
                      { $toString: { $floor: { $divide: ['$duration', 60] } } },
                      ' minutes'
                    ]
                  }
                }
              ],
              default: 'Content item'
            }
          }
        }
      },
      {
        $sort: { searchScore: { $meta: 'textScore' }, publishedAt: -1 }
      },
      {
        $limit: 20
      }
    ];

    return await this.contentCollection.aggregate(pipeline).toArray();
  }
}

Attribute Pattern

Handle documents with many similar fields using the Attribute Pattern:

// Attribute Pattern for flexible document structures
class AttributePatternManager {
  constructor(db) {
    this.db = db;
    this.productsCollection = db.collection('products');
  }

  // Traditional approach - rigid schema with many optional fields
  createTraditionalProductSchema() {
    return {
      _id: ObjectId,
      name: String,
      sku: String,
      category: String,
      price: Number,

      // Electronics-specific fields
      screenSize: Number, // Only for TVs, monitors, phones
      resolution: String, // Only for displays
      processor: String,  // Only for computers, phones
      memory: Number,     // Only for computers, phones
      storage: Number,    // Only for computers, phones

      // Clothing-specific fields
      size: String,       // Only for clothing
      color: String,      // Only for clothing, some electronics
      material: String,   // Only for clothing, furniture

      // Book-specific fields
      author: String,     // Only for books
      isbn: String,       // Only for books
      pages: Number,      // Only for books
      publisher: String,  // Only for books

      // Problems:
      // - Many null/undefined fields for each product type
      // - Schema becomes bloated and hard to maintain
      // - Difficult to add new product categories
      // - Indexes become inefficient due to sparse data
    };
  }

  // Attribute Pattern - flexible schema with key-value attributes
  createAttributePatternSchema(productData) {
    return {
      _id: ObjectId(),
      name: productData.name,
      sku: productData.sku,
      category: productData.category,
      price: productData.price,

      // Core fields that apply to all products
      brand: productData.brand,
      description: productData.description,
      inStock: productData.inStock,

      // Flexible attributes array for category-specific properties
      attributes: [
        {
          name: 'screenSize',
          value: '55',
          unit: 'inches',
          type: 'number',
          searchable: true,
          displayName: 'Screen Size'
        },
        {
          name: 'resolution',
          value: '4K Ultra HD',
          type: 'string',
          searchable: true,
          displayName: 'Resolution'
        },
        {
          name: 'smartTV',
          value: true,
          type: 'boolean',
          searchable: true,
          displayName: 'Smart TV Features'
        },
        {
          name: 'energyRating',
          value: 'A+',
          type: 'string',
          searchable: true,
          displayName: 'Energy Rating',
          category: 'specifications'
        }
      ],

      // Denormalized searchable attributes for query performance
      searchableAttributes: {
        'screenSize': 55,
        'resolution': '4K Ultra HD',
        'smartTV': true,
        'energyRating': 'A+'
      },

      // Attribute categories for UI organization
      attributeCategories: [
        {
          name: 'display',
          displayName: 'Display',
          attributes: ['screenSize', 'resolution']
        },
        {
          name: 'features',
          displayName: 'Features', 
          attributes: ['smartTV']
        },
        {
          name: 'specifications',
          displayName: 'Specifications',
          attributes: ['energyRating']
        }
      ],

      createdAt: new Date(),
      updatedAt: new Date()
    };
  }

  async createProduct(productData) {
    // Validate and process attributes
    const processedAttributes = this.processAttributes(productData.attributes);

    const product = {
      name: productData.name,
      sku: productData.sku,
      category: productData.category,
      price: productData.price,
      brand: productData.brand,
      description: productData.description,
      inStock: productData.inStock,

      attributes: processedAttributes.attributes,
      searchableAttributes: processedAttributes.searchableMap,
      attributeCategories: this.categorizeAttributes(processedAttributes.attributes),

      createdAt: new Date(),
      updatedAt: new Date()
    };

    const result = await this.productsCollection.insertOne(product);
    return { productId: result.insertedId, ...product };
  }

  processAttributes(attributesInput) {
    const attributes = [];
    const searchableMap = {};

    attributesInput.forEach(attr => {
      const processedAttr = {
        name: attr.name,
        value: attr.value,
        type: attr.type || this.inferType(attr.value),
        unit: attr.unit || null,
        searchable: attr.searchable !== false, // Default to true
        displayName: attr.displayName || this.formatDisplayName(attr.name),
        category: attr.category || 'general'
      };

      attributes.push(processedAttr);

      // Create searchable map for efficient querying
      if (processedAttr.searchable) {
        let searchValue = processedAttr.value;

        // Convert to appropriate type for searching
        if (processedAttr.type === 'number') {
          searchValue = parseFloat(searchValue);
        } else if (processedAttr.type === 'boolean') {
          searchValue = Boolean(searchValue);
        }

        searchableMap[processedAttr.name] = searchValue;
      }
    });

    return { attributes, searchableMap };
  }

  async searchProductsByAttributes(searchCriteria) {
    // Build query using searchableAttributes for performance
    const query = {};

    if (searchCriteria.category) {
      query.category = searchCriteria.category;
    }

    if (searchCriteria.priceRange) {
      query.price = {
        $gte: searchCriteria.priceRange.min || 0,
        $lte: searchCriteria.priceRange.max || Number.MAX_VALUE
      };
    }

    // Build attribute filters
    if (searchCriteria.attributes) {
      Object.entries(searchCriteria.attributes).forEach(([attrName, attrValue]) => {
        if (attrValue.operator === 'range' && attrValue.min !== undefined) {
          query[`searchableAttributes.${attrName}`] = {
            $gte: attrValue.min,
            $lte: attrValue.max || Number.MAX_VALUE
          };
        } else if (attrValue.operator === 'in') {
          query[`searchableAttributes.${attrName}`] = { $in: attrValue.values };
        } else {
          query[`searchableAttributes.${attrName}`] = attrValue;
        }
      });
    }

    return await this.productsCollection
      .find(query)
      .sort({ price: 1 })
      .toArray();
  }

  async getAttributeFilterOptions(category) {
    // Generate filter options for UI based on existing attributes
    const pipeline = [
      { $match: { category: category } },
      { $unwind: '$attributes' },
      {
        $group: {
          _id: {
            name: '$attributes.name',
            displayName: '$attributes.displayName',
            type: '$attributes.type',
            unit: '$attributes.unit'
          },
          values: { 
            $addToSet: '$attributes.value' 
          },
          minValue: { 
            $min: {
              $cond: {
                if: { $eq: ['$attributes.type', 'number'] },
                then: { $toDouble: '$attributes.value' },
                else: null
              }
            }
          },
          maxValue: {
            $max: {
              $cond: {
                if: { $eq: ['$attributes.type', 'number'] },
                then: { $toDouble: '$attributes.value' },
                else: null
              }
            }
          },
          productCount: { $sum: 1 }
        }
      },
      {
        $project: {
          attributeName: '$_id.name',
          displayName: '$_id.displayName',
          type: '$_id.type',
          unit: '$_id.unit',
          values: 1,
          minValue: 1,
          maxValue: 1,
          productCount: 1,

          // Format for UI consumption
          filterConfig: {
            $cond: {
              if: { $eq: ['$_id.type', 'number'] },
              then: {
                type: 'range',
                min: '$minValue',
                max: '$maxValue',
                unit: '$_id.unit'
              },
              else: {
                type: 'select',
                options: '$values'
              }
            }
          },
          _id: 0
        }
      },
      { $sort: { productCount: -1 } }
    ];

    return await this.productsCollection.aggregate(pipeline).toArray();
  }

  async updateProductAttributes(productId, attributeUpdates) {
    const product = await this.productsCollection.findOne({ _id: ObjectId(productId) });

    if (!product) {
      throw new Error('Product not found');
    }

    // Update specific attributes while preserving others
    const updatedAttributes = product.attributes.map(attr => {
      const update = attributeUpdates.find(u => u.name === attr.name);
      return update ? { ...attr, ...update, updatedAt: new Date() } : attr;
    });

    // Add new attributes
    attributeUpdates.forEach(update => {
      const exists = updatedAttributes.find(attr => attr.name === update.name);
      if (!exists) {
        updatedAttributes.push({
          ...update,
          type: update.type || this.inferType(update.value),
          searchable: update.searchable !== false,
          displayName: update.displayName || this.formatDisplayName(update.name),
          createdAt: new Date()
        });
      }
    });

    // Rebuild searchable attributes
    const searchableMap = {};
    updatedAttributes.forEach(attr => {
      if (attr.searchable) {
        searchableMap[attr.name] = this.convertToSearchableType(attr.value, attr.type);
      }
    });

    const updateResult = await this.productsCollection.updateOne(
      { _id: ObjectId(productId) },
      {
        $set: {
          attributes: updatedAttributes,
          searchableAttributes: searchableMap,
          attributeCategories: this.categorizeAttributes(updatedAttributes),
          updatedAt: new Date()
        }
      }
    );

    return updateResult;
  }

  // Utility methods
  inferType(value) {
    if (typeof value === 'boolean') return 'boolean';
    if (!isNaN(value) && !isNaN(parseFloat(value))) return 'number';
    return 'string';
  }

  formatDisplayName(name) {
    return name.replace(/([A-Z])/g, ' $1')
               .replace(/^./, str => str.toUpperCase())
               .trim();
  }

  convertToSearchableType(value, type) {
    switch (type) {
      case 'number':
        return parseFloat(value);
      case 'boolean':
        return Boolean(value);
      default:
        return value;
    }
  }

  categorizeAttributes(attributes) {
    // Group attributes by category for UI organization
    const categories = {};

    attributes.forEach(attr => {
      const category = attr.category || 'general';
      if (!categories[category]) {
        categories[category] = {
          name: category,
          displayName: this.formatDisplayName(category),
          attributes: []
        };
      }
      categories[category].attributes.push(attr.name);
    });

    return Object.values(categories);
  }
}

Advanced Schema Patterns

Bucket Pattern

Optimize for time-series and high-volume data using the Bucket Pattern:

// Bucket Pattern for time-series and IoT data optimization
class BucketPatternManager {
  constructor(db) {
    this.db = db;
    this.bucketSize = 60; // 60 measurements per bucket (1 hour if 1 per minute)
    this.sensorDataCollection = db.collection('sensor_data_buckets');
  }

  // Traditional approach - one document per measurement
  createTraditionalSensorReading() {
    return {
      _id: ObjectId(),
      deviceId: 'sensor_001',
      timestamp: ISODate('2025-09-06T14:30:00Z'),
      temperature: 23.5,
      humidity: 65.2,
      pressure: 1013.25,
      location: {
        building: 'A',
        floor: 3,
        room: '301'
      }
    };
    // Problems:
    // - High insertion overhead (many small documents)
    // - Index overhead scales linearly with measurements
    // - Poor query performance for time ranges
    // - Inefficient storage utilization
  }

  // Bucket Pattern - group measurements by time and device
  createSensorDataBucket(deviceId, bucketStartTime) {
    return {
      _id: ObjectId(),
      deviceId: deviceId,
      bucketStartTime: bucketStartTime,
      bucketEndTime: new Date(bucketStartTime.getTime() + 60 * 60 * 1000), // 1 hour

      // Device metadata (denormalized for query efficiency)
      deviceInfo: {
        type: 'environmental_sensor',
        model: 'EnvSensor_v2.1',
        location: {
          building: 'A',
          floor: 3,
          room: '301',
          coordinates: { lat: 40.7128, lng: -74.0060 }
        }
      },

      // Bucket statistics for quick analysis
      stats: {
        measurementCount: 0,
        temperature: { min: null, max: null, sum: 0, avg: null },
        humidity: { min: null, max: null, sum: 0, avg: null },
        pressure: { min: null, max: null, sum: 0, avg: null }
      },

      // Time-series measurements array
      measurements: [
        // Will be populated with individual readings
      ],

      createdAt: new Date(),
      lastUpdated: new Date()
    };
  }

  async addSensorReading(deviceId, reading) {
    const bucketStartTime = this.getBucketStartTime(reading.timestamp);
    const bucketId = `${deviceId}_${bucketStartTime.toISOString()}`;

    // Try to add to existing bucket
    const updateResult = await this.sensorDataCollection.updateOne(
      {
        deviceId: deviceId,
        bucketStartTime: bucketStartTime,
        'stats.measurementCount': { $lt: this.bucketSize }
      },
      {
        $push: {
          measurements: {
            timestamp: reading.timestamp,
            temperature: reading.temperature,
            humidity: reading.humidity,
            pressure: reading.pressure
          }
        },
        $inc: { 
          'stats.measurementCount': 1,
          'stats.temperature.sum': reading.temperature,
          'stats.humidity.sum': reading.humidity,
          'stats.pressure.sum': reading.pressure
        },
        $min: {
          'stats.temperature.min': reading.temperature,
          'stats.humidity.min': reading.humidity,
          'stats.pressure.min': reading.pressure
        },
        $max: {
          'stats.temperature.max': reading.temperature,
          'stats.humidity.max': reading.humidity,
          'stats.pressure.max': reading.pressure
        },
        $set: { lastUpdated: new Date() }
      }
    );

    // Create new bucket if no existing bucket found or bucket is full
    if (updateResult.matchedCount === 0) {
      const newBucket = this.createSensorDataBucket(deviceId, bucketStartTime);
      newBucket.measurements = [{
        timestamp: reading.timestamp,
        temperature: reading.temperature,
        humidity: reading.humidity,
        pressure: reading.pressure
      }];
      newBucket.stats = {
        measurementCount: 1,
        temperature: { 
          min: reading.temperature, 
          max: reading.temperature, 
          sum: reading.temperature,
          avg: reading.temperature
        },
        humidity: { 
          min: reading.humidity, 
          max: reading.humidity, 
          sum: reading.humidity,
          avg: reading.humidity
        },
        pressure: { 
          min: reading.pressure, 
          max: reading.pressure, 
          sum: reading.pressure,
          avg: reading.pressure
        }
      };

      await this.sensorDataCollection.insertOne(newBucket);
    } else {
      // Update averages after successful insertion
      await this.updateBucketAverages(deviceId, bucketStartTime);
    }
  }

  async updateBucketAverages(deviceId, bucketStartTime) {
    // Recalculate averages after adding measurements
    await this.sensorDataCollection.updateOne(
      { deviceId: deviceId, bucketStartTime: bucketStartTime },
      [
        {
          $set: {
            'stats.temperature.avg': { 
              $divide: ['$stats.temperature.sum', '$stats.measurementCount'] 
            },
            'stats.humidity.avg': { 
              $divide: ['$stats.humidity.sum', '$stats.measurementCount'] 
            },
            'stats.pressure.avg': { 
              $divide: ['$stats.pressure.sum', '$stats.measurementCount'] 
            }
          }
        }
      ]
    );
  }

  async querySensorDataRange(deviceId, startTime, endTime) {
    // Query buckets that overlap with the time range
    const pipeline = [
      {
        $match: {
          deviceId: deviceId,
          bucketStartTime: { $lte: endTime },
          bucketEndTime: { $gte: startTime }
        }
      },
      {
        $unwind: '$measurements'
      },
      {
        $match: {
          'measurements.timestamp': {
            $gte: startTime,
            $lte: endTime
          }
        }
      },
      {
        $replaceRoot: {
          newRoot: {
            $mergeObjects: [
              '$measurements',
              {
                deviceId: '$deviceId',
                deviceInfo: '$deviceInfo'
              }
            ]
          }
        }
      },
      {
        $sort: { timestamp: 1 }
      }
    ];

    return await this.sensorDataCollection.aggregate(pipeline).toArray();
  }

  async getDeviceStatsSummary(deviceId, timeRange) {
    // Get aggregated statistics across multiple buckets
    const pipeline = [
      {
        $match: {
          deviceId: deviceId,
          bucketStartTime: { 
            $gte: timeRange.start,
            $lte: timeRange.end 
          }
        }
      },
      {
        $group: {
          _id: '$deviceId',
          totalMeasurements: { $sum: '$stats.measurementCount' },
          temperature: {
            min: { $min: '$stats.temperature.min' },
            max: { $max: '$stats.temperature.max' },
            avgOfAvgs: { $avg: '$stats.temperature.avg' }
          },
          humidity: {
            min: { $min: '$stats.humidity.min' },
            max: { $max: '$stats.humidity.max' },
            avgOfAvgs: { $avg: '$stats.humidity.avg' }
          },
          pressure: {
            min: { $min: '$stats.pressure.min' },
            max: { $max: '$stats.pressure.max' },
            avgOfAvgs: { $avg: '$stats.pressure.avg' }
          },
          deviceInfo: { $first: '$deviceInfo' },
          bucketsAnalyzed: { $sum: 1 },
          timeRange: {
            start: { $min: '$bucketStartTime' },
            end: { $max: '$bucketEndTime' }
          }
        }
      },
      {
        $project: {
          deviceId: '$_id',
          totalMeasurements: 1,
          temperature: {
            min: { $round: ['$temperature.min', 1] },
            max: { $round: ['$temperature.max', 1] },
            avg: { $round: ['$temperature.avgOfAvgs', 1] }
          },
          humidity: {
            min: { $round: ['$humidity.min', 1] },
            max: { $round: ['$humidity.max', 1] },
            avg: { $round: ['$humidity.avgOfAvgs', 1] }
          },
          pressure: {
            min: { $round: ['$pressure.min', 2] },
            max: { $round: ['$pressure.max', 2] },
            avg: { $round: ['$pressure.avgOfAvgs', 2] }
          },
          deviceInfo: 1,
          bucketsAnalyzed: 1,
          timeRange: 1,
          _id: 0
        }
      }
    ];

    const results = await this.sensorDataCollection.aggregate(pipeline).toArray();
    return results[0];
  }

  getBucketStartTime(timestamp) {
    // Round down to the nearest hour for hourly buckets
    const date = new Date(timestamp);
    date.setMinutes(0, 0, 0);
    return date;
  }

  async setupBucketIndexes() {
    // Optimize indexes for bucket pattern queries
    await this.sensorDataCollection.createIndexes([
      // Primary bucket lookup
      { key: { deviceId: 1, bucketStartTime: 1 } },

      // Time range queries
      { key: { bucketStartTime: 1, bucketEndTime: 1 } },

      // Device queries
      { key: { deviceId: 1, bucketStartTime: -1 } },

      // Location-based queries
      { key: { 'deviceInfo.location.building': 1, 'deviceInfo.location.floor': 1 } },

      // Geospatial queries
      { key: { 'deviceInfo.location.coordinates': '2dsphere' } }
    ]);
  }
}

Computed Pattern

Store pre-calculated values to improve query performance:

// Computed Pattern for performance optimization
class ComputedPatternManager {
  constructor(db) {
    this.db = db;
    this.ordersCollection = db.collection('orders');
    this.customersCollection = db.collection('customers');
  }

  // Traditional approach requiring real-time calculations
  async getCustomerInsightsTraditional(customerId) {
    const pipeline = [
      { $match: { customerId: ObjectId(customerId) } },
      {
        $group: {
          _id: '$customerId',
          totalOrders: { $sum: 1 },
          totalSpent: { $sum: '$totalAmount' },
          avgOrderValue: { $avg: '$totalAmount' },
          firstOrderDate: { $min: '$orderDate' },
          lastOrderDate: { $max: '$orderDate' },
          favoriteCategories: { $push: '$items.category' }
        }
      },
      {
        $addFields: {
          customerLifetimeDays: {
            $divide: [
              { $subtract: [new Date(), '$firstOrderDate'] },
              86400000 // milliseconds in a day
            ]
          }
        }
      }
    ];

    // Problems:
    // - Expensive aggregation on every request
    // - Poor performance as order history grows
    // - High CPU usage for frequently accessed data
    // - Scaling issues with concurrent requests

    return await this.ordersCollection.aggregate(pipeline).toArray();
  }

  // Computed Pattern - pre-calculate and store values
  async createCustomerWithComputedFields(customerData) {
    const customer = {
      _id: ObjectId(),
      firstName: customerData.firstName,
      lastName: customerData.lastName,
      email: customerData.email,
      phone: customerData.phone,

      // Core customer information
      addresses: customerData.addresses || [],
      preferences: customerData.preferences || {},

      // Computed order statistics (updated on each order)
      orderStats: {
        totalOrders: 0,
        totalSpent: 0,
        averageOrderValue: 0,
        largestOrder: 0,
        smallestOrder: null,

        // Time-based analytics
        firstOrderDate: null,
        lastOrderDate: null,
        customerLifetimeDays: 0,
        averageDaysBetweenOrders: 0,

        // Purchase behavior patterns
        favoriteCategories: [],
        preferredPaymentMethods: [],
        seasonalPatterns: {
          spring: { orders: 0, spent: 0 },
          summer: { orders: 0, spent: 0 },
          fall: { orders: 0, spent: 0 },
          winter: { orders: 0, spent: 0 }
        },

        // Customer lifecycle stage
        lifeCycleStage: 'new', // new, active, at_risk, churned, vip
        riskScore: 0, // 0-100 churn risk score
        clvScore: 0   // Customer Lifetime Value score
      },

      // Category-specific insights
      categoryInsights: [
        // Will be populated as: { category: 'electronics', orders: 5, spent: 1299.99, lastPurchase: Date }
      ],

      // Behavioral segments
      segments: ['new_customer'], // Updated based on computed metrics

      // Last computation timestamp for staleness detection
      lastComputedAt: new Date(),

      createdAt: new Date(),
      updatedAt: new Date()
    };

    await this.customersCollection.insertOne(customer);
    return customer;
  }

  async updateCustomerComputedFields(customerId, newOrder) {
    const session = this.db.client.startSession();

    try {
      await session.withTransaction(async () => {
        // Get current customer data
        const customer = await this.customersCollection.findOne(
          { _id: ObjectId(customerId) },
          { session }
        );

        if (!customer) {
          throw new Error('Customer not found');
        }

        // Calculate updated statistics
        const updatedStats = this.calculateUpdatedStats(customer.orderStats, newOrder);
        const updatedCategoryInsights = this.updateCategoryInsights(
          customer.categoryInsights || [], 
          newOrder
        );
        const updatedSegments = this.calculateCustomerSegments(updatedStats, customer);

        // Update customer document with computed values
        await this.customersCollection.updateOne(
          { _id: ObjectId(customerId) },
          {
            $set: {
              orderStats: updatedStats,
              categoryInsights: updatedCategoryInsights,
              segments: updatedSegments,
              lastComputedAt: new Date(),
              updatedAt: new Date()
            }
          },
          { session }
        );

        // Insert the order
        await this.ordersCollection.insertOne({
          ...newOrder,
          customerId: ObjectId(customerId)
        }, { session });
      });
    } finally {
      await session.endSession();
    }
  }

  calculateUpdatedStats(currentStats, newOrder) {
    const newTotalOrders = currentStats.totalOrders + 1;
    const newTotalSpent = currentStats.totalSpent + newOrder.totalAmount;
    const newAverageOrderValue = newTotalSpent / newTotalOrders;

    const today = new Date();
    const orderDate = new Date(newOrder.orderDate);

    // Calculate time-based metrics
    let customerLifetimeDays = currentStats.customerLifetimeDays;
    let averageDaysBetweenOrders = currentStats.averageDaysBetweenOrders;

    const firstOrderDate = currentStats.firstOrderDate || orderDate;
    if (currentStats.firstOrderDate) {
      customerLifetimeDays = Math.floor((today - firstOrderDate) / (1000 * 60 * 60 * 24));
      if (newTotalOrders > 1) {
        averageDaysBetweenOrders = Math.floor(customerLifetimeDays / (newTotalOrders - 1));
      }
    }

    // Calculate seasonal patterns
    const season = this.getSeason(orderDate);
    const seasonalPatterns = { ...currentStats.seasonalPatterns };
    seasonalPatterns[season].orders += 1;
    seasonalPatterns[season].spent += newOrder.totalAmount;

    // Update favorite categories
    const categories = newOrder.items.map(item => item.category);
    const favoriteCategories = this.updateFavoriteCategories(
      currentStats.favoriteCategories || [], 
      categories
    );

    // Calculate lifecycle stage and risk scores
    const lifeCycleStage = this.calculateLifeCycleStage(newTotalOrders, customerLifetimeDays, averageDaysBetweenOrders);
    const riskScore = this.calculateChurnRiskScore(currentStats, orderDate);
    const clvScore = this.calculateCLVScore(newTotalSpent, newTotalOrders, customerLifetimeDays);

    return {
      totalOrders: newTotalOrders,
      totalSpent: Math.round(newTotalSpent * 100) / 100,
      averageOrderValue: Math.round(newAverageOrderValue * 100) / 100,
      largestOrder: Math.max(currentStats.largestOrder || 0, newOrder.totalAmount),
      smallestOrder: currentStats.smallestOrder ? 
        Math.min(currentStats.smallestOrder, newOrder.totalAmount) : 
        newOrder.totalAmount,

      firstOrderDate: firstOrderDate,
      lastOrderDate: orderDate,
      customerLifetimeDays: customerLifetimeDays,
      averageDaysBetweenOrders: averageDaysBetweenOrders,

      favoriteCategories: favoriteCategories,
      preferredPaymentMethods: this.updatePreferredPaymentMethods(
        currentStats.preferredPaymentMethods || [], 
        newOrder.paymentMethod
      ),
      seasonalPatterns: seasonalPatterns,

      lifeCycleStage: lifeCycleStage,
      riskScore: riskScore,
      clvScore: clvScore
    };
  }

  updateCategoryInsights(currentInsights, newOrder) {
    const categoryMap = new Map();

    // Load existing insights
    currentInsights.forEach(insight => {
      categoryMap.set(insight.category, insight);
    });

    // Update with new order data
    newOrder.items.forEach(item => {
      const existing = categoryMap.get(item.category) || {
        category: item.category,
        orders: 0,
        spent: 0,
        items: 0,
        firstPurchase: new Date(newOrder.orderDate),
        lastPurchase: new Date(newOrder.orderDate),
        averageOrderValue: 0
      };

      existing.orders += 1;
      existing.spent += item.price * item.quantity;
      existing.items += item.quantity;
      existing.lastPurchase = new Date(newOrder.orderDate);
      existing.averageOrderValue = existing.spent / existing.orders;

      categoryMap.set(item.category, existing);
    });

    return Array.from(categoryMap.values())
      .sort((a, b) => b.spent - a.spent)
      .slice(0, 10); // Keep top 10 categories
  }

  calculateCustomerSegments(orderStats, customer) {
    const segments = [];

    // Value-based segments
    if (orderStats.totalSpent > 5000) {
      segments.push('vip');
    } else if (orderStats.totalSpent > 1000) {
      segments.push('high_value');
    } else if (orderStats.totalSpent > 200) {
      segments.push('medium_value');
    } else {
      segments.push('low_value');
    }

    // Frequency-based segments
    if (orderStats.averageDaysBetweenOrders < 30) {
      segments.push('frequent_buyer');
    } else if (orderStats.averageDaysBetweenOrders < 90) {
      segments.push('regular_buyer');
    } else {
      segments.push('occasional_buyer');
    }

    // Lifecycle segments
    segments.push(orderStats.lifeCycleStage);

    // Risk segments
    if (orderStats.riskScore > 70) {
      segments.push('high_churn_risk');
    } else if (orderStats.riskScore > 40) {
      segments.push('medium_churn_risk');
    }

    // Behavioral segments
    const topCategory = orderStats.favoriteCategories[0];
    if (topCategory) {
      segments.push(`${topCategory}_enthusiast`);
    }

    return segments;
  }

  async getCustomerInsightsOptimized(customerId) {
    // Simply retrieve pre-computed values - much faster!
    const customer = await this.customersCollection.findOne(
      { _id: ObjectId(customerId) },
      {
        projection: {
          firstName: 1,
          lastName: 1,
          email: 1,
          orderStats: 1,
          categoryInsights: 1,
          segments: 1,
          lastComputedAt: 1
        }
      }
    );

    if (!customer) {
      throw new Error('Customer not found');
    }

    // Check if computed data is stale (older than 24 hours)
    const dataAge = Date.now() - customer.lastComputedAt.getTime();
    const isStale = dataAge > 24 * 60 * 60 * 1000;

    return {
      ...customer,
      dataAge: Math.floor(dataAge / (1000 * 60 * 60)), // hours
      isStale: isStale
    };
  }

  // Utility methods
  getSeason(date) {
    const month = date.getMonth();
    if (month >= 2 && month <= 4) return 'spring';
    if (month >= 5 && month <= 7) return 'summer';
    if (month >= 8 && month <= 10) return 'fall';
    return 'winter';
  }

  updateFavoriteCategories(current, newCategories) {
    const categoryCount = new Map();

    // Count existing categories
    current.forEach(cat => {
      categoryCount.set(cat.category, (categoryCount.get(cat.category) || 0) + cat.count);
    });

    // Add new categories
    newCategories.forEach(cat => {
      categoryCount.set(cat, (categoryCount.get(cat) || 0) + 1);
    });

    // Convert back to array and sort by count
    return Array.from(categoryCount.entries())
      .map(([category, count]) => ({ category, count }))
      .sort((a, b) => b.count - a.count)
      .slice(0, 5); // Top 5 categories
  }

  updatePreferredPaymentMethods(current, newMethod) {
    const methodMap = new Map();

    current.forEach(method => {
      methodMap.set(method.method, method.count);
    });

    methodMap.set(newMethod, (methodMap.get(newMethod) || 0) + 1);

    return Array.from(methodMap.entries())
      .map(([method, count]) => ({ method, count }))
      .sort((a, b) => b.count - a.count);
  }

  calculateLifeCycleStage(totalOrders, lifetimeDays, avgDaysBetweenOrders) {
    if (totalOrders === 1) return 'new';
    if (totalOrders >= 20 && avgDaysBetweenOrders < 45) return 'vip';
    if (avgDaysBetweenOrders > 180) return 'at_risk';
    if (lifetimeDays > 365 && avgDaysBetweenOrders < 90) return 'loyal';
    return 'active';
  }

  calculateChurnRiskScore(stats, lastOrderDate) {
    const daysSinceLastOrder = Math.floor((Date.now() - lastOrderDate.getTime()) / (1000 * 60 * 60 * 24));
    const avgDays = stats.averageDaysBetweenOrders || 30;

    let risk = 0;

    // Time since last order risk
    if (daysSinceLastOrder > avgDays * 2) risk += 40;
    else if (daysSinceLastOrder > avgDays * 1.5) risk += 20;

    // Order frequency risk  
    if (stats.totalOrders < 3) risk += 20;

    // Engagement risk
    if (stats.averageOrderValue < 50) risk += 15;

    return Math.min(risk, 100);
  }

  calculateCLVScore(totalSpent, totalOrders, lifetimeDays) {
    if (lifetimeDays === 0) return totalSpent;

    const annualValue = (totalSpent / lifetimeDays) * 365;
    const frequencyMultiplier = Math.min(totalOrders / 12, 2); // Max 2x for frequency

    return Math.round(annualValue * frequencyMultiplier);
  }
}

QueryLeaf Schema Design Integration

QueryLeaf provides SQL-familiar approaches to MongoDB schema design:

-- QueryLeaf schema design with SQL-style patterns

-- Embedded document queries (equivalent to SQL JOINs)
SELECT 
  c.firstName,
  c.lastName,
  c.email,
  addr.street,
  addr.city,
  addr.state
FROM customers c,
     c.addresses addr
WHERE addr.type = 'shipping'
  AND addr.isDefault = true;

-- Reference-based queries (traditional foreign key style)
SELECT 
  o.orderNumber,
  o.orderDate,
  o.totalAmount,
  c.firstName,
  c.lastName
FROM orders o
JOIN customers c ON o.customerId = c._id
WHERE o.orderDate >= CURRENT_DATE - INTERVAL '30 days'
ORDER BY o.totalAmount DESC;

-- Polymorphic content queries
SELECT 
  title,
  author.name,
  publishedAt,
  CASE type
    WHEN 'article' THEN CONCAT('Article - ', wordCount, ' words')
    WHEN 'video' THEN CONCAT('Video - ', FLOOR(duration/60), ' minutes')
    WHEN 'podcast' THEN CONCAT('Podcast Episode ', episode.number)
    ELSE 'Content'
  END as content_description
FROM content_items
WHERE status = 'published'
  AND publishedAt >= CURRENT_DATE - INTERVAL '7 days'
ORDER BY publishedAt DESC;

-- Attribute pattern queries with dynamic filtering
SELECT 
  name,
  sku,
  price,
  category,
  -- Extract specific attributes as columns
  searchableAttributes.screenSize as screen_size,
  searchableAttributes.resolution as resolution,
  searchableAttributes.smartTV as smart_tv
FROM products
WHERE category = 'televisions'
  AND searchableAttributes.screenSize >= 50
  AND searchableAttributes.smartTV = true
  AND price BETWEEN 500 AND 2000
ORDER BY price ASC;

-- Bucket pattern time-series queries
SELECT 
  deviceId,
  DATE_TRUNC('day', bucketStartTime) as measurement_date,
  SUM(stats.measurementCount) as total_readings,
  AVG(stats.temperature.avg) as avg_temperature,
  MIN(stats.temperature.min) as min_temperature,
  MAX(stats.temperature.max) as max_temperature
FROM sensor_data_buckets
WHERE deviceId IN ('sensor_001', 'sensor_002', 'sensor_003')
  AND bucketStartTime >= CURRENT_DATE - INTERVAL '7 days'
GROUP BY deviceId, DATE_TRUNC('day', bucketStartTime)
ORDER BY measurement_date DESC, deviceId;

-- Computed pattern optimized queries
SELECT 
  firstName,
  lastName,
  email,
  orderStats.totalOrders as total_orders,
  orderStats.totalSpent as total_spent,
  orderStats.averageOrderValue as avg_order_value,
  orderStats.lifeCycleStage as customer_stage,
  ARRAY_TO_STRING(segments, ', ') as customer_segments
FROM customers
WHERE orderStats.lifeCycleStage = 'vip'
  AND orderStats.totalSpent > 1000
  AND 'frequent_buyer' = ANY(segments)
ORDER BY orderStats.totalSpent DESC;

-- Complex schema analysis queries
WITH schema_analysis AS (
  SELECT 
    collection_name,
    COUNT(*) as document_count,
    AVG(BSON_SIZE(document)) as avg_doc_size,
    SUM(BSON_SIZE(document)) / 1024 / 1024 as total_size_mb
  FROM system_collections
  GROUP BY collection_name
)
SELECT 
  collection_name,
  document_count,
  ROUND(avg_doc_size::numeric, 0) as avg_size_bytes,
  ROUND(total_size_mb::numeric, 2) as total_mb,
  CASE 
    WHEN avg_doc_size > 1048576 THEN 'Large documents - consider referencing'
    WHEN avg_doc_size < 1000 THEN 'Small documents - consider embedding'
    ELSE 'Optimal size'
  END as size_recommendation
FROM schema_analysis
ORDER BY total_size_mb DESC;

-- QueryLeaf automatically optimizes for:
-- 1. Document embedding vs referencing decisions
-- 2. Index recommendations based on query patterns  
-- 3. Schema pattern detection and suggestions
-- 4. Query performance optimization across patterns
-- 5. Automatic handling of polymorphic document structures

Best Practices for MongoDB Schema Design

Schema Design Guidelines

Essential practices for effective MongoDB schema design:

Understand Query Patterns: Design schemas based on how data will be queried, not just how it's structured
Consider Data Relationships: Choose embedding vs referencing based on relationship cardinality and access patterns
Plan for Growth: Consider how document size and collection growth will impact performance
Optimize for Common Operations: Design schemas to minimize the number of database operations for frequent use cases
Use Appropriate Patterns: Apply established patterns (Attribute, Bucket, Computed) where they fit your use case
Index Strategy: Design indexes that support your schema patterns and query requirements

Performance Considerations

Optimize schema design for performance:

Document Size: Keep frequently accessed documents under 1MB when possible
Array Growth: Limit embedded array sizes to prevent unbounded growth
Atomic Operations: Design schemas to support atomic operations where needed
Read vs Write Optimization: Balance schema design between read and write performance requirements
Computed Values: Use computed patterns for frequently calculated values
Index Efficiency: Design schemas that work well with compound indexes

Conclusion

MongoDB schema design requires thoughtful consideration of data relationships, access patterns, and performance requirements. The flexibility of document-based storage provides powerful opportunities for optimization, but also requires careful planning to avoid common pitfalls.

Key schema design principles include:

Pattern-Based Design: Apply proven patterns (Embedding, Referencing, Attribute, Bucket, Computed) based on specific use cases
Query-Driven Modeling: Design schemas primarily around query patterns rather than data normalization
Performance Optimization: Balance document size, array growth, and atomic operation requirements
Flexibility Planning: Design schemas that can evolve with changing application requirements
Indexing Strategy: Create schemas that work efficiently with MongoDB's indexing capabilities

Whether you're building e-commerce platforms, content management systems, IoT applications, or analytics platforms, MongoDB schema patterns with QueryLeaf's familiar SQL interface provide the foundation for scalable, high-performance applications. This combination enables you to implement sophisticated data models while preserving familiar development patterns and query approaches.

QueryLeaf Integration: QueryLeaf automatically detects MongoDB schema patterns and optimizes SQL queries to leverage document structures, embedded relationships, and computed values. Complex schema patterns, polymorphic queries, and pattern-specific optimizations are seamlessly handled through familiar SQL syntax while maintaining the performance benefits of well-designed MongoDB schemas.

The integration of flexible document modeling with SQL-style query patterns makes MongoDB an ideal platform for applications requiring both sophisticated data structures and familiar database interaction patterns, ensuring your schema designs remain both powerful and maintainable as they scale and evolve.

September 5, 2025
18 min read

MongoDB Change Streams: Real-Time Data Synchronization with SQL-Style Event Processing

Modern applications require real-time responsiveness to data changes. Whether you're building collaborative editing tools, live dashboards, inventory management systems, or notification services, the ability to react instantly to data modifications is essential for delivering responsive user experiences and maintaining data consistency across distributed systems.

Traditional approaches to real-time data synchronization often rely on application-level polling, message queues, or complex custom trigger systems that can be resource-intensive, error-prone, and difficult to maintain. MongoDB Change Streams provide a native, efficient solution that allows applications to listen for data changes in real-time with minimal overhead.

The Real-Time Data Challenge

Conventional approaches to detecting data changes have significant limitations:

-- SQL polling approach - inefficient and delayed
-- Application repeatedly checks for changes
SELECT 
  order_id,
  status,
  updated_at,
  customer_id
FROM orders
WHERE updated_at > '2025-09-05 10:00:00'
  AND status IN ('pending', 'processing')
ORDER BY updated_at DESC;

-- Problems with polling:
-- - Constant database load from repeated queries
-- - Delay between actual change and detection
-- - Missed changes between polling intervals
-- - No differentiation between insert/update/delete operations
-- - Scaling issues with high-frequency changes

-- Trigger-based approaches - complex maintenance
CREATE OR REPLACE FUNCTION notify_order_change()
RETURNS TRIGGER AS $$
BEGIN
  IF TG_OP = 'INSERT' THEN
    PERFORM pg_notify('order_changes', json_build_object(
      'operation', 'insert',
      'order_id', NEW.order_id,
      'status', NEW.status
    )::text);
  ELSIF TG_OP = 'UPDATE' THEN
    PERFORM pg_notify('order_changes', json_build_object(
      'operation', 'update', 
      'order_id', NEW.order_id,
      'old_status', OLD.status,
      'new_status', NEW.status
    )::text);
  END IF;
  RETURN COALESCE(NEW, OLD);
END;
$$ LANGUAGE plpgsql;

-- Problems: Complex setup, maintenance overhead, limited filtering

MongoDB Change Streams solve these challenges:

// MongoDB Change Streams - efficient real-time data monitoring
const changeStream = db.collection('orders').watch([
  {
    $match: {
      'operationType': { $in: ['insert', 'update', 'delete'] },
      'fullDocument.status': { $in: ['pending', 'processing', 'shipped'] }
    }
  }
]);

changeStream.on('change', (change) => {
  console.log('Real-time change detected:', {
    operationType: change.operationType,
    documentKey: change.documentKey,
    fullDocument: change.fullDocument,
    updateDescription: change.updateDescription,
    timestamp: change.clusterTime
  });

  // React immediately to changes
  handleOrderStatusChange(change);
});

// Benefits:
// - Zero polling overhead - push-based notifications
// - Immediate change detection with sub-second latency
// - Rich change metadata including operation type and modified fields
// - Efficient filtering at the database level
// - Automatic resume capability for fault tolerance
// - Scalable across replica sets and sharded clusters

Understanding MongoDB Change Streams

Change Stream Fundamentals

MongoDB Change Streams provide real-time access to data changes:

// Change Stream implementation for various scenarios
class ChangeStreamManager {
  constructor(db) {
    this.db = db;
    this.activeStreams = new Map();
  }

  async watchCollection(collectionName, pipeline = [], options = {}) {
    const collection = this.db.collection(collectionName);

    const changeStreamOptions = {
      fullDocument: options.includeFullDocument ? 'updateLookup' : 'default',
      fullDocumentBeforeChange: options.includeBeforeDocument ? 'whenAvailable' : 'off',
      resumeAfter: options.resumeToken || null,
      startAtOperationTime: options.startTime || null,
      maxAwaitTimeMS: options.maxAwaitTime || 1000,
      batchSize: options.batchSize || 1000
    };

    const changeStream = collection.watch(pipeline, changeStreamOptions);

    // Store stream reference for management
    const streamId = `${collectionName}_${Date.now()}`;
    this.activeStreams.set(streamId, {
      stream: changeStream,
      collection: collectionName,
      pipeline: pipeline,
      startedAt: new Date()
    });

    return { streamId, changeStream };
  }

  async watchDatabase(pipeline = [], options = {}) {
    // Watch changes across entire database
    const changeStream = this.db.watch(pipeline, {
      fullDocument: options.includeFullDocument ? 'updateLookup' : 'default',
      fullDocumentBeforeChange: options.includeBeforeDocument ? 'whenAvailable' : 'off'
    });

    const streamId = `database_${Date.now()}`;
    this.activeStreams.set(streamId, {
      stream: changeStream,
      scope: 'database',
      startedAt: new Date()
    });

    return { streamId, changeStream };
  }

  async setupOrderProcessingStream() {
    // Real-time order processing workflow
    const pipeline = [
      {
        $match: {
          $or: [
            // New orders created
            {
              'operationType': 'insert',
              'fullDocument.status': 'pending'
            },
            // Order status updates
            {
              'operationType': 'update',
              'updateDescription.updatedFields.status': { $exists: true }
            },
            // Order cancellations
            {
              'operationType': 'update',
              'updateDescription.updatedFields.cancelled': true
            }
          ]
        }
      },
      {
        $project: {
          operationType: 1,
          documentKey: 1,
          fullDocument: 1,
          updateDescription: 1,
          clusterTime: 1,
          // Add computed fields for processing
          orderValue: '$fullDocument.total_amount',
          customerId: '$fullDocument.customer_id',
          priorityLevel: {
            $switch: {
              branches: [
                { case: { $gt: ['$fullDocument.total_amount', 1000] }, then: 'high' },
                { case: { $gt: ['$fullDocument.total_amount', 500] }, then: 'medium' }
              ],
              default: 'normal'
            }
          }
        }
      }
    ];

    const { streamId, changeStream } = await this.watchCollection('orders', pipeline, {
      includeFullDocument: true,
      includeBeforeDocument: true
    });

    changeStream.on('change', (change) => {
      this.processOrderChange(change);
    });

    changeStream.on('error', (error) => {
      console.error('Change stream error:', error);
      this.handleStreamError(streamId, error);
    });

    return streamId;
  }

  async processOrderChange(change) {
    const { operationType, fullDocument, updateDescription, priorityLevel } = change;

    try {
      switch (operationType) {
        case 'insert':
          // New order created
          await this.handleNewOrder(fullDocument, priorityLevel);
          break;

        case 'update':
          // Order modified
          await this.handleOrderUpdate(fullDocument, updateDescription, priorityLevel);
          break;

        case 'delete':
          // Order deleted (rare but handle gracefully)
          await this.handleOrderDeletion(change.documentKey);
          break;
      }
    } catch (error) {
      console.error('Failed to process order change:', error);
      // Implement dead letter queue or retry logic
      await this.queueFailedChange(change, error);
    }
  }

  async handleNewOrder(order, priority) {
    console.log(`New ${priority} priority order: ${order._id}`);

    // Trigger immediate actions for new orders
    const actions = [];

    // Inventory reservation
    actions.push(this.reserveInventory(order._id, order.items));

    // Payment processing for high-priority orders
    if (priority === 'high') {
      actions.push(this.expeditePaymentProcessing(order._id));
    }

    // Customer notification
    actions.push(this.notifyCustomer(order.customer_id, 'order_created', order._id));

    // Fraud detection for large orders
    if (order.total_amount > 2000) {
      actions.push(this.triggerFraudCheck(order._id));
    }

    await Promise.allSettled(actions);
  }

  async handleOrderUpdate(order, updateDescription, priority) {
    const updatedFields = updateDescription.updatedFields || {};

    // React to specific field changes
    if ('status' in updatedFields) {
      await this.handleStatusChange(order._id, updatedFields.status, priority);
    }

    if ('shipping_address' in updatedFields) {
      await this.updateShippingCalculations(order._id, updatedFields.shipping_address);
    }

    if ('items' in updatedFields) {
      await this.recalculateOrderTotal(order._id, updatedFields.items);
    }
  }

  async handleStatusChange(orderId, newStatus, priority) {
    const statusActions = {
      'confirmed': [
        () => this.initiateFullfillment(orderId),
        () => this.updateInventory(orderId, 'reserved'),
        () => this.sendCustomerNotification(orderId, 'order_confirmed')
      ],
      'shipped': [
        () => this.generateTrackingNumber(orderId),
        () => this.updateInventory(orderId, 'shipped'),
        () => this.sendShipmentNotification(orderId),
        () => this.scheduleDeliveryWindow(orderId)
      ],
      'delivered': [
        () => this.finalizeOrder(orderId),
        () => this.updateInventory(orderId, 'delivered'),
        () => this.requestCustomerFeedback(orderId),
        () => this.triggerRecommendations(orderId)
      ],
      'cancelled': [
        () => this.releaseReservedInventory(orderId),
        () => this.processRefund(orderId),
        () => this.sendCancellationNotification(orderId)
      ]
    };

    const actions = statusActions[newStatus] || [];

    if (actions.length > 0) {
      console.log(`Processing ${newStatus} status change for order ${orderId} (${priority} priority)`);

      // Execute high-priority orders first
      if (priority === 'high') {
        for (const action of actions) {
          await action();
        }
      } else {
        await Promise.allSettled(actions.map(action => action()));
      }
    }
  }

  async setupInventoryMonitoring() {
    // Real-time inventory level monitoring
    const pipeline = [
      {
        $match: {
          'operationType': 'update',
          'updateDescription.updatedFields.quantity': { $exists: true }
        }
      },
      {
        $addFields: {
          currentQuantity: '$fullDocument.quantity',
          previousQuantity: {
            $subtract: [
              '$fullDocument.quantity',
              '$updateDescription.updatedFields.quantity'
            ]
          },
          quantityChange: '$updateDescription.updatedFields.quantity',
          productId: '$fullDocument.product_id',
          threshold: '$fullDocument.reorder_threshold'
        }
      },
      {
        $match: {
          $or: [
            // Low stock alert
            { $expr: { $lt: ['$currentQuantity', '$threshold'] } },
            // Out of stock
            { currentQuantity: 0 },
            // Large quantity changes (potential issues)
            { $expr: { $gt: [{ $abs: '$quantityChange' }, 100] } }
          ]
        }
      }
    ];

    const { streamId, changeStream } = await this.watchCollection('inventory', pipeline, {
      includeFullDocument: true
    });

    changeStream.on('change', (change) => {
      this.processInventoryChange(change);
    });

    return streamId;
  }

  async processInventoryChange(change) {
    const { currentQuantity, threshold, productId, quantityChange } = change;

    if (currentQuantity === 0) {
      // Out of stock - immediate action required
      await this.handleOutOfStock(productId);
    } else if (currentQuantity <= threshold) {
      // Low stock warning
      await this.triggerReorderAlert(productId, currentQuantity, threshold);
    }

    // Detect unusual quantity changes
    if (Math.abs(quantityChange) > 100) {
      await this.flagUnusualInventoryChange(productId, quantityChange, change.clusterTime);
    }

    // Update real-time inventory dashboard
    await this.updateInventoryDashboard(productId, currentQuantity);
  }

  async handleStreamError(streamId, error) {
    console.error(`Change stream ${streamId} encountered error:`, error);

    const streamInfo = this.activeStreams.get(streamId);

    if (streamInfo) {
      // Close errored stream
      streamInfo.stream.close();

      // Attempt to resume from last known position
      if (error.resumeToken) {
        console.log(`Attempting to resume stream ${streamId}`);

        const resumeOptions = {
          resumeAfter: error.resumeToken,
          includeFullDocument: true
        };

        const { streamId: newStreamId, changeStream } = await this.watchCollection(
          streamInfo.collection,
          streamInfo.pipeline,
          resumeOptions
        );

        // Update stream reference
        this.activeStreams.delete(streamId);
        console.log(`Stream ${streamId} resumed as ${newStreamId}`);
      }
    }
  }

  async closeStream(streamId) {
    const streamInfo = this.activeStreams.get(streamId);

    if (streamInfo) {
      await streamInfo.stream.close();
      this.activeStreams.delete(streamId);
      console.log(`Stream ${streamId} closed successfully`);
    }
  }

  async closeAllStreams() {
    const closePromises = Array.from(this.activeStreams.keys()).map(
      streamId => this.closeStream(streamId)
    );

    await Promise.allSettled(closePromises);
    console.log('All change streams closed');
  }

  // Placeholder methods for business logic
  async reserveInventory(orderId, items) { /* Implementation */ }
  async expeditePaymentProcessing(orderId) { /* Implementation */ }
  async notifyCustomer(customerId, event, orderId) { /* Implementation */ }
  async triggerFraudCheck(orderId) { /* Implementation */ }
  async initiateFullfillment(orderId) { /* Implementation */ }
  async updateInventory(orderId, status) { /* Implementation */ }
  async sendCustomerNotification(orderId, type) { /* Implementation */ }
  async generateTrackingNumber(orderId) { /* Implementation */ }
  async handleOutOfStock(productId) { /* Implementation */ }
  async triggerReorderAlert(productId, current, threshold) { /* Implementation */ }
  async updateInventoryDashboard(productId, quantity) { /* Implementation */ }
}

Advanced Change Stream Patterns

Implement sophisticated change stream architectures:

// Advanced change stream patterns for complex scenarios
class AdvancedChangeStreamProcessor {
  constructor(db) {
    this.db = db;
    this.streamProcessors = new Map();
    this.changeBuffer = [];
    this.batchProcessor = null;
  }

  async setupMultiCollectionWorkflow() {
    // Coordinate changes across multiple related collections
    const collections = ['users', 'orders', 'inventory', 'payments'];
    const streams = [];

    for (const collectionName of collections) {
      const pipeline = [
        {
          $match: {
            'operationType': { $in: ['insert', 'update', 'delete'] }
          }
        },
        {
          $addFields: {
            sourceCollection: collectionName,
            changeId: { $toString: '$_id' },
            timestamp: '$clusterTime'
          }
        }
      ];

      const changeStream = this.db.collection(collectionName).watch(pipeline);

      changeStream.on('change', (change) => {
        this.processMultiCollectionChange(change);
      });

      streams.push({ collection: collectionName, stream: changeStream });
    }

    return streams;
  }

  async processMultiCollectionChange(change) {
    const { sourceCollection, operationType, documentKey, fullDocument } = change;

    // Implement cross-collection business logic
    switch (sourceCollection) {
      case 'users':
        if (operationType === 'insert') {
          await this.handleNewUserRegistration(fullDocument);
        } else if (operationType === 'update') {
          await this.handleUserProfileUpdate(documentKey._id, change.updateDescription);
        }
        break;

      case 'orders':
        await this.syncOrderRelatedData(change);
        break;

      case 'inventory':
        await this.propagateInventoryChanges(change);
        break;

      case 'payments':
        await this.handlePaymentEvents(change);
        break;
    }

    // Trigger cross-collection consistency checks
    await this.validateDataConsistency(change);
  }

  async syncOrderRelatedData(change) {
    const { operationType, fullDocument, documentKey } = change;

    if (operationType === 'insert' && fullDocument) {
      // New order created - sync with related systems
      const syncTasks = [
        this.updateCustomerOrderHistory(fullDocument.customer_id, fullDocument._id),
        this.reserveInventoryItems(fullDocument.items),
        this.createPaymentRecord(fullDocument._id, fullDocument.total_amount),
        this.updateSalesAnalytics(fullDocument)
      ];

      await Promise.allSettled(syncTasks);

    } else if (operationType === 'update') {
      const updatedFields = change.updateDescription?.updatedFields || {};

      // Sync specific field changes
      if ('status' in updatedFields) {
        await this.syncOrderStatusAcrossCollections(documentKey._id, updatedFields.status);
      }

      if ('items' in updatedFields) {
        await this.recalculateRelatedData(documentKey._id, updatedFields.items);
      }
    }
  }

  async setupBatchedChangeProcessing(options = {}) {
    // Process changes in batches for efficiency
    const batchSize = options.batchSize || 100;
    const flushInterval = options.flushIntervalMs || 5000;

    this.batchProcessor = setInterval(async () => {
      if (this.changeBuffer.length > 0) {
        const batch = this.changeBuffer.splice(0, batchSize);
        await this.processBatchedChanges(batch);
      }
    }, flushInterval);

    // Set up change streams to buffer changes
    const changeStream = this.db.collection('events').watch([
      {
        $match: {
          'operationType': { $in: ['insert', 'update'] }
        }
      }
    ]);

    changeStream.on('change', (change) => {
      this.changeBuffer.push({
        ...change,
        bufferedAt: new Date()
      });

      // Flush immediately if buffer is full
      if (this.changeBuffer.length >= batchSize) {
        this.flushChangeBuffer();
      }
    });
  }

  async processBatchedChanges(changes) {
    console.log(`Processing batch of ${changes.length} changes`);

    // Group changes by type for efficient processing
    const changeGroups = changes.reduce((groups, change) => {
      const key = `${change.operationType}_${change.ns?.coll || 'unknown'}`;
      groups[key] = groups[key] || [];
      groups[key].push(change);
      return groups;
    }, {});

    // Process each group
    for (const [groupKey, groupChanges] of Object.entries(changeGroups)) {
      await this.processChangeGroup(groupKey, groupChanges);
    }
  }

  async processChangeGroup(groupKey, changes) {
    const [operationType, collection] = groupKey.split('_');

    switch (collection) {
      case 'analytics_events':
        await this.updateAnalyticsDashboard(changes);
        break;

      case 'user_activities':
        await this.updateUserEngagementMetrics(changes);
        break;

      case 'system_logs':
        await this.processSystemLogBatch(changes);
        break;

      default:
        console.log(`Unhandled change group: ${groupKey}`);
    }
  }

  async setupChangeStreamWithDeduplication() {
    // Prevent duplicate processing of changes
    const processedChanges = new Set();
    const DEDUP_WINDOW_MS = 30000; // 30 seconds

    const changeStream = this.db.collection('critical_data').watch([
      {
        $match: {
          'operationType': { $in: ['insert', 'update', 'delete'] }
        }
      }
    ]);

    changeStream.on('change', async (change) => {
      const changeHash = this.generateChangeHash(change);

      if (processedChanges.has(changeHash)) {
        console.log('Duplicate change detected, skipping:', changeHash);
        return;
      }

      // Add to processed set
      processedChanges.add(changeHash);

      // Remove from set after dedup window
      setTimeout(() => {
        processedChanges.delete(changeHash);
      }, DEDUP_WINDOW_MS);

      // Process the change
      await this.processCriticalChange(change);
    });
  }

  generateChangeHash(change) {
    // Create hash from key change attributes
    const hashData = {
      operationType: change.operationType,
      documentKey: change.documentKey,
      clusterTime: change.clusterTime?.toString(),
      updateFields: change.updateDescription?.updatedFields ? 
        Object.keys(change.updateDescription.updatedFields).sort() : null
    };

    return JSON.stringify(hashData);
  }

  async setupResumableChangeStream(collectionName, pipeline = []) {
    // Implement resumable change streams with persistent resume tokens
    let resumeToken = await this.getStoredResumeToken(collectionName);

    const startChangeStream = () => {
      const options = { fullDocument: 'updateLookup' };

      if (resumeToken) {
        options.resumeAfter = resumeToken;
        console.log(`Resuming change stream for ${collectionName} from token:`, resumeToken);
      }

      const changeStream = this.db.collection(collectionName).watch(pipeline, options);

      changeStream.on('change', async (change) => {
        // Store resume token for recovery
        resumeToken = change._id;
        await this.storeResumeToken(collectionName, resumeToken);

        // Process the change
        await this.processResumeableChange(collectionName, change);
      });

      changeStream.on('error', (error) => {
        console.error('Change stream error:', error);

        // Attempt to restart stream
        setTimeout(() => {
          console.log('Restarting change stream...');
          startChangeStream();
        }, 5000);
      });

      return changeStream;
    };

    return startChangeStream();
  }

  async storeResumeToken(collectionName, resumeToken) {
    await this.db.collection('change_stream_tokens').updateOne(
      { collection: collectionName },
      { 
        $set: { 
          resumeToken: resumeToken,
          updatedAt: new Date()
        }
      },
      { upsert: true }
    );
  }

  async getStoredResumeToken(collectionName) {
    const tokenDoc = await this.db.collection('change_stream_tokens').findOne({
      collection: collectionName
    });

    return tokenDoc?.resumeToken || null;
  }

  async setupChangeStreamWithFiltering(filterConfig) {
    // Dynamic filtering based on configuration
    const pipeline = [];

    // Operation type filter
    if (filterConfig.operationTypes) {
      pipeline.push({
        $match: {
          'operationType': { $in: filterConfig.operationTypes }
        }
      });
    }

    // Field-specific filters
    if (filterConfig.fieldFilters) {
      const fieldMatches = Object.entries(filterConfig.fieldFilters).map(([field, condition]) => {
        return { [`fullDocument.${field}`]: condition };
      });

      if (fieldMatches.length > 0) {
        pipeline.push({
          $match: { $and: fieldMatches }
        });
      }
    }

    // Custom filter functions
    if (filterConfig.customFilter) {
      pipeline.push({
        $match: {
          $expr: filterConfig.customFilter
        }
      });
    }

    // Projection for efficiency
    if (filterConfig.projection) {
      pipeline.push({
        $project: filterConfig.projection
      });
    }

    const changeStream = this.db.collection(filterConfig.collection).watch(pipeline);

    changeStream.on('change', (change) => {
      this.processFilteredChange(filterConfig.collection, change, filterConfig);
    });

    return changeStream;
  }

  // Placeholder methods
  async handleNewUserRegistration(user) { /* Implementation */ }
  async handleUserProfileUpdate(userId, changes) { /* Implementation */ }
  async propagateInventoryChanges(change) { /* Implementation */ }
  async handlePaymentEvents(change) { /* Implementation */ }
  async validateDataConsistency(change) { /* Implementation */ }
  async updateAnalyticsDashboard(changes) { /* Implementation */ }
  async updateUserEngagementMetrics(changes) { /* Implementation */ }
  async processSystemLogBatch(changes) { /* Implementation */ }
  async processCriticalChange(change) { /* Implementation */ }
  async processResumeableChange(collection, change) { /* Implementation */ }
  async processFilteredChange(collection, change, config) { /* Implementation */ }
}

Real-Time Application Patterns

Live Dashboard Implementation

Build real-time dashboards using Change Streams:

// Real-time dashboard with Change Streams
class LiveDashboardService {
  constructor(db, websocketServer) {
    this.db = db;
    this.websockets = websocketServer;
    this.dashboardStreams = new Map();
    this.metricsCache = new Map();
  }

  async setupSalesDashboard() {
    // Real-time sales metrics dashboard
    const pipeline = [
      {
        $match: {
          $or: [
            // New sales
            { 
              'operationType': 'insert',
              'ns.coll': 'orders',
              'fullDocument.status': 'completed'
            },
            // Order updates affecting revenue
            {
              'operationType': 'update',
              'ns.coll': 'orders',
              'updateDescription.updatedFields.total_amount': { $exists: true }
            },
            // Refunds
            {
              'operationType': 'insert',
              'ns.coll': 'refunds'
            }
          ]
        }
      },
      {
        $addFields: {
          eventType: {
            $switch: {
              branches: [
                { 
                  case: { $eq: ['$operationType', 'insert'] },
                  then: {
                    $cond: {
                      if: { $eq: ['$ns.coll', 'refunds'] },
                      then: 'refund',
                      else: 'sale'
                    }
                  }
                },
                { case: { $eq: ['$operationType', 'update'] }, then: 'update' }
              ],
              default: 'unknown'
            }
          }
        }
      }
    ];

    const changeStream = this.db.watch(pipeline, {
      fullDocument: 'updateLookup'
    });

    changeStream.on('change', (change) => {
      this.processSalesChange(change);
    });

    this.dashboardStreams.set('sales', changeStream);
  }

  async processSalesChange(change) {
    const { eventType, fullDocument, operationType } = change;

    try {
      let metricsUpdate = {};

      switch (eventType) {
        case 'sale':
          metricsUpdate = await this.processSaleEvent(fullDocument);
          break;

        case 'refund':
          metricsUpdate = await this.processRefundEvent(fullDocument);
          break;

        case 'update':
          metricsUpdate = await this.processOrderUpdateEvent(change);
          break;
      }

      // Update cached metrics
      this.updateMetricsCache('sales', metricsUpdate);

      // Broadcast to connected clients
      this.broadcastMetricsUpdate('sales', metricsUpdate);

    } catch (error) {
      console.error('Error processing sales change:', error);
    }
  }

  async processSaleEvent(order) {
    const now = new Date();
    const today = now.toISOString().split('T')[0];

    // Calculate real-time metrics
    const dailyRevenue = await this.calculateDailyRevenue(today);
    const hourlyOrderCount = await this.calculateHourlyOrders(now);
    const topProducts = await this.getTopProductsToday(today);

    return {
      timestamp: now,
      newSale: {
        orderId: order._id,
        amount: order.total_amount,
        customerId: order.customer_id,
        items: order.items?.length || 0
      },
      aggregates: {
        dailyRevenue: dailyRevenue,
        hourlyOrderCount: hourlyOrderCount,
        totalOrdersToday: await this.getTotalOrdersToday(today),
        averageOrderValue: dailyRevenue / await this.getTotalOrdersToday(today)
      },
      topProducts: topProducts
    };
  }

  async setupInventoryDashboard() {
    // Real-time inventory monitoring
    const pipeline = [
      {
        $match: {
          'operationType': 'update',
          'ns.coll': 'inventory',
          'updateDescription.updatedFields.quantity': { $exists: true }
        }
      },
      {
        $addFields: {
          productId: '$fullDocument.product_id',
          newQuantity: '$fullDocument.quantity',
          quantityChange: '$updateDescription.updatedFields.quantity',
          threshold: '$fullDocument.reorder_threshold',
          category: '$fullDocument.category'
        }
      },
      {
        $match: {
          $or: [
            // Low stock alerts
            { $expr: { $lt: ['$newQuantity', '$threshold'] } },
            // Large quantity changes
            { $expr: { $gt: [{ $abs: '$quantityChange' }, 50] } },
            // Out of stock
            { newQuantity: 0 }
          ]
        }
      }
    ];

    const changeStream = this.db.collection('inventory').watch(pipeline, {
      fullDocument: 'updateLookup'
    });

    changeStream.on('change', (change) => {
      this.processInventoryChange(change);
    });

    this.dashboardStreams.set('inventory', changeStream);
  }

  async processInventoryChange(change) {
    const { productId, newQuantity, quantityChange, threshold, category } = change;

    const alertLevel = this.determineAlertLevel(newQuantity, threshold, quantityChange);
    const categoryMetrics = await this.getCategoryInventoryMetrics(category);

    const update = {
      timestamp: new Date(),
      inventory_alert: {
        productId: productId,
        quantity: newQuantity,
        change: quantityChange,
        alertLevel: alertLevel,
        category: category
      },
      category_metrics: categoryMetrics,
      low_stock_count: await this.getLowStockCount()
    };

    this.updateMetricsCache('inventory', update);
    this.broadcastMetricsUpdate('inventory', update);

    // Send critical alerts immediately
    if (alertLevel === 'critical') {
      this.sendCriticalInventoryAlert(productId, newQuantity);
    }
  }

  determineAlertLevel(quantity, threshold, change) {
    if (quantity === 0) return 'critical';
    if (quantity <= threshold * 0.5) return 'high';
    if (quantity <= threshold) return 'medium';
    if (Math.abs(change) > 100) return 'unusual';
    return 'normal';
  }

  async setupUserActivityDashboard() {
    // Real-time user activity tracking
    const pipeline = [
      {
        $match: {
          $or: [
            // New user registrations
            {
              'operationType': 'insert',
              'ns.coll': 'users'
            },
            // User login events
            {
              'operationType': 'insert',
              'ns.coll': 'user_sessions'
            },
            // User activity updates
            {
              'operationType': 'update',
              'ns.coll': 'users',
              'updateDescription.updatedFields.last_activity': { $exists: true }
            }
          ]
        }
      }
    ];

    const changeStream = this.db.watch(pipeline, {
      fullDocument: 'updateLookup'
    });

    changeStream.on('change', (change) => {
      this.processUserActivityChange(change);
    });

    this.dashboardStreams.set('user_activity', changeStream);
  }

  async processUserActivityChange(change) {
    const { operationType, ns, fullDocument } = change;

    let activityUpdate = {
      timestamp: new Date()
    };

    if (ns.coll === 'users' && operationType === 'insert') {
      // New user registration
      activityUpdate.new_user = {
        userId: fullDocument._id,
        email: fullDocument.email,
        registrationTime: fullDocument.created_at
      };

      activityUpdate.metrics = {
        dailyRegistrations: await this.getDailyRegistrations(),
        totalUsers: await this.getTotalUserCount(),
        activeUsersToday: await this.getActiveUsersToday()
      };

    } else if (ns.coll === 'user_sessions' && operationType === 'insert') {
      // New user session (login)
      activityUpdate.user_login = {
        userId: fullDocument.user_id,
        sessionId: fullDocument._id,
        loginTime: fullDocument.created_at,
        userAgent: fullDocument.user_agent
      };

      activityUpdate.metrics = {
        activeSessionsNow: await this.getActiveSessionCount(),
        loginsToday: await this.getDailyLogins()
      };
    }

    this.updateMetricsCache('user_activity', activityUpdate);
    this.broadcastMetricsUpdate('user_activity', activityUpdate);
  }

  updateMetricsCache(dashboardType, update) {
    const existing = this.metricsCache.get(dashboardType) || {};
    const merged = { ...existing, ...update };
    this.metricsCache.set(dashboardType, merged);
  }

  broadcastMetricsUpdate(dashboardType, update) {
    const message = {
      type: 'dashboard_update',
      dashboard: dashboardType,
      data: update
    };

    // Broadcast to all connected WebSocket clients
    this.websockets.emit('dashboard_update', message);
  }

  async sendCriticalInventoryAlert(productId, quantity) {
    const product = await this.db.collection('products').findOne({ _id: productId });

    const alert = {
      type: 'critical_inventory_alert',
      productId: productId,
      productName: product?.name || 'Unknown Product',
      quantity: quantity,
      timestamp: new Date(),
      severity: 'critical'
    };

    // Send to specific alert channels
    this.websockets.emit('critical_alert', alert);

    // Could also integrate with external alerting (email, Slack, etc.)
    await this.sendExternalAlert(alert);
  }

  async getCurrentMetrics(dashboardType) {
    // Get current cached metrics for dashboard initialization
    return this.metricsCache.get(dashboardType) || {};
  }

  async closeDashboard(dashboardType) {
    const stream = this.dashboardStreams.get(dashboardType);
    if (stream) {
      await stream.close();
      this.dashboardStreams.delete(dashboardType);
      this.metricsCache.delete(dashboardType);
    }
  }

  // Placeholder methods for metric calculations
  async calculateDailyRevenue(date) { /* Implementation */ }
  async calculateHourlyOrders(hour) { /* Implementation */ }
  async getTopProductsToday(date) { /* Implementation */ }
  async getTotalOrdersToday(date) { /* Implementation */ }
  async getCategoryInventoryMetrics(category) { /* Implementation */ }
  async getLowStockCount() { /* Implementation */ }
  async getDailyRegistrations() { /* Implementation */ }
  async getTotalUserCount() { /* Implementation */ }
  async getActiveUsersToday() { /* Implementation */ }
  async getActiveSessionCount() { /* Implementation */ }
  async getDailyLogins() { /* Implementation */ }
  async sendExternalAlert(alert) { /* Implementation */ }
}

Data Synchronization Patterns

Implement complex data sync scenarios:

// Data synchronization using Change Streams
class DataSynchronizationService {
  constructor(primaryDb, replicaDb) {
    this.primaryDb = primaryDb;
    this.replicaDb = replicaDb;
    this.syncStreams = new Map();
    this.syncState = new Map();
  }

  async setupCrossClusterSync() {
    // Synchronize data between different MongoDB clusters
    const collections = ['users', 'orders', 'products', 'inventory'];

    for (const collectionName of collections) {
      await this.setupCollectionSync(collectionName);
    }
  }

  async setupCollectionSync(collectionName) {
    // Get last sync timestamp for resumable sync
    const lastSyncTime = await this.getLastSyncTimestamp(collectionName);

    const pipeline = [
      {
        $match: {
          'operationType': { $in: ['insert', 'update', 'delete'] },
          'clusterTime': { $gt: lastSyncTime || new Date(0) }
        }
      },
      {
        $addFields: {
          syncId: { $toString: '$_id' },
          sourceCollection: collectionName
        }
      }
    ];

    const changeStream = this.primaryDb.collection(collectionName).watch(pipeline, {
      fullDocument: 'updateLookup',
      fullDocumentBeforeChange: 'whenAvailable',
      startAtOperationTime: lastSyncTime
    });

    changeStream.on('change', (change) => {
      this.processSyncChange(collectionName, change);
    });

    changeStream.on('error', (error) => {
      console.error(`Sync stream error for ${collectionName}:`, error);
      this.handleSyncError(collectionName, error);
    });

    this.syncStreams.set(collectionName, changeStream);
  }

  async processSyncChange(collectionName, change) {
    const { operationType, documentKey, fullDocument, fullDocumentBeforeChange } = change;

    try {
      const replicaCollection = this.replicaDb.collection(collectionName);

      switch (operationType) {
        case 'insert':
          await this.syncInsert(replicaCollection, fullDocument);
          break;

        case 'update':
          await this.syncUpdate(replicaCollection, documentKey, fullDocument, change.updateDescription);
          break;

        case 'delete':
          await this.syncDelete(replicaCollection, documentKey);
          break;
      }

      // Update sync timestamp
      await this.updateSyncTimestamp(collectionName, change.clusterTime);

      // Track sync statistics
      this.updateSyncStats(collectionName, operationType);

    } catch (error) {
      console.error(`Sync error for ${collectionName}:`, error);
      await this.recordSyncError(collectionName, change, error);
    }
  }

  async syncInsert(replicaCollection, document) {
    // Handle insert with conflict resolution
    const existingDoc = await replicaCollection.findOne({ _id: document._id });

    if (existingDoc) {
      // Document already exists - compare timestamps or use conflict resolution
      const shouldUpdate = await this.resolveInsertConflict(document, existingDoc);

      if (shouldUpdate) {
        await replicaCollection.replaceOne(
          { _id: document._id },
          document,
          { upsert: true }
        );
      }
    } else {
      await replicaCollection.insertOne(document);
    }
  }

  async syncUpdate(replicaCollection, documentKey, fullDocument, updateDescription) {
    if (fullDocument) {
      // Full document available - use replace
      await replicaCollection.replaceOne(
        documentKey,
        fullDocument,
        { upsert: true }
      );
    } else if (updateDescription) {
      // Apply partial updates
      const updateDoc = {};

      if (updateDescription.updatedFields) {
        updateDoc.$set = updateDescription.updatedFields;
      }

      if (updateDescription.removedFields) {
        updateDoc.$unset = updateDescription.removedFields.reduce((unset, field) => {
          unset[field] = "";
          return unset;
        }, {});
      }

      if (updateDescription.truncatedArrays) {
        // Handle array truncation
        for (const [field, newSize] of Object.entries(updateDescription.truncatedArrays)) {
          updateDoc.$set = updateDoc.$set || {};
          updateDoc.$set[field] = { $slice: newSize };
        }
      }

      await replicaCollection.updateOne(documentKey, updateDoc, { upsert: true });
    }
  }

  async syncDelete(replicaCollection, documentKey) {
    const result = await replicaCollection.deleteOne(documentKey);

    if (result.deletedCount === 0) {
      console.warn('Document not found for deletion:', documentKey);
    }
  }

  async setupBidirectionalSync() {
    // Two-way sync between databases with conflict resolution
    await this.setupUnidirectionalSync(this.primaryDb, this.replicaDb, 'primary_to_replica');
    await this.setupUnidirectionalSync(this.replicaDb, this.primaryDb, 'replica_to_primary');
  }

  async setupUnidirectionalSync(sourceDb, targetDb, direction) {
    const collections = ['users', 'orders'];

    for (const collectionName of collections) {
      const pipeline = [
        {
          $match: {
            'operationType': { $in: ['insert', 'update', 'delete'] },
            // Avoid sync loops by checking sync metadata
            'fullDocument.syncMetadata.origin': { $ne: direction === 'primary_to_replica' ? 'replica' : 'primary' }
          }
        }
      ];

      const changeStream = sourceDb.collection(collectionName).watch(pipeline, {
        fullDocument: 'updateLookup'
      });

      changeStream.on('change', (change) => {
        this.processBidirectionalSync(targetDb, collectionName, change, direction);
      });
    }
  }

  async processBidirectionalSync(targetDb, collectionName, change, direction) {
    const { operationType, documentKey, fullDocument } = change;
    const targetCollection = targetDb.collection(collectionName);

    // Add sync metadata to prevent loops
    const syncOrigin = direction.includes('primary') ? 'primary' : 'replica';

    if (fullDocument) {
      fullDocument.syncMetadata = {
        origin: syncOrigin,
        syncedAt: new Date(),
        syncDirection: direction
      };
    }

    switch (operationType) {
      case 'insert':
        await targetCollection.insertOne(fullDocument);
        break;

      case 'update':
        if (fullDocument) {
          const result = await targetCollection.findOneAndReplace(
            documentKey,
            fullDocument,
            { returnDocument: 'before' }
          );

          if (result.value) {
            // Check for conflicts
            await this.handleUpdateConflict(result.value, fullDocument, direction);
          }
        }
        break;

      case 'delete':
        await targetCollection.deleteOne(documentKey);
        break;
    }
  }

  async handleUpdateConflict(existingDoc, newDoc, direction) {
    // Implement conflict resolution strategy
    const existingTimestamp = existingDoc.syncMetadata?.syncedAt || existingDoc.updatedAt;
    const newTimestamp = newDoc.syncMetadata?.syncedAt || newDoc.updatedAt;

    if (existingTimestamp && newTimestamp && existingTimestamp > newTimestamp) {
      console.warn('Sync conflict detected - existing document is newer');
      // Could implement last-write-wins, manual resolution, or merge strategies
      await this.recordConflict(existingDoc, newDoc, direction);
    }
  }

  async setupEventSourcing() {
    // Event sourcing pattern with Change Streams
    const changeStream = this.primaryDb.watch([
      {
        $match: {
          'operationType': { $in: ['insert', 'update', 'delete'] }
        }
      },
      {
        $addFields: {
          eventType: '$operationType',
          aggregateId: '$documentKey._id',
          aggregateType: '$ns.coll',
          eventData: {
            before: '$fullDocumentBeforeChange',
            after: '$fullDocument',
            changes: '$updateDescription'
          },
          metadata: {
            timestamp: '$clusterTime',
            txnNumber: '$txnNumber',
            lsid: '$lsid'
          }
        }
      }
    ], {
      fullDocument: 'updateLookup',
      fullDocumentBeforeChange: 'whenAvailable'
    });

    changeStream.on('change', (change) => {
      this.processEventSourcingChange(change);
    });
  }

  async processEventSourcingChange(change) {
    const event = {
      eventId: change._id,
      eventType: change.eventType,
      aggregateId: change.aggregateId,
      aggregateType: change.aggregateType,
      eventData: change.eventData,
      metadata: change.metadata,
      createdAt: new Date()
    };

    // Store event in event store
    await this.primaryDb.collection('events').insertOne(event);

    // Project to read models
    await this.updateReadModels(event);

    // Publish to external systems
    await this.publishEvent(event);
  }

  // Utility and placeholder methods
  async getLastSyncTimestamp(collection) { /* Implementation */ }
  async updateSyncTimestamp(collection, timestamp) { /* Implementation */ }
  async updateSyncStats(collection, operation) { /* Implementation */ }
  async recordSyncError(collection, change, error) { /* Implementation */ }
  async resolveInsertConflict(newDoc, existingDoc) { /* Implementation */ }
  async handleSyncError(collection, error) { /* Implementation */ }
  async recordConflict(existing, incoming, direction) { /* Implementation */ }
  async updateReadModels(event) { /* Implementation */ }
  async publishEvent(event) { /* Implementation */ }
}

QueryLeaf Change Stream Integration

QueryLeaf provides SQL-familiar syntax for change data capture and real-time processing:

-- QueryLeaf Change Stream operations with SQL-style syntax

-- Basic change data capture using SQL trigger-like syntax
CREATE TRIGGER orders_realtime_trigger
ON orders
FOR INSERT, UPDATE, DELETE
AS
BEGIN
  -- Real-time order processing logic
  IF TRIGGER_ACTION = 'INSERT' AND NEW.status = 'pending' THEN
    -- Process new orders immediately
    INSERT INTO order_processing_queue (order_id, priority, created_at)
    VALUES (NEW.order_id, 'high', CURRENT_TIMESTAMP);

    -- Reserve inventory for new orders
    UPDATE inventory 
    SET reserved_quantity = reserved_quantity + oi.quantity
    FROM order_items oi
    WHERE inventory.product_id = oi.product_id 
      AND oi.order_id = NEW.order_id;

  ELSIF TRIGGER_ACTION = 'UPDATE' AND OLD.status != NEW.status THEN
    -- Handle status changes
    INSERT INTO order_status_history (order_id, old_status, new_status, changed_at)
    VALUES (NEW.order_id, OLD.status, NEW.status, CURRENT_TIMESTAMP);

    -- Specific status-based actions
    IF NEW.status = 'shipped' THEN
      -- Generate tracking number and notify customer
      UPDATE orders 
      SET tracking_number = GENERATE_TRACKING_NUMBER()
      WHERE order_id = NEW.order_id;

      CALL NOTIFY_CUSTOMER(NEW.customer_id, 'order_shipped', NEW.order_id);
    END IF;

  ELSIF TRIGGER_ACTION = 'DELETE' THEN
    -- Handle order cancellation/deletion
    UPDATE inventory 
    SET reserved_quantity = reserved_quantity - oi.quantity
    FROM order_items oi
    WHERE inventory.product_id = oi.product_id 
      AND oi.order_id = OLD.order_id;
  END IF;
END;

-- Real-time analytics with streaming aggregations
CREATE MATERIALIZED VIEW sales_dashboard_realtime AS
SELECT 
  DATE_TRUNC('hour', created_at) as hour_bucket,
  COUNT(*) as orders_count,
  SUM(total_amount) as revenue,
  AVG(total_amount) as avg_order_value,
  COUNT(DISTINCT customer_id) as unique_customers,
  -- Rolling window calculations
  SUM(total_amount) OVER (
    ORDER BY DATE_TRUNC('hour', created_at)
    ROWS BETWEEN 23 PRECEDING AND CURRENT ROW
  ) as rolling_24h_revenue
FROM orders
WHERE status = 'completed'
  AND created_at >= CURRENT_TIMESTAMP - INTERVAL '7 days'
GROUP BY hour_bucket
ORDER BY hour_bucket DESC;

-- QueryLeaf automatically converts this to MongoDB Change Streams:
-- 1. Sets up change stream on orders collection
-- 2. Filters for relevant operations and status changes  
-- 3. Updates materialized view in real-time
-- 4. Provides SQL-familiar syntax for complex real-time logic

-- Multi-table change stream coordination
WITH order_changes AS (
  SELECT 
    order_id,
    status,
    total_amount,
    customer_id,
    CHANGE_TYPE() as operation,
    CHANGE_TIMESTAMP() as changed_at
  FROM orders
  WHERE CHANGE_DETECTED()
),
inventory_changes AS (
  SELECT 
    product_id,
    quantity,
    reserved_quantity,
    CHANGE_TYPE() as operation,
    CHANGE_TIMESTAMP() as changed_at
  FROM inventory
  WHERE CHANGE_DETECTED()
    AND (quantity < reorder_threshold OR reserved_quantity > available_quantity)
)
-- React to coordinated changes across collections
SELECT 
  CASE 
    WHEN oc.operation = 'INSERT' AND oc.status = 'pending' THEN 
      'process_new_order'
    WHEN oc.operation = 'UPDATE' AND oc.status = 'shipped' THEN 
      'send_shipping_notification'  
    WHEN ic.operation = 'UPDATE' AND ic.quantity = 0 THEN
      'handle_out_of_stock'
    ELSE 'no_action'
  END as action_required,
  COALESCE(oc.order_id, ic.product_id) as entity_id,
  COALESCE(oc.changed_at, ic.changed_at) as event_timestamp
FROM order_changes oc
FULL OUTER JOIN inventory_changes ic 
  ON oc.changed_at BETWEEN ic.changed_at - INTERVAL '1 minute' 
                      AND ic.changed_at + INTERVAL '1 minute'
WHERE action_required != 'no_action';

-- Real-time user activity tracking
CREATE OR REPLACE VIEW user_activity_stream AS
SELECT 
  user_id,
  activity_type,
  activity_timestamp,
  session_id,
  -- Session duration calculation
  EXTRACT(EPOCH FROM (
    activity_timestamp - LAG(activity_timestamp) OVER (
      PARTITION BY session_id 
      ORDER BY activity_timestamp
    )
  )) / 60.0 as minutes_since_last_activity,

  -- Real-time engagement scoring
  CASE 
    WHEN activity_type = 'login' THEN 10
    WHEN activity_type = 'purchase' THEN 50
    WHEN activity_type = 'view_product' THEN 2
    WHEN activity_type = 'add_to_cart' THEN 15
    ELSE 1
  END as engagement_score,

  -- Session activity summary
  COUNT(*) OVER (
    PARTITION BY session_id 
    ORDER BY activity_timestamp 
    ROWS UNBOUNDED PRECEDING
  ) as activities_in_session

FROM user_activities
WHERE CHANGE_DETECTED()
  AND activity_timestamp >= CURRENT_TIMESTAMP - INTERVAL '24 hours';

-- Real-time inventory alerts with SQL window functions
SELECT 
  product_id,
  product_name,
  current_quantity,
  reorder_threshold,
  -- Calculate velocity (items sold per hour)
  (reserved_quantity - LAG(reserved_quantity, 1) OVER (
    PARTITION BY product_id 
    ORDER BY CHANGE_TIMESTAMP()
  )) as quantity_change,

  -- Predict stockout time based on current velocity
  CASE 
    WHEN quantity_change > 0 THEN
      ROUND(current_quantity / (quantity_change * 1.0), 1)
    ELSE NULL
  END as estimated_hours_until_stockout,

  -- Alert level based on multiple factors
  CASE 
    WHEN current_quantity = 0 THEN 'CRITICAL'
    WHEN current_quantity <= reorder_threshold * 0.2 THEN 'HIGH'
    WHEN current_quantity <= reorder_threshold * 0.5 THEN 'MEDIUM'
    WHEN estimated_hours_until_stockout <= 24 THEN 'URGENT'
    ELSE 'NORMAL'
  END as alert_level

FROM inventory i
JOIN products p ON i.product_id = p.product_id
WHERE CHANGE_DETECTED()
  AND (current_quantity <= reorder_threshold 
       OR estimated_hours_until_stockout <= 48)
ORDER BY alert_level DESC, estimated_hours_until_stockout ASC;

-- Real-time fraud detection using change streams
WITH payment_patterns AS (
  SELECT 
    customer_id,
    payment_amount,
    payment_method,
    ip_address,
    CHANGE_TIMESTAMP() as payment_time,

    -- Calculate recent payment velocity
    COUNT(*) OVER (
      PARTITION BY customer_id 
      ORDER BY CHANGE_TIMESTAMP()
      RANGE BETWEEN INTERVAL '1 hour' PRECEDING AND CURRENT ROW
    ) as payments_last_hour,

    -- Calculate payment amount patterns  
    AVG(payment_amount) OVER (
      PARTITION BY customer_id
      ORDER BY CHANGE_TIMESTAMP()
      ROWS BETWEEN 10 PRECEDING AND 1 PRECEDING
    ) as avg_payment_amount_10_orders,

    -- Detect IP address changes
    LAG(ip_address) OVER (
      PARTITION BY customer_id 
      ORDER BY CHANGE_TIMESTAMP()
    ) as previous_ip_address

  FROM payments
  WHERE CHANGE_DETECTED()
    AND CHANGE_TYPE() = 'INSERT'
)
SELECT 
  customer_id,
  payment_amount,
  payment_time,
  -- Fraud risk indicators
  CASE 
    WHEN payments_last_hour >= 5 THEN 'HIGH_VELOCITY'
    WHEN payment_amount > avg_payment_amount_10_orders * 3 THEN 'UNUSUAL_AMOUNT'
    WHEN ip_address != previous_ip_address THEN 'IP_CHANGE'
    ELSE 'NORMAL'
  END as fraud_indicator,

  -- Overall risk score
  (
    CASE WHEN payments_last_hour >= 5 THEN 30 ELSE 0 END +
    CASE WHEN payment_amount > avg_payment_amount_10_orders * 3 THEN 25 ELSE 0 END +
    CASE WHEN ip_address != previous_ip_address THEN 15 ELSE 0 END
  ) as fraud_risk_score

FROM payment_patterns
WHERE fraud_risk_score > 20  -- Only flag potentially fraudulent transactions
ORDER BY fraud_risk_score DESC, payment_time DESC;

Best Practices for Change Streams

Performance and Scalability Guidelines

Optimize Change Stream implementations:

Efficient Filtering: Use specific match conditions to minimize unnecessary change events
Resume Tokens: Implement resume token persistence for fault tolerance
Resource Management: Monitor change stream resource usage and connection limits
Batch Processing: Group related changes for efficient processing
Error Handling: Implement robust error handling and retry logic
Index Strategy: Ensure proper indexes for change stream filter conditions

Architecture Considerations

Design scalable change stream architectures:

Deployment Patterns: Consider change stream placement in distributed systems
Event Ordering: Handle out-of-order events and ensure consistency
Backpressure Management: Implement backpressure handling for high-volume scenarios
Multi-Tenancy: Design change streams for multi-tenant applications
Security: Implement proper authentication and authorization for change streams
Monitoring: Set up comprehensive monitoring and alerting for change stream health

Conclusion

MongoDB Change Streams provide powerful real-time data processing capabilities that enable responsive, event-driven applications. Combined with SQL-style change data capture patterns, Change Streams deliver the real-time functionality modern applications require while maintaining familiar development approaches.

Key Change Stream benefits include:

Real-Time Reactivity: Immediate response to data changes with sub-second latency
Efficient Processing: Push-based notifications eliminate polling overhead and delays
Rich Change Metadata: Complete information about operations, including before/after states
Fault Tolerance: Built-in resume capability and error recovery mechanisms
Scalable Architecture: Works seamlessly across replica sets and sharded clusters

Whether you're building live dashboards, implementing data synchronization, creating reactive user interfaces, or developing event-driven architectures, MongoDB Change Streams with QueryLeaf's familiar SQL interface provide the foundation for real-time data processing. This combination enables you to implement sophisticated real-time functionality while preserving the development patterns and query approaches your team already knows.

QueryLeaf Integration: QueryLeaf automatically manages Change Stream setup, filtering, and error handling while providing SQL-familiar trigger syntax and streaming query capabilities. Complex change stream logic, resume token management, and multi-collection coordination are seamlessly handled through familiar SQL patterns.

The integration of real-time change processing with SQL-style event handling makes MongoDB an ideal platform for applications requiring both immediate data responsiveness and familiar database interaction patterns, ensuring your real-time features remain both powerful and maintainable as they scale and evolve.

September 4, 2025
12 min read

MongoDB Multi-Document Transactions and ACID Operations: Ensuring Data Consistency with SQL-Style Transaction Management

Modern applications often require complex operations that span multiple documents and collections while maintaining strict data consistency guarantees. Whether you're processing financial transactions, managing inventory updates, or coordinating multi-step business workflows, ensuring that related operations either all succeed or all fail together is critical for data integrity.

MongoDB's multi-document transactions provide ACID (Atomicity, Consistency, Isolation, Durability) guarantees that enable complex operations while maintaining data consistency. Combined with SQL-style transaction management patterns, MongoDB transactions offer familiar transaction semantics while leveraging MongoDB's document model advantages.

The Data Consistency Challenge

Without transactions, coordinating multiple related operations can lead to inconsistent data states:

-- SQL without transaction - potential inconsistency
-- Transfer money between accounts
UPDATE accounts SET balance = balance - 100 WHERE account_id = 'A';
-- If this fails, first update is already committed
UPDATE accounts SET balance = balance + 100 WHERE account_id = 'B';

-- Problems without transactions:
-- - Partial updates leave data inconsistent
-- - Concurrent operations can cause race conditions
-- - No atomicity guarantees across operations
-- - Difficult error recovery

MongoDB multi-document transactions solve these problems:

// MongoDB multi-document transaction
const session = client.startSession();

try {
  await session.withTransaction(async () => {
    // All operations in this block are atomic
    await accounts.updateOne(
      { account_id: 'A' },
      { $inc: { balance: -100 } },
      { session }
    );

    await accounts.updateOne(
      { account_id: 'B' },
      { $inc: { balance: 100 } },
      { session }
    );

    // Log transaction for audit
    await transaction_log.insertOne({
      type: 'transfer',
      from_account: 'A',
      to_account: 'B',
      amount: 100,
      timestamp: new Date()
    }, { session });
  });

  console.log('Transfer completed successfully');
} catch (error) {
  console.error('Transfer failed, all changes rolled back:', error);
} finally {
  await session.endSession();
}

// Benefits:
// - All operations succeed or fail together (Atomicity)
// - Data remains consistent throughout (Consistency)
// - Concurrent transactions don't interfere (Isolation)
// - Changes are permanently stored on success (Durability)

Understanding MongoDB Transactions

Transaction Basics

MongoDB transactions work across replica sets and sharded clusters:

// Basic transaction structure
class TransactionManager {
  constructor(client) {
    this.client = client;
  }

  async executeTransaction(operations, options = {}) {
    const session = this.client.startSession();

    try {
      const transactionOptions = {
        readPreference: 'primary',
        readConcern: { level: 'local' },
        writeConcern: { w: 'majority', j: true },
        maxCommitTimeMS: 30000,
        ...options
      };

      const result = await session.withTransaction(async () => {
        const results = [];

        for (const operation of operations) {
          const result = await this.executeOperation(operation, session);
          results.push(result);
        }

        return results;
      }, transactionOptions);

      return { success: true, results: result };

    } catch (error) {
      return { 
        success: false, 
        error: error.message,
        errorCode: error.code
      };
    } finally {
      await session.endSession();
    }
  }

  async executeOperation(operation, session) {
    const { type, collection, filter, update, document } = operation;

    switch (type) {
      case 'insertOne':
        return await this.client.db().collection(collection)
          .insertOne(document, { session });

      case 'updateOne':
        return await this.client.db().collection(collection)
          .updateOne(filter, update, { session });

      case 'deleteOne':
        return await this.client.db().collection(collection)
          .deleteOne(filter, { session });

      case 'findOneAndUpdate':
        return await this.client.db().collection(collection)
          .findOneAndUpdate(filter, update, { 
            session, 
            returnDocument: 'after' 
          });

      default:
        throw new Error(`Unsupported operation type: ${type}`);
    }
  }
}

// Usage example
const txManager = new TransactionManager(client);

const transferOperations = [
  {
    type: 'updateOne',
    collection: 'accounts',
    filter: { account_id: 'A', balance: { $gte: 100 } },
    update: { $inc: { balance: -100 } }
  },
  {
    type: 'updateOne', 
    collection: 'accounts',
    filter: { account_id: 'B' },
    update: { $inc: { balance: 100 } }
  },
  {
    type: 'insertOne',
    collection: 'transaction_log',
    document: {
      type: 'transfer',
      from_account: 'A',
      to_account: 'B', 
      amount: 100,
      timestamp: new Date()
    }
  }
];

const result = await txManager.executeTransaction(transferOperations);

ACID Properties in MongoDB

MongoDB transactions provide full ACID guarantees:

// Demonstrating ACID properties
class ECommerceTransactionManager {
  constructor(db) {
    this.db = db;
    this.orders = db.collection('orders');
    this.inventory = db.collection('inventory');
    this.customers = db.collection('customers');
    this.payments = db.collection('payments');
  }

  // Atomicity: All operations succeed or fail together
  async processOrder(orderData, paymentData) {
    const session = this.db.client.startSession();

    try {
      await session.withTransaction(async () => {
        // 1. Create order
        const orderResult = await this.orders.insertOne(orderData, { session });
        const orderId = orderResult.insertedId;

        // 2. Update inventory for all items
        for (const item of orderData.items) {
          const inventoryUpdate = await this.inventory.updateOne(
            { 
              product_id: item.product_id,
              quantity: { $gte: item.quantity }
            },
            { 
              $inc: { quantity: -item.quantity },
              $push: {
                reservations: {
                  order_id: orderId,
                  quantity: item.quantity,
                  timestamp: new Date()
                }
              }
            },
            { session }
          );

          if (inventoryUpdate.modifiedCount === 0) {
            throw new Error(`Insufficient inventory for product ${item.product_id}`);
          }
        }

        // 3. Process payment
        const paymentRecord = {
          ...paymentData,
          order_id: orderId,
          status: 'completed',
          processed_at: new Date()
        };

        await this.payments.insertOne(paymentRecord, { session });

        // 4. Update customer order history
        await this.customers.updateOne(
          { _id: orderData.customer_id },
          { 
            $push: { 
              order_history: orderId 
            },
            $inc: { 
              total_orders: 1,
              lifetime_value: orderData.total_amount
            }
          },
          { session }
        );

        return orderId;
      });

      return { success: true, orderId };

    } catch (error) {
      return { success: false, error: error.message };
    } finally {
      await session.endSession();
    }
  }

  // Consistency: Data remains valid throughout the transaction
  async transferInventoryBetweenWarehouses(fromWarehouse, toWarehouse, transfers) {
    const session = this.db.client.startSession();

    try {
      await session.withTransaction(async () => {
        let totalValueTransferred = 0;

        for (const transfer of transfers) {
          // Validate source warehouse has sufficient inventory
          const sourceInventory = await this.inventory.findOne({
            warehouse_id: fromWarehouse,
            product_id: transfer.product_id,
            quantity: { $gte: transfer.quantity }
          }, { session });

          if (!sourceInventory) {
            throw new Error(
              `Insufficient inventory for product ${transfer.product_id} in warehouse ${fromWarehouse}`
            );
          }

          // Calculate transfer value for consistency check
          totalValueTransferred += sourceInventory.unit_cost * transfer.quantity;

          // Remove from source warehouse
          await this.inventory.updateOne(
            { 
              warehouse_id: fromWarehouse,
              product_id: transfer.product_id
            },
            { 
              $inc: { quantity: -transfer.quantity }
            },
            { session }
          );

          // Add to destination warehouse
          await this.inventory.updateOne(
            {
              warehouse_id: toWarehouse,
              product_id: transfer.product_id
            },
            {
              $inc: { quantity: transfer.quantity }
            },
            { 
              session,
              upsert: true
            }
          );
        }

        // Record transfer transaction for audit
        await this.db.collection('inventory_transfers').insertOne({
          from_warehouse: fromWarehouse,
          to_warehouse: toWarehouse,
          transfers: transfers,
          total_value: totalValueTransferred,
          timestamp: new Date(),
          status: 'completed'
        }, { session });

        // Update warehouse totals (consistency validation)
        const fromWarehouseTotal = await this.inventory.aggregate([
          { $match: { warehouse_id: fromWarehouse } },
          { $group: { _id: null, total_value: { $sum: { $multiply: ['$quantity', '$unit_cost'] } } } }
        ], { session }).toArray();

        const toWarehouseTotal = await this.inventory.aggregate([
          { $match: { warehouse_id: toWarehouse } },
          { $group: { _id: null, total_value: { $sum: { $multiply: ['$quantity', '$unit_cost'] } } } }
        ], { session }).toArray();

        // Update warehouse summary records
        await this.db.collection('warehouse_summaries').updateOne(
          { warehouse_id: fromWarehouse },
          { 
            $set: { 
              total_inventory_value: fromWarehouseTotal[0]?.total_value || 0,
              last_updated: new Date()
            }
          },
          { session, upsert: true }
        );

        await this.db.collection('warehouse_summaries').updateOne(
          { warehouse_id: toWarehouse },
          { 
            $set: { 
              total_inventory_value: toWarehouseTotal[0]?.total_value || 0,
              last_updated: new Date()
            }
          },
          { session, upsert: true }
        );
      });

      return { success: true };

    } catch (error) {
      return { success: false, error: error.message };
    } finally {
      await session.endSession();
    }
  }
}

Advanced Transaction Patterns

Isolation Levels and Read Concerns

Configure transaction isolation for different consistency requirements:

// Transaction isolation configuration
class IsolationManager {
  constructor(client) {
    this.client = client;
  }

  // Read Committed isolation (default)
  async readCommittedTransaction(operations) {
    const session = this.client.startSession();

    try {
      const result = await session.withTransaction(async () => {
        return await this.executeOperations(operations, session);
      }, {
        readConcern: { level: 'local' },
        writeConcern: { w: 'majority', j: true }
      });

      return result;
    } finally {
      await session.endSession();
    }
  }

  // Snapshot isolation for consistent reads
  async snapshotTransaction(operations) {
    const session = this.client.startSession();

    try {
      const result = await session.withTransaction(async () => {
        return await this.executeOperations(operations, session);
      }, {
        readConcern: { level: 'snapshot' },
        writeConcern: { w: 'majority', j: true }
      });

      return result;
    } finally {
      await session.endSession();
    }
  }

  // Linearizable reads for strongest consistency
  async linearizableTransaction(operations) {
    const session = this.client.startSession();

    try {
      const result = await session.withTransaction(async () => {
        return await this.executeOperations(operations, session);
      }, {
        readConcern: { level: 'linearizable' },
        writeConcern: { w: 'majority', j: true },
        readPreference: 'primary'
      });

      return result;
    } finally {
      await session.endSession();
    }
  }

  async executeOperations(operations, session) {
    const results = [];

    for (const op of operations) {
      const collection = this.client.db().collection(op.collection);

      switch (op.type) {
        case 'find':
          const docs = await collection.find(op.filter, { session }).toArray();
          results.push(docs);
          break;

        case 'aggregate':
          const aggregated = await collection.aggregate(op.pipeline, { session }).toArray();
          results.push(aggregated);
          break;

        case 'updateOne':
          const updateResult = await collection.updateOne(op.filter, op.update, { session });
          results.push(updateResult);
          break;
      }
    }

    return results;
  }
}

// Usage examples for different isolation levels
const isolationManager = new IsolationManager(client);

// Financial reporting requiring snapshot isolation
const reportOperations = [
  {
    type: 'aggregate',
    collection: 'orders',
    pipeline: [
      { $match: { created_at: { $gte: new Date('2025-09-01') } } },
      { $group: { _id: null, total_revenue: { $sum: '$total_amount' } } }
    ]
  },
  {
    type: 'aggregate',
    collection: 'payments',
    pipeline: [
      { $match: { processed_at: { $gte: new Date('2025-09-01') } } },
      { $group: { _id: null, total_processed: { $sum: '$amount' } } }
    ]
  }
];

const reportResults = await isolationManager.snapshotTransaction(reportOperations);

Transaction Retry Logic

Implement robust retry mechanisms for transaction conflicts:

// Transaction retry with exponential backoff
class RetryableTransactionManager {
  constructor(client, options = {}) {
    this.client = client;
    this.maxRetries = options.maxRetries || 3;
    this.baseDelayMs = options.baseDelayMs || 100;
    this.maxDelayMs = options.maxDelayMs || 5000;
  }

  async executeWithRetry(transactionFn, options = {}) {
    let attempt = 0;

    while (attempt <= this.maxRetries) {
      const session = this.client.startSession();

      try {
        const result = await session.withTransaction(async () => {
          return await transactionFn(session);
        }, {
          readConcern: { level: 'local' },
          writeConcern: { w: 'majority', j: true },
          maxCommitTimeMS: 30000,
          ...options
        });

        return { success: true, result, attempts: attempt + 1 };

      } catch (error) {
        attempt++;

        // Check if error is retryable
        if (this.isRetryableError(error) && attempt <= this.maxRetries) {
          const delay = Math.min(
            this.baseDelayMs * Math.pow(2, attempt - 1),
            this.maxDelayMs
          );

          console.log(`Transaction failed (attempt ${attempt}), retrying in ${delay}ms: ${error.message}`);
          await this.sleep(delay);

        } else {
          return { 
            success: false, 
            error: error.message,
            errorCode: error.code,
            attempts: attempt
          };
        }
      } finally {
        await session.endSession();
      }
    }
  }

  isRetryableError(error) {
    // MongoDB retryable error codes
    const retryableCodes = [
      112, // WriteConflict
      117, // ConflictingOperationInProgress
      121, // DocumentValidationFailure
      125, // InvalidIdField
      133, // FailedToParse
      202, // NamespaceNotFound
      211, // NamespaceExists
      225, // InvalidNamespace
      251, // NoSuchTransaction
      256, // TransactionCoordinatorSteppingDown
      257, // TransactionCoordinatorReachedAbortDecision
    ];

    return retryableCodes.includes(error.code) || 
           error.hasErrorLabel('TransientTransactionError') ||
           error.hasErrorLabel('UnknownTransactionCommitResult');
  }

  sleep(ms) {
    return new Promise(resolve => setTimeout(resolve, ms));
  }
}

// Usage example with automatic retry
const retryManager = new RetryableTransactionManager(client);

const result = await retryManager.executeWithRetry(async (session) => {
  // Complex multi-collection operation
  const order = await db.collection('orders').findOneAndUpdate(
    { _id: orderId, status: 'pending' },
    { $set: { status: 'processing', processing_started: new Date() } },
    { session, returnDocument: 'after' }
  );

  if (!order.value) {
    throw new Error('Order not found or already processed');
  }

  // Update inventory
  for (const item of order.value.items) {
    const inventoryResult = await db.collection('inventory').updateOne(
      { product_id: item.product_id, quantity: { $gte: item.quantity } },
      { $inc: { quantity: -item.quantity, reserved: item.quantity } },
      { session }
    );

    if (inventoryResult.modifiedCount === 0) {
      throw new Error(`Insufficient inventory for ${item.product_id}`);
    }
  }

  // Create shipment record
  await db.collection('shipments').insertOne({
    order_id: orderId,
    items: order.value.items,
    status: 'preparing',
    created_at: new Date()
  }, { session });

  return order.value;
});

if (result.success) {
  console.log(`Order processed successfully after ${result.attempts} attempts`);
} else {
  console.error(`Order processing failed after ${result.attempts} attempts: ${result.error}`);
}

Complex Transaction Scenarios

Multi-Step Business Workflows

Implement complex business processes with transactions:

// Multi-step subscription management workflow
class SubscriptionManager {
  constructor(db) {
    this.db = db;
  }

  async upgradeSubscription(userId, newPlanId, paymentMethodId) {
    const session = this.db.client.startSession();

    try {
      const result = await session.withTransaction(async () => {
        // 1. Get current subscription
        const currentSubscription = await this.db.collection('subscriptions').findOne(
          { user_id: userId, status: 'active' },
          { session }
        );

        if (!currentSubscription) {
          throw new Error('No active subscription found');
        }

        // 2. Get new plan details
        const newPlan = await this.db.collection('subscription_plans').findOne(
          { _id: newPlanId, active: true },
          { session }
        );

        if (!newPlan) {
          throw new Error('Invalid subscription plan');
        }

        // 3. Calculate prorated charge
        const prorationAmount = this.calculateProration(currentSubscription, newPlan);

        // 4. Process payment if upgrade requires additional payment
        let paymentRecord = null;
        if (prorationAmount > 0) {
          paymentRecord = await this.processProrationPayment(
            userId, paymentMethodId, prorationAmount, session
          );
        }

        // 5. Update current subscription to cancelled
        await this.db.collection('subscriptions').updateOne(
          { _id: currentSubscription._id },
          { 
            $set: { 
              status: 'cancelled',
              cancelled_at: new Date(),
              cancellation_reason: 'upgraded'
            }
          },
          { session }
        );

        // 6. Create new subscription
        const newSubscription = {
          user_id: userId,
          plan_id: newPlanId,
          status: 'active',
          started_at: new Date(),
          current_period_start: new Date(),
          current_period_end: this.calculatePeriodEnd(newPlan),
          payment_method_id: paymentMethodId,
          amount: newPlan.price
        };

        const subscriptionResult = await this.db.collection('subscriptions').insertOne(
          newSubscription,
          { session }
        );

        // 7. Update user account
        await this.db.collection('users').updateOne(
          { _id: userId },
          { 
            $set: { 
              current_plan_id: newPlanId,
              plan_updated_at: new Date()
            },
            $push: {
              subscription_history: {
                action: 'upgrade',
                from_plan: currentSubscription.plan_id,
                to_plan: newPlanId,
                timestamp: new Date(),
                proration_amount: prorationAmount
              }
            }
          },
          { session }
        );

        // 8. Log activity
        await this.db.collection('activity_log').insertOne({
          user_id: userId,
          action: 'subscription_upgrade',
          details: {
            old_plan: currentSubscription.plan_id,
            new_plan: newPlanId,
            proration_amount: prorationAmount,
            payment_id: paymentRecord?._id
          },
          timestamp: new Date()
        }, { session });

        return {
          subscription_id: subscriptionResult.insertedId,
          proration_amount: prorationAmount,
          payment_id: paymentRecord?._id
        };
      });

      return { success: true, ...result };

    } catch (error) {
      return { success: false, error: error.message };
    } finally {
      await session.endSession();
    }
  }

  calculateProration(currentSubscription, newPlan) {
    const now = new Date();
    const periodEnd = new Date(currentSubscription.current_period_end);
    const periodStart = new Date(currentSubscription.current_period_start);

    // Calculate remaining time in current period
    const remainingDays = Math.max(0, Math.ceil((periodEnd - now) / (24 * 60 * 60 * 1000)));
    const totalDays = Math.ceil((periodEnd - periodStart) / (24 * 60 * 60 * 1000));

    // Current plan daily rate
    const currentDailyRate = currentSubscription.amount / totalDays;
    const currentRemaining = currentDailyRate * remainingDays;

    // New plan daily rate
    const newDailyRate = newPlan.price / 30; // Assuming monthly plans
    const newRemaining = newDailyRate * remainingDays;

    return Math.max(0, newRemaining - currentRemaining);
  }

  async processProrationPayment(userId, paymentMethodId, amount, session) {
    const paymentRecord = {
      user_id: userId,
      payment_method_id: paymentMethodId,
      amount: amount,
      type: 'proration',
      status: 'completed',
      processed_at: new Date()
    };

    const result = await this.db.collection('payments').insertOne(paymentRecord, { session });
    return { ...paymentRecord, _id: result.insertedId };
  }

  calculatePeriodEnd(plan) {
    const now = new Date();
    switch (plan.billing_period) {
      case 'monthly':
        return new Date(now.getFullYear(), now.getMonth() + 1, now.getDate());
      case 'yearly':
        return new Date(now.getFullYear() + 1, now.getMonth(), now.getDate());
      default:
        throw new Error('Unsupported billing period');
    }
  }
}

Distributed Transaction Coordination

Coordinate transactions across multiple databases or services:

// Two-phase commit pattern for distributed transactions
class DistributedTransactionCoordinator {
  constructor(databases) {
    this.databases = databases; // Map of database connections
    this.transactionLog = null;
  }

  async executeDistributedTransaction(operations) {
    const transactionId = this.generateTransactionId();
    const participants = Object.keys(operations);

    try {
      // Phase 1: Prepare phase
      console.log(`Starting distributed transaction ${transactionId}`);

      const prepareResults = await this.preparePhase(transactionId, operations);

      if (prepareResults.every(result => result.success)) {
        // Phase 2: Commit phase
        const commitResults = await this.commitPhase(transactionId, participants);

        if (commitResults.every(result => result.success)) {
          await this.logTransaction(transactionId, 'committed', operations);
          return { success: true, transactionId };
        } else {
          // Partial commit failure - need manual intervention
          await this.logTransaction(transactionId, 'partially_committed', operations, commitResults);
          throw new Error('Partial commit failure - manual intervention required');
        }
      } else {
        // Prepare phase failed - abort transaction
        await this.abortPhase(transactionId, participants);
        await this.logTransaction(transactionId, 'aborted', operations, prepareResults);
        throw new Error('Transaction aborted due to prepare phase failure');
      }

    } catch (error) {
      await this.abortPhase(transactionId, participants);
      await this.logTransaction(transactionId, 'failed', operations, null, error);
      throw error;
    }
  }

  async preparePhase(transactionId, operations) {
    const results = [];

    for (const [dbName, dbOperations] of Object.entries(operations)) {
      const db = this.databases[dbName];
      const session = db.client.startSession();

      try {
        // Start transaction and execute operations
        await session.startTransaction({
          readConcern: { level: 'local' },
          writeConcern: { w: 'majority', j: true }
        });

        for (const operation of dbOperations) {
          await this.executeOperation(db, operation, session);
        }

        // Prepare: keep transaction open but don't commit
        await this.recordParticipantState(transactionId, dbName, 'prepared', session);

        results.push({ 
          participant: dbName, 
          success: true, 
          session: session 
        });

      } catch (error) {
        await session.abortTransaction();
        await session.endSession();

        results.push({ 
          participant: dbName, 
          success: false, 
          error: error.message 
        });
      }
    }

    return results;
  }

  async commitPhase(transactionId, participants) {
    const results = [];

    for (const participant of participants) {
      try {
        const participantState = await this.getParticipantState(transactionId, participant);

        if (participantState.session) {
          await participantState.session.commitTransaction();
          await participantState.session.endSession();

          await this.recordParticipantState(transactionId, participant, 'committed');
          results.push({ participant, success: true });
        }
      } catch (error) {
        results.push({ participant, success: false, error: error.message });
      }
    }

    return results;
  }

  async abortPhase(transactionId, participants) {
    for (const participant of participants) {
      try {
        const participantState = await this.getParticipantState(transactionId, participant);

        if (participantState.session) {
          await participantState.session.abortTransaction();
          await participantState.session.endSession();
        }

        await this.recordParticipantState(transactionId, participant, 'aborted');
      } catch (error) {
        console.error(`Failed to abort transaction for participant ${participant}:`, error);
      }
    }
  }

  generateTransactionId() {
    return `dtx_${Date.now()}_${Math.random().toString(36).substr(2, 9)}`;
  }
}

QueryLeaf Transaction Integration

QueryLeaf provides SQL-familiar syntax for MongoDB transactions:

-- QueryLeaf transaction syntax
BEGIN TRANSACTION;

-- Transfer funds between accounts
UPDATE accounts 
SET balance = balance - 100,
    last_modified = CURRENT_TIMESTAMP
WHERE account_id = 'A' 
  AND balance >= 100;

-- Check if update affected exactly one row
IF @@ROWCOUNT != 1
BEGIN
    ROLLBACK TRANSACTION;
    RAISERROR('Insufficient funds or account not found', 16, 1);
    RETURN;
END

UPDATE accounts
SET balance = balance + 100,
    last_modified = CURRENT_TIMESTAMP  
WHERE account_id = 'B';

-- Log the transaction
INSERT INTO transaction_log (
    transaction_type,
    from_account,
    to_account,
    amount,
    timestamp,
    status
) VALUES (
    'transfer',
    'A',
    'B', 
    100,
    CURRENT_TIMESTAMP,
    'completed'
);

COMMIT TRANSACTION;

-- QueryLeaf automatically translates this to:
-- 1. MongoDB session.withTransaction()
-- 2. Proper error handling and rollback
-- 3. ACID compliance with MongoDB transactions
-- 4. Optimal read/write concerns

-- Advanced transaction patterns
WITH order_processing AS (
    -- Complex multi-table transaction
    BEGIN TRANSACTION ISOLATION LEVEL SNAPSHOT;

    -- Create order
    INSERT INTO orders (customer_id, total_amount, status, created_at)
    VALUES (@customer_id, @total_amount, 'pending', CURRENT_TIMESTAMP);

    SET @order_id = SCOPE_IDENTITY();

    -- Update inventory for each item
    UPDATE inventory 
    SET quantity = quantity - oi.quantity,
        reserved = reserved + oi.quantity
    FROM inventory i
    INNER JOIN @order_items oi ON i.product_id = oi.product_id
    WHERE i.quantity >= oi.quantity;

    -- Verify all items were updated
    IF @@ROWCOUNT != (SELECT COUNT(*) FROM @order_items)
    BEGIN
        ROLLBACK TRANSACTION;
        RAISERROR('Insufficient inventory for one or more items', 16, 1);
        RETURN;
    END

    -- Process payment
    INSERT INTO payments (order_id, amount, status, processed_at)
    VALUES (@order_id, @total_amount, 'completed', CURRENT_TIMESTAMP);

    COMMIT TRANSACTION;

    SELECT @order_id as order_id;
),

-- Real-time inventory management with transactions  
inventory_transfer AS (
    BEGIN TRANSACTION READ COMMITTED;

    -- Transfer inventory between warehouses
    DECLARE @transfer_value DECIMAL(15,2) = 0;

    -- Calculate total transfer value
    SELECT @transfer_value = SUM(quantity * unit_cost)
    FROM inventory 
    WHERE warehouse_id = @from_warehouse
      AND product_id IN (SELECT product_id FROM @transfer_items);

    -- Remove from source
    UPDATE inventory
    SET quantity = quantity - ti.quantity
    FROM inventory i
    INNER JOIN @transfer_items ti ON i.product_id = ti.product_id
    WHERE i.warehouse_id = @from_warehouse
      AND i.quantity >= ti.quantity;

    -- Add to destination  
    INSERT INTO inventory (warehouse_id, product_id, quantity, unit_cost)
    SELECT @to_warehouse, ti.product_id, ti.quantity, i.unit_cost
    FROM @transfer_items ti
    INNER JOIN inventory i ON ti.product_id = i.product_id
    WHERE i.warehouse_id = @from_warehouse
    ON CONFLICT (warehouse_id, product_id)
    DO UPDATE SET 
        quantity = inventory.quantity + EXCLUDED.quantity;

    -- Log transfer
    INSERT INTO inventory_transfers (
        from_warehouse,
        to_warehouse, 
        transfer_value,
        items_transferred,
        timestamp
    ) VALUES (
        @from_warehouse,
        @to_warehouse,
        @transfer_value,
        (SELECT COUNT(*) FROM @transfer_items),
        CURRENT_TIMESTAMP
    );

    COMMIT TRANSACTION;
)

-- QueryLeaf provides:
-- 1. Familiar SQL transaction syntax
-- 2. Automatic MongoDB session management
-- 3. Proper isolation level mapping
-- 4. Error handling and rollback logic
-- 5. Performance optimization for MongoDB
-- 6. Multi-collection transaction coordination

Best Practices for MongoDB Transactions

Performance Optimization

Optimize transaction performance for production workloads:

Keep Transactions Short: Minimize transaction duration to reduce lock contention
Batch Operations: Group related operations within single transactions
Appropriate Isolation: Use the lowest isolation level that meets consistency requirements
Index Strategy: Ensure proper indexes for transaction queries
Connection Management: Use connection pooling and limit concurrent transactions
Monitoring: Track transaction metrics and performance

Error Handling Guidelines

Implement robust error handling for production transactions:

Retry Logic: Implement exponential backoff for transient errors
Timeout Configuration: Set appropriate transaction timeouts
Deadlock Prevention: Order operations consistently to avoid deadlocks
Logging: Comprehensive transaction logging for debugging
Recovery Planning: Plan for partial failure scenarios
Testing: Test transaction behavior under concurrent load

Conclusion

MongoDB multi-document transactions provide ACID guarantees that enable complex operations while maintaining data consistency. Combined with SQL-familiar transaction patterns, MongoDB transactions offer robust data integrity features while leveraging the flexibility of the document model.

Key transaction benefits include:

Data Integrity: ACID compliance ensures consistent data states
Complex Operations: Coordinate multi-document and multi-collection operations
Error Recovery: Automatic rollback on failures prevents partial updates
Isolation Control: Configure appropriate isolation levels for different use cases
Scalability: Work across replica sets and sharded clusters

Whether you're building financial applications, e-commerce platforms, or complex business workflows, MongoDB transactions with QueryLeaf's familiar SQL interface provide the tools for maintaining data consistency at scale. This combination enables you to implement sophisticated transactional logic while preserving familiar development patterns.

QueryLeaf Integration: QueryLeaf automatically manages MongoDB sessions, transaction retry logic, and isolation levels while providing SQL-familiar transaction syntax. Complex multi-collection operations, error handling, and performance optimizations are seamlessly translated to efficient MongoDB transaction APIs, making ACID operations both powerful and accessible.

The integration of ACID transactions with SQL-style transaction management makes MongoDB an ideal platform for applications requiring both data consistency guarantees and familiar database interaction patterns, ensuring your transactional operations remain both reliable and maintainable as they scale.

September 3, 2025
20 min read

MongoDB Time Series Collections: High-Performance Analytics with SQL-Style Time Data Operations

Modern applications generate massive amounts of time-stamped data from IoT sensors, application metrics, financial trades, user activity logs, and monitoring systems. Whether you're tracking server performance metrics, analyzing user behavior patterns, or processing real-time sensor data from industrial equipment, traditional database approaches often struggle with the volume, velocity, and specific query patterns required for time-series workloads.

Time-series data presents unique challenges: high write throughput, time-based queries, efficient storage compression, and analytics operations that span large time ranges. MongoDB's time series collections provide specialized optimizations for these workloads while maintaining the flexibility and query capabilities that make MongoDB powerful for application development.

The Time Series Data Challenge

Traditional approaches to storing time-series data have significant limitations:

-- SQL time series storage challenges

-- Basic table structure for metrics
CREATE TABLE server_metrics (
  id SERIAL PRIMARY KEY,
  server_id VARCHAR(50),
  metric_name VARCHAR(100),
  value DECIMAL(10,4),
  timestamp TIMESTAMP,
  tags JSONB
);

-- High insert volume creates index maintenance overhead
INSERT INTO server_metrics (server_id, metric_name, value, timestamp, tags)
VALUES 
  ('web-01', 'cpu_usage', 85.2, '2025-09-03 10:15:00', '{"datacenter": "us-east", "env": "prod"}'),
  ('web-01', 'memory_usage', 72.1, '2025-09-03 10:15:00', '{"datacenter": "us-east", "env": "prod"}'),
  ('web-01', 'disk_io', 150.8, '2025-09-03 10:15:00', '{"datacenter": "us-east", "env": "prod"}');
-- Problems: Index bloat, storage inefficiency, slow inserts

-- Time-range queries require expensive scans
SELECT 
  server_id,
  metric_name,
  AVG(value) as avg_value,
  MAX(value) as max_value
FROM server_metrics
WHERE timestamp BETWEEN '2025-09-03 00:00:00' AND '2025-09-03 23:59:59'
  AND metric_name = 'cpu_usage'
GROUP BY server_id, metric_name;
-- Problems: Full table scans, no time-series optimization

-- Storage grows rapidly without compression
SELECT 
  pg_size_pretty(pg_total_relation_size('server_metrics')) AS table_size,
  COUNT(*) as row_count,
  MAX(timestamp) - MIN(timestamp) as time_span
FROM server_metrics;
-- Problems: No time-based compression, storage overhead

MongoDB time series collections address these challenges:

// MongoDB time series collections optimizations
db.createCollection('server_metrics', {
  timeseries: {
    timeField: 'timestamp',
    metaField: 'metadata',
    granularity: 'minutes',
    bucketMaxSpanSeconds: 3600,
    bucketRoundingSeconds: 60
  }
});

// Optimized insertions for high-throughput scenarios
db.server_metrics.insertMany([
  {
    timestamp: ISODate("2025-09-03T10:15:00Z"),
    cpu_usage: 85.2,
    memory_usage: 72.1,
    disk_io: 150.8,
    metadata: {
      server_id: "web-01",
      datacenter: "us-east",
      environment: "prod",
      instance_type: "c5.large"
    }
  },
  {
    timestamp: ISODate("2025-09-03T10:16:00Z"),
    cpu_usage: 87.5,
    memory_usage: 74.3,
    disk_io: 165.2,
    metadata: {
      server_id: "web-01", 
      datacenter: "us-east",
      environment: "prod",
      instance_type: "c5.large"
    }
  }
]);

// Benefits:
// - Automatic bucketing reduces storage overhead by 70%+
// - Time-based indexes optimized for range queries
// - Compression algorithms designed for time-series patterns
// - Query performance optimized for time-range operations

Creating Time Series Collections

Basic Time Series Setup

Configure time series collections for optimal performance:

// Time series collection configuration
class TimeSeriesManager {
  constructor(db) {
    this.db = db;
  }

  async createMetricsCollection(options = {}) {
    // Server metrics time series collection
    return await this.db.createCollection('server_metrics', {
      timeseries: {
        timeField: 'timestamp',
        metaField: 'metadata',
        granularity: options.granularity || 'minutes',

        // Bucket configuration for optimization
        bucketMaxSpanSeconds: options.maxSpan || 3600,     // 1 hour buckets
        bucketRoundingSeconds: options.rounding || 60       // Round to nearest minute
      },

      // Additional optimizations
      clusteredIndex: {
        key: { _id: 1 },
        unique: true
      }
    });
  }

  async createIoTSensorCollection() {
    // IoT sensor data with high-frequency measurements
    return await this.db.createCollection('sensor_readings', {
      timeseries: {
        timeField: 'timestamp',
        metaField: 'sensor_info',
        granularity: 'seconds',    // High-frequency data

        // Shorter buckets for high-frequency data
        bucketMaxSpanSeconds: 300,  // 5 minute buckets
        bucketRoundingSeconds: 10   // Round to nearest 10 seconds
      }
    });
  }

  async createFinancialDataCollection() {
    // Financial market data (trades, prices)
    return await this.db.createCollection('market_data', {
      timeseries: {
        timeField: 'trade_time',
        metaField: 'instrument',
        granularity: 'seconds',

        // Financial data specific optimizations
        bucketMaxSpanSeconds: 60,   // 1 minute buckets for market data
        bucketRoundingSeconds: 1    // Precise timing important
      },

      // Expire old data automatically (regulatory requirements)
      expireAfterSeconds: 7 * 365 * 24 * 60 * 60  // 7 years retention
    });
  }

  async createUserActivityCollection() {
    // User activity tracking (clicks, views, sessions)
    return await this.db.createCollection('user_activity', {
      timeseries: {
        timeField: 'event_time',
        metaField: 'user_context',
        granularity: 'minutes',

        bucketMaxSpanSeconds: 3600,  // 1 hour buckets
        bucketRoundingSeconds: 60    // Minute precision
      },

      // Data lifecycle management
      expireAfterSeconds: 90 * 24 * 60 * 60  // 90 days retention
    });
  }
}

SQL-style time series table creation concepts:

-- SQL time series table equivalent patterns
-- Specialized table for time-series data
CREATE TABLE server_metrics (
  timestamp TIMESTAMPTZ NOT NULL,
  server_id VARCHAR(50) NOT NULL,
  datacenter VARCHAR(20),
  environment VARCHAR(10),
  cpu_usage DECIMAL(5,2),
  memory_usage DECIMAL(5,2),
  disk_io DECIMAL(8,2),
  network_bytes_in BIGINT,
  network_bytes_out BIGINT,

  -- Time-series optimizations
  CONSTRAINT pk_server_metrics PRIMARY KEY (server_id, timestamp),
  CONSTRAINT check_timestamp_range 
    CHECK (timestamp >= '2024-01-01' AND timestamp < '2030-01-01')
);

-- Time-series specific indexes
CREATE INDEX idx_server_metrics_time_range 
ON server_metrics USING BRIN (timestamp);

-- Partitioning by time for performance
CREATE TABLE server_metrics_2025_09 
PARTITION OF server_metrics
FOR VALUES FROM ('2025-09-01') TO ('2025-10-01');

-- Automatic data lifecycle with partitions
CREATE TABLE server_metrics_template (
  LIKE server_metrics INCLUDING ALL
) WITH (
  fillfactor = 100,  -- Optimize for append-only data
  parallel_workers = 8
);

-- Compression for historical data
ALTER TABLE server_metrics_2025_08 SET (
  toast_compression = 'lz4',
  parallel_workers = 4
);

High-Performance Time Series Queries

Time-Range Analytics

Implement efficient time-based analytics operations:

// Time series analytics implementation
class TimeSeriesAnalytics {
  constructor(db) {
    this.db = db;
    this.metricsCollection = db.collection('server_metrics');
  }

  async getMetricSummary(serverId, metricName, startTime, endTime) {
    // Basic time series aggregation with performance optimization
    const pipeline = [
      {
        $match: {
          'metadata.server_id': serverId,
          timestamp: {
            $gte: startTime,
            $lte: endTime
          }
        }
      },
      {
        $group: {
          _id: null,
          avg_value: { $avg: `$${metricName}` },
          min_value: { $min: `$${metricName}` },
          max_value: { $max: `$${metricName}` },
          sample_count: { $sum: 1 },
          first_timestamp: { $min: "$timestamp" },
          last_timestamp: { $max: "$timestamp" }
        }
      },
      {
        $project: {
          _id: 0,
          server_id: serverId,
          metric_name: metricName,
          statistics: {
            average: { $round: ["$avg_value", 2] },
            minimum: "$min_value",
            maximum: "$max_value",
            sample_count: "$sample_count"
          },
          time_range: {
            start: "$first_timestamp",
            end: "$last_timestamp",
            duration_minutes: {
              $divide: [
                { $subtract: ["$last_timestamp", "$first_timestamp"] },
                60000
              ]
            }
          }
        }
      }
    ];

    const results = await this.metricsCollection.aggregate(pipeline).toArray();
    return results[0];
  }

  async getTimeSeriesData(serverId, metricName, startTime, endTime, intervalMinutes = 5) {
    // Time bucketed aggregation for charts and visualization
    const intervalMs = intervalMinutes * 60 * 1000;

    const pipeline = [
      {
        $match: {
          'metadata.server_id': serverId,
          timestamp: {
            $gte: startTime,
            $lte: endTime
          }
        }
      },
      {
        $group: {
          _id: {
            // Create time buckets
            time_bucket: {
              $dateFromParts: {
                year: { $year: "$timestamp" },
                month: { $month: "$timestamp" },
                day: { $dayOfMonth: "$timestamp" },
                hour: { $hour: "$timestamp" },
                minute: {
                  $multiply: [
                    { $floor: { $divide: [{ $minute: "$timestamp" }, intervalMinutes] } },
                    intervalMinutes
                  ]
                }
              }
            }
          },
          avg_value: { $avg: `$${metricName}` },
          min_value: { $min: `$${metricName}` },
          max_value: { $max: `$${metricName}` },
          sample_count: { $sum: 1 },
          // Calculate percentiles
          values: { $push: `$${metricName}` }
        }
      },
      {
        $addFields: {
          // Calculate approximate percentiles
          p95_value: {
            $arrayElemAt: [
              "$values",
              { $floor: { $multiply: [{ $size: "$values" }, 0.95] } }
            ]
          }
        }
      },
      {
        $sort: { "_id.time_bucket": 1 }
      },
      {
        $project: {
          timestamp: "$_id.time_bucket",
          metrics: {
            average: { $round: ["$avg_value", 2] },
            minimum: "$min_value",
            maximum: "$max_value",
            p95: "$p95_value",
            sample_count: "$sample_count"
          },
          _id: 0
        }
      }
    ];

    return await this.metricsCollection.aggregate(pipeline).toArray();
  }

  async detectAnomalies(serverId, metricName, windowHours = 24) {
    // Statistical anomaly detection using moving averages
    const windowStart = new Date(Date.now() - windowHours * 60 * 60 * 1000);

    const pipeline = [
      {
        $match: {
          'metadata.server_id': serverId,
          timestamp: { $gte: windowStart }
        }
      },
      {
        $sort: { timestamp: 1 }
      },
      {
        $setWindowFields: {
          partitionBy: null,
          sortBy: { timestamp: 1 },
          output: {
            // Moving average over last 10 points
            moving_avg: {
              $avg: `$${metricName}`,
              window: {
                documents: [-9, 0]  // Current + 9 previous points
              }
            },
            // Standard deviation
            moving_std: {
              $stdDevSamp: `$${metricName}`,
              window: {
                documents: [-19, 0]  // Current + 19 previous points
              }
            }
          }
        }
      },
      {
        $addFields: {
          // Detect anomalies using 2-sigma rule
          deviation: {
            $abs: { $subtract: [`$${metricName}`, "$moving_avg"] }
          },
          threshold: { $multiply: ["$moving_std", 2] }
        }
      },
      {
        $addFields: {
          is_anomaly: { $gt: ["$deviation", "$threshold"] },
          anomaly_severity: {
            $cond: {
              if: { $gt: ["$deviation", { $multiply: ["$moving_std", 3] }] },
              then: "high",
              else: {
                $cond: {
                  if: { $gt: ["$deviation", { $multiply: ["$moving_std", 2] }] },
                  then: "medium",
                  else: "low"
                }
              }
            }
          }
        }
      },
      {
        $match: {
          is_anomaly: true
        }
      },
      {
        $project: {
          timestamp: 1,
          value: `$${metricName}`,
          expected_value: { $round: ["$moving_avg", 2] },
          deviation: { $round: ["$deviation", 2] },
          severity: "$anomaly_severity",
          metadata: 1
        }
      },
      {
        $sort: { timestamp: -1 }
      },
      {
        $limit: 50
      }
    ];

    return await this.metricsCollection.aggregate(pipeline).toArray();
  }

  async calculateMetricCorrelations(serverIds, metrics, timeWindow) {
    // Analyze correlations between different metrics
    const pipeline = [
      {
        $match: {
          'metadata.server_id': { $in: serverIds },
          timestamp: {
            $gte: new Date(Date.now() - timeWindow)
          }
        }
      },
      {
        // Group by minute for correlation analysis
        $group: {
          _id: {
            server: "$metadata.server_id",
            minute: {
              $dateFromParts: {
                year: { $year: "$timestamp" },
                month: { $month: "$timestamp" },
                day: { $dayOfMonth: "$timestamp" },
                hour: { $hour: "$timestamp" },
                minute: { $minute: "$timestamp" }
              }
            }
          },
          // Average metrics within each minute bucket
          cpu_avg: { $avg: "$cpu_usage" },
          memory_avg: { $avg: "$memory_usage" },
          disk_io_avg: { $avg: "$disk_io" },
          network_in_avg: { $avg: "$network_bytes_in" },
          network_out_avg: { $avg: "$network_bytes_out" }
        }
      },
      {
        $group: {
          _id: "$_id.server",
          data_points: {
            $push: {
              timestamp: "$_id.minute",
              cpu: "$cpu_avg",
              memory: "$memory_avg",
              disk_io: "$disk_io_avg",
              network_in: "$network_in_avg",
              network_out: "$network_out_avg"
            }
          }
        }
      },
      {
        $addFields: {
          // Calculate correlation between CPU and memory
          cpu_memory_correlation: {
            $function: {
              body: function(dataPoints) {
                const n = dataPoints.length;
                if (n < 2) return 0;

                const cpuValues = dataPoints.map(d => d.cpu);
                const memValues = dataPoints.map(d => d.memory);

                const cpuMean = cpuValues.reduce((a, b) => a + b, 0) / n;
                const memMean = memValues.reduce((a, b) => a + b, 0) / n;

                let numerator = 0, cpuSumSq = 0, memSumSq = 0;

                for (let i = 0; i < n; i++) {
                  const cpuDiff = cpuValues[i] - cpuMean;
                  const memDiff = memValues[i] - memMean;

                  numerator += cpuDiff * memDiff;
                  cpuSumSq += cpuDiff * cpuDiff;
                  memSumSq += memDiff * memDiff;
                }

                const denominator = Math.sqrt(cpuSumSq * memSumSq);
                return denominator === 0 ? 0 : numerator / denominator;
              },
              args: ["$data_points"],
              lang: "js"
            }
          }
        }
      },
      {
        $project: {
          server_id: "$_id",
          correlation_analysis: {
            cpu_memory: { $round: ["$cpu_memory_correlation", 3] },
            data_points: { $size: "$data_points" },
            analysis_period: timeWindow
          },
          _id: 0
        }
      }
    ];

    return await this.metricsCollection.aggregate(pipeline).toArray();
  }

  async getTrendAnalysis(serverId, metricName, days = 7) {
    // Trend analysis with growth rates and predictions
    const daysAgo = new Date(Date.now() - days * 24 * 60 * 60 * 1000);

    const pipeline = [
      {
        $match: {
          'metadata.server_id': serverId,
          timestamp: { $gte: daysAgo }
        }
      },
      {
        $group: {
          _id: {
            // Group by hour for trend analysis
            date: { $dateToString: { format: "%Y-%m-%d", date: "$timestamp" } },
            hour: { $hour: "$timestamp" }
          },
          avg_value: { $avg: `$${metricName}` },
          min_value: { $min: `$${metricName}` },
          max_value: { $max: `$${metricName}` },
          sample_count: { $sum: 1 }
        }
      },
      {
        $sort: { "_id.date": 1, "_id.hour": 1 }
      },
      {
        $setWindowFields: {
          sortBy: { "_id.date": 1, "_id.hour": 1 },
          output: {
            // Calculate rate of change
            previous_value: {
              $shift: {
                output: "$avg_value",
                by: -1
              }
            },
            // 24-hour moving average
            daily_trend: {
              $avg: "$avg_value",
              window: {
                documents: [-23, 0]  // 24 hours
              }
            }
          }
        }
      },
      {
        $addFields: {
          hourly_change: {
            $cond: {
              if: { $ne: ["$previous_value", null] },
              then: { $subtract: ["$avg_value", "$previous_value"] },
              else: 0
            }
          },
          change_percentage: {
            $cond: {
              if: { $and: [
                { $ne: ["$previous_value", null] },
                { $ne: ["$previous_value", 0] }
              ]},
              then: {
                $multiply: [
                  { $divide: [
                    { $subtract: ["$avg_value", "$previous_value"] },
                    "$previous_value"
                  ]},
                  100
                ]
              },
              else: 0
            }
          }
        }
      },
      {
        $match: {
          previous_value: { $ne: null }  // Exclude first data point
        }
      },
      {
        $project: {
          date: "$_id.date",
          hour: "$_id.hour",
          metric_value: { $round: ["$avg_value", 2] },
          trend_value: { $round: ["$daily_trend", 2] },
          hourly_change: { $round: ["$hourly_change", 2] },
          change_percentage: { $round: ["$change_percentage", 1] },
          volatility: {
            $abs: { $subtract: ["$avg_value", "$daily_trend"] }
          },
          _id: 0
        }
      }
    ];

    return await this.metricsCollection.aggregate(pipeline).toArray();
  }

  async getCapacityForecast(serverId, metricName, forecastDays = 30) {
    // Simple linear regression for capacity planning
    const historyDays = forecastDays * 2;  // Use 2x history for prediction
    const historyStart = new Date(Date.now() - historyDays * 24 * 60 * 60 * 1000);

    const pipeline = [
      {
        $match: {
          'metadata.server_id': serverId,
          timestamp: { $gte: historyStart }
        }
      },
      {
        $group: {
          _id: {
            date: { $dateToString: { format: "%Y-%m-%d", date: "$timestamp" } }
          },
          daily_avg: { $avg: `$${metricName}` },
          daily_max: { $max: `$${metricName}` },
          sample_count: { $sum: 1 }
        }
      },
      {
        $sort: { "_id.date": 1 }
      },
      {
        $group: {
          _id: null,
          daily_data: {
            $push: {
              date: "$_id.date",
              avg_value: "$daily_avg",
              max_value: "$daily_max"
            }
          }
        }
      },
      {
        $addFields: {
          // Linear regression calculation
          regression: {
            $function: {
              body: function(dailyData) {
                const n = dailyData.length;
                if (n < 7) return null;  // Need minimum data points

                // Convert dates to day numbers for regression
                const baseDate = new Date(dailyData[0].date).getTime();
                const points = dailyData.map((d, i) => ({
                  x: i,  // Day number
                  y: d.avg_value
                }));

                // Calculate linear regression
                const sumX = points.reduce((sum, p) => sum + p.x, 0);
                const sumY = points.reduce((sum, p) => sum + p.y, 0);
                const sumXY = points.reduce((sum, p) => sum + (p.x * p.y), 0);
                const sumXX = points.reduce((sum, p) => sum + (p.x * p.x), 0);

                const slope = (n * sumXY - sumX * sumY) / (n * sumXX - sumX * sumX);
                const intercept = (sumY - slope * sumX) / n;

                // Calculate R-squared
                const meanY = sumY / n;
                const totalSS = points.reduce((sum, p) => sum + Math.pow(p.y - meanY, 2), 0);
                const residualSS = points.reduce((sum, p) => {
                  const predicted = slope * p.x + intercept;
                  return sum + Math.pow(p.y - predicted, 2);
                }, 0);
                const rSquared = 1 - (residualSS / totalSS);

                return {
                  slope: slope,
                  intercept: intercept,
                  correlation: Math.sqrt(Math.max(0, rSquared)),
                  confidence: rSquared > 0.7 ? 'high' : rSquared > 0.4 ? 'medium' : 'low'
                };
              },
              args: ["$daily_data"],
              lang: "js"
            }
          }
        }
      },
      {
        $project: {
          current_trend: "$regression",
          forecast_days: forecastDays,
          historical_data: { $slice: ["$daily_data", -7] },  // Last 7 days
          _id: 0
        }
      }
    ];

    const results = await this.metricsCollection.aggregate(pipeline).toArray();

    if (results.length > 0 && results[0].current_trend) {
      const trend = results[0].current_trend;
      const forecastData = [];

      // Generate forecast points
      for (let day = 1; day <= forecastDays; day++) {
        const futureDate = new Date(Date.now() + day * 24 * 60 * 60 * 1000);
        const xValue = historyDays + day;
        const predictedValue = trend.slope * xValue + trend.intercept;

        forecastData.push({
          date: futureDate.toISOString().split('T')[0],
          predicted_value: Math.round(predictedValue * 100) / 100,
          confidence: trend.confidence
        });
      }

      results[0].forecast = forecastData;
    }

    return results[0];
  }

  async getMultiServerComparison(serverIds, metricName, hours = 24) {
    // Compare metrics across multiple servers
    const startTime = new Date(Date.now() - hours * 60 * 60 * 1000);

    const pipeline = [
      {
        $match: {
          'metadata.server_id': { $in: serverIds },
          timestamp: { $gte: startTime }
        }
      },
      {
        $group: {
          _id: {
            server: "$metadata.server_id",
            // Hourly buckets for comparison
            hour: {
              $dateFromParts: {
                year: { $year: "$timestamp" },
                month: { $month: "$timestamp" },
                day: { $dayOfMonth: "$timestamp" },
                hour: { $hour: "$timestamp" }
              }
            }
          },
          avg_value: { $avg: `$${metricName}` },
          max_value: { $max: `$${metricName}` },
          sample_count: { $sum: 1 }
        }
      },
      {
        $group: {
          _id: "$_id.hour",
          server_data: {
            $push: {
              server_id: "$_id.server",
              avg_value: "$avg_value",
              max_value: "$max_value",
              sample_count: "$sample_count"
            }
          }
        }
      },
      {
        $addFields: {
          // Calculate statistics across all servers for each hour
          hourly_stats: {
            avg_across_servers: { $avg: "$server_data.avg_value" },
            max_across_servers: { $max: "$server_data.max_value" },
            min_across_servers: { $min: "$server_data.avg_value" },
            server_count: { $size: "$server_data" }
          }
        }
      },
      {
        $sort: { "_id": 1 }
      },
      {
        $project: {
          timestamp: "$_id",
          servers: "$server_data",
          cluster_stats: "$hourly_stats",
          _id: 0
        }
      }
    ];

    return await this.metricsCollection.aggregate(pipeline).toArray();
  }
}

IoT and Sensor Data Management

Real-Time Sensor Processing

Handle high-frequency IoT sensor data efficiently:

// IoT sensor data management for time series
class IoTTimeSeriesManager {
  constructor(db) {
    this.db = db;
    this.sensorCollection = db.collection('sensor_readings');
  }

  async setupSensorIndexes() {
    // Optimized indexes for sensor queries
    await this.sensorCollection.createIndexes([
      // Time range queries
      { 'timestamp': 1, 'sensor_info.device_id': 1 },

      // Sensor type and location queries
      { 'sensor_info.sensor_type': 1, 'timestamp': -1 },
      { 'sensor_info.location': '2dsphere', 'timestamp': -1 },

      // Multi-sensor aggregation queries
      { 'sensor_info.facility_id': 1, 'sensor_info.sensor_type': 1, 'timestamp': -1 }
    ]);
  }

  async processSensorBatch(sensorReadings) {
    // High-performance batch insertion for IoT data
    const documents = sensorReadings.map(reading => ({
      timestamp: new Date(reading.timestamp),
      temperature: reading.temperature,
      humidity: reading.humidity,
      pressure: reading.pressure,
      vibration: reading.vibration,
      sensor_info: {
        device_id: reading.deviceId,
        sensor_type: reading.sensorType,
        location: {
          type: "Point",
          coordinates: [reading.longitude, reading.latitude]
        },
        facility_id: reading.facilityId,
        installation_date: reading.installationDate,
        firmware_version: reading.firmwareVersion
      }
    }));

    try {
      const result = await this.sensorCollection.insertMany(documents, {
        ordered: false,  // Allow partial success for high throughput
        bypassDocumentValidation: false
      });

      return {
        success: true,
        insertedCount: result.insertedCount,
        insertedIds: result.insertedIds
      };
    } catch (error) {
      // Handle partial failures gracefully
      return {
        success: false,
        error: error.message,
        partialResults: error.writeErrors || []
      };
    }
  }

  async getSensorTelemetry(facilityId, sensorType, timeRange) {
    // Real-time sensor monitoring dashboard
    const pipeline = [
      {
        $match: {
          'sensor_info.facility_id': facilityId,
          'sensor_info.sensor_type': sensorType,
          timestamp: {
            $gte: timeRange.start,
            $lte: timeRange.end
          }
        }
      },
      {
        $group: {
          _id: {
            device_id: "$sensor_info.device_id",
            // 15-minute intervals for real-time monitoring
            interval: {
              $dateFromParts: {
                year: { $year: "$timestamp" },
                month: { $month: "$timestamp" },
                day: { $dayOfMonth: "$timestamp" },
                hour: { $hour: "$timestamp" },
                minute: {
                  $multiply: [
                    { $floor: { $divide: [{ $minute: "$timestamp" }, 15] } },
                    15
                  ]
                }
              }
            }
          },
          // Aggregate sensor readings
          avg_temperature: { $avg: "$temperature" },
          avg_humidity: { $avg: "$humidity" },
          avg_pressure: { $avg: "$pressure" },
          max_vibration: { $max: "$vibration" },
          reading_count: { $sum: 1 },
          // Device metadata
          device_location: { $first: "$sensor_info.location" },
          firmware_version: { $first: "$sensor_info.firmware_version" }
        }
      },
      {
        $addFields: {
          // Health indicators
          health_score: {
            $switch: {
              branches: [
                { 
                  case: { $lt: ["$reading_count", 3] }, 
                  then: "poor"  // Too few readings
                },
                {
                  case: { $gt: ["$max_vibration", 100] },
                  then: "critical"  // High vibration
                },
                {
                  case: { $or: [
                    { $lt: ["$avg_temperature", -10] },
                    { $gt: ["$avg_temperature", 50] }
                  ]},
                  then: "warning"  // Temperature out of range
                }
              ],
              default: "normal"
            }
          }
        }
      },
      {
        $group: {
          _id: "$_id.interval",
          devices: {
            $push: {
              device_id: "$_id.device_id",
              measurements: {
                temperature: { $round: ["$avg_temperature", 1] },
                humidity: { $round: ["$avg_humidity", 1] },
                pressure: { $round: ["$avg_pressure", 1] },
                vibration: { $round: ["$max_vibration", 1] }
              },
              health: "$health_score",
              reading_count: "$reading_count",
              location: "$device_location"
            }
          },
          facility_summary: {
            avg_temp: { $avg: "$avg_temperature" },
            avg_humidity: { $avg: "$avg_humidity" },
            total_devices: { $sum: 1 },
            healthy_devices: {
              $sum: {
                $cond: {
                  if: { $eq: ["$health_score", "normal"] },
                  then: 1,
                  else: 0
                }
              }
            }
          }
        }
      },
      {
        $sort: { "_id": -1 }
      },
      {
        $limit: 24  // Last 24 intervals (6 hours of 15-min intervals)
      },
      {
        $project: {
          timestamp: "$_id",
          devices: 1,
          facility_summary: {
            avg_temperature: { $round: ["$facility_summary.avg_temp", 1] },
            avg_humidity: { $round: ["$facility_summary.avg_humidity", 1] },
            device_health_ratio: {
              $round: [
                { $divide: ["$facility_summary.healthy_devices", "$facility_summary.total_devices"] },
                2
              ]
            }
          },
          _id: 0
        }
      }
    ];

    return await this.sensorCollection.aggregate(pipeline).toArray();
  }

  async detectSensorFailures(facilityId, timeWindowHours = 2) {
    // Identify potentially failed or malfunctioning sensors
    const windowStart = new Date(Date.now() - timeWindowHours * 60 * 60 * 1000);

    const pipeline = [
      {
        $match: {
          'sensor_info.facility_id': facilityId,
          timestamp: { $gte: windowStart }
        }
      },
      {
        $group: {
          _id: "$sensor_info.device_id",
          reading_count: { $sum: 1 },
          last_reading: { $max: "$timestamp" },
          avg_temperature: { $avg: "$temperature" },
          temp_variance: { $stdDevSamp: "$temperature" },
          max_vibration: { $max: "$vibration" },
          location: { $first: "$sensor_info.location" },
          sensor_type: { $first: "$sensor_info.sensor_type" }
        }
      },
      {
        $addFields: {
          minutes_since_last_reading: {
            $divide: [
              { $subtract: [new Date(), "$last_reading"] },
              60000
            ]
          },
          expected_readings: timeWindowHours * 4,  // Assuming 15-min intervals
          reading_ratio: {
            $divide: ["$reading_count", timeWindowHours * 4]
          }
        }
      },
      {
        $addFields: {
          failure_indicators: {
            no_recent_data: { $gt: ["$minutes_since_last_reading", 30] },
            insufficient_readings: { $lt: ["$reading_ratio", 0.5] },
            temperature_anomaly: { $gt: ["$temp_variance", 20] },
            vibration_alert: { $gt: ["$max_vibration", 150] }
          }
        }
      },
      {
        $addFields: {
          failure_score: {
            $add: [
              { $cond: { if: "$failure_indicators.no_recent_data", then: 3, else: 0 } },
              { $cond: { if: "$failure_indicators.insufficient_readings", then: 2, else: 0 } },
              { $cond: { if: "$failure_indicators.temperature_anomaly", then: 2, else: 0 } },
              { $cond: { if: "$failure_indicators.vibration_alert", then: 1, else: 0 } }
            ]
          }
        }
      },
      {
        $match: {
          failure_score: { $gte: 2 }  // Devices with significant failure indicators
        }
      },
      {
        $sort: { failure_score: -1, minutes_since_last_reading: -1 }
      },
      {
        $project: {
          device_id: "$_id",
          sensor_type: 1,
          location: 1,
          failure_score: 1,
          failure_indicators: 1,
          last_reading: 1,
          minutes_since_last_reading: { $round: ["$minutes_since_last_reading", 1] },
          reading_count: 1,
          expected_readings: 1,
          _id: 0
        }
      }
    ];

    return await this.sensorCollection.aggregate(pipeline).toArray();
  }
}

SQL-style sensor data analytics concepts:

-- SQL time series sensor analytics equivalent
-- IoT sensor data table with time partitioning
CREATE TABLE sensor_readings (
  timestamp TIMESTAMPTZ NOT NULL,
  device_id VARCHAR(50) NOT NULL,
  sensor_type VARCHAR(20),
  temperature DECIMAL(5,2),
  humidity DECIMAL(5,2),
  pressure DECIMAL(7,2),
  vibration DECIMAL(6,2),
  location POINT,
  facility_id VARCHAR(20),

  PRIMARY KEY (device_id, timestamp)
) PARTITION BY RANGE (timestamp);

-- Real-time sensor monitoring query
WITH recent_readings AS (
  SELECT 
    device_id,
    sensor_type,
    AVG(temperature) as avg_temp,
    AVG(humidity) as avg_humidity,
    MAX(vibration) as max_vibration,
    COUNT(*) as reading_count,
    MAX(timestamp) as last_reading
  FROM sensor_readings
  WHERE timestamp >= NOW() - INTERVAL '15 minutes'
    AND facility_id = 'FACILITY_001'
  GROUP BY device_id, sensor_type
)
SELECT 
  device_id,
  sensor_type,
  ROUND(avg_temp, 1) as current_temperature,
  ROUND(avg_humidity, 1) as current_humidity,
  ROUND(max_vibration, 1) as peak_vibration,
  reading_count,
  CASE 
    WHEN EXTRACT(EPOCH FROM (NOW() - last_reading)) / 60 > 30 THEN 'OFFLINE'
    WHEN max_vibration > 150 THEN 'CRITICAL' 
    WHEN avg_temp < -10 OR avg_temp > 50 THEN 'WARNING'
    ELSE 'NORMAL'
  END as device_status
FROM recent_readings
ORDER BY 
  CASE device_status 
    WHEN 'CRITICAL' THEN 1 
    WHEN 'WARNING' THEN 2
    WHEN 'OFFLINE' THEN 3
    ELSE 4 
  END,
  device_id;

Financial Time Series Analytics

Market Data Processing

Process high-frequency financial data with time series collections:

// Financial market data time series processing
class FinancialTimeSeriesProcessor {
  constructor(db) {
    this.db = db;
    this.marketDataCollection = db.collection('market_data');
  }

  async processTradeData(trades) {
    // Process high-frequency trade data
    const documents = trades.map(trade => ({
      trade_time: new Date(trade.timestamp),
      price: parseFloat(trade.price),
      volume: parseInt(trade.volume),
      bid_price: parseFloat(trade.bidPrice),
      ask_price: parseFloat(trade.askPrice),
      trade_type: trade.tradeType,  // 'buy' or 'sell'
      instrument: {
        symbol: trade.symbol,
        exchange: trade.exchange,
        market_sector: trade.sector,
        currency: trade.currency
      }
    }));

    return await this.marketDataCollection.insertMany(documents, {
      ordered: false
    });
  }

  async calculateOHLCData(symbol, intervalMinutes = 5, days = 1) {
    // Calculate OHLC (Open, High, Low, Close) data for charting
    const startTime = new Date(Date.now() - days * 24 * 60 * 60 * 1000);

    const pipeline = [
      {
        $match: {
          'instrument.symbol': symbol,
          trade_time: { $gte: startTime }
        }
      },
      {
        $group: {
          _id: {
            // Create time buckets for OHLC intervals
            interval_start: {
              $dateFromParts: {
                year: { $year: "$trade_time" },
                month: { $month: "$trade_time" },
                day: { $dayOfMonth: "$trade_time" },
                hour: { $hour: "$trade_time" },
                minute: {
                  $multiply: [
                    { $floor: { $divide: [{ $minute: "$trade_time" }, intervalMinutes] } },
                    intervalMinutes
                  ]
                }
              }
            }
          },
          // OHLC calculations
          open_price: { $first: "$price" },      // First trade in interval
          high_price: { $max: "$price" },        // Highest trade price
          low_price: { $min: "$price" },         // Lowest trade price  
          close_price: { $last: "$price" },      // Last trade in interval
          total_volume: { $sum: "$volume" },
          trade_count: { $sum: 1 },

          // Additional analytics
          volume_weighted_price: {
            $divide: [
              { $sum: { $multiply: ["$price", "$volume"] } },
              { $sum: "$volume" }
            ]
          },

          // Bid-ask spread analysis
          avg_bid_ask_spread: {
            $avg: { $subtract: ["$ask_price", "$bid_price"] }
          }
        }
      },
      {
        $addFields: {
          // Calculate price movement and volatility
          price_change: { $subtract: ["$close_price", "$open_price"] },
          price_range: { $subtract: ["$high_price", "$low_price"] },
          volatility_ratio: {
            $divide: [
              { $subtract: ["$high_price", "$low_price"] },
              "$open_price"
            ]
          }
        }
      },
      {
        $sort: { "_id.interval_start": 1 }
      },
      {
        $project: {
          timestamp: "$_id.interval_start",
          ohlc: {
            open: { $round: ["$open_price", 4] },
            high: { $round: ["$high_price", 4] },
            low: { $round: ["$low_price", 4] },
            close: { $round: ["$close_price", 4] }
          },
          volume: "$total_volume",
          trades: "$trade_count",
          analytics: {
            vwap: { $round: ["$volume_weighted_price", 4] },
            price_change: { $round: ["$price_change", 4] },
            volatility: { $round: ["$volatility_ratio", 6] },
            avg_spread: { $round: ["$avg_bid_ask_spread", 4] }
          },
          _id: 0
        }
      }
    ];

    return await this.marketDataCollection.aggregate(pipeline).toArray();
  }

  async detectTradingPatterns(symbol, lookbackHours = 4) {
    // Pattern recognition for algorithmic trading
    const startTime = new Date(Date.now() - lookbackHours * 60 * 60 * 1000);

    const pipeline = [
      {
        $match: {
          'instrument.symbol': symbol,
          trade_time: { $gte: startTime }
        }
      },
      {
        $sort: { trade_time: 1 }
      },
      {
        $setWindowFields: {
          sortBy: { trade_time: 1 },
          output: {
            // Moving averages for technical analysis
            sma_5: {
              $avg: "$price",
              window: { documents: [-4, 0] }  // 5-point simple moving average
            },
            sma_20: {
              $avg: "$price", 
              window: { documents: [-19, 0] }  // 20-point simple moving average
            },

            // Price momentum indicators
            price_change_1: {
              $subtract: [
                "$price",
                { $shift: { output: "$price", by: -1 } }
              ]
            },

            // Volume analysis
            volume_ratio: {
              $divide: [
                "$volume",
                {
                  $avg: "$volume",
                  window: { documents: [-9, 0] }  // 10-period volume average
                }
              ]
            }
          }
        }
      },
      {
        $addFields: {
          // Technical indicators
          trend_signal: {
            $cond: {
              if: { $gt: ["$sma_5", "$sma_20"] },
              then: "bullish",
              else: "bearish"
            }
          },

          momentum_signal: {
            $switch: {
              branches: [
                { case: { $gt: ["$price_change_1", 0.01] }, then: "strong_buy" },
                { case: { $gt: ["$price_change_1", 0] }, then: "buy" },
                { case: { $lt: ["$price_change_1", -0.01] }, then: "strong_sell" },
                { case: { $lt: ["$price_change_1", 0] }, then: "sell" }
              ],
              default: "hold"
            }
          },

          volume_signal: {
            $cond: {
              if: { $gt: ["$volume_ratio", 1.5] },
              then: "high_volume",
              else: "normal_volume"
            }
          }
        }
      },
      {
        $match: {
          sma_5: { $ne: null },  // Exclude initial points without moving averages
          sma_20: { $ne: null }
        }
      },
      {
        $project: {
          trade_time: 1,
          price: { $round: ["$price", 4] },
          volume: 1,
          technical_indicators: {
            sma_5: { $round: ["$sma_5", 4] },
            sma_20: { $round: ["$sma_20", 4] },
            trend: "$trend_signal",
            momentum: "$momentum_signal",
            volume: "$volume_signal"
          },
          _id: 0
        }
      },
      {
        $sort: { trade_time: -1 }
      },
      {
        $limit: 100
      }
    ];

    return await this.marketDataCollection.aggregate(pipeline).toArray();
  }
}

QueryLeaf Time Series Integration

QueryLeaf provides SQL-familiar syntax for time series operations with MongoDB's optimized storage:

-- QueryLeaf time series operations with SQL-style syntax

-- Time range queries with familiar SQL date functions
SELECT 
  sensor_info.device_id,
  sensor_info.facility_id,
  AVG(temperature) as avg_temperature,
  MAX(humidity) as max_humidity,
  COUNT(*) as reading_count
FROM sensor_readings
WHERE timestamp >= CURRENT_TIMESTAMP - INTERVAL '24 hours'
  AND sensor_info.sensor_type = 'environmental'
GROUP BY sensor_info.device_id, sensor_info.facility_id
ORDER BY avg_temperature DESC;

-- Time bucketing using SQL date functions
SELECT 
  DATE_TRUNC('hour', timestamp) as hour_bucket,
  instrument.symbol,
  FIRST(price ORDER BY trade_time) as open_price,
  MAX(price) as high_price, 
  MIN(price) as low_price,
  LAST(price ORDER BY trade_time) as close_price,
  SUM(volume) as total_volume,
  COUNT(*) as trade_count
FROM market_data
WHERE trade_time >= CURRENT_DATE - INTERVAL '7 days'
  AND instrument.symbol IN ('AAPL', 'GOOGL', 'MSFT')
GROUP BY hour_bucket, instrument.symbol
ORDER BY hour_bucket DESC, instrument.symbol;

-- Window functions for technical analysis
SELECT 
  trade_time,
  instrument.symbol,
  price,
  volume,
  AVG(price) OVER (
    PARTITION BY instrument.symbol 
    ORDER BY trade_time 
    ROWS BETWEEN 4 PRECEDING AND CURRENT ROW
  ) as sma_5,
  AVG(price) OVER (
    PARTITION BY instrument.symbol
    ORDER BY trade_time
    ROWS BETWEEN 19 PRECEDING AND CURRENT ROW
  ) as sma_20
FROM market_data
WHERE trade_time >= CURRENT_TIMESTAMP - INTERVAL '4 hours'
  AND instrument.symbol = 'BTC-USD'
ORDER BY trade_time DESC;

-- Sensor anomaly detection using SQL analytics
WITH sensor_stats AS (
  SELECT 
    sensor_info.device_id,
    timestamp,
    temperature,
    AVG(temperature) OVER (
      PARTITION BY sensor_info.device_id
      ORDER BY timestamp
      ROWS BETWEEN 19 PRECEDING AND CURRENT ROW
    ) as rolling_avg,
    STDDEV(temperature) OVER (
      PARTITION BY sensor_info.device_id
      ORDER BY timestamp  
      ROWS BETWEEN 19 PRECEDING AND CURRENT ROW
    ) as rolling_std
  FROM sensor_readings
  WHERE timestamp >= CURRENT_TIMESTAMP - INTERVAL '24 hours'
    AND sensor_info.facility_id = 'PLANT_001'
)
SELECT 
  device_id,
  timestamp,
  temperature,
  rolling_avg,
  ABS(temperature - rolling_avg) as deviation,
  rolling_std * 2 as anomaly_threshold,
  CASE 
    WHEN ABS(temperature - rolling_avg) > rolling_std * 3 THEN 'CRITICAL'
    WHEN ABS(temperature - rolling_avg) > rolling_std * 2 THEN 'WARNING'
    ELSE 'NORMAL'
  END as anomaly_level
FROM sensor_stats
WHERE ABS(temperature - rolling_avg) > rolling_std * 2
ORDER BY timestamp DESC;

-- QueryLeaf automatically optimizes for:
-- 1. Time series collection bucketing and compression
-- 2. Time-based index utilization for range queries  
-- 3. Efficient aggregation pipelines for time bucketing
-- 4. Window function translation to MongoDB analytics
-- 5. Date/time function mapping to MongoDB operators
-- 6. Automatic data lifecycle management

-- Capacity planning with growth analysis
WITH daily_metrics AS (
  SELECT 
    DATE_TRUNC('day', timestamp) as metric_date,
    metadata.server_id,
    AVG(cpu_usage) as daily_avg_cpu,
    MAX(memory_usage) as daily_peak_memory
  FROM server_metrics
  WHERE timestamp >= CURRENT_DATE - INTERVAL '90 days'
  GROUP BY metric_date, metadata.server_id
),
growth_analysis AS (
  SELECT 
    server_id,
    metric_date,
    daily_avg_cpu,
    daily_peak_memory,
    LAG(daily_avg_cpu, 7) OVER (PARTITION BY server_id ORDER BY metric_date) as cpu_week_ago,
    AVG(daily_avg_cpu) OVER (
      PARTITION BY server_id 
      ORDER BY metric_date 
      ROWS BETWEEN 29 PRECEDING AND CURRENT ROW
    ) as cpu_30_day_avg
  FROM daily_metrics
)
SELECT 
  server_id,
  daily_avg_cpu as current_cpu,
  cpu_30_day_avg,
  CASE 
    WHEN cpu_week_ago IS NOT NULL 
    THEN ((daily_avg_cpu - cpu_week_ago) / cpu_week_ago) * 100
    ELSE NULL 
  END as weekly_growth_percent,
  CASE
    WHEN daily_avg_cpu > cpu_30_day_avg * 1.2 THEN 'SCALING_NEEDED'
    WHEN daily_avg_cpu > cpu_30_day_avg * 1.1 THEN 'MONITOR_CLOSELY'
    ELSE 'NORMAL_CAPACITY'
  END as capacity_status
FROM growth_analysis
WHERE metric_date = CURRENT_DATE - INTERVAL '1 day'
ORDER BY weekly_growth_percent DESC NULLS LAST;

Data Lifecycle and Retention

Automated Data Management

Implement intelligent data lifecycle policies:

// Time series data lifecycle management
class TimeSeriesLifecycleManager {
  constructor(db) {
    this.db = db;
    this.retentionPolicies = new Map();
  }

  defineRetentionPolicy(collection, policy) {
    this.retentionPolicies.set(collection, {
      hotDataDays: policy.hotDataDays || 7,      // High-frequency access
      warmDataDays: policy.warmDataDays || 90,   // Moderate access
      coldDataDays: policy.coldDataDays || 365,  // Archive access
      deleteAfterDays: policy.deleteAfterDays || 2555  // 7 years
    });
  }

  async applyDataLifecycle(collection) {
    const policy = this.retentionPolicies.get(collection);
    if (!policy) return;

    const now = new Date();
    const hotCutoff = new Date(now.getTime() - policy.hotDataDays * 24 * 60 * 60 * 1000);
    const warmCutoff = new Date(now.getTime() - policy.warmDataDays * 24 * 60 * 60 * 1000);
    const coldCutoff = new Date(now.getTime() - policy.coldDataDays * 24 * 60 * 60 * 1000);
    const deleteCutoff = new Date(now.getTime() - policy.deleteAfterDays * 24 * 60 * 60 * 1000);

    // Archive warm data (compress and move to separate collection)
    await this.archiveWarmData(collection, warmCutoff, coldCutoff);

    // Move cold data to archive storage
    await this.moveColdData(collection, coldCutoff, deleteCutoff);

    // Delete expired data
    await this.deleteExpiredData(collection, deleteCutoff);

    return {
      hotDataCutoff: hotCutoff,
      warmDataCutoff: warmCutoff,
      coldDataCutoff: coldCutoff,
      deleteCutoff: deleteCutoff
    };
  }

  async archiveWarmData(collection, startTime, endTime) {
    const archiveCollection = `${collection}_archive`;

    // Aggregate and compress warm data
    const pipeline = [
      {
        $match: {
          timestamp: { $gte: startTime, $lt: endTime }
        }
      },
      {
        $group: {
          _id: {
            // Compress to hourly aggregates
            hour: {
              $dateFromParts: {
                year: { $year: "$timestamp" },
                month: { $month: "$timestamp" }, 
                day: { $dayOfMonth: "$timestamp" },
                hour: { $hour: "$timestamp" }
              }
            },
            metadata: "$metadata"
          },
          // Statistical aggregates preserve essential information
          avg_values: {
            cpu_usage: { $avg: "$cpu_usage" },
            memory_usage: { $avg: "$memory_usage" },
            disk_io: { $avg: "$disk_io" }
          },
          max_values: {
            cpu_usage: { $max: "$cpu_usage" },
            memory_usage: { $max: "$memory_usage" },
            disk_io: { $max: "$disk_io" }
          },
          min_values: {
            cpu_usage: { $min: "$cpu_usage" },
            memory_usage: { $min: "$memory_usage" },
            disk_io: { $min: "$disk_io" }
          },
          sample_count: { $sum: 1 },
          first_reading: { $min: "$timestamp" },
          last_reading: { $max: "$timestamp" }
        }
      },
      {
        $addFields: {
          archived_at: new Date(),
          data_type: "hourly_aggregate",
          original_collection: collection
        }
      },
      {
        $out: archiveCollection
      }
    ];

    await this.db.collection(collection).aggregate(pipeline).toArray();

    // Remove original data after successful archival
    const deleteResult = await this.db.collection(collection).deleteMany({
      timestamp: { $gte: startTime, $lt: endTime }
    });

    return {
      archivedDocuments: deleteResult.deletedCount,
      archiveCollection: archiveCollection
    };
  }
}

Advanced Time Series Analytics

Complex Time-Based Aggregations

Implement sophisticated analytics operations:

// Advanced time series analytics operations
class TimeSeriesAnalyticsEngine {
  constructor(db) {
    this.db = db;
  }

  async generateTimeSeriesForecast(collection, field, options = {}) {
    // Time series forecasting using exponential smoothing
    const days = options.historyDays || 30;
    const forecastDays = options.forecastDays || 7;
    const startTime = new Date(Date.now() - days * 24 * 60 * 60 * 1000);

    const pipeline = [
      {
        $match: {
          timestamp: { $gte: startTime },
          [field]: { $exists: true, $ne: null }
        }
      },
      {
        $group: {
          _id: {
            date: { $dateToString: { format: "%Y-%m-%d", date: "$timestamp" } }
          },
          daily_avg: { $avg: `$${field}` },
          daily_count: { $sum: 1 }
        }
      },
      {
        $sort: { "_id.date": 1 }
      },
      {
        $group: {
          _id: null,
          daily_series: {
            $push: {
              date: "$_id.date",
              value: "$daily_avg",
              sample_size: "$daily_count"
            }
          }
        }
      },
      {
        $addFields: {
          // Calculate exponential smoothing forecast
          forecast: {
            $function: {
              body: function(dailySeries, forecastDays) {
                if (dailySeries.length < 7) return null;

                // Exponential smoothing parameters
                const alpha = 0.3;  // Smoothing factor
                const beta = 0.1;   // Trend factor

                let level = dailySeries[0].value;
                let trend = 0;

                // Calculate initial trend
                if (dailySeries.length >= 2) {
                  trend = dailySeries[1].value - dailySeries[0].value;
                }

                const smoothed = [];
                const forecasts = [];

                // Apply exponential smoothing to historical data
                for (let i = 0; i < dailySeries.length; i++) {
                  const actual = dailySeries[i].value;

                  if (i > 0) {
                    const forecast = level + trend;
                    const error = actual - forecast;

                    // Update level and trend
                    const newLevel = alpha * actual + (1 - alpha) * (level + trend);
                    const newTrend = beta * (newLevel - level) + (1 - beta) * trend;

                    level = newLevel;
                    trend = newTrend;
                  }

                  smoothed.push({
                    date: dailySeries[i].date,
                    actual: actual,
                    smoothed: level,
                    trend: trend
                  });
                }

                // Generate future forecasts
                for (let i = 1; i <= forecastDays; i++) {
                  const forecastValue = level + (trend * i);
                  const futureDate = new Date(new Date(dailySeries[dailySeries.length - 1].date).getTime() + i * 24 * 60 * 60 * 1000);

                  forecasts.push({
                    date: futureDate.toISOString().split('T')[0],
                    forecast_value: Math.round(forecastValue * 100) / 100,
                    confidence: Math.max(0.1, 1 - (i * 0.1))  // Decreasing confidence
                  });
                }

                return {
                  historical_smoothing: smoothed,
                  forecasts: forecasts,
                  model_parameters: {
                    alpha: alpha,
                    beta: beta,
                    final_level: level,
                    final_trend: trend
                  }
                };
              },
              args: ["$daily_series", forecastDays],
              lang: "js"
            }
          }
        }
      },
      {
        $project: {
          field_name: field,
          forecast_analysis: "$forecast",
          data_points: { $size: "$daily_series" },
          forecast_period_days: forecastDays,
          _id: 0
        }
      }
    ];

    const results = await this.db.collection(collection).aggregate(pipeline).toArray();
    return results[0];
  }

  async correlateTimeSeriesMetrics(collection, metrics, timeWindow) {
    // Cross-metric correlation analysis
    const startTime = new Date(Date.now() - timeWindow);

    const pipeline = [
      {
        $match: {
          timestamp: { $gte: startTime }
        }
      },
      {
        $group: {
          _id: {
            // Hourly buckets for correlation
            hour: {
              $dateFromParts: {
                year: { $year: "$timestamp" },
                month: { $month: "$timestamp" },
                day: { $dayOfMonth: "$timestamp" },
                hour: { $hour: "$timestamp" }
              }
            },
            server: "$metadata.server_id"
          },
          // Average metrics for each hour/server combination
          hourly_metrics: {
            $push: metrics.reduce((obj, metric) => {
              obj[metric] = { $avg: `$${metric}` };
              return obj;
            }, {})
          }
        }
      },
      {
        $group: {
          _id: "$_id.server",
          metric_series: { $push: "$hourly_metrics" }
        }
      },
      {
        $addFields: {
          correlations: {
            $function: {
              body: function(metricSeries, metricNames) {
                const correlations = {};

                // Calculate pairwise correlations
                for (let i = 0; i < metricNames.length; i++) {
                  for (let j = i + 1; j < metricNames.length; j++) {
                    const metric1 = metricNames[i];
                    const metric2 = metricNames[j];

                    const values1 = metricSeries.map(s => s[0][metric1]);
                    const values2 = metricSeries.map(s => s[0][metric2]);

                    const correlation = calculateCorrelation(values1, values2);
                    correlations[`${metric1}_${metric2}`] = Math.round(correlation * 1000) / 1000;
                  }
                }

                function calculateCorrelation(x, y) {
                  const n = x.length;
                  if (n !== y.length || n < 2) return 0;

                  const sumX = x.reduce((a, b) => a + b, 0);
                  const sumY = y.reduce((a, b) => a + b, 0);
                  const sumXY = x.reduce((sum, xi, i) => sum + xi * y[i], 0);
                  const sumXX = x.reduce((sum, xi) => sum + xi * xi, 0);
                  const sumYY = y.reduce((sum, yi) => sum + yi * yi, 0);

                  const numerator = n * sumXY - sumX * sumY;
                  const denominator = Math.sqrt((n * sumXX - sumX * sumX) * (n * sumYY - sumY * sumY));

                  return denominator === 0 ? 0 : numerator / denominator;
                }

                return correlations;
              },
              args: ["$metric_series", metrics],
              lang: "js"
            }
          }
        }
      },
      {
        $project: {
          server_id: "$_id",
          metric_correlations: "$correlations",
          analysis_period: timeWindow,
          _id: 0
        }
      }
    ];

    return await this.db.collection(collection).aggregate(pipeline).toArray();
  }
}

Best Practices for Time Series Collections

Design Guidelines

Essential practices for MongoDB time series implementations:

Time Field Selection: Choose appropriate time field granularity based on data frequency
Metadata Organization: Structure metadata for efficient querying and aggregation
Index Strategy: Create time-based compound indexes for common query patterns
Bucket Configuration: Optimize bucket sizes based on data insertion patterns
Retention Policies: Implement automatic data lifecycle management
Compression Strategy: Use MongoDB's time series compression for storage efficiency

Performance Optimization

Optimize time series collection performance:

Write Optimization: Use batch inserts and optimize insertion order by timestamp
Query Patterns: Design queries to leverage time series optimizations and indexes
Aggregation Efficiency: Use time bucketing and window functions for analytics
Memory Management: Monitor working set size and adjust based on query patterns
Sharding Strategy: Implement time-based sharding for horizontal scaling
Cache Strategy: Cache frequently accessed time ranges and aggregations

Conclusion

MongoDB time series collections provide specialized optimizations for time-stamped data workloads, delivering high-performance storage, querying, and analytics capabilities. Combined with SQL-style query patterns, time series collections enable familiar database operations while leveraging MongoDB's optimization advantages for temporal data.

Key time series benefits include:

Storage Efficiency: Automatic bucketing and compression reduce storage overhead by 70%+
Write Performance: Optimized insertion patterns for high-frequency data streams
Query Optimization: Time-based indexes and aggregation pipelines designed for temporal queries
Analytics Integration: Built-in support for windowing functions and statistical operations
Lifecycle Management: Automated data aging and retention policy enforcement

Whether you're building IoT monitoring systems, financial analytics platforms, or application performance dashboards, MongoDB time series collections with QueryLeaf's familiar SQL interface provide the foundation for scalable time-based data processing. This combination enables you to implement powerful temporal analytics while preserving the development patterns and query approaches your team already knows.

QueryLeaf Integration: QueryLeaf automatically detects time series collections and optimizes SQL queries to leverage MongoDB's time series storage and indexing optimizations. Window functions, date operations, and time-based grouping are seamlessly translated to efficient MongoDB aggregation pipelines designed for temporal data patterns.

The integration of specialized time series storage with SQL-style temporal analytics makes MongoDB an ideal platform for applications requiring both high-performance time data processing and familiar database interaction patterns, ensuring your time series analytics remain both comprehensive and maintainable as data volumes scale.

September 2, 2025
19 min read

MongoDB Full-Text Search and Advanced Indexing: SQL-Style Text Queries and Search Optimization

Modern applications require sophisticated search capabilities that go beyond simple pattern matching. Whether you're building e-commerce product catalogs, content management systems, or document repositories, users expect fast, relevant, and intelligent search functionality that can handle typos, synonyms, and complex queries across multiple languages.

Traditional database text search often relies on basic LIKE patterns or regular expressions, which are limited in functionality and performance. MongoDB's full-text search capabilities, combined with advanced indexing strategies, provide enterprise-grade search functionality that rivals dedicated search engines while maintaining the simplicity of database queries.

The Text Search Challenge

Basic text search approaches have significant limitations:

-- SQL basic text search limitations

-- Simple pattern matching - case sensitive, no relevance
SELECT product_name, description, price
FROM products
WHERE product_name LIKE '%laptop%'
   OR description LIKE '%laptop%';
-- Problems: Case sensitivity, no stemming, no relevance scoring

-- Regular expressions - expensive and limited
SELECT title, content, author
FROM articles  
WHERE content ~* '(machine|artificial|deep).*(learning|intelligence)';
-- Problems: No ranking, poor performance on large datasets

-- Multiple keyword search - complex and inefficient
SELECT *
FROM products
WHERE (LOWER(product_name) LIKE '%gaming%' OR LOWER(description) LIKE '%gaming%')
  AND (LOWER(product_name) LIKE '%laptop%' OR LOWER(description) LIKE '%laptop%')
  AND (LOWER(product_name) LIKE '%performance%' OR LOWER(description) LIKE '%performance%');
-- Problems: Complex syntax, no semantic understanding, poor performance

MongoDB's text search addresses these limitations:

// MongoDB advanced text search capabilities
db.products.find({
  $text: {
    $search: "gaming laptop performance",
    $language: "english",
    $caseSensitive: false,
    $diacriticSensitive: false
  }
}, {
  score: { $meta: "textScore" }
}).sort({
  score: { $meta: "textScore" }
});

// Results include:
// - Stemming: "games" matches "gaming"  
// - Language-specific tokenization
// - Relevance scoring based on term frequency and position
// - Multi-field search across indexed text fields
// - Performance optimized with specialized text indexes

Text Indexing Fundamentals

Creating Text Indexes

Build comprehensive text search functionality with MongoDB text indexes:

// Basic text index creation
db.products.createIndex({
  product_name: "text",
  description: "text",
  category: "text"
});

// Weighted text index for relevance tuning
db.products.createIndex({
  product_name: "text",
  description: "text", 
  tags: "text",
  category: "text"
}, {
  weights: {
    product_name: 10,    // Product name is most important
    description: 5,      // Description has medium importance  
    tags: 8,            // Tags are highly relevant
    category: 3         // Category provides context
  },
  name: "product_text_search",
  default_language: "english",
  language_override: "language"
});

// Compound index combining text search with other criteria
db.products.createIndex({
  category: 1,           // Standard index for filtering
  price: 1,             // Range queries
  product_name: "text",  // Text search
  description: "text"
}, {
  weights: {
    product_name: 15,
    description: 8
  }
});

// Multi-language text index
db.articles.createIndex({
  title: "text",
  content: "text"
}, {
  default_language: "english",
  language_override: "lang",  // Document field that specifies language
  weights: {
    title: 20,
    content: 10
  }
});

SQL-style text indexing concepts:

-- SQL full-text search equivalent patterns

-- Create full-text index on multiple columns
CREATE FULLTEXT INDEX ft_products_search 
ON products (product_name, description, tags);

-- Weighted full-text search with relevance ranking
SELECT 
  product_id,
  product_name,
  description,
  MATCH(product_name, description, tags) 
    AGAINST('gaming laptop performance' IN NATURAL LANGUAGE MODE) AS relevance_score
FROM products
WHERE MATCH(product_name, description, tags) 
  AGAINST('gaming laptop performance' IN NATURAL LANGUAGE MODE)
ORDER BY relevance_score DESC;

-- Boolean full-text search with operators
SELECT *
FROM products
WHERE MATCH(product_name, description) 
  AGAINST('+gaming +laptop -refurbished' IN BOOLEAN MODE);

-- Full-text search with additional filtering
SELECT 
  product_name,
  price,
  category,
  MATCH(product_name, description) 
    AGAINST('high performance gaming' IN NATURAL LANGUAGE MODE) AS score
FROM products
WHERE price BETWEEN 1000 AND 3000
  AND category = 'computers'
  AND MATCH(product_name, description) 
    AGAINST('high performance gaming' IN NATURAL LANGUAGE MODE)
ORDER BY score DESC
LIMIT 20;

Advanced Text Search Queries

Implement sophisticated search patterns:

// Advanced text search implementation
class TextSearchService {
  constructor(db) {
    this.db = db;
    this.productsCollection = db.collection('products');
  }

  async basicTextSearch(searchTerm, options = {}) {
    const query = {
      $text: {
        $search: searchTerm,
        $language: options.language || "english",
        $caseSensitive: options.caseSensitive || false,
        $diacriticSensitive: options.diacriticSensitive || false
      }
    };

    // Add additional filters
    if (options.category) {
      query.category = options.category;
    }

    if (options.priceRange) {
      query.price = {
        $gte: options.priceRange.min,
        $lte: options.priceRange.max
      };
    }

    const results = await this.productsCollection.find(query, {
      projection: {
        product_name: 1,
        description: 1,
        price: 1,
        category: 1,
        score: { $meta: "textScore" }
      }
    })
    .sort({ score: { $meta: "textScore" } })
    .limit(options.limit || 20)
    .toArray();

    return results;
  }

  async phraseSearch(phrase, options = {}) {
    // Exact phrase search using quoted strings
    const query = {
      $text: {
        $search: `"${phrase}"`,
        $language: options.language || "english"
      }
    };

    return await this.productsCollection.find(query, {
      projection: {
        product_name: 1,
        description: 1,
        score: { $meta: "textScore" }
      }
    })
    .sort({ score: { $meta: "textScore" } })
    .limit(options.limit || 10)
    .toArray();
  }

  async booleanTextSearch(searchExpression, options = {}) {
    // Boolean search with inclusion/exclusion operators
    const query = {
      $text: {
        $search: searchExpression,  // e.g., "laptop gaming -refurbished"
        $language: options.language || "english"
      }
    };

    return await this.productsCollection.find(query, {
      projection: {
        product_name: 1,
        description: 1,
        price: 1,
        score: { $meta: "textScore" }
      }
    })
    .sort({ score: { $meta: "textScore" } })
    .limit(options.limit || 20)
    .toArray();
  }

  async fuzzySearch(searchTerm, options = {}) {
    // Combine text search with regex for fuzzy matching
    const textResults = await this.basicTextSearch(searchTerm, options);

    // Fuzzy fallback using regex for typos/variations
    if (textResults.length < 5) {
      const fuzzyPattern = this.buildFuzzyPattern(searchTerm);
      const regexQuery = {
        $or: [
          { product_name: { $regex: fuzzyPattern, $options: 'i' } },
          { description: { $regex: fuzzyPattern, $options: 'i' } }
        ]
      };

      const fuzzyResults = await this.productsCollection.find(regexQuery)
        .limit(10 - textResults.length)
        .toArray();

      return [...textResults, ...fuzzyResults];
    }

    return textResults;
  }

  buildFuzzyPattern(term) {
    // Create regex pattern allowing character variations
    const chars = term.split('');
    const pattern = chars.map(char => {
      return `${char}.*?`;
    }).join('');

    return pattern;
  }

  async searchWithFacets(searchTerm, facetFields = ['category', 'brand', 'price_range']) {
    const pipeline = [
      {
        $match: {
          $text: { $search: searchTerm }
        }
      },
      {
        $addFields: {
          score: { $meta: "textScore" },
          price_range: {
            $switch: {
              branches: [
                { case: { $lte: ["$price", 500] }, then: "Under $500" },
                { case: { $lte: ["$price", 1000] }, then: "$500 - $1000" },
                { case: { $lte: ["$price", 2000] }, then: "$1000 - $2000" },
                { case: { $gt: ["$price", 2000] }, then: "Over $2000" }
              ],
              default: "Unknown"
            }
          }
        }
      },
      {
        $facet: {
          results: [
            { $sort: { score: -1 } },
            { $limit: 20 },
            {
              $project: {
                product_name: 1,
                description: 1,
                price: 1,
                category: 1,
                brand: 1,
                score: 1
              }
            }
          ],
          category_facets: [
            { $group: { _id: "$category", count: { $sum: 1 } } },
            { $sort: { count: -1 } }
          ],
          brand_facets: [
            { $group: { _id: "$brand", count: { $sum: 1 } } },
            { $sort: { count: -1 } }
          ],
          price_facets: [
            { $group: { _id: "$price_range", count: { $sum: 1 } } },
            { $sort: { _id: 1 } }
          ]
        }
      }
    ];

    const facetResults = await this.productsCollection.aggregate(pipeline).toArray();
    return facetResults[0];
  }

  async autoComplete(prefix, field = 'product_name', limit = 10) {
    // Auto-completion using regex and text search
    const pipeline = [
      {
        $match: {
          [field]: { $regex: `^${prefix}`, $options: 'i' }
        }
      },
      {
        $group: {
          _id: `$${field}`,
          count: { $sum: 1 }
        }
      },
      {
        $sort: { count: -1 }
      },
      {
        $limit: limit
      },
      {
        $project: {
          suggestion: "$_id",
          frequency: "$count",
          _id: 0
        }
      }
    ];

    return await this.productsCollection.aggregate(pipeline).toArray();
  }
}

Multi-Language Text Search

Support international search requirements:

// Multi-language text search implementation
class MultiLanguageSearchService {
  constructor(db) {
    this.db = db;
    this.documentsCollection = db.collection('documents');

    // Language-specific stemming and stop words
    this.languageConfig = {
      english: { 
        stopwords: ['the', 'and', 'or', 'but', 'in', 'on', 'at', 'to', 'for', 'of', 'with', 'by'],
        stemming: true
      },
      spanish: {
        stopwords: ['el', 'la', 'y', 'o', 'pero', 'en', 'con', 'por', 'para', 'de'],
        stemming: true
      },
      french: {
        stopwords: ['le', 'la', 'et', 'ou', 'mais', 'dans', 'sur', 'avec', 'par', 'pour', 'de'],
        stemming: true
      }
    };
  }

  async setupMultiLanguageIndexes() {
    // Create language-specific text indexes
    for (const [language, config] of Object.entries(this.languageConfig)) {
      await this.documentsCollection.createIndex({
        title: "text",
        content: "text",
        tags: "text"
      }, {
        name: `text_search_${language}`,
        default_language: language,
        language_override: "lang",
        weights: {
          title: 15,
          content: 10,
          tags: 8
        }
      });
    }

    // Create compound index with language field
    await this.documentsCollection.createIndex({
      language: 1,
      title: "text",
      content: "text"
    }, {
      name: "multilang_text_search"
    });
  }

  async searchMultiLanguage(searchTerm, targetLanguage = null, options = {}) {
    const query = {
      $text: {
        $search: searchTerm,
        $language: targetLanguage || "english",
        $caseSensitive: false,
        $diacriticSensitive: false
      }
    };

    // Filter by specific language if provided
    if (targetLanguage) {
      query.language = targetLanguage;
    }

    const pipeline = [
      { $match: query },
      {
        $addFields: {
          score: { $meta: "textScore" },
          // Boost score for exact language match
          language_bonus: {
            $cond: {
              if: { $eq: ["$language", targetLanguage || "english"] },
              then: 1.5,
              else: 1.0
            }
          }
        }
      },
      {
        $addFields: {
          adjusted_score: { $multiply: ["$score", "$language_bonus"] }
        }
      },
      {
        $sort: { adjusted_score: -1 }
      },
      {
        $limit: options.limit || 20
      },
      {
        $project: {
          title: 1,
          content: { $substr: ["$content", 0, 200] }, // Excerpt
          language: 1,
          author: 1,
          created_at: 1,
          score: "$adjusted_score"
        }
      }
    ];

    return await this.documentsCollection.aggregate(pipeline).toArray();
  }

  async detectLanguage(text) {
    // Simple language detection based on common words
    const words = text.toLowerCase().split(/\s+/);
    const languageScores = {};

    for (const [language, config] of Object.entries(this.languageConfig)) {
      const stopwordMatches = words.filter(word => 
        config.stopwords.includes(word)
      ).length;

      languageScores[language] = stopwordMatches / words.length;
    }

    // Return language with highest score
    return Object.entries(languageScores)
      .sort(([,a], [,b]) => b - a)[0][0];
  }

  async searchWithLanguageDetection(searchTerm, options = {}) {
    // Auto-detect search term language
    const detectedLanguage = await this.detectLanguage(searchTerm);

    return await this.searchMultiLanguage(searchTerm, detectedLanguage, options);
  }

  async translateAndSearch(searchTerm, sourceLanguage, targetLanguages = ['english']) {
    // This would integrate with translation services
    const searchResults = new Map();

    for (const targetLanguage of targetLanguages) {
      // Placeholder for translation service integration
      const translatedTerm = await this.translateTerm(searchTerm, sourceLanguage, targetLanguage);

      const results = await this.searchMultiLanguage(translatedTerm, targetLanguage);
      searchResults.set(targetLanguage, results);
    }

    return searchResults;
  }

  async translateTerm(term, from, to) {
    // Placeholder for translation service
    // In practice, integrate with Google Translate, AWS Translate, etc.
    return term; // Return original term for now
  }
}

Advanced Search Features

Search Analytics and Optimization

Track and optimize search performance:

// Search analytics and performance optimization
class SearchAnalytics {
  constructor(db) {
    this.db = db;
    this.searchLogsCollection = db.collection('search_logs');
    this.productsCollection = db.collection('products');
  }

  async logSearchQuery(searchData) {
    const logEntry = {
      search_term: searchData.query,
      user_id: searchData.userId,
      session_id: searchData.sessionId,
      timestamp: new Date(),
      results_count: searchData.resultsCount,
      clicked_results: [],
      execution_time_ms: searchData.executionTime,
      search_type: searchData.searchType, // basic, fuzzy, phrase, etc.
      filters_applied: searchData.filters || {},
      user_agent: searchData.userAgent,
      ip_address: searchData.ipAddress
    };

    await this.searchLogsCollection.insertOne(logEntry);
    return logEntry._id;
  }

  async trackSearchClick(searchLogId, clickedResult) {
    await this.searchLogsCollection.updateOne(
      { _id: searchLogId },
      {
        $push: {
          clicked_results: {
            result_id: clickedResult.id,
            result_position: clickedResult.position,
            clicked_at: new Date()
          }
        }
      }
    );
  }

  async getSearchAnalytics(timeframe = 7) {
    const since = new Date(Date.now() - timeframe * 24 * 60 * 60 * 1000);

    const pipeline = [
      {
        $match: {
          timestamp: { $gte: since }
        }
      },
      {
        $group: {
          _id: {
            search_term: { $toLower: "$search_term" },
            date: { $dateToString: { format: "%Y-%m-%d", date: "$timestamp" } }
          },
          search_count: { $sum: 1 },
          avg_results: { $avg: "$results_count" },
          avg_execution_time: { $avg: "$execution_time_ms" },
          unique_users: { $addToSet: "$user_id" },
          click_through_rate: {
            $avg: {
              $cond: {
                if: { $gt: [{ $size: "$clicked_results" }, 0] },
                then: 1,
                else: 0
              }
            }
          }
        }
      },
      {
        $addFields: {
          unique_user_count: { $size: "$unique_users" }
        }
      },
      {
        $sort: { search_count: -1 }
      },
      {
        $limit: 100
      }
    ];

    return await this.searchLogsCollection.aggregate(pipeline).toArray();
  }

  async getPopularSearchTerms(limit = 20) {
    const pipeline = [
      {
        $match: {
          timestamp: { $gte: new Date(Date.now() - 30 * 24 * 60 * 60 * 1000) }
        }
      },
      {
        $group: {
          _id: { $toLower: "$search_term" },
          frequency: { $sum: 1 },
          avg_results: { $avg: "$results_count" },
          click_rate: {
            $avg: {
              $cond: {
                if: { $gt: [{ $size: "$clicked_results" }, 0] },
                then: 1,
                else: 0
              }
            }
          }
        }
      },
      {
        $match: {
          frequency: { $gte: 2 }  // Only terms searched more than once
        }
      },
      {
        $sort: { frequency: -1 }
      },
      {
        $limit: limit
      },
      {
        $project: {
          search_term: "$_id",
          frequency: 1,
          avg_results: { $round: ["$avg_results", 1] },
          click_rate: { $round: ["$click_rate", 3] },
          _id: 0
        }
      }
    ];

    return await this.searchLogsCollection.aggregate(pipeline).toArray();
  }

  async identifyZeroResultQueries(limit = 50) {
    const pipeline = [
      {
        $match: {
          results_count: 0,
          timestamp: { $gte: new Date(Date.now() - 7 * 24 * 60 * 60 * 1000) }
        }
      },
      {
        $group: {
          _id: { $toLower: "$search_term" },
          occurrence_count: { $sum: 1 },
          last_searched: { $max: "$timestamp" }
        }
      },
      {
        $sort: { occurrence_count: -1 }
      },
      {
        $limit: limit
      },
      {
        $project: {
          search_term: "$_id",
          occurrence_count: 1,
          last_searched: 1,
          _id: 0
        }
      }
    ];

    return await this.searchLogsCollection.aggregate(pipeline).toArray();
  }

  async optimizeSearchIndexes() {
    // Analyze query patterns to optimize indexes
    const searchPatterns = await this.getSearchAnalytics(30);

    const optimizationRecommendations = [];

    for (const pattern of searchPatterns) {
      const searchTerm = pattern._id.search_term;

      // Check if current indexes are efficient for common queries
      const indexStats = await this.productsCollection.aggregate([
        { $indexStats: {} }
      ]).toArray();

      // Analyze index usage for text searches
      const textIndexUsage = indexStats.filter(stat => 
        stat.name.includes('text') || stat.key.hasOwnProperty('_fts')
      );

      if (pattern.avg_execution_time > 100) { // Slow queries > 100ms
        optimizationRecommendations.push({
          issue: 'slow_search',
          search_term: searchTerm,
          avg_time: pattern.avg_execution_time,
          recommendation: 'Consider adding compound index for frequent filters'
        });
      }

      if (pattern.avg_results < 1) { // Very few results
        optimizationRecommendations.push({
          issue: 'low_recall',
          search_term: searchTerm,
          avg_results: pattern.avg_results,
          recommendation: 'Consider fuzzy matching or synonym expansion'
        });
      }
    }

    return optimizationRecommendations;
  }

  async generateSearchSuggestions() {
    // Generate search suggestions based on popular terms
    const popularTerms = await this.getPopularSearchTerms(100);

    const suggestions = [];

    for (const term of popularTerms) {
      // Extract keywords from successful searches
      const keywords = term.search_term.split(' ').filter(word => word.length > 2);

      for (const keyword of keywords) {
        // Find related products to suggest similar searches
        const relatedProducts = await this.productsCollection.find({
          $text: { $search: keyword }
        }, {
          projection: { product_name: 1, category: 1, tags: 1 }
        }).limit(5).toArray();

        const relatedTerms = new Set();

        relatedProducts.forEach(product => {
          // Extract terms from product names and categories
          const productWords = product.product_name.toLowerCase().split(/\s+/);
          const categoryWords = product.category ? product.category.toLowerCase().split(/\s+/) : [];
          const tagWords = product.tags ? product.tags.flatMap(tag => tag.toLowerCase().split(/\s+/)) : [];

          [...productWords, ...categoryWords, ...tagWords].forEach(word => {
            if (word.length > 2 && word !== keyword) {
              relatedTerms.add(word);
            }
          });
        });

        if (relatedTerms.size > 0) {
          suggestions.push({
            base_term: keyword,
            suggested_terms: Array.from(relatedTerms).slice(0, 5),
            popularity: term.frequency
          });
        }
      }
    }

    return suggestions.slice(0, 50); // Top 50 suggestions
  }
}

Real-Time Search Suggestions

Implement dynamic search suggestions and autocomplete:

// Real-time search suggestions system
class SearchSuggestionEngine {
  constructor(db) {
    this.db = db;
    this.suggestionsCollection = db.collection('search_suggestions');
    this.productsCollection = db.collection('products');
  }

  async buildSuggestionIndex() {
    // Create suggestions from product data
    const products = await this.productsCollection.find({}, {
      projection: {
        product_name: 1,
        category: 1,
        brand: 1,
        tags: 1,
        description: 1
      }
    }).toArray();

    const suggestionSet = new Set();

    for (const product of products) {
      // Extract searchable terms
      const terms = this.extractTerms(product);
      terms.forEach(term => suggestionSet.add(term));
    }

    // Convert to suggestion documents
    const suggestionDocs = Array.from(suggestionSet).map(term => ({
      text: term,
      length: term.length,
      frequency: 1, // Initial frequency
      created_at: new Date()
    }));

    // Clear existing suggestions and insert new ones
    await this.suggestionsCollection.deleteMany({});

    if (suggestionDocs.length > 0) {
      await this.suggestionsCollection.insertMany(suggestionDocs);
    }

    // Create indexes for fast prefix matching
    await this.suggestionsCollection.createIndex({ text: 1 });
    await this.suggestionsCollection.createIndex({ length: 1, frequency: -1 });
  }

  extractTerms(product) {
    const terms = new Set();

    // Product name - split and add individual words and phrases
    if (product.product_name) {
      const words = product.product_name.toLowerCase()
        .replace(/[^\w\s]/g, ' ')
        .split(/\s+/)
        .filter(word => word.length >= 2);

      words.forEach(word => terms.add(word));

      // Add 2-word and 3-word phrases
      for (let i = 0; i < words.length - 1; i++) {
        terms.add(`${words[i]} ${words[i + 1]}`);
        if (i < words.length - 2) {
          terms.add(`${words[i]} ${words[i + 1]} ${words[i + 2]}`);
        }
      }
    }

    // Category and brand
    if (product.category) {
      terms.add(product.category.toLowerCase());
    }

    if (product.brand) {
      terms.add(product.brand.toLowerCase());
    }

    // Tags
    if (product.tags && Array.isArray(product.tags)) {
      product.tags.forEach(tag => {
        if (typeof tag === 'string') {
          terms.add(tag.toLowerCase());
        }
      });
    }

    return Array.from(terms);
  }

  async getSuggestions(prefix, limit = 10) {
    // Get suggestions starting with prefix
    const suggestions = await this.suggestionsCollection.find({
      text: { $regex: `^${prefix.toLowerCase()}`, $options: 'i' },
      length: { $lte: 50 } // Reasonable length limit
    })
    .sort({ frequency: -1, length: 1 })
    .limit(limit)
    .project({ text: 1, frequency: 1, _id: 0 })
    .toArray();

    return suggestions.map(s => s.text);
  }

  async updateSuggestionFrequency(searchTerm) {
    // Update frequency when user searches
    await this.suggestionsCollection.updateOne(
      { text: searchTerm.toLowerCase() },
      { 
        $inc: { frequency: 1 },
        $set: { last_used: new Date() }
      },
      { upsert: true }
    );
  }

  async getFuzzySuggestions(term, maxDistance = 2, limit = 5) {
    // Get fuzzy suggestions for typos
    const pipeline = [
      {
        $project: {
          text: 1,
          frequency: 1,
          distance: {
            $function: {
              body: function(text1, text2) {
                // Levenshtein distance calculation
                const a = text1.toLowerCase();
                const b = text2.toLowerCase();
                const matrix = [];

                for (let i = 0; i <= b.length; i++) {
                  matrix[i] = [i];
                }

                for (let j = 0; j <= a.length; j++) {
                  matrix[0][j] = j;
                }

                for (let i = 1; i <= b.length; i++) {
                  for (let j = 1; j <= a.length; j++) {
                    if (b.charAt(i - 1) === a.charAt(j - 1)) {
                      matrix[i][j] = matrix[i - 1][j - 1];
                    } else {
                      matrix[i][j] = Math.min(
                        matrix[i - 1][j - 1] + 1,
                        matrix[i][j - 1] + 1,
                        matrix[i - 1][j] + 1
                      );
                    }
                  }
                }

                return matrix[b.length][a.length];
              },
              args: ["$text", term],
              lang: "js"
            }
          }
        }
      },
      {
        $match: {
          distance: { $lte: maxDistance }
        }
      },
      {
        $sort: {
          distance: 1,
          frequency: -1
        }
      },
      {
        $limit: limit
      },
      {
        $project: {
          text: 1,
          distance: 1,
          _id: 0
        }
      }
    ];

    return await this.suggestionsCollection.aggregate(pipeline).toArray();
  }

  async contextualSuggestions(partialQuery, userContext = {}) {
    // Provide contextual suggestions based on user behavior
    const contextFilters = {};

    if (userContext.previousSearches) {
      // Weight suggestions based on user's search history
      const historicalTerms = userContext.previousSearches.flatMap(search => 
        search.split(' ')
      );

      contextFilters.historical_boost = {
        $in: historicalTerms
      };
    }

    if (userContext.category) {
      // Boost suggestions from user's preferred category
      contextFilters.category_match = userContext.category;
    }

    const pipeline = [
      {
        $match: {
          text: { $regex: `^${partialQuery.toLowerCase()}`, $options: 'i' }
        }
      },
      {
        $addFields: {
          context_score: {
            $add: [
              "$frequency",
              // Boost for historical relevance
              {
                $cond: {
                  if: { $in: ["$text", userContext.previousSearches || []] },
                  then: 10,
                  else: 0
                }
              }
            ]
          }
        }
      },
      {
        $sort: { context_score: -1, length: 1 }
      },
      {
        $limit: 8
      },
      {
        $project: { text: 1, _id: 0 }
      }
    ];

    const suggestions = await this.suggestionsCollection.aggregate(pipeline).toArray();
    return suggestions.map(s => s.text);
  }
}

Geospatial Text Search

Location-Based Search

Combine text search with geographic queries:

// Geospatial text search implementation
class GeoTextSearchService {
  constructor(db) {
    this.db = db;
    this.businessesCollection = db.collection('businesses');
  }

  async setupGeoTextIndexes() {
    // Create compound geospatial and text index
    await this.businessesCollection.createIndex({
      location: "2dsphere",     // Geospatial index
      name: "text",            // Text search
      description: "text",
      tags: "text"
    }, {
      weights: {
        name: 15,
        description: 8,
        tags: 10
      }
    });

    // Alternative: separate indexes
    await this.businessesCollection.createIndex({ location: "2dsphere" });
    await this.businessesCollection.createIndex({
      name: "text",
      description: "text",
      tags: "text"
    });
  }

  async searchNearby(searchTerm, location, radius = 5000, options = {}) {
    // Search for businesses near a location matching text criteria
    const pipeline = [
      {
        $geoNear: {
          near: {
            type: "Point",
            coordinates: [location.longitude, location.latitude]
          },
          distanceField: "distance_meters",
          maxDistance: radius,
          spherical: true,
          query: {
            $text: { $search: searchTerm }
          }
        }
      },
      {
        $addFields: {
          text_score: { $meta: "textScore" },
          // Combine distance and text relevance scoring
          combined_score: {
            $add: [
              { $meta: "textScore" },
              // Distance penalty (closer is better)
              { $multiply: [
                { $divide: [{ $subtract: [radius, "$distance_meters"] }, radius] },
                5  // Distance weight factor
              ]}
            ]
          }
        }
      },
      {
        $sort: { combined_score: -1 }
      },
      {
        $limit: options.limit || 20
      },
      {
        $project: {
          name: 1,
          description: 1,
          address: 1,
          location: 1,
          distance_meters: { $round: ["$distance_meters", 0] },
          text_score: { $round: ["$text_score", 2] },
          combined_score: { $round: ["$combined_score", 2] }
        }
      }
    ];

    return await this.businessesCollection.aggregate(pipeline).toArray();
  }

  async searchInArea(searchTerm, polygon, options = {}) {
    // Search within a defined geographic area
    const query = {
      $and: [
        {
          location: {
            $geoWithin: {
              $geometry: polygon
            }
          }
        },
        {
          $text: { $search: searchTerm }
        }
      ]
    };

    return await this.businessesCollection.find(query, {
      projection: {
        name: 1,
        description: 1,
        address: 1,
        location: 1,
        score: { $meta: "textScore" }
      }
    })
    .sort({ score: { $meta: "textScore" } })
    .limit(options.limit || 20)
    .toArray();
  }

  async clusterSearchResults(searchTerm, center, radius = 10000) {
    // Group search results by geographic clusters
    const pipeline = [
      {
        $match: {
          $and: [
            {
              location: {
                $geoWithin: {
                  $centerSphere: [
                    [center.longitude, center.latitude],
                    radius / 6378100 // Convert to radians (Earth radius in meters)
                  ]
                }
              }
            },
            {
              $text: { $search: searchTerm }
            }
          ]
        }
      },
      {
        $addFields: {
          text_score: { $meta: "textScore" },
          // Create grid coordinates for clustering
          grid_x: {
            $floor: {
              $multiply: [
                { $arrayElemAt: ["$location.coordinates", 0] },
                1000  // Grid resolution
              ]
            }
          },
          grid_y: {
            $floor: {
              $multiply: [
                { $arrayElemAt: ["$location.coordinates", 1] },
                1000
              ]
            }
          }
        }
      },
      {
        $group: {
          _id: {
            grid_x: "$grid_x",
            grid_y: "$grid_y"
          },
          businesses: {
            $push: {
              name: "$name",
              location: "$location",
              text_score: "$text_score",
              address: "$address"
            }
          },
          count: { $sum: 1 },
          avg_score: { $avg: "$text_score" },
          center_point: {
            $avg: {
              coordinates: "$location.coordinates"
            }
          }
        }
      },
      {
        $match: {
          count: { $gte: 2 }  // Only clusters with multiple businesses
        }
      },
      {
        $sort: { avg_score: -1 }
      }
    ];

    return await this.businessesCollection.aggregate(pipeline).toArray();
  }

  async spatialAutoComplete(prefix, location, radius = 10000, limit = 10) {
    // Autocomplete suggestions based on nearby businesses
    const pipeline = [
      {
        $match: {
          location: {
            $geoWithin: {
              $centerSphere: [
                [location.longitude, location.latitude],
                radius / 6378100
              ]
            }
          }
        }
      },
      {
        $project: {
          name: 1,
          name_words: {
            $split: [{ $toLower: "$name" }, " "]
          }
        }
      },
      {
        $unwind: "$name_words"
      },
      {
        $match: {
          name_words: { $regex: `^${prefix.toLowerCase()}` }
        }
      },
      {
        $group: {
          _id: "$name_words",
          frequency: { $sum: 1 }
        }
      },
      {
        $sort: { frequency: -1 }
      },
      {
        $limit: limit
      },
      {
        $project: {
          suggestion: "$_id",
          frequency: 1,
          _id: 0
        }
      }
    ];

    return await this.businessesCollection.aggregate(pipeline).toArray();
  }
}

SQL-style geospatial text search concepts:

-- SQL geospatial text search equivalent patterns

-- PostGIS extension for spatial queries with text search
CREATE EXTENSION IF NOT EXISTS postgis;

-- Spatial and text indexes
CREATE INDEX idx_businesses_location ON businesses USING GIST (location);
CREATE INDEX idx_businesses_text ON businesses USING GIN (
  to_tsvector('english', name || ' ' || description || ' ' || array_to_string(tags, ' '))
);

-- Search nearby businesses with text matching
WITH nearby_businesses AS (
  SELECT 
    business_id,
    name,
    description,
    ST_Distance(location, ST_MakePoint(-122.4194, 37.7749)) AS distance_meters,
    ts_rank(
      to_tsvector('english', name || ' ' || description),
      plainto_tsquery('english', 'coffee shop')
    ) AS text_relevance
  FROM businesses
  WHERE ST_DWithin(
    location, 
    ST_MakePoint(-122.4194, 37.7749)::geography, 
    5000  -- 5km radius
  )
  AND to_tsvector('english', name || ' ' || description) 
      @@ plainto_tsquery('english', 'coffee shop')
)
SELECT 
  name,
  description,
  distance_meters,
  text_relevance,
  -- Combined scoring: text relevance + distance factor
  (text_relevance + (1 - distance_meters / 5000.0)) AS combined_score
FROM nearby_businesses
ORDER BY combined_score DESC
LIMIT 20;

-- Spatial clustering with text search
SELECT 
  ST_ClusterKMeans(location, 5) OVER () AS cluster_id,
  COUNT(*) AS businesses_in_cluster,
  AVG(ts_rank(
    to_tsvector('english', name || ' ' || description),
    plainto_tsquery('english', 'restaurant')
  )) AS avg_relevance,
  ST_Centroid(ST_Collect(location)) AS cluster_center
FROM businesses
WHERE to_tsvector('english', name || ' ' || description) 
      @@ plainto_tsquery('english', 'restaurant')
  AND ST_DWithin(
    location,
    ST_MakePoint(-122.4194, 37.7749)::geography,
    10000
  )
GROUP BY cluster_id
HAVING COUNT(*) >= 3
ORDER BY avg_relevance DESC;

Performance Optimization

Text Index Optimization

Optimize text search performance for large datasets:

// Text search performance optimization
class TextSearchOptimizer {
  constructor(db) {
    this.db = db;
  }

  async analyzeTextIndexPerformance(collection) {
    // Get index statistics
    const indexStats = await this.db.collection(collection).aggregate([
      { $indexStats: {} }
    ]).toArray();

    const textIndexes = indexStats.filter(stat => 
      stat.name.includes('text') || stat.key.hasOwnProperty('_fts')
    );

    const analysis = {
      collection: collection,
      text_indexes: textIndexes.length,
      index_details: []
    };

    for (const index of textIndexes) {
      const indexDetail = {
        name: index.name,
        size_bytes: index.size || 0,
        accesses: index.accesses || {},
        key_pattern: index.key,
        // Calculate index efficiency
        efficiency: this.calculateIndexEfficiency(index.accesses)
      };

      analysis.index_details.push(indexDetail);
    }

    return analysis;
  }

  calculateIndexEfficiency(accesses) {
    if (!accesses || !accesses.ops || !accesses.since) {
      return 0;
    }

    const ageHours = (Date.now() - accesses.since.getTime()) / (1000 * 60 * 60);
    const operationsPerHour = accesses.ops / Math.max(ageHours, 1);

    return {
      ops_per_hour: Math.round(operationsPerHour),
      total_operations: accesses.ops,
      age_hours: Math.round(ageHours)
    };
  }

  async optimizeTextIndexWeights(collection, sampleQueries = []) {
    // Analyze query performance with different weight configurations
    const fieldWeightTests = [
      { title: 20, content: 10, tags: 15 },  // Title-heavy
      { title: 10, content: 20, tags: 8 },   // Content-heavy  
      { title: 15, content: 15, tags: 20 },  // Tag-heavy
      { title: 12, content: 12, tags: 12 }   // Balanced
    ];

    const testResults = [];

    for (const weights of fieldWeightTests) {
      // Create test index
      const indexName = `text_test_${Date.now()}`;

      try {
        await this.db.collection(collection).createIndex({
          title: "text",
          content: "text", 
          tags: "text"
        }, {
          weights: weights,
          name: indexName
        });

        // Test queries with this index configuration
        const queryResults = [];

        for (const query of sampleQueries) {
          const startTime = Date.now();

          const results = await this.db.collection(collection).find({
            $text: { $search: query }
          }, {
            projection: { score: { $meta: "textScore" } }
          })
          .sort({ score: { $meta: "textScore" } })
          .limit(10)
          .toArray();

          const executionTime = Date.now() - startTime;

          queryResults.push({
            query: query,
            results_count: results.length,
            execution_time: executionTime,
            avg_score: results.reduce((sum, r) => sum + r.score, 0) / results.length || 0
          });
        }

        testResults.push({
          weights: weights,
          query_performance: queryResults,
          avg_execution_time: queryResults.reduce((sum, q) => sum + q.execution_time, 0) / queryResults.length,
          avg_relevance: queryResults.reduce((sum, q) => sum + q.avg_score, 0) / queryResults.length
        });

        // Drop test index
        await this.db.collection(collection).dropIndex(indexName);

      } catch (error) {
        console.error(`Failed to test weights ${JSON.stringify(weights)}:`, error);
      }
    }

    // Find optimal weights
    const bestConfig = testResults.reduce((best, current) => {
      const bestScore = (best.avg_relevance || 0) - (best.avg_execution_time || 1000) / 1000;
      const currentScore = (current.avg_relevance || 0) - (current.avg_execution_time || 1000) / 1000;

      return currentScore > bestScore ? current : best;
    });

    return {
      recommended_weights: bestConfig.weights,
      test_results: testResults,
      optimization_summary: {
        performance_gain: bestConfig.avg_execution_time < 100 ? 'excellent' : 'good',
        relevance_quality: bestConfig.avg_relevance > 1.0 ? 'high' : 'moderate'
      }
    };
  }

  async createOptimalTextIndex(collection, fields, sampleData = []) {
    // Analyze field content to determine optimal index configuration
    const fieldAnalysis = await this.analyzeFields(collection, fields);

    // Calculate optimal weights based on content analysis
    const weights = this.calculateOptimalWeights(fieldAnalysis);

    // Determine language settings
    const languageDistribution = await this.analyzeLanguageDistribution(collection);

    const indexConfig = {
      weights: weights,
      default_language: languageDistribution.primary_language,
      language_override: 'language',
      name: `optimized_text_${Date.now()}`
    };

    // Create the optimized index
    const indexSpec = {};
    fields.forEach(field => {
      indexSpec[field] = "text";
    });

    await this.db.collection(collection).createIndex(indexSpec, indexConfig);

    return {
      index_name: indexConfig.name,
      configuration: indexConfig,
      field_analysis: fieldAnalysis,
      language_distribution: languageDistribution
    };
  }

  async analyzeFields(collection, fields) {
    const pipeline = [
      { $sample: { size: 1000 } },  // Sample for analysis
      {
        $project: fields.reduce((proj, field) => {
          proj[field] = 1;
          proj[`${field}_word_count`] = {
            $size: {
              $split: [
                { $ifNull: [`$${field}`, ""] },
                " "
              ]
            }
          };
          proj[`${field}_char_count`] = {
            $strLenCP: { $ifNull: [`$${field}`, ""] }
          };
          return proj;
        }, {})
      }
    ];

    const sampleDocs = await this.db.collection(collection).aggregate(pipeline).toArray();

    const analysis = {};

    for (const field of fields) {
      const wordCounts = sampleDocs.map(doc => doc[`${field}_word_count`] || 0);
      const charCounts = sampleDocs.map(doc => doc[`${field}_char_count`] || 0);

      analysis[field] = {
        avg_words: wordCounts.reduce((sum, count) => sum + count, 0) / wordCounts.length,
        avg_chars: charCounts.reduce((sum, count) => sum + count, 0) / charCounts.length,
        max_words: Math.max(...wordCounts),
        non_empty_ratio: wordCounts.filter(count => count > 0).length / wordCounts.length
      };
    }

    return analysis;
  }

  calculateOptimalWeights(fieldAnalysis) {
    const weights = {};
    let totalScore = 0;

    // Calculate field importance scores
    for (const [field, stats] of Object.entries(fieldAnalysis)) {
      // Higher weight for fields with moderate word counts and high fill rates
      const wordScore = Math.min(stats.avg_words / 10, 3); // Cap at reasonable level
      const fillScore = stats.non_empty_ratio * 5;

      const fieldScore = wordScore + fillScore;
      weights[field] = Math.max(Math.round(fieldScore), 1);
      totalScore += weights[field];
    }

    // Normalize weights to reasonable range (1-20)
    const maxWeight = Math.max(...Object.values(weights));
    if (maxWeight > 20) {
      for (const field in weights) {
        weights[field] = Math.round((weights[field] / maxWeight) * 20);
      }
    }

    return weights;
  }

  async analyzeLanguageDistribution(collection) {
    // Simple language detection based on common words
    const pipeline = [
      { $sample: { size: 500 } },
      {
        $project: {
          text_content: {
            $concat: [
              { $ifNull: ["$title", ""] },
              " ",
              { $ifNull: ["$content", ""] },
              " ",
              { $ifNull: [{ $reduce: { input: "$tags", initialValue: "", in: { $concat: ["$$value", " ", "$$this"] } } }, ""] }
            ]
          }
        }
      }
    ];

    const samples = await this.db.collection(collection).aggregate(pipeline).toArray();

    const languageScores = { english: 0, spanish: 0, french: 0, german: 0 };

    // Language-specific common words
    const languageMarkers = {
      english: ['the', 'and', 'or', 'but', 'in', 'on', 'at', 'to', 'for', 'of', 'with', 'by'],
      spanish: ['el', 'la', 'y', 'o', 'pero', 'en', 'con', 'por', 'para', 'de', 'que', 'es'],
      french: ['le', 'la', 'et', 'ou', 'mais', 'dans', 'sur', 'avec', 'par', 'pour', 'de', 'que'],
      german: ['der', 'die', 'das', 'und', 'oder', 'aber', 'in', 'auf', 'mit', 'für', 'von', 'zu']
    };

    for (const sample of samples) {
      const words = sample.text_content.toLowerCase().split(/\s+/);

      for (const [language, markers] of Object.entries(languageMarkers)) {
        const matches = words.filter(word => markers.includes(word)).length;
        languageScores[language] += matches / words.length;
      }
    }

    const totalSamples = samples.length;
    for (const language in languageScores) {
      languageScores[language] = languageScores[language] / totalSamples;
    }

    const primaryLanguage = Object.entries(languageScores)
      .sort(([,a], [,b]) => b - a)[0][0];

    return {
      primary_language: primaryLanguage,
      distribution: languageScores,
      confidence: languageScores[primaryLanguage]
    };
  }
}

QueryLeaf Text Search Integration

QueryLeaf provides familiar SQL-style text search syntax with MongoDB's powerful full-text capabilities:

-- QueryLeaf text search with SQL-familiar syntax

-- Basic full-text search using SQL MATCH syntax
SELECT 
  product_id,
  product_name,
  description,
  price,
  MATCH(product_name, description) AGAINST('gaming laptop') AS relevance_score
FROM products
WHERE MATCH(product_name, description) AGAINST('gaming laptop')
ORDER BY relevance_score DESC
LIMIT 20;

-- Boolean text search with operators
SELECT 
  product_name,
  category,
  price,
  MATCH_SCORE(product_name, description, tags) AS score
FROM products  
WHERE FULL_TEXT_SEARCH(product_name, description, tags, '+gaming +laptop -refurbished')
ORDER BY score DESC;

-- Phrase search for exact matches
SELECT 
  article_id,
  title,
  author,
  created_date,
  TEXT_SCORE(title, content) AS relevance
FROM articles
WHERE PHRASE_SEARCH(title, content, '"machine learning algorithms"')
ORDER BY relevance DESC;

-- Multi-language text search
SELECT 
  document_id,
  title,
  content,
  language,
  MATCH_MULTILANG(title, content, 'artificial intelligence', language) AS score
FROM documents
WHERE MATCH_MULTILANG(title, content, 'artificial intelligence', language) > 0.5
ORDER BY score DESC;

-- Text search with geographic filtering  
SELECT 
  b.business_name,
  b.address,
  ST_Distance(b.location, ST_MakePoint(-122.4194, 37.7749)) AS distance_meters,
  MATCH(b.business_name, b.description) AGAINST('coffee shop') AS text_score
FROM businesses b
WHERE ST_DWithin(
    b.location,
    ST_MakePoint(-122.4194, 37.7749),
    5000  -- 5km radius
  )
  AND MATCH(b.business_name, b.description) AGAINST('coffee shop')
ORDER BY (text_score * 0.7 + (1 - distance_meters/5000) * 0.3) DESC;

-- QueryLeaf automatically handles:
-- 1. MongoDB text index creation and optimization
-- 2. Language detection and stemming
-- 3. Relevance scoring and ranking
-- 4. Multi-field search coordination
-- 5. Performance optimization through proper indexing
-- 6. Integration with other query types (geospatial, range, etc.)

-- Advanced text analytics with SQL aggregations
WITH search_analytics AS (
  SELECT 
    search_term,
    COUNT(*) as search_frequency,
    AVG(MATCH(product_name, description) AGAINST(search_term)) as avg_relevance,
    COUNT(CASE WHEN clicked = true THEN 1 END) as click_count
  FROM search_logs
  WHERE search_date >= CURRENT_DATE - INTERVAL '30 days'
  GROUP BY search_term
)
SELECT 
  search_term,
  search_frequency,
  ROUND(avg_relevance, 3) as avg_relevance,
  ROUND(100.0 * click_count / search_frequency, 1) as click_through_rate,
  CASE 
    WHEN avg_relevance < 0.5 THEN 'LOW_QUALITY'
    WHEN click_through_rate < 5.0 THEN 'LOW_ENGAGEMENT' 
    ELSE 'PERFORMING_WELL'
  END as search_quality
FROM search_analytics
WHERE search_frequency >= 10
ORDER BY search_frequency DESC;

-- Auto-complete and suggestions using SQL
SELECT DISTINCT
  SUBSTRING(product_name, 1, POSITION(' ' IN product_name || ' ') - 1) as suggestion,
  COUNT(*) as frequency
FROM products
WHERE product_name ILIKE 'gam%'
  AND LENGTH(product_name) >= 4
GROUP BY suggestion
HAVING COUNT(*) >= 2
ORDER BY frequency DESC, suggestion ASC
LIMIT 10;

-- Search result clustering and categorization
SELECT 
  category,
  COUNT(*) as result_count,
  AVG(MATCH(product_name, description) AGAINST('smartphone')) as avg_relevance,
  MIN(price) as min_price,
  MAX(price) as max_price,
  ARRAY_AGG(DISTINCT brand ORDER BY brand) as available_brands
FROM products
WHERE MATCH(product_name, description) AGAINST('smartphone')
  AND MATCH(product_name, description) AGAINST('smartphone') > 0.3
GROUP BY category
HAVING COUNT(*) >= 5
ORDER BY avg_relevance DESC;

Best Practices for Text Search

Search Implementation Guidelines

Essential practices for implementing MongoDB text search:

Index Strategy: Create focused text indexes on relevant fields with appropriate weights
Language Support: Configure proper language settings for stemming and tokenization
Performance Monitoring: Track search query performance and optimize accordingly
Relevance Tuning: Adjust field weights based on user behavior and search analytics
Fallback Mechanisms: Implement fuzzy search for handling typos and variations
Caching: Cache frequent search results and suggestions for improved performance

Search Quality Optimization

Improve search result quality and user experience:

Analytics-Driven Optimization: Use search analytics to identify and fix poor-performing queries
User Feedback Integration: Incorporate click-through rates and user interactions for relevance tuning
Synonym Management: Implement synonym expansion for better search recall
Personalization: Provide contextual suggestions based on user history and preferences
Multi-Modal Search: Combine text search with filters, geospatial queries, and faceted search
Real-Time Adaptation: Continuously update indexes and suggestions based on new content

Conclusion

MongoDB's full-text search capabilities provide enterprise-grade search functionality that rivals dedicated search engines while maintaining database integration simplicity. Combined with SQL-style query patterns, MongoDB text search enables familiar search implementation approaches while delivering the scalability and performance required for modern applications.

Key text search benefits include:

Advanced Linguistics: Stemming, tokenization, and language-specific processing for accurate results
Relevance Scoring: Built-in scoring algorithms with customizable field weights for optimal ranking
Performance Optimization: Specialized text indexes and query optimization for fast search response
Multi-Language Support: Native support for multiple languages with proper linguistic handling
Integration Flexibility: Seamless integration with other MongoDB query types and aggregation pipelines

Whether you're building product catalogs, content management systems, or document search applications, MongoDB text search with QueryLeaf's familiar SQL interface provides the foundation for sophisticated search experiences. This combination enables you to implement powerful search functionality while preserving the development patterns and query approaches your team already knows.

The integration of advanced text search capabilities with SQL-style query management makes MongoDB an ideal platform for applications requiring both powerful search functionality and familiar database interaction patterns, ensuring your search features remain both comprehensive and maintainable as they scale and evolve.

September 1, 2025
15 min read

MongoDB Atlas: Cloud Deployment and Management with SQL-Style Database Operations

Modern applications require scalable, managed database infrastructure that can adapt to changing workloads without requiring extensive operational overhead. Whether you're building startups that need to scale rapidly, enterprise applications with global user bases, or data-intensive platforms processing millions of transactions, managing database infrastructure manually becomes increasingly complex and error-prone.

MongoDB Atlas provides a fully managed cloud database service that automates infrastructure management, scaling, and operational tasks. Combined with SQL-style database management patterns, Atlas enables familiar database operations while delivering enterprise-grade reliability, security, and performance optimization.

The Cloud Database Challenge

Managing database infrastructure in-house presents significant operational challenges:

-- Traditional database infrastructure challenges

-- Manual scaling requires downtime
ALTER TABLE orders 
ADD PARTITION p2025_q1 VALUES LESS THAN ('2025-04-01');
-- Requires planning, testing, and maintenance windows

-- Backup management complexity
CREATE SCHEDULED JOB backup_daily_full
AS 'pg_dump production_db > /backups/full_$(date +%Y%m%d).sql'
SCHEDULE = 'CRON 0 2 * * *';
-- Manual backup verification, rotation, and disaster recovery testing

-- Resource monitoring and alerting
SELECT 
  table_name,
  pg_size_pretty(pg_total_relation_size(table_name)) AS size,
  (SELECT COUNT(*) FROM table_name) AS row_count
FROM information_schema.tables
WHERE table_schema = 'public'
  AND pg_total_relation_size(table_name) > 1073741824;  -- > 1GB
-- Manual monitoring setup and threshold management

-- Security patch management
UPDATE postgresql_version 
SET version = '14.8'
WHERE current_version = '14.7';
-- Requires testing, rollback planning, and downtime coordination

MongoDB Atlas eliminates these operational complexities:

// MongoDB Atlas automated infrastructure management
const atlasCluster = {
  name: "production-cluster",
  provider: "AWS",
  region: "us-east-1",
  tier: "M30",

  // Automatic scaling configuration
  autoScaling: {
    enabled: true,
    minInstanceSize: "M10",
    maxInstanceSize: "M60", 
    scaleDownEnabled: true
  },

  // Automated backup and point-in-time recovery
  backupPolicy: {
    enabled: true,
    snapshotRetentionDays: 30,
    pointInTimeRecoveryEnabled: true,
    continuousBackup: true
  },

  // Built-in monitoring and alerting
  monitoring: {
    performance: true,
    alerts: [
      { condition: "cpu_usage > 80", notification: "email" },
      { condition: "replication_lag > 60s", notification: "slack" },
      { condition: "connections > 80%", notification: "pagerduty" }
    ]
  }
};

// Applications connect seamlessly regardless of scaling events
db.orders.insertOne({
  customer_id: ObjectId("64f1a2c4567890abcdef1234"),
  items: [{ product: "laptop", quantity: 2, price: 1500 }],
  total_amount: 3000,
  status: "pending",
  created_at: new Date()
});
// Atlas handles routing, scaling, and failover transparently

Setting Up MongoDB Atlas Clusters

Production Cluster Configuration

Deploy production-ready Atlas clusters with optimal configuration:

// Production cluster deployment configuration
class AtlasClusterManager {
  constructor(atlasAPI) {
    this.atlasAPI = atlasAPI;
  }

  async deployProductionCluster(config) {
    const clusterConfig = {
      name: config.clusterName || "production-cluster",

      // Infrastructure configuration
      clusterType: "REPLICASET",
      mongoDBMajorVersion: "7.0",

      // Cloud provider settings
      providerSettings: {
        providerName: config.provider || "AWS",
        regionName: config.region || "US_EAST_1", 
        instanceSizeName: config.tier || "M30",

        // High availability across availability zones
        electableSpecs: {
          instanceSize: config.tier || "M30",
          nodeCount: 3,  // 3-node replica set
          ebsVolumeType: "GP3",
          diskIOPS: 3000
        },

        // Read-only analytics nodes
        readOnlySpecs: {
          instanceSize: config.analyticsTier || "M20",
          nodeCount: config.analyticsNodes || 2
        }
      },

      // Auto-scaling configuration
      autoScaling: {
        diskGBEnabled: true,
        compute: {
          enabled: true,
          scaleDownEnabled: true,
          minInstanceSize: config.minTier || "M10",
          maxInstanceSize: config.maxTier || "M60"
        }
      },

      // Backup configuration
      backupEnabled: true,
      pitEnabled: true,  // Point-in-time recovery

      // Advanced configuration
      encryptionAtRestProvider: "AWS",
      labels: [
        { key: "Environment", value: config.environment || "production" },
        { key: "Application", value: config.application },
        { key: "CostCenter", value: config.costCenter }
      ]
    };

    try {
      const deploymentResult = await this.atlasAPI.clusters.create(
        config.projectId,
        clusterConfig
      );

      // Wait for cluster to become available
      await this.waitForClusterReady(config.projectId, clusterConfig.name);

      // Configure network access
      await this.configureNetworkSecurity(config.projectId, config.allowedIPs);

      // Set up database users
      await this.configureUserAccess(config.projectId, config.users);

      return {
        success: true,
        cluster: deploymentResult,
        connectionString: await this.getConnectionString(config.projectId, clusterConfig.name)
      };
    } catch (error) {
      throw new Error(`Cluster deployment failed: ${error.message}`);
    }
  }

  async configureNetworkSecurity(projectId, allowedIPs) {
    // Configure IP allowlist for network security
    const networkConfig = allowedIPs.map(ip => ({
      ipAddress: ip.address,
      comment: ip.description || `Access from ${ip.address}`
    }));

    return await this.atlasAPI.networkAccess.create(projectId, networkConfig);
  }

  async configureUserAccess(projectId, users) {
    // Create database users with appropriate privileges
    for (const user of users) {
      const userConfig = {
        username: user.username,
        password: user.password || this.generateSecurePassword(),
        roles: user.roles.map(role => ({
          roleName: role.name,
          databaseName: role.database
        })),
        scopes: user.scopes || []
      };

      await this.atlasAPI.databaseUsers.create(projectId, userConfig);
    }
  }

  async waitForClusterReady(projectId, clusterName, timeoutMs = 1800000) {
    const startTime = Date.now();

    while (Date.now() - startTime < timeoutMs) {
      const cluster = await this.atlasAPI.clusters.get(projectId, clusterName);

      if (cluster.stateName === "IDLE") {
        return cluster;
      }

      console.log(`Cluster status: ${cluster.stateName}. Waiting...`);
      await this.sleep(30000);  // Check every 30 seconds
    }

    throw new Error(`Cluster deployment timeout after ${timeoutMs / 60000} minutes`);
  }

  generateSecurePassword(length = 16) {
    const chars = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789!@#$%^&*';
    return Array.from(crypto.getRandomValues(new Uint8Array(length)))
      .map(x => chars[x % chars.length])
      .join('');
  }

  sleep(ms) {
    return new Promise(resolve => setTimeout(resolve, ms));
  }
}

SQL-style cloud deployment comparison:

-- SQL cloud database deployment concepts
CREATE MANAGED_DATABASE_CLUSTER production_cluster AS (
  -- Infrastructure specification
  PROVIDER = 'AWS',
  REGION = 'us-east-1',
  INSTANCE_CLASS = 'db.r5.xlarge',
  STORAGE_TYPE = 'gp3',
  ALLOCATED_STORAGE = 500,  -- GB

  -- High availability configuration
  MULTI_AZ = true,
  REPLICA_COUNT = 2,
  AUTOMATIC_FAILOVER = true,

  -- Auto-scaling settings
  AUTO_SCALING_ENABLED = true,
  MIN_CAPACITY = 'db.r5.large',
  MAX_CAPACITY = 'db.r5.4xlarge',
  SCALE_DOWN_ENABLED = true,

  -- Backup and recovery
  AUTOMATED_BACKUP = true,
  BACKUP_RETENTION_DAYS = 30,
  POINT_IN_TIME_RECOVERY = true,

  -- Security settings
  ENCRYPTION_AT_REST = true,
  ENCRYPTION_IN_TRANSIT = true,
  VPC_SECURITY_GROUP = 'sg-production-db'
)
WITH DEPLOYMENT_TIMEOUT = 30 MINUTES,
     MAINTENANCE_WINDOW = 'sun:03:00-sun:04:00';

Automated Scaling and Performance

Dynamic Resource Scaling

Configure Atlas auto-scaling for varying workloads:

// Auto-scaling configuration and monitoring
class AtlasScalingManager {
  constructor(atlasAPI, projectId) {
    this.atlasAPI = atlasAPI;
    this.projectId = projectId;
  }

  async configureAutoScaling(clusterName, scalingRules) {
    const autoScalingConfig = {
      // Compute auto-scaling
      compute: {
        enabled: true,
        scaleDownEnabled: scalingRules.allowScaleDown || true,
        minInstanceSize: scalingRules.minTier || "M10",
        maxInstanceSize: scalingRules.maxTier || "M60",

        // Scaling triggers
        scaleUpThreshold: {
          cpuUtilization: scalingRules.scaleUpCPU || 75,
          memoryUtilization: scalingRules.scaleUpMemory || 80,
          connectionUtilization: scalingRules.scaleUpConnections || 80
        },

        scaleDownThreshold: {
          cpuUtilization: scalingRules.scaleDownCPU || 50,
          memoryUtilization: scalingRules.scaleDownMemory || 60,
          connectionUtilization: scalingRules.scaleDownConnections || 50
        }
      },

      // Storage auto-scaling  
      storage: {
        enabled: true,
        diskGBEnabled: true
      }
    };

    try {
      await this.atlasAPI.clusters.updateAutoScaling(
        this.projectId,
        clusterName,
        autoScalingConfig
      );

      return {
        success: true,
        configuration: autoScalingConfig
      };
    } catch (error) {
      throw new Error(`Auto-scaling configuration failed: ${error.message}`);
    }
  }

  async monitorScalingEvents(clusterName, timeframeDays = 7) {
    // Get scaling events from Atlas monitoring
    const endDate = new Date();
    const startDate = new Date(endDate.getTime() - (timeframeDays * 24 * 60 * 60 * 1000));

    const scalingEvents = await this.atlasAPI.monitoring.getScalingEvents(
      this.projectId,
      clusterName,
      startDate,
      endDate
    );

    // Analyze scaling patterns
    const analysis = this.analyzeScalingPatterns(scalingEvents);

    return {
      events: scalingEvents,
      analysis: analysis,
      recommendations: this.generateScalingRecommendations(analysis)
    };
  }

  analyzeScalingPatterns(events) {
    const scaleUpEvents = events.filter(e => e.action === 'SCALE_UP');
    const scaleDownEvents = events.filter(e => e.action === 'SCALE_DOWN');

    // Calculate peak usage patterns
    const hourlyDistribution = new Array(24).fill(0);
    events.forEach(event => {
      const hour = new Date(event.timestamp).getHours();
      hourlyDistribution[hour]++;
    });

    const peakHours = hourlyDistribution
      .map((count, hour) => ({ hour, count }))
      .filter(item => item.count > 0)
      .sort((a, b) => b.count - a.count)
      .slice(0, 3);

    return {
      totalScaleUps: scaleUpEvents.length,
      totalScaleDowns: scaleDownEvents.length,
      peakUsageHours: peakHours.map(p => p.hour),
      avgScalingFrequency: events.length / 7,  // Per day over week
      mostCommonTrigger: this.findMostCommonTrigger(events)
    };
  }

  generateScalingRecommendations(analysis) {
    const recommendations = [];

    if (analysis.totalScaleUps > analysis.totalScaleDowns * 2) {
      recommendations.push({
        type: 'baseline_adjustment',
        message: 'Consider increasing minimum instance size to reduce frequent scale-ups',
        priority: 'medium'
      });
    }

    if (analysis.avgScalingFrequency > 2) {
      recommendations.push({
        type: 'scaling_sensitivity',
        message: 'High scaling frequency detected. Consider adjusting thresholds',
        priority: 'low'
      });
    }

    if (analysis.peakUsageHours.length > 0) {
      recommendations.push({
        type: 'predictive_scaling',
        message: `Peak usage detected at hours ${analysis.peakUsageHours.join(', ')}. Consider scheduled scaling`,
        priority: 'medium'
      });
    }

    return recommendations;
  }
}

Performance Optimization

Optimize Atlas cluster performance through configuration:

// Atlas performance optimization strategies
class AtlasPerformanceOptimizer {
  constructor(client, atlasAPI) {
    this.client = client;
    this.atlasAPI = atlasAPI;
  }

  async optimizeClusterPerformance(clusterName) {
    // Analyze current performance metrics
    const performanceData = await this.collectPerformanceMetrics(clusterName);

    // Generate optimization recommendations
    const optimizations = await this.generateOptimizations(performanceData);

    // Apply automated optimizations
    const applied = await this.applyOptimizations(clusterName, optimizations);

    return {
      currentMetrics: performanceData,
      recommendations: optimizations,
      appliedOptimizations: applied
    };
  }

  async collectPerformanceMetrics(clusterName) {
    // Get comprehensive cluster metrics
    const metrics = {
      cpu: await this.getMetricSeries('CPU_USAGE', clusterName),
      memory: await this.getMetricSeries('MEMORY_USAGE', clusterName),
      connections: await this.getMetricSeries('CONNECTIONS', clusterName),
      diskIOPS: await this.getMetricSeries('DISK_IOPS', clusterName),
      networkIO: await this.getMetricSeries('NETWORK_BYTES_OUT', clusterName),

      // Query performance metrics
      slowQueries: await this.getSlowQueryAnalysis(clusterName),
      indexUsage: await this.getIndexEfficiency(clusterName),

      // Operational metrics
      replicationLag: await this.getReplicationMetrics(clusterName),
      oplogStats: await this.getOplogUtilization(clusterName)
    };

    return metrics;
  }

  async getSlowQueryAnalysis(clusterName) {
    // Analyze slow query logs through Atlas API
    const slowQueries = await this.atlasAPI.monitoring.getSlowQueries(
      this.projectId,
      clusterName,
      { 
        duration: { $gte: 1000 },  // Queries > 1 second
        limit: 100
      }
    );

    // Group by operation pattern
    const queryPatterns = new Map();

    slowQueries.forEach(query => {
      const pattern = this.normalizeQueryPattern(query.command);
      if (!queryPatterns.has(pattern)) {
        queryPatterns.set(pattern, {
          pattern: pattern,
          count: 0,
          totalDuration: 0,
          avgDuration: 0,
          collections: new Set()
        });
      }

      const stats = queryPatterns.get(pattern);
      stats.count++;
      stats.totalDuration += query.duration;
      stats.avgDuration = stats.totalDuration / stats.count;
      stats.collections.add(query.ns);
    });

    return Array.from(queryPatterns.values())
      .sort((a, b) => b.totalDuration - a.totalDuration)
      .slice(0, 10);
  }

  async generateIndexRecommendations(clusterName) {
    // Use Atlas Performance Advisor API
    const recommendations = await this.atlasAPI.performanceAdvisor.getSuggestedIndexes(
      this.projectId,
      clusterName
    );

    // Prioritize recommendations by impact
    return recommendations.suggestedIndexes
      .map(rec => ({
        collection: rec.namespace,
        index: rec.index,
        impact: rec.impact,
        queries: rec.queryPatterns,
        estimatedSizeBytes: rec.estimatedSize,
        priority: this.calculateIndexPriority(rec)
      }))
      .sort((a, b) => b.priority - a.priority);
  }

  calculateIndexPriority(recommendation) {
    let priority = 0;

    // High impact operations get higher priority
    if (recommendation.impact > 0.8) priority += 3;
    else if (recommendation.impact > 0.5) priority += 2;
    else priority += 1;

    // Frequent queries get priority boost
    if (recommendation.queryPatterns.length > 10) priority += 2;

    // Small indexes are easier to implement
    if (recommendation.estimatedSize < 1024 * 1024 * 100) priority += 1; // < 100MB

    return priority;
  }
}

SQL-style performance optimization concepts:

-- SQL performance optimization equivalent
-- Analyze query performance
SELECT 
  query_text,
  calls,
  total_time / 1000.0 AS total_seconds,
  mean_time / 1000.0 AS avg_seconds,
  rows_returned / calls AS avg_rows_per_call
FROM pg_stat_statements
WHERE total_time > 60000  -- Queries taking > 1 minute total
ORDER BY total_time DESC
LIMIT 10;

-- Auto-scaling configuration
ALTER DATABASE production_db 
SET auto_scaling = 'enabled',
    min_capacity = 2,
    max_capacity = 64,
    target_cpu_utilization = 70,
    scale_down_cooldown = 300;  -- 5 minutes

-- Index recommendations based on query patterns
WITH query_analysis AS (
  SELECT 
    schemaname,
    tablename,
    seq_scan,
    seq_tup_read,
    idx_scan,
    idx_tup_fetch
  FROM pg_stat_user_tables
  WHERE seq_scan > idx_scan  -- More sequential than index scans
)
SELECT 
  schemaname,
  tablename,
  'CREATE INDEX idx_' || tablename || '_recommended ON ' || 
  schemaname || '.' || tablename || ' (column_list);' AS recommended_index
FROM query_analysis
WHERE seq_tup_read > 10000;  -- High sequential reads

Data Distribution and Global Clusters

Multi-Region Deployment

Deploy global clusters for worldwide applications:

// Global cluster configuration for multi-region deployment
class GlobalClusterManager {
  constructor(atlasAPI) {
    this.atlasAPI = atlasAPI;
  }

  async deployGlobalCluster(config) {
    const globalConfig = {
      name: config.clusterName,
      clusterType: "GEOSHARDED",  // Global clusters use geo-sharding

      // Regional configurations
      replicationSpecs: [
        {
          // Primary region (US East)
          id: "primary-region",
          numShards: config.primaryShards || 2,
          zoneName: "Zone 1",
          regionsConfig: {
            "US_EAST_1": {
              analyticsSpecs: {
                instanceSize: "M20",
                nodeCount: 1
              },
              electableSpecs: {
                instanceSize: "M30", 
                nodeCount: 3
              },
              priority: 7,  // Highest priority
              readOnlySpecs: {
                instanceSize: "M20",
                nodeCount: 2
              }
            }
          }
        },
        {
          // Secondary region (Europe)
          id: "europe-region", 
          numShards: config.europeShards || 1,
          zoneName: "Zone 2",
          regionsConfig: {
            "EU_WEST_1": {
              electableSpecs: {
                instanceSize: "M20",
                nodeCount: 3
              },
              priority: 6,
              readOnlySpecs: {
                instanceSize: "M10",
                nodeCount: 1
              }
            }
          }
        },
        {
          // Asia-Pacific region
          id: "asia-region",
          numShards: config.asiaShards || 1, 
          zoneName: "Zone 3",
          regionsConfig: {
            "AP_SOUTHEAST_1": {
              electableSpecs: {
                instanceSize: "M20",
                nodeCount: 3
              },
              priority: 5,
              readOnlySpecs: {
                instanceSize: "M10",
                nodeCount: 1
              }
            }
          }
        }
      ],

      // Global cluster settings
      mongoDBMajorVersion: "7.0",
      encryptionAtRestProvider: "AWS",
      backupEnabled: true,
      pitEnabled: true
    };

    const deployment = await this.atlasAPI.clusters.create(this.projectId, globalConfig);

    // Configure zone mappings for data locality
    await this.configureZoneMappings(config.clusterName, config.zoneMappings);

    return deployment;
  }

  async configureZoneMappings(clusterName, zoneMappings) {
    // Configure shard key ranges for geographic data distribution
    for (const mapping of zoneMappings) {
      await this.client.db('admin').command({
        updateZoneKeyRange: `${mapping.database}.${mapping.collection}`,
        min: mapping.min,
        max: mapping.max,
        zone: mapping.zone
      });
    }
  }

  async optimizeGlobalReadPreferences(applications) {
    // Configure region-aware read preferences
    const readPreferenceConfigs = applications.map(app => ({
      application: app.name,
      regions: app.regions.map(region => ({
        region: region.name,
        readPreference: {
          mode: "nearest",
          tags: [{ region: region.atlasRegion }],
          maxStalenessMS: region.maxStaleness || 120000
        }
      }))
    }));

    return readPreferenceConfigs;
  }
}

// Geographic data routing
class GeographicDataRouter {
  constructor(client) {
    this.client = client;
    this.regionMappings = {
      'us': { tags: [{ zone: 'Zone 1' }] },
      'eu': { tags: [{ zone: 'Zone 2' }] },
      'asia': { tags: [{ zone: 'Zone 3' }] }
    };
  }

  async getUserDataByRegion(userId, userRegion) {
    const readPreference = {
      mode: "nearest",
      tags: this.regionMappings[userRegion]?.tags || [],
      maxStalenessMS: 120000
    };

    return await this.client.db('ecommerce')
      .collection('users')
      .findOne(
        { _id: userId },
        { readPreference }
      );
  }

  async insertRegionalData(collection, document, region) {
    // Ensure data is written to appropriate geographic zone
    const writeOptions = {
      writeConcern: {
        w: "majority",
        j: true,
        wtimeout: 10000
      }
    };

    // Add regional metadata for proper sharding
    const regionalDocument = {
      ...document,
      _region: region,
      _zone: this.getZoneForRegion(region),
      created_at: new Date()
    };

    return await this.client.db('ecommerce')
      .collection(collection)
      .insertOne(regionalDocument, writeOptions);
  }

  getZoneForRegion(region) {
    const zoneMap = {
      'us-east-1': 'Zone 1',
      'eu-west-1': 'Zone 2', 
      'ap-southeast-1': 'Zone 3'
    };
    return zoneMap[region] || 'Zone 1';
  }
}

Backup and Disaster Recovery

Automated Backup Management

Configure comprehensive backup and recovery strategies:

// Atlas backup and recovery management
class AtlasBackupManager {
  constructor(atlasAPI, projectId) {
    this.atlasAPI = atlasAPI;
    this.projectId = projectId;
  }

  async configureBackupPolicy(clusterName, policy) {
    const backupConfig = {
      // Snapshot scheduling
      snapshotSchedulePolicy: {
        snapshotIntervalHours: policy.snapshotInterval || 24,
        snapshotRetentionDays: policy.retentionDays || 30,
        clusterCheckpointIntervalMin: policy.checkpointInterval || 15
      },

      // Point-in-time recovery
      pointInTimeRecoveryEnabled: policy.pointInTimeEnabled || true,

      // Cross-region backup replication
      copySettings: policy.crossRegionBackup ? [
        {
          cloudProvider: "AWS",
          regionName: policy.backupRegion || "US_WEST_2",
          shouldCopyOplogs: true,
          frequencies: ["HOURLY", "DAILY", "WEEKLY", "MONTHLY"]
        }
      ] : [],

      // Backup compliance settings
      restoreWindowDays: policy.restoreWindow || 7,
      updateSnapshots: policy.updateSnapshots || true
    };

    try {
      await this.atlasAPI.backups.updatePolicy(
        this.projectId,
        clusterName,
        backupConfig
      );

      return {
        success: true,
        policy: backupConfig
      };
    } catch (error) {
      throw new Error(`Backup policy configuration failed: ${error.message}`);
    }
  }

  async performOnDemandBackup(clusterName, description) {
    const snapshot = await this.atlasAPI.backups.createSnapshot(
      this.projectId,
      clusterName,
      {
        description: description || `On-demand backup - ${new Date().toISOString()}`,
        retentionInDays: 30
      }
    );

    // Wait for snapshot completion
    await this.waitForSnapshotCompletion(clusterName, snapshot.id);

    return snapshot;
  }

  async restoreFromBackup(sourceCluster, targetCluster, restoreOptions) {
    const restoreConfig = {
      // Source configuration
      snapshotId: restoreOptions.snapshotId,

      // Target cluster configuration
      targetClusterName: targetCluster,
      targetGroupId: this.projectId,

      // Restore options
      deliveryType: restoreOptions.deliveryType || "automated",

      // Point-in-time recovery
      pointInTimeUTCSeconds: restoreOptions.pointInTime 
        ? Math.floor(restoreOptions.pointInTime.getTime() / 1000)
        : null
    };

    try {
      const restoreJob = await this.atlasAPI.backups.createRestoreJob(
        this.projectId,
        sourceCluster,
        restoreConfig
      );

      // Monitor restore progress
      await this.waitForRestoreCompletion(restoreJob.id);

      return {
        success: true,
        restoreJob: restoreJob,
        targetCluster: targetCluster
      };
    } catch (error) {
      throw new Error(`Restore operation failed: ${error.message}`);
    }
  }

  async validateBackupIntegrity(clusterName) {
    // Get recent snapshots
    const snapshots = await this.atlasAPI.backups.getSnapshots(
      this.projectId,
      clusterName,
      { limit: 10 }
    );

    const validationResults = [];

    for (const snapshot of snapshots) {
      // Test restore to temporary cluster
      const tempClusterName = `temp-restore-${Date.now()}`;

      try {
        // Create temporary cluster for restore testing
        const tempCluster = await this.createTemporaryCluster(tempClusterName);

        // Restore snapshot to temporary cluster
        await this.restoreFromBackup(clusterName, tempClusterName, {
          snapshotId: snapshot.id,
          deliveryType: "automated"
        });

        // Validate restored data
        const validation = await this.validateRestoredData(tempClusterName);

        validationResults.push({
          snapshotId: snapshot.id,
          snapshotDate: snapshot.createdAt,
          valid: validation.success,
          dataIntegrity: validation.integrity,
          validationTime: new Date()
        });

        // Clean up temporary cluster
        await this.atlasAPI.clusters.delete(this.projectId, tempClusterName);

      } catch (error) {
        validationResults.push({
          snapshotId: snapshot.id,
          snapshotDate: snapshot.createdAt,
          valid: false,
          error: error.message
        });
      }
    }

    return {
      totalSnapshots: snapshots.length,
      validSnapshots: validationResults.filter(r => r.valid).length,
      validationResults: validationResults
    };
  }
}

Security and Access Management

Atlas Security Configuration

Implement enterprise security controls in Atlas:

-- SQL-style cloud security configuration concepts
-- Network access control
CREATE SECURITY_GROUP atlas_database_access AS (
  -- Application server access
  ALLOW IP_RANGE '10.0.1.0/24' 
  COMMENT 'Production application servers',

  -- VPC peering for internal access
  ALLOW VPC 'vpc-12345678' 
  COMMENT 'Production VPC peering connection',

  -- Specific analytics server access
  ALLOW IP_ADDRESS '203.0.113.100' 
  COMMENT 'Analytics server - quarterly reports',

  -- Development environment access (temporary)
  ALLOW IP_RANGE '192.168.1.0/24'
  COMMENT 'Development team access'
  EXPIRE_DATE = '2025-09-30'
);

-- Database user management with roles
CREATE USER analytics_service 
WITH PASSWORD = 'secure_password',
     AUTHENTICATION_DATABASE = 'admin';

GRANT ROLE readWrite ON DATABASE ecommerce TO analytics_service;
GRANT ROLE read ON DATABASE analytics TO analytics_service;

-- Custom role for application service
CREATE ROLE order_processor_role AS (
  PRIVILEGES = [
    { database: 'ecommerce', collection: 'orders', actions: ['find', 'insert', 'update'] },
    { database: 'ecommerce', collection: 'inventory', actions: ['find', 'update'] },
    { database: 'ecommerce', collection: 'customers', actions: ['find'] }
  ],
  INHERITANCE = false
);

CREATE USER order_service 
WITH PASSWORD = 'service_password',
     AUTHENTICATION_DATABASE = 'admin';

GRANT ROLE order_processor_role TO order_service;

MongoDB Atlas security implementation:

// Atlas security configuration
class AtlasSecurityManager {
  constructor(atlasAPI, projectId) {
    this.atlasAPI = atlasAPI;
    this.projectId = projectId;
  }

  async configureNetworkSecurity(securityRules) {
    // IP allowlist configuration
    const ipAllowlist = securityRules.allowedIPs.map(rule => ({
      ipAddress: rule.address,
      comment: rule.description,
      ...(rule.expireDate && { deleteAfterDate: rule.expireDate })
    }));

    await this.atlasAPI.networkAccess.createMultiple(this.projectId, ipAllowlist);

    // VPC peering configuration for private network access
    if (securityRules.vpcPeering) {
      for (const vpc of securityRules.vpcPeering) {
        await this.atlasAPI.networkPeering.create(this.projectId, {
          containerId: vpc.containerId,
          providerName: vpc.provider,
          routeTableCidrBlock: vpc.cidrBlock,
          vpcId: vpc.vpcId,
          awsAccountId: vpc.accountId
        });
      }
    }

    // PrivateLink configuration for secure connectivity
    if (securityRules.privateLink) {
      await this.configurePrivateLink(securityRules.privateLink);
    }
  }

  async configurePrivateLink(privateConfig) {
    // AWS PrivateLink endpoint configuration
    const endpoint = await this.atlasAPI.privateEndpoints.create(
      this.projectId,
      {
        providerName: "AWS",
        region: privateConfig.region,
        serviceAttachmentNames: privateConfig.serviceAttachments || []
      }
    );

    return {
      endpointId: endpoint.id,
      serviceName: endpoint.serviceName,
      serviceAttachmentNames: endpoint.serviceAttachmentNames
    };
  }

  async setupDatabaseUsers(userConfigurations) {
    const createdUsers = [];

    for (const userConfig of userConfigurations) {
      // Create custom roles if needed
      if (userConfig.customRoles) {
        for (const role of userConfig.customRoles) {
          await this.createCustomRole(role);
        }
      }

      // Create database user
      const user = await this.atlasAPI.databaseUsers.create(this.projectId, {
        username: userConfig.username,
        password: userConfig.password,
        databaseName: userConfig.authDatabase || "admin",

        roles: userConfig.roles.map(role => ({
          roleName: role.name,
          databaseName: role.database
        })),

        // Scope restrictions
        scopes: userConfig.scopes || [],

        // Authentication restrictions
        ...(userConfig.restrictions && {
          awsIAMType: userConfig.restrictions.awsIAMType,
          ldapAuthType: userConfig.restrictions.ldapAuthType
        })
      });

      createdUsers.push({
        username: user.username,
        roles: user.roles,
        created: new Date()
      });
    }

    return createdUsers;
  }

  async createCustomRole(roleDefinition) {
    return await this.atlasAPI.customRoles.create(this.projectId, {
      roleName: roleDefinition.name,
      privileges: roleDefinition.privileges.map(priv => ({
        resource: {
          db: priv.database,
          collection: priv.collection || ""
        },
        actions: priv.actions
      })),
      inheritedRoles: roleDefinition.inheritedRoles || []
    });
  }

  async rotateUserPasswords(usernames) {
    const rotationResults = [];

    for (const username of usernames) {
      const newPassword = this.generateSecurePassword();

      try {
        await this.atlasAPI.databaseUsers.update(
          this.projectId,
          username,
          { password: newPassword }
        );

        rotationResults.push({
          username: username,
          success: true,
          rotatedAt: new Date()
        });
      } catch (error) {
        rotationResults.push({
          username: username,
          success: false,
          error: error.message
        });
      }
    }

    return rotationResults;
  }
}

Monitoring and Alerting

Comprehensive Monitoring Setup

Configure Atlas monitoring and alerting for production environments:

// Atlas monitoring and alerting configuration
class AtlasMonitoringManager {
  constructor(atlasAPI, projectId) {
    this.atlasAPI = atlasAPI;
    this.projectId = projectId;
  }

  async setupProductionAlerting(clusterName, alertConfig) {
    const alerts = [
      // Performance alerts
      {
        typeName: "HOST_CPU_USAGE_AVERAGE",
        threshold: alertConfig.cpuThreshold || 80,
        operator: "GREATER_THAN",
        units: "RAW",
        notifications: alertConfig.notifications
      },
      {
        typeName: "HOST_MEMORY_USAGE_AVERAGE", 
        threshold: alertConfig.memoryThreshold || 85,
        operator: "GREATER_THAN",
        units: "RAW",
        notifications: alertConfig.notifications
      },

      // Replication alerts
      {
        typeName: "REPLICATION_LAG",
        threshold: alertConfig.replicationLagThreshold || 60,
        operator: "GREATER_THAN", 
        units: "SECONDS",
        notifications: alertConfig.criticalNotifications
      },

      // Connection alerts
      {
        typeName: "CONNECTIONS_PERCENT",
        threshold: alertConfig.connectionThreshold || 80,
        operator: "GREATER_THAN",
        units: "RAW",
        notifications: alertConfig.notifications
      },

      // Storage alerts
      {
        typeName: "DISK_USAGE_PERCENT",
        threshold: alertConfig.diskThreshold || 75,
        operator: "GREATER_THAN",
        units: "RAW", 
        notifications: alertConfig.notifications
      },

      // Security alerts
      {
        typeName: "TOO_MANY_UNHEALTHY_MEMBERS",
        threshold: 1,
        operator: "GREATER_THAN_OR_EQUAL",
        units: "RAW",
        notifications: alertConfig.criticalNotifications
      }
    ];

    const createdAlerts = [];

    for (const alert of alerts) {
      try {
        const alertResult = await this.atlasAPI.alerts.create(this.projectId, {
          ...alert,
          enabled: true,
          matchers: [
            {
              fieldName: "CLUSTER_NAME",
              operator: "EQUALS",
              value: clusterName
            }
          ]
        });

        createdAlerts.push(alertResult);
      } catch (error) {
        console.error(`Failed to create alert ${alert.typeName}:`, error.message);
      }
    }

    return createdAlerts;
  }

  async createCustomMetricsDashboard(clusterName) {
    // Custom dashboard for business-specific metrics
    const dashboardConfig = {
      name: `${clusterName} - Production Metrics`,

      charts: [
        {
          name: "Order Processing Rate",
          type: "line",
          metricType: "custom",
          query: {
            collection: "orders",
            pipeline: [
              {
                $match: {
                  created_at: { $gte: new Date(Date.now() - 3600000) }  // Last hour
                }
              },
              {
                $group: {
                  _id: {
                    $dateToString: {
                      format: "%Y-%m-%d %H:00:00",
                      date: "$created_at"
                    }
                  },
                  order_count: { $sum: 1 },
                  total_revenue: { $sum: "$total_amount" }
                }
              }
            ]
          }
        },
        {
          name: "Database Response Time",
          type: "area", 
          metricType: "DATABASE_AVERAGE_OPERATION_TIME",
          aggregation: "average"
        },
        {
          name: "Active Connection Distribution",
          type: "stacked-column",
          metricType: "CONNECTIONS",
          groupBy: "replica_set_member"
        }
      ]
    };

    return await this.atlasAPI.monitoring.createDashboard(
      this.projectId,
      dashboardConfig
    );
  }

  async generatePerformanceReport(clusterName, timeframeDays = 7) {
    const endDate = new Date();
    const startDate = new Date(endDate.getTime() - (timeframeDays * 24 * 60 * 60 * 1000));

    // Collect metrics for analysis
    const metrics = await Promise.all([
      this.getMetricData("CPU_USAGE", clusterName, startDate, endDate),
      this.getMetricData("MEMORY_USAGE", clusterName, startDate, endDate),
      this.getMetricData("DISK_IOPS", clusterName, startDate, endDate),
      this.getMetricData("CONNECTIONS", clusterName, startDate, endDate),
      this.getSlowQueryAnalysis(clusterName, startDate, endDate)
    ]);

    const [cpu, memory, diskIOPS, connections, slowQueries] = metrics;

    // Analyze performance trends
    const analysis = {
      cpuTrends: this.analyzeMetricTrends(cpu),
      memoryTrends: this.analyzeMetricTrends(memory),
      diskTrends: this.analyzeMetricTrends(diskIOPS),
      connectionTrends: this.analyzeMetricTrends(connections),
      queryPerformance: this.analyzeQueryPerformance(slowQueries),
      recommendations: []
    };

    // Generate recommendations based on analysis
    analysis.recommendations = this.generatePerformanceRecommendations(analysis);

    return {
      cluster: clusterName,
      timeframe: { start: startDate, end: endDate },
      analysis: analysis,
      generatedAt: new Date()
    };
  }
}

QueryLeaf Atlas Integration

QueryLeaf provides seamless integration with MongoDB Atlas through familiar SQL syntax:

-- QueryLeaf Atlas connection and management
-- Connect to Atlas cluster with connection string
CONNECT TO atlas_cluster WITH (
  connection_string = 'mongodb+srv://username:password@cluster.mongodb.net/database',
  read_preference = 'secondaryPreferred',
  write_concern = 'majority',
  max_pool_size = 50,
  timeout_ms = 30000
);

-- Query operations work transparently with Atlas scaling
SELECT 
  customer_id,
  COUNT(*) as order_count,
  SUM(total_amount) as lifetime_value,
  MAX(created_at) as last_order_date
FROM orders 
WHERE created_at >= CURRENT_DATE - INTERVAL '1 year'
  AND status IN ('completed', 'shipped') 
GROUP BY customer_id
HAVING SUM(total_amount) > 1000
ORDER BY lifetime_value DESC
LIMIT 100;
-- Atlas automatically handles routing and scaling during execution

-- Data operations benefit from Atlas automation
INSERT INTO orders (
  customer_id,
  items,
  shipping_address,
  total_amount,
  status
) VALUES (
  OBJECTID('64f1a2c4567890abcdef1234'),
  '[{"product_id": "LAPTOP001", "quantity": 1, "price": 1299.99}]'::jsonb,
  '{"street": "123 Main St", "city": "Seattle", "state": "WA", "zip": "98101"}',
  1299.99,
  'pending'
);
-- Write automatically distributed across Atlas replica set members

-- Advanced analytics with Atlas Search integration  
SELECT 
  product_name,
  description,
  category,
  price,
  SEARCH_SCORE(
    'product_catalog_index',
    'laptop gaming performance'
  ) as relevance_score
FROM products
WHERE SEARCH_TEXT(
  'product_catalog_index',
  'laptop AND (gaming OR performance)',
  'queryString'
)
ORDER BY relevance_score DESC
LIMIT 20;

-- QueryLeaf with Atlas provides:
-- 1. Transparent connection management with Atlas clusters
-- 2. Automatic scaling integration without application changes
-- 3. Built-in monitoring through familiar SQL patterns
-- 4. Backup and recovery operations through SQL DDL
-- 5. Security management using SQL-style user and role management
-- 6. Performance optimization recommendations based on query patterns

-- Monitor Atlas cluster performance through SQL
SELECT 
  metric_name,
  current_value,
  threshold_value,
  CASE 
    WHEN current_value > threshold_value THEN 'ALERT'
    WHEN current_value > threshold_value * 0.8 THEN 'WARNING'
    ELSE 'OK'
  END as status
FROM atlas_cluster_metrics
WHERE cluster_name = 'production-cluster'
  AND metric_timestamp >= CURRENT_TIMESTAMP - INTERVAL '5 minutes'
ORDER BY metric_timestamp DESC;

-- Backup management through SQL DDL
CREATE BACKUP POLICY production_backup AS (
  SCHEDULE = 'DAILY',
  RETENTION_DAYS = 30,
  POINT_IN_TIME_RECOVERY = true,
  CROSS_REGION_COPY = true,
  BACKUP_REGION = 'us-west-2'
);

APPLY BACKUP POLICY production_backup TO CLUSTER 'production-cluster';

-- Restore operations using familiar SQL patterns
RESTORE DATABASE ecommerce_staging 
FROM BACKUP 'backup-2025-09-01-03-00'
TO CLUSTER 'staging-cluster'
WITH POINT_IN_TIME = '2025-09-01 02:45:00 UTC';

Best Practices for Atlas Deployment

Production Deployment Guidelines

Essential practices for Atlas production deployments:

Cluster Sizing: Start with appropriate tier sizing based on workload analysis and scale automatically
Multi-Region Setup: Deploy across multiple regions for disaster recovery and data locality
Security Configuration: Enable all security features including network access controls and encryption
Monitoring Integration: Configure comprehensive alerting and integrate with existing monitoring systems
Backup Testing: Regularly test backup and restore procedures with production-like data volumes
Cost Optimization: Monitor usage patterns and optimize cluster configurations for cost efficiency

Operational Excellence

Implement ongoing Atlas operational practices:

Automated Scaling: Configure auto-scaling based on application usage patterns
Performance Monitoring: Use Atlas Performance Advisor for query optimization recommendations
Security Auditing: Regular security reviews and access control auditing
Capacity Planning: Monitor growth trends and plan for future capacity needs
Disaster Recovery Testing: Regular DR testing and runbook validation
Cost Management: Monitor spending and optimize resource allocation

Conclusion

MongoDB Atlas provides enterprise-grade managed database infrastructure that eliminates operational complexity while delivering high performance, security, and availability. Combined with SQL-style management patterns, Atlas enables familiar database operations while providing cloud-native scalability and automation.

Key Atlas benefits include:

Zero Operations Overhead: Fully managed infrastructure with automated patching, scaling, and monitoring
Global Distribution: Multi-region clusters with automatic data locality and disaster recovery
Enterprise Security: Comprehensive security controls with network isolation and encryption
Performance Optimization: Built-in performance monitoring and automatic optimization recommendations
Cost Efficiency: Pay-as-you-scale pricing with automated resource optimization

Whether you're building cloud-native applications, migrating existing systems, or scaling global platforms, MongoDB Atlas with QueryLeaf's familiar SQL interface provides the foundation for modern database architectures. This combination enables you to focus on application development while Atlas handles the complexities of database infrastructure management.

The integration of managed cloud services with SQL-style operations makes Atlas an ideal platform for teams seeking both operational simplicity and familiar database interaction patterns.

August 31, 2025
14 min read

MongoDB Data Migration and Schema Evolution: SQL-Style Database Transformations

Application requirements constantly evolve, requiring changes to database schemas and data structures. Whether you're adding new features, optimizing for performance, or adapting to regulatory requirements, managing schema evolution without downtime is critical for production systems. Poor migration strategies can result in application failures, data loss, or extended outages.

MongoDB's flexible document model enables gradual schema evolution, but managing these changes systematically requires proven migration patterns. Combined with SQL-style migration concepts, MongoDB enables controlled schema evolution that maintains data integrity while supporting continuous deployment practices.

The Schema Evolution Challenge

Traditional SQL databases require explicit schema changes that can lock tables and cause downtime:

-- SQL schema evolution challenges
-- Adding a new column requires table lock
ALTER TABLE users 
ADD COLUMN preferences JSONB DEFAULT '{}';
-- LOCK acquired on entire table during operation

-- Changing data types requires full table rewrite
ALTER TABLE products 
ALTER COLUMN price TYPE DECIMAL(12,2);
-- Table unavailable during conversion

-- Adding constraints requires validation of all data
ALTER TABLE orders
ADD CONSTRAINT check_order_total 
CHECK (total_amount > 0 AND total_amount <= 100000);
-- Scans entire table to validate constraint

-- Renaming columns breaks application compatibility
ALTER TABLE customers
RENAME COLUMN customer_name TO full_name;
-- Requires coordinated application deployment

MongoDB's document model allows for more flexible evolution:

// MongoDB flexible schema evolution
// Old document structure
{
  _id: ObjectId("64f1a2c4567890abcdef1234"),
  customer_name: "John Smith",
  email: "john@example.com",
  status: "active",
  created_at: ISODate("2025-01-15")
}

// New document structure (gradually migrated)
{
  _id: ObjectId("64f1a2c4567890abcdef1234"),
  customer_name: "John Smith",     // Legacy field (kept for compatibility)
  full_name: "John Smith",         // New field
  email: "john@example.com",
  contact: {                       // New nested structure
    email: "john@example.com",
    phone: "+1-555-0123",
    preferred_method: "email"
  },
  preferences: {                   // New preferences object
    newsletter: true,
    notifications: true,
    language: "en"
  },
  status: "active",
  schema_version: 2,               // Version tracking
  created_at: ISODate("2025-01-15"),
  updated_at: ISODate("2025-08-31")
}

Planning Schema Evolution

Migration Strategy Framework

Design systematic migration approaches:

// Migration planning framework
class MigrationPlanner {
  constructor(db) {
    this.db = db;
    this.migrations = new Map();
  }

  defineMigration(version, migration) {
    this.migrations.set(version, {
      version: version,
      description: migration.description,
      up: migration.up,
      down: migration.down,
      validation: migration.validation,
      estimatedDuration: migration.estimatedDuration,
      backupRequired: migration.backupRequired || false
    });
  }

  async planEvolution(currentVersion, targetVersion) {
    const migrationPath = [];

    for (let v = currentVersion + 1; v <= targetVersion; v++) {
      const migration = this.migrations.get(v);
      if (!migration) {
        throw new Error(`Missing migration for version ${v}`);
      }
      migrationPath.push(migration);
    }

    // Calculate total migration impact
    const totalDuration = migrationPath.reduce(
      (sum, m) => sum + (m.estimatedDuration || 0), 0
    );

    const requiresBackup = migrationPath.some(m => m.backupRequired);

    return {
      migrationPath: migrationPath,
      totalDuration: totalDuration,
      requiresBackup: requiresBackup,
      riskLevel: this.assessMigrationRisk(migrationPath)
    };
  }

  assessMigrationRisk(migrations) {
    let riskScore = 0;

    migrations.forEach(migration => {
      // High risk operations
      if (migration.description.includes('drop') || 
          migration.description.includes('delete')) {
        riskScore += 3;
      }

      // Medium risk operations
      if (migration.description.includes('rename') ||
          migration.description.includes('transform')) {
        riskScore += 2;
      }

      // Low risk operations
      if (migration.description.includes('add') ||
          migration.description.includes('extend')) {
        riskScore += 1;
      }
    });

    return riskScore > 6 ? 'high' : riskScore > 3 ? 'medium' : 'low';
  }
}

SQL-style migration planning concepts:

-- SQL migration planning equivalent
-- Create migration tracking table
CREATE TABLE schema_migrations (
  version INTEGER PRIMARY KEY,
  description TEXT NOT NULL,
  applied_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  applied_by VARCHAR(100),
  duration_ms INTEGER,
  checksum VARCHAR(64)
);

-- Plan migration sequence
WITH migration_plan AS (
  SELECT 
    version,
    description,
    estimated_duration_mins,
    risk_level,
    requires_exclusive_lock,
    rollback_complexity
  FROM pending_migrations
  WHERE version > (SELECT MAX(version) FROM schema_migrations)
  ORDER BY version
)
SELECT 
  version,
  description,
  SUM(estimated_duration_mins) OVER (ORDER BY version) AS cumulative_duration,
  CASE 
    WHEN requires_exclusive_lock THEN 'HIGH_RISK'
    WHEN rollback_complexity = 'complex' THEN 'MEDIUM_RISK'
    ELSE 'LOW_RISK'
  END AS migration_risk
FROM migration_plan;

Zero-Downtime Migration Patterns

Progressive Field Migration

Implement gradual field evolution without breaking existing applications:

// Progressive migration implementation
class ProgressiveMigration {
  constructor(db) {
    this.db = db;
    this.batchSize = 1000;
    this.delayMs = 100;
  }

  async migrateCustomerContactInfo() {
    // Migration: Split single email field into contact object
    const collection = this.db.collection('customers');
    let totalMigrated = 0;

    // Phase 1: Add new fields alongside old ones
    await this.addNewContactFields();

    // Phase 2: Migrate data in batches
    await this.migrateDataInBatches(collection, totalMigrated);

    // Phase 3: Validate migration results
    await this.validateMigrationResults();

    return { totalMigrated: totalMigrated, status: 'completed' };
  }

  async addNewContactFields() {
    // Create compound index for efficient queries during migration
    await this.db.collection('customers').createIndex({
      schema_version: 1,
      updated_at: -1
    });
  }

  async migrateDataInBatches(collection, totalMigrated) {
    const cursor = collection.find({
      $or: [
        { schema_version: { $exists: false } },  // Legacy documents
        { schema_version: { $lt: 2 } }           // Previous versions
      ]
    }).batchSize(this.batchSize);

    while (await cursor.hasNext()) {
      const batch = [];

      // Collect batch of documents
      for (let i = 0; i < this.batchSize && await cursor.hasNext(); i++) {
        const doc = await cursor.next();
        batch.push(doc);
      }

      // Transform batch
      const bulkOps = batch.map(doc => this.createUpdateOperation(doc));

      // Execute batch update
      if (bulkOps.length > 0) {
        await collection.bulkWrite(bulkOps, { ordered: false });
        totalMigrated += bulkOps.length;

        console.log(`Migrated ${totalMigrated} documents`);

        // Throttle to avoid overwhelming the system
        await this.sleep(this.delayMs);
      }
    }
  }

  createUpdateOperation(document) {
    const update = {
      $set: {
        schema_version: 2,
        updated_at: new Date()
      }
    };

    // Preserve existing email field
    if (document.email && !document.contact) {
      update.$set.contact = {
        email: document.email,
        phone: null,
        preferred_method: "email"
      };

      // Keep legacy field for backward compatibility
      update.$set.customer_name = document.customer_name;
      update.$set.full_name = document.customer_name;
    }

    // Add default preferences if missing
    if (!document.preferences) {
      update.$set.preferences = {
        newsletter: false,
        notifications: true,
        language: "en"
      };
    }

    return {
      updateOne: {
        filter: { _id: document._id },
        update: update
      }
    };
  }

  async validateMigrationResults() {
    // Check migration completeness
    const legacyCount = await this.db.collection('customers').countDocuments({
      $or: [
        { schema_version: { $exists: false } },
        { schema_version: { $lt: 2 } }
      ]
    });

    const migratedCount = await this.db.collection('customers').countDocuments({
      schema_version: 2,
      contact: { $exists: true }
    });

    // Validate data integrity
    const invalidDocuments = await this.db.collection('customers').find({
      schema_version: 2,
      $or: [
        { contact: { $exists: false } },
        { "contact.email": { $exists: false } }
      ]
    }).limit(10).toArray();

    return {
      legacyRemaining: legacyCount,
      successfullyMigrated: migratedCount,
      validationErrors: invalidDocuments.length,
      errorSamples: invalidDocuments
    };
  }

  sleep(ms) {
    return new Promise(resolve => setTimeout(resolve, ms));
  }
}

Version-Based Schema Management

Implement schema versioning for controlled evolution:

// Schema version management system
class SchemaVersionManager {
  constructor(db) {
    this.db = db;
    this.currentSchemaVersions = new Map();
  }

  async registerSchemaVersion(collection, version, schema) {
    // Store schema definition for validation
    await this.db.collection('schema_definitions').replaceOne(
      { collection: collection, version: version },
      {
        collection: collection,
        version: version,
        schema: schema,
        created_at: new Date(),
        active: true
      },
      { upsert: true }
    );

    this.currentSchemaVersions.set(collection, version);
  }

  async getDocumentsByVersion(collection) {
    const pipeline = [
      {
        $group: {
          _id: { $ifNull: ["$schema_version", 0] },
          count: { $sum: 1 },
          sample_docs: { $push: "$$ROOT" },
          last_updated: { $max: "$updated_at" }
        }
      },
      {
        $addFields: {
          sample_docs: { $slice: ["$sample_docs", 3] }
        }
      },
      {
        $sort: { "_id": 1 }
      }
    ];

    return await this.db.collection(collection).aggregate(pipeline).toArray();
  }

  async validateDocumentSchema(collection, document) {
    const schemaVersion = document.schema_version || 0;
    const schemaDef = await this.db.collection('schema_definitions').findOne({
      collection: collection,
      version: schemaVersion
    });

    if (!schemaDef) {
      return {
        valid: false,
        errors: [`Unknown schema version: ${schemaVersion}`]
      };
    }

    return this.validateAgainstSchema(document, schemaDef.schema);
  }

  validateAgainstSchema(document, schema) {
    const errors = [];

    // Check required fields
    for (const field of schema.required || []) {
      if (!(field in document)) {
        errors.push(`Missing required field: ${field}`);
      }
    }

    // Check field types
    for (const [field, definition] of Object.entries(schema.properties || {})) {
      if (field in document) {
        const value = document[field];
        if (!this.validateFieldType(value, definition)) {
          errors.push(`Invalid type for field ${field}: expected ${definition.type}`);
        }
      }
    }

    return {
      valid: errors.length === 0,
      errors: errors
    };
  }

  validateFieldType(value, definition) {
    switch (definition.type) {
      case 'string':
        return typeof value === 'string';
      case 'number':
        return typeof value === 'number';
      case 'boolean':
        return typeof value === 'boolean';
      case 'array':
        return Array.isArray(value);
      case 'object':
        return value && typeof value === 'object' && !Array.isArray(value);
      case 'date':
        return value instanceof Date || typeof value === 'string';
      default:
        return true;
    }
  }
}

SQL-style schema versioning concepts:

-- SQL schema versioning patterns
CREATE TABLE schema_versions (
  table_name VARCHAR(100),
  version INTEGER,
  migration_sql TEXT,
  rollback_sql TEXT,
  applied_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  applied_by VARCHAR(100),
  PRIMARY KEY (table_name, version)
);

-- Track current schema versions per table
WITH current_versions AS (
  SELECT 
    table_name,
    MAX(version) AS current_version,
    COUNT(*) AS migration_count
  FROM schema_versions
  GROUP BY table_name
)
SELECT 
  t.table_name,
  cv.current_version,
  cv.migration_count,
  t.table_rows,
  pg_size_pretty(pg_total_relation_size(t.table_name)) AS table_size
FROM information_schema.tables t
LEFT JOIN current_versions cv ON t.table_name = cv.table_name
WHERE t.table_schema = 'public';

Data Transformation Strategies

Bulk Data Transformations

Implement efficient data transformations for large collections:

// Bulk data transformation with monitoring
class DataTransformer {
  constructor(db, options = {}) {
    this.db = db;
    this.batchSize = options.batchSize || 1000;
    this.maxConcurrency = options.maxConcurrency || 5;
    this.progressCallback = options.progressCallback;
  }

  async transformOrderHistory() {
    // Migration: Normalize order items into separate collection
    const ordersCollection = this.db.collection('orders');
    const orderItemsCollection = this.db.collection('order_items');

    // Create indexes for efficient processing
    await this.prepareCollections();

    // Process orders in parallel batches
    const totalOrders = await ordersCollection.countDocuments({
      items: { $exists: true, $type: "array" }
    });

    let processedCount = 0;
    const semaphore = new Semaphore(this.maxConcurrency);

    const cursor = ordersCollection.find({
      items: { $exists: true, $type: "array" }
    });

    const batchPromises = [];
    const batch = [];

    while (await cursor.hasNext()) {
      const order = await cursor.next();
      batch.push(order);

      if (batch.length >= this.batchSize) {
        batchPromises.push(
          semaphore.acquire().then(async () => {
            try {
              const result = await this.processBatch([...batch]);
              processedCount += batch.length;

              if (this.progressCallback) {
                this.progressCallback(processedCount, totalOrders);
              }

              return result;
            } finally {
              semaphore.release();
            }
          })
        );

        batch.length = 0;
      }
    }

    // Process remaining batch
    if (batch.length > 0) {
      batchPromises.push(this.processBatch(batch));
    }

    // Wait for all batches to complete
    await Promise.all(batchPromises);

    return {
      totalProcessed: processedCount,
      status: 'completed'
    };
  }

  async prepareCollections() {
    // Create indexes for efficient queries
    await this.db.collection('orders').createIndex({ 
      items: 1, 
      schema_version: 1 
    });

    await this.db.collection('order_items').createIndex({ 
      order_id: 1, 
      product_id: 1 
    });

    await this.db.collection('order_items').createIndex({ 
      product_id: 1, 
      created_at: -1 
    });
  }

  async processBatch(orders) {
    const session = this.db.client.startSession();

    try {
      return await session.withTransaction(async () => {
        const bulkOrderOps = [];
        const bulkItemOps = [];

        for (const order of orders) {
          // Extract items to separate collection
          const orderItems = order.items.map((item, index) => ({
            _id: new ObjectId(),
            order_id: order._id,
            item_index: index,
            product_id: item.product_id || item.product,
            quantity: item.quantity,
            price: item.price,
            subtotal: item.quantity * item.price,
            created_at: order.created_at || new Date()
          }));

          // Insert order items
          if (orderItems.length > 0) {
            bulkItemOps.push({
              insertMany: {
                documents: orderItems
              }
            });
          }

          // Update order document - remove items array, add summary
          bulkOrderOps.push({
            updateOne: {
              filter: { _id: order._id },
              update: {
                $set: {
                  item_count: orderItems.length,
                  total_items: orderItems.reduce((sum, item) => sum + item.quantity, 0),
                  schema_version: 3,
                  migrated_at: new Date()
                },
                $unset: {
                  items: ""  // Remove old items array
                }
              }
            }
          });
        }

        // Execute bulk operations
        if (bulkItemOps.length > 0) {
          await this.db.collection('order_items').bulkWrite(
            bulkItemOps.map(op => ({ insertOne: op.insertMany.documents[0] })),
            { session, ordered: false }
          );
        }

        if (bulkOrderOps.length > 0) {
          await this.db.collection('orders').bulkWrite(bulkOrderOps, { 
            session, 
            ordered: false 
          });
        }

        return { processedOrders: orders.length };
      });
    } finally {
      await session.endSession();
    }
  }
}

// Semaphore for concurrency control
class Semaphore {
  constructor(maxConcurrency) {
    this.maxConcurrency = maxConcurrency;
    this.currentCount = 0;
    this.waitQueue = [];
  }

  async acquire() {
    return new Promise((resolve) => {
      if (this.currentCount < this.maxConcurrency) {
        this.currentCount++;
        resolve();
      } else {
        this.waitQueue.push(resolve);
      }
    });
  }

  release() {
    this.currentCount--;
    if (this.waitQueue.length > 0) {
      const nextResolve = this.waitQueue.shift();
      this.currentCount++;
      nextResolve();
    }
  }
}

Field Validation and Constraints

Add validation rules during schema evolution:

// Document validation during migration
const customerValidationSchema = {
  $jsonSchema: {
    bsonType: "object",
    title: "Customer Document Validation",
    required: ["full_name", "contact", "status", "schema_version"],
    properties: {
      full_name: {
        bsonType: "string",
        minLength: 1,
        maxLength: 100,
        description: "Customer full name is required"
      },
      contact: {
        bsonType: "object",
        required: ["email"],
        properties: {
          email: {
            bsonType: "string",
            pattern: "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$",
            description: "Valid email address required"
          },
          phone: {
            bsonType: ["string", "null"],
            pattern: "^\\+?[1-9]\\d{1,14}$"
          },
          preferred_method: {
            enum: ["email", "phone", "sms"],
            description: "Contact preference must be email, phone, or sms"
          }
        }
      },
      preferences: {
        bsonType: "object",
        properties: {
          newsletter: { bsonType: "bool" },
          notifications: { bsonType: "bool" },
          language: { 
            bsonType: "string",
            enum: ["en", "es", "fr", "de"]
          }
        }
      },
      status: {
        enum: ["active", "inactive", "suspended"],
        description: "Status must be active, inactive, or suspended"
      },
      schema_version: {
        bsonType: "int",
        minimum: 1,
        maximum: 10
      }
    },
    additionalProperties: true  // Allow additional fields for flexibility
  }
};

// Apply validation to collection
db.runCommand({
  collMod: "customers",
  validator: customerValidationSchema,
  validationLevel: "moderate",  // Allow existing docs, validate new ones
  validationAction: "error"     // Reject invalid documents
});

SQL validation constraints comparison:

-- SQL constraint validation equivalent
-- Add validation constraints progressively
ALTER TABLE customers
ADD CONSTRAINT check_email_format 
CHECK (contact->>'email' ~* '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$')
NOT VALID;  -- Don't validate existing data immediately

-- Validate existing data gradually
ALTER TABLE customers 
VALIDATE CONSTRAINT check_email_format;

-- Add enum constraints for status
ALTER TABLE customers
ADD CONSTRAINT check_status_values
CHECK (status IN ('active', 'inactive', 'suspended'));

-- Add foreign key constraints
ALTER TABLE order_items
ADD CONSTRAINT fk_order_items_order_id
FOREIGN KEY (order_id) REFERENCES orders(id)
ON DELETE CASCADE;

Migration Testing and Validation

Pre-Migration Testing

Validate migrations before production deployment:

// Migration testing framework
class MigrationTester {
  constructor(sourceDb, testDb) {
    this.sourceDb = sourceDb;
    this.testDb = testDb;
  }

  async testMigration(migration) {
    // 1. Clone production data subset for testing
    await this.cloneTestData();

    // 2. Run migration on test data
    const migrationResult = await this.runTestMigration(migration);

    // 3. Validate migration results
    const validationResults = await this.validateMigrationResults(migration);

    // 4. Test application compatibility
    const compatibilityResults = await this.testApplicationCompatibility();

    // 5. Performance impact analysis
    const performanceResults = await this.analyzeMigrationPerformance();

    return {
      migration: migration.description,
      migrationResult: migrationResult,
      validationResults: validationResults,
      compatibilityResults: compatibilityResults,
      performanceResults: performanceResults,
      recommendation: this.generateRecommendation(validationResults, compatibilityResults, performanceResults)
    };
  }

  async cloneTestData() {
    const collections = ['customers', 'orders', 'products', 'inventory'];

    for (const collectionName of collections) {
      // Copy representative sample of data
      const sampleData = await this.sourceDb.collection(collectionName)
        .aggregate([
          { $sample: { size: 10000 } },  // Random sample
          { $addFields: { _test_copy: true } }
        ]).toArray();

      if (sampleData.length > 0) {
        await this.testDb.collection(collectionName).insertMany(sampleData);
      }
    }
  }

  async runTestMigration(migration) {
    const startTime = Date.now();

    try {
      const result = await migration.up(this.testDb);
      const duration = Date.now() - startTime;

      return {
        success: true,
        duration: duration,
        result: result
      };
    } catch (error) {
      return {
        success: false,
        error: error.message,
        duration: Date.now() - startTime
      };
    }
  }

  async validateMigrationResults(migration) {
    const validationResults = {};

    // Data integrity checks
    validationResults.dataIntegrity = await this.validateDataIntegrity();

    // Schema compliance checks
    validationResults.schemaCompliance = await this.validateSchemaCompliance();

    // Index validity checks
    validationResults.indexHealth = await this.validateIndexes();

    return validationResults;
  }

  async validateDataIntegrity() {
    // Check for data corruption or loss
    const checks = [
      {
        name: 'customer_count_preserved',
        query: async () => {
          const before = await this.sourceDb.collection('customers').countDocuments();
          const after = await this.testDb.collection('customers').countDocuments();
          return { before, after, preserved: before === after };
        }
      },
      {
        name: 'email_fields_migrated',
        query: async () => {
          const withContact = await this.testDb.collection('customers').countDocuments({
            "contact.email": { $exists: true }
          });
          const total = await this.testDb.collection('customers').countDocuments();
          return { migrated: withContact, total, percentage: (withContact / total) * 100 };
        }
      }
    ];

    const results = {};
    for (const check of checks) {
      try {
        results[check.name] = await check.query();
      } catch (error) {
        results[check.name] = { error: error.message };
      }
    }

    return results;
  }
}

Production Migration Execution

Safe Production Migration

Execute migrations safely in production environments:

// Production-safe migration executor
class ProductionMigrationRunner {
  constructor(db, options = {}) {
    this.db = db;
    this.options = {
      dryRun: options.dryRun || false,
      monitoring: options.monitoring || true,
      autoRollback: options.autoRollback || true,
      healthCheckInterval: options.healthCheckInterval || 30000,
      ...options
    };
  }

  async executeMigration(migration) {
    const execution = {
      migrationId: migration.version,
      startTime: new Date(),
      status: 'running',
      progress: 0,
      logs: []
    };

    try {
      // Pre-flight checks
      await this.performPreflightChecks(migration);

      // Create backup if required
      if (migration.backupRequired) {
        await this.createPreMigrationBackup(migration);
      }

      // Start health monitoring
      const healthMonitor = this.startHealthMonitoring();

      // Execute migration with monitoring
      if (this.options.dryRun) {
        execution.result = await this.dryRunMigration(migration);
      } else {
        execution.result = await this.runMigrationWithMonitoring(migration);
      }

      // Stop monitoring
      healthMonitor.stop();

      // Post-migration validation
      const validation = await this.validateMigrationSuccess(migration);

      execution.status = validation.success ? 'completed' : 'failed';
      execution.endTime = new Date();
      execution.duration = execution.endTime - execution.startTime;
      execution.validation = validation;

      // Log migration completion
      await this.logMigrationCompletion(execution);

      return execution;

    } catch (error) {
      execution.status = 'failed';
      execution.error = error.message;
      execution.endTime = new Date();

      // Attempt automatic rollback if enabled
      if (this.options.autoRollback && migration.down) {
        try {
          execution.rollback = await this.executeMigrationRollback(migration);
        } catch (rollbackError) {
          execution.rollbackError = rollbackError.message;
        }
      }

      throw error;
    }
  }

  async performPreflightChecks(migration) {
    const checks = [
      this.checkReplicaSetHealth(),
      this.checkDiskSpace(),
      this.checkReplicationLag(),
      this.checkActiveConnections(),
      this.checkOplogSize()
    ];

    const results = await Promise.all(checks);

    const failures = results.filter(result => !result.passed);
    if (failures.length > 0) {
      throw new Error(`Pre-flight checks failed: ${failures.map(f => f.message).join(', ')}`);
    }
  }

  async checkReplicaSetHealth() {
    try {
      const status = await this.db.admin().command({ replSetGetStatus: 1 });
      const primaryCount = status.members.filter(m => m.state === 1).length;
      const healthySecondaries = status.members.filter(m => m.state === 2 && m.health === 1).length;

      return {
        passed: primaryCount === 1 && healthySecondaries >= 1,
        message: `Replica set health: ${primaryCount} primary, ${healthySecondaries} healthy secondaries`
      };
    } catch (error) {
      return {
        passed: false,
        message: `Failed to check replica set health: ${error.message}`
      };
    }
  }

  async runMigrationWithMonitoring(migration) {
    const startTime = Date.now();

    // Execute migration with progress tracking
    const result = await migration.up(this.db, {
      progressCallback: (current, total) => {
        const percentage = Math.round((current / total) * 100);
        console.log(`Migration progress: ${percentage}% (${current}/${total})`);
      },
      healthCallback: async () => {
        const health = await this.checkSystemHealth();
        if (!health.healthy) {
          throw new Error(`System health degraded during migration: ${health.issues.join(', ')}`);
        }
      }
    });

    return {
      ...result,
      executionTime: Date.now() - startTime
    };
  }

  startHealthMonitoring() {
    const interval = setInterval(async () => {
      try {
        const health = await this.checkSystemHealth();
        if (!health.healthy) {
          console.warn('System health warning:', health.issues);
        }
      } catch (error) {
        console.error('Health check failed:', error.message);
      }
    }, this.options.healthCheckInterval);

    return {
      stop: () => clearInterval(interval)
    };
  }

  async checkSystemHealth() {
    const issues = [];

    // Check replication lag
    const replStatus = await this.db.admin().command({ replSetGetStatus: 1 });
    const maxLag = this.calculateMaxReplicationLag(replStatus.members);
    if (maxLag > 30000) {  // 30 seconds
      issues.push(`High replication lag: ${maxLag / 1000}s`);
    }

    // Check connection count
    const serverStatus = await this.db.admin().command({ serverStatus: 1 });
    const connUtilization = serverStatus.connections.current / serverStatus.connections.available;
    if (connUtilization > 0.8) {
      issues.push(`High connection utilization: ${Math.round(connUtilization * 100)}%`);
    }

    // Check memory usage
    if (serverStatus.mem.resident > 8000) {  // 8GB
      issues.push(`High memory usage: ${serverStatus.mem.resident}MB`);
    }

    return {
      healthy: issues.length === 0,
      issues: issues
    };
  }
}

Application Compatibility During Migration

Backward Compatibility Strategies

Maintain application compatibility during schema evolution:

// Application compatibility layer
class SchemaCompatibilityLayer {
  constructor(db) {
    this.db = db;
    this.documentAdapters = new Map();
  }

  registerDocumentAdapter(collection, fromVersion, toVersion, adapter) {
    const key = `${collection}:${fromVersion}:${toVersion}`;
    this.documentAdapters.set(key, adapter);
  }

  async findWithCompatibility(collection, query, options = {}) {
    const documents = await this.db.collection(collection).find(query, options).toArray();

    return documents.map(doc => this.adaptDocument(collection, doc));
  }

  adaptDocument(collection, document) {
    const schemaVersion = document.schema_version || 1;
    const targetVersion = 2;  // Current application version

    if (schemaVersion === targetVersion) {
      return document;
    }

    // Apply version-specific transformations
    let adapted = { ...document };

    for (let v = schemaVersion; v < targetVersion; v++) {
      const adapterKey = `${collection}:${v}:${v + 1}`;
      const adapter = this.documentAdapters.get(adapterKey);

      if (adapter) {
        adapted = adapter(adapted);
      }
    }

    return adapted;
  }

  // Example adapters
  setupCustomerAdapters() {
    // V1 to V2: Add contact object and full_name field
    this.registerDocumentAdapter('customers', 1, 2, (doc) => ({
      ...doc,
      full_name: doc.customer_name || doc.full_name,
      contact: doc.contact || {
        email: doc.email,
        phone: null,
        preferred_method: "email"
      },
      preferences: doc.preferences || {
        newsletter: false,
        notifications: true,
        language: "en"
      }
    }));
  }
}

// Application service with compatibility
class CustomerService {
  constructor(db) {
    this.db = db;
    this.compatibility = new SchemaCompatibilityLayer(db);
    this.compatibility.setupCustomerAdapters();
  }

  async getCustomer(customerId) {
    const customers = await this.compatibility.findWithCompatibility(
      'customers',
      { _id: customerId }
    );

    return customers[0];
  }

  async createCustomer(customerData) {
    // Always use latest schema version for new documents
    const document = {
      ...customerData,
      schema_version: 2,
      created_at: new Date(),
      updated_at: new Date()
    };

    return await this.db.collection('customers').insertOne(document);
  }

  async updateCustomer(customerId, updates) {
    // Ensure updates don't break schema version
    const customer = await this.getCustomer(customerId);
    const targetVersion = 2;

    if (customer.schema_version < targetVersion) {
      // Upgrade document during update
      updates.schema_version = targetVersion;
      updates.updated_at = new Date();

      // Apply compatibility transformations
      if (!updates.full_name && customer.customer_name) {
        updates.full_name = customer.customer_name;
      }

      if (!updates.contact && customer.email) {
        updates.contact = {
          email: customer.email,
          phone: null,
          preferred_method: "email"
        };
      }
    }

    return await this.db.collection('customers').updateOne(
      { _id: customerId },
      { $set: updates }
    );
  }
}

QueryLeaf Migration Integration

QueryLeaf provides SQL-familiar migration management:

-- QueryLeaf migration syntax
-- Enable migration mode for safe schema evolution
SET MIGRATION_MODE = 'gradual';
SET MIGRATION_BATCH_SIZE = 1000;
SET MIGRATION_THROTTLE_MS = 100;

-- Schema evolution with familiar SQL DDL
-- Add new columns gradually
ALTER TABLE customers 
ADD COLUMN contact JSONB DEFAULT '{"email": null, "phone": null}';

-- Transform existing data using SQL syntax
UPDATE customers 
SET contact = JSON_BUILD_OBJECT(
  'email', email,
  'phone', phone_number,
  'preferred_method', 'email'
),
full_name = customer_name,
schema_version = 2
WHERE schema_version < 2 OR schema_version IS NULL;

-- Add validation constraints
ALTER TABLE customers
ADD CONSTRAINT check_contact_email
CHECK (contact->>'email' IS NOT NULL);

-- Create new normalized structure
CREATE TABLE order_items AS
SELECT 
  GENERATE_UUID() as id,
  order_id,
  item->>'product_id' as product_id,
  (item->>'quantity')::INTEGER as quantity,
  (item->>'price')::DECIMAL as price,
  created_at
FROM orders o,
LATERAL JSON_ARRAY_ELEMENTS(items) as item
WHERE items IS NOT NULL;

-- Add indexes for new structure
CREATE INDEX idx_order_items_order_id ON order_items (order_id);
CREATE INDEX idx_order_items_product_id ON order_items (product_id);

-- QueryLeaf automatically:
-- 1. Executes migrations in safe batches
-- 2. Monitors replication lag during migration
-- 3. Provides rollback capabilities
-- 4. Validates schema changes before execution
-- 5. Maintains compatibility with existing queries
-- 6. Tracks migration progress and completion

-- Monitor migration progress
SELECT 
  collection_name,
  schema_version,
  COUNT(*) as document_count,
  MAX(updated_at) as last_migration_time
FROM (
  SELECT 'customers' as collection_name, schema_version, updated_at FROM customers
  UNION ALL
  SELECT 'orders' as collection_name, schema_version, updated_at FROM orders
) migration_status
GROUP BY collection_name, schema_version
ORDER BY collection_name, schema_version;

-- Validate migration completion
SELECT 
  collection_name,
  CASE 
    WHEN legacy_documents = 0 THEN 'COMPLETED'
    WHEN legacy_documents < total_documents * 0.1 THEN 'NEARLY_COMPLETE' 
    ELSE 'IN_PROGRESS'
  END as migration_status,
  legacy_documents,
  migrated_documents,
  total_documents,
  ROUND(100.0 * migrated_documents / total_documents, 2) as completion_percentage
FROM (
  SELECT 
    'customers' as collection_name,
    COUNT(CASE WHEN schema_version < 2 OR schema_version IS NULL THEN 1 END) as legacy_documents,
    COUNT(CASE WHEN schema_version >= 2 THEN 1 END) as migrated_documents,
    COUNT(*) as total_documents
  FROM customers
) migration_summary;

Best Practices for MongoDB Migrations

Migration Planning Guidelines

Version Control: Track all schema changes in version control with clear documentation
Testing: Test migrations thoroughly on production-like data before deployment
Monitoring: Monitor system health continuously during migration execution
Rollback Strategy: Always have a rollback plan and test rollback procedures
Communication: Coordinate with application teams for compatibility requirements
Performance Impact: Consider migration impact on production workloads and schedule accordingly

Operational Procedures

Backup First: Always create backups before executing irreversible migrations
Gradual Deployment: Use progressive rollouts with feature flags when possible
Health Monitoring: Monitor replication lag, connection counts, and system resources
Rollback Readiness: Keep rollback scripts tested and ready for immediate execution
Documentation: Document all migration steps and decision rationale

Conclusion

MongoDB data migration and schema evolution enable applications to adapt to changing requirements while maintaining high availability and data integrity. Through systematic migration planning, progressive deployment strategies, and comprehensive testing, teams can evolve database schemas safely in production environments.

Key migration strategies include:

Progressive Migration: Evolve schemas gradually without breaking existing functionality
Version Management: Track schema versions and maintain compatibility across application versions
Zero-Downtime Deployment: Use batched operations and health monitoring for continuous availability
Validation Framework: Implement comprehensive testing and validation before production deployment
Rollback Capabilities: Maintain tested rollback procedures for rapid recovery when needed

Whether you're normalizing data structures, adding new features, or optimizing for performance, MongoDB migration patterns with QueryLeaf's familiar SQL interface provide the foundation for safe, controlled schema evolution. This combination enables teams to evolve their database schemas confidently while preserving both data integrity and application availability.

The integration of flexible document evolution with SQL-style migration management makes MongoDB an ideal platform for applications requiring both adaptability and reliability as they grow and change over time.