Skip to content

Blog

MongoDB Aggregation Framework Optimization and Performance Tuning: Advanced Pipeline Design with SQL-Style Query Performance

Modern data analytics require sophisticated data processing pipelines that can handle complex transformations, aggregations, and analytics across large datasets efficiently. Traditional SQL approaches often struggle with complex nested data structures, multi-stage transformations, and the performance overhead of multiple query roundtrips needed for complex analytics workflows.

MongoDB's Aggregation Framework provides a powerful pipeline-based approach that enables complex data transformations and analytics in a single, optimized operation. Unlike traditional SQL aggregation that requires multiple queries or complex subqueries, MongoDB aggregations can perform sophisticated multi-stage processing with intelligent optimization and index utilization.

The Traditional Analytics Performance Challenge

Traditional approaches to complex data aggregation and analytics have significant performance and architectural limitations:

-- Traditional SQL approach - multiple queries and complex joins

-- PostgreSQL complex analytics query with performance challenges
WITH user_segments AS (
  SELECT 
    user_id,
    email,
    registration_date,
    subscription_tier,

    -- User activity aggregation (expensive subquery)
    (SELECT COUNT(*) FROM user_activities ua WHERE ua.user_id = u.user_id) as total_activities,
    (SELECT COUNT(*) FROM orders o WHERE o.user_id = u.user_id) as total_orders,
    (SELECT COALESCE(SUM(o.total_amount), 0) FROM orders o WHERE o.user_id = u.user_id) as lifetime_value,

    -- Recent activity indicators (more expensive subqueries)
    (SELECT COUNT(*) FROM user_activities ua 
     WHERE ua.user_id = u.user_id 
       AND ua.created_at >= CURRENT_TIMESTAMP - INTERVAL '30 days') as recent_activities,
    (SELECT COUNT(*) FROM orders o 
     WHERE o.user_id = u.user_id 
       AND o.created_at >= CURRENT_TIMESTAMP - INTERVAL '30 days') as recent_orders,

    -- Engagement scoring (complex calculation)
    CASE 
      WHEN (SELECT COUNT(*) FROM user_activities ua WHERE ua.user_id = u.user_id AND ua.created_at >= CURRENT_TIMESTAMP - INTERVAL '7 days') > 10 THEN 'high'
      WHEN (SELECT COUNT(*) FROM user_activities ua WHERE ua.user_id = u.user_id AND ua.created_at >= CURRENT_TIMESTAMP - INTERVAL '30 days') > 5 THEN 'medium'
      ELSE 'low'
    END as engagement_level

  FROM users u
  WHERE u.status = 'active'
),

order_analytics AS (
  SELECT 
    o.user_id,
    COUNT(*) as order_count,
    SUM(o.total_amount) as total_spent,
    AVG(o.total_amount) as avg_order_value,
    MAX(o.created_at) as last_order_date,

    -- Product category analysis (expensive join)
    (SELECT string_agg(DISTINCT p.category, ',') 
     FROM order_items oi 
     JOIN products p ON oi.product_id = p.product_id 
     WHERE oi.order_id = o.order_id) as purchased_categories,

    -- Time-based patterns (complex calculations)
    EXTRACT(DOW FROM o.created_at) as order_day_of_week,
    EXTRACT(HOUR FROM o.created_at) as order_hour,

    -- Seasonality analysis
    CASE 
      WHEN EXTRACT(MONTH FROM o.created_at) IN (12, 1, 2) THEN 'winter'
      WHEN EXTRACT(MONTH FROM o.created_at) IN (3, 4, 5) THEN 'spring'
      WHEN EXTRACT(MONTH FROM o.created_at) IN (6, 7, 8) THEN 'summer'
      ELSE 'fall'
    END as season

  FROM orders o
  WHERE o.status = 'completed'
    AND o.created_at >= CURRENT_TIMESTAMP - INTERVAL '1 year'
  GROUP BY o.user_id, EXTRACT(DOW FROM o.created_at), EXTRACT(HOUR FROM o.created_at),
    CASE 
      WHEN EXTRACT(MONTH FROM o.created_at) IN (12, 1, 2) THEN 'winter'
      WHEN EXTRACT(MONTH FROM o.created_at) IN (3, 4, 5) THEN 'spring'  
      WHEN EXTRACT(MONTH FROM o.created_at) IN (6, 7, 8) THEN 'summer'
      ELSE 'fall'
    END
),

product_preferences AS (
  -- Complex product affinity analysis
  SELECT 
    o.user_id,
    p.category,
    COUNT(*) as category_purchases,
    SUM(oi.quantity * oi.unit_price) as category_spend,

    -- Preference scoring
    ROW_NUMBER() OVER (PARTITION BY o.user_id ORDER BY COUNT(*) DESC) as category_rank,

    -- Purchase timing patterns
    AVG(EXTRACT(EPOCH FROM (o.created_at - LAG(o.created_at) OVER (PARTITION BY o.user_id, p.category ORDER BY o.created_at)))) / 86400 as avg_days_between_category_purchases

  FROM orders o
  JOIN order_items oi ON o.order_id = oi.order_id
  JOIN products p ON oi.product_id = p.product_id
  WHERE o.status = 'completed'
  GROUP BY o.user_id, p.category
),

final_analytics AS (
  SELECT 
    us.user_id,
    us.email,
    us.subscription_tier,
    us.total_activities,
    us.total_orders,
    us.lifetime_value,
    us.engagement_level,

    -- Order analytics
    COALESCE(oa.order_count, 0) as recent_order_count,
    COALESCE(oa.total_spent, 0) as recent_total_spent,
    COALESCE(oa.avg_order_value, 0) as recent_avg_order_value,

    -- Product preferences (expensive array aggregation)
    ARRAY(
      SELECT pp.category 
      FROM product_preferences pp 
      WHERE pp.user_id = us.user_id 
        AND pp.category_rank <= 3
      ORDER BY pp.category_rank
    ) as top_product_categories,

    -- Customer lifetime value prediction (complex calculation)
    CASE
      WHEN us.lifetime_value > 1000 AND us.recent_orders > 2 THEN us.lifetime_value * 1.2
      WHEN us.lifetime_value > 500 AND us.recent_activities > 10 THEN us.lifetime_value * 1.1
      ELSE us.lifetime_value
    END as predicted_ltv,

    -- Churn risk assessment
    CASE
      WHEN us.recent_activities = 0 AND oa.last_order_date < CURRENT_TIMESTAMP - INTERVAL '90 days' THEN 'high'
      WHEN us.recent_activities < 5 AND oa.last_order_date < CURRENT_TIMESTAMP - INTERVAL '45 days' THEN 'medium'
      ELSE 'low'
    END as churn_risk,

    -- Segmentation
    CASE
      WHEN us.lifetime_value > 1000 AND us.engagement_level = 'high' THEN 'vip'
      WHEN us.lifetime_value > 500 OR us.engagement_level = 'high' THEN 'loyal'
      WHEN us.total_orders > 0 THEN 'customer'
      ELSE 'prospect'
    END as user_segment

  FROM user_segments us
  LEFT JOIN order_analytics oa ON us.user_id = oa.user_id
)

SELECT *
FROM final_analytics
ORDER BY predicted_ltv DESC, engagement_level DESC;

-- Problems with traditional SQL aggregation:
-- 1. Multiple expensive subqueries for each user
-- 2. Complex joins across many tables with poor performance
-- 3. Difficult to optimize with multiple aggregation layers
-- 4. Limited support for complex nested data transformations
-- 5. Poor performance with large datasets due to multiple passes
-- 6. Complex window functions with high memory usage
-- 7. Difficulty handling semi-structured data efficiently
-- 8. Limited parallelization opportunities
-- 9. Complex query plans that are hard to optimize
-- 10. High resource usage for multi-stage analytics

-- MySQL approach (even more limited)
SELECT 
  u.user_id,
  u.email,
  u.subscription_tier,
  COUNT(DISTINCT ua.activity_id) as total_activities,
  COUNT(DISTINCT o.order_id) as total_orders,
  COALESCE(SUM(o.total_amount), 0) as lifetime_value,

  -- Limited aggregation capabilities
  CASE 
    WHEN COUNT(DISTINCT CASE WHEN ua.created_at >= DATE_SUB(NOW(), INTERVAL 7 DAY) THEN ua.activity_id END) > 10 THEN 'high'
    WHEN COUNT(DISTINCT CASE WHEN ua.created_at >= DATE_SUB(NOW(), INTERVAL 30 DAY) THEN ua.activity_id END) > 5 THEN 'medium'
    ELSE 'low'
  END as engagement_level,

  -- Basic JSON aggregation (limited functionality)
  JSON_ARRAYAGG(DISTINCT p.category) as purchased_categories

FROM users u
LEFT JOIN user_activities ua ON u.user_id = ua.user_id
LEFT JOIN orders o ON u.user_id = o.user_id AND o.status = 'completed'
LEFT JOIN order_items oi ON o.order_id = oi.order_id
LEFT JOIN products p ON oi.product_id = p.product_id
WHERE u.status = 'active'
GROUP BY u.user_id, u.email, u.subscription_tier;

-- MySQL limitations:
-- - Very limited JSON and array processing capabilities
-- - Poor window function support in older versions
-- - Basic aggregation functions with limited customization
-- - No sophisticated data transformation capabilities
-- - Limited support for complex analytical queries
-- - Poor performance with large result sets
-- - Minimal support for nested data structures

MongoDB Aggregation Framework provides optimized, pipeline-based analytics:

// MongoDB Aggregation Framework - optimized pipeline-based analytics
const { MongoClient } = require('mongodb');

const client = new MongoClient('mongodb://localhost:27017');
const db = client.db('analytics_platform');

// Advanced aggregation pipeline optimization strategies
class MongoAggregationOptimizer {
  constructor(db) {
    this.db = db;
    this.pipelineStats = new Map();
    this.indexRecommendations = [];
  }

  async optimizeUserAnalyticsPipeline() {
    console.log('Running optimized user analytics aggregation pipeline...');

    const users = this.db.collection('users');

    // Highly optimized aggregation pipeline
    const pipeline = [
      // Stage 1: Initial filtering - leverage indexes early
      {
        $match: {
          status: 'active',
          registrationDate: { 
            $gte: new Date(Date.now() - 365 * 24 * 60 * 60 * 1000) 
          }
        }
      },

      // Stage 2: Early projection to reduce document size
      {
        $project: {
          _id: 1,
          email: 1,
          subscriptionTier: 1,
          registrationDate: 1,
          lastLoginAt: 1,
          preferences: 1
        }
      },

      // Stage 3: Lookup user activities with optimized pipeline
      {
        $lookup: {
          from: 'user_activities',
          let: { userId: '$_id' },
          pipeline: [
            {
              $match: {
                $expr: { $eq: ['$userId', '$$userId'] },
                createdAt: { 
                  $gte: new Date(Date.now() - 365 * 24 * 60 * 60 * 1000) 
                }
              }
            },
            {
              $group: {
                _id: null,
                totalActivities: { $sum: 1 },
                recentActivities: {
                  $sum: {
                    $cond: {
                      if: { 
                        $gte: ['$createdAt', new Date(Date.now() - 30 * 24 * 60 * 60 * 1000)] 
                      },
                      then: 1,
                      else: 0
                    }
                  }
                },
                weeklyActivities: {
                  $sum: {
                    $cond: {
                      if: { 
                        $gte: ['$createdAt', new Date(Date.now() - 7 * 24 * 60 * 60 * 1000)] 
                      },
                      then: 1,
                      else: 0
                    }
                  }
                },
                activityTypes: { $addToSet: '$activityType' },
                lastActivity: { $max: '$createdAt' },
                avgSessionDuration: { $avg: '$sessionDuration' }
              }
            }
          ],
          as: 'activityStats'
        }
      },

      // Stage 4: Lookup order data with aggregated calculations
      {
        $lookup: {
          from: 'orders',
          let: { userId: '$_id' },
          pipeline: [
            {
              $match: {
                $expr: { $eq: ['$userId', '$$userId'] },
                status: 'completed',
                createdAt: { 
                  $gte: new Date(Date.now() - 365 * 24 * 60 * 60 * 1000) 
                }
              }
            },
            {
              $group: {
                _id: null,
                totalOrders: { $sum: 1 },
                lifetimeValue: { $sum: '$totalAmount' },
                avgOrderValue: { $avg: '$totalAmount' },
                lastOrderDate: { $max: '$createdAt' },
                recentOrders: {
                  $sum: {
                    $cond: {
                      if: { 
                        $gte: ['$createdAt', new Date(Date.now() - 30 * 24 * 60 * 60 * 1000)] 
                      },
                      then: 1,
                      else: 0
                    }
                  }
                },
                recentSpend: {
                  $sum: {
                    $cond: {
                      if: { 
                        $gte: ['$createdAt', new Date(Date.now() - 30 * 24 * 60 * 60 * 1000)] 
                      },
                      then: '$totalAmount',
                      else: 0
                    }
                  }
                },
                orderDaysOfWeek: { $push: { $dayOfWeek: '$createdAt' } },
                orderHours: { $push: { $hour: '$createdAt' } },
                seasonality: {
                  $push: {
                    $switch: {
                      branches: [
                        { case: { $in: [{ $month: '$createdAt' }, [12, 1, 2]] }, then: 'winter' },
                        { case: { $in: [{ $month: '$createdAt' }, [3, 4, 5]] }, then: 'spring' },
                        { case: { $in: [{ $month: '$createdAt' }, [6, 7, 8]] }, then: 'summer' }
                      ],
                      default: 'fall'
                    }
                  }
                }
              }
            }
          ],
          as: 'orderStats'
        }
      },

      // Stage 5: Product preference analysis
      {
        $lookup: {
          from: 'orders',
          let: { userId: '$_id' },
          pipeline: [
            {
              $match: {
                $expr: { $eq: ['$userId', '$$userId'] },
                status: 'completed'
              }
            },
            {
              $unwind: '$items'
            },
            {
              $lookup: {
                from: 'products',
                localField: 'items.productId',
                foreignField: '_id',
                as: 'product'
              }
            },
            {
              $unwind: '$product'
            },
            {
              $group: {
                _id: '$product.category',
                categoryPurchases: { $sum: 1 },
                categorySpend: { $sum: '$items.totalPrice' },
                avgDaysBetweenPurchases: {
                  $avg: {
                    $divide: [
                      { $subtract: ['$createdAt', { $min: '$createdAt' }] },
                      86400000 // milliseconds to days
                    ]
                  }
                }
              }
            },
            {
              $sort: { categoryPurchases: -1 }
            },
            {
              $limit: 5 // Top 5 categories only
            },
            {
              $group: {
                _id: null,
                topCategories: {
                  $push: {
                    category: '$_id',
                    purchases: '$categoryPurchases',
                    spend: '$categorySpend',
                    avgDaysBetween: '$avgDaysBetweenPurchases'
                  }
                }
              }
            }
          ],
          as: 'productPreferences'
        }
      },

      // Stage 6: Flatten and calculate derived metrics
      {
        $addFields: {
          // Extract activity stats
          activityStats: { $arrayElemAt: ['$activityStats', 0] },
          orderStats: { $arrayElemAt: ['$orderStats', 0] },
          productPreferences: { $arrayElemAt: ['$productPreferences', 0] }
        }
      },

      // Stage 7: Advanced calculated fields and scoring
      {
        $addFields: {
          // Engagement scoring
          engagementScore: {
            $add: [
              { $multiply: [{ $ifNull: ['$activityStats.weeklyActivities', 0] }, 2] },
              { $multiply: [{ $ifNull: ['$activityStats.recentActivities', 0] }, 1] },
              { $multiply: [{ $ifNull: ['$orderStats.recentOrders', 0] }, 5] }
            ]
          },

          // Engagement level classification
          engagementLevel: {
            $switch: {
              branches: [
                {
                  case: { $gt: [{ $ifNull: ['$activityStats.weeklyActivities', 0] }, 10] },
                  then: 'high'
                },
                {
                  case: { $gt: [{ $ifNull: ['$activityStats.recentActivities', 0] }, 5] },
                  then: 'medium'
                }
              ],
              default: 'low'
            }
          },

          // Customer lifetime value prediction
          predictedLTV: {
            $switch: {
              branches: [
                {
                  case: {
                    $and: [
                      { $gt: [{ $ifNull: ['$orderStats.lifetimeValue', 0] }, 1000] },
                      { $gt: [{ $ifNull: ['$orderStats.recentOrders', 0] }, 2] }
                    ]
                  },
                  then: { $multiply: [{ $ifNull: ['$orderStats.lifetimeValue', 0] }, 1.2] }
                },
                {
                  case: {
                    $and: [
                      { $gt: [{ $ifNull: ['$orderStats.lifetimeValue', 0] }, 500] },
                      { $gt: [{ $ifNull: ['$activityStats.recentActivities', 0] }, 10] }
                    ]
                  },
                  then: { $multiply: [{ $ifNull: ['$orderStats.lifetimeValue', 0] }, 1.1] }
                }
              ],
              default: { $ifNull: ['$orderStats.lifetimeValue', 0] }
            }
          },

          // Churn risk assessment
          churnRisk: {
            $switch: {
              branches: [
                {
                  case: {
                    $and: [
                      { $eq: [{ $ifNull: ['$activityStats.recentActivities', 0] }, 0] },
                      {
                        $lt: [
                          { $ifNull: ['$orderStats.lastOrderDate', new Date(0)] },
                          new Date(Date.now() - 90 * 24 * 60 * 60 * 1000)
                        ]
                      }
                    ]
                  },
                  then: 'high'
                },
                {
                  case: {
                    $and: [
                      { $lt: [{ $ifNull: ['$activityStats.recentActivities', 0] }, 5] },
                      {
                        $lt: [
                          { $ifNull: ['$orderStats.lastOrderDate', new Date(0)] },
                          new Date(Date.now() - 45 * 24 * 60 * 60 * 1000)
                        ]
                      }
                    ]
                  },
                  then: 'medium'
                }
              ],
              default: 'low'
            }
          },

          // User segmentation
          userSegment: {
            $switch: {
              branches: [
                {
                  case: {
                    $and: [
                      { $gt: [{ $ifNull: ['$orderStats.lifetimeValue', 0] }, 1000] },
                      { $eq: ['$engagementLevel', 'high'] }
                    ]
                  },
                  then: 'vip'
                },
                {
                  case: {
                    $or: [
                      { $gt: [{ $ifNull: ['$orderStats.lifetimeValue', 0] }, 500] },
                      { $eq: ['$engagementLevel', 'high'] }
                    ]
                  },
                  then: 'loyal'
                },
                {
                  case: { $gt: [{ $ifNull: ['$orderStats.totalOrders', 0] }, 0] },
                  then: 'customer'
                }
              ],
              default: 'prospect'
            }
          },

          // Behavioral patterns
          behaviorPattern: {
            $let: {
              vars: {
                dayOfWeekMode: {
                  $arrayElemAt: [
                    {
                      $map: {
                        input: { $range: [1, 8] },
                        as: 'day',
                        in: {
                          day: '$$day',
                          count: {
                            $size: {
                              $filter: {
                                input: { $ifNull: ['$orderStats.orderDaysOfWeek', []] },
                                cond: { $eq: ['$$this', '$$day'] }
                              }
                            }
                          }
                        }
                      }
                    },
                    0
                  ]
                }
              },
              in: {
                preferredOrderDay: '$$dayOfWeekMode.day',
                orderFrequency: {
                  $cond: {
                    if: { $gt: [{ $ifNull: ['$orderStats.totalOrders', 0] }, 1] },
                    then: {
                      $divide: [
                        365,
                        {
                          $divide: [
                            { 
                              $subtract: [
                                { $ifNull: ['$orderStats.lastOrderDate', new Date()] },
                                '$registrationDate'
                              ] 
                            },
                            86400000
                          ]
                        }
                      ]
                    },
                    else: 0
                  }
                }
              }
            }
          }
        }
      },

      // Stage 8: Final projection with optimized field selection
      {
        $project: {
          _id: 1,
          email: 1,
          subscriptionTier: 1,
          registrationDate: 1,

          // Activity metrics
          totalActivities: { $ifNull: ['$activityStats.totalActivities', 0] },
          recentActivities: { $ifNull: ['$activityStats.recentActivities', 0] },
          weeklyActivities: { $ifNull: ['$activityStats.weeklyActivities', 0] },
          activityTypes: { $ifNull: ['$activityStats.activityTypes', []] },
          lastActivity: '$activityStats.lastActivity',
          avgSessionDuration: { $round: [{ $ifNull: ['$activityStats.avgSessionDuration', 0] }, 2] },

          // Order metrics
          totalOrders: { $ifNull: ['$orderStats.totalOrders', 0] },
          lifetimeValue: { $round: [{ $ifNull: ['$orderStats.lifetimeValue', 0] }, 2] },
          avgOrderValue: { $round: [{ $ifNull: ['$orderStats.avgOrderValue', 0] }, 2] },
          lastOrderDate: '$orderStats.lastOrderDate',
          recentOrders: { $ifNull: ['$orderStats.recentOrders', 0] },
          recentSpend: { $round: [{ $ifNull: ['$orderStats.recentSpend', 0] }, 2] },

          // Product preferences
          topProductCategories: { 
            $ifNull: ['$productPreferences.topCategories', []] 
          },

          // Calculated metrics
          engagementScore: { $round: ['$engagementScore', 0] },
          engagementLevel: 1,
          predictedLTV: { $round: ['$predictedLTV', 2] },
          churnRisk: 1,
          userSegment: 1,
          behaviorPattern: 1,

          // Performance indicators
          isHighValue: { $gte: [{ $ifNull: ['$orderStats.lifetimeValue', 0] }, 1000] },
          isRecentlyActive: { 
            $gte: [{ $ifNull: ['$activityStats.recentActivities', 0] }, 5] 
          },
          isAtRisk: { $eq: ['$churnRisk', 'high'] },

          // Days since last activity/order
          daysSinceLastActivity: {
            $cond: {
              if: { $ne: ['$activityStats.lastActivity', null] },
              then: {
                $divide: [
                  { $subtract: [new Date(), '$activityStats.lastActivity'] },
                  86400000
                ]
              },
              else: 999
            }
          },
          daysSinceLastOrder: {
            $cond: {
              if: { $ne: ['$orderStats.lastOrderDate', null] },
              then: {
                $divide: [
                  { $subtract: [new Date(), '$orderStats.lastOrderDate'] },
                  86400000
                ]
              },
              else: 999
            }
          }
        }
      },

      // Stage 9: Sorting for optimal performance
      {
        $sort: {
          predictedLTV: -1,
          engagementScore: -1,
          lastActivity: -1
        }
      },

      // Stage 10: Optional limit for performance
      {
        $limit: 10000
      }
    ];

    // Execute pipeline with performance tracking
    const startTime = Date.now();
    const results = await users.aggregate(pipeline).toArray();
    const executionTime = Date.now() - startTime;

    console.log(`Aggregation completed in ${executionTime}ms, ${results.length} results`);

    // Track pipeline performance
    this.pipelineStats.set('userAnalytics', {
      executionTime,
      resultCount: results.length,
      pipelineStages: pipeline.length,
      timestamp: new Date()
    });

    return results;
  }

  async optimizeProductAnalyticsPipeline() {
    console.log('Running optimized product analytics aggregation pipeline...');

    const orders = this.db.collection('orders');

    const pipeline = [
      // Stage 1: Filter completed orders from last year
      {
        $match: {
          status: 'completed',
          createdAt: { 
            $gte: new Date(Date.now() - 365 * 24 * 60 * 60 * 1000) 
          }
        }
      },

      // Stage 2: Unwind order items for product-level analysis
      {
        $unwind: '$items'
      },

      // Stage 3: Lookup product details
      {
        $lookup: {
          from: 'products',
          localField: 'items.productId',
          foreignField: '_id',
          as: 'product'
        }
      },

      // Stage 4: Unwind product array
      {
        $unwind: '$product'
      },

      // Stage 5: Add time-based fields for analysis
      {
        $addFields: {
          orderMonth: { $month: '$createdAt' },
          orderDayOfWeek: { $dayOfWeek: '$createdAt' },
          orderHour: { $hour: '$createdAt' },
          season: {
            $switch: {
              branches: [
                { case: { $in: [{ $month: '$createdAt' }, [12, 1, 2]] }, then: 'winter' },
                { case: { $in: [{ $month: '$createdAt' }, [3, 4, 5]] }, then: 'spring' },
                { case: { $in: [{ $month: '$createdAt' }, [6, 7, 8]] }, then: 'summer' }
              ],
              default: 'fall'
            }
          },
          revenue: '$items.totalPrice',
          profit: {
            $subtract: ['$items.totalPrice', { $multiply: ['$items.quantity', '$product.cost'] }]
          },
          profitMargin: {
            $cond: {
              if: { $gt: ['$items.totalPrice', 0] },
              then: {
                $multiply: [
                  {
                    $divide: [
                      { $subtract: ['$items.totalPrice', { $multiply: ['$items.quantity', '$product.cost'] }] },
                      '$items.totalPrice'
                    ]
                  },
                  100
                ]
              },
              else: 0
            }
          }
        }
      },

      // Stage 6: Group by product for comprehensive analytics
      {
        $group: {
          _id: '$items.productId',
          productName: { $first: '$product.name' },
          category: { $first: '$product.category' },
          price: { $first: '$product.price' },
          cost: { $first: '$product.cost' },

          // Volume metrics
          totalSold: { $sum: '$items.quantity' },
          totalOrders: { $sum: 1 },
          uniqueCustomers: { $addToSet: '$userId' },

          // Revenue metrics
          totalRevenue: { $sum: '$revenue' },
          totalProfit: { $sum: '$profit' },
          avgOrderValue: { $avg: '$revenue' },
          avgProfitMargin: { $avg: '$profitMargin' },

          // Time-based patterns
          salesByMonth: {
            $push: {
              month: '$orderMonth',
              quantity: '$items.quantity',
              revenue: '$revenue'
            }
          },
          salesByDayOfWeek: {
            $push: {
              dayOfWeek: '$orderDayOfWeek',
              quantity: '$items.quantity'
            }
          },
          salesByHour: {
            $push: {
              hour: '$orderHour',
              quantity: '$items.quantity'
            }
          },
          salesBySeason: {
            $push: {
              season: '$season',
              quantity: '$items.quantity',
              revenue: '$revenue'
            }
          },

          // Performance indicators
          firstSale: { $min: '$createdAt' },
          lastSale: { $max: '$createdAt' },
          peakSaleMonth: {
            $max: {
              month: '$orderMonth',
              quantity: '$items.quantity'
            }
          }
        }
      },

      // Stage 7: Calculate advanced metrics
      {
        $addFields: {
          uniqueCustomerCount: { $size: '$uniqueCustomers' },
          avgQuantityPerOrder: { $divide: ['$totalSold', '$totalOrders'] },
          revenuePerCustomer: { 
            $divide: ['$totalRevenue', { $size: '$uniqueCustomers' }] 
          },
          daysSinceLastSale: {
            $divide: [
              { $subtract: [new Date(), '$lastSale'] },
              86400000
            ]
          },
          productLifespanDays: {
            $divide: [
              { $subtract: ['$lastSale', '$firstSale'] },
              86400000
            ]
          },

          // Monthly sales distribution
          monthlySalesStats: {
            $let: {
              vars: {
                monthlyAgg: {
                  $reduce: {
                    input: { $range: [1, 13] },
                    initialValue: [],
                    in: {
                      $concatArrays: [
                        '$$value',
                        [{
                          month: '$$this',
                          totalQuantity: {
                            $sum: {
                              $map: {
                                input: {
                                  $filter: {
                                    input: '$salesByMonth',
                                    cond: { $eq: ['$$this.month', '$$this'] }
                                  }
                                },
                                in: '$$this.quantity'
                              }
                            }
                          },
                          totalRevenue: {
                            $sum: {
                              $map: {
                                input: {
                                  $filter: {
                                    input: '$salesByMonth',
                                    cond: { $eq: ['$$this.month', '$$this'] }
                                  }
                                },
                                in: '$$this.revenue'
                              }
                            }
                          }
                        }]
                      ]
                    }
                  }
                }
              },
              in: {
                bestMonth: {
                  $arrayElemAt: [
                    {
                      $filter: {
                        input: '$$monthlyAgg',
                        cond: {
                          $eq: [
                            '$$this.totalQuantity',
                            { $max: '$$monthlyAgg.totalQuantity' }
                          ]
                        }
                      }
                    },
                    0
                  ]
                },
                monthlyTrend: '$$monthlyAgg'
              }
            }
          }
        }
      },

      // Stage 8: Product performance classification
      {
        $addFields: {
          performanceCategory: {
            $switch: {
              branches: [
                {
                  case: {
                    $and: [
                      { $gt: ['$totalRevenue', 10000] },
                      { $gt: ['$avgProfitMargin', 20] },
                      { $gt: ['$uniqueCustomerCount', 100] }
                    ]
                  },
                  then: 'star'
                },
                {
                  case: {
                    $and: [
                      { $gt: ['$totalRevenue', 5000] },
                      { $gt: ['$avgProfitMargin', 10] }
                    ]
                  },
                  then: 'strong'
                },
                {
                  case: {
                    $and: [
                      { $gt: ['$totalRevenue', 1000] },
                      { $gt: ['$totalSold', 10] }
                    ]
                  },
                  then: 'moderate'
                },
                {
                  case: { $lt: ['$daysSinceLastSale', 30] },
                  then: 'active'
                }
              ],
              default: 'underperforming'
            }
          },

          inventoryStatus: {
            $switch: {
              branches: [
                { case: { $gt: ['$daysSinceLastSale', 90] }, then: 'stale' },
                { case: { $gt: ['$daysSinceLastSale', 30] }, then: 'slow_moving' },
                { case: { $lt: ['$daysSinceLastSale', 7] }, then: 'hot' }
              ],
              default: 'normal'
            }
          },

          // Demand predictability
          demandConsistency: {
            $let: {
              vars: {
                monthlyQuantities: '$monthlySalesStats.monthlyTrend.totalQuantity',
                avgMonthly: {
                  $avg: '$monthlySalesStats.monthlyTrend.totalQuantity'
                }
              },
              in: {
                $cond: {
                  if: { $gt: ['$$avgMonthly', 0] },
                  then: {
                    $divide: [
                      {
                        $stdDevPop: '$$monthlyQuantities'
                      },
                      '$$avgMonthly'
                    ]
                  },
                  else: 0
                }
              }
            }
          }
        }
      },

      // Stage 9: Final projection
      {
        $project: {
          productId: '$_id',
          productName: 1,
          category: 1,
          price: 1,
          cost: 1,

          // Sales metrics
          totalSold: 1,
          totalOrders: 1,
          uniqueCustomerCount: 1,
          avgQuantityPerOrder: { $round: ['$avgQuantityPerOrder', 2] },

          // Financial metrics
          totalRevenue: { $round: ['$totalRevenue', 2] },
          totalProfit: { $round: ['$totalProfit', 2] },
          avgOrderValue: { $round: ['$avgOrderValue', 2] },
          avgProfitMargin: { $round: ['$avgProfitMargin', 1] },
          revenuePerCustomer: { $round: ['$revenuePerCustomer', 2] },

          // Performance classification
          performanceCategory: 1,
          inventoryStatus: 1,
          demandConsistency: { $round: ['$demandConsistency', 3] },

          // Time-based insights
          daysSinceLastSale: { $round: ['$daysSinceLastSale', 0] },
          productLifespanDays: { $round: ['$productLifespanDays', 0] },
          bestSellingMonth: '$monthlySalesStats.bestMonth.month',
          bestMonthQuantity: '$monthlySalesStats.bestMonth.totalQuantity',
          bestMonthRevenue: { 
            $round: ['$monthlySalesStats.bestMonth.totalRevenue', 2] 
          },

          // Flags for business decisions
          isTopPerformer: { $eq: ['$performanceCategory', 'star'] },
          needsAttention: { $in: ['$performanceCategory', ['underperforming']] },
          isInventoryRisk: { $in: ['$inventoryStatus', ['stale', 'slow_moving']] },
          isHighDemand: { $eq: ['$inventoryStatus', 'hot'] },
          isPredictableDemand: { $lt: ['$demandConsistency', 0.5] }
        }
      },

      // Stage 10: Sort by business priority
      {
        $sort: {
          totalRevenue: -1,
          totalProfit: -1,
          uniqueCustomerCount: -1
        }
      }
    ];

    const startTime = Date.now();
    const results = await orders.aggregate(pipeline).toArray();
    const executionTime = Date.now() - startTime;

    console.log(`Product analytics completed in ${executionTime}ms, ${results.length} results`);

    this.pipelineStats.set('productAnalytics', {
      executionTime,
      resultCount: results.length,
      pipelineStages: pipeline.length,
      timestamp: new Date()
    });

    return results;
  }

  async analyzeAggregationPerformance(collection, pipeline, sampleSize = 1000) {
    console.log('Analyzing aggregation performance...');

    // Get explain plan for the pipeline
    const explainResult = await collection.aggregate(pipeline, { explain: true }).toArray();

    // Run with different hints and options to compare performance
    const performanceTests = [];

    // Test 1: Default execution
    const test1Start = Date.now();
    const test1Results = await collection.aggregate(pipeline).limit(sampleSize).toArray();
    const test1Time = Date.now() - test1Start;

    performanceTests.push({
      name: 'default',
      executionTime: test1Time,
      resultCount: test1Results.length,
      avgTimePerResult: test1Time / test1Results.length
    });

    // Test 2: With allowDiskUse for large datasets
    const test2Start = Date.now();
    const test2Results = await collection.aggregate(pipeline, { 
      allowDiskUse: true 
    }).limit(sampleSize).toArray();
    const test2Time = Date.now() - test2Start;

    performanceTests.push({
      name: 'allowDiskUse',
      executionTime: test2Time,
      resultCount: test2Results.length,
      avgTimePerResult: test2Time / test2Results.length
    });

    // Test 3: With maxTimeMS limit
    try {
      const test3Start = Date.now();
      const test3Results = await collection.aggregate(pipeline, { 
        maxTimeMS: 30000 
      }).limit(sampleSize).toArray();
      const test3Time = Date.now() - test3Start;

      performanceTests.push({
        name: 'maxTimeMS_30s',
        executionTime: test3Time,
        resultCount: test3Results.length,
        avgTimePerResult: test3Time / test3Results.length
      });
    } catch (error) {
      performanceTests.push({
        name: 'maxTimeMS_30s',
        error: error.message,
        executionTime: 30000,
        resultCount: 0
      });
    }

    // Analyze pipeline stages
    const stageAnalysis = pipeline.map((stage, index) => {
      const stageType = Object.keys(stage)[0];
      return {
        stage: index + 1,
        type: stageType,
        complexity: this.analyzeStageComplexity(stage),
        indexUtilization: this.analyzeIndexUsage(stage),
        optimizationOpportunities: this.identifyOptimizations(stage)
      };
    });

    return {
      explainPlan: explainResult,
      performanceTests: performanceTests,
      stageAnalysis: stageAnalysis,
      recommendations: this.generateOptimizationRecommendations(performanceTests, stageAnalysis)
    };
  }

  analyzeStageComplexity(stage) {
    const stageType = Object.keys(stage)[0];
    const complexityScores = {
      '$match': 1,
      '$project': 2,
      '$addFields': 3,
      '$group': 5,
      '$lookup': 7,
      '$unwind': 3,
      '$sort': 4,
      '$limit': 1,
      '$skip': 1,
      '$facet': 8,
      '$bucket': 6,
      '$sortByCount': 4
    };

    return complexityScores[stageType] || 3;
  }

  analyzeIndexUsage(stage) {
    const stageType = Object.keys(stage)[0];

    if (stageType === '$match') {
      const matchFields = Object.keys(stage[stageType]);
      return {
        canUseIndex: true,
        indexFields: matchFields,
        recommendation: `Ensure compound index exists for fields: ${matchFields.join(', ')}`
      };
    } else if (stageType === '$sort') {
      const sortFields = Object.keys(stage[stageType]);
      return {
        canUseIndex: true,
        indexFields: sortFields,
        recommendation: `Create index with sort field order: ${sortFields.join(', ')}`
      };
    }

    return {
      canUseIndex: false,
      recommendation: 'Stage cannot directly utilize indexes'
    };
  }

  identifyOptimizations(stage) {
    const stageType = Object.keys(stage)[0];
    const optimizations = [];

    switch (stageType) {
      case '$match':
        optimizations.push('Place $match stages as early as possible in pipeline');
        optimizations.push('Use indexes for filter conditions');
        break;
      case '$project':
        optimizations.push('Project only necessary fields to reduce document size');
        optimizations.push('Place projection early to reduce pipeline data volume');
        break;
      case '$lookup':
        optimizations.push('Use pipeline in $lookup for better performance');
        optimizations.push('Ensure foreign collection has appropriate indexes');
        optimizations.push('Consider embedding documents instead of lookups if data size permits');
        break;
      case '$group':
        optimizations.push('Group operations may require memory - consider allowDiskUse');
        optimizations.push('Use $bucket or $bucketAuto for large groupings');
        break;
      case '$sort':
        optimizations.push('Use indexes for sorting when possible');
        optimizations.push('Limit sort data with early $match and $limit stages');
        break;
    }

    return optimizations;
  }

  generateOptimizationRecommendations(performanceTests, stageAnalysis) {
    const recommendations = [];

    // Performance analysis
    const fastest = performanceTests.reduce((prev, current) => 
      prev.executionTime < current.executionTime ? prev : current
    );

    if (fastest.name !== 'default') {
      recommendations.push(`Best performance achieved with ${fastest.name} option`);
    }

    // High complexity stages
    const highComplexityStages = stageAnalysis.filter(s => s.complexity >= 6);
    if (highComplexityStages.length > 0) {
      recommendations.push(`High complexity stages detected: ${highComplexityStages.map(s => s.type).join(', ')}`);
    }

    // Index recommendations
    const indexableStages = stageAnalysis.filter(s => s.indexUtilization.canUseIndex);
    if (indexableStages.length > 0) {
      recommendations.push(`Create indexes for stages: ${indexableStages.map(s => s.type).join(', ')}`);
    }

    // General optimization
    const totalComplexity = stageAnalysis.reduce((sum, s) => sum + s.complexity, 0);
    if (totalComplexity > 30) {
      recommendations.push('Consider breaking pipeline into smaller parts');
      recommendations.push('Use $limit early to reduce dataset size');
    }

    return recommendations;
  }

  async createOptimalIndexes(collection, aggregationPatterns) {
    console.log('Creating optimal indexes for aggregation patterns...');

    const indexRecommendations = [];

    for (const pattern of aggregationPatterns) {
      const { pipeline, frequency, avgExecutionTime } = pattern;

      // Analyze pipeline for index opportunities
      const matchStages = pipeline.filter(stage => stage.$match);
      const sortStages = pipeline.filter(stage => stage.$sort);
      const lookupStages = pipeline.filter(stage => stage.$lookup);

      // Create compound indexes for $match + $sort combinations
      for (const matchStage of matchStages) {
        const matchFields = Object.keys(matchStage.$match);

        for (const sortStage of sortStages) {
          const sortFields = Object.keys(sortStage.$sort);

          // Combine match and sort fields following ESR rule
          const indexSpec = {};

          // Equality fields first
          matchFields.forEach(field => {
            if (typeof matchStage.$match[field] !== 'object') {
              indexSpec[field] = 1;
            }
          });

          // Sort fields next
          sortFields.forEach(field => {
            if (!indexSpec[field]) {
              indexSpec[field] = sortStage.$sort[field];
            }
          });

          // Range fields last
          matchFields.forEach(field => {
            if (typeof matchStage.$match[field] === 'object' && !indexSpec[field]) {
              indexSpec[field] = 1;
            }
          });

          if (Object.keys(indexSpec).length > 1) {
            indexRecommendations.push({
              collection: collection.collectionName,
              indexSpec: indexSpec,
              reason: 'Compound index for $match + $sort optimization',
              frequency: frequency,
              priority: frequency * avgExecutionTime,
              estimatedBenefit: this.estimateIndexBenefit(indexSpec, pattern)
            });
          }
        }
      }

      // Create indexes for $lookup foreign collections
      for (const lookupStage of lookupStages) {
        const { from, foreignField } = lookupStage.$lookup;

        if (foreignField) {
          indexRecommendations.push({
            collection: from,
            indexSpec: { [foreignField]: 1 },
            reason: 'Index for $lookup foreign field',
            frequency: frequency,
            priority: frequency * avgExecutionTime * 0.8,
            estimatedBenefit: 'High - improves lookup performance significantly'
          });
        }
      }
    }

    // Sort by priority and create top indexes
    const topRecommendations = indexRecommendations
      .sort((a, b) => b.priority - a.priority)
      .slice(0, 10);

    for (const rec of topRecommendations) {
      try {
        const targetCollection = this.db.collection(rec.collection);
        const indexName = `idx_agg_${Object.keys(rec.indexSpec).join('_')}`;

        await targetCollection.createIndex(rec.indexSpec, {
          name: indexName,
          background: true
        });

        console.log(`Created index ${indexName} on ${rec.collection}`);

      } catch (error) {
        console.error(`Failed to create index for ${rec.collection}:`, error.message);
      }
    }

    return topRecommendations;
  }

  estimateIndexBenefit(indexSpec, pattern) {
    const fieldCount = Object.keys(indexSpec).length;
    const pipelineComplexity = pattern.pipeline.length;

    if (fieldCount >= 3 && pipelineComplexity >= 5) {
      return 'Very High - Complex compound index for multi-stage pipeline';
    } else if (fieldCount >= 2) {
      return 'High - Compound index provides significant benefit';
    } else {
      return 'Medium - Single field index provides moderate benefit';
    }
  }

  async getPipelinePerformanceMetrics() {
    const metrics = {
      totalPipelines: this.pipelineStats.size,
      pipelines: Array.from(this.pipelineStats.entries()).map(([name, stats]) => ({
        name: name,
        executionTime: stats.executionTime,
        resultCount: stats.resultCount,
        stageCount: stats.pipelineStages,
        throughput: Math.round(stats.resultCount / (stats.executionTime / 1000)),
        lastRun: stats.timestamp
      })),
      indexRecommendations: this.indexRecommendations,

      // Performance categories
      fastPipelines: Array.from(this.pipelineStats.entries())
        .filter(([_, stats]) => stats.executionTime < 1000),
      slowPipelines: Array.from(this.pipelineStats.entries())
        .filter(([_, stats]) => stats.executionTime > 5000),

      // Overall health
      avgExecutionTime: Array.from(this.pipelineStats.values())
        .reduce((sum, stats) => sum + stats.executionTime, 0) / this.pipelineStats.size || 0
    };

    return metrics;
  }
}

// Benefits of MongoDB Aggregation Framework:
// - Single-pass processing eliminates multiple query roundtrips
// - Intelligent pipeline optimization with automatic stage reordering
// - Native index utilization throughout the pipeline stages
// - Memory-efficient streaming processing for large datasets
// - Built-in parallelization across shards in distributed deployments
// - Rich expression language for complex transformations and calculations
// - Integration with MongoDB's query optimizer for optimal execution plans
// - Support for complex nested document operations and transformations
// - Automatic spill-to-disk capabilities for memory-intensive operations
// - Native support for advanced analytics patterns and statistical functions

module.exports = {
  MongoAggregationOptimizer
};

Understanding MongoDB Aggregation Performance Architecture

Advanced Pipeline Optimization Strategies

Implement sophisticated aggregation optimization techniques for maximum performance:

// Advanced aggregation optimization patterns
class AggregationPerformanceTuner {
  constructor(db) {
    this.db = db;
    this.performanceProfiles = new Map();
    this.optimizationRules = this.loadOptimizationRules();
  }

  async optimizePipelineOrder(pipeline) {
    console.log('Optimizing pipeline stage order for maximum performance...');

    // Analyze current pipeline
    const analysis = this.analyzePipelineStages(pipeline);

    // Apply optimization rules
    const optimizedPipeline = this.applyOptimizationRules(pipeline, analysis);

    // Estimate performance improvement
    const improvement = this.estimatePerformanceImprovement(pipeline, optimizedPipeline);

    return {
      originalPipeline: pipeline,
      optimizedPipeline: optimizedPipeline,
      optimizations: analysis.optimizations,
      estimatedImprovement: improvement
    };
  }

  analyzePipelineStages(pipeline) {
    const analysis = {
      stages: [],
      optimizations: [],
      indexOpportunities: [],
      memoryUsage: 0,
      diskUsage: false
    };

    pipeline.forEach((stage, index) => {
      const stageType = Object.keys(stage)[0];
      const stageAnalysis = {
        index: index,
        type: stageType,
        selectivity: this.calculateSelectivity(stage),
        memoryImpact: this.estimateMemoryUsage(stage),
        indexable: this.isIndexable(stage),
        earlyPlacement: this.canPlaceEarly(stage)
      };

      analysis.stages.push(stageAnalysis);

      // Track memory usage
      analysis.memoryUsage += stageAnalysis.memoryImpact;

      // Check for disk usage requirements
      if (stageType === '$group' || stageType === '$sort') {
        analysis.diskUsage = true;
      }

      // Identify optimization opportunities
      if (stageAnalysis.earlyPlacement && index > 2) {
        analysis.optimizations.push({
          type: 'move_early',
          stage: stageType,
          currentIndex: index,
          suggestedIndex: 0,
          reason: 'High selectivity stage should be placed early'
        });
      }

      if (stageAnalysis.indexable && !this.hasAppropriateIndex(stage)) {
        analysis.indexOpportunities.push({
          stage: stageType,
          indexSpec: this.suggestIndexSpec(stage),
          priority: stageAnalysis.selectivity * 10
        });
      }
    });

    return analysis;
  }

  applyOptimizationRules(pipeline, analysis) {
    let optimizedPipeline = [...pipeline];

    // Rule 1: Move high-selectivity $match stages to the beginning
    const matchStages = optimizedPipeline
      .map((stage, index) => ({ stage, index }))
      .filter(item => item.stage.$match)
      .sort((a, b) => {
        const selectivityA = this.calculateSelectivity(a.stage);
        const selectivityB = this.calculateSelectivity(b.stage);
        return selectivityA - selectivityB; // Higher selectivity first
      });

    // Reorder match stages
    matchStages.forEach((matchItem, newIndex) => {
      if (matchItem.index !== newIndex) {
        const stage = optimizedPipeline.splice(matchItem.index, 1)[0];
        optimizedPipeline.splice(newIndex, 0, stage);
      }
    });

    // Rule 2: Place $project stages early to reduce document size
    const projectIndex = optimizedPipeline.findIndex(stage => stage.$project);
    if (projectIndex > 2) {
      const projectStage = optimizedPipeline.splice(projectIndex, 1)[0];
      optimizedPipeline.splice(2, 0, projectStage);
    }

    // Rule 3: Move $limit stages as early as possible
    const limitIndex = optimizedPipeline.findIndex(stage => stage.$limit);
    if (limitIndex > -1) {
      const limitStage = optimizedPipeline[limitIndex];

      // Find appropriate position after filtering stages
      let insertPosition = 0;
      for (let i = 0; i < optimizedPipeline.length; i++) {
        const stageType = Object.keys(optimizedPipeline[i])[0];
        if (['$match', '$project'].includes(stageType)) {
          insertPosition = i + 1;
        } else {
          break;
        }
      }

      if (limitIndex !== insertPosition) {
        optimizedPipeline.splice(limitIndex, 1);
        optimizedPipeline.splice(insertPosition, 0, limitStage);
      }
    }

    // Rule 4: Combine adjacent $addFields stages
    optimizedPipeline = this.combineAdjacentAddFields(optimizedPipeline);

    // Rule 5: Push $match conditions into $lookup pipelines
    optimizedPipeline = this.optimizeLookupStages(optimizedPipeline);

    return optimizedPipeline;
  }

  calculateSelectivity(stage) {
    const stageType = Object.keys(stage)[0];

    switch (stageType) {
      case '$match':
        return this.calculateMatchSelectivity(stage.$match);
      case '$limit':
        return 0.1; // Very high selectivity
      case '$project':
        return 0.8; // Reduces document size
      case '$addFields':
        return 1.0; // No selectivity change
      case '$group':
        return 0.3; // Significant reduction typically
      case '$lookup':
        return 1.2; // May increase document size
      case '$unwind':
        return 1.5; // Increases document count
      case '$sort':
        return 1.0; // No selectivity change
      default:
        return 1.0;
    }
  }

  calculateMatchSelectivity(matchCondition) {
    let selectivity = 1.0;

    for (const [field, condition] of Object.entries(matchCondition)) {
      if (typeof condition === 'object') {
        // Range or complex conditions
        if (condition.$gte || condition.$lte || condition.$lt || condition.$gt) {
          selectivity *= 0.3; // Range queries are moderately selective
        } else if (condition.$in) {
          selectivity *= Math.min(0.5, condition.$in.length / 10);
        } else if (condition.$ne || condition.$nin) {
          selectivity *= 0.9; // Negative conditions are less selective
        } else if (condition.$exists) {
          selectivity *= condition.$exists ? 0.8 : 0.2;
        }
      } else {
        // Equality condition
        selectivity *= 0.1; // Equality is highly selective
      }
    }

    return Math.max(selectivity, 0.01); // Minimum selectivity
  }

  estimateMemoryUsage(stage) {
    const stageType = Object.keys(stage)[0];
    const memoryScores = {
      '$match': 10,
      '$project': 20,
      '$addFields': 30,
      '$group': 500,
      '$lookup': 200,
      '$unwind': 50,
      '$sort': 300,
      '$limit': 5,
      '$skip': 5,
      '$facet': 800,
      '$bucket': 400
    };

    return memoryScores[stageType] || 50;
  }

  isIndexable(stage) {
    const stageType = Object.keys(stage)[0];
    return ['$match', '$sort'].includes(stageType);
  }

  canPlaceEarly(stage) {
    const stageType = Object.keys(stage)[0];
    return ['$match', '$limit', '$project'].includes(stageType);
  }

  combineAdjacentAddFields(pipeline) {
    const optimized = [];
    let pendingAddFields = null;

    for (const stage of pipeline) {
      const stageType = Object.keys(stage)[0];

      if (stageType === '$addFields') {
        if (pendingAddFields) {
          // Merge with previous $addFields
          pendingAddFields.$addFields = {
            ...pendingAddFields.$addFields,
            ...stage.$addFields
          };
        } else {
          pendingAddFields = { ...stage };
        }
      } else {
        // Flush pending $addFields
        if (pendingAddFields) {
          optimized.push(pendingAddFields);
          pendingAddFields = null;
        }
        optimized.push(stage);
      }
    }

    // Flush any remaining $addFields
    if (pendingAddFields) {
      optimized.push(pendingAddFields);
    }

    return optimized;
  }

  optimizeLookupStages(pipeline) {
    return pipeline.map(stage => {
      if (stage.$lookup && !stage.$lookup.pipeline) {
        // Convert simple lookup to pipeline-based lookup for better performance
        const { from, localField, foreignField, as } = stage.$lookup;

        return {
          $lookup: {
            from: from,
            let: { localValue: `$${localField}` },
            pipeline: [
              {
                $match: {
                  $expr: { $eq: [`$${foreignField}`, '$$localValue'] }
                }
              }
            ],
            as: as
          }
        };
      }
      return stage;
    });
  }

  estimatePerformanceImprovement(originalPipeline, optimizedPipeline) {
    const originalScore = this.scorePipeline(originalPipeline);
    const optimizedScore = this.scorePipeline(optimizedPipeline);

    const improvement = (optimizedScore - originalScore) / originalScore * 100;

    return {
      originalScore: originalScore,
      optimizedScore: optimizedScore,
      improvementPercentage: Math.round(improvement),
      category: improvement > 50 ? 'Significant' :
                improvement > 20 ? 'Moderate' :
                improvement > 5 ? 'Minor' : 'Negligible'
    };
  }

  scorePipeline(pipeline) {
    let score = 100;
    let documentSizeMultiplier = 1;

    for (let i = 0; i < pipeline.length; i++) {
      const stage = pipeline[i];
      const stageType = Object.keys(stage)[0];

      // Penalties for poor stage ordering
      switch (stageType) {
        case '$match':
          if (i > 2) score -= 20; // Should be early
          break;
        case '$limit':
          if (i > 3) score -= 15; // Should be early
          break;
        case '$project':
          if (i > 1) score -= 10; // Should be early
          break;
        case '$sort':
          if (i === pipeline.length - 1) score += 5; // Good at end
          break;
        case '$group':
          score -= this.estimateMemoryUsage(stage) / 10;
          break;
        case '$lookup':
          score -= 20; // Expensive operation
          if (!stage.$lookup.pipeline) score -= 10; // No pipeline optimization
          break;
      }

      // Track document size changes
      const selectivity = this.calculateSelectivity(stage);
      documentSizeMultiplier *= selectivity;

      // Penalty for processing large documents through expensive stages
      if (documentSizeMultiplier > 1.5 && ['$group', '$lookup'].includes(stageType)) {
        score -= 25;
      }
    }

    return Math.max(score, 10);
  }

  loadOptimizationRules() {
    return [
      {
        name: 'early_filtering',
        description: 'Move high-selectivity $match stages early in pipeline',
        priority: 10
      },
      {
        name: 'index_utilization',
        description: 'Ensure indexable stages can use appropriate indexes',
        priority: 9
      },
      {
        name: 'document_size_reduction',
        description: 'Use $project early to reduce document size',
        priority: 8
      },
      {
        name: 'memory_optimization',
        description: 'Minimize memory usage in aggregation stages',
        priority: 7
      },
      {
        name: 'lookup_optimization',
        description: 'Optimize $lookup operations with pipelines',
        priority: 6
      }
    ];
  }

  async benchmarkPipelineVariations(collection, basePipeline, variations = []) {
    console.log('Benchmarking pipeline variations...');

    const results = [];
    const testDataSize = 1000;

    // Test base pipeline
    const baseResult = await this.benchmarkSinglePipeline(
      collection, 
      basePipeline, 
      'original', 
      testDataSize
    );
    results.push(baseResult);

    // Test optimized version
    const optimizationResult = await this.optimizePipelineOrder(basePipeline);
    const optimizedResult = await this.benchmarkSinglePipeline(
      collection,
      optimizationResult.optimizedPipeline,
      'optimized',
      testDataSize
    );
    results.push(optimizedResult);

    // Test custom variations
    for (let i = 0; i < variations.length; i++) {
      const variationResult = await this.benchmarkSinglePipeline(
        collection,
        variations[i].pipeline,
        variations[i].name || `variation_${i + 1}`,
        testDataSize
      );
      results.push(variationResult);
    }

    // Analyze results
    const analysis = this.analyzePerformanceResults(results);

    return {
      results: results,
      analysis: analysis,
      recommendation: this.generatePerformanceRecommendation(results, analysis)
    };
  }

  async benchmarkSinglePipeline(collection, pipeline, name, limit) {
    const iterations = 3;
    const times = [];

    for (let i = 0; i < iterations; i++) {
      const startTime = Date.now();

      try {
        const results = await collection.aggregate([
          ...pipeline,
          { $limit: limit }
        ]).toArray();

        const endTime = Date.now();
        times.push({
          executionTime: endTime - startTime,
          resultCount: results.length,
          success: true
        });

      } catch (error) {
        times.push({
          executionTime: null,
          resultCount: 0,
          success: false,
          error: error.message
        });
      }
    }

    const successfulRuns = times.filter(t => t.success);
    const avgTime = successfulRuns.length > 0 ? 
      successfulRuns.reduce((sum, t) => sum + t.executionTime, 0) / successfulRuns.length : null;

    return {
      name: name,
      pipeline: pipeline,
      iterations: iterations,
      successfulRuns: successfulRuns.length,
      averageTime: avgTime,
      minTime: successfulRuns.length > 0 ? Math.min(...successfulRuns.map(t => t.executionTime)) : null,
      maxTime: successfulRuns.length > 0 ? Math.max(...successfulRuns.map(t => t.executionTime)) : null,
      resultCount: successfulRuns.length > 0 ? successfulRuns[0].resultCount : 0,
      errors: times.filter(t => !t.success).map(t => t.error)
    };
  }

  analyzePerformanceResults(results) {
    const analysis = {
      bestPerforming: null,
      worstPerforming: null,
      performanceGains: [],
      consistencyAnalysis: []
    };

    // Find best and worst performing
    const validResults = results.filter(r => r.averageTime !== null);
    if (validResults.length > 0) {
      analysis.bestPerforming = validResults.reduce((best, current) => 
        current.averageTime < best.averageTime ? current : best
      );

      analysis.worstPerforming = validResults.reduce((worst, current) => 
        current.averageTime > worst.averageTime ? current : worst
      );
    }

    // Calculate performance gains
    const baseline = results.find(r => r.name === 'original');
    if (baseline && baseline.averageTime) {
      results.forEach(result => {
        if (result.name !== 'original' && result.averageTime) {
          const improvementPercent = ((baseline.averageTime - result.averageTime) / baseline.averageTime) * 100;
          analysis.performanceGains.push({
            name: result.name,
            improvementPercent: Math.round(improvementPercent),
            absoluteImprovement: baseline.averageTime - result.averageTime
          });
        }
      });
    }

    // Consistency analysis
    results.forEach(result => {
      if (result.minTime && result.maxTime && result.averageTime) {
        const variance = result.maxTime - result.minTime;
        const consistency = variance / result.averageTime;

        analysis.consistencyAnalysis.push({
          name: result.name,
          variance: variance,
          consistencyScore: consistency,
          rating: consistency < 0.1 ? 'Excellent' :
                  consistency < 0.3 ? 'Good' :
                  consistency < 0.5 ? 'Fair' : 'Poor'
        });
      }
    });

    return analysis;
  }

  generatePerformanceRecommendation(results, analysis) {
    const recommendations = [];

    if (analysis.bestPerforming) {
      recommendations.push(`Best performance achieved with: ${analysis.bestPerforming.name} (${analysis.bestPerforming.averageTime}ms average)`);
    }

    const significantGains = analysis.performanceGains.filter(g => g.improvementPercent > 20);
    if (significantGains.length > 0) {
      recommendations.push(`Significant performance improvements found: ${significantGains.map(g => `${g.name} (+${g.improvementPercent}%)`).join(', ')}`);
    }

    const poorConsistency = analysis.consistencyAnalysis.filter(c => c.rating === 'Poor');
    if (poorConsistency.length > 0) {
      recommendations.push(`Poor consistency detected in: ${poorConsistency.map(c => c.name).join(', ')} - consider allowDiskUse or different approach`);
    }

    if (recommendations.length === 0) {
      recommendations.push('All pipeline variations perform similarly - current implementation is adequate');
    }

    return recommendations;
  }
}

SQL-Style Aggregation Operations with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB aggregation operations:

-- QueryLeaf aggregation operations with SQL-familiar syntax

-- Complex user analytics with optimized aggregation
WITH user_activity_stats AS (
  SELECT 
    u.user_id,
    u.email,
    u.subscription_tier,
    u.registration_date,

    -- Activity metrics using MongoDB aggregation expressions
    COUNT(ua.activity_id) as total_activities,
    COUNT(ua.activity_id) FILTER (WHERE ua.created_at >= CURRENT_TIMESTAMP - INTERVAL '30 days') as recent_activities,
    COUNT(ua.activity_id) FILTER (WHERE ua.created_at >= CURRENT_TIMESTAMP - INTERVAL '7 days') as weekly_activities,

    -- Engagement scoring with MongoDB operators
    ARRAY_AGG(DISTINCT ua.activity_type) as activity_types,
    MAX(ua.created_at) as last_activity,
    AVG(ua.session_duration) as avg_session_duration,

    -- Complex engagement calculation
    (COUNT(ua.activity_id) FILTER (WHERE ua.created_at >= CURRENT_TIMESTAMP - INTERVAL '7 days') * 2) +
    (COUNT(ua.activity_id) FILTER (WHERE ua.created_at >= CURRENT_TIMESTAMP - INTERVAL '30 days') * 1) as engagement_score

  FROM users u
  LEFT JOIN user_activities ua ON u.user_id = ua.user_id 
    AND ua.created_at >= CURRENT_TIMESTAMP - INTERVAL '1 year'
  WHERE u.status = 'active'
    AND u.registration_date >= CURRENT_TIMESTAMP - INTERVAL '1 year'
  GROUP BY u.user_id, u.email, u.subscription_tier, u.registration_date
),

order_analytics AS (
  SELECT 
    o.user_id,
    COUNT(*) as total_orders,
    SUM(o.total_amount) as lifetime_value,
    AVG(o.total_amount) as avg_order_value,
    MAX(o.created_at) as last_order_date,
    COUNT(*) FILTER (WHERE o.created_at >= CURRENT_TIMESTAMP - INTERVAL '30 days') as recent_orders,
    SUM(o.total_amount) FILTER (WHERE o.created_at >= CURRENT_TIMESTAMP - INTERVAL '30 days') as recent_spend,

    -- Time-based patterns using MongoDB date operators
    MODE() WITHIN GROUP (ORDER BY EXTRACT(DOW FROM o.created_at)) as preferred_order_day,
    ARRAY_AGG(
      CASE 
        WHEN EXTRACT(MONTH FROM o.created_at) IN (12, 1, 2) THEN 'winter'
        WHEN EXTRACT(MONTH FROM o.created_at) IN (3, 4, 5) THEN 'spring'
        WHEN EXTRACT(MONTH FROM o.created_at) IN (6, 7, 8) THEN 'summer'
        ELSE 'fall'
      END
    ) as seasonal_patterns

  FROM orders o
  WHERE o.status = 'completed'
    AND o.created_at >= CURRENT_TIMESTAMP - INTERVAL '1 year'
  GROUP BY o.user_id
),

product_preferences AS (
  -- Optimized product affinity analysis
  SELECT 
    o.user_id,
    -- Use MongoDB aggregation for complex transformations
    JSON_AGG(
      JSON_BUILD_OBJECT(
        'category', p.category,
        'purchases', COUNT(*),
        'spend', SUM(oi.quantity * oi.unit_price),
        'avg_days_between', AVG(
          EXTRACT(EPOCH FROM (o.created_at - LAG(o.created_at) OVER (
            PARTITION BY o.user_id, p.category 
            ORDER BY o.created_at
          ))) / 86400
        )
      )
      ORDER BY COUNT(*) DESC
      LIMIT 5
    ) as top_categories

  FROM orders o
  JOIN order_items oi ON o.order_id = oi.order_id
  JOIN products p ON oi.product_id = p.product_id
  WHERE o.status = 'completed'
  GROUP BY o.user_id
),

final_user_analytics AS (
  SELECT 
    uas.user_id,
    uas.email,
    uas.subscription_tier,
    uas.registration_date,

    -- Activity metrics
    uas.total_activities,
    uas.recent_activities,
    uas.weekly_activities,
    uas.activity_types,
    uas.last_activity,
    ROUND(uas.avg_session_duration::numeric, 2) as avg_session_duration,
    uas.engagement_score,

    -- Order metrics
    COALESCE(oa.total_orders, 0) as total_orders,
    COALESCE(oa.lifetime_value, 0) as lifetime_value,
    COALESCE(oa.avg_order_value, 0) as avg_order_value,
    oa.last_order_date,
    COALESCE(oa.recent_orders, 0) as recent_orders,
    COALESCE(oa.recent_spend, 0) as recent_spend,

    -- Product preferences
    pp.top_categories,

    -- Calculated fields using MongoDB-style conditional logic
    CASE 
      WHEN uas.weekly_activities > 10 THEN 'high'
      WHEN uas.recent_activities > 5 THEN 'medium'
      ELSE 'low'
    END as engagement_level,

    -- Predictive LTV using MongoDB conditional expressions
    CASE
      WHEN COALESCE(oa.lifetime_value, 0) > 1000 AND COALESCE(oa.recent_orders, 0) > 2 
        THEN COALESCE(oa.lifetime_value, 0) * 1.2
      WHEN COALESCE(oa.lifetime_value, 0) > 500 AND uas.recent_activities > 10 
        THEN COALESCE(oa.lifetime_value, 0) * 1.1
      ELSE COALESCE(oa.lifetime_value, 0)
    END as predicted_ltv,

    -- Churn risk assessment
    CASE
      WHEN uas.recent_activities = 0 AND oa.last_order_date < CURRENT_TIMESTAMP - INTERVAL '90 days' THEN 'high'
      WHEN uas.recent_activities < 5 AND oa.last_order_date < CURRENT_TIMESTAMP - INTERVAL '45 days' THEN 'medium'
      ELSE 'low'
    END as churn_risk,

    -- User segmentation
    CASE
      WHEN COALESCE(oa.lifetime_value, 0) > 1000 AND uas.engagement_score > 50 THEN 'vip'
      WHEN COALESCE(oa.lifetime_value, 0) > 500 OR uas.engagement_score > 30 THEN 'loyal'
      WHEN COALESCE(oa.total_orders, 0) > 0 THEN 'customer'
      ELSE 'prospect'
    END as user_segment,

    -- Behavioral patterns
    oa.preferred_order_day,
    CASE 
      WHEN COALESCE(oa.total_orders, 0) > 1 THEN
        365.0 / GREATEST(
          EXTRACT(EPOCH FROM (oa.last_order_date - uas.registration_date)) / 86400.0,
          1
        )
      ELSE 0
    END as order_frequency,

    -- Performance indicators
    COALESCE(oa.lifetime_value, 0) >= 1000 as is_high_value,
    uas.recent_activities >= 5 as is_recently_active,
    CASE
      WHEN uas.recent_activities = 0 AND oa.last_order_date < CURRENT_TIMESTAMP - INTERVAL '90 days' THEN true
      ELSE false
    END as is_at_risk,

    -- Time since last activity/order
    CASE 
      WHEN uas.last_activity IS NOT NULL THEN
        EXTRACT(EPOCH FROM (CURRENT_TIMESTAMP - uas.last_activity)) / 86400
      ELSE 999
    END as days_since_last_activity,

    CASE 
      WHEN oa.last_order_date IS NOT NULL THEN
        EXTRACT(EPOCH FROM (CURRENT_TIMESTAMP - oa.last_order_date)) / 86400
      ELSE 999
    END as days_since_last_order

  FROM user_activity_stats uas
  LEFT JOIN order_analytics oa ON uas.user_id = oa.user_id
  LEFT JOIN product_preferences pp ON uas.user_id = pp.user_id
)

SELECT *
FROM final_user_analytics
ORDER BY predicted_ltv DESC, engagement_score DESC, last_activity DESC
LIMIT 1000;

-- Advanced product performance analytics
WITH product_sales_analysis AS (
  SELECT 
    p.product_id,
    p.name as product_name,
    p.category,
    p.price,
    p.cost,

    -- Volume metrics using MongoDB aggregation
    SUM(oi.quantity) as total_sold,
    COUNT(DISTINCT o.order_id) as total_orders,
    COUNT(DISTINCT o.user_id) as unique_customers,
    AVG(oi.quantity) as avg_quantity_per_order,

    -- Revenue and profit calculations
    SUM(oi.quantity * oi.unit_price) as total_revenue,
    SUM(oi.quantity * (oi.unit_price - p.cost)) as total_profit,
    AVG(oi.quantity * oi.unit_price) as avg_order_value,
    AVG((oi.unit_price - p.cost) / oi.unit_price * 100) as avg_profit_margin,
    SUM(oi.quantity * oi.unit_price) / COUNT(DISTINCT o.user_id) as revenue_per_customer,

    -- Time-based analysis using MongoDB date functions
    MIN(o.created_at) as first_sale,
    MAX(o.created_at) as last_sale,
    EXTRACT(EPOCH FROM (MAX(o.created_at) - MIN(o.created_at))) / 86400 as product_lifespan_days,
    EXTRACT(EPOCH FROM (CURRENT_TIMESTAMP - MAX(o.created_at))) / 86400 as days_since_last_sale,

    -- Monthly sales pattern analysis
    JSON_OBJECT_AGG(
      EXTRACT(MONTH FROM o.created_at),
      JSON_BUILD_OBJECT(
        'quantity', SUM(oi.quantity),
        'revenue', SUM(oi.quantity * oi.unit_price)
      )
    ) as monthly_sales,

    -- Day of week patterns
    JSON_OBJECT_AGG(
      EXTRACT(DOW FROM o.created_at),
      SUM(oi.quantity)
    ) as dow_sales_pattern,

    -- Seasonal analysis
    JSON_OBJECT_AGG(
      CASE 
        WHEN EXTRACT(MONTH FROM o.created_at) IN (12, 1, 2) THEN 'winter'
        WHEN EXTRACT(MONTH FROM o.created_at) IN (3, 4, 5) THEN 'spring'
        WHEN EXTRACT(MONTH FROM o.created_at) IN (6, 7, 8) THEN 'summer'
        ELSE 'fall'
      END,
      JSON_BUILD_OBJECT(
        'quantity', SUM(oi.quantity),
        'revenue', SUM(oi.quantity * oi.unit_price)
      )
    ) as seasonal_performance

  FROM products p
  JOIN order_items oi ON p.product_id = oi.product_id
  JOIN orders o ON oi.order_id = o.order_id
  WHERE o.status = 'completed'
    AND o.created_at >= CURRENT_TIMESTAMP - INTERVAL '1 year'
  GROUP BY p.product_id, p.name, p.category, p.price, p.cost
),

product_performance_classification AS (
  SELECT *,
    -- Performance scoring using MongoDB-style conditional logic
    CASE 
      WHEN total_revenue > 10000 AND avg_profit_margin > 20 AND unique_customers > 100 THEN 'star'
      WHEN total_revenue > 5000 AND avg_profit_margin > 10 THEN 'strong'
      WHEN total_revenue > 1000 AND total_sold > 10 THEN 'moderate'
      WHEN days_since_last_sale < 30 THEN 'active'
      ELSE 'underperforming'
    END as performance_category,

    -- Inventory status
    CASE 
      WHEN days_since_last_sale > 90 THEN 'stale'
      WHEN days_since_last_sale > 30 THEN 'slow_moving'
      WHEN days_since_last_sale < 7 THEN 'hot'
      ELSE 'normal'
    END as inventory_status,

    -- Demand predictability using MongoDB expressions
    -- Calculate coefficient of variation for monthly sales
    (
      SELECT STDDEV(monthly_quantity) / AVG(monthly_quantity)
      FROM (
        SELECT (monthly_sales->>month_num)::numeric as monthly_quantity
        FROM generate_series(1, 12) as month_num
      ) monthly_data
      WHERE monthly_quantity > 0
    ) as demand_consistency,

    -- Best performing periods
    (
      SELECT month_num
      FROM (
        SELECT 
          month_num,
          (monthly_sales->>month_num)::numeric as quantity
        FROM generate_series(1, 12) as month_num
      ) monthly_rank
      ORDER BY quantity DESC NULLS LAST
      LIMIT 1
    ) as best_month,

    -- Performance flags
    total_revenue >= 10000 as is_top_performer,
    performance_category = 'underperforming' as needs_attention,
    inventory_status IN ('stale', 'slow_moving') as is_inventory_risk,
    inventory_status = 'hot' as is_high_demand,
    demand_consistency < 0.5 as is_predictable_demand

  FROM product_sales_analysis
)

SELECT 
  product_id,
  product_name,
  category,
  price,
  cost,

  -- Volume metrics
  total_sold,
  total_orders,
  unique_customers,
  ROUND(avg_quantity_per_order::numeric, 2) as avg_quantity_per_order,

  -- Financial metrics
  ROUND(total_revenue::numeric, 2) as total_revenue,
  ROUND(total_profit::numeric, 2) as total_profit,
  ROUND(avg_order_value::numeric, 2) as avg_order_value,
  ROUND(avg_profit_margin::numeric, 1) as avg_profit_margin_pct,
  ROUND(revenue_per_customer::numeric, 2) as revenue_per_customer,

  -- Performance classification
  performance_category,
  inventory_status,
  ROUND(demand_consistency::numeric, 3) as demand_consistency,

  -- Time-based insights
  ROUND(days_since_last_sale::numeric, 0) as days_since_last_sale,
  ROUND(product_lifespan_days::numeric, 0) as product_lifespan_days,
  best_month,

  -- Business flags
  is_top_performer,
  needs_attention,
  is_inventory_risk,
  is_high_demand,
  is_predictable_demand,

  -- Additional insights
  monthly_sales,
  seasonal_performance

FROM product_performance_classification
ORDER BY total_revenue DESC, total_profit DESC, unique_customers DESC
LIMIT 500;

-- Real-time aggregation with windowed analytics
SELECT 
  user_id,
  activity_type,
  DATE_TRUNC('hour', created_at) as hour_bucket,

  -- Window functions with MongoDB-style aggregations
  COUNT(*) as activities_this_hour,
  SUM(session_duration) as total_session_time,
  AVG(session_duration) as avg_session_duration,

  -- Moving averages over time windows
  AVG(COUNT(*)) OVER (
    PARTITION BY user_id, activity_type 
    ORDER BY DATE_TRUNC('hour', created_at)
    ROWS BETWEEN 23 PRECEDING AND CURRENT ROW
  ) as avg_activities_24h,

  -- Rank activities within user sessions
  DENSE_RANK() OVER (
    PARTITION BY user_id, DATE_TRUNC('day', created_at)
    ORDER BY COUNT(*) DESC
  ) as daily_activity_rank,

  -- Calculate cumulative metrics
  SUM(COUNT(*)) OVER (
    PARTITION BY user_id 
    ORDER BY DATE_TRUNC('hour', created_at)
  ) as cumulative_activities,

  -- Detect anomalies using MongoDB statistical functions
  COUNT(*) > (
    AVG(COUNT(*)) OVER (
      PARTITION BY user_id, activity_type
      ORDER BY DATE_TRUNC('hour', created_at)
      ROWS BETWEEN 167 PRECEDING AND 1 PRECEDING
    ) + 2 * STDDEV(COUNT(*)) OVER (
      PARTITION BY user_id, activity_type
      ORDER BY DATE_TRUNC('hour', created_at)
      ROWS BETWEEN 167 PRECEDING AND 1 PRECEDING
    )
  ) as is_anomaly,

  -- Performance indicators
  CASE
    WHEN COUNT(*) > 100 THEN 'high_activity'
    WHEN COUNT(*) > 50 THEN 'moderate_activity'
    ELSE 'low_activity'
  END as activity_level

FROM user_activities
WHERE created_at >= CURRENT_TIMESTAMP - INTERVAL '7 days'
GROUP BY user_id, activity_type, DATE_TRUNC('hour', created_at)
ORDER BY user_id, hour_bucket DESC;

-- QueryLeaf provides comprehensive aggregation optimization:
-- 1. SQL-familiar aggregation syntax with MongoDB performance benefits
-- 2. Automatic pipeline optimization and stage reordering
-- 3. Intelligent index utilization for aggregation stages
-- 4. Memory-efficient processing for large dataset analytics
-- 5. Advanced window functions and statistical operations
-- 6. Real-time aggregation with streaming analytics capabilities
-- 7. Integration with MongoDB's native aggregation optimizations
-- 8. Familiar SQL patterns for complex analytical queries
-- 9. Automatic spill-to-disk handling for memory-intensive operations
-- 10. Performance monitoring and optimization recommendations

Best Practices for Aggregation Optimization

Pipeline Design Strategy

Essential principles for optimal aggregation performance:

  1. Early Filtering: Place $match stages as early as possible to reduce dataset size
  2. Index Utilization: Design indexes that support aggregation stages effectively
  3. Memory Management: Monitor memory usage and use allowDiskUse when necessary
  4. Stage Ordering: Follow optimization rules for stage placement and combination
  5. Document Size: Use $project early to reduce document size through the pipeline
  6. Parallelization: Design pipelines that can leverage MongoDB's parallel processing

Performance and Scalability

Optimize aggregations for production workloads:

  1. Pipeline Optimization: Use MongoDB's explain functionality to understand execution plans
  2. Resource Planning: Plan memory and CPU resources for aggregation processing
  3. Sharding Strategy: Design aggregations that work efficiently across sharded clusters
  4. Caching Strategy: Implement appropriate caching for frequently-run aggregations
  5. Monitoring Setup: Track aggregation performance and resource usage
  6. Testing Strategy: Benchmark different pipeline approaches with realistic data volumes

Conclusion

MongoDB's Aggregation Framework provides sophisticated data processing capabilities that eliminate the performance limitations and complexity of traditional SQL analytics approaches. The combination of pipeline-based processing, intelligent optimization, and native index utilization makes building high-performance analytics both powerful and efficient.

Key Aggregation Framework benefits include:

  • Single-Pass Processing: Eliminates multiple query roundtrips for complex analytics
  • Intelligent Optimization: Automatic pipeline optimization and stage reordering
  • Native Index Integration: Comprehensive index utilization throughout pipeline stages
  • Memory-Efficient Processing: Streaming processing with automatic spill-to-disk capabilities
  • Parallel Execution: Built-in parallelization across distributed deployments
  • Rich Expression Language: Comprehensive transformation and analytical capabilities

Whether you're building business intelligence dashboards, real-time analytics platforms, data science workflows, or any application requiring sophisticated data processing, MongoDB's Aggregation Framework with QueryLeaf's familiar SQL interface provides the foundation for high-performance analytics solutions.

QueryLeaf Integration: QueryLeaf automatically optimizes MongoDB aggregation operations while providing SQL-familiar analytics syntax, pipeline optimization, and performance monitoring. Advanced aggregation patterns, index optimization, and performance tuning are seamlessly handled through familiar SQL constructs, making sophisticated analytics both powerful and accessible to SQL-oriented development teams.

The integration of advanced aggregation capabilities with SQL-style operations makes MongoDB an ideal platform for applications requiring both complex analytical processing and familiar database interaction patterns, ensuring your analytics solutions remain both performant and maintainable as they scale and evolve.

MongoDB Transactions and ACID Compliance: Building Reliable Distributed Systems with SQL-Style Transaction Management

Modern distributed applications require robust data consistency guarantees and transaction support to ensure business-critical operations maintain data integrity across complex workflows. Traditional NoSQL databases often sacrifice ACID properties for scalability, forcing developers to implement complex application-level consistency mechanisms that are error-prone and difficult to maintain.

MongoDB Multi-Document Transactions provide full ACID compliance across multiple documents and collections, enabling developers to build reliable distributed systems with the same consistency guarantees as traditional relational databases while maintaining MongoDB's horizontal scalability and flexible document model. Unlike eventual consistency models that require complex conflict resolution, MongoDB transactions ensure immediate consistency with familiar commit/rollback semantics.

The Traditional Distributed Consistency Challenge

Conventional approaches to maintaining consistency in distributed systems have significant limitations for modern applications:

-- Traditional relational approach - limited scalability and flexibility

-- PostgreSQL distributed transaction with complex state management
BEGIN;

-- Order creation with inventory checks
WITH inventory_check AS (
  SELECT 
    product_id,
    available_quantity,
    reserved_quantity,
    CASE 
      WHEN available_quantity >= 5 THEN true 
      ELSE false 
    END as sufficient_inventory
  FROM inventory 
  WHERE product_id = 'prod_12345'
  FOR UPDATE
),
order_validation AS (
  SELECT 
    user_id,
    account_balance,
    credit_limit,
    account_status,
    CASE 
      WHEN account_status = 'active' AND (account_balance + credit_limit) >= 299.99 THEN true
      ELSE false
    END as payment_valid
  FROM user_accounts 
  WHERE user_id = 'user_67890'
  FOR UPDATE
)
INSERT INTO orders (
  order_id,
  user_id, 
  product_id,
  quantity,
  total_amount,
  order_status,
  created_at
)
SELECT 
  'order_' || nextval('order_seq'),
  'user_67890',
  'prod_12345', 
  5,
  299.99,
  CASE 
    WHEN ic.sufficient_inventory AND ov.payment_valid THEN 'confirmed'
    ELSE 'failed'
  END,
  CURRENT_TIMESTAMP
FROM inventory_check ic, order_validation ov;

-- Update inventory with complex validation
UPDATE inventory 
SET 
  available_quantity = available_quantity - 5,
  reserved_quantity = reserved_quantity + 5,
  updated_at = CURRENT_TIMESTAMP
WHERE product_id = 'prod_12345' 
  AND available_quantity >= 5;

-- Update user account balance
UPDATE user_accounts 
SET 
  account_balance = account_balance - 299.99,
  last_transaction = CURRENT_TIMESTAMP
WHERE user_id = 'user_67890' 
  AND account_status = 'active'
  AND (account_balance + credit_limit) >= 299.99;

-- Create order items with foreign key constraints
INSERT INTO order_items (
  order_id,
  product_id,
  quantity,
  unit_price,
  line_total
)
SELECT 
  o.order_id,
  'prod_12345',
  5,
  59.99,
  299.95
FROM orders o 
WHERE o.user_id = 'user_67890' 
  AND o.created_at >= CURRENT_TIMESTAMP - INTERVAL '1 minute';

-- Create audit trail
INSERT INTO transaction_audit (
  transaction_id,
  transaction_type,
  user_id,
  order_id,
  amount,
  status,
  created_at
)
SELECT 
  txid_current(),
  'order_creation',
  'user_67890',
  o.order_id,
  299.99,
  o.order_status,
  CURRENT_TIMESTAMP
FROM orders o 
WHERE o.user_id = 'user_67890' 
  AND o.created_at >= CURRENT_TIMESTAMP - INTERVAL '1 minute';

-- Complex validation before commit
DO $$
DECLARE
  order_count INTEGER;
  inventory_count INTEGER;
  balance_valid BOOLEAN;
BEGIN
  -- Verify order was created
  SELECT COUNT(*) INTO order_count
  FROM orders 
  WHERE user_id = 'user_67890' 
    AND created_at >= CURRENT_TIMESTAMP - INTERVAL '1 minute';

  -- Verify inventory was updated
  SELECT COUNT(*) INTO inventory_count
  FROM inventory 
  WHERE product_id = 'prod_12345' 
    AND reserved_quantity >= 5;

  -- Verify account balance
  SELECT (account_balance >= 0) INTO balance_valid
  FROM user_accounts 
  WHERE user_id = 'user_67890';

  IF order_count = 0 OR inventory_count = 0 OR NOT balance_valid THEN
    RAISE EXCEPTION 'Transaction validation failed';
  END IF;
END
$$;

COMMIT;

-- Problems with traditional distributed transactions:
-- 1. Complex multi-table validation and rollback logic
-- 2. Poor performance with long-running transactions and locks
-- 3. Difficulty scaling across multiple database instances
-- 4. Limited flexibility with rigid relational schema constraints
-- 5. Complex error handling and partial failure scenarios
-- 6. Manual coordination of distributed transaction state
-- 7. Poor integration with modern microservices architectures
-- 8. Limited support for document-based data structures
-- 9. Complex deadlock detection and resolution
-- 10. High operational overhead for distributed consistency

-- MySQL distributed transactions (even more limitations)
START TRANSACTION;

-- Basic order processing with limited validation
INSERT INTO mysql_orders (
  user_id, 
  product_id,
  quantity,
  amount,
  status,
  created_at
) VALUES (
  'user_67890',
  'prod_12345', 
  5,
  299.99,
  'pending',
  NOW()
);

-- Update inventory without proper validation
UPDATE mysql_inventory 
SET quantity = quantity - 5 
WHERE product_id = 'prod_12345' 
  AND quantity >= 5;

-- Update account balance
UPDATE mysql_accounts 
SET balance = balance - 299.99
WHERE user_id = 'user_67890' 
  AND balance >= 299.99;

-- Check if all updates succeeded
SELECT 
  (SELECT COUNT(*) FROM mysql_orders WHERE user_id = 'user_67890' AND created_at >= DATE_SUB(NOW(), INTERVAL 1 MINUTE)) as order_created,
  (SELECT quantity FROM mysql_inventory WHERE product_id = 'prod_12345') as remaining_inventory,
  (SELECT balance FROM mysql_accounts WHERE user_id = 'user_67890') as remaining_balance;

COMMIT;

-- MySQL limitations:
-- - Limited JSON support for complex document structures  
-- - Basic transaction isolation levels
-- - Poor support for distributed transactions
-- - Limited cross-table validation capabilities
-- - Simple error handling and rollback mechanisms
-- - No native support for document relationships
-- - Minimal support for complex business logic in transactions

MongoDB Multi-Document Transactions provide comprehensive ACID compliance:

// MongoDB Multi-Document Transactions - full ACID compliance with document flexibility
const { MongoClient } = require('mongodb');

const client = new MongoClient('mongodb://localhost:27017');
const db = client.db('ecommerce_platform');

// Advanced transaction processing with complex business logic
class TransactionManager {
  constructor(db) {
    this.db = db;
    this.collections = {
      orders: db.collection('orders'),
      inventory: db.collection('inventory'),
      users: db.collection('users'),
      payments: db.collection('payments'),
      auditLog: db.collection('audit_log'),
      loyalty: db.collection('loyalty_program'),
      promotions: db.collection('promotions'),
      shipping: db.collection('shipping_addresses')
    };
    this.transactionOptions = {
      readPreference: 'primary',
      readConcern: { level: 'local' },
      writeConcern: { w: 'majority', j: true }
    };
  }

  async processComplexOrder(orderData) {
    const session = client.startSession();

    try {
      // Start multi-document transaction with full ACID properties
      const result = await session.withTransaction(async () => {

        // Step 1: Validate and reserve inventory
        const inventoryResult = await this.validateAndReserveInventory(
          orderData.items, session
        );

        if (!inventoryResult.success) {
          throw new Error(`Insufficient inventory: ${inventoryResult.message}`);
        }

        // Step 2: Validate user account and payment method
        const userValidation = await this.validateUserAccount(
          orderData.userId, orderData.totalAmount, session
        );

        if (!userValidation.success) {
          throw new Error(`Payment validation failed: ${userValidation.message}`);
        }

        // Step 3: Apply promotions and calculate final pricing
        const pricingResult = await this.calculateFinalPricing(
          orderData, userValidation.user, session
        );

        // Step 4: Create order with complete transaction context
        const order = await this.createOrder({
          ...orderData,
          ...pricingResult,
          inventoryReservations: inventoryResult.reservations,
          userId: orderData.userId
        }, session);

        // Step 5: Process payment transaction
        const paymentResult = await this.processPaymentTransaction(
          order, userValidation.user.paymentMethods, session
        );

        if (!paymentResult.success) {
          throw new Error(`Payment processing failed: ${paymentResult.message}`);
        }

        // Step 6: Update user loyalty points
        await this.updateLoyaltyProgram(
          orderData.userId, pricingResult.finalAmount, session
        );

        // Step 7: Create shipping record
        await this.createShippingRecord(order, session);

        // Step 8: Create comprehensive audit trail
        await this.createTransactionAuditTrail({
          orderId: order._id,
          userId: orderData.userId,
          amount: pricingResult.finalAmount,
          inventoryChanges: inventoryResult.changes,
          paymentId: paymentResult.paymentId,
          timestamp: new Date()
        }, session);

        return {
          success: true,
          orderId: order._id,
          paymentId: paymentResult.paymentId,
          finalAmount: pricingResult.finalAmount,
          loyaltyPointsEarned: pricingResult.loyaltyPoints
        };

      }, this.transactionOptions);

      console.log('Complex order transaction completed successfully:', result);
      return result;

    } catch (error) {
      console.error('Transaction failed, automatic rollback initiated:', error);
      throw error;
    } finally {
      await session.endSession();
    }
  }

  async validateAndReserveInventory(items, session) {
    console.log('Validating and reserving inventory for items:', items);

    const reservations = [];
    const changes = [];

    for (const item of items) {
      // Read current inventory state within transaction
      const inventoryDoc = await this.collections.inventory.findOne(
        { productId: item.productId },
        { session }
      );

      if (!inventoryDoc) {
        return {
          success: false,
          message: `Product not found: ${item.productId}`
        };
      }

      // Validate availability including existing reservations
      const availableQuantity = inventoryDoc.quantity - inventoryDoc.reservedQuantity;

      if (availableQuantity < item.quantity) {
        return {
          success: false,
          message: `Insufficient stock for ${item.productId}. Available: ${availableQuantity}, Requested: ${item.quantity}`
        };
      }

      // Reserve inventory within transaction
      const updateResult = await this.collections.inventory.updateOne(
        {
          productId: item.productId,
          quantity: { $gte: inventoryDoc.reservedQuantity + item.quantity }
        },
        {
          $inc: { reservedQuantity: item.quantity },
          $push: {
            reservationHistory: {
              reservationId: new ObjectId(),
              quantity: item.quantity,
              timestamp: new Date(),
              type: 'order_reservation'
            }
          },
          $set: { lastUpdated: new Date() }
        },
        { session }
      );

      if (updateResult.modifiedCount === 0) {
        return {
          success: false,
          message: `Failed to reserve inventory for ${item.productId}`
        };
      }

      reservations.push({
        productId: item.productId,
        quantityReserved: item.quantity,
        previousAvailable: availableQuantity
      });

      changes.push({
        productId: item.productId,
        action: 'reserved',
        quantity: item.quantity,
        newReservedQuantity: inventoryDoc.reservedQuantity + item.quantity
      });
    }

    return {
      success: true,
      reservations: reservations,
      changes: changes
    };
  }

  async validateUserAccount(userId, totalAmount, session) {
    console.log(`Validating user account: ${userId} for amount: ${totalAmount}`);

    // Fetch user data within transaction
    const user = await this.collections.users.findOne(
      { _id: userId },
      { session }
    );

    if (!user) {
      return {
        success: false,
        message: 'User account not found'
      };
    }

    // Validate account status
    if (user.accountStatus !== 'active') {
      return {
        success: false,
        message: `Account is ${user.accountStatus} - cannot process orders`
      };
    }

    // Validate payment methods
    if (!user.paymentMethods || user.paymentMethods.length === 0) {
      return {
        success: false,
        message: 'No valid payment methods on file'
      };
    }

    // Check credit limits and available balance
    const totalAvailableCredit = user.accountBalance + 
      user.paymentMethods.reduce((sum, pm) => sum + (pm.creditLimit || 0), 0);

    if (totalAvailableCredit < totalAmount) {
      return {
        success: false,
        message: `Insufficient funds. Available: ${totalAvailableCredit}, Required: ${totalAmount}`
      };
    }

    // Check for fraud indicators
    if (user.riskScore && user.riskScore > 0.8) {
      return {
        success: false,
        message: 'Transaction blocked due to high risk score'
      };
    }

    return {
      success: true,
      user: user,
      availableCredit: totalAvailableCredit
    };
  }

  async calculateFinalPricing(orderData, user, session) {
    console.log('Calculating final pricing with promotions and discounts');

    let totalAmount = orderData.subtotal;
    let discountAmount = 0;
    let loyaltyPoints = 0;
    const appliedPromotions = [];

    // Check for applicable promotions within transaction
    const activePromotions = await this.collections.promotions.find(
      {
        active: true,
        startDate: { $lte: new Date() },
        endDate: { $gte: new Date() },
        $or: [
          { applicableUsers: userId },
          { applicableUserTiers: user.loyaltyTier },
          { globalPromotion: true }
        ]
      },
      { session }
    ).toArray();

    // Apply best available promotion
    for (const promotion of activePromotions) {
      if (this.isPromotionApplicable(promotion, orderData, user)) {
        const promotionDiscount = this.calculatePromotionDiscount(promotion, totalAmount);

        if (promotionDiscount > discountAmount) {
          discountAmount = promotionDiscount;
          appliedPromotions.push({
            promotionId: promotion._id,
            promotionName: promotion.name,
            discountAmount: promotionDiscount,
            discountType: promotion.discountType
          });
        }
      }
    }

    // Calculate loyalty points earned
    const loyaltyMultiplier = user.loyaltyTier === 'gold' ? 1.5 : 
                            user.loyaltyTier === 'silver' ? 1.2 : 1.0;
    loyaltyPoints = Math.floor((totalAmount - discountAmount) * 0.01 * loyaltyMultiplier);

    // Calculate taxes and final amount
    const taxRate = orderData.shippingAddress?.taxRate || 0.08;
    const subtotalAfterDiscount = totalAmount - discountAmount;
    const taxAmount = subtotalAfterDiscount * taxRate;
    const finalAmount = subtotalAfterDiscount + taxAmount + (orderData.shippingCost || 0);

    return {
      originalAmount: totalAmount,
      discountAmount: discountAmount,
      taxAmount: taxAmount,
      shippingCost: orderData.shippingCost || 0,
      finalAmount: finalAmount,
      loyaltyPoints: loyaltyPoints,
      appliedPromotions: appliedPromotions
    };
  }

  async createOrder(orderData, session) {
    console.log('Creating order with full transaction context');

    const order = {
      _id: new ObjectId(),
      userId: orderData.userId,
      orderNumber: `ORD-${Date.now()}-${Math.random().toString(36).substr(2, 9)}`,
      status: 'confirmed',

      // Order items with detailed information
      items: orderData.items.map(item => ({
        productId: item.productId,
        productName: item.productName,
        quantity: item.quantity,
        unitPrice: item.unitPrice,
        lineTotal: item.quantity * item.unitPrice,

        // Product snapshot for historical accuracy
        productSnapshot: {
          name: item.productName,
          description: item.description,
          category: item.category,
          sku: item.sku
        }
      })),

      // Pricing breakdown
      pricing: {
        subtotal: orderData.originalAmount,
        discountAmount: orderData.discountAmount,
        taxAmount: orderData.taxAmount,
        shippingCost: orderData.shippingCost,
        finalAmount: orderData.finalAmount
      },

      // Applied promotions
      promotions: orderData.appliedPromotions || [],

      // Customer information
      customer: {
        userId: orderData.userId,
        email: orderData.customerEmail,
        loyaltyTier: orderData.customerLoyaltyTier
      },

      // Shipping information
      shipping: {
        address: orderData.shippingAddress,
        method: orderData.shippingMethod,
        estimatedDelivery: orderData.estimatedDelivery,
        cost: orderData.shippingCost
      },

      // Order lifecycle
      lifecycle: {
        createdAt: new Date(),
        confirmedAt: new Date(),
        estimatedFulfillmentDate: new Date(Date.now() + 24 * 60 * 60 * 1000), // 24 hours
        status: 'confirmed'
      },

      // Transaction metadata
      transaction: {
        sessionId: session.id,
        ipAddress: orderData.ipAddress,
        userAgent: orderData.userAgent,
        referrer: orderData.referrer
      },

      // Inventory reservations
      inventoryReservations: orderData.inventoryReservations
    };

    const insertResult = await this.collections.orders.insertOne(order, { session });

    if (!insertResult.acknowledged) {
      throw new Error('Failed to create order');
    }

    return order;
  }

  async processPaymentTransaction(order, paymentMethods, session) {
    console.log(`Processing payment for order: ${order._id}`);

    // Select best payment method
    const primaryPaymentMethod = paymentMethods.find(pm => pm.primary) || paymentMethods[0];

    if (!primaryPaymentMethod) {
      return {
        success: false,
        message: 'No valid payment method available'
      };
    }

    // Create payment record within transaction
    const payment = {
      _id: new ObjectId(),
      orderId: order._id,
      userId: order.userId,
      amount: order.pricing.finalAmount,
      currency: 'USD',

      paymentMethod: {
        type: primaryPaymentMethod.type,
        maskedNumber: primaryPaymentMethod.maskedNumber,
        provider: primaryPaymentMethod.provider
      },

      status: 'completed', // Simulated successful payment

      transactionDetails: {
        authorizationCode: `AUTH_${Date.now()}`,
        transactionId: `TXN_${Math.random().toString(36).substr(2, 16)}`,
        processedAt: new Date(),
        processingFee: order.pricing.finalAmount * 0.029, // 2.9% processing fee

        // Risk assessment
        riskScore: Math.random() * 0.3, // Simulated low risk
        fraudChecks: {
          addressVerification: 'pass',
          cvvVerification: 'pass',
          velocityCheck: 'pass'
        }
      },

      // Gateway information
      gateway: {
        provider: 'stripe',
        gatewayTransactionId: `pi_${Math.random().toString(36).substr(2, 24)}`,
        gatewayFee: order.pricing.finalAmount * 0.029 + 0.30
      },

      createdAt: new Date(),
      updatedAt: new Date()
    };

    const paymentResult = await this.collections.payments.insertOne(payment, { session });

    if (!paymentResult.acknowledged) {
      return {
        success: false,
        message: 'Payment processing failed'
      };
    }

    // Update user account balance if using account credit
    if (primaryPaymentMethod.type === 'account_balance') {
      await this.collections.users.updateOne(
        { _id: order.userId },
        {
          $inc: { accountBalance: -order.pricing.finalAmount },
          $push: {
            transactionHistory: {
              type: 'debit',
              amount: order.pricing.finalAmount,
              description: `Order payment: ${order.orderNumber}`,
              timestamp: new Date()
            }
          }
        },
        { session }
      );
    }

    return {
      success: true,
      paymentId: payment._id,
      transactionId: payment.transactionDetails.transactionId,
      amount: payment.amount
    };
  }

  async updateLoyaltyProgram(userId, orderAmount, session) {
    console.log(`Updating loyalty program for user: ${userId}`);

    // Calculate loyalty points (1% of order amount)
    const pointsEarned = Math.floor(orderAmount);

    // Update loyalty program within transaction
    const loyaltyUpdate = await this.collections.loyalty.updateOne(
      { userId: userId },
      {
        $inc: { 
          totalPoints: pointsEarned,
          lifetimePoints: pointsEarned,
          totalSpend: orderAmount
        },
        $push: {
          pointsHistory: {
            type: 'earned',
            points: pointsEarned,
            description: 'Order purchase',
            timestamp: new Date()
          }
        },
        $set: { lastUpdated: new Date() }
      },
      { upsert: true, session }
    );

    // Check for tier upgrades
    const loyaltyAccount = await this.collections.loyalty.findOne(
      { userId: userId },
      { session }
    );

    if (loyaltyAccount) {
      const newTier = this.calculateLoyaltyTier(loyaltyAccount.totalSpend, loyaltyAccount.totalPoints);

      if (newTier !== loyaltyAccount.currentTier) {
        await this.collections.loyalty.updateOne(
          { userId: userId },
          {
            $set: { 
              currentTier: newTier,
              tierUpgradedAt: new Date()
            },
            $push: {
              tierHistory: {
                previousTier: loyaltyAccount.currentTier,
                newTier: newTier,
                upgradedAt: new Date()
              }
            }
          },
          { session }
        );

        // Update user's tier in main user document
        await this.collections.users.updateOne(
          { _id: userId },
          { $set: { loyaltyTier: newTier } },
          { session }
        );
      }
    }

    return {
      pointsEarned: pointsEarned,
      newTier: loyaltyAccount?.currentTier
    };
  }

  async createShippingRecord(order, session) {
    console.log(`Creating shipping record for order: ${order._id}`);

    const shippingRecord = {
      _id: new ObjectId(),
      orderId: order._id,
      userId: order.userId,

      shippingAddress: order.shipping.address,
      shippingMethod: order.shipping.method,

      status: 'pending',
      trackingNumber: null, // Will be assigned when shipped

      estimatedDelivery: order.shipping.estimatedDelivery,
      actualDelivery: null,

      carrier: this.selectShippingCarrier(order.shipping.method),

      shippingCost: order.shipping.cost,

      items: order.items.map(item => ({
        productId: item.productId,
        quantity: item.quantity,
        weight: item.estimatedWeight || 1, // Default weight
        dimensions: item.dimensions
      })),

      lifecycle: {
        createdAt: new Date(),
        status: 'pending',
        statusHistory: [{
          status: 'pending',
          timestamp: new Date(),
          note: 'Shipping record created'
        }]
      }
    };

    await this.collections.shipping.insertOne(shippingRecord, { session });
    return shippingRecord;
  }

  async createTransactionAuditTrail(auditData, session) {
    console.log('Creating comprehensive audit trail');

    const auditEntry = {
      _id: new ObjectId(),

      // Transaction identification
      transactionId: auditData.sessionId || new ObjectId(),
      transactionType: 'order_creation',

      // Entity information
      orderId: auditData.orderId,
      userId: auditData.userId,
      paymentId: auditData.paymentId,

      // Transaction details
      amount: auditData.amount,
      currency: 'USD',

      // Changes made
      changes: {
        orderCreated: {
          orderId: auditData.orderId,
          status: 'confirmed',
          timestamp: auditData.timestamp
        },
        inventoryChanges: auditData.inventoryChanges,
        paymentProcessed: {
          paymentId: auditData.paymentId,
          amount: auditData.amount,
          status: 'completed'
        },
        loyaltyUpdated: true
      },

      // Compliance and security
      compliance: {
        dataRetentionPeriod: 7 * 365 * 24 * 60 * 60 * 1000, // 7 years
        encryptionRequired: true,
        auditLevel: 'full'
      },

      // System metadata
      system: {
        applicationVersion: process.env.APP_VERSION || '1.0.0',
        nodeId: process.env.NODE_ID || 'node-1',
        environment: process.env.NODE_ENV || 'development'
      },

      timestamp: auditData.timestamp,
      createdAt: new Date()
    };

    await this.collections.auditLog.insertOne(auditEntry, { session });
    return auditEntry;
  }

  // Helper methods
  isPromotionApplicable(promotion, orderData, user) {
    // Implement promotion applicability logic
    if (promotion.minOrderAmount && orderData.subtotal < promotion.minOrderAmount) {
      return false;
    }

    if (promotion.applicableUserTiers && !promotion.applicableUserTiers.includes(user.loyaltyTier)) {
      return false;
    }

    if (promotion.maxUsesPerUser) {
      // Check usage count (would need to query promotion usage history)
      return true; // Simplified for example
    }

    return true;
  }

  calculatePromotionDiscount(promotion, orderAmount) {
    switch (promotion.discountType) {
      case 'percentage':
        return orderAmount * (promotion.discountValue / 100);
      case 'fixed_amount':
        return Math.min(promotion.discountValue, orderAmount);
      default:
        return 0;
    }
  }

  calculateLoyaltyTier(totalSpend, totalPoints) {
    if (totalSpend >= 10000) return 'platinum';
    if (totalSpend >= 5000) return 'gold';
    if (totalSpend >= 1000) return 'silver';
    return 'bronze';
  }

  selectShippingCarrier(shippingMethod) {
    const carrierMap = {
      'standard': 'USPS',
      'expedited': 'FedEx',
      'overnight': 'UPS',
      'two_day': 'FedEx'
    };
    return carrierMap[shippingMethod] || 'USPS';
  }

  // Advanced transaction patterns
  async processBulkTransactions(transactions) {
    console.log(`Processing ${transactions.length} bulk transactions`);

    const session = client.startSession();
    const results = [];

    try {
      await session.withTransaction(async () => {
        for (const transactionData of transactions) {
          try {
            const result = await this.processComplexOrder(transactionData);
            results.push({
              success: true,
              orderId: result.orderId,
              data: result
            });
          } catch (error) {
            results.push({
              success: false,
              error: error.message,
              transactionData: transactionData
            });

            // Decide whether to continue or abort entire batch
            if (error.critical) {
              throw error; // Abort entire batch
            }
          }
        }
      });

    } catch (error) {
      console.error('Bulk transaction failed:', error);
      throw error;
    } finally {
      await session.endSession();
    }

    return results;
  }

  async processCompensatingTransaction(originalOrderId, compensationType) {
    console.log(`Processing compensating transaction for order: ${originalOrderId}`);

    const session = client.startSession();

    try {
      return await session.withTransaction(async () => {

        // Fetch original order
        const originalOrder = await this.collections.orders.findOne(
          { _id: originalOrderId },
          { session }
        );

        if (!originalOrder) {
          throw new Error('Original order not found');
        }

        switch (compensationType) {
          case 'full_refund':
            return await this.processFullRefund(originalOrder, session);
          case 'partial_refund':
            return await this.processPartialRefund(originalOrder, session);
          case 'order_cancellation':
            return await this.processOrderCancellation(originalOrder, session);
          default:
            throw new Error(`Unknown compensation type: ${compensationType}`);
        }
      });

    } finally {
      await session.endSession();
    }
  }

  async processFullRefund(originalOrder, session) {
    console.log(`Processing full refund for order: ${originalOrder._id}`);

    // Release inventory reservations
    for (const item of originalOrder.items) {
      await this.collections.inventory.updateOne(
        { productId: item.productId },
        {
          $inc: { reservedQuantity: -item.quantity },
          $push: {
            reservationHistory: {
              reservationId: new ObjectId(),
              quantity: -item.quantity,
              timestamp: new Date(),
              type: 'refund_release'
            }
          }
        },
        { session }
      );
    }

    // Process refund payment
    const refundPayment = {
      _id: new ObjectId(),
      originalOrderId: originalOrder._id,
      originalPaymentId: originalOrder.paymentId,
      userId: originalOrder.userId,
      amount: originalOrder.pricing.finalAmount,
      currency: 'USD',
      type: 'refund',
      status: 'completed',
      processedAt: new Date(),
      createdAt: new Date()
    };

    await this.collections.payments.insertOne(refundPayment, { session });

    // Update order status
    await this.collections.orders.updateOne(
      { _id: originalOrder._id },
      {
        $set: {
          status: 'refunded',
          'lifecycle.refundedAt': new Date(),
          'lifecycle.status': 'refunded'
        },
        $push: {
          'lifecycle.statusHistory': {
            status: 'refunded',
            timestamp: new Date(),
            note: 'Full refund processed'
          }
        }
      },
      { session }
    );

    // Update user account balance
    await this.collections.users.updateOne(
      { _id: originalOrder.userId },
      {
        $inc: { accountBalance: originalOrder.pricing.finalAmount },
        $push: {
          transactionHistory: {
            type: 'credit',
            amount: originalOrder.pricing.finalAmount,
            description: `Refund for order: ${originalOrder.orderNumber}`,
            timestamp: new Date()
          }
        }
      },
      { session }
    );

    // Create audit trail
    await this.createTransactionAuditTrail({
      orderId: originalOrder._id,
      userId: originalOrder.userId,
      amount: originalOrder.pricing.finalAmount,
      type: 'full_refund',
      timestamp: new Date()
    }, session);

    return {
      success: true,
      refundId: refundPayment._id,
      amount: originalOrder.pricing.finalAmount
    };
  }
}

// Benefits of MongoDB Multi-Document Transactions:
// - Full ACID compliance across multiple documents and collections
// - Automatic rollback on failure with consistent data state
// - Session-based transaction isolation with configurable read/write concerns
// - Support for complex business logic within transaction boundaries
// - Seamless integration with MongoDB's document model and flexible schemas
// - Distributed transaction support across replica sets and sharded clusters  
// - Rich error handling and transaction state management
// - Integration with MongoDB's change streams for real-time transaction monitoring
// - Optimistic concurrency control with automatic retry mechanisms
// - Native support for document relationships and embedded data structures

module.exports = {
  TransactionManager
};

Understanding MongoDB Transaction Architecture

Advanced Transaction Patterns and Isolation Levels

Implement sophisticated transaction patterns for different business scenarios:

// Advanced transaction patterns and isolation management
class AdvancedTransactionPatterns {
  constructor(db) {
    this.db = db;
    this.isolationLevels = {
      readUncommitted: { level: 'available' },
      readCommitted: { level: 'local' },
      repeatableRead: { level: 'majority' },
      serializable: { level: 'linearizable' }
    };
  }

  async demonstrateIsolationLevels() {
    console.log('Demonstrating MongoDB transaction isolation levels...');

    // Read Committed isolation (MongoDB default)
    const readCommittedSession = client.startSession();
    try {
      await readCommittedSession.withTransaction(async () => {

        // Reads only committed data
        const userData = await this.db.collection('users').findOne(
          { _id: 'user123' },
          { 
            session: readCommittedSession,
            readConcern: this.isolationLevels.readCommitted
          }
        );

        // Updates are isolated from other transactions
        await this.db.collection('users').updateOne(
          { _id: 'user123' },
          { $set: { lastActivity: new Date() } },
          { session: readCommittedSession }
        );

      }, {
        readConcern: this.isolationLevels.readCommitted,
        writeConcern: { w: 'majority', j: true }
      });
    } finally {
      await readCommittedSession.endSession();
    }

    // Snapshot isolation for consistent reads
    const snapshotSession = client.startSession();
    try {
      await snapshotSession.withTransaction(async () => {

        // All reads within transaction see consistent snapshot
        const orders = await this.db.collection('orders').find(
          { userId: 'user123' },
          { session: snapshotSession }
        ).toArray();

        const inventory = await this.db.collection('inventory').find(
          { productId: { $in: orders.map(o => o.productId) } },
          { session: snapshotSession }
        ).toArray();

        // Both reads see data from same point in time
        console.log(`Found ${orders.length} orders and ${inventory.length} inventory items`);

      }, {
        readConcern: { level: 'snapshot' },
        writeConcern: { w: 'majority', j: true }
      });
    } finally {
      await snapshotSession.endSession();
    }
  }

  async implementSagaPattern(sagaSteps) {
    // Saga pattern for distributed transaction coordination
    console.log('Implementing Saga pattern for distributed transactions...');

    const sagaId = new ObjectId();
    const saga = {
      _id: sagaId,
      status: 'started',
      steps: sagaSteps,
      currentStep: 0,
      compensations: [],
      createdAt: new Date()
    };

    // Create saga record
    await this.db.collection('sagas').insertOne(saga);

    try {
      for (let i = 0; i < sagaSteps.length; i++) {
        const step = sagaSteps[i];

        console.log(`Executing saga step ${i + 1}/${sagaSteps.length}: ${step.name}`);

        const session = client.startSession();
        try {
          await session.withTransaction(async () => {

            // Execute step within transaction
            const stepResult = await this.executeSagaStep(step, session);

            // Update saga progress
            await this.db.collection('sagas').updateOne(
              { _id: sagaId },
              {
                $set: {
                  currentStep: i + 1,
                  status: i === sagaSteps.length - 1 ? 'completed' : 'in_progress',
                  lastUpdated: new Date()
                },
                $push: {
                  stepResults: {
                    stepIndex: i,
                    stepName: step.name,
                    result: stepResult,
                    completedAt: new Date()
                  }
                }
              },
              { session }
            );

          });
        } finally {
          await session.endSession();
        }
      }

      console.log(`Saga ${sagaId} completed successfully`);
      return { success: true, sagaId };

    } catch (error) {
      console.error(`Saga ${sagaId} failed at step ${saga.currentStep}:`, error);

      // Execute compensating transactions
      await this.compensateSaga(sagaId, saga.currentStep);

      throw error;
    }
  }

  async compensateSaga(sagaId, failedStepIndex) {
    console.log(`Compensating saga ${sagaId} from step ${failedStepIndex}`);

    const saga = await this.db.collection('sagas').findOne({ _id: sagaId });

    // Execute compensations in reverse order
    for (let i = failedStepIndex - 1; i >= 0; i--) {
      const step = saga.steps[i];

      if (step.compensation) {
        console.log(`Executing compensation for step ${i + 1}: ${step.compensation.name}`);

        const session = client.startSession();
        try {
          await session.withTransaction(async () => {
            await this.executeCompensation(step.compensation, session);

            await this.db.collection('sagas').updateOne(
              { _id: sagaId },
              {
                $push: {
                  compensationsExecuted: {
                    stepIndex: i,
                    compensationName: step.compensation.name,
                    executedAt: new Date()
                  }
                }
              },
              { session }
            );
          });
        } finally {
          await session.endSession();
        }
      }
    }

    // Mark saga as compensated
    await this.db.collection('sagas').updateOne(
      { _id: sagaId },
      {
        $set: {
          status: 'compensated',
          compensatedAt: new Date()
        }
      }
    );
  }

  async implementOptimisticLocking() {
    // Optimistic locking pattern for concurrent updates
    console.log('Implementing optimistic locking pattern...');

    const session = client.startSession();
    const maxRetries = 3;
    let retryCount = 0;

    while (retryCount < maxRetries) {
      try {
        await session.withTransaction(async () => {

          // Read document with current version
          const document = await this.db.collection('accounts').findOne(
            { _id: 'account123' },
            { session }
          );

          if (!document) {
            throw new Error('Account not found');
          }

          // Simulate business logic processing time
          await new Promise(resolve => setTimeout(resolve, 100));

          // Update with version check
          const updateResult = await this.db.collection('accounts').updateOne(
            { 
              _id: 'account123',
              version: document.version  // Optimistic lock check
            },
            {
              $set: { 
                balance: document.balance - 100,
                lastUpdated: new Date()
              },
              $inc: { version: 1 }  // Increment version
            },
            { session }
          );

          if (updateResult.modifiedCount === 0) {
            throw new Error('Optimistic lock conflict - document was modified by another transaction');
          }

          console.log('Optimistic lock update successful');

        });

        break; // Success - exit retry loop

      } catch (error) {
        retryCount++;

        if (error.message.includes('optimistic lock conflict') && retryCount < maxRetries) {
          console.log(`Optimistic lock conflict, retrying (${retryCount}/${maxRetries})...`);

          // Exponential backoff before retry
          await new Promise(resolve => setTimeout(resolve, Math.pow(2, retryCount) * 100));

        } else {
          console.error('Optimistic locking failed:', error);
          throw error;
        }
      }
    }

    await session.endSession();
  }

  async implementDistributedLocking() {
    // Distributed locking for coordinating access across instances
    console.log('Implementing distributed locking pattern...');

    const lockId = 'global-lock-' + new ObjectId();
    const lockTimeout = 30000; // 30 seconds
    const acquireTimeout = 5000; // 5 seconds to acquire

    const session = client.startSession();

    try {
      // Attempt to acquire distributed lock
      const lockAcquired = await this.acquireDistributedLock(
        lockId, lockTimeout, acquireTimeout, session
      );

      if (!lockAcquired) {
        throw new Error('Failed to acquire distributed lock');
      }

      console.log(`Distributed lock acquired: ${lockId}`);

      // Perform critical section operations within transaction
      await session.withTransaction(async () => {

        // Critical operations that require distributed coordination
        await this.performCriticalOperations(session);

        // Refresh lock if needed for long operations
        await this.refreshDistributedLock(lockId, lockTimeout, session);

      });

    } finally {
      // Always release the lock
      await this.releaseDistributedLock(lockId, session);
      await session.endSession();
    }
  }

  async acquireDistributedLock(lockId, timeout, acquireTimeout, session) {
    const expiration = new Date(Date.now() + timeout);
    const acquireDeadline = Date.now() + acquireTimeout;

    while (Date.now() < acquireDeadline) {
      try {
        const result = await this.db.collection('distributed_locks').insertOne(
          {
            _id: lockId,
            owner: process.env.NODE_ID || 'unknown',
            acquiredAt: new Date(),
            expiresAt: expiration
          },
          { session }
        );

        if (result.acknowledged) {
          return true; // Lock acquired
        }

      } catch (error) {
        if (error.code === 11000) { // Duplicate key error - lock exists

          // Check if lock is expired and can be claimed
          const existingLock = await this.db.collection('distributed_locks').findOne(
            { _id: lockId },
            { session }
          );

          if (existingLock && existingLock.expiresAt < new Date()) {
            // Lock is expired, try to claim it
            const claimResult = await this.db.collection('distributed_locks').replaceOne(
              { 
                _id: lockId, 
                expiresAt: existingLock.expiresAt 
              },
              {
                _id: lockId,
                owner: process.env.NODE_ID || 'unknown',
                acquiredAt: new Date(),
                expiresAt: expiration
              },
              { session }
            );

            if (claimResult.modifiedCount > 0) {
              return true; // Successfully claimed expired lock
            }
          }

          // Lock is held by someone else, wait and retry
          await new Promise(resolve => setTimeout(resolve, 50));

        } else {
          throw error;
        }
      }
    }

    return false; // Failed to acquire lock within timeout
  }

  async releaseDistributedLock(lockId, session) {
    await this.db.collection('distributed_locks').deleteOne(
      { 
        _id: lockId,
        owner: process.env.NODE_ID || 'unknown'
      },
      { session }
    );

    console.log(`Distributed lock released: ${lockId}`);
  }

  async implementTransactionRetryLogic() {
    // Advanced retry logic for transaction conflicts
    console.log('Implementing advanced transaction retry logic...');

    const retryConfig = {
      maxRetries: 5,
      initialDelay: 100,
      maxDelay: 2000,
      backoffMultiplier: 2,
      jitterRange: 0.1
    };

    let attempt = 0;

    while (attempt < retryConfig.maxRetries) {
      const session = client.startSession();

      try {
        const result = await session.withTransaction(async () => {

          // Simulate transaction work that might conflict
          const account = await this.db.collection('accounts').findOne(
            { _id: 'account123' },
            { session }
          );

          if (!account) {
            throw new Error('Account not found');
          }

          // Business logic that might conflict with other transactions
          const newBalance = account.balance - 50;

          if (newBalance < 0) {
            throw new Error('Insufficient funds');
          }

          await this.db.collection('accounts').updateOne(
            { _id: 'account123' },
            { 
              $set: { 
                balance: newBalance,
                lastUpdated: new Date() 
              }
            },
            { session }
          );

          return { success: true, newBalance };

        }, {
          readConcern: { level: 'majority' },
          writeConcern: { w: 'majority', j: true },
          maxCommitTimeMS: 30000
        });

        console.log('Transaction succeeded:', result);
        return result;

      } catch (error) {
        attempt++;

        // Check if error is retryable
        const isRetryable = this.isTransactionRetryable(error);

        if (isRetryable && attempt < retryConfig.maxRetries) {
          // Calculate retry delay with exponential backoff and jitter
          const baseDelay = Math.min(
            retryConfig.initialDelay * Math.pow(retryConfig.backoffMultiplier, attempt - 1),
            retryConfig.maxDelay
          );

          const jitter = baseDelay * retryConfig.jitterRange * (Math.random() - 0.5);
          const delay = baseDelay + jitter;

          console.log(`Transaction failed (attempt ${attempt}), retrying in ${delay}ms:`, error.message);

          await new Promise(resolve => setTimeout(resolve, delay));

        } else {
          console.error('Transaction failed after all retries:', error);
          throw error;
        }
      } finally {
        await session.endSession();
      }
    }
  }

  isTransactionRetryable(error) {
    // Determine if transaction error is retryable
    const retryableErrors = [
      'WriteConflict',
      'TransientTransactionError',
      'UnknownTransactionCommitResult',
      'LockTimeout',
      'TemporarilyUnavailable'
    ];

    return retryableErrors.some(retryableError => 
      error.message.includes(retryableError) ||
      error.code === 112 || // WriteConflict
      error.code === 50 ||  // ExceededTimeLimit
      error.hasErrorLabel('TransientTransactionError') ||
      error.hasErrorLabel('UnknownTransactionCommitResult')
    );
  }

  async performTransactionPerformanceTesting() {
    console.log('Performing transaction performance testing...');

    const testConfig = {
      concurrentTransactions: 10,
      transactionsPerThread: 100,
      documentCount: 1000
    };

    // Setup test data
    await this.setupPerformanceTestData(testConfig.documentCount);

    const startTime = Date.now();
    const promises = [];

    // Launch concurrent transaction threads
    for (let i = 0; i < testConfig.concurrentTransactions; i++) {
      const promise = this.runTransactionThread(i, testConfig.transactionsPerThread);
      promises.push(promise);
    }

    // Wait for all threads to complete
    const results = await Promise.allSettled(promises);
    const endTime = Date.now();

    // Analyze results
    const successful = results.filter(r => r.status === 'fulfilled').length;
    const failed = results.filter(r => r.status === 'rejected').length;
    const totalTransactions = testConfig.concurrentTransactions * testConfig.transactionsPerThread;
    const throughput = totalTransactions / ((endTime - startTime) / 1000);

    console.log('Transaction Performance Results:');
    console.log(`- Total transactions: ${totalTransactions}`);
    console.log(`- Successful threads: ${successful}/${testConfig.concurrentTransactions}`);
    console.log(`- Failed threads: ${failed}`);
    console.log(`- Total time: ${endTime - startTime}ms`);
    console.log(`- Throughput: ${throughput.toFixed(2)} transactions/second`);

    return {
      totalTransactions,
      successful,
      failed,
      duration: endTime - startTime,
      throughput
    };
  }

  async runTransactionThread(threadId, transactionCount) {
    console.log(`Starting transaction thread ${threadId} with ${transactionCount} transactions`);

    for (let i = 0; i < transactionCount; i++) {
      const session = client.startSession();

      try {
        await session.withTransaction(async () => {

          // Simulate realistic transaction workload
          const fromAccount = `account_${threadId}_${Math.floor(Math.random() * 10)}`;
          const toAccount = `account_${(threadId + 1) % 10}_${Math.floor(Math.random() * 10)}`;
          const amount = Math.floor(Math.random() * 100) + 1;

          // Transfer funds between accounts
          const fromDoc = await this.db.collection('test_accounts').findOne(
            { _id: fromAccount },
            { session }
          );

          if (fromDoc && fromDoc.balance >= amount) {
            await this.db.collection('test_accounts').updateOne(
              { _id: fromAccount },
              { $inc: { balance: -amount } },
              { session }
            );

            await this.db.collection('test_accounts').updateOne(
              { _id: toAccount },
              { $inc: { balance: amount } },
              { upsert: true, session }
            );

            // Create transaction record
            await this.db.collection('test_transactions').insertOne(
              {
                fromAccount,
                toAccount,
                amount,
                timestamp: new Date(),
                threadId,
                transactionIndex: i
              },
              { session }
            );
          }

        });

      } catch (error) {
        console.error(`Transaction ${i} in thread ${threadId} failed:`, error.message);
      } finally {
        await session.endSession();
      }
    }

    console.log(`Thread ${threadId} completed`);
  }

  async setupPerformanceTestData(documentCount) {
    console.log(`Setting up ${documentCount} test accounts...`);

    // Clear existing test data
    await this.db.collection('test_accounts').deleteMany({});
    await this.db.collection('test_transactions').deleteMany({});

    // Create test accounts
    const accounts = [];
    for (let i = 0; i < documentCount; i++) {
      accounts.push({
        _id: `account_${Math.floor(i / 100)}_${i % 100}`,
        balance: Math.floor(Math.random() * 1000) + 100,
        createdAt: new Date()
      });
    }

    await this.db.collection('test_accounts').insertMany(accounts);

    console.log('Test data setup completed');
  }
}

SQL-Style Transaction Operations with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB transaction management:

-- QueryLeaf transaction operations with SQL-familiar syntax

-- Begin transaction with isolation level
BEGIN TRANSACTION 
WITH (
  isolation_level = 'read_committed',
  read_concern = 'majority',
  write_concern = 'majority',
  max_timeout = '30s'
);

-- Complex multi-collection transaction
WITH order_validation AS (
  -- Validate inventory availability
  SELECT 
    product_id,
    available_quantity,
    reserved_quantity,
    CASE 
      WHEN available_quantity >= 5 THEN true 
      ELSE false 
    END as inventory_available
  FROM inventory 
  WHERE product_id = 'prod_12345'
),
payment_validation AS (
  -- Validate user payment capability
  SELECT 
    user_id,
    account_balance,
    credit_limit,
    account_status,
    CASE 
      WHEN account_status = 'active' AND (account_balance + credit_limit) >= 299.99 THEN true
      ELSE false
    END as payment_valid
  FROM users 
  WHERE user_id = 'user_67890'
),
promotion_calculation AS (
  -- Calculate applicable promotions
  SELECT 
    promotion_id,
    discount_type,
    discount_value,
    CASE discount_type
      WHEN 'percentage' THEN 299.99 * (discount_value / 100.0)
      WHEN 'fixed' THEN LEAST(discount_value, 299.99)
      ELSE 0
    END as discount_amount
  FROM promotions 
  WHERE active = true 
    AND start_date <= CURRENT_TIMESTAMP 
    AND end_date >= CURRENT_TIMESTAMP
    AND (global_promotion = true OR 'user_67890' = ANY(applicable_users))
  ORDER BY discount_amount DESC
  LIMIT 1
)

-- Create order within transaction
INSERT INTO orders (
  order_id,
  user_id,
  order_number,
  status,

  -- Order items as nested documents
  items,

  -- Pricing breakdown
  pricing,

  -- Customer information  
  customer,

  -- Shipping details
  shipping,

  -- Lifecycle tracking
  lifecycle,

  created_at
)
SELECT 
  gen_random_uuid() as order_id,
  'user_67890' as user_id,
  'ORD-' || EXTRACT(EPOCH FROM NOW())::bigint || '-' || SUBSTRING(MD5(RANDOM()::text), 1, 9) as order_number,
  'confirmed' as status,

  -- Items array with product details
  JSON_BUILD_ARRAY(
    JSON_BUILD_OBJECT(
      'product_id', 'prod_12345',
      'product_name', 'Premium Widget',
      'quantity', 5,
      'unit_price', 59.99,
      'line_total', 299.95,
      'product_snapshot', JSON_BUILD_OBJECT(
        'name', 'Premium Widget',
        'category', 'electronics',
        'sku', 'WID-12345'
      )
    )
  ) as items,

  -- Pricing structure
  JSON_BUILD_OBJECT(
    'subtotal', 299.99,
    'discount_amount', COALESCE(pc.discount_amount, 0),
    'tax_amount', (299.99 - COALESCE(pc.discount_amount, 0)) * 0.08,
    'shipping_cost', 15.99,
    'final_amount', (299.99 - COALESCE(pc.discount_amount, 0)) * 1.08 + 15.99
  ) as pricing,

  -- Customer data
  JSON_BUILD_OBJECT(
    'user_id', 'user_67890',
    'email', 'customer@example.com',
    'loyalty_tier', 'gold'
  ) as customer,

  -- Shipping information
  JSON_BUILD_OBJECT(
    'address', JSON_BUILD_OBJECT(
      'street', '123 Main St',
      'city', 'Anytown',
      'state', 'CA',
      'zip', '12345'
    ),
    'method', 'standard',
    'estimated_delivery', CURRENT_TIMESTAMP + INTERVAL '5 days',
    'cost', 15.99
  ) as shipping,

  -- Lifecycle tracking
  JSON_BUILD_OBJECT(
    'created_at', CURRENT_TIMESTAMP,
    'confirmed_at', CURRENT_TIMESTAMP,
    'status', 'confirmed',
    'estimated_fulfillment', CURRENT_TIMESTAMP + INTERVAL '1 day'
  ) as lifecycle,

  CURRENT_TIMESTAMP as created_at

FROM order_validation ov
CROSS JOIN payment_validation pv  
LEFT JOIN promotion_calculation pc ON true
WHERE ov.inventory_available = true 
  AND pv.payment_valid = true;

-- Update inventory within same transaction
UPDATE inventory 
SET 
  reserved_quantity = reserved_quantity + 5,
  reservation_history = ARRAY_APPEND(
    reservation_history,
    JSON_BUILD_OBJECT(
      'reservation_id', gen_random_uuid(),
      'quantity', 5,
      'timestamp', CURRENT_TIMESTAMP,
      'type', 'order_reservation'
    )
  ),
  last_updated = CURRENT_TIMESTAMP
WHERE product_id = 'prod_12345' 
  AND (quantity - reserved_quantity) >= 5;

-- Process payment within transaction
INSERT INTO payments (
  payment_id,
  order_id,
  user_id,
  amount,
  currency,
  payment_method,
  status,
  transaction_details,
  gateway,
  created_at
)
SELECT 
  gen_random_uuid() as payment_id,
  o.order_id,
  o.user_id,
  (o.pricing->>'final_amount')::numeric as amount,
  'USD' as currency,

  -- Payment method details
  JSON_BUILD_OBJECT(
    'type', 'card',
    'masked_number', '****1234',
    'provider', 'visa'
  ) as payment_method,

  'completed' as status,

  -- Transaction details
  JSON_BUILD_OBJECT(
    'authorization_code', 'AUTH_' || EXTRACT(EPOCH FROM NOW())::bigint,
    'transaction_id', 'TXN_' || SUBSTRING(MD5(RANDOM()::text), 1, 16),
    'processed_at', CURRENT_TIMESTAMP,
    'processing_fee', (o.pricing->>'final_amount')::numeric * 0.029,
    'risk_score', RANDOM() * 0.3,
    'fraud_checks', JSON_BUILD_OBJECT(
      'address_verification', 'pass',
      'cvv_verification', 'pass', 
      'velocity_check', 'pass'
    )
  ) as transaction_details,

  -- Gateway information
  JSON_BUILD_OBJECT(
    'provider', 'stripe',
    'gateway_transaction_id', 'pi_' || SUBSTRING(MD5(RANDOM()::text), 1, 24),
    'gateway_fee', (o.pricing->>'final_amount')::numeric * 0.029 + 0.30
  ) as gateway,

  CURRENT_TIMESTAMP as created_at

FROM orders o 
WHERE o.user_id = 'user_67890' 
  AND o.created_at >= CURRENT_TIMESTAMP - INTERVAL '1 minute';

-- Update user loyalty program
UPDATE loyalty_program 
SET 
  total_points = total_points + FLOOR((SELECT pricing->>'final_amount' FROM orders WHERE user_id = 'user_67890' AND created_at >= CURRENT_TIMESTAMP - INTERVAL '1 minute')::numeric),
  lifetime_points = lifetime_points + FLOOR((SELECT pricing->>'final_amount' FROM orders WHERE user_id = 'user_67890' AND created_at >= CURRENT_TIMESTAMP - INTERVAL '1 minute')::numeric),
  total_spend = total_spend + (SELECT pricing->>'final_amount' FROM orders WHERE user_id = 'user_67890' AND created_at >= CURRENT_TIMESTAMP - INTERVAL '1 minute')::numeric,

  points_history = ARRAY_APPEND(
    points_history,
    JSON_BUILD_OBJECT(
      'type', 'earned',
      'points', FLOOR((SELECT pricing->>'final_amount' FROM orders WHERE user_id = 'user_67890' AND created_at >= CURRENT_TIMESTAMP - INTERVAL '1 minute')::numeric),
      'description', 'Order purchase',
      'timestamp', CURRENT_TIMESTAMP
    )
  ),

  last_updated = CURRENT_TIMESTAMP
WHERE user_id = 'user_67890';

-- Create comprehensive audit trail
INSERT INTO audit_log (
  audit_id,
  transaction_id,
  transaction_type,
  entities_affected,
  changes_made,
  user_id,
  amount,
  compliance,
  timestamp
)
SELECT 
  gen_random_uuid() as audit_id,
  txid_current() as transaction_id,
  'order_creation' as transaction_type,

  -- Entities affected by transaction
  JSON_BUILD_OBJECT(
    'order_id', o.order_id,
    'payment_id', p.payment_id,
    'user_id', o.user_id,
    'product_ids', JSON_BUILD_ARRAY('prod_12345')
  ) as entities_affected,

  -- Detailed changes made
  JSON_BUILD_OBJECT(
    'order_created', JSON_BUILD_OBJECT(
      'order_id', o.order_id,
      'status', 'confirmed',
      'amount', (o.pricing->>'final_amount')::numeric
    ),
    'inventory_reserved', JSON_BUILD_OBJECT(
      'product_id', 'prod_12345',
      'quantity_reserved', 5
    ),
    'payment_processed', JSON_BUILD_OBJECT(
      'payment_id', p.payment_id,
      'amount', p.amount,
      'status', 'completed'
    ),
    'loyalty_updated', JSON_BUILD_OBJECT(
      'points_earned', FLOOR(p.amount),
      'total_spend_increase', p.amount
    )
  ) as changes_made,

  o.user_id,
  (o.pricing->>'final_amount')::numeric as amount,

  -- Compliance information
  JSON_BUILD_OBJECT(
    'retention_period', 2557, -- 7 years in days
    'encryption_required', true,
    'audit_level', 'full'
  ) as compliance,

  CURRENT_TIMESTAMP as timestamp

FROM orders o
JOIN payments p ON o.order_id = p.order_id
WHERE o.user_id = 'user_67890' 
  AND o.created_at >= CURRENT_TIMESTAMP - INTERVAL '1 minute';

-- Transaction validation before commit
SELECT 
  -- Verify order creation
  (SELECT COUNT(*) FROM orders WHERE user_id = 'user_67890' AND created_at >= CURRENT_TIMESTAMP - INTERVAL '1 minute') as orders_created,

  -- Verify payment processing
  (SELECT COUNT(*) FROM payments WHERE user_id = 'user_67890' AND created_at >= CURRENT_TIMESTAMP - INTERVAL '1 minute') as payments_processed,

  -- Verify inventory reservation
  (SELECT reserved_quantity FROM inventory WHERE product_id = 'prod_12345') as inventory_reserved,

  -- Verify loyalty update
  (SELECT total_points FROM loyalty_program WHERE user_id = 'user_67890') as loyalty_points,

  -- Overall validation
  CASE 
    WHEN (SELECT COUNT(*) FROM orders WHERE user_id = 'user_67890' AND created_at >= CURRENT_TIMESTAMP - INTERVAL '1 minute') = 1
     AND (SELECT COUNT(*) FROM payments WHERE user_id = 'user_67890' AND created_at >= CURRENT_TIMESTAMP - INTERVAL '1 minute') = 1
     AND (SELECT reserved_quantity FROM inventory WHERE product_id = 'prod_12345') >= 5
    THEN 'TRANSACTION_VALID'
    ELSE 'TRANSACTION_INVALID'
  END as validation_result;

-- Conditional commit based on validation
COMMIT TRANSACTION
WHERE validation_result = 'TRANSACTION_VALID';

-- Automatic rollback if validation fails
-- ROLLBACK TRANSACTION IF validation_result = 'TRANSACTION_INVALID';

-- Advanced transaction patterns with QueryLeaf

-- Nested transaction with savepoints
BEGIN TRANSACTION;

  -- Create savepoint for partial rollback
  SAVEPOINT order_creation;

  -- Create initial order
  INSERT INTO orders (order_id, user_id, status, created_at)
  VALUES (gen_random_uuid(), 'user_123', 'pending', CURRENT_TIMESTAMP);

  -- Create savepoint before inventory updates
  SAVEPOINT inventory_updates;

  -- Update inventory (might fail)
  UPDATE inventory 
  SET reserved_quantity = reserved_quantity + 10
  WHERE product_id = 'prod_456' AND quantity >= reserved_quantity + 10;

  -- Check if inventory update succeeded
  SELECT 
    CASE 
      WHEN ROW_COUNT() = 0 THEN 'INSUFFICIENT_INVENTORY'
      ELSE 'INVENTORY_UPDATED'
    END as inventory_status;

  -- Conditional rollback to savepoint
  ROLLBACK TO SAVEPOINT inventory_updates 
  WHERE inventory_status = 'INSUFFICIENT_INVENTORY';

  -- Alternative inventory handling
  UPDATE orders 
  SET status = 'backordered',
      backorder_reason = 'Insufficient inventory'
  WHERE order_id IN (
    SELECT order_id FROM orders 
    WHERE user_id = 'user_123' 
      AND created_at >= CURRENT_TIMESTAMP - INTERVAL '1 minute'
  )
  AND inventory_status = 'INSUFFICIENT_INVENTORY';

COMMIT TRANSACTION;

-- Distributed transaction across collections
BEGIN DISTRIBUTED_TRANSACTION 
WITH (
  collections = ['orders', 'inventory', 'payments', 'audit_log'],
  coordinator = 'two_phase_commit',
  timeout = '60s'
);

  -- Phase 1: Prepare all operations
  PREPARE TRANSACTION 'order_tx_001' ON orders, inventory, payments, audit_log;

  -- Phase 2: Commit if all participants are ready
  COMMIT PREPARED 'order_tx_001';

-- Transaction with retry logic
BEGIN TRANSACTION 
WITH (
  retry_attempts = 3,
  retry_delay = '100ms',
  exponential_backoff = true,
  max_delay = '2s'
);

  -- Operations that might conflict with concurrent transactions
  UPDATE accounts 
  SET balance = balance - 100,
      version = version + 1,
      last_updated = CURRENT_TIMESTAMP
  WHERE account_id = 'acc_789' 
    AND balance >= 100
    AND version = (
      SELECT version FROM accounts WHERE account_id = 'acc_789'
    ); -- Optimistic locking

COMMIT TRANSACTION 
WITH (
  on_conflict = 'retry',
  conflict_resolution = 'last_writer_wins'
);

-- Real-time transaction monitoring
WITH transaction_metrics AS (
  SELECT 
    DATE_TRUNC('minute', created_at) as time_bucket,
    COUNT(*) as total_transactions,
    COUNT(*) FILTER (WHERE status = 'completed') as successful_transactions,
    COUNT(*) FILTER (WHERE status = 'failed') as failed_transactions,
    COUNT(*) FILTER (WHERE status = 'rolled_back') as rolled_back_transactions,

    -- Performance metrics
    AVG(EXTRACT(EPOCH FROM (completed_at - created_at))) as avg_duration_seconds,
    PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY EXTRACT(EPOCH FROM (completed_at - created_at))) as p95_duration,
    MAX(EXTRACT(EPOCH FROM (completed_at - created_at))) as max_duration,

    -- Error analysis
    array_agg(DISTINCT error_code) FILTER (WHERE status = 'failed') as error_codes,
    array_agg(DISTINCT error_message) FILTER (WHERE status = 'failed') as error_messages,

    -- Lock analysis
    AVG(lock_wait_time_ms) as avg_lock_wait_time,
    COUNT(*) FILTER (WHERE lock_timeout = true) as lock_timeouts,

    -- Resource usage
    AVG(documents_read) as avg_docs_read,
    AVG(documents_written) as avg_docs_written,
    SUM(bytes_transferred) / (1024 * 1024) as total_mb_transferred

  FROM transaction_log
  WHERE created_at >= CURRENT_TIMESTAMP - INTERVAL '1 hour'
  GROUP BY DATE_TRUNC('minute', created_at)
),

transaction_health AS (
  SELECT 
    time_bucket,
    total_transactions,
    successful_transactions,
    failed_transactions,
    rolled_back_transactions,

    -- Success rate
    ROUND((successful_transactions::numeric / NULLIF(total_transactions, 0)) * 100, 1) as success_rate_percent,

    -- Performance assessment
    ROUND(avg_duration_seconds, 3) as avg_duration_sec,
    ROUND(p95_duration, 3) as p95_duration_sec,
    ROUND(max_duration, 3) as max_duration_sec,

    -- Performance status
    CASE 
      WHEN avg_duration_seconds > 30 THEN 'SLOW'
      WHEN avg_duration_seconds > 10 THEN 'DEGRADED'
      WHEN p95_duration > 60 THEN 'INCONSISTENT'
      ELSE 'NORMAL'
    END as performance_status,

    -- Error analysis
    CASE 
      WHEN (failed_transactions + rolled_back_transactions)::numeric / NULLIF(total_transactions, 0) > 0.1 THEN 'HIGH_ERROR_RATE'
      WHEN (failed_transactions + rolled_back_transactions)::numeric / NULLIF(total_transactions, 0) > 0.05 THEN 'ELEVATED_ERRORS'
      ELSE 'NORMAL_ERROR_RATE'
    END as error_status,

    error_codes,
    error_messages,

    -- Lock performance
    ROUND(avg_lock_wait_time, 1) as avg_lock_wait_ms,
    lock_timeouts,

    -- Resource efficiency
    ROUND(avg_docs_read, 1) as avg_docs_read,
    ROUND(avg_docs_written, 1) as avg_docs_written,
    ROUND(total_mb_transferred, 2) as mb_transferred

  FROM transaction_metrics
)

SELECT 
  time_bucket,
  total_transactions,
  success_rate_percent,
  performance_status,
  error_status,
  avg_duration_sec,
  p95_duration_sec,

  -- Alerts and recommendations
  CASE 
    WHEN performance_status = 'SLOW' THEN 'Transaction performance is degraded - investigate slow operations'
    WHEN performance_status = 'INCONSISTENT' THEN 'Inconsistent transaction performance - check for lock contention'
    WHEN error_status = 'HIGH_ERROR_RATE' THEN 'High transaction error rate - review application logic and retry mechanisms'
    WHEN lock_timeouts > total_transactions * 0.1 THEN 'Frequent lock timeouts - consider optimistic locking or shorter transactions'
    ELSE 'Transaction performance within normal parameters'
  END as recommendation,

  -- Detailed metrics for investigation
  error_codes,
  avg_lock_wait_ms,
  lock_timeouts,
  mb_transferred

FROM transaction_health
WHERE performance_status != 'NORMAL' OR error_status != 'NORMAL_ERROR_RATE'
ORDER BY time_bucket DESC;

-- Transaction isolation level testing
SELECT 
  isolation_level,
  transaction_id,
  operation_type,
  collection_name,

  -- Read phenomena detection
  CASE 
    WHEN EXISTS(
      SELECT 1 FROM transaction_operations o2 
      WHERE o2.transaction_id != t.transaction_id 
        AND o2.document_id = t.document_id
        AND o2.timestamp BETWEEN t.start_timestamp AND t.end_timestamp
        AND o2.operation_type = 'UPDATE'
    ) THEN 'DIRTY_READ_POSSIBLE'

    WHEN EXISTS(
      SELECT 1 FROM transaction_operations o2
      WHERE o2.transaction_id = t.transaction_id
        AND o2.document_id = t.document_id  
        AND o2.operation_type = 'READ'
        AND o2.timestamp < t.timestamp
        AND o2.value != t.value
    ) THEN 'NON_REPEATABLE_READ'

    ELSE 'CONSISTENT_READ'
  END as read_consistency_status,

  -- Lock analysis
  lock_type,
  lock_duration_ms,
  lock_conflicts,

  -- Performance impact
  operation_duration_ms,
  documents_affected,

  -- Concurrency metrics
  concurrent_transactions,
  wait_time_ms

FROM transaction_operations t
WHERE operation_timestamp >= CURRENT_TIMESTAMP - INTERVAL '5 minutes'
ORDER BY transaction_id, operation_timestamp;

-- QueryLeaf provides comprehensive transaction capabilities:
-- 1. SQL-familiar transaction syntax with BEGIN/COMMIT/ROLLBACK
-- 2. Advanced isolation level control and read/write concern specification
-- 3. Nested transactions with savepoint support for partial rollback
-- 4. Distributed transaction coordination across multiple collections
-- 5. Automatic retry logic with exponential backoff for conflict resolution
-- 6. Real-time transaction performance monitoring and health assessment
-- 7. Optimistic locking patterns with version-based conflict detection
-- 8. Complex multi-collection operations with full ACID guarantees
-- 9. Integration with MongoDB's native transaction optimizations
-- 10. Familiar SQL patterns for complex business logic within transactions

Best Practices for MongoDB Transaction Implementation

Transaction Design Guidelines

Essential principles for optimal MongoDB transaction design:

  1. Transaction Scope: Keep transactions as small and focused as possible to minimize lock contention
  2. Read/Write Patterns: Design transactions to minimize conflicts through strategic ordering of operations
  3. Retry Logic: Implement robust retry mechanisms for transient transaction failures
  4. Timeout Configuration: Set appropriate timeouts based on expected transaction duration
  5. Isolation Levels: Choose appropriate isolation levels based on consistency requirements
  6. Error Handling: Design comprehensive error handling with meaningful business-level responses

Performance and Scalability

Optimize MongoDB transactions for production workloads:

  1. Lock Minimization: Structure operations to minimize lock duration and scope
  2. Index Strategy: Ensure proper indexing to support transaction query patterns
  3. Connection Management: Use appropriate connection pooling for transaction workloads
  4. Monitoring Setup: Implement comprehensive transaction performance monitoring
  5. Resource Planning: Plan memory and CPU resources for transaction processing overhead
  6. Testing Strategy: Implement thorough testing for concurrent transaction scenarios

Conclusion

MongoDB Multi-Document Transactions provide comprehensive ACID compliance that eliminates the complexity and limitations of traditional distributed consistency approaches while maintaining the flexibility and scalability of MongoDB's document model. The ability to perform complex multi-collection operations with guaranteed consistency makes building reliable distributed systems both powerful and straightforward.

Key MongoDB Transaction benefits include:

  • Full ACID Compliance: Complete atomicity, consistency, isolation, and durability across multiple documents
  • Flexible Document Operations: Support for complex document structures and relationships within transactions
  • Distributed Consistency: Seamless operation across replica sets and sharded clusters
  • Automatic Rollback: Comprehensive rollback capabilities on failure with consistent state restoration
  • Performance Optimization: Intelligent locking and concurrency control for optimal throughput
  • Familiar Patterns: SQL-style transaction semantics with commit/rollback operations

Whether you're building e-commerce platforms, financial systems, inventory management applications, or any system requiring strong consistency guarantees, MongoDB Transactions with QueryLeaf's familiar SQL interface provides the foundation for reliable distributed applications. This combination enables sophisticated transaction processing while preserving familiar database interaction patterns.

QueryLeaf Integration: QueryLeaf automatically manages MongoDB transaction operations while providing SQL-familiar transaction control, isolation level management, and consistency guarantees. Advanced transaction patterns, retry logic, and performance monitoring are seamlessly handled through familiar SQL syntax, making robust distributed systems both powerful and accessible to SQL-oriented development teams.

The integration of native ACID transaction capabilities with SQL-style operations makes MongoDB an ideal platform for applications requiring both strong consistency and familiar database interaction patterns, ensuring your distributed systems remain both reliable and maintainable as they scale and evolve.

MongoDB Compound Indexes and Multi-Field Query Optimization: Advanced Indexing Strategies with SQL-Style Query Performance

Modern applications require sophisticated query patterns that filter, sort, and aggregate data across multiple fields simultaneously, demanding carefully optimized indexing strategies for optimal performance. Traditional database approaches often struggle with efficient multi-field query support, requiring complex index planning, manual query optimization, and extensive performance tuning to achieve acceptable response times.

MongoDB Compound Indexes provide advanced multi-field indexing capabilities that enable efficient querying across multiple dimensions with automatic query optimization, intelligent index selection, and sophisticated query planning. Unlike simple single-field indexes, compound indexes support complex query patterns including range queries, equality matches, and sorting operations across multiple fields with optimal performance characteristics.

The Traditional Multi-Field Query Challenge

Conventional approaches to multi-field indexing and query optimization have significant limitations for modern applications:

-- Traditional relational multi-field indexing - limited and complex

-- PostgreSQL approach with multiple single indexes
CREATE TABLE user_activities (
    activity_id BIGSERIAL PRIMARY KEY,
    user_id BIGINT NOT NULL,
    application_id VARCHAR(100) NOT NULL,
    activity_type VARCHAR(50) NOT NULL,
    status VARCHAR(20) NOT NULL,
    priority INTEGER DEFAULT 5,
    created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    completed_at TIMESTAMP,

    -- User context
    session_id VARCHAR(100),
    ip_address INET,
    user_agent TEXT,

    -- Activity data
    activity_data JSONB,
    metadata JSONB,

    -- Performance tracking
    execution_time_ms INTEGER,
    error_count INTEGER DEFAULT 0,
    retry_count INTEGER DEFAULT 0,

    -- Categorization
    category VARCHAR(100),
    subcategory VARCHAR(100),
    tags TEXT[],

    -- Geographic data
    country_code CHAR(2),
    region VARCHAR(100),
    city VARCHAR(100)
);

-- Multiple single-field indexes (inefficient for compound queries)
CREATE INDEX idx_user_activities_user_id ON user_activities (user_id);
CREATE INDEX idx_user_activities_app_id ON user_activities (application_id);
CREATE INDEX idx_user_activities_type ON user_activities (activity_type);
CREATE INDEX idx_user_activities_status ON user_activities (status);
CREATE INDEX idx_user_activities_created ON user_activities (created_at);
CREATE INDEX idx_user_activities_priority ON user_activities (priority);

-- Attempt at compound indexes (order matters significantly)
CREATE INDEX idx_user_app_status ON user_activities (user_id, application_id, status);
CREATE INDEX idx_app_type_created ON user_activities (application_id, activity_type, created_at);
CREATE INDEX idx_status_priority_created ON user_activities (status, priority, created_at);

-- Complex multi-field query with suboptimal performance
EXPLAIN (ANALYZE, BUFFERS) 
SELECT 
    ua.activity_id,
    ua.user_id,
    ua.application_id,
    ua.activity_type,
    ua.status,
    ua.priority,
    ua.created_at,
    ua.execution_time_ms,
    ua.activity_data,

    -- Derived metrics
    CASE 
        WHEN ua.completed_at IS NOT NULL THEN 
            EXTRACT(EPOCH FROM (ua.completed_at - ua.created_at)) * 1000
        ELSE NULL 
    END as total_duration_ms,

    -- Window functions for ranking
    ROW_NUMBER() OVER (
        PARTITION BY ua.user_id, ua.application_id 
        ORDER BY ua.priority DESC, ua.created_at DESC
    ) as user_app_rank,

    -- Activity scoring
    CASE
        WHEN ua.error_count = 0 AND ua.status = 'completed' THEN 100
        WHEN ua.error_count = 0 AND ua.status = 'in_progress' THEN 75
        WHEN ua.error_count > 0 AND ua.retry_count <= 3 THEN 50
        ELSE 25
    END as activity_score

FROM user_activities ua
WHERE 
    -- Multi-field filtering (challenging for optimizer)
    ua.user_id IN (12345, 23456, 34567, 45678)
    AND ua.application_id IN ('web_app', 'mobile_app', 'api_service')
    AND ua.activity_type IN ('login', 'purchase', 'api_call', 'data_export')
    AND ua.status IN ('completed', 'in_progress', 'failed')
    AND ua.priority >= 3
    AND ua.created_at >= CURRENT_TIMESTAMP - INTERVAL '7 days'
    AND ua.created_at <= CURRENT_TIMESTAMP - INTERVAL '1 hour'

    -- Geographic filtering
    AND ua.country_code IN ('US', 'CA', 'GB', 'DE')
    AND ua.region IS NOT NULL

    -- Performance filtering
    AND (ua.execution_time_ms IS NULL OR ua.execution_time_ms < 10000)
    AND ua.error_count <= 5

    -- Category filtering
    AND ua.category IN ('user_interaction', 'system_process', 'data_operation')

    -- JSON data filtering (expensive)
    AND ua.activity_data->>'source' IN ('web', 'mobile', 'api')
    AND COALESCE((ua.activity_data->>'amount')::numeric, 0) > 10

ORDER BY 
    ua.priority DESC,
    ua.created_at DESC,
    ua.user_id ASC
LIMIT 50;

-- Problems with traditional compound indexing:
-- 1. Index order critically affects query performance
-- 2. Limited flexibility for varying query patterns
-- 3. Index intersection overhead for multiple conditions
-- 4. Complex query planning with unpredictable performance
-- 5. Maintenance overhead with multiple specialized indexes
-- 6. Poor support for mixed equality and range conditions
-- 7. Difficulty optimizing for sorting requirements
-- 8. Limited support for JSON/document field indexing

-- Query performance analysis
WITH index_usage AS (
    SELECT 
        schemaname,
        tablename,
        indexname,
        idx_scan,
        idx_tup_read,
        idx_tup_fetch,

        -- Index effectiveness metrics
        CASE 
            WHEN idx_scan > 0 THEN idx_tup_read::numeric / idx_scan 
            ELSE 0 
        END as avg_tuples_per_scan,

        CASE 
            WHEN idx_tup_read > 0 THEN idx_tup_fetch::numeric / idx_tup_read * 100
            ELSE 0 
        END as fetch_ratio_percent

    FROM pg_stat_user_indexes
    WHERE tablename = 'user_activities'
),
table_performance AS (
    SELECT 
        schemaname,
        tablename,
        seq_scan,
        seq_tup_read,
        idx_scan,
        idx_tup_fetch,
        n_tup_ins,
        n_tup_upd,
        n_tup_del,

        -- Table scan ratios
        CASE 
            WHEN (seq_scan + idx_scan) > 0 
            THEN seq_scan::numeric / (seq_scan + idx_scan) * 100
            ELSE 0 
        END as seq_scan_ratio_percent

    FROM pg_stat_user_tables
    WHERE tablename = 'user_activities'
)
SELECT 
    -- Index usage analysis
    iu.indexname,
    iu.idx_scan as index_scans,
    ROUND(iu.avg_tuples_per_scan, 2) as avg_tuples_per_scan,
    ROUND(iu.fetch_ratio_percent, 1) as fetch_efficiency_pct,

    -- Index effectiveness assessment
    CASE
        WHEN iu.idx_scan = 0 THEN 'unused'
        WHEN iu.avg_tuples_per_scan > 100 THEN 'inefficient'
        WHEN iu.fetch_ratio_percent < 50 THEN 'poor_selectivity'
        ELSE 'effective'
    END as index_status,

    -- Table-level performance
    tp.seq_scan as table_scans,
    ROUND(tp.seq_scan_ratio_percent, 1) as seq_scan_pct,

    -- Recommendations
    CASE 
        WHEN iu.idx_scan = 0 THEN 'Consider dropping unused index'
        WHEN iu.avg_tuples_per_scan > 100 THEN 'Improve index selectivity or reorder fields'
        WHEN tp.seq_scan_ratio_percent > 20 THEN 'Add missing indexes for common queries'
        ELSE 'Index performing within acceptable parameters'
    END as recommendation

FROM index_usage iu
CROSS JOIN table_performance tp
ORDER BY iu.idx_scan DESC, iu.avg_tuples_per_scan DESC;

-- MySQL compound indexing (more limited capabilities)
CREATE TABLE mysql_activities (
    id BIGINT AUTO_INCREMENT PRIMARY KEY,
    user_id BIGINT NOT NULL,
    app_id VARCHAR(100) NOT NULL,
    activity_type VARCHAR(50) NOT NULL,
    status VARCHAR(20) NOT NULL,
    priority INT DEFAULT 5,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    activity_data JSON,

    -- Compound indexes (limited optimization capabilities)
    INDEX idx_user_app_status (user_id, app_id, status),
    INDEX idx_app_type_created (app_id, activity_type, created_at),
    INDEX idx_status_priority (status, priority)
);

-- Basic multi-field query in MySQL
SELECT 
    user_id,
    app_id,
    activity_type,
    status,
    priority,
    created_at,
    JSON_EXTRACT(activity_data, '$.source') as source
FROM mysql_activities
WHERE user_id IN (12345, 23456)
  AND app_id = 'web_app'
  AND status = 'completed'
  AND priority >= 3
  AND created_at >= DATE_SUB(NOW(), INTERVAL 7 DAY)
ORDER BY priority DESC, created_at DESC
LIMIT 50;

-- MySQL limitations for compound indexing:
-- - Limited query optimization capabilities
-- - Poor JSON field indexing support
-- - Restrictive index intersection algorithms
-- - Basic query planning with limited statistics
-- - Limited support for complex sorting requirements
-- - Poor performance with large result sets
-- - Minimal support for index-only scans

MongoDB Compound Indexes provide comprehensive multi-field optimization:

// MongoDB Compound Indexes - advanced multi-field query optimization
const { MongoClient } = require('mongodb');

const client = new MongoClient('mongodb://localhost:27017');
const db = client.db('optimization_platform');

// Create collection with comprehensive compound index strategy
const setupAdvancedIndexing = async () => {
  const userActivities = db.collection('user_activities');

  // 1. Primary compound index for user-centric queries
  await userActivities.createIndex(
    {
      userId: 1,
      applicationId: 1,
      status: 1,
      createdAt: -1
    },
    {
      name: 'idx_user_app_status_time',
      background: true
    }
  );

  // 2. Application-centric compound index
  await userActivities.createIndex(
    {
      applicationId: 1,
      activityType: 1,
      priority: -1,
      createdAt: -1
    },
    {
      name: 'idx_app_type_priority_time',
      background: true
    }
  );

  // 3. Status and performance monitoring index
  await userActivities.createIndex(
    {
      status: 1,
      priority: -1,
      executionTimeMs: 1,
      createdAt: -1
    },
    {
      name: 'idx_status_priority_performance',
      background: true
    }
  );

  // 4. Geographic and categorization index
  await userActivities.createIndex(
    {
      countryCode: 1,
      region: 1,
      category: 1,
      subcategory: 1,
      createdAt: -1
    },
    {
      name: 'idx_geo_category_time',
      background: true
    }
  );

  // 5. Advanced compound index with embedded document fields
  await userActivities.createIndex(
    {
      'metadata.source': 1,
      activityType: 1,
      'activityData.amount': -1,
      createdAt: -1
    },
    {
      name: 'idx_source_type_amount_time',
      background: true,
      partialFilterExpression: {
        'metadata.source': { $exists: true },
        'activityData.amount': { $exists: true, $gt: 0 }
      }
    }
  );

  // 6. Text search compound index
  await userActivities.createIndex(
    {
      userId: 1,
      applicationId: 1,
      activityType: 1,
      title: 'text',
      description: 'text',
      'metadata.keywords': 'text'
    },
    {
      name: 'idx_user_app_type_text',
      background: true,
      weights: {
        title: 10,
        description: 5,
        'metadata.keywords': 3
      }
    }
  );

  // 7. Sparse index for optional fields
  await userActivities.createIndex(
    {
      completedAt: -1,
      userId: 1,
      'performance.totalDuration': -1
    },
    {
      name: 'idx_completed_user_duration',
      sparse: true,
      background: true
    }
  );

  // 8. TTL index for automatic data cleanup
  await userActivities.createIndex(
    {
      createdAt: 1
    },
    {
      name: 'idx_ttl_cleanup',
      expireAfterSeconds: 60 * 60 * 24 * 90, // 90 days
      background: true
    }
  );

  console.log('Advanced compound indexes created successfully');
};

// High-performance multi-field query examples
const performAdvancedQueries = async () => {
  const userActivities = db.collection('user_activities');

  // Query 1: User activity dashboard with compound index optimization
  const userDashboard = await userActivities.aggregate([
    // Stage 1: Efficient filtering using compound index
    {
      $match: {
        userId: { $in: [12345, 23456, 34567, 45678] },
        applicationId: { $in: ['web_app', 'mobile_app', 'api_service'] },
        status: { $in: ['completed', 'in_progress', 'failed'] },
        createdAt: {
          $gte: new Date(Date.now() - 7 * 24 * 60 * 60 * 1000),
          $lte: new Date(Date.now() - 60 * 60 * 1000)
        }
      }
    },

    // Stage 2: Additional filtering leveraging partial indexes
    {
      $match: {
        priority: { $gte: 3 },
        countryCode: { $in: ['US', 'CA', 'GB', 'DE'] },
        region: { $exists: true },
        $or: [
          { executionTimeMs: null },
          { executionTimeMs: { $lt: 10000 } }
        ],
        errorCount: { $lte: 5 },
        category: { $in: ['user_interaction', 'system_process', 'data_operation'] },
        'metadata.source': { $in: ['web', 'mobile', 'api'] },
        'activityData.amount': { $gt: 10 }
      }
    },

    // Stage 3: Add computed fields
    {
      $addFields: {
        totalDurationMs: {
          $cond: {
            if: { $ne: ['$completedAt', null] },
            then: { $subtract: ['$completedAt', '$createdAt'] },
            else: null
          }
        },

        activityScore: {
          $switch: {
            branches: [
              {
                case: { 
                  $and: [
                    { $eq: ['$errorCount', 0] },
                    { $eq: ['$status', 'completed'] }
                  ]
                },
                then: 100
              },
              {
                case: { 
                  $and: [
                    { $eq: ['$errorCount', 0] },
                    { $eq: ['$status', 'in_progress'] }
                  ]
                },
                then: 75
              },
              {
                case: { 
                  $and: [
                    { $gt: ['$errorCount', 0] },
                    { $lte: ['$retryCount', 3] }
                  ]
                },
                then: 50
              }
            ],
            default: 25
          }
        }
      }
    },

    // Stage 4: Window functions for ranking
    {
      $setWindowFields: {
        partitionBy: { userId: '$userId', applicationId: '$applicationId' },
        sortBy: { priority: -1, createdAt: -1 },
        output: {
          userAppRank: {
            $denseRank: {}
          },

          // Rolling statistics
          rollingAvgDuration: {
            $avg: '$executionTimeMs',
            window: {
              documents: [-4, 0] // Last 5 activities
            }
          }
        }
      }
    },

    // Stage 5: Final sorting leveraging compound indexes
    {
      $sort: {
        priority: -1,
        createdAt: -1,
        userId: 1
      }
    },

    // Stage 6: Limit results
    {
      $limit: 50
    },

    // Stage 7: Project final structure
    {
      $project: {
        activityId: '$_id',
        userId: 1,
        applicationId: 1,
        activityType: 1,
        status: 1,
        priority: 1,
        createdAt: 1,
        executionTimeMs: 1,
        activityData: 1,
        totalDurationMs: 1,
        userAppRank: 1,
        activityScore: 1,
        rollingAvgDuration: { $round: ['$rollingAvgDuration', 2] },

        // Performance indicators
        isHighPriority: { $gte: ['$priority', 8] },
        isRecentActivity: { 
          $gte: ['$createdAt', new Date(Date.now() - 24 * 60 * 60 * 1000)]
        },
        hasPerformanceIssue: { $gt: ['$executionTimeMs', 5000] }
      }
    }
  ]).toArray();

  console.log('User dashboard query completed:', userDashboard.length, 'results');

  // Query 2: Application performance analysis with optimized grouping
  const appPerformanceAnalysis = await userActivities.aggregate([
    {
      $match: {
        createdAt: { $gte: new Date(Date.now() - 24 * 60 * 60 * 1000) },
        executionTimeMs: { $exists: true }
      }
    },

    // Group by application and activity type
    {
      $group: {
        _id: {
          applicationId: '$applicationId',
          activityType: '$activityType',
          status: '$status'
        },

        // Volume metrics
        totalActivities: { $sum: 1 },
        uniqueUsers: { $addToSet: '$userId' },

        // Performance metrics
        avgExecutionTime: { $avg: '$executionTimeMs' },
        minExecutionTime: { $min: '$executionTimeMs' },
        maxExecutionTime: { $max: '$executionTimeMs' },
        p95ExecutionTime: { 
          $percentile: { 
            input: '$executionTimeMs', 
            p: [0.95], 
            method: 'approximate' 
          } 
        },

        // Error metrics
        errorCount: { $sum: '$errorCount' },
        retryCount: { $sum: '$retryCount' },

        // Success metrics
        successCount: {
          $sum: { $cond: [{ $eq: ['$status', 'completed'] }, 1, 0] }
        },

        // Time distribution
        activitiesByHour: {
          $push: { $hour: '$createdAt' }
        },

        // Priority distribution
        avgPriority: { $avg: '$priority' },
        maxPriority: { $max: '$priority' }
      }
    },

    // Calculate derived metrics
    {
      $addFields: {
        uniqueUserCount: { $size: '$uniqueUsers' },
        successRate: {
          $multiply: [
            { $divide: ['$successCount', '$totalActivities'] },
            100
          ]
        },
        errorRate: {
          $multiply: [
            { $divide: ['$errorCount', '$totalActivities'] },
            100
          ]
        },

        // Performance classification
        performanceCategory: {
          $switch: {
            branches: [
              {
                case: { $lt: ['$avgExecutionTime', 1000] },
                then: 'fast'
              },
              {
                case: { $lt: ['$avgExecutionTime', 5000] },
                then: 'moderate'
              },
              {
                case: { $lt: ['$avgExecutionTime', 10000] },
                then: 'slow'
              }
            ],
            default: 'critical'
          }
        }
      }
    },

    // Sort by performance issues first
    {
      $sort: {
        performanceCategory: -1, // Critical first
        errorRate: -1,
        avgExecutionTime: -1
      }
    }
  ]).toArray();

  console.log('Application performance analysis completed:', appPerformanceAnalysis.length, 'results');

  // Query 3: Advanced text search with compound index
  const textSearchResults = await userActivities.aggregate([
    {
      $match: {
        userId: { $in: [12345, 23456, 34567] },
        applicationId: 'web_app',
        activityType: 'search_query',
        $text: {
          $search: 'performance optimization mongodb',
          $caseSensitive: false,
          $diacriticSensitive: false
        }
      }
    },

    {
      $addFields: {
        textScore: { $meta: 'textScore' },
        relevanceScore: {
          $multiply: [
            { $meta: 'textScore' },
            {
              $switch: {
                branches: [
                  { case: { $eq: ['$priority', 10] }, then: 1.5 },
                  { case: { $gte: ['$priority', 8] }, then: 1.2 },
                  { case: { $gte: ['$priority', 5] }, then: 1.0 }
                ],
                default: 0.8
              }
            }
          ]
        }
      }
    },

    {
      $sort: {
        relevanceScore: -1,
        createdAt: -1
      }
    },

    {
      $limit: 20
    }
  ]).toArray();

  console.log('Text search results:', textSearchResults.length, 'matches');

  return {
    userDashboard,
    appPerformanceAnalysis,
    textSearchResults
  };
};

// Index performance analysis and optimization
const analyzeIndexPerformance = async () => {
  const userActivities = db.collection('user_activities');

  // Get index statistics
  const indexStats = await userActivities.aggregate([
    { $indexStats: {} }
  ]).toArray();

  // Analyze query execution plans
  const explainPlan = await userActivities.find({
    userId: { $in: [12345, 23456] },
    applicationId: 'web_app',
    status: 'completed',
    createdAt: { $gte: new Date(Date.now() - 24 * 60 * 60 * 1000) }
  }).explain('executionStats');

  // Index usage recommendations
  const indexRecommendations = indexStats.map(index => {
    const usage = index.accesses;
    const effectiveness = usage.ops / Math.max(usage.since.getTime(), 1);

    return {
      indexName: index.name,
      keyPattern: index.key,
      usage: usage,
      effectiveness: effectiveness,
      recommendation: effectiveness < 0.001 ? 'Consider dropping - low usage' :
                     effectiveness < 0.01 ? 'Monitor usage patterns' :
                     effectiveness < 0.1 ? 'Optimize query patterns' :
                     'Performing well',

      // Size and memory impact
      estimatedSize: index.spec?.storageSize || 'N/A',

      // Usage patterns
      opsPerDay: usage.ops
    };
  });

  console.log('Index Performance Analysis:');
  console.log(JSON.stringify(indexRecommendations, null, 2));

  return {
    indexStats,
    explainPlan,
    indexRecommendations
  };
};

// Advanced compound index patterns for specific use cases
const setupSpecializedIndexes = async () => {
  const userActivities = db.collection('user_activities');

  // 1. Multikey index for array fields
  await userActivities.createIndex(
    {
      tags: 1,
      category: 1,
      createdAt: -1
    },
    {
      name: 'idx_tags_category_time',
      background: true
    }
  );

  // 2. Compound index with hashed sharding key
  await userActivities.createIndex(
    {
      userId: 'hashed',
      createdAt: -1,
      applicationId: 1
    },
    {
      name: 'idx_user_hash_time_app',
      background: true
    }
  );

  // 3. Compound wildcard index for dynamic schemas
  await userActivities.createIndex(
    {
      'metadata.$**': 1,
      activityType: 1
    },
    {
      name: 'idx_metadata_wildcard_type',
      background: true,
      wildcardProjection: {
        'metadata.sensitive': 0 // Exclude sensitive fields
      }
    }
  );

  // 4. Compound 2dsphere index for geospatial queries
  await userActivities.createIndex(
    {
      'location.coordinates': '2dsphere',
      activityType: 1,
      createdAt: -1
    },
    {
      name: 'idx_geo_type_time',
      background: true
    }
  );

  // 5. Compound partial index for conditional optimization
  await userActivities.createIndex(
    {
      status: 1,
      'performance.executionTimeMs': -1,
      userId: 1
    },
    {
      name: 'idx_status_performance_user_partial',
      background: true,
      partialFilterExpression: {
        status: { $in: ['failed', 'timeout'] },
        'performance.executionTimeMs': { $gt: 5000 }
      }
    }
  );

  console.log('Specialized compound indexes created');
};

// Benefits of MongoDB Compound Indexes:
// - Efficient multi-field query optimization with automatic index selection
// - Support for complex query patterns including range and equality conditions
// - Intelligent query planning with cost-based optimization
// - Index intersection capabilities for optimal query performance
// - Support for sorting and filtering in a single index scan
// - Flexible index ordering to match query patterns
// - Integration with aggregation pipeline optimization
// - Advanced index types including text, geospatial, and wildcard
// - Partial and sparse indexing for memory efficiency
// - Background index building for zero-downtime optimization

module.exports = {
  setupAdvancedIndexing,
  performAdvancedQueries,
  analyzeIndexPerformance,
  setupSpecializedIndexes
};

Understanding MongoDB Compound Index Architecture

Advanced Compound Index Design Patterns

Implement sophisticated compound indexing strategies for different query scenarios:

// Advanced compound indexing design patterns
class CompoundIndexOptimizer {
  constructor(db) {
    this.db = db;
    this.indexAnalytics = new Map();
    this.queryPatterns = new Map();
  }

  async analyzeQueryPatterns(collection, sampleSize = 10000) {
    console.log(`Analyzing query patterns for ${collection.collectionName}...`);

    // Capture query patterns from operations
    const operations = await this.db.admin().command({
      currentOp: 1,
      $all: true,
      ns: { $regex: collection.collectionName }
    });

    // Analyze existing queries from profiler data
    const profilerData = await this.db.collection('system.profile')
      .find({
        ns: `${this.db.databaseName}.${collection.collectionName}`,
        op: { $in: ['query', 'find', 'aggregate'] }
      })
      .sort({ ts: -1 })
      .limit(sampleSize)
      .toArray();

    // Extract query patterns
    const queryPatterns = this.extractQueryPatterns(profilerData);

    console.log(`Found ${queryPatterns.length} unique query patterns`);
    return queryPatterns;
  }

  extractQueryPatterns(profilerData) {
    const patterns = new Map();

    profilerData.forEach(op => {
      if (op.command && op.command.filter) {
        const filterFields = Object.keys(op.command.filter);
        const sortFields = op.command.sort ? Object.keys(op.command.sort) : [];

        const patternKey = JSON.stringify({
          filter: filterFields.sort(),
          sort: sortFields
        });

        if (!patterns.has(patternKey)) {
          patterns.set(patternKey, {
            filterFields,
            sortFields,
            frequency: 0,
            avgExecutionTime: 0,
            totalExecutionTime: 0
          });
        }

        const pattern = patterns.get(patternKey);
        pattern.frequency++;
        pattern.totalExecutionTime += op.millis || 0;
        pattern.avgExecutionTime = pattern.totalExecutionTime / pattern.frequency;
      }
    });

    return Array.from(patterns.values());
  }

  async generateOptimalIndexes(collection, queryPatterns) {
    console.log('Generating optimal compound indexes...');

    const indexRecommendations = [];

    // Sort patterns by frequency and performance impact
    const sortedPatterns = queryPatterns.sort((a, b) => 
      (b.frequency * b.avgExecutionTime) - (a.frequency * a.avgExecutionTime)
    );

    for (const pattern of sortedPatterns.slice(0, 10)) { // Top 10 patterns
      const indexSpec = this.designCompoundIndex(pattern);

      if (indexSpec && indexSpec.fields.length > 0) {
        indexRecommendations.push({
          pattern: pattern,
          indexSpec: indexSpec,
          estimatedBenefit: pattern.frequency * pattern.avgExecutionTime,
          priority: this.calculateIndexPriority(pattern)
        });
      }
    }

    return indexRecommendations;
  }

  designCompoundIndex(queryPattern) {
    const { filterFields, sortFields } = queryPattern;

    // ESR rule: Equality, Sort, Range
    const equalityFields = [];
    const rangeFields = [];

    // Analyze field types (would need actual query analysis)
    filterFields.forEach(field => {
      // This is simplified - in practice, analyze actual query operators
      if (this.isEqualityField(field)) {
        equalityFields.push(field);
      } else {
        rangeFields.push(field);
      }
    });

    // Construct compound index following ESR rule
    const indexFields = [
      ...equalityFields,
      ...sortFields.filter(field => !equalityFields.includes(field)),
      ...rangeFields.filter(field => 
        !equalityFields.includes(field) && !sortFields.includes(field)
      )
    ];

    return {
      fields: indexFields,
      spec: this.buildIndexSpec(indexFields, sortFields),
      rule: 'ESR (Equality, Sort, Range)',
      rationale: this.explainIndexDesign(equalityFields, sortFields, rangeFields)
    };
  }

  buildIndexSpec(indexFields, sortFields) {
    const spec = {};

    indexFields.forEach(field => {
      // Determine sort order based on usage pattern
      if (sortFields.includes(field)) {
        // Use descending for time-based fields, ascending for others
        spec[field] = field.includes('time') || field.includes('date') || 
                     field.includes('created') || field.includes('updated') ? -1 : 1;
      } else {
        spec[field] = 1; // Default ascending for filtering
      }
    });

    return spec;
  }

  isEqualityField(field) {
    // Heuristic to determine if field is typically used for equality
    const equalityHints = ['id', 'status', 'type', 'category', 'code'];
    return equalityHints.some(hint => field.toLowerCase().includes(hint));
  }

  explainIndexDesign(equalityFields, sortFields, rangeFields) {
    return {
      equalityFields: equalityFields,
      sortFields: sortFields,
      rangeFields: rangeFields,
      reasoning: [
        'Equality fields placed first for maximum selectivity',
        'Sort fields positioned to enable index-based sorting',
        'Range fields placed last to minimize index scan overhead'
      ]
    };
  }

  calculateIndexPriority(pattern) {
    const frequencyWeight = 0.4;
    const performanceWeight = 0.6;

    const normalizedFrequency = Math.min(pattern.frequency / 100, 1);
    const normalizedPerformance = Math.min(pattern.avgExecutionTime / 1000, 1);

    return (normalizedFrequency * frequencyWeight) + 
           (normalizedPerformance * performanceWeight);
  }

  async implementIndexRecommendations(collection, recommendations) {
    console.log(`Implementing ${recommendations.length} index recommendations...`);

    const results = [];

    for (const rec of recommendations) {
      try {
        const indexName = `idx_optimized_${rec.pattern.filterFields.join('_')}`;

        await collection.createIndex(rec.indexSpec.spec, {
          name: indexName,
          background: true
        });

        results.push({
          indexName: indexName,
          spec: rec.indexSpec.spec,
          status: 'created',
          estimatedBenefit: rec.estimatedBenefit,
          priority: rec.priority
        });

        console.log(`Created index: ${indexName}`);

      } catch (error) {
        results.push({
          indexName: `idx_failed_${rec.pattern.filterFields.join('_')}`,
          spec: rec.indexSpec.spec,
          status: 'failed',
          error: error.message
        });

        console.error(`Failed to create index:`, error.message);
      }
    }

    return results;
  }

  async monitorIndexEffectiveness(collection, duration = 24 * 60 * 60 * 1000) {
    console.log('Starting index effectiveness monitoring...');

    const initialStats = await collection.aggregate([{ $indexStats: {} }]).toArray();

    // Wait for monitoring period
    await new Promise(resolve => setTimeout(resolve, duration));

    const finalStats = await collection.aggregate([{ $indexStats: {} }]).toArray();

    // Compare statistics
    const effectiveness = this.compareIndexStats(initialStats, finalStats);

    return effectiveness;
  }

  compareIndexStats(initialStats, finalStats) {
    const effectiveness = [];

    finalStats.forEach(finalStat => {
      const initialStat = initialStats.find(stat => stat.name === finalStat.name);

      if (initialStat) {
        const opsChange = finalStat.accesses.ops - initialStat.accesses.ops;
        const timeChange = finalStat.accesses.since - initialStat.accesses.since;
        const opsPerHour = timeChange > 0 ? (opsChange / timeChange) * 3600 : 0;

        effectiveness.push({
          indexName: finalStat.name,
          keyPattern: finalStat.key,
          operationsChange: opsChange,
          operationsPerHour: Math.round(opsPerHour),
          effectiveness: this.assessEffectiveness(opsPerHour),
          recommendation: this.getEffectivenessRecommendation(opsPerHour)
        });
      }
    });

    return effectiveness;
  }

  assessEffectiveness(opsPerHour) {
    if (opsPerHour < 0.1) return 'unused';
    if (opsPerHour < 1) return 'low';
    if (opsPerHour < 10) return 'moderate';
    if (opsPerHour < 100) return 'high';
    return 'critical';
  }

  getEffectivenessRecommendation(opsPerHour) {
    if (opsPerHour < 0.1) return 'Consider dropping this index';
    if (opsPerHour < 1) return 'Monitor usage patterns';
    if (opsPerHour < 10) return 'Index is providing moderate benefit';
    return 'Index is highly effective';
  }

  async performCompoundIndexBenchmark(collection, testQueries) {
    console.log('Running compound index benchmark...');

    const benchmarkResults = [];

    for (const query of testQueries) {
      console.log(`Testing query: ${JSON.stringify(query.filter)}`);

      // Benchmark without hint (let MongoDB choose)
      const autoResult = await this.benchmarkQuery(collection, query, null);

      // Benchmark with different index hints
      const hintResults = [];
      const indexes = await collection.indexes();

      for (const index of indexes) {
        if (Object.keys(index.key).length > 1) { // Compound indexes only
          const hintResult = await this.benchmarkQuery(collection, query, index.key);
          hintResults.push({
            indexHint: index.key,
            indexName: index.name,
            ...hintResult
          });
        }
      }

      benchmarkResults.push({
        query: query,
        automatic: autoResult,
        withHints: hintResults.sort((a, b) => a.executionTime - b.executionTime)
      });
    }

    return benchmarkResults;
  }

  async benchmarkQuery(collection, query, indexHint, iterations = 5) {
    const times = [];

    for (let i = 0; i < iterations; i++) {
      const startTime = Date.now();

      let cursor = collection.find(query.filter);

      if (indexHint) {
        cursor = cursor.hint(indexHint);
      }

      if (query.sort) {
        cursor = cursor.sort(query.sort);
      }

      if (query.limit) {
        cursor = cursor.limit(query.limit);
      }

      const results = await cursor.toArray();
      const endTime = Date.now();

      times.push({
        executionTime: endTime - startTime,
        resultCount: results.length
      });
    }

    const avgTime = times.reduce((sum, t) => sum + t.executionTime, 0) / times.length;
    const minTime = Math.min(...times.map(t => t.executionTime));
    const maxTime = Math.max(...times.map(t => t.executionTime));

    return {
      averageExecutionTime: Math.round(avgTime),
      minExecutionTime: minTime,
      maxExecutionTime: maxTime,
      resultCount: times[0].resultCount,
      consistency: maxTime - minTime
    };
  }

  async optimizeExistingIndexes(collection) {
    console.log('Analyzing existing indexes for optimization opportunities...');

    const indexes = await collection.indexes();
    const indexStats = await collection.aggregate([{ $indexStats: {} }]).toArray();

    const optimizations = [];

    // Identify unused indexes
    const unusedIndexes = indexStats.filter(stat => 
      stat.accesses.ops === 0 && stat.name !== '_id_'
    );

    // Identify overlapping indexes
    const overlappingIndexes = this.findOverlappingIndexes(indexes);

    // Identify missing indexes based on query patterns
    const queryPatterns = await this.analyzeQueryPatterns(collection);
    const missingIndexes = this.identifyMissingIndexes(indexes, queryPatterns);

    optimizations.push({
      type: 'unused_indexes',
      count: unusedIndexes.length,
      indexes: unusedIndexes.map(idx => idx.name),
      recommendation: 'Consider dropping these indexes to save storage and maintenance overhead'
    });

    optimizations.push({
      type: 'overlapping_indexes',
      count: overlappingIndexes.length,
      indexes: overlappingIndexes,
      recommendation: 'Consolidate overlapping indexes to improve efficiency'
    });

    optimizations.push({
      type: 'missing_indexes',
      count: missingIndexes.length,
      recommendations: missingIndexes,
      recommendation: 'Create these indexes to improve query performance'
    });

    return optimizations;
  }

  findOverlappingIndexes(indexes) {
    const overlapping = [];

    for (let i = 0; i < indexes.length; i++) {
      for (let j = i + 1; j < indexes.length; j++) {
        const idx1 = indexes[i];
        const idx2 = indexes[j];

        if (this.areIndexesOverlapping(idx1.key, idx2.key)) {
          overlapping.push({
            index1: idx1.name,
            index2: idx2.name,
            keys1: idx1.key,
            keys2: idx2.key,
            overlapType: this.getOverlapType(idx1.key, idx2.key)
          });
        }
      }
    }

    return overlapping;
  }

  areIndexesOverlapping(keys1, keys2) {
    const fields1 = Object.keys(keys1);
    const fields2 = Object.keys(keys2);

    // Check if one index is a prefix of another
    return this.isPrefix(fields1, fields2) || this.isPrefix(fields2, fields1);
  }

  isPrefix(fields1, fields2) {
    if (fields1.length > fields2.length) return false;

    for (let i = 0; i < fields1.length; i++) {
      if (fields1[i] !== fields2[i]) return false;
    }

    return true;
  }

  getOverlapType(keys1, keys2) {
    const fields1 = Object.keys(keys1);
    const fields2 = Object.keys(keys2);

    if (this.isPrefix(fields1, fields2)) {
      return `${fields1.join(',')} is prefix of ${fields2.join(',')}`;
    } else if (this.isPrefix(fields2, fields1)) {
      return `${fields2.join(',')} is prefix of ${fields1.join(',')}`;
    }

    return 'partial_overlap';
  }

  identifyMissingIndexes(existingIndexes, queryPatterns) {
    const missing = [];
    const existingSpecs = existingIndexes.map(idx => JSON.stringify(idx.key));

    queryPatterns.forEach(pattern => {
      const recommendedIndex = this.designCompoundIndex(pattern);
      const specStr = JSON.stringify(recommendedIndex.spec);

      if (!existingSpecs.includes(specStr) && recommendedIndex.fields.length > 0) {
        missing.push({
          pattern: pattern,
          recommendedIndex: recommendedIndex,
          priority: this.calculateIndexPriority(pattern)
        });
      }
    });

    return missing.sort((a, b) => b.priority - a.priority);
  }
}

SQL-Style Compound Index Operations with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB compound index management:

-- QueryLeaf compound index operations with SQL-familiar syntax

-- Create comprehensive compound indexes
CREATE COMPOUND INDEX idx_user_app_status_time ON user_activities (
  user_id ASC,
  application_id ASC, 
  status ASC,
  created_at DESC
) WITH (
  background = true,
  unique = false
);

CREATE COMPOUND INDEX idx_app_type_priority_performance ON user_activities (
  application_id ASC,
  activity_type ASC,
  priority DESC,
  execution_time_ms ASC,
  created_at DESC
) WITH (
  background = true,
  partial_filter = 'execution_time_ms IS NOT NULL AND priority >= 5'
);

-- Create compound text search index
CREATE COMPOUND INDEX idx_user_app_text_search ON user_activities (
  user_id ASC,
  application_id ASC,
  activity_type ASC,
  title TEXT,
  description TEXT,
  keywords TEXT
) WITH (
  weights = JSON_BUILD_OBJECT('title', 10, 'description', 5, 'keywords', 3),
  background = true
);

-- Optimized multi-field queries leveraging compound indexes
WITH user_activity_analysis AS (
  SELECT 
    user_id,
    application_id,
    activity_type,
    status,
    priority,
    created_at,
    execution_time_ms,
    error_count,
    retry_count,
    activity_data,

    -- Performance categorization
    CASE 
      WHEN execution_time_ms IS NULL THEN 'no_data'
      WHEN execution_time_ms < 1000 THEN 'fast'
      WHEN execution_time_ms < 5000 THEN 'moderate' 
      WHEN execution_time_ms < 10000 THEN 'slow'
      ELSE 'critical'
    END as performance_category,

    -- Activity scoring
    CASE
      WHEN error_count = 0 AND status = 'completed' THEN 100
      WHEN error_count = 0 AND status = 'in_progress' THEN 75
      WHEN error_count > 0 AND retry_count <= 3 THEN 50
      ELSE 25
    END as activity_score,

    -- Time-based metrics
    EXTRACT(hour FROM created_at) as activity_hour,
    DATE_TRUNC('day', created_at) as activity_date,

    -- User context
    activity_data->>'source' as source_system,
    CAST(activity_data->>'amount' AS NUMERIC) as transaction_amount,
    activity_data->>'category' as data_category

  FROM user_activities
  WHERE 
    -- Multi-field filtering optimized by compound index
    user_id IN (12345, 23456, 34567, 45678)
    AND application_id IN ('web_app', 'mobile_app', 'api_service')
    AND status IN ('completed', 'in_progress', 'failed')
    AND created_at >= CURRENT_TIMESTAMP - INTERVAL '7 days'
    AND created_at <= CURRENT_TIMESTAMP - INTERVAL '1 hour'
    AND priority >= 3
    AND (execution_time_ms IS NULL OR execution_time_ms < 30000)
    AND error_count <= 5
),

performance_metrics AS (
  SELECT 
    user_id,
    application_id,
    activity_type,

    -- Volume metrics
    COUNT(*) as total_activities,
    COUNT(DISTINCT DATE_TRUNC('day', created_at)) as active_days,
    COUNT(DISTINCT activity_hour) as active_hours,

    -- Performance distribution
    COUNT(*) FILTER (WHERE performance_category = 'fast') as fast_activities,
    COUNT(*) FILTER (WHERE performance_category = 'moderate') as moderate_activities,
    COUNT(*) FILTER (WHERE performance_category = 'slow') as slow_activities,
    COUNT(*) FILTER (WHERE performance_category = 'critical') as critical_activities,

    -- Execution time statistics
    AVG(execution_time_ms) as avg_execution_time,
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY execution_time_ms) as median_execution_time,
    PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY execution_time_ms) as p95_execution_time,
    PERCENTILE_CONT(0.99) WITHIN GROUP (ORDER BY execution_time_ms) as p99_execution_time,
    MIN(execution_time_ms) as min_execution_time,
    MAX(execution_time_ms) as max_execution_time,
    STDDEV_POP(execution_time_ms) as execution_time_stddev,

    -- Status distribution
    COUNT(*) FILTER (WHERE status = 'completed') as completed_count,
    COUNT(*) FILTER (WHERE status = 'failed') as failed_count,
    COUNT(*) FILTER (WHERE status = 'in_progress') as in_progress_count,

    -- Error and retry analysis
    SUM(error_count) as total_errors,
    SUM(retry_count) as total_retries,
    AVG(error_count) as avg_error_rate,
    MAX(error_count) as max_errors_per_activity,

    -- Quality metrics
    AVG(activity_score) as avg_activity_score,
    MIN(activity_score) as min_activity_score,
    MAX(activity_score) as max_activity_score,

    -- Transaction analysis
    AVG(transaction_amount) FILTER (WHERE transaction_amount > 0) as avg_transaction_amount,
    SUM(transaction_amount) FILTER (WHERE transaction_amount > 0) as total_transaction_amount,
    COUNT(*) FILTER (WHERE transaction_amount > 100) as high_value_transactions,

    -- Activity timing patterns
    mode() WITHIN GROUP (ORDER BY activity_hour) as most_active_hour,
    COUNT(DISTINCT source_system) as unique_source_systems,

    -- Recent activity indicators
    MAX(created_at) as last_activity_time,
    COUNT(*) FILTER (WHERE created_at >= CURRENT_TIMESTAMP - INTERVAL '24 hours') as recent_24h_activities,
    COUNT(*) FILTER (WHERE created_at >= CURRENT_TIMESTAMP - INTERVAL '1 hour') as recent_1h_activities

  FROM user_activity_analysis
  GROUP BY user_id, application_id, activity_type
),

ranked_performance AS (
  SELECT *,
    -- Performance rankings
    ROW_NUMBER() OVER (
      PARTITION BY application_id 
      ORDER BY avg_execution_time DESC
    ) as slowest_rank,

    ROW_NUMBER() OVER (
      PARTITION BY application_id
      ORDER BY total_errors DESC
    ) as error_rank,

    ROW_NUMBER() OVER (
      PARTITION BY application_id
      ORDER BY total_activities DESC
    ) as volume_rank,

    -- Efficiency scoring
    CASE 
      WHEN avg_execution_time IS NULL THEN 0
      WHEN avg_execution_time > 0 THEN 
        (completed_count::numeric / total_activities) / (avg_execution_time / 1000.0) * 1000
      ELSE 0
    END as efficiency_score,

    -- Performance categorization
    CASE
      WHEN p95_execution_time > 10000 THEN 'critical'
      WHEN p95_execution_time > 5000 THEN 'poor'
      WHEN p95_execution_time > 2000 THEN 'moderate'
      WHEN p95_execution_time > 1000 THEN 'good'
      ELSE 'excellent'
    END as performance_grade,

    -- Error rate classification
    CASE 
      WHEN total_activities > 0 THEN
        CASE
          WHEN (total_errors::numeric / total_activities) > 0.1 THEN 'high_error'
          WHEN (total_errors::numeric / total_activities) > 0.05 THEN 'moderate_error'
          WHEN (total_errors::numeric / total_activities) > 0.01 THEN 'low_error'
          ELSE 'minimal_error'
        END
      ELSE 'no_data'
    END as error_grade

  FROM performance_metrics
),

final_analysis AS (
  SELECT 
    user_id,
    application_id,
    activity_type,
    total_activities,
    active_days,

    -- Performance summary
    ROUND(avg_execution_time::numeric, 2) as avg_execution_time_ms,
    ROUND(median_execution_time::numeric, 2) as median_execution_time_ms,
    ROUND(p95_execution_time::numeric, 2) as p95_execution_time_ms,
    ROUND(p99_execution_time::numeric, 2) as p99_execution_time_ms,
    performance_grade,

    -- Success metrics
    ROUND((completed_count::numeric / total_activities) * 100, 1) as success_rate_pct,
    ROUND((failed_count::numeric / total_activities) * 100, 1) as failure_rate_pct,
    error_grade,

    -- Volume and efficiency
    volume_rank,
    ROUND(efficiency_score::numeric, 2) as efficiency_score,

    -- Financial metrics
    ROUND(total_transaction_amount::numeric, 2) as total_transaction_value,
    high_value_transactions,

    -- Activity patterns
    most_active_hour,
    recent_24h_activities,
    recent_1h_activities,

    -- Rankings and alerts
    slowest_rank,
    error_rank,

    CASE 
      WHEN performance_grade = 'critical' OR error_grade = 'high_error' THEN 'immediate_attention'
      WHEN performance_grade = 'poor' OR error_grade = 'moderate_error' THEN 'needs_optimization'
      WHEN slowest_rank <= 3 OR error_rank <= 3 THEN 'monitor_closely'
      ELSE 'performing_normally'
    END as alert_level,

    -- Recommendations
    CASE 
      WHEN performance_grade = 'critical' THEN 'Investigate performance bottlenecks immediately'
      WHEN error_grade = 'high_error' THEN 'Review error patterns and implement fixes'
      WHEN efficiency_score < 50 THEN 'Optimize processing efficiency'
      WHEN recent_1h_activities = 0 AND recent_24h_activities > 0 THEN 'Monitor for potential issues'
      ELSE 'Continue normal monitoring'
    END as recommendation

  FROM ranked_performance
)
SELECT *
FROM final_analysis
ORDER BY 
  CASE alert_level
    WHEN 'immediate_attention' THEN 1
    WHEN 'needs_optimization' THEN 2
    WHEN 'monitor_closely' THEN 3
    ELSE 4
  END,
  performance_grade DESC,
  total_activities DESC;

-- Advanced compound index analysis and optimization
WITH index_performance AS (
  SELECT 
    index_name,
    key_pattern,
    index_size_mb,

    -- Usage statistics
    total_operations,
    operations_per_day,
    avg_operations_per_query,

    -- Performance impact
    index_hit_ratio,
    avg_query_time_with_index,
    avg_query_time_without_index,
    performance_improvement_pct,

    -- Maintenance overhead
    build_time_minutes,
    storage_overhead_pct,
    update_overhead_ms,

    -- Effectiveness scoring
    (operations_per_day * performance_improvement_pct * index_hit_ratio) / 
    (index_size_mb * update_overhead_ms) as effectiveness_score

  FROM INDEX_PERFORMANCE_STATS()
  WHERE index_type = 'compound'
),

index_recommendations AS (
  SELECT 
    index_name,
    key_pattern,
    operations_per_day,
    ROUND(effectiveness_score::numeric, 4) as effectiveness_score,

    -- Performance classification
    CASE 
      WHEN effectiveness_score > 1000 THEN 'highly_effective'
      WHEN effectiveness_score > 100 THEN 'effective'
      WHEN effectiveness_score > 10 THEN 'moderately_effective' 
      WHEN effectiveness_score > 1 THEN 'minimally_effective'
      ELSE 'ineffective'
    END as effectiveness_category,

    -- Optimization recommendations
    CASE
      WHEN operations_per_day < 1 AND index_size_mb > 100 THEN 'Consider dropping - low usage, high storage cost'
      WHEN effectiveness_score < 1 THEN 'Review index design and query patterns'
      WHEN performance_improvement_pct < 10 THEN 'Minimal performance benefit - evaluate necessity'
      WHEN index_hit_ratio < 0.5 THEN 'Poor selectivity - consider reordering fields'
      WHEN update_overhead_ms > 100 THEN 'High maintenance cost - optimize for write workload'
      ELSE 'Index performing within acceptable parameters'
    END as recommendation,

    -- Priority for attention
    CASE
      WHEN effectiveness_score < 0.1 THEN 'high_priority'
      WHEN effectiveness_score < 1 THEN 'medium_priority'
      ELSE 'low_priority'
    END as optimization_priority,

    -- Storage and performance details
    ROUND(index_size_mb::numeric, 2) as size_mb,
    ROUND(performance_improvement_pct::numeric, 1) as performance_gain_pct,
    ROUND(index_hit_ratio::numeric, 3) as selectivity_ratio,
    build_time_minutes

  FROM index_performance
)
SELECT 
  index_name,
  key_pattern,
  effectiveness_category,
  effectiveness_score,
  operations_per_day,
  performance_gain_pct,
  selectivity_ratio,
  size_mb,
  optimization_priority,
  recommendation

FROM index_recommendations
ORDER BY 
  CASE optimization_priority
    WHEN 'high_priority' THEN 1
    WHEN 'medium_priority' THEN 2
    ELSE 3
  END,
  effectiveness_score DESC;

-- Query execution plan analysis for compound indexes
EXPLAIN (ANALYZE true, VERBOSE true)
SELECT 
  user_id,
  application_id,
  activity_type,
  status,
  priority,
  execution_time_ms,
  created_at
FROM user_activities
WHERE user_id IN (12345, 23456, 34567)
  AND application_id = 'web_app'
  AND status IN ('completed', 'failed')
  AND priority >= 5
  AND created_at >= CURRENT_TIMESTAMP - INTERVAL '24 hours'
ORDER BY priority DESC, created_at DESC
LIMIT 100;

-- Index intersection analysis
WITH query_analysis AS (
  SELECT 
    query_pattern,
    execution_count,
    avg_execution_time_ms,
    index_used,
    index_intersection_count,

    -- Index effectiveness
    rows_examined,
    rows_returned, 
    CASE 
      WHEN rows_examined > 0 THEN rows_returned::numeric / rows_examined
      ELSE 0
    END as index_selectivity,

    -- Performance indicators
    CASE
      WHEN avg_execution_time_ms > 5000 THEN 'slow'
      WHEN avg_execution_time_ms > 1000 THEN 'moderate'
      ELSE 'fast'
    END as performance_category

  FROM QUERY_EXECUTION_STATS()
  WHERE query_type = 'multi_field'
    AND time_period >= CURRENT_TIMESTAMP - INTERVAL '7 days'
)
SELECT 
  query_pattern,
  execution_count,
  ROUND(avg_execution_time_ms::numeric, 2) as avg_time_ms,
  performance_category,
  index_used,
  index_intersection_count,
  ROUND(index_selectivity::numeric, 4) as selectivity,

  -- Optimization opportunities
  CASE 
    WHEN index_selectivity < 0.1 THEN 'Poor index selectivity - consider compound index'
    WHEN index_intersection_count > 2 THEN 'Multiple index intersection - create compound index'
    WHEN performance_category = 'slow' THEN 'Performance issue - review indexing strategy'
    ELSE 'Acceptable performance'
  END as optimization_opportunity,

  rows_examined,
  rows_returned

FROM query_analysis
WHERE execution_count > 10  -- Focus on frequently executed queries
ORDER BY avg_execution_time_ms DESC, execution_count DESC;

-- QueryLeaf provides comprehensive compound indexing capabilities:
-- 1. SQL-familiar compound index creation with advanced options
-- 2. Multi-field query optimization with automatic index selection  
-- 3. Performance analysis and index effectiveness monitoring
-- 4. Query execution plan analysis with detailed statistics
-- 5. Index intersection detection and optimization recommendations
-- 6. Background index building for zero-downtime optimization
-- 7. Partial and sparse indexing for memory and storage efficiency
-- 8. Text search integration with compound field indexing
-- 9. Integration with MongoDB's query planner and optimization
-- 10. Familiar SQL syntax for complex multi-dimensional queries

Best Practices for Compound Index Implementation

Index Design Strategy

Essential principles for optimal compound index design:

  1. ESR Rule: Follow Equality, Sort, Range field ordering for maximum effectiveness
  2. Query Pattern Analysis: Analyze actual query patterns before designing indexes
  3. Cardinality Optimization: Place high-cardinality fields first for better selectivity
  4. Sort Integration: Design indexes that support both filtering and sorting requirements
  5. Prefix Optimization: Ensure indexes support multiple query patterns through prefixes
  6. Maintenance Balance: Balance query performance with index maintenance overhead

Performance and Scalability

Optimize compound indexes for production workloads:

  1. Index Intersection: Understand when MongoDB uses multiple indexes vs. compound indexes
  2. Memory Utilization: Monitor index memory usage and working set requirements
  3. Write Performance: Balance read optimization with write performance impact
  4. Partial Indexes: Use partial indexes to reduce storage and maintenance overhead
  5. Index Statistics: Regularly analyze index usage patterns and effectiveness
  6. Background Building: Use background index creation for zero-downtime deployments

Conclusion

MongoDB Compound Indexes provide sophisticated multi-field query optimization that eliminates the complexity and limitations of traditional relational indexing approaches. The integration of intelligent query planning, automatic index selection, and flexible field ordering makes building high-performance multi-dimensional queries both powerful and efficient.

Key Compound Index benefits include:

  • Advanced Query Optimization: Intelligent index selection and query path optimization
  • Multi-Field Efficiency: Single index supporting complex filtering, sorting, and range queries
  • Flexible Design Patterns: Support for various query patterns through strategic field ordering
  • Performance Monitoring: Comprehensive index usage analytics and optimization recommendations
  • Scalable Architecture: Efficient performance across large datasets and high-concurrency workloads
  • Developer Familiarity: SQL-style compound index creation and management patterns

Whether you're building analytics platforms, real-time dashboards, e-commerce applications, or any system requiring complex multi-field queries, MongoDB Compound Indexes with QueryLeaf's familiar SQL interface provides the foundation for optimal query performance. This combination enables sophisticated indexing strategies while preserving familiar database interaction patterns.

QueryLeaf Integration: QueryLeaf automatically optimizes MongoDB compound index operations while providing SQL-familiar index creation, query optimization, and performance analysis. Advanced indexing strategies, query planning, and index effectiveness monitoring are seamlessly handled through familiar SQL patterns, making sophisticated database optimization both powerful and accessible.

The integration of advanced compound indexing capabilities with SQL-style operations makes MongoDB an ideal platform for applications requiring both complex multi-field query performance and familiar database interaction patterns, ensuring your optimization strategies remain both effective and maintainable as they scale and evolve.

MongoDB Change Streams and Event-Driven Architecture: Building Reactive Applications with SQL-Style Event Processing

Modern applications increasingly require real-time responsiveness and event-driven architectures that can react instantly to data changes across distributed systems. Traditional polling-based approaches for change detection introduce significant latency, resource overhead, and scaling challenges that make building responsive applications complex and inefficient.

MongoDB Change Streams provide native event streaming capabilities that enable applications to watch for data changes in real-time, triggering immediate reactions without polling overhead. Unlike traditional database triggers or external change data capture systems, MongoDB Change Streams offer a unified, scalable approach to event-driven architecture that works seamlessly across replica sets and sharded clusters.

The Traditional Change Detection Challenge

Traditional approaches to detecting and reacting to data changes have significant architectural and performance limitations:

-- Traditional polling approach - inefficient and high-latency

-- PostgreSQL polling-based change detection
CREATE TABLE user_activities (
    activity_id BIGSERIAL PRIMARY KEY,
    user_id BIGINT NOT NULL,
    activity_type VARCHAR(50) NOT NULL,
    activity_data JSONB,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    processed BOOLEAN DEFAULT FALSE
);

-- Polling query runs every few seconds
SELECT 
    activity_id,
    user_id,
    activity_type,
    activity_data,
    created_at
FROM user_activities 
WHERE processed = FALSE 
ORDER BY created_at ASC 
LIMIT 100;

-- Mark as processed after handling
UPDATE user_activities 
SET processed = TRUE, updated_at = CURRENT_TIMESTAMP
WHERE activity_id IN (1, 2, 3, ...);

-- Problems with polling approach:
-- 1. High latency - changes only detected on poll intervals
-- 2. Resource waste - constant querying even when no changes
-- 3. Scaling issues - increased polling frequency impacts performance
-- 4. Race conditions - multiple consumers competing for same records
-- 5. Complex state management - tracking processed vs unprocessed
-- 6. Poor real-time experience - delays in reaction to changes

-- Database trigger approach (limited and complex)
CREATE OR REPLACE FUNCTION notify_activity_change()
RETURNS TRIGGER AS $$
BEGIN
    PERFORM pg_notify('activity_changes', 
        json_build_object(
            'activity_id', NEW.activity_id,
            'user_id', NEW.user_id,
            'activity_type', NEW.activity_type,
            'operation', TG_OP
        )::text
    );
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER activity_change_trigger
AFTER INSERT OR UPDATE OR DELETE ON user_activities
FOR EACH ROW EXECUTE FUNCTION notify_activity_change();

-- Trigger limitations:
-- - Limited to single database instance
-- - No ordering guarantees across tables
-- - Difficult error handling and retry logic
-- - Complex setup for distributed systems
-- - No built-in filtering or transformation
-- - Poor integration with modern event architectures

-- MySQL limitations (even more restrictive)
CREATE TABLE change_log (
    id INT AUTO_INCREMENT PRIMARY KEY,
    table_name VARCHAR(100),
    record_id VARCHAR(100), 
    operation VARCHAR(10),
    change_data JSON,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Basic trigger for change tracking
DELIMITER $$
CREATE TRIGGER user_change_tracker
AFTER INSERT ON users
FOR EACH ROW
BEGIN
    INSERT INTO change_log (table_name, record_id, operation, change_data)
    VALUES ('users', NEW.id, 'INSERT', JSON_OBJECT('user_id', NEW.id));
END$$
DELIMITER ;

-- MySQL trigger limitations:
-- - Very limited JSON functionality
-- - No advanced event routing capabilities
-- - Poor performance with high-volume changes
-- - Complex maintenance and debugging
-- - No distributed system support

MongoDB Change Streams provide comprehensive event-driven capabilities:

// MongoDB Change Streams - native event-driven architecture
const { MongoClient } = require('mongodb');

const client = new MongoClient('mongodb://localhost:27017');
const db = client.db('event_driven_platform');

// Advanced Change Stream implementation for event-driven architecture
class EventDrivenMongoDBPlatform {
  constructor(db) {
    this.db = db;
    this.changeStreams = new Map();
    this.eventHandlers = new Map();
    this.metrics = {
      eventsProcessed: 0,
      lastEvent: null,
      errorCount: 0
    };
  }

  async setupEventDrivenCollections() {
    // Create collections for different event types
    const collections = {
      userActivities: db.collection('user_activities'),
      orderEvents: db.collection('order_events'),
      inventoryChanges: db.collection('inventory_changes'),
      systemEvents: db.collection('system_events'),
      auditLog: db.collection('audit_log')
    };

    // Create indexes for optimal change stream performance
    for (const [name, collection] of Object.entries(collections)) {
      await collection.createIndex({ userId: 1, timestamp: -1 });
      await collection.createIndex({ eventType: 1, status: 1 });
      await collection.createIndex({ createdAt: -1 });
    }

    return collections;
  }

  async startChangeStreamWatchers() {
    console.log('Starting change stream watchers...');

    // 1. Watch all changes across entire database
    await this.watchDatabaseChanges();

    // 2. Watch specific collection changes with filtering
    await this.watchUserActivityChanges();

    // 3. Watch order processing pipeline
    await this.watchOrderEvents();

    // 4. Watch inventory for real-time stock updates
    await this.watchInventoryChanges();

    console.log('All change stream watchers started');
  }

  async watchDatabaseChanges() {
    console.log('Setting up database-level change stream...');

    const changeStream = this.db.watch(
      [
        // Pipeline to filter and transform events
        {
          $match: {
            // Only watch insert, update, delete operations
            operationType: { $in: ['insert', 'update', 'delete', 'replace'] },

            // Exclude system collections and temporary data
            'ns.coll': { 
              $not: { $regex: '^(system\.|temp_)' }
            }
          }
        },
        {
          $addFields: {
            // Add event metadata
            eventId: { $toString: '$_id' },
            eventTimestamp: '$clusterTime',
            database: '$ns.db',
            collection: '$ns.coll',

            // Create standardized event structure
            eventData: {
              $switch: {
                branches: [
                  {
                    case: { $eq: ['$operationType', 'insert'] },
                    then: {
                      operation: 'created',
                      document: '$fullDocument'
                    }
                  },
                  {
                    case: { $eq: ['$operationType', 'update'] },
                    then: {
                      operation: 'updated', 
                      documentKey: '$documentKey',
                      updatedFields: '$updateDescription.updatedFields',
                      removedFields: '$updateDescription.removedFields'
                    }
                  },
                  {
                    case: { $eq: ['$operationType', 'delete'] },
                    then: {
                      operation: 'deleted',
                      documentKey: '$documentKey'
                    }
                  }
                ],
                default: {
                  operation: '$operationType',
                  documentKey: '$documentKey'
                }
              }
            }
          }
        }
      ],
      {
        fullDocument: 'updateLookup', // Include full document for updates
        fullDocumentBeforeChange: 'whenAvailable' // Include before state
      }
    );

    this.changeStreams.set('database', changeStream);

    // Handle database-level events
    changeStream.on('change', async (changeEvent) => {
      try {
        await this.handleDatabaseEvent(changeEvent);
        this.updateMetrics('database', changeEvent);
      } catch (error) {
        console.error('Error handling database event:', error);
        this.metrics.errorCount++;
      }
    });

    changeStream.on('error', (error) => {
      console.error('Database change stream error:', error);
      this.handleChangeStreamError('database', error);
    });
  }

  async watchUserActivityChanges() {
    console.log('Setting up user activity change stream...');

    const userActivities = this.db.collection('user_activities');

    const changeStream = userActivities.watch(
      [
        {
          $match: {
            operationType: { $in: ['insert', 'update'] },

            // Only watch for significant user activities
            $or: [
              { 'fullDocument.activityType': 'login' },
              { 'fullDocument.activityType': 'purchase' },
              { 'fullDocument.activityType': 'subscription_change' },
              { 'fullDocument.status': 'completed' },
              { 'updateDescription.updatedFields.status': 'completed' }
            ]
          }
        }
      ],
      {
        fullDocument: 'updateLookup',
        fullDocumentBeforeChange: 'whenAvailable'
      }
    );

    this.changeStreams.set('userActivities', changeStream);

    changeStream.on('change', async (changeEvent) => {
      try {
        await this.handleUserActivityEvent(changeEvent);

        // Trigger downstream events based on activity type
        await this.triggerDownstreamEvents('user_activity', changeEvent);

      } catch (error) {
        console.error('Error handling user activity event:', error);
        await this.logEventError('user_activities', changeEvent, error);
      }
    });
  }

  async watchOrderEvents() {
    console.log('Setting up order events change stream...');

    const orderEvents = this.db.collection('order_events');

    const changeStream = orderEvents.watch(
      [
        {
          $match: {
            operationType: 'insert',

            // Order lifecycle events
            'fullDocument.eventType': {
              $in: ['order_created', 'payment_processed', 'order_shipped', 
                   'order_delivered', 'order_cancelled', 'refund_processed']
            }
          }
        },
        {
          $addFields: {
            // Enrich with order context
            orderStage: {
              $switch: {
                branches: [
                  { case: { $eq: ['$fullDocument.eventType', 'order_created'] }, then: 'pending' },
                  { case: { $eq: ['$fullDocument.eventType', 'payment_processed'] }, then: 'confirmed' },
                  { case: { $eq: ['$fullDocument.eventType', 'order_shipped'] }, then: 'in_transit' },
                  { case: { $eq: ['$fullDocument.eventType', 'order_delivered'] }, then: 'completed' },
                  { case: { $eq: ['$fullDocument.eventType', 'order_cancelled'] }, then: 'cancelled' }
                ],
                default: 'unknown'
              }
            },

            // Priority for event processing
            processingPriority: {
              $switch: {
                branches: [
                  { case: { $eq: ['$fullDocument.eventType', 'payment_processed'] }, then: 1 },
                  { case: { $eq: ['$fullDocument.eventType', 'order_created'] }, then: 2 },
                  { case: { $eq: ['$fullDocument.eventType', 'order_cancelled'] }, then: 1 },
                  { case: { $eq: ['$fullDocument.eventType', 'refund_processed'] }, then: 1 }
                ],
                default: 3
              }
            }
          }
        }
      ],
      { fullDocument: 'updateLookup' }
    );

    this.changeStreams.set('orderEvents', changeStream);

    changeStream.on('change', async (changeEvent) => {
      try {
        // Route to appropriate order processing handler
        await this.processOrderEventChange(changeEvent);

        // Update order state machine
        await this.updateOrderStateMachine(changeEvent);

        // Trigger business logic workflows
        await this.triggerOrderWorkflows(changeEvent);

      } catch (error) {
        console.error('Error processing order event:', error);
        await this.handleOrderEventError(changeEvent, error);
      }
    });
  }

  async watchInventoryChanges() {
    console.log('Setting up inventory change stream...');

    const inventoryChanges = this.db.collection('inventory_changes');

    const changeStream = inventoryChanges.watch(
      [
        {
          $match: {
            $or: [
              // Stock level changes
              { 
                operationType: 'update',
                'updateDescription.updatedFields.stockLevel': { $exists: true }
              },
              // New inventory items
              {
                operationType: 'insert',
                'fullDocument.itemType': 'product'
              },
              // Inventory alerts
              {
                operationType: 'insert',
                'fullDocument.alertType': { $in: ['low_stock', 'out_of_stock', 'restock'] }
              }
            ]
          }
        }
      ],
      {
        fullDocument: 'updateLookup',
        fullDocumentBeforeChange: 'whenAvailable'
      }
    );

    this.changeStreams.set('inventoryChanges', changeStream);

    changeStream.on('change', async (changeEvent) => {
      try {
        // Real-time inventory updates
        await this.handleInventoryChange(changeEvent);

        // Check for low stock alerts
        await this.checkInventoryAlerts(changeEvent);

        // Update product availability in real-time
        await this.updateProductAvailability(changeEvent);

        // Notify relevant systems (pricing, recommendations, etc.)
        await this.notifyInventorySubscribers(changeEvent);

      } catch (error) {
        console.error('Error handling inventory change:', error);
        await this.logInventoryError(changeEvent, error);
      }
    });
  }

  async handleDatabaseEvent(changeEvent) {
    const { database, collection, eventData, operationType } = changeEvent;

    console.log(`Database Event: ${operationType} in ${database}.${collection}`);

    // Global event logging
    await this.logGlobalEvent({
      eventId: changeEvent.eventId,
      timestamp: new Date(changeEvent.clusterTime),
      database: database,
      collection: collection,
      operation: operationType,
      eventData: eventData
    });

    // Route to collection-specific handlers
    await this.routeCollectionEvent(collection, changeEvent);

    // Update global metrics and monitoring
    await this.updateGlobalMetrics(changeEvent);
  }

  async handleUserActivityEvent(changeEvent) {
    const { fullDocument, operationType } = changeEvent;
    const activity = fullDocument;

    console.log(`User Activity: ${activity.activityType} for user ${activity.userId}`);

    // Real-time user analytics
    if (activity.activityType === 'login') {
      await this.updateUserSession(activity);
      await this.trackUserLocation(activity);
    }

    // Purchase events
    if (activity.activityType === 'purchase') {
      await this.processRealtimePurchase(activity);
      await this.updateRecommendations(activity.userId);
      await this.triggerLoyaltyUpdates(activity);
    }

    // Subscription changes
    if (activity.activityType === 'subscription_change') {
      await this.processSubscriptionChange(activity);
      await this.updateBilling(activity);
    }

    // Create reactive events for downstream systems
    await this.publishUserEvent(activity, operationType);
  }

  async processOrderEventChange(changeEvent) {
    const { fullDocument: orderEvent } = changeEvent;

    console.log(`Order Event: ${orderEvent.eventType} for order ${orderEvent.orderId}`);

    switch (orderEvent.eventType) {
      case 'order_created':
        await this.processNewOrder(orderEvent);
        break;

      case 'payment_processed':
        await this.confirmOrderPayment(orderEvent);
        await this.triggerFulfillment(orderEvent);
        break;

      case 'order_shipped':
        await this.updateShippingTracking(orderEvent);
        await this.notifyCustomer(orderEvent);
        break;

      case 'order_delivered':
        await this.completeOrder(orderEvent);
        await this.triggerPostDeliveryWorkflow(orderEvent);
        break;

      case 'order_cancelled':
        await this.processCancellation(orderEvent);
        await this.handleRefund(orderEvent);
        break;
    }

    // Update order analytics in real-time
    await this.updateOrderAnalytics(orderEvent);
  }

  async handleInventoryChange(changeEvent) {
    const { fullDocument: inventory, operationType } = changeEvent;

    console.log(`Inventory Change: ${operationType} for item ${inventory.itemId}`);

    // Real-time stock updates
    if (changeEvent.updateDescription?.updatedFields?.stockLevel !== undefined) {
      const newStock = changeEvent.fullDocument.stockLevel;
      const previousStock = changeEvent.fullDocumentBeforeChange?.stockLevel || 0;

      await this.handleStockLevelChange({
        itemId: inventory.itemId,
        previousStock: previousStock,
        newStock: newStock,
        changeAmount: newStock - previousStock
      });
    }

    // Product availability updates
    await this.updateProductCatalog(inventory);

    // Pricing adjustments based on stock levels
    await this.updateDynamicPricing(inventory);
  }

  async triggerDownstreamEvents(eventType, changeEvent) {
    // Message queue integration for external systems
    const event = {
      eventId: generateEventId(),
      eventType: eventType,
      timestamp: new Date(),
      source: 'mongodb-change-stream',
      data: changeEvent,
      version: '1.0'
    };

    // Publish to different channels based on event type
    await this.publishToEventBus(event);
    await this.updateEventSourcing(event);
    await this.triggerWebhooks(event);
  }

  async publishToEventBus(event) {
    // Integration with message queues (Kafka, RabbitMQ, etc.)
    console.log(`Publishing event ${event.eventId} to event bus`);

    // Route to appropriate topics/queues
    const routingKey = `${event.eventType}.${event.data.operationType}`;

    // Simulate message queue publishing
    // await messageQueue.publish(routingKey, event);
  }

  async setupResumeTokenPersistence() {
    // Persist resume tokens for fault tolerance
    const resumeTokens = this.db.collection('change_stream_resume_tokens');

    // Save resume tokens periodically
    setInterval(async () => {
      for (const [streamName, changeStream] of this.changeStreams.entries()) {
        try {
          const resumeToken = changeStream.resumeToken;
          if (resumeToken) {
            await resumeTokens.updateOne(
              { streamName: streamName },
              {
                $set: {
                  resumeToken: resumeToken,
                  lastUpdated: new Date()
                }
              },
              { upsert: true }
            );
          }
        } catch (error) {
          console.error(`Error saving resume token for ${streamName}:`, error);
        }
      }
    }, 10000); // Every 10 seconds
  }

  async handleChangeStreamError(streamName, error) {
    console.error(`Change stream ${streamName} encountered error:`, error);

    // Implement retry logic with exponential backoff
    setTimeout(async () => {
      try {
        console.log(`Attempting to restart change stream: ${streamName}`);

        // Load last known resume token
        const resumeTokenDoc = await this.db.collection('change_stream_resume_tokens')
          .findOne({ streamName: streamName });

        // Restart stream from last known position
        if (resumeTokenDoc?.resumeToken) {
          // Restart with resume token
          await this.restartChangeStream(streamName, resumeTokenDoc.resumeToken);
        } else {
          // Restart from current time
          await this.restartChangeStream(streamName);
        }

      } catch (retryError) {
        console.error(`Failed to restart change stream ${streamName}:`, retryError);
        // Implement exponential backoff retry
      }
    }, 5000); // Initial 5-second delay
  }

  async getChangeStreamMetrics() {
    return {
      activeStreams: this.changeStreams.size,
      eventsProcessed: this.metrics.eventsProcessed,
      lastEventTime: this.metrics.lastEvent,
      errorCount: this.metrics.errorCount,

      streamHealth: Array.from(this.changeStreams.entries()).map(([name, stream]) => ({
        name: name,
        isActive: !stream.closed,
        hasResumeToken: !!stream.resumeToken
      }))
    };
  }

  updateMetrics(streamName, changeEvent) {
    this.metrics.eventsProcessed++;
    this.metrics.lastEvent = new Date();

    console.log(`Processed event from ${streamName}: ${changeEvent.operationType}`);
  }

  async shutdown() {
    console.log('Shutting down change streams...');

    // Close all change streams gracefully
    for (const [name, changeStream] of this.changeStreams.entries()) {
      try {
        await changeStream.close();
        console.log(`Closed change stream: ${name}`);
      } catch (error) {
        console.error(`Error closing change stream ${name}:`, error);
      }
    }

    this.changeStreams.clear();
    console.log('All change streams closed');
  }
}

// Usage example
const startEventDrivenPlatform = async () => {
  try {
    const platform = new EventDrivenMongoDBPlatform(db);

    // Setup collections and indexes
    await platform.setupEventDrivenCollections();

    // Start change stream watchers
    await platform.startChangeStreamWatchers();

    // Setup fault tolerance
    await platform.setupResumeTokenPersistence();

    // Monitor platform health
    setInterval(async () => {
      const metrics = await platform.getChangeStreamMetrics();
      console.log('Platform Metrics:', metrics);
    }, 30000); // Every 30 seconds

    console.log('Event-driven platform started successfully');
    return platform;

  } catch (error) {
    console.error('Error starting event-driven platform:', error);
    throw error;
  }
};

// Benefits of MongoDB Change Streams:
// - Real-time event processing without polling overhead
// - Ordered, durable event streams with resume token support  
// - Cluster-wide change detection across replica sets and shards
// - Rich filtering and transformation capabilities through aggregation pipelines
// - Built-in fault tolerance and automatic failover
// - Integration with MongoDB's ACID transactions
// - Scalable event-driven architecture foundation
// - Native integration with MongoDB ecosystem and tools

module.exports = {
  EventDrivenMongoDBPlatform,
  startEventDrivenPlatform
};

Understanding MongoDB Change Streams Architecture

Advanced Change Stream Patterns

Implement sophisticated change stream patterns for different event-driven scenarios:

// Advanced change stream patterns and event processing
class AdvancedChangeStreamPatterns {
  constructor(db) {
    this.db = db;
    this.eventProcessors = new Map();
    this.eventStore = db.collection('event_store');
    this.eventProjections = db.collection('event_projections');
  }

  async setupEventSourcingPattern() {
    // Event sourcing with change streams
    console.log('Setting up event sourcing pattern...');

    const aggregateCollections = [
      'user_aggregates',
      'order_aggregates', 
      'inventory_aggregates',
      'payment_aggregates'
    ];

    for (const collectionName of aggregateCollections) {
      const collection = this.db.collection(collectionName);

      const changeStream = collection.watch(
        [
          {
            $match: {
              operationType: { $in: ['insert', 'update', 'replace'] }
            }
          },
          {
            $addFields: {
              // Create event sourcing envelope
              eventEnvelope: {
                eventId: { $toString: '$_id' },
                eventType: '$operationType',
                aggregateId: '$documentKey._id',
                aggregateType: collectionName,
                eventVersion: { $ifNull: ['$fullDocument.version', 1] },
                eventData: '$fullDocument',
                eventMetadata: {
                  timestamp: '$clusterTime',
                  source: 'change-stream',
                  causationId: '$fullDocument.causationId',
                  correlationId: '$fullDocument.correlationId'
                }
              }
            }
          }
        ],
        {
          fullDocument: 'updateLookup',
          fullDocumentBeforeChange: 'whenAvailable'
        }
      );

      changeStream.on('change', async (changeEvent) => {
        await this.processEventSourcingEvent(changeEvent);
      });

      this.eventProcessors.set(`${collectionName}_eventsourcing`, changeStream);
    }
  }

  async processEventSourcingEvent(changeEvent) {
    const { eventEnvelope } = changeEvent;

    // Store event in event store
    await this.eventStore.insertOne({
      ...eventEnvelope,
      storedAt: new Date(),
      processedBy: [],
      projectionStatus: 'pending'
    });

    // Update read model projections
    await this.updateProjections(eventEnvelope);

    // Trigger sagas and process managers
    await this.triggerSagas(eventEnvelope);
  }

  async setupCQRSPattern() {
    // Command Query Responsibility Segregation with change streams
    console.log('Setting up CQRS pattern...');

    const commandCollections = ['commands', 'command_results'];

    for (const collectionName of commandCollections) {
      const collection = this.db.collection(collectionName);

      const changeStream = collection.watch(
        [
          {
            $match: {
              operationType: 'insert',
              'fullDocument.status': { $ne: 'processed' }
            }
          }
        ],
        { fullDocument: 'updateLookup' }
      );

      changeStream.on('change', async (changeEvent) => {
        await this.processCommand(changeEvent.fullDocument);
      });

      this.eventProcessors.set(`${collectionName}_cqrs`, changeStream);
    }
  }

  async setupSagaOrchestration() {
    // Saga pattern for distributed transaction coordination
    console.log('Setting up saga orchestration...');

    const sagaCollection = this.db.collection('sagas');

    const changeStream = sagaCollection.watch(
      [
        {
          $match: {
            $or: [
              { operationType: 'insert' },
              { 
                operationType: 'update',
                'updateDescription.updatedFields.status': { $exists: true }
              }
            ]
          }
        }
      ],
      { fullDocument: 'updateLookup' }
    );

    changeStream.on('change', async (changeEvent) => {
      await this.processSagaEvent(changeEvent);
    });

    this.eventProcessors.set('saga_orchestration', changeStream);
  }

  async processSagaEvent(changeEvent) {
    const saga = changeEvent.fullDocument;
    const { sagaId, status, currentStep, steps } = saga;

    console.log(`Processing saga ${sagaId}: ${status} at step ${currentStep}`);

    switch (status) {
      case 'started':
        await this.executeSagaStep(saga, 0);
        break;

      case 'step_completed':
        if (currentStep + 1 < steps.length) {
          await this.executeSagaStep(saga, currentStep + 1);
        } else {
          await this.completeSaga(sagaId);
        }
        break;

      case 'step_failed':
        await this.compensateSaga(saga, currentStep);
        break;

      case 'compensating':
        if (currentStep > 0) {
          await this.executeCompensation(saga, currentStep - 1);
        } else {
          await this.failSaga(sagaId);
        }
        break;
    }
  }

  async setupStreamProcessing() {
    // Stream processing with windowed aggregations
    console.log('Setting up stream processing...');

    const eventStream = this.db.collection('events');

    const changeStream = eventStream.watch(
      [
        {
          $match: {
            operationType: 'insert',
            'fullDocument.eventType': { $in: ['user_activity', 'transaction', 'system_event'] }
          }
        },
        {
          $addFields: {
            processingWindow: {
              $dateTrunc: {
                date: '$fullDocument.timestamp',
                unit: 'minute',
                binSize: 5 // 5-minute windows
              }
            }
          }
        }
      ],
      { fullDocument: 'updateLookup' }
    );

    let windowBuffer = new Map();

    changeStream.on('change', async (changeEvent) => {
      await this.processStreamEvent(changeEvent, windowBuffer);
    });

    // Process window aggregations every minute
    setInterval(async () => {
      await this.processWindowedAggregations(windowBuffer);
    }, 60000);

    this.eventProcessors.set('stream_processing', changeStream);
  }

  async processStreamEvent(changeEvent, windowBuffer) {
    const event = changeEvent.fullDocument;
    const window = changeEvent.processingWindow;
    const windowKey = window.toISOString();

    if (!windowBuffer.has(windowKey)) {
      windowBuffer.set(windowKey, {
        window: window,
        events: [],
        aggregations: {
          count: 0,
          userActivities: 0,
          transactions: 0,
          systemEvents: 0,
          totalValue: 0
        }
      });
    }

    const windowData = windowBuffer.get(windowKey);
    windowData.events.push(event);
    windowData.aggregations.count++;

    // Type-specific aggregations
    switch (event.eventType) {
      case 'user_activity':
        windowData.aggregations.userActivities++;
        break;
      case 'transaction':
        windowData.aggregations.transactions++;
        windowData.aggregations.totalValue += event.amount || 0;
        break;
      case 'system_event':
        windowData.aggregations.systemEvents++;
        break;
    }

    // Real-time alerting for anomalies
    if (windowData.aggregations.count > 1000) {
      await this.triggerVolumeAlert(windowKey, windowData);
    }
  }

  async setupMultiCollectionCoordination() {
    // Coordinate changes across multiple collections
    console.log('Setting up multi-collection coordination...');

    const coordinationConfig = [
      {
        collections: ['users', 'user_preferences', 'user_activities'],
        coordinator: 'userProfileCoordinator'
      },
      {
        collections: ['orders', 'order_items', 'payments', 'shipping'],
        coordinator: 'orderProcessingCoordinator' 
      },
      {
        collections: ['products', 'inventory', 'pricing', 'reviews'],
        coordinator: 'productManagementCoordinator'
      }
    ];

    for (const config of coordinationConfig) {
      await this.setupCollectionCoordinator(config);
    }
  }

  async setupCollectionCoordinator(config) {
    const { collections, coordinator } = config;

    for (const collectionName of collections) {
      const collection = this.db.collection(collectionName);

      const changeStream = collection.watch(
        [
          {
            $match: {
              operationType: { $in: ['insert', 'update', 'delete'] }
            }
          },
          {
            $addFields: {
              coordinationContext: {
                coordinator: coordinator,
                sourceCollection: collectionName,
                relatedCollections: collections.filter(c => c !== collectionName)
              }
            }
          }
        ],
        { fullDocument: 'updateLookup' }
      );

      changeStream.on('change', async (changeEvent) => {
        await this.processCoordinatedChange(changeEvent);
      });

      this.eventProcessors.set(`${collectionName}_${coordinator}`, changeStream);
    }
  }

  async processCoordinatedChange(changeEvent) {
    const { coordinationContext, fullDocument, operationType } = changeEvent;
    const { coordinator, sourceCollection, relatedCollections } = coordinationContext;

    console.log(`Coordinated change in ${sourceCollection} via ${coordinator}`);

    // Execute coordination logic based on coordinator type
    switch (coordinator) {
      case 'userProfileCoordinator':
        await this.coordinateUserProfileChanges(changeEvent);
        break;

      case 'orderProcessingCoordinator':
        await this.coordinateOrderProcessing(changeEvent);
        break;

      case 'productManagementCoordinator':
        await this.coordinateProductManagement(changeEvent);
        break;
    }
  }

  async coordinateUserProfileChanges(changeEvent) {
    const { fullDocument, operationType, ns } = changeEvent;
    const sourceCollection = ns.coll;

    if (sourceCollection === 'users' && operationType === 'update') {
      // User profile updated - sync preferences and activities
      await this.syncUserPreferences(fullDocument._id);
      await this.updateUserActivityContext(fullDocument._id);
    }

    if (sourceCollection === 'user_activities' && operationType === 'insert') {
      // New activity - update user profile analytics
      await this.updateUserAnalytics(fullDocument.userId, fullDocument);
    }
  }

  async setupChangeStreamHealthMonitoring() {
    // Health monitoring and metrics collection
    console.log('Setting up change stream health monitoring...');

    const healthMetrics = {
      totalStreams: 0,
      activeStreams: 0,
      eventsProcessed: 0,
      errorCount: 0,
      lastProcessedEvent: null,
      streamLatency: new Map()
    };

    // Monitor each change stream
    for (const [streamName, changeStream] of this.eventProcessors.entries()) {
      healthMetrics.totalStreams++;

      if (!changeStream.closed) {
        healthMetrics.activeStreams++;
      }

      // Monitor stream latency
      const originalEmit = changeStream.emit;
      changeStream.emit = function(event, ...args) {
        if (event === 'change') {
          const latency = Date.now() - args[0].clusterTime.getTime();
          healthMetrics.streamLatency.set(streamName, latency);
          healthMetrics.lastProcessedEvent = new Date();
          healthMetrics.eventsProcessed++;
        }
        return originalEmit.call(this, event, ...args);
      };

      // Monitor errors
      changeStream.on('error', (error) => {
        healthMetrics.errorCount++;
        console.error(`Stream ${streamName} error:`, error);
      });
    }

    // Periodic health reporting
    setInterval(() => {
      this.reportHealthMetrics(healthMetrics);
    }, 30000); // Every 30 seconds

    return healthMetrics;
  }

  reportHealthMetrics(metrics) {
    const avgLatency = Array.from(metrics.streamLatency.values())
      .reduce((sum, latency) => sum + latency, 0) / metrics.streamLatency.size || 0;

    console.log('Change Stream Health Report:', {
      totalStreams: metrics.totalStreams,
      activeStreams: metrics.activeStreams,
      eventsProcessed: metrics.eventsProcessed,
      errorCount: metrics.errorCount,
      averageLatency: Math.round(avgLatency) + 'ms',
      lastActivity: metrics.lastProcessedEvent
    });
  }

  async shutdown() {
    console.log('Shutting down advanced change stream patterns...');

    for (const [name, processor] of this.eventProcessors.entries()) {
      try {
        await processor.close();
        console.log(`Closed processor: ${name}`);
      } catch (error) {
        console.error(`Error closing processor ${name}:`, error);
      }
    }

    this.eventProcessors.clear();
  }
}

SQL-Style Change Stream Operations with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB Change Stream operations:

-- QueryLeaf change stream operations with SQL-familiar syntax

-- Create change stream watchers with SQL-style syntax
CREATE CHANGE_STREAM user_activity_watcher ON user_activities
WITH (
  operations = ['insert', 'update'],
  full_document = 'updateLookup',
  full_document_before_change = 'whenAvailable'
)
FILTER (
  activity_type IN ('login', 'purchase', 'subscription_change')
  OR status = 'completed'
);

-- Advanced change stream with aggregation pipeline
CREATE CHANGE_STREAM order_processing_watcher ON order_events
WITH (
  operations = ['insert'],
  full_document = 'updateLookup'
)
PIPELINE (
  FILTER (
    event_type IN ('order_created', 'payment_processed', 'order_shipped', 'order_delivered')
  ),
  ADD_FIELDS (
    order_stage = CASE 
      WHEN event_type = 'order_created' THEN 'pending'
      WHEN event_type = 'payment_processed' THEN 'confirmed'
      WHEN event_type = 'order_shipped' THEN 'in_transit'
      WHEN event_type = 'order_delivered' THEN 'completed'
      ELSE 'unknown'
    END,
    processing_priority = CASE
      WHEN event_type = 'payment_processed' THEN 1
      WHEN event_type = 'order_created' THEN 2
      ELSE 3
    END
  )
);

-- Database-level change stream monitoring
CREATE CHANGE_STREAM database_monitor ON DATABASE
WITH (
  operations = ['insert', 'update', 'delete'],
  full_document = 'updateLookup'
)
FILTER (
  -- Exclude system collections
  ns.coll NOT LIKE 'system.%'
  AND ns.coll NOT LIKE 'temp_%'
)
PIPELINE (
  ADD_FIELDS (
    event_id = CAST(_id AS VARCHAR),
    event_timestamp = cluster_time,
    database_name = ns.db,
    collection_name = ns.coll,
    event_data = CASE operation_type
      WHEN 'insert' THEN JSON_BUILD_OBJECT('operation', 'created', 'document', full_document)
      WHEN 'update' THEN JSON_BUILD_OBJECT(
        'operation', 'updated',
        'document_key', document_key,
        'updated_fields', update_description.updated_fields,
        'removed_fields', update_description.removed_fields
      )
      WHEN 'delete' THEN JSON_BUILD_OBJECT('operation', 'deleted', 'document_key', document_key)
      ELSE JSON_BUILD_OBJECT('operation', operation_type, 'document_key', document_key)
    END
  )
);

-- Event-driven reactive queries
WITH CHANGE_STREAM inventory_changes AS (
  SELECT 
    document_key._id as item_id,
    full_document.item_name,
    full_document.stock_level,
    full_document_before_change.stock_level as previous_stock_level,
    operation_type,
    cluster_time as event_time,

    -- Calculate stock change
    full_document.stock_level - COALESCE(full_document_before_change.stock_level, 0) as stock_change

  FROM CHANGE_STREAM ON inventory 
  WHERE operation_type IN ('insert', 'update')
    AND (full_document.stock_level != full_document_before_change.stock_level OR operation_type = 'insert')
),
stock_alerts AS (
  SELECT *,
    CASE 
      WHEN stock_level = 0 THEN 'OUT_OF_STOCK'
      WHEN stock_level <= 10 THEN 'LOW_STOCK' 
      WHEN stock_change > 0 AND previous_stock_level = 0 THEN 'RESTOCKED'
      ELSE 'NORMAL'
    END as alert_type,

    CASE
      WHEN stock_level = 0 THEN 'critical'
      WHEN stock_level <= 10 THEN 'warning'
      WHEN stock_change > 100 THEN 'info'
      ELSE 'normal'
    END as alert_severity

  FROM inventory_changes
)
SELECT 
  item_id,
  item_name,
  stock_level,
  previous_stock_level,
  stock_change,
  alert_type,
  alert_severity,
  event_time,

  -- Generate alert message
  CASE alert_type
    WHEN 'OUT_OF_STOCK' THEN CONCAT('Item ', item_name, ' is now out of stock')
    WHEN 'LOW_STOCK' THEN CONCAT('Item ', item_name, ' is running low (', stock_level, ' remaining)')
    WHEN 'RESTOCKED' THEN CONCAT('Item ', item_name, ' has been restocked (', stock_level, ' units)')
    ELSE CONCAT('Stock updated for ', item_name, ': ', stock_change, ' units')
  END as alert_message

FROM stock_alerts
WHERE alert_type != 'NORMAL'
ORDER BY alert_severity DESC, event_time DESC;

-- Real-time user activity aggregation
WITH CHANGE_STREAM user_events AS (
  SELECT 
    full_document.user_id,
    full_document.activity_type,
    full_document.session_id,
    full_document.timestamp,
    full_document.metadata,
    cluster_time as event_time

  FROM CHANGE_STREAM ON user_activities
  WHERE operation_type = 'insert'
    AND full_document.activity_type IN ('page_view', 'click', 'purchase', 'login')
),
session_aggregations AS (
  SELECT 
    user_id,
    session_id,
    TIME_WINDOW('5 minutes', event_time) as time_window,

    -- Activity counts
    COUNT(*) as total_activities,
    COUNT(*) FILTER (WHERE activity_type = 'page_view') as page_views,
    COUNT(*) FILTER (WHERE activity_type = 'click') as clicks, 
    COUNT(*) FILTER (WHERE activity_type = 'purchase') as purchases,

    -- Session metrics
    MIN(timestamp) as session_start,
    MAX(timestamp) as session_end,
    MAX(timestamp) - MIN(timestamp) as session_duration,

    -- Engagement scoring
    COUNT(DISTINCT metadata.page_url) as unique_pages_visited,
    AVG(EXTRACT(EPOCH FROM (LEAD(timestamp) OVER (ORDER BY timestamp) - timestamp))) as avg_time_between_activities

  FROM user_events
  GROUP BY user_id, session_id, TIME_WINDOW('5 minutes', event_time)
),
user_behavior_insights AS (
  SELECT *,
    -- Engagement level
    CASE 
      WHEN session_duration > INTERVAL '30 minutes' AND clicks > 20 THEN 'highly_engaged'
      WHEN session_duration > INTERVAL '10 minutes' AND clicks > 5 THEN 'engaged'
      WHEN session_duration > INTERVAL '2 minutes' THEN 'browsing'
      ELSE 'quick_visit'
    END as engagement_level,

    -- Conversion indicators
    purchases > 0 as converted_session,
    clicks / GREATEST(page_views, 1) as click_through_rate,

    -- Behavioral patterns
    CASE 
      WHEN unique_pages_visited > 10 THEN 'explorer'
      WHEN avg_time_between_activities > 60 THEN 'reader'
      WHEN clicks > page_views * 2 THEN 'active_clicker'
      ELSE 'standard'
    END as behavior_pattern

  FROM session_aggregations
)
SELECT 
  user_id,
  session_id,
  time_window,
  total_activities,
  page_views,
  clicks,
  purchases,
  session_duration,
  engagement_level,
  behavior_pattern,
  converted_session,
  ROUND(click_through_rate, 3) as ctr,

  -- Real-time recommendations
  CASE behavior_pattern
    WHEN 'explorer' THEN 'Show product recommendations based on browsed categories'
    WHEN 'reader' THEN 'Provide detailed product information and reviews'
    WHEN 'active_clicker' THEN 'Present clear call-to-action buttons and offers'
    ELSE 'Standard personalization approach'
  END as recommendation_strategy

FROM user_behavior_insights
WHERE engagement_level IN ('engaged', 'highly_engaged')
ORDER BY session_start DESC;

-- Event sourcing with change streams
CREATE EVENT_STORE aggregate_events AS
SELECT 
  CAST(cluster_time AS VARCHAR) as event_id,
  operation_type as event_type,
  document_key._id as aggregate_id,
  ns.coll as aggregate_type,
  COALESCE(full_document.version, 1) as event_version,
  full_document as event_data,

  -- Event metadata
  JSON_BUILD_OBJECT(
    'timestamp', cluster_time,
    'source', 'change-stream',
    'causation_id', full_document.causation_id,
    'correlation_id', full_document.correlation_id,
    'user_id', full_document.user_id
  ) as event_metadata

FROM CHANGE_STREAM ON DATABASE
WHERE operation_type IN ('insert', 'update', 'replace')
  AND ns.coll LIKE '%_aggregates'
ORDER BY cluster_time ASC;

-- CQRS read model projections
CREATE MATERIALIZED VIEW user_profile_projection AS
WITH user_events AS (
  SELECT *
  FROM aggregate_events
  WHERE aggregate_type = 'user_aggregates'
    AND event_type IN ('insert', 'update')
  ORDER BY event_version ASC
),
profile_changes AS (
  SELECT 
    aggregate_id as user_id,
    event_data.email,
    event_data.first_name,
    event_data.last_name,
    event_data.preferences,
    event_data.subscription_status,
    event_data.total_orders,
    event_data.lifetime_value,
    event_metadata.timestamp as last_updated,

    -- Calculate derived fields
    ROW_NUMBER() OVER (PARTITION BY aggregate_id ORDER BY event_version DESC) as rn

  FROM user_events
)
SELECT 
  user_id,
  email,
  CONCAT(first_name, ' ', last_name) as full_name,
  preferences,
  subscription_status,
  total_orders,
  lifetime_value,
  last_updated,

  -- User segments
  CASE 
    WHEN lifetime_value > 1000 THEN 'premium'
    WHEN total_orders > 10 THEN 'loyal'
    WHEN total_orders > 0 THEN 'customer'
    ELSE 'prospect'
  END as user_segment,

  -- Activity status
  CASE 
    WHEN last_updated >= CURRENT_TIMESTAMP - INTERVAL '7 days' THEN 'active'
    WHEN last_updated >= CURRENT_TIMESTAMP - INTERVAL '30 days' THEN 'recent'
    WHEN last_updated >= CURRENT_TIMESTAMP - INTERVAL '90 days' THEN 'inactive'
    ELSE 'dormant'
  END as activity_status

FROM profile_changes
WHERE rn = 1; -- Latest version only

-- Saga orchestration monitoring
WITH CHANGE_STREAM saga_events AS (
  SELECT 
    full_document.saga_id,
    full_document.saga_type,
    full_document.status,
    full_document.current_step,
    full_document.steps,
    full_document.started_at,
    full_document.completed_at,
    cluster_time as event_time,
    operation_type

  FROM CHANGE_STREAM ON sagas
  WHERE operation_type IN ('insert', 'update')
),
saga_monitoring AS (
  SELECT 
    saga_id,
    saga_type,
    status,
    current_step,
    ARRAY_LENGTH(steps, 1) as total_steps,
    started_at,
    completed_at,
    event_time,

    -- Progress calculation
    CASE 
      WHEN status = 'completed' THEN 100.0
      WHEN status = 'failed' THEN 0.0
      WHEN total_steps > 0 THEN (current_step::numeric / total_steps) * 100.0
      ELSE 0.0
    END as progress_percentage,

    -- Duration tracking
    CASE 
      WHEN completed_at IS NOT NULL THEN completed_at - started_at
      ELSE CURRENT_TIMESTAMP - started_at
    END as duration,

    -- Status classification
    CASE status
      WHEN 'completed' THEN 'success'
      WHEN 'failed' THEN 'error'
      WHEN 'compensating' THEN 'warning'
      WHEN 'started' THEN 'in_progress'
      ELSE 'unknown'
    END as status_category

  FROM saga_events
),
saga_health AS (
  SELECT 
    saga_type,
    status_category,
    COUNT(*) as saga_count,
    AVG(progress_percentage) as avg_progress,
    AVG(EXTRACT(EPOCH FROM duration)) as avg_duration_seconds,

    -- Performance metrics
    COUNT(*) FILTER (WHERE status = 'completed') as success_count,
    COUNT(*) FILTER (WHERE status = 'failed') as failure_count,
    COUNT(*) FILTER (WHERE duration > INTERVAL '5 minutes') as slow_saga_count

  FROM saga_monitoring
  WHERE event_time >= CURRENT_TIMESTAMP - INTERVAL '1 hour'
  GROUP BY saga_type, status_category
)
SELECT 
  saga_type,
  status_category,
  saga_count,
  ROUND(avg_progress, 1) as avg_progress_pct,
  ROUND(avg_duration_seconds, 2) as avg_duration_sec,
  success_count,
  failure_count,
  slow_saga_count,

  -- Health indicators
  CASE 
    WHEN failure_count > success_count THEN 'unhealthy'
    WHEN slow_saga_count > saga_count * 0.5 THEN 'degraded'
    ELSE 'healthy'
  END as health_status,

  -- Success rate
  CASE 
    WHEN (success_count + failure_count) > 0 
    THEN ROUND((success_count::numeric / (success_count + failure_count)) * 100, 1)
    ELSE 0.0
  END as success_rate_pct

FROM saga_health
ORDER BY saga_type, status_category;

-- Resume token management for fault tolerance
CREATE TABLE change_stream_resume_tokens (
  stream_name VARCHAR(100) PRIMARY KEY,
  resume_token DOCUMENT NOT NULL,
  last_updated TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  stream_config DOCUMENT,

  -- Health tracking
  last_event_time TIMESTAMP,
  error_count INTEGER DEFAULT 0,
  restart_count INTEGER DEFAULT 0
);

-- Monitoring and alerting for change streams
WITH stream_health AS (
  SELECT 
    stream_name,
    resume_token,
    last_updated,
    last_event_time,
    error_count,
    restart_count,

    -- Health calculation
    CURRENT_TIMESTAMP - last_event_time as time_since_last_event,
    CURRENT_TIMESTAMP - last_updated as time_since_update,

    CASE 
      WHEN last_event_time IS NULL THEN 'never_active'
      WHEN CURRENT_TIMESTAMP - last_event_time > INTERVAL '5 minutes' THEN 'stalled'
      WHEN error_count > 5 THEN 'error_prone'
      WHEN restart_count > 3 THEN 'unstable'
      ELSE 'healthy'
    END as health_status

  FROM change_stream_resume_tokens
)
SELECT 
  stream_name,
  health_status,
  EXTRACT(EPOCH FROM time_since_last_event) as seconds_since_last_event,
  error_count,
  restart_count,

  -- Alert conditions
  CASE health_status
    WHEN 'never_active' THEN 'Stream has never processed events - check configuration'
    WHEN 'stalled' THEN 'Stream has not processed events recently - investigate connectivity'
    WHEN 'error_prone' THEN 'High error rate - review error logs and handlers'
    WHEN 'unstable' THEN 'Frequent restarts - check resource limits and stability'
    ELSE 'Stream operating normally'
  END as alert_message,

  CASE health_status
    WHEN 'never_active' THEN 'critical'
    WHEN 'stalled' THEN 'warning'  
    WHEN 'error_prone' THEN 'warning'
    WHEN 'unstable' THEN 'info'
    ELSE 'normal'
  END as alert_severity

FROM stream_health
WHERE health_status != 'healthy'
ORDER BY 
  CASE health_status
    WHEN 'never_active' THEN 1
    WHEN 'stalled' THEN 2
    WHEN 'error_prone' THEN 3
    WHEN 'unstable' THEN 4
    ELSE 5
  END;

-- QueryLeaf provides comprehensive change stream capabilities:
-- 1. SQL-familiar change stream creation and management syntax
-- 2. Real-time event processing with filtering and transformation
-- 3. Event-driven architecture patterns (CQRS, Event Sourcing, Sagas)
-- 4. Advanced stream processing with windowed aggregations
-- 5. Fault tolerance with resume token management
-- 6. Health monitoring and alerting for change streams
-- 7. Integration with MongoDB's native change stream optimizations
-- 8. Reactive query patterns for real-time analytics
-- 9. Multi-collection coordination and event correlation
-- 10. Familiar SQL syntax for complex event-driven applications

Best Practices for Change Stream Implementation

Event-Driven Architecture Design

Essential patterns for building robust event-driven systems:

  1. Event Schema Design: Create consistent event schemas with proper versioning and backward compatibility
  2. Resume Token Management: Implement reliable resume token persistence for fault tolerance
  3. Error Handling: Design comprehensive error handling with retry logic and dead letter queues
  4. Ordering Guarantees: Understand MongoDB's ordering guarantees and design accordingly
  5. Filtering Optimization: Use aggregation pipelines to filter events at the database level
  6. Resource Management: Monitor memory usage and connection limits for change streams

Performance and Scalability

Optimize change streams for high-performance event processing:

  1. Connection Pooling: Use appropriate connection pooling for change stream connections
  2. Batch Processing: Process events in batches where possible to improve throughput
  3. Parallel Processing: Design for parallel event processing while maintaining ordering
  4. Resource Limits: Set appropriate limits on change stream cursors and connections
  5. Monitoring: Implement comprehensive monitoring for stream health and performance
  6. Graceful Degradation: Design fallback mechanisms for change stream failures

Conclusion

MongoDB Change Streams provide native event-driven architecture capabilities that eliminate the complexity and limitations of traditional polling and trigger-based approaches. The ability to react to data changes in real-time with ordered, resumable event streams makes building responsive, scalable applications both powerful and elegant.

Key Change Streams benefits include:

  • Real-Time Reactivity: Instant response to data changes without polling overhead
  • Ordered Event Processing: Guaranteed ordering within shards with resume token support
  • Scalable Architecture: Works seamlessly across replica sets and sharded clusters
  • Rich Filtering: Aggregation pipeline support for sophisticated event filtering and transformation
  • Fault Tolerance: Built-in resume capabilities and error handling for production reliability
  • Ecosystem Integration: Native integration with MongoDB's ACID transactions and tooling

Whether you're building microservices architectures, real-time dashboards, event sourcing systems, or any application requiring immediate response to data changes, MongoDB Change Streams with QueryLeaf's familiar SQL interface provides the foundation for modern event-driven applications.

QueryLeaf Integration: QueryLeaf seamlessly manages MongoDB Change Streams while providing SQL-familiar event processing syntax, change detection patterns, and reactive query capabilities. Advanced event-driven architecture patterns including CQRS, Event Sourcing, and Sagas are elegantly handled through familiar SQL constructs, making sophisticated reactive applications both powerful and accessible to SQL-oriented development teams.

The combination of native change stream capabilities with SQL-style event processing makes MongoDB an ideal platform for applications requiring both real-time responsiveness and familiar database interaction patterns, ensuring your event-driven solutions remain both effective and maintainable as they evolve and scale.

MongoDB Capped Collections and Circular Buffers: High-Performance Logging and Event Storage with SQL-Style Data Management

High-performance applications generate massive volumes of log data, events, and operational metrics that require specialized storage patterns optimized for write-heavy workloads, automatic size management, and chronological data access. Traditional database approaches for logging and event storage struggle with write performance bottlenecks, complex rotation mechanisms, and inefficient space utilization when dealing with continuous data streams.

MongoDB Capped Collections provide purpose-built capabilities for circular buffer patterns, offering fixed-size collections with automatic document rotation, natural insertion-order preservation, and optimized write performance. Unlike traditional logging solutions that require complex partitioning schemes or external rotation tools, capped collections automatically manage storage limits while maintaining chronological access patterns essential for debugging, monitoring, and real-time analytics.

The Traditional Logging Storage Challenge

Conventional approaches to high-volume logging and event storage have significant limitations for modern applications:

-- Traditional relational logging approach - complex and performance-limited

-- PostgreSQL log storage with manual partitioning and rotation
CREATE TABLE application_logs (
    log_id BIGSERIAL PRIMARY KEY,
    application_name VARCHAR(100) NOT NULL,
    service_name VARCHAR(100) NOT NULL,
    instance_id VARCHAR(100),
    log_level VARCHAR(20) NOT NULL,
    message TEXT NOT NULL,

    -- Structured log data
    request_id VARCHAR(100),
    user_id BIGINT,
    session_id VARCHAR(100),
    trace_id VARCHAR(100),
    span_id VARCHAR(100),

    -- Context information  
    source_file VARCHAR(255),
    source_line INTEGER,
    function_name VARCHAR(255),
    thread_id INTEGER,

    -- Metadata
    hostname VARCHAR(255),
    environment VARCHAR(50),
    version VARCHAR(50),

    -- Log data
    log_data JSONB,
    error_stack TEXT,

    created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,

    -- Partitioning key
    partition_date DATE GENERATED ALWAYS AS (created_at::date) STORED

) PARTITION BY RANGE (partition_date);

-- Create monthly partitions (manual maintenance required)
CREATE TABLE application_logs_2024_01 PARTITION OF application_logs
    FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
CREATE TABLE application_logs_2024_02 PARTITION OF application_logs  
    FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');
CREATE TABLE application_logs_2024_03 PARTITION OF application_logs
    FOR VALUES FROM ('2024-03-01') TO ('2024-04-01');
-- ... manual partition creation continues

-- Indexes for log queries (high overhead on writes)
CREATE INDEX idx_logs_app_service_time ON application_logs (application_name, service_name, created_at);
CREATE INDEX idx_logs_level_time ON application_logs (log_level, created_at);
CREATE INDEX idx_logs_request_id ON application_logs (request_id) WHERE request_id IS NOT NULL;
CREATE INDEX idx_logs_user_id_time ON application_logs (user_id, created_at) WHERE user_id IS NOT NULL;
CREATE INDEX idx_logs_trace_id ON application_logs (trace_id) WHERE trace_id IS NOT NULL;

-- Complex log rotation and cleanup procedure
CREATE OR REPLACE FUNCTION cleanup_old_log_partitions()
RETURNS void AS $$
DECLARE
    partition_name TEXT;
    cutoff_date DATE;
BEGIN
    -- Calculate cutoff date (e.g., 90 days retention)
    cutoff_date := CURRENT_DATE - INTERVAL '90 days';

    -- Find and drop old partitions
    FOR partition_name IN 
        SELECT schemaname||'.'||tablename 
        FROM pg_tables 
        WHERE tablename LIKE 'application_logs_____'
        AND tablename < 'application_logs_' || to_char(cutoff_date, 'YYYY_MM')
    LOOP
        EXECUTE 'DROP TABLE IF EXISTS ' || partition_name || ' CASCADE';
        RAISE NOTICE 'Dropped old partition: %', partition_name;
    END LOOP;
END;
$$ LANGUAGE plpgsql;

-- Schedule cleanup job (requires external scheduler)
-- SELECT cron.schedule('cleanup-logs', '0 2 * * 0', 'SELECT cleanup_old_log_partitions();');

-- Complex log analysis query with performance issues
WITH recent_logs AS (
    SELECT 
        application_name,
        service_name,
        log_level,
        message,
        request_id,
        user_id,
        trace_id,
        log_data,
        created_at,

        -- Row number for chronological ordering
        ROW_NUMBER() OVER (
            PARTITION BY application_name, service_name 
            ORDER BY created_at DESC
        ) as rn,

        -- Lag for time between log entries
        LAG(created_at) OVER (
            PARTITION BY application_name, service_name 
            ORDER BY created_at
        ) as prev_log_time

    FROM application_logs
    WHERE created_at >= CURRENT_TIMESTAMP - INTERVAL '1 hour'
      AND log_level IN ('ERROR', 'WARN', 'INFO')
),
error_analysis AS (
    SELECT 
        application_name,
        service_name,
        COUNT(*) as total_logs,
        COUNT(*) FILTER (WHERE log_level = 'ERROR') as error_count,
        COUNT(*) FILTER (WHERE log_level = 'WARN') as warning_count,
        COUNT(*) FILTER (WHERE log_level = 'INFO') as info_count,

        -- Error patterns
        array_agg(DISTINCT message) FILTER (WHERE log_level = 'ERROR') as error_messages,
        COUNT(DISTINCT request_id) as unique_requests,
        COUNT(DISTINCT user_id) as affected_users,

        -- Timing analysis
        AVG(EXTRACT(EPOCH FROM (created_at - prev_log_time))) as avg_log_interval,

        -- Recent errors for immediate attention
        array_agg(
            json_build_object(
                'message', message,
                'created_at', created_at,
                'trace_id', trace_id,
                'request_id', request_id
            ) ORDER BY created_at DESC
        ) FILTER (WHERE log_level = 'ERROR' AND rn <= 10) as recent_errors

    FROM recent_logs
    GROUP BY application_name, service_name
),
log_volume_trends AS (
    SELECT 
        application_name,
        service_name,
        DATE_TRUNC('minute', created_at) as minute_bucket,
        COUNT(*) as logs_per_minute,
        COUNT(*) FILTER (WHERE log_level = 'ERROR') as errors_per_minute
    FROM application_logs
    WHERE created_at >= CURRENT_TIMESTAMP - INTERVAL '2 hours'
    GROUP BY application_name, service_name, DATE_TRUNC('minute', created_at)
)
SELECT 
    ea.application_name,
    ea.service_name,
    ea.total_logs,
    ea.error_count,
    ea.warning_count,
    ea.info_count,
    ROUND((ea.error_count::numeric / ea.total_logs) * 100, 2) as error_rate_percent,
    ea.unique_requests,
    ea.affected_users,
    ROUND(ea.avg_log_interval::numeric, 3) as avg_seconds_between_logs,

    -- Volume trend analysis
    (
        SELECT AVG(logs_per_minute)
        FROM log_volume_trends lvt 
        WHERE lvt.application_name = ea.application_name 
          AND lvt.service_name = ea.service_name
    ) as avg_logs_per_minute,

    (
        SELECT MAX(logs_per_minute)
        FROM log_volume_trends lvt
        WHERE lvt.application_name = ea.application_name
          AND lvt.service_name = ea.service_name  
    ) as peak_logs_per_minute,

    -- Top error messages
    (
        SELECT string_agg(error_msg, '; ') 
        FROM unnest(ea.error_messages) as error_msg
        LIMIT 3
    ) as top_error_messages,

    ea.recent_errors

FROM error_analysis ea
ORDER BY ea.error_count DESC, ea.total_logs DESC;

-- Problems with traditional logging approach:
-- 1. Complex partition management and maintenance overhead
-- 2. Write performance degradation with increasing indexes
-- 3. Manual log rotation and cleanup procedures
-- 4. Storage space management challenges
-- 5. Query performance issues across multiple partitions
-- 6. Complex chronological ordering requirements
-- 7. High operational overhead for high-volume logging
-- 8. Scalability limitations with increasing log volumes
-- 9. Backup and restore complexity with partitioned tables
-- 10. Limited flexibility for varying log data structures

-- MySQL logging limitations (even more restrictive)
CREATE TABLE mysql_logs (
    id BIGINT AUTO_INCREMENT PRIMARY KEY,
    app_name VARCHAR(100),
    level VARCHAR(20),
    message TEXT,
    log_data JSON,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    -- MySQL partitioning limitations
    INDEX idx_time_level (created_at, level),
    INDEX idx_app_time (app_name, created_at)
) 
-- Basic range partitioning (limited functionality)
PARTITION BY RANGE (UNIX_TIMESTAMP(created_at)) (
    PARTITION p2024_q1 VALUES LESS THAN (UNIX_TIMESTAMP('2024-04-01')),
    PARTITION p2024_q2 VALUES LESS THAN (UNIX_TIMESTAMP('2024-07-01')),
    PARTITION p2024_q3 VALUES LESS THAN (UNIX_TIMESTAMP('2024-10-01')),
    PARTITION p2024_q4 VALUES LESS THAN (UNIX_TIMESTAMP('2025-01-01'))
);

-- Basic log query in MySQL (limited analytical capabilities)
SELECT 
    app_name,
    level,
    COUNT(*) as log_count,
    MAX(created_at) as latest_log
FROM mysql_logs
WHERE created_at >= DATE_SUB(NOW(), INTERVAL 1 HOUR)
  AND level IN ('ERROR', 'WARN')
GROUP BY app_name, level
ORDER BY log_count DESC
LIMIT 20;

-- MySQL limitations:
-- - Limited JSON functionality compared to PostgreSQL
-- - Basic partitioning capabilities only  
-- - Poor performance with high-volume inserts
-- - Limited analytical query capabilities
-- - No advanced window functions
-- - Complex maintenance procedures
-- - Storage engine limitations for write-heavy workloads

MongoDB Capped Collections provide optimized circular buffer capabilities:

// MongoDB Capped Collections - purpose-built for high-performance logging
const { MongoClient } = require('mongodb');

const client = new MongoClient('mongodb://localhost:27017');
const db = client.db('logging_platform');

// Create capped collections for different log types and performance requirements
const createOptimizedCappedCollections = async () => {
  try {
    // High-volume application logs - 1GB circular buffer
    await db.createCollection('application_logs', {
      capped: true,
      size: 1024 * 1024 * 1024, // 1GB maximum size
      max: 10000000 // Maximum 10 million documents (optional limit)
    });

    // Error logs - smaller, longer retention
    await db.createCollection('error_logs', {
      capped: true,
      size: 256 * 1024 * 1024, // 256MB maximum size
      max: 1000000 // Maximum 1 million error documents
    });

    // Access logs - high throughput, shorter retention
    await db.createCollection('access_logs', {
      capped: true,
      size: 2 * 1024 * 1024 * 1024, // 2GB maximum size
      // No max document limit for maximum throughput
    });

    // Performance metrics - structured time-series data
    await db.createCollection('performance_metrics', {
      capped: true,
      size: 512 * 1024 * 1024, // 512MB maximum size
      max: 5000000 // Maximum 5 million metric points
    });

    // Audit trail - compliance and security logs
    await db.createCollection('audit_logs', {
      capped: true,
      size: 128 * 1024 * 1024, // 128MB maximum size
      max: 500000 // Maximum 500k audit events
    });

    console.log('Capped collections created successfully');

    // Create indexes for common query patterns (minimal overhead)
    await createOptimalIndexes();

    return {
      applicationLogs: db.collection('application_logs'),
      errorLogs: db.collection('error_logs'),
      accessLogs: db.collection('access_logs'),
      performanceMetrics: db.collection('performance_metrics'),
      auditLogs: db.collection('audit_logs')
    };

  } catch (error) {
    console.error('Error creating capped collections:', error);
    throw error;
  }
};

async function createOptimalIndexes() {
  // Minimal indexes for capped collections to maintain write performance
  // Note: Capped collections maintain insertion order automatically

  // Application logs - service and level queries
  await db.collection('application_logs').createIndex({ 
    'service': 1, 
    'level': 1 
  });

  // Error logs - application and timestamp queries
  await db.collection('error_logs').createIndex({ 
    'application': 1, 
    'timestamp': -1 
  });

  // Access logs - endpoint performance analysis
  await db.collection('access_logs').createIndex({ 
    'endpoint': 1, 
    'status_code': 1 
  });

  // Performance metrics - metric type and timestamp
  await db.collection('performance_metrics').createIndex({ 
    'metric_type': 1, 
    'instance_id': 1 
  });

  // Audit logs - user and action queries
  await db.collection('audit_logs').createIndex({ 
    'user_id': 1, 
    'action': 1 
  });

  console.log('Optimal indexes created for capped collections');
}

// High-performance log ingestion with batch processing
const logIngestionSystem = {
  collections: null,
  buffers: new Map(),
  batchSizes: {
    application_logs: 1000,
    error_logs: 100,
    access_logs: 2000,
    performance_metrics: 500,
    audit_logs: 50
  },
  flushIntervals: new Map(),

  async initialize() {
    this.collections = await createOptimizedCappedCollections();

    // Start batch flush timers for each collection
    for (const [collectionName, batchSize] of Object.entries(this.batchSizes)) {
      this.buffers.set(collectionName, []);

      // Flush timer based on expected volume
      const flushInterval = collectionName === 'access_logs' ? 1000 : // 1 second
                           collectionName === 'application_logs' ? 2000 : // 2 seconds
                           5000; // 5 seconds for others

      const intervalId = setInterval(
        () => this.flushBuffer(collectionName), 
        flushInterval
      );

      this.flushIntervals.set(collectionName, intervalId);
    }

    console.log('Log ingestion system initialized');
  },

  async logApplicationEvent(logEntry) {
    // Structured application log entry
    const document = {
      timestamp: new Date(),
      application: logEntry.application || 'unknown',
      service: logEntry.service || 'unknown',
      instance: logEntry.instance || process.env.HOSTNAME || 'unknown',
      level: logEntry.level || 'INFO',
      message: logEntry.message,

      // Request context
      request: {
        id: logEntry.requestId,
        method: logEntry.method,
        endpoint: logEntry.endpoint,
        user_id: logEntry.userId,
        session_id: logEntry.sessionId,
        ip_address: logEntry.ipAddress
      },

      // Trace context
      trace: {
        trace_id: logEntry.traceId,
        span_id: logEntry.spanId,
        parent_span_id: logEntry.parentSpanId,
        flags: logEntry.traceFlags
      },

      // Source information
      source: {
        file: logEntry.sourceFile,
        line: logEntry.sourceLine,
        function: logEntry.functionName,
        thread: logEntry.threadId
      },

      // Environment context
      environment: {
        name: logEntry.environment || process.env.NODE_ENV || 'development',
        version: logEntry.version || process.env.APP_VERSION || '1.0.0',
        build: logEntry.build || process.env.BUILD_ID,
        commit: logEntry.commit || process.env.GIT_COMMIT
      },

      // Structured data
      data: logEntry.data || {},

      // Performance metrics
      metrics: {
        duration_ms: logEntry.duration,
        memory_mb: logEntry.memoryUsage,
        cpu_percent: logEntry.cpuUsage
      },

      // Error context (if applicable)
      error: logEntry.error ? {
        name: logEntry.error.name,
        message: logEntry.error.message,
        stack: logEntry.error.stack,
        code: logEntry.error.code,
        details: logEntry.error.details
      } : null
    };

    await this.bufferDocument('application_logs', document);
  },

  async logAccessEvent(accessEntry) {
    // HTTP access log optimized for high throughput
    const document = {
      timestamp: new Date(),

      // Request details
      method: accessEntry.method,
      endpoint: accessEntry.endpoint,
      path: accessEntry.path,
      query_string: accessEntry.queryString,

      // Response details
      status_code: accessEntry.statusCode,
      response_size: accessEntry.responseSize,
      content_type: accessEntry.contentType,

      // Timing information
      duration_ms: accessEntry.duration,
      queue_time_ms: accessEntry.queueTime,
      process_time_ms: accessEntry.processTime,

      // Client information
      client: {
        ip: accessEntry.clientIp,
        user_agent: accessEntry.userAgent,
        referer: accessEntry.referer,
        user_id: accessEntry.userId,
        session_id: accessEntry.sessionId
      },

      // Geographic data (if available)
      geo: accessEntry.geo ? {
        country: accessEntry.geo.country,
        region: accessEntry.geo.region,
        city: accessEntry.geo.city,
        coordinates: accessEntry.geo.coordinates
      } : null,

      // Application context
      application: accessEntry.application,
      service: accessEntry.service,
      instance: accessEntry.instance || process.env.HOSTNAME,
      version: accessEntry.version,

      // Cache information
      cache: {
        hit: accessEntry.cacheHit,
        key: accessEntry.cacheKey,
        ttl: accessEntry.cacheTTL
      },

      // Load balancing and routing
      routing: {
        backend: accessEntry.backend,
        upstream_time: accessEntry.upstreamTime,
        retry_count: accessEntry.retryCount
      }
    };

    await this.bufferDocument('access_logs', document);
  },

  async logPerformanceMetric(metricEntry) {
    // System and application performance metrics
    const document = {
      timestamp: new Date(),

      metric_type: metricEntry.type, // 'cpu', 'memory', 'disk', 'network', 'application'
      metric_name: metricEntry.name,
      value: metricEntry.value,
      unit: metricEntry.unit,

      // Instance information
      instance_id: metricEntry.instanceId || process.env.HOSTNAME,
      application: metricEntry.application,
      service: metricEntry.service,

      // Dimensional metadata
      dimensions: metricEntry.dimensions || {},

      // Aggregation information
      aggregation: {
        type: metricEntry.aggregationType, // 'gauge', 'counter', 'histogram', 'summary'
        interval_seconds: metricEntry.intervalSeconds,
        sample_count: metricEntry.sampleCount
      },

      // Statistical data (for histograms/summaries)
      statistics: metricEntry.statistics ? {
        min: metricEntry.statistics.min,
        max: metricEntry.statistics.max,
        mean: metricEntry.statistics.mean,
        median: metricEntry.statistics.median,
        p95: metricEntry.statistics.p95,
        p99: metricEntry.statistics.p99,
        std_dev: metricEntry.statistics.stdDev
      } : null,

      // Alerts and thresholds
      alerts: {
        warning_threshold: metricEntry.warningThreshold,
        critical_threshold: metricEntry.criticalThreshold,
        is_anomaly: metricEntry.isAnomaly,
        anomaly_score: metricEntry.anomalyScore
      }
    };

    await this.bufferDocument('performance_metrics', document);
  },

  async logAuditEvent(auditEntry) {
    // Security and compliance audit logging
    const document = {
      timestamp: new Date(),

      // Event classification
      event_type: auditEntry.eventType, // 'authentication', 'authorization', 'data_access', 'configuration'
      event_category: auditEntry.category, // 'security', 'compliance', 'operational'
      severity: auditEntry.severity || 'INFO',

      // Actor information
      actor: {
        user_id: auditEntry.userId,
        username: auditEntry.username,
        email: auditEntry.email,
        roles: auditEntry.roles || [],
        groups: auditEntry.groups || [],
        is_service_account: auditEntry.isServiceAccount || false,
        authentication_method: auditEntry.authMethod
      },

      // Target resource
      target: {
        resource_type: auditEntry.resourceType,
        resource_id: auditEntry.resourceId,
        resource_name: auditEntry.resourceName,
        owner: auditEntry.resourceOwner,
        classification: auditEntry.dataClassification
      },

      // Action details
      action: {
        type: auditEntry.action, // 'create', 'read', 'update', 'delete', 'login', 'logout'
        description: auditEntry.description,
        result: auditEntry.result, // 'success', 'failure', 'partial'
        reason: auditEntry.reason
      },

      // Request context
      request: {
        id: auditEntry.requestId,
        source_ip: auditEntry.sourceIp,
        user_agent: auditEntry.userAgent,
        session_id: auditEntry.sessionId,
        api_key: auditEntry.apiKey ? 'REDACTED' : null
      },

      // Data changes (for modification events)
      changes: auditEntry.changes ? {
        before: auditEntry.changes.before,
        after: auditEntry.changes.after,
        fields_changed: auditEntry.changes.fieldsChanged || []
      } : null,

      // Compliance and regulatory
      compliance: {
        regulation: auditEntry.regulation, // 'GDPR', 'SOX', 'HIPAA', 'PCI-DSS'
        retention_period: auditEntry.retentionPeriod,
        encryption_required: auditEntry.encryptionRequired || false
      },

      // Application context
      application: auditEntry.application,
      service: auditEntry.service,
      environment: auditEntry.environment
    };

    await this.bufferDocument('audit_logs', document);
  },

  async bufferDocument(collectionName, document) {
    const buffer = this.buffers.get(collectionName);
    if (!buffer) {
      console.error(`Unknown collection: ${collectionName}`);
      return;
    }

    buffer.push(document);

    // Flush buffer if it reaches batch size
    if (buffer.length >= this.batchSizes[collectionName]) {
      await this.flushBuffer(collectionName);
    }
  },

  async flushBuffer(collectionName) {
    const buffer = this.buffers.get(collectionName);
    if (!buffer || buffer.length === 0) {
      return;
    }

    // Move buffer contents to local array and clear buffer
    const documents = buffer.splice(0);

    try {
      const collection = this.collections[this.getCollectionProperty(collectionName)];
      if (!collection) {
        console.error(`Collection not found: ${collectionName}`);
        return;
      }

      // High-performance batch insert
      const result = await collection.insertMany(documents, {
        ordered: false, // Allow parallel inserts
        writeConcern: { w: 1, j: false } // Optimize for speed
      });

      if (result.insertedCount !== documents.length) {
        console.warn(`Partial insert: ${result.insertedCount}/${documents.length} documents inserted to ${collectionName}`);
      }

    } catch (error) {
      console.error(`Error flushing buffer for ${collectionName}:`, error);

      // Re-add documents to buffer for retry (optional)
      if (error.code !== 11000) { // Not a duplicate key error
        buffer.unshift(...documents);
      }
    }
  },

  getCollectionProperty(collectionName) {
    const mapping = {
      'application_logs': 'applicationLogs',
      'error_logs': 'errorLogs',
      'access_logs': 'accessLogs',
      'performance_metrics': 'performanceMetrics',
      'audit_logs': 'auditLogs'
    };
    return mapping[collectionName];
  },

  async shutdown() {
    console.log('Shutting down log ingestion system...');

    // Clear all flush intervals
    for (const intervalId of this.flushIntervals.values()) {
      clearInterval(intervalId);
    }

    // Flush all remaining buffers
    const flushPromises = [];
    for (const collectionName of this.buffers.keys()) {
      flushPromises.push(this.flushBuffer(collectionName));
    }

    await Promise.all(flushPromises);

    console.log('Log ingestion system shutdown complete');
  }
};

// Advanced log analysis and monitoring
const logAnalysisEngine = {
  collections: null,

  async initialize(collections) {
    this.collections = collections;
  },

  async analyzeRecentErrors(timeRangeMinutes = 60) {
    console.log(`Analyzing errors from last ${timeRangeMinutes} minutes...`);

    const cutoffTime = new Date(Date.now() - timeRangeMinutes * 60 * 1000);

    const errorAnalysis = await this.collections.applicationLogs.aggregate([
      {
        $match: {
          timestamp: { $gte: cutoffTime },
          level: { $in: ['ERROR', 'FATAL'] }
        }
      },

      // Group by error patterns
      {
        $group: {
          _id: {
            application: '$application',
            service: '$service',
            errorMessage: {
              $substr: ['$message', 0, 100] // Truncate for grouping
            }
          },

          count: { $sum: 1 },
          firstOccurrence: { $min: '$timestamp' },
          lastOccurrence: { $max: '$timestamp' },
          affectedInstances: { $addToSet: '$instance' },
          affectedUsers: { $addToSet: '$request.user_id' },

          // Sample error details
          sampleErrors: {
            $push: {
              timestamp: '$timestamp',
              message: '$message',
              request_id: '$request.id',
              trace_id: '$trace.trace_id',
              stack: '$error.stack'
            }
          }
        }
      },

      // Calculate error characteristics
      {
        $addFields: {
          duration: {
            $divide: [
              { $subtract: ['$lastOccurrence', '$firstOccurrence'] },
              1000 // Convert to seconds
            ]
          },
          errorRate: {
            $divide: ['$count', timeRangeMinutes] // Errors per minute
          },
          instanceCount: { $size: '$affectedInstances' },
          userCount: { $size: '$affectedUsers' },

          // Take only recent sample errors
          recentSamples: { $slice: ['$sampleErrors', -5] }
        }
      },

      // Sort by error frequency and recency
      {
        $sort: {
          count: -1,
          lastOccurrence: -1
        }
      },

      {
        $limit: 50 // Top 50 error patterns
      },

      // Format for analysis output
      {
        $project: {
          application: '$_id.application',
          service: '$_id.service',
          errorPattern: '$_id.errorMessage',
          count: 1,
          errorRate: { $round: ['$errorRate', 2] },
          duration: { $round: ['$duration', 1] },
          firstOccurrence: 1,
          lastOccurrence: 1,
          instanceCount: 1,
          userCount: 1,
          affectedInstances: 1,
          recentSamples: 1,

          // Severity assessment
          severity: {
            $switch: {
              branches: [
                {
                  case: { $gt: ['$errorRate', 10] }, // > 10 errors/minute
                  then: 'CRITICAL'
                },
                {
                  case: { $gt: ['$errorRate', 5] }, // > 5 errors/minute
                  then: 'HIGH'
                },
                {
                  case: { $gt: ['$errorRate', 1] }, // > 1 error/minute
                  then: 'MEDIUM'
                }
              ],
              default: 'LOW'
            }
          }
        }
      }
    ]).toArray();

    console.log(`Found ${errorAnalysis.length} error patterns`);
    return errorAnalysis;
  },

  async analyzeAccessPatterns(timeRangeMinutes = 30) {
    console.log(`Analyzing access patterns from last ${timeRangeMinutes} minutes...`);

    const cutoffTime = new Date(Date.now() - timeRangeMinutes * 60 * 1000);

    const accessAnalysis = await this.collections.accessLogs.aggregate([
      {
        $match: {
          timestamp: { $gte: cutoffTime }
        }
      },

      // Group by endpoint and status
      {
        $group: {
          _id: {
            endpoint: '$endpoint',
            method: '$method',
            statusClass: {
              $switch: {
                branches: [
                  { case: { $lt: ['$status_code', 300] }, then: '2xx' },
                  { case: { $lt: ['$status_code', 400] }, then: '3xx' },
                  { case: { $lt: ['$status_code', 500] }, then: '4xx' },
                  { case: { $gte: ['$status_code', 500] }, then: '5xx' }
                ],
                default: 'unknown'
              }
            }
          },

          requestCount: { $sum: 1 },
          avgDuration: { $avg: '$duration_ms' },
          minDuration: { $min: '$duration_ms' },
          maxDuration: { $max: '$duration_ms' },

          // Percentile approximations
          durations: { $push: '$duration_ms' },

          totalResponseSize: { $sum: '$response_size' },
          uniqueClients: { $addToSet: '$client.ip' },
          uniqueUsers: { $addToSet: '$client.user_id' },

          // Error details for non-2xx responses
          errorSamples: {
            $push: {
              $cond: [
                { $gte: ['$status_code', 400] },
                {
                  timestamp: '$timestamp',
                  status: '$status_code',
                  client_ip: '$client.ip',
                  user_id: '$client.user_id',
                  duration: '$duration_ms'
                },
                null
              ]
            }
          }
        }
      },

      // Calculate additional metrics
      {
        $addFields: {
          requestsPerMinute: { $divide: ['$requestCount', timeRangeMinutes] },
          avgResponseSize: { $divide: ['$totalResponseSize', '$requestCount'] },
          uniqueClientCount: { $size: '$uniqueClients' },
          uniqueUserCount: { $size: '$uniqueUsers' },

          // Filter out null error samples
          errorSamples: {
            $filter: {
              input: '$errorSamples',
              cond: { $ne: ['$$this', null] }
            }
          },

          // Approximate percentiles (simplified)
          p95Duration: {
            $let: {
              vars: {
                sortedDurations: {
                  $sortArray: {
                    input: '$durations',
                    sortBy: 1
                  }
                }
              },
              in: {
                $arrayElemAt: [
                  '$$sortedDurations',
                  { $floor: { $multiply: [{ $size: '$$sortedDurations' }, 0.95] } }
                ]
              }
            }
          }
        }
      },

      // Sort by request volume
      {
        $sort: {
          requestCount: -1
        }
      },

      {
        $limit: 100 // Top 100 endpoints
      },

      // Format output
      {
        $project: {
          endpoint: '$_id.endpoint',
          method: '$_id.method',
          statusClass: '$_id.statusClass',
          requestCount: 1,
          requestsPerMinute: { $round: ['$requestsPerMinute', 2] },
          avgDuration: { $round: ['$avgDuration', 1] },
          minDuration: 1,
          maxDuration: 1,
          p95Duration: { $round: ['$p95Duration', 1] },
          avgResponseSize: { $round: ['$avgResponseSize', 0] },
          uniqueClientCount: 1,
          uniqueUserCount: 1,
          errorSamples: { $slice: ['$errorSamples', 5] }, // Recent 5 errors

          // Performance assessment
          performanceStatus: {
            $switch: {
              branches: [
                {
                  case: { $gt: ['$avgDuration', 5000] }, // > 5 seconds
                  then: 'SLOW'
                },
                {
                  case: { $gt: ['$avgDuration', 2000] }, // > 2 seconds
                  then: 'WARNING'
                }
              ],
              default: 'NORMAL'
            }
          }
        }
      }
    ]).toArray();

    console.log(`Analyzed ${accessAnalysis.length} endpoint patterns`);
    return accessAnalysis;
  },

  async generatePerformanceReport(timeRangeMinutes = 60) {
    console.log(`Generating performance report for last ${timeRangeMinutes} minutes...`);

    const cutoffTime = new Date(Date.now() - timeRangeMinutes * 60 * 1000);

    const performanceReport = await this.collections.performanceMetrics.aggregate([
      {
        $match: {
          timestamp: { $gte: cutoffTime }
        }
      },

      // Group by metric type and instance
      {
        $group: {
          _id: {
            metricType: '$metric_type',
            metricName: '$metric_name',
            instanceId: '$instance_id'
          },

          sampleCount: { $sum: 1 },
          avgValue: { $avg: '$value' },
          minValue: { $min: '$value' },
          maxValue: { $max: '$value' },
          latestValue: { $last: '$value' },

          // Time series data for trending
          timeSeries: {
            $push: {
              timestamp: '$timestamp',
              value: '$value'
            }
          },

          // Alert information
          alertCount: {
            $sum: {
              $cond: [
                {
                  $or: [
                    { $gte: ['$value', '$alerts.critical_threshold'] },
                    { $gte: ['$value', '$alerts.warning_threshold'] }
                  ]
                },
                1,
                0
              ]
            }
          }
        }
      },

      // Calculate trend and status
      {
        $addFields: {
          // Simple trend calculation (comparing first and last values)
          trend: {
            $let: {
              vars: {
                firstValue: { $arrayElemAt: ['$timeSeries', 0] },
                lastValue: { $arrayElemAt: ['$timeSeries', -1] }
              },
              in: {
                $cond: [
                  { $gt: ['$$lastValue.value', '$$firstValue.value'] },
                  'INCREASING',
                  {
                    $cond: [
                      { $lt: ['$$lastValue.value', '$$firstValue.value'] },
                      'DECREASING',
                      'STABLE'
                    ]
                  }
                ]
              }
            }
          },

          // Alert status
          alertStatus: {
            $cond: [
              { $gt: ['$alertCount', 0] },
              'ALERTS_TRIGGERED',
              'NORMAL'
            ]
          }
        }
      },

      // Group by metric type for summary
      {
        $group: {
          _id: '$_id.metricType',

          metrics: {
            $push: {
              name: '$_id.metricName',
              instance: '$_id.instanceId',
              sampleCount: '$sampleCount',
              avgValue: '$avgValue',
              minValue: '$minValue',
              maxValue: '$maxValue',
              latestValue: '$latestValue',
              trend: '$trend',
              alertStatus: '$alertStatus',
              alertCount: '$alertCount'
            }
          },

          totalSamples: { $sum: '$sampleCount' },
          instanceCount: { $addToSet: '$_id.instanceId' },
          totalAlerts: { $sum: '$alertCount' }
        }
      },

      {
        $addFields: {
          instanceCount: { $size: '$instanceCount' }
        }
      },

      {
        $sort: { _id: 1 }
      }
    ]).toArray();

    console.log(`Performance report generated for ${performanceReport.length} metric types`);
    return performanceReport;
  },

  async getTailLogs(collectionName, limit = 100) {
    // Get most recent logs (natural order in capped collections)
    const collection = this.collections[this.getCollectionProperty(collectionName)];
    if (!collection) {
      throw new Error(`Collection not found: ${collectionName}`);
    }

    // Capped collections maintain insertion order, so we can use natural order
    const logs = await collection.find()
      .sort({ $natural: -1 }) // Reverse natural order (most recent first)
      .limit(limit)
      .toArray();

    return logs.reverse(); // Return in chronological order (oldest first)
  },

  getCollectionProperty(collectionName) {
    const mapping = {
      'application_logs': 'applicationLogs',
      'error_logs': 'errorLogs', 
      'access_logs': 'accessLogs',
      'performance_metrics': 'performanceMetrics',
      'audit_logs': 'auditLogs'
    };
    return mapping[collectionName];
  }
};

// Benefits of MongoDB Capped Collections:
// - Automatic size management with guaranteed space limits
// - Natural insertion order preservation without indexes
// - Optimized write performance for high-throughput logging
// - Circular buffer behavior with automatic old document removal
// - No fragmentation or maintenance overhead
// - Tailable cursors for real-time log streaming
// - Atomic document rotation without application logic
// - Consistent performance regardless of collection size
// - Integration with MongoDB ecosystem and tools
// - Built-in clustering and replication support

module.exports = {
  createOptimizedCappedCollections,
  logIngestionSystem,
  logAnalysisEngine
};

Understanding MongoDB Capped Collections Architecture

Advanced Capped Collection Management and Patterns

Implement sophisticated capped collection strategies for different logging scenarios:

// Advanced capped collection management system
class CappedCollectionManager {
  constructor(db, options = {}) {
    this.db = db;
    this.options = {
      // Default configurations
      defaultSize: 100 * 1024 * 1024, // 100MB
      retentionPeriods: {
        application_logs: 7 * 24 * 60 * 60 * 1000, // 7 days
        error_logs: 30 * 24 * 60 * 60 * 1000, // 30 days  
        access_logs: 24 * 60 * 60 * 1000, // 24 hours
        audit_logs: 365 * 24 * 60 * 60 * 1000 // 1 year
      },
      ...options
    };

    this.collections = new Map();
    this.tails = new Map();
    this.statistics = new Map();
  }

  async createCappedCollectionHierarchy() {
    // Create hierarchical capped collections for different log levels and retention

    // Critical logs - smallest size, longest retention
    await this.createTieredCollection('critical_logs', {
      size: 50 * 1024 * 1024, // 50MB
      max: 100000,
      retention: 'critical'
    });

    // Error logs - medium size and retention  
    await this.createTieredCollection('error_logs', {
      size: 200 * 1024 * 1024, // 200MB
      max: 500000,
      retention: 'error'
    });

    // Warning logs - larger size, medium retention
    await this.createTieredCollection('warning_logs', {
      size: 300 * 1024 * 1024, // 300MB  
      max: 1000000,
      retention: 'warning'
    });

    // Info logs - large size, shorter retention
    await this.createTieredCollection('info_logs', {
      size: 500 * 1024 * 1024, // 500MB
      max: 2000000, 
      retention: 'info'
    });

    // Debug logs - largest size, shortest retention
    await this.createTieredCollection('debug_logs', {
      size: 1024 * 1024 * 1024, // 1GB
      max: 5000000,
      retention: 'debug'
    });

    // Specialized collections
    await this.createSpecializedCollections();

    console.log('Capped collection hierarchy created');
  }

  async createTieredCollection(name, config) {
    try {
      const collection = await this.db.createCollection(name, {
        capped: true,
        size: config.size,
        max: config.max
      });

      this.collections.set(name, collection);

      // Initialize statistics tracking
      this.statistics.set(name, {
        documentsInserted: 0,
        totalSize: 0,
        lastInsert: null,
        insertRate: 0,
        retentionType: config.retention
      });

      console.log(`Created capped collection: ${name} (${config.size} bytes, max ${config.max} docs)`);

    } catch (error) {
      if (error.code === 48) { // Collection already exists
        console.log(`Capped collection ${name} already exists`);
        const collection = this.db.collection(name);
        this.collections.set(name, collection);
      } else {
        throw error;
      }
    }
  }

  async createSpecializedCollections() {
    // Real-time metrics collection
    await this.createTieredCollection('realtime_metrics', {
      size: 100 * 1024 * 1024, // 100MB
      max: 1000000,
      retention: 'realtime'
    });

    // Security events collection
    await this.createTieredCollection('security_events', {
      size: 50 * 1024 * 1024, // 50MB
      max: 200000,
      retention: 'security'
    });

    // Business events collection  
    await this.createTieredCollection('business_events', {
      size: 200 * 1024 * 1024, // 200MB
      max: 1000000,
      retention: 'business'
    });

    // System health collection
    await this.createTieredCollection('system_health', {
      size: 150 * 1024 * 1024, // 150MB
      max: 500000,
      retention: 'system'
    });

    // Create minimal indexes for specialized queries
    await this.createSpecializedIndexes();
  }

  async createSpecializedIndexes() {
    // Minimal indexes to maintain write performance

    // Real-time metrics - by type and timestamp
    await this.collections.get('realtime_metrics').createIndex({
      metric_type: 1,
      timestamp: -1
    });

    // Security events - by severity and event type
    await this.collections.get('security_events').createIndex({
      severity: 1,
      event_type: 1
    });

    // Business events - by event category
    await this.collections.get('business_events').createIndex({
      category: 1,
      user_id: 1
    });

    // System health - by component and status
    await this.collections.get('system_health').createIndex({
      component: 1,
      status: 1
    });
  }

  async insertWithRouting(logLevel, document) {
    // Route documents to appropriate capped collection based on level
    const routingMap = {
      FATAL: 'critical_logs',
      ERROR: 'error_logs', 
      WARN: 'warning_logs',
      INFO: 'info_logs',
      DEBUG: 'debug_logs',
      TRACE: 'debug_logs'
    };

    const collectionName = routingMap[logLevel] || 'info_logs';
    const collection = this.collections.get(collectionName);

    if (!collection) {
      throw new Error(`Collection not found: ${collectionName}`);
    }

    // Add routing metadata
    const enrichedDocument = {
      ...document,
      _routed_to: collectionName,
      _inserted_at: new Date()
    };

    try {
      const result = await collection.insertOne(enrichedDocument);

      // Update statistics
      this.updateInsertionStatistics(collectionName, enrichedDocument);

      return result;
    } catch (error) {
      console.error(`Error inserting to ${collectionName}:`, error);
      throw error;
    }
  }

  updateInsertionStatistics(collectionName, document) {
    const stats = this.statistics.get(collectionName);
    if (!stats) return;

    stats.documentsInserted++;
    stats.totalSize += this.estimateDocumentSize(document);
    stats.lastInsert = new Date();

    // Calculate insertion rate (documents per second)
    if (stats.documentsInserted > 1) {
      const timeSpan = stats.lastInsert - stats.firstInsert || 1;
      stats.insertRate = (stats.documentsInserted / (timeSpan / 1000)).toFixed(2);
    } else {
      stats.firstInsert = stats.lastInsert;
    }
  }

  estimateDocumentSize(document) {
    // Rough estimation of document size in bytes
    return JSON.stringify(document).length * 2; // UTF-8 approximation
  }

  async setupTailableStreams() {
    // Set up tailable cursors for real-time log streaming
    console.log('Setting up tailable cursors for real-time streaming...');

    for (const [collectionName, collection] of this.collections.entries()) {
      const tail = collection.find().addCursorFlag('tailable', true)
                             .addCursorFlag('awaitData', true);

      this.tails.set(collectionName, tail);

      // Start async processing of tailable cursor
      this.processTailableStream(collectionName, tail);
    }
  }

  async processTailableStream(collectionName, cursor) {
    console.log(`Starting tailable stream for: ${collectionName}`);

    try {
      for await (const document of cursor) {
        // Process real-time log document
        await this.processRealtimeLog(collectionName, document);
      }
    } catch (error) {
      console.error(`Tailable stream error for ${collectionName}:`, error);

      // Attempt to restart the stream
      setTimeout(() => {
        this.restartTailableStream(collectionName);
      }, 5000);
    }
  }

  async processRealtimeLog(collectionName, document) {
    // Real-time processing of log entries
    const stats = this.statistics.get(collectionName);

    // Update real-time statistics
    if (stats) {
      stats.documentsInserted++;
      stats.lastInsert = new Date();
    }

    // Trigger alerts for critical conditions
    if (collectionName === 'critical_logs' || collectionName === 'error_logs') {
      await this.checkForAlertConditions(document);
    }

    // Real-time analytics
    if (collectionName === 'realtime_metrics') {
      await this.updateRealtimeMetrics(document);
    }

    // Security monitoring
    if (collectionName === 'security_events') {
      await this.analyzeSecurityEvent(document);
    }

    // Emit to external systems (WebSocket, message queues, etc.)
    this.emitRealtimeEvent(collectionName, document);
  }

  async checkForAlertConditions(document) {
    // Implement alert logic for critical conditions
    const alertConditions = [
      // High error rate
      document.level === 'ERROR' && document.error_count > 10,

      // Security incidents
      document.category === 'security' && document.severity === 'high',

      // System failures
      document.component === 'database' && document.status === 'down',

      // Performance degradation
      document.metric_type === 'response_time' && document.value > 10000
    ];

    if (alertConditions.some(condition => condition)) {
      await this.triggerAlert({
        type: 'critical_condition',
        document: document,
        timestamp: new Date()
      });
    }
  }

  async triggerAlert(alert) {
    console.log('ALERT TRIGGERED:', JSON.stringify(alert, null, 2));

    // Store alert in dedicated collection
    const alertsCollection = this.db.collection('alerts');
    await alertsCollection.insertOne({
      ...alert,
      _id: new ObjectId(),
      acknowledged: false,
      created_at: new Date()
    });

    // Send external notifications (email, Slack, PagerDuty, etc.)
    // Implementation depends on notification system
  }

  emitRealtimeEvent(collectionName, document) {
    // Emit to WebSocket connections, message queues, etc.
    console.log(`Real-time event: ${collectionName}`, {
      id: document._id,
      timestamp: document._inserted_at || document.timestamp,
      level: document.level,
      message: document.message?.substring(0, 100) + '...'
    });
  }

  async getCollectionStatistics(collectionName) {
    const collection = this.collections.get(collectionName);
    if (!collection) {
      throw new Error(`Collection not found: ${collectionName}`);
    }

    // Get MongoDB collection statistics
    const stats = await this.db.runCommand({ collStats: collectionName });
    const customStats = this.statistics.get(collectionName);

    return {
      // MongoDB statistics
      size: stats.size,
      count: stats.count,
      avgObjSize: stats.avgObjSize,
      storageSize: stats.storageSize,
      capped: stats.capped,
      max: stats.max,
      maxSize: stats.maxSize,

      // Custom statistics
      insertRate: customStats?.insertRate || 0,
      lastInsert: customStats?.lastInsert,
      retentionType: customStats?.retentionType,

      // Calculated metrics
      utilizationPercent: ((stats.size / stats.maxSize) * 100).toFixed(2),
      documentsPerMB: Math.round(stats.count / (stats.size / 1024 / 1024)),

      // Health assessment
      healthStatus: this.assessCollectionHealth(stats, customStats)
    };
  }

  assessCollectionHealth(mongoStats, customStats) {
    const utilizationPercent = (mongoStats.size / mongoStats.maxSize) * 100;
    const timeSinceLastInsert = customStats?.lastInsert ? 
      Date.now() - customStats.lastInsert.getTime() : Infinity;

    if (utilizationPercent > 95) {
      return 'NEAR_CAPACITY';
    } else if (timeSinceLastInsert > 300000) { // 5 minutes
      return 'INACTIVE';
    } else if (customStats?.insertRate > 1000) {
      return 'HIGH_VOLUME';
    } else {
      return 'HEALTHY';
    }
  }

  async performMaintenance() {
    console.log('Performing capped collection maintenance...');

    const maintenanceReport = {
      timestamp: new Date(),
      collections: {},
      recommendations: []
    };

    for (const collectionName of this.collections.keys()) {
      const stats = await this.getCollectionStatistics(collectionName);
      maintenanceReport.collections[collectionName] = stats;

      // Generate recommendations based on statistics
      if (stats.healthStatus === 'NEAR_CAPACITY') {
        maintenanceReport.recommendations.push({
          collection: collectionName,
          type: 'SIZE_WARNING',
          message: `Collection ${collectionName} is at ${stats.utilizationPercent}% capacity`
        });
      }

      if (stats.healthStatus === 'INACTIVE') {
        maintenanceReport.recommendations.push({
          collection: collectionName,
          type: 'INACTIVE_WARNING',
          message: `Collection ${collectionName} has not received data recently`
        });
      }

      if (stats.insertRate > 1000) {
        maintenanceReport.recommendations.push({
          collection: collectionName,
          type: 'HIGH_VOLUME',
          message: `Collection ${collectionName} has high insertion rate: ${stats.insertRate}/sec`
        });
      }
    }

    console.log('Maintenance report generated:', maintenanceReport);
    return maintenanceReport;
  }

  async shutdown() {
    console.log('Shutting down capped collection manager...');

    // Close all tailable cursors
    for (const [collectionName, cursor] of this.tails.entries()) {
      try {
        await cursor.close();
        console.log(`Closed tailable cursor for: ${collectionName}`);
      } catch (error) {
        console.error(`Error closing cursor for ${collectionName}:`, error);
      }
    }

    this.tails.clear();
    this.collections.clear();
    this.statistics.clear();

    console.log('Capped collection manager shutdown complete');
  }
}

// Real-time log aggregation and analysis
class RealtimeLogAggregator {
  constructor(cappedManager) {
    this.cappedManager = cappedManager;
    this.aggregationWindows = new Map();
    this.alertThresholds = {
      errorRate: 0.05, // 5% error rate
      responseTime: 5000, // 5 seconds
      memoryUsage: 0.85, // 85% memory usage
      cpuUsage: 0.90 // 90% CPU usage
    };
  }

  async startRealtimeAggregation() {
    console.log('Starting real-time log aggregation...');

    // Set up sliding window aggregations
    this.startSlidingWindow('error_rate', 300000); // 5-minute window
    this.startSlidingWindow('response_time', 60000); // 1-minute window
    this.startSlidingWindow('throughput', 60000); // 1-minute window
    this.startSlidingWindow('resource_usage', 120000); // 2-minute window

    console.log('Real-time aggregation started');
  }

  startSlidingWindow(metricType, windowSizeMs) {
    const windowData = {
      data: [],
      windowSize: windowSizeMs,
      lastCleanup: Date.now()
    };

    this.aggregationWindows.set(metricType, windowData);

    // Start cleanup interval
    setInterval(() => {
      this.cleanupWindow(metricType);
    }, windowSizeMs / 10); // Cleanup every 1/10th of window size
  }

  cleanupWindow(metricType) {
    const window = this.aggregationWindows.get(metricType);
    if (!window) return;

    const cutoffTime = Date.now() - window.windowSize;
    window.data = window.data.filter(entry => entry.timestamp > cutoffTime);
    window.lastCleanup = Date.now();
  }

  addDataPoint(metricType, value, metadata = {}) {
    const window = this.aggregationWindows.get(metricType);
    if (!window) return;

    window.data.push({
      timestamp: Date.now(),
      value: value,
      metadata: metadata
    });

    // Check for alerts
    this.checkAggregationAlerts(metricType);
  }

  checkAggregationAlerts(metricType) {
    const window = this.aggregationWindows.get(metricType);
    if (!window || window.data.length === 0) return;

    const recentData = window.data.slice(-10); // Last 10 data points
    const avgValue = recentData.reduce((sum, point) => sum + point.value, 0) / recentData.length;

    let alertTriggered = false;
    let alertMessage = '';

    switch (metricType) {
      case 'error_rate':
        if (avgValue > this.alertThresholds.errorRate) {
          alertTriggered = true;
          alertMessage = `High error rate: ${(avgValue * 100).toFixed(2)}%`;
        }
        break;

      case 'response_time':
        if (avgValue > this.alertThresholds.responseTime) {
          alertTriggered = true;
          alertMessage = `High response time: ${avgValue.toFixed(0)}ms`;
        }
        break;

      case 'resource_usage':
        const memoryAlert = recentData.some(p => p.metadata.memory > this.alertThresholds.memoryUsage);
        const cpuAlert = recentData.some(p => p.metadata.cpu > this.alertThresholds.cpuUsage);

        if (memoryAlert || cpuAlert) {
          alertTriggered = true;
          alertMessage = `High resource usage: Memory ${memoryAlert ? 'HIGH' : 'OK'}, CPU ${cpuAlert ? 'HIGH' : 'OK'}`;
        }
        break;
    }

    if (alertTriggered) {
      this.cappedManager.triggerAlert({
        type: 'aggregation_alert',
        metricType: metricType,
        message: alertMessage,
        value: avgValue,
        threshold: this.alertThresholds[metricType] || 'N/A',
        recentData: recentData.slice(-3) // Last 3 data points
      });
    }
  }

  getWindowSummary(metricType) {
    const window = this.aggregationWindows.get(metricType);
    if (!window || window.data.length === 0) {
      return { metricType, dataPoints: 0, summary: null };
    }

    const values = window.data.map(point => point.value);
    const sortedValues = [...values].sort((a, b) => a - b);

    return {
      metricType: metricType,
      dataPoints: window.data.length,
      windowSizeMs: window.windowSize,
      summary: {
        min: Math.min(...values),
        max: Math.max(...values),
        avg: values.reduce((sum, val) => sum + val, 0) / values.length,
        median: sortedValues[Math.floor(sortedValues.length / 2)],
        p95: sortedValues[Math.floor(sortedValues.length * 0.95)],
        p99: sortedValues[Math.floor(sortedValues.length * 0.99)]
      },
      trend: this.calculateTrend(window.data),
      lastUpdate: window.data[window.data.length - 1].timestamp
    };
  }

  calculateTrend(dataPoints) {
    if (dataPoints.length < 2) return 'INSUFFICIENT_DATA';

    const firstHalf = dataPoints.slice(0, Math.floor(dataPoints.length / 2));
    const secondHalf = dataPoints.slice(Math.floor(dataPoints.length / 2));

    const firstHalfAvg = firstHalf.reduce((sum, p) => sum + p.value, 0) / firstHalf.length;
    const secondHalfAvg = secondHalf.reduce((sum, p) => sum + p.value, 0) / secondHalf.length;

    const change = (secondHalfAvg - firstHalfAvg) / firstHalfAvg;

    if (Math.abs(change) < 0.05) return 'STABLE'; // Less than 5% change
    return change > 0 ? 'INCREASING' : 'DECREASING';
  }

  getAllWindowSummaries() {
    const summaries = {};
    for (const metricType of this.aggregationWindows.keys()) {
      summaries[metricType] = this.getWindowSummary(metricType);
    }
    return summaries;
  }
}

SQL-Style Capped Collection Operations with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB Capped Collection management and querying:

-- QueryLeaf capped collection operations with SQL-familiar syntax

-- Create capped collections with size and document limits
CREATE CAPPED COLLECTION application_logs 
WITH (
  size = '1GB',
  max_documents = 10000000,
  auto_rotate = true
);

CREATE CAPPED COLLECTION error_logs 
WITH (
  size = '256MB', 
  max_documents = 1000000
);

CREATE CAPPED COLLECTION access_logs
WITH (
  size = '2GB'
  -- No document limit for maximum throughput
);

-- High-performance log insertion
INSERT INTO application_logs 
VALUES (
  CURRENT_TIMESTAMP,
  'user-service',
  'payment-processor', 
  'prod-instance-01',
  'ERROR',
  'Payment processing failed for transaction tx_12345',

  -- Structured request context
  ROW(
    'req_98765',
    'POST',
    '/api/payments/process',
    'user_54321',
    'sess_abcdef',
    '192.168.1.100'
  ) AS request_context,

  -- Trace information
  ROW(
    'trace_xyz789',
    'span_456',
    'span_123',
    1
  ) AS trace_info,

  -- Error details
  ROW(
    'PaymentValidationError',
    'Invalid payment method: expired_card',
    'PaymentProcessor.validateCard() line 245',
    'PM001'
  ) AS error_details,

  -- Additional data
  JSON_BUILD_OBJECT(
    'transaction_id', 'tx_12345',
    'user_id', 'user_54321', 
    'payment_amount', 299.99,
    'payment_method', 'card_****1234',
    'merchant_id', 'merchant_789'
  ) AS log_data
);

-- Real-time log tailing (most recent entries first)
SELECT 
  timestamp,
  service,
  level,
  message,
  request_context.request_id,
  request_context.user_id,
  trace_info.trace_id,
  error_details.error_code,
  log_data
FROM application_logs
ORDER BY $natural DESC  -- Natural order in capped collections
LIMIT 100;

-- Log analysis with time-based aggregation
WITH recent_logs AS (
  SELECT 
    service,
    level,
    timestamp,
    message,
    request_context.user_id,
    error_details.error_code,

    -- Time bucketing for analysis
    DATE_TRUNC('minute', timestamp) as minute_bucket,
    DATE_TRUNC('hour', timestamp) as hour_bucket
  FROM application_logs
  WHERE timestamp >= CURRENT_TIMESTAMP - INTERVAL '4 hours'
),

error_summary AS (
  SELECT 
    service,
    hour_bucket,
    level,
    COUNT(*) as log_count,
    COUNT(DISTINCT request_context.user_id) as affected_users,
    COUNT(DISTINCT error_details.error_code) as unique_errors,

    -- Error patterns
    mode() WITHIN GROUP (ORDER BY error_details.error_code) as most_common_error,
    array_agg(DISTINCT error_details.error_code) as error_codes,

    -- Sample messages for investigation
    array_agg(
      json_build_object(
        'timestamp', timestamp,
        'message', SUBSTRING(message, 1, 100),
        'user_id', request_context.user_id,
        'error_code', error_details.error_code
      ) ORDER BY timestamp DESC
    )[1:5] as recent_samples

  FROM recent_logs
  WHERE level IN ('ERROR', 'FATAL')
  GROUP BY service, hour_bucket, level
),

service_health AS (
  SELECT 
    service,
    hour_bucket,

    -- Overall metrics
    SUM(log_count) as total_logs,
    SUM(log_count) FILTER (WHERE level = 'ERROR') as error_count,
    SUM(log_count) FILTER (WHERE level = 'WARN') as warning_count,
    SUM(affected_users) as total_affected_users,

    -- Error rate calculation
    CASE 
      WHEN SUM(log_count) > 0 THEN 
        (SUM(log_count) FILTER (WHERE level = 'ERROR')::numeric / SUM(log_count)) * 100
      ELSE 0
    END as error_rate_percent,

    -- Service status assessment
    CASE 
      WHEN SUM(log_count) FILTER (WHERE level = 'ERROR') > 100 THEN 'CRITICAL'
      WHEN (SUM(log_count) FILTER (WHERE level = 'ERROR')::numeric / NULLIF(SUM(log_count), 0)) > 0.05 THEN 'DEGRADED'
      WHEN SUM(log_count) FILTER (WHERE level = 'WARN') > 50 THEN 'WARNING'
      ELSE 'HEALTHY'
    END as service_status

  FROM error_summary
  GROUP BY service, hour_bucket
)

SELECT 
  sh.service,
  sh.hour_bucket,
  sh.total_logs,
  sh.error_count,
  sh.warning_count,
  ROUND(sh.error_rate_percent, 2) as error_rate_pct,
  sh.total_affected_users,
  sh.service_status,

  -- Top error details
  es.most_common_error,
  es.unique_errors,
  es.error_codes,
  es.recent_samples,

  -- Trend analysis
  LAG(sh.error_count, 1) OVER (
    PARTITION BY sh.service 
    ORDER BY sh.hour_bucket
  ) as prev_hour_errors,

  sh.error_count - LAG(sh.error_count, 1) OVER (
    PARTITION BY sh.service 
    ORDER BY sh.hour_bucket
  ) as error_count_change

FROM service_health sh
LEFT JOIN error_summary es ON (
  sh.service = es.service AND 
  sh.hour_bucket = es.hour_bucket AND 
  es.level = 'ERROR'
)
WHERE sh.service_status != 'HEALTHY'
ORDER BY sh.service_status DESC, sh.error_rate_percent DESC, sh.hour_bucket DESC;

-- Access log analysis for performance monitoring
WITH access_metrics AS (
  SELECT 
    endpoint,
    method,
    DATE_TRUNC('minute', timestamp) as minute_bucket,

    -- Request metrics
    COUNT(*) as request_count,
    AVG(duration_ms) as avg_duration,
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY duration_ms) as median_duration,
    PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY duration_ms) as p95_duration,
    PERCENTILE_CONT(0.99) WITHIN GROUP (ORDER BY duration_ms) as p99_duration,
    MIN(duration_ms) as min_duration,
    MAX(duration_ms) as max_duration,

    -- Status code distribution
    COUNT(*) FILTER (WHERE status_code < 300) as success_count,
    COUNT(*) FILTER (WHERE status_code >= 300 AND status_code < 400) as redirect_count,
    COUNT(*) FILTER (WHERE status_code >= 400 AND status_code < 500) as client_error_count,
    COUNT(*) FILTER (WHERE status_code >= 500) as server_error_count,

    -- Data transfer metrics
    AVG(response_size) as avg_response_size,
    SUM(response_size) as total_response_size,

    -- Client metrics
    COUNT(DISTINCT client.ip) as unique_clients,
    COUNT(DISTINCT client.user_id) as unique_users

  FROM access_logs
  WHERE timestamp >= CURRENT_TIMESTAMP - INTERVAL '2 hours'
  GROUP BY endpoint, method, minute_bucket
),

performance_analysis AS (
  SELECT 
    endpoint,
    method,

    -- Aggregated performance metrics
    SUM(request_count) as total_requests,
    AVG(avg_duration) as overall_avg_duration,
    MAX(p95_duration) as max_p95_duration,
    MAX(p99_duration) as max_p99_duration,

    -- Error rates
    (SUM(client_error_count + server_error_count)::numeric / SUM(request_count)) * 100 as error_rate_percent,
    SUM(server_error_count) as total_server_errors,

    -- Throughput metrics
    AVG(request_count) as avg_requests_per_minute,
    MAX(request_count) as peak_requests_per_minute,

    -- Data transfer
    AVG(avg_response_size) as avg_response_size,
    SUM(total_response_size) / (1024 * 1024) as total_mb_transferred,

    -- Client diversity
    AVG(unique_clients) as avg_unique_clients,
    AVG(unique_users) as avg_unique_users,

    -- Performance assessment
    CASE 
      WHEN AVG(avg_duration) > 5000 THEN 'SLOW'
      WHEN AVG(avg_duration) > 2000 THEN 'DEGRADED' 
      WHEN MAX(p95_duration) > 10000 THEN 'INCONSISTENT'
      ELSE 'NORMAL'
    END as performance_status,

    -- Time series data for trending
    array_agg(
      json_build_object(
        'minute', minute_bucket,
        'requests', request_count,
        'avg_duration', avg_duration,
        'p95_duration', p95_duration,
        'error_rate', (client_error_count + server_error_count)::numeric / request_count * 100
      ) ORDER BY minute_bucket
    ) as time_series_data

  FROM access_metrics
  GROUP BY endpoint, method
),

endpoint_ranking AS (
  SELECT *,
    ROW_NUMBER() OVER (ORDER BY total_requests DESC) as request_rank,
    ROW_NUMBER() OVER (ORDER BY error_rate_percent DESC) as error_rank,
    ROW_NUMBER() OVER (ORDER BY overall_avg_duration DESC) as duration_rank
  FROM performance_analysis
)

SELECT 
  endpoint,
  method,
  total_requests,
  ROUND(overall_avg_duration, 1) as avg_duration_ms,
  ROUND(max_p95_duration, 1) as max_p95_ms,
  ROUND(max_p99_duration, 1) as max_p99_ms,
  ROUND(error_rate_percent, 2) as error_rate_pct,
  total_server_errors,
  ROUND(avg_requests_per_minute, 1) as avg_rpm,
  peak_requests_per_minute as peak_rpm,
  ROUND(total_mb_transferred, 1) as total_mb,
  performance_status,

  -- Rankings
  request_rank,
  error_rank, 
  duration_rank,

  -- Alerts and recommendations
  CASE 
    WHEN performance_status = 'SLOW' THEN 'Optimize endpoint performance - average response time exceeds 5 seconds'
    WHEN performance_status = 'DEGRADED' THEN 'Monitor endpoint performance - response times elevated'
    WHEN performance_status = 'INCONSISTENT' THEN 'Investigate performance spikes - P95 latency exceeds 10 seconds'
    WHEN error_rate_percent > 5 THEN 'High error rate detected - investigate client and server errors'
    WHEN total_server_errors > 100 THEN 'Significant server errors detected - check application health'
    ELSE 'Performance within normal parameters'
  END as recommendation,

  time_series_data

FROM endpoint_ranking
WHERE (
  performance_status != 'NORMAL' OR 
  error_rate_percent > 1 OR 
  request_rank <= 20
)
ORDER BY 
  CASE performance_status
    WHEN 'SLOW' THEN 1
    WHEN 'DEGRADED' THEN 2
    WHEN 'INCONSISTENT' THEN 3
    ELSE 4
  END,
  error_rate_percent DESC,
  total_requests DESC;

-- Real-time metrics aggregation from capped collections
CREATE VIEW real_time_metrics AS
WITH metric_windows AS (
  SELECT 
    metric_type,
    metric_name,
    instance_id,

    -- Current values
    LAST_VALUE(value ORDER BY timestamp) as current_value,
    FIRST_VALUE(value ORDER BY timestamp) as first_value,

    -- Statistical aggregations
    AVG(value) as avg_value,
    MIN(value) as min_value,
    MAX(value) as max_value,
    STDDEV_POP(value) as stddev_value,
    COUNT(*) as sample_count,

    -- Trend calculation
    CASE 
      WHEN COUNT(*) >= 2 THEN
        (LAST_VALUE(value ORDER BY timestamp) - FIRST_VALUE(value ORDER BY timestamp)) / 
        NULLIF(FIRST_VALUE(value ORDER BY timestamp), 0) * 100
      ELSE 0
    END as trend_percent,

    -- Alert thresholds
    MAX(alerts.warning_threshold) as warning_threshold,
    MAX(alerts.critical_threshold) as critical_threshold,

    -- Time range
    MIN(timestamp) as window_start,
    MAX(timestamp) as window_end

  FROM performance_metrics
  WHERE timestamp >= CURRENT_TIMESTAMP - INTERVAL '5 minutes'
  GROUP BY metric_type, metric_name, instance_id
)

SELECT 
  metric_type,
  metric_name,
  instance_id,
  current_value,
  ROUND(avg_value::numeric, 2) as avg_value,
  min_value,
  max_value,
  ROUND(stddev_value::numeric, 2) as stddev,
  sample_count,
  ROUND(trend_percent::numeric, 1) as trend_pct,

  -- Alert status
  CASE 
    WHEN critical_threshold IS NOT NULL AND current_value >= critical_threshold THEN 'CRITICAL'
    WHEN warning_threshold IS NOT NULL AND current_value >= warning_threshold THEN 'WARNING'
    ELSE 'NORMAL'
  END as alert_status,

  warning_threshold,
  critical_threshold,
  window_start,
  window_end,

  -- Performance assessment
  CASE metric_type
    WHEN 'cpu_percent' THEN 
      CASE WHEN current_value > 90 THEN 'HIGH' 
           WHEN current_value > 70 THEN 'ELEVATED'
           ELSE 'NORMAL' END
    WHEN 'memory_percent' THEN
      CASE WHEN current_value > 85 THEN 'HIGH'
           WHEN current_value > 70 THEN 'ELEVATED' 
           ELSE 'NORMAL' END
    WHEN 'response_time_ms' THEN
      CASE WHEN current_value > 5000 THEN 'SLOW'
           WHEN current_value > 2000 THEN 'ELEVATED'
           ELSE 'NORMAL' END
    ELSE 'NORMAL'
  END as performance_status

FROM metric_windows
ORDER BY 
  CASE alert_status
    WHEN 'CRITICAL' THEN 1
    WHEN 'WARNING' THEN 2
    ELSE 3
  END,
  metric_type,
  metric_name;

-- Capped collection maintenance and monitoring
SELECT 
  collection_name,
  is_capped,
  max_size_bytes / (1024 * 1024) as max_size_mb,
  current_size_bytes / (1024 * 1024) as current_size_mb,
  document_count,
  max_documents,

  -- Utilization metrics
  ROUND((current_size_bytes::numeric / max_size_bytes) * 100, 1) as size_utilization_pct,
  ROUND((document_count::numeric / NULLIF(max_documents, 0)) * 100, 1) as document_utilization_pct,

  -- Health assessment
  CASE 
    WHEN (current_size_bytes::numeric / max_size_bytes) > 0.95 THEN 'NEAR_CAPACITY'
    WHEN (current_size_bytes::numeric / max_size_bytes) > 0.80 THEN 'HIGH_UTILIZATION'
    WHEN document_count = 0 THEN 'EMPTY'
    ELSE 'HEALTHY'
  END as health_status,

  -- Performance metrics
  avg_document_size_bytes,
  ROUND(avg_document_size_bytes / 1024.0, 1) as avg_document_size_kb,

  -- Recommendations
  CASE 
    WHEN (current_size_bytes::numeric / max_size_bytes) > 0.95 THEN 
      'Consider increasing collection size or reducing retention period'
    WHEN document_count = 0 THEN 
      'Collection is empty - verify data ingestion is working'
    WHEN avg_document_size_bytes > 16384 THEN 
      'Large average document size - consider data optimization'
    ELSE 'Collection operating within normal parameters'
  END as recommendation

FROM CAPPED_COLLECTION_STATS()
WHERE is_capped = true
ORDER BY size_utilization_pct DESC;

-- QueryLeaf provides comprehensive capped collection capabilities:
-- 1. SQL-familiar capped collection creation and management
-- 2. High-performance log insertion with structured data support
-- 3. Real-time log tailing and streaming with natural ordering
-- 4. Advanced log analysis with time-based aggregations
-- 5. Access pattern analysis for performance monitoring
-- 6. Real-time metrics aggregation and alerting
-- 7. Capped collection health monitoring and maintenance
-- 8. Integration with MongoDB's circular buffer optimizations
-- 9. Automatic size management without manual intervention
-- 10. Familiar SQL patterns for log analysis and troubleshooting

Best Practices for Capped Collection Implementation

Design Guidelines

Essential practices for optimal capped collection configuration:

  1. Size Planning: Calculate appropriate collection sizes based on expected data volume and retention requirements
  2. Index Strategy: Use minimal indexes to maintain write performance while supporting essential queries
  3. Document Structure: Design documents for optimal compression and query performance
  4. Retention Alignment: Align capped collection sizes with business retention and compliance requirements
  5. Monitoring Setup: Implement continuous monitoring of collection utilization and performance
  6. Alert Configuration: Set up alerts for capacity utilization and performance degradation

Performance and Scalability

Optimize capped collections for high-throughput logging scenarios:

  1. Write Performance: Minimize indexes and use batch insertion for maximum throughput
  2. Tailable Cursors: Leverage tailable cursors for real-time log streaming and processing
  3. Collection Sizing: Balance collection size with query performance and storage efficiency
  4. Replica Set Configuration: Optimize replica set settings for write-heavy workloads
  5. Hardware Considerations: Use fast storage and adequate memory for optimal performance
  6. Network Optimization: Configure network settings for high-volume log ingestion

Conclusion

MongoDB Capped Collections provide purpose-built capabilities for high-performance logging and circular buffer patterns that eliminate the complexity and overhead of traditional database approaches while delivering consistent performance and automatic space management. The natural ordering preservation and optimized write characteristics make capped collections ideal for log processing, event storage, and real-time data applications.

Key Capped Collection benefits include:

  • Automatic Size Management: Fixed-size collections with automatic document rotation
  • Write-Optimized Performance: Optimized for high-throughput, sequential write operations
  • Natural Ordering: Insertion order preservation without additional indexing overhead
  • Circular Buffer Behavior: Automatic old document removal when size limits are reached
  • Real-Time Streaming: Tailable cursor support for live log streaming and processing
  • Operational Simplicity: No manual maintenance or complex rotation procedures required

Whether you're building logging systems, event processors, real-time analytics platforms, or any application requiring circular buffer patterns, MongoDB Capped Collections with QueryLeaf's familiar SQL interface provides the foundation for high-performance data storage. This combination enables you to implement sophisticated logging capabilities while preserving familiar database interaction patterns.

QueryLeaf Integration: QueryLeaf automatically manages MongoDB Capped Collection operations while providing SQL-familiar collection creation, log analysis, and real-time querying syntax. Advanced circular buffer management, performance monitoring, and maintenance operations are seamlessly handled through familiar SQL patterns, making high-performance logging both powerful and accessible.

The integration of native capped collection capabilities with SQL-style operations makes MongoDB an ideal platform for applications requiring both high-performance logging and familiar database interaction patterns, ensuring your logging solutions remain both effective and maintainable as they scale and evolve.

MongoDB Geospatial Queries and Location-Based Services: SQL-Style Spatial Operations for Modern Applications

Location-aware applications have become fundamental to modern software experiences - from ride-sharing platforms and delivery services to social networks and retail applications. These applications require sophisticated spatial data processing capabilities including proximity searches, route optimization, geofencing, and real-time location tracking that traditional relational databases struggle to handle efficiently.

MongoDB provides comprehensive geospatial functionality with support for 2D and 3D coordinates, multiple coordinate reference systems, and advanced spatial operations. Unlike traditional databases that require complex extensions for spatial data, MongoDB natively supports geospatial indexes, queries, and aggregation operations that can handle billions of location data points with sub-second query performance.

The Traditional Spatial Data Challenge

Relational databases face significant limitations when handling geospatial data and location-based queries:

-- Traditional PostgreSQL/PostGIS approach - complex setup and limited performance
-- Location-based application with spatial data

CREATE EXTENSION IF NOT EXISTS postgis;

-- Store locations with geometry data
CREATE TABLE locations (
    location_id SERIAL PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    category VARCHAR(100),
    address TEXT,
    city VARCHAR(100),
    state VARCHAR(50),
    country VARCHAR(100),

    -- PostGIS geometry column (complex setup required)
    coordinates GEOMETRY(POINT, 4326), -- WGS84 coordinate system

    -- Additional spatial data
    service_area GEOMETRY(POLYGON, 4326), -- Service coverage area
    delivery_zones GEOMETRY(MULTIPOLYGON, 4326), -- Multiple delivery zones

    -- Business data
    rating DECIMAL(3,2),
    total_reviews INTEGER DEFAULT 0,
    is_active BOOLEAN DEFAULT true,
    hours_of_operation JSONB,

    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Create spatial indexes (requires PostGIS extension)
CREATE INDEX idx_locations_coordinates ON locations USING GIST (coordinates);
CREATE INDEX idx_locations_service_area ON locations USING GIST (service_area);

-- Store user locations and activities
CREATE TABLE user_locations (
    user_location_id SERIAL PRIMARY KEY,
    user_id INTEGER NOT NULL REFERENCES users(user_id),
    coordinates GEOMETRY(POINT, 4326),
    accuracy_meters DECIMAL(8,2),
    recorded_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    activity_type VARCHAR(50), -- 'check-in', 'delivery', 'movement'
    device_info JSONB
);

CREATE INDEX idx_user_locations_coordinates ON user_locations USING GIST (coordinates);
CREATE INDEX idx_user_locations_user_time ON user_locations (user_id, recorded_at);

-- Complex proximity search query
WITH nearby_locations AS (
    SELECT 
        l.location_id,
        l.name,
        l.category,
        l.rating,

        -- Distance calculation in meters
        ST_Distance(
            l.coordinates,
            ST_SetSRID(ST_MakePoint(-122.4194, 37.7749), 4326) -- San Francisco coordinates
        ) as distance_meters,

        -- Check if point is within service area
        ST_Contains(
            l.service_area,
            ST_SetSRID(ST_MakePoint(-122.4194, 37.7749), 4326)
        ) as is_in_service_area,

        -- Convert coordinates back to lat/lng for application
        ST_Y(l.coordinates) as latitude,
        ST_X(l.coordinates) as longitude

    FROM locations l
    WHERE 
        l.is_active = true
        AND ST_DWithin(
            l.coordinates,
            ST_SetSRID(ST_MakePoint(-122.4194, 37.7749), 4326),
            5000 -- 5km radius in meters
        )
),
location_analytics AS (
    -- Add user activity data for locations
    SELECT 
        nl.*,
        COUNT(DISTINCT ul.user_id) as unique_visitors_last_30_days,
        COUNT(ul.user_location_id) as total_activities_last_30_days,
        AVG(ul.accuracy_meters) as avg_location_accuracy
    FROM nearby_locations nl
    LEFT JOIN user_locations ul ON ST_DWithin(
        ST_SetSRID(ST_MakePoint(nl.longitude, nl.latitude), 4326),
        ul.coordinates,
        100 -- Within 100 meters of location
    )
    AND ul.recorded_at >= CURRENT_DATE - INTERVAL '30 days'
    GROUP BY nl.location_id, nl.name, nl.category, nl.rating, 
             nl.distance_meters, nl.is_in_service_area, 
             nl.latitude, nl.longitude
)
SELECT 
    location_id,
    name,
    category,
    rating,
    ROUND(distance_meters::numeric, 0) as distance_meters,
    is_in_service_area,
    latitude,
    longitude,
    unique_visitors_last_30_days,
    total_activities_last_30_days,
    ROUND(avg_location_accuracy::numeric, 1) as avg_accuracy_meters,

    -- Relevance scoring based on distance, rating, and activity
    (
        (1000 - LEAST(distance_meters, 1000)) / 1000 * 0.4 + -- Distance factor (40%)
        (rating / 5.0) * 0.3 + -- Rating factor (30%)
        (LEAST(unique_visitors_last_30_days, 50) / 50.0) * 0.3 -- Activity factor (30%)
    ) as relevance_score

FROM location_analytics
ORDER BY relevance_score DESC, distance_meters ASC
LIMIT 20;

-- Problems with traditional spatial approach:
-- 1. Complex PostGIS extension setup and maintenance
-- 2. Requires specialized spatial database knowledge
-- 3. Limited coordinate system support without additional configuration
-- 4. Performance degrades with large datasets and complex queries
-- 5. Difficult integration with application object models
-- 6. Complex geometry data types and manipulation functions
-- 7. Limited aggregation capabilities for spatial analytics
-- 8. Challenging horizontal scaling for global applications
-- 9. Memory-intensive spatial operations
-- 10. Complex backup and restore procedures for spatial data

-- MySQL spatial limitations (even more restrictive):
CREATE TABLE locations_mysql (
    id INT AUTO_INCREMENT PRIMARY KEY,
    name VARCHAR(255),
    -- MySQL spatial support limited and less capable
    coordinates POINT NOT NULL,
    SPATIAL INDEX(coordinates)
);

-- Basic proximity query in MySQL (limited functionality)
SELECT 
    id, name,
    ST_Distance_Sphere(
        coordinates, 
        POINT(-122.4194, 37.7749)
    ) as distance_meters
FROM locations_mysql
WHERE ST_Distance_Sphere(
    coordinates, 
    POINT(-122.4194, 37.7749)
) < 5000
ORDER BY distance_meters
LIMIT 10;

-- MySQL limitations:
-- - Limited spatial functions compared to PostGIS
-- - Poor performance with large spatial datasets
-- - No advanced spatial analytics capabilities
-- - Limited coordinate system support
-- - Basic geometry types only
-- - No spatial aggregation functions
-- - Difficult to implement complex spatial business logic

MongoDB provides comprehensive geospatial capabilities with simple, intuitive syntax:

// MongoDB native geospatial support - powerful and intuitive
const { MongoClient } = require('mongodb');

const client = new MongoClient('mongodb://localhost:27017');
const db = client.db('location_services');

// MongoDB geospatial document structure - native and flexible
const createLocationServiceDataModel = async () => {
  // Create locations collection with rich geospatial data
  const locations = db.collection('locations');

  // Example location document with geospatial data
  const locationDocument = {
    _id: new ObjectId(),

    // Basic business information
    name: "Blue Bottle Coffee - Ferry Building",
    category: "cafe",
    subcategory: "specialty_coffee",
    chain: "Blue Bottle Coffee",

    // Address information
    address: {
      street: "1 Ferry Building",
      unit: "Shop 7",
      city: "San Francisco",
      state: "CA",
      country: "USA",
      postalCode: "94111",
      formattedAddress: "1 Ferry Building, Shop 7, San Francisco, CA 94111"
    },

    // Primary location - GeoJSON Point format
    location: {
      type: "Point",
      coordinates: [-122.3937, 37.7955] // [longitude, latitude] - NOTE: MongoDB uses [lng, lat]
    },

    // Service area - GeoJSON Polygon format
    serviceArea: {
      type: "Polygon",
      coordinates: [[
        [-122.4050, 37.7850], // Southwest corner
        [-122.3850, 37.7850], // Southeast corner  
        [-122.3850, 37.8050], // Northeast corner
        [-122.4050, 37.8050], // Northwest corner
        [-122.4050, 37.7850]  // Close polygon
      ]]
    },

    // Multiple delivery zones - GeoJSON MultiPolygon
    deliveryZones: {
      type: "MultiPolygon", 
      coordinates: [
        [[ // First delivery zone
          [-122.4000, 37.7900],
          [-122.3900, 37.7900],
          [-122.3900, 37.8000],
          [-122.4000, 37.8000],
          [-122.4000, 37.7900]
        ]],
        [[ // Second delivery zone
          [-122.4100, 37.7800],
          [-122.3950, 37.7800],
          [-122.3950, 37.7900],
          [-122.4100, 37.7900],
          [-122.4100, 37.7800]
        ]]
      ]
    },

    // Business information
    business: {
      rating: 4.6,
      totalReviews: 1247,
      priceRange: "$$",
      phoneNumber: "+1-415-555-0123",
      website: "https://bluebottlecoffee.com",
      isActive: true,
      isChain: true,

      // Hours of operation with geospatial considerations
      hours: {
        monday: { open: "06:00", close: "19:00", timezone: "America/Los_Angeles" },
        tuesday: { open: "06:00", close: "19:00", timezone: "America/Los_Angeles" },
        wednesday: { open: "06:00", close: "19:00", timezone: "America/Los_Angeles" },
        thursday: { open: "06:00", close: "19:00", timezone: "America/Los_Angeles" },
        friday: { open: "06:00", close: "20:00", timezone: "America/Los_Angeles" },
        saturday: { open: "07:00", close: "20:00", timezone: "America/Los_Angeles" },
        sunday: { open: "07:00", close: "19:00", timezone: "America/Los_Angeles" }
      },

      // Services and amenities
      amenities: ["wifi", "outdoor_seating", "takeout", "delivery", "mobile_payment"],
      specialties: ["single_origin", "cold_brew", "espresso", "pour_over"]
    },

    // Geospatial metadata
    geoMetadata: {
      coordinateSystem: "WGS84",
      accuracyMeters: 5,
      elevationMeters: 15,
      dataSource: "GPS_verified",
      lastVerified: new Date("2024-09-01"),

      // Nearby landmarks for context
      nearbyLandmarks: [
        {
          name: "Ferry Building Marketplace",
          distance: 50,
          bearing: "north"
        },
        {
          name: "Embarcadero BART Station", 
          distance: 200,
          bearing: "west"
        }
      ]
    },

    // Analytics and performance data
    analytics: {
      monthlyVisitors: 12500,
      averageVisitDuration: 25, // minutes
      peakHours: ["08:00-09:00", "12:00-13:00", "15:00-16:00"],
      popularDays: ["monday", "tuesday", "wednesday", "friday"],

      // Location-specific metrics
      locationMetrics: {
        averageWalkingTime: 3.5, // minutes from nearest transit
        parkingAvailability: "limited",
        accessibilityRating: 4.2,
        noiseLevel: "moderate",
        crowdLevel: "busy"
      }
    },

    // SEO and discovery
    searchTerms: [
      "coffee shop ferry building", 
      "blue bottle san francisco",
      "specialty coffee embarcadero",
      "third wave coffee downtown sf"
    ],

    tags: ["coffee", "cafe", "specialty", "artisan", "downtown", "waterfront"],

    createdAt: new Date("2024-01-15"),
    updatedAt: new Date("2024-09-14")
  };

  // Insert the location document
  await locations.insertOne(locationDocument);

  // Create geospatial index - 2dsphere for spherical geometry (Earth)
  await locations.createIndex({ location: "2dsphere" });
  await locations.createIndex({ serviceArea: "2dsphere" });
  await locations.createIndex({ deliveryZones: "2dsphere" });

  // Additional indexes for common queries
  await locations.createIndex({ category: 1, "business.rating": -1 });
  await locations.createIndex({ "business.isActive": 1, "location": "2dsphere" });
  await locations.createIndex({ tags: 1, "location": "2dsphere" });

  console.log("Location document and indexes created successfully");
  return locations;
};

// Advanced geospatial queries and operations
const performGeospatialOperations = async () => {
  const locations = db.collection('locations');

  // 1. Proximity Search - Find nearby locations
  console.log("=== Proximity Search ===");
  const userLocation = [-122.4194, 37.7749]; // San Francisco coordinates [lng, lat]

  const nearbyLocations = await locations.find({
    location: {
      $near: {
        $geometry: {
          type: "Point",
          coordinates: userLocation
        },
        $maxDistance: 5000, // 5km in meters
        $minDistance: 0
      }
    },
    "business.isActive": true
  }).limit(10).toArray();

  console.log(`Found ${nearbyLocations.length} locations within 5km`);

  // 2. Geo Within - Find locations within a specific area
  console.log("\n=== Geo Within Search ===");
  const searchPolygon = {
    type: "Polygon", 
    coordinates: [[
      [-122.4270, 37.7609], // Southwest corner
      [-122.3968, 37.7609], // Southeast corner
      [-122.3968, 37.7908], // Northeast corner  
      [-122.4270, 37.7908], // Northwest corner
      [-122.4270, 37.7609]  // Close polygon
    ]]
  };

  const locationsInArea = await locations.find({
    location: {
      $geoWithin: {
        $geometry: searchPolygon
      }
    },
    category: "restaurant"
  }).toArray();

  console.log(`Found ${locationsInArea.length} restaurants in specified area`);

  // 3. Geospatial Aggregation - Complex analytics
  console.log("\n=== Geospatial Analytics ===");
  const geospatialAnalytics = await locations.aggregate([
    // Match active locations
    {
      $match: {
        "business.isActive": true,
        location: {
          $geoWithin: {
            $centerSphere: [userLocation, 10 / 3963.2] // 10 miles radius
          }
        }
      }
    },

    // Calculate distance from user location
    {
      $addFields: {
        distanceFromUser: {
          $divide: [
            {
              $sqrt: {
                $add: [
                  {
                    $pow: [
                      { $subtract: [{ $arrayElemAt: ["$location.coordinates", 0] }, userLocation[0]] },
                      2
                    ]
                  },
                  {
                    $pow: [
                      { $subtract: [{ $arrayElemAt: ["$location.coordinates", 1] }, userLocation[1]] },
                      2
                    ]
                  }
                ]
              }
            },
            0.000009 // Approximate degrees to meters conversion
          ]
        }
      }
    },

    // Group by category and analyze
    {
      $group: {
        _id: "$category",
        totalLocations: { $sum: 1 },
        averageRating: { $avg: "$business.rating" },
        averageDistance: { $avg: "$distanceFromUser" },
        closestLocation: {
          $min: {
            name: "$name",
            distance: "$distanceFromUser",
            coordinates: "$location.coordinates"
          }
        },

        // Collect all locations in category
        locations: {
          $push: {
            name: "$name",
            rating: "$business.rating",
            distance: "$distanceFromUser",
            coordinates: "$location.coordinates"
          }
        },

        // Rating distribution
        highRatedCount: {
          $sum: { $cond: [{ $gte: ["$business.rating", 4.5] }, 1, 0] }
        },
        mediumRatedCount: {
          $sum: { $cond: [{ $and: [{ $gte: ["$business.rating", 3.5] }, { $lt: ["$business.rating", 4.5] }] }, 1, 0] }
        },
        lowRatedCount: {
          $sum: { $cond: [{ $lt: ["$business.rating", 3.5] }, 1, 0] }
        }
      }
    },

    // Calculate additional metrics
    {
      $addFields: {
        categoryDensity: { $divide: ["$totalLocations", 314] }, // per square km (10 mile radius ≈ 314 sq km)
        highRatedPercentage: { $multiply: [{ $divide: ["$highRatedCount", "$totalLocations"] }, 100] },
        averageDistanceKm: { $multiply: ["$averageDistance", 111] } // Rough conversion to km
      }
    },

    // Sort by total locations and rating
    {
      $sort: {
        totalLocations: -1,
        averageRating: -1
      }
    },

    // Format output
    {
      $project: {
        category: "$_id",
        totalLocations: 1,
        averageRating: { $round: ["$averageRating", 2] },
        averageDistanceKm: { $round: ["$averageDistanceKm", 2] },
        categoryDensity: { $round: ["$categoryDensity", 2] },
        highRatedPercentage: { $round: ["$highRatedPercentage", 1] },
        closestLocation: 1,
        ratingDistribution: {
          high: "$highRatedCount",
          medium: "$mediumRatedCount", 
          low: "$lowRatedCount"
        }
      }
    }
  ]).toArray();

  console.log("Geospatial Analytics Results:");
  console.log(JSON.stringify(geospatialAnalytics, null, 2));

  // 4. Route optimization - Find optimal path through multiple locations
  console.log("\n=== Route Optimization ===");
  const waypointLocations = [
    [-122.4194, 37.7749], // Start: San Francisco
    [-122.4094, 37.7849], // Waypoint 1
    [-122.3994, 37.7949], // Waypoint 2
    [-122.4194, 37.7749]  // End: Back to start
  ];

  // Find locations near each waypoint
  const routeAnalysis = await Promise.all(
    waypointLocations.map(async (waypoint, index) => {
      const nearbyOnRoute = await locations.find({
        location: {
          $near: {
            $geometry: {
              type: "Point",
              coordinates: waypoint
            },
            $maxDistance: 500 // 500m radius
          }
        },
        "business.isActive": true
      }).limit(5).toArray();

      return {
        waypointIndex: index,
        coordinates: waypoint,
        nearbyLocations: nearbyOnRoute.map(loc => ({
          name: loc.name,
          category: loc.category,
          rating: loc.business.rating,
          coordinates: loc.location.coordinates
        }))
      };
    })
  );

  console.log("Route Analysis:");
  console.log(JSON.stringify(routeAnalysis, null, 2));

  return {
    nearbyLocations: nearbyLocations.length,
    locationsInArea: locationsInArea.length,
    analyticsResults: geospatialAnalytics.length,
    routeWaypoints: routeAnalysis.length
  };
};

// Real-time location tracking and geofencing
const setupLocationTracking = async () => {
  const userLocations = db.collection('user_locations');
  const geofences = db.collection('geofences');

  // Create user location tracking document
  const userLocationDocument = {
    _id: new ObjectId(),
    userId: new ObjectId("64a1b2c3d4e5f6789012347a"),

    // Current location
    currentLocation: {
      type: "Point",
      coordinates: [-122.4194, 37.7749]
    },

    // Location metadata
    locationMetadata: {
      accuracy: 10, // meters
      altitude: 15, // meters above sea level
      heading: 45, // degrees from north
      speed: 1.5, // meters per second
      timestamp: new Date(),
      source: "GPS", // GPS, WiFi, Cellular, Manual
      batteryLevel: 85,

      // Device context
      device: {
        platform: "iOS",
        version: "17.1",
        model: "iPhone 15 Pro",
        appVersion: "2.1.0"
      }
    },

    // Location history (recent positions)
    locationHistory: [
      {
        location: {
          type: "Point", 
          coordinates: [-122.4204, 37.7739]
        },
        timestamp: new Date(Date.now() - 300000), // 5 minutes ago
        accuracy: 15,
        source: "GPS"
      },
      {
        location: {
          type: "Point",
          coordinates: [-122.4214, 37.7729] 
        },
        timestamp: new Date(Date.now() - 600000), // 10 minutes ago
        accuracy: 12,
        source: "GPS"
      }
    ],

    // Privacy and permissions
    privacy: {
      shareLocation: true,
      accuracyLevel: "precise", // precise, approximate, city
      shareWithFriends: true,
      shareWithBusiness: false,
      trackingEnabled: true
    },

    // Activity context
    activity: {
      type: "walking", // walking, driving, cycling, stationary
      confidence: 0.85,
      detectedTransition: null,
      lastActivity: "stationary"
    },

    createdAt: new Date(),
    updatedAt: new Date()
  };

  // Create indexes for location tracking
  await userLocations.createIndex({ currentLocation: "2dsphere" });
  await userLocations.createIndex({ userId: 1, "locationMetadata.timestamp": -1 });
  await userLocations.createIndex({ "locationHistory.location": "2dsphere" });

  await userLocations.insertOne(userLocationDocument);

  // Create geofence system
  const geofenceDocument = {
    _id: new ObjectId(),
    name: "Downtown Coffee Shop Promo Zone",
    description: "Special promotions for coffee shops in downtown area",

    // Geofence area
    area: {
      type: "Polygon",
      coordinates: [[
        [-122.4200, 37.7700],
        [-122.4100, 37.7700], 
        [-122.4100, 37.7800],
        [-122.4200, 37.7800],
        [-122.4200, 37.7700]
      ]]
    },

    // Geofence configuration
    config: {
      type: "promotional", // promotional, security, analytics, notification
      radius: null, // For circular geofences
      isActive: true,

      // Trigger conditions
      triggers: {
        onEnter: true,
        onExit: true,
        onDwell: true,
        dwellTimeMinutes: 5,

        // Rate limiting
        minTimeBetweenTriggers: 300, // seconds
        maxTriggersPerDay: 10
      },

      // Actions to take
      actions: {
        notification: {
          enabled: true,
          title: "Coffee Deals Nearby!",
          message: "Check out special offers at local coffee shops",
          deepLink: "app://offers/coffee"
        },
        analytics: {
          trackEntry: true,
          trackExit: true,
          trackDwellTime: true
        },
        webhook: {
          enabled: false,
          url: "https://api.example.com/geofence-trigger",
          method: "POST"
        }
      }
    },

    // Analytics
    analytics: {
      totalEnters: 1456,
      totalExits: 1423,
      avgDwellTimeMinutes: 12.5,
      uniqueUsers: 342,

      // Time-based patterns
      hourlyActivity: {
        "08": 45, "09": 78, "10": 23, "11": 34,
        "12": 89, "13": 67, "14": 45, "15": 56,
        "16": 78, "17": 123, "18": 89, "19": 34
      },

      dailyActivity: {
        "monday": 234, "tuesday": 189, "wednesday": 267,
        "thursday": 201, "friday": 298, "saturday": 156, "sunday": 111
      }
    },

    createdAt: new Date("2024-09-01"),
    updatedAt: new Date("2024-09-14")
  };

  await geofences.createIndex({ area: "2dsphere" });
  await geofences.createIndex({ "config.isActive": 1, "config.type": 1 });

  await geofences.insertOne(geofenceDocument);

  // Real-time geofence checking function
  const checkGeofences = async (userId, currentLocation) => {
    console.log("Checking geofences for user location...");

    // Find all active geofences that contain the user's location
    const triggeredGeofences = await geofences.find({
      "config.isActive": true,
      area: {
        $geoIntersects: {
          $geometry: {
            type: "Point",
            coordinates: currentLocation
          }
        }
      }
    }).toArray();

    console.log(`Found ${triggeredGeofences.length} triggered geofences`);

    // Process each triggered geofence
    for (const geofence of triggeredGeofences) {
      console.log(`Processing geofence: ${geofence.name}`);

      // Update analytics
      await geofences.updateOne(
        { _id: geofence._id },
        {
          $inc: { 
            "analytics.totalEnters": 1,
            [`analytics.hourlyActivity.${new Date().getHours().toString().padStart(2, '0')}`]: 1,
            [`analytics.dailyActivity.${new Date().toLocaleDateString('en-US', { weekday: 'long' }).toLowerCase()}`]: 1
          },
          $set: { updatedAt: new Date() }
        }
      );

      // Trigger actions (notifications, webhooks, etc.)
      if (geofence.config.actions.notification.enabled) {
        console.log(`Sending notification: ${geofence.config.actions.notification.title}`);
        // Implementation would send actual notification
      }
    }

    return triggeredGeofences;
  };

  // Test geofence checking
  const testLocation = [-122.4150, 37.7750]; // Point within the geofence
  const triggeredFences = await checkGeofences(userLocationDocument.userId, testLocation);

  return {
    userLocationDocument,
    geofenceDocument,
    triggeredGeofences: triggeredFences.length
  };
};

// Advanced spatial analytics and heatmap generation
const generateSpatialAnalytics = async () => {
  const locations = db.collection('locations');
  const userLocations = db.collection('user_locations');

  console.log("=== Generating Spatial Analytics ===");

  // 1. Location Density Analysis
  const locationDensityAnalysis = await locations.aggregate([
    {
      $match: {
        "business.isActive": true
      }
    },

    // Create grid cells for density analysis
    {
      $addFields: {
        gridCell: {
          lat: {
            $floor: {
              $multiply: [
                { $arrayElemAt: ["$location.coordinates", 1] }, // latitude
                1000 // Create 0.001 degree grid cells (~100m)
              ]
            }
          },
          lng: {
            $floor: {
              $multiply: [
                { $arrayElemAt: ["$location.coordinates", 0] }, // longitude  
                1000
              ]
            }
          }
        }
      }
    },

    // Group by grid cell
    {
      $group: {
        _id: "$gridCell",
        locationCount: { $sum: 1 },
        avgRating: { $avg: "$business.rating" },
        categories: { $push: "$category" },

        // Calculate center point of grid cell
        centerCoordinates: {
          $first: {
            type: "Point",
            coordinates: [
              { $divide: ["$gridCell.lng", 1000] },
              { $divide: ["$gridCell.lat", 1000] }
            ]
          }
        },

        // Business metrics
        totalReviews: { $sum: "$business.totalReviews" },
        uniqueCategories: { $addToSet: "$category" }
      }
    },

    // Calculate density metrics
    {
      $addFields: {
        densityScore: {
          $multiply: [
            "$locationCount",
            { $divide: ["$avgRating", 5] } // Weight by average rating
          ]
        },
        categoryDiversity: { $size: "$uniqueCategories" }
      }
    },

    // Sort by density
    {
      $sort: { densityScore: -1 }
    },

    {
      $limit: 20 // Top 20 densest areas
    },

    {
      $project: {
        gridId: "$_id",
        locationCount: 1,
        densityScore: { $round: ["$densityScore", 2] },
        avgRating: { $round: ["$avgRating", 2] },
        categoryDiversity: 1,
        totalReviews: 1,
        centerCoordinates: 1
      }
    }
  ]).toArray();

  console.log(`Location Density Analysis - Found ${locationDensityAnalysis.length} high-density areas`);

  // 2. User Movement Patterns
  const userMovementAnalysis = await userLocations.aggregate([
    {
      $match: {
        "locationMetadata.timestamp": {
          $gte: new Date(Date.now() - 7 * 24 * 60 * 60 * 1000) // Last 7 days
        }
      }
    },

    // Unwind location history
    { $unwind: "$locationHistory" },

    // Calculate movement vectors
    {
      $addFields: {
        movement: {
          fromLat: { $arrayElemAt: ["$locationHistory.location.coordinates", 1] },
          fromLng: { $arrayElemAt: ["$locationHistory.location.coordinates", 0] },
          toLat: { $arrayElemAt: ["$currentLocation.coordinates", 1] },
          toLng: { $arrayElemAt: ["$currentLocation.coordinates", 0] },
          timestamp: "$locationHistory.timestamp"
        }
      }
    },

    // Calculate distance and bearing
    {
      $addFields: {
        "movement.distance": {
          // Haversine formula approximation
          $multiply: [
            6371000, // Earth radius in meters
            {
              $acos: {
                $add: [
                  {
                    $multiply: [
                      { $sin: { $multiply: [{ $degreesToRadians: "$movement.fromLat" }, 1] } },
                      { $sin: { $multiply: [{ $degreesToRadians: "$movement.toLat" }, 1] } }
                    ]
                  },
                  {
                    $multiply: [
                      { $cos: { $multiply: [{ $degreesToRadians: "$movement.fromLat" }, 1] } },
                      { $cos: { $multiply: [{ $degreesToRadians: "$movement.toLat" }, 1] } },
                      { $cos: {
                        $multiply: [
                          { $degreesToRadians: { $subtract: ["$movement.toLng", "$movement.fromLng"] } },
                          1
                        ]
                      } }
                    ]
                  }
                ]
              }
            }
          ]
        }
      }
    },

    // Group movement patterns
    {
      $group: {
        _id: {
          hour: { $hour: "$movement.timestamp" },
          dayOfWeek: { $dayOfWeek: "$movement.timestamp" }
        },

        totalMovements: { $sum: 1 },
        avgDistance: { $avg: "$movement.distance" },
        totalDistance: { $sum: "$movement.distance" },
        uniqueUsers: { $addToSet: "$userId" },

        // Movement characteristics
        shortMovements: {
          $sum: { $cond: [{ $lt: ["$movement.distance", 100] }, 1, 0] } // < 100m
        },
        mediumMovements: {
          $sum: { $cond: [
            { $and: [
              { $gte: ["$movement.distance", 100] },
              { $lt: ["$movement.distance", 1000] }
            ]}, 1, 0
          ] } // 100m - 1km
        },
        longMovements: {
          $sum: { $cond: [{ $gte: ["$movement.distance", 1000] }, 1, 0] } // > 1km
        }
      }
    },

    // Calculate additional metrics
    {
      $addFields: {
        uniqueUserCount: { $size: "$uniqueUsers" },
        avgMovementsPerUser: { $divide: ["$totalMovements", { $size: "$uniqueUsers" }] },
        movementDistribution: {
          short: { $divide: ["$shortMovements", "$totalMovements"] },
          medium: { $divide: ["$mediumMovements", "$totalMovements"] },
          long: { $divide: ["$longMovements", "$totalMovements"] }
        }
      }
    },

    {
      $sort: { totalMovements: -1 }
    },

    {
      $project: {
        hour: "$_id.hour",
        dayOfWeek: "$_id.dayOfWeek", 
        totalMovements: 1,
        uniqueUserCount: 1,
        avgDistance: { $round: ["$avgDistance", 1] },
        avgMovementsPerUser: { $round: ["$avgMovementsPerUser", 1] },
        movementDistribution: {
          short: { $round: ["$movementDistribution.short", 3] },
          medium: { $round: ["$movementDistribution.medium", 3] },
          long: { $round: ["$movementDistribution.long", 3] }
        }
      }
    }
  ]).toArray();

  console.log(`User Movement Analysis - Analyzed ${userMovementAnalysis.length} time periods`);

  // 3. Geographic Performance Analysis
  const geoPerformanceAnalysis = await locations.aggregate([
    {
      $match: {
        "business.isActive": true,
        "analytics.monthlyVisitors": { $exists: true }
      }
    },

    // Create geographic regions
    {
      $addFields: {
        region: {
          $switch: {
            branches: [
              {
                case: {
                  $and: [
                    { $gte: [{ $arrayElemAt: ["$location.coordinates", 1] }, 37.77] }, // North of 37.77°N
                    { $lte: [{ $arrayElemAt: ["$location.coordinates", 0] }, -122.41] } // West of -122.41°W
                  ]
                },
                then: "Northwest"
              },
              {
                case: {
                  $and: [
                    { $gte: [{ $arrayElemAt: ["$location.coordinates", 1] }, 37.77] },
                    { $gt: [{ $arrayElemAt: ["$location.coordinates", 0] }, -122.41] }
                  ]
                },
                then: "Northeast"
              },
              {
                case: {
                  $and: [
                    { $lt: [{ $arrayElemAt: ["$location.coordinates", 1] }, 37.77] },
                    { $lte: [{ $arrayElemAt: ["$location.coordinates", 0] }, -122.41] }
                  ]
                },
                then: "Southwest"
              },
              {
                case: {
                  $and: [
                    { $lt: [{ $arrayElemAt: ["$location.coordinates", 1] }, 37.77] },
                    { $gt: [{ $arrayElemAt: ["$location.coordinates", 0] }, -122.41] }
                  ]
                },
                then: "Southeast"
              }
            ],
            default: "Other"
          }
        }
      }
    },

    // Group by region and category
    {
      $group: {
        _id: {
          region: "$region",
          category: "$category"
        },

        locationCount: { $sum: 1 },
        avgRating: { $avg: "$business.rating" },
        avgMonthlyVisitors: { $avg: "$analytics.monthlyVisitors" },
        totalMonthlyVisitors: { $sum: "$analytics.monthlyVisitors" },

        // Performance metrics
        highPerformers: {
          $sum: {
            $cond: [
              {
                $and: [
                  { $gte: ["$business.rating", 4.5] },
                  { $gte: ["$analytics.monthlyVisitors", 10000] }
                ]
              }, 1, 0
            ]
          }
        },

        topLocation: {
          $max: {
            name: "$name",
            visitors: "$analytics.monthlyVisitors",
            rating: "$business.rating"
          }
        }
      }
    },

    // Calculate regional metrics
    {
      $group: {
        _id: "$_id.region",

        categories: {
          $push: {
            category: "$_id.category",
            locationCount: "$locationCount",
            avgRating: "$avgRating",
            avgMonthlyVisitors: "$avgMonthlyVisitors",
            totalMonthlyVisitors: "$totalMonthlyVisitors",
            highPerformers: "$highPerformers",
            topLocation: "$topLocation"
          }
        },

        regionalTotals: {
          totalLocations: { $sum: "$locationCount" },
          totalMonthlyVisitors: { $sum: "$totalMonthlyVisitors" },
          totalHighPerformers: { $sum: "$highPerformers" }
        }
      }
    },

    // Sort by total visitors
    {
      $sort: { "regionalTotals.totalMonthlyVisitors": -1 }
    },

    {
      $project: {
        region: "$_id",
        categories: 1,
        regionalTotals: 1,

        // Calculate regional performance metrics
        performanceMetrics: {
          avgVisitorsPerLocation: {
            $divide: ["$regionalTotals.totalMonthlyVisitors", "$regionalTotals.totalLocations"]
          },
          highPerformerRatio: {
            $divide: ["$regionalTotals.totalHighPerformers", "$regionalTotals.totalLocations"]
          }
        }
      }
    }
  ]).toArray();

  console.log(`Geographic Performance Analysis - Analyzed ${geoPerformanceAnalysis.length} regions`);

  return {
    densityAnalysis: locationDensityAnalysis,
    movementAnalysis: userMovementAnalysis,
    performanceAnalysis: geoPerformanceAnalysis,

    summary: {
      densityHotspots: locationDensityAnalysis.length,
      movementPatterns: userMovementAnalysis.length,
      regionalInsights: geoPerformanceAnalysis.length
    }
  };
};

// Benefits of MongoDB Geospatial Features:
// - Native GeoJSON support with automatic validation
// - Multiple coordinate reference systems (2D, 2dsphere)
// - Built-in spatial operators and aggregation functions
// - Automatic spatial indexing with B-tree and R-tree structures
// - Spherical geometry calculations for Earth-based applications
// - Integration with aggregation framework for complex analytics
// - Real-time geofencing and location tracking capabilities
// - Scalable to billions of location data points
// - Simple query syntax compared to PostGIS extensions
// - No additional setup required - works out of the box

module.exports = {
  createLocationServiceDataModel,
  performGeospatialOperations,
  setupLocationTracking,
  generateSpatialAnalytics
};

Understanding MongoDB Geospatial Architecture

Coordinate Systems and Indexing Strategies

MongoDB supports multiple geospatial indexing approaches optimized for different use cases:

// Advanced geospatial indexing and coordinate system management
class GeospatialIndexManager {
  constructor(db) {
    this.db = db;
    this.collections = new Map();
  }

  async setupGeospatialIndexing() {
    // 1. 2dsphere Index - For spherical geometry (Earth-based coordinates)
    const locations = this.db.collection('locations');

    // Create 2dsphere index for GeoJSON objects
    await locations.createIndex({ location: "2dsphere" });

    // Compound index for filtered geospatial queries
    await locations.createIndex({ 
      category: 1, 
      "business.isActive": 1, 
      location: "2dsphere" 
    });

    // Text and geospatial compound index
    await locations.createIndex({
      "$**": "text",
      location: "2dsphere"
    });

    console.log("2dsphere indexes created for global location queries");

    // 2. 2d Index - For flat geometry (game maps, floor plans)
    const gameLocations = this.db.collection('game_locations');

    // 2d index for flat coordinate system (e.g., game world coordinates)
    await gameLocations.createIndex({ position: "2d" });

    // Example game location document
    const gameLocationDoc = {
      _id: new ObjectId(),
      playerId: new ObjectId(),
      characterName: "DragonSlayer42",

      // Flat 2D coordinates for game world
      position: [1250.5, 875.2], // [x, y] coordinates in game units

      // Game-specific data
      level: 45,
      zone: "Enchanted Forest",
      server: "US-East-1",

      // Bounding box for area of influence
      areaOfInfluence: {
        bottomLeft: [1200, 825],
        topRight: [1300, 925]
      },

      lastUpdated: new Date()
    };

    await gameLocations.insertOne(gameLocationDoc);
    console.log("2d index created for flat coordinate system");

    // 3. Specialized indexing for different data patterns
    const trajectories = this.db.collection('vehicle_trajectories');

    // Index for trajectory lines and paths
    await trajectories.createIndex({ route: "2dsphere" });
    await trajectories.createIndex({ vehicleId: 1, timestamp: 1 });

    // Example trajectory document
    const trajectoryDoc = {
      _id: new ObjectId(),
      vehicleId: "TRUCK_001",
      driverId: new ObjectId(),

      // LineString geometry for route
      route: {
        type: "LineString",
        coordinates: [
          [-122.4194, 37.7749], // Start point
          [-122.4184, 37.7759], // Waypoint 1
          [-122.4174, 37.7769], // Waypoint 2
          [-122.4164, 37.7779]  // End point
        ]
      },

      // Route metadata
      routeMetadata: {
        totalDistance: 2.3, // km
        estimatedTime: 8, // minutes
        actualTime: 9.5, // minutes
        fuelUsed: 0.45, // liters
        trafficConditions: "moderate"
      },

      // Time-based tracking
      startTime: new Date("2024-09-18T14:30:00Z"),
      endTime: new Date("2024-09-18T14:39:30Z"),

      // Performance metrics
      metrics: {
        averageSpeed: 14.5, // km/h
        maxSpeed: 25.0,
        idleTime: 45, // seconds
        hardBrakingEvents: 1,
        hardAccelerationEvents: 0
      }
    };

    await trajectories.insertOne(trajectoryDoc);
    console.log("Trajectory tracking setup completed");

    return {
      sphericalIndexes: ["locations.location", "locations.compound"],
      flatIndexes: ["game_locations.position"],
      trajectoryIndexes: ["trajectories.route"]
    };
  }

  async performAdvancedSpatialQueries() {
    const locations = this.db.collection('locations');

    // 1. Multi-stage geospatial aggregation
    console.log("=== Advanced Spatial Aggregation ===");

    const complexSpatialAnalysis = await locations.aggregate([
      // Stage 1: Geospatial filtering
      {
        $geoNear: {
          near: {
            type: "Point",
            coordinates: [-122.4194, 37.7749]
          },
          distanceField: "calculatedDistance",
          maxDistance: 10000, // 10km
          spherical: true,
          query: { "business.isActive": true }
        }
      },

      // Stage 2: Spatial relationship analysis
      {
        $addFields: {
          // Distance categories
          distanceCategory: {
            $switch: {
              branches: [
                { case: { $lte: ["$calculatedDistance", 1000] }, then: "nearby" },
                { case: { $lte: ["$calculatedDistance", 5000] }, then: "moderate" },
                { case: { $lte: ["$calculatedDistance", 10000] }, then: "distant" }
              ],
              default: "very_distant"
            }
          },

          // Spatial density calculation
          spatialDensity: {
            $divide: ["$analytics.monthlyVisitors", { $add: ["$calculatedDistance", 1] }]
          }
        }
      },

      // Stage 3: Complex geospatial grouping
      {
        $group: {
          _id: {
            category: "$category",
            distanceCategory: "$distanceCategory"
          },

          locations: { $push: "$$ROOT" },
          avgDistance: { $avg: "$calculatedDistance" },
          avgRating: { $avg: "$business.rating" },
          avgDensity: { $avg: "$spatialDensity" },
          count: { $sum: 1 },

          // Geospatial aggregations
          centroid: {
            $avg: {
              coordinates: "$location.coordinates"
            }
          },

          // Bounding box calculation
          minLat: { $min: { $arrayElemAt: ["$location.coordinates", 1] } },
          maxLat: { $max: { $arrayElemAt: ["$location.coordinates", 1] } },
          minLng: { $min: { $arrayElemAt: ["$location.coordinates", 0] } },
          maxLng: { $max: { $arrayElemAt: ["$location.coordinates", 0] } }
        }
      },

      // Stage 4: Spatial statistics
      {
        $addFields: {
          boundingBox: {
            type: "Polygon",
            coordinates: [[
              ["$minLng", "$minLat"],
              ["$maxLng", "$minLat"], 
              ["$maxLng", "$maxLat"],
              ["$minLng", "$maxLat"],
              ["$minLng", "$minLat"]
            ]]
          },

          // Geographic spread calculation
          geographicSpread: {
            $sqrt: {
              $add: [
                { $pow: [{ $subtract: ["$maxLat", "$minLat"] }, 2] },
                { $pow: [{ $subtract: ["$maxLng", "$minLng"] }, 2] }
              ]
            }
          }
        }
      },

      {
        $sort: { count: -1, avgDensity: -1 }
      }
    ]).toArray();

    console.log(`Complex Spatial Analysis - ${complexSpatialAnalysis.length} category/distance combinations`);

    // 2. Intersection and overlay queries
    console.log("\n=== Spatial Intersection Analysis ===");

    const intersectionAnalysis = await locations.aggregate([
      {
        $match: {
          "business.isActive": true,
          deliveryZones: { $exists: true }
        }
      },

      // Find intersections between delivery zones
      {
        $lookup: {
          from: "locations",
          let: { currentZones: "$deliveryZones" },
          pipeline: [
            {
              $match: {
                $expr: {
                  $and: [
                    { $ne: ["$_id", "$$ROOT._id"] }, // Different location
                    { $ne: ["$$currentZones", null] },
                    {
                      $gt: [{
                        $size: {
                          $filter: {
                            input: "$deliveryZones.coordinates",
                            cond: {
                              // Simplified intersection check
                              $anyElementTrue: {
                                $map: {
                                  input: "$$currentZones.coordinates",
                                  in: { $ne: ["$$this", null] }
                                }
                              }
                            }
                          }
                        }
                      }, 0]
                    }
                  ]
                }
              }
            },
            {
              $project: {
                name: 1,
                category: 1,
                "business.rating": 1
              }
            }
          ],
          as: "overlappingLocations"
        }
      },

      // Calculate overlap metrics
      {
        $addFields: {
          overlapCount: { $size: "$overlappingLocations" },
          hasOverlap: { $gt: [{ $size: "$overlappingLocations" }, 0] },
          competitionLevel: {
            $switch: {
              branches: [
                { case: { $gte: [{ $size: "$overlappingLocations" }, 5] }, then: "high" },
                { case: { $gte: [{ $size: "$overlappingLocations" }, 2] }, then: "medium" },
                { case: { $gt: [{ $size: "$overlappingLocations" }, 0] }, then: "low" }
              ],
              default: "none"
            }
          }
        }
      },

      {
        $match: { hasOverlap: true }
      },

      {
        $group: {
          _id: "$category",
          avgOverlapCount: { $avg: "$overlapCount" },
          locationsWithOverlap: { $sum: 1 },
          highCompetitionAreas: {
            $sum: { $cond: [{ $eq: ["$competitionLevel", "high"] }, 1, 0] }
          }
        }
      },

      { $sort: { avgOverlapCount: -1 } }
    ]).toArray();

    console.log(`Intersection Analysis - ${intersectionAnalysis.length} categories with delivery zone overlaps`);

    // 3. Temporal-spatial analysis
    console.log("\n=== Temporal-Spatial Analysis ===");

    const temporalSpatialAnalysis = await this.db.collection('user_locations').aggregate([
      {
        $match: {
          "locationMetadata.timestamp": {
            $gte: new Date(Date.now() - 24 * 60 * 60 * 1000) // Last 24 hours
          }
        }
      },

      // Unwind location history for temporal analysis
      { $unwind: "$locationHistory" },

      // Create time buckets
      {
        $addFields: {
          timeBucket: {
            $dateTrunc: {
              date: "$locationHistory.timestamp",
              unit: "hour"
            }
          },

          // Grid cell for spatial grouping
          spatialGrid: {
            lat: {
              $floor: {
                $multiply: [
                  { $arrayElemAt: ["$locationHistory.location.coordinates", 1] },
                  1000 // 0.001 degree precision
                ]
              }
            },
            lng: {
              $floor: {
                $multiply: [
                  { $arrayElemAt: ["$locationHistory.location.coordinates", 0] },
                  1000
                ]
              }
            }
          }
        }
      },

      // Group by time and space
      {
        $group: {
          _id: {
            timeBucket: "$timeBucket",
            spatialGrid: "$spatialGrid"
          },

          uniqueUsers: { $addToSet: "$userId" },
          totalEvents: { $sum: 1 },
          avgAccuracy: { $avg: "$locationHistory.accuracy" },

          // Location cluster center
          centerLat: { $avg: { $arrayElemAt: ["$locationHistory.location.coordinates", 1] } },
          centerLng: { $avg: { $arrayElemAt: ["$locationHistory.location.coordinates", 0] } }
        }
      },

      // Calculate density metrics
      {
        $addFields: {
          userDensity: { $size: "$uniqueUsers" },
          eventDensity: "$totalEvents",
          densityScore: { $multiply: [{ $size: "$uniqueUsers" }, { $log: { $add: ["$totalEvents", 1] } }] }
        }
      },

      // Temporal pattern analysis
      {
        $group: {
          _id: { $hour: "$_id.timeBucket" },

          totalGridCells: { $sum: 1 },
          avgUserDensity: { $avg: "$userDensity" },
          maxUserDensity: { $max: "$userDensity" },
          totalUniqueUsers: { $sum: "$userDensity" },

          // Hotspot identification
          hotspots: {
            $push: {
              $cond: [
                { $gte: ["$densityScore", 10] },
                {
                  center: { type: "Point", coordinates: ["$centerLng", "$centerLat"] },
                  userDensity: "$userDensity",
                  densityScore: "$densityScore"
                },
                null
              ]
            }
          }
        }
      },

      // Clean up hotspots array
      {
        $addFields: {
          hotspots: {
            $filter: {
              input: "$hotspots",
              cond: { $ne: ["$$this", null] }
            }
          }
        }
      },

      { $sort: { "_id": 1 } },

      {
        $project: {
          hour: "$_id",
          totalGridCells: 1,
          avgUserDensity: { $round: ["$avgUserDensity", 2] },
          maxUserDensity: 1,
          totalUniqueUsers: 1,
          hotspotCount: { $size: "$hotspots" },
          topHotspots: { $slice: ["$hotspots", 5] }
        }
      }
    ]).toArray();

    console.log(`Temporal-Spatial Analysis - ${temporalSpatialAnalysis.length} hourly patterns`);

    return {
      complexSpatialResults: complexSpatialAnalysis.length,
      intersectionResults: intersectionAnalysis.length,  
      temporalSpatialResults: temporalSpatialAnalysis.length,

      insights: {
        spatialComplexity: complexSpatialAnalysis,
        deliveryOverlaps: intersectionAnalysis,
        hourlyPatterns: temporalSpatialAnalysis
      }
    };
  }

  async optimizeGeospatialPerformance() {
    console.log("=== Geospatial Performance Optimization ===");

    // 1. Index performance analysis
    const locations = this.db.collection('locations');

    // Test different query patterns
    const performanceTests = [
      {
        name: "Simple Proximity Query",
        query: {
          location: {
            $near: {
              $geometry: { type: "Point", coordinates: [-122.4194, 37.7749] },
              $maxDistance: 5000
            }
          }
        }
      },
      {
        name: "Filtered Proximity Query", 
        query: {
          location: {
            $near: {
              $geometry: { type: "Point", coordinates: [-122.4194, 37.7749] },
              $maxDistance: 5000
            }
          },
          category: "restaurant",
          "business.isActive": true
        }
      },
      {
        name: "Geo Within Query",
        query: {
          location: {
            $geoWithin: {
              $centerSphere: [[-122.4194, 37.7749], 5 / 3963.2] // 5 miles
            }
          }
        }
      }
    ];

    const performanceResults = [];

    for (const test of performanceTests) {
      const startTime = Date.now();

      const results = await locations.find(test.query)
        .limit(20)
        .explain("executionStats");

      const executionTime = Date.now() - startTime;

      performanceResults.push({
        testName: test.name,
        executionTimeMs: executionTime,
        documentsExamined: results.executionStats.totalDocsExamined,
        documentsReturned: results.executionStats.totalDocsReturned,
        indexUsed: results.executionStats.executionStages?.indexName || "none",
        efficiency: results.executionStats.totalDocsReturned / Math.max(results.executionStats.totalDocsExamined, 1)
      });
    }

    console.log("Performance Test Results:");
    performanceResults.forEach(result => {
      console.log(`${result.testName}: ${result.executionTimeMs}ms, Efficiency: ${(result.efficiency * 100).toFixed(1)}%`);
    });

    // 2. Index recommendations
    const indexRecommendations = await this.analyzeIndexUsage(locations);

    // 3. Memory usage optimization
    const memoryOptimization = await this.optimizeMemoryUsage(locations);

    return {
      performanceResults,
      indexRecommendations,
      memoryOptimization,

      recommendations: [
        "Use 2dsphere indexes for Earth-based coordinates",
        "Include commonly filtered fields in compound indexes",
        "Limit result sets with appropriate $maxDistance values", 
        "Use $geoNear aggregation for complex distance-based analytics",
        "Monitor index usage and query patterns regularly"
      ]
    };
  }

  async analyzeIndexUsage(collection) {
    // Get index usage statistics
    const indexStats = await collection.aggregate([
      { $indexStats: {} }
    ]).toArray();

    const recommendations = [];

    indexStats.forEach(stat => {
      const usageRatio = stat.accesses.ops / (stat.accesses.since?.getTime() || 1);

      if (usageRatio < 0.001) {
        recommendations.push({
          type: "remove",
          index: stat.name,
          reason: "Low usage index - consider removing",
          usage: usageRatio
        });
      } else if (usageRatio > 10) {
        recommendations.push({
          type: "optimize",
          index: stat.name, 
          reason: "High usage index - ensure optimal configuration",
          usage: usageRatio
        });
      }
    });

    return {
      totalIndexes: indexStats.length,
      recommendations: recommendations,
      indexStats: indexStats
    };
  }

  async optimizeMemoryUsage(collection) {
    // Analyze document sizes and memory patterns
    const sizeAnalysis = await collection.aggregate([
      {
        $project: {
          documentSize: { $bsonSize: "$$ROOT" },
          hasLocationHistory: { $ne: ["$locationHistory", null] },
          locationHistorySize: { $size: { $ifNull: ["$locationHistory", []] } },
          hasDeliveryZones: { $ne: ["$deliveryZones", null] }
        }
      },
      {
        $group: {
          _id: null,

          avgDocumentSize: { $avg: "$documentSize" },
          maxDocumentSize: { $max: "$documentSize" },
          minDocumentSize: { $min: "$documentSize" },

          largeDocuments: { $sum: { $cond: [{ $gt: ["$documentSize", 16384] }, 1, 0] } }, // > 16KB
          documentsWithHistory: { $sum: { $cond: ["$hasLocationHistory", 1, 0] } },
          avgHistorySize: { $avg: "$locationHistorySize" },

          totalDocuments: { $sum: 1 }
        }
      }
    ]).toArray();

    const analysis = sizeAnalysis[0] || {};

    const optimizationTips = [];

    if (analysis.avgDocumentSize > 8192) {
      optimizationTips.push("Consider splitting large documents or using references");
    }

    if (analysis.avgHistorySize > 100) {
      optimizationTips.push("Limit location history array size or archive old data");
    }

    if (analysis.largeDocuments > analysis.totalDocuments * 0.1) {
      optimizationTips.push("High number of large documents - review document structure");
    }

    return {
      sizeAnalysis: analysis,
      optimizationTips: optimizationTips,

      recommendations: {
        documentSize: "Keep documents under 16MB, optimal under 1MB",
        arrays: "Limit embedded arrays to prevent unbounded growth", 
        indexing: "Use partial indexes for sparse geospatial data",
        sharding: "Consider sharding key that includes geospatial distribution"
      }
    };
  }
}

SQL-Style Geospatial Operations with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB's powerful geospatial capabilities:

-- QueryLeaf geospatial operations with SQL-familiar syntax

-- Create geospatial-enabled table/collection
CREATE TABLE locations (
  id OBJECTID PRIMARY KEY,
  name VARCHAR(255) NOT NULL,
  category VARCHAR(100),

  -- Geospatial columns with native GeoJSON support
  location POINT NOT NULL, -- GeoJSON Point
  service_area POLYGON,    -- GeoJSON Polygon
  delivery_zones MULTIPOLYGON, -- GeoJSON MultiPolygon

  -- Business data
  rating DECIMAL(3,2),
  total_reviews INTEGER DEFAULT 0,
  is_active BOOLEAN DEFAULT true,

  -- Address information
  address DOCUMENT {
    street VARCHAR(255),
    city VARCHAR(100),
    state VARCHAR(50),
    country VARCHAR(100),
    postal_code VARCHAR(20)
  },

  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Create geospatial indexes
CREATE SPATIAL INDEX idx_locations_location ON locations (location);
CREATE SPATIAL INDEX idx_locations_service_area ON locations (service_area);
CREATE COMPOUND INDEX idx_locations_category_geo ON locations (category, location);

-- Insert location data with geospatial coordinates
INSERT INTO locations (name, category, location, service_area, address, rating, total_reviews)
VALUES (
  'Blue Bottle Coffee',
  'cafe', 
  ST_POINT(-122.3937, 37.7955), -- Longitude, Latitude
  ST_POLYGON(ARRAY[
    ARRAY[-122.4050, 37.7850], -- Southwest
    ARRAY[-122.3850, 37.7850], -- Southeast  
    ARRAY[-122.3850, 37.8050], -- Northeast
    ARRAY[-122.4050, 37.8050], -- Northwest
    ARRAY[-122.4050, 37.7850]  -- Close polygon
  ]),
  {
    street: '1 Ferry Building',
    city: 'San Francisco',
    state: 'CA',
    country: 'USA',
    postal_code: '94111'
  },
  4.6,
  1247
);

-- Proximity search - find nearby locations
SELECT 
  id,
  name,
  category,
  rating,

  -- Calculate distance in meters
  ST_DISTANCE(location, ST_POINT(-122.4194, 37.7749)) as distance_meters,

  -- Extract coordinates for display
  ST_X(location) as longitude,
  ST_Y(location) as latitude,

  -- Address information
  address.street,
  address.city,
  address.state

FROM locations
WHERE 
  is_active = true
  AND ST_DISTANCE(location, ST_POINT(-122.4194, 37.7749)) <= 5000 -- Within 5km
  AND category IN ('cafe', 'restaurant', 'retail')
ORDER BY ST_DISTANCE(location, ST_POINT(-122.4194, 37.7749))
LIMIT 20;

-- Advanced proximity search with relevance scoring
WITH nearby_locations AS (
  SELECT 
    *,
    ST_DISTANCE(location, ST_POINT(-122.4194, 37.7749)) as distance_meters
  FROM locations
  WHERE 
    is_active = true
    AND ST_DWITHIN(location, ST_POINT(-122.4194, 37.7749), 10000) -- 10km radius
),
scored_locations AS (
  SELECT *,
    -- Relevance scoring: distance (40%) + rating (30%) + reviews (30%)
    (
      (1000 - LEAST(distance_meters, 1000)) / 1000 * 0.4 +
      (rating / 5.0) * 0.3 +
      (LEAST(total_reviews, 1000) / 1000.0) * 0.3
    ) as relevance_score,

    -- Distance categories
    CASE 
      WHEN distance_meters <= 1000 THEN 'nearby'
      WHEN distance_meters <= 5000 THEN 'moderate'
      ELSE 'distant'
    END as distance_category

  FROM nearby_locations
)
SELECT 
  name,
  category, 
  rating,
  total_reviews,
  ROUND(distance_meters) as distance_m,
  distance_category,
  ROUND(relevance_score, 3) as relevance,

  -- Format coordinates for maps
  CONCAT(
    ROUND(ST_Y(location), 6), ',', 
    ROUND(ST_X(location), 6)
  ) as lat_lng

FROM scored_locations
ORDER BY relevance_score DESC, distance_meters ASC
LIMIT 25;

-- Geospatial area queries
SELECT 
  l.name,
  l.category,
  l.rating,

  -- Check if location is within specific area
  ST_CONTAINS(
    ST_POLYGON(ARRAY[
      ARRAY[-122.4270, 37.7609], -- Downtown SF polygon
      ARRAY[-122.3968, 37.7609],
      ARRAY[-122.3968, 37.7908], 
      ARRAY[-122.4270, 37.7908],
      ARRAY[-122.4270, 37.7609]
    ]),
    l.location
  ) as is_in_downtown,

  -- Check service area coverage
  ST_CONTAINS(l.service_area, ST_POINT(-122.4194, 37.7749)) as serves_user_location

FROM locations l
WHERE 
  l.is_active = true
  AND ST_INTERSECTS(
    l.location,
    ST_POLYGON(ARRAY[
      ARRAY[-122.4270, 37.7609],
      ARRAY[-122.3968, 37.7609], 
      ARRAY[-122.3968, 37.7908],
      ARRAY[-122.4270, 37.7908],
      ARRAY[-122.4270, 37.7609]
    ])
  );

-- Complex geospatial analytics with aggregation
WITH location_analytics AS (
  SELECT 
    category,

    -- Spatial clustering analysis
    ST_CLUSTERKMEANS(location, 5) OVER () as cluster_id,

    -- Distance from city center
    ST_DISTANCE(location, ST_POINT(-122.4194, 37.7749)) as distance_from_center,

    -- Geospatial grid for density analysis
    ST_SNAPGRID(location, 0.001, 0.001) as grid_cell,

    name,
    rating,
    total_reviews,
    location

  FROM locations
  WHERE is_active = true
),
cluster_analysis AS (
  SELECT 
    cluster_id,
    category,
    COUNT(*) as location_count,
    AVG(rating) as avg_rating,
    AVG(distance_from_center) as avg_distance_from_center,

    -- Calculate cluster centroid
    ST_CENTROID(ST_COLLECT(location)) as cluster_center,

    -- Calculate cluster bounds
    ST_ENVELOPE(ST_COLLECT(location)) as cluster_bounds,

    -- Business metrics
    SUM(total_reviews) as total_reviews,
    AVG(total_reviews) as avg_reviews_per_location

  FROM location_analytics
  GROUP BY cluster_id, category
),
grid_density AS (
  SELECT 
    grid_cell,
    COUNT(DISTINCT category) as category_diversity,
    COUNT(*) as location_density,
    AVG(rating) as avg_rating,

    -- Calculate grid cell center
    ST_CENTROID(grid_cell) as grid_center

  FROM location_analytics
  GROUP BY grid_cell
  HAVING COUNT(*) >= 3 -- Only dense grid cells
)
SELECT 
  ca.cluster_id,
  ca.category,
  ca.location_count,
  ROUND(ca.avg_rating, 2) as avg_rating,
  ROUND(ca.avg_distance_from_center) as avg_distance_m,

  -- Cluster geographic data
  ST_X(ca.cluster_center) as cluster_lng,
  ST_Y(ca.cluster_center) as cluster_lat,

  -- Calculate cluster area in square meters
  ST_AREA(ca.cluster_bounds, true) as cluster_area_sqm,

  -- Density metrics
  ROUND(ca.location_count / ST_AREA(ca.cluster_bounds, true) * 1000000, 2) as density_per_sqkm,

  -- Business performance
  ca.total_reviews,
  ROUND(ca.avg_reviews_per_location) as avg_reviews,

  -- Nearby high-density areas
  (
    SELECT COUNT(*)
    FROM grid_density gd
    WHERE ST_DISTANCE(ca.cluster_center, gd.grid_center) <= 1000
  ) as nearby_dense_areas

FROM cluster_analysis ca
WHERE ca.location_count >= 2
ORDER BY ca.location_count DESC, ca.avg_rating DESC;

-- Geofencing and real-time location queries
CREATE TABLE geofences (
  id OBJECTID PRIMARY KEY,
  name VARCHAR(255) NOT NULL,
  geofence_area POLYGON NOT NULL,
  geofence_type VARCHAR(50) DEFAULT 'notification',
  is_active BOOLEAN DEFAULT true,

  -- Trigger configuration
  config DOCUMENT {
    on_enter BOOLEAN DEFAULT true,
    on_exit BOOLEAN DEFAULT true,
    on_dwell BOOLEAN DEFAULT false,
    dwell_time_minutes INTEGER DEFAULT 5,
    max_triggers_per_day INTEGER DEFAULT 10
  },

  -- Analytics tracking
  analytics DOCUMENT {
    total_enters INTEGER DEFAULT 0,
    total_exits INTEGER DEFAULT 0,
    unique_users INTEGER DEFAULT 0,
    avg_dwell_minutes DECIMAL(8,2) DEFAULT 0
  },

  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE SPATIAL INDEX idx_geofences_area ON geofences (geofence_area);

-- Check geofence triggers for user location
SELECT 
  gf.id,
  gf.name,
  gf.geofence_type,

  -- Check if user location triggers geofence
  ST_CONTAINS(gf.geofence_area, ST_POINT(-122.4150, 37.7750)) as is_triggered,

  -- Calculate distance to geofence edge
  ST_DISTANCE(
    ST_POINT(-122.4150, 37.7750),
    ST_BOUNDARY(gf.geofence_area)
  ) as distance_to_edge_m,

  -- Geofence area and perimeter
  ST_AREA(gf.geofence_area, true) as area_sqm,
  ST_PERIMETER(gf.geofence_area, true) as perimeter_m,

  -- Configuration and analytics
  gf.config,
  gf.analytics

FROM geofences gf
WHERE 
  gf.is_active = true
  AND (
    ST_CONTAINS(gf.geofence_area, ST_POINT(-122.4150, 37.7750)) -- Inside geofence
    OR ST_DISTANCE(
      ST_POINT(-122.4150, 37.7750), 
      gf.geofence_area
    ) <= 100 -- Within 100m of geofence
  );

-- Time-based geospatial analysis
CREATE TABLE user_location_history (
  id OBJECTID PRIMARY KEY,
  user_id OBJECTID NOT NULL,
  location POINT NOT NULL,
  recorded_at TIMESTAMP NOT NULL,
  accuracy_meters DECIMAL(8,2),
  activity_type VARCHAR(50),

  -- Movement data
  speed_mps DECIMAL(8,2), -- meters per second
  heading_degrees INTEGER, -- 0-360 degrees from north

  -- Context information
  context DOCUMENT {
    battery_level INTEGER,
    connection_type VARCHAR(50),
    app_state VARCHAR(50)
  }
);

CREATE COMPOUND INDEX idx_user_location_time_geo ON user_location_history (
  user_id, recorded_at, location
);

-- Movement pattern analysis
WITH user_movements AS (
  SELECT 
    user_id,
    location,
    recorded_at,

    -- Calculate distance from previous location
    ST_DISTANCE(
      location,
      LAG(location) OVER (
        PARTITION BY user_id 
        ORDER BY recorded_at
      )
    ) as movement_distance,

    -- Time since previous location
    EXTRACT(EPOCH FROM (
      recorded_at - LAG(recorded_at) OVER (
        PARTITION BY user_id 
        ORDER BY recorded_at
      )
    )) as time_elapsed_seconds,

    -- Previous location for trajectory analysis
    LAG(location) OVER (
      PARTITION BY user_id 
      ORDER BY recorded_at
    ) as previous_location

  FROM user_location_history
  WHERE recorded_at >= CURRENT_TIMESTAMP - INTERVAL '24 hours'
),
movement_metrics AS (
  SELECT 
    user_id,
    COUNT(*) as location_points,
    SUM(movement_distance) as total_distance_m,
    AVG(movement_distance / NULLIF(time_elapsed_seconds, 0)) as avg_speed_mps,
    MAX(movement_distance / NULLIF(time_elapsed_seconds, 0)) as max_speed_mps,

    -- Create trajectory line
    ST_MAKELINE(ARRAY_AGG(location ORDER BY recorded_at)) as trajectory,

    -- Calculate bounding box of movement
    ST_ENVELOPE(ST_COLLECT(location)) as movement_bounds,

    -- Time-based metrics
    MIN(recorded_at) as journey_start,
    MAX(recorded_at) as journey_end,
    EXTRACT(EPOCH FROM (MAX(recorded_at) - MIN(recorded_at))) as journey_duration_seconds,

    -- Movement patterns
    COUNT(DISTINCT ST_SNAPGRID(location, 0.001, 0.001)) as unique_areas_visited

  FROM user_movements
  WHERE movement_distance IS NOT NULL
    AND time_elapsed_seconds > 0
    AND movement_distance < 10000 -- Filter out GPS errors
  GROUP BY user_id
)
SELECT 
  user_id,
  location_points,
  ROUND(total_distance_m) as total_distance_m,
  ROUND(total_distance_m / 1000.0, 2) as total_distance_km,
  ROUND(avg_speed_mps * 3.6, 1) as avg_speed_kmh, -- Convert to km/h
  ROUND(max_speed_mps * 3.6, 1) as max_speed_kmh,

  -- Journey characteristics
  journey_start,
  journey_end,
  ROUND(journey_duration_seconds / 3600.0, 1) as journey_hours,
  unique_areas_visited,

  -- Trajectory analysis
  ST_LENGTH(trajectory, true) as trajectory_length_m,
  ST_AREA(movement_bounds, true) as coverage_area_sqm,

  -- Movement efficiency (straight-line vs actual distance)
  ROUND(
    ST_DISTANCE(
      ST_STARTPOINT(trajectory),
      ST_ENDPOINT(trajectory)
    ) / NULLIF(ST_LENGTH(trajectory, true), 0) * 100, 1
  ) as movement_efficiency_pct,

  -- Geographic extent
  ST_XMIN(movement_bounds) as min_longitude,
  ST_XMAX(movement_bounds) as max_longitude, 
  ST_YMIN(movement_bounds) as min_latitude,
  ST_YMAX(movement_bounds) as max_latitude

FROM movement_metrics
WHERE total_distance_m > 100 -- Minimum movement threshold
ORDER BY total_distance_m DESC
LIMIT 50;

-- Location-based recommendations engine
WITH user_preferences AS (
  SELECT 
    u.user_id,
    u.location as current_location,

    -- User preference analysis based on visit history
    up.preferred_categories,
    up.avg_rating_threshold,
    up.max_distance_preference,
    up.price_range_preference

  FROM user_profiles u
  JOIN user_preferences up ON u.user_id = up.user_id
  WHERE u.is_active = true
),
location_scoring AS (
  SELECT 
    l.*,
    up.user_id,

    -- Distance scoring
    ST_DISTANCE(l.location, up.current_location) as distance_m,
    EXP(-ST_DISTANCE(l.location, up.current_location) / 2000.0) as distance_score,

    -- Category preference scoring
    CASE 
      WHEN l.category = ANY(up.preferred_categories) THEN 1.0
      WHEN ARRAY_LENGTH(up.preferred_categories, 1) = 0 THEN 0.5
      ELSE 0.2
    END as category_score,

    -- Rating scoring
    l.rating / 5.0 as rating_score,

    -- Popularity scoring based on reviews
    LN(l.total_reviews + 1) / LN(1000) as popularity_score,

    -- Time-based scoring (open/closed)
    CASE 
      WHEN EXTRACT(DOW FROM CURRENT_TIMESTAMP) = 0 THEN -- Sunday
        CASE WHEN l.hours.sunday.is_open THEN 1.0 ELSE 0.3 END
      WHEN EXTRACT(DOW FROM CURRENT_TIMESTAMP) = 1 THEN -- Monday
        CASE WHEN l.hours.monday.is_open THEN 1.0 ELSE 0.3 END
      -- ... other days
      ELSE 0.8
    END as availability_score

  FROM locations l
  CROSS JOIN user_preferences up
  WHERE 
    l.is_active = true
    AND ST_DISTANCE(l.location, up.current_location) <= up.max_distance_preference
    AND l.rating >= up.avg_rating_threshold
),
final_recommendations AS (
  SELECT *,
    -- Combined relevance score
    (
      distance_score * 0.25 +
      category_score * 0.30 +
      rating_score * 0.20 +
      popularity_score * 0.15 +
      availability_score * 0.10
    ) as relevance_score

  FROM location_scoring
)
SELECT 
  user_id,
  name as location_name,
  category,
  rating,
  total_reviews,
  ROUND(distance_m) as distance_meters,
  ROUND(relevance_score, 3) as relevance,

  -- Location details for display
  ST_X(location) as longitude,
  ST_Y(location) as latitude,
  address.street || ', ' || address.city as display_address,

  -- Recommendation reasoning
  CASE 
    WHEN category_score = 1.0 THEN 'Matches your preferences'
    WHEN distance_score > 0.8 THEN 'Very close to you'
    WHEN rating_score >= 0.9 THEN 'Highly rated'
    WHEN popularity_score > 0.5 THEN 'Popular destination'
    ELSE 'Good option nearby'
  END as recommendation_reason

FROM final_recommendations
WHERE relevance_score > 0.3
ORDER BY user_id, relevance_score DESC
LIMIT 10 PER user_id;

-- QueryLeaf geospatial features provide:
-- 1. Native GeoJSON support with SQL-familiar geometry functions
-- 2. Spatial indexing with automatic optimization for Earth-based coordinates
-- 3. Distance calculations and proximity queries with intuitive syntax
-- 4. Complex geospatial aggregations and analytics using familiar SQL patterns
-- 5. Geofencing capabilities with real-time trigger detection
-- 6. Movement pattern analysis and trajectory tracking
-- 7. Location-based recommendation engines with multi-factor scoring
-- 8. Integration with MongoDB's native geospatial operators and functions
-- 9. Performance optimization through intelligent query planning
-- 10. Seamless scaling from simple proximity queries to complex spatial analytics

Best Practices for Geospatial Implementation

Coordinate System Selection

Choose the appropriate coordinate system and indexing strategy:

  1. 2dsphere Index: Use for Earth-based coordinates with spherical geometry calculations
  2. 2d Index: Use for flat coordinate systems like game maps or floor plans
  3. Coordinate Format: MongoDB uses [longitude, latitude] format (opposite of many mapping APIs)
  4. Precision Considerations: Balance coordinate precision with storage and performance requirements
  5. Projection Selection: Choose appropriate coordinate reference system for your geographic region
  6. Distance Units: Ensure consistent distance units throughout your application

Performance Optimization

Optimize geospatial queries for high performance and scalability:

  1. Index Strategy: Create compound indexes that support your most common query patterns
  2. Query Limits: Use $maxDistance and $minDistance to limit search scope
  3. Result Pagination: Implement proper pagination for large result sets
  4. Memory Management: Monitor working set size and optimize document structure
  5. Aggregation Optimization: Use $geoNear for distance-based aggregations when possible
  6. Sharding Strategy: Consider geospatial distribution when designing sharding keys

Conclusion

MongoDB geospatial capabilities provide comprehensive location-aware functionality that eliminates the complexity of traditional spatial database extensions while delivering superior performance and scalability. The native support for GeoJSON, multiple coordinate systems, and sophisticated spatial operations makes building location-based applications both powerful and intuitive.

Key geospatial benefits include:

  • Native Spatial Support: Built-in GeoJSON support without additional extensions or setup
  • High Performance: Optimized spatial indexing and query execution for billions of documents
  • Rich Query Capabilities: Comprehensive spatial operators for proximity, intersection, and containment
  • Flexible Data Models: Store complex location data with business context in single documents
  • Real-time Processing: Efficient geofencing and location tracking for live applications
  • Scalable Architecture: Horizontal scaling across distributed clusters with location-aware sharding

Whether you're building ride-sharing platforms, delivery applications, location-based social networks, or IoT sensor networks, MongoDB's geospatial features with QueryLeaf's familiar SQL interface provides the foundation for sophisticated location-aware applications. This combination enables you to implement complex spatial functionality while preserving familiar database interaction patterns.

QueryLeaf Integration: QueryLeaf automatically manages MongoDB geospatial operations while providing SQL-familiar spatial query syntax, coordinate system handling, and geographic analysis functions. Advanced geospatial indexing, proximity calculations, and spatial analytics are seamlessly handled through familiar SQL patterns, making location-based application development both powerful and accessible.

The integration of native geospatial capabilities with SQL-style spatial operations makes MongoDB an ideal platform for applications requiring both sophisticated location functionality and familiar database interaction patterns, ensuring your geospatial solutions remain both effective and maintainable as they scale and evolve.

MongoDB Time Series Collections and IoT Data Management: SQL-Style Time Series Analytics with High-Performance Data Ingestion

Modern IoT applications generate massive volumes of time-stamped data from sensors, devices, and monitoring systems requiring specialized storage, querying, and analysis capabilities. Traditional relational databases struggle with time series workloads due to their rigid schema requirements, poor compression for temporal data, and inefficient querying patterns for time-based aggregations and analytics.

MongoDB Time Series Collections provide purpose-built capabilities for storing, querying, and analyzing time-stamped data with automatic partitioning, compression, and optimized indexing. Unlike traditional collection storage, time series collections automatically organize data by time ranges, apply sophisticated compression algorithms, and provide specialized query patterns optimized for temporal analytics and IoT workloads.

The Traditional Time Series Challenge

Relational database approaches to time series data have significant performance and scalability limitations:

-- Traditional relational time series design - inefficient and complex

-- PostgreSQL time series approach with partitioning
CREATE TABLE sensor_readings (
    reading_id BIGSERIAL,
    sensor_id VARCHAR(100) NOT NULL,
    device_id VARCHAR(100) NOT NULL,
    location VARCHAR(200),
    timestamp TIMESTAMP NOT NULL,
    temperature DECIMAL(5,2),
    humidity DECIMAL(5,2),
    pressure DECIMAL(7,2),
    battery_level DECIMAL(3,2),
    signal_strength INTEGER,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
) PARTITION BY RANGE (timestamp);

-- Create monthly partitions (manual maintenance required)
CREATE TABLE sensor_readings_2024_01 PARTITION OF sensor_readings
    FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
CREATE TABLE sensor_readings_2024_02 PARTITION OF sensor_readings
    FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');
CREATE TABLE sensor_readings_2024_03 PARTITION OF sensor_readings
    FOR VALUES FROM ('2024-03-01') TO ('2024-04-01');
-- ... manual partition creation for each month

-- Indexes for time series queries
CREATE INDEX idx_sensor_readings_timestamp ON sensor_readings (timestamp);
CREATE INDEX idx_sensor_readings_sensor_id_timestamp ON sensor_readings (sensor_id, timestamp);
CREATE INDEX idx_sensor_readings_device_timestamp ON sensor_readings (device_id, timestamp);

-- Complex time series aggregation query
SELECT 
    sensor_id,
    device_id,
    DATE_TRUNC('hour', timestamp) as hour_bucket,

    -- Statistical aggregations
    COUNT(*) as reading_count,
    AVG(temperature) as avg_temperature,
    MIN(temperature) as min_temperature,
    MAX(temperature) as max_temperature,
    STDDEV(temperature) as temp_stddev,

    AVG(humidity) as avg_humidity,
    AVG(pressure) as avg_pressure,
    AVG(battery_level) as avg_battery,

    -- Time-based calculations
    FIRST_VALUE(temperature) OVER (
        PARTITION BY sensor_id, DATE_TRUNC('hour', timestamp) 
        ORDER BY timestamp
    ) as first_temp,
    LAST_VALUE(temperature) OVER (
        PARTITION BY sensor_id, DATE_TRUNC('hour', timestamp) 
        ORDER BY timestamp 
        RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) as last_temp,

    -- Lag calculations for trends
    LAG(AVG(temperature)) OVER (
        PARTITION BY sensor_id 
        ORDER BY DATE_TRUNC('hour', timestamp)
    ) as prev_hour_avg_temp

FROM sensor_readings
WHERE timestamp >= CURRENT_TIMESTAMP - INTERVAL '24 hours'
    AND sensor_id IN ('TEMP_001', 'TEMP_002', 'TEMP_003')
GROUP BY sensor_id, device_id, DATE_TRUNC('hour', timestamp)
ORDER BY sensor_id, hour_bucket;

-- Problems with traditional time series approach:
-- 1. Manual partition management and maintenance overhead
-- 2. Poor compression ratios for time-stamped data
-- 3. Complex query patterns for time-based aggregations
-- 4. Limited scalability for high-frequency data ingestion
-- 5. Inefficient storage for sparse or irregular time series
-- 6. Difficult downsampling and data retention management
-- 7. Poor performance for cross-time-range analytics
-- 8. Complex indexing strategies for temporal queries

-- InfluxDB-style approach (specialized but limited)
-- INSERT INTO sensor_data,sensor_id=TEMP_001,device_id=DEV_001,location=warehouse_A 
--   temperature=23.5,humidity=65.2,pressure=1013.25,battery_level=85.3 1640995200000000000

-- InfluxDB limitations:
-- - Specialized query language (InfluxQL/Flux) not SQL compatible
-- - Limited JOIN capabilities across measurements
-- - Complex data modeling for hierarchical sensor networks
-- - Difficult integration with existing application stacks
-- - Limited support for complex business logic
-- - Vendor lock-in with proprietary tools and ecosystem
-- - Complex migration paths from existing SQL-based systems

MongoDB Time Series Collections provide comprehensive time series capabilities:

// MongoDB Time Series Collections - purpose-built for temporal data
const { MongoClient } = require('mongodb');

const client = new MongoClient('mongodb://localhost:27017');
const db = client.db('iot_platform');

// Create time series collection with automatic optimization
const createTimeSeriesCollection = async () => {
  try {
    // Create time series collection with comprehensive configuration
    const collection = await db.createCollection('sensor_readings', {
      timeseries: {
        // Time field - required, used for automatic partitioning
        timeField: 'timestamp',

        // Meta field - optional, groups related time series together
        metaField: 'metadata',

        // Granularity for automatic bucketing and compression
        granularity: 'minutes', // 'seconds', 'minutes', 'hours'

        // Automatic expiration for data retention
        expireAfterSeconds: 60 * 60 * 24 * 365 // 1 year retention
      }
    });

    console.log('Time series collection created successfully');
    return collection;

  } catch (error) {
    console.error('Error creating time series collection:', error);
    throw error;
  }
};

// High-performance time series data ingestion
const ingestSensorData = async () => {
  const sensorReadings = db.collection('sensor_readings');

  // Batch insert for optimal performance
  const batchData = [];
  const batchSize = 1000;
  const currentTime = new Date();

  // Generate realistic IoT sensor data
  for (let i = 0; i < batchSize; i++) {
    const timestamp = new Date(currentTime.getTime() - (i * 60000)); // Every minute

    // Multiple sensors per batch
    ['TEMP_001', 'TEMP_002', 'TEMP_003', 'HUM_001', 'PRESS_001'].forEach(sensorId => {
      batchData.push({
        // Time field (required for time series)
        timestamp: timestamp,

        // Metadata field - groups related measurements
        metadata: {
          sensorId: sensorId,
          deviceId: sensorId.startsWith('TEMP') ? 'CLIMATE_DEV_001' : 
                   sensorId.startsWith('HUM') ? 'CLIMATE_DEV_001' : 'PRESSURE_DEV_001',
          location: {
            building: 'Warehouse_A',
            floor: 1,
            room: 'Storage_Room_1',
            coordinates: {
              x: Math.floor(Math.random() * 100),
              y: Math.floor(Math.random() * 100)
            }
          },
          sensorType: sensorId.startsWith('TEMP') ? 'temperature' :
                     sensorId.startsWith('HUM') ? 'humidity' : 'pressure',
          unit: sensorId.startsWith('TEMP') ? 'celsius' :
                sensorId.startsWith('HUM') ? 'percent' : 'hPa',
          calibrationDate: new Date('2024-01-01'),
          firmwareVersion: '2.1.3'
        },

        // Measurement data - varies by sensor type
        measurements: generateMeasurements(sensorId, timestamp),

        // System metadata
        ingestionTime: new Date(),
        dataQuality: {
          isValid: Math.random() > 0.02, // 2% invalid readings
          confidence: 0.95 + (Math.random() * 0.05), // 95-100% confidence
          calibrationStatus: 'valid',
          lastCalibration: new Date('2024-01-01')
        },

        // Device health metrics
        deviceHealth: {
          batteryLevel: 85 + Math.random() * 15, // 85-100%
          signalStrength: -30 - Math.random() * 40, // -30 to -70 dBm
          temperature: 20 + Math.random() * 10, // Device temp 20-30°C
          uptime: Math.floor(Math.random() * 86400 * 30) // Up to 30 days
        }
      });
    });
  }

  // Batch insert for optimal ingestion performance
  try {
    const result = await sensorReadings.insertMany(batchData, { 
      ordered: false, // Allow parallel insertions
      writeConcern: { w: 1 } // Optimize for ingestion speed
    });

    console.log(`Inserted ${result.insertedCount} sensor readings`);
    return result;

  } catch (error) {
    console.error('Error inserting sensor data:', error);
    throw error;
  }
};

function generateMeasurements(sensorId, timestamp) {
  const baseValues = {
    'TEMP_001': { value: 22, variance: 5 },
    'TEMP_002': { value: 24, variance: 3 },
    'TEMP_003': { value: 20, variance: 4 },
    'HUM_001': { value: 65, variance: 15 },
    'PRESS_001': { value: 1013.25, variance: 5 }
  };

  const base = baseValues[sensorId];
  if (!base) return {};

  // Add some realistic patterns and noise
  const hourOfDay = timestamp.getHours();
  const seasonalEffect = Math.sin((timestamp.getMonth() * Math.PI) / 6) * 2;
  const dailyEffect = Math.sin((hourOfDay * Math.PI) / 12) * 1.5;
  const randomNoise = (Math.random() - 0.5) * base.variance;

  const value = base.value + seasonalEffect + dailyEffect + randomNoise;

  return {
    value: Math.round(value * 100) / 100,
    rawValue: value,
    processed: true,

    // Statistical context
    range: {
      min: base.value - base.variance,
      max: base.value + base.variance
    },

    // Quality indicators
    outlierScore: Math.abs(randomNoise) / base.variance,
    trend: dailyEffect > 0 ? 'increasing' : 'decreasing'
  };
}

// Advanced time series queries and analytics
const performTimeSeriesAnalytics = async () => {
  const sensorReadings = db.collection('sensor_readings');

  // 1. Real-time dashboard data - last 24 hours
  const realtimeDashboard = await sensorReadings.aggregate([
    // Filter to last 24 hours
    {
      $match: {
        timestamp: {
          $gte: new Date(Date.now() - 24 * 60 * 60 * 1000)
        },
        'dataQuality.isValid': true
      }
    },

    // Group by sensor and time bucket for aggregation
    {
      $group: {
        _id: {
          sensorId: '$metadata.sensorId',
          sensorType: '$metadata.sensorType',
          location: '$metadata.location.room',
          // 15-minute time buckets
          timeBucket: {
            $dateTrunc: {
              date: '$timestamp',
              unit: 'minute',
              binSize: 15
            }
          }
        },

        // Statistical aggregations
        count: { $sum: 1 },
        avgValue: { $avg: '$measurements.value' },
        minValue: { $min: '$measurements.value' },
        maxValue: { $max: '$measurements.value' },
        stdDev: { $stdDevPop: '$measurements.value' },

        // First and last readings in bucket
        firstReading: { $first: '$measurements.value' },
        lastReading: { $last: '$measurements.value' },

        // Data quality metrics
        validReadings: {
          $sum: { $cond: ['$dataQuality.isValid', 1, 0] }
        },
        avgConfidence: { $avg: '$dataQuality.confidence' },

        // Device health aggregations
        avgBatteryLevel: { $avg: '$deviceHealth.batteryLevel' },
        avgSignalStrength: { $avg: '$deviceHealth.signalStrength' }
      }
    },

    // Calculate derived metrics
    {
      $addFields: {
        // Value change within bucket
        valueChange: { $subtract: ['$lastReading', '$firstReading'] },

        // Coefficient of variation (relative variability)
        coefficientOfVariation: {
          $cond: {
            if: { $ne: ['$avgValue', 0] },
            then: { $divide: ['$stdDev', '$avgValue'] },
            else: 0
          }
        },

        // Data quality ratio
        dataQualityRatio: { $divide: ['$validReadings', '$count'] },

        // Device health status
        deviceHealthStatus: {
          $switch: {
            branches: [
              {
                case: { 
                  $and: [
                    { $gte: ['$avgBatteryLevel', 80] },
                    { $gte: ['$avgSignalStrength', -50] }
                  ]
                },
                then: 'excellent'
              },
              {
                case: { 
                  $and: [
                    { $gte: ['$avgBatteryLevel', 50] },
                    { $gte: ['$avgSignalStrength', -65] }
                  ]
                },
                then: 'good'
              },
              {
                case: { 
                  $or: [
                    { $lt: ['$avgBatteryLevel', 20] },
                    { $lt: ['$avgSignalStrength', -80] }
                  ]
                },
                then: 'critical'
              }
            ],
            default: 'warning'
          }
        }
      }
    },

    // Sort by sensor and time
    {
      $sort: {
        '_id.sensorId': 1,
        '_id.timeBucket': 1
      }
    },

    // Format output for dashboard consumption
    {
      $group: {
        _id: '$_id.sensorId',
        sensorType: { $first: '$_id.sensorType' },
        location: { $first: '$_id.location' },

        // Time series data points
        timeSeries: {
          $push: {
            timestamp: '$_id.timeBucket',
            value: '$avgValue',
            min: '$minValue',
            max: '$maxValue',
            count: '$count',
            quality: '$dataQualityRatio',
            deviceHealth: '$deviceHealthStatus'
          }
        },

        // Aggregate statistics across all time buckets
        overallStats: {
          $push: {
            avg: '$avgValue',
            stdDev: '$stdDev',
            cv: '$coefficientOfVariation'
          }
        },

        // Latest values
        latestValue: { $last: '$avgValue' },
        latestChange: { $last: '$valueChange' },
        latestQuality: { $last: '$dataQualityRatio' }
      }
    },

    // Calculate final sensor-level statistics
    {
      $addFields: {
        overallAvg: { $avg: '$overallStats.avg' },
        overallStdDev: { $avg: '$overallStats.stdDev' },
        avgCV: { $avg: '$overallStats.cv' },

        // Trend analysis
        trend: {
          $cond: {
            if: { $gt: ['$latestChange', 0.1] },
            then: 'increasing',
            else: {
              $cond: {
                if: { $lt: ['$latestChange', -0.1] },
                then: 'decreasing',
                else: 'stable'
              }
            }
          }
        }
      }
    }
  ]).toArray();

  console.log('Real-time dashboard data:', JSON.stringify(realtimeDashboard, null, 2));

  // 2. Anomaly detection using statistical methods
  const anomalyDetection = await sensorReadings.aggregate([
    {
      $match: {
        timestamp: {
          $gte: new Date(Date.now() - 7 * 24 * 60 * 60 * 1000) // Last 7 days
        }
      }
    },

    // Calculate rolling statistics for anomaly detection
    {
      $setWindowFields: {
        partitionBy: '$metadata.sensorId',
        sortBy: { timestamp: 1 },
        output: {
          // Rolling 30-point average and standard deviation
          rollingAvg: {
            $avg: '$measurements.value',
            window: {
              documents: [-15, 15] // 30-point centered window
            }
          },
          rollingStdDev: {
            $stdDevPop: '$measurements.value',
            window: {
              documents: [-15, 15]
            }
          },

          // Previous values for change detection
          prevValue: {
            $first: '$measurements.value',
            window: {
              documents: [-1, -1]
            }
          }
        }
      }
    },

    // Identify anomalies using statistical thresholds
    {
      $addFields: {
        // Z-score calculation
        zScore: {
          $cond: {
            if: { $ne: ['$rollingStdDev', 0] },
            then: {
              $divide: [
                { $subtract: ['$measurements.value', '$rollingAvg'] },
                '$rollingStdDev'
              ]
            },
            else: 0
          }
        },

        // Rate of change
        rateOfChange: {
          $cond: {
            if: { $and: ['$prevValue', { $ne: ['$prevValue', 0] }] },
            then: {
              $divide: [
                { $subtract: ['$measurements.value', '$prevValue'] },
                '$prevValue'
              ]
            },
            else: 0
          }
        }
      }
    },

    // Filter to potential anomalies
    {
      $match: {
        $or: [
          { zScore: { $gt: 3 } }, // Values > 3 standard deviations
          { zScore: { $lt: -3 } },
          { rateOfChange: { $gt: 0.5 } }, // > 50% change
          { rateOfChange: { $lt: -0.5 } }
        ]
      }
    },

    // Classify anomaly types
    {
      $addFields: {
        anomalyType: {
          $switch: {
            branches: [
              {
                case: { $gt: ['$zScore', 3] },
                then: 'statistical_high'
              },
              {
                case: { $lt: ['$zScore', -3] },
                then: 'statistical_low'
              },
              {
                case: { $gt: ['$rateOfChange', 0.5] },
                then: 'rapid_increase'
              },
              {
                case: { $lt: ['$rateOfChange', -0.5] },
                then: 'rapid_decrease'
              }
            ],
            default: 'unknown'
          }
        },

        anomalySeverity: {
          $switch: {
            branches: [
              {
                case: { 
                  $or: [
                    { $gt: ['$zScore', 5] },
                    { $lt: ['$zScore', -5] }
                  ]
                },
                then: 'critical'
              },
              {
                case: { 
                  $or: [
                    { $gt: ['$zScore', 4] },
                    { $lt: ['$zScore', -4] }
                  ]
                },
                then: 'high'
              }
            ],
            default: 'medium'
          }
        }
      }
    },

    // Group anomalies by sensor and type
    {
      $group: {
        _id: {
          sensorId: '$metadata.sensorId',
          anomalyType: '$anomalyType'
        },
        count: { $sum: 1 },
        avgSeverity: { $avg: '$zScore' },
        latestAnomaly: { $max: '$timestamp' },
        anomalies: {
          $push: {
            timestamp: '$timestamp',
            value: '$measurements.value',
            zScore: '$zScore',
            rateOfChange: '$rateOfChange',
            severity: '$anomalySeverity'
          }
        }
      }
    },

    {
      $sort: {
        '_id.sensorId': 1,
        count: -1
      }
    }
  ]).toArray();

  console.log('Anomaly detection results:', JSON.stringify(anomalyDetection, null, 2));

  // 3. Predictive maintenance analysis
  const predictiveMaintenance = await sensorReadings.aggregate([
    {
      $match: {
        timestamp: {
          $gte: new Date(Date.now() - 30 * 24 * 60 * 60 * 1000) // Last 30 days
        }
      }
    },

    // Calculate device health trends
    {
      $group: {
        _id: {
          deviceId: '$metadata.deviceId',
          day: {
            $dateTrunc: {
              date: '$timestamp',
              unit: 'day'
            }
          }
        },

        avgBatteryLevel: { $avg: '$deviceHealth.batteryLevel' },
        avgSignalStrength: { $avg: '$deviceHealth.signalStrength' },
        readingCount: { $sum: 1 },
        errorRate: {
          $avg: { $cond: ['$dataQuality.isValid', 0, 1] }
        }
      }
    },

    // Calculate trends using linear regression approximation
    {
      $setWindowFields: {
        partitionBy: '$_id.deviceId',
        sortBy: { '_id.day': 1 },
        output: {
          batteryTrend: {
            $linearFill: '$avgBatteryLevel'
          },
          signalTrend: {
            $linearFill: '$avgSignalStrength'
          }
        }
      }
    },

    // Predict maintenance needs
    {
      $addFields: {
        batteryDaysRemaining: {
          $cond: {
            if: { $lt: ['$batteryTrend', 0] },
            then: {
              $ceil: {
                $divide: ['$avgBatteryLevel', { $abs: '$batteryTrend' }]
              }
            },
            else: 365 // Battery not declining
          }
        },

        maintenanceRisk: {
          $switch: {
            branches: [
              {
                case: {
                  $or: [
                    { $lt: ['$avgBatteryLevel', 20] },
                    { $gt: ['$errorRate', 0.1] }
                  ]
                },
                then: 'immediate'
              },
              {
                case: {
                  $or: [
                    { $lt: ['$avgBatteryLevel', 40] },
                    { $lt: ['$avgSignalStrength', -70] }
                  ]
                },
                then: 'high'
              },
              {
                case: { $lt: ['$avgBatteryLevel', 60] },
                then: 'medium'
              }
            ],
            default: 'low'
          }
        }
      }
    },

    // Group by device with latest status
    {
      $group: {
        _id: '$_id.deviceId',
        latestBatteryLevel: { $last: '$avgBatteryLevel' },
        latestSignalStrength: { $last: '$avgSignalStrength' },
        batteryTrend: { $last: '$batteryTrend' },
        signalTrend: { $last: '$signalTrend' },
        estimatedBatteryDays: { $last: '$batteryDaysRemaining' },
        maintenanceRisk: { $last: '$maintenanceRisk' },
        avgErrorRate: { $avg: '$errorRate' }
      }
    },

    {
      $sort: {
        maintenanceRisk: 1, // immediate first
        estimatedBatteryDays: 1
      }
    }
  ]).toArray();

  console.log('Predictive maintenance analysis:', JSON.stringify(predictiveMaintenance, null, 2));

  return {
    realtimeDashboard,
    anomalyDetection,
    predictiveMaintenance
  };
};

// Benefits of MongoDB Time Series Collections:
// - Automatic data partitioning and compression optimized for time-based data
// - Built-in retention policies with automatic expiration
// - Optimized indexes and query patterns for temporal analytics
// - High-performance ingestion with automatic bucketing
// - Native aggregation framework support for complex time series analysis
// - Flexible schema evolution for changing IoT device requirements
// - Horizontal scaling across sharded clusters
// - Integration with existing MongoDB ecosystem and tools
// - Real-time analytics with change streams for live dashboards
// - Cost-effective storage with intelligent compression algorithms

Understanding MongoDB Time Series Architecture

Time Series Collection Design Patterns

Implement comprehensive time series patterns for different IoT scenarios:

// Advanced time series collection design patterns
class IoTTimeSeriesManager {
  constructor(db) {
    this.db = db;
    this.collections = new Map();
    this.ingestionBuffers = new Map();
  }

  async createIoTTimeSeriesCollections() {
    // Pattern 1: High-frequency sensor data
    const highFrequencyConfig = {
      timeseries: {
        timeField: 'timestamp',
        metaField: 'sensor',
        granularity: 'seconds', // For sub-minute data
        bucketMaxSpanSeconds: 3600, // 1-hour buckets
        bucketRoundingSeconds: 60 // Round to minute boundaries
      },
      expireAfterSeconds: 60 * 60 * 24 * 30 // 30 days retention
    };

    const highFrequencySensors = await this.db.createCollection(
      'high_frequency_sensors', 
      highFrequencyConfig
    );

    // Pattern 2: Environmental monitoring (medium frequency)
    const environmentalConfig = {
      timeseries: {
        timeField: 'timestamp',
        metaField: 'location',
        granularity: 'minutes',
        bucketMaxSpanSeconds: 86400, // 24-hour buckets
        bucketRoundingSeconds: 3600 // Round to hour boundaries
      },
      expireAfterSeconds: 60 * 60 * 24 * 365 // 1 year retention
    };

    const environmentalData = await this.db.createCollection(
      'environmental_monitoring',
      environmentalConfig
    );

    // Pattern 3: Device health metrics (low frequency)
    const deviceHealthConfig = {
      timeseries: {
        timeField: 'timestamp',
        metaField: 'device',
        granularity: 'hours',
        bucketMaxSpanSeconds: 86400 * 7, // Weekly buckets
        bucketRoundingSeconds: 86400 // Round to day boundaries
      },
      expireAfterSeconds: 60 * 60 * 24 * 365 * 5 // 5 years retention
    };

    const deviceHealth = await this.db.createCollection(
      'device_health_metrics',
      deviceHealthConfig
    );

    // Pattern 4: Event-based time series (irregular intervals)
    const eventBasedConfig = {
      timeseries: {
        timeField: 'timestamp',
        metaField: 'eventSource',
        granularity: 'minutes' // Flexible for irregular events
      },
      expireAfterSeconds: 60 * 60 * 24 * 90 // 90 days retention
    };

    const eventTimeSeries = await this.db.createCollection(
      'event_time_series',
      eventBasedConfig
    );

    // Store collection references
    this.collections.set('highFrequency', highFrequencySensors);
    this.collections.set('environmental', environmentalData);
    this.collections.set('deviceHealth', deviceHealth);
    this.collections.set('events', eventTimeSeries);

    console.log('Time series collections created successfully');
    return this.collections;
  }

  async setupOptimalIndexes() {
    // Create compound indexes for common query patterns
    for (const [name, collection] of this.collections.entries()) {
      try {
        // Metadata + time range queries
        await collection.createIndex({
          'sensor.id': 1,
          'timestamp': 1
        });

        // Location-based queries
        await collection.createIndex({
          'sensor.location.building': 1,
          'sensor.location.floor': 1,
          'timestamp': 1
        });

        // Device type queries
        await collection.createIndex({
          'sensor.type': 1,
          'timestamp': 1
        });

        // Data quality queries
        await collection.createIndex({
          'quality.isValid': 1,
          'timestamp': 1
        });

        console.log(`Indexes created for ${name} collection`);

      } catch (error) {
        console.error(`Error creating indexes for ${name}:`, error);
      }
    }
  }

  async ingestHighFrequencyData(sensorData) {
    // High-performance ingestion with batching
    const collection = this.collections.get('highFrequency');
    const batchSize = 10000;
    const batches = [];

    // Prepare optimized document structure
    const documents = sensorData.map(reading => ({
      timestamp: new Date(reading.timestamp),

      // Metadata field - groups related time series
      sensor: {
        id: reading.sensorId,
        type: reading.sensorType,
        model: reading.model || 'Unknown',
        location: {
          building: reading.building,
          floor: reading.floor,
          room: reading.room,
          coordinates: reading.coordinates
        },
        specifications: {
          accuracy: reading.accuracy,
          range: reading.range,
          units: reading.units
        }
      },

      // Measurements - optimized for compression
      temp: reading.temperature,
      hum: reading.humidity,
      press: reading.pressure,

      // Device status
      batt: reading.batteryLevel,
      signal: reading.signalStrength,

      // Data quality indicators
      quality: {
        isValid: reading.isValid !== false,
        confidence: reading.confidence || 1.0,
        source: reading.source || 'sensor'
      }
    }));

    // Split into batches for optimal ingestion
    for (let i = 0; i < documents.length; i += batchSize) {
      batches.push(documents.slice(i, i + batchSize));
    }

    // Parallel batch ingestion
    const ingestionPromises = batches.map(async (batch, index) => {
      try {
        const result = await collection.insertMany(batch, {
          ordered: false,
          writeConcern: { w: 1 }
        });

        console.log(`Batch ${index + 1}: Inserted ${result.insertedCount} documents`);
        return result.insertedCount;

      } catch (error) {
        console.error(`Batch ${index + 1} failed:`, error);
        return 0;
      }
    });

    const results = await Promise.all(ingestionPromises);
    const totalInserted = results.reduce((sum, count) => sum + count, 0);

    console.log(`Total documents inserted: ${totalInserted}`);
    return totalInserted;
  }

  async performRealTimeAnalytics(timeRange = '1h', sensorIds = []) {
    const collection = this.collections.get('highFrequency');

    // Calculate time range
    const timeRangeMs = {
      '15m': 15 * 60 * 1000,
      '1h': 60 * 60 * 1000,
      '6h': 6 * 60 * 60 * 1000,
      '24h': 24 * 60 * 60 * 1000
    };

    const startTime = new Date(Date.now() - timeRangeMs[timeRange]);

    const pipeline = [
      // Time range and sensor filtering
      {
        $match: {
          timestamp: { $gte: startTime },
          ...(sensorIds.length > 0 && { 'sensor.id': { $in: sensorIds } }),
          'quality.isValid': true
        }
      },

      // Time-based bucketing for aggregation
      {
        $group: {
          _id: {
            sensorId: '$sensor.id',
            sensorType: '$sensor.type',
            location: '$sensor.location.room',
            // Dynamic time bucketing based on range
            timeBucket: {
              $dateTrunc: {
                date: '$timestamp',
                unit: 'minute',
                binSize: timeRange === '15m' ? 1 : 
                        timeRange === '1h' ? 5 : 
                        timeRange === '6h' ? 15 : 60
              }
            }
          },

          // Statistical aggregations
          count: { $sum: 1 },

          // Temperature metrics
          tempAvg: { $avg: '$temp' },
          tempMin: { $min: '$temp' },
          tempMax: { $max: '$temp' },
          tempStdDev: { $stdDevPop: '$temp' },

          // Humidity metrics
          humAvg: { $avg: '$hum' },
          humMin: { $min: '$hum' },
          humMax: { $max: '$hum' },

          // Pressure metrics
          pressAvg: { $avg: '$press' },
          pressMin: { $min: '$press' },
          pressMax: { $max: '$press' },

          // Device health metrics
          battAvg: { $avg: '$batt' },
          battMin: { $min: '$batt' },
          signalAvg: { $avg: '$signal' },
          signalMin: { $min: '$signal' },

          // Data quality metrics
          validReadings: { $sum: 1 },
          avgConfidence: { $avg: '$quality.confidence' },

          // First and last values for trend calculation
          firstTemp: { $first: '$temp' },
          lastTemp: { $last: '$temp' },
          firstTimestamp: { $first: '$timestamp' },
          lastTimestamp: { $last: '$timestamp' }
        }
      },

      // Calculate derived metrics
      {
        $addFields: {
          // Temperature trends
          tempTrend: { $subtract: ['$lastTemp', '$firstTemp'] },
          tempCV: {
            $cond: {
              if: { $ne: ['$tempAvg', 0] },
              then: { $divide: ['$tempStdDev', '$tempAvg'] },
              else: 0
            }
          },

          // Time span for rate calculations
          timeSpanMinutes: {
            $divide: [
              { $subtract: ['$lastTimestamp', '$firstTimestamp'] },
              60000
            ]
          },

          // Device health status
          deviceStatus: {
            $switch: {
              branches: [
                {
                  case: { 
                    $and: [
                      { $gte: ['$battAvg', 80] },
                      { $gte: ['$signalAvg', -50] }
                    ]
                  },
                  then: 'excellent'
                },
                {
                  case: {
                    $and: [
                      { $gte: ['$battAvg', 50] },
                      { $gte: ['$signalAvg', -65] }
                    ]
                  },
                  then: 'good'
                },
                {
                  case: {
                    $or: [
                      { $lt: ['$battAvg', 20] },
                      { $lt: ['$signalAvg', -80] }
                    ]
                  },
                  then: 'critical'
                }
              ],
              default: 'warning'
            }
          }
        }
      },

      // Sort for time series presentation
      {
        $sort: {
          '_id.sensorId': 1,
          '_id.timeBucket': 1
        }
      },

      // Format for dashboard consumption
      {
        $group: {
          _id: '$_id.sensorId',
          sensorType: { $first: '$_id.sensorType' },
          location: { $first: '$_id.location' },

          // Time series data
          timeSeries: {
            $push: {
              timestamp: '$_id.timeBucket',
              temperature: {
                avg: '$tempAvg',
                min: '$tempMin',
                max: '$tempMax',
                trend: '$tempTrend',
                cv: '$tempCV'
              },
              humidity: {
                avg: '$humAvg',
                min: '$humMin',
                max: '$humMax'
              },
              pressure: {
                avg: '$pressAvg',
                min: '$pressMin',
                max: '$pressMax'
              },
              deviceHealth: {
                battery: '$battAvg',
                signal: '$signalAvg',
                status: '$deviceStatus'
              },
              dataQuality: {
                readingCount: '$count',
                confidence: '$avgConfidence'
              }
            }
          },

          // Summary statistics
          summaryStats: {
            totalReadings: { $sum: '$count' },
            avgTemperature: { $avg: '$tempAvg' },
            temperatureRange: {
              $subtract: [{ $max: '$tempMax' }, { $min: '$tempMin' }]
            },
            overallDeviceStatus: { $last: '$deviceStatus' }
          }
        }
      }
    ];

    const results = await collection.aggregate(pipeline).toArray();

    // Add metadata about the query
    return {
      timeRange: timeRange,
      queryTime: new Date(),
      startTime: startTime,
      endTime: new Date(),
      sensorCount: results.length,
      data: results
    };
  }

  async detectAnomaliesAdvanced(sensorId, lookbackHours = 168) { // 1 week default
    const collection = this.collections.get('highFrequency');
    const lookbackTime = new Date(Date.now() - lookbackHours * 60 * 60 * 1000);

    const pipeline = [
      {
        $match: {
          'sensor.id': sensorId,
          timestamp: { $gte: lookbackTime },
          'quality.isValid': true
        }
      },

      { $sort: { timestamp: 1 } },

      // Calculate rolling statistics using window functions
      {
        $setWindowFields: {
          sortBy: { timestamp: 1 },
          output: {
            // Rolling 50-point statistics for anomaly detection
            rollingMean: {
              $avg: '$temp',
              window: { documents: [-25, 25] }
            },
            rollingStd: {
              $stdDevPop: '$temp',
              window: { documents: [-25, 25] }
            },

            // Seasonal decomposition (24-hour pattern)
            dailyMean: {
              $avg: '$temp',
              window: { range: [-12, 12], unit: 'hour' }
            },

            // Trend analysis
            trendSlope: {
              $linearFill: '$temp'
            },

            // Previous values for rate of change
            prevTemp: {
              $first: '$temp',
              window: { documents: [-1, -1] }
            }
          }
        }
      },

      // Calculate anomaly scores
      {
        $addFields: {
          // Z-score anomaly detection
          zScore: {
            $cond: {
              if: { $ne: ['$rollingStd', 0] },
              then: {
                $divide: [
                  { $subtract: ['$temp', '$rollingMean'] },
                  '$rollingStd'
                ]
              },
              else: 0
            }
          },

          // Seasonal anomaly (deviation from daily pattern)
          seasonalAnomaly: {
            $cond: {
              if: { $ne: ['$dailyMean', 0] },
              then: {
                $abs: {
                  $divide: [
                    { $subtract: ['$temp', '$dailyMean'] },
                    '$dailyMean'
                  ]
                }
              },
              else: 0
            }
          },

          // Rate of change anomaly
          rateOfChange: {
            $cond: {
              if: { $and: ['$prevTemp', { $ne: ['$prevTemp', 0] }] },
              then: {
                $abs: {
                  $divide: [
                    { $subtract: ['$temp', '$prevTemp'] },
                    '$prevTemp'
                  ]
                }
              },
              else: 0
            }
          }
        }
      },

      // Identify anomalies using multiple criteria
      {
        $addFields: {
          isAnomaly: {
            $or: [
              { $gt: [{ $abs: '$zScore' }, 3] }, // Statistical outlier
              { $gt: ['$seasonalAnomaly', 0.3] }, // 30% deviation from seasonal
              { $gt: ['$rateOfChange', 0.5] } // 50% rate of change
            ]
          },

          anomalyType: {
            $switch: {
              branches: [
                {
                  case: { $gt: ['$zScore', 3] },
                  then: 'statistical_high'
                },
                {
                  case: { $lt: ['$zScore', -3] },
                  then: 'statistical_low'
                },
                {
                  case: { $gt: ['$seasonalAnomaly', 0.3] },
                  then: 'seasonal_deviation'
                },
                {
                  case: { $gt: ['$rateOfChange', 0.5] },
                  then: 'rapid_change'
                }
              ],
              default: 'normal'
            }
          },

          anomalySeverity: {
            $switch: {
              branches: [
                {
                  case: { $gt: [{ $abs: '$zScore' }, 5] },
                  then: 'critical'
                },
                {
                  case: { $gt: [{ $abs: '$zScore' }, 4] },
                  then: 'high'
                },
                {
                  case: { $gt: [{ $abs: '$zScore' }, 3] },
                  then: 'medium'
                }
              ],
              default: 'low'
            }
          }
        }
      },

      // Filter to anomalies only
      { $match: { isAnomaly: true } },

      // Group consecutive anomalies into events
      {
        $group: {
          _id: {
            $dateToString: {
              format: '%Y-%m-%d-%H',
              date: '$timestamp'
            }
          },

          anomalyCount: { $sum: 1 },
          avgSeverityScore: { $avg: { $abs: '$zScore' } },

          anomalies: {
            $push: {
              timestamp: '$timestamp',
              value: '$temp',
              zScore: '$zScore',
              type: '$anomalyType',
              severity: '$anomalySeverity',
              seasonalDeviation: '$seasonalAnomaly',
              rateOfChange: '$rateOfChange'
            }
          },

          startTime: { $min: '$timestamp' },
          endTime: { $max: '$timestamp' }
        }
      },

      { $sort: { startTime: -1 } }
    ];

    return await collection.aggregate(pipeline).toArray();
  }

  async generatePerformanceReports(reportType = 'daily') {
    const collection = this.collections.get('highFrequency');

    // Calculate report time range
    const timeRanges = {
      'hourly': 60 * 60 * 1000,
      'daily': 24 * 60 * 60 * 1000,
      'weekly': 7 * 24 * 60 * 60 * 1000,
      'monthly': 30 * 24 * 60 * 60 * 1000
    };

    const startTime = new Date(Date.now() - timeRanges[reportType]);

    const pipeline = [
      {
        $match: {
          timestamp: { $gte: startTime }
        }
      },

      // Group by sensor and time period
      {
        $group: {
          _id: {
            sensorId: '$sensor.id',
            sensorType: '$sensor.type',
            location: '$sensor.location',
            period: {
              $dateTrunc: {
                date: '$timestamp',
                unit: reportType === 'hourly' ? 'hour' :
                      reportType === 'daily' ? 'day' :
                      reportType === 'weekly' ? 'week' : 'month'
              }
            }
          },

          // Data volume metrics
          totalReadings: { $sum: 1 },
          validReadings: {
            $sum: { $cond: ['$quality.isValid', 1, 0] }
          },

          // Data quality metrics
          avgConfidence: { $avg: '$quality.confidence' },
          dataQualityRatio: {
            $avg: { $cond: ['$quality.isValid', 1, 0] }
          },

          // Measurement statistics
          tempStats: {
            $push: {
              avg: { $avg: '$temp' },
              min: { $min: '$temp' },
              max: { $max: '$temp' },
              stdDev: { $stdDevPop: '$temp' }
            }
          },

          // Device health metrics
          avgBatteryLevel: { $avg: '$batt' },
          minBatteryLevel: { $min: '$batt' },
          avgSignalStrength: { $avg: '$signal' },
          minSignalStrength: { $min: '$signal' },

          // Time coverage
          firstReading: { $min: '$timestamp' },
          lastReading: { $max: '$timestamp' }
        }
      },

      // Calculate performance indicators
      {
        $addFields: {
          // Coverage percentage
          coveragePercentage: {
            $multiply: [
              {
                $divide: [
                  { $subtract: ['$lastReading', '$firstReading'] },
                  timeRanges[reportType]
                ]
              },
              100
            ]
          },

          // Device health score
          deviceHealthScore: {
            $multiply: [
              {
                $add: [
                  { $divide: ['$avgBatteryLevel', 100] }, // Battery factor
                  { $divide: [{ $add: ['$avgSignalStrength', 100] }, 50] } // Signal factor
                ]
              },
              50
            ]
          },

          // Overall performance score
          performanceScore: {
            $multiply: [
              {
                $add: [
                  { $multiply: ['$dataQualityRatio', 0.4] },
                  { $multiply: [{ $divide: ['$avgConfidence', 1] }, 0.3] },
                  { $multiply: [{ $divide: ['$avgBatteryLevel', 100] }, 0.2] },
                  { $multiply: [{ $divide: [{ $add: ['$avgSignalStrength', 100] }, 50] }, 0.1] }
                ]
              },
              100
            ]
          }
        }
      },

      // Generate recommendations
      {
        $addFields: {
          recommendations: {
            $switch: {
              branches: [
                {
                  case: { $lt: ['$dataQualityRatio', 0.9] },
                  then: ['Investigate data quality issues', 'Check sensor calibration']
                },
                {
                  case: { $lt: ['$avgBatteryLevel', 30] },
                  then: ['Schedule battery replacement', 'Consider solar charging']
                },
                {
                  case: { $lt: ['$avgSignalStrength', -75] },
                  then: ['Check network connectivity', 'Consider signal boosters']
                },
                {
                  case: { $lt: ['$coveragePercentage', 95] },
                  then: ['Investigate data gaps', 'Check device uptime']
                }
              ],
              default: ['Performance within normal parameters']
            }
          },

          alertLevel: {
            $switch: {
              branches: [
                {
                  case: { $lt: ['$performanceScore', 60] },
                  then: 'critical'
                },
                {
                  case: { $lt: ['$performanceScore', 80] },
                  then: 'warning'
                }
              ],
              default: 'normal'
            }
          }
        }
      },

      {
        $sort: {
          performanceScore: 1, // Lowest scores first
          '_id.sensorId': 1
        }
      }
    ];

    const results = await collection.aggregate(pipeline).toArray();

    return {
      reportType: reportType,
      generatedAt: new Date(),
      timeRange: {
        start: startTime,
        end: new Date()
      },
      summary: {
        totalSensors: results.length,
        criticalAlerts: results.filter(r => r.alertLevel === 'critical').length,
        warnings: results.filter(r => r.alertLevel === 'warning').length,
        avgPerformanceScore: results.reduce((sum, r) => sum + r.performanceScore, 0) / results.length
      },
      sensorReports: results
    };
  }
}

SQL-Style Time Series Operations with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB Time Series operations:

-- QueryLeaf time series operations with SQL-familiar syntax

-- Create time series collection
CREATE TIME_SERIES COLLECTION sensor_readings (
  timestamp TIMESTAMP NOT NULL, -- time field
  sensor_id VARCHAR(100) NOT NULL,
  location VARCHAR(200),
  device_id VARCHAR(100),

  -- Measurements
  temperature DECIMAL(5,2),
  humidity DECIMAL(5,2),
  pressure DECIMAL(7,2),

  -- Device health
  battery_level DECIMAL(3,2),
  signal_strength INTEGER,

  -- Data quality
  is_valid BOOLEAN DEFAULT true,
  confidence DECIMAL(3,2) DEFAULT 1.00
) WITH (
  meta_field = 'sensor_metadata',
  granularity = 'minutes',
  expire_after_seconds = 2678400 -- 31 days
);

-- High-performance batch insert for IoT data
INSERT INTO sensor_readings 
VALUES 
  ('2024-09-17 10:00:00', 'TEMP_001', 'Warehouse_A', 'DEV_001', 23.5, 65.2, 1013.25, 85.3, -45, true, 0.98),
  ('2024-09-17 10:01:00', 'TEMP_001', 'Warehouse_A', 'DEV_001', 23.7, 65.0, 1013.30, 85.2, -46, true, 0.97),
  ('2024-09-17 10:02:00', 'TEMP_001', 'Warehouse_A', 'DEV_001', 23.6, 64.8, 1013.28, 85.1, -44, true, 0.99);

-- Real-time dashboard query with time bucketing
SELECT 
  sensor_id,
  location,
  TIME_BUCKET('15 minutes', timestamp) as time_bucket,

  -- Statistical aggregations
  COUNT(*) as reading_count,
  AVG(temperature) as avg_temperature,
  MIN(temperature) as min_temperature,
  MAX(temperature) as max_temperature,
  STDDEV_POP(temperature) as temp_stddev,

  AVG(humidity) as avg_humidity,
  AVG(pressure) as avg_pressure,

  -- Device health metrics
  AVG(battery_level) as avg_battery,
  MIN(battery_level) as min_battery,
  AVG(signal_strength) as avg_signal,

  -- Data quality metrics
  SUM(CASE WHEN is_valid THEN 1 ELSE 0 END) as valid_readings,
  AVG(confidence) as avg_confidence,

  -- Trend indicators
  FIRST_VALUE(temperature ORDER BY timestamp) as first_temp,
  LAST_VALUE(temperature ORDER BY timestamp) as last_temp,
  LAST_VALUE(temperature ORDER BY timestamp) - FIRST_VALUE(temperature ORDER BY timestamp) as temp_change

FROM sensor_readings
WHERE timestamp >= CURRENT_TIMESTAMP - INTERVAL '24 hours'
  AND sensor_id IN ('TEMP_001', 'TEMP_002', 'TEMP_003')
  AND is_valid = true
GROUP BY sensor_id, location, TIME_BUCKET('15 minutes', timestamp)
ORDER BY sensor_id, time_bucket;

-- Advanced anomaly detection with window functions
WITH statistical_baseline AS (
  SELECT 
    sensor_id,
    timestamp,
    temperature,

    -- Rolling statistics for anomaly detection
    AVG(temperature) OVER (
      PARTITION BY sensor_id
      ORDER BY timestamp
      ROWS BETWEEN 25 PRECEDING AND 25 FOLLOWING
    ) as rolling_avg,

    STDDEV_POP(temperature) OVER (
      PARTITION BY sensor_id  
      ORDER BY timestamp
      ROWS BETWEEN 25 PRECEDING AND 25 FOLLOWING
    ) as rolling_stddev,

    -- Seasonal baseline (same hour of day pattern)
    AVG(temperature) OVER (
      PARTITION BY sensor_id, EXTRACT(hour FROM timestamp)
      ORDER BY timestamp
      RANGE BETWEEN INTERVAL '7 days' PRECEDING AND INTERVAL '7 days' FOLLOWING
    ) as seasonal_avg,

    -- Previous value for rate of change
    LAG(temperature, 1) OVER (
      PARTITION BY sensor_id 
      ORDER BY timestamp
    ) as prev_temperature

  FROM sensor_readings
  WHERE timestamp >= CURRENT_TIMESTAMP - INTERVAL '7 days'
    AND is_valid = true
),
anomaly_scores AS (
  SELECT *,
    -- Z-score calculation
    CASE 
      WHEN rolling_stddev > 0 THEN (temperature - rolling_avg) / rolling_stddev
      ELSE 0 
    END as z_score,

    -- Seasonal deviation
    ABS(temperature - seasonal_avg) / GREATEST(seasonal_avg, 0.1) as seasonal_deviation,

    -- Rate of change
    CASE 
      WHEN prev_temperature IS NOT NULL AND prev_temperature != 0 
      THEN ABS(temperature - prev_temperature) / ABS(prev_temperature)
      ELSE 0 
    END as rate_of_change

  FROM statistical_baseline
),
classified_anomalies AS (
  SELECT *,
    -- Anomaly classification
    CASE
      WHEN ABS(z_score) > 3 OR seasonal_deviation > 0.3 OR rate_of_change > 0.5 THEN true
      ELSE false
    END as is_anomaly,

    CASE 
      WHEN z_score > 3 THEN 'statistical_high'
      WHEN z_score < -3 THEN 'statistical_low'
      WHEN seasonal_deviation > 0.3 THEN 'seasonal_deviation'
      WHEN rate_of_change > 0.5 THEN 'rapid_change'
      ELSE 'normal'
    END as anomaly_type,

    CASE
      WHEN ABS(z_score) > 5 THEN 'critical'
      WHEN ABS(z_score) > 4 THEN 'high'
      WHEN ABS(z_score) > 3 THEN 'medium'
      ELSE 'low'
    END as severity

  FROM anomaly_scores
)
SELECT 
  sensor_id,
  DATE_TRUNC('hour', timestamp) as anomaly_hour,
  COUNT(*) as anomaly_count,
  AVG(ABS(z_score)) as avg_severity_score,

  -- Anomaly details
  json_agg(
    json_build_object(
      'timestamp', timestamp,
      'temperature', temperature,
      'z_score', ROUND(z_score::numeric, 3),
      'type', anomaly_type,
      'severity', severity
    ) ORDER BY timestamp
  ) as anomalies,

  MIN(timestamp) as first_anomaly,
  MAX(timestamp) as last_anomaly

FROM classified_anomalies
WHERE is_anomaly = true
GROUP BY sensor_id, DATE_TRUNC('hour', timestamp)
ORDER BY sensor_id, anomaly_hour DESC;

-- Predictive maintenance analysis
WITH device_health_trends AS (
  SELECT 
    device_id,
    sensor_id,
    DATE_TRUNC('day', timestamp) as day,

    AVG(battery_level) as daily_battery_avg,
    MIN(battery_level) as daily_battery_min,
    AVG(signal_strength) as daily_signal_avg,
    MIN(signal_strength) as daily_signal_min,
    COUNT(*) as daily_reading_count,

    -- Data quality metrics
    AVG(CASE WHEN is_valid THEN 1.0 ELSE 0.0 END) as data_quality_ratio,
    AVG(confidence) as avg_confidence

  FROM sensor_readings
  WHERE timestamp >= CURRENT_TIMESTAMP - INTERVAL '30 days'
  GROUP BY device_id, sensor_id, DATE_TRUNC('day', timestamp)
),
trend_analysis AS (
  SELECT *,
    -- Linear trend approximation using least squares
    REGR_SLOPE(daily_battery_avg, EXTRACT(epoch FROM day)) * 86400 as battery_daily_slope,
    REGR_SLOPE(daily_signal_avg, EXTRACT(epoch FROM day)) * 86400 as signal_daily_slope,

    -- Device health scoring
    (daily_battery_avg * 0.4 + 
     (daily_signal_avg + 100) / 50 * 100 * 0.3 +
     data_quality_ratio * 100 * 0.3) as health_score

  FROM device_health_trends
),
maintenance_predictions AS (
  SELECT 
    device_id,

    -- Latest status
    LAST_VALUE(daily_battery_avg ORDER BY day) as current_battery,
    LAST_VALUE(daily_signal_avg ORDER BY day) as current_signal,
    LAST_VALUE(data_quality_ratio ORDER BY day) as current_quality,
    LAST_VALUE(health_score ORDER BY day) as current_health_score,

    -- Trends
    AVG(battery_daily_slope) as battery_trend,
    AVG(signal_daily_slope) as signal_trend,

    -- Predictions
    CASE 
      WHEN AVG(battery_daily_slope) < -0.5 THEN 
        CEIL(LAST_VALUE(daily_battery_avg ORDER BY day) / ABS(AVG(battery_daily_slope)))
      ELSE 365 
    END as estimated_battery_days,

    -- Risk assessment
    CASE
      WHEN LAST_VALUE(daily_battery_avg ORDER BY day) < 20 OR 
           LAST_VALUE(data_quality_ratio ORDER BY day) < 0.8 THEN 'immediate'
      WHEN LAST_VALUE(daily_battery_avg ORDER BY day) < 40 OR 
           LAST_VALUE(daily_signal_avg ORDER BY day) < -70 THEN 'high'
      WHEN LAST_VALUE(daily_battery_avg ORDER BY day) < 60 THEN 'medium'
      ELSE 'low'
    END as maintenance_risk,

    COUNT(*) as days_monitored

  FROM trend_analysis
  GROUP BY device_id
)
SELECT 
  device_id,
  ROUND(current_battery, 1) as battery_level,
  ROUND(current_signal, 1) as signal_strength,
  ROUND(current_quality * 100, 1) as data_quality_pct,
  ROUND(current_health_score, 1) as health_score,

  -- Trends
  CASE 
    WHEN battery_trend < -0.1 THEN 'declining'
    WHEN battery_trend > 0.1 THEN 'improving'
    ELSE 'stable'
  END as battery_trend_status,

  estimated_battery_days,
  maintenance_risk,

  -- Recommendations
  CASE maintenance_risk
    WHEN 'immediate' THEN 'Schedule maintenance within 24 hours'
    WHEN 'high' THEN 'Schedule maintenance within 1 week'  
    WHEN 'medium' THEN 'Schedule maintenance within 1 month'
    ELSE 'Monitor normal schedule'
  END as recommendation,

  days_monitored

FROM maintenance_predictions
ORDER BY 
  CASE maintenance_risk
    WHEN 'immediate' THEN 1
    WHEN 'high' THEN 2
    WHEN 'medium' THEN 3
    ELSE 4
  END,
  estimated_battery_days ASC;

-- Time series downsampling and data retention
CREATE MATERIALIZED VIEW hourly_sensor_summary AS
SELECT 
  sensor_id,
  location,
  device_id,
  TIME_BUCKET('1 hour', timestamp) as hour_bucket,

  -- Statistical summaries
  COUNT(*) as reading_count,
  AVG(temperature) as avg_temperature,
  MIN(temperature) as min_temperature,  
  MAX(temperature) as max_temperature,
  PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY temperature) as median_temperature,
  STDDEV_POP(temperature) as temp_stddev,

  AVG(humidity) as avg_humidity,
  AVG(pressure) as avg_pressure,

  -- Device health summaries
  AVG(battery_level) as avg_battery,
  MIN(battery_level) as min_battery,
  AVG(signal_strength) as avg_signal,

  -- Quality metrics
  AVG(CASE WHEN is_valid THEN 1.0 ELSE 0.0 END) as data_quality,
  AVG(confidence) as avg_confidence,

  -- Time range
  MIN(timestamp) as period_start,
  MAX(timestamp) as period_end

FROM sensor_readings
WHERE is_valid = true
GROUP BY sensor_id, location, device_id, TIME_BUCKET('1 hour', timestamp);

-- Performance monitoring and optimization
WITH collection_stats AS (
  SELECT 
    'sensor_readings' as collection_name,
    COUNT(*) as total_documents,

    -- Time range analysis
    MIN(timestamp) as earliest_data,
    MAX(timestamp) as latest_data,
    MAX(timestamp) - MIN(timestamp) as time_span,

    -- Data volume analysis  
    COUNT(*) / EXTRACT(days FROM (MAX(timestamp) - MIN(timestamp))) as avg_docs_per_day,

    -- Quality metrics
    AVG(CASE WHEN is_valid THEN 1.0 ELSE 0.0 END) as overall_quality,
    COUNT(DISTINCT sensor_id) as unique_sensors,
    COUNT(DISTINCT device_id) as unique_devices

  FROM sensor_readings
),
performance_metrics AS (
  SELECT 
    cs.*,

    -- Storage efficiency estimates
    total_documents * 200 as estimated_storage_bytes, -- Rough estimate

    -- Query performance indicators
    CASE 
      WHEN avg_docs_per_day > 100000 THEN 'high_volume'
      WHEN avg_docs_per_day > 10000 THEN 'medium_volume'
      ELSE 'low_volume'
    END as volume_category,

    -- Recommendations
    CASE
      WHEN overall_quality < 0.9 THEN 'Review data validation and sensor calibration'
      WHEN avg_docs_per_day > 100000 THEN 'Consider additional indexing and archiving strategy'
      WHEN time_span > INTERVAL '6 months' THEN 'Implement data lifecycle management'
      ELSE 'Performance within normal parameters'
    END as recommendation

  FROM collection_stats cs
)
SELECT 
  collection_name,
  total_documents,
  TO_CHAR(earliest_data, 'YYYY-MM-DD HH24:MI') as data_start,
  TO_CHAR(latest_data, 'YYYY-MM-DD HH24:MI') as data_end,
  EXTRACT(days FROM time_span) as retention_days,
  ROUND(avg_docs_per_day::numeric, 0) as daily_ingestion_rate,
  ROUND(overall_quality * 100, 1) as quality_percentage,
  unique_sensors,
  unique_devices,
  volume_category,
  ROUND(estimated_storage_bytes / 1024.0 / 1024.0, 1) as estimated_storage_mb,
  recommendation
FROM performance_metrics;

-- QueryLeaf provides comprehensive time series capabilities:
-- 1. SQL-familiar time series collection creation and management
-- 2. High-performance batch data ingestion optimized for IoT workloads  
-- 3. Advanced time bucketing and statistical aggregations
-- 4. Sophisticated anomaly detection using multiple algorithms
-- 5. Predictive maintenance analysis with trend forecasting
-- 6. Automatic data lifecycle management and retention policies
-- 7. Performance monitoring and optimization recommendations
-- 8. Integration with MongoDB's native time series optimizations
-- 9. Real-time analytics with materialized view support
-- 10. Familiar SQL syntax for complex temporal queries and analysis

Best Practices for Time Series Implementation

Data Modeling and Schema Design

Essential practices for optimal time series performance:

  1. Granularity Selection: Choose appropriate time granularity based on data frequency and query patterns
  2. Metadata Organization: Structure metadata fields to optimize automatic bucketing and compression
  3. Measurement Optimization: Use efficient data types and avoid deep nesting for measurements
  4. Index Strategy: Create compound indexes supporting common time range and metadata queries
  5. Retention Policies: Implement automatic expiration aligned with business requirements
  6. Batch Ingestion: Use bulk operations for high-throughput IoT data ingestion

Performance and Scalability

Optimize time series collections for high-performance analytics:

  1. Bucket Sizing: Configure bucket parameters for optimal compression and query performance
  2. Query Optimization: Leverage time series specific aggregation patterns and operators
  3. Resource Planning: Size clusters appropriately for expected data volumes and query loads
  4. Archival Strategy: Implement data lifecycle management with cold storage integration
  5. Monitoring Setup: Track collection performance and optimize based on usage patterns
  6. Downsampling: Use materialized views and pre-aggregated summaries for historical analysis

Conclusion

MongoDB Time Series Collections provide purpose-built capabilities for IoT data management and temporal analytics that eliminate the complexity and limitations of traditional relational approaches. The integration of automatic compression, optimized indexing, and specialized query patterns makes building high-performance time series applications both powerful and efficient.

Key Time Series benefits include:

  • Purpose-Built Storage: Automatic partitioning and compression optimized for temporal data
  • High-Performance Ingestion: Optimized for high-frequency IoT data streams
  • Advanced Analytics: Native support for complex time-based aggregations and window functions
  • Automatic Lifecycle: Built-in retention policies and data expiration management
  • Scalable Architecture: Horizontal scaling across sharded clusters for massive datasets
  • Developer Familiar: SQL-style query patterns with specialized time series operations

Whether you're building IoT monitoring platforms, sensor networks, financial trading systems, or applications requiring time-based analytics, MongoDB Time Series Collections with QueryLeaf's familiar SQL interface provides the foundation for modern temporal data management. This combination enables you to implement sophisticated time series capabilities while preserving familiar database interaction patterns.

QueryLeaf Integration: QueryLeaf automatically manages MongoDB Time Series Collections while providing SQL-familiar time bucketing, statistical aggregations, and temporal analytics. Advanced time series features, anomaly detection, and performance optimization are seamlessly handled through familiar SQL patterns, making high-performance time series analytics both powerful and accessible.

The integration of native time series capabilities with SQL-style operations makes MongoDB an ideal platform for applications requiring both sophisticated temporal analytics and familiar database interaction patterns, ensuring your time series solutions remain both effective and maintainable as they scale and evolve.

MongoDB Change Streams and Real-Time Data Processing: SQL-Style Event-Driven Architecture for Reactive Applications

Modern applications require real-time responsiveness to data changes - instant notifications, live dashboards, automatic workflow triggers, and synchronized data across distributed systems. Traditional approaches of polling databases for changes create significant performance overhead, introduce latency delays, and consume unnecessary resources while missing the precision and immediacy that users expect from contemporary applications.

MongoDB Change Streams provide enterprise-grade real-time data processing capabilities that monitor database changes as they occur, delivering instant event notifications with complete change context, ordering guarantees, and resumability features. Unlike polling-based approaches or complex trigger systems, Change Streams integrate seamlessly with application architectures to enable reactive programming patterns and event-driven workflows.

The Traditional Change Detection Challenge

Conventional approaches to detecting data changes have significant limitations for real-time applications:

-- Traditional polling approach - inefficient and high-latency
-- Application repeatedly queries database for changes

-- PostgreSQL change detection with polling
CREATE TABLE user_activities (
    activity_id SERIAL PRIMARY KEY,
    user_id INTEGER NOT NULL,
    activity_type VARCHAR(100) NOT NULL,
    activity_data JSONB,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    processed_at TIMESTAMP,
    is_processed BOOLEAN DEFAULT false
);

-- Trigger to update timestamp on changes
CREATE OR REPLACE FUNCTION update_updated_at_column()
RETURNS TRIGGER AS $$
BEGIN
    NEW.updated_at = CURRENT_TIMESTAMP;
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER update_user_activities_updated_at
    BEFORE UPDATE ON user_activities
    FOR EACH ROW EXECUTE FUNCTION update_updated_at_column();

-- Application polling for changes (inefficient)
-- This query runs continuously every few seconds
SELECT 
    activity_id,
    user_id,
    activity_type,
    activity_data,
    created_at,
    updated_at
FROM user_activities 
WHERE (updated_at > @last_poll_time OR created_at > @last_poll_time)
  AND is_processed = false
ORDER BY created_at, updated_at
LIMIT 1000;

-- Update processed records
UPDATE user_activities 
SET is_processed = true, processed_at = CURRENT_TIMESTAMP
WHERE activity_id IN (@processed_ids);

-- Problems with polling approach:
-- 1. High database load from constant polling queries
-- 2. Polling frequency vs. latency tradeoff (faster polling = more load)
-- 3. Potential race conditions with concurrent processors
-- 4. No ordering guarantees across multiple tables
-- 5. Missed changes during application downtime
-- 6. Complex state management for resuming processing
-- 7. Difficult to scale across multiple application instances
-- 8. Resource waste during periods of no activity

-- Database triggers approach - limited and fragile
CREATE OR REPLACE FUNCTION notify_change()
RETURNS TRIGGER AS $$
BEGIN
    -- Limited payload size in PostgreSQL notifications
    PERFORM pg_notify(
        'user_activity_change',
        json_build_object(
            'operation', TG_OP,
            'table', TG_TABLE_NAME,
            'id', COALESCE(NEW.activity_id, OLD.activity_id)
        )::text
    );

    RETURN COALESCE(NEW, OLD);
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER user_activities_change_trigger
    AFTER INSERT OR UPDATE OR DELETE ON user_activities
    FOR EACH ROW EXECUTE FUNCTION notify_change();

-- Application listening for notifications
-- Limited payload, no automatic reconnection, fragile connections
LISTEN user_activity_change;

-- Trigger limitations:
-- - Limited payload size (8000 bytes in PostgreSQL)
-- - Connection-based, not resilient to network issues  
-- - No built-in resume capability after disconnection
-- - Complex coordination across multiple database connections
-- - Difficult to filter events at database level
-- - No ordering guarantees across transactions
-- - Performance impact on write operations

MongoDB Change Streams provide comprehensive real-time change processing:

// MongoDB Change Streams - enterprise-grade real-time data processing
const { MongoClient } = require('mongodb');

const client = new MongoClient('mongodb://localhost:27017');
const db = client.db('production_app');

// Comprehensive change stream with advanced filtering and processing
async function setupAdvancedChangeStream() {
  // Create change stream with sophisticated pipeline filtering
  const changeStream = db.collection('user_activities').watch([
    // Match specific operations and conditions
    {
      $match: {
        $and: [
          // Only monitor insert and update operations
          { operationType: { $in: ['insert', 'update', 'replace'] } },

          // Filter by activity types we care about
          {
            $or: [
              { 'fullDocument.activity_type': { $in: ['purchase', 'login', 'signup'] } },
              { 'updateDescription.updatedFields.status': { $exists: true } },
              { 'fullDocument.priority': 'high' }
            ]
          },

          // Only process activities for active users
          { 'fullDocument.user_status': 'active' },

          // Exclude system-generated activities
          { 'fullDocument.source': { $ne: 'system_maintenance' } }
        ]
      }
    },

    // Enrich change events with additional context
    {
      $lookup: {
        from: 'users',
        localField: 'fullDocument.user_id',
        foreignField: '_id',
        as: 'user_info'
      }
    },

    // Add computed fields for processing
    {
      $addFields: {
        processedAt: new Date(),
        changeId: { $toString: '$_id' },
        user: { $arrayElemAt: ['$user_info', 0] },

        // Categorize change types
        changeCategory: {
          $switch: {
            branches: [
              { case: { $eq: ['$operationType', 'insert'] }, then: 'new_activity' },
              { 
                case: { 
                  $and: [
                    { $eq: ['$operationType', 'update'] },
                    { $ifNull: ['$updateDescription.updatedFields.status', false] }
                  ]
                }, 
                then: 'status_change' 
              },
              { case: { $eq: ['$operationType', 'replace'] }, then: 'activity_replaced' }
            ],
            default: 'other_change'
          }
        },

        // Priority scoring
        priorityScore: {
          $switch: {
            branches: [
              { case: { $eq: ['$fullDocument.activity_type', 'purchase'] }, then: 10 },
              { case: { $eq: ['$fullDocument.activity_type', 'signup'] }, then: 8 },
              { case: { $eq: ['$fullDocument.activity_type', 'login'] }, then: 3 },
              { case: { $eq: ['$fullDocument.priority', 'high'] }, then: 9 }
            ],
            default: 5
          }
        }
      }
    },

    // Project final change document structure
    {
      $project: {
        changeId: 1,
        operationType: 1,
        changeCategory: 1,
        priorityScore: 1,
        processedAt: 1,
        clusterTime: 1,

        // Original document data
        documentKey: 1,
        fullDocument: 1,
        updateDescription: 1,

        // User context
        'user.username': 1,
        'user.email': 1,
        'user.subscription_type': 1,
        'user.segment': 1,

        // Metadata
        ns: 1,
        to: 1
      }
    }
  ], {
    // Change stream options
    fullDocument: 'updateLookup',        // Always include full document
    fullDocumentBeforeChange: 'whenAvailable', // Include before-change document
    resumeAfter: null,                   // Resume token (set from previous session)
    startAtOperationTime: null,          // Start from specific time
    maxAwaitTimeMS: 1000,               // Maximum time to wait for changes
    batchSize: 100,                      // Batch size for change events
    collation: { locale: 'en', strength: 2 } // Collation for text matching
  });

  // Process change stream events
  console.log('Monitoring user activities for real-time changes...');

  for await (const change of changeStream) {
    try {
      await processChangeEvent(change);

      // Store resume token for fault tolerance
      await storeResumeToken(change._id);

    } catch (error) {
      console.error('Error processing change event:', error);

      // Implement error handling strategy
      await handleChangeProcessingError(change, error);
    }
  }
}

// Sophisticated change event processing
async function processChangeEvent(change) {
  console.log(`Processing ${change.changeCategory} event:`, {
    changeId: change.changeId,
    operationType: change.operationType,
    priority: change.priorityScore,
    user: change.user?.username,
    timestamp: change.processedAt
  });

  // Route change events based on type and priority
  switch (change.changeCategory) {
    case 'new_activity':
      await handleNewActivity(change);
      break;

    case 'status_change':
      await handleStatusChange(change);
      break;

    case 'activity_replaced':
      await handleActivityReplacement(change);
      break;

    default:
      await handleGenericChange(change);
  }

  // Emit real-time event to connected clients
  await emitRealTimeEvent(change);

  // Update analytics and metrics
  await updateRealtimeMetrics(change);
}

async function handleNewActivity(change) {
  const activity = change.fullDocument;
  const user = change.user;

  // Process high-priority activities immediately
  if (change.priorityScore >= 8) {
    await processHighPriorityActivity(activity, user);
  }

  // Trigger automated workflows
  switch (activity.activity_type) {
    case 'purchase':
      await triggerPurchaseWorkflow(activity, user);
      break;

    case 'signup':
      await triggerOnboardingWorkflow(activity, user);
      break;

    case 'login':
      await updateUserSession(activity, user);
      break;
  }

  // Update real-time dashboards
  await updateLiveDashboard('new_activity', {
    activityType: activity.activity_type,
    userId: activity.user_id,
    userSegment: user.segment,
    timestamp: activity.created_at
  });
}

async function handleStatusChange(change) {
  const updatedFields = change.updateDescription.updatedFields;
  const activity = change.fullDocument;

  // Process status-specific logic
  if (updatedFields.status) {
    console.log(`Activity status changed: ${updatedFields.status}`);

    switch (updatedFields.status) {
      case 'completed':
        await handleActivityCompletion(activity);
        break;

      case 'failed':
        await handleActivityFailure(activity);
        break;

      case 'cancelled':
        await handleActivityCancellation(activity);
        break;
    }
  }

  // Notify interested parties
  await sendStatusChangeNotification(change);
}

// Benefits of MongoDB Change Streams:
// - Real-time event delivery with sub-second latency
// - Complete change context including before/after state
// - Resumable streams with automatic fault tolerance
// - Advanced filtering and transformation capabilities
// - Ordering guarantees within and across collections
// - Integration with existing MongoDB infrastructure
// - Scalable across sharded clusters and replica sets
// - Built-in authentication and authorization
// - No polling overhead or resource waste
// - Developer-friendly API with powerful aggregation pipeline

Understanding MongoDB Change Streams Architecture

Advanced Change Stream Configuration and Management

Implement comprehensive change stream management for production environments:

// Advanced change stream management system
class MongoChangeStreamManager {
  constructor(client, options = {}) {
    this.client = client;
    this.db = client.db(options.database || 'production');
    this.options = {
      // Stream configuration
      maxRetries: options.maxRetries || 10,
      retryDelay: options.retryDelay || 1000,
      batchSize: options.batchSize || 100,
      maxAwaitTimeMS: options.maxAwaitTimeMS || 1000,

      // Resume configuration
      enableResume: options.enableResume !== false,
      resumeTokenStorage: options.resumeTokenStorage || 'mongodb',

      // Error handling
      errorRetryStrategies: options.errorRetryStrategies || ['exponential_backoff', 'circuit_breaker'],

      // Monitoring
      enableMetrics: options.enableMetrics !== false,
      metricsInterval: options.metricsInterval || 30000,

      ...options
    };

    this.activeStreams = new Map();
    this.resumeTokens = new Map();
    this.streamMetrics = new Map();
    this.eventHandlers = new Map();
    this.isShuttingDown = false;
  }

  async createChangeStream(streamConfig) {
    const {
      streamId,
      collection,
      pipeline = [],
      options = {},
      eventHandlers = {}
    } = streamConfig;

    if (this.activeStreams.has(streamId)) {
      throw new Error(`Change stream with ID '${streamId}' already exists`);
    }

    // Build comprehensive change stream pipeline
    const changeStreamPipeline = [
      // Base filtering
      {
        $match: {
          $and: [
            // Operation type filtering
            streamConfig.operationTypes ? {
              operationType: { $in: streamConfig.operationTypes }
            } : {},

            // Namespace filtering
            streamConfig.namespaces ? {
              'ns.coll': { $in: streamConfig.namespaces.map(ns => ns.collection || ns) }
            } : {},

            // Custom filtering
            ...(streamConfig.filters || [])
          ].filter(filter => Object.keys(filter).length > 0)
        }
      },

      // Enrichment lookups
      ...(streamConfig.enrichments || []).map(enrichment => ({
        $lookup: {
          from: enrichment.from,
          localField: enrichment.localField,
          foreignField: enrichment.foreignField,
          as: enrichment.as,
          pipeline: enrichment.pipeline || []
        }
      })),

      // Computed fields
      {
        $addFields: {
          streamId: streamId,
          processedAt: new Date(),
          changeId: { $toString: '$_id' },

          // Change categorization
          changeCategory: streamConfig.categorization || {
            $switch: {
              branches: [
                { case: { $eq: ['$operationType', 'insert'] }, then: 'create' },
                { case: { $eq: ['$operationType', 'update'] }, then: 'update' },
                { case: { $eq: ['$operationType', 'replace'] }, then: 'replace' },
                { case: { $eq: ['$operationType', 'delete'] }, then: 'delete' }
              ],
              default: 'other'
            }
          },

          // Priority scoring
          priority: streamConfig.priorityScoring || 5,

          // Custom computed fields
          ...streamConfig.computedFields || {}
        }
      },

      // Additional pipeline stages
      ...pipeline,

      // Final projection
      {
        $project: {
          _id: 1,
          streamId: 1,
          changeId: 1,
          processedAt: 1,
          operationType: 1,
          changeCategory: 1,
          priority: 1,
          clusterTime: 1,
          documentKey: 1,
          fullDocument: 1,
          updateDescription: 1,
          ns: 1,
          to: 1,
          ...streamConfig.additionalProjection || {}
        }
      }
    ];

    // Configure change stream options
    const changeStreamOptions = {
      fullDocument: streamConfig.fullDocument || 'updateLookup',
      fullDocumentBeforeChange: streamConfig.fullDocumentBeforeChange || 'whenAvailable',
      resumeAfter: await this.getStoredResumeToken(streamId),
      maxAwaitTimeMS: this.options.maxAwaitTimeMS,
      batchSize: this.options.batchSize,
      ...options
    };

    // Create change stream
    const changeStream = collection ? 
      this.db.collection(collection).watch(changeStreamPipeline, changeStreamOptions) :
      this.db.watch(changeStreamPipeline, changeStreamOptions);

    // Store stream configuration
    this.activeStreams.set(streamId, {
      stream: changeStream,
      config: streamConfig,
      options: changeStreamOptions,
      createdAt: new Date(),
      lastEventAt: null,
      eventCount: 0,
      errorCount: 0,
      retryCount: 0
    });

    // Initialize metrics
    this.streamMetrics.set(streamId, {
      eventsProcessed: 0,
      errorsEncountered: 0,
      avgProcessingTime: 0,
      lastProcessingTime: 0,
      throughputHistory: [],
      errorHistory: [],
      resumeHistory: []
    });

    // Store event handlers
    this.eventHandlers.set(streamId, eventHandlers);

    // Start processing
    this.processChangeStream(streamId);

    console.log(`Change stream '${streamId}' created and started`);
    return streamId;
  }

  async processChangeStream(streamId) {
    const streamInfo = this.activeStreams.get(streamId);
    const metrics = this.streamMetrics.get(streamId);
    const handlers = this.eventHandlers.get(streamId);

    if (!streamInfo) {
      console.error(`Change stream '${streamId}' not found`);
      return;
    }

    const { stream, config } = streamInfo;

    try {
      console.log(`Starting event processing for stream: ${streamId}`);

      for await (const change of stream) {
        if (this.isShuttingDown) {
          console.log(`Shutting down stream: ${streamId}`);
          break;
        }

        const processingStartTime = Date.now();

        try {
          // Process the change event
          await this.processChangeEvent(streamId, change, handlers);

          // Update metrics
          const processingTime = Date.now() - processingStartTime;
          this.updateStreamMetrics(streamId, processingTime, true);

          // Store resume token
          await this.storeResumeToken(streamId, change._id);

          // Update stream info
          streamInfo.lastEventAt = new Date();
          streamInfo.eventCount++;

        } catch (error) {
          console.error(`Error processing change event in stream '${streamId}':`, error);

          // Update error metrics
          const processingTime = Date.now() - processingStartTime;
          this.updateStreamMetrics(streamId, processingTime, false);

          streamInfo.errorCount++;

          // Handle processing error
          await this.handleProcessingError(streamId, change, error);
        }
      }

    } catch (error) {
      console.error(`Change stream '${streamId}' encountered error:`, error);

      if (!this.isShuttingDown) {
        await this.handleStreamError(streamId, error);
      }
    }
  }

  async processChangeEvent(streamId, change, handlers) {
    // Route to appropriate handler based on change type
    const handlerKey = change.changeCategory || change.operationType;
    const handler = handlers[handlerKey] || handlers.default || this.defaultEventHandler;

    if (typeof handler === 'function') {
      await handler(change, {
        streamId,
        metrics: this.streamMetrics.get(streamId),
        resumeToken: change._id
      });
    } else {
      console.warn(`No handler found for change type '${handlerKey}' in stream '${streamId}'`);
    }
  }

  async defaultEventHandler(change, context) {
    console.log(`Default handler processing change:`, {
      streamId: context.streamId,
      changeId: change.changeId,
      operationType: change.operationType,
      collection: change.ns?.coll
    });
  }

  updateStreamMetrics(streamId, processingTime, success) {
    const metrics = this.streamMetrics.get(streamId);
    if (!metrics) return;

    metrics.eventsProcessed++;
    metrics.lastProcessingTime = processingTime;

    // Update average processing time (exponential moving average)
    metrics.avgProcessingTime = (metrics.avgProcessingTime * 0.9) + (processingTime * 0.1);

    if (success) {
      // Update throughput history
      metrics.throughputHistory.push({
        timestamp: Date.now(),
        processingTime: processingTime
      });

      // Keep only recent history
      if (metrics.throughputHistory.length > 1000) {
        metrics.throughputHistory.shift();
      }
    } else {
      metrics.errorsEncountered++;

      // Record error
      metrics.errorHistory.push({
        timestamp: Date.now(),
        processingTime: processingTime
      });

      // Keep only recent error history
      if (metrics.errorHistory.length > 100) {
        metrics.errorHistory.shift();
      }
    }
  }

  async handleProcessingError(streamId, change, error) {
    const streamInfo = this.activeStreams.get(streamId);
    const config = streamInfo?.config;

    // Log error details
    console.error(`Processing error in stream '${streamId}':`, {
      changeId: change.changeId,
      operationType: change.operationType,
      error: error.message
    });

    // Apply error handling strategies
    if (config?.errorHandling) {
      const strategy = config.errorHandling.strategy || 'log';

      switch (strategy) {
        case 'retry':
          await this.retryChangeEvent(streamId, change, error);
          break;

        case 'deadletter':
          await this.sendToDeadLetter(streamId, change, error);
          break;

        case 'skip':
          console.warn(`Skipping failed change event: ${change.changeId}`);
          break;

        case 'stop_stream':
          console.error(`Stopping stream '${streamId}' due to processing error`);
          await this.stopChangeStream(streamId);
          break;

        default:
          console.error(`Unhandled processing error in stream '${streamId}'`);
      }
    }
  }

  async handleStreamError(streamId, error) {
    const streamInfo = this.activeStreams.get(streamId);
    if (!streamInfo) return;

    console.error(`Stream error in '${streamId}':`, error.message);

    // Increment retry count
    streamInfo.retryCount++;

    // Check if we should retry
    if (streamInfo.retryCount <= this.options.maxRetries) {
      console.log(`Retrying stream '${streamId}' (attempt ${streamInfo.retryCount})`);

      // Exponential backoff
      const delay = this.options.retryDelay * Math.pow(2, streamInfo.retryCount - 1);
      await this.sleep(delay);

      // Record resume attempt
      const metrics = this.streamMetrics.get(streamId);
      if (metrics) {
        metrics.resumeHistory.push({
          timestamp: Date.now(),
          attempt: streamInfo.retryCount,
          error: error.message
        });
      }

      // Restart the stream
      await this.restartChangeStream(streamId);
    } else {
      console.error(`Maximum retries exceeded for stream '${streamId}'. Marking as failed.`);
      streamInfo.status = 'failed';
      streamInfo.lastError = error;
    }
  }

  async restartChangeStream(streamId) {
    const streamInfo = this.activeStreams.get(streamId);
    if (!streamInfo) return;

    console.log(`Restarting change stream: ${streamId}`);

    try {
      // Close existing stream
      await streamInfo.stream.close();
    } catch (closeError) {
      console.warn(`Error closing stream '${streamId}':`, closeError.message);
    }

    // Update stream options with resume token
    const resumeToken = await this.getStoredResumeToken(streamId);
    if (resumeToken) {
      streamInfo.options.resumeAfter = resumeToken;
      console.log(`Resuming stream '${streamId}' from stored token`);
    }

    // Create new change stream
    const changeStreamPipeline = streamInfo.config.pipeline || [];
    const newStream = streamInfo.config.collection ? 
      this.db.collection(streamInfo.config.collection).watch(changeStreamPipeline, streamInfo.options) :
      this.db.watch(changeStreamPipeline, streamInfo.options);

    // Update stream reference
    streamInfo.stream = newStream;
    streamInfo.restartedAt = new Date();

    // Resume processing
    this.processChangeStream(streamId);
  }

  async storeResumeToken(streamId, resumeToken) {
    if (!this.options.enableResume) return;

    this.resumeTokens.set(streamId, {
      token: resumeToken,
      timestamp: new Date()
    });

    // Store persistently based on configuration
    if (this.options.resumeTokenStorage === 'mongodb') {
      await this.db.collection('change_stream_resume_tokens').updateOne(
        { streamId: streamId },
        {
          $set: {
            resumeToken: resumeToken,
            updatedAt: new Date()
          }
        },
        { upsert: true }
      );
    } else if (this.options.resumeTokenStorage === 'redis' && this.redisClient) {
      await this.redisClient.set(
        `resume_token:${streamId}`,
        JSON.stringify({
          token: resumeToken,
          timestamp: new Date()
        })
      );
    }
  }

  async getStoredResumeToken(streamId) {
    if (!this.options.enableResume) return null;

    // Check memory first
    const memoryToken = this.resumeTokens.get(streamId);
    if (memoryToken) {
      return memoryToken.token;
    }

    // Load from persistent storage
    try {
      if (this.options.resumeTokenStorage === 'mongodb') {
        const tokenDoc = await this.db.collection('change_stream_resume_tokens').findOne(
          { streamId: streamId }
        );
        return tokenDoc?.resumeToken || null;
      } else if (this.options.resumeTokenStorage === 'redis' && this.redisClient) {
        const tokenData = await this.redisClient.get(`resume_token:${streamId}`);
        return tokenData ? JSON.parse(tokenData).token : null;
      }
    } catch (error) {
      console.warn(`Error loading resume token for stream '${streamId}':`, error.message);
    }

    return null;
  }

  async stopChangeStream(streamId) {
    const streamInfo = this.activeStreams.get(streamId);
    if (!streamInfo) {
      console.warn(`Change stream '${streamId}' not found`);
      return;
    }

    console.log(`Stopping change stream: ${streamId}`);

    try {
      await streamInfo.stream.close();
      streamInfo.stoppedAt = new Date();
      streamInfo.status = 'stopped';

      console.log(`Change stream '${streamId}' stopped successfully`);
    } catch (error) {
      console.error(`Error stopping stream '${streamId}':`, error);
    }
  }

  async getStreamMetrics(streamId) {
    if (streamId) {
      return {
        streamInfo: this.activeStreams.get(streamId),
        metrics: this.streamMetrics.get(streamId)
      };
    } else {
      // Return metrics for all streams
      const allMetrics = {};
      for (const [id, streamInfo] of this.activeStreams.entries()) {
        allMetrics[id] = {
          streamInfo: streamInfo,
          metrics: this.streamMetrics.get(id)
        };
      }
      return allMetrics;
    }
  }

  async startMonitoring() {
    if (this.monitoringInterval) return;

    console.log('Starting change stream monitoring');

    this.monitoringInterval = setInterval(async () => {
      try {
        await this.performHealthCheck();
      } catch (error) {
        console.error('Monitoring check failed:', error);
      }
    }, this.options.metricsInterval);
  }

  async performHealthCheck() {
    for (const [streamId, streamInfo] of this.activeStreams.entries()) {
      const metrics = this.streamMetrics.get(streamId);

      // Check stream health
      const health = this.assessStreamHealth(streamId, streamInfo, metrics);

      if (health.status !== 'healthy') {
        console.warn(`Stream '${streamId}' health check:`, health);
      }

      // Log throughput metrics
      if (metrics.throughputHistory.length > 0) {
        const recentEvents = metrics.throughputHistory.filter(
          event => Date.now() - event.timestamp < 60000 // Last minute
        );

        if (recentEvents.length > 0) {
          const avgThroughput = recentEvents.length; // Events per minute
          console.log(`Stream '${streamId}' throughput: ${avgThroughput} events/minute`);
        }
      }
    }
  }

  assessStreamHealth(streamId, streamInfo, metrics) {
    const health = {
      streamId: streamId,
      status: 'healthy',
      issues: [],
      recommendations: []
    };

    // Check error rate
    if (metrics.errorsEncountered > 0 && metrics.eventsProcessed > 0) {
      const errorRate = (metrics.errorsEncountered / metrics.eventsProcessed) * 100;
      if (errorRate > 10) {
        health.status = 'unhealthy';
        health.issues.push(`High error rate: ${errorRate.toFixed(2)}%`);
        health.recommendations.push('Investigate error patterns and processing logic');
      } else if (errorRate > 5) {
        health.status = 'warning';
        health.issues.push(`Elevated error rate: ${errorRate.toFixed(2)}%`);
      }
    }

    // Check processing performance
    if (metrics.avgProcessingTime > 5000) {
      health.issues.push(`Slow processing: ${metrics.avgProcessingTime.toFixed(0)}ms average`);
      health.recommendations.push('Optimize event processing logic');
      if (health.status === 'healthy') health.status = 'warning';
    }

    // Check stream activity
    const timeSinceLastEvent = streamInfo.lastEventAt ? 
      Date.now() - streamInfo.lastEventAt.getTime() : 
      Date.now() - streamInfo.createdAt.getTime();

    if (timeSinceLastEvent > 3600000) { // 1 hour
      health.issues.push(`No events for ${Math.round(timeSinceLastEvent / 60000)} minutes`);
      health.recommendations.push('Verify data source and stream configuration');
    }

    // Check retry count
    if (streamInfo.retryCount > 3) {
      health.issues.push(`Multiple retries: ${streamInfo.retryCount} attempts`);
      health.recommendations.push('Investigate connection stability and error causes');
      if (health.status === 'healthy') health.status = 'warning';
    }

    return health;
  }

  sleep(ms) {
    return new Promise(resolve => setTimeout(resolve, ms));
  }

  async shutdown() {
    console.log('Shutting down change stream manager...');

    this.isShuttingDown = true;

    // Stop monitoring
    if (this.monitoringInterval) {
      clearInterval(this.monitoringInterval);
      this.monitoringInterval = null;
    }

    // Close all active streams
    const closePromises = [];
    for (const [streamId] of this.activeStreams.entries()) {
      closePromises.push(this.stopChangeStream(streamId));
    }

    await Promise.all(closePromises);

    console.log('Change stream manager shutdown complete');
  }
}

Real-Time Event Processing Patterns

Implement sophisticated event processing patterns for different application scenarios:

// Specialized change stream patterns for different use cases
class RealtimeEventPatterns {
  constructor(changeStreamManager) {
    this.csm = changeStreamManager;
    this.eventBus = new EventEmitter();
    this.processors = new Map();
  }

  async setupUserActivityStream() {
    // Real-time user activity monitoring
    return await this.csm.createChangeStream({
      streamId: 'user_activities',
      collection: 'user_activities',
      operationTypes: ['insert', 'update'],

      filters: [
        { 'fullDocument.activity_type': { $in: ['login', 'purchase', 'view', 'search'] } },
        { 'fullDocument.user_id': { $exists: true } }
      ],

      enrichments: [
        {
          from: 'users',
          localField: 'fullDocument.user_id',
          foreignField: '_id',
          as: 'user_data'
        },
        {
          from: 'user_sessions',
          localField: 'fullDocument.session_id',
          foreignField: '_id',
          as: 'session_data'
        }
      ],

      computedFields: {
        activityScore: {
          $switch: {
            branches: [
              { case: { $eq: ['$fullDocument.activity_type', 'purchase'] }, then: 100 },
              { case: { $eq: ['$fullDocument.activity_type', 'login'] }, then: 10 },
              { case: { $eq: ['$fullDocument.activity_type', 'search'] }, then: 5 },
              { case: { $eq: ['$fullDocument.activity_type', 'view'] }, then: 1 }
            ],
            default: 0
          }
        },

        userSegment: { $arrayElemAt: ['$user_data.segment', 0] },
        sessionDuration: { $arrayElemAt: ['$session_data.duration', 0] }
      },

      eventHandlers: {
        insert: async (change, context) => {
          await this.handleNewUserActivity(change);
        },
        update: async (change, context) => {
          await this.handleUserActivityUpdate(change);
        }
      },

      errorHandling: {
        strategy: 'retry',
        maxRetries: 3
      }
    });
  }

  async handleNewUserActivity(change) {
    const activity = change.fullDocument;
    const user = change.user_data?.[0];

    console.log(`New user activity: ${activity.activity_type}`, {
      userId: activity.user_id,
      username: user?.username,
      activityScore: change.activityScore,
      timestamp: activity.created_at
    });

    // Real-time user engagement tracking
    await this.updateUserEngagement(activity, user);

    // Trigger personalization engine
    if (change.activityScore >= 5) {
      await this.triggerPersonalizationUpdate(activity, user);
    }

    // Real-time recommendations
    if (activity.activity_type === 'view' || activity.activity_type === 'search') {
      await this.updateRecommendations(activity, user);
    }

    // Fraud detection for high-value activities
    if (activity.activity_type === 'purchase') {
      await this.analyzeFraudRisk(activity, user, change.session_data?.[0]);
    }

    // Live dashboard updates
    this.eventBus.emit('user_activity', {
      type: 'new_activity',
      activity: activity,
      user: user,
      score: change.activityScore
    });
  }

  async setupOrderProcessingStream() {
    // Real-time order processing and fulfillment
    return await this.csm.createChangeStream({
      streamId: 'order_processing',
      collection: 'orders',
      operationTypes: ['insert', 'update'],

      filters: [
        {
          $or: [
            { operationType: 'insert' },
            { 'updateDescription.updatedFields.status': { $exists: true } }
          ]
        }
      ],

      enrichments: [
        {
          from: 'customers',
          localField: 'fullDocument.customer_id',
          foreignField: '_id',
          as: 'customer_data'
        },
        {
          from: 'inventory',
          localField: 'fullDocument.items.product_id',
          foreignField: '_id',
          as: 'inventory_data'
        }
      ],

      computedFields: {
        orderValue: '$fullDocument.total_amount',
        orderPriority: {
          $switch: {
            branches: [
              { case: { $gt: ['$fullDocument.total_amount', 1000] }, then: 'high' },
              { case: { $gt: ['$fullDocument.total_amount', 500] }, then: 'medium' }
            ],
            default: 'normal'
          }
        },
        customerTier: { $arrayElemAt: ['$customer_data.tier', 0] }
      },

      eventHandlers: {
        insert: async (change, context) => {
          await this.handleNewOrder(change);
        },
        update: async (change, context) => {
          await this.handleOrderStatusChange(change);
        }
      }
    });
  }

  async handleNewOrder(change) {
    const order = change.fullDocument;
    const customer = change.customer_data?.[0];

    console.log(`New order received:`, {
      orderId: order._id,
      customerId: order.customer_id,
      customerTier: change.customerTier,
      orderValue: change.orderValue,
      priority: change.orderPriority
    });

    // Inventory allocation
    await this.allocateInventory(order, change.inventory_data);

    // Payment processing
    if (order.payment_method) {
      await this.processPayment(order, customer);
    }

    // Shipping calculation
    await this.calculateShipping(order, customer);

    // Notification systems
    await this.sendOrderConfirmation(order, customer);

    // Analytics and reporting
    this.eventBus.emit('new_order', {
      order: order,
      customer: customer,
      priority: change.orderPriority,
      value: change.orderValue
    });
  }

  async handleOrderStatusChange(change) {
    const updatedFields = change.updateDescription.updatedFields;
    const order = change.fullDocument;

    if (updatedFields.status) {
      console.log(`Order status changed: ${order._id} -> ${updatedFields.status}`);

      switch (updatedFields.status) {
        case 'confirmed':
          await this.handleOrderConfirmation(order);
          break;
        case 'shipped':
          await this.handleOrderShipment(order);
          break;
        case 'delivered':
          await this.handleOrderDelivery(order);
          break;
        case 'cancelled':
          await this.handleOrderCancellation(order);
          break;
      }

      // Customer notifications
      await this.sendStatusUpdateNotification(order, updatedFields.status);
    }
  }

  async setupInventoryManagementStream() {
    // Real-time inventory tracking and alerts
    return await this.csm.createChangeStream({
      streamId: 'inventory_management',
      collection: 'inventory',
      operationTypes: ['update'],

      filters: [
        {
          $or: [
            { 'updateDescription.updatedFields.quantity': { $exists: true } },
            { 'updateDescription.updatedFields.reserved_quantity': { $exists: true } },
            { 'updateDescription.updatedFields.available_quantity': { $exists: true } }
          ]
        }
      ],

      enrichments: [
        {
          from: 'products',
          localField: 'documentKey._id',
          foreignField: 'inventory_id',
          as: 'product_data'
        }
      ],

      computedFields: {
        stockLevel: '$fullDocument.available_quantity',
        reorderThreshold: '$fullDocument.reorder_level',
        stockStatus: {
          $cond: {
            if: { $lte: ['$fullDocument.available_quantity', '$fullDocument.reorder_level'] },
            then: 'low_stock',
            else: 'in_stock'
          }
        }
      },

      eventHandlers: {
        update: async (change, context) => {
          await this.handleInventoryChange(change);
        }
      }
    });
  }

  async handleInventoryChange(change) {
    const inventory = change.fullDocument;
    const updatedFields = change.updateDescription.updatedFields;
    const product = change.product_data?.[0];

    console.log(`Inventory updated:`, {
      productId: product?._id,
      productName: product?.name,
      previousQuantity: updatedFields.quantity,
      currentQuantity: inventory.available_quantity,
      stockStatus: change.stockStatus
    });

    // Low stock alerts
    if (change.stockStatus === 'low_stock') {
      await this.triggerLowStockAlert(inventory, product);
    }

    // Out of stock handling
    if (inventory.available_quantity <= 0) {
      await this.handleOutOfStock(inventory, product);
    }

    // Automatic reordering
    if (inventory.auto_reorder && inventory.available_quantity <= inventory.reorder_level) {
      await this.triggerAutomaticReorder(inventory, product);
    }

    // Live inventory dashboard
    this.eventBus.emit('inventory_change', {
      inventory: inventory,
      product: product,
      stockStatus: change.stockStatus,
      quantityChange: updatedFields.quantity ? 
        inventory.available_quantity - updatedFields.quantity : 0
    });
  }

  async setupMultiCollectionStream() {
    // Monitor changes across multiple collections
    return await this.csm.createChangeStream({
      streamId: 'multi_collection_monitor',
      operationTypes: ['insert', 'update', 'delete'],

      filters: [
        {
          'ns.coll': { 
            $in: ['users', 'orders', 'products', 'reviews'] 
          }
        }
      ],

      computedFields: {
        collectionType: '$ns.coll',
        businessImpact: {
          $switch: {
            branches: [
              { case: { $eq: ['$ns.coll', 'orders'] }, then: 'high' },
              { case: { $eq: ['$ns.coll', 'users'] }, then: 'medium' },
              { case: { $eq: ['$ns.coll', 'products'] }, then: 'medium' },
              { case: { $eq: ['$ns.coll', 'reviews'] }, then: 'low' }
            ],
            default: 'unknown'
          }
        }
      },

      eventHandlers: {
        insert: async (change, context) => {
          await this.handleMultiCollectionInsert(change);
        },
        update: async (change, context) => {
          await this.handleMultiCollectionUpdate(change);
        },
        delete: async (change, context) => {
          await this.handleMultiCollectionDelete(change);
        }
      }
    });
  }

  async handleMultiCollectionInsert(change) {
    const collection = change.ns.coll;

    switch (collection) {
      case 'users':
        await this.handleNewUser(change.fullDocument);
        break;
      case 'orders':
        await this.handleNewOrder(change);
        break;
      case 'products':
        await this.handleNewProduct(change.fullDocument);
        break;
      case 'reviews':
        await this.handleNewReview(change.fullDocument);
        break;
    }

    // Cross-collection analytics
    await this.updateCrossCollectionMetrics(collection, 'insert');
  }

  async setupAggregationUpdateStream() {
    // Monitor changes that require aggregation updates
    return await this.csm.createChangeStream({
      streamId: 'aggregation_updates',
      operationTypes: ['insert', 'update', 'delete'],

      filters: [
        {
          $or: [
            // Order changes affecting customer metrics
            { 
              $and: [
                { 'ns.coll': 'orders' },
                { 'fullDocument.status': 'completed' }
              ]
            },
            // Review changes affecting product ratings
            { 'ns.coll': 'reviews' },
            // Activity changes affecting user engagement
            { 
              $and: [
                { 'ns.coll': 'user_activities' },
                { 'fullDocument.activity_type': { $in: ['purchase', 'view', 'like'] } }
              ]
            }
          ]
        }
      ],

      eventHandlers: {
        default: async (change, context) => {
          await this.handleAggregationUpdate(change);
        }
      }
    });
  }

  async handleAggregationUpdate(change) {
    const collection = change.ns.coll;
    const document = change.fullDocument;

    switch (collection) {
      case 'orders':
        if (document.status === 'completed') {
          await this.updateCustomerMetrics(document.customer_id);
          await this.updateProductSalesMetrics(document.items);
        }
        break;

      case 'reviews':
        await this.updateProductRatings(document.product_id);
        break;

      case 'user_activities':
        await this.updateUserEngagementMetrics(document.user_id);
        break;
    }
  }

  // Analytics and Metrics Updates
  async updateUserEngagement(activity, user) {
    // Update real-time user engagement metrics
    const engagementUpdate = {
      $inc: {
        'metrics.total_activities': 1,
        [`metrics.activity_counts.${activity.activity_type}`]: 1
      },
      $set: {
        'metrics.last_activity': activity.created_at,
        'metrics.updated_at': new Date()
      }
    };

    await this.csm.db.collection('user_engagement').updateOne(
      { user_id: activity.user_id },
      engagementUpdate,
      { upsert: true }
    );
  }

  async updateCustomerMetrics(customerId) {
    // Recalculate customer lifetime value and order metrics
    const pipeline = [
      { $match: { customer_id: customerId, status: 'completed' } },
      {
        $group: {
          _id: '$customer_id',
          totalOrders: { $sum: 1 },
          totalSpent: { $sum: '$total_amount' },
          avgOrderValue: { $avg: '$total_amount' },
          lastOrderDate: { $max: '$created_at' },
          firstOrderDate: { $min: '$created_at' }
        }
      }
    ];

    const result = await this.csm.db.collection('orders').aggregate(pipeline).toArray();

    if (result.length > 0) {
      const metrics = result[0];
      await this.csm.db.collection('customer_metrics').updateOne(
        { customer_id: customerId },
        {
          $set: {
            ...metrics,
            updated_at: new Date()
          }
        },
        { upsert: true }
      );
    }
  }

  // Event Bus Integration
  setupEventBusHandlers() {
    this.eventBus.on('user_activity', (data) => {
      // Emit to external systems (WebSocket, message queue, etc.)
      this.emitToExternalSystems('user_activity', data);
    });

    this.eventBus.on('new_order', (data) => {
      this.emitToExternalSystems('new_order', data);
    });

    this.eventBus.on('inventory_change', (data) => {
      this.emitToExternalSystems('inventory_change', data);
    });
  }

  async emitToExternalSystems(eventType, data) {
    // WebSocket broadcasting
    if (this.wsServer) {
      this.wsServer.broadcast(JSON.stringify({
        type: eventType,
        data: data,
        timestamp: new Date()
      }));
    }

    // Message queue publishing
    if (this.messageQueue) {
      await this.messageQueue.publish(eventType, data);
    }

    // Webhook notifications
    if (this.webhookHandler) {
      await this.webhookHandler.notify(eventType, data);
    }
  }

  async shutdown() {
    console.log('Shutting down real-time event patterns...');
    this.eventBus.removeAllListeners();
    await this.csm.shutdown();
  }
}

SQL-Style Change Stream Operations with QueryLeaf

QueryLeaf provides familiar SQL approaches to MongoDB Change Stream configuration and monitoring:

-- QueryLeaf change stream operations with SQL-familiar syntax

-- Create change stream with advanced filtering
CREATE CHANGE_STREAM user_activities_stream ON user_activities
WITH (
  operations = ARRAY['insert', 'update'],
  resume_token_storage = 'mongodb',
  batch_size = 100,
  max_await_time_ms = 1000
)
FILTER (
  activity_type IN ('login', 'purchase', 'view', 'search') AND
  user_id IS NOT NULL AND
  created_at >= CURRENT_TIMESTAMP - INTERVAL '1 hour'
)
ENRICH WITH (
  users ON user_activities.user_id = users._id AS user_data,
  user_sessions ON user_activities.session_id = user_sessions._id AS session_data
)
COMPUTE (
  activity_score = CASE 
    WHEN activity_type = 'purchase' THEN 100
    WHEN activity_type = 'login' THEN 10
    WHEN activity_type = 'search' THEN 5
    WHEN activity_type = 'view' THEN 1
    ELSE 0
  END,
  user_segment = user_data.segment,
  session_duration = session_data.duration
);

-- Monitor change stream with real-time processing
SELECT 
  change_id,
  operation_type,
  collection_name,
  document_key,
  cluster_time,

  -- Document data
  full_document,
  update_description,

  -- Computed fields from stream
  activity_score,
  user_segment,
  session_duration,

  -- Change categorization
  CASE 
    WHEN operation_type = 'insert' THEN 'new_activity'
    WHEN operation_type = 'update' AND update_description.updated_fields ? 'status' THEN 'status_change'
    WHEN operation_type = 'update' THEN 'activity_updated'
    ELSE 'other'
  END as change_category,

  -- Priority assessment
  CASE
    WHEN activity_score >= 50 THEN 'high'
    WHEN activity_score >= 10 THEN 'medium'
    ELSE 'low'
  END as priority_level,

  processed_at

FROM CHANGE_STREAM('user_activities_stream')
WHERE activity_score > 0
ORDER BY activity_score DESC, cluster_time ASC;

-- Multi-collection change stream monitoring
CREATE CHANGE_STREAM business_events_stream
WITH (
  operations = ARRAY['insert', 'update', 'delete'],
  full_document = 'updateLookup',
  full_document_before_change = 'whenAvailable'
)
FILTER (
  collection_name IN ('orders', 'users', 'products', 'inventory') AND
  (
    -- High-impact order changes
    (collection_name = 'orders' AND operation_type IN ('insert', 'update')) OR
    -- User registration and profile updates
    (collection_name = 'users' AND (operation_type = 'insert' OR update_description.updated_fields ? 'subscription_type')) OR
    -- Product catalog changes
    (collection_name = 'products' AND update_description.updated_fields ? 'price') OR
    -- Inventory level changes
    (collection_name = 'inventory' AND update_description.updated_fields ? 'available_quantity')
  )
);

-- Real-time analytics from change streams
WITH change_stream_analytics AS (
  SELECT 
    collection_name,
    operation_type,
    DATE_TRUNC('minute', cluster_time) as time_bucket,

    -- Event counts
    COUNT(*) as event_count,
    COUNT(*) FILTER (WHERE operation_type = 'insert') as inserts,
    COUNT(*) FILTER (WHERE operation_type = 'update') as updates,
    COUNT(*) FILTER (WHERE operation_type = 'delete') as deletes,

    -- Business metrics
    CASE collection_name
      WHEN 'orders' THEN 
        SUM(CASE WHEN operation_type = 'insert' THEN (full_document->>'total_amount')::numeric ELSE 0 END)
      ELSE 0
    END as revenue_impact,

    CASE collection_name
      WHEN 'inventory' THEN
        SUM(CASE 
          WHEN update_description.updated_fields ? 'available_quantity' 
          THEN (full_document->>'available_quantity')::int - (update_description.updated_fields->>'available_quantity')::int
          ELSE 0
        END)
      ELSE 0  
    END as inventory_change,

    -- Processing performance
    AVG(EXTRACT(EPOCH FROM (processed_at - cluster_time))) as avg_processing_latency_seconds,
    MAX(EXTRACT(EPOCH FROM (processed_at - cluster_time))) as max_processing_latency_seconds

  FROM CHANGE_STREAM('business_events_stream')
  WHERE cluster_time >= CURRENT_TIMESTAMP - INTERVAL '1 hour'
  GROUP BY collection_name, operation_type, DATE_TRUNC('minute', cluster_time)
),

real_time_dashboard AS (
  SELECT 
    time_bucket,

    -- Overall activity metrics
    SUM(event_count) as total_events,
    SUM(inserts) as total_inserts,
    SUM(updates) as total_updates,
    SUM(deletes) as total_deletes,

    -- Business KPIs
    SUM(revenue_impact) as minute_revenue,
    SUM(inventory_change) as net_inventory_change,

    -- Performance metrics
    AVG(avg_processing_latency_seconds) as avg_latency,
    MAX(max_processing_latency_seconds) as max_latency,

    -- Collection breakdown
    json_object_agg(
      collection_name,
      json_build_object(
        'events', event_count,
        'inserts', inserts,
        'updates', updates,
        'deletes', deletes
      )
    ) as collection_breakdown,

    -- Alerts and anomalies
    CASE 
      WHEN SUM(event_count) > 1000 THEN 'high_volume'
      WHEN AVG(avg_processing_latency_seconds) > 5 THEN 'high_latency'
      WHEN SUM(revenue_impact) < 0 THEN 'revenue_concern'
      ELSE 'normal'
    END as alert_status

  FROM change_stream_analytics
  GROUP BY time_bucket
)

SELECT 
  time_bucket,
  total_events,
  total_inserts,
  total_updates,
  total_deletes,
  ROUND(minute_revenue, 2) as revenue_per_minute,
  net_inventory_change,
  ROUND(avg_latency, 3) as avg_processing_seconds,
  ROUND(max_latency, 3) as max_processing_seconds,
  collection_breakdown,
  alert_status,

  -- Trend indicators
  LAG(total_events, 1) OVER (ORDER BY time_bucket) as prev_minute_events,
  ROUND(
    (total_events - LAG(total_events, 1) OVER (ORDER BY time_bucket))::numeric / 
    NULLIF(LAG(total_events, 1) OVER (ORDER BY time_bucket), 0) * 100,
    1
  ) as event_growth_pct,

  ROUND(
    (minute_revenue - LAG(minute_revenue, 1) OVER (ORDER BY time_bucket))::numeric / 
    NULLIF(LAG(minute_revenue, 1) OVER (ORDER BY time_bucket), 0) * 100,
    1
  ) as revenue_growth_pct

FROM real_time_dashboard
ORDER BY time_bucket DESC
LIMIT 60; -- Last hour of minute-by-minute data

-- Change stream error handling and monitoring
SELECT 
  stream_name,
  stream_status,
  created_at,
  last_event_at,
  event_count,
  error_count,
  retry_count,

  -- Health assessment
  CASE 
    WHEN error_count::float / NULLIF(event_count, 0) > 0.1 THEN 'UNHEALTHY'
    WHEN error_count::float / NULLIF(event_count, 0) > 0.05 THEN 'WARNING'  
    WHEN last_event_at < CURRENT_TIMESTAMP - INTERVAL '1 hour' THEN 'INACTIVE'
    ELSE 'HEALTHY'
  END as health_status,

  -- Performance metrics
  ROUND(error_count::numeric / NULLIF(event_count, 0) * 100, 2) as error_rate_pct,
  EXTRACT(EPOCH FROM (CURRENT_TIMESTAMP - last_event_at)) / 60 as minutes_since_last_event,

  -- Resume token status
  CASE 
    WHEN resume_token IS NOT NULL THEN 'RESUMABLE'
    ELSE 'NOT_RESUMABLE'
  END as resume_status,

  -- Recommendations
  CASE 
    WHEN error_count::float / NULLIF(event_count, 0) > 0.1 THEN 'Investigate error patterns and processing logic'
    WHEN retry_count > 5 THEN 'Check connection stability and resource limits'
    WHEN last_event_at < CURRENT_TIMESTAMP - INTERVAL '2 hours' THEN 'Verify data source and stream configuration'
    ELSE 'Stream operating normally'
  END as recommendation

FROM CHANGE_STREAM_STATUS()
ORDER BY 
  CASE health_status
    WHEN 'UNHEALTHY' THEN 1
    WHEN 'WARNING' THEN 2
    WHEN 'INACTIVE' THEN 3
    ELSE 4
  END,
  error_rate_pct DESC NULLS LAST;

-- Event-driven workflow triggers
CREATE TRIGGER real_time_order_processing
ON CHANGE_STREAM('business_events_stream')
WHEN (
  collection_name = 'orders' AND 
  operation_type = 'insert' AND
  full_document->>'status' = 'pending'
)
EXECUTE PROCEDURE (
  -- Inventory allocation
  UPDATE inventory 
  SET reserved_quantity = reserved_quantity + (
    SELECT SUM((item->>'quantity')::int)
    FROM json_array_elements(NEW.full_document->'items') AS item
    WHERE inventory.product_id = (item->>'product_id')::uuid
  ),
  available_quantity = available_quantity - (
    SELECT SUM((item->>'quantity')::int) 
    FROM json_array_elements(NEW.full_document->'items') AS item
    WHERE inventory.product_id = (item->>'product_id')::uuid
  )
  WHERE product_id IN (
    SELECT DISTINCT (item->>'product_id')::uuid
    FROM json_array_elements(NEW.full_document->'items') AS item
  );

  -- Payment processing trigger
  INSERT INTO payment_processing_queue (
    order_id,
    customer_id,
    amount,
    payment_method,
    priority,
    created_at
  )
  VALUES (
    (NEW.full_document->>'_id')::uuid,
    (NEW.full_document->>'customer_id')::uuid,
    (NEW.full_document->>'total_amount')::numeric,
    NEW.full_document->>'payment_method',
    CASE 
      WHEN (NEW.full_document->>'total_amount')::numeric > 1000 THEN 'high'
      ELSE 'normal'
    END,
    CURRENT_TIMESTAMP
  );

  -- Customer notification
  INSERT INTO notification_queue (
    recipient_id,
    notification_type,
    channel,
    message_data,
    created_at
  )
  VALUES (
    (NEW.full_document->>'customer_id')::uuid,
    'order_confirmation',
    'email',
    json_build_object(
      'order_id', NEW.full_document->>'_id',
      'order_total', NEW.full_document->>'total_amount',
      'items_count', json_array_length(NEW.full_document->'items')
    ),
    CURRENT_TIMESTAMP
  );
);

-- Change stream performance optimization
WITH stream_performance AS (
  SELECT 
    stream_name,
    AVG(processing_time_ms) as avg_processing_time,
    PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY processing_time_ms) as p95_processing_time,
    MAX(processing_time_ms) as max_processing_time,
    COUNT(*) as total_events,
    SUM(CASE WHEN processing_time_ms > 1000 THEN 1 ELSE 0 END) as slow_events,
    AVG(batch_size) as avg_batch_size
  FROM CHANGE_STREAM_METRICS()
  WHERE recorded_at >= CURRENT_TIMESTAMP - INTERVAL '1 hour'
  GROUP BY stream_name
)
SELECT 
  stream_name,
  ROUND(avg_processing_time, 2) as avg_processing_ms,
  ROUND(p95_processing_time, 2) as p95_processing_ms,
  max_processing_time as max_processing_ms,
  total_events,
  ROUND((slow_events::numeric / total_events) * 100, 2) as slow_event_pct,
  ROUND(avg_batch_size, 1) as avg_batch_size,

  -- Performance assessment
  CASE 
    WHEN avg_processing_time > 2000 THEN 'SLOW'
    WHEN slow_events::numeric / total_events > 0.1 THEN 'INCONSISTENT'  
    WHEN avg_batch_size < 10 THEN 'UNDERUTILIZED'
    ELSE 'OPTIMAL'
  END as performance_status,

  -- Optimization recommendations
  CASE
    WHEN avg_processing_time > 2000 THEN 'Optimize event processing logic and reduce complexity'
    WHEN slow_events::numeric / total_events > 0.1 THEN 'Investigate processing bottlenecks and resource constraints'
    WHEN avg_batch_size < 10 THEN 'Increase batch size for better throughput'
    WHEN p95_processing_time > 5000 THEN 'Add error handling and timeout management'
    ELSE 'Performance is within acceptable limits'
  END as optimization_recommendation

FROM stream_performance
ORDER BY avg_processing_time DESC;

-- QueryLeaf provides comprehensive change stream capabilities:
-- 1. SQL-familiar change stream creation and configuration
-- 2. Advanced filtering with complex business logic
-- 3. Real-time enrichment with related collection data
-- 4. Computed fields for event categorization and scoring
-- 5. Multi-collection monitoring with unified interface
-- 6. Real-time analytics and dashboard integration
-- 7. Event-driven workflow automation and triggers
-- 8. Performance monitoring and optimization recommendations
-- 9. Error handling and automatic retry mechanisms
-- 10. Resume capability for fault-tolerant processing

Best Practices for Change Stream Implementation

Design Guidelines

Essential practices for optimal change stream configuration:

  1. Strategic Filtering: Design filters to process only relevant changes and minimize resource usage
  2. Resume Strategy: Implement robust resume token storage for fault-tolerant processing
  3. Error Handling: Build comprehensive error handling with retry strategies and dead letter queues
  4. Performance Monitoring: Track processing latency, throughput, and error rates continuously
  5. Resource Management: Size change stream configurations based on expected data volumes
  6. Event Ordering: Understand and leverage MongoDB's ordering guarantees within and across collections

Scalability and Performance

Optimize change streams for high-throughput, low-latency processing:

  1. Batch Processing: Configure appropriate batch sizes for optimal throughput
  2. Parallel Processing: Distribute change processing across multiple consumers when possible
  3. Resource Allocation: Ensure adequate compute and network resources for real-time processing
  4. Connection Management: Use connection pooling and proper resource cleanup
  5. Monitoring Integration: Integrate with observability tools for production monitoring
  6. Load Testing: Test change stream performance under expected and peak loads

Conclusion

MongoDB Change Streams provide enterprise-grade real-time data processing capabilities that eliminate the complexity and overhead of polling-based change detection while delivering immediate, ordered, and resumable event notifications. The integration of sophisticated filtering, enrichment, and processing capabilities makes building reactive applications and event-driven architectures both powerful and maintainable.

Key Change Streams benefits include:

  • Real-Time Processing: Sub-second latency for immediate response to data changes
  • Complete Change Context: Full document state and change details for comprehensive processing
  • Fault Tolerance: Automatic resume capability and robust error handling mechanisms
  • Scalable Architecture: Support for high-throughput processing across sharded clusters
  • Developer Experience: Intuitive API with powerful aggregation pipeline integration
  • Production Ready: Built-in monitoring, authentication, and operational capabilities

Whether you're building live dashboards, automated workflows, real-time analytics, or event-driven microservices, MongoDB Change Streams with QueryLeaf's familiar SQL interface provides the foundation for reactive data processing. This combination enables you to implement sophisticated real-time capabilities while preserving familiar development patterns and operational approaches.

QueryLeaf Integration: QueryLeaf automatically manages MongoDB Change Stream operations while providing SQL-familiar change detection, event filtering, and real-time processing syntax. Advanced stream configuration, error handling, and performance optimization are seamlessly handled through familiar SQL patterns, making real-time data processing both powerful and accessible.

The integration of native change stream capabilities with SQL-style operations makes MongoDB an ideal platform for applications requiring both real-time responsiveness and familiar database interaction patterns, ensuring your event-driven architecture remains both effective and maintainable as it scales and evolves.

MongoDB Data Modeling and Schema Design Patterns: SQL-Style Database Design for NoSQL Performance and Flexibility

Modern applications require database designs that can handle complex data relationships, evolving requirements, and massive scale while maintaining query performance and data consistency. Traditional relational database design relies on normalization principles and rigid schema constraints, but often struggles with nested data structures, dynamic attributes, and horizontal scaling demands that characterize modern applications.

MongoDB's document-based data model provides flexible schema design that can adapt to changing requirements while delivering high performance through strategic denormalization and document structure optimization. Unlike relational databases that require complex joins to reassemble related data, MongoDB document modeling can embed related data within single documents, reducing query complexity and improving performance for read-heavy workloads.

The Relational Database Design Challenge

Traditional relational database design approaches face significant limitations with modern application requirements:

-- Traditional relational database design - rigid and join-heavy
-- E-commerce product catalog with complex relationships

CREATE TABLE categories (
    category_id SERIAL PRIMARY KEY,
    category_name VARCHAR(100) NOT NULL,
    parent_category_id INTEGER REFERENCES categories(category_id),
    description TEXT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE brands (
    brand_id SERIAL PRIMARY KEY,
    brand_name VARCHAR(100) NOT NULL UNIQUE,
    brand_description TEXT,
    brand_website VARCHAR(255),
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE products (
    product_id SERIAL PRIMARY KEY,
    product_name VARCHAR(255) NOT NULL,
    product_description TEXT,
    category_id INTEGER NOT NULL REFERENCES categories(category_id),
    brand_id INTEGER NOT NULL REFERENCES brands(brand_id),
    base_price DECIMAL(10, 2) NOT NULL,
    weight DECIMAL(8, 3),
    dimensions_length DECIMAL(8, 2),
    dimensions_width DECIMAL(8, 2), 
    dimensions_height DECIMAL(8, 2),
    is_active BOOLEAN DEFAULT true,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE product_attributes (
    attribute_id SERIAL PRIMARY KEY,
    product_id INTEGER NOT NULL REFERENCES products(product_id),
    attribute_name VARCHAR(100) NOT NULL,
    attribute_value TEXT NOT NULL,
    attribute_type VARCHAR(50) DEFAULT 'string',
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    UNIQUE(product_id, attribute_name)
);

CREATE TABLE product_images (
    image_id SERIAL PRIMARY KEY,
    product_id INTEGER NOT NULL REFERENCES products(product_id),
    image_url VARCHAR(500) NOT NULL,
    image_alt_text VARCHAR(255),
    display_order INTEGER DEFAULT 0,
    is_primary BOOLEAN DEFAULT false,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE product_variants (
    variant_id SERIAL PRIMARY KEY,
    product_id INTEGER NOT NULL REFERENCES products(product_id),
    variant_name VARCHAR(255) NOT NULL,
    sku VARCHAR(100) UNIQUE,
    price_adjustment DECIMAL(10, 2) DEFAULT 0,
    stock_quantity INTEGER DEFAULT 0,
    variant_attributes JSONB,
    is_active BOOLEAN DEFAULT true,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE product_reviews (
    review_id SERIAL PRIMARY KEY,
    product_id INTEGER NOT NULL REFERENCES products(product_id),
    customer_id INTEGER NOT NULL REFERENCES customers(customer_id),
    rating INTEGER CHECK (rating >= 1 AND rating <= 5),
    review_title VARCHAR(200),
    review_text TEXT,
    is_verified_purchase BOOLEAN DEFAULT false,
    helpful_votes INTEGER DEFAULT 0,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Complex query to get product details with all related data
SELECT 
    p.product_id,
    p.product_name,
    p.product_description,
    p.base_price,

    -- Category hierarchy (requires recursive CTE for full path)
    c.category_name,
    parent_c.category_name as parent_category,

    -- Brand information
    b.brand_name,
    b.brand_description,

    -- Product dimensions
    CASE 
        WHEN p.dimensions_length IS NOT NULL THEN 
            CONCAT(p.dimensions_length, ' x ', p.dimensions_width, ' x ', p.dimensions_height)
        ELSE NULL
    END as dimensions,

    -- Aggregate attributes (problematic with large numbers)
    STRING_AGG(
        CONCAT(pa.attribute_name, ': ', pa.attribute_value), 
        ', ' 
        ORDER BY pa.attribute_name
    ) as attributes,

    -- Primary image
    pi_primary.image_url as primary_image,

    -- Review statistics
    COUNT(DISTINCT pr.review_id) as review_count,
    ROUND(AVG(pr.rating), 2) as average_rating,

    -- Variant count
    COUNT(DISTINCT pv.variant_id) as variant_count,

    -- Stock availability across variants
    SUM(pv.stock_quantity) as total_stock

FROM products p
JOIN categories c ON p.category_id = c.category_id
LEFT JOIN categories parent_c ON c.parent_category_id = parent_c.category_id
JOIN brands b ON p.brand_id = b.brand_id
LEFT JOIN product_attributes pa ON p.product_id = pa.product_id
LEFT JOIN product_images pi_primary ON p.product_id = pi_primary.product_id 
    AND pi_primary.is_primary = true
LEFT JOIN product_variants pv ON p.product_id = pv.product_id 
    AND pv.is_active = true
LEFT JOIN product_reviews pr ON p.product_id = pr.product_id

WHERE p.is_active = true
    AND p.product_id = $1

GROUP BY 
    p.product_id, p.product_name, p.product_description, p.base_price,
    c.category_name, parent_c.category_name,
    b.brand_name, b.brand_description,
    p.dimensions_length, p.dimensions_width, p.dimensions_height,
    pi_primary.image_url;

-- Problems with relational approach:
-- 1. Complex multi-table joins for simple product queries
-- 2. Difficult to add new product attributes without schema changes
-- 3. Poor performance with large numbers of attributes and images
-- 4. Rigid schema prevents storing varying product structures
-- 5. N+1 query problems when loading product catalogs
-- 6. Difficult to handle hierarchical categories efficiently
-- 7. Complex aggregation queries for review statistics
-- 8. Schema migrations required for new product types
-- 9. Inefficient storage of sparse attributes
-- 10. Challenging to implement full-text search across attributes

MongoDB's document-based design eliminates many of these issues:

// MongoDB optimized document design - flexible and performance-oriented
// Single document contains all product information

// Example product document with embedded data
const productDocument = {
  _id: ObjectId("64a1b2c3d4e5f6789012345a"),

  // Basic product information
  name: "MacBook Pro 16-inch M3 Max",
  description: "Powerful laptop for professional workflows with M3 Max chip, stunning Liquid Retina XDR display, and all-day battery life.",
  sku: "MACBOOK-PRO-16-M3MAX-512GB",

  // Category with embedded hierarchy
  category: {
    primary: "Electronics",
    secondary: "Computers & Tablets", 
    tertiary: "Laptops",
    path: ["Electronics", "Computers & Tablets", "Laptops"],
    categoryId: "electronics-computers-laptops"
  },

  // Brand information embedded
  brand: {
    name: "Apple",
    description: "Innovative technology products and solutions",
    website: "https://www.apple.com",
    brandId: "apple"
  },

  // Pricing structure
  pricing: {
    basePrice: 3499.00,
    currency: "USD",
    priceHistory: [
      { price: 3499.00, effectiveDate: ISODate("2024-01-15"), reason: "launch_price" },
      { price: 3299.00, effectiveDate: ISODate("2024-06-01"), reason: "promotional_discount" }
    ],
    currentPrice: 3299.00,
    msrp: 3499.00
  },

  // Physical specifications
  specifications: {
    dimensions: {
      length: 35.57,
      width: 24.81,
      height: 1.68,
      unit: "cm"
    },
    weight: {
      value: 2.16,
      unit: "kg"
    },

    // Technical specifications as flexible object
    technical: {
      processor: "Apple M3 Max chip with 12-core CPU and 38-core GPU",
      memory: "36GB unified memory",
      storage: "512GB SSD storage",
      display: {
        size: "16.2-inch",
        resolution: "3456 x 2234",
        technology: "Liquid Retina XDR",
        brightness: "1000 nits sustained, 1600 nits peak"
      },
      connectivity: [
        "Three Thunderbolt 4 ports",
        "HDMI port", 
        "SDXC card slot",
        "MagSafe 3 charging port",
        "3.5mm headphone jack"
      ],
      wireless: {
        wifi: "Wi-Fi 6E",
        bluetooth: "Bluetooth 5.3"
      },
      operatingSystem: "macOS Sonoma"
    }
  },

  // Flexible attributes array for varying product features
  attributes: [
    { name: "Color", value: "Space Black", type: "string", searchable: true },
    { name: "Screen Size", value: 16.2, type: "number", unit: "inches" },
    { name: "Battery Life", value: "Up to 22 hours", type: "string" },
    { name: "Warranty", value: "1 Year Limited", type: "string" },
    { name: "Touch ID", value: true, type: "boolean" }
  ],

  // Images embedded for faster loading
  images: [
    {
      url: "https://images.example.com/macbook-pro-16-space-black-1.jpg",
      altText: "MacBook Pro 16-inch in Space Black - front view",
      isPrimary: true,
      displayOrder: 1,
      imageType: "product_shot",
      dimensions: { width: 2000, height: 1500 }
    },
    {
      url: "https://images.example.com/macbook-pro-16-space-black-2.jpg", 
      altText: "MacBook Pro 16-inch in Space Black - side view",
      isPrimary: false,
      displayOrder: 2,
      imageType: "product_shot",
      dimensions: { width: 2000, height: 1500 }
    }
  ],

  // Product variants embedded for related configurations
  variants: [
    {
      _id: ObjectId("64a1b2c3d4e5f6789012345b"),
      name: "MacBook Pro 16-inch M3 Max - 1TB",
      sku: "MACBOOK-PRO-16-M3MAX-1TB",
      priceAdjustment: 500.00,
      specifications: {
        storage: "1TB SSD storage",
        memory: "36GB unified memory"
      },
      stockQuantity: 45,
      isActive: true,
      attributes: [
        { name: "Storage", value: "1TB", type: "string" }
      ]
    },
    {
      _id: ObjectId("64a1b2c3d4e5f6789012345c"),
      name: "MacBook Pro 16-inch M3 Max - Silver",
      sku: "MACBOOK-PRO-16-M3MAX-SILVER",
      priceAdjustment: 0.00,
      attributes: [
        { name: "Color", value: "Silver", type: "string" }
      ],
      stockQuantity: 23,
      isActive: true
    }
  ],

  // Inventory and availability
  inventory: {
    stockQuantity: 67,
    reservedQuantity: 3,
    availableQuantity: 64,
    reorderLevel: 10,
    reorderQuantity: 50,
    lastRestocked: ISODate("2024-09-01"),
    supplier: {
      name: "Apple Inc.",
      supplierId: "APPLE_DIRECT",
      leadTimeDays: 7
    }
  },

  // Reviews embedded with summary statistics
  reviews: {
    // Summary statistics for quick access
    summary: {
      totalReviews: 347,
      averageRating: 4.7,
      ratingDistribution: {
        "5": 245,
        "4": 78, 
        "3": 18,
        "2": 4,
        "1": 2
      },
      lastUpdated: ISODate("2024-09-14")
    },

    // Recent reviews embedded (with pagination for full list)
    recent: [
      {
        _id: ObjectId("64a1b2c3d4e5f6789012346a"),
        customerId: ObjectId("64a1b2c3d4e5f678901234aa"),
        customerName: "Sarah Chen",
        rating: 5,
        title: "Exceptional performance for video editing",
        text: "The M3 Max chip handles 4K video editing effortlessly. Battery life is impressive for such a powerful machine.",
        isVerifiedPurchase: true,
        helpfulVotes: 23,
        createdAt: ISODate("2024-09-10"),
        updatedAt: ISODate("2024-09-10")
      }
    ]
  },

  // SEO and search optimization
  seo: {
    metaTitle: "MacBook Pro 16-inch M3 Max - Professional Performance",
    metaDescription: "Experience unmatched performance with the MacBook Pro featuring M3 Max chip, 36GB memory, and stunning 16-inch Liquid Retina XDR display.",
    keywords: ["MacBook Pro", "M3 Max", "16-inch", "laptop", "Apple", "professional"],
    searchTerms: [
      "macbook pro 16 inch",
      "apple laptop", 
      "m3 max",
      "professional laptop",
      "video editing laptop"
    ]
  },

  // Status and metadata
  status: {
    isActive: true,
    isPublished: true,
    isFeatured: true,
    publishedAt: ISODate("2024-01-15"),
    lastModified: ISODate("2024-09-14"),
    version: 3
  },

  // Analytics and performance tracking
  analytics: {
    views: {
      total: 15420,
      thisMonth: 2341,
      uniqueVisitors: 12087
    },
    conversions: {
      addToCart: 892,
      purchases: 156,
      conversionRate: 17.5
    },
    searchPerformance: {
      avgPosition: 2.3,
      clickThroughRate: 8.7,
      impressions: 45230
    }
  },

  // Timestamps for auditing and tracking
  createdAt: ISODate("2024-01-15"),
  updatedAt: ISODate("2024-09-14")
};

// Benefits of MongoDB document design:
// - Single query retrieves complete product information
// - Flexible schema accommodates different product types
// - Embedded related data eliminates joins
// - Rich nested structures for complex specifications
// - Easy to add new attributes without schema changes
// - Efficient storage and retrieval of product hierarchies
// - Native support for arrays and nested objects
// - Simplified application logic with document-oriented design
// - Better performance for product catalog queries
// - Natural fit for JSON-based APIs and front-end applications

Understanding MongoDB Data Modeling Patterns

Document Structure and Embedding Strategies

Strategic document design patterns for optimal performance and maintainability:

// Advanced MongoDB data modeling patterns for different use cases
class MongoDataModelingPatterns {
  constructor(db) {
    this.db = db;
    this.modelingPatterns = new Map();
  }

  // Pattern 1: Embedded Document Pattern
  // Use when: Related data is accessed together, 1:1 or 1:few relationships
  createUserProfileEmbeddedPattern() {
    return {
      _id: ObjectId("64a1b2c3d4e5f6789012347a"),

      // Basic user information
      username: "sarah_dev",
      email: "sarah@example.com",

      // Embedded profile information (1:1 relationship)
      profile: {
        firstName: "Sarah",
        lastName: "Johnson",
        dateOfBirth: ISODate("1990-05-15"),
        avatar: {
          url: "https://images.example.com/avatars/sarah_dev.jpg",
          uploadedAt: ISODate("2024-03-12"),
          size: { width: 200, height: 200 }
        },
        bio: "Full-stack developer passionate about clean code and user experience",
        location: {
          city: "San Francisco",
          state: "CA",
          country: "USA",
          timezone: "America/Los_Angeles"
        },
        socialMedia: {
          github: "https://github.com/sarahdev",
          linkedin: "https://linkedin.com/in/sarah-johnson-dev",
          twitter: "@sarah_codes"
        }
      },

      // Embedded preferences (1:1 relationship)
      preferences: {
        theme: "dark",
        language: "en",
        notifications: {
          email: true,
          push: false,
          sms: false
        },
        privacy: {
          profileVisibility: "public",
          showEmail: false,
          showLocation: true
        }
      },

      // Embedded contact methods (1:few relationship)  
      contactMethods: [
        {
          type: "email",
          value: "sarah@example.com", 
          isPrimary: true,
          isVerified: true,
          verifiedAt: ISODate("2024-01-15")
        },
        {
          type: "phone",
          value: "+1-555-123-4567",
          isPrimary: false,
          isVerified: true,
          verifiedAt: ISODate("2024-01-20")
        }
      ],

      // Embedded skills (1:many but limited)
      skills: [
        { name: "JavaScript", level: "expert", yearsExperience: 8 },
        { name: "Python", level: "advanced", yearsExperience: 5 },
        { name: "MongoDB", level: "intermediate", yearsExperience: 3 },
        { name: "React", level: "expert", yearsExperience: 6 }
      ],

      // Account status and metadata
      account: {
        status: "active",
        type: "premium",
        createdAt: ISODate("2024-01-15"),
        lastLoginAt: ISODate("2024-09-14"),
        loginCount: 342,
        isEmailVerified: true,
        twoFactorEnabled: true
      },

      createdAt: ISODate("2024-01-15"),
      updatedAt: ISODate("2024-09-14")
    };
  }

  // Pattern 2: Reference Pattern  
  // Use when: Large documents, many:many relationships, frequently changing data
  createBlogPostReferencePattern() {
    // Main blog post document
    const blogPost = {
      _id: ObjectId("64a1b2c3d4e5f6789012348a"),
      title: "Advanced MongoDB Data Modeling Techniques",
      slug: "advanced-mongodb-data-modeling-techniques",
      content: "Content of the blog post...",
      excerpt: "Learn advanced techniques for MongoDB data modeling...",

      // Reference to author (many posts : 1 author)
      authorId: ObjectId("64a1b2c3d4e5f6789012347a"),

      // Reference to category (many posts : 1 category)
      categoryId: ObjectId("64a1b2c3d4e5f6789012349a"),

      // References to tags (many posts : many tags)
      tagIds: [
        ObjectId("64a1b2c3d4e5f67890123401"),
        ObjectId("64a1b2c3d4e5f67890123402"), 
        ObjectId("64a1b2c3d4e5f67890123403")
      ],

      // Post metadata
      metadata: {
        publishedAt: ISODate("2024-09-10"),
        status: "published",
        featuredImageUrl: "https://images.example.com/blog/mongodb-modeling.jpg",
        readingTime: 12,
        wordCount: 2400
      },

      // SEO information
      seo: {
        metaTitle: "Advanced MongoDB Data Modeling - Complete Guide",
        metaDescription: "Master MongoDB data modeling with patterns, best practices, and real-world examples.",
        keywords: ["MongoDB", "data modeling", "NoSQL", "database design"]
      },

      // Analytics data
      stats: {
        views: 2340,
        likes: 89,
        shares: 23,
        commentsCount: 15, // Computed field updated by triggers
        averageRating: 4.6
      },

      createdAt: ISODate("2024-09-08"),
      updatedAt: ISODate("2024-09-14")
    };

    // Separate comments collection for scalability
    const blogComments = [
      {
        _id: ObjectId("64a1b2c3d4e5f67890123501"),
        postId: ObjectId("64a1b2c3d4e5f6789012348a"), // Reference to blog post
        authorId: ObjectId("64a1b2c3d4e5f67890123470"), // Reference to user
        content: "Great article! Very helpful examples.",

        // Embedded author info for faster loading (denormalization)
        author: {
          username: "dev_mike",
          avatar: "https://images.example.com/avatars/dev_mike.jpg",
          displayName: "Mike Chen"
        },

        // Support for nested replies
        parentCommentId: null, // Top-level comment
        replyCount: 2,

        // Comment moderation
        status: "approved",
        moderatedBy: ObjectId("64a1b2c3d4e5f67890123500"),
        moderatedAt: ISODate("2024-09-11"),

        // Engagement metrics
        likes: 5,
        dislikes: 0,
        isReported: false,

        createdAt: ISODate("2024-09-11"),
        updatedAt: ISODate("2024-09-11")
      }
    ];

    return { blogPost, blogComments };
  }

  // Pattern 3: Hybrid Pattern (Embedding + Referencing)
  // Use when: Need benefits of both patterns for different aspects
  createOrderHybridPattern() {
    return {
      _id: ObjectId("64a1b2c3d4e5f6789012350a"),
      orderNumber: "ORD-2024-091401",

      // Customer reference (frequent lookups, separate profile management)
      customerId: ObjectId("64a1b2c3d4e5f6789012347a"),

      // Embedded customer snapshot for order history queries
      customerSnapshot: {
        name: "Sarah Johnson",
        email: "sarah@example.com",
        phone: "+1-555-123-4567",
        // Capture customer state at time of order
        membershipLevel: "gold",
        snapshotDate: ISODate("2024-09-14")
      },

      // Embedded order items (order-specific, not shared)
      items: [
        {
          productId: ObjectId("64a1b2c3d4e5f6789012345a"), // Reference for inventory updates

          // Embedded product snapshot to preserve order history
          productSnapshot: {
            name: "MacBook Pro 16-inch M3 Max",
            sku: "MACBOOK-PRO-16-M3MAX-512GB",
            description: "Powerful laptop for professional workflows...",
            image: "https://images.example.com/macbook-pro-16-1.jpg",
            // Capture product state at time of order
            snapshotDate: ISODate("2024-09-14")
          },

          quantity: 1,
          unitPrice: 3299.00,
          totalPrice: 3299.00,

          // Item-specific information
          selectedVariant: {
            color: "Space Black",
            storage: "512GB",
            variantId: ObjectId("64a1b2c3d4e5f6789012345b")
          },

          // Embedded pricing breakdown
          pricing: {
            basePrice: 3499.00,
            discount: 200.00,
            discountReason: "promotional_discount",
            finalPrice: 3299.00,
            tax: 263.92,
            taxRate: 8.0
          }
        }
      ],

      // Embedded shipping information
      shipping: {
        method: "express",
        carrier: "FedEx",
        trackingNumber: "1234567890123456",
        cost: 15.99,

        // Embedded shipping address (snapshot)
        address: {
          name: "Sarah Johnson",
          company: null,
          addressLine1: "123 Tech Street",
          addressLine2: "Apt 4B",
          city: "San Francisco",
          state: "CA",
          postalCode: "94107",
          country: "USA",
          phone: "+1-555-123-4567"
        },

        estimatedDelivery: ISODate("2024-09-16"),
        actualDelivery: null,
        deliveryInstructions: "Leave at door if not home"
      },

      // Embedded billing information
      billing: {
        // Reference to payment method for future use
        paymentMethodId: ObjectId("64a1b2c3d4e5f67890123600"),

        // Embedded payment snapshot
        paymentSnapshot: {
          method: "credit_card",
          last4: "4242",
          brand: "visa",
          expiryMonth: 12,
          expiryYear: 2027,
          // Capture payment method state at time of order
          snapshotDate: ISODate("2024-09-14")
        },

        // Billing address (may differ from shipping)
        address: {
          name: "Sarah Johnson",
          addressLine1: "456 Billing Ave",
          city: "San Francisco",
          state: "CA", 
          postalCode: "94107",
          country: "USA"
        },

        // Payment processing details
        transactionId: "txn_1234567890abcdef",
        processorResponse: "approved",
        authorizationCode: "AUTH123456",
        capturedAt: ISODate("2024-09-14")
      },

      // Order totals and calculations
      totals: {
        subtotal: 3299.00,
        taxAmount: 263.92,
        shippingAmount: 15.99,
        discountAmount: 200.00,
        totalAmount: 3378.91,
        currency: "USD"
      },

      // Order status and timeline
      status: {
        current: "processing",
        timeline: [
          {
            status: "placed",
            timestamp: ISODate("2024-09-14T10:30:00Z"),
            note: "Order successfully placed"
          },
          {
            status: "paid", 
            timestamp: ISODate("2024-09-14T10:30:15Z"),
            note: "Payment processed successfully"
          },
          {
            status: "processing",
            timestamp: ISODate("2024-09-14T11:15:00Z"),
            note: "Order sent to fulfillment center"
          }
        ]
      },

      // Order metadata
      metadata: {
        source: "web",
        campaign: "fall_promotion_2024",
        referrer: "google_ads",
        userAgent: "Mozilla/5.0...",
        ipAddress: "192.168.1.1",
        sessionId: "sess_abcd1234efgh5678"
      },

      createdAt: ISODate("2024-09-14T10:30:00Z"),
      updatedAt: ISODate("2024-09-14T11:15:00Z")
    };
  }

  // Pattern 4: Polymorphic Pattern
  // Use when: Similar documents have different structures based on type
  createNotificationPolymorphicPattern() {
    const notifications = [
      // Email notification type
      {
        _id: ObjectId("64a1b2c3d4e5f6789012351a"),
        type: "email",
        userId: ObjectId("64a1b2c3d4e5f6789012347a"),

        // Common notification fields
        title: "Welcome to our platform!",
        priority: "normal",
        status: "sent",
        createdAt: ISODate("2024-09-14T10:00:00Z"),

        // Email-specific fields
        emailData: {
          from: "noreply@example.com",
          to: "sarah@example.com",
          subject: "Welcome to our platform!",
          templateId: "welcome_email_v2",
          templateVariables: {
            firstName: "Sarah",
            activationLink: "https://example.com/activate/abc123"
          },
          deliveryAttempts: 1,
          deliveredAt: ISODate("2024-09-14T10:01:30Z"),
          openedAt: ISODate("2024-09-14T10:15:22Z"),
          clickedAt: ISODate("2024-09-14T10:16:10Z")
        }
      },

      // Push notification type
      {
        _id: ObjectId("64a1b2c3d4e5f6789012351b"),
        type: "push",
        userId: ObjectId("64a1b2c3d4e5f6789012347a"),

        // Common notification fields
        title: "Your order has shipped!",
        priority: "high",
        status: "delivered",
        createdAt: ISODate("2024-09-14T14:30:00Z"),

        // Push-specific fields
        pushData: {
          deviceTokens: [
            "device_token_1234567890abcdef",
            "device_token_abcdef1234567890"
          ],
          payload: {
            alert: {
              title: "Order Shipped",
              body: "Your MacBook Pro is on the way! Track: 1234567890123456"
            },
            badge: 1,
            sound: "default",
            category: "order_update",
            customData: {
              orderId: "ORD-2024-091401",
              trackingNumber: "1234567890123456",
              deepLink: "app://orders/ORD-2024-091401"
            }
          },
          deliveryResults: [
            {
              deviceToken: "device_token_1234567890abcdef",
              status: "delivered",
              deliveredAt: ISODate("2024-09-14T14:31:15Z")
            },
            {
              deviceToken: "device_token_abcdef1234567890", 
              status: "failed",
              error: "invalid_token",
              attemptedAt: ISODate("2024-09-14T14:31:15Z")
            }
          ]
        }
      },

      // SMS notification type
      {
        _id: ObjectId("64a1b2c3d4e5f6789012351c"),
        type: "sms",
        userId: ObjectId("64a1b2c3d4e5f6789012347a"),

        // Common notification fields
        title: "Security Alert",
        priority: "urgent",
        status: "sent",
        createdAt: ISODate("2024-09-14T16:45:00Z"),

        // SMS-specific fields
        smsData: {
          to: "+15551234567",
          from: "+15559876543",
          message: "Security Alert: New login detected from San Francisco, CA. If this wasn't you, secure your account immediately.",
          provider: "twilio",
          messageId: "SMabcdef1234567890",
          segments: 1,
          cost: 0.0075,
          deliveredAt: ISODate("2024-09-14T16:45:12Z"),
          deliveryStatus: "delivered"
        }
      }
    ];

    return notifications;
  }

  // Pattern 5: Bucket Pattern
  // Use when: Time-series data or high-volume data needs grouping
  createMetricsBucketPattern() {
    // Group metrics by hour to reduce document count
    return {
      _id: ObjectId("64a1b2c3d4e5f6789012352a"),

      // Bucket identifier
      type: "user_activity_metrics",
      userId: ObjectId("64a1b2c3d4e5f6789012347a"),

      // Time bucket information
      bucketDate: ISODate("2024-09-14T10:00:00Z"), // Hour bucket start
      bucketSize: "hourly",

      // Metadata for the bucket
      metadata: {
        userName: "sarah_dev",
        userSegment: "premium",
        deviceType: "desktop",
        location: "San Francisco, CA"
      },

      // Count of events in this bucket
      eventCount: 45,

      // Array of individual events within the time bucket
      events: [
        {
          timestamp: ISODate("2024-09-14T10:05:23Z"),
          eventType: "page_view",
          page: "/dashboard",
          sessionId: "sess_abc123",
          loadTime: 1250,
          userAgent: "Mozilla/5.0..."
        },
        {
          timestamp: ISODate("2024-09-14T10:07:45Z"),
          eventType: "click",
          element: "export_button",
          page: "/reports",
          sessionId: "sess_abc123"
        },
        {
          timestamp: ISODate("2024-09-14T10:12:10Z"),
          eventType: "api_call",
          endpoint: "/api/v1/reports/generate",
          responseTime: 2340,
          statusCode: 200,
          sessionId: "sess_abc123"
        }
        // ... more events up to reasonable bucket size (e.g., 100-1000 events)
      ],

      // Pre-aggregated summary statistics for the bucket
      summary: {
        pageViews: 15,
        clicks: 8,
        apiCalls: 12,
        errors: 2,
        uniquePages: 6,
        totalLoadTime: 18750,
        avgLoadTime: 1250,
        maxLoadTime: 3200,
        minLoadTime: 450,
        totalSessionTime: 1800000 // 30 minutes
      },

      // Bucket management
      bucketMetadata: {
        isFull: false,
        maxEvents: 1000,
        createdAt: ISODate("2024-09-14T10:05:23Z"),
        lastUpdated: ISODate("2024-09-14T10:59:45Z"),
        nextBucketId: null // Set when bucket is full
      }
    };
  }

  // Pattern 6: Attribute Pattern  
  // Use when: Documents have many similar fields or sparse attributes
  createProductAttributePattern() {
    return {
      _id: ObjectId("64a1b2c3d4e5f6789012353a"),
      productName: "Gaming Desktop Computer",
      category: "Electronics",

      // Attribute pattern for flexible, searchable specifications
      attributes: [
        {
          key: "processor",
          value: "Intel Core i9-13900K",
          type: "string",
          unit: null,
          isSearchable: true,
          isFilterable: true,
          displayOrder: 1,
          category: "performance"
        },
        {
          key: "ram",
          value: 32,
          type: "number",
          unit: "GB",
          isSearchable: true,
          isFilterable: true,
          displayOrder: 2,
          category: "performance"
        },
        {
          key: "storage",
          value: "1TB NVMe SSD + 2TB HDD",
          type: "string", 
          unit: null,
          isSearchable: true,
          isFilterable: false,
          displayOrder: 3,
          category: "storage"
        },
        {
          key: "graphics_card",
          value: "NVIDIA GeForce RTX 4080",
          type: "string",
          unit: null,
          isSearchable: true,
          isFilterable: true,
          displayOrder: 4,
          category: "performance"
        },
        {
          key: "power_consumption",
          value: 750,
          type: "number",
          unit: "watts",
          isSearchable: false,
          isFilterable: true,
          displayOrder: 10,
          category: "specifications"
        },
        {
          key: "warranty_years",
          value: 3,
          type: "number", 
          unit: "years",
          isSearchable: false,
          isFilterable: true,
          displayOrder: 15,
          category: "warranty"
        },
        {
          key: "rgb_lighting",
          value: true,
          type: "boolean",
          unit: null,
          isSearchable: false,
          isFilterable: true,
          displayOrder: 20,
          category: "aesthetics"
        }
      ],

      // Pre-computed attribute indexes for faster queries
      attributeIndex: {
        // String attributes for text search
        stringAttributes: {
          "processor": "Intel Core i9-13900K",
          "storage": "1TB NVMe SSD + 2TB HDD",
          "graphics_card": "NVIDIA GeForce RTX 4080"
        },

        // Numeric attributes for range queries
        numericAttributes: {
          "ram": 32,
          "power_consumption": 750,
          "warranty_years": 3
        },

        // Boolean attributes for exact matching
        booleanAttributes: {
          "rgb_lighting": true
        },

        // Searchable attribute values for text search
        searchableValues: [
          "Intel Core i9-13900K",
          "1TB NVMe SSD + 2TB HDD", 
          "NVIDIA GeForce RTX 4080"
        ],

        // Filterable attributes for faceted search
        filterableAttributes: [
          "processor", "ram", "graphics_card", 
          "power_consumption", "warranty_years", "rgb_lighting"
        ]
      },

      createdAt: ISODate("2024-09-14"),
      updatedAt: ISODate("2024-09-14")
    };
  }

  // Pattern 7: Computed Pattern
  // Use when: Expensive calculations need to be pre-computed and stored
  createUserAnalyticsComputedPattern() {
    return {
      _id: ObjectId("64a1b2c3d4e5f6789012354a"),
      userId: ObjectId("64a1b2c3d4e5f6789012347a"),

      // Computed metrics updated periodically
      computedMetrics: {
        // User engagement metrics
        engagement: {
          totalSessions: 342,
          totalSessionTime: 45600000, // milliseconds
          avgSessionDuration: 133333, // milliseconds (4.5 minutes)
          lastActiveDate: ISODate("2024-09-14"),
          daysSinceLastActive: 0,

          // Activity patterns
          mostActiveHour: 14, // 2 PM
          mostActiveDay: "tuesday",
          peakActivityScore: 8.7,

          // Engagement trends (last 30 days)
          dailyAverages: {
            sessions: 11.4,
            sessionTime: 1520000, // milliseconds
            pageViews: 23.7
          }
        },

        // Purchase behavior analytics
        purchasing: {
          totalOrders: 23,
          totalSpent: 12485.67,
          avgOrderValue: 543.29,
          daysSinceLastPurchase: 12,

          // Purchase patterns
          preferredCategories: [
            { category: "Electronics", orderCount: 12, totalSpent: 8234.50 },
            { category: "Books", orderCount: 8, totalSpent: 2145.32 },
            { category: "Clothing", orderCount: 3, totalSpent: 2105.85 }
          ],

          // Customer lifecycle metrics  
          lifetimeValue: 12485.67,
          predictedLifetimeValue: 24750.00,
          churnProbability: 0.15,
          nextPurchasePrediction: ISODate("2024-09-28"),

          // RFM scores
          rfmScores: {
            recency: 4, // Recent purchase
            frequency: 3, // Moderate purchase frequency
            monetary: 5, // High spending
            combined: "435",
            segment: "Loyal Customer"
          }
        },

        // Content interaction metrics
        contentEngagement: {
          articlesRead: 45,
          videosWatched: 23,
          totalReadingTime: 54000000, // milliseconds (15 hours)
          avgReadingSpeed: 250, // words per minute

          // Content preferences
          preferredTopics: [
            { topic: "Technology", interactionScore: 9.2, articles: 18 },
            { topic: "Programming", interactionScore: 8.8, articles: 15 },
            { topic: "Career", interactionScore: 7.5, articles: 12 }
          ],

          // Engagement quality
          completionRate: 0.78, // 78% of articles read to completion
          shareRate: 0.12, // 12% of articles shared
          bookmarkRate: 0.25 // 25% of articles bookmarked
        },

        // Social interaction metrics
        socialMetrics: {
          connectionsCount: 156,
          followersCount: 234,
          followingCount: 189,

          // Interaction patterns
          postsCreated: 67,
          commentsPosted: 234,
          likesGiven: 1567,
          sharesGiven: 89,

          // Influence metrics
          avgLikesPerPost: 12.4,
          avgCommentsPerPost: 3.8,
          influenceScore: 7.3,
          engagementRate: 0.065 // 6.5%
        }
      },

      // Computation metadata
      computationMetadata: {
        lastComputedAt: ISODate("2024-09-14T06:00:00Z"),
        nextComputationAt: ISODate("2024-09-15T06:00:00Z"),
        computationFrequency: "daily",
        computationDuration: 2340, // milliseconds
        dataFreshness: "6_hours", // Data is 6 hours old

        // Data sources used in computation
        dataSources: [
          {
            collection: "user_sessions",
            lastProcessedRecord: ISODate("2024-09-14T00:00:00Z"),
            recordsProcessed: 342
          },
          {
            collection: "orders",
            lastProcessedRecord: ISODate("2024-09-13T23:59:59Z"),
            recordsProcessed: 23
          },
          {
            collection: "content_interactions", 
            lastProcessedRecord: ISODate("2024-09-14T00:00:00Z"),
            recordsProcessed: 1456
          }
        ],

        // Computation version for tracking changes
        version: "2.1.0",
        algorithmVersion: "analytics_v2_1"
      },

      createdAt: ISODate("2024-01-15"),
      updatedAt: ISODate("2024-09-14T06:00:00Z")
    };
  }

  // Method to choose optimal pattern based on use case
  recommendDataPattern(useCase) {
    const recommendations = {
      "user_profile": {
        pattern: "embedded",
        reason: "Related data accessed together, relatively small size",
        example: "createUserProfileEmbeddedPattern()"
      },
      "blog_system": {
        pattern: "reference",
        reason: "Large documents, many-to-many relationships, separate lifecycle",
        example: "createBlogPostReferencePattern()"
      },
      "ecommerce_order": {
        pattern: "hybrid",
        reason: "Need historical snapshots and current references",
        example: "createOrderHybridPattern()"
      },
      "notification_system": {
        pattern: "polymorphic", 
        reason: "Different document structures based on notification type",
        example: "createNotificationPolymorphicPattern()"
      },
      "time_series_data": {
        pattern: "bucket",
        reason: "High-volume data with time-based grouping",
        example: "createMetricsBucketPattern()"
      },
      "product_catalog": {
        pattern: "attribute",
        reason: "Flexible attributes with search and filtering needs",
        example: "createProductAttributePattern()"
      },
      "user_analytics": {
        pattern: "computed",
        reason: "Expensive calculations need pre-computation",
        example: "createUserAnalyticsComputedPattern()"
      }
    };

    return recommendations[useCase] || {
      pattern: "hybrid",
      reason: "Consider combining patterns based on specific requirements",
      example: "Analyze access patterns and choose appropriate combination"
    };
  }
}

Schema Design and Migration Strategies

Implement effective schema evolution and migration patterns:

// Advanced schema design and migration strategies
class MongoSchemaManager {
  constructor(db) {
    this.db = db;
    this.schemaVersions = new Map();
    this.migrationHistory = [];
  }

  async createSchemaVersioningSystem(collection) {
    // Schema versioning pattern for gradual migrations
    const schemaVersionedDocument = {
      _id: ObjectId("64a1b2c3d4e5f6789012355a"),

      // Schema version metadata
      _schema: {
        version: "2.1.0",
        createdAt: ISODate("2024-09-14"),
        lastMigrated: ISODate("2024-09-14T08:30:00Z"),
        migrationHistory: [
          {
            fromVersion: "1.0.0",
            toVersion: "2.0.0",
            migratedAt: ISODate("2024-08-15T10:00:00Z"),
            migrationId: "migration_20240815_v2",
            changes: ["Added user preferences", "Restructured contact methods"]
          },
          {
            fromVersion: "2.0.0",
            toVersion: "2.1.0",
            migratedAt: ISODate("2024-09-14T08:30:00Z"),
            migrationId: "migration_20240914_v21",
            changes: ["Added analytics tracking", "Enhanced profile structure"]
          }
        ]
      },

      // Document data with current schema structure
      username: "sarah_dev",
      email: "sarah@example.com",
      profile: {
        firstName: "Sarah",
        lastName: "Johnson",
        // ... rest of profile data
      },

      // Optional: Keep old field names for backward compatibility during transition
      _deprecated: {
        // Old structure maintained during migration period
        full_name: "Sarah Johnson", // Deprecated in v2.0.0
        user_preferences: { /* old structure */ }, // Deprecated in v2.1.0
        deprecatedFields: ["full_name", "user_preferences"],
        removalScheduled: ISODate("2024-12-01") // When to remove deprecated fields
      },

      createdAt: ISODate("2024-01-15"),
      updatedAt: ISODate("2024-09-14")
    };

    return schemaVersionedDocument;
  }

  async performGradualMigration(collection, fromVersion, toVersion, migrationConfig) {
    // Gradual migration strategy to avoid downtime
    const migrationPlan = {
      migrationId: `migration_${Date.now()}`,
      fromVersion: fromVersion,
      toVersion: toVersion,
      startedAt: new Date(),

      // Migration phases
      phases: [
        {
          phase: 1,
          name: "preparation",
          description: "Create indexes and validate migration logic",
          status: "pending"
        },
        {
          phase: 2,
          name: "gradual_migration", 
          description: "Migrate documents in batches",
          batchSize: migrationConfig.batchSize || 1000,
          status: "pending"
        },
        {
          phase: 3,
          name: "validation",
          description: "Validate migrated data integrity",
          status: "pending"
        },
        {
          phase: 4,
          name: "cleanup",
          description: "Remove deprecated fields and indexes",
          status: "pending"
        }
      ]
    };

    try {
      // Phase 1: Preparation
      console.log("Phase 1: Preparing migration...");
      migrationPlan.phases[0].status = "in_progress";

      // Create necessary indexes for migration
      if (migrationConfig.newIndexes) {
        for (const index of migrationConfig.newIndexes) {
          await this.db.collection(collection).createIndex(index.fields, index.options);
          console.log(`Created index: ${JSON.stringify(index.fields)}`);
        }
      }

      migrationPlan.phases[0].status = "completed";
      migrationPlan.phases[0].completedAt = new Date();

      // Phase 2: Gradual migration in batches
      console.log("Phase 2: Starting gradual migration...");
      migrationPlan.phases[1].status = "in_progress";
      migrationPlan.phases[1].startedAt = new Date();

      let totalProcessed = 0;
      let batchNumber = 0;

      while (true) {
        batchNumber++;

        // Find documents that need migration
        const documentsToMigrate = await this.db.collection(collection).find({
          "_schema.version": { $ne: toVersion },
          "_migrationLock": { $exists: false } // Avoid concurrent migration
        })
        .limit(migrationConfig.batchSize || 1000)
        .toArray();

        if (documentsToMigrate.length === 0) {
          break; // No more documents to migrate
        }

        console.log(`Processing batch ${batchNumber}: ${documentsToMigrate.length} documents`);

        // Process batch with write concern for durability
        const bulkOperations = [];

        for (const doc of documentsToMigrate) {
          // Set migration lock to prevent concurrent updates
          await this.db.collection(collection).updateOne(
            { _id: doc._id },
            { $set: { "_migrationLock": true } }
          );

          try {
            // Apply migration transformation
            const migratedDoc = await this.applyMigrationTransformation(doc, fromVersion, toVersion);

            bulkOperations.push({
              updateOne: {
                filter: { _id: doc._id },
                update: {
                  $set: migratedDoc,
                  $unset: { "_migrationLock": 1 },
                  $push: {
                    "_schema.migrationHistory": {
                      fromVersion: fromVersion,
                      toVersion: toVersion,
                      migratedAt: new Date(),
                      migrationId: migrationPlan.migrationId
                    }
                  }
                }
              }
            });

          } catch (error) {
            console.error(`Migration failed for document ${doc._id}:`, error);

            // Remove migration lock on failure
            await this.db.collection(collection).updateOne(
              { _id: doc._id },
              { $unset: { "_migrationLock": 1 } }
            );
          }
        }

        // Execute bulk operations
        if (bulkOperations.length > 0) {
          const result = await this.db.collection(collection).bulkWrite(bulkOperations, {
            writeConcern: { w: "majority" }
          });

          totalProcessed += result.modifiedCount;
          console.log(`Batch ${batchNumber} completed: ${result.modifiedCount} documents migrated`);
        }

        // Add delay between batches to reduce system load
        if (migrationConfig.batchDelayMs) {
          await new Promise(resolve => setTimeout(resolve, migrationConfig.batchDelayMs));
        }
      }

      migrationPlan.phases[1].status = "completed";
      migrationPlan.phases[1].completedAt = new Date();
      migrationPlan.phases[1].documentsProcessed = totalProcessed;

      // Phase 3: Validation
      console.log("Phase 3: Validating migration...");
      migrationPlan.phases[2].status = "in_progress";

      const validationResult = await this.validateMigration(collection, toVersion);

      if (validationResult.success) {
        migrationPlan.phases[2].status = "completed";
        migrationPlan.phases[2].validationResult = validationResult;
        console.log("Migration validation successful");
      } else {
        migrationPlan.phases[2].status = "failed";
        migrationPlan.phases[2].validationResult = validationResult;
        throw new Error(`Migration validation failed: ${validationResult.errors.join(", ")}`);
      }

      // Phase 4: Cleanup (optional, scheduled for later)
      if (migrationConfig.immediateCleanup) {
        console.log("Phase 4: Cleanup...");
        migrationPlan.phases[3].status = "in_progress";

        await this.cleanupDeprecatedFields(collection, migrationConfig.fieldsToRemove);

        migrationPlan.phases[3].status = "completed";
        migrationPlan.phases[3].completedAt = new Date();
      } else {
        migrationPlan.phases[3].status = "scheduled";
        migrationPlan.phases[3].scheduledFor = migrationConfig.cleanupScheduledFor;
      }

      migrationPlan.status = "completed";
      migrationPlan.completedAt = new Date();

      // Record migration in history
      this.migrationHistory.push(migrationPlan);

      return migrationPlan;

    } catch (error) {
      migrationPlan.status = "failed";
      migrationPlan.error = error.message;
      migrationPlan.failedAt = new Date();

      console.error("Migration failed:", error);

      // Attempt to clean up any migration locks
      await this.db.collection(collection).updateMany(
        { "_migrationLock": true },
        { $unset: { "_migrationLock": 1 } }
      );

      throw error;
    }
  }

  async applyMigrationTransformation(document, fromVersion, toVersion) {
    // Apply specific transformation based on version upgrade path
    const transformations = {
      "1.0.0_to_2.0.0": (doc) => {
        // Example: Restructure user contact information
        if (doc.full_name && !doc.profile) {
          const nameParts = doc.full_name.split(" ");
          doc.profile = {
            firstName: nameParts[0] || "",
            lastName: nameParts.slice(1).join(" ") || ""
          };

          // Mark old field as deprecated but keep for backward compatibility
          doc._deprecated = doc._deprecated || {};
          doc._deprecated.full_name = doc.full_name;
          delete doc.full_name;
        }

        // Update schema version
        doc._schema = doc._schema || {};
        doc._schema.version = "2.0.0";
        doc._schema.lastMigrated = new Date();

        return doc;
      },

      "2.0.0_to_2.1.0": (doc) => {
        // Example: Add analytics tracking structure
        if (!doc.analytics) {
          doc.analytics = {
            totalLogins: 0,
            lastLoginAt: null,
            createdAt: doc.createdAt,
            engagement: {
              level: "new",
              score: 0
            }
          };
        }

        // Migrate user preferences structure
        if (doc.user_preferences && !doc.preferences) {
          doc.preferences = {
            theme: doc.user_preferences.theme || "light",
            language: doc.user_preferences.lang || "en",
            notifications: doc.user_preferences.notifications || {}
          };

          // Mark old field as deprecated
          doc._deprecated = doc._deprecated || {};
          doc._deprecated.user_preferences = doc.user_preferences;
          delete doc.user_preferences;
        }

        // Update schema version
        doc._schema.version = "2.1.0";
        doc._schema.lastMigrated = new Date();

        return doc;
      }
    };

    const transformationKey = `${fromVersion}_to_${toVersion}`;
    const transformation = transformations[transformationKey];

    if (!transformation) {
      throw new Error(`No transformation defined for ${transformationKey}`);
    }

    return transformation({ ...document }); // Work with copy to avoid mutations
  }

  async validateMigration(collection, expectedVersion) {
    const validationResult = {
      success: true,
      errors: [],
      warnings: [],
      statistics: {}
    };

    try {
      // Check all documents have the correct schema version
      const totalDocuments = await this.db.collection(collection).countDocuments({});
      const migratedDocuments = await this.db.collection(collection).countDocuments({
        "_schema.version": expectedVersion
      });

      validationResult.statistics.totalDocuments = totalDocuments;
      validationResult.statistics.migratedDocuments = migratedDocuments;
      validationResult.statistics.migrationCompleteness = migratedDocuments / totalDocuments;

      if (migratedDocuments !== totalDocuments) {
        validationResult.errors.push(
          `Migration incomplete: ${migratedDocuments}/${totalDocuments} documents migrated`
        );
        validationResult.success = false;
      }

      // Check for migration locks (indicates failed migrations)
      const lockedDocuments = await this.db.collection(collection).countDocuments({
        "_migrationLock": true
      });

      if (lockedDocuments > 0) {
        validationResult.warnings.push(
          `${lockedDocuments} documents have migration locks - may indicate failed migrations`
        );
      }

      // Validate sample documents have expected structure
      const sampleSize = Math.min(100, migratedDocuments);
      const sampleDocuments = await this.db.collection(collection).aggregate([
        { $match: { "_schema.version": expectedVersion } },
        { $sample: { size: sampleSize } }
      ]).toArray();

      let structureValidationErrors = 0;

      for (const doc of sampleDocuments) {
        try {
          await this.validateDocumentStructure(doc, expectedVersion);
        } catch (error) {
          structureValidationErrors++;
        }
      }

      if (structureValidationErrors > 0) {
        validationResult.errors.push(
          `${structureValidationErrors}/${sampleSize} sample documents have structure validation errors`
        );
        validationResult.success = false;
      }

      validationResult.statistics.sampleSize = sampleSize;
      validationResult.statistics.structureValidationErrors = structureValidationErrors;

    } catch (error) {
      validationResult.success = false;
      validationResult.errors.push(`Validation error: ${error.message}`);
    }

    return validationResult;
  }

  async validateDocumentStructure(document, schemaVersion) {
    // Define expected structure for each schema version
    const schemaValidators = {
      "2.1.0": (doc) => {
        // Required fields for version 2.1.0
        const requiredFields = ["_schema", "username", "email", "profile", "createdAt"];

        for (const field of requiredFields) {
          if (!doc.hasOwnProperty(field)) {
            throw new Error(`Missing required field: ${field}`);
          }
        }

        // Validate _schema structure
        if (!doc._schema.version || !doc._schema.lastMigrated) {
          throw new Error("Invalid _schema structure");
        }

        // Validate profile structure
        if (!doc.profile.firstName || !doc.profile.lastName) {
          throw new Error("Invalid profile structure");
        }

        return true;
      }
    };

    const validator = schemaValidators[schemaVersion];
    if (!validator) {
      throw new Error(`No validator defined for schema version ${schemaVersion}`);
    }

    return validator(document);
  }

  async cleanupDeprecatedFields(collection, fieldsToRemove) {
    // Remove deprecated fields after successful migration
    console.log(`Cleaning up deprecated fields: ${fieldsToRemove.join(", ")}`);

    const unsetFields = fieldsToRemove.reduce((acc, field) => {
      acc[field] = 1;
      acc[`_deprecated.${field}`] = 1;
      return acc;
    }, {});

    const result = await this.db.collection(collection).updateMany(
      {}, // Update all documents
      {
        $unset: unsetFields,
        $set: {
          "cleanupCompletedAt": new Date()
        }
      }
    );

    console.log(`Cleanup completed: ${result.modifiedCount} documents updated`);
    return result;
  }

  async createSchemaValidationRules(collection, schemaVersion) {
    // Create MongoDB schema validation rules
    const validationRules = {
      "2.1.0": {
        $jsonSchema: {
          bsonType: "object",
          required: ["_schema", "username", "email", "profile", "createdAt"],
          properties: {
            _schema: {
              bsonType: "object",
              required: ["version"],
              properties: {
                version: {
                  bsonType: "string",
                  enum: ["2.1.0"]
                },
                lastMigrated: {
                  bsonType: "date"
                }
              }
            },
            username: {
              bsonType: "string",
              minLength: 3,
              maxLength: 30
            },
            email: {
              bsonType: "string",
              pattern: "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$"
            },
            profile: {
              bsonType: "object",
              required: ["firstName", "lastName"],
              properties: {
                firstName: { bsonType: "string", maxLength: 50 },
                lastName: { bsonType: "string", maxLength: 50 },
                dateOfBirth: { bsonType: "date" },
                avatar: {
                  bsonType: "object",
                  properties: {
                    url: { bsonType: "string" },
                    uploadedAt: { bsonType: "date" }
                  }
                }
              }
            },
            createdAt: { bsonType: "date" },
            updatedAt: { bsonType: "date" }
          }
        }
      }
    };

    const rule = validationRules[schemaVersion];
    if (!rule) {
      throw new Error(`No validation rule defined for schema version ${schemaVersion}`);
    }

    // Apply validation rule to collection
    await this.db.command({
      collMod: collection,
      validator: rule,
      validationLevel: "moderate", // Only validate inserts and updates to valid documents
      validationAction: "warn" // Log validation errors but allow operations
    });

    console.log(`Schema validation rules applied to ${collection} for version ${schemaVersion}`);
    return rule;
  }

  async getMigrationStatus(collection) {
    // Get comprehensive migration status for a collection
    const status = {
      collection: collection,
      currentTime: new Date(),
      schemaVersions: {},
      totalDocuments: 0,
      migrationLocks: 0,
      deprecatedFields: [],
      recentMigrations: []
    };

    // Count documents by schema version
    const versionCounts = await this.db.collection(collection).aggregate([
      {
        $group: {
          _id: "$_schema.version",
          count: { $sum: 1 },
          lastMigrated: { $max: "$_schema.lastMigrated" }
        }
      },
      { $sort: { "_id": 1 } }
    ]).toArray();

    versionCounts.forEach(version => {
      status.schemaVersions[version._id || "unknown"] = {
        count: version.count,
        lastMigrated: version.lastMigrated
      };
      status.totalDocuments += version.count;
    });

    // Count migration locks
    status.migrationLocks = await this.db.collection(collection).countDocuments({
      "_migrationLock": true
    });

    // Find documents with deprecated fields
    const deprecatedFieldsAnalysis = await this.db.collection(collection).aggregate([
      { $match: { "_deprecated": { $exists: true } } },
      {
        $project: {
          deprecatedFields: { $objectToArray: "$_deprecated" }
        }
      },
      { $unwind: "$deprecatedFields" },
      {
        $group: {
          _id: "$deprecatedFields.k",
          count: { $sum: 1 }
        }
      }
    ]).toArray();

    status.deprecatedFields = deprecatedFieldsAnalysis.map(field => ({
      fieldName: field._id,
      documentCount: field.count
    }));

    // Get recent migration history
    status.recentMigrations = this.migrationHistory
      .filter(migration => migration.collection === collection)
      .slice(-5) // Last 5 migrations
      .map(migration => ({
        migrationId: migration.migrationId,
        fromVersion: migration.fromVersion,
        toVersion: migration.toVersion,
        status: migration.status,
        completedAt: migration.completedAt,
        documentsProcessed: migration.phases[1]?.documentsProcessed
      }));

    return status;
  }
}

SQL-Style Data Modeling with QueryLeaf

QueryLeaf provides familiar SQL approaches to MongoDB data modeling and schema design:

-- QueryLeaf data modeling with SQL-familiar schema design syntax

-- Define document structure similar to CREATE TABLE
CREATE DOCUMENT_SCHEMA users (
  _id OBJECTID PRIMARY KEY,
  username VARCHAR(30) NOT NULL UNIQUE,
  email VARCHAR(255) NOT NULL UNIQUE,

  -- Embedded document structure
  profile DOCUMENT {
    firstName VARCHAR(50) NOT NULL,
    lastName VARCHAR(50) NOT NULL,
    dateOfBirth DATE,
    avatar DOCUMENT {
      url VARCHAR(500),
      uploadedAt TIMESTAMP,
      size DOCUMENT {
        width INTEGER,
        height INTEGER
      }
    },
    bio TEXT,
    location DOCUMENT {
      city VARCHAR(100),
      state VARCHAR(50),
      country VARCHAR(100),
      timezone VARCHAR(50)
    }
  },

  -- Array of embedded documents
  contactMethods ARRAY OF DOCUMENT {
    type ENUM('email', 'phone', 'address'),
    value VARCHAR(255) NOT NULL,
    isPrimary BOOLEAN DEFAULT false,
    isVerified BOOLEAN DEFAULT false,
    verifiedAt TIMESTAMP
  },

  -- Array of simple values with constraints
  skills ARRAY OF DOCUMENT {
    name VARCHAR(100) NOT NULL,
    level ENUM('beginner', 'intermediate', 'advanced', 'expert'),
    yearsExperience INTEGER CHECK (yearsExperience >= 0)
  },

  -- Reference to another collection
  departmentId OBJECTID REFERENCES departments(_id),

  -- Embedded metadata
  account DOCUMENT {
    status ENUM('active', 'inactive', 'suspended') DEFAULT 'active',
    type ENUM('free', 'premium', 'enterprise') DEFAULT 'free',
    createdAt TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    lastLoginAt TIMESTAMP,
    loginCount INTEGER DEFAULT 0,
    isEmailVerified BOOLEAN DEFAULT false,
    twoFactorEnabled BOOLEAN DEFAULT false
  },

  -- Flexible attributes using attribute pattern
  attributes ARRAY OF DOCUMENT {
    key VARCHAR(100) NOT NULL,
    value MIXED, -- Can be string, number, boolean, etc.
    type ENUM('string', 'number', 'boolean', 'date'),
    isSearchable BOOLEAN DEFAULT false,
    isFilterable BOOLEAN DEFAULT false,
    category VARCHAR(50)
  },

  -- Timestamps for auditing
  createdAt TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  updatedAt TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Create indexes for optimal query performance
CREATE INDEX idx_users_username ON users (username);
CREATE INDEX idx_users_email ON users (email);
CREATE INDEX idx_users_profile_name ON users (profile.firstName, profile.lastName);
CREATE INDEX idx_users_skills ON users (skills.name, skills.level);
CREATE INDEX idx_users_location ON users (profile.location.city, profile.location.state);

-- Compound index for complex queries
CREATE INDEX idx_users_active_premium ON users (account.status, account.type, createdAt);

-- Text index for full-text search
CREATE TEXT INDEX idx_users_search ON users (
  username,
  profile.firstName,
  profile.lastName,
  profile.bio,
  skills.name
);

-- Schema versioning and migration management
ALTER DOCUMENT_SCHEMA users ADD COLUMN analytics DOCUMENT {
  totalLogins INTEGER DEFAULT 0,
  lastLoginAt TIMESTAMP,
  engagement DOCUMENT {
    level ENUM('new', 'active', 'power', 'inactive') DEFAULT 'new',
    score DECIMAL(3,2) DEFAULT 0.00
  }
} WITH MIGRATION_STRATEGY gradual;

-- Polymorphic document schema for notifications
CREATE DOCUMENT_SCHEMA notifications (
  _id OBJECTID PRIMARY KEY,
  userId OBJECTID NOT NULL REFERENCES users(_id),
  type ENUM('email', 'push', 'sms') NOT NULL,

  -- Common fields for all notification types
  title VARCHAR(200) NOT NULL,
  priority ENUM('low', 'normal', 'high', 'urgent') DEFAULT 'normal',
  status ENUM('pending', 'sent', 'delivered', 'failed') DEFAULT 'pending',

  -- Polymorphic data based on type using VARIANT
  notificationData VARIANT {
    WHEN type = 'email' THEN DOCUMENT {
      from VARCHAR(255) NOT NULL,
      to VARCHAR(255) NOT NULL,
      subject VARCHAR(500) NOT NULL,
      templateId VARCHAR(100),
      templateVariables DOCUMENT,
      deliveryAttempts INTEGER DEFAULT 0,
      deliveredAt TIMESTAMP,
      openedAt TIMESTAMP,
      clickedAt TIMESTAMP
    },

    WHEN type = 'push' THEN DOCUMENT {
      deviceTokens ARRAY OF VARCHAR(255),
      payload DOCUMENT {
        alert DOCUMENT {
          title VARCHAR(200),
          body VARCHAR(500)
        },
        badge INTEGER,
        sound VARCHAR(50),
        category VARCHAR(100),
        customData DOCUMENT
      },
      deliveryResults ARRAY OF DOCUMENT {
        deviceToken VARCHAR(255),
        status ENUM('delivered', 'failed'),
        error VARCHAR(255),
        timestamp TIMESTAMP
      }
    },

    WHEN type = 'sms' THEN DOCUMENT {
      to VARCHAR(20) NOT NULL,
      from VARCHAR(20),
      message VARCHAR(1600) NOT NULL,
      provider VARCHAR(50),
      messageId VARCHAR(255),
      segments INTEGER DEFAULT 1,
      cost DECIMAL(6,4),
      deliveredAt TIMESTAMP,
      deliveryStatus VARCHAR(50)
    }
  },

  createdAt TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  updatedAt TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Bucket pattern for time-series metrics
CREATE DOCUMENT_SCHEMA user_activity_buckets (
  _id OBJECTID PRIMARY KEY,

  -- Bucket identification
  userId OBJECTID NOT NULL REFERENCES users(_id),
  bucketDate TIMESTAMP NOT NULL, -- Hour/day bucket start time
  bucketType ENUM('hourly', 'daily') NOT NULL,

  -- Bucket metadata
  metadata DOCUMENT {
    userName VARCHAR(30),
    userSegment VARCHAR(50),
    deviceType VARCHAR(50),
    location VARCHAR(100)
  },

  -- Event counter
  eventCount INTEGER DEFAULT 0,

  -- Array of events within the bucket
  events ARRAY OF DOCUMENT {
    timestamp TIMESTAMP NOT NULL,
    eventType ENUM('page_view', 'click', 'api_call', 'error') NOT NULL,
    page VARCHAR(500),
    element VARCHAR(200),
    sessionId VARCHAR(100),
    responseTime INTEGER,
    statusCode INTEGER,
    userAgent TEXT
  } VALIDATE (ARRAY_LENGTH(events) <= 1000), -- Limit bucket size

  -- Pre-computed summary statistics
  summary DOCUMENT {
    pageViews INTEGER DEFAULT 0,
    clicks INTEGER DEFAULT 0,
    apiCalls INTEGER DEFAULT 0,
    errors INTEGER DEFAULT 0,
    uniquePages INTEGER DEFAULT 0,
    totalResponseTime BIGINT DEFAULT 0,
    avgResponseTime DECIMAL(8,2),
    maxResponseTime INTEGER,
    minResponseTime INTEGER
  },

  -- Bucket management
  bucketMetadata DOCUMENT {
    isFull BOOLEAN DEFAULT false,
    maxEvents INTEGER DEFAULT 1000,
    nextBucketId OBJECTID
  },

  createdAt TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  updatedAt TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Compound index for efficient bucket queries
CREATE INDEX idx_activity_buckets_user_time ON user_activity_buckets (
  userId, bucketType, bucketDate
);

-- Complex analytics queries with document modeling
WITH user_engagement AS (
  SELECT 
    u._id as user_id,
    u.username,
    u.profile.firstName || ' ' || u.profile.lastName as full_name,
    u.account.type as account_type,

    -- Aggregate metrics from activity buckets
    SUM(ab.summary.pageViews) as total_page_views,
    SUM(ab.summary.clicks) as total_clicks,
    AVG(ab.summary.avgResponseTime) as avg_response_time,
    COUNT(DISTINCT ab.bucketDate) as active_days,

    -- Calculate engagement score
    (SUM(ab.summary.pageViews) * 0.1 + 
     SUM(ab.summary.clicks) * 0.3 + 
     COUNT(DISTINCT ab.bucketDate) * 0.6) as engagement_score,

    -- User profile attributes
    ARRAY_AGG(
      CASE WHEN ua.attributes->key = 'department' 
           THEN ua.attributes->value 
      END
    ) FILTER (WHERE ua.attributes->key = 'department') as departments,

    -- Location information
    u.profile.location.city as city,
    u.profile.location.state as state

  FROM users u
  LEFT JOIN user_activity_buckets ab ON u._id = ab.userId
    AND ab.bucketDate >= CURRENT_DATE - INTERVAL '30 days'
  LEFT JOIN UNNEST(u.attributes) as ua ON true

  WHERE u.account.status = 'active'
    AND u.createdAt >= CURRENT_DATE - INTERVAL '1 year'

  GROUP BY u._id, u.username, u.profile.firstName, u.profile.lastName,
           u.account.type, u.profile.location.city, u.profile.location.state
),

engagement_segments AS (
  SELECT *,
    CASE 
      WHEN engagement_score >= 50 THEN 'High Engagement'
      WHEN engagement_score >= 20 THEN 'Medium Engagement' 
      WHEN engagement_score >= 5 THEN 'Low Engagement'
      ELSE 'Inactive'
    END as engagement_segment,

    -- Percentile ranking within account type
    PERCENT_RANK() OVER (
      PARTITION BY account_type 
      ORDER BY engagement_score
    ) as engagement_percentile

  FROM user_engagement
)

SELECT 
  engagement_segment,
  account_type,
  COUNT(*) as user_count,
  AVG(engagement_score) as avg_engagement_score,
  AVG(total_page_views) as avg_page_views,
  AVG(active_days) as avg_active_days,

  -- Top cities by user count in each segment
  ARRAY_AGG(
    JSON_BUILD_OBJECT(
      'city', city,
      'state', state,
      'count', COUNT(*) OVER (PARTITION BY city, state)
    ) ORDER BY COUNT(*) OVER (PARTITION BY city, state) DESC LIMIT 5
  ) as top_locations,

  -- Engagement distribution
  JSON_BUILD_OBJECT(
    'min', MIN(engagement_score),
    'p25', PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY engagement_score),
    'p50', PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY engagement_score),
    'p75', PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY engagement_score),
    'p95', PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY engagement_score),
    'max', MAX(engagement_score)
  ) as engagement_distribution

FROM engagement_segments
GROUP BY engagement_segment, account_type
ORDER BY engagement_segment, account_type;

-- Schema validation and data quality checks
SELECT 
  collection_name,
  schema_version,
  document_count,

  -- Data quality metrics
  (SELECT COUNT(*) FROM users WHERE username IS NULL) as missing_usernames,
  (SELECT COUNT(*) FROM users WHERE email IS NULL) as missing_emails,
  (SELECT COUNT(*) FROM users WHERE profile IS NULL) as missing_profiles,

  -- Schema compliance
  (SELECT COUNT(*) FROM users WHERE _schema.version != '2.1.0') as outdated_schema,
  (SELECT COUNT(*) FROM users WHERE _migrationLock = true) as migration_locks,

  -- Index usage analysis
  JSON_BUILD_OBJECT(
    'username_index_usage', INDEX_USAGE_STATS('users', 'idx_users_username'),
    'email_index_usage', INDEX_USAGE_STATS('users', 'idx_users_email'),
    'profile_name_index_usage', INDEX_USAGE_STATS('users', 'idx_users_profile_name')
  ) as index_statistics,

  -- Storage efficiency metrics
  AVG_DOCUMENT_SIZE('users') as avg_document_size_kb,
  DOCUMENT_SIZE_DISTRIBUTION('users') as size_distribution,

  CURRENT_TIMESTAMP as analysis_timestamp

FROM DOCUMENT_SCHEMA_STATS('users');

-- Migration management with SQL-style syntax
CREATE MIGRATION migrate_users_v2_to_v3 AS
BEGIN
  -- Add new analytics structure
  ALTER DOCUMENT_SCHEMA users 
  ADD COLUMN detailed_analytics DOCUMENT {
    sessions ARRAY OF DOCUMENT {
      sessionId VARCHAR(100),
      startTime TIMESTAMP,
      endTime TIMESTAMP,
      pageViews INTEGER,
      actions ARRAY OF VARCHAR(100)
    },
    preferences DOCUMENT {
      communicationChannels ARRAY OF ENUM('email', 'sms', 'push'),
      contentTopics ARRAY OF VARCHAR(100),
      frequencySettings DOCUMENT {
        marketing ENUM('never', 'weekly', 'monthly'),
        updates ENUM('immediate', 'daily', 'weekly')
      }
    }
  };

  -- Update existing documents with default values
  UPDATE users 
  SET detailed_analytics = {
    sessions: [],
    preferences: {
      communicationChannels: ['email'],
      contentTopics: [],
      frequencySettings: {
        marketing: 'monthly',
        updates: 'weekly'
      }
    }
  }
  WHERE detailed_analytics IS NULL;

  -- Update schema version
  UPDATE users 
  SET 
    _schema.version = '3.0.0',
    _schema.lastMigrated = CURRENT_TIMESTAMP,
    updatedAt = CURRENT_TIMESTAMP;

END;

-- Execute migration with options
EXECUTE MIGRATION migrate_users_v2_to_v3 WITH OPTIONS (
  batch_size = 1000,
  batch_delay_ms = 100,
  validation_sample_size = 50,
  cleanup_schedule = '2024-12-01'
);

-- Monitor migration progress
SELECT 
  migration_name,
  status,
  current_phase,
  documents_processed,
  estimated_completion,
  error_count,
  last_error_message
FROM MIGRATION_STATUS('migrate_users_v2_to_v3');

-- QueryLeaf data modeling provides:
-- 1. SQL-familiar schema definition with document structure support
-- 2. Flexible embedded documents and arrays with validation
-- 3. Polymorphic schemas with variant types based on discriminator fields
-- 4. Advanced indexing strategies for document queries
-- 5. Schema versioning and gradual migration management
-- 6. Data quality validation and compliance checking
-- 7. Storage efficiency analysis and optimization recommendations
-- 8. Integration with MongoDB's native document features
-- 9. SQL-style complex queries across embedded structures
-- 10. Automated migration execution with rollback capabilities

Best Practices for MongoDB Data Modeling

Design Decision Framework

Strategic approach to document design decisions:

  1. Access Pattern Analysis: Design documents based on how data will be queried and updated
  2. Cardinality Considerations: Choose embedding vs. referencing based on relationship cardinality
  3. Data Growth Patterns: Consider how document size and collection size will grow over time
  4. Update Frequency: Factor in how often different parts of documents will be updated
  5. Consistency Requirements: Balance performance with data consistency needs
  6. Query Performance: Optimize document structure for most common query patterns

Performance Optimization Guidelines

Essential practices for high-performance document modeling:

  1. Document Size Management: Keep documents under 16MB limit, optimize for working set
  2. Index Strategy: Create indexes that support your access patterns and query requirements
  3. Denormalization Strategy: Strategic denormalization for read performance vs. update complexity
  4. Array Size Limits: Monitor array growth to prevent performance degradation
  5. Embedding Depth: Limit nesting levels to maintain query performance and readability
  6. Schema Evolution: Plan for schema changes without downtime using versioning strategies

Conclusion

MongoDB data modeling requires a fundamental shift from relational thinking to document-oriented design principles. By understanding when to embed versus reference data, how to structure documents for optimal performance, and how to implement effective schema evolution strategies, you can create database designs that are both flexible and performant.

Key data modeling benefits include:

  • Flexible Schema Design: Documents can evolve naturally with application requirements
  • Optimal Performance: Strategic embedding eliminates complex joins for read-heavy workloads
  • Natural Data Structures: Document structure aligns with object-oriented programming models
  • Horizontal Scalability: Document design supports sharding and distributed architectures
  • Rich Data Types: Native support for arrays, nested objects, and complex data structures
  • Schema Evolution: Gradual migration strategies enable schema changes without downtime

Whether you're building content management systems, e-commerce platforms, real-time analytics applications, or any system requiring flexible data structures, MongoDB's document modeling with QueryLeaf's familiar SQL interface provides the foundation for scalable, maintainable database designs. This combination enables you to leverage advanced NoSQL capabilities while preserving familiar database interaction patterns.

QueryLeaf Integration: QueryLeaf automatically translates SQL-familiar schema definitions into optimal MongoDB document structures while providing familiar syntax for complex document queries, schema evolution, and migration management. Advanced document patterns, validation rules, and performance optimization are seamlessly handled through SQL-style operations, making flexible schema design both powerful and accessible.

The integration of flexible document modeling with SQL-style database operations makes MongoDB an ideal platform for applications requiring both sophisticated data structures and familiar database interaction patterns, ensuring your data models remain both efficient and maintainable as they scale and evolve.

MongoDB Atlas Search and Full-Text Indexing: SQL-Style Text Search with Advanced Analytics and Ranking

Modern applications require sophisticated search capabilities that go beyond simple text matching - semantic understanding, relevance scoring, faceted search, auto-completion, and real-time search analytics. Traditional relational databases provide basic full-text search through extensions like PostgreSQL's pg_trgm or MySQL's MATCH AGAINST, but struggle with advanced search features, relevance ranking, and the performance demands of modern search applications.

MongoDB Atlas Search provides enterprise-grade search capabilities built on Apache Lucene, delivering advanced full-text search, semantic search, vector search, and search analytics directly integrated with your MongoDB data. Unlike external search engines that require complex data synchronization, Atlas Search maintains real-time consistency with your database while providing powerful search features typically found only in dedicated search platforms.

The Traditional Search Challenge

Relational database search approaches have significant limitations for modern applications:

-- Traditional SQL full-text search - limited and inefficient

-- PostgreSQL full-text search approach
CREATE TABLE articles (
    article_id SERIAL PRIMARY KEY,
    title VARCHAR(500) NOT NULL,
    content TEXT NOT NULL,
    author_id INTEGER REFERENCES users(user_id),
    category VARCHAR(100),
    tags TEXT[],
    published_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    view_count INTEGER DEFAULT 0,

    -- Full-text search vectors
    title_tsvector TSVECTOR,
    content_tsvector TSVECTOR,
    combined_tsvector TSVECTOR
);

-- Create full-text search indexes
CREATE INDEX idx_articles_title_fts ON articles USING GIN(title_tsvector);
CREATE INDEX idx_articles_content_fts ON articles USING GIN(content_tsvector);
CREATE INDEX idx_articles_combined_fts ON articles USING GIN(combined_tsvector);

-- Maintain search vectors with triggers
CREATE OR REPLACE FUNCTION update_article_search_vectors()
RETURNS TRIGGER AS $$
BEGIN
    NEW.title_tsvector := to_tsvector('english', NEW.title);
    NEW.content_tsvector := to_tsvector('english', NEW.content);
    NEW.combined_tsvector := to_tsvector('english', 
        NEW.title || ' ' || NEW.content || ' ' || array_to_string(NEW.tags, ' '));
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER trigger_update_search_vectors
    BEFORE INSERT OR UPDATE ON articles
    FOR EACH ROW EXECUTE FUNCTION update_article_search_vectors();

-- Basic full-text search query
SELECT 
    a.article_id,
    a.title,
    a.published_date,
    a.view_count,

    -- Simple relevance ranking
    ts_rank(a.combined_tsvector, query) as relevance_score,

    -- Highlight search terms (basic)
    ts_headline('english', a.content, query, 
        'MaxWords=50, MinWords=10, ShortWord=3') as snippet

FROM articles a,
     plainto_tsquery('english', 'machine learning algorithms') as query
WHERE a.combined_tsvector @@ query
ORDER BY ts_rank(a.combined_tsvector, query) DESC
LIMIT 20;

-- Problems with traditional full-text search:
-- 1. Limited language support and stemming capabilities
-- 2. Basic relevance scoring without advanced ranking factors
-- 3. No semantic understanding or synonym handling
-- 4. Limited faceting and aggregation capabilities
-- 5. Poor auto-completion and suggestion features
-- 6. No built-in analytics or search performance metrics
-- 7. Complex maintenance of search vectors and triggers
-- 8. Limited scalability for large document collections

-- MySQL full-text search (even more limited)
CREATE TABLE documents (
    doc_id INT AUTO_INCREMENT PRIMARY KEY,
    title VARCHAR(255),
    content LONGTEXT,
    category VARCHAR(100),
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    FULLTEXT(title, content)
) ENGINE=InnoDB;

-- Basic MySQL full-text search
SELECT 
    doc_id,
    title,
    created_at,
    MATCH(title, content) AGAINST('machine learning' IN NATURAL LANGUAGE MODE) as score
FROM documents 
WHERE MATCH(title, content) AGAINST('machine learning' IN NATURAL LANGUAGE MODE)
ORDER BY score DESC
LIMIT 20;

-- MySQL limitations:
-- - Minimum word length restrictions
-- - Limited boolean query syntax
-- - Poor performance with large datasets
-- - No advanced ranking or analytics
-- - Limited customization options

MongoDB Atlas Search provides comprehensive search capabilities:

// MongoDB Atlas Search - enterprise-grade search with advanced features
const { MongoClient } = require('mongodb');

const client = new MongoClient('mongodb+srv://cluster.mongodb.net');
const db = client.db('content_platform');
const articles = db.collection('articles');

// Advanced Atlas Search query with multiple search techniques
const searchQuery = [
  {
    $search: {
      index: "articles_search_index", // Custom search index
      compound: {
        must: [
          // Text search with fuzzy matching
          {
            text: {
              query: "machine learning algorithms",
              path: ["title", "content"],
              fuzzy: {
                maxEdits: 2,
                prefixLength: 1,
                maxExpansions: 50
              }
            }
          }
        ],
        should: [
          // Boost title matches
          {
            text: {
              query: "machine learning algorithms",
              path: "title",
              score: { boost: { value: 3.0 } }
            }
          },
          // Phrase matching with slop
          {
            phrase: {
              query: "machine learning",
              path: ["title", "content"],
              slop: 2,
              score: { boost: { value: 2.0 } }
            }
          },
          // Semantic search using synonyms
          {
            text: {
              query: "machine learning algorithms",
              path: ["title", "content"],
              synonyms: "tech_synonyms"
            }
          }
        ],
        filter: [
          // Date range filtering
          {
            range: {
              path: "publishedDate",
              gte: new Date("2023-01-01"),
              lte: new Date("2025-12-31")
            }
          },
          // Category filtering
          {
            text: {
              query: ["technology", "science", "ai"],
              path: "category"
            }
          }
        ],
        mustNot: [
          // Exclude draft articles
          {
            equals: {
              path: "status",
              value: "draft"
            }
          }
        ]
      },

      // Advanced highlighting
      highlight: {
        path: ["title", "content"],
        maxCharsToExamine: 500000,
        maxNumPassages: 3
      },

      // Count total matches
      count: {
        type: "total"
      }
    }
  },

  // Add computed relevance and metadata
  {
    $addFields: {
      searchScore: { $meta: "searchScore" },
      searchHighlights: { $meta: "searchHighlights" },

      // Custom scoring factors
      popularityScore: {
        $divide: [
          { $add: ["$viewCount", "$likeCount"] },
          { $max: [{ $divide: [{ $subtract: [new Date(), "$publishedDate"] }, 86400000] }, 1] }
        ]
      },

      // Content quality indicators
      contentQuality: {
        $cond: {
          if: { $gte: [{ $strLenCP: "$content" }, 1000] },
          then: { $min: [{ $divide: [{ $strLenCP: "$content" }, 500] }, 5] },
          else: 1
        }
      }
    }
  },

  // Faceted aggregations for search filters
  {
    $facet: {
      // Main search results
      results: [
        {
          $addFields: {
            finalScore: {
              $add: [
                "$searchScore",
                { $multiply: ["$popularityScore", 0.2] },
                { $multiply: ["$contentQuality", 0.1] }
              ]
            }
          }
        },
        { $sort: { finalScore: -1 } },
        { $limit: 20 },
        {
          $project: {
            articleId: "$_id",
            title: 1,
            author: 1,
            category: 1,
            tags: 1,
            publishedDate: 1,
            viewCount: 1,
            searchScore: 1,
            finalScore: 1,
            searchHighlights: 1,
            snippet: { $substr: ["$content", 0, 200] }
          }
        }
      ],

      // Category facets
      categoryFacets: [
        {
          $group: {
            _id: "$category",
            count: { $sum: 1 },
            avgScore: { $avg: "$searchScore" }
          }
        },
        { $sort: { count: -1 } },
        { $limit: 10 }
      ],

      // Author facets
      authorFacets: [
        {
          $group: {
            _id: "$author.name",
            count: { $sum: 1 },
            articles: { $push: "$title" }
          }
        },
        { $sort: { count: -1 } },
        { $limit: 10 }
      ],

      // Date range facets
      dateFacets: [
        {
          $group: {
            _id: {
              year: { $year: "$publishedDate" },
              month: { $month: "$publishedDate" }
            },
            count: { $sum: 1 },
            avgScore: { $avg: "$searchScore" }
          }
        },
        { $sort: { "_id.year": -1, "_id.month": -1 } }
      ],

      // Search analytics
      searchAnalytics: [
        {
          $group: {
            _id: null,
            totalResults: { $sum: 1 },
            avgScore: { $avg: "$searchScore" },
            maxScore: { $max: "$searchScore" },
            scoreDistribution: {
              $push: {
                $switch: {
                  branches: [
                    { case: { $gte: ["$searchScore", 10] }, then: "excellent" },
                    { case: { $gte: ["$searchScore", 5] }, then: "good" },
                    { case: { $gte: ["$searchScore", 2] }, then: "fair" }
                  ],
                  default: "poor"
                }
              }
            }
          }
        }
      ]
    }
  }
];

// Execute search with comprehensive results
const searchResults = await articles.aggregate(searchQuery).toArray();

// Benefits of MongoDB Atlas Search:
// - Advanced relevance scoring with custom ranking factors
// - Semantic search with synonym support and fuzzy matching
// - Real-time search index updates synchronized with data changes
// - Faceted search with complex aggregations
// - Advanced highlighting and snippet generation
// - Built-in analytics and search performance metrics
// - Support for multiple languages and custom analyzers
// - Vector search capabilities for AI and machine learning
// - Auto-completion and suggestion features
// - Geospatial search integration
// - Security and access control integration

Understanding MongoDB Atlas Search Architecture

Search Index Creation and Management

Implement comprehensive search indexes for optimal performance:

// Advanced Atlas Search index management system
class AtlasSearchManager {
  constructor(db) {
    this.db = db;
    this.searchIndexes = new Map();
    this.searchAnalytics = db.collection('search_analytics');
  }

  async createComprehensiveSearchIndex(collection, indexName, indexDefinition) {
    // Create sophisticated search index with multiple field types
    const advancedIndexDefinition = {
      name: indexName,
      definition: {
        // Text search fields with different analyzers
        mappings: {
          dynamic: false,
          fields: {
            // Title field with enhanced text analysis
            title: {
              type: "string",
              analyzer: "lucene.english",
              searchAnalyzer: "lucene.keyword",
              highlightAnalyzer: "lucene.english",
              store: true,
              indexOptions: "freqs"
            },

            // Content field with full-text capabilities
            content: {
              type: "string",
              analyzer: "content_analyzer",
              maxGrams: 15,
              minGrams: 2,
              store: true
            },

            // Category as both text and facet
            category: [
              {
                type: "string",
                analyzer: "lucene.keyword"
              },
              {
                type: "stringFacet"
              }
            ],

            // Tags for exact and fuzzy matching
            tags: {
              type: "string",
              analyzer: "lucene.standard",
              multi: {
                keyword: {
                  type: "string",
                  analyzer: "lucene.keyword"
                }
              }
            },

            // Author information
            "author.name": {
              type: "string",
              analyzer: "lucene.standard",
              store: true
            },

            "author.expertise": {
              type: "stringFacet"
            },

            // Numeric fields for sorting and filtering
            publishedDate: {
              type: "date"
            },

            viewCount: {
              type: "number",
              indexIntegers: true,
              indexDoubles: false
            },

            likeCount: {
              type: "number"
            },

            readingTime: {
              type: "number"
            },

            // Geospatial data
            "location.coordinates": {
              type: "geo"
            },

            // Vector field for semantic search
            contentEmbedding: {
              type: "knnVector",
              dimensions: 1536,
              similarity: "cosine"
            }
          }
        },

        // Custom analyzers
        analyzers: [
          {
            name: "content_analyzer",
            charFilters: [
              {
                type: "htmlStrip"
              },
              {
                type: "mapping",
                mappings: {
                  "& => and",
                  "@ => at"
                }
              }
            ],
            tokenizer: {
              type: "standard"
            },
            tokenFilters: [
              {
                type: "lowercase"
              },
              {
                type: "stop",
                stopwords: ["the", "a", "an", "and", "or", "but"]
              },
              {
                type: "snowballStemming",
                language: "english"
              },
              {
                type: "length",
                min: 2,
                max: 100
              }
            ]
          },

          {
            name: "autocomplete_analyzer",
            tokenizer: {
              type: "edgeGram",
              minGrams: 1,
              maxGrams: 20
            },
            tokenFilters: [
              {
                type: "lowercase"
              }
            ]
          }
        ],

        // Synonym mappings
        synonyms: [
          {
            name: "tech_synonyms",
            source: {
              collection: "synonyms",
              analyzer: "lucene.standard"
            }
          }
        ],

        // Search configuration
        storedSource: {
          include: ["title", "author.name", "category", "publishedDate"],
          exclude: ["content", "internalNotes"]
        }
      }
    };

    try {
      // Create the search index
      const result = await this.db.collection(collection).createSearchIndex(advancedIndexDefinition);

      // Store index metadata
      this.searchIndexes.set(indexName, {
        collection: collection,
        indexName: indexName,
        definition: advancedIndexDefinition,
        createdAt: new Date(),
        status: 'creating'
      });

      console.log(`Search index '${indexName}' created for collection '${collection}'`);
      return result;

    } catch (error) {
      console.error(`Failed to create search index '${indexName}':`, error);
      throw error;
    }
  }

  async createAutoCompleteIndex(collection, fields, indexName = 'autocomplete_index') {
    // Create specialized index for auto-completion
    const autoCompleteIndex = {
      name: indexName,
      definition: {
        mappings: {
          dynamic: false,
          fields: fields.reduce((acc, field) => {
            acc[field.path] = {
              type: "autocomplete",
              analyzer: "autocomplete_analyzer",
              tokenization: "edgeGram",
              maxGrams: field.maxGrams || 15,
              minGrams: field.minGrams || 2,
              foldDiacritics: true
            };
            return acc;
          }, {})
        },
        analyzers: [
          {
            name: "autocomplete_analyzer",
            tokenizer: {
              type: "edgeGram",
              minGrams: 2,
              maxGrams: 15
            },
            tokenFilters: [
              {
                type: "lowercase"
              },
              {
                type: "diacriticFolding"
              }
            ]
          }
        ]
      }
    };

    return await this.db.collection(collection).createSearchIndex(autoCompleteIndex);
  }

  async performAdvancedSearch(collection, searchParams) {
    // Execute sophisticated search with multiple techniques
    const pipeline = [];

    // Build complex search stage
    const searchStage = {
      $search: {
        index: searchParams.index || 'default_search_index',
        compound: {
          must: [],
          should: [],
          filter: [],
          mustNot: []
        }
      }
    };

    // Text search with boosting
    if (searchParams.query) {
      searchStage.$search.compound.must.push({
        text: {
          query: searchParams.query,
          path: searchParams.searchFields || ['title', 'content'],
          fuzzy: searchParams.fuzzy || {
            maxEdits: 2,
            prefixLength: 1
          }
        }
      });

      // Boost title matches
      searchStage.$search.compound.should.push({
        text: {
          query: searchParams.query,
          path: 'title',
          score: { boost: { value: 3.0 } }
        }
      });

      // Phrase matching
      if (searchParams.phraseSearch) {
        searchStage.$search.compound.should.push({
          phrase: {
            query: searchParams.query,
            path: ['title', 'content'],
            slop: 2,
            score: { boost: { value: 2.0 } }
          }
        });
      }
    }

    // Vector search for semantic similarity
    if (searchParams.vectorQuery) {
      searchStage.$search = {
        knnBeta: {
          vector: searchParams.vectorQuery,
          path: "contentEmbedding",
          k: searchParams.vectorK || 50,
          score: {
            boost: {
              value: searchParams.vectorBoost || 1.5
            }
          }
        }
      };
    }

    // Filters
    if (searchParams.filters) {
      if (searchParams.filters.category) {
        searchStage.$search.compound.filter.push({
          text: {
            query: searchParams.filters.category,
            path: "category"
          }
        });
      }

      if (searchParams.filters.dateRange) {
        searchStage.$search.compound.filter.push({
          range: {
            path: "publishedDate",
            gte: new Date(searchParams.filters.dateRange.start),
            lte: new Date(searchParams.filters.dateRange.end)
          }
        });
      }

      if (searchParams.filters.author) {
        searchStage.$search.compound.filter.push({
          text: {
            query: searchParams.filters.author,
            path: "author.name"
          }
        });
      }

      if (searchParams.filters.minViewCount) {
        searchStage.$search.compound.filter.push({
          range: {
            path: "viewCount",
            gte: searchParams.filters.minViewCount
          }
        });
      }
    }

    // Highlighting
    if (searchParams.highlight !== false) {
      searchStage.$search.highlight = {
        path: searchParams.highlightFields || ['title', 'content'],
        maxCharsToExamine: 500000,
        maxNumPassages: 5
      };
    }

    // Count configuration
    if (searchParams.count) {
      searchStage.$search.count = {
        type: searchParams.count.type || 'total',
        threshold: searchParams.count.threshold || 1000
      };
    }

    pipeline.push(searchStage);

    // Add scoring and ranking
    pipeline.push({
      $addFields: {
        searchScore: { $meta: "searchScore" },
        searchHighlights: { $meta: "searchHighlights" },

        // Custom relevance scoring
        relevanceScore: {
          $add: [
            "$searchScore",
            // Boost recent content
            {
              $multiply: [
                {
                  $max: [
                    0,
                    {
                      $subtract: [
                        30,
                        {
                          $divide: [
                            { $subtract: [new Date(), "$publishedDate"] },
                            86400000
                          ]
                        }
                      ]
                    }
                  ]
                },
                0.1
              ]
            },
            // Boost popular content
            {
              $multiply: [
                { $log10: { $max: [1, "$viewCount"] } },
                0.2
              ]
            },
            // Boost quality content
            {
              $multiply: [
                { $min: [{ $divide: [{ $strLenCP: "$content" }, 1000] }, 3] },
                0.15
              ]
            }
          ]
        }
      }
    });

    // Faceted search results
    if (searchParams.facets) {
      pipeline.push({
        $facet: {
          results: [
            { $sort: { relevanceScore: -1 } },
            { $skip: searchParams.skip || 0 },
            { $limit: searchParams.limit || 20 },
            {
              $project: {
                _id: 1,
                title: 1,
                author: 1,
                category: 1,
                tags: 1,
                publishedDate: 1,
                viewCount: 1,
                likeCount: 1,
                searchScore: 1,
                relevanceScore: 1,
                searchHighlights: 1,
                snippet: { $substr: ["$content", 0, 250] },
                readingTime: 1
              }
            }
          ],

          facets: this.buildFacetPipeline(searchParams.facets),

          totalCount: [
            { $count: "total" }
          ]
        }
      });
    } else {
      // Simple results without faceting
      pipeline.push(
        { $sort: { relevanceScore: -1 } },
        { $skip: searchParams.skip || 0 },
        { $limit: searchParams.limit || 20 }
      );
    }

    // Execute search and track analytics
    const startTime = Date.now();
    const results = await this.db.collection(collection).aggregate(pipeline).toArray();
    const executionTime = Date.now() - startTime;

    // Log search analytics
    await this.logSearchAnalytics(searchParams, results, executionTime);

    return results;
  }

  buildFacetPipeline(facetConfig) {
    const facetPipeline = {};

    if (facetConfig.category) {
      facetPipeline.categories = [
        {
          $group: {
            _id: "$category",
            count: { $sum: 1 },
            avgScore: { $avg: "$searchScore" }
          }
        },
        { $sort: { count: -1 } },
        { $limit: 20 }
      ];
    }

    if (facetConfig.author) {
      facetPipeline.authors = [
        {
          $group: {
            _id: "$author.name",
            count: { $sum: 1 },
            avgScore: { $avg: "$searchScore" },
            expertise: { $first: "$author.expertise" }
          }
        },
        { $sort: { count: -1 } },
        { $limit: 15 }
      ];
    }

    if (facetConfig.tags) {
      facetPipeline.tags = [
        { $unwind: "$tags" },
        {
          $group: {
            _id: "$tags",
            count: { $sum: 1 },
            avgScore: { $avg: "$searchScore" }
          }
        },
        { $sort: { count: -1 } },
        { $limit: 25 }
      ];
    }

    if (facetConfig.dateRanges) {
      facetPipeline.dateRanges = [
        {
          $bucket: {
            groupBy: "$publishedDate",
            boundaries: [
              new Date("2020-01-01"),
              new Date("2022-01-01"),
              new Date("2023-01-01"),
              new Date("2024-01-01"),
              new Date("2025-01-01"),
              new Date("2030-01-01")
            ],
            default: "older",
            output: {
              count: { $sum: 1 },
              avgScore: { $avg: "$searchScore" }
            }
          }
        }
      ];
    }

    if (facetConfig.viewRanges) {
      facetPipeline.viewRanges = [
        {
          $bucket: {
            groupBy: "$viewCount",
            boundaries: [0, 100, 1000, 10000, 100000, 1000000],
            default: "very_popular",
            output: {
              count: { $sum: 1 },
              avgScore: { $avg: "$searchScore" }
            }
          }
        }
      ];
    }

    return facetPipeline;
  }

  async performAutoComplete(collection, query, field, limit = 10) {
    // Auto-completion search
    const pipeline = [
      {
        $search: {
          index: 'autocomplete_index',
          autocomplete: {
            query: query,
            path: field,
            tokenOrder: "sequential",
            fuzzy: {
              maxEdits: 1,
              prefixLength: 1
            }
          }
        }
      },
      {
        $group: {
          _id: `$${field}`,
          score: { $max: { $meta: "searchScore" } },
          count: { $sum: 1 }
        }
      },
      { $sort: { score: -1, count: -1 } },
      { $limit: limit },
      {
        $project: {
          suggestion: "$_id",
          score: 1,
          frequency: "$count",
          _id: 0
        }
      }
    ];

    return await this.db.collection(collection).aggregate(pipeline).toArray();
  }

  async performSemanticSearch(collection, queryVector, filters = {}, limit = 20) {
    // Vector-based semantic search
    const pipeline = [
      {
        $vectorSearch: {
          index: "vector_search_index",
          path: "contentEmbedding",
          queryVector: queryVector,
          numCandidates: limit * 10,
          limit: limit,
          filter: filters
        }
      },
      {
        $addFields: {
          vectorScore: { $meta: "vectorSearchScore" }
        }
      },
      {
        $project: {
          title: 1,
          content: { $substr: ["$content", 0, 200] },
          author: 1,
          category: 1,
          publishedDate: 1,
          vectorScore: 1,
          similarity: { $multiply: ["$vectorScore", 100] }
        }
      }
    ];

    return await this.db.collection(collection).aggregate(pipeline).toArray();
  }

  async createSearchSuggestions(collection, userQuery, suggestionTypes = ['spelling', 'query', 'category']) {
    // Generate search suggestions and corrections
    const suggestions = {
      spelling: [],
      queries: [],
      categories: [],
      authors: []
    };

    // Spelling suggestions using fuzzy search
    if (suggestionTypes.includes('spelling')) {
      const spellingPipeline = [
        {
          $search: {
            index: 'default_search_index',
            text: {
              query: userQuery,
              path: ['title', 'content'],
              fuzzy: {
                maxEdits: 2,
                prefixLength: 0
              }
            }
          }
        },
        { $limit: 5 },
        {
          $project: {
            title: 1,
            score: { $meta: "searchScore" }
          }
        }
      ];

      suggestions.spelling = await this.db.collection(collection).aggregate(spellingPipeline).toArray();
    }

    // Query suggestions from search history
    if (suggestionTypes.includes('query')) {
      suggestions.queries = await this.searchAnalytics.find({
        query: new RegExp(userQuery, 'i'),
        resultCount: { $gt: 0 }
      })
      .sort({ searchCount: -1 })
      .limit(5)
      .project({ query: 1, resultCount: 1 })
      .toArray();
    }

    // Category suggestions
    if (suggestionTypes.includes('category')) {
      const categoryPipeline = [
        {
          $search: {
            index: 'default_search_index',
            text: {
              query: userQuery,
              path: 'category'
            }
          }
        },
        {
          $group: {
            _id: "$category",
            count: { $sum: 1 },
            score: { $max: { $meta: "searchScore" } }
          }
        },
        { $sort: { score: -1, count: -1 } },
        { $limit: 5 }
      ];

      suggestions.categories = await this.db.collection(collection).aggregate(categoryPipeline).toArray();
    }

    return suggestions;
  }

  async logSearchAnalytics(searchParams, results, executionTime) {
    // Track search analytics for optimization
    const analyticsDoc = {
      query: searchParams.query,
      searchType: this.determineSearchType(searchParams),
      filters: searchParams.filters || {},
      resultCount: Array.isArray(results) ? results.length : 
                   (results[0] && results[0].totalCount ? results[0].totalCount[0]?.total : 0),
      executionTime: executionTime,
      timestamp: new Date(),

      // Search quality metrics
      avgScore: this.calculateAverageScore(results),
      scoreDistribution: this.analyzeScoreDistribution(results),

      // User experience metrics
      hasResults: (results && results.length > 0),
      fastResponse: executionTime < 500,

      // Technical metrics
      index: searchParams.index,
      facetsRequested: !!searchParams.facets,
      highlightRequested: searchParams.highlight !== false
    };

    await this.searchAnalytics.insertOne(analyticsDoc);

    // Update search frequency
    await this.searchAnalytics.updateOne(
      { 
        query: searchParams.query,
        searchType: analyticsDoc.searchType 
      },
      { 
        $inc: { searchCount: 1 },
        $set: { lastSearched: new Date() }
      },
      { upsert: true }
    );
  }

  determineSearchType(searchParams) {
    if (searchParams.vectorQuery) return 'vector';
    if (searchParams.phraseSearch) return 'phrase';
    if (searchParams.fuzzy) return 'fuzzy';
    return 'text';
  }

  calculateAverageScore(results) {
    if (!results || !results.length) return 0;

    const scores = results.map(r => r.searchScore || r.relevanceScore || 0);
    return scores.reduce((sum, score) => sum + score, 0) / scores.length;
  }

  analyzeScoreDistribution(results) {
    if (!results || !results.length) return {};

    const scores = results.map(r => r.searchScore || r.relevanceScore || 0);
    const distribution = {
      excellent: scores.filter(s => s >= 10).length,
      good: scores.filter(s => s >= 5 && s < 10).length,
      fair: scores.filter(s => s >= 2 && s < 5).length,
      poor: scores.filter(s => s < 2).length
    };

    return distribution;
  }

  async getSearchAnalytics(dateRange = {}, groupBy = 'day') {
    // Comprehensive search analytics
    const matchStage = {
      timestamp: {
        $gte: dateRange.start || new Date(Date.now() - 30 * 24 * 60 * 60 * 1000),
        $lte: dateRange.end || new Date()
      }
    };

    const pipeline = [
      { $match: matchStage },

      {
        $group: {
          _id: this.getGroupingExpression(groupBy),
          totalSearches: { $sum: 1 },
          uniqueQueries: { $addToSet: "$query" },
          avgExecutionTime: { $avg: "$executionTime" },
          avgResultCount: { $avg: "$resultCount" },
          successfulSearches: {
            $sum: { $cond: [{ $gt: ["$resultCount", 0] }, 1, 0] }
          },
          fastSearches: {
            $sum: { $cond: [{ $lt: ["$executionTime", 500] }, 1, 0] }
          },
          searchTypes: { $push: "$searchType" },
          popularQueries: { $push: "$query" }
        }
      },

      {
        $addFields: {
          uniqueQueryCount: { $size: "$uniqueQueries" },
          successRate: { $divide: ["$successfulSearches", "$totalSearches"] },
          performanceRate: { $divide: ["$fastSearches", "$totalSearches"] },
          topQueries: {
            $slice: [
              {
                $sortArray: {
                  input: {
                    $reduce: {
                      input: "$popularQueries",
                      initialValue: [],
                      in: {
                        $concatArrays: [
                          "$$value",
                          [{ query: "$$this", count: 1 }]
                        ]
                      }
                    }
                  },
                  sortBy: { count: -1 }
                }
              },
              10
            ]
          }
        }
      },

      { $sort: { _id: -1 } }
    ];

    return await this.searchAnalytics.aggregate(pipeline).toArray();
  }

  getGroupingExpression(groupBy) {
    const dateExpressions = {
      hour: {
        year: { $year: "$timestamp" },
        month: { $month: "$timestamp" },
        day: { $dayOfMonth: "$timestamp" },
        hour: { $hour: "$timestamp" }
      },
      day: {
        year: { $year: "$timestamp" },
        month: { $month: "$timestamp" },
        day: { $dayOfMonth: "$timestamp" }
      },
      week: {
        year: { $year: "$timestamp" },
        week: { $week: "$timestamp" }
      },
      month: {
        year: { $year: "$timestamp" },
        month: { $month: "$timestamp" }
      }
    };

    return dateExpressions[groupBy] || dateExpressions.day;
  }

  async optimizeSearchPerformance(collection, analysisRange = 30) {
    // Analyze and optimize search performance
    const analysisDate = new Date(Date.now() - analysisRange * 24 * 60 * 60 * 1000);

    const performanceAnalysis = await this.searchAnalytics.aggregate([
      { $match: { timestamp: { $gte: analysisDate } } },

      {
        $group: {
          _id: null,
          totalSearches: { $sum: 1 },
          avgExecutionTime: { $avg: "$executionTime" },
          slowSearches: {
            $sum: { $cond: [{ $gt: ["$executionTime", 2000] }, 1, 0] }
          },
          emptyResults: {
            $sum: { $cond: [{ $eq: ["$resultCount", 0] }, 1, 0] }
          },
          commonQueries: { $push: "$query" },
          slowQueries: {
            $push: {
              $cond: [
                { $gt: ["$executionTime", 1000] },
                { query: "$query", executionTime: "$executionTime" },
                null
              ]
            }
          }
        }
      }
    ]).toArray();

    const analysis = performanceAnalysis[0];
    const recommendations = [];

    // Performance recommendations
    if (analysis.avgExecutionTime > 1000) {
      recommendations.push({
        type: 'performance',
        issue: 'High average execution time',
        recommendation: 'Consider index optimization or query refinement',
        priority: 'high'
      });
    }

    if (analysis.slowSearches / analysis.totalSearches > 0.1) {
      recommendations.push({
        type: 'performance',
        issue: 'High percentage of slow searches',
        recommendation: 'Review index configuration and query complexity',
        priority: 'high'
      });
    }

    if (analysis.emptyResults / analysis.totalSearches > 0.3) {
      recommendations.push({
        type: 'relevance',
        issue: 'High percentage of searches with no results',
        recommendation: 'Improve fuzzy matching and synonyms configuration',
        priority: 'medium'
      });
    }

    return {
      analysis: analysis,
      recommendations: recommendations,
      generatedAt: new Date()
    };
  }
}

SQL-Style Search Operations with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB Atlas Search operations:

-- QueryLeaf Atlas Search operations with SQL-familiar syntax

-- Create full-text search index
CREATE SEARCH INDEX articles_search_idx ON articles (
  -- Text fields with different analyzers
  title WITH (analyzer='lucene.english', boost=3.0),
  content WITH (analyzer='content_analyzer', store=true),

  -- Faceted fields
  category AS FACET,
  "author.name" AS FACET,
  tags AS FACET,

  -- Numeric and date fields
  publishedDate AS DATE,
  viewCount AS NUMBER,
  likeCount AS NUMBER,

  -- Auto-completion fields
  title AS AUTOCOMPLETE WITH (maxGrams=15, minGrams=2),

  -- Vector field for semantic search
  contentEmbedding AS VECTOR WITH (dimensions=1536, similarity='cosine')
);

-- Advanced text search with ranking
SELECT 
  article_id,
  title,
  author,
  category,
  published_date,
  view_count,

  -- Search relevance scoring
  SEARCH_SCORE() as search_score,
  SEARCH_HIGHLIGHTS('title', 'content') as highlights,

  -- Custom relevance calculation
  (SEARCH_SCORE() + 
   LOG10(GREATEST(1, view_count)) * 0.2 +
   CASE 
     WHEN published_date >= CURRENT_DATE - INTERVAL '30 days' THEN 2.0
     WHEN published_date >= CURRENT_DATE - INTERVAL '90 days' THEN 1.0
     ELSE 0
   END) as final_score

FROM articles
WHERE SEARCH_TEXT('machine learning algorithms', 
  fields => ARRAY['title', 'content'],
  fuzzy => JSON_BUILD_OBJECT('maxEdits', 2, 'prefixLength', 1),
  boost => JSON_BUILD_OBJECT('title', 3.0, 'content', 1.0)
)
AND category IN ('technology', 'science', 'ai')
AND published_date >= '2023-01-01'
AND status != 'draft'

ORDER BY final_score DESC
LIMIT 20;

-- Faceted search with aggregations
WITH search_results AS (
  SELECT *,
    SEARCH_SCORE() as search_score,
    SEARCH_HIGHLIGHTS('title', 'content') as highlights
  FROM articles
  WHERE SEARCH_TEXT('artificial intelligence',
    fields => ARRAY['title', 'content'],
    synonyms => 'tech_synonyms'
  )
)
SELECT 
  -- Main results
  json_build_object(
    'results', json_agg(
      json_build_object(
        'article_id', article_id,
        'title', title,
        'author', author,
        'category', category,
        'search_score', search_score,
        'highlights', highlights
      ) ORDER BY search_score DESC LIMIT 20
    ),

    -- Category facets
    'categoryFacets', (
      SELECT json_agg(
        json_build_object(
          'category', category,
          'count', COUNT(*),
          'avgScore', AVG(search_score)
        )
      )
      FROM (
        SELECT category, search_score
        FROM search_results
        GROUP BY category, search_score
      ) cat_data
      GROUP BY category
      ORDER BY COUNT(*) DESC
    ),

    -- Author facets
    'authorFacets', (
      SELECT json_agg(
        json_build_object(
          'author', author->>'name',
          'count', COUNT(*),
          'expertise', author->>'expertise'
        )
      )
      FROM search_results
      GROUP BY author->>'name', author->>'expertise'
      ORDER BY COUNT(*) DESC
      LIMIT 10
    ),

    -- Search analytics
    'analytics', json_build_object(
      'totalResults', COUNT(*),
      'avgScore', AVG(search_score),
      'maxScore', MAX(search_score),
      'scoreDistribution', json_build_object(
        'excellent', COUNT(*) FILTER (WHERE search_score >= 10),
        'good', COUNT(*) FILTER (WHERE search_score >= 5 AND search_score < 10),
        'fair', COUNT(*) FILTER (WHERE search_score >= 2 AND search_score < 5),
        'poor', COUNT(*) FILTER (WHERE search_score < 2)
      )
    )
  )
FROM search_results;

-- Auto-completion search
SELECT 
  suggestion,
  score,
  frequency
FROM AUTOCOMPLETE_SEARCH('machine lear', 
  field => 'title',
  limit => 10,
  fuzzy => JSON_BUILD_OBJECT('maxEdits', 1)
)
ORDER BY score DESC, frequency DESC;

-- Semantic vector search
SELECT 
  article_id,
  title,
  author,
  category,
  published_date,
  VECTOR_SCORE() as similarity_score,
  ROUND(VECTOR_SCORE() * 100, 2) as similarity_percentage
FROM articles
WHERE VECTOR_SEARCH(@query_embedding,
  field => 'contentEmbedding',
  k => 20,
  filter => JSON_BUILD_OBJECT('category', ARRAY['technology', 'ai'])
)
ORDER BY similarity_score DESC;

-- Combined text and vector search (hybrid search)
WITH text_search AS (
  SELECT article_id, title, author, category, published_date,
    SEARCH_SCORE() as text_score,
    1 as search_type
  FROM articles
  WHERE SEARCH_TEXT('neural networks deep learning')
  ORDER BY SEARCH_SCORE() DESC
  LIMIT 50
),
vector_search AS (
  SELECT article_id, title, author, category, published_date,
    VECTOR_SCORE() as vector_score,
    2 as search_type
  FROM articles
  WHERE VECTOR_SEARCH(@neural_networks_embedding, field => 'contentEmbedding', k => 50)
),
combined_results AS (
  -- Combine and re-rank results
  SELECT 
    COALESCE(t.article_id, v.article_id) as article_id,
    COALESCE(t.title, v.title) as title,
    COALESCE(t.author, v.author) as author,
    COALESCE(t.category, v.category) as category,
    COALESCE(t.published_date, v.published_date) as published_date,

    -- Hybrid scoring
    COALESCE(t.text_score, 0) * 0.6 + COALESCE(v.vector_score, 0) * 0.4 as hybrid_score,

    CASE 
      WHEN t.article_id IS NOT NULL AND v.article_id IS NOT NULL THEN 'both'
      WHEN t.article_id IS NOT NULL THEN 'text_only'
      ELSE 'vector_only'
    END as match_type
  FROM text_search t
  FULL OUTER JOIN vector_search v ON t.article_id = v.article_id
)
SELECT * FROM combined_results
ORDER BY hybrid_score DESC, match_type = 'both' DESC
LIMIT 20;

-- Search with custom scoring and boosting
SELECT 
  article_id,
  title,
  author,
  category,
  published_date,
  view_count,
  like_count,

  -- Multi-factor scoring
  (
    SEARCH_SCORE() * 1.0 +                                    -- Base search relevance
    LOG10(GREATEST(1, view_count)) * 0.3 +                   -- Popularity boost
    LOG10(GREATEST(1, like_count)) * 0.2 +                   -- Engagement boost
    CASE 
      WHEN published_date >= CURRENT_DATE - INTERVAL '7 days' THEN 3.0
      WHEN published_date >= CURRENT_DATE - INTERVAL '30 days' THEN 2.0  
      WHEN published_date >= CURRENT_DATE - INTERVAL '90 days' THEN 1.0
      ELSE 0
    END +                                                     -- Recency boost
    CASE 
      WHEN LENGTH(content) >= 2000 THEN 1.5
      WHEN LENGTH(content) >= 1000 THEN 1.0
      ELSE 0.5
    END                                                       -- Content quality boost
  ) as comprehensive_score

FROM articles
WHERE SEARCH_COMPOUND(
  must => ARRAY[
    SEARCH_TEXT('blockchain cryptocurrency', fields => ARRAY['title', 'content'])
  ],
  should => ARRAY[
    SEARCH_TEXT('blockchain', field => 'title', boost => 3.0),
    SEARCH_PHRASE('blockchain technology', fields => ARRAY['title', 'content'], slop => 2)
  ],
  filter => ARRAY[
    SEARCH_RANGE('published_date', gte => '2022-01-01'),
    SEARCH_TERMS('category', values => ARRAY['technology', 'finance'])
  ],
  must_not => ARRAY[
    SEARCH_TERM('status', value => 'draft')
  ]
)
ORDER BY comprehensive_score DESC;

-- Search analytics and performance monitoring  
SELECT 
  DATE_TRUNC('day', search_timestamp) as search_date,
  search_query,
  COUNT(*) as search_count,
  AVG(execution_time_ms) as avg_execution_time,
  AVG(result_count) as avg_results,

  -- Performance metrics
  COUNT(*) FILTER (WHERE execution_time_ms < 500) as fast_searches,
  COUNT(*) FILTER (WHERE result_count > 0) as successful_searches,
  COUNT(*) FILTER (WHERE result_count = 0) as empty_searches,

  -- Search quality metrics
  AVG(CASE WHEN result_count > 0 THEN avg_search_score END) as avg_relevance,

  -- User behavior indicators
  COUNT(DISTINCT user_id) as unique_searchers,
  AVG(click_through_rate) as avg_ctr

FROM search_analytics
WHERE search_timestamp >= CURRENT_DATE - INTERVAL '30 days'
  AND search_query IS NOT NULL
GROUP BY DATE_TRUNC('day', search_timestamp), search_query
HAVING COUNT(*) >= 10  -- Only frequent searches
ORDER BY search_count DESC, avg_execution_time ASC;

-- Search optimization recommendations
WITH search_performance AS (
  SELECT 
    search_query,
    COUNT(*) as frequency,
    AVG(execution_time_ms) as avg_time,
    AVG(result_count) as avg_results,
    STDDEV(execution_time_ms) as time_variance
  FROM search_analytics
  WHERE search_timestamp >= CURRENT_DATE - INTERVAL '7 days'
  GROUP BY search_query
  HAVING COUNT(*) >= 5
),
optimization_analysis AS (
  SELECT *,
    CASE 
      WHEN avg_time > 2000 THEN 'slow_query'
      WHEN avg_results = 0 THEN 'no_results'
      WHEN avg_results < 5 THEN 'few_results'
      WHEN time_variance > avg_time THEN 'inconsistent_performance'
      ELSE 'optimal'
    END as performance_category,

    CASE 
      WHEN avg_time > 2000 THEN 'Add more specific indexes or optimize query complexity'
      WHEN avg_results = 0 THEN 'Improve fuzzy matching and synonym configuration'
      WHEN avg_results < 5 THEN 'Review relevance scoring and boost popular content'
      WHEN time_variance > avg_time THEN 'Investigate index fragmentation or resource contention'
      ELSE 'Query performing well'
    END as recommendation
)
SELECT 
  search_query,
  frequency,
  ROUND(avg_time, 2) as avg_execution_time_ms,
  ROUND(avg_results, 1) as avg_result_count,
  performance_category,
  recommendation,

  -- Priority scoring
  CASE 
    WHEN performance_category = 'slow_query' AND frequency > 100 THEN 1
    WHEN performance_category = 'no_results' AND frequency > 50 THEN 2
    WHEN performance_category = 'inconsistent_performance' AND frequency > 75 THEN 3
    ELSE 4
  END as optimization_priority

FROM optimization_analysis
WHERE performance_category != 'optimal'
ORDER BY optimization_priority, frequency DESC;

-- QueryLeaf provides comprehensive Atlas Search capabilities:
-- 1. SQL-familiar search index creation and management
-- 2. Advanced text search with custom scoring and boosting
-- 3. Faceted search with aggregations and analytics
-- 4. Auto-completion and suggestion generation
-- 5. Vector search for semantic similarity
-- 6. Hybrid search combining text and vector approaches
-- 7. Search analytics and performance monitoring
-- 8. Automated optimization recommendations
-- 9. Real-time search index synchronization
-- 10. Integration with MongoDB's native Atlas Search features

Best Practices for Atlas Search Implementation

Search Index Optimization

Essential practices for optimal search performance:

  1. Index Design Strategy: Design indexes specifically for your search patterns and query types
  2. Field Analysis: Use appropriate analyzers for different content types and languages
  3. Relevance Tuning: Implement custom scoring with business logic and user behavior
  4. Performance Monitoring: Track search analytics and optimize based on real usage patterns
  5. Faceting Strategy: Design facets to support filtering and discovery workflows
  6. Auto-completion Design: Implement sophisticated suggestion systems for user experience

Search Quality and Relevance

Optimize search quality through comprehensive relevance engineering:

  1. Multi-factor Scoring: Combine text relevance with business metrics and user behavior
  2. Semantic Enhancement: Use synonyms and vector search for better understanding
  3. Query Understanding: Implement fuzzy matching and error correction
  4. Content Quality: Factor content quality metrics into relevance scoring
  5. Personalization: Incorporate user preferences and search history
  6. A/B Testing: Continuously test and optimize search relevance algorithms

Conclusion

MongoDB Atlas Search provides enterprise-grade search capabilities that eliminate the complexity of external search engines while delivering sophisticated full-text search, semantic understanding, and search analytics. The integration of advanced search features with familiar SQL syntax makes implementing modern search applications both powerful and accessible.

Key Atlas Search benefits include:

  • Native Integration: Built-in search without external dependencies or synchronization
  • Advanced Relevance: Sophisticated scoring with custom business logic
  • Real-time Updates: Automatic search index synchronization with data changes
  • Comprehensive Analytics: Built-in search performance and user behavior tracking
  • Scalable Architecture: Enterprise-grade performance with horizontal scaling
  • Developer Friendly: Familiar query syntax with powerful search capabilities

Whether you're building e-commerce search, content discovery platforms, knowledge bases, or applications requiring sophisticated text analysis, MongoDB Atlas Search with QueryLeaf's familiar SQL interface provides the foundation for modern search experiences. This combination enables you to implement advanced search capabilities while preserving familiar database interaction patterns.

QueryLeaf Integration: QueryLeaf automatically manages MongoDB Atlas Search operations while providing SQL-familiar search index creation, query syntax, and analytics. Advanced search features, relevance tuning, and performance optimization are seamlessly handled through familiar SQL patterns, making enterprise-grade search both powerful and accessible.

The integration of native search capabilities with SQL-style operations makes MongoDB an ideal platform for applications requiring both sophisticated search functionality and familiar database interaction patterns, ensuring your search solutions remain both effective and maintainable as they scale and evolve.