Skip to content

Blog

MongoDB Geospatial Data Management: SQL-Style Approaches to Location Queries

MongoDB offers powerful geospatial capabilities for storing and querying location-based data. Whether you're building a ride-sharing app, store locator, or IoT sensor network, understanding how to work with coordinates, distances, and geographic boundaries is essential.

While MongoDB's native geospatial operators like $near and $geoWithin handle spatial calculations, applying SQL thinking to location data helps structure queries and optimize performance for common location-based scenarios.

The Geospatial Challenge

Consider a food delivery application that needs to: - Find restaurants within 2km of a customer - Check if a delivery address is within a restaurant's service area - Calculate delivery routes and estimated travel times - Analyze order density by geographic regions

Traditional MongoDB geospatial queries require understanding multiple operators and coordinate systems:

// Sample restaurant document
{
  "_id": ObjectId("..."),
  "name": "Mario's Pizza",
  "cuisine": "Italian",
  "rating": 4.6,
  "location": {
    "type": "Point",
    "coordinates": [-122.4194, 37.7749] // [longitude, latitude]
  },
  "serviceArea": {
    "type": "Polygon",
    "coordinates": [[
      [-122.4294, 37.7649],
      [-122.4094, 37.7649], 
      [-122.4094, 37.7849],
      [-122.4294, 37.7849],
      [-122.4294, 37.7649]
    ]]
  },
  "address": "123 Mission St, San Francisco, CA",
  "phone": "+1-555-0123",
  "deliveryFee": 2.99
}

Native MongoDB proximity search:

// Find restaurants within 2km
db.restaurants.find({
  location: {
    $near: {
      $geometry: {
        type: "Point",
        coordinates: [-122.4194, 37.7749]
      },
      $maxDistance: 2000
    }
  }
})

// Check if point is within delivery area
db.restaurants.find({
  serviceArea: {
    $geoWithin: {
      $geometry: {
        type: "Point",
        coordinates: [-122.4150, 37.7700]
      }
    }
  }
})

SQL-Style Location Data Modeling

Using SQL concepts, we can structure location queries more systematically. While QueryLeaf doesn't directly support spatial functions, we can model location data using standard SQL patterns and coordinate these with MongoDB's native geospatial features:

-- Structure location data using SQL patterns
SELECT 
  name,
  cuisine,
  rating,
  location,
  address
FROM restaurants
WHERE location IS NOT NULL
ORDER BY rating DESC
LIMIT 10

-- Coordinate-based filtering (for approximate area queries)  
SELECT 
  name,
  cuisine,
  rating
FROM restaurants
WHERE latitude BETWEEN 37.7700 AND 37.7800
  AND longitude BETWEEN -122.4250 AND -122.4150
ORDER BY rating DESC

Setting Up Location Indexes

For location-based queries, proper indexing is crucial:

Coordinate Field Indexes

-- Index individual coordinate fields for range queries
CREATE INDEX idx_restaurants_coordinates 
ON restaurants (latitude, longitude)

-- Index location field for native MongoDB geospatial queries
CREATE INDEX idx_restaurants_location
ON restaurants (location)

MongoDB geospatial indexes (use native MongoDB commands):

// For GeoJSON Point data
db.restaurants.createIndex({ location: "2dsphere" })

// For legacy coordinate pairs  
db.restaurants.createIndex({ coordinates: "2d" })

// Compound index combining location with other filters
db.restaurants.createIndex({ location: "2dsphere", cuisine: 1, rating: 1 })

Location Query Patterns with QueryLeaf

Bounding Box Queries

Use SQL range queries to implement approximate location searches:

-- Find restaurants in a rectangular area (bounding box approach)
SELECT 
  name,
  cuisine,  
  rating,
  latitude,
  longitude
FROM restaurants
WHERE latitude BETWEEN 37.7650 AND 37.7850
  AND longitude BETWEEN -122.4300 AND -122.4100
  AND rating >= 4.0
ORDER BY rating DESC
LIMIT 20

-- More precise filtering with nested location fields
SELECT 
  name,
  cuisine,
  rating,
  location.coordinates[0] AS longitude,
  location.coordinates[1] AS latitude  
FROM restaurants
WHERE location.coordinates[1] BETWEEN 37.7650 AND 37.7850
  AND location.coordinates[0] BETWEEN -122.4300 AND -122.4100
ORDER BY rating DESC

Coordinate-Based Filtering

QueryLeaf supports standard SQL operations on coordinate fields:

-- Find restaurants near a specific point using coordinate ranges
SELECT 
  name,
  cuisine,
  rating,
  deliveryFee,
  latitude,
  longitude
FROM restaurants
WHERE latitude BETWEEN 37.7694 AND 37.7794  -- ~1km north-south
  AND longitude BETWEEN -122.4244 AND -122.4144  -- ~1km east-west  
  AND rating >= 4.0
  AND deliveryFee <= 5.00
ORDER BY rating DESC
LIMIT 15

Polygon Containment

-- Check if delivery address is within service areas
SELECT 
  r.name,
  r.phone,
  r.deliveryFee,
  'Available' AS delivery_status
FROM restaurants r
WHERE ST_CONTAINS(r.serviceArea, ST_POINT(-122.4150, 37.7700))
  AND r.cuisine IN ('Italian', 'Chinese', 'Mexican')

-- Find all restaurants serving a specific neighborhood
WITH neighborhood AS (
  SELECT ST_POLYGON(ARRAY[
    ST_POINT(-122.4300, 37.7650),
    ST_POINT(-122.4100, 37.7650),
    ST_POINT(-122.4100, 37.7850),
    ST_POINT(-122.4300, 37.7850),
    ST_POINT(-122.4300, 37.7650)
  ]) AS boundary
)
SELECT 
  r.name,
  r.cuisine,
  r.rating
FROM restaurants r, neighborhood n
WHERE ST_INTERSECTS(r.serviceArea, n.boundary)

Advanced Geospatial Operations

Bounding Box Queries

-- Find restaurants in a rectangular area (bounding box)
SELECT name, cuisine, rating
FROM restaurants
WHERE ST_WITHIN(
  location,
  ST_BOX(
    ST_POINT(-122.4400, 37.7600),  -- Southwest corner
    ST_POINT(-122.4000, 37.7800)   -- Northeast corner
  )
)
ORDER BY rating DESC

Circular Area Queries

-- Find all locations within a circular delivery zone
SELECT 
  name,
  address,
  ST_DISTANCE(location, ST_POINT(-122.4194, 37.7749)) AS distance
FROM restaurants
WHERE ST_WITHIN(
  location,
  ST_BUFFER(ST_POINT(-122.4194, 37.7749), 1500)
)
ORDER BY distance ASC

Route and Path Analysis

-- Calculate total distance along a delivery route
WITH route_points AS (
  SELECT UNNEST(ARRAY[
    ST_POINT(-122.4194, 37.7749),  -- Start: Customer
    ST_POINT(-122.4150, 37.7700),  -- Stop 1: Restaurant A  
    ST_POINT(-122.4250, 37.7800),  -- Stop 2: Restaurant B
    ST_POINT(-122.4194, 37.7749)   -- End: Back to customer
  ]) AS point,
  ROW_NUMBER() OVER () AS seq
)
SELECT 
  SUM(ST_DISTANCE(curr.point, next.point)) AS total_distance_meters,
  SUM(ST_DISTANCE(curr.point, next.point)) / 1609.34 AS total_distance_miles
FROM route_points curr
JOIN route_points next ON curr.seq = next.seq - 1

Real-World Implementation Examples

Store Locator System

-- Comprehensive store locator with business hours
SELECT 
  s.name,
  s.address,
  s.phone,
  s.storeType,
  ST_DISTANCE(s.location, ST_POINT(?, ?)) AS distance_meters,
  CASE 
    WHEN EXTRACT(HOUR FROM CURRENT_TIMESTAMP) BETWEEN s.openHour AND s.closeHour 
    THEN 'Open'
    ELSE 'Closed'
  END AS status
FROM stores s
WHERE ST_DWITHIN(s.location, ST_POINT(?, ?), 10000)  -- 10km radius
  AND s.isActive = true
ORDER BY distance_meters ASC
LIMIT 20
-- Find properties near amenities
WITH user_location AS (
  SELECT ST_POINT(-122.4194, 37.7749) AS point
),
nearby_amenities AS (
  SELECT 
    p._id AS property_id,
    COUNT(CASE WHEN a.type = 'school' THEN 1 END) AS schools_nearby,
    COUNT(CASE WHEN a.type = 'grocery' THEN 1 END) AS groceries_nearby,
    COUNT(CASE WHEN a.type = 'transit' THEN 1 END) AS transit_nearby
  FROM properties p
  JOIN amenities a ON ST_DWITHIN(p.location, a.location, 1000)
  GROUP BY p._id
)
SELECT 
  p.address,
  p.price,
  p.bedrooms,
  p.bathrooms,
  ST_DISTANCE(p.location, ul.point) AS distance_to_user,
  na.schools_nearby,
  na.groceries_nearby,
  na.transit_nearby
FROM properties p
JOIN user_location ul ON ST_DWITHIN(p.location, ul.point, 5000)
LEFT JOIN nearby_amenities na ON p._id = na.property_id
WHERE p.price BETWEEN 500000 AND 800000
  AND p.bedrooms >= 2
ORDER BY 
  (na.schools_nearby + na.groceries_nearby + na.transit_nearby) DESC,
  distance_to_user ASC

IoT Sensor Network

// Sample IoT sensor document
{
  "_id": ObjectId("..."),
  "sensorId": "temp_001",
  "type": "temperature",
  "location": {
    "type": "Point", 
    "coordinates": [-122.4194, 37.7749]
  },
  "readings": [
    {
      "timestamp": ISODate("2025-08-20T10:00:00Z"),
      "value": 22.5,
      "unit": "celsius"
    }
  ],
  "battery": 87,
  "lastSeen": ISODate("2025-08-20T10:05:00Z")
}

Spatial analysis of sensor data:

-- Find sensors in a specific area with recent anomalous readings
SELECT 
  s.sensorId,
  s.type,
  s.battery,
  s.lastSeen,
  r.timestamp,
  r.value,
  ST_DISTANCE(
    s.location, 
    ST_POINT(-122.4200, 37.7750)
  ) AS distance_from_center
FROM sensors s
CROSS JOIN UNNEST(s.readings) AS r
WHERE ST_WITHIN(
  s.location,
  ST_BOX(
    ST_POINT(-122.4300, 37.7700),
    ST_POINT(-122.4100, 37.7800) 
  )
)
AND r.timestamp >= CURRENT_TIMESTAMP - INTERVAL '1 hour'
AND (
  (s.type = 'temperature' AND (r.value < 0 OR r.value > 40)) OR
  (s.type = 'humidity' AND (r.value < 10 OR r.value > 90))
)
ORDER BY r.timestamp DESC

Performance Optimization

Spatial Query Optimization

-- Optimize queries by limiting search area first
SELECT 
  name,
  cuisine,
  ST_DISTANCE(location, ST_POINT(-122.4194, 37.7749)) AS exact_distance
FROM restaurants
WHERE 
  -- Use bounding box for initial filtering (uses index efficiently)
  ST_WITHIN(location, ST_BOX(
    ST_POINT(-122.4244, 37.7699),  -- Southwest
    ST_POINT(-122.4144, 37.7799)   -- Northeast  
  ))
  -- Then apply precise distance filter
  AND ST_DWITHIN(location, ST_POINT(-122.4194, 37.7749), 2000)
ORDER BY exact_distance ASC

Compound Index Strategy

-- Create indexes that support both spatial and attribute filtering
CREATE INDEX idx_restaurants_location_rating_cuisine
ON restaurants (location, rating, cuisine)
USING GEO2DSPHERE

-- Query that leverages the compound index
SELECT name, rating, cuisine
FROM restaurants  
WHERE ST_DWITHIN(location, ST_POINT(-122.4194, 37.7749), 3000)
  AND rating >= 4.0
  AND cuisine = 'Italian'

Data Import and Coordinate Systems

Converting Address to Coordinates

-- Geocoded restaurant data insertion
INSERT INTO restaurants (
  name,
  address, 
  location,
  cuisine
) VALUES (
  'Giuseppe''s Italian',
  '456 Columbus Ave, San Francisco, CA',
  ST_POINT(-122.4075, 37.7983),  -- Geocoded coordinates
  'Italian'
)

-- Bulk geocoding update for existing records
UPDATE restaurants 
SET location = ST_POINT(longitude, latitude)
WHERE location IS NULL
  AND longitude IS NOT NULL 
  AND latitude IS NOT NULL

Working with Different Coordinate Systems

-- Convert between coordinate systems (if needed)
SELECT 
  name,
  location AS wgs84_point,
  ST_TRANSFORM(location, 3857) AS web_mercator_point
FROM restaurants
WHERE name LIKE '%Pizza%'

Aggregation with Geospatial Data

Density Analysis

-- Analyze restaurant density by geographic grid
WITH grid_cells AS (
  SELECT 
    FLOOR((ST_X(location) + 122.45) * 100) AS grid_x,
    FLOOR((ST_Y(location) - 37.75) * 100) AS grid_y,
    COUNT(*) AS restaurant_count,
    AVG(rating) AS avg_rating
  FROM restaurants
  WHERE ST_WITHIN(location, ST_BOX(
    ST_POINT(-122.45, 37.75),
    ST_POINT(-122.40, 37.80)
  ))
  GROUP BY grid_x, grid_y
)
SELECT 
  grid_x,
  grid_y,
  restaurant_count,
  ROUND(avg_rating, 2) AS avg_rating
FROM grid_cells
WHERE restaurant_count >= 5
ORDER BY restaurant_count DESC

Service Coverage Analysis

-- Calculate total area covered by delivery services
SELECT 
  cuisine,
  COUNT(*) AS restaurant_count,
  SUM(ST_AREA(serviceArea)) AS total_coverage_sqm,
  AVG(deliveryFee) AS avg_delivery_fee
FROM restaurants
WHERE serviceArea IS NOT NULL
GROUP BY cuisine
HAVING COUNT(*) >= 3
ORDER BY total_coverage_sqm DESC

Combining QueryLeaf with MongoDB Geospatial Features

While QueryLeaf doesn't directly support spatial functions, you can combine SQL-style queries with MongoDB's native geospatial capabilities:

-- Use QueryLeaf for business logic and data filtering
SELECT 
  name,
  cuisine,
  rating,
  deliveryFee,
  estimatedDeliveryTime,
  location,
  isOpen,
  acceptingOrders
FROM restaurants
WHERE rating >= 4.0
  AND deliveryFee <= 5.00
  AND isOpen = true
  AND acceptingOrders = true
  AND location IS NOT NULL
ORDER BY rating DESC

Then apply MongoDB geospatial operators in a second step:

// Follow up with native MongoDB geospatial query
const candidateRestaurants = await queryLeaf.execute(sqlQuery);

// Filter by proximity using MongoDB's native operators
const nearbyRestaurants = await db.collection('restaurants').find({
  _id: { $in: candidateRestaurants.map(r => r._id) },
  location: {
    $near: {
      $geometry: { type: "Point", coordinates: [-122.4194, 37.7749] },
      $maxDistance: 2000  // 2km
    }
  }
}).toArray();

Best Practices for Geospatial Data

  1. Coordinate Order: Always use [longitude, latitude] order in GeoJSON
  2. Index Strategy: Create 2dsphere indexes on all spatial fields used in queries
  3. Query Optimization: Use bounding boxes for initial filtering before precise distance calculations
  4. Data Validation: Ensure coordinates are within valid ranges (-180 to 180 for longitude, -90 to 90 for latitude)
  5. Units Awareness: MongoDB distances are in meters by default
  6. Precision: Consider coordinate precision needs (6 decimal places ≈ 10cm accuracy)

Conclusion

Working with location data in MongoDB requires understanding both SQL-style data modeling and MongoDB's native geospatial capabilities. While QueryLeaf doesn't directly support spatial functions, applying SQL thinking to location data helps structure queries and optimize performance.

Key strategies for location-based applications:

  • Data Modeling: Store coordinates in both individual fields and GeoJSON format for flexibility
  • Query Patterns: Use SQL range queries for approximate location searches and coordinate validation
  • Hybrid Approach: Combine QueryLeaf's SQL capabilities with MongoDB's native geospatial operators
  • Performance: Leverage proper indexing strategies for both coordinate fields and GeoJSON data

Whether you're building delivery platforms, store locators, or IoT monitoring systems, understanding how to structure location queries gives you a solid foundation. You can start with SQL-style coordinate filtering using QueryLeaf, then enhance with MongoDB's powerful geospatial features when precise distance calculations and complex spatial relationships are needed.

The combination of familiar SQL patterns with MongoDB's document flexibility and native geospatial capabilities provides the tools needed for sophisticated location-aware applications that scale effectively.

MongoDB Transactions and ACID Operations: SQL-Style Data Consistency

One of the most significant differences between traditional SQL databases and MongoDB has historically been transaction support. While MongoDB has supported ACID properties within single documents since its inception, multi-document transactions were only introduced in version 4.0, with cross-shard support added in version 4.2.

Understanding how to implement robust transactional patterns in MongoDB using SQL-style syntax ensures your applications maintain data consistency while leveraging document database flexibility.

The Transaction Challenge

Consider a financial application where you need to transfer money between accounts. This operation requires updating multiple documents atomically - if any part fails, the entire operation must be rolled back.

Traditional SQL makes this straightforward:

BEGIN TRANSACTION;

UPDATE accounts 
SET balance = balance - 100 
WHERE account_id = 'account_001';

UPDATE accounts 
SET balance = balance + 100 
WHERE account_id = 'account_002';

INSERT INTO transaction_log (from_account, to_account, amount, timestamp)
VALUES ('account_001', 'account_002', 100, NOW());

COMMIT;

In MongoDB, this same operation historically required complex application-level coordination:

// Complex MongoDB approach without transactions
const session = client.startSession();

try {
  await session.withTransaction(async () => {
    const accounts = db.collection('accounts');
    const logs = db.collection('transaction_log');

    // Check source account balance
    const sourceAccount = await accounts.findOne(
      { account_id: 'account_001' }, 
      { session }
    );

    if (sourceAccount.balance < 100) {
      throw new Error('Insufficient funds');
    }

    // Update accounts
    await accounts.updateOne(
      { account_id: 'account_001' },
      { $inc: { balance: -100 } },
      { session }
    );

    await accounts.updateOne(
      { account_id: 'account_002' },
      { $inc: { balance: 100 } },
      { session }
    );

    // Log transaction
    await logs.insertOne({
      from_account: 'account_001',
      to_account: 'account_002', 
      amount: 100,
      timestamp: new Date()
    }, { session });
  });
} finally {
  await session.endSession();
}

SQL-Style Transaction Syntax

Using SQL patterns makes transaction handling much more intuitive:

-- Begin transaction
BEGIN TRANSACTION;

-- Verify sufficient funds
SELECT balance 
FROM accounts 
WHERE account_id = 'account_001' 
  AND balance >= 100;

-- Update accounts atomically
UPDATE accounts 
SET balance = balance - 100,
    last_modified = CURRENT_TIMESTAMP
WHERE account_id = 'account_001';

UPDATE accounts 
SET balance = balance + 100,
    last_modified = CURRENT_TIMESTAMP  
WHERE account_id = 'account_002';

-- Create audit trail
INSERT INTO transaction_log (
  transaction_id,
  from_account, 
  to_account, 
  amount,
  transaction_type,
  timestamp,
  status
) VALUES (
  'txn_' + RANDOM_UUID(),
  'account_001',
  'account_002', 
  100,
  'transfer',
  CURRENT_TIMESTAMP,
  'completed'
);

-- Commit the transaction
COMMIT;

Transaction Isolation Levels

MongoDB supports different isolation levels that map to familiar SQL concepts:

Read Uncommitted

-- Set transaction isolation
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;

BEGIN TRANSACTION;

-- This might read uncommitted data from other transactions
SELECT SUM(balance) FROM accounts 
WHERE account_type = 'checking';

COMMIT;

Read Committed (Default)

SET TRANSACTION ISOLATION LEVEL READ COMMITTED;

BEGIN TRANSACTION;

-- Only sees data committed before transaction started
SELECT account_id, balance, last_modified
FROM accounts 
WHERE customer_id = 'cust_123'
ORDER BY last_modified DESC;

COMMIT;

Snapshot Isolation

SET TRANSACTION ISOLATION LEVEL SNAPSHOT;

BEGIN TRANSACTION;

-- Consistent snapshot of data throughout transaction
SELECT 
  c.customer_name,
  c.email,
  SUM(a.balance) AS total_balance,
  COUNT(a.account_id) AS account_count
FROM customers c
JOIN accounts a ON c.customer_id = a.customer_id
WHERE c.status = 'active'
GROUP BY c.customer_id, c.customer_name, c.email
HAVING SUM(a.balance) > 10000;

COMMIT;

Complex Business Workflows

E-commerce Order Processing

Consider placing an order that involves inventory management, payment processing, and order creation:

BEGIN TRANSACTION;

-- Verify product availability
SELECT 
  p.product_id,
  p.name,
  p.price,
  i.quantity_available,
  i.reserved_quantity
FROM products p
JOIN inventory i ON p.product_id = i.product_id  
WHERE p.product_id IN ('prod_001', 'prod_002')
  AND i.quantity_available >= CASE p.product_id 
    WHEN 'prod_001' THEN 2
    WHEN 'prod_002' THEN 1
    ELSE 0
  END;

-- Reserve inventory
UPDATE inventory
SET reserved_quantity = reserved_quantity + 2,
    quantity_available = quantity_available - 2,
    last_updated = CURRENT_TIMESTAMP
WHERE product_id = 'prod_001';

UPDATE inventory  
SET reserved_quantity = reserved_quantity + 1,
    quantity_available = quantity_available - 1,
    last_updated = CURRENT_TIMESTAMP
WHERE product_id = 'prod_002';

-- Create order
INSERT INTO orders (
  order_id,
  customer_id,
  order_date,
  status,
  total_amount,
  payment_status,
  items
) VALUES (
  'order_' + RANDOM_UUID(),
  'cust_456',
  CURRENT_TIMESTAMP,
  'pending_payment',
  359.97,
  'processing',
  JSON_ARRAY(
    JSON_OBJECT(
      'product_id', 'prod_001',
      'quantity', 2,
      'price', 149.99
    ),
    JSON_OBJECT(
      'product_id', 'prod_002', 
      'quantity', 1,
      'price', 59.99
    )
  )
);

-- Process payment
INSERT INTO payments (
  payment_id,
  order_id,
  customer_id,
  amount,
  payment_method,
  status,
  processed_at
) VALUES (
  'pay_' + RANDOM_UUID(),
  LAST_INSERT_ID(),
  'cust_456',
  359.97,
  'credit_card',
  'completed',
  CURRENT_TIMESTAMP
);

-- Update order status
UPDATE orders
SET status = 'confirmed',
    payment_status = 'completed',
    confirmed_at = CURRENT_TIMESTAMP
WHERE order_id = LAST_INSERT_ID();

COMMIT;

Handling Transaction Failures

BEGIN TRANSACTION;

-- Savepoint for partial rollback
SAVEPOINT before_payment;

UPDATE accounts
SET balance = balance - 500
WHERE account_id = 'checking_001';

-- Attempt payment processing
INSERT INTO payment_attempts (
  account_id,
  amount, 
  merchant,
  attempt_time,
  status
) VALUES (
  'checking_001',
  500,
  'ACME Store',
  CURRENT_TIMESTAMP,
  'processing'
);

-- Check if payment succeeded (simulated)
SELECT status FROM payment_gateway 
WHERE transaction_ref = LAST_INSERT_ID();

-- If payment failed, rollback to savepoint
-- ROLLBACK TO SAVEPOINT before_payment;

-- If successful, complete the transaction
UPDATE payment_attempts
SET status = 'completed',
    completed_at = CURRENT_TIMESTAMP
WHERE transaction_ref = LAST_INSERT_ID();

COMMIT;

Multi-Collection Consistency Patterns

Master-Detail Relationships

Maintain consistency between header and detail records:

// Sample order document structure
{
  "_id": ObjectId("..."),
  "order_id": "order_12345",
  "customer_id": "cust_456", 
  "order_date": ISODate("2025-08-19"),
  "status": "pending",
  "total_amount": 0,  // Calculated from items
  "item_count": 0,    // Calculated from items
  "last_modified": ISODate("2025-08-19")
}

// Order items in separate collection
{
  "_id": ObjectId("..."),
  "order_id": "order_12345",
  "line_number": 1,
  "product_id": "prod_001",
  "quantity": 2,
  "unit_price": 149.99,
  "line_total": 299.98
}

Update both collections atomically:

BEGIN TRANSACTION;

-- Insert order header
INSERT INTO orders (
  order_id,
  customer_id,
  order_date,
  status,
  total_amount,
  item_count
) VALUES (
  'order_12345',
  'cust_456', 
  CURRENT_TIMESTAMP,
  'pending',
  0,
  0
);

-- Insert order items
INSERT INTO order_items (
  order_id,
  line_number,
  product_id,
  quantity,
  unit_price,
  line_total
) VALUES 
  ('order_12345', 1, 'prod_001', 2, 149.99, 299.98),
  ('order_12345', 2, 'prod_002', 1, 59.99, 59.99);

-- Update order totals
UPDATE orders
SET total_amount = (
  SELECT SUM(line_total) 
  FROM order_items 
  WHERE order_id = 'order_12345'
),
item_count = (
  SELECT SUM(quantity)
  FROM order_items
  WHERE order_id = 'order_12345'  
),
last_modified = CURRENT_TIMESTAMP
WHERE order_id = 'order_12345';

COMMIT;

Performance Optimization for Transactions

Transaction Scope Minimization

Keep transactions short and focused:

-- Good: Minimal transaction scope
BEGIN TRANSACTION;

UPDATE inventory 
SET quantity = quantity - 1
WHERE product_id = 'prod_001'
  AND quantity > 0;

INSERT INTO reservations (product_id, customer_id, reserved_at)
VALUES ('prod_001', 'cust_123', CURRENT_TIMESTAMP);

COMMIT;

-- Avoid: Long-running transactions
-- BEGIN TRANSACTION;
-- Complex calculations...
-- External API calls...
-- COMMIT;

Batching Operations

Group related operations efficiently:

BEGIN TRANSACTION;

-- Batch inventory updates
UPDATE inventory 
SET quantity = CASE product_id
  WHEN 'prod_001' THEN quantity - 2
  WHEN 'prod_002' THEN quantity - 1
  WHEN 'prod_003' THEN quantity - 3
  ELSE quantity
END,
reserved = reserved + CASE product_id
  WHEN 'prod_001' THEN 2
  WHEN 'prod_002' THEN 1  
  WHEN 'prod_003' THEN 3
  ELSE 0
END
WHERE product_id IN ('prod_001', 'prod_002', 'prod_003');

-- Batch order item insertion
INSERT INTO order_items (order_id, product_id, quantity, price)
VALUES 
  ('order_456', 'prod_001', 2, 29.99),
  ('order_456', 'prod_002', 1, 49.99),
  ('order_456', 'prod_003', 3, 19.99);

COMMIT;

Error Handling and Retry Logic

Transient Error Recovery

-- Implement retry logic for write conflicts
RETRY_TRANSACTION: BEGIN TRANSACTION;

-- Critical business operation
UPDATE accounts
SET balance = balance - CASE 
  WHEN account_type = 'checking' THEN 100
  WHEN account_type = 'savings' THEN 95  -- Fee discount
  ELSE 105  -- Premium fee
END,
transaction_count = transaction_count + 1,
last_transaction_date = CURRENT_TIMESTAMP
WHERE customer_id = 'cust_789'
  AND account_status = 'active'
  AND balance >= 100;

-- Verify update succeeded
SELECT ROW_COUNT() AS updated_rows;

-- Create transaction record
INSERT INTO account_transactions (
  transaction_id,
  customer_id,
  transaction_type,
  amount,
  balance_after,
  processed_at
) 
SELECT 
  'txn_' + RANDOM_UUID(),
  'cust_789',
  'withdrawal',
  100,
  balance,
  CURRENT_TIMESTAMP
FROM accounts 
WHERE customer_id = 'cust_789'
  AND account_type = 'checking';

-- If write conflict occurs, retry with exponential backoff
-- ON WRITE_CONFLICT RETRY RETRY_TRANSACTION AFTER DELAY(RANDOM() * 1000);

COMMIT;

Advanced Transaction Patterns

Compensating Transactions

Implement saga patterns for distributed operations:

-- Order placement saga
BEGIN TRANSACTION 'order_placement_saga';

-- Step 1: Reserve inventory
INSERT INTO saga_steps (
  saga_id,
  step_name, 
  operation_type,
  compensation_sql,
  status
) VALUES (
  'saga_order_123',
  'reserve_inventory',
  'UPDATE',
  'UPDATE inventory SET reserved = reserved - 2 WHERE product_id = ''prod_001''',
  'pending'
);

UPDATE inventory 
SET reserved = reserved + 2
WHERE product_id = 'prod_001';

-- Step 2: Process payment
INSERT INTO saga_steps (
  saga_id,
  step_name,
  operation_type, 
  compensation_sql,
  status
) VALUES (
  'saga_order_123',
  'process_payment',
  'INSERT',
  'DELETE FROM payments WHERE payment_id = ''pay_456''',
  'pending'
);

INSERT INTO payments (payment_id, amount, status)
VALUES ('pay_456', 199.98, 'processed');

-- Step 3: Create order
INSERT INTO orders (order_id, customer_id, status, total_amount)
VALUES ('order_123', 'cust_456', 'confirmed', 199.98);

-- Mark saga as completed
UPDATE saga_steps 
SET status = 'completed'
WHERE saga_id = 'saga_order_123';

COMMIT;

Read-Only Transactions for Analytics

Ensure consistent reporting across multiple collections:

-- Consistent financial reporting
BEGIN TRANSACTION READ ONLY;

-- Get snapshot timestamp
SELECT CURRENT_TIMESTAMP AS report_timestamp;

-- Account balances
SELECT 
  account_type,
  COUNT(*) AS account_count,
  SUM(balance) AS total_balance,
  AVG(balance) AS average_balance
FROM accounts
WHERE status = 'active'
GROUP BY account_type;

-- Transaction volume
SELECT 
  DATE(transaction_date) AS date,
  transaction_type,
  COUNT(*) AS transaction_count,
  SUM(amount) AS total_amount
FROM transactions
WHERE transaction_date >= CURRENT_DATE - INTERVAL '30 days'
  AND status = 'completed'
GROUP BY DATE(transaction_date), transaction_type
ORDER BY date DESC, transaction_type;

-- Customer activity
SELECT
  c.customer_segment,
  COUNT(DISTINCT t.customer_id) AS active_customers,
  AVG(t.amount) AS avg_transaction_amount
FROM customers c
JOIN transactions t ON c.customer_id = t.customer_id  
WHERE t.transaction_date >= CURRENT_DATE - INTERVAL '30 days'
  AND t.status = 'completed'
GROUP BY c.customer_segment;

COMMIT;

MongoDB-Specific Transaction Features

Working with Sharded Collections

-- Cross-shard transaction
BEGIN TRANSACTION;

-- Update documents across multiple shards
UPDATE user_profiles
SET last_login = CURRENT_TIMESTAMP,
    login_count = login_count + 1
WHERE user_id = 'user_123';  -- Shard key

UPDATE user_activity_log
SET login_events = ARRAY_APPEND(
  login_events,
  JSON_OBJECT(
    'timestamp', CURRENT_TIMESTAMP,
    'ip_address', '192.168.1.1',
    'user_agent', 'Mozilla/5.0...'
  )
)
WHERE user_id = 'user_123';  -- Same shard key

COMMIT;

Time-Based Data Operations

-- Session cleanup transaction
BEGIN TRANSACTION;

-- Archive expired sessions
INSERT INTO archived_sessions
SELECT * FROM active_sessions
WHERE expires_at < CURRENT_TIMESTAMP;

-- Remove expired sessions  
DELETE FROM active_sessions
WHERE expires_at < CURRENT_TIMESTAMP;

-- Update session statistics
UPDATE session_stats
SET expired_count = expired_count + ROW_COUNT(),
    last_cleanup = CURRENT_TIMESTAMP
WHERE date = CURRENT_DATE;

COMMIT;

QueryLeaf Transaction Integration

QueryLeaf provides seamless transaction support, automatically handling MongoDB session management and translating SQL transaction syntax:

-- QueryLeaf handles session lifecycle automatically
BEGIN TRANSACTION;

-- Complex business logic with joins and aggregations
WITH customer_orders AS (
  SELECT 
    c.customer_id,
    c.customer_tier,
    SUM(o.total_amount) AS total_spent,
    COUNT(o.order_id) AS order_count
  FROM customers c
  JOIN orders o ON c.customer_id = o.customer_id
  WHERE o.order_date >= '2025-01-01'
    AND o.status = 'completed'
  GROUP BY c.customer_id, c.customer_tier
  HAVING SUM(o.total_amount) > 1000
)
UPDATE customers
SET customer_tier = CASE
  WHEN co.total_spent > 5000 THEN 'platinum'
  WHEN co.total_spent > 2500 THEN 'gold'  
  WHEN co.total_spent > 1000 THEN 'silver'
  ELSE customer_tier
END,
tier_updated_at = CURRENT_TIMESTAMP
FROM customer_orders co
WHERE customers.customer_id = co.customer_id;

-- Insert tier change log
INSERT INTO tier_changes (
  customer_id,
  old_tier,
  new_tier, 
  change_reason,
  changed_at
)
SELECT 
  c.customer_id,
  c.previous_tier,
  c.customer_tier,
  'purchase_volume',
  CURRENT_TIMESTAMP
FROM customers c
WHERE c.tier_updated_at = CURRENT_TIMESTAMP;

COMMIT;

QueryLeaf automatically optimizes transaction boundaries, manages MongoDB sessions, and provides proper error handling and retry logic.

Best Practices for MongoDB Transactions

  1. Keep Transactions Short: Minimize transaction duration to reduce lock contention
  2. Use Appropriate Isolation: Choose the right isolation level for your use case
  3. Handle Write Conflicts: Implement retry logic for transient errors
  4. Optimize Document Structure: Design schemas to minimize cross-document transactions
  5. Monitor Performance: Track transaction metrics and identify bottlenecks
  6. Test Failure Scenarios: Ensure your application handles rollbacks correctly

Conclusion

MongoDB's transaction support, combined with SQL-style syntax, provides robust ACID guarantees while maintaining document database flexibility. Understanding how to structure transactions effectively ensures your applications maintain data consistency across complex business operations.

Key benefits of SQL-style transaction management:

  • Familiar Patterns: Use well-understood SQL transaction syntax
  • Clear Semantics: Explicit transaction boundaries and error handling
  • Cross-Document Consistency: Maintain data integrity across collections
  • Business Logic Clarity: Express complex workflows in readable SQL
  • Performance Control: Fine-tune transaction scope and isolation levels

Whether you're building financial applications, e-commerce platforms, or complex business workflows, proper transaction management is essential for data integrity. With QueryLeaf's SQL-to-MongoDB translation, you can leverage familiar transaction patterns while taking advantage of MongoDB's document model flexibility.

The combination of MongoDB's ACID transaction support with SQL's expressive transaction syntax creates a powerful foundation for building reliable, scalable applications that maintain data consistency without sacrificing performance or development productivity.

MongoDB Text Search and Full-Text Indexing: SQL-Style Search Queries

Building search functionality in MongoDB can be complex when working with the native operators. While MongoDB's $text and $regex operators are powerful, implementing comprehensive search features often requires understanding multiple MongoDB-specific concepts and syntax patterns.

Using SQL-style search queries makes text search more intuitive and maintainable, especially for teams familiar with traditional database search patterns.

The Text Search Challenge

Consider a content management system with articles, products, and user profiles. Traditional MongoDB text search involves multiple operators and complex aggregation pipelines:

// Sample article document
{
  "_id": ObjectId("..."),
  "title": "Getting Started with MongoDB Indexing",
  "content": "MongoDB provides several types of indexes to optimize query performance. Understanding compound indexes, text indexes, and partial indexes is crucial for building scalable applications.",
  "author": "Jane Developer",
  "category": "Database",
  "tags": ["mongodb", "indexing", "performance", "databases"],
  "publishDate": ISODate("2025-08-15"),
  "status": "published",
  "wordCount": 1250,
  "readTime": 5
}

Native MongoDB search requires multiple approaches:

// Basic text search
db.articles.find({
  $text: {
    $search: "mongodb indexing performance"
  }
})

// Complex search with multiple conditions
db.articles.find({
  $and: [
    { $text: { $search: "mongodb indexing" } },
    { status: "published" },
    { category: "Database" },
    { publishDate: { $gte: ISODate("2025-01-01") } }
  ]
}).sort({ score: { $meta: "textScore" } })

// Regex-based partial matches
db.articles.find({
  $or: [
    { title: { $regex: "mongodb", $options: "i" } },
    { content: { $regex: "mongodb", $options: "i" } }
  ]
})

The same searches become much more readable with SQL syntax:

-- Basic full-text search
SELECT title, author, publishDate, 
       MATCH_SCORE(title, content) AS relevance
FROM articles
WHERE MATCH(title, content) AGAINST ('mongodb indexing performance')
  AND status = 'published'
ORDER BY relevance DESC

-- Advanced search with multiple criteria
SELECT title, author, category, readTime,
       MATCH_SCORE(title, content) AS score
FROM articles  
WHERE MATCH(title, content) AGAINST ('mongodb indexing')
  AND category = 'Database'
  AND publishDate >= '2025-01-01'
  AND status = 'published'
ORDER BY score DESC, publishDate DESC

Setting Up Text Indexes

Before performing text searches, you need appropriate indexes. Here's how to create them:

Basic Text Index

-- Create text index on multiple fields
CREATE TEXT INDEX idx_articles_search 
ON articles (title, content)

MongoDB equivalent:

db.articles.createIndex({ 
  title: "text", 
  content: "text" 
})

Weighted Text Index

Give different importance to various fields:

-- Create weighted text index
CREATE TEXT INDEX idx_articles_weighted_search 
ON articles (title, content, tags)
WITH WEIGHTS (title: 10, content: 5, tags: 1)

MongoDB syntax:

db.articles.createIndex(
  { title: "text", content: "text", tags: "text" },
  { weights: { title: 10, content: 5, tags: 1 } }
)

Language-Specific Text Index

-- Create text index with language specification
CREATE TEXT INDEX idx_articles_english_search 
ON articles (title, content)
WITH LANGUAGE 'english'

MongoDB equivalent:

db.articles.createIndex(
  { title: "text", content: "text" },
  { default_language: "english" }
)

Search Query Patterns

-- Search for exact phrases
SELECT title, author, MATCH_SCORE(title, content) AS score
FROM articles
WHERE MATCH(title, content) AGAINST ('"compound indexes"')
  AND status = 'published'
ORDER BY score DESC

Boolean Search Operations

-- Advanced boolean search
SELECT title, author, category
FROM articles
WHERE MATCH(title, content) AGAINST ('mongodb +indexing -aggregation')
  AND status = 'published'

-- Search with OR conditions
SELECT title, author
FROM articles  
WHERE MATCH(title, content) AGAINST ('indexing OR performance OR optimization')
  AND category IN ('Database', 'Performance')

Case-Insensitive Pattern Matching

-- Partial string matching
SELECT title, author, category
FROM articles
WHERE title ILIKE '%mongodb%'
   OR content ILIKE '%mongodb%'
   OR ARRAY_TO_STRING(tags, ' ') ILIKE '%mongodb%'

-- Using REGEX for complex patterns
SELECT title, author
FROM articles
WHERE title REGEX '(?i)mongo.*db'
   OR content REGEX '(?i)index(ing|es)?'

Advanced Search Features

Search with Aggregations

Combine text search with analytical queries:

-- Search results with category breakdown
SELECT 
  category,
  COUNT(*) AS articleCount,
  AVG(MATCH_SCORE(title, content)) AS avgRelevance,
  AVG(readTime) AS avgReadTime
FROM articles
WHERE MATCH(title, content) AGAINST ('mongodb performance')
  AND status = 'published'
  AND publishDate >= '2025-01-01'
GROUP BY category
ORDER BY avgRelevance DESC

Search with JOIN Operations

-- Search articles with author information
SELECT 
  a.title,
  a.publishDate,
  u.name AS authorName,
  u.expertise,
  MATCH_SCORE(a.title, a.content) AS relevance
FROM articles a
JOIN users u ON a.author = u.username
WHERE MATCH(a.title, a.content) AGAINST ('indexing strategies')
  AND a.status = 'published'
  AND u.isActive = true
ORDER BY relevance DESC, a.publishDate DESC

Faceted Search Results

-- Get search results with facet counts
WITH search_results AS (
  SELECT *,
         MATCH_SCORE(title, content) AS score
  FROM articles
  WHERE MATCH(title, content) AGAINST ('mongodb optimization')
    AND status = 'published'
)
SELECT 
  'results' AS type,
  COUNT(*) AS count,
  JSON_AGG(
    JSON_BUILD_OBJECT(
      'title', title,
      'author', author,
      'category', category,
      'score', score
    )
  ) AS data
FROM search_results
WHERE score > 0.5

UNION ALL

SELECT 
  'categories' AS type,
  COUNT(*) AS count,
  JSON_AGG(
    JSON_BUILD_OBJECT(
      'category', category,
      'count', category_count
    )
  ) AS data
FROM (
  SELECT category, COUNT(*) AS category_count
  FROM search_results
  GROUP BY category
) category_facets

Performance Optimization

Create compound indexes that support both search and filtering:

-- Compound index for search + filtering
CREATE INDEX idx_articles_search_filter 
ON articles (status, category, publishDate)

-- Combined with text index for optimal performance
CREATE TEXT INDEX idx_articles_content_search
ON articles (title, content)

Search Result Pagination

-- Efficient pagination for search results
SELECT title, author, publishDate,
       MATCH_SCORE(title, content) AS score
FROM articles
WHERE MATCH(title, content) AGAINST ('mongodb tutorial')
  AND status = 'published'
ORDER BY score DESC, _id ASC
LIMIT 20 OFFSET 40

Search Performance Analysis

-- Analyze search query performance
EXPLAIN ANALYZE
SELECT title, author, MATCH_SCORE(title, content) AS score
FROM articles
WHERE MATCH(title, content) AGAINST ('performance optimization')
  AND category = 'Database'
  AND publishDate >= '2025-01-01'
ORDER BY score DESC
LIMIT 10

Real-World Search Implementation

// Sample product document
{
  "_id": ObjectId("..."),
  "name": "MacBook Pro 16-inch M3",
  "description": "Powerful laptop with M3 chip, perfect for development and creative work",
  "brand": "Apple",
  "category": "Laptops",
  "subcategory": "Professional",
  "price": 2499.99,
  "features": ["M3 chip", "16GB RAM", "1TB SSD", "Liquid Retina Display"],
  "tags": ["laptop", "apple", "macbook", "professional", "development"],
  "inStock": true,
  "rating": 4.8,
  "reviewCount": 1247
}

Comprehensive product search query:

SELECT 
  p.name,
  p.brand,
  p.price,
  p.rating,
  p.reviewCount,
  MATCH_SCORE(p.name, p.description) AS textScore,
  -- Boost score based on rating and reviews
  (MATCH_SCORE(p.name, p.description) * 0.7 + 
   (p.rating / 5.0) * 0.2 + 
   LOG(p.reviewCount + 1) * 0.1) AS finalScore
FROM products p
WHERE MATCH(p.name, p.description) AGAINST ('macbook pro development')
  AND p.inStock = true
  AND p.price BETWEEN 1000 AND 5000
  AND p.rating >= 4.0
ORDER BY finalScore DESC, p.reviewCount DESC
LIMIT 20

Content Discovery System

-- Find related articles based on search terms and user preferences
WITH user_interests AS (
  SELECT UNNEST(interests) AS interest
  FROM users 
  WHERE _id = ?
),
search_matches AS (
  SELECT 
    a.*,
    MATCH_SCORE(a.title, a.content) AS textScore
  FROM articles a
  WHERE MATCH(a.title, a.content) AGAINST (?)
    AND a.status = 'published'
    AND a.publishDate >= CURRENT_DATE - INTERVAL '90 days'
)
SELECT 
  s.title,
  s.author,
  s.category,
  s.publishDate,
  s.readTime,
  s.textScore,
  -- Boost articles matching user interests
  CASE 
    WHEN s.category IN (SELECT interest FROM user_interests) THEN s.textScore * 1.5
    WHEN EXISTS (
      SELECT 1 FROM user_interests ui 
      WHERE s.tags @> ARRAY[ui.interest]
    ) THEN s.textScore * 1.2
    ELSE s.textScore
  END AS personalizedScore
FROM search_matches s
ORDER BY personalizedScore DESC, s.publishDate DESC
LIMIT 15

Multi-Language Search Support

Language Detection and Indexing

-- Create language-specific indexes
CREATE TEXT INDEX idx_articles_english 
ON articles (title, content) 
WHERE language = 'english'
WITH LANGUAGE 'english'

CREATE TEXT INDEX idx_articles_spanish 
ON articles (title, content) 
WHERE language = 'spanish'
WITH LANGUAGE 'spanish'

Multi-Language Search Query

-- Search across multiple languages
SELECT 
  title,
  author,
  language,
  MATCH_SCORE(title, content) AS score
FROM articles
WHERE (
  (language = 'english' AND MATCH(title, content) AGAINST ('database performance'))
  OR 
  (language = 'spanish' AND MATCH(title, content) AGAINST ('rendimiento base datos'))
)
AND status = 'published'
ORDER BY score DESC

Search Analytics and Insights

Search Term Analysis

-- Analyze popular search terms (from search logs)
SELECT 
  searchTerm,
  COUNT(*) AS searchCount,
  AVG(resultCount) AS avgResults,
  AVG(clickThroughRate) AS avgCTR
FROM search_logs
WHERE searchDate >= CURRENT_DATE - INTERVAL '30 days'
  AND resultCount > 0
GROUP BY searchTerm
HAVING COUNT(*) >= 10
ORDER BY searchCount DESC, avgCTR DESC
LIMIT 20

Content Gap Analysis

-- Find search terms with low result counts
SELECT 
  sl.searchTerm,
  COUNT(*) AS searchFrequency,
  AVG(sl.resultCount) AS avgResultCount
FROM search_logs sl
WHERE sl.searchDate >= CURRENT_DATE - INTERVAL '30 days'
  AND sl.resultCount < 5
GROUP BY sl.searchTerm
HAVING COUNT(*) >= 5
ORDER BY searchFrequency DESC

QueryLeaf Integration

When using QueryLeaf for MongoDB text search, you gain several advantages:

-- QueryLeaf automatically optimizes this complex search query
SELECT 
  a.title,
  a.author,
  a.publishDate,
  u.name AS authorFullName,
  u.expertise,
  MATCH_SCORE(a.title, a.content) AS relevance,
  -- Complex scoring with user engagement metrics
  (MATCH_SCORE(a.title, a.content) * 0.6 + 
   LOG(a.viewCount + 1) * 0.2 + 
   a.socialShares * 0.2) AS engagementScore
FROM articles a
JOIN users u ON a.author = u.username
WHERE MATCH(a.title, a.content) AGAINST ('mongodb indexing performance optimization')
  AND a.status = 'published'
  AND a.publishDate >= '2025-01-01'
  AND u.isActive = true
  AND a.category IN ('Database', 'Performance', 'Tutorial')
ORDER BY engagementScore DESC, a.publishDate DESC
LIMIT 25

QueryLeaf handles the complex MongoDB aggregation pipeline generation, text index utilization, and query optimization automatically.

  1. Index Strategy: Create appropriate text indexes for your search fields
  2. Query Optimization: Use compound indexes to support filtering alongside text search
  3. Result Ranking: Implement scoring algorithms that consider relevance and business metrics
  4. Performance Monitoring: Regularly analyze search query performance and user behavior
  5. Content Quality: Maintain good content structure to improve search effectiveness

Conclusion

MongoDB's text search capabilities are powerful, but SQL-style queries make them much more accessible and maintainable. By using familiar SQL patterns, you can build sophisticated search functionality that performs well and is easy to understand.

Key benefits of SQL-style text search: - Intuitive query syntax for complex search operations - Easy integration of search with business logic and analytics - Better performance through optimized query planning - Simplified maintenance and debugging of search functionality

Whether you're building content discovery systems, e-commerce product search, or knowledge management platforms, SQL-style text search queries provide the clarity and power needed to create effective search experiences.

With QueryLeaf, you can leverage MongoDB's document flexibility while maintaining the search query patterns your team already knows, creating the best of both worlds for modern applications.

MongoDB Schema Design Patterns: Building Scalable Document Structures

MongoDB's flexible document model offers freedom from rigid table schemas, but this flexibility can be overwhelming. Unlike SQL databases with normalized tables, MongoDB requires careful consideration of how to structure documents to balance query performance, data consistency, and application scalability.

Understanding proven schema design patterns helps you leverage MongoDB's strengths while avoiding common pitfalls that can hurt performance and maintainability.

The Schema Design Challenge

Consider an e-commerce application with users, orders, and products. In SQL, you'd normalize this into separate tables:

-- SQL normalized approach
CREATE TABLE users (
  id SERIAL PRIMARY KEY,
  email VARCHAR(255) UNIQUE,
  name VARCHAR(255),
  address_street VARCHAR(255),
  address_city VARCHAR(255),
  address_country VARCHAR(255)
);

CREATE TABLE orders (
  id SERIAL PRIMARY KEY,
  user_id INTEGER REFERENCES users(id),
  order_date TIMESTAMP,
  total_amount DECIMAL(10,2),
  status VARCHAR(50)
);

CREATE TABLE order_items (
  id SERIAL PRIMARY KEY,
  order_id INTEGER REFERENCES orders(id),
  product_id INTEGER REFERENCES products(id),
  quantity INTEGER,
  price DECIMAL(10,2)
);

In MongoDB, you have multiple design options, each with different tradeoffs. Let's explore the main patterns.

Pattern 1: Embedding (Denormalization)

Embedding stores related data within a single document, reducing the need for joins.

// Embedded approach - Order with items embedded
{
  "_id": ObjectId("..."),
  "userId": ObjectId("..."),
  "userEmail": "john@example.com",
  "userName": "John Smith",
  "orderDate": ISODate("2025-08-17"),
  "status": "completed",
  "shippingAddress": {
    "street": "123 Main St",
    "city": "Seattle",
    "state": "WA",
    "zipCode": "98101",
    "country": "USA"
  },
  "items": [
    {
      "productId": ObjectId("..."),
      "name": "MacBook Pro",
      "price": 1299.99,
      "quantity": 1,
      "category": "Electronics"
    },
    {
      "productId": ObjectId("..."),
      "name": "USB-C Cable",
      "price": 19.99,
      "quantity": 2,
      "category": "Accessories"
    }
  ],
  "totalAmount": 1339.97
}

Benefits of Embedding:

  • Single Query Performance: Retrieve all related data in one operation
  • Atomic Updates: MongoDB guarantees ACID properties within a single document
  • Reduced Network Round Trips: No need for multiple queries or joins

SQL-Style Queries for Embedded Data:

-- Find orders with expensive items
SELECT 
  _id,
  userId,
  orderDate,
  items[0].name AS primaryItem,
  totalAmount
FROM orders
WHERE items[0].price > 1000
  AND status = 'completed'

-- Analyze spending by product category
SELECT 
  i.category,
  COUNT(*) AS orderCount,
  SUM(i.price * i.quantity) AS totalRevenue
FROM orders o
CROSS JOIN UNNEST(o.items) AS i
WHERE o.status = 'completed'
  AND o.orderDate >= '2025-01-01'
GROUP BY i.category
ORDER BY totalRevenue DESC

When to Use Embedding:

  • One-to-few relationships (typically < 100 subdocuments)
  • Child documents are always accessed with the parent
  • Child documents don't need independent querying
  • Document size stays under 16MB limit
  • Update patterns favor atomic operations

Pattern 2: References (Normalization)

References store related data in separate collections, similar to SQL foreign keys.

// Users collection
{
  "_id": ObjectId("user123"),
  "email": "john@example.com", 
  "name": "John Smith",
  "addresses": [
    {
      "type": "shipping",
      "street": "123 Main St",
      "city": "Seattle",
      "state": "WA",
      "zipCode": "98101",
      "country": "USA"
    }
  ]
}

// Orders collection
{
  "_id": ObjectId("order456"),
  "userId": ObjectId("user123"),
  "orderDate": ISODate("2025-08-17"),
  "status": "completed",
  "itemIds": [
    ObjectId("item789"),
    ObjectId("item790")
  ],
  "totalAmount": 1339.97
}

// Order Items collection  
{
  "_id": ObjectId("item789"),
  "orderId": ObjectId("order456"),
  "productId": ObjectId("prod001"),
  "name": "MacBook Pro",
  "price": 1299.99,
  "quantity": 1,
  "category": "Electronics"
}

SQL-Style Queries with References:

-- Join orders with user information
SELECT 
  o._id AS orderId,
  o.orderDate,
  o.totalAmount,
  u.name AS userName,
  u.email
FROM orders o
JOIN users u ON o.userId = u._id
WHERE o.status = 'completed'
  AND o.orderDate >= '2025-08-01'

-- Get detailed order information with items
SELECT 
  o._id AS orderId,
  o.orderDate,
  u.name AS customerName,
  i.name AS itemName,
  i.price,
  i.quantity
FROM orders o
JOIN users u ON o.userId = u._id
JOIN order_items i ON o._id = i.orderId
WHERE o.status = 'completed'
ORDER BY o.orderDate DESC, i.name

When to Use References:

  • One-to-many relationships with many children
  • Child documents need independent querying
  • Child documents are frequently updated
  • Need to maintain data consistency across documents
  • Document size would exceed MongoDB's 16MB limit

Pattern 3: Hybrid Approach

Combines embedding and referencing based on access patterns and data characteristics.

// Order with embedded frequently-accessed data and references for detailed data
{
  "_id": ObjectId("order456"),
  "userId": ObjectId("user123"),

  // Embedded user snapshot for quick access
  "userSnapshot": {
    "name": "John Smith",
    "email": "john@example.com",
    "membershipLevel": "gold"
  },

  "orderDate": ISODate("2025-08-17"),
  "status": "completed",

  // Embedded order items for atomic updates
  "items": [
    {
      "productId": ObjectId("prod001"),
      "name": "MacBook Pro", 
      "price": 1299.99,
      "quantity": 1
    }
  ],

  // Reference to detailed shipping info
  "shippingAddressId": ObjectId("addr123"),

  // Reference to payment information
  "paymentId": ObjectId("payment456"),

  "totalAmount": 1339.97
}

Benefits of Hybrid Approach:

  • Optimized Queries: Fast access to commonly needed data
  • Reduced Duplication: Reference detailed data that changes infrequently
  • Flexible Updates: Update embedded snapshots as needed

Advanced Schema Patterns

1. Polymorphic Pattern

Store different document types in the same collection:

// Products collection with different product types
{
  "_id": ObjectId("..."),
  "type": "book",
  "name": "MongoDB Definitive Guide",
  "price": 39.99,
  "isbn": "978-1449344689",
  "author": "Kristina Chodorow",
  "pages": 432
}

{
  "_id": ObjectId("..."),
  "type": "electronics",
  "name": "iPhone 15",
  "price": 799.99,
  "brand": "Apple",
  "model": "iPhone 15",
  "storage": "128GB"
}

Query with type-specific logic:

SELECT 
  name,
  price,
  CASE type
    WHEN 'book' THEN CONCAT(author, ' - ', pages, ' pages')
    WHEN 'electronics' THEN CONCAT(brand, ' ', model)
    ELSE 'Unknown product type'
  END AS productDetails
FROM products
WHERE price BETWEEN 30 AND 100
ORDER BY price DESC

2. Bucket Pattern

Group related documents to optimize for time-series or IoT data:

// Sensor readings bucketed by hour
{
  "_id": ObjectId("..."),
  "sensorId": "temp_sensor_01",
  "bucketDate": ISODate("2025-08-17T10:00:00Z"),
  "readings": [
    { "timestamp": ISODate("2025-08-17T10:00:00Z"), "value": 22.1 },
    { "timestamp": ISODate("2025-08-17T10:01:00Z"), "value": 22.3 },
    { "timestamp": ISODate("2025-08-17T10:02:00Z"), "value": 22.0 }
  ],
  "readingCount": 3,
  "minValue": 22.0,
  "maxValue": 22.3,
  "avgValue": 22.13
}

3. Outlier Pattern

Separate frequently accessed data from rare edge cases:

// Normal product document
{
  "_id": ObjectId("prod001"),
  "name": "Standard Widget",
  "price": 19.99,
  "category": "Widgets",
  "inStock": true,
  "hasOutliers": false
}

// Product with outlier data stored separately  
{
  "_id": ObjectId("prod002"), 
  "name": "Premium Widget",
  "price": 199.99,
  "category": "Widgets",
  "inStock": true,
  "hasOutliers": true
}

// Separate outlier collection
{
  "_id": ObjectId("..."),
  "productId": ObjectId("prod002"),
  "detailedSpecs": { /* large technical specifications */ },
  "userManual": "http://example.com/manual.pdf",
  "warrantyInfo": { /* detailed warranty terms */ }
}

Schema Design Decision Framework

1. Analyze Access Patterns

-- Common query: Get user's recent orders
SELECT * FROM orders 
WHERE userId = ? 
ORDER BY orderDate DESC 
LIMIT 10

-- This suggests embedding user snapshot in orders
-- Or at least indexing userId + orderDate

2. Consider Update Frequency

  • High Update Frequency: Use references to avoid document growth
  • Low Update Frequency: Embedding may be optimal
  • Atomic Updates Needed: Embed related data

3. Evaluate Data Growth

  • Bounded Growth: Embedding works well
  • Unbounded Growth: Use references
  • Predictable Growth: Hybrid approach

4. Query Performance Requirements

-- If this query is critical:
SELECT o.*, u.name, u.email
FROM orders o
JOIN users u ON o.userId = u._id
WHERE o.status = 'pending'

-- Consider embedding user snapshot in orders:
-- { "userSnapshot": { "name": "...", "email": "..." } }

Indexing Strategy for Different Patterns

Embedded Documents

// Index embedded array elements
db.orders.createIndex({ "items.productId": 1 })
db.orders.createIndex({ "items.category": 1, "orderDate": -1 })

// Index nested object fields
db.orders.createIndex({ "shippingAddress.city": 1 })

Referenced Documents

// Standard foreign key indexes
db.orders.createIndex({ "userId": 1, "orderDate": -1 })
db.orderItems.createIndex({ "orderId": 1 })
db.orderItems.createIndex({ "productId": 1 })

Migration Strategies

When your schema needs to evolve:

1. Adding New Fields (Easy)

// Add versioning to handle schema changes
{
  "_id": ObjectId("..."),
  "schemaVersion": 2,
  "userId": ObjectId("..."),
  // ... existing fields
  "newField": "new value"  // Added in version 2
}

2. Restructuring Documents (Complex)

-- Use aggregation to transform documents
UPDATE orders 
SET items = [
  {
    "productId": productId,
    "name": productName, 
    "price": price,
    "quantity": quantity
  }
]
WHERE schemaVersion = 1

Performance Testing Your Schema

Test different patterns with realistic data volumes:

// Load test embedded approach
for (let i = 0; i < 100000; i++) {
  db.orders.insertOne({
    userId: ObjectId(),
    items: generateRandomItems(1, 10),
    // ... other fields
  })
}

// Compare query performance
db.orders.find({ "userId": userId }).explain("executionStats")

QueryLeaf Schema Optimization

When using QueryLeaf for SQL-to-MongoDB translation, your schema design becomes even more critical. QueryLeaf can analyze your SQL query patterns and suggest optimal schema structures:

-- QueryLeaf can detect this join pattern
SELECT 
  o.orderDate,
  o.totalAmount,
  u.name AS customerName,
  i.productName,
  i.price
FROM orders o
JOIN users u ON o.userId = u._id
JOIN order_items i ON o._id = i.orderId
WHERE o.orderDate >= '2025-01-01'

-- And recommend either:
-- 1. Embedding user snapshots in orders
-- 2. Creating specific indexes for join performance
-- 3. Hybrid approach based on query frequency

Conclusion

Effective MongoDB schema design requires balancing multiple factors: query patterns, data relationships, update frequency, and performance requirements. There's no one-size-fits-all solution – the best approach depends on your specific use case.

Key principles: - Start with your queries: Design schemas to support your most important access patterns - Consider data lifecycle: How your data grows and changes over time - Measure performance: Test different approaches with realistic data volumes - Plan for evolution: Build in flexibility for future schema changes - Use appropriate indexes: Support your chosen schema pattern with proper indexing

Whether you choose embedding, referencing, or a hybrid approach, understanding these patterns helps you build MongoDB applications that scale efficiently while maintaining data integrity and query performance.

The combination of thoughtful schema design with tools like QueryLeaf gives you the flexibility of MongoDB documents with the query power of SQL – letting you build applications that are both performant and maintainable.

MongoDB Indexing Strategies: Optimizing Queries with SQL-Driven Approaches

MongoDB's indexing system is powerful, but designing effective indexes can be challenging when you're thinking in SQL terms. Understanding how your SQL queries translate to MongoDB operations is crucial for creating indexes that actually improve performance.

This guide shows how to design MongoDB indexes that support SQL-style queries, ensuring your applications run efficiently while maintaining query readability.

Understanding Index Types in MongoDB

MongoDB supports several index types that map well to SQL concepts:

  1. Single Field Indexes - Similar to SQL column indexes
  2. Compound Indexes - Like SQL multi-column indexes
  3. Text Indexes - For full-text search capabilities
  4. Partial Indexes - Equivalent to SQL conditional indexes
  5. TTL Indexes - For automatic document expiration

Basic Indexing for SQL-Style Queries

Single Field Indexes

Consider this user query pattern:

SELECT name, email, registrationDate
FROM users
WHERE email = 'john@example.com'

Create a supporting index:

CREATE INDEX idx_users_email ON users (email)

In MongoDB shell syntax:

db.users.createIndex({ email: 1 })

Compound Indexes for Complex Queries

For queries involving multiple fields:

SELECT productName, price, category, inStock
FROM products
WHERE category = 'Electronics'
  AND price BETWEEN 100 AND 500
  AND inStock = true
ORDER BY price ASC

Create an optimized compound index:

CREATE INDEX idx_products_category_instock_price 
ON products (category, inStock, price)

MongoDB equivalent:

db.products.createIndex({ 
  category: 1, 
  inStock: 1, 
  price: 1 
})

The index field order matters: equality filters first, range filters last, sort fields at the end.

Indexing for Array Operations

When working with embedded arrays, index specific array positions for known access patterns:

// Sample order document
{
  "customerId": ObjectId("..."),
  "items": [
    { "product": "iPhone", "price": 999, "category": "Electronics" },
    { "product": "Case", "price": 29, "category": "Accessories" }
  ],
  "orderDate": ISODate("2025-01-15")
}

For this SQL query accessing the first item:

SELECT customerId, orderDate, items[0].product
FROM orders
WHERE items[0].category = 'Electronics'
  AND items[0].price > 500
ORDER BY orderDate DESC

Create targeted indexes:

-- Index for first item queries
CREATE INDEX idx_orders_first_item 
ON orders (items[0].category, items[0].price, orderDate)

-- General array element index (covers any position)
CREATE INDEX idx_orders_items_category 
ON orders (items.category, items.price)

Advanced Indexing Patterns

Text Search Indexes

For content search across multiple fields:

SELECT title, content, author
FROM articles
WHERE MATCH(title, content) AGAINST ('mongodb indexing')
ORDER BY score DESC

Create a text index:

CREATE TEXT INDEX idx_articles_search 
ON articles (title, content) 
WITH WEIGHTS (title: 2, content: 1)

MongoDB syntax:

db.articles.createIndex(
  { title: "text", content: "text" },
  { weights: { title: 2, content: 1 } }
)

Partial Indexes for Conditional Data

Index only relevant documents to save space:

-- Only index active users for login queries
CREATE INDEX idx_users_active_email 
ON users (email)
WHERE status = 'active'

MongoDB equivalent:

db.users.createIndex(
  { email: 1 },
  { partialFilterExpression: { status: "active" } }
)

TTL Indexes for Time-Based Data

Automatically expire temporary data:

-- Sessions expire after 24 hours
CREATE TTL INDEX idx_sessions_expiry 
ON sessions (createdAt)
EXPIRE AFTER 86400 SECONDS

MongoDB syntax:

db.sessions.createIndex(
  { createdAt: 1 },
  { expireAfterSeconds: 86400 }
)

JOIN-Optimized Indexing

When using SQL JOINs, ensure both collections have appropriate indexes:

SELECT 
  o.orderDate,
  o.totalAmount,
  c.name,
  c.region
FROM orders o
JOIN customers c ON o.customerId = c._id
WHERE c.region = 'North America'
  AND o.orderDate >= '2025-01-01'
ORDER BY o.orderDate DESC

Required indexes:

-- Index foreign key field in orders
CREATE INDEX idx_orders_customer_date 
ON orders (customerId, orderDate)

-- Index join condition and filter in customers  
CREATE INDEX idx_customers_region_id 
ON customers (region, _id)

Index Performance Analysis

Monitoring Index Usage

Check if your indexes are being used effectively:

-- Analyze query performance
EXPLAIN SELECT name, email
FROM users  
WHERE email = 'test@example.com'
  AND status = 'active'

This helps identify: - Which indexes are used - Query execution time - Documents examined vs returned - Whether sorts use indexes

Index Optimization Tips

  1. Use Covered Queries: Include all selected fields in the index

    -- This query can be fully satisfied by the index
    CREATE INDEX idx_users_covered 
    ON users (email, status, name)
    
    SELECT name FROM users 
    WHERE email = 'test@example.com' AND status = 'active'
    

  2. Optimize Sort Operations: Include sort fields in compound indexes

    CREATE INDEX idx_orders_status_date 
    ON orders (status, orderDate)
    
    SELECT * FROM orders 
    WHERE status = 'pending'
    ORDER BY orderDate DESC
    

  3. Consider Index Intersection: Sometimes multiple single-field indexes work better than one compound index

Real-World Indexing Strategy

E-commerce Platform Example

For a typical e-commerce application, here's a comprehensive indexing strategy:

-- Product catalog queries
CREATE INDEX idx_products_category_price ON products (category, price)
CREATE INDEX idx_products_search ON products (name, description) -- text index
CREATE INDEX idx_products_instock ON products (inStock, category)

-- Order management  
CREATE INDEX idx_orders_customer_date ON orders (customerId, orderDate)
CREATE INDEX idx_orders_status_date ON orders (status, orderDate)
CREATE INDEX idx_orders_items_category ON orders (items.category, items.price)

-- User management
CREATE INDEX idx_users_email ON users (email) -- unique
CREATE INDEX idx_users_region_status ON users (region, status)

-- Analytics queries
CREATE INDEX idx_orders_analytics ON orders (orderDate, status, totalAmount)

Query Pattern Matching

Design indexes based on your most common query patterns:

-- Pattern 1: Customer order history
SELECT * FROM orders 
WHERE customerId = ? 
ORDER BY orderDate DESC

-- Supporting index:
CREATE INDEX idx_orders_customer_date ON orders (customerId, orderDate)

-- Pattern 2: Product search with filters  
SELECT * FROM products
WHERE category = ? AND price BETWEEN ? AND ?
ORDER BY price ASC

-- Supporting index:
CREATE INDEX idx_products_category_price ON products (category, price)

-- Pattern 3: Recent activity analytics
SELECT DATE(orderDate), COUNT(*), SUM(totalAmount)
FROM orders
WHERE orderDate >= ?
GROUP BY DATE(orderDate)

-- Supporting index:
CREATE INDEX idx_orders_date_amount ON orders (orderDate, totalAmount)

Index Maintenance and Monitoring

Identifying Missing Indexes

Use query analysis to find slow operations:

-- Queries scanning many documents suggest missing indexes
EXPLAIN ANALYZE SELECT * FROM orders 
WHERE status = 'pending' AND items[0].category = 'Electronics'

If the explain plan shows high totalDocsExamined relative to totalDocsReturned, you likely need better indexes.

Removing Unused Indexes

Monitor index usage and remove unnecessary ones:

// MongoDB command to see index usage stats
db.orders.aggregate([{ $indexStats: {} }])

Remove indexes that haven't been used:

DROP INDEX idx_orders_unused ON orders

Performance Best Practices

  1. Limit Index Count: Too many indexes slow down writes
  2. Use Ascending Order: Unless you specifically need descending sorts
  3. Index Selectivity: Put most selective fields first in compound indexes
  4. Monitor Index Size: Large indexes impact memory usage
  5. Regular Maintenance: Rebuild indexes periodically in busy systems

QueryLeaf Integration

When using QueryLeaf for SQL-to-MongoDB translation, your indexing strategy becomes even more important. QueryLeaf can provide index recommendations based on your SQL query patterns:

-- QueryLeaf can suggest optimal indexes for complex queries
SELECT 
  c.region,
  COUNT(DISTINCT o.customerId) AS uniqueCustomers,
  SUM(i.price * i.quantity) AS totalRevenue
FROM customers c
JOIN orders o ON c._id = o.customerId  
CROSS JOIN UNNEST(o.items) AS i
WHERE o.orderDate >= '2025-01-01'
  AND o.status = 'completed'
GROUP BY c.region
HAVING totalRevenue > 10000
ORDER BY totalRevenue DESC

QueryLeaf analyzes such queries and can recommend compound indexes that support the JOIN conditions, array operations, filtering, grouping, and sorting requirements.

Conclusion

Effective MongoDB indexing requires understanding how your SQL queries translate to document operations. By thinking about indexes in terms of your query patterns rather than just individual fields, you can create an indexing strategy that significantly improves application performance.

Key takeaways: - Design indexes to match your SQL query patterns - Use compound indexes for multi-field queries and sorts - Consider partial indexes for conditional data - Monitor and maintain indexes based on actual usage - Test index effectiveness with realistic data volumes

With proper indexing aligned to your SQL query patterns, MongoDB can deliver excellent performance while maintaining the query readability you're used to from SQL databases.

MongoDB Data Modeling: Managing Relationships with SQL-Style Queries

One of the biggest challenges when transitioning from relational databases to MongoDB is understanding how to model relationships between data. MongoDB's flexible document structure offers multiple ways to represent relationships, but choosing the right approach can be confusing.

This guide shows how to design and query MongoDB relationships using familiar SQL patterns, making data modeling decisions clearer and queries more intuitive.

Understanding MongoDB Relationship Patterns

MongoDB provides several ways to model relationships:

  1. Embedded Documents - Store related data within the same document
  2. References - Store ObjectId references to other documents
  3. Hybrid Approach - Combine embedding and referencing strategically

Let's explore each pattern with practical examples.

Pattern 1: Embedded Relationships

When to Embed

Use embedded documents when: - Related data is always accessed together - The embedded data has a clear ownership relationship - The embedded collection size is bounded and relatively small

Example: Blog Posts with Comments

// Embedded approach
{
  "_id": ObjectId("..."),
  "title": "Getting Started with MongoDB",
  "content": "MongoDB is a powerful NoSQL database...",
  "author": "Jane Developer",
  "publishDate": ISODate("2025-01-10"),
  "comments": [
    {
      "author": "John Reader",
      "text": "Great article!",
      "date": ISODate("2025-01-11")
    },
    {
      "author": "Alice Coder",
      "text": "Very helpful examples",
      "date": ISODate("2025-01-12")
    }
  ]
}

Querying embedded data with SQL is straightforward:

-- Find posts with comments containing specific text
SELECT title, author, publishDate
FROM posts
WHERE comments[0].text LIKE '%helpful%'
   OR comments[1].text LIKE '%helpful%'
   OR comments[2].text LIKE '%helpful%'

-- Get posts with recent comments
SELECT title, comments[0].author, comments[0].date
FROM posts  
WHERE comments[0].date >= '2025-01-01'
ORDER BY comments[0].date DESC

The equivalent MongoDB aggregation would be much more complex:

db.posts.aggregate([
  {
    $match: {
      "comments.text": { $regex: /helpful/i }
    }
  },
  {
    $project: {
      title: 1,
      author: 1, 
      publishDate: 1
    }
  }
])

Pattern 2: Referenced Relationships

When to Reference

Use references when: - Related documents are large or frequently updated independently - You need to avoid duplication across multiple parent documents - Relationship cardinality is one-to-many or many-to-many

Example: E-commerce with Separate Collections

// Orders collection
{
  "_id": ObjectId("..."),
  "customerId": ObjectId("507f1f77bcf86cd799439011"),
  "orderDate": ISODate("2025-01-15"),
  "totalAmount": 1299.97,
  "status": "processing"
}

// Customers collection  
{
  "_id": ObjectId("507f1f77bcf86cd799439011"),
  "name": "Sarah Johnson",
  "email": "sarah@example.com",
  "address": {
    "street": "123 Main St",
    "city": "Seattle", 
    "state": "WA"
  },
  "memberSince": ISODate("2024-03-15")
}

SQL JOINs make working with references intuitive:

-- Get order details with customer information
SELECT 
  o.orderDate,
  o.totalAmount,
  o.status,
  c.name AS customerName,
  c.email,
  c.address.city
FROM orders o
JOIN customers c ON o.customerId = c._id
WHERE o.orderDate >= '2025-01-01'
ORDER BY o.orderDate DESC

Advanced Reference Queries

-- Find customers with multiple high-value orders
SELECT 
  c.name,
  c.email,
  COUNT(o._id) AS orderCount,
  SUM(o.totalAmount) AS totalSpent
FROM customers c
JOIN orders o ON c._id = o.customerId
WHERE o.totalAmount > 500
GROUP BY c._id, c.name, c.email
HAVING COUNT(o._id) >= 3
ORDER BY totalSpent DESC

Pattern 3: Hybrid Approach

When to Use Hybrid Modeling

Combine embedding and referencing when: - You need both immediate access to summary data and detailed information - Some related data changes frequently while other parts remain stable - You want to optimize for different query patterns

Example: User Profiles with Activity History

// Users collection with embedded recent activity + references
{
  "_id": ObjectId("..."),
  "username": "developer_mike",
  "profile": {
    "name": "Mike Chen",
    "avatar": "/images/avatars/mike.jpg",
    "bio": "Full-stack developer"
  },
  "recentActivity": [
    {
      "type": "post_created",
      "title": "MongoDB Best Practices", 
      "date": ISODate("2025-01-14"),
      "postId": ObjectId("...")
    },
    {
      "type": "comment_added",
      "text": "Great point about indexing",
      "date": ISODate("2025-01-13"), 
      "postId": ObjectId("...")
    }
  ],
  "stats": {
    "totalPosts": 127,
    "totalComments": 892,
    "reputation": 2450
  }
}

// Separate Posts collection for full content
{
  "_id": ObjectId("..."),
  "authorId": ObjectId("..."),
  "title": "MongoDB Best Practices",
  "content": "When working with MongoDB...",
  "publishDate": ISODate("2025-01-14")
}

Query both embedded and referenced data:

-- Get user dashboard with recent activity and full post details
SELECT 
  u.username,
  u.profile.name,
  u.recentActivity[0].title AS latestActivityTitle,
  u.recentActivity[0].date AS latestActivityDate,
  u.stats.totalPosts,
  p.content AS latestPostContent
FROM users u
LEFT JOIN posts p ON u.recentActivity[0].postId = p._id
WHERE u.recentActivity[0].type = 'post_created'
  AND u.recentActivity[0].date >= '2025-01-01'
ORDER BY u.recentActivity[0].date DESC

Performance Optimization for Relationships

Indexing Strategies

-- Index embedded array fields for efficient queries
CREATE INDEX ON orders (items[0].category, items[0].price)

-- Index reference fields
CREATE INDEX ON orders (customerId, orderDate)

-- Compound indexes for complex queries
CREATE INDEX ON posts (authorId, publishDate, status)

Query Optimization Patterns

-- Efficient pagination with references
SELECT 
  o._id,
  o.orderDate,
  o.totalAmount,
  c.name
FROM orders o
JOIN customers c ON o.customerId = c._id
WHERE o.orderDate >= '2025-01-01'
ORDER BY o.orderDate DESC
LIMIT 20 OFFSET 0

Choosing the Right Pattern

Decision Matrix

Scenario Pattern Reason
User profiles with preferences Embedded Preferences are small and always accessed with user
Blog posts with comments Embedded Comments belong to post, bounded size
Orders with customer data Referenced Customer data is large and shared across orders
Products with inventory tracking Referenced Inventory changes frequently and independently
Shopping cart items Embedded Cart items are temporary and belong to session
Order items with product details Hybrid Embed order-specific data, reference product catalog

Performance Guidelines

-- Good: Query embedded data directly
SELECT customerId, items[0].name, items[0].price
FROM orders
WHERE items[0].category = 'Electronics'

-- Better: Use references for large related documents
SELECT o.orderDate, c.name, c.address.city
FROM orders o  
JOIN customers c ON o.customerId = c._id
WHERE c.address.state = 'CA'

-- Best: Hybrid approach for optimal queries
SELECT 
  u.username,
  u.stats.reputation,
  u.recentActivity[0].title,
  p.content
FROM users u
JOIN posts p ON u.recentActivity[0].postId = p._id
WHERE u.stats.reputation > 1000

Data Consistency Patterns

Maintaining Reference Integrity

-- Find orphaned records
SELECT o._id, o.customerId
FROM orders o
LEFT JOIN customers c ON o.customerId = c._id
WHERE c._id IS NULL

-- Update related documents atomically
UPDATE users
SET stats.totalPosts = stats.totalPosts + 1
WHERE _id = '507f1f77bcf86cd799439011'

Querying with QueryLeaf

All the SQL examples in this guide work seamlessly with QueryLeaf, which translates your familiar SQL syntax into optimized MongoDB operations. You get the modeling flexibility of MongoDB with the query clarity of SQL.

For more details on advanced relationship queries, see our guides on JOINs and nested field access.

Conclusion

MongoDB relationship modeling doesn't have to be complex. By understanding when to embed, reference, or use hybrid approaches, you can design schemas that are both performant and maintainable.

Using SQL syntax for relationship queries provides several advantages: - Familiar patterns for developers with SQL background - Clear expression of business logic and data relationships - Easier debugging and query optimization - Better collaboration across teams with mixed database experience

The key is choosing the right modeling pattern for your use case and then leveraging SQL's expressive power to query your MongoDB data effectively. With the right approach, you get MongoDB's document flexibility combined with SQL's query clarity.

MongoDB Aggregation Pipelines Simplified: From Complex Pipelines to Simple SQL

MongoDB's aggregation framework is powerful, but its multi-stage pipeline syntax can be overwhelming for developers coming from SQL backgrounds. Complex operations that would be straightforward in SQL often require lengthy aggregation pipelines with multiple stages, operators, and nested expressions.

What if you could achieve the same results using familiar SQL syntax? Let's explore how to transform complex MongoDB aggregations into readable SQL queries.

The Aggregation Pipeline Challenge

Consider an e-commerce database with orders and customers. A common business requirement is to analyze sales by region and product category. Here's what this looks like with MongoDB's native aggregation:

// Sample documents
// Orders collection:
{
  "_id": ObjectId("..."),
  "customerId": ObjectId("..."),
  "orderDate": ISODate("2025-07-15"),
  "items": [
    { "product": "iPhone 15", "category": "Electronics", "price": 999, "quantity": 1 },
    { "product": "Case", "category": "Accessories", "price": 29, "quantity": 2 }
  ],
  "status": "completed"
}

// Customers collection:
{
  "_id": ObjectId("..."),
  "name": "John Smith",
  "email": "john@example.com",
  "region": "North America",
  "registrationDate": ISODate("2024-03-10")
}

To get sales by region and category, you'd need this complex aggregation pipeline:

db.orders.aggregate([
  // Stage 1: Match completed orders from last 30 days
  {
    $match: {
      status: "completed",
      orderDate: { $gte: new Date(Date.now() - 30*24*60*60*1000) }
    }
  },

  // Stage 2: Unwind the items array
  { $unwind: "$items" },

  // Stage 3: Join with customers
  {
    $lookup: {
      from: "customers",
      localField: "customerId",
      foreignField: "_id",
      as: "customer"
    }
  },

  // Stage 4: Unwind customer (since lookup returns array)
  { $unwind: "$customer" },

  // Stage 5: Calculate item total and group by region/category
  {
    $group: {
      _id: {
        region: "$customer.region",
        category: "$items.category"
      },
      totalRevenue: { 
        $sum: { $multiply: ["$items.price", "$items.quantity"] }
      },
      orderCount: { $sum: 1 },
      avgOrderValue: { 
        $avg: { $multiply: ["$items.price", "$items.quantity"] }
      }
    }
  },

  // Stage 6: Sort by revenue descending
  { $sort: { totalRevenue: -1 } },

  // Stage 7: Format output
  {
    $project: {
      _id: 0,
      region: "$_id.region",
      category: "$_id.category",
      totalRevenue: 1,
      orderCount: 1,
      avgOrderValue: { $round: ["$avgOrderValue", 2] }
    }
  }
])

This pipeline has 7 stages and is difficult to read, modify, or debug. The logic is spread across multiple stages, making it hard to understand the business intent.

SQL: Clear and Concise

The same analysis becomes much more readable with SQL:

SELECT 
  c.region,
  i.category,
  SUM(i.price * i.quantity) AS totalRevenue,
  COUNT(*) AS orderCount,
  ROUND(AVG(i.price * i.quantity), 2) AS avgOrderValue
FROM orders o
JOIN customers c ON o.customerId = c._id
CROSS JOIN UNNEST(o.items) AS i
WHERE o.status = 'completed'
  AND o.orderDate >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY c.region, i.category
ORDER BY totalRevenue DESC

The SQL version is much more concise and follows a logical flow that matches how we think about the problem. Let's break down more examples.

Common Aggregation Patterns in SQL

1. Time-Based Analytics

MongoDB aggregation for daily sales trends:

db.orders.aggregate([
  {
    $match: {
      orderDate: { $gte: ISODate("2025-07-01") },
      status: "completed"
    }
  },
  {
    $group: {
      _id: {
        year: { $year: "$orderDate" },
        month: { $month: "$orderDate" },
        day: { $dayOfMonth: "$orderDate" }
      },
      dailySales: { $sum: "$totalAmount" },
      orderCount: { $sum: 1 }
    }
  },
  {
    $project: {
      _id: 0,
      date: {
        $dateFromParts: {
          year: "$_id.year",
          month: "$_id.month",
          day: "$_id.day"
        }
      },
      dailySales: 1,
      orderCount: 1
    }
  },
  { $sort: { date: 1 } }
])

SQL equivalent:

SELECT 
  DATE(orderDate) AS date,
  SUM(totalAmount) AS dailySales,
  COUNT(*) AS orderCount
FROM orders
WHERE orderDate >= '2025-07-01'
  AND status = 'completed'
GROUP BY DATE(orderDate)
ORDER BY date

2. Complex Filtering and Grouping

Finding top customers by spending in each region:

db.orders.aggregate([
  { $match: { status: "completed" } },
  {
    $lookup: {
      from: "customers",
      localField: "customerId", 
      foreignField: "_id",
      as: "customer"
    }
  },
  { $unwind: "$customer" },
  {
    $group: {
      _id: {
        customerId: "$customerId",
        region: "$customer.region"
      },
      customerName: { $first: "$customer.name" },
      totalSpent: { $sum: "$totalAmount" },
      orderCount: { $sum: 1 }
    }
  },
  { $sort: { "_id.region": 1, totalSpent: -1 } },
  {
    $group: {
      _id: "$_id.region",
      topCustomers: {
        $push: {
          customerId: "$_id.customerId",
          name: "$customerName",
          totalSpent: "$totalSpent",
          orderCount: "$orderCount"
        }
      }
    }
  }
])

SQL with window functions:

SELECT 
  region,
  customerId,
  customerName,
  totalSpent,
  orderCount,
  RANK() OVER (PARTITION BY region ORDER BY totalSpent DESC) as regionRank
FROM (
  SELECT 
    c.region,
    o.customerId,
    c.name AS customerName,
    SUM(o.totalAmount) AS totalSpent,
    COUNT(*) AS orderCount
  FROM orders o
  JOIN customers c ON o.customerId = c._id
  WHERE o.status = 'completed'
  GROUP BY c.region, o.customerId, c.name
) customer_totals
WHERE regionRank <= 5
ORDER BY region, totalSpent DESC

3. Advanced Array Processing

Analyzing product performance across all orders:

db.orders.aggregate([
  { $match: { status: "completed" } },
  { $unwind: "$items" },
  {
    $group: {
      _id: "$items.product",
      category: { $first: "$items.category" },
      totalQuantity: { $sum: "$items.quantity" },
      totalRevenue: { $sum: { $multiply: ["$items.price", "$items.quantity"] } },
      avgPrice: { $avg: "$items.price" },
      orderFrequency: { $sum: 1 }
    }
  },
  { $sort: { totalRevenue: -1 } },
  {
    $project: {
      _id: 0,
      product: "$_id",
      category: 1,
      totalQuantity: 1,
      totalRevenue: 1,
      avgPrice: { $round: ["$avgPrice", 2] },
      orderFrequency: 1
    }
  }
])

SQL equivalent:

SELECT 
  i.product,
  i.category,
  SUM(i.quantity) AS totalQuantity,
  SUM(i.price * i.quantity) AS totalRevenue,
  ROUND(AVG(i.price), 2) AS avgPrice,
  COUNT(*) AS orderFrequency
FROM orders o
CROSS JOIN UNNEST(o.items) AS i
WHERE o.status = 'completed'
GROUP BY i.product, i.category
ORDER BY totalRevenue DESC

Advanced SQL Features for MongoDB

Conditional Aggregations

Instead of multiple MongoDB pipeline stages for conditional logic:

SELECT 
  customerId,
  COUNT(*) AS totalOrders,
  COUNT(CASE WHEN totalAmount > 100 THEN 1 END) AS highValueOrders,
  COUNT(CASE WHEN status = 'completed' THEN 1 END) AS completedOrders,
  ROUND(
    COUNT(CASE WHEN totalAmount > 100 THEN 1 END) * 100.0 / COUNT(*), 
    2
  ) AS highValuePercentage
FROM orders
WHERE orderDate >= '2025-01-01'
GROUP BY customerId
HAVING COUNT(*) >= 5
ORDER BY highValuePercentage DESC

Window Functions for Rankings

-- Top 3 products in each category by revenue
SELECT *
FROM (
  SELECT 
    i.category,
    i.product,
    SUM(i.price * i.quantity) AS revenue,
    ROW_NUMBER() OVER (PARTITION BY i.category ORDER BY SUM(i.price * i.quantity) DESC) as rank
  FROM orders o
  CROSS JOIN UNNEST(o.items) AS i
  WHERE o.status = 'completed'
  GROUP BY i.category, i.product
) ranked_products
WHERE rank <= 3
ORDER BY category, rank

Performance Benefits

SQL queries often perform better because:

  1. Query Optimization: SQL engines optimize entire queries, while MongoDB processes each pipeline stage separately
  2. Index Usage: SQL can better utilize compound indexes across JOINs
  3. Memory Efficiency: No need to pass large intermediate result sets between pipeline stages
  4. Parallel Processing: SQL engines can parallelize operations more effectively

When to Use SQL vs Native Aggregation

Use SQL-style queries when: - Writing complex analytics and reporting queries - Team members are more familiar with SQL - You need readable, maintainable code - Working with multiple collections (JOINs)

Stick with MongoDB aggregation when: - Using MongoDB-specific features like $facet or $bucket - Need fine-grained control over pipeline stages - Working with highly specialized MongoDB operators - Performance testing shows aggregation pipeline is faster for your specific use case

Real-World Example: Customer Segmentation

Here's a practical customer segmentation analysis that would be complex in MongoDB but straightforward in SQL:

SELECT 
  CASE 
    WHEN totalSpent > 1000 THEN 'VIP'
    WHEN totalSpent > 500 THEN 'Premium'
    WHEN totalSpent > 100 THEN 'Regular'
    ELSE 'New'
  END AS customerSegment,
  COUNT(*) AS customerCount,
  AVG(totalSpent) AS avgSpending,
  AVG(orderCount) AS avgOrders,
  MIN(lastOrderDate) AS earliestLastOrder,
  MAX(lastOrderDate) AS latestLastOrder
FROM (
  SELECT 
    c._id,
    c.name,
    COALESCE(SUM(o.totalAmount), 0) AS totalSpent,
    COUNT(o._id) AS orderCount,
    MAX(o.orderDate) AS lastOrderDate
  FROM customers c
  LEFT JOIN orders o ON c._id = o.customerId AND o.status = 'completed'
  GROUP BY c._id, c.name
) customer_summary
GROUP BY customerSegment
ORDER BY 
  CASE customerSegment
    WHEN 'VIP' THEN 1
    WHEN 'Premium' THEN 2  
    WHEN 'Regular' THEN 3
    ELSE 4
  END

Getting Started with QueryLeaf

Ready to simplify your MongoDB aggregations? QueryLeaf allows you to write SQL queries that automatically compile to optimized MongoDB operations. You get the readability of SQL with the flexibility of MongoDB's document model.

For more information about advanced SQL features, check out our guides on GROUP BY operations and working with JOINs.

Conclusion

MongoDB aggregation pipelines are powerful but can become unwieldy for complex analytics. SQL provides a more intuitive way to express these operations, making your code more readable and maintainable.

By using SQL syntax for MongoDB operations, you can: - Reduce complexity in data analysis queries - Make code more accessible to SQL-familiar team members
- Improve query maintainability and debugging - Leverage familiar patterns for complex business logic

The combination of SQL's expressiveness with MongoDB's document flexibility gives you the best of both worlds – clear, concise queries that work with your existing MongoDB data structures.

MongoDB Array Operations Made Simple with SQL Syntax

Working with arrays in MongoDB can be challenging, especially if you come from a SQL background. MongoDB's native query syntax for arrays involves complex aggregation pipelines and operators that can be intimidating for developers used to straightforward SQL queries.

What if you could query MongoDB arrays using the SQL syntax you already know? Let's explore how to make MongoDB array operations intuitive and readable.

The Array Query Challenge in MongoDB

Consider a typical e-commerce scenario where you have orders with arrays of items:

{
  "_id": ObjectId("..."),
  "customerId": "user123",
  "orderDate": "2025-01-10",
  "items": [
    { "name": "Laptop", "price": 999.99, "category": "Electronics" },
    { "name": "Mouse", "price": 29.99, "category": "Electronics" },
    { "name": "Keyboard", "price": 79.99, "category": "Electronics" }
  ],
  "status": "shipped"
}

In native MongoDB, finding orders where the first item costs more than $500 requires this aggregation pipeline:

db.orders.aggregate([
  {
    $match: {
      "items.0.price": { $gt: 500 }
    }
  },
  {
    $project: {
      customerId: 1,
      orderDate: 1,
      firstItemName: "$items.0.name",
      firstItemPrice: "$items.0.price",
      status: 1
    }
  }
])

This works, but it's verbose and not intuitive for developers familiar with SQL.

SQL Array Access: Intuitive and Readable

With SQL syntax for MongoDB, the same query becomes straightforward:

SELECT 
  customerId,
  orderDate,
  items[0].name AS firstItemName,
  items[0].price AS firstItemPrice,
  status
FROM orders
WHERE items[0].price > 500

Much cleaner, right? Let's explore more array operations.

Common Array Operations with SQL

1. Accessing Specific Array Elements

Query orders where the second item is in the Electronics category:

SELECT customerId, orderDate, items[1].name, items[1].category
FROM orders
WHERE items[1].category = 'Electronics'

This translates to MongoDB's items.1.category field path, handling the zero-based indexing automatically.

2. Working with Nested Arrays

For documents with nested arrays, like product reviews with ratings arrays:

{
  "productId": "prod456",
  "reviews": [
    {
      "user": "alice",
      "rating": 5,
      "tags": ["excellent", "fast-delivery"]
    },
    {
      "user": "bob", 
      "rating": 4,
      "tags": ["good", "value-for-money"]
    }
  ]
}

Find products where the first review's second tag is "fast-delivery":

SELECT productId, reviews[0].user, reviews[0].rating
FROM products
WHERE reviews[0].tags[1] = 'fast-delivery'

3. Filtering and Projecting Array Elements

Get order details showing only the first two items:

SELECT 
  customerId,
  orderDate,
  items[0].name AS item1Name,
  items[0].price AS item1Price,
  items[1].name AS item2Name,
  items[1].price AS item2Price
FROM orders
WHERE status = 'shipped'

4. Array Operations in JOINs

When joining collections that contain arrays, SQL syntax makes relationships clear:

SELECT 
  u.name,
  u.email,
  o.orderDate,
  o.items[0].name AS primaryItem
FROM users u
JOIN orders o ON u._id = o.customerId
WHERE o.items[0].price > 100

This joins users with orders and filters by the first item's price, automatically handling ObjectId conversion.

Advanced Array Patterns

Working with Dynamic Array Access

While direct array indexing works well for known positions, you can also combine array access with other SQL features:

-- Get orders where any item exceeds $500
SELECT customerId, orderDate, status
FROM orders
WHERE items[0].price > 500 
   OR items[1].price > 500 
   OR items[2].price > 500

For more complex array queries that need to check all elements regardless of position, you'd still use MongoDB's native array operators, but for specific positional queries, SQL array access is perfect.

Updating Array Elements

Updating specific array positions is also intuitive with SQL syntax:

-- Update the price of the first item in an order
UPDATE orders
SET items[0].price = 899.99
WHERE _id = '507f1f77bcf86cd799439011'

-- Update nested array values
UPDATE products
SET reviews[0].tags[1] = 'super-fast-delivery'
WHERE productId = 'prod456'

Performance Considerations

When working with array operations:

  1. Index Array Elements: Create indexes on frequently queried array positions like items.0.price
  2. Limit Deep Nesting: Accessing deeply nested arrays (items[0].details[2].specs[1]) can be slow
  3. Consider Array Size: Operations on large arrays may impact performance
  4. Use Compound Indexes: For queries combining array access with other fields

Real-World Example: E-commerce Analytics

Here's a practical example analyzing order patterns:

-- Find high-value orders where the primary item is expensive
SELECT 
  customerId,
  orderDate,
  items[0].name AS primaryProduct,
  items[0].price AS primaryPrice,
  items[0].category,
  status
FROM orders
WHERE items[0].price > 200
  AND status IN ('shipped', 'delivered')
  AND orderDate >= '2025-01-01'
ORDER BY items[0].price DESC
LIMIT 50

This query helps identify customers who purchase high-value primary items, useful for marketing campaigns or inventory planning.

When to Use Array Indexing vs Native MongoDB Queries

Use SQL array indexing when: - Accessing specific, known array positions - Working with fixed-structure arrays - Writing readable queries for specific business logic - Team members are more comfortable with SQL

Use native MongoDB queries when: - Need to query all array elements regardless of position - Working with variable-length arrays where position doesn't matter - Requires complex array aggregations - Performance is critical and you need MongoDB's optimized array operators

Getting Started

To start using SQL syntax for MongoDB array operations, you can use tools that translate SQL to MongoDB queries. The key is having a system that understands both SQL array syntax and MongoDB's document structure.

For more information about working with nested document structures in SQL, check out our guide on working with nested fields which complements array operations perfectly.

Conclusion

MongoDB arrays don't have to be intimidating. With SQL syntax, you can leverage familiar patterns to query and manipulate array data effectively. This approach bridges the gap between SQL knowledge and MongoDB's document model, making your database operations more intuitive and maintainable.

Whether you're building e-commerce platforms, content management systems, or analytics dashboards, SQL-style array operations can simplify your MongoDB development workflow while keeping your queries readable and maintainable.

QueryLeaf Integration: QueryLeaf automatically translates SQL array syntax into MongoDB's native array operators, making complex array operations accessible through familiar SQL patterns. Array indexing, element filtering, and nested array queries are seamlessly handled through standard SQL syntax, enabling developers to work with MongoDB arrays using the SQL knowledge they already possess without learning MongoDB's aggregation pipeline syntax.

The combination of SQL's clarity with MongoDB's flexibility gives you the best of both worlds – familiar syntax with powerful document database capabilities.

MongoDB Full-Text Search for Intelligent Content Discovery: Advanced Text Indexing, Ranking, and Relevance Scoring with SQL-Compatible Operations

Modern applications require sophisticated full-text search capabilities to enable users to discover relevant content across large document collections, knowledge bases, and content management systems. Traditional database text search approaches often provide limited functionality, poor performance with large datasets, and insufficient relevance ranking mechanisms that fail to deliver the intelligent search experiences users expect.

MongoDB Full-Text Search provides native support for intelligent text indexing, advanced relevance scoring, and high-performance text queries with comprehensive linguistic features including stemming, stop word filtering, and multi-language support. Unlike basic SQL LIKE operations or external search engine integrations that require complex infrastructure management, MongoDB's Full-Text Search delivers enterprise-grade search capabilities while maintaining unified data access patterns and transactional consistency.

The Traditional Text Search Challenge

Building effective text search with conventional database approaches creates significant performance and functionality limitations:

-- Traditional PostgreSQL text search - limited functionality and performance issues

-- Basic text table with conventional indexing
CREATE TABLE documents (
    document_id BIGSERIAL PRIMARY KEY,
    title VARCHAR(500) NOT NULL,
    content TEXT NOT NULL,
    author VARCHAR(200),
    category VARCHAR(100),
    tags TEXT[],
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    -- Metadata for search optimization
    document_language VARCHAR(10) DEFAULT 'en',
    content_type VARCHAR(50),
    word_count INTEGER,

    -- Search optimization fields
    search_vector TSVECTOR,
    title_search_vector TSVECTOR
);

-- Traditional full-text indexes (limited configuration options)
CREATE INDEX idx_documents_search_vector ON documents USING GIN(search_vector);
CREATE INDEX idx_documents_title_search ON documents USING GIN(title_search_vector);
CREATE INDEX idx_documents_category_search ON documents (category, search_vector) USING GIN;

-- Triggers for maintaining search vectors
CREATE OR REPLACE FUNCTION update_document_search_vector()
RETURNS TRIGGER AS $$
BEGIN
    NEW.search_vector := 
        setweight(to_tsvector(coalesce(NEW.title, '')), 'A') ||
        setweight(to_tsvector(coalesce(NEW.content, '')), 'B') ||
        setweight(to_tsvector(coalesce(array_to_string(NEW.tags, ' '), '')), 'C');

    NEW.title_search_vector := to_tsvector(coalesce(NEW.title, ''));

    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER tr_update_search_vector
    BEFORE INSERT OR UPDATE ON documents
    FOR EACH ROW EXECUTE FUNCTION update_document_search_vector();

-- Complex text search query with limited ranking capabilities
WITH search_query AS (
    SELECT 
        to_tsquery($1) as query_vector,  -- User search query
        $1 as original_query
),

basic_text_search AS (
    SELECT 
        d.document_id,
        d.title,
        d.content,
        d.author,
        d.category,
        d.tags,
        d.created_at,
        d.word_count,
        sq.original_query,

        -- Basic relevance scoring (limited functionality)
        ts_rank(d.search_vector, sq.query_vector) as basic_rank,
        ts_rank_cd(d.search_vector, sq.query_vector) as cover_density_rank,

        -- Title match scoring
        ts_rank(d.title_search_vector, sq.query_vector) as title_rank,

        -- Query matching analysis
        ts_headline(d.content, sq.query_vector, 'MaxWords=30, MinWords=5') as content_highlight,
        ts_headline(d.title, sq.query_vector) as title_highlight,

        -- Manual relevance factors (limited sophistication)
        CASE 
            WHEN d.category = $2 THEN 0.2  -- Category boost parameter
            ELSE 0.0 
        END as category_boost,

        CASE 
            WHEN d.created_at >= CURRENT_TIMESTAMP - INTERVAL '30 days' THEN 0.1
            WHEN d.created_at >= CURRENT_TIMESTAMP - INTERVAL '90 days' THEN 0.05
            ELSE 0.0
        END as recency_boost,

        -- Tag matching (manual implementation)
        CASE 
            WHEN d.tags && string_to_array($3, ',') THEN 0.1  -- Tag filter parameter
            ELSE 0.0
        END as tag_boost

    FROM documents d
    CROSS JOIN search_query sq
    WHERE d.search_vector @@ sq.query_vector
),

enhanced_ranking AS (
    SELECT 
        bts.*,

        -- Combined relevance scoring (manual calculation)
        (
            basic_rank * 0.4 +                    -- Primary text relevance
            title_rank * 0.3 +                    -- Title match importance
            category_boost +                      -- Category relevance
            recency_boost +                       -- Time decay factor
            tag_boost +                          -- Tag matching

            -- Word count normalization (rough approximation)
            CASE 
                WHEN word_count BETWEEN 300 AND 2000 THEN 0.1
                WHEN word_count BETWEEN 100 AND 300 THEN 0.05
                ELSE 0.0
            END
        ) as composite_relevance_score,

        -- Search quality indicators (limited analysis)
        CASE 
            WHEN basic_rank > 0.1 AND title_rank > 0.1 THEN 'high_relevance'
            WHEN basic_rank > 0.05 THEN 'medium_relevance'
            ELSE 'low_relevance'
        END as relevance_category,

        -- Content analysis (basic)
        LENGTH(content) as content_length,
        ROUND((LENGTH(content) / NULLIF(word_count, 0))::numeric, 1) as avg_word_length

    FROM basic_text_search bts
),

search_results AS (
    SELECT 
        er.*,

        -- Result ranking and grouping
        ROW_NUMBER() OVER (ORDER BY composite_relevance_score DESC) as search_rank,

        -- Diversity scoring (limited implementation)
        ROW_NUMBER() OVER (
            PARTITION BY category 
            ORDER BY composite_relevance_score DESC
        ) as category_rank,

        -- Search metadata
        CURRENT_TIMESTAMP as search_performed_at,
        original_query as search_query

    FROM enhanced_ranking er
    WHERE composite_relevance_score > 0.01  -- Minimum relevance threshold
)

SELECT 
    document_id,
    title,
    CASE 
        WHEN LENGTH(content) > 300 THEN LEFT(content, 300) || '...'
        ELSE content
    END as content_preview,
    author,
    category,
    tags,
    TO_CHAR(created_at, 'YYYY-MM-DD') as published_date,

    -- Relevance and ranking metrics
    ROUND(composite_relevance_score::numeric, 4) as relevance_score,
    search_rank,
    relevance_category,

    -- Search highlights (limited formatting)
    title_highlight,
    content_highlight,

    -- Content characteristics
    word_count,
    content_length,

    -- Quality indicators
    CASE 
        WHEN word_count >= 500 AND composite_relevance_score > 0.1 THEN 'comprehensive_match'
        WHEN title_rank > 0.2 THEN 'title_focused_match'
        WHEN basic_rank > 0.1 THEN 'content_focused_match'
        ELSE 'partial_match'
    END as match_quality,

    -- Search context
    search_performed_at,
    search_query

FROM search_results
WHERE 
    -- Result filtering and diversity constraints
    search_rank <= 50                           -- Limit total results
    AND category_rank <= 5                      -- Max per category for diversity
    AND composite_relevance_score >= 0.02       -- Quality threshold

ORDER BY composite_relevance_score DESC, search_rank
LIMIT 20;

-- Problems with traditional text search approaches:
-- 1. Limited linguistic processing and language-specific features
-- 2. Complex manual relevance scoring with poor ranking algorithms
-- 3. Expensive full-text index maintenance and update overhead
-- 4. Limited support for fuzzy matching and typo tolerance
-- 5. Poor performance with large document collections and complex queries
-- 6. No built-in support for search analytics and query optimization
-- 7. Difficult integration of business logic into ranking algorithms
-- 8. Limited faceted search and filtering capabilities
-- 9. Complex infrastructure required for distributed search scenarios
-- 10. Manual implementation of search features like auto-complete and suggestions

MongoDB Full-Text Search provides native intelligent text search capabilities:

// MongoDB Full-Text Search - native intelligent content discovery with advanced ranking
const { MongoClient } = require('mongodb');

const client = new MongoClient('mongodb://localhost:27017');
const db = client.db('content_management_platform');

// Advanced Full-Text Search Management System
class MongoFullTextSearchManager {
  constructor(db, searchConfig = {}) {
    this.db = db;
    this.config = {
      // Text search configuration
      enableFullTextSearch: searchConfig.enableFullTextSearch !== false,
      defaultLanguage: searchConfig.defaultLanguage || 'en',
      supportedLanguages: searchConfig.supportedLanguages || ['en', 'es', 'fr', 'de'],

      // Index configuration
      textIndexWeights: searchConfig.textIndexWeights || {
        title: 10,
        content: 5,
        tags: 8,
        category: 3,
        description: 6
      },

      // Search optimization
      enableFuzzyMatching: searchConfig.enableFuzzyMatching !== false,
      enableAutoComplete: searchConfig.enableAutoComplete !== false,
      defaultSearchLimit: searchConfig.defaultSearchLimit || 20,
      maxSearchResults: searchConfig.maxSearchResults || 100,

      // Relevance scoring
      enableAdvancedScoring: searchConfig.enableAdvancedScoring !== false,
      scoringWeights: searchConfig.scoringWeights || {
        textScore: 0.4,
        titleBoost: 0.3,
        recencyFactor: 0.1,
        popularityBoost: 0.1,
        categoryBoost: 0.1
      },

      // Performance optimization
      enableSearchCache: searchConfig.enableSearchCache !== false,
      enableSearchAnalytics: searchConfig.enableSearchAnalytics !== false,

      ...searchConfig
    };

    // Collection references
    this.collections = {
      documents: db.collection('documents'),
      searchQueries: db.collection('search_queries'),
      searchAnalytics: db.collection('search_analytics'),
      searchSuggestions: db.collection('search_suggestions')
    };

    this.initializeFullTextSearch();
  }

  async initializeFullTextSearch() {
    console.log('Initializing MongoDB Full-Text Search system...');

    try {
      // Create optimized text indexes
      await this.createTextIndexes();

      // Setup search analytics infrastructure
      await this.setupSearchAnalytics();

      // Initialize auto-complete and suggestions
      await this.setupSearchSuggestions();

      console.log('Full-Text Search system initialized successfully');

    } catch (error) {
      console.error('Error initializing full-text search:', error);
      throw error;
    }
  }

  async createTextIndexes() {
    console.log('Creating optimized full-text search indexes...');

    const documentsCollection = this.collections.documents;

    try {
      // Comprehensive text index with weighted fields
      await documentsCollection.createIndex(
        {
          title: 'text',
          content: 'text',
          tags: 'text',
          category: 'text',
          description: 'text',
          author: 'text'
        },
        {
          name: 'comprehensive_text_index',
          weights: this.config.textIndexWeights,
          default_language: this.config.defaultLanguage,
          language_override: 'language',
          background: true
        }
      );

      // Category-specific text index for targeted searches
      await documentsCollection.createIndex(
        {
          category: 1,
          title: 'text',
          content: 'text'
        },
        {
          name: 'category_text_index',
          weights: {
            title: 10,
            content: 5
          },
          background: true
        }
      );

      // Supporting indexes for search optimization
      await documentsCollection.createIndexes([
        {
          key: { category: 1, createdAt: -1 },
          name: 'category_date_index',
          background: true
        },
        {
          key: { author: 1, createdAt: -1 },
          name: 'author_date_index', 
          background: true
        },
        {
          key: { tags: 1, createdAt: -1 },
          name: 'tags_date_index',
          background: true
        },
        {
          key: { 'analytics.viewCount': -1 },
          name: 'popularity_index',
          background: true
        }
      ]);

      console.log('✅ Full-text search indexes created successfully');

    } catch (error) {
      console.error('Error creating text indexes:', error);
      throw error;
    }
  }

  async performIntelligentTextSearch(searchQuery, options = {}) {
    console.log(`Performing intelligent text search for: "${searchQuery}"`);

    const searchStartTime = Date.now();

    try {
      const searchConfig = {
        query: searchQuery,
        language: options.language || this.config.defaultLanguage,
        categoryFilter: options.categoryFilter,
        authorFilter: options.authorFilter,
        dateRange: options.dateRange,
        tagFilter: options.tagFilter,
        limit: Math.min(options.limit || this.config.defaultSearchLimit, this.config.maxSearchResults),
        enableFuzzy: options.enableFuzzy !== false,
        sortBy: options.sortBy || 'relevance',
        includeHighlights: options.includeHighlights !== false,
        enableFacets: options.enableFacets !== false
      };

      // Build comprehensive search aggregation pipeline
      const searchPipeline = [
        // Stage 1: Text search with scoring
        {
          $match: {
            $text: {
              $search: searchConfig.query,
              $language: searchConfig.language,
              ...(searchConfig.enableFuzzy && { 
                $caseSensitive: false,
                $diacriticSensitive: false 
              })
            },

            // Apply filters
            ...(searchConfig.categoryFilter && { 
              category: { $in: Array.isArray(searchConfig.categoryFilter) ? searchConfig.categoryFilter : [searchConfig.categoryFilter] }
            }),
            ...(searchConfig.authorFilter && { author: searchConfig.authorFilter }),
            ...(searchConfig.dateRange && {
              createdAt: {
                $gte: new Date(searchConfig.dateRange.start),
                $lte: new Date(searchConfig.dateRange.end)
              }
            }),
            ...(searchConfig.tagFilter && { 
              tags: { $in: Array.isArray(searchConfig.tagFilter) ? searchConfig.tagFilter : [searchConfig.tagFilter] }
            }),

            // Quality filters
            isPublished: true,
            isActive: { $ne: false }
          }
        },

        // Stage 2: Add text score and metadata
        {
          $addFields: {
            textScore: { $meta: 'textScore' },
            searchMetadata: {
              query: searchConfig.query,
              searchTime: new Date(),
              language: searchConfig.language
            }
          }
        },

        // Stage 3: Enhanced relevance scoring
        {
          $addFields: {
            // Advanced relevance calculation
            relevanceScore: {
              $add: [
                // Primary text score component
                {
                  $multiply: [
                    { $meta: 'textScore' },
                    this.config.scoringWeights.textScore
                  ]
                },

                // Title boost (if query terms appear in title)
                {
                  $cond: [
                    {
                      $regexMatch: {
                        input: { $toLower: '$title' },
                        regex: { $toLower: searchConfig.query },
                        options: 'i'
                      }
                    },
                    this.config.scoringWeights.titleBoost,
                    0
                  ]
                },

                // Recency factor (newer content gets boost)
                {
                  $multiply: [
                    {
                      $max: [
                        0,
                        {
                          $subtract: [
                            1,
                            {
                              $divide: [
                                { $subtract: [new Date(), '$createdAt'] },
                                365 * 24 * 60 * 60 * 1000  // 1 year in milliseconds
                              ]
                            }
                          ]
                        }
                      ]
                    },
                    this.config.scoringWeights.recencyFactor
                  ]
                },

                // Popularity boost (view count factor)
                {
                  $multiply: [
                    {
                      $min: [
                        1,
                        {
                          $divide: [
                            { $ifNull: ['$analytics.viewCount', 0] },
                            1000
                          ]
                        }
                      ]
                    },
                    this.config.scoringWeights.popularityBoost
                  ]
                },

                // Category match boost
                {
                  $cond: [
                    { $in: ['$category', searchConfig.categoryFilter || []] },
                    this.config.scoringWeights.categoryBoost,
                    0
                  ]
                }
              ]
            },

            // Content quality indicators
            contentQuality: {
              $switch: {
                branches: [
                  {
                    case: { 
                      $and: [
                        { $gte: ['$wordCount', 1000] },
                        { $gte: [{ $ifNull: ['$analytics.averageRating', 0] }, 4] }
                      ]
                    },
                    then: 'high_quality'
                  },
                  {
                    case: {
                      $and: [
                        { $gte: ['$wordCount', 300] },
                        { $gte: [{ $ifNull: ['$analytics.averageRating', 0] }, 3] }
                      ]
                    },
                    then: 'good_quality'
                  },
                  {
                    case: { $gte: ['$wordCount', 100] },
                    then: 'basic_quality'
                  }
                ],
                default: 'short_content'
              }
            },

            // Search match type classification
            matchType: {
              $switch: {
                branches: [
                  {
                    case: {
                      $regexMatch: {
                        input: { $toLower: '$title' },
                        regex: { $toLower: searchConfig.query },
                        options: 'i'
                      }
                    },
                    then: 'title_match'
                  },
                  {
                    case: { $gte: [{ $meta: 'textScore' }, 2.0] },
                    then: 'strong_content_match'
                  },
                  {
                    case: { $gte: [{ $meta: 'textScore' }, 1.0] },
                    then: 'good_content_match'
                  }
                ],
                default: 'weak_content_match'
              }
            }
          }
        },

        // Stage 4: Content analysis and enrichment
        {
          $lookup: {
            from: 'user_interactions',
            let: { docId: '$_id' },
            pipeline: [
              {
                $match: {
                  $expr: {
                    $and: [
                      { $eq: ['$documentId', '$$docId'] },
                      { $gte: ['$interactionDate', { $subtract: [new Date(), 30 * 24 * 60 * 60 * 1000] }] }
                    ]
                  }
                }
              },
              {
                $group: {
                  _id: null,
                  recentViews: { $sum: 1 },
                  averageRating: { $avg: '$rating' },
                  uniqueUsers: { $addToSet: '$userId' }
                }
              }
            ],
            as: 'recentEngagement'
          }
        },

        // Stage 5: Related content discovery
        {
          $lookup: {
            from: 'documents',
            let: {
              currentCategory: '$category',
              currentTags: '$tags',
              currentId: '$_id'
            },
            pipeline: [
              {
                $match: {
                  $expr: {
                    $and: [
                      { $ne: ['$_id', '$$currentId'] },
                      { $eq: ['$isPublished', true] },
                      {
                        $or: [
                          { $eq: ['$category', '$$currentCategory'] },
                          { $gt: [{ $size: { $setIntersection: ['$tags', '$$currentTags'] } }, 0] }
                        ]
                      }
                    ]
                  }
                }
              },
              {
                $sample: { size: 3 }
              },
              {
                $project: {
                  _id: 1,
                  title: 1,
                  category: 1
                }
              }
            ],
            as: 'relatedContent'
          }
        },

        // Stage 6: Generate search highlights
        ...(searchConfig.includeHighlights ? [
          {
            $addFields: {
              searchHighlights: {
                title: {
                  $regexFind: {
                    input: '$title',
                    regex: searchConfig.query,
                    options: 'i'
                  }
                },
                contentSnippet: {
                  $let: {
                    vars: {
                      contentLower: { $toLower: '$content' },
                      queryLower: { $toLower: searchConfig.query }
                    },
                    in: {
                      $cond: [
                        { $gt: [{ $indexOfCP: ['$$contentLower', '$$queryLower'] }, -1] },
                        {
                          $let: {
                            vars: {
                              matchIndex: { $indexOfCP: ['$$contentLower', '$$queryLower'] },
                              snippetStart: {
                                $max: [
                                  0,
                                  { $subtract: [{ $indexOfCP: ['$$contentLower', '$$queryLower'] }, 50] }
                                ]
                              }
                            },
                            in: {
                              $concat: [
                                { $cond: [{ $gt: ['$$snippetStart', 0] }, '...', ''] },
                                { $substrCP: ['$content', '$$snippetStart', 200] },
                                '...'
                              ]
                            }
                          }
                        },
                        { $substrCP: ['$content', 0, 200] }
                      ]
                    }
                  }
                }
              }
            }
          }
        ] : []),

        // Stage 7: Final enrichment and formatting
        {
          $addFields: {
            // Engagement metrics
            engagementMetrics: {
              $cond: [
                { $gt: [{ $size: '$recentEngagement' }, 0] },
                {
                  recentViews: { $arrayElemAt: ['$recentEngagement.recentViews', 0] },
                  averageRating: { $arrayElemAt: ['$recentEngagement.averageRating', 0] },
                  uniqueUsers: { $size: { $arrayElemAt: ['$recentEngagement.uniqueUsers', 0] } }
                },
                {
                  recentViews: 0,
                  averageRating: null,
                  uniqueUsers: 0
                }
              ]
            },

            // Search result metadata
            searchResultMetadata: {
              rank: 0,  // Will be set in sort stage
              resultType: 'standard',
              searchAlgorithm: 'mongodb_text_search_v1'
            }
          }
        },

        // Stage 8: Final projection and cleanup
        {
          $project: {
            // Core document information
            title: 1,
            author: 1,
            category: 1,
            tags: 1,
            createdAt: 1,
            updatedAt: 1,
            language: 1,

            // Content preview
            contentPreview: {
              $cond: [
                searchConfig.includeHighlights,
                '$searchHighlights.contentSnippet',
                { $substrCP: ['$content', 0, 200] }
              ]
            },

            // Search relevance metrics
            textScore: { $round: ['$textScore', 4] },
            relevanceScore: { $round: ['$relevanceScore', 4] },
            matchType: 1,
            contentQuality: 1,

            // Content metrics
            wordCount: 1,

            // Engagement and quality indicators
            engagementMetrics: 1,

            // Related content
            relatedContent: 1,

            // Search highlights
            ...(searchConfig.includeHighlights && { searchHighlights: 1 }),

            // Metadata
            searchMetadata: 1,
            searchResultMetadata: 1
          }
        },

        // Stage 9: Sorting based on configuration
        {
          $sort: {
            ...(searchConfig.sortBy === 'relevance' && { relevanceScore: -1, textScore: -1 }),
            ...(searchConfig.sortBy === 'date' && { createdAt: -1 }),
            ...(searchConfig.sortBy === 'popularity' && { 'analytics.viewCount': -1 }),
            ...(searchConfig.sortBy === 'rating' && { 'analytics.averageRating': -1 })
          }
        },

        // Stage 10: Add search ranking
        {
          $addFields: {
            'searchResultMetadata.rank': {
              $add: [{ $indexOfArray: [{ $map: { input: { $range: [0, searchConfig.limit] }, as: 'i', in: '$$i' } }, { $indexOfArray: [{ $map: { input: { $range: [0, searchConfig.limit] }, as: 'i', in: '$$i' } }, 0] }] }, 1]
            }
          }
        },

        // Stage 11: Limit results
        {
          $limit: searchConfig.limit
        }
      ];

      // Execute search pipeline
      const searchResults = await this.collections.documents
        .aggregate(searchPipeline, {
          allowDiskUse: true,
          maxTimeMS: 30000
        })
        .toArray();

      const searchLatency = Date.now() - searchStartTime;

      // Generate search facets if requested
      let facets = null;
      if (searchConfig.enableFacets) {
        facets = await this.generateSearchFacets(searchQuery, searchConfig);
      }

      // Log search analytics
      await this.logSearchAnalytics({
        query: searchConfig.query,
        resultsCount: searchResults.length,
        searchLatency: searchLatency,
        searchConfig: searchConfig,
        timestamp: new Date()
      });

      console.log(`✅ Text search completed: ${searchResults.length} results in ${searchLatency}ms`);

      return {
        success: true,
        query: searchConfig.query,
        results: searchResults,
        facets: facets,
        searchMetadata: {
          latency: searchLatency,
          resultsCount: searchResults.length,
          language: searchConfig.language,
          algorithm: 'mongodb_text_search_v1'
        }
      };

    } catch (error) {
      console.error('Error performing text search:', error);
      const searchLatency = Date.now() - searchStartTime;

      return {
        success: false,
        error: error.message,
        searchMetadata: {
          latency: searchLatency,
          resultsCount: 0
        }
      };
    }
  }

  async generateSearchFacets(searchQuery, searchConfig) {
    console.log('Generating search facets...');

    try {
      const facetPipeline = [
        // Match documents that would appear in search results
        {
          $match: {
            $text: {
              $search: searchQuery,
              $language: searchConfig.language
            },
            isPublished: true,
            isActive: { $ne: false }
          }
        },

        // Generate facet aggregations
        {
          $facet: {
            // Category distribution
            categories: [
              {
                $group: {
                  _id: '$category',
                  count: { $sum: 1 },
                  averageRelevance: { $avg: { $meta: 'textScore' } }
                }
              },
              {
                $sort: { count: -1 }
              },
              {
                $limit: 10
              }
            ],

            // Author distribution
            authors: [
              {
                $group: {
                  _id: '$author',
                  count: { $sum: 1 },
                  latestDocument: { $max: '$createdAt' }
                }
              },
              {
                $sort: { count: -1 }
              },
              {
                $limit: 10
              }
            ],

            // Tag distribution
            tags: [
              {
                $unwind: '$tags'
              },
              {
                $group: {
                  _id: '$tags',
                  count: { $sum: 1 }
                }
              },
              {
                $sort: { count: -1 }
              },
              {
                $limit: 15
              }
            ],

            // Date range distribution
            dateRanges: [
              {
                $group: {
                  _id: {
                    $switch: {
                      branches: [
                        {
                          case: { $gte: ['$createdAt', { $subtract: [new Date(), 7 * 24 * 60 * 60 * 1000] }] },
                          then: 'last_week'
                        },
                        {
                          case: { $gte: ['$createdAt', { $subtract: [new Date(), 30 * 24 * 60 * 60 * 1000] }] },
                          then: 'last_month'
                        },
                        {
                          case: { $gte: ['$createdAt', { $subtract: [new Date(), 90 * 24 * 60 * 60 * 1000] }] },
                          then: 'last_quarter'
                        },
                        {
                          case: { $gte: ['$createdAt', { $subtract: [new Date(), 365 * 24 * 60 * 60 * 1000] }] },
                          then: 'last_year'
                        }
                      ],
                      default: 'older'
                    }
                  },
                  count: { $sum: 1 }
                }
              }
            ],

            // Content quality distribution
            contentQuality: [
              {
                $group: {
                  _id: {
                    $switch: {
                      branches: [
                        {
                          case: { $gte: ['$wordCount', 1000] },
                          then: 'comprehensive'
                        },
                        {
                          case: { $gte: ['$wordCount', 500] },
                          then: 'detailed'
                        },
                        {
                          case: { $gte: ['$wordCount', 200] },
                          then: 'standard'
                        }
                      ],
                      default: 'brief'
                    }
                  },
                  count: { $sum: 1 }
                }
              }
            ]
          }
        }
      ];

      const facetResults = await this.collections.documents
        .aggregate(facetPipeline)
        .toArray();

      return facetResults[0];

    } catch (error) {
      console.error('Error generating search facets:', error);
      return null;
    }
  }

  async generateAutoCompleteData(query, limit = 10) {
    console.log(`Generating auto-complete suggestions for: "${query}"`);

    try {
      // Use text search with partial matching for auto-complete
      const autoCompletePipeline = [
        {
          $match: {
            $or: [
              { title: { $regex: query, $options: 'i' } },
              { tags: { $regex: query, $options: 'i' } },
              { category: { $regex: query, $options: 'i' } }
            ],
            isPublished: true
          }
        },
        {
          $group: {
            _id: null,
            titleSuggestions: {
              $addToSet: {
                $cond: [
                  { $regexMatch: { input: '$title', regex: query, options: 'i' } },
                  '$title',
                  null
                ]
              }
            },
            tagSuggestions: { $addToSet: '$tags' },
            categorySuggestions: { $addToSet: '$category' }
          }
        },
        {
          $project: {
            suggestions: {
              $setUnion: [
                { $filter: { input: '$titleSuggestions', cond: { $ne: ['$$this', null] } } },
                { $reduce: {
                  input: '$tagSuggestions',
                  initialValue: [],
                  in: { $concatArrays: ['$$value', '$$this'] }
                }},
                '$categorySuggestions'
              ]
            }
          }
        },
        {
          $unwind: '$suggestions'
        },
        {
          $match: {
            suggestions: { $regex: query, $options: 'i' }
          }
        },
        {
          $group: {
            _id: '$suggestions',
            relevance: { $sum: 1 }
          }
        },
        {
          $sort: { relevance: -1 }
        },
        {
          $limit: limit
        }
      ];

      const autoCompleteResults = await this.collections.documents
        .aggregate(autoCompletePipeline)
        .toArray();

      return {
        suggestions: autoCompleteResults.map(result => ({
          text: result._id,
          relevance: result.relevance
        }))
      };

    } catch (error) {
      console.error('Error generating auto-complete suggestions:', error);
      return { suggestions: [] };
    }
  }

  async setupSearchAnalytics() {
    console.log('Setting up search analytics infrastructure...');

    try {
      // Create indexes for search analytics
      await this.collections.searchAnalytics.createIndexes([
        {
          key: { timestamp: -1 },
          name: 'timestamp_index',
          background: true
        },
        {
          key: { 'query': 1, timestamp: -1 },
          name: 'query_time_index',
          background: true
        },
        {
          key: { resultsCount: 1, searchLatency: 1 },
          name: 'performance_index',
          background: true
        }
      ]);

      console.log('✅ Search analytics infrastructure setup completed');

    } catch (error) {
      console.error('Error setting up search analytics:', error);
      throw error;
    }
  }

  async setupSearchSuggestions() {
    console.log('Setting up search suggestions system...');

    try {
      // Create collection for search suggestions with text index
      await this.collections.searchSuggestions.createIndex(
        { suggestion: 'text' },
        {
          name: 'suggestion_text_index',
          background: true
        }
      );

      console.log('✅ Search suggestions system setup completed');

    } catch (error) {
      console.error('Error setting up search suggestions:', error);
    }
  }

  async logSearchAnalytics(searchData) {
    try {
      await this.collections.searchAnalytics.insertOne({
        ...searchData,
        createdAt: new Date()
      });
    } catch (error) {
      console.error('Error logging search analytics:', error);
    }
  }

  async getSearchPerformanceReport(timeRange = '24h') {
    console.log('Generating search performance report...');

    try {
      const timeRangeMs = this.parseTimeRange(timeRange);
      const startTime = new Date(Date.now() - timeRangeMs);

      const performanceReport = await this.collections.searchAnalytics.aggregate([
        {
          $match: {
            timestamp: { $gte: startTime }
          }
        },
        {
          $group: {
            _id: null,
            totalSearches: { $sum: 1 },
            averageLatency: { $avg: '$searchLatency' },
            medianLatency: { $percentile: { input: '$searchLatency', p: [0.5] } },
            p95Latency: { $percentile: { input: '$searchLatency', p: [0.95] } },
            averageResultsCount: { $avg: '$resultsCount' },
            uniqueQueries: { $addToSet: '$query' },
            zeroResultQueries: {
              $sum: {
                $cond: [{ $eq: ['$resultsCount', 0] }, 1, 0]
              }
            }
          }
        },
        {
          $project: {
            totalSearches: 1,
            averageLatency: { $round: ['$averageLatency', 2] },
            medianLatency: { $round: [{ $arrayElemAt: ['$medianLatency', 0] }, 2] },
            p95Latency: { $round: [{ $arrayElemAt: ['$p95Latency', 0] }, 2] },
            averageResultsCount: { $round: ['$averageResultsCount', 1] },
            uniqueQueryCount: { $size: '$uniqueQueries' },
            zeroResultRate: {
              $round: [
                { $multiply: [{ $divide: ['$zeroResultQueries', '$totalSearches'] }, 100] },
                2
              ]
            }
          }
        }
      ]).toArray();

      return performanceReport[0] || {};

    } catch (error) {
      console.error('Error generating search performance report:', error);
      return {};
    }
  }

  parseTimeRange(timeRange) {
    const timeMap = {
      '1h': 60 * 60 * 1000,
      '6h': 6 * 60 * 60 * 1000,
      '24h': 24 * 60 * 60 * 1000,
      '7d': 7 * 24 * 60 * 60 * 1000,
      '30d': 30 * 24 * 60 * 60 * 1000
    };
    return timeMap[timeRange] || timeMap['24h'];
  }

  async shutdown() {
    console.log('Shutting down Full-Text Search manager...');

    try {
      if (this.client) {
        await this.client.close();
      }
      console.log('Full-Text Search manager shutdown completed');
    } catch (error) {
      console.error('Error during shutdown:', error);
    }
  }
}

// Benefits of MongoDB Full-Text Search:
// - Native text indexing with advanced linguistic processing and language support
// - Intelligent relevance scoring with customizable ranking algorithms
// - High-performance text queries with automatic optimization and caching
// - Advanced search features including fuzzy matching, auto-complete, and faceted search
// - Seamless integration with existing MongoDB data and operations
// - Real-time search analytics and query performance monitoring
// - Flexible scoring mechanisms incorporating business logic and user behavior
// - Multi-language support with language-specific stemming and stop word processing
// - SQL-compatible text search operations through QueryLeaf integration
// - Enterprise-ready scalability with distributed search capabilities

module.exports = {
  MongoFullTextSearchManager
};

Understanding MongoDB Full-Text Search Architecture

Advanced Search Patterns for Content Management

MongoDB Full-Text Search enables sophisticated content discovery patterns for modern applications:

// Enterprise Content Discovery Platform with Advanced Search Capabilities
class EnterpriseContentSearchPlatform extends MongoFullTextSearchManager {
  constructor(db, enterpriseConfig) {
    super(db, enterpriseConfig);

    this.enterpriseConfig = {
      ...enterpriseConfig,
      enableSemanticSearch: true,
      enablePersonalization: true,
      enableSearchInsights: true,
      enableContentRecommendations: true
    };

    this.setupEnterpriseSearchCapabilities();
  }

  async implementAdvancedSearchPatterns() {
    console.log('Implementing enterprise search patterns...');

    const searchPatterns = {
      // Semantic search enhancement
      semanticSearchEnhancement: {
        enableConceptualMatching: true,
        enableEntityRecognition: true,
        enableTopicModeling: true,
        enableContextualSearch: true
      },

      // Personalized search results
      personalizedSearch: {
        userProfileIntegration: true,
        behaviorBasedRanking: true,
        preferenceBasedFiltering: true,
        collaborativeFiltering: true
      },

      // Content recommendation engine
      contentRecommendations: {
        similarContentDiscovery: true,
        trendingContentIdentification: true,
        relatedTopicSuggestions: true,
        expertiseBasedRecommendations: true
      },

      // Search quality optimization
      searchQualityOptimization: {
        queryUnderstandingEnhancement: true,
        resultDiversification: true,
        relevanceFeedbackIntegration: true,
        searchResultOptimization: true
      }
    };

    return await this.deployEnterpriseSearchPatterns(searchPatterns);
  }

  async implementPersonalizedSearch(userId, searchQuery, searchOptions = {}) {
    console.log(`Implementing personalized search for user: ${userId}`);

    // Get user search profile and preferences
    const userProfile = await this.getUserSearchProfile(userId);

    // Enhance search with personalization
    const personalizedSearchOptions = {
      ...searchOptions,
      userProfile: userProfile,
      personalizedScoring: true,
      contentPreferences: userProfile.contentPreferences,
      expertiseLevel: userProfile.expertiseLevel,
      preferredAuthors: userProfile.preferredAuthors,
      preferredCategories: userProfile.preferredCategories
    };

    // Execute personalized search
    const personalizedResults = await this.performIntelligentTextSearch(
      searchQuery, 
      personalizedSearchOptions
    );

    // Update user search behavior
    await this.updateUserSearchBehavior(userId, searchQuery, personalizedResults);

    return personalizedResults;
  }

  async getUserSearchProfile(userId) {
    // Aggregate user search behavior and preferences
    const userProfilePipeline = [
      {
        $match: {
          userId: userId,
          timestamp: {
            $gte: new Date(Date.now() - 90 * 24 * 60 * 60 * 1000) // Last 90 days
          }
        }
      },
      {
        $group: {
          _id: '$userId',
          searchQueries: { $push: '$query' },
          clickedResults: { $push: '$clickedDocuments' },
          preferredCategories: { $push: '$categoryInteractions' },
          searchPatterns: { $push: '$searchMetadata' }
        }
      }
    ];

    const profileData = await this.collections.searchAnalytics
      .aggregate(userProfilePipeline)
      .toArray();

    return this.buildUserSearchProfile(profileData[0] || {});
  }

  buildUserSearchProfile(rawProfileData) {
    return {
      contentPreferences: this.extractContentPreferences(rawProfileData),
      expertiseLevel: this.assessExpertiseLevel(rawProfileData),
      preferredAuthors: this.identifyPreferredAuthors(rawProfileData),
      preferredCategories: this.identifyPreferredCategories(rawProfileData),
      searchBehaviorPatterns: this.analyzeSearchPatterns(rawProfileData)
    };
  }
}

SQL-Style Full-Text Search Operations with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB Full-Text Search operations:

-- QueryLeaf full-text search operations with SQL-familiar syntax

-- Create full-text searchable table
CREATE TABLE content_documents (
    document_id UUID PRIMARY KEY,
    title TEXT NOT NULL,
    content TEXT NOT NULL,
    description TEXT,
    author VARCHAR(200),
    category VARCHAR(100),
    tags TEXT[],
    language VARCHAR(10) DEFAULT 'en',

    -- Content metadata
    word_count INTEGER,
    reading_time_minutes INTEGER,
    content_type VARCHAR(50),

    -- Publishing information
    published_date DATE,
    is_published BOOLEAN DEFAULT false,
    is_featured BOOLEAN DEFAULT false,

    -- Analytics
    view_count BIGINT DEFAULT 0,
    like_count BIGINT DEFAULT 0,
    share_count BIGINT DEFAULT 0,
    average_rating DECIMAL(3,2),

    -- Timestamps
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP

) WITH (
    -- MongoDB full-text search configuration
    text_indexes = [
        {
            name: 'comprehensive_text_search',
            fields: {
                title: 10,      -- Higher weight for title matches
                content: 5,     -- Standard weight for content
                description: 8, -- High weight for descriptions
                tags: 7,        -- High weight for tag matches
                category: 3,    -- Lower weight for category
                author: 2       -- Lowest weight for author name
            },
            language: 'en',
            language_override: 'language'
        },
        {
            name: 'category_focused_search',
            fields: {
                title: 'text',
                content: 'text'
            },
            filters: ['category']
        }
    ]
);

-- Advanced full-text search with relevance scoring
WITH intelligent_search_results AS (
    SELECT 
        document_id,
        title,
        content,
        description,
        author,
        category,
        tags,
        published_date,
        word_count,
        view_count,
        average_rating,

        -- MongoDB text search score
        TEXT_SEARCH_SCORE() as text_relevance_score,

        -- Enhanced relevance calculation
        (
            -- Base text relevance (40%)
            TEXT_SEARCH_SCORE() * 0.4 +

            -- Title match bonus (30%)
            CASE 
                WHEN LOWER(title) LIKE '%' || LOWER($1) || '%' THEN 0.3
                WHEN title ILIKE '%' || $1 || '%' THEN 0.2  -- Case insensitive partial match
                ELSE 0.0
            END +

            -- Recency factor (10%)
            CASE 
                WHEN published_date >= CURRENT_DATE - INTERVAL '30 days' THEN 0.1
                WHEN published_date >= CURRENT_DATE - INTERVAL '90 days' THEN 0.05
                ELSE 0.0
            END +

            -- Popularity boost (10%)
            LEAST(0.1, (view_count / 10000.0)) +

            -- Quality indicator (10%)
            CASE 
                WHEN average_rating >= 4.5 THEN 0.1
                WHEN average_rating >= 4.0 THEN 0.05
                WHEN average_rating >= 3.5 THEN 0.025
                ELSE 0.0
            END
        ) as enhanced_relevance_score,

        -- Content quality assessment
        CASE 
            WHEN word_count >= 1500 AND average_rating >= 4.0 THEN 'comprehensive_high_quality'
            WHEN word_count >= 800 AND average_rating >= 3.5 THEN 'detailed_good_quality'
            WHEN word_count >= 300 AND average_rating >= 3.0 THEN 'standard_quality'
            WHEN word_count >= 100 THEN 'brief_content'
            ELSE 'minimal_content'
        END as content_quality_tier,

        -- Match type classification
        CASE 
            WHEN LOWER(title) LIKE '%' || LOWER($1) || '%' THEN 'title_match'
            WHEN TEXT_SEARCH_SCORE() > 2.0 THEN 'strong_content_match'
            WHEN TEXT_SEARCH_SCORE() > 1.0 THEN 'good_content_match'
            ELSE 'weak_content_match'
        END as match_type

    FROM content_documents
    WHERE 
        -- Full-text search condition
        TEXT_SEARCH(title, content, description, tags, $1) -- $1 = search query

        -- Quality and availability filters
        AND is_published = true
        AND word_count >= 50  -- Minimum content length

        -- Optional category filter
        AND ($2 IS NULL OR category = $2)  -- $2 = optional category filter

        -- Optional date range filter  
        AND ($3 IS NULL OR published_date >= $3::DATE)  -- $3 = optional start date
        AND ($4 IS NULL OR published_date <= $4::DATE)  -- $4 = optional end date
),

search_with_highlights AS (
    SELECT 
        isr.*,

        -- Generate search highlights
        TEXT_HIGHLIGHT(title, $1, 'MaxWords=10') as title_highlight,
        TEXT_HIGHLIGHT(content, $1, 'MaxWords=30, MinWords=10') as content_highlight,
        TEXT_HIGHLIGHT(description, $1, 'MaxWords=20') as description_highlight,

        -- Content preview generation
        CASE 
            WHEN LENGTH(content) <= 200 THEN content
            WHEN POSITION(LOWER($1) IN LOWER(content)) > 0 THEN
                -- Extract snippet around search term
                SUBSTRING(
                    content, 
                    GREATEST(1, POSITION(LOWER($1) IN LOWER(content)) - 50), 
                    200
                ) || '...'
            ELSE 
                LEFT(content, 200) || '...'
        END as content_preview,

        -- Tag matching analysis
        ARRAY(
            SELECT tag 
            FROM UNNEST(tags) AS tag 
            WHERE tag ILIKE '%' || $1 || '%'
        ) as matching_tags,

        -- Related content scoring
        ARRAY_LENGTH(
            ARRAY(
                SELECT tag 
                FROM UNNEST(tags) AS tag 
                WHERE tag ILIKE '%' || $1 || '%'
            ),
            1
        ) as tag_match_count

    FROM intelligent_search_results isr
),

search_analytics AS (
    SELECT 
        swh.*,

        -- Search result ranking
        ROW_NUMBER() OVER (ORDER BY enhanced_relevance_score DESC, text_relevance_score DESC) as search_rank,

        -- Category diversity ranking
        ROW_NUMBER() OVER (PARTITION BY category ORDER BY enhanced_relevance_score DESC) as category_rank,

        -- Author diversity ranking  
        ROW_NUMBER() OVER (PARTITION BY author ORDER BY enhanced_relevance_score DESC) as author_rank,

        -- Quality tier ranking
        ROW_NUMBER() OVER (PARTITION BY content_quality_tier ORDER BY enhanced_relevance_score DESC) as quality_tier_rank,

        -- Engagement metrics
        (view_count + like_count * 2 + share_count * 3) as engagement_score,

        -- Search confidence scoring
        CASE 
            WHEN enhanced_relevance_score >= 1.5 THEN 'high_confidence'
            WHEN enhanced_relevance_score >= 0.8 THEN 'medium_confidence'
            WHEN enhanced_relevance_score >= 0.4 THEN 'low_confidence'
            ELSE 'very_low_confidence'
        END as search_confidence

    FROM search_with_highlights swh
)

SELECT 
    document_id,
    title,
    content_preview,
    description,
    author,
    category,
    tags,
    TO_CHAR(published_date, 'YYYY-MM-DD') as published_date,

    -- Search relevance metrics
    ROUND(text_relevance_score::NUMERIC, 4) as text_score,
    ROUND(enhanced_relevance_score::NUMERIC, 4) as relevance_score,
    search_rank,
    match_type,
    search_confidence,

    -- Content characteristics
    word_count,
    content_quality_tier,
    ROUND((word_count / NULLIF(reading_time_minutes, 0))::NUMERIC, 0) as reading_speed_wpm,

    -- Search highlights
    title_highlight,
    content_highlight,
    description_highlight,

    -- Tag analysis
    matching_tags,
    tag_match_count,

    -- Engagement and quality
    view_count,
    like_count,
    share_count,
    average_rating,
    engagement_score,

    -- Diversity indicators
    category_rank,
    author_rank,
    quality_tier_rank,

    -- Search metadata
    CASE 
        WHEN search_confidence = 'high_confidence' THEN 'Excellent match for your search'
        WHEN match_type = 'title_match' THEN 'Title contains your search terms'
        WHEN content_quality_tier = 'comprehensive_high_quality' THEN 'Comprehensive, high-quality content'
        WHEN engagement_score > 100 THEN 'Popular content with high engagement'
        ELSE 'Relevant match found'
    END as result_description,

    CURRENT_TIMESTAMP as search_performed_at

FROM search_analytics
WHERE 
    -- Result quality thresholds
    enhanced_relevance_score >= 0.2
    AND text_relevance_score >= 0.5

    -- Diversity constraints for better result variety
    AND category_rank <= 3          -- Max 3 results per category
    AND author_rank <= 2            -- Max 2 results per author  
    AND quality_tier_rank <= 5      -- Max 5 per quality tier

ORDER BY 
    enhanced_relevance_score DESC,
    search_rank ASC,
    engagement_score DESC
LIMIT 20;

-- Search faceting for advanced filtering
WITH search_facets AS (
    SELECT 
        category,
        author,
        content_quality_tier,
        EXTRACT(YEAR FROM published_date) as publication_year,
        COUNT(*) as result_count,
        AVG(enhanced_relevance_score) as avg_relevance,
        MAX(enhanced_relevance_score) as max_relevance

    FROM intelligent_search_results
    GROUP BY category, author, content_quality_tier, EXTRACT(YEAR FROM published_date)
    HAVING COUNT(*) >= 2  -- Minimum results threshold
)

SELECT 
    'category' as facet_type,
    category as facet_value,
    result_count,
    ROUND(avg_relevance::NUMERIC, 3) as avg_relevance_score,
    ROUND(max_relevance::NUMERIC, 3) as max_relevance_score
FROM search_facets
WHERE category IS NOT NULL

UNION ALL

SELECT 
    'author' as facet_type,
    author as facet_value,
    result_count,
    ROUND(avg_relevance::NUMERIC, 3) as avg_relevance_score,
    ROUND(max_relevance::NUMERIC, 3) as max_relevance_score
FROM search_facets  
WHERE author IS NOT NULL AND result_count >= 3

UNION ALL

SELECT 
    'content_quality' as facet_type,
    content_quality_tier as facet_value,
    result_count,
    ROUND(avg_relevance::NUMERIC, 3) as avg_relevance_score,
    ROUND(max_relevance::NUMERIC, 3) as max_relevance_score
FROM search_facets
WHERE content_quality_tier IS NOT NULL

UNION ALL

SELECT 
    'publication_year' as facet_type,
    publication_year::TEXT as facet_value,
    result_count,
    ROUND(avg_relevance::NUMERIC, 3) as avg_relevance_score,
    ROUND(max_relevance::NUMERIC, 3) as max_relevance_score
FROM search_facets
WHERE publication_year IS NOT NULL

ORDER BY facet_type, result_count DESC;

-- Search analytics and insights
CREATE VIEW search_performance_insights AS
WITH search_statistics AS (
    SELECT 
        DATE_TRUNC('day', search_performed_at) as search_date,
        search_query,
        COUNT(*) as search_frequency,
        AVG(results_count) as avg_results_returned,
        AVG(search_latency) as avg_search_latency,

        -- Query analysis
        LENGTH(search_query) as query_length,
        ARRAY_LENGTH(STRING_TO_ARRAY(search_query, ' '), 1) as query_word_count,

        -- Zero result tracking
        COUNT(*) FILTER (WHERE results_count = 0) as zero_result_searches,
        COUNT(*) FILTER (WHERE results_count > 0) as successful_searches,

        -- Performance classification
        COUNT(*) FILTER (WHERE search_latency <= 100) as fast_searches,
        COUNT(*) FILTER (WHERE search_latency <= 500) as acceptable_searches,
        COUNT(*) FILTER (WHERE search_latency > 500) as slow_searches

    FROM search_analytics_log
    WHERE search_performed_at >= CURRENT_DATE - INTERVAL '30 days'
    GROUP BY DATE_TRUNC('day', search_performed_at), search_query
    HAVING COUNT(*) >= 2  -- Focus on repeated queries
),

search_trends AS (
    SELECT 
        search_query,
        SUM(search_frequency) as total_searches,
        AVG(avg_results_returned) as overall_avg_results,
        AVG(avg_search_latency) as overall_avg_latency,

        -- Success rate calculation
        (SUM(successful_searches)::DECIMAL / NULLIF(SUM(search_frequency), 0)) * 100 as success_rate_percent,

        -- Performance score
        (SUM(fast_searches)::DECIMAL / NULLIF(SUM(search_frequency), 0)) * 100 as fast_search_percent,

        -- Query characteristics
        AVG(query_length) as avg_query_length,
        AVG(query_word_count) as avg_query_words,

        -- Trend analysis
        COUNT(DISTINCT search_date) as search_days,
        MIN(search_date) as first_search_date,
        MAX(search_date) as last_search_date

    FROM search_statistics
    GROUP BY search_query
),

query_insights AS (
    SELECT 
        st.*,

        -- Classification
        CASE 
            WHEN success_rate_percent >= 95 THEN 'high_performing_query'
            WHEN success_rate_percent >= 80 THEN 'good_performing_query'
            WHEN success_rate_percent >= 60 THEN 'fair_performing_query'
            ELSE 'poor_performing_query'
        END as query_performance_class,

        CASE 
            WHEN total_searches >= 100 THEN 'very_popular'
            WHEN total_searches >= 50 THEN 'popular'  
            WHEN total_searches >= 20 THEN 'moderately_popular'
            WHEN total_searches >= 10 THEN 'occasionally_used'
            ELSE 'rarely_used'
        END as query_popularity_class,

        CASE 
            WHEN fast_search_percent >= 90 THEN 'excellent_performance'
            WHEN fast_search_percent >= 70 THEN 'good_performance'
            WHEN fast_search_percent >= 50 THEN 'acceptable_performance'
            ELSE 'poor_performance'
        END as latency_performance_class,

        -- Recommendations
        CASE 
            WHEN success_rate_percent < 60 THEN 'Improve content coverage for this query'
            WHEN fast_search_percent < 50 THEN 'Optimize search performance'
            WHEN overall_avg_results < 5 THEN 'Expand content in this area'
            ELSE 'Query performing well'
        END as optimization_recommendation

    FROM search_trends st
)

SELECT 
    search_query,
    query_popularity_class,
    query_performance_class,  
    latency_performance_class,

    -- Key metrics
    total_searches,
    ROUND(success_rate_percent::NUMERIC, 1) as success_rate_pct,
    ROUND(overall_avg_results::NUMERIC, 1) as avg_results_returned,
    ROUND(overall_avg_latency::NUMERIC, 0) as avg_latency_ms,
    ROUND(fast_search_percent::NUMERIC, 1) as fast_searches_pct,

    -- Query characteristics
    ROUND(avg_query_length::NUMERIC, 1) as avg_character_length,
    ROUND(avg_query_words::NUMERIC, 1) as avg_word_count,

    -- Usage patterns
    search_days as days_active,
    TO_CHAR(first_search_date, 'YYYY-MM-DD') as first_seen,
    TO_CHAR(last_search_date, 'YYYY-MM-DD') as last_seen,

    -- Insights and recommendations
    optimization_recommendation,

    -- Priority scoring for optimization efforts
    CASE 
        WHEN query_popularity_class IN ('very_popular', 'popular') AND query_performance_class IN ('poor_performing_query', 'fair_performing_query') THEN 1
        WHEN query_popularity_class = 'very_popular' AND latency_performance_class = 'poor_performance' THEN 2
        WHEN query_performance_class = 'poor_performing_query' THEN 3
        ELSE 4
    END as optimization_priority

FROM query_insights
ORDER BY 
    optimization_priority ASC,
    total_searches DESC,
    success_rate_percent ASC;

-- QueryLeaf provides comprehensive full-text search capabilities:
-- 1. SQL-familiar TEXT_SEARCH function for complex text queries
-- 2. Advanced relevance scoring with customizable ranking algorithms  
-- 3. Built-in search highlighting and snippet generation
-- 4. Faceted search capabilities with aggregation-based filtering
-- 5. Search analytics and performance monitoring with SQL queries
-- 6. Auto-complete and suggestion generation using pattern matching
-- 7. Multi-language text search support with language-specific processing
-- 8. Enterprise search patterns with personalization and recommendations
-- 9. Native integration with MongoDB text indexing optimizations
-- 10. Familiar SQL patterns for complex search and content discovery requirements

Best Practices for Full-Text Search Implementation

Index Design and Optimization

Essential practices for production full-text search deployments:

  1. Weight Configuration: Assign appropriate weights to different fields based on their importance for relevance scoring
  2. Language Support: Configure language-specific processing for stemming, stop words, and linguistic analysis
  3. Index Maintenance: Monitor index performance and rebuild indexes when content patterns change significantly
  4. Query Optimization: Design search queries that leverage index capabilities while minimizing performance overhead
  5. Result Caching: Implement intelligent caching strategies for frequently executed search queries
  6. Performance Monitoring: Track search latency, relevance quality, and user satisfaction metrics

Enterprise Search Architecture

Design full-text search systems for enterprise-scale content discovery:

  1. Personalization Integration: Implement user behavior tracking and personalized ranking algorithms
  2. Content Quality Assessment: Develop metrics for content quality that influence search ranking
  3. Search Analytics: Establish comprehensive search analytics for query optimization and content gap analysis
  4. Faceted Navigation: Design intuitive faceted search interfaces that help users refine search results
  5. Auto-Complete Systems: Implement intelligent auto-complete that learns from user behavior and content
  6. Multi-Modal Search: Integrate text search with other search capabilities like vector similarity and geospatial queries

Conclusion

MongoDB Full-Text Search provides comprehensive intelligent content discovery capabilities that eliminate the complexity and limitations of traditional database text search approaches. The combination of advanced linguistic processing, intelligent relevance scoring, and sophisticated analytical capabilities enables modern applications to deliver the intelligent search experiences users expect while maintaining familiar database interaction patterns.

Key Full-Text Search benefits include:

  • Advanced Linguistic Processing: Native support for stemming, stop word filtering, and multi-language text analysis
  • Intelligent Relevance Scoring: Sophisticated ranking algorithms with customizable business logic integration
  • High-Performance Text Queries: Optimized text indexing with automatic query optimization and caching
  • Enterprise Search Features: Built-in support for faceted search, auto-complete, and search analytics
  • Seamless MongoDB Integration: Unified data access patterns with existing MongoDB operations and security
  • SQL Compatibility: Familiar search operations through QueryLeaf for accessible content discovery development

Whether you're building knowledge management systems, content discovery platforms, e-commerce search functionality, or any application requiring intelligent text search, MongoDB Full-Text Search with QueryLeaf's SQL-familiar interface provides the foundation for modern content discovery that scales efficiently while maintaining familiar interaction patterns.

QueryLeaf Integration: QueryLeaf automatically manages MongoDB Full-Text Search operations while providing SQL-familiar syntax for text search queries, relevance scoring, and search analytics. Advanced content discovery patterns including personalized search, faceted navigation, and search optimization are seamlessly accessible through familiar SQL constructs, making sophisticated search development both powerful and approachable for SQL-oriented development teams.

The combination of MongoDB's robust text search capabilities with SQL-style operations makes it an ideal platform for applications requiring both intelligent content discovery and familiar database query patterns, ensuring your search solutions remain both effective and maintainable as content volumes and user expectations evolve.

MongoDB Change Streams and Event-Driven Architecture: Real-Time Data Processing and Reactive Application Development with SQL-Compatible Operations

Modern applications increasingly require real-time responsiveness to data changes, enabling immediate updates across distributed systems, live dashboards, notification systems, and collaborative features. Traditional polling-based approaches create significant performance overhead, increase database load, and introduce unacceptable latency for responsive user experiences.

MongoDB Change Streams provide native event-driven capabilities that eliminate polling overhead through real-time change notifications, enabling sophisticated reactive architectures with guaranteed delivery, resumability, and comprehensive filtering. Unlike traditional database triggers or external message queues that require complex infrastructure management, Change Streams deliver enterprise-grade real-time data processing with automatic failover, distributed coordination, and seamless integration with MongoDB's operational model.

The Traditional Change Detection Challenge

Conventional approaches to detecting data changes involve significant complexity, performance penalties, and reliability issues:

-- Traditional PostgreSQL change detection - complex polling with performance overhead

-- Audit table approach with triggers (complex maintenance and performance impact)
CREATE TABLE product_audit (
    audit_id BIGSERIAL PRIMARY KEY,
    product_id BIGINT NOT NULL,
    operation_type VARCHAR(10) NOT NULL, -- INSERT, UPDATE, DELETE
    old_data JSONB,
    new_data JSONB,
    changed_fields TEXT[],
    change_timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    user_id BIGINT,
    session_id VARCHAR(100),
    application_context JSONB,

    -- Performance indexes
    INDEX audit_product_time_idx (product_id, change_timestamp DESC),
    INDEX audit_operation_time_idx (operation_type, change_timestamp DESC)
);

-- Complex trigger function for change tracking
CREATE OR REPLACE FUNCTION track_product_changes()
RETURNS TRIGGER AS $$
DECLARE
    old_json JSONB;
    new_json JSONB;
    changed_fields TEXT[] := ARRAY[]::TEXT[];
    field_name TEXT;
    field_value_old TEXT;
    field_value_new TEXT;
BEGIN
    -- Handle different operation types
    CASE TG_OP
        WHEN 'INSERT' THEN
            new_json := row_to_json(NEW)::JSONB;
            INSERT INTO product_audit (
                product_id, operation_type, new_data, 
                changed_fields, user_id, session_id
            ) VALUES (
                NEW.product_id, 'INSERT', new_json,
                array(select key from jsonb_each(new_json)),
                NEW.last_modified_by, NEW.session_id
            );
            RETURN NEW;

        WHEN 'UPDATE' THEN
            old_json := row_to_json(OLD)::JSONB;
            new_json := row_to_json(NEW)::JSONB;

            -- Complex field-by-field comparison for change detection
            FOR field_name IN SELECT key FROM jsonb_each(new_json) LOOP
                field_value_old := COALESCE((old_json->>field_name), '');
                field_value_new := COALESCE((new_json->>field_name), '');

                IF field_value_old != field_value_new THEN
                    changed_fields := array_append(changed_fields, field_name);
                END IF;
            END LOOP;

            -- Only log if there are actual changes
            IF array_length(changed_fields, 1) > 0 THEN
                INSERT INTO product_audit (
                    product_id, operation_type, old_data, new_data,
                    changed_fields, user_id, session_id
                ) VALUES (
                    NEW.product_id, 'UPDATE', old_json, new_json,
                    changed_fields, NEW.last_modified_by, NEW.session_id
                );
            END IF;
            RETURN NEW;

        WHEN 'DELETE' THEN
            old_json := row_to_json(OLD)::JSONB;
            INSERT INTO product_audit (
                product_id, operation_type, old_data,
                changed_fields, user_id, session_id
            ) VALUES (
                OLD.product_id, 'DELETE', old_json,
                array(select key from jsonb_each(old_json)),
                OLD.last_modified_by, OLD.session_id
            );
            RETURN OLD;
    END CASE;

    RETURN NULL;
END;
$$ LANGUAGE plpgsql;

-- Create triggers on multiple tables (maintenance overhead)
CREATE TRIGGER product_changes_trigger
    AFTER INSERT OR UPDATE OR DELETE ON products
    FOR EACH ROW EXECUTE FUNCTION track_product_changes();

CREATE TRIGGER inventory_changes_trigger  
    AFTER INSERT OR UPDATE OR DELETE ON inventory
    FOR EACH ROW EXECUTE FUNCTION track_inventory_changes();

-- Polling-based change consumption (expensive and unreliable)
WITH recent_changes AS (
    SELECT 
        pa.audit_id,
        pa.product_id,
        pa.operation_type,
        pa.old_data,
        pa.new_data,
        pa.changed_fields,
        pa.change_timestamp,
        pa.user_id,
        pa.session_id,

        -- Product context enrichment (expensive joins)
        p.name as product_name,
        p.category_id,
        p.current_price,
        p.status,
        c.category_name,

        -- Change analysis
        CASE 
            WHEN pa.operation_type = 'INSERT' THEN 'Product Created'
            WHEN pa.operation_type = 'UPDATE' AND 'current_price' = ANY(pa.changed_fields) THEN 'Price Updated'
            WHEN pa.operation_type = 'UPDATE' AND 'status' = ANY(pa.changed_fields) THEN 'Status Changed'
            WHEN pa.operation_type = 'UPDATE' THEN 'Product Modified'
            WHEN pa.operation_type = 'DELETE' THEN 'Product Removed'
        END as change_description,

        -- Business impact assessment
        CASE 
            WHEN pa.operation_type = 'UPDATE' AND 'current_price' = ANY(pa.changed_fields) THEN
                CASE 
                    WHEN (pa.new_data->>'current_price')::DECIMAL > (pa.old_data->>'current_price')::DECIMAL 
                    THEN 'Price Increase'
                    ELSE 'Price Decrease'
                END
            WHEN pa.operation_type = 'UPDATE' AND 'inventory_count' = ANY(pa.changed_fields) THEN
                CASE 
                    WHEN (pa.new_data->>'inventory_count')::INTEGER <= 5 THEN 'Low Stock Alert'
                    WHEN (pa.new_data->>'inventory_count')::INTEGER = 0 THEN 'Out of Stock'
                    ELSE 'Inventory Updated'
                END
        END as business_impact,

        -- Notification targeting
        CASE 
            WHEN pa.operation_type = 'INSERT' THEN ARRAY['product_managers', 'inventory_team']
            WHEN pa.operation_type = 'UPDATE' AND 'current_price' = ANY(pa.changed_fields) 
                THEN ARRAY['pricing_team', 'sales_team', 'customers']
            WHEN pa.operation_type = 'UPDATE' AND 'inventory_count' = ANY(pa.changed_fields) 
                THEN ARRAY['inventory_team', 'fulfillment']
            WHEN pa.operation_type = 'DELETE' THEN ARRAY['product_managers', 'customers']
            ELSE ARRAY['general_subscribers']
        END as notification_targets

    FROM product_audit pa
    LEFT JOIN products p ON pa.product_id = p.product_id
    LEFT JOIN categories c ON p.category_id = c.category_id
    WHERE pa.change_timestamp > (
        -- Get last processed timestamp (requires external state management)
        SELECT COALESCE(last_processed_timestamp, CURRENT_TIMESTAMP - INTERVAL '5 minutes')
        FROM change_processing_checkpoint 
        WHERE processor_name = 'product_change_handler'
    )
    ORDER BY pa.change_timestamp ASC
),

change_aggregation AS (
    -- Complex aggregation for batch processing
    SELECT 
        rc.product_id,
        rc.product_name,
        rc.category_name,
        COUNT(*) as total_changes,

        -- Change type counts
        COUNT(*) FILTER (WHERE operation_type = 'INSERT') as creates,
        COUNT(*) FILTER (WHERE operation_type = 'UPDATE') as updates, 
        COUNT(*) FILTER (WHERE operation_type = 'DELETE') as deletes,

        -- Business impact analysis
        COUNT(*) FILTER (WHERE business_impact LIKE '%Price%') as price_changes,
        COUNT(*) FILTER (WHERE business_impact LIKE '%Stock%') as inventory_changes,

        -- Change timeline
        MIN(change_timestamp) as first_change,
        MAX(change_timestamp) as last_change,
        EXTRACT(SECONDS FROM (MAX(change_timestamp) - MIN(change_timestamp))) as change_window_seconds,

        -- Most recent change details
        (array_agg(rc.operation_type ORDER BY rc.change_timestamp DESC))[1] as latest_operation,
        (array_agg(rc.change_description ORDER BY rc.change_timestamp DESC))[1] as latest_description,
        (array_agg(rc.business_impact ORDER BY rc.change_timestamp DESC))[1] as latest_impact,

        -- Notification consolidation
        array_agg(DISTINCT unnest(rc.notification_targets)) as all_notification_targets,

        -- Change velocity (changes per minute)
        CASE 
            WHEN EXTRACT(SECONDS FROM (MAX(change_timestamp) - MIN(change_timestamp))) > 0 
            THEN COUNT(*)::DECIMAL / (EXTRACT(SECONDS FROM (MAX(change_timestamp) - MIN(change_timestamp))) / 60)
            ELSE COUNT(*)::DECIMAL
        END as changes_per_minute

    FROM recent_changes rc
    GROUP BY rc.product_id, rc.product_name, rc.category_name
),

notification_prioritization AS (
    SELECT 
        ca.*,

        -- Priority scoring
        (
            -- Change frequency component
            LEAST(changes_per_minute * 2, 10) +

            -- Business impact component  
            CASE 
                WHEN price_changes > 0 THEN 5
                WHEN inventory_changes > 0 THEN 4
                WHEN creates > 0 THEN 3
                WHEN deletes > 0 THEN 6
                ELSE 1
            END +

            -- Recency component
            CASE 
                WHEN change_window_seconds < 300 THEN 3  -- Within 5 minutes
                WHEN change_window_seconds < 3600 THEN 2 -- Within 1 hour
                ELSE 1
            END
        ) as priority_score,

        -- Alert classification
        CASE 
            WHEN deletes > 0 THEN 'critical'
            WHEN price_changes > 0 AND changes_per_minute > 1 THEN 'high'
            WHEN inventory_changes > 0 THEN 'medium'
            WHEN creates > 0 THEN 'low'
            ELSE 'informational'
        END as alert_level,

        -- Message formatting
        CASE 
            WHEN total_changes = 1 THEN latest_description
            ELSE CONCAT(total_changes, ' changes to ', product_name, ' (', latest_description, ')')
        END as notification_message

    FROM change_aggregation ca
)

-- Final change processing output (still requires external message queue)
SELECT 
    np.product_id,
    np.product_name, 
    np.category_name,
    np.total_changes,
    np.priority_score,
    np.alert_level,
    np.notification_message,
    np.all_notification_targets,
    np.last_change,

    -- Processing metadata
    CURRENT_TIMESTAMP as processed_at,
    'product_change_handler' as processor_name,

    -- External system integration requirements
    CASE alert_level
        WHEN 'critical' THEN 'immediate_push_notification'
        WHEN 'high' THEN 'priority_email_and_push' 
        WHEN 'medium' THEN 'email_notification'
        ELSE 'dashboard_update_only'
    END as delivery_method,

    -- Routing information for message queue
    CASE 
        WHEN 'customers' = ANY(all_notification_targets) THEN 'customer_notifications_queue'
        WHEN 'pricing_team' = ANY(all_notification_targets) THEN 'internal_alerts_queue'
        ELSE 'general_updates_queue'
    END as routing_key,

    -- Deduplication key (manual implementation required)
    MD5(CONCAT(product_id, ':', array_to_string(all_notification_targets, ','), ':', DATE_TRUNC('minute', last_change))) as deduplication_key

FROM notification_prioritization np
WHERE priority_score >= 3  -- Filter low-priority notifications
ORDER BY priority_score DESC, last_change DESC;

-- Update checkpoint after processing (manual transaction management)
UPDATE change_processing_checkpoint 
SET 
    last_processed_timestamp = CURRENT_TIMESTAMP,
    processed_count = processed_count + (SELECT COUNT(*) FROM recent_changes),
    last_updated = CURRENT_TIMESTAMP
WHERE processor_name = 'product_change_handler';

-- Traditional polling approach problems:
-- 1. Expensive polling operations creating unnecessary database load
-- 2. Complex trigger-based audit tables requiring extensive maintenance
-- 3. Race conditions and missed changes during high-concurrency periods
-- 4. Manual checkpoint management and external state tracking required
-- 5. Complex field-level change detection with performance overhead
-- 6. No guaranteed delivery or automatic failure recovery mechanisms
-- 7. Difficult horizontal scaling of change processing systems
-- 8. External message queue infrastructure required for reliability
-- 9. Manual deduplication and ordering logic implementation required
-- 10. Limited filtering capabilities and expensive context enrichment queries

MongoDB Change Streams eliminate polling complexity with native real-time change notifications:

// MongoDB Change Streams - native real-time change processing with comprehensive event handling
const { MongoClient, ObjectId } = require('mongodb');

const client = new MongoClient('mongodb://localhost:27017');
const db = client.db('ecommerce_platform');

// Advanced MongoDB Change Streams Manager
class MongoDBChangeStreamsManager {
  constructor(db, config = {}) {
    this.db = db;
    this.config = {
      // Change stream configuration
      enableChangeStreams: config.enableChangeStreams !== false,
      resumeAfterFailure: config.resumeAfterFailure !== false,
      batchSize: config.batchSize || 100,
      maxAwaitTimeMS: config.maxAwaitTimeMS || 1000,

      // Event processing configuration
      enableEventEnrichment: config.enableEventEnrichment !== false,
      enableEventFiltering: config.enableEventFiltering !== false,
      enableEventAggregation: config.enableEventAggregation !== false,
      enableEventRouting: config.enableEventRouting !== false,

      // Reliability and resilience  
      enableAutoResume: config.enableAutoResume !== false,
      enableDeadLetterQueue: config.enableDeadLetterQueue !== false,
      maxRetries: config.maxRetries || 3,
      retryDelayMs: config.retryDelayMs || 1000,

      // Performance optimization
      enableParallelProcessing: config.enableParallelProcessing !== false,
      processingConcurrency: config.processingConcurrency || 10,
      enableBatchProcessing: config.enableBatchProcessing !== false,
      batchProcessingWindowMs: config.batchProcessingWindowMs || 5000,

      // Monitoring and observability
      enableMetrics: config.enableMetrics !== false,
      enableLogging: config.enableLogging !== false,
      logLevel: config.logLevel || 'info',

      ...config
    };

    // Collection references
    this.collections = {
      products: db.collection('products'),
      inventory: db.collection('inventory'),
      orders: db.collection('orders'),
      customers: db.collection('customers'),

      // Event processing collections
      changeEvents: db.collection('change_events'),
      processingCheckpoints: db.collection('processing_checkpoints'),
      deadLetterQueue: db.collection('dead_letter_queue'),
      eventMetrics: db.collection('event_metrics')
    };

    // Change stream management
    this.changeStreams = new Map();
    this.eventProcessors = new Map();
    this.processingQueues = new Map();
    this.resumeTokens = new Map();

    // Performance metrics
    this.metrics = {
      eventsProcessed: 0,
      eventsFailured: 0,
      averageProcessingTime: 0,
      totalProcessingTime: 0,
      lastProcessedAt: null,
      processingErrors: []
    };

    this.initializeChangeStreams();
  }

  async initializeChangeStreams() {
    console.log('Initializing MongoDB Change Streams for real-time data processing...');

    try {
      // Setup change streams for different collections
      await this.setupProductChangeStream();
      await this.setupInventoryChangeStream();
      await this.setupOrderChangeStream();
      await this.setupCustomerChangeStream();

      // Setup cross-collection change aggregation
      await this.setupDatabaseChangeStream();

      // Initialize event processing infrastructure
      await this.setupEventProcessingInfrastructure();

      console.log('Change streams initialized successfully');

    } catch (error) {
      console.error('Error initializing change streams:', error);
      throw error;
    }
  }

  async setupProductChangeStream() {
    console.log('Setting up product change stream...');

    const productsCollection = this.collections.products;

    // Advanced change stream pipeline with filtering and enrichment
    const changeStreamPipeline = [
      // Stage 1: Filter relevant operations
      {
        $match: {
          $or: [
            { 'operationType': 'insert' },
            { 'operationType': 'update' },
            { 'operationType': 'delete' },
            { 'operationType': 'replace' }
          ],

          // Optional namespace filtering
          'ns.db': this.db.databaseName,
          'ns.coll': 'products'
        }
      },

      // Stage 2: Enrich change events with business context
      {
        $lookup: {
          from: 'categories',
          localField: 'fullDocument.categoryId',
          foreignField: '_id',
          as: 'categoryInfo'
        }
      },

      // Stage 3: Add computed fields and change analysis
      {
        $addFields: {
          // Event metadata
          eventId: { $toString: '$_id' },
          eventTimestamp: '$$NOW',
          collectionName: '$ns.coll',

          // Change analysis
          changedFields: {
            $cond: {
              if: { $eq: ['$operationType', 'update'] },
              then: { $objectToArray: '$updateDescription.updatedFields' },
              else: []
            }
          },

          // Business context
          categoryInfo: { $arrayElemAt: ['$categoryInfo', 0] },

          // Priority assessment
          eventPriority: {
            $switch: {
              branches: [
                {
                  case: { $eq: ['$operationType', 'delete'] },
                  then: 'critical'
                },
                {
                  case: {
                    $and: [
                      { $eq: ['$operationType', 'update'] },
                      { $ne: [{ $type: '$updateDescription.updatedFields.price' }, 'missing'] }
                    ]
                  },
                  then: 'high'
                },
                {
                  case: {
                    $and: [
                      { $eq: ['$operationType', 'update'] },
                      { $ne: [{ $type: '$updateDescription.updatedFields.inventory' }, 'missing'] }
                    ]
                  },
                  then: 'medium'
                },
                {
                  case: { $eq: ['$operationType', 'insert'] },
                  then: 'low'
                }
              ],
              default: 'informational'
            }
          },

          // Notification routing
          notificationTargets: {
            $switch: {
              branches: [
                {
                  case: { $eq: ['$operationType', 'insert'] },
                  then: ['product_managers', 'inventory_team']
                },
                {
                  case: {
                    $and: [
                      { $eq: ['$operationType', 'update'] },
                      { $ne: [{ $type: '$updateDescription.updatedFields.price' }, 'missing'] }
                    ]
                  },
                  then: ['pricing_team', 'sales_team', 'customers']
                },
                {
                  case: {
                    $and: [
                      { $eq: ['$operationType', 'update'] },
                      { $ne: [{ $type: '$updateDescription.updatedFields.inventory' }, 'missing'] }
                    ]
                  },
                  then: ['inventory_team', 'fulfillment']
                },
                {
                  case: { $eq: ['$operationType', 'delete'] },
                  then: ['product_managers', 'customers']
                }
              ],
              default: ['general_subscribers']
            }
          }
        }
      }
    ];

    // Create change stream with pipeline and options
    const productChangeStream = productsCollection.watch(changeStreamPipeline, {
      fullDocument: 'updateLookup',
      fullDocumentBeforeChange: 'whenAvailable',
      batchSize: this.config.batchSize,
      maxAwaitTimeMS: this.config.maxAwaitTimeMS,
      resumeAfter: this.resumeTokens.get('products')
    });

    // Event processing handler
    productChangeStream.on('change', async (changeEvent) => {
      await this.processProductChangeEvent(changeEvent);
    });

    // Error handling and resume token management
    productChangeStream.on('error', async (error) => {
      console.error('Product change stream error:', error);
      await this.handleChangeStreamError('products', error);
    });

    productChangeStream.on('resumeTokenChanged', (resumeToken) => {
      this.resumeTokens.set('products', resumeToken);
      this.persistResumeToken('products', resumeToken);
    });

    this.changeStreams.set('products', productChangeStream);
    console.log('Product change stream setup complete');
  }

  async processProductChangeEvent(changeEvent) {
    const startTime = Date.now();

    try {
      console.log(`Processing product change event: ${changeEvent.operationType} for product ${changeEvent.documentKey._id}`);

      // Enrich change event with additional context
      const enrichedEvent = await this.enrichProductChangeEvent(changeEvent);

      // Apply business logic and routing
      const processedEvent = await this.applyProductBusinessLogic(enrichedEvent);

      // Route to appropriate handlers
      await this.routeProductChangeEvent(processedEvent);

      // Store event for audit and analytics
      await this.storeChangeEvent(processedEvent);

      // Update metrics
      this.updateProcessingMetrics(startTime, 'success');

    } catch (error) {
      console.error('Error processing product change event:', error);

      // Handle processing failure
      await this.handleEventProcessingError(changeEvent, error);
      this.updateProcessingMetrics(startTime, 'error');
    }
  }

  async enrichProductChangeEvent(changeEvent) {
    console.log('Enriching product change event with business context...');

    try {
      const enrichedEvent = {
        ...changeEvent,

        // Processing metadata
        processingId: new ObjectId(),
        processingTimestamp: new Date(),
        processorVersion: '1.0',

        // Document context (current and previous state)
        currentDocument: changeEvent.fullDocument,
        previousDocument: changeEvent.fullDocumentBeforeChange,

        // Change analysis
        changeAnalysis: await this.analyzeProductChange(changeEvent),

        // Business impact assessment
        businessImpact: await this.assessProductBusinessImpact(changeEvent),

        // Related data enrichment
        relatedData: await this.getRelatedProductData(changeEvent.documentKey._id),

        // Notification configuration
        notificationConfig: await this.getProductNotificationConfig(changeEvent),

        // Processing context
        processingContext: {
          correlationId: changeEvent.eventId,
          sourceCollection: changeEvent.collectionName,
          processingPipeline: 'product_changes',
          retryCount: 0,
          maxRetries: this.config.maxRetries
        }
      };

      return enrichedEvent;

    } catch (error) {
      console.error('Error enriching product change event:', error);
      throw error;
    }
  }

  async analyzeProductChange(changeEvent) {
    const analysis = {
      operationType: changeEvent.operationType,
      affectedFields: [],
      fieldChanges: {},
      changeType: 'unknown',
      changeSignificance: 'low'
    };

    switch (changeEvent.operationType) {
      case 'insert':
        analysis.changeType = 'product_creation';
        analysis.changeSignificance = 'medium';
        analysis.affectedFields = Object.keys(changeEvent.fullDocument || {});
        break;

      case 'update':
        if (changeEvent.updateDescription && changeEvent.updateDescription.updatedFields) {
          analysis.affectedFields = Object.keys(changeEvent.updateDescription.updatedFields);

          // Analyze specific field changes
          const updatedFields = changeEvent.updateDescription.updatedFields;

          for (const [field, newValue] of Object.entries(updatedFields)) {
            const oldValue = changeEvent.fullDocumentBeforeChange?.[field];

            analysis.fieldChanges[field] = {
              oldValue,
              newValue,
              changeType: this.classifyFieldChange(field, oldValue, newValue)
            };
          }

          // Determine change type and significance
          if ('price' in updatedFields) {
            analysis.changeType = 'price_update';
            analysis.changeSignificance = 'high';
          } else if ('inventory' in updatedFields) {
            analysis.changeType = 'inventory_update';
            analysis.changeSignificance = 'medium';
          } else if ('status' in updatedFields) {
            analysis.changeType = 'status_change';
            analysis.changeSignificance = 'medium';
          } else {
            analysis.changeType = 'product_modification';
            analysis.changeSignificance = 'low';
          }
        }
        break;

      case 'delete':
        analysis.changeType = 'product_deletion';
        analysis.changeSignificance = 'critical';
        break;

      case 'replace':
        analysis.changeType = 'product_replacement';
        analysis.changeSignificance = 'high';
        break;
    }

    return analysis;
  }

  classifyFieldChange(fieldName, oldValue, newValue) {
    switch (fieldName) {
      case 'price':
        if (newValue > oldValue) return 'price_increase';
        if (newValue < oldValue) return 'price_decrease';
        return 'price_change';

      case 'inventory':
        if (newValue === 0) return 'out_of_stock';
        if (newValue <= 5) return 'low_stock';
        if (newValue > oldValue) return 'stock_increase';
        if (newValue < oldValue) return 'stock_decrease';
        return 'inventory_adjustment';

      case 'status':
        if (newValue === 'discontinued') return 'product_discontinued';
        if (newValue === 'active' && oldValue !== 'active') return 'product_activated';
        if (newValue !== 'active' && oldValue === 'active') return 'product_deactivated';
        return 'status_change';

      default:
        return 'field_update';
    }
  }

  async assessProductBusinessImpact(changeEvent) {
    const impact = {
      impactLevel: 'low',
      impactAreas: [],
      affectedSystems: [],
      businessMetrics: {},
      actionRequired: false,
      recommendations: []
    };

    const productId = changeEvent.documentKey._id;
    const analysis = await this.analyzeProductChange(changeEvent);

    // Assess impact based on change type
    switch (analysis.changeType) {
      case 'price_update':
        impact.impactLevel = 'high';
        impact.impactAreas = ['revenue', 'customer_experience', 'competitive_positioning'];
        impact.affectedSystems = ['pricing_engine', 'recommendation_system', 'customer_notifications'];
        impact.actionRequired = true;
        impact.recommendations = [
          'Notify customers of price changes',
          'Update marketing materials',
          'Review competitive pricing'
        ];

        // Calculate price change impact
        const priceChange = analysis.fieldChanges?.price;
        if (priceChange) {
          impact.businessMetrics.priceChangePercentage = 
            ((priceChange.newValue - priceChange.oldValue) / priceChange.oldValue * 100).toFixed(2);
        }
        break;

      case 'inventory_update':
        impact.impactLevel = 'medium';
        impact.impactAreas = ['fulfillment', 'customer_experience'];
        impact.affectedSystems = ['inventory_management', 'order_processing'];

        const inventoryChange = analysis.fieldChanges?.inventory;
        if (inventoryChange) {
          if (inventoryChange.newValue === 0) {
            impact.impactLevel = 'high';
            impact.actionRequired = true;
            impact.recommendations = ['Update product availability', 'Notify backordered customers'];
          } else if (inventoryChange.newValue <= 5) {
            impact.recommendations = ['Monitor inventory levels', 'Plan restocking'];
          }
        }
        break;

      case 'product_deletion':
        impact.impactLevel = 'critical';
        impact.impactAreas = ['customer_experience', 'revenue', 'data_integrity'];
        impact.affectedSystems = ['catalog_management', 'order_processing', 'recommendations'];
        impact.actionRequired = true;
        impact.recommendations = [
          'Handle existing orders',
          'Update customer wishlists',
          'Archive product data',
          'Redirect product URLs'
        ];
        break;

      case 'product_creation':
        impact.impactLevel = 'medium';
        impact.impactAreas = ['catalog_expansion', 'revenue_opportunity'];
        impact.affectedSystems = ['search_indexing', 'recommendation_system', 'inventory_tracking'];
        impact.recommendations = [
          'Index for search',
          'Generate recommendations',
          'Create marketing content'
        ];
        break;
    }

    return impact;
  }

  async getRelatedProductData(productId) {
    try {
      // Get product relationships and context
      const relatedData = await Promise.allSettled([
        // Category information
        this.collections.products.aggregate([
          { $match: { _id: productId } },
          {
            $lookup: {
              from: 'categories',
              localField: 'categoryId',
              foreignField: '_id',
              as: 'category'
            }
          },
          { $project: { category: { $arrayElemAt: ['$category', 0] } } }
        ]).toArray(),

        // Inventory information
        this.collections.inventory.findOne({ productId: productId }),

        // Recent orders for this product
        this.collections.orders.find({
          'items.productId': productId,
          createdAt: { $gte: new Date(Date.now() - 30 * 24 * 60 * 60 * 1000) } // Last 30 days
        }).limit(10).toArray(),

        // Customer interest metrics
        this.collections.analytics.findOne({
          productId: productId,
          type: 'product_engagement'
        })
      ]);

      const [categoryResult, inventoryResult, ordersResult, analyticsResult] = relatedData;

      return {
        category: categoryResult.status === 'fulfilled' ? categoryResult.value[0]?.category : null,
        inventory: inventoryResult.status === 'fulfilled' ? inventoryResult.value : null,
        recentOrders: ordersResult.status === 'fulfilled' ? ordersResult.value : [],
        analytics: analyticsResult.status === 'fulfilled' ? analyticsResult.value : null,
        dataRetrievedAt: new Date()
      };

    } catch (error) {
      console.error('Error getting related product data:', error);
      return { error: error.message };
    }
  }

  async getProductNotificationConfig(changeEvent) {
    const config = {
      enableNotifications: true,
      notificationTargets: changeEvent.notificationTargets || [],
      deliveryMethods: ['push', 'email'],
      priority: changeEvent.eventPriority || 'low',
      batching: {
        enabled: true,
        windowMs: 60000, // 1 minute
        maxBatchSize: 10
      },
      filtering: {
        enabled: true,
        rules: []
      }
    };

    // Customize based on event type and priority
    switch (changeEvent.operationType) {
      case 'delete':
        config.deliveryMethods = ['push', 'email', 'sms'];
        config.batching.enabled = false; // Immediate delivery
        break;

      case 'update':
        if (changeEvent.eventPriority === 'high') {
          config.deliveryMethods = ['push', 'email'];
          config.batching.windowMs = 30000; // 30 seconds
        }
        break;
    }

    return config;
  }

  async applyProductBusinessLogic(enrichedEvent) {
    console.log('Applying business logic to product change event...');

    try {
      const processedEvent = {
        ...enrichedEvent,

        // Business rules execution results
        businessRules: await this.executeProductBusinessRules(enrichedEvent),

        // Workflow triggers
        workflowTriggers: await this.identifyWorkflowTriggers(enrichedEvent),

        // Integration requirements
        integrationRequirements: await this.identifyIntegrationRequirements(enrichedEvent),

        // Compliance and governance
        complianceChecks: await this.performComplianceChecks(enrichedEvent)
      };

      return processedEvent;

    } catch (error) {
      console.error('Error applying business logic:', error);
      throw error;
    }
  }

  async executeProductBusinessRules(enrichedEvent) {
    const rules = [];
    const analysis = enrichedEvent.changeAnalysis;

    // Price change rules
    if (analysis.changeType === 'price_update') {
      const priceChange = analysis.fieldChanges.price;
      const changePercent = Math.abs(
        ((priceChange.newValue - priceChange.oldValue) / priceChange.oldValue) * 100
      );

      if (changePercent > 20) {
        rules.push({
          rule: 'significant_price_change',
          triggered: true,
          severity: 'high',
          action: 'require_manager_approval',
          details: `Price change of ${changePercent.toFixed(2)}% requires approval`
        });
      }

      if (priceChange.newValue < priceChange.oldValue * 0.5) {
        rules.push({
          rule: 'deep_discount_alert',
          triggered: true,
          severity: 'medium',
          action: 'fraud_detection_review',
          details: 'Price reduced by more than 50%'
        });
      }
    }

    // Inventory rules
    if (analysis.changeType === 'inventory_update') {
      const inventoryChange = analysis.fieldChanges.inventory;

      if (inventoryChange?.newValue === 0) {
        rules.push({
          rule: 'out_of_stock',
          triggered: true,
          severity: 'high',
          action: 'update_product_availability',
          details: 'Product is now out of stock'
        });
      }

      if (inventoryChange?.newValue <= 5 && inventoryChange?.newValue > 0) {
        rules.push({
          rule: 'low_stock_warning',
          triggered: true,
          severity: 'medium',
          action: 'reorder_notification',
          details: `Low stock: ${inventoryChange.newValue} units remaining`
        });
      }
    }

    // Product lifecycle rules
    if (analysis.changeType === 'product_deletion') {
      rules.push({
        rule: 'product_deletion',
        triggered: true,
        severity: 'critical',
        action: 'cleanup_related_data',
        details: 'Product deleted - cleanup required'
      });
    }

    return rules;
  }

  async routeProductChangeEvent(processedEvent) {
    console.log('Routing product change event to appropriate handlers...');

    try {
      const routingTasks = [];

      // Real-time notification routing
      if (processedEvent.notificationConfig.enableNotifications) {
        routingTasks.push(this.routeToNotificationSystem(processedEvent));
      }

      // Search index updates
      if (['insert', 'update', 'replace'].includes(processedEvent.operationType)) {
        routingTasks.push(this.routeToSearchIndexing(processedEvent));
      }

      // Analytics and reporting
      routingTasks.push(this.routeToAnalytics(processedEvent));

      // Integration webhooks
      if (processedEvent.integrationRequirements?.webhooks?.length > 0) {
        routingTasks.push(this.routeToWebhooks(processedEvent));
      }

      // Workflow automation
      if (processedEvent.workflowTriggers?.length > 0) {
        routingTasks.push(this.routeToWorkflowEngine(processedEvent));
      }

      // Business intelligence
      routingTasks.push(this.routeToBusinessIntelligence(processedEvent));

      // Execute routing tasks concurrently
      await Promise.allSettled(routingTasks);

    } catch (error) {
      console.error('Error routing product change event:', error);
      throw error;
    }
  }

  async routeToNotificationSystem(processedEvent) {
    console.log('Routing to notification system...');

    const notification = {
      eventId: processedEvent.processingId,
      eventType: 'product_change',
      operationType: processedEvent.operationType,
      productId: processedEvent.documentKey._id,
      priority: processedEvent.eventPriority,
      targets: processedEvent.notificationTargets,
      deliveryMethods: processedEvent.notificationConfig.deliveryMethods,

      message: this.generateNotificationMessage(processedEvent),
      payload: {
        productDetails: processedEvent.currentDocument,
        changeAnalysis: processedEvent.changeAnalysis,
        businessImpact: processedEvent.businessImpact
      },

      routing: {
        immediate: processedEvent.eventPriority === 'critical',
        batchable: processedEvent.notificationConfig.batching.enabled,
        batchWindowMs: processedEvent.notificationConfig.batching.windowMs
      },

      createdAt: new Date()
    };

    // Route to notification queue (could be MongoDB collection, message queue, etc.)
    await this.collections.notifications.insertOne(notification);

    return notification;
  }

  generateNotificationMessage(processedEvent) {
    const analysis = processedEvent.changeAnalysis;
    const product = processedEvent.currentDocument;

    switch (analysis.changeType) {
      case 'product_creation':
        return `New product added: ${product.name}`;

      case 'price_update':
        const priceChange = analysis.fieldChanges.price;
        const direction = priceChange.newValue > priceChange.oldValue ? 'increased' : 'decreased';
        return `Price ${direction} for ${product.name}: $${priceChange.oldValue} → $${priceChange.newValue}`;

      case 'inventory_update':
        const inventoryChange = analysis.fieldChanges.inventory;
        if (inventoryChange.newValue === 0) {
          return `${product.name} is now out of stock`;
        } else if (inventoryChange.newValue <= 5) {
          return `Low stock alert: ${product.name} (${inventoryChange.newValue} remaining)`;
        } else {
          return `Inventory updated for ${product.name}: ${inventoryChange.newValue} units`;
        }

      case 'product_deletion':
        return `Product removed: ${product.name}`;

      default:
        return `Product updated: ${product.name}`;
    }
  }

  async routeToSearchIndexing(processedEvent) {
    console.log('Routing to search indexing system...');

    const indexUpdate = {
      eventId: processedEvent.processingId,
      operationType: processedEvent.operationType,
      documentId: processedEvent.documentKey._id,
      collection: 'products',

      document: processedEvent.currentDocument,
      priority: processedEvent.eventPriority === 'critical' ? 'immediate' : 'normal',

      indexingInstructions: {
        fullReindex: processedEvent.operationType === 'insert',
        partialUpdate: processedEvent.operationType === 'update',
        deleteFromIndex: processedEvent.operationType === 'delete',
        affectedFields: processedEvent.changeAnalysis.affectedFields
      },

      createdAt: new Date()
    };

    await this.collections.searchIndexUpdates.insertOne(indexUpdate);
    return indexUpdate;
  }

  async setupDatabaseChangeStream() {
    console.log('Setting up database-wide change stream for cross-collection analytics...');

    // Database-level change stream for comprehensive monitoring
    const databaseChangeStream = this.db.watch([
      {
        $match: {
          'operationType': { $in: ['insert', 'update', 'delete'] },
          'ns.db': this.db.databaseName,
          'ns.coll': { $in: ['products', 'orders', 'customers', 'inventory'] }
        }
      },
      {
        $addFields: {
          eventId: { $toString: '$_id' },
          eventTimestamp: '$$NOW',

          // Cross-collection correlation
          correlationContext: {
            $switch: {
              branches: [
                {
                  case: { $eq: ['$ns.coll', 'products'] },
                  then: {
                    type: 'product_event',
                    productId: '$documentKey._id',
                    correlationKey: '$documentKey._id'
                  }
                },
                {
                  case: { $eq: ['$ns.coll', 'orders'] },
                  then: {
                    type: 'order_event',
                    orderId: '$documentKey._id',
                    correlationKey: '$fullDocument.customerId'
                  }
                }
              ],
              default: { type: 'generic_event' }
            }
          }
        }
      }
    ], {
      fullDocument: 'updateLookup'
    });

    databaseChangeStream.on('change', async (changeEvent) => {
      await this.processDatabaseChangeEvent(changeEvent);
    });

    this.changeStreams.set('database', databaseChangeStream);
    console.log('Database change stream setup complete');
  }

  async processDatabaseChangeEvent(changeEvent) {
    try {
      // Cross-collection event correlation and analytics
      await this.performCrossCollectionAnalytics(changeEvent);

      // Real-time business metrics updates
      await this.updateRealTimeMetrics(changeEvent);

      // Event pattern detection
      await this.detectEventPatterns(changeEvent);

    } catch (error) {
      console.error('Error processing database change event:', error);
    }
  }

  async storeChangeEvent(processedEvent) {
    try {
      const changeEventRecord = {
        eventId: processedEvent.processingId,
        resumeToken: processedEvent._id,

        // Event identification
        operationType: processedEvent.operationType,
        collection: processedEvent.ns?.coll,
        documentId: processedEvent.documentKey._id,

        // Timing information
        clusterTime: processedEvent.clusterTime,
        eventTimestamp: processedEvent.eventTimestamp,
        processingTimestamp: processedEvent.processingTimestamp,

        // Change details
        changeAnalysis: processedEvent.changeAnalysis,
        businessImpact: processedEvent.businessImpact,

        // Processing results
        businessRules: processedEvent.businessRules,
        routingResults: processedEvent.routingResults,

        // Status and metadata
        processingStatus: 'completed',
        processingVersion: processedEvent.processorVersion,

        // Audit trail
        createdAt: new Date(),
        retentionPolicy: 'standard' // Keep for standard retention period
      };

      await this.collections.changeEvents.insertOne(changeEventRecord);

    } catch (error) {
      console.error('Error storing change event:', error);
      // Don't throw - storage failure shouldn't stop processing
    }
  }

  async handleEventProcessingError(changeEvent, error) {
    console.log('Handling event processing error...');

    try {
      const errorRecord = {
        eventId: new ObjectId(),
        originalEventId: changeEvent.eventId,

        // Error details
        error: {
          name: error.name,
          message: error.message,
          stack: error.stack,
          code: error.code
        },

        // Event context
        changeEvent: changeEvent,
        processingAttempt: (changeEvent.processingContext?.retryCount || 0) + 1,
        maxRetries: this.config.maxRetries,

        // Status
        status: 'pending_retry',
        nextRetryAt: new Date(Date.now() + this.config.retryDelayMs),

        createdAt: new Date()
      };

      // Store in dead letter queue if max retries exceeded
      if (errorRecord.processingAttempt >= this.config.maxRetries) {
        errorRecord.status = 'dead_letter';
        errorRecord.nextRetryAt = null;
      }

      await this.collections.deadLetterQueue.insertOne(errorRecord);

      // Schedule retry if applicable
      if (errorRecord.status === 'pending_retry') {
        setTimeout(() => {
          this.retryEventProcessing(errorRecord);
        }, this.config.retryDelayMs);
      }

    } catch (storeError) {
      console.error('Error storing failed event:', storeError);
    }
  }

  updateProcessingMetrics(startTime, status) {
    const processingTime = Date.now() - startTime;

    this.metrics.eventsProcessed++;
    this.metrics.totalProcessingTime += processingTime;
    this.metrics.averageProcessingTime = this.metrics.totalProcessingTime / this.metrics.eventsProcessed;
    this.metrics.lastProcessedAt = new Date();

    if (status === 'error') {
      this.metrics.eventsFailured++;
    }

    if (this.config.enableMetrics) {
      // Log metrics periodically
      if (this.metrics.eventsProcessed % 100 === 0) {
        console.log(`Processing metrics: ${this.metrics.eventsProcessed} events processed, ` +
                   `${this.metrics.averageProcessingTime.toFixed(2)}ms avg processing time, ` +
                   `${this.metrics.eventsFailured} failures`);
      }
    }
  }

  async persistResumeToken(streamName, resumeToken) {
    try {
      await this.collections.processingCheckpoints.updateOne(
        { streamName: streamName },
        {
          $set: {
            resumeToken: resumeToken,
            lastUpdated: new Date()
          }
        },
        { upsert: true }
      );
    } catch (error) {
      console.error(`Error persisting resume token for ${streamName}:`, error);
    }
  }

  async loadResumeTokens() {
    try {
      const checkpoints = await this.collections.processingCheckpoints.find({}).toArray();

      for (const checkpoint of checkpoints) {
        this.resumeTokens.set(checkpoint.streamName, checkpoint.resumeToken);
      }

      console.log(`Loaded ${checkpoints.length} resume tokens`);
    } catch (error) {
      console.error('Error loading resume tokens:', error);
    }
  }

  async getProcessingStatistics() {
    return {
      activeStreams: this.changeStreams.size,
      eventsProcessed: this.metrics.eventsProcessed,
      eventsFailured: this.metrics.eventsFailured,
      averageProcessingTime: this.metrics.averageProcessingTime,
      successRate: ((this.metrics.eventsProcessed - this.metrics.eventsFailured) / this.metrics.eventsProcessed * 100).toFixed(2),
      lastProcessedAt: this.metrics.lastProcessedAt,

      // Stream-specific metrics
      streamMetrics: Object.fromEntries(this.changeStreams.keys().map(name => [
        name, 
        { active: true, resumeToken: this.resumeTokens.has(name) }
      ]))
    };
  }

  async shutdown() {
    console.log('Shutting down Change Streams Manager...');

    // Close all change streams
    for (const [name, stream] of this.changeStreams) {
      try {
        await stream.close();
        console.log(`Closed change stream: ${name}`);
      } catch (error) {
        console.error(`Error closing change stream ${name}:`, error);
      }
    }

    // Final metrics log
    const stats = await this.getProcessingStatistics();
    console.log('Final processing statistics:', stats);

    console.log('Change Streams Manager shutdown complete');
  }
}

// Benefits of MongoDB Change Streams:
// - Real-time change notifications without polling overhead
// - Guaranteed delivery with automatic resume capability and failure recovery
// - Advanced filtering and aggregation pipelines for targeted event processing
// - Comprehensive change context including before/after document state
// - Native integration with MongoDB's replica set and sharding architecture
// - Atomic change detection with cluster-wide ordering guarantees
// - Efficient resource utilization with intelligent batching and buffering
// - Seamless integration with existing MongoDB operations and security
// - SQL-compatible event processing through QueryLeaf integration
// - Production-ready reliability with built-in error handling and retry logic

module.exports = {
  MongoDBChangeStreamsManager
};

Understanding MongoDB Change Streams Architecture

Advanced Event-Driven Patterns for Real-Time Applications

Implement sophisticated change stream patterns for production event-driven systems:

// Production-ready Change Streams with advanced event processing and routing
class ProductionChangeStreamsProcessor extends MongoDBChangeStreamsManager {
  constructor(db, productionConfig) {
    super(db, productionConfig);

    this.productionConfig = {
      ...productionConfig,
      enableEventSourcing: true,
      enableCQRS: true,
      enableEventStore: true,
      enableSagaOrchestration: true,
      enableEventProjections: true,
      enableSnapshotting: true
    };

    this.setupProductionEventProcessing();
    this.initializeEventSourcing();
    this.setupCQRSProjections();
    this.setupSagaOrchestration();
  }

  async implementEventSourcingPattern() {
    console.log('Implementing event sourcing pattern with Change Streams...');

    const eventSourcingStrategy = {
      // Event store management
      eventStore: {
        enableEventPersistence: true,
        enableEventReplay: true,
        enableSnapshotting: true,
        snapshotFrequency: 1000
      },

      // Command handling
      commandHandling: {
        enableCommandValidation: true,
        enableCommandProjections: true,
        enableCommandSagas: true
      },

      // Query projections
      queryProjections: {
        enableRealTimeProjections: true,
        enableMaterializedViews: true,
        enableProjectionRecovery: true
      }
    };

    return await this.deployEventSourcing(eventSourcingStrategy);
  }

  async setupAdvancedEventRouting() {
    console.log('Setting up advanced event routing and distribution...');

    const routingStrategy = {
      // Message routing
      messageRouting: {
        enableTopicRouting: true,
        enableContentRouting: true,
        enableGeographicRouting: true,
        enableLoadBalancing: true
      },

      // Event transformation
      eventTransformation: {
        enableEventEnrichment: true,
        enableEventFiltering: true,
        enableEventAggregation: true,
        enableEventSplitting: true
      },

      // Delivery guarantees
      deliveryGuarantees: {
        enableAtLeastOnceDelivery: true,
        enableExactlyOnceDelivery: true,
        enableOrderedDelivery: true,
        enableDuplicateDetection: true
      }
    };

    return await this.deployAdvancedRouting(routingStrategy);
  }

  async implementReactiveStreams() {
    console.log('Implementing reactive streams for backpressure management...');

    const reactiveConfig = {
      // Backpressure handling
      backpressure: {
        enableFlowControl: true,
        bufferStrategy: 'drop_oldest',
        maxBufferSize: 10000,
        backpressureThreshold: 0.8
      },

      // Stream processing
      streamProcessing: {
        enableParallelProcessing: true,
        parallelismLevel: 10,
        enableBatching: true,
        batchSize: 100
      },

      // Error handling
      errorHandling: {
        enableCircuitBreaker: true,
        enableRetryLogic: true,
        enableDeadLetterQueue: true,
        enableGracefulDegradation: true
      }
    };

    return await this.deployReactiveStreams(reactiveConfig);
  }
}

SQL-Style Change Stream Operations with QueryLeaf

QueryLeaf provides familiar SQL syntax for MongoDB Change Streams and event-driven operations:

-- QueryLeaf change stream operations with SQL-familiar syntax

-- Create change stream monitoring with SQL-style syntax
CREATE CHANGE_STREAM product_changes 
ON products 
WITH (
  -- Change stream configuration
  full_document = 'updateLookup',
  full_document_before_change = 'whenAvailable',
  batch_size = 100,
  max_await_time_ms = 1000,

  -- Event filtering
  FILTER (
    operation_type IN ('insert', 'update', 'delete') AND
    namespace.database = 'ecommerce' AND
    namespace.collection = 'products'
  ),

  -- Event enrichment pipeline
  ENRICH (
    -- Add business context
    category_info FROM categories USING fullDocument.categoryId,
    inventory_info FROM inventory USING documentKey._id,

    -- Compute derived fields
    event_priority = CASE 
      WHEN operation_type = 'delete' THEN 'critical'
      WHEN operation_type = 'update' AND updateDescription.updatedFields.price IS NOT NULL THEN 'high'
      WHEN operation_type = 'update' AND updateDescription.updatedFields.inventory IS NOT NULL THEN 'medium'
      ELSE 'low'
    END,

    -- Change analysis
    change_type = CASE
      WHEN operation_type = 'insert' THEN 'product_creation'
      WHEN operation_type = 'update' AND updateDescription.updatedFields.price IS NOT NULL THEN 'price_update'
      WHEN operation_type = 'update' AND updateDescription.updatedFields.inventory IS NOT NULL THEN 'inventory_update'
      WHEN operation_type = 'delete' THEN 'product_deletion'
      ELSE 'product_modification'
    END
  )
);

-- Monitor change events with SQL queries
SELECT 
  event_id,
  operation_type,
  document_key._id as product_id,
  full_document.name as product_name,
  full_document.price as current_price,

  -- Change analysis
  change_type,
  event_priority,
  cluster_time,

  -- Business context
  category_info.name as category_name,
  inventory_info.quantity as current_inventory,

  -- Change details for updates
  CASE 
    WHEN operation_type = 'update' THEN
      JSON_BUILD_OBJECT(
        'updated_fields', updateDescription.updatedFields,
        'removed_fields', updateDescription.removedFields,
        'truncated_arrays', updateDescription.truncatedArrays
      )
    ELSE NULL
  END as update_details,

  -- Price change analysis
  CASE 
    WHEN change_type = 'price_update' THEN
      JSON_BUILD_OBJECT(
        'old_price', fullDocumentBeforeChange.price,
        'new_price', fullDocument.price,
        'change_amount', fullDocument.price - fullDocumentBeforeChange.price,
        'change_percentage', 
          ROUND(
            ((fullDocument.price - fullDocumentBeforeChange.price) / 
             fullDocumentBeforeChange.price) * 100, 
            2
          )
      )
    ELSE NULL
  END as price_change_analysis,

  -- Inventory change analysis
  CASE 
    WHEN change_type = 'inventory_update' THEN
      JSON_BUILD_OBJECT(
        'old_inventory', fullDocumentBeforeChange.inventory,
        'new_inventory', fullDocument.inventory,
        'change_amount', fullDocument.inventory - fullDocumentBeforeChange.inventory,
        'stock_status', 
          CASE 
            WHEN fullDocument.inventory = 0 THEN 'out_of_stock'
            WHEN fullDocument.inventory <= 5 THEN 'low_stock'
            WHEN fullDocument.inventory > fullDocumentBeforeChange.inventory THEN 'restocked'
            ELSE 'normal'
          END
      )
    ELSE NULL
  END as inventory_change_analysis

FROM CHANGE_STREAM product_changes
WHERE cluster_time > TIMESTAMP '2025-01-05 00:00:00'
ORDER BY cluster_time DESC;

-- Event aggregation and analytics
WITH change_events AS (
  SELECT 
    *,
    DATE_TRUNC('hour', cluster_time) as hour_bucket,
    DATE_TRUNC('day', cluster_time) as day_bucket
  FROM CHANGE_STREAM product_changes
  WHERE cluster_time >= CURRENT_TIMESTAMP - INTERVAL '24 hours'
),

hourly_change_metrics AS (
  SELECT 
    hour_bucket,

    -- Operation counts
    COUNT(*) as total_events,
    COUNT(*) FILTER (WHERE operation_type = 'insert') as product_creates,
    COUNT(*) FILTER (WHERE operation_type = 'update') as product_updates,
    COUNT(*) FILTER (WHERE operation_type = 'delete') as product_deletes,

    -- Change type analysis
    COUNT(*) FILTER (WHERE change_type = 'price_update') as price_changes,
    COUNT(*) FILTER (WHERE change_type = 'inventory_update') as inventory_changes,
    COUNT(*) FILTER (WHERE change_type = 'product_creation') as new_products,

    -- Priority distribution
    COUNT(*) FILTER (WHERE event_priority = 'critical') as critical_events,
    COUNT(*) FILTER (WHERE event_priority = 'high') as high_priority_events,
    COUNT(*) FILTER (WHERE event_priority = 'medium') as medium_priority_events,
    COUNT(*) FILTER (WHERE event_priority = 'low') as low_priority_events,

    -- Business impact metrics
    AVG(CAST(price_change_analysis->>'change_percentage' AS DECIMAL)) as avg_price_change_pct,
    COUNT(*) FILTER (WHERE inventory_change_analysis->>'stock_status' = 'out_of_stock') as out_of_stock_events,
    COUNT(*) FILTER (WHERE inventory_change_analysis->>'stock_status' = 'low_stock') as low_stock_events,

    -- Unique products affected
    COUNT(DISTINCT document_key._id) as unique_products_affected,
    COUNT(DISTINCT category_info.name) as categories_affected

  FROM change_events
  GROUP BY hour_bucket
),

change_velocity_analysis AS (
  SELECT 
    hcm.*,

    -- Change velocity metrics
    total_events / 60.0 as events_per_minute,
    unique_products_affected / 60.0 as products_changed_per_minute,

    -- Change intensity scoring
    CASE 
      WHEN critical_events > 10 THEN 'very_high_intensity'
      WHEN high_priority_events > 50 THEN 'high_intensity'
      WHEN total_events > 100 THEN 'moderate_intensity'
      ELSE 'normal_intensity'
    END as change_intensity,

    -- Business activity classification
    CASE 
      WHEN price_changes > total_events * 0.3 THEN 'pricing_focused'
      WHEN inventory_changes > total_events * 0.4 THEN 'inventory_focused'
      WHEN new_products > total_events * 0.2 THEN 'catalog_expansion'
      ELSE 'general_maintenance'
    END as activity_pattern,

    -- Alert thresholds
    CASE 
      WHEN critical_events > 5 OR out_of_stock_events > 20 THEN 'alert_required'
      WHEN high_priority_events > 30 OR events_per_minute > 5 THEN 'monitoring_required'
      ELSE 'normal_operations'
    END as operational_status

  FROM hourly_change_metrics hcm
)

SELECT 
  TO_CHAR(hour_bucket, 'YYYY-MM-DD HH24:00') as hour,

  -- Core metrics
  total_events,
  ROUND(events_per_minute, 2) as events_per_minute,
  unique_products_affected,
  categories_affected,

  -- Operation breakdown
  product_creates,
  product_updates,
  product_deletes,

  -- Change type breakdown  
  price_changes,
  inventory_changes,

  -- Priority breakdown
  critical_events,
  high_priority_events,
  medium_priority_events,
  low_priority_events,

  -- Business insights
  change_intensity,
  activity_pattern,
  operational_status,

  -- Impact metrics
  ROUND(COALESCE(avg_price_change_pct, 0), 2) as avg_price_change_pct,
  out_of_stock_events,
  low_stock_events,

  -- Health indicators
  ROUND((total_events - critical_events)::DECIMAL / total_events * 100, 1) as operational_health_pct,

  -- Recommendations
  CASE operational_status
    WHEN 'alert_required' THEN 'Immediate attention required - high critical event volume'
    WHEN 'monitoring_required' THEN 'Increased monitoring recommended'
    ELSE 'Normal operations - continue monitoring'
  END as recommendation

FROM change_velocity_analysis
ORDER BY hour_bucket DESC;

-- Real-time event routing and notifications
CREATE TRIGGER change_event_router
  ON CHANGE_STREAM product_changes
  FOR EACH CHANGE_EVENT
  EXECUTE FUNCTION (
    -- Route critical events immediately
    WHEN event_priority = 'critical' THEN
      NOTIFY 'critical_alerts' WITH PAYLOAD JSON_BUILD_OBJECT(
        'event_id', event_id,
        'product_id', document_key._id,
        'operation', operation_type,
        'priority', event_priority,
        'timestamp', cluster_time
      ),

    -- Batch medium/low priority events
    WHEN event_priority IN ('medium', 'low') THEN
      INSERT INTO event_batch_queue (
        event_id, event_priority, event_data, batch_window
      ) VALUES (
        event_id, 
        event_priority, 
        JSON_BUILD_OBJECT(
          'product_id', document_key._id,
          'operation', operation_type,
          'change_type', change_type,
          'details', full_document
        ),
        DATE_TRUNC('minute', CURRENT_TIMESTAMP, 5) -- 5-minute batching window
      ),

    -- Route to search indexing
    WHEN operation_type IN ('insert', 'update') THEN
      INSERT INTO search_index_updates (
        document_id, collection_name, operation_type, 
        document_data, priority, created_at
      ) VALUES (
        document_key._id, 
        'products', 
        operation_type,
        full_document,
        CASE WHEN event_priority = 'critical' THEN 'immediate' ELSE 'normal' END,
        CURRENT_TIMESTAMP
      ),

    -- Route to analytics pipeline
    ALWAYS THEN
      INSERT INTO analytics_events (
        event_id, event_type, collection_name, document_id,
        operation_type, event_data, processing_priority, created_at
      ) VALUES (
        event_id,
        'change_stream_event',
        'products',
        document_key._id,
        operation_type,
        JSON_BUILD_OBJECT(
          'change_type', change_type,
          'priority', event_priority,
          'business_context', JSON_BUILD_OBJECT(
            'category', category_info.name,
            'inventory', inventory_info.quantity
          )
        ),
        event_priority,
        CURRENT_TIMESTAMP
      )
  );

-- Event sourcing and audit trail queries
CREATE VIEW product_change_audit AS
SELECT 
  event_id,
  document_key._id as product_id,
  operation_type,
  cluster_time as event_time,

  -- Change details
  change_type,

  -- Document states
  full_document as current_state,
  full_document_before_change as previous_state,

  -- Change delta for updates
  CASE 
    WHEN operation_type = 'update' THEN
      JSON_BUILD_OBJECT(
        'updated_fields', updateDescription.updatedFields,
        'removed_fields', updateDescription.removedFields
      )
    ELSE NULL
  END as change_delta,

  -- Business impact
  event_priority,

  -- Audit metadata
  resume_token,
  wall_time,

  -- Computed audit fields
  ROW_NUMBER() OVER (
    PARTITION BY document_key._id 
    ORDER BY cluster_time
  ) as change_sequence,

  LAG(cluster_time) OVER (
    PARTITION BY document_key._id 
    ORDER BY cluster_time
  ) as previous_change_time,

  -- Time between changes
  EXTRACT(SECONDS FROM (
    cluster_time - LAG(cluster_time) OVER (
      PARTITION BY document_key._id 
      ORDER BY cluster_time
    )
  )) as seconds_since_last_change

FROM CHANGE_STREAM product_changes
ORDER BY document_key._id, cluster_time;

-- Advanced pattern detection
WITH product_lifecycle_events AS (
  SELECT 
    document_key._id as product_id,
    operation_type,
    change_type,
    cluster_time,

    -- Lifecycle stage detection
    CASE 
      WHEN operation_type = 'insert' THEN 'creation'
      WHEN operation_type = 'delete' THEN 'deletion'
      WHEN change_type = 'price_update' THEN 'pricing_management'
      WHEN change_type = 'inventory_update' THEN 'inventory_management'
      ELSE 'maintenance'
    END as lifecycle_stage,

    -- Change frequency analysis
    COUNT(*) OVER (
      PARTITION BY document_key._id 
      ORDER BY cluster_time 
      RANGE BETWEEN INTERVAL '1 hour' PRECEDING AND CURRENT ROW
    ) as changes_last_hour,

    -- Pattern detection
    LAG(change_type) OVER (
      PARTITION BY document_key._id 
      ORDER BY cluster_time
    ) as previous_change_type,

    LEAD(change_type) OVER (
      PARTITION BY document_key._id 
      ORDER BY cluster_time
    ) as next_change_type

  FROM CHANGE_STREAM product_changes
  WHERE cluster_time >= CURRENT_TIMESTAMP - INTERVAL '7 days'
),

change_patterns AS (
  SELECT 
    product_id,
    lifecycle_stage,
    change_type,
    previous_change_type,
    next_change_type,
    changes_last_hour,
    cluster_time,

    -- Pattern identification
    CASE 
      WHEN changes_last_hour > 10 THEN 'high_frequency_changes'
      WHEN change_type = 'price_update' AND previous_change_type = 'price_update' THEN 'price_oscillation'
      WHEN change_type = 'inventory_update' AND changes_last_hour > 5 THEN 'inventory_volatility'
      WHEN lifecycle_stage = 'creation' AND next_change_type = 'price_update' THEN 'immediate_pricing_adjustment'
      WHEN lifecycle_stage = 'deletion' AND previous_change_type = 'inventory_update' THEN 'clearance_deletion'
      ELSE 'normal_pattern'
    END as pattern_type,

    -- Anomaly scoring
    CASE 
      WHEN changes_last_hour > 20 THEN 5  -- Very high frequency
      WHEN changes_last_hour > 10 THEN 3  -- High frequency
      WHEN change_type = previous_change_type AND change_type = next_change_type THEN 2  -- Repetitive changes
      ELSE 0
    END as anomaly_score

  FROM product_lifecycle_events
),

pattern_alerts AS (
  SELECT 
    cp.*,

    -- Alert classification
    CASE 
      WHEN anomaly_score >= 5 THEN 'critical_pattern_anomaly'
      WHEN anomaly_score >= 3 THEN 'unusual_pattern_detected'
      WHEN pattern_type IN ('price_oscillation', 'inventory_volatility') THEN 'business_pattern_concern'
      ELSE 'normal_pattern'
    END as alert_level,

    -- Recommended actions
    CASE pattern_type
      WHEN 'high_frequency_changes' THEN 'Investigate automated system behavior'
      WHEN 'price_oscillation' THEN 'Review pricing strategy and rules'
      WHEN 'inventory_volatility' THEN 'Check inventory management system'
      WHEN 'clearance_deletion' THEN 'Verify clearance process completion'
      ELSE 'Continue monitoring'
    END as recommended_action

  FROM change_patterns cp
  WHERE anomaly_score > 0 OR pattern_type != 'normal_pattern'
)

SELECT 
  product_id,
  lifecycle_stage,
  pattern_type,
  alert_level,
  anomaly_score,
  changes_last_hour,
  TO_CHAR(cluster_time, 'YYYY-MM-DD HH24:MI:SS') as event_time,
  recommended_action,

  -- Pattern context
  CASE 
    WHEN alert_level = 'critical_pattern_anomaly' THEN 
      'CRITICAL: Unusual change frequency detected - immediate investigation required'
    WHEN alert_level = 'unusual_pattern_detected' THEN
      'WARNING: Pattern anomaly detected - monitoring recommended'
    WHEN alert_level = 'business_pattern_concern' THEN
      'BUSINESS ALERT: Review business process associated with detected pattern'
    ELSE 'INFO: Pattern identified for awareness'
  END as alert_description

FROM pattern_alerts
ORDER BY anomaly_score DESC, cluster_time DESC
LIMIT 100;

-- QueryLeaf provides comprehensive change stream capabilities:
-- 1. SQL-familiar change stream creation and monitoring syntax
-- 2. Advanced event filtering and enrichment with business context
-- 3. Real-time event routing and notification triggers
-- 4. Comprehensive change analytics and velocity analysis
-- 5. Pattern detection and anomaly identification
-- 6. Event sourcing and audit trail capabilities
-- 7. Business rule integration and automated responses  
-- 8. Cross-collection change correlation and analysis
-- 9. Production-ready error handling and resume capabilities
-- 10. Native integration with MongoDB Change Streams performance optimization

Best Practices for Change Streams Implementation

Event Processing Strategy and Performance Optimization

Essential principles for effective MongoDB Change Streams deployment:

  1. Resume Token Management: Implement robust resume token persistence and recovery strategies for guaranteed delivery
  2. Pipeline Optimization: Design change stream pipelines that minimize network traffic and processing overhead
  3. Error Handling: Implement comprehensive error handling with retry logic and dead letter queue management
  4. Filtering Strategy: Apply server-side filtering to reduce client processing load and network usage
  5. Batch Processing: Implement intelligent batching for high-volume event processing scenarios
  6. Performance Monitoring: Track change stream performance metrics and optimize based on usage patterns

Production Event-Driven Architecture

Optimize Change Streams for enterprise-scale event-driven systems:

  1. Scalability Design: Plan for horizontal scaling with appropriate sharding and replica set configurations
  2. Fault Tolerance: Implement automatic failover and recovery mechanisms for change stream processors
  3. Event Enrichment: Design efficient event enrichment patterns that balance context with performance
  4. Integration Patterns: Establish clear integration patterns with external systems and message queues
  5. Security Considerations: Implement proper authentication and authorization for change stream access
  6. Operational Monitoring: Deploy comprehensive monitoring and alerting for change stream health

Conclusion

MongoDB Change Streams provide comprehensive real-time event-driven capabilities that eliminate the complexity and performance overhead of traditional polling-based change detection. The combination of guaranteed delivery, automatic resume capabilities, and sophisticated filtering makes Change Streams ideal for building responsive, event-driven applications that scale efficiently with growing data volumes.

Key MongoDB Change Streams benefits include:

  • Real-Time Notifications: Native change notifications without polling overhead or database performance impact
  • Guaranteed Delivery: Automatic resume capability and failure recovery with cluster-wide ordering guarantees
  • Advanced Filtering: Server-side aggregation pipelines for targeted event processing and context enrichment
  • Production Ready: Built-in error handling, retry logic, and integration with MongoDB's operational model
  • Event-Driven Architecture: Native support for reactive patterns, event sourcing, and CQRS implementations
  • SQL Accessibility: Familiar SQL-style change stream operations through QueryLeaf for accessible event processing

Whether you're building real-time dashboards, notification systems, data synchronization services, or comprehensive event-driven architectures, MongoDB Change Streams with QueryLeaf's familiar SQL interface provide the foundation for scalable, responsive applications.

QueryLeaf Integration: QueryLeaf seamlessly manages MongoDB Change Streams while providing SQL-familiar syntax for change event monitoring, filtering, and processing. Advanced event-driven patterns including real-time analytics, pattern detection, and automated routing are elegantly handled through familiar SQL constructs, making sophisticated event processing both powerful and accessible to SQL-oriented development teams.

The combination of MongoDB's robust change stream capabilities with SQL-style event operations makes it an ideal platform for applications requiring both real-time responsiveness and familiar database interaction patterns, ensuring your event-driven systems can evolve with changing requirements while maintaining reliable, high-performance operation.