What:
Real-time processing engines (Apache Flink, Spark Streaming) that aggregate and transform infinite message streams continuously.
Primary purpose:
Extracting analytical insights, metric counters, or fraud signals from event logs instantly as they occur.
Usually used for:
Real-time GPS coordinate ETAs, live payment fraud alerts, and sliding-window system health trackers.
How should I think about this inside system architectures?
🕰️ Event vs Processing Time
Event Time is when the update occurred on the client. Processing Time is when it reaches the server. Always process using Event Time.
🌊 Watermark advancement
A temporal anchor signal: 'We assume all events with Event Time < T have arrived.' Advance watermarks to close windows and emit states.
🗄️ Stateful RocksDB Cache
Store running aggregate counters (e.g. click counts) inside local RocksDB key-value stores to enable micro-second local state updates.