What:
A decentralized peer-to-peer communication protocol where nodes periodically share membership and state details with randomly selected neighbors.
Primary purpose:
Managing cluster member directories, broadcasting system metadata, and detecting node hardware crashes without a single-point controller.
Usually used for:
Cassandra cluster token ring sync, Consul service mesh discoverability, and distributed key-value store configurations.
How should I think about this inside system architectures?
🦠 Epidemic Dissemination
Like a virus spreading in a crowd. If Node A learns about a new server, it gossips to B and C. They gossip to D and E, spreading the news to the whole cluster in logarithmic time.
📊 Phi Accrual Suspicion
Avoid binary alive/dead assumptions. Calculate a continuous probability score based on heartbeats history to separate network lag from real node crashes.
🔄 Anti-Entropy Sync
Reconcile divergent states. Nodes periodically select a random neighbor to compare full datasets, resolving inconsistencies via Merkle Trees.