2025-07-28 | Categories Distributed Systems | Tags MIT 6.824 Raft Distributed Consensus Replicated Log

Practical Raft: A Deep Dive into Distributed Replicated Log Systems

1. Everyday Analogy: Team “Leader” Election and Task Synchronization

Imagine a project team that needs to select a leader through voting. The leader then assigns tasks to ensure everyone executes according to plan. Raft is such a “democratic” mechanism that guarantees multiple nodes coordinate consistently.

2. Core Design of Raft

1. Roles and States

Node Roles:
- Leader: Handles client requests and manages log replication
- Follower: Passively accepts commands from the leader
- Candidate: Competes to become the leader

2. Election Mechanism

Each follower waits for a randomized election timeout before becoming a candidate
Candidates request votes; the one with majority votes becomes leader
Leader periodically sends heartbeats (AppendEntries RPC) to prevent new elections

3. Log Replication

Leader receives client commands and appends them to its log
Leader replicates logs concurrently to all followers
Once a log entry is written by a majority, it is committed and applied to the state machine

3. Detailed Workflow

Raft Workflow:

Client Request
    ↓
Leader appends entry to log
    ↓
Sends AppendEntries RPC concurrently to Followers
    ↓
Followers write logs and respond success
    ↓
Leader confirms majority success, commits logs
    ↓
Apply to state machine

4. Core Code Examples (Go)

1. Election Timeout Triggering Election

func (rf *Raft) electionTimeout() {
    rf.mu.Lock()
    defer rf.mu.Unlock()
    if rf.role != Leader && time.Since(rf.lastHeartbeat) > rf.electionTimeout {
        rf.startElection()
    }
}

2. Sending Vote Requests

func (rf *Raft) startElection() {
    rf.currentTerm++
    rf.role = Candidate
    rf.votedFor = rf.me
    votes := 1
    for _, peer := range rf.peers {
        if peer == rf.me {
            continue
        }
        go func(p int) {
            voteGranted := rf.sendRequestVote(p)
            if voteGranted {
                votes++
                if votes > len(rf.peers)/2 {
                    rf.becomeLeader()
                }
            }
        }(peer)
    }
}

3. Appending Log Entries

func (rf *Raft) AppendEntries(args *AppendEntriesArgs, reply *AppendEntriesReply) {
    rf.mu.Lock()
    defer rf.mu.Unlock()
    if args.Term < rf.currentTerm {
        reply.Success = false
        return
    }
    rf.lastHeartbeat = time.Now()
    rf.role = Follower
    rf.currentTerm = args.Term
    rf.log = append(rf.log, args.Entries...)
    reply.Success = true
}

5. Debugging Tips and Practical Experience

Simulate network delays and partitions to test election stability
Monitor log consistency to avoid log loss or out-of-order entries
Use Go’s race detector to catch race conditions
Add detailed logs for state transitions to diagnose role changes

6. Terminology Mapping Table

Everyday Term	Technical Term	Explanation
Team Leader	Leader	Manages logs and commands cluster
Team Member	Follower	Receives and executes leader commands
Candidate	Candidate	Runs for leadership
Vote	RequestVote RPC	Message requesting votes
Heartbeat	AppendEntries RPC	Leader’s periodic authority message

7. Thought Exercises and Practice

How does Raft prevent multiple leaders during network partitions?
Design log compaction and snapshot mechanisms to improve performance.
Implement AppendEntries RPC with retries and timeout handling.

8. Conclusion: Master Distributed Consensus with Raft

Raft’s clear role definitions and workflows make it a cornerstone of distributed system consistency. Understanding and implementing Raft is key to mastering distributed log replication and fault tolerance design.

< Demystifying Distributed Consistency CAP Theorem and Raft Algorithm Explained Fault Tolerance and High Availability Building Stable Distributed Systems >