2025-07-30 | Categories Distributed Systems | Tags MIT 6.824 Key-Value Store Sharding Load Balancing Distributed Systems

Sharded Key-Value Store in Practice: Design and Implementation

1. Everyday Analogy: Division of Labor and Ledger Partitioning

Imagine a group managing a massive ledger; handling it alone is difficult and error-prone. They decide to split the ledger into multiple parts, each managed by different people while coordinating their work. This reduces individual burden and ensures data consistency. Sharded key-value stores similarly split large data across nodes for efficient cooperation.

2. System Goals and Challenges

Shard management: Partition data reasonably to evenly distribute load
Request routing: Client requests accurately target the corresponding shard
Data replication and fault tolerance: Ensure data reliability and prevent single points of failure
Dynamic scaling and migration: Support shard adjustments while maintaining system stability

3. Architecture Overview and Workflow

Overall Architecture:

Client
   ↓ Request shard mapping
Shard Controller (manages shard mappings)
   ↓ Specifies target shard
Shard Servers (shard node clusters)
   ↓ Data storage and replication

Request Flow:

Client
   └── Queries Shard Controller for shard info
        └── Sends request to specific Shard Server
            └── Read/write operations

4. Key Design Points

1. Shard Mapping Management

Maintain a mapping table recording which shard each key belongs to
Use consistent hashing or range partitioning for mapping

2. Request Routing Strategy

Client or proxy first accesses shard controller to get shard info
Requests are routed directly to the corresponding shard server to reduce forwarding

3. Shard Data Replication

Use Raft within each shard to guarantee consistency and fault tolerance
Multi-replica mechanism ensures data durability when nodes fail

4. Shard Migration and Scaling

When new nodes join, coordinate migration of partial data from old nodes
Ensure data consistency and availability during migration

5. Key Code Examples (Go)

1. Get Shard Number (Hash Function)

func key2shard(key string, shardCount int) int {
    h := fnv.New32a()
    h.Write([]byte(key))
    return int(h.Sum32()) % shardCount
}

2. Client Requests Shard Controller for Routing Info

func (client *Clerk) QueryShard(key string) int {
    shard := key2shard(key, client.shardCount)
    return client.config.Shards[shard] // Returns the server ID for the shard
}

3. Shard Server Handles Write Request (Invoking Raft)

func (kv *ShardKV) Put(args *PutArgs, reply *PutReply) {
    if !kv.rf.IsLeader() {
        reply.Err = ErrWrongLeader
        return
    }
    op := Op{Key: args.Key, Value: args.Value, Type: "Put"}
    index, _, isLeader := kv.rf.Start(op)
    if !isLeader {
        reply.Err = ErrWrongLeader
        return
    }
    kv.waitForCommit(index)
    reply.Err = OK
}

6. Debugging and Practical Tips

Simulate shard node dynamic join/leave to verify migration mechanism
Test cross-shard requests to ensure accurate routing
Stress test shard balancing to avoid hotspot nodes
Use logs and monitoring to track shard states

7. Terminology Mapping Table

Everyday Term	Technical Term	Description
Ledger Partition	Data Sharding	Splitting data into parts for distributed storage
Chief Accountant	Shard Controller	Manages shard info and routing rules
Ledger Manager	Shard Server	Server storing shard data
Ledger Migration	Shard Migration	Reallocation of data among nodes

8. Thought Exercises and Practice

How to implement dynamic shard scaling without service interruption?
Design client-side shard mapping caching to reduce shard controller load.
Implement leader election and failure recovery for shard replicas.

9. Conclusion: The Path to Scalable Sharded Key-Value Stores

Sharded key-value systems combine shard management, load balancing, and Raft replication to deliver highly available and high-performance data services. Mastering these design principles and practical skills is key to building large-scale distributed storage.

< Data Sharding and Load Balancing The Scalability Boosters for Distributed Systems Distributed Transaction Processing Ensuring Cross-Node Data Consistency >