Redis之BigKey与HotKey问题详解_Redis

概述

在 redis 的实际使用过程中，bigkey（大键）和 hotkey（热键）是两个常见且可能导致严重性能问题的问题。它们可能导致 redis 实例响应变慢、内存使用不均、甚至服务不可用。

bigkey 指的是单个键值对占用内存过大的情况。

hotkey 指的是某个键的访问频率远高于其他键的情况。

这两个问题往往同时出现，相互影响，需要系统性地进行预防和处理。

bigkey 问题

什么是 bigkey

bigkey 是指单个键值对占用内存过大的键。redis 官方建议：

string 类型：单个 value 不超过 10kb
hash、list、set、zset 类型：元素个数不超过 5000

超过这些阈值的键可以被视为 bigkey。

bigkey 的危害

1. 内存占用不均

bigkey 会占用大量内存，可能导致 redis 实例内存使用不均衡。在 redis cluster 中，这会导致某些节点的内存使用率远高于其他节点，引发内存倾斜。

2. 阻塞主线程

redis 是单线程模型，bigkey 的操作会阻塞主线程：

操作	影响说明
del	删除大键会阻塞主线程，时间复杂度 o(n)
hgetall	获取所有字段会阻塞主线程
lrange	大范围获取列表元素会阻塞
keys	遍历所有键会严重阻塞
flushdb/flushall	清空数据库会长时间阻塞

3. 网络带宽消耗

bigkey 的读写操作会消耗大量网络带宽，影响其他请求的响应速度。

4. 持久化问题

rdb：生成 rdb 文件时，bigkey 会导致 fork 子进程时内存占用翻倍
aof：bigkey 的写入会导致 aof 文件膨胀，重写时耗时较长

5. 主从同步延迟

bigkey 的同步会导致主从复制延迟增加，影响数据一致性。

bigkey 的检测

1. 使用 redis-cli --bigkeys

redis 自带的 --bigkeys 命令可以扫描并统计大键：

redis-cli --bigkeys -i 0.1

参数说明：

-i 0.1：每次扫描间隔 0.1 秒，避免阻塞

输出示例：

-------- summary -------
sampled 506 keys in the keyspace!
total key length in bytes is 1885 (avg len 3.73)

biggest string found 'user:1001:profile' has 10240 bytes
biggest   list found 'order:queue' has 10003 items
biggest    set found 'online:users' has 8005 items
biggest   hash found 'product:info' has 5012 fields

506 keys with 506 types

2. 使用 scan 命令

编写脚本使用 scan 命令遍历所有键并检查大小：

#!/bin/bash
redis-cli --scan --pattern "*" | while read key; do
    size=$(redis-cli memory usage "$key")
    if [ $size -gt 10240 ]; then
        echo "bigkey: $key, size: $size bytes"
    fi
done

3. 使用 memory usage 命令

redis-cli memory usage your_key

4. 使用 redis 慢查询日志

配置慢查询阈值：

redis-cli config set slowlog-log-slower-than 10000  # 10ms
redis-cli slowlog get 10

5. 使用 redis 模块

redis modules：如 redisjson、redistimeseries 等模块提供更详细的内存分析
redis insight：官方可视化工具，可以查看内存分布

bigkey 的解决方案

1. 拆分 bigkey

hash 拆分示例：

原始结构：

user:1001:info -> {name: "张三", age: 30, address: "...", ...} (5000+ fields)

拆分后：

user:1001:info:base -> {name: "张三", age: 30}
user:1001:info:contact -> {phone: "...", email: "..."}
user:1001:info:address -> {province: "...", city: "..."}

list 拆分示例：

原始结构：

order:queue -> [order1, order2, ..., order10000]

拆分后：

order:queue:0 -> [order1, ..., order1000]
order:queue:1 -> [order1001, ..., order2000]
...
order:queue:9 -> [order9001, ..., order10000]

2. 使用合适的数据结构

场景	推荐数据结构	避免使用
简单键值对	string	hash (少量字段时)
对象属性	hash	string (json)
去重集合	set	list
排序集合	zset	set + 排序
计数器	string (incr)	hash

3. 压缩数据

使用更紧凑的序列化格式（如 messagepack、protobuf）
对 string 类型的值进行压缩
使用 hash 的 ziplist 编码（元素较少时）

4. 异步删除 bigkey

使用 unlink 命令替代 del：

redis-cli unlink your_big_key

unlink 会在后台线程中删除键，不会阻塞主线程。

5. 分批次操作

对于大集合的操作，分批次进行：

# 原始方式（会阻塞）
redis.hgetall("big_hash")

# 改进方式（分批次）
cursor = 0
while true:
    cursor, data = redis.hscan("big_hash", cursor, count=100)
    process(data)
    if cursor == 0:
        break

6. 设置过期时间

为 bigkey 设置合理的过期时间，避免数据无限累积：

redis-cli expire your_key 3600

hotkey 问题

什么是 hotkey

hotkey 是指某个键的访问频率远高于其他键的键。通常表现为：

某个键的 qps 远超其他键
某个键的读写请求集中在短时间内爆发
某个键的访问量占整个 redis 实例的很大比例

hotkey 的危害

1. cpu 负载不均

在 redis cluster 中，hotkey 所在节点的 cpu 负载会远高于其他节点：

节点 a: cpu 95% (包含 hotkey)
节点 b: cpu 20%
节点 c: cpu 15%

2. 网络带宽瓶颈

hotkey 的高频访问会消耗大量网络带宽，影响其他请求。

3. 缓存击穿

当 hotkey 过期时，大量请求会同时穿透到后端数据库，导致数据库压力骤增。

4. 请求堆积

由于 redis 是单线程，hotkey 的处理会导致其他请求排队等待。

5. 主从同步压力

hotkey 的频繁更新会增加主从同步的负担。

hotkey 的检测

1. 使用 redis info 命令

redis-cli info stats

关注 keyspace_hits 和 keyspace_misses 指标。

2. 使用 monitor 命令（谨慎使用）

redis-cli monitor | grep "your_key"

注意：monitor 会严重影响性能，仅用于调试，不要在生产环境使用。

3. 使用 redis 慢查询日志

redis-cli config set slowlog-log-slower-than 0
redis-cli slowlog get 100

4. 使用客户端统计

在应用层记录每个键的访问频率：

from collections import defaultdict

access_stats = defaultdict(int)

def get_redis(key):
    access_stats[key] += 1
    return redis.get(key)

# 定期打印统计
def print_stats():
    for key, count in sorted(access_stats.items(), key=lambda x: x[1], reverse=true)[:10]:
        print(f"{key}: {count}")

5. 使用 redis 4.0+ 的 lfu 淘汰策略

配置 lfu（least frequently used）淘汰策略：

redis-cli config set maxmemory-policy allkeys-lfu

然后使用 object freq 命令查看访问频率：

redis-cli object freq your_key

6. 使用第三方工具

redis exporter + prometheus + grafana：监控 redis 指标
阿里云 redis：提供 hotkey 分析功能
腾讯云 redis：提供热 key 监控

hotkey 的解决方案

1. 本地缓存

在应用层使用本地缓存（如 guava cache、caffeine）缓存 hotkey：

// 使用 caffeine 本地缓存
cache<string, string> localcache = caffeine.newbuilder()
    .maximumsize(1000)
    .expireafterwrite(1, timeunit.minutes)
    .build();

public string get(string key) {
    // 先查本地缓存
    string value = localcache.getifpresent(key);
    if (value != null) {
        return value;
    }
    // 再查 redis
    value = redis.get(key);
    if (value != null) {
        localcache.put(key, value);
    }
    return value;
}

2. 读写分离

对于读多写少的 hotkey，使用读写分离：

应用 -> 读请求 -> redis 从节点
应用 -> 写请求 -> redis 主节点

3. key 分片

将 hotkey 拆分成多个 key：

原始方式：

hot_product:1001 -> 商品信息

分片方式：

hot_product:1001:0 -> 商品信息
hot_product:1001:1 -> 商品信息（副本）
hot_product:1001:2 -> 商品信息（副本）

访问时随机选择一个分片：

import random

def get_hot_product(product_id):
    shard = random.randint(0, 2)
    return redis.get(f"hot_product:{product_id}:{shard}")

4. 备份 key

为 hotkey 创建多个备份，分散请求：

# 写入时同步更新所有备份
def set_hot_key(key, value):
    pipe = redis.pipeline()
    for i in range(3):
        pipe.set(f"{key}:backup:{i}", value)
    pipe.execute()

# 读取时随机选择一个备份
def get_hot_key(key):
    backup = random.randint(0, 2)
    return redis.get(f"{key}:backup:{backup}")

5. 使用 redis cluster

在 redis cluster 中，hotkey 会分散到不同的节点，但需要注意：

确保数据分片均匀
避免使用 hash tag 导致数据集中

hash tag 示例（会导致数据集中）：

user:{1001}:profile
user:{1001}:orders
user:{1001}:cart

这些键会被分配到同一个节点。

6. 限流保护

对 hotkey 的访问进行限流：

from functools import wraps
import time

class ratelimiter:
    def __init__(self, max_calls, period):
        self.max_calls = max_calls
        self.period = period
        self.calls = {}
    
    def allow(self, key):
        now = time.time()
        if key not in self.calls:
            self.calls[key] = []
        # 清理过期记录
        self.calls[key] = [t for t in self.calls[key] if now - t < self.period]
        if len(self.calls[key]) >= self.max_calls:
            return false
        self.calls[key].append(now)
        return true

limiter = ratelimiter(max_calls=1000, period=1)  # 每秒最多 1000 次

def rate_limit(func):
    @wraps(func)
    def wrapper(key, *args, **kwargs):
        if not limiter.allow(key):
            raise exception("rate limit exceeded")
        return func(key, *args, **kwargs)
    return wrapper

@rate_limit
def get_hot_key(key):
    return redis.get(key)

7. 缓存预热

在系统启动或低峰期，提前加载 hotkey 到缓存：

def warm_up_cache():
    hot_keys = get_hot_keys_from_db()  # 从数据库获取热点键列表
    for key in hot_keys:
        value = db.get(key)
        redis.set(key, value, ex=3600)

8. 使用多级缓存

构建多级缓存架构：

应用 -> 本地缓存 -> redis -> 数据库

9. 消息队列削峰

对于写请求较多的 hotkey，使用消息队列削峰：

应用 -> 消息队列 -> 消费者 -> redis

最佳实践

1. 设计阶段

合理设计 key 的命名和结构：避免产生 bigkey
预估数据量：提前规划数据规模，选择合适的数据结构
设置过期时间：为所有 key 设置合理的过期时间

2. 开发阶段

使用 pipeline：批量操作减少网络开销
避免使用 keys：使用 scan 替代 keys
监控 key 大小：定期检查是否有 bigkey 产生

3. 运维阶段

定期巡检：使用工具定期检查 bigkey 和 hotkey
设置告警：对内存使用、慢查询等指标设置告警
容量规划：根据业务增长提前规划容量

4. 应急处理

紧急扩容：当发现问题时，及时扩容
限流降级：对异常请求进行限流和降级
数据迁移：将 bigkey 迁移到独立实例

工具推荐

1. redis 官方工具

工具	用途
redis-cli --bigkeys	检测 bigkey
redis-cli --memkeys	检测占用内存最多的键
redis-cli --hotkeys	检测 hotkey（lfu 模式下）
redis insight	可视化管理工具

2. 第三方工具

工具	特点
redis-rdb-tools	分析 rdb 文件，找出 bigkey
redis-faina	分析 monitor 输出，统计访问频率
redis exporter	prometheus 指标导出器
redis commander	web 管理界面
medis	mac 平台的 redis 客户端

3. 云服务

阿里云 redis：提供 bigkey 和 hotkey 分析
腾讯云 redis：提供热 key 监控
aws elasticache：提供 cloudwatch 监控

总结

bigkey 和 hotkey 是 redis 使用中的常见问题，但通过合理的设计、有效的监控和及时的优化，可以很好地避免和解决这些问题。

核心要点：

预防为主：在设计阶段就考虑避免 bigkey 和 hotkey
定期巡检：使用工具定期检查，及时发现问题
合理拆分：对 bigkey 进行拆分，对 hotkey 进行分散
多级缓存：构建多级缓存架构，减轻 redis 压力
监控告警：建立完善的监控和告警机制

到此这篇关于redis之bigkey与hotkey问题详解的文章就介绍到这了,更多相关redis bigkey与hotkey内容请搜索代码网以前的文章或继续浏览下面的相关文章希望大家以后多多支持代码网！

Redis之BigKey与HotKey问题详解

概述