MySQL CPU飙高排查的全流程指南_Mysql

当 mysql 出现 cpu 持续飙高 时，问题往往不只存在于数据库本身，而可能涉及：

sql 执行效率
系统资源瓶颈
并发模型
内核调度
i/o 或网络行为

本文提供一套 工程化三阶段排查方法：

主机层定位 → 系统层分析 → mysql 内部根因

目标是精准回答三个问题：

cpu 被谁消耗？
为什么消耗？
如何优化？

第一阶段：确认问题范围（定位 cpu 消耗主体）

目标：明确 是谁在消耗 cpu。

mysql 整体？
单个线程？
sql 计算？
内核系统调用？

1.1 查看主机整体 cpu 负载

命令

top -c

cpu 行解读

%cpu(s): 10.4 us, 2.6 sy, 0.0 ni, 86.5 id, 0.1 wa, 0.0 hi, 0.4 si, 0.0 st

字段	含义	判断
us	用户态 cpu	高 → sql 计算密集
sy	内核态 cpu	高 → 系统调用频繁
wa	i/o 等待	高 → 磁盘瓶颈
id	空闲	低 → cpu 真正繁忙

经验判断

us > 70% → sql 问题概率极高
sy 高 → 锁 / 内核调度问题
wa 高 → i/o 伪 cpu 高

load average 判断

load average: 8.2, 7.9, 6.5

规则：

load > cpu 核心数 = 系统过载

示例：

4 核 cpu
load = 8
➡ 存在运行队列堆积

1.2 定位 mysql 进程 pid

ps -ef | grep mysqld
# 或
pidof mysqld

记录 pid，例如：

1.3 查看 mysql 内部线程 cpu（关键步骤）

mysql = 多线程模型
一个连接 ≈ 一个线程。

top -h -p 12345 -d 1

场景分析

✅ 场景 a：单线程 100%

含义：

单条慢 sql

行动：

记录线程 id
去 mysql 查 sql

✅ 场景 b：大量线程均高

含义：

并发过高 / 连接风暴

行动：

检查连接池
限制最大连接

✅ 场景 c：线程不高但整体 cpu 高

可能原因：

mysql 后台线程
锁竞争
上下文切换

1.4 区分用户态与内核态 cpu

pidstat -p 12345 -u -h 1 5

字段	含义
%usr	sql 计算
%system	内核消耗
%cpu	总占用

第二阶段：系统层面排查

确认 mysqld 占 cpu 后，需要排除 操作系统导致的性能下降。

2.1 上下文切换检查

vmstat 1 5

重点字段：

字段	含义
cs	上下文切换
in	中断次数

判断：

正常：几千/s
异常：> 20000/s

原因：

线程过多
锁竞争
cpu 抢占

进一步：

pidstat -w -p 12345 1 5

关注：

cswch/s
nvcswch/s

2.2 内存与 swap 检查

free -m
vmstat 1 5

关键字段：

字段	含义
si	swap in
so	swap out

⚠️ si/so != 0 = 严重问题

影响：

cpu sy 飙升
数据库性能断崖下降

优化：

增内存
调整 buffer pool
swapoff -a

2.3 网络连接检查

netstat -an | grep established | wc -l
netstat -an | grep time_wait | wc -l
ss -ant | grep :3306 | wc -l

判断：

现象	含义
established 高	连接池失效
time_wait 高	短连接风暴

优化：

使用连接池
tcp_tw_reuse

2.4 磁盘 i/o 与 cpu 关联

iostat -x -k 1 5

关注：

字段	判断
%util	接近100% = 饱和
await	>10ms = 慢盘

若同时：

wa 高
%util 高

➡ cpu 是被动等待。

2.5 numa 架构检查

numactl --hardware
dmesg | grep -i numa

问题：

cpu 与内存跨节点访问

建议：

numactl --interleave=all /usr/sbin/mysqld

2.6 硬中断检查

watch -n 1 'cat /proc/interrupts | grep -e "cpu|eth|nvme|sda"'

如果某 cpu 中断暴涨：

➡ irq 未均衡

解决：

irqbalance

第一、二阶段总结

检查项	命令	异常
cpu	top	us/wa 高
线程	top -h	单线程100%
上下文	vmstat	cs 高
swap	vmstat	si/so>0
网络	ss	time_wait 多
numa	numactl	未绑定

第三阶段：mysql 层面排查（核心阶段）

当系统层无异常：

问题几乎一定在 sql 或 mysql 内部机制

3.1 实时会话分析（抓现行）

select id, user, host, db, command, time, state, info
from information_schema.processlist
where command != 'sleep'
order by time desc
limit 20;

state 含义

状态	含义
sending data	全表扫描
sorting result	排序
creating tmp table	临时表
waiting for lock	锁竞争
purging	undo 清理

os 线程关联（8.0）

通过：

performance_schema.threads

关联：

processlist_id
thread_os_id

3.2 慢查询分析（历史问题）

开启：

set global slow_query_log='on';
set global long_query_time=0.1;
set global log_queries_not_using_indexes='on';

mysqldumpslow

mysqldumpslow -s t -t 10 slow.log

pt-query-digest（推荐）

pt-query-digest slow.log

关注：

rows examine
response time

3.3 状态指标分析

线程

show status like 'threads_running';

规则：

threads_running ≤ cpu 核心数

临时表

show status like 'created_tmp%';

磁盘临时表高 → sql 或 tmp_table_size 问题。

buffer pool 命中率

计算：

1 - reads / read_requests

目标：

≥ 99%

3.4 锁与事务分析

show engine innodb status\g

关注：

transactions
semaphores

大量 spin/wait → 锁竞争。

select * from sys.innodb_lock_waits;

检查：

长事务
ddl 阻塞

3.5 执行计划分析（最终定位）

explain format=json select ...

关键字段：

字段	危险信号
type	all
key	null
rows	极大
extra	using filesort
extra	using temporary

常见索引失效

违反最左前缀
函数计算
隐式类型转换
%abc 模糊查询

3.6 performance schema 深度分析

select event_name, count_star, sum_timer_wait
from performance_schema.events_statements_summary_by_global_by_event_name
order by sum_timer_wait desc
limit 10;

实时：

select *
from sys.session
order by current_statement_latency desc;

第三阶段决策表

现象	原因	方案
单 sql 慢	全表扫描	建索引
多 sql 快	并发高	限流
tmp 表高	排序	调内存
buffer miss	内存小	调 bp
锁等待	长事务	拆事务
purging	写入多	调 purge

MySQL CPU飙高排查的全流程指南