oom-killer引起的一次k8s worker节点宕机事件

今天上班,同事反馈业务系统访问不了,排查下来发现k8s一个worker节点挂掉了,而且ssh还远程不上,无奈只能强制重启,查看日志信息如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
Jul 27 08:54:15 k8s-worker02 kernel: agent invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=1000
Jul 27 08:54:17 k8s-worker02 kernel: agent cpuset=bfabbea08d8f02561adc88889f407a9f7b3efcc0e3fa6ad441456cf846bc3c57 mems_allowed=0
Jul 27 08:54:20 k8s-worker02 kernel: CPU: 7 PID: 5495 Comm: agent Kdump: loaded Tainted: G ------------ T 3.10.0-1160.59.1.el7.x86_64 #1
Jul 27 08:54:22 k8s-worker02 kernel: Hardware name: Nutanix AHV, BIOS 1.11.0-2.el7 04/01/2014
Jul 27 08:54:25 k8s-worker02 kernel: Call Trace:
Jul 27 08:54:31 k8s-worker02 kernel: [<ffffffff91d865b9>] dump_stack+0x19/0x1b
Jul 27 08:54:38 k8s-worker02 kernel: [<ffffffff91d81658>] dump_header+0x90/0x229
Jul 27 08:54:44 k8s-worker02 kernel: [<ffffffff91706992>] ? ktime_get_ts64+0x52/0xf0
Jul 27 08:54:50 k8s-worker02 kernel: [<ffffffff9175e01f>] ? delayacct_end+0x8f/0xb0
Jul 27 08:54:59 k8s-worker02 kernel: [<ffffffff917c254d>] oom_kill_process+0x2cd/0x490
Jul 27 08:55:07 k8s-worker02 kernel: [<ffffffff917c1f3d>] ? oom_unkillable_task+0xcd/0x120
Jul 27 08:55:23 k8s-worker02 kernel: [<ffffffff917c2c3a>] out_of_memory+0x31a/0x500
Jul 27 08:55:36 k8s-worker02 kernel: [<ffffffff917c9854>] __alloc_pages_nodemask+0xad4/0xbe0
Jul 27 08:55:49 k8s-worker02 kernel: [<ffffffff918193b8>] alloc_pages_current+0x98/0x110
Jul 27 08:56:07 k8s-worker02 kernel: [<ffffffff917be007>] __page_cache_alloc+0x97/0xb0
Jul 27 08:56:23 k8s-worker02 kernel: [<ffffffff917c0fa0>] filemap_fault+0x270/0x420
Jul 27 08:56:41 k8s-worker02 kernel: [<ffffffffc03f691e>] __xfs_filemap_fault+0x7e/0x1d0 [xfs]
Jul 27 08:55:23 k8s-worker02 kernel: [<ffffffff917c2c3a>] out_of_memory+0x31a/0x500
Jul 27 08:55:36 k8s-worker02 kernel: [<ffffffff917c9854>] __alloc_pages_nodemask+0xad4/0xbe0
Jul 27 08:55:49 k8s-worker02 kernel: [<ffffffff918193b8>] alloc_pages_current+0x98/0x110
Jul 27 08:56:07 k8s-worker02 kernel: [<ffffffff917be007>] __page_cache_alloc+0x97/0xb0
Jul 27 08:56:23 k8s-worker02 kernel: [<ffffffff917c0fa0>] filemap_fault+0x270/0x420
Jul 27 08:56:41 k8s-worker02 kernel: [<ffffffffc03f691e>] __xfs_filemap_fault+0x7e/0x1d0 [xfs]

其中agent invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=1000这一条很醒目,服务器OOM了。因为Kubernetes官方不推荐使用SWAP分区,而且正常情况下服务器的内存是完全足够的,因此便只能对每个POD限制资源了,为容器配置适当的资源请求和限制。这有助于确保每个容器获得足够的内存,并避免单个容器过度消耗节点资源。

1
2
3
4
5
resources:
requests:
memory: "512Mi"
limits:
memory: "1Gi"

当然在非生产环境可以启用SWAP分区,虽然会对系统性能上有所影响,但是不至于引起OOM,导致服务不能访问的情况。

  1. 增加交换空间可以在物理内存不足时提供额外的虚拟内存。使用 swapon -s 检查当前交换空间,并根据需要调整:
1
2
3
4
sudo fallocate -l 4G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile
  1. 确保在 /etc/fstab 中配置持久化交换空间:
1
/swapfile none swap sw 0 0

除此之外,还可以调整内核参数来改变OOM killer 的行为,避免系统在那里不停的杀进程。比如我们可以在触发 OOM 后立刻触发 kernel panic,kernel panic 10秒后自动重启系统:

1
2
# echo "vm.panic_on_oom=1" >> /etc/sysctl.conf
# echo "kernel.panic=10" >> /etc/sysctl.conf

修改 oom_score_adj 的值以调整 OOM killer的行为。增加 oom_score_adj 值将增加进程被杀死的可能性,降低则减少:

1
echo -1000 > /proc/<pid>/oom_score_adj