AzureLinux云计算系统

故障排查:解决红帽Redhat/CentOS8的systemd-resolved.service不定期启动失败的问题

Systemd这是一个用于Linux的系统和服务管理器,与SysV以及LSB init脚本兼容。它提供了强大的处理能力,使用套接字和D-Bus启动服务,提供按需启动守护进程,并使用Linux cgroups跟踪进程。
然而在版本systemd-239-31.el8版本会存在问题可能会影响操作系统级别的网络组件的正常运行。

 

症状:

服务systemd-resolved.service默认在RHEL/CENTOS 8是启动状态,但在操作系统启动环节一些时候(不定期的)会出现无法启动的状况。

[FAILED] Failed to start Network Name Resolution.
See 'systemctl status systemd-resolved.service' for details.
[  OK  ] Started Create list of required sta…vice nodes for the current kernel.
[  OK  ] Started Apply Kernel Variables.
[  OK  ] Stopped Network Name Resolution.
         Starting Network Name Resolution...
[FAILED] Failed to start Network Name Resolution.
See 'systemctl status systemd-resolved.service' for details.
[  OK  ] Stopped Network Name Resolution.
         Starting Network Name Resolution...
[FAILED] Failed to start Network Name Resolution.
See 'systemctl status systemd-resolved.service' for details.
[  OK  ] Stopped Network Name Resolution.
         Starting Network Name Resolution...
[FAILED] Failed to start Network Name Resolution.
See 'systemctl status systemd-resolved.service' for details.
[  OK  ] Stopped Network Name Resolution.
         Starting Network Name Resolution...
[FAILED] Failed to start Network Name Resolution.
See 'systemctl status systemd-resolved.service' for details.
[  OK  ] Stopped Network Name Resolution.
[FAILED] Failed to start Network Name Resolution.
See 'systemctl status systemd-resolved.service' for details.

 

故障排查:

检查/var/log/message,发现如下错误信息:

systemd-resolved.service: Failed at step NAMESPACE spawning /usr/lib/systemd/systemd-resolved: Read-only file system

 

根源:

由于系统systemd-resolved.service错误的配置(基于包systemd-239-31.el8),关键服务systemd-resolved.service可能会错误的在systemd-remount-fs.service未就绪之前地启动。
这意味着对于文件系统读写操作将会受到影响,从而在systemd-resolved.service启动时执行额外的重载入服务操作,并导致参数 PriviteDevices 涉及的/tmp目录权限处于只读状态。

 

验证:

运行命令以便检查你当前的systemd版本是否为systemd-239-31.el8

rpm -qa systemd
grep systemd /var/log/dnf.rpm*

 

解决:

临时解决方法(仅限RHEL/CentOS8):
1. 创建systemd-resolved.service的扩展配置文件,修改标记PrivateTmp=yes和ProtectSystem=strict:

mkdir -p /etc/systemd/system/systemd-resolved.service.d
echo -e "[Service]\nPrivateTmp=yes\nProtectSystem=strict" > /etc/systemd/system/systemd-resolved.service.d/override.conf

2. 重新载入systemd配置

systemctl daemon-reload

3. 验证并检查状态是否与以下结果符合:

systemctl show -p PrivateTmp -p ProtectSystem systemd-resolved.service
PrivateTmp=yes
ProtectSystem=strict

 

永久解决方法:
红帽RHEL官方已经发布声明并确认在最新的systemd-239-40.el8中解决了这个问题: RHSA-2020:4553 – Security Advisory – Red Hat 客户门户网站
如果你遇到了完全相同的症状,请尝试手动升级systemd相关组件:

dnf -y install systemd

 

参考文献:

  1. 公告来源:systemd-resolved.service sometimes fails to start – Red Hat Customer Portal
  2. 包更新细节:RHSA-2020:4553 – Security Advisory – Red Hat 客户门户网站