The case of privileged systemd container and hardware clock
Today we debugged yet another issue which eventually turned out to a time synchronization problem with multiple machines.
Let us first look at the case we solved today:
The setup consisted of 3 nodes, 1 master OpenShift deployment. Whenever we started a privileged systemd container on a node, the node would enter into a NotReady state. These nodes were VMs that were hosted on a VMware EXSI server and the peculiarity was that the issue wasn't seen on a similar OpenShift setup hosted on a libvirt hypervisor.
To truly understand the issue, you need to know about hardware clock and system clock. Hardware clock is maintained on the chip and powered by a battery for the duration when the system is shutdown. System clock is maintained by OS in memory. When the computer boots, OS picks up the time from Hardware clock and sets the system clock. To mitigate clock skew, system clock is updated by OS at regular intervals using ntp or chrony or similar services. Now, hardware clock does not have a timezone attribute, hence it is duty of the admin to tell OS how to interpret it.
As you might have already figured, this hypervisor had wrong date/time set in hardware clock. It was also set to provide the hardware clock time as the hardware clock time to VM on its boot. Once the VM was up, it corrected system clock using NTP. However, when privileged systemd container started, it picked the wrong date/time from hardware clock and set system clock again. This made most of the services on the node to behave strangely and the node went into NotReady state.
You can avoid many errors like this if you take care to have a time sync service running on your machines.
This brings "time sync" in third position on the list of things to check when debugging a setup:
1. firewall
2. selinux
3. time sync
Let us first look at the case we solved today:
The setup consisted of 3 nodes, 1 master OpenShift deployment. Whenever we started a privileged systemd container on a node, the node would enter into a NotReady state. These nodes were VMs that were hosted on a VMware EXSI server and the peculiarity was that the issue wasn't seen on a similar OpenShift setup hosted on a libvirt hypervisor.
To truly understand the issue, you need to know about hardware clock and system clock. Hardware clock is maintained on the chip and powered by a battery for the duration when the system is shutdown. System clock is maintained by OS in memory. When the computer boots, OS picks up the time from Hardware clock and sets the system clock. To mitigate clock skew, system clock is updated by OS at regular intervals using ntp or chrony or similar services. Now, hardware clock does not have a timezone attribute, hence it is duty of the admin to tell OS how to interpret it.
As you might have already figured, this hypervisor had wrong date/time set in hardware clock. It was also set to provide the hardware clock time as the hardware clock time to VM on its boot. Once the VM was up, it corrected system clock using NTP. However, when privileged systemd container started, it picked the wrong date/time from hardware clock and set system clock again. This made most of the services on the node to behave strangely and the node went into NotReady state.
You can avoid many errors like this if you take care to have a time sync service running on your machines.
This brings "time sync" in third position on the list of things to check when debugging a setup:
1. firewall
2. selinux
3. time sync
Comments
Post a Comment