Summary:
Attempt to reboot a node by Oracle Clusterware may result in OS freeze instead of reboot.
Symptoms:
Under certain failure conditions (such as loss of network or storage connectivity) Oracle Clusterware performs node fencing by rebooting the node. The node may hang instead of rebooting. Manual reset of the server/VM is needed if the node hangs.
Affected products:
Any Oracle RAC or Failover HA cluster
Affected versions:
- RHEL 8 based clusters on Azure with 8.5 kernel version 4.18.0-348.y.z.el8_5
- OL 8 based clusters on Azure with RHCK 8.5 kernel version 4.18.0-348.y.z.el8_5
Note: RHEL/OL 7 based clusters are NOT affected
Root cause:
On Azure, the RHEL 8.5 kernel may hang when it receives two SysRq Reboot commands simultaneously. Oracle Clusterware may issue two simultaneous SysRq Reboot commands during node fencing.
Resolution:
RHEL 8: Update to 8.6 kernel version 4.18.0-372.16.1.el8_6 or newer.
Oracle Linux 8: Update to 8.6 kernel version 4.18.0-372.16.1.el8_6 or newer, or switch to UEKR6 kernel.