Summary
Attempt to reboot a node by Oracle Clusterware may result in OS freeze instead of reboot.
Symptoms
Under certain failure conditions (such as loss of network or storage connectivity) Oracle Clusterware performs node fencing by rebooting the node. The node may hang instead of rebooting. Manual reset of the server/VM is needed if the node hangs.
Affected Products
FlashGrid Cluster on Azure using Oracle Linux 8 with UEKR6 kernel
Affected Versions
FlashGrid Cluster version 23.04 or earlier.
Oracle Linux 8 UEKR6 kernel versions 5.4.17-2136.320.7.el8uek and above.
Root Cause
Kernel bug.
Resolution
Update to FlashGrid Cluster version 23.06 or later. The update adds a kernel parameter that prevents the hang.
Alternative workaround: Keep (or downgrade to) kernel version 5.4.17-2136.319.1.4.el8uek or earlier.