Summary:
On Azure based clusters, hv_balloon and hyperv_fb driver modules, if loaded, may cause disruptions in communication with a node under certain conditions and may cause cluster brown-out and potential database downtime.
Symptoms:
- Intermittent network disruptions with a node recorded by FlashGrid Node Monitor service and by Oracle CSSD service over an extended period of time
- iSCSI connection errors recorded over an extended period of time
- Disks taken offline
- Disk groups being dismounted
Affected products:
FlashGrid Cluster on Azure
Affected versions:
Clusters deployed on Azure with FlashGrid Launcher version 19.10 or earlier are affected.
Note: FlashGrid software updates prior to version 21.08.100 (released on 2021-11-09) did not fix the issue.
Note: Your cluster is affected if the /etc/modprobe.d/blacklist.conf file does not exist on each node.
Root cause:
The hv_balloon and hyperv_fb modules are not necessary on FlashGrid Cluster and may cause disruptions in operation of the VM. These modules are blacklisted in FlashGrid Clusters deployed with FlashGrid Launcher 20.02 (released in February 2020) and newer. However, the update package prior to version 21.08.100 did not include blacklisting of these modules.
Resolution:
Update FlashGrid Cluster software using flashgrid_node_update package of version 21.08.100 or newer. The update will blacklist the hv_balloon and hyperv_fb modules. See update instructions here.