Troubleshooting: Disk(s) taken offline – FlashGrid Help Center

Symptoms

One or more disks show as offline in Oracle ASM.

Solution

In most cases, FlashGrid software within 10 minutes will automatically online the disk(s) that were offlined because of transient storage or network errors. No action by user is necessary if the disk is onlined successfully and stays online.

However, if the problem repeats, then the disk(s) may stay offline to avoid more repeating errors. In such a case follow the steps below:

Identify which disks are offline in which diskgroup by running the following command on any node: flashgrid-cluster drives
Bring the disk(s) online as soon as possible to restore full redundancy (unless the problem repeats after bringing the disks online).
- To online all disks attached to a particular node run: flashgrid-node online as user fg on that node
- To online a particular disk (except quorum disks): asmcmd online -G DGNAME -D 'HOSTNAME$DISKNAME' as user grid
- To online a particular quorum disk (a disk on a quorum node): asmcmd online -G DGNAME -q -D 'HOSTNAME$DISKNAME' as user grid
- To online all disks in a disk group: asmcmd online -G DGNAME -a as user grid
- Note: asmcmd command can be executed on any database node. It is not available on quorum or storage nodes.
Wait for the disk group resync to complete and confirm that the disk stays online by running flashgrid-cluster drives
To avoid the recurrence of the problem, it is essential to identify the cause of it:
- Check /opt/flashgrid-diags/log/node_monitor-all.log on the node for clues that may explain why the disk could go offline.
- You can use the table provided below to identify the probable reason for the disk(s) being offline.
- Upload cluster diags to FlashGrid support and open a FlashGrid support ticket with a problem description.

Determining the cause of the problem

Several failure types may cause ASM disk(s) to be taken offline. The table below, together with the log files, can help determine the exact cause of the problem.

Symptoms	Possible causes
A single disk is offline	Disk problem (may be transient). Stability problem on the node where the disk is attached. Network disruption (less likely).
Multiple disks belonging to the same node are offline, but some disks on the same node are online	Stability problem on the node where the offline disks are attached. Storage disruption in the cloud AZ where the disks are located. Network disruption (less likely).
All disks belonging to the same node are offline	Stability problem on the node where the offline disks are attached. Storage disruption in the cloud AZ where the disks are located. Network disruption.
Multiple disks belonging to 2 or more different nodes are offline	Stability problem on one of the nodes. Network disruption.

Notes:

A single disk going offline may be caused by a transient error that will not repeat. However, if the same disk is taken offline more than once then the disk may be damaged and might need replacement.
The most common causes of node instability are connection storms and server hardware problems. Other possible reasons include heavy swapping, out-of-memory, and excessive CPU load - these are less likely on a properly configured system.