Symptoms
One or more disks were taken offline by ASM.
Solution
- Identify which disks are offline in which diskgroup by running the following command on any node:
flashgrid-cluster drives
- Bring the disk(s) online as soon as possible to restore full redundancy (unless the problem repeats after bringing the disks online).
- To online all disks attached to a particular node run:
flashgrid-node online
as user fg@ on that node - To online a particular disk (except quorum disks):
asmcmd online -G DGNAME -D 'HOSTNAME$DISKNAME'
as user grid@ - To online a particular quorum disk (a disk on a quorum node):
asmcmd online -G DGNAME -q -D 'HOSTNAME$DISKNAME'
as user grid@ - To online all disks in a disk group:
asmcmd online -G DGNAME -a
as user grid@ - Note:
asmcmd
command can be executed on any database node. It is not available on quorum or storage nodes.
- To online all disks attached to a particular node run:
- Wait for the disk group resync to complete and confirm that the disk stays online by running
flashgrid-cluster drives
- To avoid the recurrence of the problem, it is essential to identify the cause of it:
- Check
/opt/flashgrid-diags/log/node_monitor-all.log
on the node for clues that may explain why the disk could go offline. - You can use the table provided below to identify the probable reason for the disk(s) being offline.
- Upload cluster diags to FlashGrid support and open a FlashGrid support ticket with a problem description.
- Check
Determining the cause of the problem
Several failure types may cause ASM disk(s) to be taken offline. The table below, together with the log files, can help determine the exact cause of the problem.
Symptoms | Possible causes |
---|---|
A single disk is offline |
|
Multiple disks belonging to the same node are offline, but some disks on the same node are online |
|
All disks belonging to the same node are offline |
|
Multiple disks belonging to 2 or more different nodes are offline |
|
Notes:
- A single disk going offline may be caused by a transient error that will not repeat. However, if the same disk is taken offline more than once then the disk may be damaged and might need replacement.
- The most common causes of node instability are connection storms and server hardware problems. Other possible reasons include heavy swapping, out-of-memory, and excessive CPU load - these are less likely on a properly configured system.