Symptoms
One or more disks were taken offline by ASM
Solution
- Bring the disk(s) online as soon as possible to restore full redundancy (unless the problem is repeating after onlining the disks)
- to online all disks attached to a particular node:
flashgrid-node online
as user fg@ on that node - to online a particular disk (except quorum disks):
asmcmd online -G DGNAME -D 'HOSTNAME$DISKNAME'
as user grid@ - to online a particular quorum disk (a disk on a quorum node):
asmcmd online -G DGNAME -q -D 'HOSTNAME$DISKNAME'
as user grid@ - to online all disks in a disk group:
asmcmd online -G DGNAME -a
as user grid@ - Note:
asmcmd
command can be executed on any databse node, it is not avialable on quorum or storage nodes.
- to online all disks attached to a particular node:
- Wait for the disk group resync to complete and confirm that the disk stays online by running
flashgrid-dg
- Determine the cause of the problem to avoid it happening again
- Check /opt/flashgrid-diags/logs/flashgrid-node-monitor-all.log on the node that the disk belongs for clues why the disk could go offline.
- Use the table below to determine the likely cause of the disk(s) offline.
- Collect cluster diags (
flashgrid-diags --all
), upload to FlashGrid support, and contact FlashGrid support with description of the problem.
Determining the cause of the problem
Several failure types may cause ASM disk(s) to be taken offline. The table below together with the log files can help with determining the exact cause of the problem.
Symptoms | Possible causes |
---|---|
A single disk is offline | Disk problem (may be transient) Stability problem on the node where the disk is attached Network disruption (less likely) |
Multiple disks belonging to the same node are offline, but some disks on the same node are online | Stability problem on the node where the offline disks are attached Network disruption (less likely) |
All disks belonging to the same node are offline | Stability problem on the node where the offline disks are attached Network disruption |
Multiple disks belonging to 2 or more different nodes are offline | Stability problem on one of the nodes Network disruption |
Notes:
- A single disk going offline may be caused by a transient error that will not repeat. However, if the same disk is taken offline more than once then the disk may be damaged and might need replacement.
- Stability problem on a node may be caused by heavy swapping, out of memory, excessive CPU load, or kernel problems that do not result in immediate node crash, but severly impact performance.