To replace a failed SSD in a running cluster
flashgrid-cluster drivescommand to determine the following information about the failed SSD:
- FlashGrid name of the SSD, e.g. rac2.failedserialnumber
- ASM name of the SSD, e.g. RAC2$FAILEDSERIALNUMBER
- slot number where the SSD is installed
- whether the ASM disk is online, offline, or dropped (ASMStatus=N/A)
Drop the failed SSD from the ASM disk group if it has not been dropped yet. Examples:
a. If the failing ASM disk is still online:
SQL> alter diskgroup MYDG drop disk RAC2$FAILEDSERIALNUMBER rebalance wait;
b. If the failed ASM disk is offline, but has not been dropped by ASM:
SQL> alter diskgroup MYDG drop disk RAC2$FAILEDSERIALNUMBER force;
Use flashgrid-node utility to power off the node where the failed SSD is located:
# flashgrid-node poweroff
Physically remove the failed SSD.
Plug in a new SSD in the same PCIe slot.
Power on the node.
command to determine FlashGrid name of the new SSD, e.g. rac2.newserialnumber
Add the new SSD to the ASM disk group that the failed SSD was in. Example:
$ flashgrid-dg add-disks -G MYDG -d /dev/flashgrid/rac2.newserialnumber
If you have to re-add the same SSD that was used before or add a different SSD that already has ASM metadata on it then you need to use the force option
$ flashgrid-dg add-disks -G MYDG -d /dev/flashgrid/rac2.newserialnumber -f