This article provides instructions for replacing a damaged OS (root) disk of one database node by cloning it from a healthy database node. This may be needed when no good backup of the disk is available.
Prerequisites:
- The disk containing Oracle software binaries filesystem (typically /u01) on the damaged node is unaffected. The procedure replaces the OS (root) disk only.
- At least one database node of the cluster is healthy.
Note: Instructions in this article should be used only in an emergency situation when the OS (root) disk is damaged and no valid backup exists for it. If you followed proper backup procedure then see instructions for restoring from backup.
Note: before using this procedure, open a ticket with Flashgrid Support.
Note: further in this article we use rac1 as the host name of the damaged node and rac2 for the healthy node.
Cloning OS disk from a healthy database node
a) create snapshot from database node rac2 of the rac2-root
volume, choosing "Full" snapshot type, and "Disable public and private access" in the Networking options
b) locate the snapshot and select the option to Create disk. Assign a name to the new disk rac1-new-root
and create the new disk in the availability zone where the affected instance is located.
c) stop the damaged cluster node (rac1)
d) on the damaged node (rac1) open the Disks blade, select "Swap OS Disks", and choose the new volume rac1-new-root
e) start the node (rac1)
Reconfiguration steps after OS volume replacement
Login to the node (rac1). Note: you will receive a warning about a changed hostname
a) stop CLAN service
# systemctl stop flashgrid-clan.service
b) change the hostname rac2 -> rac1:
# hostnamectl set-hostname rac1.<mydomain>
c) change the HOSTNAME value in /etc/sysconfig/network
(e.g. rac2 to rac1)
Expected value example: HOSTNAME=rac1
d) adjust iptables rules
- temporarily remove the immutable attribute
# chattr -i /etc/sysconfig/iptables/span
- replace 192.168.1.X with 192.168.1.Y in /etc/sysconfig/iptables
file. The 192.168.1.X and 192.168.1.Y correspond to the fg-pub network IP addresses of the rac1 and rac2 nodes correspondingly, e.g. 192.168.1.1 and 192.168.1.2
- restart iptables service:
# systemctl restart iptables
- add back the immutable attribute:
# chattr +i /etc/sysconfig/iptables
e) edit ISCSI initiator name:
- temporarily removes the immutable attribute
# chattr -i /etc/iscsi/initiatorname.iscsi
- edit /etc/iscsi/initiatorname.iscsi
file and replace the hostname (rac2 with rac1)
Expected value: InitiatorName=iqn.2015-04.io.flashgrid:rac1
- add back the immutable attribute:
# chattr +i /etc/iscsi/initiatorname.iscsi
f) deploy the CLAN configuration
# flashgrid-clan-cfg deploy-config-local
g) deploy FlashGrid Storage Fabric configuration
# flashgrid-cluster deploy-config-local
h) rename /etc/oracle/scls_scr/rac2
# mv /etc/oracle/scls_scr/rac2 /etc/oracle/scls_scr/rac1
i) edit /etc/oracle/olr.loc
file to point to /u01/app/grid/crsdata/rac1/olr/rac1_19.olr
(in case of a customized GI home path, change the path accordingly).
Example of the file contents:
[root@rac1 ~]# cat /etc/oracle/olr.loc olrconfig_loc=/u01/app/grid/crsdata/rac1/olr/rac1_19.olr crs_home=/u01/app/19.3.0/grid
j) reboot rac1 node
# sync; sync
# reboot
k) check cluster status
# flashgrid-cluster
Wait until all disks are back online, and resyncing operations complete on all disk groups before starting the database instances on the node.
l) on rac1 node edit .bashrc
file from GI owner home (grid) and set correct ASM instance number in the ORACLE_SID
variable. Example:
export ORACLE_SID=+ASM1
m) on all other cluster nodes run as users fg (all nodes), grid (database nodes), and oracle (database nodes):
ssh-keygen -R rac1
n) on rac1 node, any third-party software (e.g. monitoring agents) that has host-specific configuration must be reconfigured to reflect the newer environment specific to rac1 node.
o) After all the steps above are complete, upload diags to FlashGrid support. A new license file will need to be generated for the rac1 host.