This article provides instructions for replacing a damaged OS (root) disk of one database node by cloning it from a healthy database node. This may be needed when no good backup of the disk is available.
Prerequisites:
- The disk containing Oracle software binaries filesystem (typically /u01) on the damaged node is unaffected. The procedure replaces the OS (root) disk only.
- At least one database node of the cluster is healthy.
Note: before using this procedure, open a ticket with Flashgrid Support.
Note: further in this article we use rac1 as the host name of the damaged node and rac2 for the healthy node.
Cloning OS disk from a healthy database node
a) create snapshot from database node rac2 /dev/sda1
volume
b) use the snapshot to create a new volume in the availability zone where the affected instance is located
c) stop the damaged cluster node (rac1)
d) detach /dev/sda1
volume from node rac1
e) attach the volume created at step b) as /dev/sda1
f) start the node (rac1)
Reconfiguration steps after OS volume replacement
a) stop CLAN service
# systemctl stop flashgrid-clan.service
b) change the hostname rac2 -> rac1:
# hostnamectl set-hostname rac1.<mydomain>
c) change the HOSTNAME value in /etc/sysconfig/network
(e.g. rac2 to rac1)
Expected value example: HOSTNAME=rac1
d) adjust iptables rules
- temporarily remove the immutable attribute
# chattr -i /etc/sysconfig/iptables
- replace 192.168.1.X with 192.168.1.Y in /etc/sysconfig/iptables
file. The 192.168.1.X and 192.168.1.Y correspond to the fg-pub network IP addresses of the rac1 and rac2 nodes respectively, e.g. 192.168.1.1 and 192.168.1.2
- restart iptables service: systemctl restart iptables
- add back the immutable attribute:
# chattr +i /etc/sysconfig/iptables
e) edit ISCSI initiator name:
- temporarily removes the immutable attribute
# chattr -i /etc/iscsi/initiatorname.iscsi
- edit /etc/iscsi/initiatorname.iscsi
file and replace the hostname (rac2 with rac1)
Expected value: InitiatorName=iqn.2015-04.io.flashgrid:rac1
- add back the immutable attribute:
# chattr +i /etc/iscsi/initiatorname.iscsi
f) deploy the CLAN configuration
# flashgrid-clan-cfg deploy-config-local
g) deploy FlashGrid Storage Fabric configuration
# flashgrid-cluster deploy-config-local
h) rename /etc/oracle/scls_scr/rac2
# mv /etc/oracle/scls_scr/rac2 /etc/oracle/scls_scr/rac1
i) edit /etc/oracle/olr.loc
file to point to /u01/app/grid/crsdata/rac1/olr/rac1_19.olr
(in case of a customized GI home path, change the path accordingly).
Example of the file contents:
[root@rac1 ~]# cat /etc/oracle/olr.loc olrconfig_loc=/u01/app/grid/crsdata/rac1/olr/rac1_19.olr crs_home=/u01/app/19.3.0/grid
j) reboot rac1 node
# sync; sync
# reboot
k) check cluster status
# flashgrid-cluster
Wait until all disks are back online, and resyncing operations complete on all disk groups before starting the database instances on the node.
l) on rac1 node edit .bashrc
file from GI owner home (grid) and set correct ASM instance number in the ORACLE_SID
variable. Example:
export ORACLE_SID=+ASM1
m) on all other cluster nodes run as users fg (all nodes), grid (database nodes), and oracle (database nodes):
n) on rac1 node, any third-party software (e.g. monitoring agents) that has host-specific configuration must be reconfigured to reflect the newer environment specific to rac1 node.
o) After all the steps above are complete, upload diags to FlashGrid support.