Resizing cluster node VMs may be needed for performance or cost reasons. Resizing can be done for one node at a time without causing database downtime.
ATTENTION! The procedure below is mandatory for resizing from a newer instance type (r5b, r6i, m6i, c6i) to an older instance type (r5, m5, c5). An attempt to resize instances while the cluster is stopped (disk groups not mounted) will result in failure to mount the disk groups after the resize. More details available here.
Preparation
Prior to resizing nodes, first confirm that your chosen instance type is supported by FlashGrid software.
On the nodes that are to be resized, execute the following command to display a list of supported instance types:
$ flashgrid-clan-cfg show-supported-instance-types
Detected EC2 instance type: c5.xlarge
Supported instance types:
c5.12xlarge
c5.18xlarge
c5.24xlarge
c5.2xlarge
[...]
You can use grep to find a particular instance type, i.e.
$ flashgrid-clan-cfg show-supported-instance-types | grep i3.metal
i3.metal
If your chosen instance type is not shown in the output then it is not supported by the installed version of FlashGrid CLAN software.
You can review Release Notes: Cloud Area Network software to determine if the instance type has been added in a newer release. To check which version you have installed, run:
$ rpm -qa flashgrid-clan
flashgrid-clan-21.8.292.58899.3a5ad336.release-1.el8.x86_64
For access to download a newer version of FlashGrid software, or to request a review of your resizing options, raise a support case with FlashGrid.
Resize Node
To resize database nodes in a running cluster repeat the following steps on each database node, one node at a time
-
Update SGA and PGA sizing parameters for the databases according to the new VM memory size
-
Skip this step unless you have
vm.nr_hugepages
parameter in/etc/sysctl.conf
manually configured. If you have it manually configured then update the parameters according to the new VM size. Note that starting with Storage Fabric 19.02 HugePages are configured automatically by default and manual change is not required. -
Make sure there are no other nodes that are in offline or re-syncing state. All disk groups must have zero offline disks and Resync = No:
# flashgrid-cluster
-
Stop all local database instances running on the node.
-
Stop Oracle CRS on the node:
# crsctl stop crs
-
Stop the FlashGrid Storage Fabric services on the node:
# flashgrid-node stop
-
Stop the VM using AWS console
-
Resize the VM using AWS console
-
Start the VM using AWS console
-
Wait until all disks are back online and resyncing operations complete on all disk groups. All disk groups must have zero offline disks and Resync = No.
# flashgrid-cluster
-
Start all database instances on the node
-
Proceed to the next node
To resize storage or quorum nodes in a running cluster repeat the following steps on each storage/quorum node, one node at a time
-
Make sure there are no other nodes that are in offline or re-syncing state. All disk groups must have zero offline disks and Resync = No:
# flashgrid-cluster
-
Stop the FlashGrid Storage Fabric services on the node:
# flashgrid-node stop
-
Stop the VM using AWS console
-
Resize the VM using AWS console
-
Start the VM using AWS console
-
Wait until all disks are back online and resyncing operations complete on all disk groups. All disk groups must have zero offline disks and Resync = No.
# flashgrid-cluster
- Proceed to the next node
Troubleshooting
Unsupported instance type
If you have resized to an instance type that is unsupported by the current FlashGrid software installation, FlashGrid software will not start after the server is powered on:
# flashgrid-cluster
FlashGrid 22.3.113.61512 #xxxxxxxxxxxxxxxxxxxxxxxx
License: Active, Expires 2022-11-02
Licensee: Company_test
Support plan: Demo
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
FlashGrid service is not running on this node. To start the service:
$ sudo flashgrid-node start
================================================================================
A check of system services (journalctl -xe) shows that the instance type is not supported:
# journalctl -xe
-- Subject: Unit failed
-- Defined-By: systemd
-- Support: https://support.oracle.com
--
-- The unit flashgrid-clan.service has entered the 'failed' state with result 'exit-code'.
lines 1970-2003/2003 (END)
Aug 31 06:35:23 rac1 flashgrid-clan-daemon[2932]: 2022-08-31 06:35:23+0000 [-] Stopping reactor due to fatal error
Aug 31 06:35:23 rac1 flashgrid-clan-daemon[2932]: Traceback (most recent call last):
Aug 31 06:35:23 rac1 flashgrid-clan-daemon[2932]: File "/opt/flashgrid-clan/lib/python3.6/site-packages/twisted/internet/defer.py", line 322, in addCallback
Aug 31 06:35:23 rac1 flashgrid-clan-daemon[2932]:
Aug 31 06:35:23 rac1 flashgrid-clan-daemon[2932]: File "/opt/flashgrid-clan/lib/python3.6/site-packages/twisted/internet/defer.py", line 311, in addCallbacks
Aug 31 06:35:23 rac1 flashgrid-clan-daemon[2932]:
Aug 31 06:35:23 rac1 flashgrid-clan-daemon[2932]: File "/opt/flashgrid-clan/lib/python3.6/site-packages/twisted/internet/defer.py", line 654, in _runCallbacks
Aug 31 06:35:23 rac1 flashgrid-clan-daemon[2932]:
Aug 31 06:35:23 rac1 flashgrid-clan-daemon[2932]: File "/opt/flashgrid-clan/lib/python3.6/site-packages/twisted/internet/defer.py", line 1652, in execute
Aug 31 06:35:23 rac1 flashgrid-clan-daemon[2932]:
Aug 31 06:35:23 rac1 flashgrid-clan-daemon[2932]: --- <exception caught here> ---
Aug 31 06:35:23 rac1 flashgrid-clan-daemon[2932]: File "/opt/flashgrid-clan/lib/python3.6/site-packages/vxlan/flashgrid_vxlan.py", line 1459, in init_service
Aug 31 06:35:23 rac1 flashgrid-clan-daemon[2932]:
Aug 31 06:35:23 rac1 flashgrid-clan-daemon[2932]: File "/opt/flashgrid-clan/lib/python3.6/site-packages/twisted/internet/defer.py", line 151, in maybeDeferred
Aug 31 06:35:23 rac1 flashgrid-clan-daemon[2932]:
Aug 31 06:35:23 rac1 flashgrid-clan-daemon[2932]: File "/opt/flashgrid-clan/lib/python3.6/site-packages/vxlan/flashgrid_vxlan.py", line 1160, in reload_config
Aug 31 06:35:23 rac1 flashgrid-clan-daemon[2932]:
Aug 31 06:35:23 rac1 flashgrid-clan-daemon[2932]: File "/opt/flashgrid-clan/lib/python3.6/site-packages/vxlan/flashgrid_vxlan.py", line 1370, in build_tree
Aug 31 06:35:23 rac1 flashgrid-clan-daemon[2932]:
Aug 31 06:35:23 rac1 flashgrid-clan-daemon[2932]: File "/opt/flashgrid-clan/lib/python3.6/site-packages/vxlan/flashgrid_vxlan.py", line 1297, in add_master_if
Aug 31 06:35:23 rac1 flashgrid-clan-daemon[2932]:
Aug 31 06:35:23 rac1 flashgrid-clan-daemon[2932]: File "/opt/flashgrid-clan/lib/python3.6/site-packages/vxlan/flashgrid_vxlan.py", line 1117, in eval_root_bw
Aug 31 06:35:23 rac1 flashgrid-clan-daemon[2932]:
Aug 31 06:35:23 rac1 flashgrid-clan-daemon[2932]: builtins.Exception: AWS instance type 'r6a.2xlarge' is not supported!
Aug 31 06:35:23 rac1 flashgrid-clan-daemon[2932]:
Aug 31 06:35:23 rac1 flashgrid-clan-daemon[2932]: 2022-08-31 06:35:23+0000 [-] Main loop terminated.
Aug 31 06:35:23 rac1 flashgrid-clan-daemon[2932]: 2022-08-31 06:35:23+0000 [-] Exiting with code 1
Aug 31 06:35:24 rac1 systemd[1]: flashgrid-clan.service: Main process exited, code=exited, status=1/FAILURE
Aug 31 06:35:24 rac1 systemd[1]: flashgrid-clan.service: Failed with result 'exit-code'.
Remediation options
- Rollback to previous instance type, and review instructions at the top of this document to review supported instance types,
- Run "flashgrid-clan-cfg show-supported-instance-types" to identify supported instance types, and re-size to one that is listed,
- Review Release Notes: Cloud Area Network software to determine if the instance type has been added in a newer release. If it has, you may be able to update the FlashGrid software in-place. Raise a ticket to FlashGrid Support requesting access to the newer release (the new release software can be installed, and node will require a reboot. The remaining nodes in the cluster should be upgraded before resizing any other nodes).
Keyword: EC2 instance upgrade