Resizing cluster node VMs may be needed for performance or cost reasons. Resizing can be done for one node at a time without causing database downtime.
ATTENTION! If your are using an AWS Marketplace Private Offer and planning to switch to a different instance type (e.g. when switching from R6i to R7i) then you should first confirm that your Private Offer covers the new instance type. If your existing Private Offer does not include the required instance type (this may happen if the new instance type was introduced after your Private Offer was created), then please contact FlashGrid Sales to have a new Private Offer created for you.
ATTENTION! The procedure below is mandatory for resizing from a newer instance type (r5b, r6i, m6i, c6i) to an older instance type (r5, m5, c5). An attempt to resize instances while the cluster is stopped (disk groups not mounted) will result in failure to mount the disk groups after the resize. More details available here.
Preparation
Prior to resizing nodes, first confirm that your chosen instance type is supported by FlashGrid software.
On the nodes that are to be resized, execute the following command to display a list of supported instance types:
$ flashgrid-clan-cfg show-supported-instance-types
Detected EC2 instance type: c5.xlarge
Supported instance types:
c5.12xlarge
c5.18xlarge
c5.24xlarge
c5.2xlarge
[...]
You can use grep to find a particular instance type, i.e.
$ flashgrid-clan-cfg show-supported-instance-types | grep i3.metal
i3.metal
If your chosen instance type is not shown in the output, then it is not supported by the installed version of FlashGrid CLAN software.
You can review Release Notes: Cloud Area Network software to determine if the instance type has been added in a newer release. To check which version you have installed, run:
$ rpm -q flashgrid-clan
flashgrid-clan-21.8.292.58899.3a5ad336.release-1.el8.x86_64
For access to download a newer version of FlashGrid software, or to request a review of your resizing options, raise a support case with FlashGrid.
Resize Node
To resize the nodes in a running cluster repeat the following steps on each node, one node at a time:
-
Make sure there are no other nodes that are in offline or re-syncing state. All disk groups must have zero offline disks and Resync = No:
# flashgrid-cluster
-
If the node is a database node
- Update SGA and PGA sizing parameters for the databases according to the new VM memory size
-
Skip this step unless you have
vm.nr_hugepages
parameter in/etc/sysctl.conf
manually configured. If you have it manually configured, then update the parameters according to the new VM size. Note that starting with Storage Fabric 19.02 HugePages are configured automatically by default and manual change is not required. -
Stop all local database instances running on the node.
-
Power-off the node:
# flashgrid-node poweroff
-
Resize the EC2 Instance using AWS console. Custom CPU options can be set under Advanced details in the Change instance type view if needed.
-
Start the EC2 Instance using AWS console.
-
Wait until all disks are back online and resyncing operations complete on all disk groups. All disk groups must have zero offline disks and Resync = No.
# flashgrid-cluster
-
If the node is a database node
-
Start all database instances on the node.
-
-
Proceed to the next node.
Troubleshooting
Unsupported instance type
If you have resized to an instance type that is unsupported by the current FlashGrid software installation, FlashGrid software will not start after the server is powered on:
# flashgrid-cluster
FlashGrid 22.3.113.61512 #xxxxxxxxxxxxxxxxxxxxxxxx
License: Active, Expires 2022-11-02
Licensee: Company_test
Support plan: Demo
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
FlashGrid service is not running on this node. To start the service:
$ sudo flashgrid-node start
================================================================================
A check of system services (journalctl -xe) shows that the instance type is not supported:
# journalctl -xe
-- Subject: Unit failed
-- Defined-By: systemd
-- Support: https://support.oracle.com
--
-- The unit flashgrid-clan.service has entered the 'failed' state with result 'exit-code'.
lines 1970-2003/2003 (END)
Aug 31 06:35:23 rac1 flashgrid-clan-daemon[2932]: 2022-08-31 06:35:23+0000 [-] Stopping reactor due to fatal error
Aug 31 06:35:23 rac1 flashgrid-clan-daemon[2932]: Traceback (most recent call last):
Aug 31 06:35:23 rac1 flashgrid-clan-daemon[2932]: File "/opt/flashgrid-clan/lib/python3.6/site-packages/twisted/internet/defer.py", line 322, in addCallback
Aug 31 06:35:23 rac1 flashgrid-clan-daemon[2932]:
Aug 31 06:35:23 rac1 flashgrid-clan-daemon[2932]: File "/opt/flashgrid-clan/lib/python3.6/site-packages/twisted/internet/defer.py", line 311, in addCallbacks
Aug 31 06:35:23 rac1 flashgrid-clan-daemon[2932]:
Aug 31 06:35:23 rac1 flashgrid-clan-daemon[2932]: File "/opt/flashgrid-clan/lib/python3.6/site-packages/twisted/internet/defer.py", line 654, in _runCallbacks
Aug 31 06:35:23 rac1 flashgrid-clan-daemon[2932]:
Aug 31 06:35:23 rac1 flashgrid-clan-daemon[2932]: File "/opt/flashgrid-clan/lib/python3.6/site-packages/twisted/internet/defer.py", line 1652, in execute
Aug 31 06:35:23 rac1 flashgrid-clan-daemon[2932]:
Aug 31 06:35:23 rac1 flashgrid-clan-daemon[2932]: --- <exception caught here> ---
Aug 31 06:35:23 rac1 flashgrid-clan-daemon[2932]: File "/opt/flashgrid-clan/lib/python3.6/site-packages/vxlan/flashgrid_vxlan.py", line 1459, in init_service
Aug 31 06:35:23 rac1 flashgrid-clan-daemon[2932]:
Aug 31 06:35:23 rac1 flashgrid-clan-daemon[2932]: File "/opt/flashgrid-clan/lib/python3.6/site-packages/twisted/internet/defer.py", line 151, in maybeDeferred
Aug 31 06:35:23 rac1 flashgrid-clan-daemon[2932]:
Aug 31 06:35:23 rac1 flashgrid-clan-daemon[2932]: File "/opt/flashgrid-clan/lib/python3.6/site-packages/vxlan/flashgrid_vxlan.py", line 1160, in reload_config
Aug 31 06:35:23 rac1 flashgrid-clan-daemon[2932]:
Aug 31 06:35:23 rac1 flashgrid-clan-daemon[2932]: File "/opt/flashgrid-clan/lib/python3.6/site-packages/vxlan/flashgrid_vxlan.py", line 1370, in build_tree
Aug 31 06:35:23 rac1 flashgrid-clan-daemon[2932]:
Aug 31 06:35:23 rac1 flashgrid-clan-daemon[2932]: File "/opt/flashgrid-clan/lib/python3.6/site-packages/vxlan/flashgrid_vxlan.py", line 1297, in add_master_if
Aug 31 06:35:23 rac1 flashgrid-clan-daemon[2932]:
Aug 31 06:35:23 rac1 flashgrid-clan-daemon[2932]: File "/opt/flashgrid-clan/lib/python3.6/site-packages/vxlan/flashgrid_vxlan.py", line 1117, in eval_root_bw
Aug 31 06:35:23 rac1 flashgrid-clan-daemon[2932]:
Aug 31 06:35:23 rac1 flashgrid-clan-daemon[2932]: builtins.Exception: AWS instance type 'r6a.2xlarge' is not supported!
Aug 31 06:35:23 rac1 flashgrid-clan-daemon[2932]:
Aug 31 06:35:23 rac1 flashgrid-clan-daemon[2932]: 2022-08-31 06:35:23+0000 [-] Main loop terminated.
Aug 31 06:35:23 rac1 flashgrid-clan-daemon[2932]: 2022-08-31 06:35:23+0000 [-] Exiting with code 1
Aug 31 06:35:24 rac1 systemd[1]: flashgrid-clan.service: Main process exited, code=exited, status=1/FAILURE
Aug 31 06:35:24 rac1 systemd[1]: flashgrid-clan.service: Failed with result 'exit-code'.
Remediation options
- Rollback to the previous instance type
- Review the Preparation section at the top of this document to confirm supported instance types
- If the above does not work, open a ticket with FlashGrid Support
Keyword: EC2 instance upgrade