Resizing cluster node VMs may be needed for performance or cost reasons. Resizing can be done for one node at a time without causing database downtime.
Preparation
Prior to resizing nodes, first confirm that your chosen instance type is supported by FlashGrid software.
On the nodes that are to be resized, execute the following command to display a list of supported instance types:
$ flashgrid-clan-cfg show-supported-instance-types
Detected Azure instance type: Standard_D8s_v5
Supported instance types:
Standard_D16ads_v5
Standard_D16as_v4
Standard_D16as_v5
[...]
You can use grep to find a particular instance type, i.e.
$ flashgrid-clan-cfg show-supported-instance-types | grep E48bs
Standard_E48bs_v5
If your chosen instance type is not shown in the output then it is not supported by the installed version of FlashGrid CLAN software.
You can review Release Notes: Cloud Area Network software to determine if the instance type has been added in a newer release. To check which version you have installed, run:
$ rpm -q flashgrid-clan
flashgrid-clan-21.8.292.58899.3a5ad336.release-1.el8.x86_64
For access to download a newer version of FlashGrid software, or to request a review of your resizing options, raise a support case with FlashGrid.
Additionally, you cannot resize a VM with a local temp disk to one without and vice versa. Only the following combinations are allowed:
- VM size with a local temp disk → VM size with a local temp disk (Eadsv5, Ebdsv5, Edsv5, Easv4, Edsv4, Esv3, Dadsv5, Ddsv5, Dasv4, Ddsv4, Dsv3, Lsv2, M, Mv2, Mdsv2, FX) OR
- VM size without a local temp disk → VM size without a local temp disk (Easv5, Ebsv5, Esv5, Esv4, Dasv5, Dsv5, Dsv4, Msv2)
Review Azure documentation to find out whether or not a VM size has a local temp disk.
Resize Node
To resize the nodes in a running cluster repeat the following steps on each node, one node at a time:
-
Make sure there are no other nodes that are in offline or re-syncing state. All disk groups must have zero offline disks and Resync = No:
# flashgrid-cluster
-
If the node is a database node
-
Update SGA and PGA sizing parameters for the databases according to the new VM memory size.
-
Skip this step unless you have
vm.nr_hugepages
parameter in/etc/sysctl.conf
manually configured. If you have it manually configured then update the parameters according to the new VM size. Note that starting with Storage Fabric 19.02 HugePages are configured automatically by default and manual change is not required. - Stop all local database instances running on the node.
-
-
Stop the FlashGrid Storage Fabric services on the node:
# flashgrid-node stop
-
Stop the VM using Azure console.
-
Resize the VM using Azure console.
-
Start the VM using Azure console.
-
Wait until all disks are back online and resyncing operations complete on all disk groups. All disk groups must have zero offline disks and Resync = No.
# flashgrid-cluster
-
If the node is a database node
-
Start all database instances on the node.
-
-
Proceed to the next node.
Troubleshooting
Unsupported instance type
If you have resized to an instance type that is unsupported by the current FlashGrid software installation, FlashGrid software will not start after the server is powered on:
# flashgrid-cluster
FlashGrid 21.11.23.45125 #xxxxxxxxxxxxxxxxxxxxxxxx
License: Active, Expires 2022-09-29
Licensee: Company_test
Support plan: Demo
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
FlashGrid service is not running on this node. To start the service:
$ sudo flashgrid-node start
================================================================================
A check of system services (journalctl -xe) shows that the instance type is not supported:
# journalctl -xe
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit flashgrid-clan.service has finished starting up.
--
-- The start-up result is done.
Aug 30 05:11:06 rac1 flashgrid-clan-daemon[2986]: 2022-08-30 05:11:06+0000 [-] Log opened.
Aug 30 05:11:06 rac1 flashgrid-clan-daemon[2986]: 2022-08-30 05:11:06+0000 [-] GCP API failed with code 404
Aug 30 05:11:06 rac1 flashgrid-clan-daemon[2986]: 2022-08-30 05:11:06+0000 [-] EC2 token fetch failed with code 400
Aug 30 05:11:06 rac1 flashgrid-clan-daemon[2986]: 2022-08-30 05:11:06+0000 [-] Detected Azure instance type: Standard_E8bds_v5
Aug 30 05:11:06 rac1 flashgrid-clan-daemon[2986]: 2022-08-30 05:11:06+0000 [-] Starting ebpf filter...
[...]
Aug 30 05:11:07 rac1 flashgrid-clan-daemon[2986]: builtins.Exception: Azure instance type 'Standard_E8bds_v5' is not supported!
Aug 30 05:11:07 rac1 flashgrid-clan-daemon[2986]: 2022-08-30 05:11:07+0000 [-] Main loop terminated.
Aug 30 05:11:07 rac1 flashgrid-clan-daemon[2986]: 2022-08-30 05:11:07+0000 [-] Exiting with code 1
Aug 30 05:11:07 rac1 systemd[1]: flashgrid-clan.service: main process exited, code=exited, status=1/FAILURE
Aug 30 05:11:07 rac1 systemd[1]: Unit flashgrid-clan.service entered failed state.
Aug 30 05:11:07 rac1 systemd[1]: flashgrid-clan.service failed.
Remediation options
- Rollback to the previous instance type
- Review the Preparation section at the top of this document to confirm supported instance types
- If the above does not work, open a ticket with FlashGrid Support