Resizing cluster node VMs may be needed for performance or cost reasons. Resizing can be done for one node at a time without causing database downtime.
Preparation
Prior to resizing nodes, first confirm that your chosen instance type is supported by FlashGrid software.
On the nodes that are to be resized, execute the following command to display a list of supported instance types:
$ flashgrid-clan-cfg show-supported-instance-types
Detected GCP instance type: n1-standard-32
Supported instance types:
m1-megamem-96
m1-ultramem-160
m1-ultramem-40
[...]
You can use grep to find a particular instance type, i.e.
$ flashgrid-clan-cfg show-supported-instance-types | grep n1-standard-8
n1-standard-8
If your chosen instance type is not shown in the output then it is not supported by the installed version of FlashGrid CLAN software.
You can review Release Notes: Cloud Area Network software to determine if the instance type has been added in a newer release. To check which version you have installed, run:
$ rpm -q flashgrid-clan
flashgrid-clan-21.8.292.58899.3a5ad336.release-1.el8.x86_64
For access to download a newer version of FlashGrid software, or to request a review of your resizing options, raise a support case with FlashGrid.
Resize Node
To resize the nodes in a running cluster repeat the following steps on each node, one node at a time:
-
Make sure there are no other nodes that are in offline or re-syncing state. All disk groups must have zero offline disks and Resync = No:
# flashgrid-cluster
- If the node is a database node
-
Update SGA and PGA sizing parameters for the databases according to the new VM memory size.
-
Skip this step unless you have
vm.nr_hugepages
parameter in/etc/sysctl.conf
manually configured. If you have it manually configured, then update the parameters according to the new VM size. Note that starting with Storage Fabric 19.02 HugePages are configured automatically by default and manual change is not required. -
Stop all local database instances running on the node.
-
-
Stop the FlashGrid Storage Fabric services on the node:
# flashgrid-node stop
-
Stop the VM using GCP console
-
Resize the VM using GCP console
-
Start the VM using GCP console
-
Wait until all disks are back online and resyncing operations complete on all disk groups. All disk groups must have zero offline disks and Resync = No.
# flashgrid-cluster
- If the node is a database node
-
Start all database instances on the node
-
- Proceed to the next node
Troubleshooting
Unsupported instance type
If you have resized to an instance type that is unsupported by the current FlashGrid software installation, FlashGrid software will not start after the server is powered on:
# flashgrid-cluster
FlashGrid 21.6.36.54852 #xxxxxxxxxxxxxxxxxxxxxxxxxx
License: Active, Marketplace
Licensee: FlashGrid
Support plan: 24x7
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
FlashGrid service is not running on this node. To start the service:
$ sudo flashgrid-node start
================================================================================
A check of system services (journalctl -xe) shows that the instance type is not supported:
# journalctl -xe
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit flashgrid-clan.service has begun starting up.
Aug 31 09:09:20 rac1.example.com systemd[1]: Started FlashGrid CLAN.
-- Subject: Unit flashgrid-clan.service has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit flashgrid-clan.service has finished starting up.
--
-- The start-up result is done.
Aug 31 09:09:21 rac1.example.com flashgrid-clan-daemon[2440]: 2022-08-31 09:09:21+0000 [-] Log opened.
Aug 31 09:09:21 rac1.example.com flashgrid-clan-daemon[2440]: 2022-08-31 09:09:21+0000 [-] Detected GCP instance type: n2d-standard-32
Aug 31 09:09:21 rac1.example.com flashgrid-clan-daemon[2440]: 2022-08-31 09:09:21+0000 [-] Starting ebpf filter...
Aug 31 09:09:22 rac1.example.com kernel: python2[2445] is installing a program with bpf_override_return helper that may cause unexpected behavior!
Aug 31 09:09:22 rac1.example.com kernel: python2[2445] is installing a program with bpf_override_return helper that may cause unexpected behavior!
Aug 31 09:09:22 rac1.example.com flashgrid-clan-daemon[2440]: 2022-08-31 09:09:22+0000 [-] Traffic filter enabled
Aug 31 09:09:22 rac1.example.com flashgrid-clan-daemon[2440]: 2022-08-31 09:09:22+0000 [-] Stopping reactor due to fatal error
Aug 31 09:09:22 rac1.example.com flashgrid-clan-daemon[2440]: Traceback (most recent call last):
Aug 31 09:09:22 rac1.example.com flashgrid-clan-daemon[2440]: File "/opt/flashgrid-clan/lib/python3.6/site-packages/twisted/internet/defer.py", line 322, in addCallback
Aug 31 09:09:22 rac1.example.com flashgrid-clan-daemon[2440]: File "/opt/flashgrid-clan/lib/python3.6/site-packages/twisted/internet/defer.py", line 311, in addCallbacks
Aug 31 09:09:22 rac1.example.com flashgrid-clan-daemon[2440]: File "/opt/flashgrid-clan/lib/python3.6/site-packages/twisted/internet/defer.py", line 654, in _runCallbacks
Aug 31 09:09:22 rac1.example.com flashgrid-clan-daemon[2440]: File "/opt/flashgrid-clan/lib/python3.6/site-packages/twisted/internet/defer.py", line 1652, in execute
Aug 31 09:09:22 rac1.example.com flashgrid-clan-daemon[2440]: --- <exception caught here> ---
Aug 31 09:09:22 rac1.example.com flashgrid-clan-daemon[2440]: File "/opt/flashgrid-clan/lib/python3.6/site-packages/vxlan/flashgrid_vxlan.py", line 1155, in init_service
Aug 31 09:09:22 rac1.example.com flashgrid-clan-daemon[2440]: File "/opt/flashgrid-clan/lib/python3.6/site-packages/twisted/internet/defer.py", line 151, in maybeDeferred
Aug 31 09:09:22 rac1.example.com flashgrid-clan-daemon[2440]: File "/opt/flashgrid-clan/lib/python3.6/site-packages/vxlan/flashgrid_vxlan.py", line 861, in reload_config
Aug 31 09:09:22 rac1.example.com flashgrid-clan-daemon[2440]: File "/opt/flashgrid-clan/lib/python3.6/site-packages/vxlan/flashgrid_vxlan.py", line 1071, in build_tree
Aug 31 09:09:22 rac1.example.com flashgrid-clan-daemon[2440]: File "/opt/flashgrid-clan/lib/python3.6/site-packages/vxlan/flashgrid_vxlan.py", line 998, in add_master_if
Aug 31 09:09:22 rac1.example.com flashgrid-clan-daemon[2440]: File "/opt/flashgrid-clan/lib/python3.6/site-packages/vxlan/flashgrid_vxlan.py", line 840, in eval_root_bw
Aug 31 09:09:22 rac1.example.com flashgrid-clan-daemon[2440]: builtins.Exception: GCP instance type 'n2d-standard-32' is not supported!
Aug 31 09:09:22 rac1.example.com flashgrid-clan-daemon[2440]: 2022-08-31 09:09:22+0000 [-] Main loop terminated.
Aug 31 09:09:22 rac1.example.com flashgrid-clan-daemon[2440]: 2022-08-31 09:09:22+0000 [-] Exiting with code 1
Aug 31 09:09:22 rac1.example.com systemd[1]: flashgrid-clan.service: main process exited, code=exited, status=1/FAILURE
Aug 31 09:09:22 rac1.example.com systemd[1]: Unit flashgrid-clan.service entered failed state.
Aug 31 09:09:22 rac1.example.com systemd[1]: flashgrid-clan.service failed.
Remediation options
- Rollback to the previous instance type
- Review the Preparation section at the top of this document to confirm supported instance types
- If the above does not work, open a ticket with FlashGrid Support