Summary
On Oracle Linux 8 on AWS, the /u01
volume (or any other LVM volume) may fail to mount after reboot preventing Oracle services from starting.
Symptoms
When the EC2 instance boots up, the LVM volumes are not mounted (no /u01
in the df
command output):
$ df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 16G 0 16G 0% /dev
tmpfs 16G 0 16G 0% /dev/shm
tmpfs 16G 25M 16G 1% /run
tmpfs 16G 0 16G 0% /sys/fs/cgroup
/dev/nvme0n1p1 35G 8.0G 28G 23% /
tmpfs 3.1G 0 3.1G 0% /run/user/1001
tmpfs 3.1G 0 3.1G 0% /run/user/1002
tmpfs 3.1G 0 3.1G 0% /run/user/0
The following messages are logged in /var/log/messages
:
systemd[1]: dev-testvg-u01.device: Job dev-testvg-u01.device/start timed out.
systemd[1]: Timed out waiting for device dev-testvg-u01.device.
systemd[1]: Dependency failed for /u01.
systemd[1]: Dependency failed for Remote File Systems.
systemd[1]: remote-fs.target: Job remote-fs.target/start failed with result 'dependency'.
systemd[1]: u01.mount: Job u01.mount/start failed with result 'dependency'.
systemd[1]: dev-testvg-u01.device: Job dev-testvg-u01.device/start failed with result 'timeout'.
Affected Products
FlashGrid Cluster on AWS running Oracle Linux 8
FlashGrid Server on AWS running Oracle Linux 8
Affected Versions
The following systemd versions are affected:
-
systemd-239-78.0.3
or later
To determine the installed systemd version, please run:
$ rpm -q systemd
Root Cause (preliminary)
Likely a bug in the Oracle Linux version of systemd RPM.
The Oracle Linux version of systemd includes the systemd-fstab-generator-reload-targets.service
that is not present in RHEL 8. Disabling it prevents the issue.
Workaround
To prevent the issue:
A. Either mask systemd-fstab-generator-reload-targets.service
manually, or update to FlashGrid software version 24.05 or later:
# systemctl mask systemd-fstab-generator-reload-targets.service
B. Add x-systemd.before=local-fs.target
option to the /u01
filesystem in /etc/fstab
(or the filesystem that is used for the Oracle Grid Infrastructure home if it is not /u01
).
For example, change /dev/fgvg/u01 /u01 xfs defaults 0 0
to /dev/fgvg/u01 /u01 xfs defaults,x-systemd.before=local-fs.target 0 0
In case the system still experiences the issue after implementing A and B, switch to R6i or newer EC2 instance type.
To resolve the issue after it has already happened:
Run vgchange -a y
as root to activate the LVM volumes.
Resolution
FlashGrid is working with Oracle to identify a permanent fix.