Symptoms
Cluster nodes (if these nodes have AMD CPU and Mellanox NIC) reboot unexpectedly.
In dmsg log, you see this error:
AMD-Vi: Event logged [IO_PAGE_FAULT device=41:00.0 domain=0x002d address=0x00000000be6ca980 flags=0x0020]
Cause
Known issue specific to this configuration. See the Troubleshooting section of this Mellanox community article.