This guide provides detailed steps on the problem scenarios and solution on Longhorn PVC mount failures.
Problem with multipathd service
In some cases, Longhorn fails to mount Persistent Volume Claims (PVC) to pods in a Kubernetes cluster. This issue is typically caused by conflicts with the multipathd service, which may mistakenly identify Longhorn volumes as being in use, preventing the filesystem from being created.
The multipathd service is responsible for managing multiple paths to the same storage device. When it incorrectly identifies a Longhorn volume as being in use, it blocks the filesystem creation process, resulting in mount failures.
You might encounter the following error message in your Kubernetes environment:
Error Message:
Warning  FailedMount  12s (x6 over 28s)  kubelet  
MountVolume.MountDevice failed for volume "pvc-87285c92-26c4-40bd-842d-7f608d9db2d8": 
rpc error: code = Internal desc = format of disk "/dev/longhorn/pvc-87285c92-26c4-40bd-842d-7f608d9db2d8" failed:
type: ("ext4")
target: ("/var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/1e70ad7ff7c1222b1d656429fcc03679fdfa8ed3d9ae0739e656b2e161bfc08d/globalmount")
options: ("defaults")
errcode: (exit status 1)
output: (
  mke2fs 1.46.4 (18-Aug-2021)
  /dev/longhorn/pvc-87285c92-26c4-40bd-842d-7f608d9db2d8 is apparently in use by the system; will not make a filesystem here!
)Solution
Follow these steps to resolve the issue:
Step 1: Edit the multipath.conf File
- Open the multipath.conf file for editing:vi /etc/multipath.conf
- Add the Configuration.- Add the following configuration to multipath.conffile on all nodes in the cluster:blacklist { devnode "^sd[a-z0-9]+" }
- After adding the configuration, the file should look like this:defaults { user_friendly_names yes } blacklist { devnode "^sd[a-z0-9]+" }
 
- Add the following configuration to 
Step 2: Restart the multipathd.service
After the multipath.conf file update, restart the multipathd service on all nodes in the cluster. Use the below command to restart it.
systemctl restart multipathd.serviceStep 3: Delete and Recreate the Affected Pods
To apply the changes and resolve the issue, delete the affected pods so that Kubernetes can recreate them with the corrected configuration:
kubectl delete pod nextgen-gw-0 nextgen-gw-redis-master-0Problem with longhorn file corruption
- Longhorn cannot remount the volume when the Longhorn volume has a corrupted filesystem. The workload then fails to restart as a result of this.
- Longhorn cannot fix this automatically. You will need to resolve this manually when this happens.
 You might encounter the following error message in your Kubernetes environment:
 Error Message:Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedMount 56s (x5809 over 8d) kubelet MountVolume.MountDevice failed for volume "pvc-b3ca140a-dab9-49f6-9f39-063594e58521" : rpc error: code = Internal desc = 'fsck' found errors on device /dev/longhorn/pvc-b3ca140a-dab9-49f6-9f39-063594e58521 but could not correct them: fsck from util-linux 2.39.3 /dev/longhorn/pvc-b3ca140a-dab9-49f6-9f39-063594e58521 contains a file system with errors, check forced. /dev/longhorn/pvc-b3ca140a-dab9-49f6-9f39-063594e58521: Unattached inode 1555
Solution
Follow these steps to resolve the issue:
Step 1: Identify the Node Running the Pod
Run the following command to find the node where the gateway pod is running:
kubectl get pods –o wide Sample Responce:
root@opsramp-gateway:/home/gateway-admin# kubectl get pods -o wide 
NAME                        READY   STATUS    RESTARTS        AGE   IP               NODE             NOMINATED NODE   READINESS GATES 
nextgen-gw-0                0/3     ContainerCreating   0               12m   10.42.0.31       opsram-pgateway   <none>           <none> 
nextgen-gw-redis-master-0   1/1     Running   0               25m   10.42.0.29       opsramp-gateway   <none>           <none> From this output, we see that the gateway pod is running on the opsramp-gateway node.
Step 2: Login to the node and fix the file corruption issue
Log in to the node (opsramp-gateway) where the pod is running. Then, run the following command to repair the corrupted filesystem:
fsck –y <file-path> Note
Obtain the<file-path> by describing the pod:Kubectl describe pod nextgen-gw-0 Sample Responce:
Events: 
  Type     Reason       Age                  From     Message 
  ----     ------       ----                 ----     ------- 
  Warning  FailedMount  56s (x5809 over 8d)  kubelet  MountVolume.MountDevice failed for volume "pvc-b3ca140a-dab9-49f6-9f39-063594e58521" : rpc error: code = Internal desc = 'fsck'                                                        found errors on device /dev/longhorn/pvc-b3ca140a-dab9-49f6-9f39-063594e58521 but could not correct them: fsck from util-linux 2.39.3 
/dev/longhorn/pvc-b3ca140a-dab9-49f6-9f39-063594e58521 contains a file system with errors, check forced. 
/dev/longhorn/pvc-b3ca140a-dab9-49f6-9f39-063594e58521: Unattached inode 1555 In this case, the file path is /dev/longhorn/pvc-b3ca140a-dab9-49f6-9f39-063594e58521, so run:
fsck –y /dev/longhorn/pvc-b3ca140a-dab9-49f6-9f39-063594e58521 Step 3: Delete the Affected Pods
To apply the fixes, delete the affected pod so Kubernetes can recreate it:
kubectl delete pod nextgen-gw-0 If multiple pods are affected, repeat the deletion process for each.