From 04bf190d9339a6cd5adef1a4b789517ff9838fdb Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Tue, 28 Jun 2011 16:59:42 +1000 Subject: [PATCH] md: avoid endless recovery loop when waiting for fail device to complete. commit 4274215d24633df7302069e51426659d4759c5ed upstream. If a device fails in a way that causes pending request to take a while to complete, md will not be able to immediately remove it from the array in remove_and_add_spares. It will then incorrectly look like a spare device and md will try to recover it even though it is failed. This leads to a recovery process starting and instantly aborting over and over again. We should check if the device is faulty before considering it to be a spare. This will avoid trying to start a recovery that cannot proceed. This bug was introduced in 2.6.26 so that patch is suitable for any kernel since then. Reported-by: Jim Paradis Signed-off-by: NeilBrown Signed-off-by: Greg Kroah-Hartman Signed-off-by: Andi Kleen --- drivers/md/md.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/md/md.c b/drivers/md/md.c index 010cbdc9506..0f88c456025 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -6967,6 +6967,7 @@ static int remove_and_add_spares(mddev_t *mddev) list_for_each_entry(rdev, &mddev->disks, same_set) { if (rdev->raid_disk >= 0 && !test_bit(In_sync, &rdev->flags) && + !test_bit(Faulty, &rdev->flags) && !test_bit(Blocked, &rdev->flags)) spares++; if (rdev->raid_disk < 0 -- 2.11.4.GIT