can mdadm say if an array is broken ?

doublehp · Guru Joined: 11 Apr 2005 Posts: 473 Location: FRANCE

When a device is missing in an array, /proc/mdstat put's an "_" in the description. If only one device is missing in RAID5 or one or two in RAID6, the array can be fixed by adding a new volume. When doing "mdadm -a", recovery starts automatically.

When 2 are faulty in RAID5 or three in RAID6, mdadm outputs things the same way; the array can not be fixed by adding more drives, and DATA can not be fixed ever. After adding more drives, mdadm does NOT start recovery, and just put the devices as "spare". In this second case, data are lost forcever, and this point is not clear at all ...

neither in proc/mdstat nor in mdadm -D ...

Did I miss something ? or is mdadm just unable to "measure" this ?

I understand that at boot process, kernel adds volumes one by one to arrays, so that, during a few milisenconds, arrays are in degraded mode untill all elements of the array are found. I understand that degraded mode is acceptable for a few seconds at boot time, and thus, it is an essantial, and "not so alarming state" for an array. Still, I would like mdadm or the kernel to tell me "right now, at once, the array is not usable, broken, and <<if you don't add blocks pretty fast, and performa any write on the array, you will loose data forever>> ".
_________________
DEMAINE Benoît-Pierre (aka DoubleHP ) http://www.demaine.info/
>o_/ Coin coin coin \_o<
to contact me (MSN,ICQ, JABBER, Skype ... ) http://benoit.demaine.info/contact.png

drescherjm · Posted: Wed Jan 06, 2010 4:06 am Post subject:

doublehp · Guru Joined: 11 Apr 2005 Posts: 473 Location: FRANCE

I only used RAID 0 and RAID1 in the past; I just bought 4 drives this morning, and doing heavy tests, to understand how mdadm works, and what my computer will say me when one fails, then a second one. Just doing simulation, using mdadm -f -r for now.

Still, if I do -f vol4 -f vol3 -r vol4 -r vol3 ... proc/mdstat say that everything is normal. While I would expect some "error", warning, or explicit message about possible data loss. IMHO, it should show a graphical difference between "your are loosing redundancy", and "you have lost redundancy" and "your data are corrupt". Mdadm -D is not really better than mdstat.
_________________
DEMAINE Benoît-Pierre (aka DoubleHP ) http://www.demaine.info/
>o_/ Coin coin coin \_o<
to contact me (MSN,ICQ, JABBER, Skype ... ) http://benoit.demaine.info/contact.png

drescherjm · Posted: Wed Jan 06, 2010 4:14 am Post subject:

drescherjm · Posted: Wed Jan 06, 2010 4:18 am Post subject:

doublehp · Guru Joined: 11 Apr 2005 Posts: 473 Location: FRANCE

I just says as you expect [UU__] ... and *I* have to know that, if the erray is RAID5, then it's dead, if RAID6, I have to hurry up.

Unless messages are removed because I remove members manually ? maybe things should be different in case kernel detects failts by itself ? still, before letting me marking a volume as faulty, it should warn me. Because, if just before I mark it faulty, an other volume breaks ... I may end of rapidly with a broken system.

Mdadm really seems to have absolutely no backup or security against PEBCAK, or human mistakes.
_________________
DEMAINE Benoît-Pierre (aka DoubleHP ) http://www.demaine.info/
>o_/ Coin coin coin \_o<
to contact me (MSN,ICQ, JABBER, Skype ... ) http://benoit.demaine.info/contact.png

doublehp · Guru Joined: 11 Apr 2005 Posts: 473 Location: FRANCE

mdadm should send me email/sms automatically ? where do i set this ?
_________________
DEMAINE Benoît-Pierre (aka DoubleHP ) http://www.demaine.info/
>o_/ Coin coin coin \_o<
to contact me (MSN,ICQ, JABBER, Skype ... ) http://benoit.demaine.info/contact.png

drescherjm · Posted: Wed Jan 06, 2010 4:24 am Post subject:

drescherjm · Posted: Wed Jan 06, 2010 4:25 am Post subject:

drescherjm · Posted: Wed Jan 06, 2010 4:27 am Post subject:

Also the /proc/mdstat will say degraded on the array that does not have all of its drives.
_________________
John

My gentoo overlay
Instructons for overlay

doublehp · Guru Joined: 11 Apr 2005 Posts: 473 Location: FRANCE

I never used the conf file; always let magic dothings. Can't mdadm record the email to send directly in the drives ? or make the monitoring daemon do it without declaring arrays in the conf ?

I tried several times to declare the drives in the conf, and always got troubles.
_________________
DEMAINE Benoît-Pierre (aka DoubleHP ) http://www.demaine.info/
>o_/ Coin coin coin \_o<
to contact me (MSN,ICQ, JABBER, Skype ... ) http://benoit.demaine.info/contact.png

drescherjm · Posted: Wed Jan 06, 2010 4:29 am Post subject:

There is some good info here:
http://en.gentoo-wiki.com/wiki/RAID/Software
_________________
John

My gentoo overlay
Instructons for overlay

doublehp · Guru Joined: 11 Apr 2005 Posts: 473 Location: FRANCE

drescherjm · Posted: Wed Jan 06, 2010 4:33 am Post subject:

I may be wrong about it saying degraded in the /proc/mdstat. I do know it does send out emails though.
_________________
John

My gentoo overlay
Instructons for overlay

drescherjm · Posted: Wed Jan 06, 2010 4:36 am Post subject:

BTW. Here are examples of the email nagios can send:

doublehp · Guru Joined: 11 Apr 2005 Posts: 473 Location: FRANCE

Monkeh · Veteran Joined: 06 Aug 2005 Posts: 1656 Location: England

doublehp · Guru Joined: 11 Apr 2005 Posts: 473 Location: FRANCE

drescherjm · Posted: Wed Jan 06, 2010 2:26 pm Post subject:

Here is some info on where to begin for nagios:

http://www.gentoo.org/doc/en/nagios-guide.xml
_________________
John

My gentoo overlay
Instructons for overlay