I have an HP D2700 disk enclosure with twelve (12) HP 3TB SAS HHDs installed, configured in RAID 6 as a single 30TB volume formatted XFS.
The array has been working fine but has not been used very much until recently. Now that I want to actually read and write to it, it periodically fails with I/O errors to the OS (RHEL 7.2).
When I inspect the enclosure, I see ALL green lights everywhere except ALL the drives are showing steady amber. If only ONE drive showed amber or red, I would know to replace that drive and let the rebuild process do its thing. However, clearly all the drives have not failed simultaneously.
If I cycle power on the array, all the drives glow happily green again but the server will not access the drive unless I reboot the (production) server to which it is attached. If I do reboot it, all is well until I try do do any substantial disk activity on the array, then it happens again.
If I try to get the OS to recognize the drive without reboothing the server, the RAID controller marks it as bad and I have to manually tell the RAID controller at the consol during boot to put the array back online, since it has now taken it offline.
The manual for the D2700 does not mention this error condition. Can anyone tell me what it means, if anything else can be done to troubleshoot it (perhaps one drive really is bad, but which do I swap out?), and how to address the issue?
The drives and the enclosure all have been updated to the latest firmware.
Thanks!