分页: 1 / 1
SmartArray RAID 5 stripe consistency check
发表于 : 2012年 5月 18日 14:34 星期五
由 HONG
We have recently experienced a SmartArray issue which has so far affected two of the servers.
The scenario is as follows:
One of the HDDs in their RAID 5 array has failed.
The SmartArray controller automatically starts a rebuild on the hot spare drive.
However sometime into the rebuild process on the hot spare drive, the rebuild stops due to the raid controller detecting a bad sector on one of the remaining data drives.
So the hot spare cannot effectively save them from a failed HDD.
We will now have to recreate the entire RAID 5 LUN and restore from a backup.
Very very dissatisfied as a result.
I see that this situation can be avoided if the SmartArray had a feature to perform a background consistency check on all the data stripes on that RAID 5 drive set.
Does the SmartArray have such a feature? If so, how can this be invoked.
Re: SmartArray RAID 5 stripe consistency check
发表于 : 2012年 5月 18日 14:35 星期五
由 YAN
The Smart Array controllers have this feature and it is actively working in the background.
Whenever the controller is idle it is actively scanning the entire array from beginning to end, the going back to the start and repeating.
Unfortunately on an active array it can take some time to get all the way through if it is constantly being interrupted by IO.
The controller will wait a set amount of idle time before continuing the scan, and on a busy array it might not reach that idle time often enough.
To help mitigate this somewhat you can set the Surface Scan Delay value in the Controller Settings in ACU.
Setting this value lower can negatively affect performance although I believe it would be minimal.
Re: SmartArray RAID 5 stripe consistency check
发表于 : 2012年 5月 18日 14:37 星期五
由 MUDBOY
The Smart Array controllers have a feature called Dynamic Sector Repair.
From the SmartArray controller tech brief:
Dynamic sector repair
Under normal operating conditions and over time, disk drive media can develop defects caused by variances in the drive mechanisms. To protect data from media defects, HP built a dynamic sector repair feature into Smart Array controllers.
During inactive periods, Smart Array controllers configured with a fault-tolerant logical drive perform a background surface analysis, continually scanning all drives for media defects. During busy periods, Smart Array controllers can also detect media defects when accessing a bad sector. If a Smart Array controller finds a recoverable media defect, the controller automatically remaps the bad sector to a reserve area on the disk drive. If the controller finds an unrecoverable media defect and a fault-tolerant logical drive is configured, the controller automatically regenerates the data and writes it to the remapped reserved area on the disk drive.
This is where this feature is configured in ACU:
ACU.png
The surface analysis process can be monitored via ADU.
Make sure the latest SA FW is loaded, HP released SA firmware’s to enhance bad block handling:
eg P400 FW7.22 release notes:
• Note: The Smart Array Firmware allows for early media error detection so Firmware can detect UREs sooner and mark sectors bad. To enable this feature in ACU version 8.60 or later, the end user must set the Surface Scan Analysis Priority to "High".
And finally, depending on the SA model, suggest to use RAID-6 which offers double parity protection and highly reduces the risk of rebuild failures.