SmartArray RAID 5 stripe consistency check

HP ProLiant、BladeSystem、Integrity、Integrity NonStop、9000、Alpha服务器等

版主: xyevolve

版面规则
1. 本版是定位于惠普服务器的技术讨论区。
2. 本版鼓励发帖共同讨论技术问题,不鼓励站内信件私下交流,独知知不如众知知。
3. 本版允许转贴或引用他人的作品,但必须声明原作者信息。
4. 本版禁止发表出售、求购、或其他非技术讨论等帖子。
5. 本版禁止灌水,包括但不限于任何与所讨论主题无关的回复,无意义字符,直接复制其他回复等。
6. 本站附件禁止用于商业目的,请在下载后24小时内删除,本站不对其造成的结果负任何责任。
回复
HONG
注册用户
帖子: 87
注册时间: 2011年 1月 26日 23:03 星期三

SmartArray RAID 5 stripe consistency check

帖子 HONG » 2012年 5月 18日 14:34 星期五

We have recently experienced a SmartArray issue which has so far affected two of the servers.

The scenario is as follows:

One of the HDDs in their RAID 5 array has failed.
The SmartArray controller automatically starts a rebuild on the hot spare drive.
However sometime into the rebuild process on the hot spare drive, the rebuild stops due to the raid controller detecting a bad sector on one of the remaining data drives.

So the hot spare cannot effectively save them from a failed HDD.
We will now have to recreate the entire RAID 5 LUN and restore from a backup.
Very very dissatisfied as a result.

I see that this situation can be avoided if the SmartArray had a feature to perform a background consistency check on all the data stripes on that RAID 5 drive set.

Does the SmartArray have such a feature? If so, how can this be invoked.

YAN
注册用户
帖子: 67
注册时间: 2010年 12月 30日 22:49 星期四

Re: SmartArray RAID 5 stripe consistency check

帖子 YAN » 2012年 5月 18日 14:35 星期五

The Smart Array controllers have this feature and it is actively working in the background.
Whenever the controller is idle it is actively scanning the entire array from beginning to end, the going back to the start and repeating.

Unfortunately on an active array it can take some time to get all the way through if it is constantly being interrupted by IO.
The controller will wait a set amount of idle time before continuing the scan, and on a busy array it might not reach that idle time often enough.

To help mitigate this somewhat you can set the Surface Scan Delay value in the Controller Settings in ACU.
Setting this value lower can negatively affect performance although I believe it would be minimal.

头像
MUDBOY
创始人
帖子: 3882
注册时间: 2010年 12月 28日 21:17 星期二

Re: SmartArray RAID 5 stripe consistency check

帖子 MUDBOY » 2012年 5月 18日 14:37 星期五

The Smart Array controllers have a feature called Dynamic Sector Repair.
From the SmartArray controller tech brief:
Dynamic sector repair

Under normal operating conditions and over time, disk drive media can develop defects caused by variances in the drive mechanisms. To protect data from media defects, HP built a dynamic sector repair feature into Smart Array controllers.
During inactive periods, Smart Array controllers configured with a fault-tolerant logical drive perform a background surface analysis, continually scanning all drives for media defects. During busy periods, Smart Array controllers can also detect media defects when accessing a bad sector. If a Smart Array controller finds a recoverable media defect, the controller automatically remaps the bad sector to a reserve area on the disk drive. If the controller finds an unrecoverable media defect and a fault-tolerant logical drive is configured, the controller automatically regenerates the data and writes it to the remapped reserved area on the disk drive.
This is where this feature is configured in ACU:
ACU.png
The surface analysis process can be monitored via ADU.

Make sure the latest SA FW is loaded, HP released SA firmware’s to enhance bad block handling:
eg P400 FW7.22 release notes:
• Note: The Smart Array Firmware allows for early media error detection so Firmware can detect UREs sooner and mark sectors bad. To enable this feature in ACU version 8.60 or later, the end user must set the Surface Scan Analysis Priority to "High".
And finally, depending on the SA model, suggest to use RAID-6 which offers double parity protection and highly reduces the risk of rebuild failures.
您没有权限查看这个主题的附件。

回复