We have recently experienced a SmartArray issue which has so far affected two of the servers.
The scenario is as follows:
One of the HDDs in their RAID 5 array has failed.
The SmartArray controller automatically starts a rebuild on the hot spare drive.
However sometime into the rebuild process on the hot spare drive, the rebuild stops due to the raid controller detecting a bad sector on one of the remaining data drives.
So the hot spare cannot effectively save them from a failed HDD.
We will now have to recreate the entire RAID 5 LUN and restore from a backup.
Very very dissatisfied as a result.
I see that this situation can be avoided if the SmartArray had a feature to perform a background consistency check on all the data stripes on that RAID 5 drive set.
Does the SmartArray have such a feature? If so, how can this be invoked.
SmartArray RAID 5 stripe consistency check
版主: xyevolve
版面规则
1. 本版是定位于惠普服务器的技术讨论区。
2. 本版鼓励发帖共同讨论技术问题,不鼓励站内信件私下交流,独知知不如众知知。
3. 本版允许转贴或引用他人的作品,但必须声明原作者信息。
4. 本版禁止发表出售、求购、或其他非技术讨论等帖子。
5. 本版禁止灌水,包括但不限于任何与所讨论主题无关的回复,无意义字符,直接复制其他回复等。
6. 本站附件禁止用于商业目的,请在下载后24小时内删除,本站不对其造成的结果负任何责任。
1. 本版是定位于惠普服务器的技术讨论区。
2. 本版鼓励发帖共同讨论技术问题,不鼓励站内信件私下交流,独知知不如众知知。
3. 本版允许转贴或引用他人的作品,但必须声明原作者信息。
4. 本版禁止发表出售、求购、或其他非技术讨论等帖子。
5. 本版禁止灌水,包括但不限于任何与所讨论主题无关的回复,无意义字符,直接复制其他回复等。
6. 本站附件禁止用于商业目的,请在下载后24小时内删除,本站不对其造成的结果负任何责任。
-
- 注册用户
- 帖子: 67
- 注册时间: 2010年 12月 30日 22:49 星期四
Re: SmartArray RAID 5 stripe consistency check
The Smart Array controllers have this feature and it is actively working in the background.
Whenever the controller is idle it is actively scanning the entire array from beginning to end, the going back to the start and repeating.
Unfortunately on an active array it can take some time to get all the way through if it is constantly being interrupted by IO.
The controller will wait a set amount of idle time before continuing the scan, and on a busy array it might not reach that idle time often enough.
To help mitigate this somewhat you can set the Surface Scan Delay value in the Controller Settings in ACU.
Setting this value lower can negatively affect performance although I believe it would be minimal.
Whenever the controller is idle it is actively scanning the entire array from beginning to end, the going back to the start and repeating.
Unfortunately on an active array it can take some time to get all the way through if it is constantly being interrupted by IO.
The controller will wait a set amount of idle time before continuing the scan, and on a busy array it might not reach that idle time often enough.
To help mitigate this somewhat you can set the Surface Scan Delay value in the Controller Settings in ACU.
Setting this value lower can negatively affect performance although I believe it would be minimal.
- MUDBOY
- 创始人
- 帖子: 3882
- 注册时间: 2010年 12月 28日 21:17 星期二
Re: SmartArray RAID 5 stripe consistency check
The Smart Array controllers have a feature called Dynamic Sector Repair.
From the SmartArray controller tech brief:
The surface analysis process can be monitored via ADU.
Make sure the latest SA FW is loaded, HP released SA firmware’s to enhance bad block handling:
eg P400 FW7.22 release notes:
From the SmartArray controller tech brief:
This is where this feature is configured in ACU:Dynamic sector repair
Under normal operating conditions and over time, disk drive media can develop defects caused by variances in the drive mechanisms. To protect data from media defects, HP built a dynamic sector repair feature into Smart Array controllers.
During inactive periods, Smart Array controllers configured with a fault-tolerant logical drive perform a background surface analysis, continually scanning all drives for media defects. During busy periods, Smart Array controllers can also detect media defects when accessing a bad sector. If a Smart Array controller finds a recoverable media defect, the controller automatically remaps the bad sector to a reserve area on the disk drive. If the controller finds an unrecoverable media defect and a fault-tolerant logical drive is configured, the controller automatically regenerates the data and writes it to the remapped reserved area on the disk drive.
The surface analysis process can be monitored via ADU.
Make sure the latest SA FW is loaded, HP released SA firmware’s to enhance bad block handling:
eg P400 FW7.22 release notes:
And finally, depending on the SA model, suggest to use RAID-6 which offers double parity protection and highly reduces the risk of rebuild failures.• Note: The Smart Array Firmware allows for early media error detection so Firmware can detect UREs sooner and mark sectors bad. To enable this feature in ACU version 8.60 or later, the end user must set the Surface Scan Analysis Priority to "High".
您没有权限查看这个主题的附件。