Content deleted Content added
→Software RAID: A small cleanup |
As noted in Talk:Error recovery control#Completely agree - preoccupation with hardware vs. software RAID, this should improve the things (the while article could use much more work, of course) |
||
Line 1:
{{Refimprove|date=April 2010}}
In [[computing]], '''error recovery control''' ('''ERC''') ([[Western Digital]]: '''time-limited error recovery''' ('''TLER'''), [[Samsung]]/[[Hitachi GST|Hitachi]]: '''command completion time limit''' ('''CCTL''')) is a feature of [[hard disk]]s which allow a system administrator to configure the amount of time a drive's [[firmware]] is allowed to spend recovering from a read or write error. Limiting the recovery time allows for improved error handling in
==Overview==
Modern [[hard drive]]s feature an ability to recover from some read/write errors by internally remapping [[Disk sector|sectors]] and performing other forms of self test and recovery. The process for this can sometimes take several seconds or (under heavy usage) minutes, during which time the drive is unresponsive. Hardware RAID controllers and software RAID implementations are designed to recognise a drive which does not respond within a few seconds, and mark it as unreliable, indicating that it should be withdrawn from use and the array rebuilt from [[Parity bit#Parity block|parity data]]. This is a long process, degrades performance, and if more drives fail under the resulting additional workload, it may be catastrophic.
If the drive itself is inherently reliable but has some bad sectors, then TLER and similar features prevent a disk from being unnecessarily marked as 'failed' by limiting the time spent on correcting detected errors before advising the array controller of a failed operation. The array controller can then handle the data recovery for the limited amount involved, rather than marking the entire drive as faulty.
==Desktop computers and TLER==
Effectively, TLER and similar features limit the performance of on-drive error handling, to allow hardware RAID controllers and software RAID implementations to handle the error if problematic
Generally, Western Digital [[Enterprise disk drive|enterprise drives]] such as [[Western Digital Raptor|Raptor]], Caviar RE2 and RE2-GP (RAID Edition) come with TLER Read "Enabled" (7 seconds) and TLER Write "Enabled" (7 seconds) while desktop drives such as Caviar SE, SE16, and GP come with TLER Read and Write Disabled (0 seconds).
== Standalone vs. RAID considerations ==
It is best for TLER to be "
In a stand-alone configuration TLER should be disabled. As the drive is not redundant, reporting segments as failed will only increase manual intervention. Without a hardware RAID controller or a software RAID implementations to drop the disk, normal (no TLER) recovery ability is most stable.
In a software RAID configuration whether or not TLER is helpful is dependent on the operating system. For example in FreeBSD the ATA/CAM stack controls the timeouts, and is set to progressively increase the timeouts as they occur. Thus, if a desktop disk without TLER starts delaying a response to a sector read, FreeBSD will retry the read with successively longer timeouts to prevent prematurely dropping the disk out of the array.
|