Error recovery control: Difference between revisions

Content deleted Content added
GreenC bot (talk | contribs)
Rescued 1 archive link; reformat 1 link. Wayback Medic 2.5 per Category:All articles with dead external links - pass 3
 
(180 intermediate revisions by 99 users not shown)
Line 1:
{{Refimprove|date=April 2010}}
'''Time-Limited Error Recovery''' (TLER) is a name used by [[Western Digital]] for a [[hard drive]] feature that allows improved error handling in a [[RAID]] environment. In some cases, there is a conflict whether error handling should be undertaken by the hard drive or by the RAID controller, which leads to drives being marked as unusable and significant performance degradation, when this could otherwise have been avoided. Similar technologies are called '''Error Recovery Control''' (ERC), used by competitor [[Seagate]], and '''Command Completion Time Limit''' (CCTL), used by [[Samsung]] and [[Hitachi]].
In [[computing]], '''error recovery control''' ('''ERC''') ([[Western Digital]]: '''time-limited error recovery''' ('''TLER'''), [[Samsung]]/[[Hitachi GST|Hitachi]]: '''command completion time limit''' ('''CCTL''')) is a feature of [[hard disk]]s which allow a system administrator to configure the amount of time a drive's [[firmware]] is allowed to spend recovering from a read or write error. Limiting the recovery time allows for improved error handling in hardware or software [[RAID]] environments. In some cases, there is a conflict as to whether error handling should be undertaken by the hard drive or by the RAID implementation, which leads to drives being marked as unusable and significant performance degradation, when this could otherwise have been avoided.
 
==Overview==
Modern [[hard drive]]s feature an ability to recover from some read/write errors by internally remapping [[Disk sector|sectors]] and performing other forms of self -test and recovery. The process for this can sometimes take several seconds or (under heavy usage) minutes, during which time the drive is unresponsive. Hardware RAID controllers and software RAID implementations are designed to recognizerecognise a drive which does not respond within a few seconds, and mark it as unreliable, indicating that it should be withdrawn from use and the array rebuilt from [[Parity bit#Parity block|parity data]]. This is a long process, degrades performance, and if amore second drive shoulddrives fail under the resulting additional workload, it canmay be catastrophic.
 
If the drive itself is inherently reliable but has some bad sectors, then TLER and similar features prevent a disk from being unnecessarily marked as 'failed' by limiting the time spent on correcting detected errors before advising the array controller of a failed operation. The array controller can then handle the data recovery for the limited amount involved, rather than marking the entire drive as faulty.
 
==Typical defaults==
==TLER in a non-RAID environment==
Effectively, TLER and similar features limit the performance of on-drive error handling, to allow hardware RAID controllers and software RAID implementations to handle the error if problematic. In a non-RAID environment, such features are unhelpful, and manufacturers do not recommend their use.
 
By defaultGenerally, Western Digital [http://www.westerndigital.com/en/products/index.asp?Cat=2&Language=en [Enterprise Drivesdisk drive|enterprise drives]] such as [[Western Digital Raptor|Raptor]], Caviar RE2 and RE2-GP (RAID Edition) come with TLER Read "DisabledEnabled" (07 seconds) and TLER Write "Enabled" (7 seconds) while [http://www.westerndigital.com/en/products/index.asp?Cat=3&Language=endesktop Desktop Drives]drives such as Caviar SE, SE16, and GP come with TLER Read and Write Disabled (configured as 0 seconds, to disable).
TLER can be enabled or disabled on certain Western Digital drives, using the tool WDTLER on a DOS bootdisk. Western Digital states that this feature cannot be disabled. [http://wdc.custhelp.com/cgi-bin/wdc.cfg/php/enduser/std_adp.php?p_faqid=1478 WD Customer Help FAQ 1478] However users and independent editors have have reported that this feature can be disabled. [http://techreport.com/reviews/2006q3/wd-500s/index.x?pg=1 TechReport.com] [http://www.silentpcreview.com/forums/viewtopic.php?t=38385&sid=16f0f601b1174a4ebf3fd6fd295b2351 SilentPCReview.com] [http://www.fatwallet.com/forums/messageview.php?catid=18&threadid=660910&start=0 FatWallet.com] [http://www.anandtech.com/storage/showdoc.aspx?i=2922 AnandTech.com] The tool allows this feature to be disabled by setting the values for read and write to 0 seconds.
 
== Standalone vs. RAID considerations ==
By default, Western Digital [http://www.westerndigital.com/en/products/index.asp?Cat=2&Language=en Enterprise Drives] such as Raptor, Caviar RE2 and RE2-GP (RAID Edition) come with TLER Read "Disabled" (0 seconds) and TLER Write "Enabled" (7 seconds) while [http://www.westerndigital.com/en/products/index.asp?Cat=3&Language=en Desktop Drives] such as Caviar SE, SE16, and GP come with TLER Read and Write Disabled (0 seconds).
It is best for TLER to be "enabled" when in a RAID array to prevent the recovery time from a disk read or write error from exceeding the RAID implementation's timeout threshold. If a drive times out, the hard disk will need to be manually re-added to the array, requiring a re-build and re-synchronization of the hard disk. Enabling TLER seeks to prevent this by interrupting error correction before timeout, to report failures only for data segments. The result is increased reliability in a RAID array.
 
In a stand-alone configuration TLER should be disabled. As the drive is not redundant, reporting segments as failed will only increase manual intervention. Without a hardware RAID controller or a software RAID implementation to drop the disk, normal (no TLER) recovery ability is most stable.
== Stand-Alone vs RAID Hard Disk Usage Considerations ==
 
In a software RAID configuration whether or not TLER is helpful is dependent on the operating system. For example, in FreeBSD the ATA/CAM stack controls the timeouts, and is set to progressively increase the timeouts as they occur. Thus, if a desktop disk without TLER starts delaying a response to a sector read, FreeBSD will retry the read with successively longer timeouts to prevent prematurely dropping the disk out of the array.
It is important to understand that TLER should be "Enabled" for a hard disk if it is being used in a RAID array to prevent a the recovery time from a disk read or write error to take too long and prevent the RAID controller from flagging the drive as failed and dropping the drive from the array. If a drive is dropped from an array due to it passing the timeout threshold of the RAID controller due to taking too much time performing error correction the hard disk will need to be the manually re-added to the array requiring a re-build and re-synchronization of the hard disk with the rest of the disks in the array. In the remote possibility that two drives that do not have TLER enable happen to encounter a disk error that takes too much time to recover both of these drives might be flagged as failed and dropped from the RAID array effectively breaking the array and requiring either a complete data restore from backup or manual intervention to force the array to re-recognize the drives as clean and online.
 
On the other hand it would be prudent to disable TLER for a RAID Edition hard drive being used as a stand-alone drive to allow it more time to recover from disk read or write errors increasing the probability of data recovery.
 
{| class="wikitable"
|-
! Model
! TLER Defaultdefault ( Readread / Writewrite )
! Stand-Alonealone Recommendationrecommendation
! RAID Recommendationrecommendation
|-
| '''Caviar, SE, SE16, GP, Raptor'''
| ''Disabled ( 0s / 0s )''
| ''Default''
| Enabled (if 7s / 7s possible)
|-
| '''Caviar RE2, RE2-GP, RaptorRed'''
| ''Enabled ( 0s7s / 7s )''
| Disabled ( 0s / 0s )
| ''Default''
|}
 
=== ZFS ===
== Western Digital Time Limit Error Recovery Utility - WDTLER.EXE ==
The [[ZFS|ZFS filesystem]] was designed to immediately write data to a sector that reports as bad or takes an excessively long time to read (such as non-TLER drives); this will usually force an immediate sector remap on a weak sector in most drives.{{Citation needed|date=March 2022}}
 
The WDTLER utility allows for the enabling or disabling of the TLER parameter in the hard disk's firmware settings allowing the user to determine the best setting for his particular usage as either a stand-alone or RAID drive. This utility is written for the DOS operating system and you will require a DOS bootable disk with this utility on it to use it.
 
The WDTLER utility works on and makes changes to all the connected and compatibly Western Digital hard drives to the computer. It is important to remember that any change will affect all the hard drives. If you only wish to change specific hard drives on your computer then you should disconnect the other hard drives before you use this utility, then reconnect them after you are finished.
 
The WDTLER utility comes with three batch files, TLERSCAN.BAT to Get the current state of the TLER setting on all the hard drives, TLER-ON.BAT to Enable TLER, and TLER-OFF.BAT to Disable TLER. The included TLER-ON.BAT will set the Read & Write TLER time to 7 seconds. If you wish to use a custom timeout value, you can use the WDTLER.EXE utility directly with the <code>-r# -w#</code> parameters to specify how many seconds the Time Limit value should be.
 
Below is the WDTLER output for Western Digital Caviar SE16 320 GB and 500 GB hard disk for the default TLER configuration before and after TLER has been Enabled.
 
===RAID controllers===
'''Before - TLER Read & Write: Disabled'''
Disconnect timeout values for different hardware [[Disk array controller|RAID controllers]] may vary between vendors; thus, TLER should trigger before the controller times out the drive. For example, 3ware 9650SE uses 20 seconds as the timeout,<ref>{{cite web|url=http://kb.lsi.com/KnowledgebaseArticle15639.aspx|archiveurl=https://web.archive.org/web/20120203053819/http://kb.lsi.com/KnowledgebaseArticle15639.aspx|title=User Guide for 9650SE 9690SA from 9.5.2 Complete Codeset|archivedate=3 February 2012|work=lsi.com|accessdate=10 June 2015}}</ref> while for the LSI Logic used in IBM x-series it is 10 seconds.<ref>Available in BIOS Raid Config Utility > Advanced Device Properties</ref>
 
Widely available [[Intel Rapid Storage Technology|Intel Matrix RAID / Intel Rapid Storage Technology]], embedded in [[Intel]] server motherboards and modern desktop motherboards, is a pseudo-hardware controller, not a true hardware RAID controller.
<pre>
WDTLER Version 1.03
Copyright (C) 2004-2006 Western Digital Corporation
Western Digital Time Limit Error Recovery Utility
 
===Software RAID===
Model: WDC WD3200KS-00PFB0 Serial Number: WD-WCAPD1234567
Linux [[mdadm]] simply holds and lets the drive complete its recovery – however, the default command timeout for the SCSI Disk layer (/sys/block/sd?/device/timeout) is 30 seconds,<ref>{{cite web|url=https://github.com/torvalds/linux/blob/master/drivers/scsi/sd.h#LC12|title=linux/sd.h at master · torvalds/linux · GitHub|work=GitHub}}</ref> after which it will attempt to reset the drive, and if that fails, put the drive offline.<ref>{{cite web|url=https://www.kernel.org/doc/html/v5.9/scsi/scsi_eh.html|title= Linux SCSI Subsystem: SCSI EH|work=kernel.org}}</ref>
Read TLER is disabled.
Write TLER is disabled.
 
== Changing ERC ==
Model: WDC WD3200KS-00PFB0 Serial Number: WD-WCAPD1234567
Read TLER is disabled.
Write TLER is disabled.
 
===ATA-8 standard===
Model: WDC WD5000KS-00MNB0 Serial Number: WD-WMANU1234567
The 2006 ATA-8 standard defines a SCT {{tt|Error Recovery Control}} command.<ref>[https://www.singlix.org/trdos/8086/archive/specs/D1699r3e-ATA8_ACS.pdf ATA/ATAPI Command Set (ATA8-ACS) ]</ref> For hard drives that implement this interface, the {{Mono|smartctl}} utility (part of the [[smartmontools]] package) can be used to change the error-recovery timeout via {{code|-l scterc}}.<ref name=greg>{{Cite web |author=Richard Gregory |url=http://abatis.org.uk/projects/erc/ |title=Author's description of the original patch to smartctl that implemented that feature |access-date=2013-02-15 |archive-url=https://web.archive.org/web/20130910034510/http://cgi.csc.liv.ac.uk:80/~greg/projects/erc/ |archive-date=2013-09-10 |url-status=live }}</ref> In 2018, ACS-4 added a functionality for the setting to persist across reboot; it is now supported by smartctl.<ref>{{cite web |title=#1427 (Add support for SCT Error Recovery Timer features added in ACS-4) – smartmontools |url=https://www.smartmontools.org/ticket/1427 |website=www.smartmontools.org}}</ref>
Read TLER is disabled.
Write TLER is disabled.
 
Controlling the timeout behavior through the {{Mono|smartctl}} utility may not work on all hard disk drives because some manufacturers have changed their desktop drives not to include the support for the ERC parameter,<ref>{{cite web|url=http://www.spinics.net/lists/raid/msg38964.html|title=Re: md RAID with enterprise-class SATA or SAS drives|work=spinics.net}}</ref><ref>{{cite web|url=http://knowledge.seagate.com/articles/en_US/FAQ/203991en|title=Seagate FAQ: What is Error Recovery Control?|work=seagate.com}}</ref> purportedly to force sales of their more expensive RAID/enterprise models.{{Citation needed|date=April 2016}} Richard Gregory, who wrote the original ERC patch for smartctl, reports that Western Digital retracted ERC support by releasing a new model without notice.<ref name=greg/>
Model: WDC WD5000KS-00MNB0 Serial Number: WD-WMANU1234567
Read TLER is disabled.
Write TLER is disabled.
</pre>
<small>''Legend: WD3200KS - Western Digital Caviar SE16 320 GB, WD5000KS - Western Digital Caviar SE16 500 GB''</small>
 
On Windows, the HDAT2 program is available in addition to smartctl (which is cross-platform).<ref name=greg/>
<!-- Text Output from DOS applications is considered "computer generated screen output" and usage is considered "fair use" under copyright law. The Copyright notice shown is "screen output" from the copyrighted WDTLER utility and not the screen output. -->
 
=== SCSI standard ===
'''After - TLER Read & Write: 7 seconds'''
SBC-4 describes a RECOVERY TIME LIMIT field in the Read-Write Error Recovery mode page used to define how the drive performs error recovery.<ref>{{cite web |title=INCITS 506-202x - Information technology - SCSI Block Commands - 4 (SBC-4) draft revision 22 |url=https://standards.incits.org/apps/group_public/download.php/124286/livelink |access-date=22 May 2023 |date=15 September 2020}}</ref> The sdparm program can change this setting with {{code|1=--set=RTL}}.<ref>{{man|8|sdparm|Linux}}</ref>
 
=== Vendor utilities ===
<pre>
==== Western Digital ====
WDTLER Version 1.03
A {{Mono|WDTLER.EXE}} utility allows the enabling or disabling of the TLER parameter on Western Digital hard drives. This utility is written for [[DOS]]. The utility works on and makes changes to all compatible Western Digital hard disk drives connected to the computer. The change survives power-cycling. Western Digital used to mention the tool in an FAQ.<ref name=customer-service>{{cite web |title=TLER / CCTL / ERC thread |url=https://hardforum.com/threads/tler-cctl-erc-thread.1562128/ |website=[H]ard{{!}}Forum |date=16 November 2010}}</ref>
Copyright (C) 2004-2006 Western Digital Corporation
Western Digital Time Limit Error Recovery Utility
 
The WDTLER utility comes with three batch files, {{Mono|TLERSCAN.BAT}} to Getget the current state of the TLER setting on all the hard drives, {{Mono|TLER-ON.BAT}} to Enableenable TLER, and {{Mono|TLER-OFF.BAT}} to Disabledisable TLER. The included {{Mono|TLER-ON.BAT}} will set the Read & Write TLER time to 7seven seconds. It Ifis you wishpossible to use a custom timeout value, you can use the {{Mono|WDTLER.EXE}} utility directly with the <code>-r# -w#</code> parameters tofor specifya howcustom many seconds the Time Limit value should betimeout.
Model: WDC WD3200KS-00PFB0 Serial Number: WD-WCAPD1234567
Read TLER time is 7.000 seconds.
Write TLER time is 7.000 seconds.
 
Western Digital claims that using the {{Mono|WDTLER.EXE}} utility on newer drives can damage the firmware and make the disk unusable. The utility is no longer available from Western Digital, and new drives will not be able to have the TLER setting changed. RE disks are only suitable for RAID arrays and Caviar are only suitable for non-RAID use. The utility still works for older drives{{which|date=May 2023}}<!-- what is the cutoff? -->.
Model: WDC WD3200KS-00PFB0 Serial Number: WD-WCAPD1234567
Read TLER time is 7.000 seconds.
Write TLER time is 7.000 seconds.
 
==== Hitachi ====
Model: WDC WD5000KS-00MNB0 Serial Number: WD-WMANU1234567
Hitachi customer service stated in 2009 that there is a Feature Tool for changing ERC (referred to as CCTL).<ref name=customer-service/>
Read TLER time is 7.000 seconds.
Write TLER time is 7.000 seconds.
 
==== Seagate ====
Model: WDC WD5000KS-00MNB0 Serial Number: WD-WMANU1234567
Seagate provides a {{Mono|openSeaChest}} utility to allow you to interrogate and change many firmware settings including TLER. If you cannot use <code>smartctl -l scterr,x,y</code> to set the TLER, the relevant command-line commands are <code>openSeaChest_Configure -d /dev/sg0 --sctReadTimer</code> and <code>openSeaChest_Configure -d /dev/sg0 --sctWriteTimer</code>.
Read TLER time is 7.000 seconds.
Write TLER time is 7.000 seconds.
</pre>
<small>''Legend: WD3200KS - Western Digital Caviar SE16 320 GB, WD5000KS - Western Digital Caviar SE16 500 GB''</small>
 
==References==
<!-- Text Output from DOS applications is considered "computer generated screen output" and usage is considered "fair use" under copyright law. The Copyright notice shown is "screen output" from the copyrighted WDTLER utility and not the screen output.-->
{{Reflist}}
 
==External links==
* [https://raid.wiki.kernel.org/index.php/Timeout_Mismatch Linux Raid wiki: Timeout Mismatch]
* [http://www.wdc.com/en/library/sata/2579-001098.pdf Time-Limited Error Recovery (TLER) Information Sheet]
* [https://archive.today/20130121054825/http://wdc.custhelp.com/app/answers/detail/a_id/1397/p/227,283/session/L3RpbWUvMTMyMTQzOTc4NS9zaWQvdVhvYmpmSms%3D Western Digital FAQ answer ID 1397: Difference between Desktop edition and RAID (Enterprise) edition drives]
* [http://www.samsung.com/global/business/hdd/learningresource/whitepapers/LearningResource_CCTL.html Samsung CCTL]
* [http://www.wdc.com/enwdproducts/library/sataother/2579-001098.pdf Time-Limited Error Recovery (TLER) Information Sheet], Western Digital, January 2013
* [https://web.archive.org/web/20071103042201/http://www.samsung.com/global/business/hdd/learningresource/whitepapers/LearningResource_CCTL.html Samsung CCTL]
 
[[Category:Rotating disc computer storage media]]
[[Category:Hard disk computer storage]]
[[de:TLER]]
[[nl:Time-Limited Error Recovery]]