Self-Monitoring, Analysis and Reporting Technology

Template:Nota disambigua2 Self-Monitoring, Analysis, and Reporting Technology, o S.M.A.R.T., è un sistema di monitoraggio per dischi rigidi, per rilevare e fornire diversi indicatori di affidabilità, nella speranza di anticipare i malfunzionamenti.

Funzionamento

Essenzialmente, i malfunzionamenti dei dischi rigidi sono di due tipi:

Quelli prevedibili, in cui i malfunzionamenti, specialmente dovuti all'usura o all'invecchiamento, si manifestano gradualmente. Un sistema di monitoraggio può individuarli, come la spia della temperatura nella strumentazione di un'automobile può mettere in guardia il guidatore — prima che accadano seri danni — che il motore sta cominciando a surriscaldarsi.
Quelli imprevedibili, dove i malfunzionamenti accadono improvvisamente e senza preavviso, come nel caso di un componente elettronico che si brucia.

Il monitoraggio di un disco rigido può predire circa il 60% dei possibili malfunzionamenti. S.M.A.R.T. ha lo scopo di avvisare l'utente o l'amministratore di sistema che il disco rigido sta per guastarsi, in modo che questi abbia il tempo di copiare i dati su un altro dispositivo di archiviazione.

Compaq è stata la prima azienda a supportare S.M.A.R.T., ma oggi la maggior parte dei principali produttori di dischi rigidi e schede madri lo supportano almeno in parte. Molte schede madri avvisano l'utente quando il disco rigido sta per guastarsi. Tuttavia, S.M.A.R.T. attualmente non è implementato correttamente su molte piattaforme, a causa dell'assenza di standard per l'interscambio di dati S.M.A.R.T.

Da un punto di vista legale, il termine "S.M.A.R.T." si riferisce soltanto all'interscambio di dati tra i sensori elettro-meccanici del disco rigido e il computer, cosicché alcuni produttori includono sensori per una sola grandezza fisica e dichiarano il prodotto compatibile S.M.A.R.T. Per esempio, alcuni produttori dichiarano di supportare S.M.A.R.T., ma non includono un sensore di temperatura. Nel caso di dispositivi elettronici, l'affidabilità di norma è inversamente proporzionale alla temperatura, pertanto questo fattore è cruciale per predire eventuali malfunzionamenti.

Durante periodi di uso intenso (come nel caso di operazioni di deframmentazione oppure di funzionamento come server web), la temperatura può superare le specifiche fornite dal produttore. I danni provocati da temperatura eccessiva sono cumulativi nel tempo. Un sensore di temperatura S.M.A.R.T. può informare l'utente prima che il disco sia danneggiato dal calore eccessivo, ma molti produttori non includono un sensore di temperatura nel corredo S.M.A.R.T. Perciò, il termine S.M.A.R.T. è uno standard praticamente privo di significato, perché molti produttori dichiarano di supportarlo, ma si rifiutano di rivelare quali caratteristiche fisiche sono monitorate. Ciò crea confusione ed impedisce all'utente di confrontare correttamente i diversi prodotti.

Alcuni controller sono in grado di duplicare le operazioni di scrittura su di un backup secondario. Questa tecnologia è nota come RAID. Però molti software S.M.A.R.T. non funzionano se RAID è in funzione. È probabile che l'industria informatica supporterà correttamente S.M.A.R.T. soltanto quando un significativo numero di utenti chiederanno compatibilità, standardizzazione, e un'apertura trasparente a questa tecnologia da parte dei produttori di dischi rigidi.

Attributi

Ogni produttore definisce un insieme di attributi S.M.A.R.T. e imposta i valori di soglia che non dovrebbero essere superati durante un normale funzionamento. La scala di valori che un attributo può assumere varia da 1 a 253 (1 indica il funzionamento peggiore e 253 quello migliore). A seconda del produttore, gli attributi con valore di circa 100 o 200 saranno scelti come valori "normali". I produttori potrebbero non concordare sulle definizioni degli attributi e sulle unità di misura.

Legenda
	Un alto valore "raw" è meglio		Un basso valore "raw" è peggio
Critico		Potenziale indicatore di un imminente guasto elettromeccanico

ID	Hex	Nome attributo	Meglio se	Descrizione
01	01	Read Error Rate		Indica il numero delle volte in cui è capitato un errore di lettura hardware avvenuto leggendo un dato dalla superficie del disco. Un valore diverso da zero indica un problema della superficie del disco o delle testine di lettura/scrittura. Da notare che gli hard-disk Seagate spesso riportano un valore raw, che non indica problemi e può riportare alti valori anche su dischi appena comprati.
02	02	Throughput Performance		Generale velocità di banda del disco. Se il valore di questo attributo cala c'è un'alta probabilità che il disco abbia un problema.
03	03	Spin-Up Time		Average time of spindle spin up (from zero RPM to fully operational [millisec]).
04	04	Start/Stop Count		A tally of spindle start/stop cycles.
05	05	Reallocated Sectors Count		Count of reallocated sectors. When the hard drive finds a read/write/verification error, it marks this sector as "reallocated" and transfers data to a special reserved area (spare area). This process is also known as remapping and "reallocated" sectors are called remaps. This is why, on modern hard disks, "bad blocks" cannot be found while testing the surface — all bad blocks are hidden in reallocated sectors. However, the more sectors that are reallocated, the more read/write speed will decrease. A decrease in the attribute value indicates bad sectors.
06	06	Read Channel Margin		Margin of a channel while reading data. The function of this attribute is not specified.
07	07	Seek Error Rate		Rate of seek errors of the magnetic heads. If there is a failure in the mechanical positioning system, a servo damage or a thermal widening of the hard disk, seek errors arise. More seek errors indicates a worsening condition of a disk surface and the mechanical subsystem.
08	08	Seek Time Performance		Average performance of seek operations of the magnetic heads. If this attribute is decreasing, it is a sign of problems in the mechanical subsystem.
09	09	Power-On Hours (POH)		Count of hours in power-on state. The raw value of this attribute shows total count of hours (or minutes, or seconds, depending on manufacturer) in power-on state.
10	0A	Spin Retry Count		Count of retry of spin start attempts. This attribute stores a total count of the spin start attempts to reach the fully operational speed (under the condition that the first attempt was unsuccessful). An increase of this attribute value is a sign of problems in the hard disk mechanical subsystem.
11	0B	Recalibration Retries		This attribute indicates the number of times recalibration was requested (under the condition that the first attempt was unsuccessful). A decrease of this attribute value is a sign of problems in the hard disk mechanical subsystem.
12	0C	Device Power Cycle Count		This attribute indicates the count of full hard disk power on/off cycles.
13	0D	Soft Read Error Rate		Uncorrected read errors reported to the operating system. If the value is non-zero, you should back up your data.
190	BE	Airflow Temperature (WDC)		Airflow temperature on Western Digital HDs (Same as temp. (C2), but current value is 50 less for some models. Marked as obsolete.)
190	BE	Temperature Difference from 100		Value is equal to (100 - temp °C), allowing manufacturer to set a minimum threshold which corresponds to a maximum temperature. (Seagate only?)^{[senza fonte]} Seagate ST910021AS: Verified Present^{[senza fonte]} Seagate ST3802110A: Verified Present 2007-02-13^{[senza fonte]} Seagate ST980825AS: Verified Present 2007-04-05^{[senza fonte]} Seagate ST3320620AS: Verified Present 2007-04-23^{[senza fonte]} Seagate ST3500641AS: Verified Present 2007-06-12^{[senza fonte]} Seagate ST3250824AS: Verified Present 2007-08-07^{[senza fonte]} Seagate ST31000340AS: Verified Present 2008-02-05^{[senza fonte]} Seagate ST3160211AS: Verified Present 2008-06-12^{[senza fonte]} Seagate ST3320620AS: Verified Present 2008-06-12^{[senza fonte]} Seagate ST3400620AS: Verified Present 2008-06-12^{[senza fonte]} Samsung HD501LJ: Verified Present under name "Airflow Temperature" 2008-03-02^{[senza fonte]}
191	BF	G-sense error rate		Frequency of mistakes as a result of impact loads^{[senza fonte]}
192	C0	Power-off Retract Count		Number of times the heads are loaded off the media. Heads can be unloaded without actually powering off.^{[senza fonte]} (or Emergency Retract Cycle count - Fujitsu)^{[senza fonte]}
193	C1	Load/Unload Cycle		Count of load/unload cycles into head landing zone position. ^{[senza fonte]}
194	C2	Temperature		Current internal temperature.
195	C3	Hardware ECC Recovered		Time between ECC-corrected errors.
196	C4	Reallocation Event Count		Count of remap operations. The raw value of this attribute shows the total number of attempts to transfer data from reallocated sectors to a spare area. Both successful & unsuccessful attempts are counted.
197	C5	Current Pending Sector Count		Number of "unstable" sectors (waiting to be remapped). If the unstable sector is subsequently written or read successfully, this value is decreased and the sector is not remapped. Read errors on the sector will not remap the sector, it will only be remapped on a failed write attempt. This can be problematic to test because cached writes will not remap the sector, only direct I/O writes to the disk.
198	C6	Uncorrectable Sector Count		The total number of uncorrectable errors when reading/writing a sector. A rise in the value of this attribute indicates defects of the disk surface and/or problems in the mechanical subsystem.
199	C7	UltraDMA CRC Error Count		The number of errors in data transfer via the interface cable as determined by ICRC (Interface Cyclic Redundancy Check).
200	C8	Write Error Rate / Multi-Zone Error Rate		The total number of errors when writing a sector.
201	C9	Soft Read Error Rate		Number of off-track errors. If non-zero, make a backup.
202	CA	Data Address Mark errors		Number of Data Address Mark errors (or vendor-specific).^{[senza fonte]}
203	CB	Run Out Cancel		Number of ECC errors
204	CC	Soft ECC Correction		Number of errors corrected by software ECC^{[senza fonte]}
205	CD	Thermal Asperity Rate (TAR)		Number of thermal asperity errors.^{[senza fonte]}
206	CE	Flying Height	?	Height of heads above the disk surface.^{[senza fonte]}
207	CF	Spin High Current	?	Amount of high current used to spin up the drive.^{[senza fonte]}
208	D0	Spin Buzz	?	Number of buzz routines to spin up the drive^{[senza fonte]}
209	D1	Offline Seek Performance	?	Drive’s seek performance during offline operations^{[senza fonte]}
220	DC	Disk Shift		Distance the disk has shifted relative to the spindle (usually due to shock). Unit of measure is unknown.
221	DD	G-Sense Error Rate		The number of errors resulting from externally-induced shock & vibration.
222	DE	Loaded Hours	?	Time spent operating under data load (movement of magnetic head armature)^{[senza fonte]}
223	DF	Load/Unload Retry Count	?	Number of times head changes position.^{[senza fonte]}
224	E0	Load Friction		Resistance caused by friction in mechanical parts while operating.^{[senza fonte]}
225	E1	Load/Unload Cycle Count		Total number of load cycles^{[senza fonte]}
226	E2	Load 'In'-time	?	Total time of loading on the magnetic heads actuator (time not spent in parking area).^{[senza fonte]}
227	E3	Torque Amplification Count		Number of attempts to compensate for platter speed variations^{[senza fonte]}
228	E4	Power-Off Retract Cycle		The number of times the magnetic armature was retracted automatically as a result of cutting power.^{[senza fonte]}
230	E6	GMR Head Amplitude	?	Amplitude of "thrashing" (distance of repetitive forward/reverse head motion)^{[senza fonte]}
231	E7	Temperature		Drive Temperature
240	F0	Head Flying Hours	?	Time while head is positioning^{[senza fonte]}
250	FA	Read Error Retry Rate		Number of errors while reading from a disk

Bibliografia

Il significato degli attributi S.M.A.R.T.. (EN) PalickSoft.

Collegamenti esterni

Software

Molti software (specifici per sistema operativo) possono rilevare lo stato S.M.A.R.T. dei dischi rigidi della macchina host. Questi software possono anche distinguere il graduale deterioramento (il normale comportamento) da cambiamenti improvvisi (che indicano problemi più seri).

Zbigniew Chlondowski; Vari link a tools S.M.A.R.T.
smartmontools — open-source per Windows e Linux. Da notare anche per la quantità di documentazione sullo S.M.A.R.T.
DiskView — shareware per Windows. Si integra con Windows Explorer
DriveSitter — shareware per Windows
HDDlife — shareware per Windows
DiskCheckup — uso personale libero. Per Windows.
SMART Disk Monitor — shareware per Windows, Linux & Unix
ActiveSMART — shareware per Windows
HDD Health — freeware per Windows
S.M.A.R.T. — software commerciale Apple Macintosh
Utility Disco: un software sviluppato dalla Apple Computer per la manutenzione dei dischi rigidi che, tra l'altro, può visualizzare anche lo stato S.M.A.R.T. È incluso nell'installazione di Mac OS X.
SpeedFan — freeware per Windows