Avoiding Hard Disk Failures
Temperature is the enemy of hard disks. Hot disks fail sooner.
If you have a well designed server with lightly loaded disks in a cool room your drives are probably running around 25°C(77°F). If you have a 4U box that you chucked full of drives you may find some of your drives running much hotter. For instance, I just found two running at 41°C(105°F) in a box with otherwise cool drives.
Drives are typically specified to operate up to 55°C(131°F), but their lives are shortened. Estimating from Google data it looks like your failure rate within three years is about triple for 45°C versus 25°C. In the third year, about 1 in 6 of the hot drives will fail.
So protect your servers:
Know the problem: In debian land, install hddtemp and run it to see which drives are hot. Windows users might use DTemp.
`` vev# for v in a b c d ;do hddtemp /dev/sd\$v ;done /dev/sda: ST3750640AS: 34°C /dev/sdb: ST3750640AS: 27°C /dev/sdc: ST3750640AS: 41°C /dev/sdd: ST3750640AS: 40°C vev# # I have two hot drives.
Mitigate: If you have hot drives, move them around or adjust airflow, perhaps by making little cardstock air dams inside the server to cool the drives. Add a fan blowing on them if you have a large case.
Monitor: Record their temperatures and check in on them once in a while to make sure things aren’t going badly.
Debian users might also want to install smartmontools which will track your S.M.A.R.T. data and notify you of problems.
Note: for SATA drives you will need a “-d ata” or it will misaddress them as SCSI drives and you need a “-m foo@example.com” if you want to be email notified.