Server note: hard disk fail

Started by Toruk Makto, May 27, 2011, 01:14:08 AM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Toruk Makto

 Just a note for anyone keeping score:

The disk drive on channel four of the RAID5 array failed this evening. That drive has been hot-swapped with a new unit and the array controller is currently rebuilding the RAID set on the fly. During this process, the website and associated LearnNavi services may slow down occasionally, but will remain online. The rebuild could take up to 6 hours depending on the system load, after which we will be back to normal.

This is actually our first hardware failure on this server. The system status reporting system, on-site spares, and colo personnel performed as I had anticipated, so we had no downtime. I love it when a plan comes together!

ta Markì

Lì'fyari leNa'vi 'Rrtamì, vay set 'almong a fra'u zera'u ta ngrrpongu
Na'vi Dictionary: http://files.learnnavi.org/dicts/NaviDictionary.pdf

bommel

#1
RAID ftw! ;D

I know about a mainframe at a datacenter where they have to swap at least one HDD/week (okay they have over 5000 TB total disk space) ^^

Swoka Ikran

Quote from: bommel on May 27, 2011, 04:21:09 AM
I know about a mainframe at a datacenter where they have to swap at least one HDD/week (okay they have over 5000 TB total disk space) ^^
I'm curious what they use the 5000TB for.

@Markì: How long did that drive last? IIRC, we just got the new server a few months ago. (Asking because the lifespan of that HDD didn't seem to be very long...)
2010 was the year of the Na'vi.Vivar 'ivong Na'vi!


 
Avatray | NWOTD Sigbars | Sacred's Sigbar Tool | My collection of Avatar merchandise

omängum fra'uti

1 HDD a week?  There's some data centers where there are people whose full time job is, basically, to swap HDDs (And other hardware as it fails).  (Also, 5000TB = 5PB)
Ftxey lu nga tokx ftxey lu nga tirea? Lu oe tìkeftxo.
Listen to my Na'vi Lessons podcast!

Toruk Makto

Quote from: Swoka Ikran on May 27, 2011, 12:08:04 PM
@Markì: How long did that drive last? IIRC, we just got the new server a few months ago. (Asking because the lifespan of that HDD didn't seem to be very long...)

You're right, it didn't. However, the server receives over 250,000 web hits per day and also running the minecraft server, so the drive hardware may really be wearing out that fast. Hope not, but we'll see. If another one fails in the next few weeks, we'll know it is the amount of use and not just premature failure.

Lì'fyari leNa'vi 'Rrtamì, vay set 'almong a fra'u zera'u ta ngrrpongu
Na'vi Dictionary: http://files.learnnavi.org/dicts/NaviDictionary.pdf

omängum fra'uti

It was likely premature failure.  Commodity desktop drives aren't built to handle the vibrations of constant use, but even then that's a pretty short time.  If they are server drives, then it's certainly just premature failure.  Even hard drives under constant load should last longer than that.  But the expected life if a hard drive is a curve from the start, not a bump, perfectly level for years, then a drop.
Ftxey lu nga tokx ftxey lu nga tirea? Lu oe tìkeftxo.
Listen to my Na'vi Lessons podcast!

Sіr. Ηaxalot

#6
Quote from: Swoka Ikran on May 27, 2011, 12:08:04 PM
Quote from: bommel on May 27, 2011, 04:21:09 AM
I know about a mainframe at a datacenter where they have to swap at least one HDD/week (okay they have over 5000 TB total disk space) ^^
I'm curious what they use the 5000TB for.

@Markì: How long did that drive last? IIRC, we just got the new server a few months ago. (Asking because the lifespan of that HDD didn't seem to be very long...)

The lifespan of a HDD is usually way longer. I had a HDD failure in my backup server too, but I'm pretty sure it was the same drive that have been in the server since it was first bought back in 2001-2002, which means that it should have been ~10 years old.

That server was a storage server for a number of accounts in a school network. I got it unformatted, everything the students had storaged in they're network storage was still there :o

Seze

I've had some drives drop out of a raid array on me before that were still good.  The joys of using software raid with drives that were made almost specifically to not function well in a raid array.  The drive would try to fix itself if it detected something odd, and the time it spent to do that would cause the controller to think the drive stopped responding and it would be marked as a failed drive...


Learn Na'vi Mobile App - Now Available

Toruk Makto

These drives don't try to do anything fancy. The Areca ARC-1120 RAID controller runs some pretty definitive tests when a drive starts to act up before dropping it from the array, so I am fairly convinced this was a genuine event.


Lì'fyari leNa'vi 'Rrtamì, vay set 'almong a fra'u zera'u ta ngrrpongu
Na'vi Dictionary: http://files.learnnavi.org/dicts/NaviDictionary.pdf

Human No More

Servers do burn through HDDs at a much higher failure rate, just due to load.
"I can barely remember my old life. I don't know who I am any more."

HNM, not 'Human' :)

Na'vi tattoo:
1 | 2 (finished) | 3
ToS: Human No More
dA
Personal site coming soon(ish

"God was invented to explain mystery. God is always invented to explain those things that you do not understand."
- Richard P. Feynman