Folding@home catchall

Started by bommel, December 28, 2010, 07:19:24 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Swoka Ikran

Quote from: Tsmuktengan on February 15, 2012, 07:32:08 PM
Doesn't S.M.A.R.T. displays 'read errors' counts or some other disk failures such as when it starts rotating or such? It should, I would see no other reason because of the errors you get. Obviously if there aren't bad sectors it should be an other issue, such as the lens starting to show signs of fatigue or the disk not rotating properly... or something else.
It does, but they all say 0. Zero read errors, zero spinup errors, zero seek errors, and zero reallocated sectors, but Windows' event log seems to disagree with it being healthy.

Quote from: Tsmuktengan on February 15, 2012, 07:32:08 PM
You should move data on another disk and send this disk to your manufacturer. You can do tests on it before to make sure he does not find issues, but it does seem there is something wrong with it.

What is the hard drive's model by the way? Hopefully Western Digital or Seagate...  :)
Western Digital (a 500GB SATA unit). Mfg date is 6 days before Avatar came out in the U.S. (12/12/09), and it has a 3 year warranty.

I'll see about doing a long format on the bad partition this weekend. If that doesn't make any difference, I'll move everything onto a spare HDD and send this one back for replacement.

Also worth noting: I moved the pagefile back to the Windows partition and haven't had the BSODs since.
2010 was the year of the Na'vi.Vivar 'ivong Na'vi!


 
Avatray | NWOTD Sigbars | Sacred's Sigbar Tool | My collection of Avatar merchandise

Tsmuktengan

Ah, now that you talk about pagefile generating errors..... well that was logical then. The pagefile is used by Windows's virtual memory management (equivalent to Linux's swap partition in the principle). So the errors do not come from the Hard drive itself at all.

If you wish to not use memory paging at all, there is a setting somewhere in Windows to disable it. You need to have enough RAM for this however.


Swoka Ikran

Quote from: Tsmuktengan on February 15, 2012, 08:45:35 PM
Ah, now that you talk about pagefile generating errors..... well that was logical then. The pagefile is used by Windows's virtual memory management (equivalent to Linux's swap partition in the principle). So the errors do not come from the Hard drive itself at all.
The BSODs were KERNEL_DATA_INPAGE_ERRORs, which are often caused by bad sectors in the pagefile or defective RAM. The HDD is still the cause of the problem.

If I try to use software stored on the bad partition, that software still crashes randomly :( Also, some of my VMs that are kept in that partition no longer boot or complain of corrupt files.

I can't disable paging since I've got 3GB of RAM and run multiple VMs and other memory-heavy tasks quite frequently.
2010 was the year of the Na'vi.Vivar 'ivong Na'vi!


 
Avatray | NWOTD Sigbars | Sacred's Sigbar Tool | My collection of Avatar merchandise

Tsmuktengan

Aw all right...

...now this is just complicated. Try changing your hard drive for another one, perform some tests on your hard drive and a low level format eventually and see what error you get. There should be some read/write issues, can't understand why S.M.A.R.T. doesn't see anything (is it activated for this drive in the BIOS?).

You'v got the warranty for the hard drive. In a week or two you could get a new one indeed. Even when Western Digital's factories were still flooded, they were able to change my Caviar Blue.  :)


Swoka Ikran

Quote from: Tsmuktengan on February 16, 2012, 07:27:27 AM
Aw all right...

...now this is just complicated. Try changing your hard drive for another one, perform some tests on your hard drive and a low level format eventually and see what error you get.
I ran Windows disk check overnight and the event log filled up with bad block messages again. :( I'll format the damaged partition this weekend and swap the drive if the format makes no difference.

Quote from: Tsmuktengan on February 16, 2012, 07:27:27 AM
There should be some read/write issues, can't understand why S.M.A.R.T. doesn't see anything (is it activated for this drive in the BIOS?).
I've never had any success with SMART...for me, it always: a) says a bad disk is fine, or b) tells me only after it has already obviously failed (when such warning is useless).

My neighbor had me fix a laptop for him recently...HDD was clearly bad (hanging for minutes attempting to read sectors, spewing errors saying sectors couldn't even be found let alone read, Windows refusing to install), yet the disk insisted it was fine. Only after I tried formatting it twice did the drive tell me that it was bad. I put a new one in and it's been fine since.

Quote from: Tsmuktengan on February 16, 2012, 07:27:27 AM
You'v got the warranty for the hard drive. In a week or two you could get a new one indeed. Even when Western Digital's factories were still flooded, they were able to change my Caviar Blue.  :)
Based on the results of the disk check last night, it looks like I'll be using it. Glad to hear they're good at replacing them. :)
2010 was the year of the Na'vi.Vivar 'ivong Na'vi!


 
Avatray | NWOTD Sigbars | Sacred's Sigbar Tool | My collection of Avatar merchandise

Tsmuktengan

Using the warranty is the best thing you can do with a bad disk.

SMART is supposed to be a very reliable source of information for a disk's state. If SMART does not count read/write errors or sectors while there are issues with the disk, this means SMART is not enabled in the BIOS settings for hard drives. In many BIOSes, this is not enabled by default. Ensure either that the BIOS start screen clearly says SMART is enabled for the hard drive or that the setting is turned on precisely for your hard drive in the BIOS settings. The setting is sometimes not that accessible.

I had tens of drives between my hands, having to know if they were safe or not, and maintaining them. SMART has always shown reliable and real-time updated data on it's state (I have a hard drive that is still working, even with 8 million reallocated sectors).

Under Linux, I appreciate Palimpsest for helping a lot reading SMART stats and performing SMART tests and benchmarks to see the disk's health. Listening to the disk's sounds is also good to see if the metal arm isn't hitting or scratching the disk for example.


Swoka Ikran

#766
Quote from: Tsmuktengan on February 16, 2012, 08:26:14 PM
Using the warranty is the best thing you can do with a bad disk.

SMART is supposed to be a very reliable source of information for a disk's state. If SMART does not count read/write errors or sectors while there are issues with the disk, this means SMART is not enabled in the BIOS settings for hard drives. In many BIOSes, this is not enabled by default. Ensure either that the BIOS start screen clearly says SMART is enabled for the hard drive or that the setting is turned on precisely for your hard drive in the BIOS settings. The setting is sometimes not that accessible.
I'll check it again, but I think it's enabled already.

FWIW, I've had the opposite problem with SMART as well: 2 HDDs I got in a broken yard sale PC: Both report themselves as bad, yet they work fine. I used the disks for non-vital stuff since I didn't expect them to last...that was 2 years ago.

EDIT: Checked BIOS settings. It was already enabled, and the hardware monitor there says "Supported, Status OK" for the drive. Windows decided to run chkdsk during startup.

EDIT2: Argh. Just discovered that one of the 2 HDDs in the RAID in one of my file server/F@H boxes failed. That's my second HDD problem this week. Unlike my main PC's WD drive though, I can't complain about this one. I got 10 years out of it...
2010 was the year of the Na'vi.Vivar 'ivong Na'vi!


 
Avatray | NWOTD Sigbars | Sacred's Sigbar Tool | My collection of Avatar merchandise

Swoka Ikran

An update on this...I put the 750GB HDD in today and transferred the contents with GHOST. Working fine.

Now I'm waiting for the thing to defrag...my E: partition is 93% fragmented! D: is 87% fragmented, and C: is 45%...

As for my server that broke down on Thursday...HDD 0 in that is toast. I'm running SpinRite on it for laughs, but it's spewing SMART errors and has more bad blocks than I care to count.
2010 was the year of the Na'vi.Vivar 'ivong Na'vi!


 
Avatray | NWOTD Sigbars | Sacred's Sigbar Tool | My collection of Avatar merchandise

Tsmuktengan

* Tsmuktengan looks at his Seagate hard drives coming from the 90's...

I wonder how people can run into so many issues with hard drives. *whistles*

;D


bommel

I'm quite impressed by my GTX 580. It is not as loud as I thought and I receive almost 1.4k points/WU now (GTX 260: ~350) :)

Swoka Ikran

Quote from: bommel on February 19, 2012, 08:40:36 AM
I'm quite impressed by my GTX 580. It is not as loud as I thought and I receive almost 1.4k points/WU now (GTX 260: ~350) :)
How long does a unit take?
2010 was the year of the Na'vi.Vivar 'ivong Na'vi!


 
Avatray | NWOTD Sigbars | Sacred's Sigbar Tool | My collection of Avatar merchandise

bommel

Quote from: Swoka Ikran on February 19, 2012, 10:24:51 AM
How long does a unit take?
approx. 2 hours. I don't know exactly about the time my GTX 260 needed but I think it was quite the same (different WUs though).

Ftiafpi

I really need to time my 560Ti but I imagine it's not far off from the 580.

bommel

Quote from: Ftiafpi on February 20, 2012, 07:30:36 PM
I really need to time my 560Ti but I imagine it's not far off from the 580.
Depending on the GPU/VRAM clock speeds they are maybe just 2k-3k PPD apart. The GTX 560 is quite good at f@h. If you want to know the exact time have a look at the logfiles. They are usually located where you installed the f@h client.

bommel

I just want to report an issues with the latest Nvidia WHQL driver (295.73): if you have error messages like CoreStatus = 63 after completing a GPU-WU, it's caused by the new driver. Since I'm using it I have this issue after a WU finishes. The client tries to download a new core version but fails and wants to sleep for a day. With the older WHQL driver it works fine. I've heard from other users with the same problem but it doesn't seem to be a general problem (most affected users have a GTX 580).

Swoka Ikran

Good to know. I was planning to update sometime this week.

I'll pass on 295.73 now.
2010 was the year of the Na'vi.Vivar 'ivong Na'vi!


 
Avatray | NWOTD Sigbars | Sacred's Sigbar Tool | My collection of Avatar merchandise

bommel

I'm not yet sure but it may be related to power management. When the display power management is enabled and Windows shuts down the display, the GPU will enter a sleep state to save power. As long as the fah-core is running, it actually doesn't enter the sleep mode. But as soon as the core finishes a WU, the GPU will sleep and won't wake up when the new WU is ready to fold. This is the point where folding@home decides to stop working. Shutting off the display power management seems to be a workaround for this issue (i.e. set to "never turn off display"). As I switch off my display manually when not in use this is no big deal for me.

Here is a link to a post in the official Nvidia forums describing the issue and workaround.

Swoka Ikran

I don't turn my monitor off except at night, so the power management is kind of important...

I'll wait until the next release. I have 285.58 now, and everything is already working fine, so there's no real need to update at the moment anyway.
2010 was the year of the Na'vi.Vivar 'ivong Na'vi!


 
Avatray | NWOTD Sigbars | Sacred's Sigbar Tool | My collection of Avatar merchandise

bommel

I can confirm that this is the issue. After turning the power management off, it doesn't crash during the night.

bommel

We're getting closer to rank 400. But we have 25 inactive users :(