Search This Blog

Wednesday, February 10, 2010

Die! *smash*

I have squashed the mystery.

I did some chatting with Merlin (never mind who that is, just that he knows some about the electrical engineering of computers, as well as programming). He gave me a couple of theoretical possibilities what the problem could be, some of which could be tested, others were simply components on the motherboard that I had no way to individually test.

One thing was the power supply. According to him, it's possible that when data is being sent from the memory to the controller (in 8 blocks of 64 bits), if the power supply is flaky the voltage could droop over time, which could hypothetically explain why the bit only fails only once in each cache line even if the problem is in one of those 64 data lines. However, this was easily disproved, as I do have other power supplies.

More importantly, while talking to him I thought of my grandma's computer (my computer prior to 2001), which had the same type of memory. While this computer is too old to support 512 meg DIMMs (this was why I couldn't just use it to verify the DIMMs worked and be done with it), it did have some smaller DIMMs in it (256 meg). See where this is going?

Now I had more than two DIMMs, and with them I was able to demonstrate that the same bit failure occurred with any combination of two DIMMs (although the frequency of the error did vary some depending on the pair used). This proved conclusively that the DIMMs themselves were not responsible, and the problem had to reside in the common element - the motherboard or the CPU. This appears to be a problem that only shows up when both DIMM slots were full.

Now, it's still possible that it could be a software (BIOS) problem that could be fixed by updating the BIOS, but I don't care to try that, for the reason I mentioned previously.

No comments: