I got a BP6 just recently. I got it for nothing with no guarantee of function.
Put processors and all the components in and switched on. It booted nicely,
so I've upgraded the BIOS to the latest release available. I was aware of
the issues the BP6 has, hence test it with memtest86++. The board didn't pass
test #8 and also #10. I knew it's a board problem, because I was sure that the
memory sticks are OK since they has been tested in another machine.
Replacement of EC10 capacitor for 1500µF, 6.3V one has solved the problem.
Linux installation was a breeze and after a short while a had the BP6 up and
running. The machine was quite stable, but occasionally there have been nasty
DMA timeouts, especially during copy of big files which led to corruption. I've
also find out that /proc/interrupts statistics shows some errors, especially
during long compiles. From what I've learned from kernel docs these are not
dangerous since these are errors detected by IO-APIC which repeats
the transaction that failed. But it indicates some issue with interrupt
distribution. And I've got a suspicion that DMA timeouts problem is somewhat
related with that. As I've said earlier the timeouts happened rather rarely and
apart from that the machine was very stable. Hence I thought that all the
problems are caused by heat (it were very very hot summer days), but after
installation of lm_sensors I realized that I was mistaken, since
all the crucial temperatures were within the limits (outdoor temperatures
dropped down significantly since than as well). The same goes for voltages,
all swingins are well within the limits. Installation of X-Window has deepen
the problem even more. I had to switch of GLX (and DRI) since the machine
sooner or later freezed completely. From what I'm picking up, a lock-up occurs
when there are outstanding interrupts from GFX board and IO subsystem
simultaneously (that is, when X-Window system is accelerated and there happen
to be a disk request serviced). That happens more frequently when the machine
is under a heavy load, but very often when the machine is almost idle as well.
My suspicion is on interrupt distribution, that is, lost interrupts from IO
subsystem lead cause the DMA timeouts and similarily lost interrupts from GFX
board lock-up the X.
I was aware of the IRQ sharing from the beginning and did placed all the add-on
cards so that they don't share a interrupt. So that shouldn't be a problem.
I also set the interrupts for every IRQ line manually in the BIOS. I've tried
with ACPI on as well as off. Tried MP specification v1.4 as well as v1.1.
However, as you might imagine, to no avail. One solution might seem to be set
the interrupt affinity to one processor, but since the APIC errors happens on
both, it probably wouldn't work. On the other hand, what helped, was removing
one processor from its socket. Tried both CPUs in each socket and with only
one CPU in any socket the machine was rock solid, no DMA timeouts, no APIC
errors, no lock-ups whatsoever. I will repeat myself, but the whole problem
seems to be in distribution of interrupts among CPUs, which is not the case
when there is only one CPU in the system. As I've mentioned above, I did
replaced EC10 capacitor already and the rest caps seem to be in shape (no
leakage, no voltage swingins). Will the replacement of all capacitors help?
Anybody with a similar problem? Any bright idea?
Configuration:
Abit BP6, BIOS RV release, HPT 1.30b
|-> Celeron 500MHz, FSB 66MHz (no overclock)
|-> Celeron 500MHz, FSB 66MHz (no overclock)
|-> 128MB, 100MHz SDRAM
|-> 128MB, 100MHz SDRAM
|-> 128MB, 100MHz SDRAM
|-> AGP -> Radeon 7000, 64MB DDR
|-> PCI
|-> PCI
|-> PCI
|-> PCI -> SoundBlaster 128
|-> PCI
|-> ISA
|-> ISA
|-> PIIX4 IDE -> Seagate Medalist ST33210A, 3.2GB
|-> PIIX4 IDE -> Teac CD-W552E 52x CD-RW
|-> HPT366 -> Western Digital Protege WD200AA 20GB
|-> HPT366 -> Western Digital Caviar WD205EE 20.5GB
Linux 2.6.17, GCC 4.1.1, Xorg 7.1
