| Version 22 (modified by , 17 years ago) ( diff ) | 
|---|
List of Node Failures
| Node | Failure Mode | Solution / Notes | 
| [1,5] | Pxe Halt - Locks up during execution of PXE code | Multiple resets (more than 1)  may be required Might require node Change  | 
| [1,5] | Dead Node ID box top LED (the blinking one) |  Power cycle Fixed it  Rabbit Issue?  | 
| [3,8] | First Power on Halt |  Locks during the first attempt  Post after reset  | 
| [17,4] | First Power on Halt |  Locks during the first attempt  no serial console output  | 
| [1,14] | First Power on Halt |  Locks during the first attempt  Reset Fixes it has new disk  | 
| [20,19] | Disk Failure |  Kernel Throws errors during imageing  Disk Changed  | 
| [12,9] | Disk Controller Failure | Disk controller was having issues, disks were being incorrectly recognised | 
| [3,18] | Disk Failure |  Disk Write errors  Disk replaced  | 
| [5,11] | Disk Failure |  Disk Write errors  Disk replaced  | 
| [14,11] | Disk Failure |  Disk Write errors  Disk replaced  | 
| [13,5] | Lock Up |  Rabbit and Node were halted  Power cycled  | 
| [4,11] | Disk Failure |  Disk Write errors  Disk replaced  | 
| [5,9] | Disk Failure |  Disk Write errors  Disk replaced  | 
| [9,11] | Disk Failure |  Disk Write errors  Disk replaced  | 
| [3,19] | Bad Node |  Mother board Failure, refused to boot  Replaced  | 
| [14,8] | Disk Failure |  Kernel Throws Disk Errors  Disk Changed  | 
| [17,9] | Disk Failure |  Disk write halts, imaging times out Disk replaced  | 
| [18,3] | Over heat | CM measures internal temp at 106F, fails to boot reliably | 
| [20,2] | Disk Failure |  Disk Write errors  Disk replaced  | 
| [8,13] | Disk Failure |  Disk Write errors  Disk replaced  | 
| [9,10] | Disk Failure |  Disk Write errors  Disk replaced  | 
| [5,2] | Disk Failure |  Disk Write errors  Disk replaced  | 
| [17,13] | Disk Failure |  Disk Write errors  Disk replaced  | 
| [12,1] | Disk Failure |  Disk Write errors  Disk replaced  | 
| [6,14] | Disk Failure |  Disk Write errors  Disk replaced  | 
| [17,19] | Memory Failure | Memory Pins did not make proper contact, Bent case and reinserted memory | 
| [7,2] | Disk Failure |  Disk Write errors  Disk replaced  | 
| [5,15] | Lock Up |  Rabbit and Node were halted, node ID box LED was solid  Power cycled  | 
| [7,2] | Lock Up |  Rabbit and Node were halted, node ID box led was off  Power cycled  | 
| [16,1] | Lock Up |  Rabbit and Node were halted  Power cycled  | 
| [1,9] | Intermitten failure | Power cycled | 
| [1,5] | Disk Failure |  Failing disk caused disk controller to fail Cm had issues also, both replaced  | 
| [9,4] | Disk Failure |  Failing disk caused disk controller to fail Cm had issues also, both replaced  | 
| [15,6] | Disk Failure |  Disk Write errors  Disk replaced  | 
| [18,16] | Disk Failure |  Disk Write errors  Disk replaced  | 
| [3,11] | Disk Failure |  Disk Write errors  Disk replaced  | 
| [16,19] | Disk Failure |  Disk Write errors  Disk replaced  | 
| [5,17] | Disk Failure |  Disk Write errors  Disk replaced  | 
| [20,4] | Node Failure | Node was replaced | 
| [15,4] | Node Failure | Node was replaced, bad left antenna connector. Replacement was used | 
| [5,14] | Overheat | Fan was not plugged in | 
| [17,4] | Disk Failure | Smartctl reports impending disk death | 
| [9,9] | Memory Failure | Memory Pins did not make proper contact, Bent case and reinserted memory | 
| [11,4] | Disk Failure |  Disk Write errors  Disk replaced  | 
| [12,7] | Disk Failure |  Disk Write errors  Disk replaced  | 
| [13,2] | Disk Failure | Successfully booted from disk, but kernel was throwing disk errors | 
| [16,6] | Disk Failure | SMART overall-health self-assessment test result: FAILED! | 
| [13,5] | Disk Failure | kernel throwing disk errors | 
| [17,3] | Disk Failure | kernel throwing disk errors | 
| [14,12] | Pxe Halt - Locks up during execution of PXE code | Not Fixed | 
| [11,15] | Network Failure |  Pxe give media check failure  ] Node replaced  | 
| [19,6] | Pxe Halt | Powers down during pxe | 
| [15,7] | Pxe Halt | Halts at random stages in the pxe image download process, before control in handed over to kernel | 
| [16,8] | CM crash | Power Cycled | 
| [20,20] | CM crash | CM light stays solid, Power Cycled | 
| [7,2] | CM crash | Node ID light stays off, Power Cycled | 
| [2,20] | CM crash | CM light stays solid, Power Cycled | 
| [14,12] | Disk Failure |  Disk Write errors  Disk replaced  | 
| [10,7] | Disk Failure |  Disk Write errors  Disk replaced  | 
| [11,18] | Disk Failure |  Disk Write errors  Disk replaced  | 
| [1,15] | Disk Failure |  Disk Write errors  Disk replaced  | 
| [8,3] | Disk Failure |  Disk Write errors  Disk replaced  | 
| [2,11] | Disk Failure |  Disk Write errors  Disk replaced  | 
| [11,16] | Disk Failure |  Disk Write errors  Disk replaced  | 
| [7,8] | Disk Failure |  Bios Does not detect disk  Disk replaced  | 
| [18,7] | Disk Failure |  Bios Does not detect disk  Disk replaced  | 
| [2,17] | Disk Failure |  Bios Does not detect disk  Disk replaced  | 
| [5,19] | Disk Failure |  Bios Does not detect disk  Disk replaced  | 
| [7,2] | Disk Failure |  kernel throwing disk errors  Disk replaced  | 
| [12,4] | Disk Failure |  kernel throwing disk errors  Disk replaced  | 
| [1,8] | Disk Failure |  kernel throwing disk errors  Disk replaced  | 
| [18,18] | Disk Failure |  kernel throwing disk errors  Disk replaced  | 
| [14,20] | Disk Failure |  kernel throwing disk errors  Disk replaced  | 
| [9,16] | Disk Failure |  kernel throwing disk errors  Disk replaced  | 
| [4,6] | Disk Failure |  kernel throwing disk errors  Disk replaced  | 
| [6,8] | Disk Failure |  kernel throwing disk errors  Disk replaced  | 
| [3,13] | Disk Failure |  kernel throwing disk errors  Disk replaced  | 
| [5,4] | Disk Failure |  kernel throwing disk errors  Disk replaced  | 
| [10,5] | Disk Failure |  kernel throwing disk errors  Disk replaced  | 
| [10,8] | Disk Failure |  kernel throwing disk errors  Disk replaced  | 
| [8,8] | Network Failure |  Kernel throws network hardware complain during dhcp  eth0: — ERROR — Class: Hardware failure Nr: 0x270 Msg: 2 Pair Downshift detected eth0: network connection up using port A speed: 100 autonegotiation: yes  | 
| [12,4] | Disk Failure |  Bios Does not detect disk  Disk replaced  | 
| [8,10] | Disk Failure |  Bios Does not detect disk  Disk replaced  | 
| [15,17] | Disk Failure |  Bios Does not detect disk  Disk replaced  | 
| [10,2] | Disk Failure |  Bios Does not detect disk  Disk replaced  | 
| [1,6] | Disk Failure |  kernel throwing disk errors  Disk replaced hda: dma_timer_expiry: dma status == 0x21  | 
| [18,12] | Disk Failure |  kernel throwing disk errors  Disk replaced hda: dma_timer_expiry: dma status == 0x21  | 
  Note:
 See   TracWiki
 for help on using the wiki.
    