Changes between Version 27 and Version 28 of Internal/NodeFailureModes


Ignore:
Timestamp:
Jun 25, 2012, 7:07:35 PM (12 years ago)
Author:
ssugrim
Comment:

Legend:

Unmodified
Added
Removed
Modified
  • Internal/NodeFailureModes

    v27 v28  
    11= List of Node Failures =
     21 = good, 0 = bad
    23
    3 || ''' Node ''' || ''' Failure Mode ''' || ''' Solution / Notes ''' ||
    4 || [1,5] ||Pxe Halt - Locks up during execution of PXE code ||Multiple resets (more than 1) [[BR]] may be required [[BR]] Might require node Change ||
    5 || [1,5] ||Dead Node ID box top LED (the blinking one) || Power cycle Fixed it [[BR]] Rabbit Issue? ||
    6 || [3,8] ||First Power on Halt || Locks during the first attempt [[BR]] Post after reset ||
    7 || [17,4] ||First Power on Halt || Locks during the first attempt [[BR]] no serial console output ||
    8 || [1,14] ||First Power on Halt || Locks during the first attempt [[BR]] Reset Fixes it [[BR]] has new disk [[BR]] ||
    9 || [20,19] ||Disk Failure || Kernel Throws errors during imageing [[BR]] Disk Changed ||
    10 || [12,9] ||Disk Controller  Failure || Disk controller was having issues, disks were being incorrectly recognised||
    11 || [3,18] ||Disk Failure || Disk Write errors [[BR]] Disk replaced ||
    12 || [5,11] ||Disk Failure || Disk Write errors [[BR]] Disk replaced ||
    13 || [14,11] ||Disk Failure || Disk Write errors [[BR]] Disk replaced ||
    14 || [13,5] ||Lock Up || Rabbit and Node were halted [[BR]] Power cycled||
    15 || [4,11] ||Disk Failure || Disk Write errors [[BR]] Disk replaced ||
    16 || [5,9] ||Disk Failure || Disk Write errors [[BR]] Disk replaced ||
    17 || [9,11] ||Disk Failure || Disk Write errors [[BR]] Disk replaced ||
    18 || [3,19] ||Bad Node || Mother board Failure, refused to boot [[BR]] Replaced ||
    19 || [14,8] ||Disk Failure || Kernel Throws Disk Errors [[BR]] Disk Changed||
    20 || [17,9] ||Disk Failure || Disk write halts, imaging times out[[BR]] Disk replaced ||
    21 || [18,3] ||Over heat || CM measures internal temp at 106F, fails to boot reliably ||
    22 || [20,2] ||Disk Failure || Disk Write errors [[BR]] Disk replaced ||
    23 || [8,13] ||Disk Failure || Disk Write errors [[BR]] Disk replaced ||
    24 || [9,10] ||Disk Failure || Disk Write errors [[BR]] Disk replaced ||
    25 || [5,2] ||Disk Failure || Disk Write errors [[BR]] Disk replaced ||
    26 || [17,13] ||Disk Failure || Disk Write errors [[BR]] Disk replaced ||
    27 || [12,1] ||Disk Failure || Disk Write errors [[BR]] Disk replaced ||
    28 || [6,14] ||Disk Failure || Disk Write errors [[BR]] Disk replaced ||
    29 || [17,19] || Memory Failure || Memory Pins did not make proper contact, Bent case and reinserted memory ||
    30 || [7,2] ||Disk Failure || Disk Write errors [[BR]] Disk replaced ||
    31 || [5,15] ||Lock Up || Rabbit and Node were halted, node ID box LED was solid [[BR]] Power cycled||
    32 || [7,2] ||Lock Up || Rabbit and Node were halted, node ID box led was off [[BR]] Power cycled||
    33 || [16,1] ||Lock Up || Rabbit and Node were halted [[BR]] Power cycled||
    34 || [1,9] ||Intermitten failure || Power cycled||
    35 || [1,5] ||Disk Failure || Failing disk caused disk controller to fail[[BR]] Cm had issues also, both replaced||
    36 || [9,4] ||Disk Failure || Failing disk caused disk controller to fail[[BR]] Cm had issues also, both replaced||
    37 || [15,6] ||Disk Failure || Disk Write errors [[BR]] Disk replaced ||
    38 || [18,16] ||Disk Failure || Disk Write errors [[BR]] Disk replaced ||
    39 || [3,11] ||Disk Failure || Disk Write errors [[BR]] Disk replaced ||
    40 || [16,19] ||Disk Failure || Disk Write errors [[BR]] Disk replaced ||
    41 || [5,17] ||Disk Failure || Disk Write errors [[BR]] Disk replaced ||
    42 || [20,4] || Node Failure|| Node was replaced ||
    43 || [15,4] || Node Failure|| Node was replaced, bad left antenna connector. Replacement was used ||
    44 || [5,14] || Overheat|| Fan was not plugged in ||
    45 || [17,4] || Disk Failure || Smartctl reports impending disk death||
    46 || [9,9] || Memory Failure || Memory Pins did not make proper contact, Bent case and reinserted memory ||
    47 || [11,4] ||Disk Failure || Disk Write errors [[BR]] Disk replaced ||
    48 || [12,7] ||Disk Failure || Disk Write errors [[BR]] Disk replaced ||
    49 || [13,2] ||Disk Failure || Successfully booted from disk, but kernel was throwing disk errors ||
    50 || [16,6] ||Disk Failure || SMART overall-health self-assessment test result: FAILED! ||
    51 || [13,5] ||Disk Failure || kernel throwing disk errors ||
    52 || [17,3] ||Disk Failure || kernel throwing disk errors ||
    53 || [14,12] ||Pxe Halt - Locks up during execution of PXE code || '''Not Fixed''' ||
    54 || [11,15] ||Network Failure || Pxe give media check failure [[BR]]] Node replaced ||
    55 || [19,6] ||Pxe Halt ||Powers down during pxe ||
    56 || [15,7] ||Pxe Halt ||Halts at random stages in the pxe image download process, before control in handed over to kernel ||
    57 || [16,8] ||CM crash ||Power Cycled ||
    58 || [20,20] ||CM crash ||CM light stays solid, Power Cycled ||
    59 || [7,2] || CM crash ||Node ID light stays off, Power Cycled ||
    60 || [2,20] ||CM crash ||CM light stays solid, Power Cycled ||
    61 || [14,12] ||Disk Failure || Disk Write errors [[BR]] Disk replaced ||
    62 || [10,7] ||Disk Failure || Disk Write errors [[BR]] Disk replaced ||
    63 || [11,18] ||Disk Failure || Disk Write errors [[BR]] Disk replaced ||
    64 || [1,15] ||Disk Failure || Disk Write errors [[BR]] Disk replaced ||
    65 || [8,3] ||Disk Failure || Disk Write errors [[BR]] Disk replaced ||
    66 || [2,11] ||Disk Failure || Disk Write errors [[BR]] Disk replaced ||
    67 || [11,16] ||Disk Failure || Disk Write errors [[BR]] Disk replaced ||
    68 || [7,8] ||Disk Failure || Bios Does not detect disk [[BR]] Disk replaced ||
    69 || [18,7] ||Disk Failure || Bios Does not detect disk [[BR]] Disk replaced ||
    70 || [2,17] ||Disk Failure || Bios Does not detect disk [[BR]] Disk replaced ||
    71 || [5,19] ||Disk Failure || Bios Does not detect disk [[BR]] Disk replaced ||
    72 || [7,2] ||Disk Failure || kernel throwing disk errors [[BR]] Disk replaced||
    73 || [12,4] ||Disk Failure || kernel throwing disk errors [[BR]] Disk replaced ||
    74 || [1,8] ||Disk Failure || kernel throwing disk errors [[BR]] Disk replaced ||
    75 || [18,18] ||Disk Failure || kernel throwing disk errors [[BR]] Disk replaced ||
    76 || [14,20] ||Disk Failure || kernel throwing disk errors [[BR]] Disk replaced ||
    77 || [9,16] ||Disk Failure || kernel throwing disk errors [[BR]] Disk replaced ||
    78 || [4,6] ||Disk Failure || kernel throwing disk errors [[BR]] Disk replaced ||
    79 || [6,8] ||Disk Failure || kernel throwing disk errors [[BR]] Disk replaced ||
    80 || [3,13] ||Disk Failure || kernel throwing disk errors [[BR]] Disk replaced ||
    81 || [5,4] ||Disk Failure || kernel throwing disk errors [[BR]] Disk replaced ||
    82 || [10,5] ||Disk Failure || kernel throwing disk errors [[BR]] Disk replaced ||
    83 || [10,8] ||Disk Failure || kernel throwing disk errors [[BR]] Disk replaced ||
    84 || [8,8] ||Network Failure || Kernel throws network hardware complain [wiki:Internal/NodeFailureModes/Node8.8 during dhcp]||
    85 || [12,4] || Disk Failure || Bios Does not detect disk [[BR]] Disk replaced ||
    86 ||[8,10]|| Disk Failure || Bios Does not detect disk [[BR]] Disk replaced ||
    87 ||[15,17] || Disk Failure || Bios Does not detect disk [[BR]] Disk replaced ||
    88 ||[10,2] || Disk Failure || Bios Does not detect disk [[BR]] Disk replaced ||
    89 ||[1,6] || Disk Failure || kernel throwing disk errors [[BR]] Disk replaced [[BR]] hda: dma_timer_expiry: dma status == 0x21||
    90 ||[18,12] ||Disk Failure || kernel throwing disk errors [[BR]] Disk replaced [[BR]] hda: dma_timer_expiry: dma status == 0x21||
    91 || [1,10] ||Disk Failure || kernel throwing disk errors [[BR]] Disk replaced ||
    92 || [13,8] ||Disk Failure || kernel throwing disk errors [[BR]] Disk replaced ||
    93 || [12,12] ||Disk Failure || kernel throwing disk errors [[BR]] Disk replaced ||
    94 || [8,8] ||Disk Failure || kernel throwing disk errors [[BR]] Disk replaced ||
    95 || [2,3] ||Disk Failure || kernel throwing disk errors [[BR]] Disk replaced ||
    96 || [2,14] ||Disk Failure || kernel throwing disk errors [[BR]] Disk replaced ||
    97 || [13,17] ||Disk Failure || kernel throwing disk errors [[BR]] Disk replaced ||
    98 || [16,17] ||Disk Failure || kernel throwing disk errors [[BR]] Disk replaced ||
    99 || [1,2] || Node Failure || Can't isolate Problem: Seems to over heat and [wiki:Internal/NodeFailureModes/Node1.2 kernel panic] ||
    100 || [6,20] ||Disk Failure || Disk Write errors [[BR]] Disk replaced ||
    101 || [20,20] ||Disk Failure || Disk Write errors [[BR]] Disk replaced ||
    102 || [18,19] ||Disk Failure || Disk Write errors [[BR]] Disk replaced ||
    103 || [16,17] ||Disk Failure || Disk Write errors [[BR]] Disk replaced ||
    104 || [11,17] ||Disk Failure || Disk Write errors [[BR]] Disk replaced ||
    105 || [1,3] ||Disk Failure || Disk Write errors [[BR]] Disk replaced ||
    106 || [2,3] ||Disk Failure || Disk Write errors [[BR]] Disk replaced ||
    107 || [6,15] ||Disk Failure || Disk Write errors [[BR]] Disk replaced ||
    108 || [13,9] ||Disk Failure || Disk Write errors [[BR]] Disk replaced ||
    109 || [11,10] ||Disk Failure || Disk Write errors [[BR]] Disk replaced ||
    110 || [15,2] ||Disk Failure || Disk Write errors [[BR]] Disk replaced ||
    111 || [13,7] ||Disk Failure || Disk Write errors [[BR]] Disk replaced ||
    112 || [11,1] ||Disk Failure || Disk Write errors [[BR]] Disk replaced ||
    113 || [15,10] ||Disk Failure || Disk Write errors [[BR]] Disk replaced ||
    114 || [7,10] ||Disk Failure || Disk Write errors [[BR]] Disk replaced ||
    115 || [11,9] ||Disk Failure || Disk Write errors [[BR]] Disk replaced ||
    116 || [3,14] ||Disk Failure || Disk Write errors [[BR]] Disk replaced ||
    117 || [8,12] ||Disk Failure || Disk Write errors [[BR]] Disk replaced ||
    118 || [15,5] ||Disk Failure || Disk Write errors [[BR]] Disk replaced ||
    119 || [2,13] ||Disk Failure || Disk Write errors [[BR]] Disk replaced ||
    120 || [19,2] ||Disk Failure || Disk Write errors [[BR]] Disk replaced ||
    121 || [14,7] ||Disk Failure || Disk Write errors [[BR]] Disk replaced ||
     4|| ''' Front Face # ''' || ''' Disk ''' || ''' Memory ''' || ''' CPU ''' || ''' Fan ''' || ''' Network  ''' ||  ''' Solution / Notes ''' ||
     5|| 1 || 1 || 1 || 1 || 1 || 1 || Dies in the middle of PXE ||