Cisco 6509 Damaged POE Modules

posted 30 Jun 2011, 03:05 by Tristan Self
We had a severe lightning storm with power cuts and multiple lightbolts hitting our buildings and grounds around our core server rooms. Luckly the UPSes carried the load and gobbled up the overvoltages etc no problem.
 
However the lightning induced an over voltage in the copper cables that run from the Cisco 6509 POE Blades (48 ports) to our workstations and POE telephones. Once the lightning was over we started getting calls about phones being down all over the place.
 
When checking the switch I saw the following (see below), and even after running a "hw-module" reset on each of the affected modules (blades) we still got no power (POE) to the phones, however the computers directly connected were working fine.
 
Upon reseating or resetting the module we got these errors on bootup of the module:
 
%PM_SCP-SP-2-LCP_FW_ERR_INFORM: Module 6 is experiencing the following error: Inline Power Module - PS voltage bad
 
%CONST_DIAG-SP-3-BOOTUP_TEST_FAIL: Module 6: TestVDB failed
 
A "show module" was showing this:
 
Mod  Online Diag Status
---- -------------------
  1  Pass
  3  Pass
  4  Pass
  5  Pass
  6  Minor Error
  7  Minor Error
  8  Minor Error

show diagnostic mod 7

Current bootup diagnostic level: complete

Module 7: SFM-capable 48 port 10/100/1000mb RJ45  SerialNo : xxxxxx

  Overall Diagnostic Result for Module 7 : MINOR ERROR
  Diagnostic level at card bootup: complete

  Test results: (. = Pass, F = Fail, U = Untested)

    1) TestLoopback:

   Port  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
   ----------------------------------------------------------------------------
         .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .

   Port 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
   ----------------------------------------------------------------------------
         .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .


    2) TestSynchedFabChannel -----------> .
    3) TestL3VlanMet -------------------> .
    4) TestIngressSpan -----------------> .
    5) TestEgressSpan ------------------> .
    6) TestAsicMemory ------------------> U
    7) TestFirmwareDiagStatus ----------> .
    8) TestEobcStressPing --------------> U
    9) TestAsicSync --------------------> .
   10) TestUnusedPortLoopback:

   Port  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
   ----------------------------------------------------------------------------
         U  U  U  U  U  U  U  U  U  U  U  U  U  U  U  U  U  U  U  U  U  U  U  U

   Port 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
   ----------------------------------------------------------------------------
         U  U  U  U  U  U  U  U  U  U  U  U  U  U  U  U  U  U  U  U  U  U  U  U


   11) TestErrorCounterMonitor ---------> .
   12) TestIntPortLoopback -------------> .
   13) TestPortTxMonitoring:

   Port  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
   ----------------------------------------------------------------------------
         U  U  U  U  U  U  U  U  U  U  U  U  U  U  U  U  U  U  U  U  U  U  U  U

   Port 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
   ----------------------------------------------------------------------------
         U  U  U  U  U  U  U  U  U  U  U  U  U  U  U  U  U  U  U  U  U  U  U  U


   14) TestVDB -------------------------> F
 
Basically from what I've found the TESTVDB means that the POE component on the blade has failed, and it will refuse to deliver power, at the time of writing this post, we don't know whether the whole chassis is fried, we assume it isn't as all the other blades and the data is working okay. And one of the other POE blades in the switch survived and still delivers power.
 
Fault to Cisco, cross fingers and hope the chassis isn't toast too.
 
Moral of the Story:
 
It doesn't matter how much UPS protection you add a direct lightning strike will cause significant damage and disruption to systems, the only protection you can have is to have enough staff to deal with the problems, and enough money to fix or replace the faulty kit.
 
Comments