Author Topic: Unplanned Server Outage - 21aug2014  (Read 4481 times)

Offline quadz

  • Loquaciously Multiloquent Member
  • ****
  • Posts: 5352
    • View Profile
  • Rated:
Unplanned Server Outage - 21aug2014
« on: August 21, 2014, 10:59:38 PM »
Salutations,

Here's a synopsis of the unplanned server outage that occurred today.

At 1330 days of uptime, our primary server fragbait.tastyspleen.net crashed with a kernel panic in the Ethernet driver.

Subsequent attempts to reboot and/or power cycle the server were met with failures to bring up the network driver:

[    1.921773] e1000e: Intel(R) PRO/1000 Network Driver - 1.0.2-k2
[    1.921836] e1000e: Copyright (c) 1999-2008 Intel Corporation.
[    1.921945] e1000e 0000:04:00.0: PCI INT A -> GSI 18 (level, low) -> IRQ 18
[    1.922025] e1000e 0000:04:00.0: setting latency timer to 64
[    1.922140] e1000e 0000:04:00.0: irq 54 for MSI/MSI-X
[    2.172397] e1000e 0000:04:00.0: PCI INT A disabled
[    2.172467] e1000e: probe of 0000:04:00.0 failed with error -2
[    2.172552] e1000e 0000:04:00.1: PCI INT B -> GSI 19 (level, low) -> IRQ 19
[    2.172629] e1000e 0000:04:00.1: setting latency timer to 64
[    2.172754] e1000e 0000:04:00.1: irq 54 for MSI/MSI-X
[    2.423008] e1000e 0000:04:00.1: PCI INT B disabled
[    2.423070] e1000e: probe of 0000:04:00.1 failed with error -2


After I opened a support ticket, I was able to watch the tech assigned by SoftLayer over the virtual KVM as he attempted to troubleshoot.

Sometimes when the tech would boot into a "rescue" kernel, the network driver would load. Sometimes after that when he booted back into the fragbait kernel the networking would still be OK--which is why we had a brief interval in the afternoon where the system was back up for awhile. However, on subsequent reboots the networking would go back to the error condition above.

The tech suggested updating the BIOS on the motherboard, but unfortunately this had no noticable effect on the problem.

The tech then continued running various diagnostic programs at occasional intervals, however the programs the tech was running depended on the Ethernet devices existing (but the devices did not exist, because the driver was unable to load, again related to the error above -- so these diagnostic tools were unable to run.)

So it was an interesting experience watching sometimes lengthy periods of inactivity punctuated by the running of another diagnostic tool which had no chance of working.

Eventually the problem was escalated to a higher tier tech who was aware of a rather bizarre problem with the particular motherboard on the fragbait server.

He was able to get the networking back up, and explained the issue as follows:

Quote
I am going to go ahead and put an account note on your account so that we can get this fixed in a faster fashion in the future.

There is a specific issue with the motherboard in this server. There is a trick of sorts that has to be performed as this motherboard can lose network access at times.

basically for the server fragbait2 which has the X7DBR-E we have to shutdown the machine, set the network interfaces to 0 then push the config to our switches.

Then wait a minute or two, then power back on the machine and then reset the speed of the interfaces back to the required speed which for this one is 1Gbps.

After that networking starts to work and is a documented issue for this board.

I did go ahead and add the notes for this server so hopefully any issues will be fixed in a quick manner.

So.... that's definitely an odd one. I'm glad somebody there knew a workaround. It will be interesting to see how long it lasts. (If it can last another 1330 days, I won't complain much. But who knows...)


:exqueezeme:
  • Insightful
    Informative
    Funny
    Nice Job / Good Work
    Rock On
    Flawless Logic
    Well-Reasoned Argument and/or Conclusion
    Demonstrates Exceptional Knowlege of the Game
    Appears Not to Comprehend Game Fundamentals
    Frag of the Week
    Frag Hall of Fame
    Jump of the Week
    Jump Hall of Fame
    Best Solution
    Wins The Internet
    Whoosh! You done missed the joke thar Cletus!
    Obvious Troll Is Obvious
    DO YOU EVEN LIFT?
    DEMO OR STFU
    Offtopic
    Flamebait
    Redundant
    Factually Challenged
    Preposterously Irrational Arguments
    Blindingly Obvious Logical Fallacies
    Absurd Misconstrual of Scientific Principles or Evidence
    Amazing Conspiracy Theory Bro
    Racist Ignoramus
"He knew all the tricks, dramatic irony, metaphor, bathos, puns, parody, litotes and... satire. He was vicious."

Offline VaeVictis

  • i was -1 because you fucking suck
  • Brobdingnagian Member
  • *
  • Posts: 4498
    • View Profile
  • Rated:
Re: Unplanned Server Outage - 21aug2014
« Reply #1 on: August 22, 2014, 01:22:36 AM »
Never seen such an odd issue with a super micro board like that. Usually super micro boards are rock solid without any kind of issues
  • Insightful
    Informative
    Funny
    Nice Job / Good Work
    Rock On
    Flawless Logic
    Well-Reasoned Argument and/or Conclusion
    Demonstrates Exceptional Knowlege of the Game
    Appears Not to Comprehend Game Fundamentals
    Frag of the Week
    Frag Hall of Fame
    Jump of the Week
    Jump Hall of Fame
    Best Solution
    Wins The Internet
    Whoosh! You done missed the joke thar Cletus!
    Obvious Troll Is Obvious
    DO YOU EVEN LIFT?
    DEMO OR STFU
    Offtopic
    Flamebait
    Redundant
    Factually Challenged
    Preposterously Irrational Arguments
    Blindingly Obvious Logical Fallacies
    Absurd Misconstrual of Scientific Principles or Evidence
    Amazing Conspiracy Theory Bro
    Racist Ignoramus

Offline fishxz

  • Newbie
  • *
  • Posts: 28
    • View Profile
  • Rated:
Re: Unplanned Server Outage - 21aug2014
« Reply #2 on: August 22, 2014, 05:53:58 AM »
sad to hear quadz :( my heart is bleeding for ur uptime :(
  • Insightful
    Informative
    Funny
    Nice Job / Good Work
    Rock On
    Flawless Logic
    Well-Reasoned Argument and/or Conclusion
    Demonstrates Exceptional Knowlege of the Game
    Appears Not to Comprehend Game Fundamentals
    Frag of the Week
    Frag Hall of Fame
    Jump of the Week
    Jump Hall of Fame
    Best Solution
    Wins The Internet
    Whoosh! You done missed the joke thar Cletus!
    Obvious Troll Is Obvious
    DO YOU EVEN LIFT?
    DEMO OR STFU
    Offtopic
    Flamebait
    Redundant
    Factually Challenged
    Preposterously Irrational Arguments
    Blindingly Obvious Logical Fallacies
    Absurd Misconstrual of Scientific Principles or Evidence
    Amazing Conspiracy Theory Bro
    Racist Ignoramus

Offline bluemeanies

  • Swanky Member
  • *****
  • Posts: 520
    • View Profile
  • Rated:
Re: Unplanned Server Outage - 21aug2014
« Reply #3 on: August 22, 2014, 10:50:02 AM »
1330

I think you misspelled 1337. Thanks for all your hard work and free service quadz.

And yes nerds...we've all heard the story of that company finding a server that had been up for 20 years dry walled in behind a wall in some basement. That uptime seems impressive to me for what it handles and all the bullshit that probably gets thrown at it too.
  • Insightful
    Informative
    Funny
    Nice Job / Good Work
    Rock On
    Flawless Logic
    Well-Reasoned Argument and/or Conclusion
    Demonstrates Exceptional Knowlege of the Game
    Appears Not to Comprehend Game Fundamentals
    Frag of the Week
    Frag Hall of Fame
    Jump of the Week
    Jump Hall of Fame
    Best Solution
    Wins The Internet
    Whoosh! You done missed the joke thar Cletus!
    Obvious Troll Is Obvious
    DO YOU EVEN LIFT?
    DEMO OR STFU
    Offtopic
    Flamebait
    Redundant
    Factually Challenged
    Preposterously Irrational Arguments
    Blindingly Obvious Logical Fallacies
    Absurd Misconstrual of Scientific Principles or Evidence
    Amazing Conspiracy Theory Bro
    Racist Ignoramus
Quote from: John Kreese
Mercy is for the weak. Here, in the streets, in competition: A man confronts you, he is the enemy. An enemy deserves no mercy.

Offline |iR|Focalor

  • Irrepressibly Profuse Member
  • *
  • Posts: 15769
  • Help Destroy America: VOTE DEMOCRAT
    • View Profile
    • Focalor's Horrible Website: We Rape You Til The Room Stinks
  • Rated:
Re: Unplanned Server Outage - 21aug2014
« Reply #4 on: August 22, 2014, 12:48:31 PM »
It's a good thing, I was beginning to revert back to my pre-internet stone-age instincts. I had thrown all the furniture outside over the back deck rail and was getting ready to set it on fire and dance naked around it with yak shit and deer blood smeared all over my body, a gesture which I seem to remember back in the 80's was used to lure in fertile and receptive females for the mating ritual.
  • Insightful
    Informative
    Funny
    Nice Job / Good Work
    Rock On
    Flawless Logic
    Well-Reasoned Argument and/or Conclusion
    Demonstrates Exceptional Knowlege of the Game
    Appears Not to Comprehend Game Fundamentals
    Frag of the Week
    Frag Hall of Fame
    Jump of the Week
    Jump Hall of Fame
    Best Solution
    Wins The Internet
    Whoosh! You done missed the joke thar Cletus!
    Obvious Troll Is Obvious
    DO YOU EVEN LIFT?
    DEMO OR STFU
    Offtopic
    Flamebait
    Redundant
    Factually Challenged
    Preposterously Irrational Arguments
    Blindingly Obvious Logical Fallacies
    Absurd Misconstrual of Scientific Principles or Evidence
    Amazing Conspiracy Theory Bro
    Racist Ignoramus

Offline haunted

  • Irrepressibly Profuse Member
  • *
  • Posts: 10149
  • I am hollywood.
    • View Profile
  • Rated:
Re: Unplanned Server Outage - 21aug2014
« Reply #5 on: August 22, 2014, 01:09:16 PM »
It's a good thing, I was beginning to revert back to my pre-internet stone-age instincts. I had thrown all the furniture outside over the back deck rail and was getting ready to set it on fire and dance naked around it with yak shit and deer blood smeared all over my body, a gesture which I seem to remember back in the 80's was used to lure in fertile and receptive females for the mating ritual.

You do that too when you can't get into dev/random to talk about football?
  • Insightful
    Informative
    Funny
    Nice Job / Good Work
    Rock On
    Flawless Logic
    Well-Reasoned Argument and/or Conclusion
    Demonstrates Exceptional Knowlege of the Game
    Appears Not to Comprehend Game Fundamentals
    Frag of the Week
    Frag Hall of Fame
    Jump of the Week
    Jump Hall of Fame
    Best Solution
    Wins The Internet
    Whoosh! You done missed the joke thar Cletus!
    Obvious Troll Is Obvious
    DO YOU EVEN LIFT?
    DEMO OR STFU
    Offtopic
    Flamebait
    Redundant
    Factually Challenged
    Preposterously Irrational Arguments
    Blindingly Obvious Logical Fallacies
    Absurd Misconstrual of Scientific Principles or Evidence
    Amazing Conspiracy Theory Bro
    Racist Ignoramus

Offline |iR|Focalor

  • Irrepressibly Profuse Member
  • *
  • Posts: 15769
  • Help Destroy America: VOTE DEMOCRAT
    • View Profile
    • Focalor's Horrible Website: We Rape You Til The Room Stinks
  • Rated:
Re: Unplanned Server Outage - 21aug2014
« Reply #6 on: August 22, 2014, 01:41:26 PM »
Of corpse! I remember when we all did that before the internet. Before President Bill Clinton invented the internet in 1994, we didn't have yahoo and wikipedia to tell us everything, so we were adamantly convinced that two slot toasters were actually fancy sock drying machines.
  • Insightful
    Informative
    Funny
    Nice Job / Good Work
    Rock On
    Flawless Logic
    Well-Reasoned Argument and/or Conclusion
    Demonstrates Exceptional Knowlege of the Game
    Appears Not to Comprehend Game Fundamentals
    Frag of the Week
    Frag Hall of Fame
    Jump of the Week
    Jump Hall of Fame
    Best Solution
    Wins The Internet
    Whoosh! You done missed the joke thar Cletus!
    Obvious Troll Is Obvious
    DO YOU EVEN LIFT?
    DEMO OR STFU
    Offtopic
    Flamebait
    Redundant
    Factually Challenged
    Preposterously Irrational Arguments
    Blindingly Obvious Logical Fallacies
    Absurd Misconstrual of Scientific Principles or Evidence
    Amazing Conspiracy Theory Bro
    Racist Ignoramus

 

El Box de Shoutamente

Last 10 Shouts:

Costigan_Q2

November 11, 2024, 06:41:06 AM
"Stay cozy folks.

Everything is gonna be fine."

There'll be no excuses for having TDS after January 20th, there'll be no excuses AT ALL!!!
 

|iR|Focalor

November 06, 2024, 03:28:50 AM
 

RailWolf

November 05, 2024, 03:13:44 PM
Nice :)

Tom Servo

November 04, 2024, 05:05:24 PM
The Joe Rogan Experience episode 223 that dropped a couple hours ago with Musk, they're talking about Quake lol.

Costigan_Q2

November 04, 2024, 03:37:55 PM
Stay cozy folks.

Everything is gonna be fine.
 

|iR|Focalor

October 31, 2024, 08:56:37 PM

Costigan_Q2

October 17, 2024, 06:31:53 PM
Not activated your account yet?

Activate it now! join in the fun!

Tom Servo

October 11, 2024, 03:35:36 PM
HAHAHAHAHAHA
 

|iR|Focalor

October 10, 2024, 12:19:41 PM
I don't worship the devil. Jesus is Lord, friend. He died for your sins. He will forgive you if you just ask.
 

rikwad

October 09, 2024, 07:57:21 PM
Sorry, I couldn't resist my inner asshole.

Show 50 latest
Welcome, Guest. Please login or register.
November 25, 2024, 02:16:05 PM

Login with username, password and session length