This morning my site went down due to SQL Server bonking out. Sql went down completely and wouldn’t restart until I rebooted. I monitor my sites with West Wind Web Monitor and they correctly detected the failures immediately (all 361 of them <s>) and I have them set up to restart the Web Service and also SQL Server for some of the failure requests. Usually if there’s any rare failure the Web and SQL restart always brings the site right back up.
Today however, even the SQL restart didn’t do the trick so the site kept coming up and kept right on failing with more SQL Errors and so went through an endless cycle of Sql and Web Restarts. Yuk. When I finally rolled out of bed this morning I looked at my cellphone and 300 SMS messages later – Shit! <g>
I tried manual restarts of SQL with no luck, but gave in after a few minutes and rebooted the machine and all is well again. Sql’s back up and so is the rest of the site.
Sql Server has been super stable for me on my box – so much so that I don’t worry much about it falling down in my recovery protocols. In fact only one of the applications I run is there a flag to restart Sql Server (going on the premise if Sql’s down it’ll hit that application soon enough <s>) and this is the first time in the 5 year plus history of this box and Sql installation that it has bonked. Not a bad record I guess.
However, I’ve been having issues with services mysteriously shutting down and not coming back up. The same sort of thing happens about once every 3 or 4 months with IIS – IIS just stops and won’t come back up until the box reboots.
No harm done except a nice Cingular bill for 300 SMSs received I bet <s>… but it does bother me to see these odd lockups. I can understand a server dying. I hope that won’t happen but I can deal with that eventuality just fine. A server dying and not being able to restart however is a bit more scary and more difficult to trap.
Worse, looking at the event log there appears to be no trace of failure whatsoever. No unstable errors (other than a failed backup a few hours earlier – followed by a set of completed backups), no unexpected authentication etc. Nothing that hints at failure – same as the issues I had with IIS in the past. One minute the server’s up the next it’s down and doesn’t want to come back up…
Oh well, stuff happens I guess – I should be grateful things are so much more stable than say 5 years ago where it was a milestone to get through a few days without a failure <g>…
Other Posts you might also like