Comcast suffered a strange semi-brownout today here in the Rockford area. A full outage is easy to diagnose: Your Internet just doesn’t work. This outage was weird, as the issue was between Comcast and a third of the Internet, and took about 1½ hours to resolve. Afterward, Comcast seemed to point blame at Level 3, a Colorado-based Internet interconnection company that connects service providers.
Level 3 is one of those companies that are crucial to a smoothly operating Internet — and therefore a smoothly operating workplace — but few people know because we never see its name on a bill.
Our first notification came when our telephone servers, hosted at Google, lost connection with their handsets all over Rockford. Our monitoring software checks the connection between every server and a VoIP phone onsite with our clients. Whenever a problem arises, an email is fired off for us to investigate, and a second email is sent once it’s resolved. We quickly determined that the issue was localized to Comcast by calling one of the associated telephone numbers and hearing the correct outgoing message. This meant the phone servers were operating correctly.
During this stage, the Bose music system here at Thinker continued playing Spotify and Gmail continued to work, but all my SSH sessions into respective servers closed down. We discovered that all of our servers with Google were unreachable, but all of the servers with Amazon were fine.
Our phone servers are generally configured to offer an option to call people by extension and even have links to a phone book. Most extensions fail over to cellphones, so people could still be reached.
At this stage, my next tool is the great Down Detector website, http://downdetector.com/status/comcast-xfinity. The website was, however, unreachable by Comcast. A quick look using my phone showed us where the fault lay.
Using my phone’s hotspot mode, we determined that our websites at Google were working fine for the rest of the Internet and our phone servers continued to work, so we alerted a few of our clients to the issues before they realized what was happening. At this stage, we also discovered that Facebook was unreachable. With Facebook down, we realized that Comcast would be well aware of the outage and would be prioritizing a fix.
We used our online monitoring tools to keep track of which phone extensions across town were disconnected and watched them come back online (at first, with very high latency, which then returned to normal speed over the next hour or so).
By 2 p.m., the outage was mostly resolved in Rockford. Some reports are indicating that Comcast has bypassed Level 3 for the time being. One-third of the Internet traffic normally goes through Level 3, so you may continue to experience some services, such as Netflix, loading much more slowly until the issue is fully fixed. Essentially, it would be like the state shutting down Interstate 90, forcing all of us to take U.S. 20 to Chicago.