Expect ‘lost password’ headaches in the next few monthsJanuary 28, 2018
Weirdest things can cause system issuesMarch 9, 2018
In January, the state of Illinois launched a website to help parents of kids in private schools apply for scholarships to go toward their children’s education.
The site was developed in conjunction with the schools, and the day of launch saw many schools throw scholarship parties where parents could sign up immediately. However, when the big reveal came, Empower Illinois’ server setup was not up to the task and promptly crashed.
It was, obviously, disappointing to parents waiting for their chance to apply for part of the fund. The crash appeared on the local news and on the front pages of newspapers throughout the state.
As I write, this particular website still is not up again.
On Feb. 6, SpaceX’s Falcon Heavy launcher shot a Tesla Roadster into outer space. Apart from appealing to me and other Hitchhiker’s Guide to the Galaxy fans by putting the words “Don’t Panic” in large friendly letters on the Roadster’s display console during flight, we noticed how slow the website became during launch. The site didn’t crash, but with millions of people simultaneously watching this online, it just took a long time to respond to requests. That is a typical response when demand outstrips the ability for the infrastructure to respond.
These two events reminded us why we spent so much time creating a very fast, autoscaling web server infrastructure.
Nobody can really tell whether their idea will go viral — and how that would affect their webhost. We have discovered that tests work differently from real-world activity. We have learned so much about scaling up, extremely fast, to the point that we’ve actually had to modify code used by Google because it’s too conservative and would cause our system to lock up. One of our clients (with a very loyal following) loves to offer timed, limited-availability flash sales. During these sales, they see about four times as many clients as there are products online. With a hard sale time, the server will scale up from two mini servers to 90 in three minutes, and they sell out of items within 11 minutes. Without the extra measures we put in place, the website would crash within 30 seconds of the start of the sale.
During the soccer World Cup in Brazil four years ago, Google and Coca-Cola showed perfectly how quickly this technology can scale. Coca-Cola, with the assistance of CI&T, a Brazilian software engineering company, ran a campaign in which people could upload a picture to create the Happiness Flag, http://www.ciandt.com/card/coca-cola-happiness-flag. All photos received were stitched together in one huge flag of crowd-sourced images. During the opening celebration, a URL was posted where you could view this image. The servers then sent an email or text to each entrant with their image’s coordinates so they could find themselves amongst the other millions of fans. Google recorded the size and speed of the scale out of their servers worldwide as the website was accessed by more than 1 billion simultaneous users.
At Thinker, we’re really proud of our autoscaling web server platform, built on Google’s platform. Within the next few weeks, we plan to create a running graph showing how the system scales during an average week. If your site is not being hosted by an infrastructure that can scale on demand, you too can be a lack-of-service victim if you go viral.