Keeping Your Web Application Alive • Edric Teo

I recently caught up with Avengers Endgame 3 weeks after its initial release. Unlike me, most Marvel enthusiasts rushed to get their tickets which caused Singapore’s cinema online booking to crash.

And it got me thinking, about what could I do in advance in preparation for the release of Avengers if I were the administrator.

Even if there were only 24 hours left. There must be some way to cope with the surge in traffic.

The “Cloud”

Let’s start off with the “cloud”.

One would describe the “cloud” as another computer on the other side of the world.

Most cloud services like Amazon Web Services, Microsoft Azure, and Google Cloud Platform offer some sort of auto-scaling.

With a few clicks of a button, your web application will magically grow and shrink as and when needed.

For example, Troy Hunt has been using Azure services to scale Have I Been Pwned (HIBP) since its origin. He wrote a post specifically on how he used Azure’s auto-scale feature to keep the service alive at an affordable cost.

A year later, he wrote another post on how he handles huge traffic spikes. This time HIBP received such a huge spike that even auto-scale wasn’t able to catch up.

Unfortunately do not have the luxury of building the web application from scratch. So this shall be implemented another day.

DNS Round Robin

If the web application is built in such a way that we could easily spawn another instance and on top of that, if the servers are communicating with a single database or a group of databases that are in sync, then we could simply create new instances and add new A records to our DNS pointing to the new instances.

However, this does not guarantee the effect of load balance.

Here is an example. Imagine a corporation’s internal DNS caches the initial request, then every subsequent visit within the network will return the address of the initial request.

To address this, we could lower our time-to-live (TTL) value so that it will expire sooner and a new request has to be made in order to achieve the load balance effect.

The other downside of this setup is that if a server is down, there needs some sort of implementation to 1)detect the dead server and 2) remove the entry corresponding to the dead web application server in the DNS record.

However, this then increases the load on the remaining server, and hence new instances of the web application have to be set up and added back to the DNS records

If the implementations were done properly, this is a pretty affordable way of dealing with an increase in traffic.

Load Balancer

It’s similar to using a DNS round-robin however a load balancer splits the load among the servers depending on the policy.

The load balancer itself would first have to be able to withstand the amount of traffic. If it is unable to handle the traffic itself, then the site would still crash since users are not directed to the web application.

To fix this issue, we could have 2 or more instances of the load balancer with the second one acting as a backup.

This gif by Digital Ocean shows hows to create a high-availability setup:

This is probably the most ideal solution since load balancers are created specifically to distribute traffic.

Restrict User Access (Self-DOS)

This last method is the last resort, probably something that could be done 30 minutes before the release.

The idea is to create a temporary front-facing web page (Gitlab and Github pages are free) with a JavaScript snippet that generates a random number and we will redirect a certain percentage of the user to the application.

So for example, if we were to generate from 1 to 100, and we only redirect users whose generated number is between 1-10, then we are allowing 10% of the traffic to pass through. This is not a definite percentage however according to the laws of large numbers, this is a pretty good estimate.

Of course, this could simply be bypassed as JavaScript runs on the client’s browser.

It will probably work with the majority of the public that is non-technical and it is definitely better to have a few people buying tickets than no one buying tickets due to the crash.

Resources:

https://serverfault.com/questions/101053/is-round-robin-dns-good-enough-for-load-balancing-static-content

https://www.educative.io/api/collection/5668639101419520/5649050225344512
/page/5747976207073280/image/5696459148099584.png