Keeping Your Web Application Alive

Ah, ha, ha, ha, stayin' alive, stayin' alive

I recently caught up with Avengers Endgame 3 weeks after its initial release. Unlike me, most Marvel enthusiast rushed to get their ticket which caused Singapore’s cinema online booking to crash.

And it got me thinking, what could I do in advance in preparation for the release of Avengers if I were the administrator.

Even if there was only 24 hours left. There must be some way to cope the surge in traffic.


The “Cloud”

Let’s start off with the “cloud”.

One would describe the “cloud” as another computer at the other side of the world.

Most cloud services like Amazon Web Services, Microsoft Azure and Google Cloud Platform offers some sort of auto-scaling.

With a few clicks of a button your web application will magically grow and shrink as and when needed.

For example, Troy Hunt has been using Azure services to scale Have I been Pwned (HIBP) since its origin. He wrote a post specifically on how he used Azure’s auto scale feature to keep the service alive at an affordable cost.

A year later, he wrote another post on how he handles huge traffic spikes. This time HIBP received such a huge spike that even auto-scale wasn’t able to catch up.

Unfortunately do not have the luxury of building the web application from scratch. So this shall be implemented another day.

DNS Round Robin

If the web application is built in such a way where we could easily spawn another instance and on top of that, if the servers is communicating with a single database or a group of databases that are in sync, then we could simply create new instances and add new A records to our DNS pointing to the new instances.

However, this does not guarantee the effect of load balance.

Here is an example. Imagine a corporation’s internal DNS caches the initial request, then every subsequent visits within the network will return the address of the initial request.

To address this, we could lower our time-to-live (TTL) value so that it will expire sooner and a new request has to be made in order to achieve the load balance effect.

The other down side of this setup is that if a server is down, there needs some sort of implementation to 1)detect the dead server and 2) remove the entry corresponding to the dead web application server in the DNS record.

However this then increases the load on the remaining server and hence new instances of the web application has to be set up and added back to the DNS records

If the implementations were done properly, this is a pretty affordable way of dealing with an increased in traffic.

Load Balancer

It’s similar to using a DNS round robin however a load balancer splits the load among the servers depending on the policy.

The load balancer itself would first have to be able to withstand the amount of traffic. If it is unable to handle the traffic itself, then the site would still crash since user’s are not directed to the web application.

To fix this issue, we could have 2 or more instances of the load balancer with the second one acting as a backup.

This gif by Digital Ocean show hows to create a high availability setup:

This is probably the most ideal solution since load balancers are created specifically to distribute traffic.

Restrict User Access (Self-DOS)

This last method is the last resort, probably something that could be done 30 minutes before the release.

The idea is to create a temporary front facing web page (Gitlab and Github pages are free) with a JavaScript snippet that generates a random number and we will redirect a certain percentage of the user to the application.

So for example if we were to generate from 1 to 100, and we only redirect user whose generated number is between 1-10, then we are allowing 10% of the traffic to pass through. This is not a definite percentage however according to laws of large numbers, this is a pretty good estimate.

Of course this could simply be bypassed as JavaScript runs on the client’s browser.

It will probably work with the majority of the public that are non-technical and it is definitely better to have a few person buying tickets than no one buying tickets due to the crash.


Resources:

https://serverfault.com/questions/101053/is-round-robin-dns-good-enough-for-load-balancing-static-content

https://www.educative.io/api/collection/5668639101419520/5649050225344512/page/5747976207073280/image/5696459148099584.png