Uncategorized Archives • Page 7 of 13 • Edric Teo

Swing Bias

Swing Bias occurs when a person starts with an original thesis, swings to the opposing view to avoid confirmation bias, realizes they may have overcorrected, and swings back to the original thesis.

***

Confirmation bias is the tendency to believe in things that align with our existing beliefs. It influences our daily search, interpretation, and recollection of information. Schools teach this bias to help students develop critical thinking skills.

However, there are second-order effect to this bias.

Since we are aware that confirmation bias exist, we have to consider opposing viewpoints to ensure our evaluation is well-rounded.

We can begin by identifying obvious points that are on the extreme opposite end. These points fall into the category of known-unknowns, where we simply invert our current viewpoint.

However, points along the spectrum live in the realm of unknown-unknowns.

As a result, we rely on readily available information for our research. For example, both our online searches and the people we speak to are biased.

Consider this: our primary source of information is through search engines, which ranks information based on what the engineers deem useful for consumers. How can a search engine truly understand a user’s intent and determine what is useful?

The role of a search engine is to determine what is most useful to the majority.

This bias also extends to the people we engage with for discussions as well. Even when considering the viewpoints of experts in a particular field, not all are accessible for a discussion. As such, our discussion points are biased towards the experts who are more available.

Adding to the complexity, most viewpoints fall along a spectrum rather than being at opposite ends. This means that, in addition to identifying these points (if we can even identify them), the influence they they have on a thesis needs to be weighted. And of course, the assigned weightage of these viewpoints in itself is bias.

Balancing all of these factors is more of an art than a science.

While gathering information on opposing viewpoints, we may inadvertently overcorrect our original thesis, resulting in a counter confirmation bias.

Now that we are aware of this possibility, there is nothing preventing us from overcorrecting again.

List Comprehension(Python) Addiction

I first encountered the term list comprehension by watching a talk on optimizing Python code by Sebastian Witowski.

I always knew this term existed but didn’t include it in my arsenal of tools while writing personal projects.

I mean, I don’t have to right?

So long as the code work and is maintainable, I’m good.

Now look at the following code:

output = []
for i in list:
	if i % 2 == 0
	output.append(i)
return output

Isn’t it clear and easy to understand?

There is no need for fancy one-liners.

We simply iterate through the list and only append it to the output if element i is an even number.

But as I start working on more projects, I realized that such a pattern occurs over and over again – to return a subset of the list that meets certain conditions.

It may not seem much if it’s just for a simple script or project.

However, as your script/project grows, all these extra lines of code will have an impact.

The above code can be simply replaced by:

return [i for i in list if i % 2 == 0]

To those who are unfamiliar with list comprehension. Here is the syntax:

[expression for i in list if condition]

It is equivalent to this:

for i in list:
	if condition:
    	expression

List comprehension can also be useful for assertions.

For example, if we want to ensure that the list is in sorted order, we could simply use the:

assert all([list[i] <= list[i+1] for i in range(len(list)-1)])

all() is an in-built function that returns a True if every element in the list is True.

So our list comprehension starts iterating from 0 to the length of the list minus 1.

Our expression, list[i] <= list[i+1], will then check if the current value of list[i] is less than the next (note that list[i] will not reach the last element). And it will return the respective boolean value.

We can also use it to assert that every value in a dictionary is an integer.

assert all([type(value) is int for key, value in dictionary.items()])

Keeping Your Web Application Alive

I recently caught up with Avengers Endgame 3 weeks after its initial release. Unlike me, most Marvel enthusiasts rushed to get their tickets which caused Singapore’s cinema online booking to crash.

And it got me thinking, about what could I do in advance in preparation for the release of Avengers if I were the administrator.

Even if there were only 24 hours left. There must be some way to cope with the surge in traffic.

The “Cloud”

Let’s start off with the “cloud”.

One would describe the “cloud” as another computer on the other side of the world.

Most cloud services like Amazon Web Services, Microsoft Azure, and Google Cloud Platform offer some sort of auto-scaling.

With a few clicks of a button, your web application will magically grow and shrink as and when needed.

For example, Troy Hunt has been using Azure services to scale Have I Been Pwned (HIBP) since its origin. He wrote a post specifically on how he used Azure’s auto-scale feature to keep the service alive at an affordable cost.

A year later, he wrote another post on how he handles huge traffic spikes. This time HIBP received such a huge spike that even auto-scale wasn’t able to catch up.

Unfortunately do not have the luxury of building the web application from scratch. So this shall be implemented another day.

DNS Round Robin

If the web application is built in such a way that we could easily spawn another instance and on top of that, if the servers are communicating with a single database or a group of databases that are in sync, then we could simply create new instances and add new A records to our DNS pointing to the new instances.

However, this does not guarantee the effect of load balance.

Here is an example. Imagine a corporation’s internal DNS caches the initial request, then every subsequent visit within the network will return the address of the initial request.

To address this, we could lower our time-to-live (TTL) value so that it will expire sooner and a new request has to be made in order to achieve the load balance effect.

The other downside of this setup is that if a server is down, there needs some sort of implementation to 1)detect the dead server and 2) remove the entry corresponding to the dead web application server in the DNS record.

However, this then increases the load on the remaining server, and hence new instances of the web application have to be set up and added back to the DNS records

If the implementations were done properly, this is a pretty affordable way of dealing with an increase in traffic.

Load Balancer

It’s similar to using a DNS round-robin however a load balancer splits the load among the servers depending on the policy.

The load balancer itself would first have to be able to withstand the amount of traffic. If it is unable to handle the traffic itself, then the site would still crash since users are not directed to the web application.

To fix this issue, we could have 2 or more instances of the load balancer with the second one acting as a backup.

This gif by Digital Ocean shows hows to create a high-availability setup:

This is probably the most ideal solution since load balancers are created specifically to distribute traffic.

Restrict User Access (Self-DOS)

This last method is the last resort, probably something that could be done 30 minutes before the release.

The idea is to create a temporary front-facing web page (Gitlab and Github pages are free) with a JavaScript snippet that generates a random number and we will redirect a certain percentage of the user to the application.

So for example, if we were to generate from 1 to 100, and we only redirect users whose generated number is between 1-10, then we are allowing 10% of the traffic to pass through. This is not a definite percentage however according to the laws of large numbers, this is a pretty good estimate.

Of course, this could simply be bypassed as JavaScript runs on the client’s browser.

It will probably work with the majority of the public that is non-technical and it is definitely better to have a few people buying tickets than no one buying tickets due to the crash.

Resources:

https://serverfault.com/questions/101053/is-round-robin-dns-good-enough-for-load-balancing-static-content

https://www.educative.io/api/collection/5668639101419520/5649050225344512
/page/5747976207073280/image/5696459148099584.png