The Need for Prior as Structure in Machine Learning

Arvind Mehrotra

Machine learning (ML) is a complex business, and one of the hardest parts of developing ML algorithms is making them business ready. Unfortunately, the real world and the testing world are not identical, which means that the ML algorithms we build using training datasets will struggle to perform when faced with a real-world task.

The situation has gotten more complex due to Covid19 as the process and behaviors have undergone colossal change or shift, leading to past data not valid for future behavior prediction. 

For example, a simple problem-solving scenario like differentiating cats from dogs might be possible via ML if you fed the algorithm enough pictures of cats and dogs.

However, more nuanced challenges like distinguishing between cancer or non-cancer or between high-value legitimate transactions and fraud call for a different approach to training ML – by introducing a prior or pre-existing bias into the dataset to give it structure. 

Even if this seems a little counter-productive at first, here is why you need to hear me out.

The Structure is a Primitive Prerequisite for Decision-Making

 To understand the need for prior in ML, we have to look at the age-old debate of nature vs nurture. In the former approach, the end-to-end learning process happens without prior – or existing – knowledge, making the decision-making engine (the human cognitive mind or an ML algorithm) more robust and objective.

On the flip side, it would take an extraordinary amount of time to reach a level of firm and confident decision-making at par with the needs of the real world.

In contrast, nurture purports that we base some of our knowledge and decision-making on pre-existing facts, opinions, and ideas – also called the prior in the context of ML. It gives it a structure to base future learnings on, thereby learning more quickly and applying it more efficiently.

A simple example is having someone teach us good from bad during childhood instead of learning through trial, error, punishment, and consequence. Similarly, with ML, prior can be defined as a probability equation that expresses the validity of a belief/conviction regarding a variable before any evidence to the same has come to light.

 For instance, you may not have gone to jail for theft, and you may not have met anyone who has, but you know that stealing is wrong.

 Augmenting Prior with Likelihood

Likelihood encapsulates the possibility of concluding, given historical patterns in the same conditions. When you combine Likelihood with prior, the forecasting capability (and thereby the decision-making capability) of humans and ML increases.

I will illustrate it with an example; Likelihood will tell you that in a world with 1.4 billion Chinese nationals and a global population of 7.6 billion, the chances of you encountering a Chinese national on the streets is 18%.

But when you factor in the local people in your region, which is prior because you have not counted every person yourself, you get a far more accurate estimate of your possibility of actually meeting a Chinese national.