What is Supervised Learning?

Supervised learning is a branch of machine learning, a method of data analysis that uses algorithms that iteratively learn from data to allow computers to find hidden insights without being explicitly programmed where to look. Supervised learning is one of three methods of the way machines “learn”: supervised, unsupervised, and optimization.

Supervised Learning Diagram

Supervised learning solves known problems and uses a labeled data set to train an algorithm to perform specific tasks. It uses models to predict known outcomes such as “What is the color of the image?” “How many people are in the image?” “What factors are driving fraud or product defects?” etc. For example, a supervised learning process could be to classify two-wheeler and four-wheeler vehicles from their images. The training data would have to be correctly labeled to identify if a vehicle is a two-wheeler or a four-wheeler. Supervised learning enables algorithms to ‘learn’ from historical/training data and apply it to unknown inputs to derive the correct output. Supervised learning uses Decision Trees, Random Forest, and Gradient Boosting Machine to operate.

In contrast, unsupervised learning is a type of machine learning that is used to identify new patterns and detect anomalies. The data that is fed into unsupervised learning algorithms is unlabeled. The algorithm (or models) try to make sense of the data on its own by finding features and patterns. A sample question that unsupervised machine learning could answer is “Are there new fraud clusters or buying patterns or failure modes emerging?” Unsupervised learning uses clustering, Principle Components, Neural Networks, and Support Vector Machines.

Optimization, the third type of machine learning, finds the best solution even when there are complex constraints. For instance, optimization could answer the question “What is the optimum route to take or allocation of resources or equipment maintenance schedule?” Optimization utilizes genetic algorithms, which is based on Darwin’s theory of evolution.

What is Classification in Supervised Learning?

There are two major types of supervised learning; classification and regression. Classification is where an algorithm is trained to classify input data on discrete variables. During training, algorithms are given training input data with a ‘class’ label. For example, training data might consist of the last credit card bills of a set of customers, labeled with whether they made a future purchase or not. When a new customer’s credit balance is presented to the algorithm, it classifies the customer to either ‘will purchase’ or ‘will not purchase’ group.

What is Regression in Supervised Learning?

In contrast with classification, regression is a supervised learning method where an algorithm is trained to predict an output from a continuous range of possible values. For example, real estate training data would take note of the location, area, and other relevant parameters. The output is the price of the specific real estate.

In regression, an algorithm needs to identify a functional relationship between the input parameters and the output. The output value is not discrete like in classification, instead it is a function of the input parameters. The correctness of a regression algorithm is calculated based on the variance between the accurate output and the predicted output.

Real-life Applications of Classification


Binary classification

This algorithm classifies input data into one of two possible groups. Often one of the classes indicates a ‘normal/desired’ state, and the other indicates an ‘abnormal/undesired’ state. Real-world applications of binary classification include:

Spam detection

The algorithm is given sample emails that are labeled as ‘spam’ or ‘not spam’ during the supervised learning phase. Later, when the algorithm is presented with a new email input, it predicts if the mail is a ‘spam’ or ‘not spam.’

Churn prediction

The algorithm uses a training data set of customers who have unsubscribed from a service earlier. Based on the training, the algorithm predicts if a new customer will end the subscription or not based on the input parameters.

Conversion prediction

The algorithm is trained with buyer data and if they bought the item or not. Then, based on this training, the algorithm predicts whether a new customer will make a purchase or not.

The main algorithms used for binary classification include logistic regression and support vector machines.

Multi-class Classification

In multi-class classification, the training data set is labeled with one of multiple possible classes. In contrast to the binary classification, a multi-class algorithm is trained with data that can be categorized into one of the many possible classes. The applications for multi-class classification include:

  • Face classification: Based on the training data, a model categorizes a photo and maps it to a specific person. One thing to note here, there might be a huge number of class labels. In this case, thousands of people.
  • Email classification: Multi-class classification is used to segregate emails into various categories – social, education, work, and family.
  • The main algorithms used for multi-class classification are Random Forest, Naive Bayes, Decision Trees, K-nearest neighbors, and Gradient Boosting.

Multi-label Classification

Unlike the binary and multi-class classification where the outcome has only one possible class, the multi-label output belongs to one or more classes. That means the same input data might be classified into different buckets. The applications of multi-label classification include:

  • Photo detection: In cases where photos have several objects, like a vehicle, animal, and persons, the photo might fall into multiple labels.
  • Audio/video classification: Songs and videos might fit into various genres and moods. Multi-label classification can be used to assign these multi labels.
  • Text categorization: It is possible to categorize articles based on its content.

Imbalanced Classification

This is a special case of binary classification, where there is an imbalance of classes in the training data set. Most of the examples in the training data belong to one set, and a small portion belongs to the second set. Unfortunately, most machine learning algorithms work best when there is an equal split between the classes. For instance, let’s say in your training data and you have 10,000 genuine customer transactions and only 100 fraudulent ones. To even out the accuracy, specialized techniques are needed due to the imbalance in the data. The applications of imbalanced classification could be:

  • Fraud detection: In the labeled data set used for training, only a small number of inputs are labeled as a fraud.
  • Medical diagnostics: In a large pool of samples, ones with a positive case of a disease might be far less.

Specialized techniques like cost-based approaches and sampling based approaches are used to help deal with imbalanced classification cases.

Real-life Applications of Regression


Linear regression

Linear regression in supervised learning trains an algorithm to find a linear relationship between the input and output data. It is the simplest model used where the outputs represent a linearly weighted combination of the outputs. Linear regression can be used to predict values within a continuous range, (e.g. sales, price - forecasting) or classifying them into categories (e.g. cat, dog - logistic regression). In the training data for linear regression, an input variable (independent) and a corresponding output variable (the dependent variable) are provided. From the labeled input data provided, the regression algorithm calculates the intercept and x-coefficient in the linear function. Applications of linear regression may include:

Forecasting: One of the most significant applications of linear regression is forecasting. The forecasting can be of different natures. Businesses use linear regression for forecasting sales or the buying behaviors of their customers. It is also used in predicting economic growth, real estate sales, and the prices of commodities like petroleum. Linear regression is also used in estimating the optimum salary for a new employee, based on the historical data of the salaries.

Logistic regression

Is used to determine the probability that an event will happen. The training data will have an independent variable, and the desired output would be a value between 0 and 1. Once the algorithm is trained with logistic regression, it can predict the value of a dependent variable (between 0 and 1) based on the value of the independent variable (input). Logistic regression uses the classic S-shaped sigmoid function. In logistic regression in the supervised learning context, an algorithm estimates the beta coefficient values b0 and b1 from the training data provided.

odds = e^(b0 + b1 * X)

Applications of logistic regression include:

  • Determining the probability: One of the main applications of logistic regression is to determine the likelihood of an event. The probability of any event lies between 0 and 1, and that is the output of a logistic function. Logistic regression algorithms in machine learning can be used to predict election results, probabilities of a natural calamity, and other such events.
  • Classification: Even though logistic regression uses a continuous function, some of its applications are in classification. It can be used for image segregation and related classification problems.

Polynomial regression

Polynomial regression is used for a more complex data set that will not fit neatly into a linear regression. An algorithm is trained with a complex, labeled data set that may not fit well under a straight line regression. If such training data is used with linear regression, it might cause under-fitting, where the algorithm is not capturing the true trends of the data. Polynomial regressions allow for more curvature in the regression line and hence a better approximation of the relationship between the dependent and independent variable.

Bias and variance are two main terms associated with polynomial regression. Bias is the error in modeling that occurs through simplifying the fitting function. Variance also refers to an error caused by using an over-complex function to fit the data.

The Basic Steps of Supervised Learning

To execute and solve a problem using supervised machine learning, one must:

  • Select the type of training data: The first step in supervised learning is to determine what is the nature of data to be used for the training. For example, in the case of handwriting analysis, this could be a single letter, a word, or a sentence.
  • Gather and cleanse the training data: In this step, the training data is collected from various sources and undergoes rigorous data cleansing.
  • Choose a model using a supervised learning algorithm: Based on the nature of the input data and the desired use, choose either a classification or regression algorithm. This might be Decision trees, SVM, Naïve Bayes, or Random Forest. The primary consideration while selecting an algorithm is the training speed, usage of memory, accuracy of the prediction on new data, and transparency/interpretability of the algorithm.
  • Train the model: The fitting function is fine-tuned through multiple iterations of training data to improve accuracy and the speed of prediction.
  • Make predictions and evaluate the model: Once the fitting function is satisfactory, the algorithm can be given new data sets to make new predictions.

Optimize and retrain the model: Data decay is a natural part of machine learning. Therefore, models must be regularly retrained with refreshed data to ensure accuracy.