Performance Metrics of Classification Models

Performance metrics of Classification Model by Monisha Swami

Chintu and Chutki have recently learned the concept of Classification. And they are all set to learn how to measure the model performance.

Statistical Concepts
Performance metrics of Classification Model

Consider this example –

Two new chocolate vending machines have been installed in the nearby shopping complex. The vending machine functions on the concept of logistic regression.  When a coin is inserted it predicts whether the coin is fake or not fake. 

Chintu claims that vending machine A performs better than Vending machine B which Chutki doesn’t agree with. So, they decide to perform an experiment. A total of 100 coins of which 94 are real and 6 are fake are inserted in vending machine A and its predictions are recorded. The same task is performed for vending machine B. 

Before moving forward we will first understand the concept of the Confusion matrix. The class of interest is called the positive class. Here the vending machine is trying to detect fake or not fake. This makes the Fake the positive class.

Confusion Matrix

 Actual Values
  Predicted values FakeNot Fake
FakeTrue PositivesFalse Positives
Not FakeFalse NegativesTrue Negatives
Confusion Matrix

True Positive – A fake coin classified as fake.

False Positive – A real coin classified as fake.

False Negative – A fake coin classified as not fake.

True Negative – A real coin classified as not fake.

The observations of experiment from two vending machines are as follows:

Actual Values

Predicted values
FakeNot Fake
Fake  2        (True Positives)      0       (False Positives)
Not Fake 4         (False Negatives)    94        (True Negatives)
Vending Machine A

Actual Values

Predicted values
FakeNot Fake
Fake  4      (True Positives)      4       (False Positives)
Not Fake2        (False Negatives)   90       (True Negatives)
Vending Machine B


It is defined as the ratio of correctly classified observations to total number of observations.

Accuracy = Correctly classified observations/ All observations

i.e. Accuracy = (True Positives + True Negatives)/(True Positives + True Negatives + False Positives + False Negatives)

Aaccuracy = (2+94)/100 = 0.96

Baccuracy = (4+90)/100 = 0.94

Even though the accuracy of Vending Machine A is higher Chutki is still not satisfied. So, the two of them approach their teacher who explains that accuracy is not always the best metric to measure model performance when dealing with class imbalance. Class imbalance is a situation where one class is more frequent than the other.

For example-

In a model of fraud classification where 98% of transactions are genuine and only 2% are fraud, a classifier could be built that predicts all transactions as genuine and thus the model will have an accuracy of 98%. While the accuracy is high the model fails terribly at its real purpose to detect fraud transactions. Thus, we need a more nuanced metric to access the performance of a model in such cases.

Sensitivity (Recall or True Positive Rate)

We can define Sensitivity as the ratio of true positives to the actual positives in the data. i.e. out of the total actual positives, what percentage have been correctly predicted as positive by the model.

Sensitivity = Number of fake coins correctly predicted as fake / Total number of fake coins

i.e. Sensitivity = True Positive / (True Positive + False Negative)

ASensitivity = 2/(2+4) = 0.33 (Only 33.33% of fake coins are correctly classified as fake by vending machine A)

BSensitivity = 4/(4+2) = 0.66 (66.66% of fake coins are correctly classified as fake by vending machine B)

This brings Chintu and Chutki to the conclusion that Vending Machines B performs better at predicting fake coins. But, vending machine B can also be improved by optimizing the model for sensitivity i.e. The model rather classify a real coin as fake (False Positive) than classify a fake coin as not fake (False Negative).

Sensitivity is also known as Recall and True Positive Rate.


We define Specificity as the ratio of true negatives to the actual negatives in the data. i.e. out of the total actual negatives, what percentage have been correctly predicted as negative by the model.

Specificity = True Negative / (True Negative + False Positive)

Consider a model to predict spam email. So, here spam is our positive class. So, specificity will be

Specificity = Genuine mail predicted not spam / Total number of genuine mail

Specificity is useful for models like spam filter where it is better for an email user to rather send spam to inbox (False Negative) than send real email to the spam filter (False Positive).


It is the ratio of true positives to the total number of observations predicted as positive i.e Out of the observations predicted as positive, what percentage is correct?

Precision = True Positive/ (True positive + False Positive)

In case of example of mail prediction as Spam,

Precision = Spam mail predicted as spam/Total number of mails predicted spam

A high precision will imply that not many real emails are predicted as spam while a high sensitivity will imply that most spam mails are predicted as spam even if some real mails are also predicted as spam.

F1 Score

It is the harmonic mean of precision and recall.

F1 Score = 2 * Precision * Recall/ (Precision + Recall)

To summarize:

The various performance metrics have been summarized in the table below:

 Actual Values
  Predicted values10Measures
1True Positives (a)False Positives (b)Precision = a/(a+b)
0False Negatives (c)True Negatives (d)
MeasuresSensitivity/ Recall = a/(a+c)Specificity = d/(b+d)Accuracy = (a+d)/(a+b+c+d)
Confusion Matrix

For any suggestions, please reach out to us on LinkedIn. You can also schedule a meeting by vising the Contact page.

Find some of the resources that helped us here.

You can create an impact by talking about your interview experience. Please fill this form and help students get a perspective about the interview structure and questions.

You can read other articles here.