THE ROLE OF CONFUSION MATRIX

mahima gautam
3 min readJun 6, 2021

what is a confusion matrix and why is it important?

A confusion matrix is a table that displays and compares actual values with the model’s predicted values. Within the context of machine learning, a confusion matrix is utilized as a metric to analyze how a machine learning classifier performed on a dataset. A confusion matrix generates a visualization of metrics like precision, accuracy, specificity, and recall.

The reason that the confusion matrix is particularly useful is because the confusion matrix generates a more complete picture of how a model performed. Only using a metric like accuracy can lead to a situation where the model is completely and consistently misidentifying one class, but it goes unnoticed because on average performance is good. Meanwhile, the confusion matrix gives a comparison of different values like False Negatives, True Negatives, False Positives, and True Positives.

Confusion Matrix

The Positive/Negative label refers to the predicted outcome of an experiment, while the True/False refers to the actual outcome.

Now let’s understand each term :

  • True Positive — When the actual class of a data point is 1 and model predicted 1. (Model is truly saying positive, you can trust)
  • False Negative — When the actual class of data point is 1 and model predicted 0. (Model is falsely saying Negative, not reliable)
  • False Positive — When the actual class of data point is 0 and model predicted 1. (Model is falsely saying Positive, not reliable)
  • True Negative — When the actual class of a data point is 0 and model predicted 0. (Model is truly saying negative, trustworthy)

Given below is a list of rates that are often computed from a confusion matrix for a binary classifier:

  • Accuracy: Overall, how often is the classifier correct?

(TP+TN)/total

  • Misclassification Rate: Overall, how often is it wrong?

(FP+FN)/total

It is equivalent to 1 minus Accuracy and also known as “Error Rate”

  • True Positive Rate: When it’s actually yes, how often does it predict yes?

TP/actual yes ,also known as “Sensitivity” or “Recall”

  • False Positive Rate: When it’s actually no, how often does it predict yes?

FP/actual no

  • True Negative Rate: When it’s actually no, how often does it predict no?

TN/actual no

-It is equivalent to 1 minus False Positive Rate and also known as “Specificity”

  • Precision: When it predicts yes, how often is it correct?

TP/predicted yes

  • Prevalence: How often does the yes condition actually occur in our sample?

actual yes/total

Let’s get back to our model and look at its confusion matrix again and understand

Table2. Performance evaluation of Existing Naïve Bayes classifier Vs Enhanced Random Forest

In Table2, the performance assessment of existing Naïve Bayes Vs Enhanced Random Forest classifier had been tabulated.

So comparing the values obtained in the confusion matrix :

  • True Positive — For Enhanced Random Forest is higher which is good meaning the number of cases which were predicted right are high.
  • False Negative — For Enhanced Random Forest is zero which is really good meaning the number of cases which were predicted wrong are low.

And undoubtedly subsequently the accuracy rate for Enhanced Random Forest is very high that is 99.58%.

--

--