How confusion matrix is used in cybersecurity?

3 min readJun 9, 2021

What is a confusion matrix?

A confusion matrix is a table that is often used to describe the performance of a model on a set of test data for which the true values are known. The confusion matrix itself is relatively simple to understand, but the related terminology can be confusing.

Let’s understand this using this simple example. We have a total of 165 records or observations amongst which 50 are those which are predicted as No and are actually NO, 10 are those which are predicted YES but actually are NO, and so on.

  • true positives (TP): These are cases in which we predicted yes (they have the disease), and they do have the disease.
  • true negatives (TN): We predicted no, and they don’t have the disease.
  • false positives (FP): We predicted yes, but they don’t actually have the disease. (Also known as a Type I error.)
  • false negatives (FN): We predicted no, but they actually do have the disease. (Also known as a Type II error.)

How can we use a confusion matrix in cybersecurity?

We can use this in multiple use cases. Some of them are listed below:

  1. BotNet detection
  2. IDS
  3. In the classification of attack

and many more.

This is a list of rates that are often computed from a confusion matrix for a binary classifier:

  • Accuracy: Overall, how often is the classifier correct?
  • (TP+TN)/total = (100+50)/165 = 0.91
  • Misclassification Rate: Overall, how often is it wrong?
  • (FP+FN)/total = (10+5)/165 = 0.09
  • equivalent to 1 minus Accuracy
  • also known as “Error Rate”
  • True Positive Rate: When it’s actually yes, how often does it predict yes?
  • TP/actual yes = 100/105 = 0.95
  • also known as “Sensitivity” or “Recall”
  • False Positive Rate: When it’s actually no, how often does it predict yes?
  • FP/actual no = 10/60 = 0.17
  • True Negative Rate: When it’s actually no, how often does it predict no?
  • TN/actual no = 50/60 = 0.83
  • equivalent to 1 minus False Positive Rate
  • also known as “Specificity”
  • Precision: When it predicts yes, how often is it correct?
  • TP/predicted yes = 100/110 = 0.91
  • Prevalence: How often does the yes condition actually occur in our sample?
  • actual yes/total = 105/165 = 0.64

A couple of other terms are also worth mentioning:

  • Null Error Rate: This is how often you would be wrong if you always predicted the majority class. (In our example, the null error rate would be 60/165=0.36 because if you always predicted yes, you would only be wrong for the 60 “no” cases.) This can be a useful baseline metric to compare your classifier against. However, the best classifier for a particular application will sometimes have a higher error rate than the null error rate, as demonstrated by the Accuracy Paradox.
  • Cohen’s Kappa: This is essentially a measure of how well the classifier performed as compared to how well it would have performed simply by chance. In other words, a model will have a high Kappa score if there is a big difference between the accuracy and the null error rate.
  • F Score: This is a weighted average of the true positive rate (recall) and precision.
  • ROC Curve: This is a commonly used graph that summarizes the performance of a classifier over all possible thresholds. It is generated by plotting the True Positive Rate (y-axis) against the False Positive Rate (x-axis) as you vary the threshold for assigning observations to a given class.




C|EH | Cybersecurity researcher | MLOps | Hybrid Multi Cloud | Devops assembly line | Openshift | AWS EKS | Docker