Mean Average Precision (mAP) in Object Detection

A metric for evaluating Object Detection Models

6 min readJul 15, 2023

What is Object Detection?

Object detection is the task of identifying and localizing objects within images or video frames.

Object Detection- Classification + Localization

So, how do we evaluate the performance of object detection model? This is where the mean Average Precision (mAP) metric comes into play.

To assess the performance of an object detection model, we examine its ability to correctly identify the object’s class and accurately predict the bounding box coordinates of the object.

To understand mAP we need to understand the following things first:
1. Intersection Over Union

2. Confusion Matrix

3. Precision

4. Recall

5. Precision Recall Curve

6. Average Precision

Intersection Over Union (IoU)

Intersection Over Union is a measurement used to evaluate the accuracy of object detection algorithm.

It measures the amount of overlap between ground truth box and predicted bounding box. The overlap is calculated by computing the ratio of intersection area to the union area between the ground truth and prediction box.

The IoU value ranges from 0 to 1, where a value closer to 1 indicates a higher degree of overlap and better alignment between the predicted and ground truth bounding boxes.

How to calculate IoU?

Let’s consider the below example:

Source: https://machinelearningspace.com/

Once we have (x₀ᴵ, y₀ᴵ) (x₁ᴵ y₁ᴵ) i.e the coordinates of the intersecting box we can easily calculate the area of overlap and area of union.

Area of overlap = (x₁ᴵ - x₀ᴵ) * (y₁ᴵ - y₀ᴵ)

Area of union = (x₁ᴬ - x₀ᴬ) * (y₁ᴬ - y₀ᴬ) + (x₁ᴮ - x₀ᴮ) * (y₁ᴮ - y₀ᴮ) - (x₁ᴵ - x₀ᴵ) * (y₁ᴵ - y₀ᴵ)

Note: Area of overlap is 0 if (x₁ᴵ - x₀ᴵ) or (y₁ᴵ - y₀ᴵ) is negative.

Confusion Matrix

Confusion Matrix is a table that summarizes the performance of a machine learning model on a set of data.

Correct Prediction- The class of predicted bounding box and ground truth bounding box is same and the IoU between the predicted and grouth truth box is greater than or equal to the set threshold value.

True Positive- A correct prediction, IoU ≥ threshold

False Positive- An incorrect prediction, IoU < threshold

False Negative- A grouth truth box was present but not detected.

True Negative- True negatives represent background in object detection. True negative means that no bounding box was predicted for the background and the bounding box was not present in the ground truth.

Precision

Precision is a measure that tells out of all the predictions made by the model what percentage of predictions are actually correct.

Recall

Recall is a measure that tells out of all the ground truths what percentage was correctly predicted.

Precision Recall Calculation

Note: All the calculation below should be done for each class separately. True Negatives are not considered in the calculation of Precision and Recall.

Lets consider the sample output of an object detection model as below to understand how to calculate precision and recall value.

There are 3 images with 4 ground truth boxes(green) and 6 prediction boxes(red) along with the confidence score.

To calculate precision and recall we need to find the TP and FP for that we need to first set a IoU threshold value and get the TP and FP.

Let’s consider that the IoU threshold value is 0.5. If the IoU value is ≥ 0.5 the box will be a TP otherwise if IoU < 0.5 the box will be FP.

Note: If one object has multiple predictions the one with the highest overlap is considered TP.

Precision Recall Curve

The precision-recall curve is a graphical representation of the trade-off between the precision and recall of a object detection model. It is commonly used to evaluate the performance of models, especially in cases where the data is imbalanced.

To plot a precision-recall curve, the following steps are performed:

Sort the predictions based on the confidence score of each bounding box.
Calculate the accumulated precision and accumulated recall. (Refer table below)

Now we can use the precision recall value calculated above to plot the precision recall curve.

Average Precision (AP)

Average precision can be calculated using the area under the curve (AUC) of the precision recall curve as shown in the plot above.

Note: Average precision is calculated for each class.

Calculating Area Under Curve (AUC)

Average precision is the area under the precision recall curve. To find the area under the curve the following methods can be used:

Approximating Area under curve with rectangles

The average precision can be calculate by approximating the area under the curve using rectangles. This method is also called as rectangular approximation method or the method of rectangles.

This method divides the area under the curve into a series of rectangles and calculates the sum of their areas to estimate the total area. Refer below diagram.

Approximating area under the curve using rectangles

Width and height of each rectangle can be calculated as follows:

Then, the average precision will be sum of areas of these rectangles.

2. 11 Point Interpolation

11-point interpolation refers to a method of approximating a function’s value at intermediate points using a set of 11 data points.

The 11-point interpolation method was introduced in the 2007 PASCAL VOC challenge. It involves calculating Precision values at 11 equally spaced Recall values.

The recall values between [0, 1.0] are considered with an increment of 0.1.

For the recall values at 0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0 the precision is calculated as follows:

Take the maximum Precision value to the right of each Recall value. In other words, it finds the highest Precision corresponding to a Recall value greater than the current Recall value.

Start from the last precision value keep moving to the left as soon as a higher precision value is found update the precision value.

Average precision is to average the precisions at a set of 11 recall points.

For the example above the average will be:

The precision is interpolated only for 11 recall points to mitigate the influence of minor fluctuations in the precision/recall curve. Since the evaluation dataset is typically large, plotting the precision/recall graph for all predictions would result in very small differences between adjacent points. Therefore, the 11-point interpolation provides a sufficient basis for comparing and evaluating different models.