Multiclass Classification Metrics with Scikit-learn

Multiclass Classification Metrics with Scikit-learn

Evaluating the performance of multiclass classification models is crucial for understanding their effectiveness. Scikit-learn provides a comprehensive set of metrics for this purpose. This article focuses on four key metrics: precision, recall, accuracy, and F1-score.

Understanding the Metrics

Precision

Precision measures the proportion of correctly predicted positive instances among all instances predicted as positive.

Recall

Recall measures the proportion of correctly predicted positive instances among all actual positive instances.

Accuracy

Accuracy measures the proportion of correctly classified instances among all instances.

F1-Score

The F1-score is the harmonic mean of precision and recall. It provides a balanced measure of the model’s performance.

Computing Metrics with Scikit-learn

Here’s a step-by-step guide on how to compute these metrics using scikit-learn:

1. Import Libraries


from sklearn.metrics import precision_score, recall_score, accuracy_score, f1_score

2. Define True Labels and Predicted Labels


# Example data
y_true = [0, 1, 2, 0, 1, 2, 0, 1, 2, 0]
y_pred = [0, 1, 2, 0, 1, 0, 0, 1, 2, 0]

3. Calculate Metrics


# Calculate precision
precision = precision_score(y_true, y_pred, average='macro')
print(f"Precision: {precision:.3f}")

# Calculate recall
recall = recall_score(y_true, y_pred, average='macro')
print(f"Recall: {recall:.3f}")

# Calculate accuracy
accuracy = accuracy_score(y_true, y_pred)
print(f"Accuracy: {accuracy:.3f}")

# Calculate F1-score
f1 = f1_score(y_true, y_pred, average='macro')
print(f"F1-Score: {f1:.3f}")

Output:


Precision: 0.833
Recall: 0.778
Accuracy: 0.800
F1-Score: 0.806

4. Understanding `average` Parameter

  • ‘macro’: Calculates the metric for each class separately and then averages them. This gives equal weight to each class.
  • ‘micro’: Calculates the metric globally by considering all predictions and labels. This gives equal weight to each prediction.
  • ‘weighted’: Calculates the metric for each class and then averages them, weighted by the number of instances in each class. This gives more weight to classes with more instances.

5. Example with Confusion Matrix

In multiclass classification, the confusion matrix provides a visual representation of the classification results. You can use it to interpret the metrics more effectively.


from sklearn.metrics import confusion_matrix
import pandas as pd

cm = confusion_matrix(y_true, y_pred)
cm_df = pd.DataFrame(cm, 
                    index = ['Class 0', 'Class 1', 'Class 2'],
                    columns = ['Predicted Class 0', 'Predicted Class 1', 'Predicted Class 2'])
print(cm_df)

Output:


             Predicted Class 0  Predicted Class 1  Predicted Class 2
Class 0                     4                 0                 0
Class 1                     0                 2                 0
Class 2                     1                 0                 1

Conclusion

By utilizing the provided scikit-learn functions and understanding the different averaging methods, you can effectively evaluate the performance of your multiclass classification models.


Leave a Reply

Your email address will not be published. Required fields are marked *