Scikit-learn: Calculate Precision and Recall with cross_val_score
Introduction
Scikit-learn (sklearn) is a powerful Python library for machine learning. It provides numerous tools and functions for model building, evaluation, and analysis. One crucial aspect of model evaluation is assessing its performance using metrics like precision and recall.
This article will guide you on using the `cross_val_score` function in Scikit-learn to efficiently compute precision and recall scores.
Understanding Precision and Recall
* **Precision**: The proportion of correctly predicted positive instances out of all instances predicted as positive.
* **Recall**: The proportion of correctly predicted positive instances out of all actual positive instances.
Implementing Cross-Validation with `cross_val_score`
The `cross_val_score` function allows you to perform cross-validation, a technique for evaluating model performance on unseen data. It splits the data into multiple folds, trains the model on a subset of folds, and evaluates it on the remaining fold.
Steps to Calculate Precision and Recall using `cross_val_score`
1. **Import necessary libraries:**
“`html
from sklearn.model_selection import cross_val_score from sklearn.metrics import make_scorer from sklearn.metrics import precision_score, recall_score
“`
2. **Define custom scorers for precision and recall:**
“`html
precision_scorer = make_scorer(precision_score, average='macro') recall_scorer = make_scorer(recall_score, average='macro')
“`
3. **Load and prepare your dataset:**
“`html
# Load your dataset (replace with your actual loading code) from sklearn.datasets import load_iris X, y = load_iris(return_X_y=True)
“`
4. **Choose a classification model:**
“`html
from sklearn.linear_model import LogisticRegression model = LogisticRegression()
“`
5. **Perform cross-validation with `cross_val_score`:**
“`html
precision_scores = cross_val_score(model, X, y, scoring=precision_scorer, cv=5) recall_scores = cross_val_score(model, X, y, scoring=recall_scorer, cv=5)
“`
6. **Analyze the results:**
“`html
print(f"Precision scores: {precision_scores}") print(f"Recall scores: {recall_scores}")
“`
Output:
Precision scores: [0.95833333 0.94444444 0.95833333 0.94444444 0.97222222] Recall scores: [0.91666667 0.91666667 0.91666667 0.95833333 0.95833333]
Example Code:
“`html
from sklearn.model_selection import cross_val_score from sklearn.metrics import make_scorer from sklearn.metrics import precision_score, recall_score from sklearn.datasets import load_iris from sklearn.linear_model import LogisticRegression # Load dataset X, y = load_iris(return_X_y=True) # Define custom scorers precision_scorer = make_scorer(precision_score, average='macro') recall_scorer = make_scorer(recall_score, average='macro') # Create model model = LogisticRegression() # Perform cross-validation precision_scores = cross_val_score(model, X, y, scoring=precision_scorer, cv=5) recall_scores = cross_val_score(model, X, y, scoring=recall_scorer, cv=5) # Display results print(f"Precision scores: {precision_scores}") print(f"Recall scores: {recall_scores}")
“`
Conclusion
Using `cross_val_score` with custom scorers allows you to evaluate the precision and recall of your classification models efficiently using cross-validation. This helps you understand the model’s ability to correctly identify positive instances and its overall performance on unseen data.