Convert array of indices to one-hot encoded array in NumPy
Introduction
One-hot encoding is a common technique used in machine learning to represent categorical data. In this technique, each category is represented by a binary vector with a single “1” at the index corresponding to that category, and all other elements are “0”. NumPy provides a convenient way to convert an array of indices to a one-hot encoded array using the np.eye
function.
Using np.eye
The np.eye
function generates an identity matrix, which is a square matrix with ones on the diagonal and zeros elsewhere. This can be utilized to create a one-hot encoded array from an array of indices.
Steps
- Generate an identity matrix with the size equal to the number of unique indices.
- Index the identity matrix using the array of indices.
Code Example
import numpy as np
# Array of indices
indices = np.array([1, 2, 0, 1, 3])
# Number of unique indices
num_classes = 4
# One-hot encode the indices
one_hot = np.eye(num_classes)[indices]
# Print the one-hot encoded array
print(one_hot)
Output
[[0. 1. 0. 0.]
[0. 0. 1. 0.]
[1. 0. 0. 0.]
[0. 1. 0. 0.]
[0. 0. 0. 1.]]
Explanation
np.eye(num_classes)
generates a 4×4 identity matrix.np.eye(num_classes)[indices]
selects rows from the identity matrix based on the values in theindices
array.- The resulting array
one_hot
is a one-hot encoded representation of the original indices.
Benefits of using np.eye
- Simplicity: The code is concise and easy to understand.
- Efficiency:
np.eye
is a highly optimized function for generating identity matrices. - Flexibility: This approach works with any array of indices, regardless of the data type.
Conclusion
Using NumPy’s np.eye
function provides a simple and efficient way to convert an array of indices to a one-hot encoded array. This technique is widely used in machine learning for representing categorical data and can be easily implemented in your NumPy code.