Convert array of indices to one-hot encoded array in NumPy

Introduction

One-hot encoding is a common technique used in machine learning to represent categorical data. In this technique, each category is represented by a binary vector with a single “1” at the index corresponding to that category, and all other elements are “0”. NumPy provides a convenient way to convert an array of indices to a one-hot encoded array using the np.eye function.

Using np.eye

The np.eye function generates an identity matrix, which is a square matrix with ones on the diagonal and zeros elsewhere. This can be utilized to create a one-hot encoded array from an array of indices.

Steps

  1. Generate an identity matrix with the size equal to the number of unique indices.
  2. Index the identity matrix using the array of indices.

Code Example


import numpy as np

# Array of indices
indices = np.array([1, 2, 0, 1, 3])

# Number of unique indices
num_classes = 4

# One-hot encode the indices
one_hot = np.eye(num_classes)[indices]

# Print the one-hot encoded array
print(one_hot)

Output


[[0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 0. 1.]]

Explanation

  • np.eye(num_classes) generates a 4×4 identity matrix.
  • np.eye(num_classes)[indices] selects rows from the identity matrix based on the values in the indices array.
  • The resulting array one_hot is a one-hot encoded representation of the original indices.

Benefits of using np.eye

  • Simplicity: The code is concise and easy to understand.
  • Efficiency: np.eye is a highly optimized function for generating identity matrices.
  • Flexibility: This approach works with any array of indices, regardless of the data type.

Conclusion

Using NumPy’s np.eye function provides a simple and efficient way to convert an array of indices to a one-hot encoded array. This technique is widely used in machine learning for representing categorical data and can be easily implemented in your NumPy code.

Leave a Reply

Your email address will not be published. Required fields are marked *