OneHotEncoder categorical_features deprecated, how to transform specific column

By jacksparrow August 31, 2024

OneHotEncoder: categorical_features Deprecated

The categorical_features parameter in the scikit-learn’s OneHotEncoder has been deprecated. This article explains the change and provides a clear solution for encoding specific columns.

Understanding the Change

In older versions of scikit-learn, OneHotEncoder allowed specifying the indices of categorical features using the categorical_features parameter. This approach is no longer recommended.

New Approach: handle_unknown and drop

The updated OneHotEncoder now works with all features, regardless of their type. To encode only specific columns, you can use these parameters:

handle_unknown: Controls how the encoder handles unseen categories during testing. Options include:

‘ignore’: Ignores unseen categories. (default)
‘error’: Raises an error for unseen categories.
‘use_encoded_value’: Encodes unseen categories using a dedicated value.

drop: Determines whether to drop columns. Options include:

‘first’: Drops the first column for each categorical feature.
‘if_binary’: Drops a column if the categorical feature has only two categories.

Example: Encoding a Specific Column

Let’s illustrate the process with an example:

Column Name	Data Type
City	Categorical
Age	Numeric
Income	Numeric

We want to encode the ‘City’ column only.

Code


import pandas as pd
from sklearn.preprocessing import OneHotEncoder

# Sample data
data = {'City': ['New York', 'London', 'Paris', 'New York', 'London'],
        'Age': [25, 30, 28, 32, 27],
        'Income': [50000, 60000, 45000, 70000, 55000]}
df = pd.DataFrame(data)

# Create the encoder
encoder = OneHotEncoder(handle_unknown='ignore', drop='first')

# Fit the encoder on the 'City' column
encoder.fit(df[['City']])

# Transform the 'City' column
encoded_city = encoder.transform(df[['City']]).toarray()

# Create a new DataFrame with the encoded features
encoded_df = pd.DataFrame(encoded_city, columns=encoder.categories_[0][1:])

# Concatenate the encoded features with the original DataFrame
df = pd.concat([df, encoded_df], axis=1)

# Print the result
print(df)

Output


       City  Age  Income  London  Paris
0  New York   25  50000       0      0
1    London   30  60000       1      0
2     Paris   28  45000       0      1
3  New York   32  70000       0      0
4    London   27  55000       1      0

In this example, the ‘City’ column has been successfully one-hot encoded, with the ‘New York’ column dropped as the reference category. The encoded features are now added to the original DataFrame.

Post Views: 9

OneHotEncoder categorical_features deprecated, how to transform specific column

OneHotEncoder: categorical_features Deprecated

Understanding the Change

New Approach: handle_unknown and drop

Example: Encoding a Specific Column

Code

Output

By jacksparrow

Leave a Reply Cancel reply

You Missed

What is Python? – Definition, Features, Application

KeyAttestation in Android Nougat API 24

UTM tracking codes in Firebase

android.os.BadParcelableException: ClassNotFoundException when unmarshalling: com.facebook.flatbuffers.helpers.FlatBufferModelHelper$LazyHolder

OneHotEncoder categorical_features deprecated, how to transform specific column

OneHotEncoder: categorical_features Deprecated

Understanding the Change

New Approach: handle_unknown and drop

Example: Encoding a Specific Column

Code

Output

By jacksparrow

Related Post

Leave a Reply Cancel reply

You Missed

What is Python? – Definition, Features, Application

KeyAttestation in Android Nougat API 24

UTM tracking codes in Firebase

android.os.BadParcelableException: ClassNotFoundException when unmarshalling: com.facebook.flatbuffers.helpers.FlatBufferModelHelper$LazyHolder