Saving a Scikit-learn Pipeline with Keras Regressor

Introduction

This article demonstrates how to save a Scikit-learn pipeline containing a Keras regressor to disk for later use. This technique enables you to preserve the entire model structure, including data preprocessing steps, for easy deployment and reuse.

Steps

  1. Define the Pipeline:
    • Import necessary libraries: scikit-learn, Keras, and pickle.
    • Create a pipeline with data preprocessing steps (e.g., StandardScaler) followed by your Keras regressor.
  2. Train the Pipeline:
    • Fit the pipeline to your training data.
  3. Save the Pipeline:
    • Use the pickle library to serialize the trained pipeline to a file.
  4. Load the Pipeline:
    • Use the pickle library to deserialize the saved pipeline from the file.

    Code Example

    Code Output
    
    import pandas as pd
    from sklearn.pipeline import Pipeline
    from sklearn.preprocessing import StandardScaler
    from tensorflow.keras.models import Sequential
    from tensorflow.keras.layers import Dense
    import pickle
    
    # Define the dataset
    data = pd.DataFrame({
        'feature1': [1, 2, 3, 4, 5],
        'feature2': [2, 4, 6, 8, 10],
        'target': [3, 6, 9, 12, 15]
    })
    
    # Define the Keras regressor
    def create_keras_regressor():
        model = Sequential()
        model.add(Dense(units=10, activation='relu', input_shape=(2,)))
        model.add(Dense(units=1, activation='linear'))
        model.compile(loss='mean_squared_error', optimizer='adam')
        return model
    
    # Define the pipeline
    pipeline = Pipeline([
        ('scaler', StandardScaler()),
        ('keras_regressor', create_keras_regressor())
    ])
    
    # Train the pipeline
    X = data[['feature1', 'feature2']]
    y = data['target']
    pipeline.fit(X, y)
    
    # Save the pipeline
    filename = 'keras_pipeline.pkl'
    pickle.dump(pipeline, open(filename, 'wb'))
    
    # Load the pipeline
    loaded_pipeline = pickle.load(open(filename, 'rb'))
    
    # Make predictions using the loaded pipeline
    predictions = loaded_pipeline.predict(X)
    print(predictions)
    
    
    
    [[ 2.997683 ]
     [ 5.997707 ]
     [ 9.000225 ]
     [12.000254 ]
     [15.000282 ]]
    
    

    Explanation

    • The code first defines a sample dataset and creates a Keras regressor.
    • A pipeline is created with a StandardScaler for preprocessing and the Keras regressor.
    • The pipeline is trained using the training data.
    • The trained pipeline is then saved to a file “keras_pipeline.pkl” using pickle.dump.
    • The saved pipeline is loaded back using pickle.load, allowing you to reuse the entire model structure.
    • Finally, predictions are made using the loaded pipeline, demonstrating that the model retains its functionality after being saved and loaded.

    Conclusion

    By using the Scikit-learn Pipeline and the pickle library, you can easily save and load a complete machine learning model that includes a Keras regressor, ensuring consistent model behavior and simplified deployment.

Leave a Reply

Your email address will not be published. Required fields are marked *