Plot PCA loadings and loading in biplot in sklearn (like R’s autoplot)

By jacksparrow August 31, 2024

Plotting PCA Loadings and Biplots in scikit-learn

Visualizing Principal Component Analysis (PCA) in scikit-learn: Loadings and Biplots

Principal Component Analysis (PCA) is a powerful dimensionality reduction technique widely used in data analysis. After performing PCA, visualizing the loadings and creating biplots can provide valuable insights into the relationships between variables and principal components.

Understanding PCA Loadings

PCA loadings represent the contribution of each original feature to the corresponding principal component. They essentially indicate how much each variable “loads” onto the principal component. Loadings are often visualized as a bar chart or a heatmap to understand which variables are most influential in defining each principal component.

Creating a Loadings Plot in scikit-learn

Let’s illustrate how to plot loadings using scikit-learn. We’ll use the famous Iris dataset as an example:

from sklearn.datasets import load_iris from sklearn.preprocessing import StandardScaler from sklearn.decomposition import PCA import matplotlib.pyplot as plt import pandas as pd


# Load the Iris dataset

iris = load_iris()

X = iris.data

features = iris.feature_names
# Standardize the data

scaler = StandardScaler()

X_scaled = scaler.fit_transform(X)
# Apply PCA with 2 components

pca = PCA(n_components=2)

pca.fit(X_scaled)
# Get the loadings

loadings = pca.components_
# Create a dataframe for easier visualization

loadings_df = pd.DataFrame(loadings, columns=features)

loadings_df.index = ['PC1', 'PC2']

# Plot the loadings plt.figure(figsize=(10, 6)) loadings_df.T.plot(kind='bar', rot=0) plt.xlabel('Features') plt.ylabel('Loadings') plt.title('PCA Loadings') plt.legend(title='Principal Components') plt.show()

<Figure size 720x432 with 1 Axes>

In the above code, we first load and scale the Iris dataset. Then, we perform PCA with two components and obtain the loadings. Finally, we create a pandas DataFrame for the loadings and plot them using a bar chart. This visualization shows the contribution of each feature to PC1 and PC2.

Biplots: Visualizing Data and Loadings Together

A biplot combines the scores of the principal components with the loadings. This allows you to see both the data points and the relationships between variables in the reduced space.

Creating a Biplot in scikit-learn

To create a biplot, we’ll need to calculate the principal component scores and then use Matplotlib’s `plt.scatter` and `plt.quiver` functions:

import matplotlib.pyplot as plt


# Get the principal component scores

scores = pca.transform(X_scaled)
# Create the biplot

plt.figure(figsize=(10, 6))

plt.scatter(scores[:, 0], scores[:, 1], c=iris.target, cmap='viridis', alpha=0.7)
# Plot the loadings

for i, (x, y) in enumerate(loadings):

    plt.arrow(0, 0, x, y, color='k', head_width=0.05, head_length=0.1, alpha=0.8)

    plt.text(x, y, features[i], fontsize=10)

plt.xlabel('PC1 ({}%)'.format(round(pca.explained_variance_ratio_[0] * 100, 2))) plt.ylabel('PC2 ({}%)'.format(round(pca.explained_variance_ratio_[1] * 100, 2))) plt.title('Biplot of Iris Dataset') plt.grid(True) plt.show()

<Figure size 720x432 with 1 Axes>

In this code, we calculate the principal component scores, plot the data points as a scatter plot, and then draw arrows for the loadings. The length of the arrows represents the magnitude of the loadings, and their direction indicates the contribution of each variable to the corresponding principal component.

Benefits of Visualization

Plotting loadings and creating biplots offers several advantages:

Data Interpretation: Loadings help understand which variables are most responsible for each principal component, providing insights into the underlying structure of the data.
Variable Relationships: Biplots reveal the relationships between variables and how they contribute to the principal components. This can help identify potential correlations and dependencies.
Data Exploration: Visualizing the data and loadings together facilitates a comprehensive analysis and exploration of the reduced dimensionality space.

Conclusion

Visualizing PCA loadings and creating biplots in scikit-learn allows you to gain deeper insights into the relationships between variables and principal components. By plotting these components, you can enhance your understanding of the underlying data structure and make informed decisions based on your findings.

Post Views: 6

Plot PCA loadings and loading in biplot in sklearn (like R’s autoplot)

Visualizing Principal Component Analysis (PCA) in scikit-learn: Loadings and Biplots

Understanding PCA Loadings

Creating a Loadings Plot in scikit-learn

Biplots: Visualizing Data and Loadings Together

Creating a Biplot in scikit-learn

Benefits of Visualization

Conclusion

By jacksparrow

Leave a Reply Cancel reply

You Missed

What is Python? – Definition, Features, Application

KeyAttestation in Android Nougat API 24

UTM tracking codes in Firebase

android.os.BadParcelableException: ClassNotFoundException when unmarshalling: com.facebook.flatbuffers.helpers.FlatBufferModelHelper$LazyHolder

Plot PCA loadings and loading in biplot in sklearn (like R’s autoplot)

Visualizing Principal Component Analysis (PCA) in scikit-learn: Loadings and Biplots

Understanding PCA Loadings

Creating a Loadings Plot in scikit-learn

Biplots: Visualizing Data and Loadings Together

Creating a Biplot in scikit-learn

Benefits of Visualization

Conclusion

By jacksparrow

Related Post

Leave a Reply Cancel reply

You Missed

What is Python? – Definition, Features, Application

KeyAttestation in Android Nougat API 24

UTM tracking codes in Firebase

android.os.BadParcelableException: ClassNotFoundException when unmarshalling: com.facebook.flatbuffers.helpers.FlatBufferModelHelper$LazyHolder