How to find cluster centroid with Scikit-learn

By jacksparrow September 9, 2024

How to Find Cluster Centroids with Scikit-learn

Finding Cluster Centroids with Scikit-learn

Cluster analysis is a fundamental task in machine learning, and finding the centroids of clusters is often a key step. Scikit-learn provides powerful tools for clustering, and in this article, we’ll explore how to obtain the cluster centroids using different methods.

K-means Clustering

K-means is a popular clustering algorithm that partitions data points into ‘k’ clusters by iteratively assigning points to the nearest centroid and updating the centroids based on the assigned points.

1. Implementing K-means

 from sklearn.cluster import KMeans import numpy as np # Sample data X = np.array([[1, 2], [1.5, 1.8], [5, 8], [8, 8], [1, 0.6], [9, 11]]) # Initialize KMeans with 2 clusters kmeans = KMeans(n_clusters=2, random_state=0) # Fit the model to the data kmeans.fit(X) # Get the cluster centroids centroids = kmeans.cluster_centers_ # Print the centroids print(centroids)

2. Output

 [[1.25 1.1 ] [7. 9.5 ]]

The code above performs K-means clustering with 2 clusters and extracts the cluster centroids, which represent the average locations of data points within each cluster.

Other Clustering Methods

Scikit-learn also provides other clustering algorithms, such as DBSCAN and Agglomerative Clustering, which can be used to find cluster centroids. However, the specific way to extract the centroids may vary depending on the algorithm.

1. DBSCAN

DBSCAN (Density-Based Spatial Clustering of Applications with Noise) uses density-based connectivity to identify clusters.

 from sklearn.cluster import DBSCAN from sklearn.datasets import make_blobs # Generate sample data X, _ = make_blobs(n_samples=100, centers=3, random_state=0) # Initialize DBSCAN dbscan = DBSCAN(eps=0.5, min_samples=5) # Fit the model dbscan.fit(X) # Get the cluster labels labels = dbscan.labels_ # Find unique labels (including noise) unique_labels = np.unique(labels) # Calculate the centroids for each cluster centroids = [] for label in unique_labels: if label != -1: # Ignore noise points cluster_points = X[labels == label] centroid = np.mean(cluster_points, axis=0) centroids.append(centroid) print(centroids)

2. Agglomerative Clustering

Agglomerative Clustering is a hierarchical clustering approach. It builds a hierarchy of clusters by merging smaller clusters into larger ones.

 from sklearn.cluster import AgglomerativeClustering from sklearn.datasets import make_blobs # Generate sample data X, _ = make_blobs(n_samples=100, centers=3, random_state=0) # Initialize Agglomerative Clustering agg_clustering = AgglomerativeClustering(n_clusters=3) # Fit the model agg_clustering.fit(X) # Get the cluster labels labels = agg_clustering.labels_ # Calculate the centroids for each cluster centroids = [] for label in range(3): cluster_points = X[labels == label] centroid = np.mean(cluster_points, axis=0) centroids.append(centroid) print(centroids)

Summary

This article illustrated how to retrieve cluster centroids using Scikit-learn. By understanding the methods and their outputs, you can leverage these techniques to gain valuable insights from your data. Remember to choose the appropriate clustering algorithm based on the nature of your dataset and the specific task at hand.

Post Views: 9

How to find cluster centroid with Scikit-learn

Finding Cluster Centroids with Scikit-learn

K-means Clustering

1. Implementing K-means

2. Output

Other Clustering Methods

1. DBSCAN

2. Agglomerative Clustering

Summary

By jacksparrow

Leave a Reply Cancel reply

You Missed

What is Python? – Definition, Features, Application

KeyAttestation in Android Nougat API 24

UTM tracking codes in Firebase

android.os.BadParcelableException: ClassNotFoundException when unmarshalling: com.facebook.flatbuffers.helpers.FlatBufferModelHelper$LazyHolder

How to find cluster centroid with Scikit-learn

Finding Cluster Centroids with Scikit-learn

K-means Clustering

1. Implementing K-means

2. Output

Other Clustering Methods

1. DBSCAN

2. Agglomerative Clustering

Summary

By jacksparrow

Related Post

Leave a Reply Cancel reply

You Missed

What is Python? – Definition, Features, Application

KeyAttestation in Android Nougat API 24

UTM tracking codes in Firebase

android.os.BadParcelableException: ClassNotFoundException when unmarshalling: com.facebook.flatbuffers.helpers.FlatBufferModelHelper$LazyHolder