Skip to content

K means clustering for breast-cancer-wisconsin.data from scratch

Notifications You must be signed in to change notification settings

tarunkolla/K-Means

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 

Repository files navigation

K-Means

Data Set:

breast-cancer-wisconsin.data or you can go to find it on GitHub here

The data set contains 11 columns, separated by comma. The first column is the example id, and has been ignored. The second to tenth columns are the 9 features, based on which K-means algorithm works. The last column is the class label, and has been ignored as well.

Implementation:

K-Means algorithm performs clustering on the above dataset with K = 2, 3, 4, 5, 6, 7, 8. For each K value, the algorithm is first run and then the potential function is computed as follows:

Potential Function

where m is the number of examples, xj denotes the feature vector for j th example and µC(j) refers to the centroid of the cluster that xj belongs to.

Empty clusters in a certain iteration have been droped and randomly the largest cluster is split into two clusters to maintain the total number of clusters at K.

Results:

A graph is plot for the values of k and L(K).

Graph

Releases

No releases published

Packages

No packages published

Languages