K-means clustering cyber crime use case !

darshil shah
2 min readSep 8, 2021

the history

the term “k-means” was first used by james macqueen in 1967 as part of his paper on “some methods for classification and analysis of multivariate observations”. the standard algorithm was also used in bell labs as part of a technique in pulse code modulation in 1957. it was also published by in 1965 by e. w. forgy and typically is also known as the lloyd-forgy method.

what is k-means?

clustering is the task of dividing the population or data points into a number of groups such that data points in the same groups are more similar to other data points in the same group than those in other groups. in simple words, the aim is to segregate groups with similar traits and assign them into clusters. the goal of the k-means algorithm is to find groups in the data, with the number of groups represented by the variable k. the algorithm works iteratively to assign each data point to one of k groups based on the features that are provided. in the reference image below, k=2, and there are two clusters identified from the source dataset.

cyber-profiling criminals

cyber-profiling is the process of collecting data from individuals and groups to identify significant co-relations. the idea of cyber profiling is derived from criminal profiles, which provide information on the investigation division to classify the types of criminals who were at the crime scene. here is an interesting white paper on how to cyber-profile users in an academic environment based on user data preferences.

--

--