What is Clustering anyway?

What is Clustering in Machine Learning

Clustering is a popular classification machine-learning technique which can greatly improve the quality and speed of analysis. The following account traces a conversation between two friends who hold opposing views on the same, and how they end up on the same page at the end.

“Aarghh…what kind of IDE is that?”, shrieked Anita. “C’mon, you need to check this out! It helps you compile every few lines of code on the go! You can even generate and regenerate graphs and other outputs, as many times as you want!”, blurted Ramesh, with a tone enthused with the fervor and excitement of a nine-year-old.

What is Clustering in ml

To make sense of what Ramesh was trying to convince Anita about, let us look into the prelude to these events. Anita is a pass-out from a renowned business school. She leads a reputed FMCG firm called Hexene, as an executive officer. She is witty, with sharp critical thinking and problem-solving skills. However, she isn’t tech-savvy and the bare mention of the word code gets to her nerves. Her firm does not hire “data analysts” for some reason.

Anyway, a lot of her time is spent seeing presentations and hearing reports, which are prepared by a team of business analysts reporting directly to her. They do so after crunching meaning out of the sales and customer data that the company receives on a fortnightly basis.

There are revenue targets and margins to be fulfilled; customer feedback is to be monitored and it has to be ensured that they do not lose out to their competitors Indiana, a firm which is heavily relying on technical finesse to increase their market share. In fact, the word going around in the corporate circles was that Indiana’s owner has a stern belief in the power of analytics and thus, they had actively invested their resources in taking services from data analysts and incorporating their suggestions into their customer acquisition and sales growth strategies.

Ramesh, Anita’s friend from her undergrad days, has been a programming enthusiast who went on to build a career as a professional developer and aims to roll out his own creations in the market in the near future. He is quite open to taking risks and embracing changes that have become commonplace in his domain over the past few decades.

He is inspired by Anita’s hard work and her commitment to her passion. At the same time, he is an ardent believer of the adage, “You do not need to be a programmer in order to program” and desperately longs to change her views and deride her of her inhibitions regarding the power code possesses. He wondered how the idea could be driven home.

For the past few weeks, Anita has been fretting over what she called the missing piece of a puzzling question. She has been able to understand, after running a detailed competitor analysis, that both Hexene and its chief competitor Indiana, make similar products and target similar demographics. However, they are not able to adapt to the changing circumstances and demands of the market as quickly as their competitors are able to, with the reason being unascertained. She decided to take up the problem with Ramesh on their next weekly rendezvous.

Also Read✅✅ Types of Clustering Algorithms

What is Machine Learning Clustering

Halfway through Anita’s description of the situation, Ramesh exclaimed, “Would you like to have a look at my laptop once? You’d be baffled to know that I believe I’ve understood the problem. However, you shan’t be parochial.” While she was not exactly appalled, her spirits had been dampened at the mention of his PC, which was, according to her, a “repository of technical jargon and tainted English sentences”. Nevertheless, she decided to keep an open mind and hear him out.

Ramesh switched on the Jupyter Notebook and opened a blank .ipynb file. No sooner did he begin typing code than Anita went awry and the uneasy exchange stated in the beginning began taking place.

Ramesh asked Anita to wait a bit longer. He quickly typed out some lines and created a chart resembling the one shown below:

What is Clustering Machine Learning

She could not believe her eyes. The grouping, which Ramesh later referred to as clustering, had shown a mirror to the way her team comprehended and worked out the customer segmentation problem, on the basis of their satisfaction with the products Hexene made. She recognized the patterns, which enabled her to recognize the names of these clusters (collection of data points).

Roamers (the ones who did not buy a lot and were not very satisfied), Loyalists (who bought a lot despite not being highly satisfied), and Fans (who bought the most and were the most satisfied) were visible to her in a single chart. Doing something like this wasn’t impossible for her team, but it was unimaginable for her that it could be done with so little time and effort.

Also, she was unable to hear Ramesh now. She began wondering whether there could be a similarly elegant way to work out the other calculations they had. Python is going to free me from your death grip, Indiana, she thought to herself.

Companies and multinational corporations have adapted to the changing trends of the market, in order to stay relevant. Relying on the power of data can often seal the deal and this is what Anita ended up learning too. Clustering in Python with the help of Jupyter Notebook, was an unheard proposition a decade ago. Yet, here we are today!


  • Ayush Pareek

    Written by:

    I am a programmer, who loves all things code. I have been writing about data science and other allied disciplines like machine learning and artificial intelligence ever since June 2021. You can check out my articles at pickl.ai/blog/author/ayushpareek/ I have been doing my undergrad in engineering at Jadavpur University since 2019. When not debugging issues, I can be found reading articles online that concern history, languages, and economics, among other topics. I can be reached on LinkedIn and via my email.