It is always scary and fun to embark on a new journey but you get to learn a lot of new things and gain a ton of experience during the process. In this blog, I will try to explain and summarize my journey working with TransOrg Analytics in the vast domain of machine learning. I joined them in March of the year 2022 and got the amazing opportunity to learn and get my hands dirty on different models and projects, especially on the Google Cloud Platform.
How it started
During the initial period, we went over lots of core machine learning concepts and approaches that were necessary to create a deeper understanding of the workings of the domain and at the same time helped create a data mindset and with the ability to view and analyze datasets. Later on, we were introduced to the Google Cloud Platform often referred to as GCP and we had a hands-on session with the workings of the platform and the various capabilities it possesses. To understand the console we worked on a simple Natural Language Processing-based chatbot based on Auto ML.
This helped me understand how the machine views and processes the queries and performs sentimental analysis to give the right response to the user and point them in the right direction. Playing around on GCP showed me how this process can be more streamlined and made more
accurate and robust with different possible changes.
Getting onto the personal project
After attaining, a good understanding of everything required it was now time to build a personal project-based. For this project, I wanted to build something relevant to a vast number of users and was easy to access and use at the same time. I started looking in the direction of healthcare as a possible domain of application as it will be relevant to a majority of users. I came across a dataset that contained the tax insurance cost of an individual based on various factors such as their age, sex, BMI, etc. I downloaded this dataset and explored it in python using various tools and visualized it to get a deeper understanding of the data. I found some inconsistencies in the
data but was quickly able to fix them and transformed the relevant rows to better work with the linear regression algorithm.
The following libraries were used for data correction, exploration, visualisation and model fitting:
Cleaned dataset’s insights
After training the model and after some minor tweaks the model was consistently performing and giving accurate results. The training data yielded a R-squared value of 0.751505643411174 and I was able to achieve a very close R-squared value of 0.7447273869684077 for the test data which showed there was no training bias.
It was now time to shift the entire thing to Google Cloud Platform so it can be remotely accessed and can run on the cloud itself. The treated dataset was used for this purpose Vertex AI was used to train a brand new model that performed similarly to the local machine model predicting the cost for health insurance a person should be looking to pay based on various parameters. With the help of the built in tools of GCP the model was further optimised to each a R-squared score of 0.948 for cloud model, giving very accurate results to the users when deployed.
Now to make it accessible to everyone it needed to be deployed on the internet. I choose to create a Web Application for this purpose using bubble.io which takes the user information as input, sends it to the linear regression model which predicts the cost, and sends it back to the web application to be displayed to the user using an API.