Radicl ML


Data has always played a central role in the insurance industry, and today, insurance carriers have access to more of it than ever before. We have created more data in the past five years than the human race has ever created. Insurers—like organisations in most industries—are overwhelmed by the explosion in data from a host of sources, including telematics, online and social media activity, voice analytics, connected sensors and wearable devices. They need machines to process this information and unearth analytical insights. But most insurers are struggling to maximise the benefits of machine learning. This situation is seeing a gradual but steady change, driven by an environment characterised by increased competition, elastic marketplaces, complex claims and fraud behaviour, higher customer expectations and tighter regulation. Insurers are being forced to explore ways to use predictive modelling and machine learning to maintain their competitive edge, boost business operations and enhance customer satisfaction.

Key Factors Driving this change

Smart everything:
Enterprises are looking to use advanced machine learning to drive smart, automated applications in fields such as healthcare diagnosis, predictive maintenance, customer service,and automated data centres.

Open source everywhere:
As data becomes omnipresent, open source protocols will emerge to ensure data is shared and used across. Different public and private entities will come together to create ecosystems for sharing data on multiple use cases under a common regulatory and cybersecurity framework.

Harnessing Internet of things (IoT) data:
The volume and velocity of data from IoT will drive the need to automate the generation of actionable insight using advanced machine learning tools. According to Gartner, by 2025, 35 percent of enterprises will employ dedicated people to monitor and guide machine learning (such as neural networks). The notion of training rather than programming systems will become increasingly important.

Ability to talk back:
Natural-language processing algorithms are continuously advancing. AI is becoming proficient at understanding spoken language and at facial recognition, helping to make it more useful and intuitive. These algorithms are evolving in unexpected ways, as Google found when Google Translate invented its own language to help it translate more effectively.


The myriad opportunities in the insurance sector, especially the health insurance sector motivated me to undertake this capstone project. In this ever advancing world, health has taken a back seat but the pandemic has made us realise its importance. Access to healthcare is still a challenge, primarily due to high costs associated with it. Insurance is the single most important factor that improves access to healthcare for individuals.

I wish to predict the insurance premium charges of an insuree given their characteristics. Their characteristics include

  • Age
  • BMI
  • Biological Sex
  • Smoker
  • Region

Exploratory data analysis yielded excellent insights into the data.

The data had no NULL values which is an important factor for data quality. Each of the independent data features had no outliers and data wrangling was successfully performed.


I applied a regression algorithm to predict insurance charges for the insurees. Post applying regression on the data set, the prediction was successful with an R-squared value of 0.7488.

Scope for Improvement

One could use other machine learning algorithms for improved accuracy and fine tuning of the model. A linear regression model’s performance could be improved by using:

  • Adding interaction terms to model how two or more independent variables together impact the target variable
  • Adding polynomial terms to model the nonlinear relationship between an independent variable and the target variable
  • Adding spines to approximate piecewise linear models
  • Fitting isotonic regression to remove any assumption of the target function form

One could also use non-linear ML algorithms like regression trees to improve performance.

Challenges insurers typically encounter when adopting machine learning

Training requirements:
AI-powered intellectual systems must be trained in a domain, e.g., claims or billing for an insurer. This requires a separate training system, which insurers find hard to provide for training the AI model. Models need to be trained with huge volumes of documents/transactions to cover all possible scenarios.

Right data source:
The quality of data used to train predictive models is equally important as the quantity, in the case of machine learning. The datasets need to be representative and balanced so that they can give a better picture and avoid bias. This is important to train predictive models. Generally, insurers struggle to provide relevant data for training AI models.

Difficulty in predicting returns:
It’s not very easy to predict improvements that machine learning can bring to a project. For example, it’s not easy to plan or budget a project using machine learning, as the funding needs may vary during the project, based on the findings. Therefore, it is almost impossible to predict the return on investment. This makes it hard to get everyone on board the concept and invest in it.

Data security:
The huge amount of data used for machine learning algorithms has created an additional security risk for insurance companies. With such an increase in collected data and connectivity among applications, there is a risk of data leaks and security breaches. A security incident could lead to personal information falling into the wrong hands. This creates fear in the minds of insurers.