AGEAS-Project

Insurance Claim Modeling with Random Forest

Frequency–Severity Approach

TL;DR

Two-part insurance risk model:

Combining both:

Expected Loss = Frequency × Severity

This project applies Random Forest models to capture nonlinear risk patterns and analyze insurance portfolio behavior.


Problem

Insurance risk is not a single prediction problem.

Instead, expected loss is modeled as:

Expected Cost = Frequency × Severity

Where:

This structure better reflects real-world insurance pricing.


Methodology

Frequency Model

Severity Model

Final Output


Key Results

Business Impact: Total Expected Loss per Segment

Total Loss

Some segments contribute disproportionately to total loss.
This highlights that portfolio concentration is as important as individual risk.


Risk Structure: Frequency vs Severity

Risk Scatter

There is a clear nonlinear relationship:


Feature Importance (Frequency Model)

Feature Importance

A small number of features dominate prediction power,
suggesting strong underlying drivers of claim occurrence.


Interpretable Segmentation (Surrogate Tree)

Surrogate Tree

A decision tree approximation of the model reveals:


Distribution of Expected Loss

Expected Loss Distribution

The distribution is highly skewed:

This is typical in insurance and reinforces the need for segmentation.


Portfolio-Level Insights

Customer Segmentation Overview

Summary Panel


Key Takeaways


Project Structure