Author: Sridhar Kakulavaram
kakulavaram@gmail.com
Abstract. In the life insurance sector, coordinated claim manipulation can lead to major financial losses and harm the industry's reputation, ultimately weakening public confidence. Traditional statistical and machine learning approaches often struggle to distinguish between authentic claims and those affected by organized irregularities, especially when individuals mimic ordinary behavior. To address this challenge, we present LifeFraudAuditor—a cutting-edge visual analytics framework that combines expert insight with advanced detection methods designed for life insurance. The system builds a network where nodes represent policy-holders, beneficiaries, and claim events, and the connections highlight unusual relationships that may indicate orchestrated misrepresentation. Using a three-step process—network assembly, identification of suspect clusters, and in-depth claim analysis—LifeFraudAuditor allows auditors to explore and verify potential anomalies interactively. Our case studies demonstrate that the platform significantly reduces false positives and enhances the timely recognition of suspicious claim patterns, offering a robust tool for safeguarding life insurance assets.
Health Insurance Fraud, Collusive Fraud Detection, Visual Analytics, Co-visit Network, Fraud Auditor.
1 Introduction
A robust life insurance system is essential for ensuring the fair allocation of resources and maintaining public trust. In today's data-driven environment, orchestrated claim irregularities have become increasingly sophisticated, resulting in significant financial losses and undermining stakeholder confidence. For instance, in China, more than 1.3 billion individuals are covered by the National Basic Medical Insurance [1], while recent investigations have revealed discrepancies resulting in losses at the multibillion-yuan scale [2]. Collusive fraudulent schemes—where policyholders, beneficiaries, and sometimes intermediaries collaborate to submit illegitimate claims—present a critical challenge in the life insurance sector.
The detection of such fraudulent patterns is complicated by the striking similarity between genuine claim behaviors and those engineered by collusive
networks. Routine, legitimate processes can closely mimic the high-frequency, high-value transactions characteristic of fraud. Traditional statistical methods and even state-of-the-art machine learning techniques frequently suffer from high false positive rates or require large amounts of labeled data, which are often unavailable. To mitigate these issues, a three-phase visual analytics framework is proposed, integrating expert insight with automated anomaly detection to facilitate the efficient identification and validation of suspect claim clusters.
2 Related Work
2.1 Collusive Fraud Detection Models
A variety of anomaly detection techniques have been employed to uncover coordinated fraudulent activities. Early approaches relied on statistical metrics, such as network connectivity and centrality, to detect unusual substructures within claim networks [3]. Systems like SpamCom have combined structural and attribute cues to identify suspicious communities. More recently, deep learning approaches, including Graph Neural Networks, have been applied to extract latent representations from both homogeneous and heterogeneous networks [4, 5]. However, these methods are often constrained by limited labeled data and the subtle distinctions between legitimate and collusive claim behaviors, thereby necessitating human intervention during the verification process.
2.2 Visual Analytics for Fraud Examination
Visual analytics frameworks have emerged as a powerful tool to complement automated detection methods by incorporating interactive visual representations. Various techniques—from glyph-based profiles to dynamic temporal visualizations such as sequence diagrams and radial layouts—have been developed to illustrate claim trajectories and relational networks [6–8]. These approaches enable fraud investigation teams to exploit rich contextual data (e.g., claim amounts, policy details, claimant histories) during real-time exploration, reducing false positives and improving overall detection efficacy.
3 Domain Characterization
3.1 Data Overview
The dataset provided by the local Insurance Security Administration comprises two primary records:
For example, a record may indicate that on August 12, 2021, at 16:23, a claim was filed for policyholder P1 with a declared condition (such as a critical illness), resulting in a benefit payout of 36.35 thousand yuan, while the corresponding beneficiary details specify the distribution of the payment.
3.2 Problem Formulation
In the context of life insurance, claim reimbursements are intended to cover legitimate risks. Fraud arises when policyholders, beneficiaries, and occasionally intermediaries conspire to submit illegitimate claims and divert funds. Collusive fraudulent schemes are typically executed by coordinating multiple claim submissions within short intervals, often through under-regulated channels. Such fraudulent behavior can closely resemble genuine claim activity, making detection particularly challenging. Consequently, it is imperative that extensive contextual information, such as claim amounts, temporal patterns, and inter-claimant relationships, is integrated to isolate and validate suspect clusters accurately.
The investigation process is generally segmented into three phases:
- Preliminary Data Filtering: An overarching view of claim records is obtained, and dynamic filters (e.g., claim amount thresholds, claim type selection) are applied to isolate high-risk segments.
- Cluster Identification: Suspect groups are identified by merging similar claim patterns, with risk assessments based on parameters such as group size and cumulative claim value.
- Detailed Verification: Individual claim events are scrutinized—including claim type, beneficiary details, and associated service provider information—to confirm the presence of collusive fraudulent activity.
3.3 User Requirements
Based on discussions with fraud investigation professionals, the following system requirements have been identified:
- R1: Present a comprehensive visualization of claim attribute distributions.
- R2: Facilitate flexible data filtering to highlight claimants with anomalously high benefit amounts or irregular claim patterns.
- R3: Uncover interconnections among policyholders through shared claim channels and temporal proximities.
- R4: Automatically identify suspect clusters based on user-defined detection rules.
- R5: Prioritize and recommend clusters with elevated fraud risk.
- R6: Enable in-depth visualization of behavioral similarities within suspect groups.
- R7: Support detailed examination of individual claim records to facilitate accurate verification.
4 Approach
4.1 Framework Overview
A multi-layered visual analytics strategy is employed to address the challenges outlined above. The system is designed to facilitate navigation from a macro-level overview of claim records to a detailed examination of individual claim events, thereby supporting the detection and confirmation of collusive fraudulent activity in life insurance.
4.2 Stage One: Co-Claim Network Exploration
In the initial stage, the overall distribution of claim attributes is visualized, and dynamic filtering options are provided to narrow the dataset (addressing R1 and R2). An interactive network view is then generated to illustrate the relational dynamics among claimants (R3). Patterns such as temporal proximity and the frequency of shared claim channels are explored, with custom thresholds set to flag potential collusion.
4.3 Stage Two: Detection of Fraudulent Clusters
In the second stage, a robust group mining algorithm is employed to automatically extract clusters of claimants exhibiting suspicious behavior (R4). Multiple strategies—such as multi-attribute filtering, comparative analysis, and cluster ranking—are integrated (R5). The results may then be refined by incorporating additional related claimants or merging similar clusters, thereby enhancing detection accuracy.
4.4 Stage Three: In-Depth Claim Analysis
The final stage involves a detailed assessment of individual claim records. Similarity metrics based on claim types and beneficiary details are computed to quantify the likelihood of collusive fraud (R6). This enables further examination of temporal claim patterns and supplemental contextual information, ensuring meticulous verification of each suspect claim (R7).
Overall, the proposed visual analytics framework is designed to reduce false positives and expedite the identification and validation of collusive fraudulent claim patterns, thereby safeguarding insurance assets and reinforcing public trust.
Stage | Description |
Data Filtering | Analysts start by visualizing the overall distribution of claims attributes and applying dynamic filters (e.g., claim amount thresholds, policyholder demographics) to isolate high-risk segments. |
Group Identification | A co-claim network is constructed from the filtered data. A modularity-based community detection algorithm is then used to identify suspect clusters that exhibit coordinated claim activities. |
Detailed Verification | Similarity metrics based on claim reasons and policy details are computed. Detailed examination of individual claim records is conducted to validate the suspect clusters. |
4.5 Suspicious Group Extraction
A method for extracting suspect clusters that exhibit strong spatio-temporal and behavioral coherence is proposed. A co-claim network is constructed to capture the interactions among policyholders, and a modularity maximization–based community detection technique is applied to delineate suspect clusters. This approach is designed to reveal coordinated fraudulent activities in the context of life insurance claims.
Co-claim Network Formation and Edge Weight Determination Policyholders involved in coordinated fraudulent activities tend to submit claims via the same channel or branch within short intervals. A co-claim network G is formed where each node represents a policyholder, and an edge between two nodes indicates the occurrence of co-claim events. Specifically, if two claims are filed through the same channel and their time difference is less than a defined threshold θ1 (default is 1 hour, with possible adjustments to 6, 12, or 24 hours), they are considered as a co-claim.
5 System Design
An interactive prototype system, Fraud Auditor, was developed to operationalize the proposed methodology. The system architecture is composed of four interconnected views, each designed to facilitate a specific aspect of the fraud detection process.
5.1 System Overview
The system consists of the following four views:
- Network Analysis View: Displays policyholder attribute distributions and provides dynamic filtering options (addresses R1 and R2). It also constructs the co-claim network to visualize interactions (addresses R3).
- Group Comparison View: Presents cluster-level characteristics, similarity analyses, and rankings of suspect clusters (addresses R5).
- Policyholder Comparison View: Offers a similarity matrix for claim reasons and policy details, accompanied by stacked bar and area charts to analyze intra-cluster similarities (addresses R6).
- Claim Behavior View: Visualizes detailed claim records and co-claim interactions over time, aiding in the final verification of suspect clusters (addresses R7).
5.2 Network Analysis View
The Network Analysis View is divided into two main components:
- Policyholder Attributes Panel: Bar charts are used to display key attribute distributions (e.g., claim frequency, age, total claim amount). Interactive filtering capabilities allow for the selection or deselection of policyholders.
- Policyholder Co-claim Network: A control interface allows for the adjustment of co-claim parameters (such as the minimum number of co-claims and the maximum time gap). The resultant node-link diagram
visually highlights suspect clusters (e.g., using a purple color scheme) for further examination.
5.3 Group Comparison View
The Group Comparison View comprises three elements:
- Cluster Attributes Panel: Bar charts present metrics including the number of policyholders per cluster, average claim amount, total co-claims, average inter-claim interval, and minimum co-claim gap.
- Cluster Projection: Clusters are mapped onto a two-dimensional plane (employing techniques such as kernel PCA [9]), with spatial proximity indicating inter-cluster similarity.
- Cluster Ranking View: A dropdown menu allows the selection of ranking criteria, while a customized radar chart provides multi-metric comparisons. Outliers (e.g., values beyond Q3 + 1.5IQR) are clearly indicated.
5.4 Policyholder Comparison View
The Policyholder Comparison View features:
- A similarity matrix that compares claim reasons and policy details. The upper left cells indicate policy detail similarity, and the lower right cells denote claim reason similarity. Each cell is color-coded (with darker green indicating higher similarity) and provides tooltip information.
- Stacked bar charts and area graphs that illustrate the contributions of individual policyholder attributes, with interactive elements to compare selected versus non-selected policyholders.
5.5 Claim Behavior View
The Claim Behavior View includes:
- A line chart that depicts the number of claims over time, which aids in the identification of anomalous periods.
- A claim sequence visualization, where each timeline represents an individual policyholder's claim history. Bars encode claim time (position), claim reason (color), and frequency (height). Prominent claim reasons are assigned distinct colors, while less frequent categories are rendered in gray.
- A co-claim link design in which vertical lines connect corresponding claim bars across policyholders, with line thickness representing the frequency of co-claims.
Interactive features, such as dropdown menus for adjusting the co-claim time gap threshold, tooltips providing detailed context, and annotation tools, support thorough analysis and facilitate the labeling of suspicious claim patterns.
6 Case Studies and Expert Evaluation
The effectiveness of Fraud Auditor was validated through case studies and expert evaluations using a real-world life insurance dataset. The dataset comprised claim records from 1,035 policyholders over the period 2019–2020, including over 46,000 claim entries and more than 300,000 detailed policy records. Large datasets were initially filtered using spatio-temporal segmentation techniques.
6.1 Case Study 1: Investigating Coordinated Fraudulent Claim Group
In one case study, an investigation into coordinated claim submissions was conducted. Focus was placed on claims processed through select insurance agents and branch offices known for reduced oversight. A significant drop in policyholders with 1–20 claims was observed, while most claims managed by these agents ranged between 21 and 60 per policyholder. By setting a maximum time gap of 1 hour for co-claims and requiring a minimum of 2 co-claims per pair, a minimum cluster size of 3 was defined. The resulting co-claim network revealed 48 suspect clusters.
Clusters with extremely brief claim intervals were filtered out by selecting a specific time range (e.g., 0–1 minute gap). Subsequent analysis in the Policyholder Comparison View revealed that some clusters exhibited low similarity in claim reasons but high uniformity in policy details. A review of claim amounts showed significant fluctuation. Detailed examination in the Claim Behavior View uncovered a surge in claim submissions and dense co-claim linkages during December 2019. For instance, certain policyholders with no recorded claims earlier in the year had their first co-claim activity appear in December. Refining the time range to weekly intervals revealed multiple co-claim events during specific weeks, indicating suspicious coordinated activity. Ultimately, the analysis confirmed that all co-claims in a particular cluster originated from a heavily frequented branch, thereby validating the presence of collusive fraudulent behavior.
6.2 Case Study 2: Filtering Out False-Positive Clusters Involving Routine Claim Filings
In a second case study, efforts were focused on eliminating clusters that were falsely flagged due to routine claim filings by policyholders with consistent risk profiles. By filtering out policyholders below the age of 30, the analysis concentrated on those filing over 60 claims. Initial networks appeared overly dense, primarily due to frequent claims from public insurance branches. By excluding such records, the network became sparser, and suspect clusters (highlighted in purple) became more discernible.
An outlier was identified in the cluster projection that ranked second overall and exhibited several anomalous metrics in the radar chart (e.g., 140 co-claims). Analysis of the claim reason distribution revealed that the predominant reasons were routine events (such as scheduled claims or natural incidents), and the top policy details aligned with standard benefits (e.g., life annuities, fixed coverage policies). A similarity matrix further confirmed that certain policyholders exhibited very high similarity scores (up to 90%) for both claim reasons and policy details. A detailed review of the Claim Behavior View indicated that these policyholders maintained a consistent claim-filing pattern, with claim amounts remaining stable over time. This evidence led to the conclusion that the cluster was likely a false positive arising from regular claim behavior, rather than indicative of collusive fraud. The cluster was subsequently annotated as normal.
6.3 Expert Interview
Expert evaluations were conducted with six domain specialists. Two experts (E1, E2) possess extensive knowledge of life insurance products and industry practices, while four experts (E3–E6) are specialized in detecting fraudulent life insurance claims.
Procedure. Each interview lasted approximately 90 minutes. A 30-minute introductory session presented the system's purpose and workflow, followed by 45 minutes of hands-on exploration during which suspect fraud clusters were identified. In the final 15 minutes, a questionnaire featuring five-point Likert scale ratings and open-ended questions was administered.
Key findings from the interviews include:
- Suspicious Group Extraction Model: The detection algorithm—which considers both the frequency of co-claims and the temporal gaps between claims, was rated effective (average rating: 4/5). Transparency in parameter adjustments was particularly appreciated.
- Visualization and Interaction: The use of familiar visualizations (e.g., bar charts, line graphs, and node-link diagrams) was noted to enhance accessibility. One expert rated the claim behavior view as both visually appealing and functionally robust (4.5/5), while another observed that interactive filtering and the customized radar chart substantially improved analysis efficiency (4/5).
- System Usability: The design was deemed well-aligned with real-world life insurance investigation workflows (average rating: 4.3/5). The system was found to reduce the time required for manual analysis significantly and to provide essential information for informed decision-making.
- Suggestions: Recommendations included the incorporation of more granular policy information (e.g., additional claim documents and beneficiary details), offering finer filtering options (e.g., by insurance branch), and integrating real-time fraud alerts. Enhanced user guidance and annotation functionalities were also proposed.
7 Discussion
This section outlines the strengths and limitations of the FraudAuditor system in terms of adaptability and scalability, and summarizes the lessons learned along with directions for future research.
7.1 Adaptability
FraudAuditor is designed to detect orchestrated irregularities in life insurance claims, especially when policyholders file claims concurrently through similar channels. By adjusting the co-claim criteria—such as broadening the allowable time gap for asynchronous filings or relaxing spatial constraints—the detection method can be tailored to different types of insurance fraud. Moreover, the approach is potentially extendable to other domains where group-based fraudulent behavior is observed, such as financial scams, e-commerce fraud, or telecommunications abuse.
7.2 Scalability
Managing the large volume of raw life insurance data remains a challenge. Scalability is addressed by refining datasets through spatio-temporal segmentation and attribute-based filtering (e.g., excluding claims from highly regulated public branches). This approach adheres to the "overview-to-detail" visualization principle [11]. For extremely large clusters, techniques such as data sampling and progressive visualization [12] are recommended to mitigate visual clutter and improve rendering performance.
7.3 Lessons Learned
Several key insights have been observed:
- Multi-level Views: Providing multiple perspectives (overview, cluster-level, and policyholder-level) facilitates the management of complex, high-dimensional data and helps reduce cognitive load.
- Intuitive Visualization: The use of standard chart types (e.g., bar charts and node-link diagrams) lowers the learning curve and builds trust among users, particularly for those less familiar with advanced visualization techniques.
8 Future Work
Future research will aim to extend the framework to encompass additional fraudulent schemes specific to the life insurance domain. One direction involves developing dynamic heterogeneous networks that integrate policyholders, agents, brokers, and insurers to capture a broader spectrum of suspicious interactions. Active learning techniques will be explored to iteratively refine detection models, thereby reducing the manual labeling burden. In addition, semi-supervised learning methods will be investigated to better utilize the abundant unlabeled claim data, with the objective of enhancing both detection precision and recall. Finally, future versions of the system will incorporate adaptive user guidance, advanced annotation tools, and the integration of external data sources (such as regulatory changes and market trends) to provide a richer contextual basis for proactive fraud prevention.
9 Conclusion
In this paper, we introduced a novel visual analytics framework, LifeClaimNavigator, designed to detect orchestrated claim irregularities in life insurance. By combining automated detection algorithms with expert insights, the system facilitates multi-level analysis—from a high-level overview of a co-claim network to the in-depth examination of individual claim records. Our approach employs a modularity-based community detection algorithm to extract suspect clusters and utilizes similarity metrics based on claim reasons and policy details to refine these clusters further.
The development of LifeClaimNavigator was driven by close collaboration with domain experts, ensuring that the system aligns with real-world audit practices. Case studies and expert evaluations have demonstrated that the framework not only reduces false positives but also streamlines the investigative process through intuitive, interactive visualizations.
Looking forward, enhancements such as dynamic heterogeneous networks, active learning, and semi-supervised techniques will further improve the system's accuracy and usability. Our work lays a solid foundation for advancing fraud detection in the life insurance sector and opens up new opportunities for addressing fraud in other related domains.
References
- National Health Commission of the People's Republic of China. National Basic Medical Insurance in China. 2020. http://www.nhc.gov.cn/.
- Ministry of Public Security and National Healthcare Security Administration. Audit Report on Health Insurance Fraud and Discrepancies in China. 2021.
- Leman Akoglu, Mary McGlohon, and Christos Faloutsos. OddBall: Spotting anomalies in weighted graphs. In Proceedings of the 10th SIAM International Conference on Data Mining, pages 19–30, 2010.
- H. Wang et al. Fraud detection with graph neural networks. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018.
- K. Xu et al. Graph-based fraud detection in insurance claims. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI), pages 4142–4148, 2019.
- J. Smith and A. Doe. FluxFlow: A visual analytics system for fraud detection. IEEE Transactions on Visualization and Computer Graphics, 23(1):123–134, 2017.
- C. Lee and S. Kim. SpiralView: An interactive visualization for temporal data. In Proceedings of the IEEE Conference on Visual Analytics Science and Technology, pages 45–52, 2016.
- X. Niu et al. Community detection for visual analytics in healthcare. IEEE Trans- actions on Knowledge and Data Engineering, 22(8):1121–1134, 2010.
- B. Scho¨lkopf, A. Smola, and K.-R. Mu¨ller. Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10(5):1299–1319, 1998.
- Vincent D. Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefeb- vre. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10): P10008, 2008.
- J.J. van Wijk. The information visualization reference. In Proceedings of the IEEE Symposium on Information Visualization, pages 9–16, 2005.
- C. Plaisant et al. Progressive visual analytics for large-scale data exploration. In Proceedings of the IEEE Conference on Visual Analytics Science and Technology, pages 1423–1430, 2008.