Machine Learning - Detecting Fraudulent Transactions with Isolation Forests

In an ever-increasingly interconnected digital world, billions of transactions occur every day via various systems, from point-of-sale terminals within traditional stores to online payment gateways. These systems have provided great opportunities and helped drive new innovative businesses with unique business models. While there have been significant benefits, there has also been a sharp rise in ever more sophisticated cybercrime.

One of the most common forms of cybercrime is credit card fraud, accounting for billions of dollars listed in the financial sector globally. Given the number of transactions that occur every day, it is challenging for financial institutions to combat cybercriminals; recent advances in Machine Learning have given rise to new methods for identifying and detecting fraudulent transactions. Accurate fraud identification allows for automated mitigated strategies such as alerting the customer and requesting further confirmation before a transaction proceeds.

This case study explores a machine learning-oriented approach to credit card fraud identification. Machine Learning has proven effective in many different settings and is also efficient in running on large volumes of data, an essential consideration for software engineers implementing banking systems.

Solving the Problem via Artificial Intelligence

Detecting anomalies has traditionally been a problem that is within the statistics field. Conventional methods assume a dataset falls within the normal distribution. Thus, outliers are detected by observing which data points deviate significantly from the norm. The drawback of this method is that if a dataset is not normally distributed, the technique's effectiveness is limited.

Machine Learning approaches using recent algorithmic advances were later attempted, which proved to be highly effective, scalable and overcome the aforementioned limitations of statistical methods. Although not widely adopted outside the financial sector, many of these methods have currently been productionised in banking.

A novel approach was developed in 2008 in [1] by exploiting a unique property of outliers, being that outliers typically are isolated relative to a majority of the data points. Given this property, it is possible to generate random partitions surrounding data points to enclose a data point, the fewer partitions required to isolate a data point, the more likely such a data point is an outlier. The algorithm developed has a linear time complexity and was proven to work well even when limited training data is available; this contrasts with typical approaches that request extensive training data.

Credit Card Fraud Animation

Overview of the Organisational Challenge

Considering that billions of transactions occur daily, detecting fraudulent outliers and running a model in real-time is challenging. A visual inspection highlights that finding a needle in a haystack is like finding a needle. The following images illustrate banking transactions over time, with legitimate green and fraudulent red. It is challenging to isolate fraudulent transactions. Financial institutions are required to attempt to combat fraud to comply with regulations. It is also an expectation of customers. Usually, when fraud occurs, the financial institution pays the cost to maintain customer satisfaction.

Credit Card Transactions Scatter Plot

Credit Card Transactions Packed Bubble Chart

Organisations are increasingly turning to machine learning methods as part of their digital transformation journeys to solve problems that require scale as fraud detection. Many of the makers to detect fraud are typically stored within data warehouses. Forensic accounting techniques are also quite advanced in determining metrics used as inputs for machine learning models.

Isolation forests have been applied to the Kaggle credit card dataset [2] and have been demonstrated to be 99% effective in detecting fraudulent transactions [3]. Given that a general approach has been determined that works, most organisations face implementation challenges that work at scale rather than having to research & develop a solution.

Organisational Data Available as ML Input

Data sources used by financial institutions are as follows:

Customer meta-data.
Transaction timestamps and amounts.
Transaction history of customers.
The geographic location of transactions.
Benford's Law.

Integration Methodology

The following is an overview of the process we would perform at a high level to analyse such feeds within an organisation:

Identify financial metrics from ERP systems that can be used as inputs.
Train an isolation forest on an initial dataset, and continue to train the model going into the future to ensure it detects the more recent fraudulent transaction patterns.
Calling Telemus AI™ APIs to run the Isolation Forest on incoming transactions, the API returns a probabilistic estimate of the likelihood of a fraudulent transaction based on the model.
Set up customised workflows and processes to alert the fraud team as well as customers on potentially fraudulent transactions

Telemus AI™ has robust machine learning models read so your organisation can focus on the business logic rather than the technical implementation.

Organisational Applications

The following lists other potential applications for your organisation:

Detecting fraudulent transactions.
Detecting fraudulent employee claims.
Determining unusual organisational behaviour via human resources tracking systems.

Potential and Realised Benefits

Given the vast magnitude of time and money financial fraud costs and the reputational damage and customer dissatisfaction it can cause, actively preventing fraud can save up to millions, even billions of dollars, depending on the scale of operation. Regulatory bodies are also continuously bringing out more stringent compliance guidelines. There is an expectation that financial institutions have processes, procedures, and systems to prevent and combat fraud. Regulatory technologies, or RegTech is an emerging field that has the potential to drive many innovations within the operations departments of many organisations moving forward into the future.

Telemus AI™ is an Australian based artificial intelligence company providing advanced solutions to government and enterprise. Contact us today for a free consultation on how the Telemus AI™ can be integrated into your organisation.

References

[1] - Isolation Forest - Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou
[2] - Credit Card Fraud Detection - Kaggle
[3] - Machine Learning in Credit Card Fraud Detection - S Joel Franklin

Financial Monitoring