Challenge

Millions of transactions from different kinds take place in a variety of locations. The goal is to identify which ones could be potentially fraudulent, e.g., a $2,000 purchase at a burger place at midnight. For each transaction, the amount, purchase location, date, time, category, among other characteristics, are given. The ground truths are unknown.

Solution

Different clustering methods were explored, including k-means, k-prototypes, Deep Learning Autoencoders, and Grouped Gaussian Confidence Intervals with Log Transformation. The latter was the most successful one, being able to discriminate anomalies across different locations and times, e.g., a $500 transaction in a nightclub could be suspicious at 9AM, but not at 12 AM.

Industry

  • Entertainment

Tools Used

  • Data Science Experience (DSX), TensorFlow, Keras, Spark, R, & Python

Data Science Techniques

  • Clustering & Deep Learning

 

Benefits:

Automatic tagging of potentially fraudulent transactions improve productivity of accounting department. Reduction of fraud helps improve profits and customer satisfaction.

Stay up-to-date! Follow ML Hub below. Contact us: MLHub@us.ibm.com