Millions of transactions from different kinds take place in a variety of locations. The goal is to identify which ones could be potentially fraudulent, e.g., a $2,000 purchase at a burger place at midnight. For each transaction, the amount, purchase location, date, time, category, among other characteristics, are given. The ground truths are unknown.


Different clustering methods were explored, including k-means, k-prototypes, Deep Learning Autoencoders, and Grouped Gaussian Confidence Intervals with Log Transformation. The latter was the most successful one, being able to discriminate anomalies across different locations and times, e.g., a $500 transaction in a nightclub could be suspicious at 9AM, but not at 12 AM.


  • Entertainment

Tools Used

  • Data Science Experience (DSX), TensorFlow, Keras, Spark, R, & Python

Data Science Techniques

  • Clustering & Deep Learning



Automatic tagging of potentially fraudulent transactions improve productivity of accounting department. Reduction of fraud helps improve profits and customer satisfaction.

Stay up-to-date! Follow ML Hub below. Contact us: