Millions of transactions from different kinds take place in a variety of locations. The goal is to identify which ones could be potentially fraudulent, e.g., a $2,000 purchase at a burger place at midnight. For each transaction, the amount, purchase location, date, time, category, among other characteristics, are given. The ground truths are unknown.
Different clustering methods were explored, including k-means, k-prototypes, Deep Learning Autoencoders, and Grouped Gaussian Confidence Intervals with Log Transformation. The latter was the most successful one, being able to discriminate anomalies across different locations and times, e.g., a $500 transaction in a nightclub could be suspicious at 9AM, but not at 12 AM.
- Data Science Experience (DSX), TensorFlow, Keras, Spark, R, & Python
Data Science Techniques
- Clustering & Deep Learning
Automatic tagging of potentially fraudulent transactions improve productivity of accounting department. Reduction of fraud helps improve profits and customer satisfaction.