KDD2024

Lessons Learned while Running ML Models in Harsh Environments

Pedro Bizarro

Abstract

Once a very large payment processor client told us: 'if we are down for 5 minutes, we open the evening news - so don't screw up'. Processing billions of dollars per day, many financial institutions, need to continuously fight organized crime in the form of transaction fraud, stolen cards, anti-money laundering, account opening fraud, impersonations scams, phishing, and many other exotic and ever changing attacks from organized crime groups worldwide. In fact, it is estimated that in 2023 the global losses in fraud scams and bank fraud reached 485.6 billion. However, in addition to having very good detection rates and very low false positive rates, financial institutions also need to maintain very high availability rates, very low latencies, very high throughputs, automatic fault tolerance, auto scale up and down, and more. In this talk we cover some lessons related to running ML models in harsh, mission critical environments. We describe data issues, scale issues, ethical issues, system issues, security issues, compliance issues, business and regulation issues, and some architectural tradeoffs and architectural evolutions.