Back to home

Measuring Success in Machine Learning: Beyond Accuracy

The most successful companies of the last 10 years have harnessed the capabilities of Machine Learning (ML). Netflix and Spotify serve up movies and songs users want, when they want them. They’ve crafted an experience for each individual and have done so at scale. On the other hand, as you can see in this video, some companies have dramatically failed to launch ML-powered products. When starting a new ML project, it is extremely important to determine the impact predictions will have on your customers, and decide  the level of predictive accuracy customers will accept. To illustrate, we’ll look at several examples of ML projects and how the measures of accuracy should differ to provide the best customer experience.

Let’s consider a hypothetical classification algorithm which takes images and tells the user if the image is, or is not, a jar of peanut butter. The algorithm is tested on an image set of 100 images total. 95 of the images are of peanut butter and the other 5 are raspberry jelly. If the algorithm predicts that every image is peanut butter, it will be correct 95% of the time! This example makes it clear that managers cannot simply measure success solely on the amount the algorithm gets the answer right. A good algorithm, in this case, needs to be measured evenly on the images it classified correctly as well as incorrectly. By establishing a metric of success that takes into account both positive and negative predictions (predicting peanut butter and not peanut butter), the manager will give the data scientist the direction they need to fine tune the model.

Now, consider a non-profit who wants use ML to scan twitter posts, predict if someone is in distress, and then alert emergency services. The manager needs to take a look at this project and establish the risks of incorrect predictions. After interacting with officials from the emergency services department, the manager was told they do not trust social media and want to minimize the number of times they are called incorrectly. With this knowledge in mind, the manager needs to establish a metric that focuses mainly on the accuracy of true positive predictions. In other words, the model should be optimized to only predict an individual is in distress when they are truly in distress. Unlike the example above, the model does not need to be tuned to predict the negative case (not in distress) with high accuracy, but only needs to focus on correctly notifying emergency services every time.

A final example comes from the medical field. There has been discussion about using ML to identify cancer from images of patient scans. In this example, the worst case is sending home a patient who has been identified as being cancer-free, when in fact, they do have cancer. This is called the false negative rate- predicting ‘no cancer’ incorrectly. The data scientists would be less worried about predicting cancer for those who do not have cancer (they will engage in follow-up tests for precautionary reasons with no harm done), and must focus on tuning the model to make sure the doctors are not sending home patients incorrectly. The stakes are high in this case and the managers making the decision to roll out this ML model need to establish a proper metric for success early in the process.

KPIs (key performance indicators) and OKRs (objectives and key results) are nothing new in business. It has long been the practice of management to create the proper incentives and goals for company success. With the emergence of AI and ML, managers now need to arm themselves with an understanding of these new metrics and how they can be used to create optimal ML solutions for their customers.