How to teach artificial intelligence and say, “I’m not sure”
One of the biggest challenges in advanced analytics is developing mechanisms to determine how reliable the decisions made by algorithms are. Now, research by BBVA’s Artificial Intelligence Factory is proposing a new method to make machine learning models capable of expressing the uncertainty in their predictions in a clearer manner. It’s an exploratory approach to make artificial intelligence more transparent, measure the reliability of its predictions and fine tune the accuracy of its results.
Machine learning uses large volumes of data to make predictions about what will occur in the future, based on patterns extracted from past examples. However, a higher or lower degree of certainty always exists for each of these predictions, and this certainty can decrease when the data being used is especially complex.
In this context, new initiatives are emerging that attempt to mitigate the problem by making the algorithms capable of capturing the uncertainty in their predictions. This way, if they are not able to offer precise information, they can show a response that reflects the possible ambiguity.
This is the approach proposed by new research carried out by a team from BBVA’s Artificial Intelligence (AI) Factory, in collaboration with the University of Barcelona. Its results were presented at the NeurIPS Conference in Vancouver (Canada), one of the machine learning events with the best reputation in the scientific community. The solution proposed by the researchers was to develop a method that allows the machine learning model to include the uncertainty present in the variable it is trying to predict so that the result produced more precisely reflects reality, or even indicates when it simply cannot make an accurate prediction.
Dealing with uncertainty
“The problem is that these kind of systems do not normally provide us information on the uncertainty underlying their prediction processes,” explains Axel Brando, Data Scientist at BBVA’s AI Factory and one of the authors of the research. In other words, they are trained to always provide a single solution, even when there could be equally probable options, thus crucial information could be lost. “By default, most predictive systems are usually designed in a way that they cannot offer an “I don’t know”, or “I’m not sure” as an answer, he adds. The researcher explains that this situation is problematic when predictive models are applied to risk scenarios where the cost of making mistakes in predictions is sufficiently elevated. In these situations it is preferable to not make automated predictions “when the systems knows that it is very likely that they won’t be correct.”
“When we develop an automated predictive system, we know that a certain percentage of the predictions will be wrong, so most of these kinds of predictive algorithms try to offer a value that minimizes errors,” explains Brando. This occurs in complex situations, when they normally tend to assume a series of restrictions as a way of simplifying the reality they are trying to model or represent. For example, it is assumed that the distribution of data they want to predict has a normal distribution, and therefore, that there is symmetry in the data.
"Most predictive systems are usually designed in a way that they cannot offer an “I don’t know”, or “I’m not sure” as an answer”
On this point, the authors ask: “Is it more important to always predict with the lowest possible level of error, or is it preferable to predict selectively and only when we are confident in advance that the level of error with be small?”
Along these lines, the solution proposed by the team of scientists was to design a deep learning model, capable of estimating the complexity of the distribution of possible predictions. Thanks to this model, the person who will make the decisions based on the results will have a more complete vision in order to decide which value is the best one to try to predict. “Our goal is to transmit information to the person who is going to work with the model, so that they are aware of how reliable each possible prediction is. With this approach in mind, we can develop analytical models that are capable of abstaining from making a prediction if they are not sufficiently reliable,” adds Brando.
More precise expense predictions
Addressing the modeling of uncertainty is fundamental in any predictive system that entails a certain level of risk. For this reason, it is a scientific approach, key to the design of any product or service in the financial sector. In fact, the issue is becoming highly relevant in the scientific community and was one of the topics that sparked the most interest in recent editions of NeurIPS.
The problem, as well as the solution proposed by researchers, can be illustrated with a real example from the financial sector: the forecasting of monthly expenses.
Visual representation of how the model shows a distribution of the different possible predictions.
A customer’s monthly expense data can be used to represent each customer’s history of expenses as a time series, where the first value corresponds to the total amount spent in the first month, the second point corresponds to the total amount spent in the second month, and so on. In this context, it is possible to create a predictive system that, given the history of several months of expenses by a customer, can predict how much they will spend next month. As illustrated in the figure above - and similar to what occurs in many problems in real life - given the same history of expenses, each user could act in a different way in the future. The question is, which of these possible expenses would the predictive system show?
In fact, all spending options shown in the figure are possible scenarios, so offering a single numerical value as the result doesn’t make sense in this problem. In addition, by observing the distribution of these possible predictions, it can be observed that for this specific case, the results are organized into three different groups - which can be easily recognized by noting the three peaks in the distribution.
Therefore, in this case, being aware of the distribution (or multimodal pattern) is essential, as it provides key, highly useful information to adapt the prediction - information we would not have if we modeled the uncertainty in a flexible manner. Furthermore, by having the property of flexibility, the developed model makes it possible to detect the distribution of predictions without strong restrictions.
This example outlines a possible application of the modeling of uncertainty. However, in order to understand the technical aspects of this work, you can watch a three minute video-summary, read the scientific paper presented at the conference, or experiment with the implementation of the model proposed in the article (UMAL; Uncountable Mixture of Asymmetric Laplacians) individually in different public access problems.
The modeling of uncertainty is an active research area at BBVA’s AI Factory, developed by José A. Rodríguez Serrano and Axel Brando, the co-authors of the research presented at NeurIPS. “This line of work represents an example of how our pragmatic research approach, geared toward BBVA’s analytical challenges has produced a result that can be used by teams to offer our customers better products and services,” explains Rodríguez, Research Director. In fact, the results and conclusions are already being transferred to the design of new products and services, using data with which BBVA is currently working.