Similarity Cracks the Code Of Explainable AI
By Emily Webber
There is nothing more basic to thought and language than our sense of similarity; our sorting things into kinds.
The Renaissance of Artificial Intelligence
Welcome to the rebirth of AI. Computational experts have taken inspiration from cognitive neuroscience for decades, but technical advances leading to the proliferation of data, efficient storage methods for it, faster processing techniques, and accessible scripting languages have brought classic algorithms to the forefront of cutting edge technology. What began as stepwise regression analysis on proprietary software has become a booming industry at the intersection of software development, statistical analysis, domain expertise, and pure creativity. Thousands of companies churn out data-driven applications for businesses and users, with growth in the industry overall expected to skyrocket over the coming decade.
A small set of algorithms have emerged as having the highest potential to deliver on the promises of machine learning, but these come with significant hurdles. Tree-based methods, ensemble models, and continuous learning systems tantalize the would-be data scientist with fascinating optimization strategies. These algorithms are by-products of academia, industry, and homegrown concoctions, and they are being developed, tested, and applied in novel settings by the day. In particular, artificial neural networks boast mathematical specificity capable of learning extremely complex functions, and recent releases by major hardware manufacturers may provide the material environment necessary for applications that rely on timely compute.
Business Strategy and AI Converge At Explainability
The missing volley in this varsity game of innovation-as-usual is explainability. Business users need to know why their customers are predicted to take a specific action, such as churn, in order to take effective action. It is not enough for the engines of AI optimization to produce extremely accurate predictions. Feature importance at the global level of modeling will not answer the crucial person-by-person questions that drive business decisions. In order to stay relevant and provide actionable insights for business insight specialists, data scientists must provide explainability for their insights.
The scene is easy to imagine. A mid-level analyst in marketing is given the task of designing a campaign for a product release. She’s excited about the new data science team her company just acquired, so she reaches out to them for support. Together, they pull rows and columns of consumer data from the database and join them with historical records of product buying history. Equipped with these clean and labeled data, they run the files through an optimized pipeline that the data science team has spent months developing. The pipeline automatically identifies hyperparameters and modeling specifications that will deliver results set to a higher level of precision for a lower level of recall, because she wants to send ads only to people who are very likely to buy.
After weeks of generating reports and debugging Python scripts, the data team is ready to present their deliverables. They have a ranked list of customers she should target, sorted by expected value of revenue. She looks through the list, and asks why someone living in Lincoln Park is more likely to purchase her couch than someone in the Gold Coast.
The data science team is silent. They can tell her how the model works. They know the math behind the optimization strategy for KMeans Clustering. They even have a proprietary version of Bayesian Optimization implemented that very efficiently finds hyperparameters. But they do not have the capacity to explain the local feature importance for each individual record.
Explainable AI is the make-or-break for applied data science. Without it, even the most optimized pipeline will not add value to a business user.
This is simply not a tool provided in open source software. The marketing analyst politely nods. She knows she’ll have to fill in the details herself. She wonders if she’ll ever reach out to the data team again.
Data scientists and would-be AI practitioners require explainability to build applications that business experts can rely on for decisions. Explainable AI is the make-or-break for applied data science. Without it, even the most optimized pipeline will not add value to a business user.
The Race for Explainable AI Stumbles on “The Why”
The obstacle that data science teams must navigate is the age old trade-off between predictive performance and explainability. Tree-based methods have historically been the easiest algorithm to digest for people who have a mathematical background but are not specialists in machine learning. Ensembles methods built on top of trees tend to perform better in terms of accuracy, but the complexity of the ensemble can dramatically reduce explainability. A single document frequently undergoes too many transformations to maintain any interpretation on the factors driving the prediction, and the mathematical engine behind the ensemble can be too intimidating for a generalist to happily absorb.
Within data science communities artificial neural networks has typically been regarded the Holy Grail. To a machine learning audience, the algorithm is endlessly enticing for its conceptual appeal, mathematical complexity, and predictive accuracy. It is a mark of distinction in a technical crowd, a signal of one’s expertise. Layers upon layers of “neurons” are activated using variations on logistic functions, and smart “gates” control the passing of numbers to and from each layer. The back-propagation algorithm computes the gradient, the rate of change with respect to each factor, for every node. Using extensive networks with many layers of neurons is called deep learning.
The problem with deep learning and most other AI methods is that they are effectively uninterpretable. While predictive accuracy can be manufactured through experimentation with more layers and nodes, the addition of each logistic puzzle piece renders the data passing through unreadable by the end user. Yet it is the factors that are of interest in a business setting. The data driving the algorithm is what gives it value to our marketing analyst. She needs to know how the factors are being computed for each record.
Similarity-Based Methods Provide Record-Level Granularity
As a result, many companies experiment with each of these algorithms, trying to maintain the predictive accuracy of neural networks without losing the accessibility and explainability of tree-based methods. The national defense research agency DARPA, for example, last year released a study of the state of the field. Most techniques at best displayed a limited explanation of internal logic and provided almost no capacity to explain classification error. Other researchers have experimented with induction methods that provide explanations for other black box classification systems.
While some are able to enhance the interpretability of their models without sacrificing accuracy, none of these methods provide feature importance at the local level for every prediction. The ability to describe how the model behaves for each individual person was simply not a tool on the marketplace until the commercial launch of simMachines last year. It is this “Why” that will productize machine learning for a business audience.
Marketing Has Been Waiting For “The Why”
Data scientists who specialize in unsupervised machine learning are familiar with clustering algorithms, those that take a data set and break it up into natural groups. There are two key problems with most clustering methods. First, standard open source libraries require that the user specify the “K” number of clusters that they are trying to find. In practice this information is not known, and the exercise of clustering is undertaken in order to discover exactly this natural breakdown of the data set. Second, most cluster builds are extremely compute intensive. The iterator must pick up, examine, and reset each document in the data set until Lloyd’s algorithm converges for each local minimum. If a data set is large, as most production ones are, then a single cluster build will be very expensive. Without knowing the initial K parameter, most systems iterate through a wide range of Ks before settling on a global optimal. This process is extremely expensive and somewhat suspect; selecting K in practice is more of an art than a science. Even after this extremely expensive process, most systems are only able to describe features for the entire cluster, and like the data science team, remain silent on the factors behind each individual record.
In marketing, these clusters are known as “segments.” A marketing analyst thinks of her customer base as static segments, or groups of people who appear to exhibit similar behavior and characteristics. Most campaigns, advertisements, and specials in the last decade have been created to target segments of people, with natural optimizations for high spenders. The problem with segment analysis today is that segments are static and poorly reflect the actual propensities of the individuals being grouped. A segment is typically created through summary statistics, with a business intelligence graphic presented in colorful blocks and concise verbiage. Marketing analysts will dream about the journey of their segment over time, and try to construct communications that match those movements.
In reality, people move from segment to segment. Changes coming from the economy, evolving technology, weather, industry trends, and the natural progression of the buyer’s journey mean that in order to stay relevant, segments must constantly adapt to the business of the moment. This implies that in order for a segment to be relevant and actionable for marketing, it must necessarily be dynamic. Dynamism is useful for applications that rely on real-time data, because the insights drawn from them will be more representative of the living customer body.
The missing link in marketing applications available right now is “The Why.” Without it, most analysts are left in exactly the same scenario as described earlier. Even if they are lucky enough to have quality data and a seasoned data science team that can provide accurate predictions, only one technology is positioned to provide the record-level granularity and explainability that is achievable with similarity-based machine learning. On the market today, simMachines is the only provider of record-level explainability.
simMachines Builds Explainable AI Applications For Marketing
Similarity is an organic conceptual framework for machine learning models because it describes much of human learning. As cognitive mammals, humans often group feelings, ideas, activities, and objects into what Quine called “natural kinds.” While describing the entirety of human learning is impossible, the analogy does have an allure. Amos Tversky’s 1977 proposal receives active hypothesis testing to this day.
Computationally, similarity has been a field of research for decades. Work by simMachine’s Founder and CTO, Arnoldo Mueller (2009), demonstrated that compression could serve as a viable mechanism for scaling what are known as “sketches,” or bit representations of the similarity between two objects. Recent advances by Mic, Novak, Zezula (2016) demonstrate that using already known information at the point of data read can greatly reduce overhead time of processing and memory usage. Distance functions can be tuned for optimal performance that are specific to each business case. Facebook last year released their own library using similarity-based methods for search, data retrieval, and image recognition. The problem most similarity researchers come against is scalability. Standard SQL languages and relational approaches are not able to build schemas that meet the needs of most production applications, where the dimensionality is too high for element-wise comparison to be a viable possibility.
simMachines has solved the scalability problem for similarity-based machine learning. Our proprietary technology employs similarity-based searching methods with efficiency as yet unmatched on the market. We abstract away from the business user the need to select a K, and we provide the weighted factors behind every prediction, for every record. Marketing as an industry is already conditioned to think about their customers in terms of similarity. simMachines extends this mental framework, using it as a lens through which marketing can view cutting edge machine learning.
Similarity-based methods for machine learning and artificial intelligence provide the missing link for Explainable-AI applications, and marketing is well suited to take advantage of this advancement.