Understanding the Basics of Explainable AI

Summary

In this post, we take advantage of the lessons to be learned from the simple contour map example discussed in our previous post.  In particular, we will begin to address the issue of interpretable or explainable AI in an easily understood form, and we will lay the groundwork for pursuing this and related issues in our upcoming posts which are planned to follow this discussion.  We will also mention, in passing, some early results of work in computer generated sound and music, as prelude to posts to come.

Introduction

It has been some time since the last post appeared, dealing with the contour map example of a gradient descent algorithm. This kind of simple concept is commonly used in “Machine Learning” (ML) and “Deep Learning” in AI, and serves to help us easily understand the problems encountered in using any such approaches, which include “back-propagation” and other algorithms that accomplish the same purpose (minimizing an error measure in a suitable network architecture).

For the time being, then, there is more that can be illustrated by this simple contour map example. And so, in this post we shall continue the discussion which started with the previous post.  In particular, we will begin to move more deeply into understanding currently unresolved problems associated with attempts to interpret or explain results from the use of ML and Deep Learning, as they are being implemented today, based on the artificial neural network as the (perhaps unexplainable) network architecture of choice.

What More Can We Learn from the Contour Map Example?

Part 1 – Comments on “Explainable AI”

Let us recall that in the Contour Map example in the previous post, the problem was to simply use a common contour map as a means of deriving an algorithm to minimize an error measure. In that example, the error measure was a non-negative measure of elevation above/below mean sea-level). The algorithm would be a rule-based procedure for moving through the contour map in a very specific way and from a very specific starting point, with the objective of arriving at a geographical location at zero elevation, or mean sea-level, on the earth’s surface.

The algorithm that we developed for the contour map, without actually using any mathematics, was analogous to gradient-type algorithms commonly used in computer implementations of Machine Learning (ML), and Deep Learning today. We noted, in the contour map example, that many common challenges, which our algorithm would have to deal with, are exactly the same kind of challenges that a comparable computer algorithm would have to deal with. The major differences, however, were that in the simple contour map example, we were moving through a well-understood 3-dimensional space (architecture, or topography), using parameters whose meaning we understand and can correctly interpret (for example, latitude, longitude, and a non-negative elevation above or below mean sea-level). However, in most situations of interest in ML and Deep Learning, using artificial neural networks, the computer must deal with a generally much higher dimensional mathematically defined architecture (or topography), which generally takes the form of a perhaps randomly chosen, mathematically defined, and poorly understood artificial neural network.

The choice of the actual neural network architecture to be used is normally based on a “try it and see what happens” sort of reasoning. We pointed out that when we don’t know what problem we are actually solving, we surely don’t have much right to expect that the solution we hope for (if there is one) will be the one that we are hoping to find.

Part 2 – Why and How we can Understand and Interpret the Results of the Contour Map Problem

Let us therefore recall the advantages that we had in the understanding the contour map example. First, there were three parameters naturally involved in our problem, if we take the trouble to identify them. The three natural parameters relate to the three spatial dimensions that we are familiar with. The problem that we posed amounted to starting from a known starting point on the earth (we chose some point in the Rocky Mountains, for example), which involved knowledge of the values of three appropriate geographical parameters, so that they could be found on a simple contour map, and then finding an algorithm that would allow us to move through this 3-dimensional parameter space in such a way that we would ideally reduce one of the parameters (non-negative elevation above/below mean sea level) to a minimum. Since the error measure is non-negative, the minimum value possible will be zero, and so this minimum elevation will correspond to mean sea level. We then suggested a simple algorithm for adjusting the other two parameters, in an attempt to achieve our goal, if that might be possible. The basis for determining our route of descent would be the gradient descent-type-algorithm that we would invent to solve our problem.

There are many possible systems of coordinates that could have been used for this mathematical problem, but we can think of them as analogous (on a spherical earth) to latitude, longitude, and perhaps the square of elevation above or below mean sea level (a non-negative measure). These three parameters are sufficient to define our current location at any point on the surface of the earth, above or at mean sea level.  Hence the parameters that we chose were understood by us, fully interpretable by us, and able to supply complete information for our location on our contour map at any point during the application of our gradient-type descent algorithm. The algorithm showed us how to move through the topography of the earth from our initial starting point, in such a way that we would be ideally (but not necessarily) reducing our elevation above mean sea-level at each step or iteration of the algorithm, until we would reach the least elevation that our algorithm would allow us to reach (which ideally would be zero elevation above mean sea level).  If the algorithm could actually successfully handle all of the possible challenges to gradient descent, discussed in the earlier post, then, since in this particular case there is always a solution to the original problem, we would be able finally to arrive at some point on the earth’s surface which is at zero elevation.  Further, we would have sufficient information to say exactly how we got there and we could understand why we had arrived at the particular solution (geographical location) that we found.

In other words, we would know everything that the parameters could tell us, and since we know how to interpret the parameters at any point, we can fully and correctly interpret how we got to our solution (our “route”), as well as understanding the full meaning of the parameters when we reach, or even do not reach an actual solution. In particular we will know where we are, and why, even if our algorithm does not get us to one of the infinity of solutions at mean sea level.

Now, what we have just explained is what is normally missing when carrying out a mathematical gradient descent process (minimizing a non-negative error measure), when we don’t understand the meaning or functioning of all of the parameters we are using in our computer-based mathematical model, representing perhaps an (generally randomly chosen) artificial neural network, for example.

In other words, we normally have no way of interpreting the significance (or lack thereof) of the (often large number of) parameters which belong to, at least, the hidden layers of the neural network architecture which we have chosen to work with in order to solve the error minimization problem. There is a great deal of mathematics involved in a proper discussion of this, especially since the usual artificial neural network architecture normally combines both linear and non-linear mathematical elements in what are usually randomly chosen (essentially meaningless) ways, and which, nevertheless, define some possibly high dimensional parameterized mathematical architecture (network, system, or topography), no matter how they are chosen and interconnected. 

We should mention, in passing, that in most reasonable cases, any such simple mathematical network or system represents what has been referred to as a “universal approximator”, meaning that its parameters can be adjusted by adaptive processes such as gradient descent algorithms in order to minimize a non-negative error measure to the extent that this may be mathematically possible for the chosen network or system, and for the choice of error measure which we have used.  In other words, there is nothing unique about an artificial neural network.  The concept exists because it seemed to be able to be “fudged” to act in a way that was observed, at that time, to simulate the behavior pattern of a real neuron in our brain or nervous system (which we still do not understand well at all).

Unfortunately we normally don’t know what these artificial neural network (“ANN”, or simply “NN”) parameters mean or why and how they should relate to finding a meaningful solution to a real problem that we may be trying to solve. This is the essence of saying that this is not science, it is more related to magic or alchemy in which we hope that something miraculous will happen, and the results that we find will actually have the meaning we might wish to associate with them.

If you understand what is being said, then you understand what is meant by the phrase that we need to find “explainable AI”, or as it is sometimes referred to as “XAI”. Here is the heart of the problem, if you followed the analogy!  Effectively, in XAI for neural networks, if it existed, we would be able to assign meaning (interpretation) to the parameters, etc, which we used in solving the problem.

It should go without saying that, without the ability to understand and to correctly interpret the solution that we might obtain, and to fully understand the implications of how and why we found that solution, we must ask the obvious question, “of what value is such a solution“? This reality has been dawning on many who have tried to use the results of ML or Deep Learning, and you will find more and more references to the fact that the results of ML or Deep Learning cannot be trusted (also see our post “AI – Can You Trust a Black Box?“.

Part 3 – How (Not) to Build a Radio

As a further illustration of the underlying principles of the artificial neural network (which was invented in order to be able to mimic, in some very simple way, the observed actions of the neurons in the human brain), let us consider an analogous problem.

Let us suppose that we are children at a certain low level of development, and we are familiar with the concept of a radio. Armed with our minimal set of skills, we decide we’ll build one for ourselves. We know that we will need a box to hold the radio and we’ll need some knobs or dials or at least something similar to put on the outside of the box, and we’ll either need some batteries or perhaps a power cord to attach to the box (depending on what we have seen on, or in, real radios). If we are at a high enough level of development, we might bother to look inside an actual radio and see what we find there. Let’s suppose we find something that resembles a printed circuit board with some unknown little things that seem to be stuck to the circuit board (inductors, capacitors, resistors, transistors, etc).

Let’s suppose that we gather some small seeds of various kinds and perhaps find a small board or flat piece of plastic that resembles the substrate of the printed circuit board we saw. We might then find some aluminum foil, if we are really clever, and cut it into thin strips and glue them to our substrate to resemble the objects on the printed circuit board. We can now choose from our selection of small seeds, some that might resemble what we saw stuck to the printed circuit board, and we proceed to glue them onto it, of course, in a somewhat random fashion. In the end, we will wind up attaching the remaining pieces, which we think look like dials (perhaps sewing buttons?), and the electric cord, and perhaps a cardboard cone or something like that for a speaker, and even perhaps an antenna, if we are that clever.. We have now completed what, to us, looks like a radio.

Needless to say, the next thing to do will be to plug it in, or simply turn it on (however we arranged to do that), and we’ll be delighted by the music and various other forms of entertainment it will provide. What a wonderful thing it will be, and we might even invite our little part of the world to come and be amazed by what we have done. And of course, if this goes well, we’ll move on to TV in our next project (and perhaps even raise millions in VC funding before we start)!

Well, unfortunately, what we have described is analogous to what the people, who originally looked at the brain, and knowing something about the new digital computer, decided to use to create a simulation of the human brain. When they looked at the human brain to see what makes it work, they recognized what they referred to as biological “neurons” which seemed to be the fundamental operational unit of the brain (like the printed circuit board, etc, inside the radio).

Based on what they had seen in the actual brain, and the fact that the whole thing might just be apparently simulated using the new digital computer (similar to building a radio using things that look like electronic components on a printed circuit board), they had a basis for proceeding. Further, given a bit of ingenuity to guess how the brain neuron might work and how it might be simulated with the computer, they announced a project to be called artificial intelligence (“AI”) in 1956, and they invited the world to watch as the computer should start behaving like a real human brain as soon as the (largely arbitrary) digital simulation was up and running.

The artificial neuron would be simulated by a somewhat random arrangement of low-level linear mathematics which could be made nonlinear, as the real neuron seemed to be, by adding a simple nonlinear mathematical component here and there, among the linear components. This is, of course, is analogous to adding arbitrary “seed” components to a random printed circuit board look-alike in our radio example.

This was done in such a way that the artificial neuron would ideally output a numeric 1 value (the artificial neuron would “fire” if suitably activated), and a numeric zero (no activation) otherwise. Because they were operating at an educated adult level, they knew enough to arrange things so that simple logical operations might be implemented (although the original work was ‘shot down’ early-on due to the realization that not every basic logical operation could be realized by the original combination of elements). This turned out, later, to be fixable by adding additional layers to the network of artificial neurons, in case that might actually make a difference. The whole concept was based on the notion that the brain is only carrying out strings of formal logic operations, and the new digital computer should, if properly set up, be able to do the same thing. We pause to remind the reader that the brain is not a digital computer and the digital computer is not a brain.

So What’s New?

Today, after more than 6 decades of largely wasted, and in many cases, very costly effort, we still don’t have a working artificial brain based on a random architecture called the artificial neural network. That doesn’t mean that we don’t have things that seem to behave like a human brain might behave. We can simulate speech through proper coding, and so on, but little of what appears to have been accomplished is actually new or has anything of particular importance to do with the man-conceived concept of the artificial neural network.

Like most other systems or networks of similar simplicity, these artificial neural networks can be made to mathematically approximate, to whatever extent is mathematically possible for each particular choice of a network (number and relationships between parameters, the error measure used, etc, etc) a desired output, in response to a particular input. The actual response, of course, depends on many factors or choices which we make within the artificial neural network’s preconceived architecture or “template”. This procedure for solving our minimization problem is usually referred to as using a “black box” approach because we don’t really understand how well our possibly arbitrary choices might work, without simply trying it to see what happens.. It is a “formal” procedure that we hope will magically produce a meaningful result, according to whatever we conceive as “meaningful”. If you understand what is being said, then you also understand the interest in trying to find “explainable” AI, as an alternative.

Of course, the problem, in past decades, was often said to be that we just didn’t have enough computer power yet in order to do a really good job. Well, if anything has changed (and in the fantasy world of AI, not much has), it is surely that we do now have more computer power than anyone could have imagined 60+ years ago. However, credible computer generated music and speech synthesis were possible back then and, before long, people were doing these things. It worked because we did not rely on so-called artificial neural networks, which can require enormous amounts of computer power and access to very large databases to “train”, without any way to know what the results might actually be.

Instead, we utilized our knowledge of the physics and perhaps psycho-physics of sound, and our own ability to manipulate and/or simulate this knowledge with proper coding in the digital computer.  When the digital output was coupled with some simple digital-to-analog circuitry we were able to actually listen to the results of our efforts.  When we got the basic physics right, we also got the kind of results we expected (explainable results!).  If the (explainable) results didn’t quite satisfy us, we knew (from the physical model) what modifications we might have to make in order to achieve our desired goal.  This is the kind of procedure that would be made possible by explainable AI, if it existed in the world of artificial neural networks.

A Little Bit of History – Computer Generated Music

Even though we possessed almost no computing power by today’s standards, In the few decades following the advent of AI in 1956, we were able to produce, for example, wonderful simulations of musical manuscripts being played by simulated musical instruments on the computer. The coding and required data input could be done, and we simply had to run the computer for many hours, and introduce efficiencies, such as coding in integer arithmetic, in order to utilize the fastest possible operations that could be coded on the early digital computers. What took us many – many hours of computing to achieve then could be done in near-real time today, but that didn’t deter the effort. What we lacked in computer power, we made up for in clever coding and very long run-times to achieve results that in many cases were as good as any that can be achieved today. The basic abilities of the digital computer have not substantially changed, since that time. Standards of things like 24-bit sound synthesis, and 40 kHz sample rates were established long ago. 

Again, we emphasize the fact that we were not playing with artificial neural networks to magically solve our problems. It was done by understanding what we were doing and correctly coding true solutions to the physical problems we needed to solve in order to produce the really good results that were achieved by us and the other researchers whom we were fortunate to know and to work with at that time.

An Afterthought – Re-Release of Early Recordings of Computer Generated Music

It occurs to me to mention that if you have not heard the results of some of that early computer generated audio work, recordings of much of it still exist and I have thought of perhaps trying to make some of that available again if there would be enough interest in doing so.  If you think that enough people would like to hear work that appeared in the 1970s, for example, I might be tempted to make some it available again in the form of currently available listening formats. I may even give samples of it to the Smithsonian Institution, since I think it is of historical interest.

For my own part, for example, a 2-LP album (vinyl recordings) of computer generated music by contemporary (at that time) computer music composers and researchers was released. The occasion prompting the release was a series of musical performances in honor of a visit to our university by Aaron Copland, who was often referred to as the “Dean of American Composers” (now deceased).  Copland will be very familiar to people who know American Music, and we will not dwell here on his many accomplishments.

We were also active in realizing on the computer some great works in the Ragtime genre, and I believe that Gunther Schuller and even the great Scott Joplin would have applauded the results!  We also carried out numerous interesting mathematical experiments based on the work of Bach, and even ventured into experiments in micro-tonal music.  An interesting comment by Aaron Copland relates to his question of why we would even consider working in micro-tonal music when we have only just begun to explore the endless possibilities of the 12-tone scale! However, these are the kinds of possibilities that are actually conveniently opened up for us by the digital computer.  None of it was ever referred to as “AI” nor approached in the manner that is often called “AI” today.  We will have much more to say about this later!

In view of these comments, you are welcome to use our “Contact” page to respond, if you would like to be able to experience the sounds of some of these early recordings.  I believe you’d be impressed by the quality of the work which was possible at that time.  The fundamental capabilities of the computer have, for the most part not changed, it is ultimately the speed of computing that has seen the largest advances in recent years.

Update – Was it Really What it Appeared to Be?

Automated Phone Reservation

Just to illustrate a point, in an earlier post, we made reference to the announcement to the effect that a computer had dialed a restaurant and made a reservation for dinner with no human intervention on the computer side, but with a real person on the other end of the phone line, at the restaurant. However, it was later revealed that the whole thing had not been done using only neural networks, or even perhaps true artificial intelligence of any kind.

Instead, the computer had been programmed to identify certain key-words spoken by the human at the restaurant, and it had been coded to activate “canned” simulated (or even perhaps pre-recorded) voice responses, which would be appropriate, depending on which key words had been identified. In other words, as I understand it, essentially nothing very new was actually being done, but the enormous computer power available today made it possible in real time. The company that did the demonstration has reportedly made this code available for public use, but only for the purpose of helping to make automated reservations or appointments, as described above. Human intervention, in this type of situation, will always be required whenever an unrecognized response is received or when there is no pre-coded or recorded canned response which would be appropriate for the circumstances.

Automated Facial Recognition

Today, although we earlier described how facial recognition might be attempted using a neural network, it is not actually done that way in most commercial applications. What is actually being done is that the computer is coded to attempt to identify and measure characteristic facial features from a human facial image, including measuring relative distance between eyes and nose and mouth, and other associated facial measurements. The quantitative results of these measurements are then compared to a very large database of similar measurements taken from a very large number of people, in order to find a “best-fit” match.

The unfortunate truth is that we all come from pretty much the same original human stock or stocks, and most of us closely resemble many other people. The net effect of this is that, in general, many reasonably good matches can be made to our own particular facial measurements, and this often results in mis-identification of an individual face.

Police departments are reporting thousands of false identifications, and in some cases being sued for this by the persons who have been incorrectly identified. The problem is that even though no neural networks may have been used in this process, facial measurements are just not accurate enough to uniquely identify us. We have had better results for a very long time using fingerprints and the apparently unique structure of the iris of the eye, for example. Unfortunately, these better identifiers often can’t be used when only a possibly poor photograph is available for identification. In that case, think about the possibilities of tracking your unique cell phone, internet visits from your ISP address, social media postings, etc – Beware! We should all be aware of what is actually being done, and be particularly aware that it may not be what you think or believe is being done, and realize that you may be the unfortunate victim of a failure to correctly identify you from limited and less reliable information such as a photograph.

What’s Next?

There is much more to be written, and this post, as with many of our posts, is becoming excessively long, and time is limited. This post has, to some extent, accomplished its original purpose of continuing the discussion related to the simple contour map and what can be learned from it. As a result, we have probably reached a reasonable place to take a break before pursuing this discussion any further at this time.

I hope that you have learned something more about the real nature of the neural network, Machine Learning, Deep Learning, and explainable AI, and so we will stop here for now. In the next post, we’ll plan to explore something that I think has to be addressed, as well as more discussion related to the concept of explainable AI. Ultimately, of course, we will draw conclusions to our original questions, including, “Is It Really the Monster That Ate Humanity?”. Suffice it to say, if we are ignorant enough of the facts, it might be, so let’s plan to continue this discussion with the next post.

Thanks again for stopping by, and we’ll look forward to seeing you again, soon!