Neural Networks - A Dialogue

Neural Networks and other matters

– OR –
How Arturo Found Annette

We find young Arturo, excited about starting his college education, strolling down a shady campus lane on a lovely autumn afternoon. He is hoping for a chance encounter with spritely Annette, who at this moment is just a lovely vision seen from afar. Suddenly, his gaze is diverted to a group of students gathered on the campus green nearby, seated around Professor Philos, whom Arturo has recently met during pre-registration counseling. Philos enjoys the high regard of his students, and Arturo becomes interested in getting involved in the group.

Philos (spying Arturo joining the group): Welcome, Arturo, it’s good to see you and happy to have you join us this fine autumn afternoon. We are just about to break up, but if you’d care to join me, we can walk together.

Arturo: Thanks for the invitation, and I’ll gladly accept a chance to spend some additional time with you.

Philos: Tell me about your classes and how things are going, now that you are a full-fledged college student! I hope my pre-registration counseling was helpful to you.

Arturo (very animated): It was, and I couldn’t be more excited about the prospects before me. There is so much that I want to learn, as you already know! I love computing and I hope to be a successful entrepreneur someday, and perhaps run my own company.

Philos (laughing): You’ll do well, Arturo, and how well I remember those days when I was in your shoes! So, what shall we talk about today?

Arturo: Professor Philos, I am fascinated by all that seems to be happening related to work in artificial intelligence, and I wonder if you could tell me something about it.

Philos: Well, as you know, I am not a computer scientist, but I certainly do know something about what many of my colleagues are working on, and I’ll be happy to share what I know and any insights that I can give you, as a scientist who is not directly involved in that area of activity.

Arturo: Great, I’m eager to learn as much as I can.

Philos: Arturo, as you may know, there is a blog that you can consult for some brief historical background information that you should read and familiarize yourself with. There are also a number of good related reference books that you might like to consult, and they will introduce you to more than I can tell you in an afternoon walk, so let’s limit our discussion today to what is happening at the present time. I suspect that you may be aware of some of it, so I’ll just give you my impressions and thoughts on what I know about it, together with some necessary background information to help with understanding.

Arturo: Please go on, I’m listening!

Philos: To begin, artificial intelligence has been going through what we might call evolutionary phases, leading up to the present time. When the computer was new and everyone had great expectations for what it was going to do, these often ended in disappointment, partly because nothing turned out to be as simple as it seemed, and there really was not a lot of related research or even computing capability, at that time, that could provide the necessary basis for doing what everyone hoped could be done.

I expect that you understand what I am talking about, and you yourself are probably aware of the great increases in computing power, and so on, that have taken place just in the past few decades. I think we are still at an early stage in the computer revolution. As the power of the machine increases, and as we have access to more and more data in digital format, it is natural that the activity related to harnessing all of these new resources, for new applications, will increase at a similar rate. I think we are seeing the effects of that today.

Arturo: Yes, I’m sure I know what you mean.

Philos: Now, because we have only a limited amount of time today to pursue these issues, let me suggest that I’ll get our discussions going with some thoughts that are uppermost in my mind, at the moment, and I hope you’ll forgive me if I may simplify things a bit for the sake of conversation.

Arturo: Not a problem, professor.

Philos: Good, and then if you are agreeable, perhaps we can meet again before too long to continue from wherever we have to leave off today.

Arturo: That would be great with me, and it would be wonderful to hear you expound further on this topic, and perhaps others, in our future meetings.

Philos: Then it’s agreed, and I suspect we are going to enjoy many pleasant Fall afternoons like this one to carry on our discussion, which I think I will enjoy just as much as you seem to.

Arturo: Right on, professor!

Philos: What I seem to be observing, recently, in artificial intelligence activity, is a focusing of efforts to some extent on a rather limited range of approaches and techniques which seem to have been singled out for more intense study, and I may know some of the reasons why this is happening. Are you familiar with the concept of a what is known as a “neural network”, or “neural net”?

Arturo: I think I have some familiarity with the concept, and I certainly hear the term being used a lot.

Philos: Yes, you are right about that, and, at least for today, I’d like to focus our attention and perhaps my comments to this one topic, which seems to dominate much of the current work in AI, though I expect that this will not always be the case. Would you be surprised if I told you that probably several hundred thousand papers have been written on this one topic alone?

Arturo: Wow, that’s a lot of information to comprehend!

Philos: Yes, and I can’t verify that number, but I have heard it quoted and it wouldn’t surprise me at all. The concept of a neural network was known originally as a perceptron, and was based on observations of how the brain seemed to be structured, back at the time when the digital computer was still just in its early gestation stages. In any event, that concept has not changed very much, but it has become what is known today as a neural network.

Now, the idea is really very simple. It had been observed that the brain has structures known as “neurons” which can form (electro-chemical) connections with each other through pathways known as axons (for sending signals from the neuron), and dendrites (for receiving signals into the neuron), and, at the connection points, “synapses” or gaps (across which the transmission of signals from axons to dendrites takes place). It was natural to wonder whether, if we modeled something like these structures in the computer via computer programs, they might be able to do some of the things that the brain does.

As an example, one simple idea that results from this line of thought is that the network consists of “nodes”, which are connection points for the other parts of the network which correspond to the axons and dendrites (like wires) connecting to the nodes. Now to complete the picture, we just expand on this idea. First, we will agree to arrange the nodes into what are effectively rows and columns, with every node in each row being connected to every node in the next row, but no more than that. The very first row will be where external signals (these will be numbers in the digital computer) are input (some kind of signal into each node in the first row), and there is one node for each separate input signal. The last row will be for outputs from the neural network, and there will be one node for each separate output signal that we wish to have.

Now there can be any number of rows of nodes in between the input and output rows, and each of these rows, known as “hidden layers”, can have any number of nodes in them that you’d like to assign.

That’s it – that’s the basic structure of a neural network. There is only one more thing that we need to describe, and that is what happens at each node in a row as a signal (a number) is sent to that node from all the nodes in the previous row? First, we are going to allow each of the signals arriving at a node (after the input row) to be multiplied by a scalar weight which will be assigned to each signal path connecting to the node. We then sum all the weighted signals sent from the nodes in the previous row to that node and we can add a scalar bias term to this sum, also. The result becomes the input signal to that node. The set of all of the weights and biases, in every case, will be called the “parameters” of the system. In general, there is a weight parameter for each signal path between nodes, and a bias parameter for each node, in rows after the input row.

In order to “train” the neural network, what we are going to do is to adjust the entire set of these parameters in some way in order to achieve a particular goal for the network outputs. Now, you may already be familiar with the idea of linear systems, and if so, at this point of our discussion, we have described a very simple feed-forward linear system (or, in this case, a linear neural network). If we stop here, then the whole thing can be given a simple matrix description, and linear algebra, or matrix theory, for example, can be used to do whatever we want to do after that. However that’s not quite all that we are going to do, and the entire network architecture that we choose, like the one being shown, will become important.

We know that the neurons in the brain do not behave in a linear fashion. Therefore, the neural network that we have described so far, is not quite as complex as the brain seems to be. Therefore, in the interest of destroying the linearity of our simple system that we have just described, we can introduce an element of nonlinearity to act in addition to the weighting, summing, and biasing of all the signals arriving at a node from the previous row of nodes. Unfortunately, there is not necessarily a particular logical choice for the nonlinearity, but noting that real neurons in the brain, when receiving a signal, seem to either activate, by generating an output signal in response to an input signal, or in other cases, they may not activate (send out a response signal) at all.

This is nonlinear behavior. The idea, then is that the nonlinearity that we might want to use might just act more or less as a switch which, in the simplest case might just either adjust the input signal coming into the neuron by outputting either a 0 (no activation), or a 1 (activation), and this might be done based on the value of the signal received, for example (we don’t actually know on what basis the brain does this, so this is surely an oversimplification).

For reasons that we won’t go into in detail right now, rather than simply activating an on-off switch, we will instead choose to pass the summed and biased value arriving at the node into a nonlinear (activation) function that varies between 0 and 1, for example, according to the value of the input signal, in some simple but nonlinear way (not in a straight line). The kind of function that is often chosen is referred to as a “sigmoid” (S-shaped) function, and could be implemented by using something like the hyperbolic tangent function or the logistic function, as examples. If the value which is input to the nonlinear function is a positive number, for example, we might want the nonlinear function to output a value of 1 (activation), or if it is a negative number, for example, we might want the output from the nonlinear function to be zero (no activation). This is just one of many ways that we might want to achieve activation or non-activation of a node, based on the value of the signal which is input to the (nonlinear) activation function.

In practice, therefore, this operation is usually normalized so that the output value from the nonlinear function is always a number between zero and one, and then the output is taken to be 1 (activation), for example, if the output value is greater than one-half, and it is taken to be zero (no activation) if the output value from the nonlinear function is less than one-half. We won’t go into any further detail on this, today, but that’s the idea, and we are free to choose exactly how activation might be accomplished or avoided, and other choices of nonlinear functions and their operation can certainly be made to accomplish this goal. Again, we emphasize that this is purely a simple construct based on a choice we make, since we do not know on what basis a real neuron in the brain might actually be prompted to fire (activate), or not, in response to signals which are received by the neuron in the normal functioning of the brain.

As a final comment, we often simply want to adjust the network parameters to produce only a value of either zero or one, as the network output, in order to make a decision. For example, we may have “trained” a neural network to output a value of one if a particular object is present in a photograph and a zero otherwise. We then interpret the output from the network, when a new photo is input, as “yes, the object is present” if the output is one, and “no, it is not present” if the output is zero.

We should always be aware, however, that the network is a “black box” which may or may not always arrive at the desired result, and so the result must always be viewed with suspicion, or “taken with a grain of salt”, as we often say. In fact, outputs from trained networks can be very sensitive to even small differences which may be present in a new input. These differences may result in an incorrect output on which we are to make a decision. This is a serious issue that we need to be aware of, and we can pursue this issue in greater depth at a later time.

Arturo: That’s a lot to think about, but I think I do get the overall idea of a gridded network with each node in each row of the grid connected to each node in the next row. Then the numbers being sent to the next node by all of the nodes in the previous row are each multiplied by numbers known as parameters. The results are then summed. A bias term, which is another parameter, can then be added. The result of these operations can also be input to a nonlinear activation function that can vary in some (nonlinear) way between 0 and 1, the actual value being based on the value of the summed and weighted signals received, and is normally then adjusted to be exactly zero or one, as the final output from the node. This is the signal that the node sends to all of the nodes in the next row, and the same kind of operations are performed again to obtain the final input and output signals for that row of nodes. In general, we may want just a simple zero or one value at an output node, and this output can become the basis for making a decision. Further, we always need to be aware that the network output can be sensitive to small variations in the inputs which may result in an incorrect output from the network. This could then lead to an incorrect decision based on that output.

Philos: I think you’ve got the idea, Arturo, and that is really what a neural network is all about. The signals normally flow in one direction, from the first row, the input nodes, to the last row, the output nodes (this is called a feed-forward network, and in this case it is nonlinear). We try to adjust our parameters at all the hidden, and output nodes, in some way so that the signal coming into the network will be transformed into a desired output signal (or signals) at the output node (or nodes). So, to put it simply, we have imagined a very simple feed-forward network which is nonlinear, and whose purpose is to change an input signal or signals into a desired signal or signals at the output nodes. Further, we generally want this to work for a large class of related input signals. We hope to do this by finding a suitable set of parameters that will accomplish our purpose.

Arturo: But what if there isn’t any set of parameters that will do this?

Philos: A very good question, Arturo, and indeed there may not be such a set of parameters that can do an acceptable job of this. However, the mathematical procedure (algorithm) that we will use to determine the values of the parameters works in such a way that the observed output signal, after “training”, will be as close to the desired output signal(s) as is possible, for a large class of related input signals. This, in turn requires us to choose a measure of error (between our desired signal and the actual output signal) which we are effectively going to hope to minimize in this training process. In other words, we want the output signal, after training, to be as close as possible to the desired signal, based on the measure of the error, that we decide to use.

Arturo: But then, that seems to say that the final result of training a neural network may depend very much on how many parameters we have, and how the network is actually arranged into rows and columns, and even, I suppose, on the function, or even functions, that we might choose as the source of the non-linearity in the network, and even on the measure of error being used, and the types and characteristics of related examples that we are going to use for training. Further, the performance of the training of the system will depend, I suppose, on the specific type of algorithm chosen for adjusting the parameters. In addition, we must always be aware that sensitivity to even small variations in new inputs to a trained network may result in erroneous outputs.

Philos: Arturo, when you are ready to go to graduate school, I want you to come and see me first. You are truly perceptive beyond your years! You are exactly right, and, in fact, it can take quite a long time to allow the algorithm that is used, to even successfully adjust all of the parameters in an effort to reach our training goal.

The process is typically iterative (repetitive), and it involves successively updating our current parameter selection so that the next selection of parameters, after adjustment, will result in a lower measure of error. If we don’t eventually reach an acceptable level of error or performance by the neural net for the entire training set, as we may not, then we may have to change the structure of the network and try again, and…. well, you probably get the picture of how this process goes. And so far, our discussion has been an oversimplification, because typically this process is carried out over a very large number of different, but related examples of input signals (the training set), in order to find a network that might produce the correct outputs on other related examples that were not used in the training process. There is a good chance that we will never perfectly reach the desired goal.

Arturo: Thanks for the compliment, and yes, I’m afraid I do get the picture.

Now perhaps I can only make a few more comments before we end our visit today, but you have set the stage for our next discussion.

Arturo: I think I’m beginning to get the idea, and I’m already looking forward to our next meeting!

Philos: Fine, now let me just say a few more things that I think are well worth knowing about neural networks.

As things stand today, there is not a strong scientific foundation for how to best determine the structure of the neural network to be trained for any particular application, and there is so much more to be said. Not only are these essentially open questions that have to be answered by trial and error, today, but we have not even begun to explore the whole universe of possible networks or even alternative approaches that might be used just as well, and that might perform much better than the very simple neural network model we have described.

Arturo: Yes, I’m beginning to get the idea of some of the important issues in using neural networks, and the kind of work yet to be done.

Philos: Indeed. Hopefully you are beginning to understand where these comments are leading us, and where we will find fertile ground for future discussions.

It’s getting late, and I’ll call it a day, for now. Have you met my daughter, Annette? I think I see her over there waiting by the car for me (Annette is waving to them), so I’ll bid you adieu, for now, but come by my office at your convenience next week, and we’ll plan our next meeting.

Arturo (stunned at the revelation about his vision, Annette): Professor Philos, there is no way that I’m going to miss coming by your office next week! You can be sure that I’ll see you there.

Philos: Have a good evening, Arturo, and do some reading and organize your thoughts for our next meeting, and so will I. I’ll also provide a brief list of references for you to have at the office next week! Adieu!

Arturo (smiling at his good luck today): Ciao, professor, and thanks so much for everything!

Arturo turns and waves a goodbye back to Annette, whom he still has not formally met. She smiles and returns the courtesy….

…… to be continued in our next dialogue …..

The link to the previous post, about Dialogues and what they are, is here.