A Survey of Neural Networks: Part I

by Farrukh Alavi  
  1. Overview
  2. Neural Architecures and Algorithms
  3. Applications

  1. Overview

  2. One of the long-standing aims of computer scientists and engineers has been, and continues to be, the design and implementation of systems which are increasingly human-like in their computational capabilities and responses. Popular culture continues to promise a future where the level of user-friendliness and intelligence built into machines would be indistinguishable from those expected from living beings. Two major avenues of research have emerged over the last two decades that aim to deliver on this promise: artificial intelligence (AI) and artificial neural networks (ANNs). While there are some similarities in both the origins and scope of both of these disciplines, there are also fundamental differences, particularly in the level of human intervention required for a working system: AI requires that all the relevant information be pre-programmed into a database, whereas ANNs can learn any required data autonomously. Consequently, AI expert systems are today used in applications where the underlying knowledge base does not significantly change with time (e.g. medical diagnostic systems), and ANNs are more suitable when the input dataset can evolve with time (e.g. real-time control systems). In order to appreciate the present excitement and interest in ANNs, it is instructive to consider the neuro-biological motivation and history of the subject.

  3. Neural Architectures and Algorithms

  4. The types of problems to which modern ANNs are applied to primarily fall under the following categories: In order for a neural network to be able to perform these tasks, it has (just like a human brain) first to be trained with some preliminary data. Training, or learning, involves a set of network input and output dataset pairs, which are usually represented as one-dimensional vectors (corresponding to the one-dimensional input and output layers of the network). The diagram below shows a typical two-layer artificial neural network (the input layer does not contribute to the layer count). Some appropriately pre-processed data is presented to the network at the input layer. The input layer neurons are not computational units, and simply propagate the input on to the hidden layer (there may be several of these in a given problem, but they all work according to the principle described here). The hidden layer and output layer neurons are simple computational units, unlike the input layer neurons. The values received at the hidden layer are the matrix product of the input vector and the 8x4 dimensional `weight matrix' W (these dimensions correspond to the number of neurons at the input layer and the number of hidden layer neurons). The matrix elements of W (simply referred to as `weights' usually) start off either as randomised values, usually between 0 and 1, or simply as 0.5. As more and more inputs are presented to the network, these matrix elements evolve under iteration according to several well-defined `learning algorithms' until the desired network output is achieved. When this happens, the network has learned a particular task. Much the same thing happens to the 4x3 dimensional matrix V. However, an important difference between this matrix and W is that V multiplies the output of actual computing elements (in the hidden layer), whereas W acts directly on the raw input data. The output of the hidden and output layer neurons is usually normalised to a value between 0 and 1 in order to preserve network stability - in most cases, this is achieved by modulating the output value of a hidden or output layer neuron by a so-called `activation function'. A good choice of this activation function is a crucial factor in the network performing as desired. Popular choices include the Heaviside step function and the hyperbolic tangent function.
    A trained network is capable of producing correct responses for the input data it has already `seen', and is also capable of `generalisation', namely the ability to guess correctly the output for inputs it has never seen. This property is similar to that of the human mind, which can, for example, be taught a particular alphabet and rules of pronunciation, and correctly generalise to the case of words that it has not previously seen. Generalisation is related to the `capacity' of a neural network, which refers to the maximum number of patterns that can be learned before a network starts to produce unacceptable errors. A rigorous mathematical theory of capacity was worked out in the late 1980s by Elizabeth Gardner, in the context of statistical mechanics. Other notable features of networks include the following: Training algorithms come in a wide variety of flavours, but are generally divided into two camps:

  5. Applications

  6. Because of the tremendous flexibility that neural network architectures possess, as well as the fact that they are influenced by such a diverse cross-section of professionals ranging from engineers to psychologists, the ANN industry is likely to continue its rapid growth for some time to come. In this section, we look at some of the more prominent uses to which ANNs have been put:
  • ConclusionsIn this article, some basic ANN background has been discussed in order to convey some of the excitement surrounding this subject. In Part II, we shall consider other, more esoteric, applications, including attempts to build machines with `artificial consciousness' and VLSI implementations of neural hardware. Some useful references for further study will also be given.  
    About the author:

  • Dr Farrukh Alavi is a research fellow in the Department of Computer Science, University of Reading, United Kingdom. Interested readers may contact him at: f.n.alavi@reading.ac.uk