A Brief Tutorial on Artificial Neural Networks and their Training

November 15th, 2018

By Lynnette Reese, Editor-in-Chief, Embedded Systems Engineering

A machine learning algorithm is only a computer program, but it works by improving its performance with every piece of clearly identified data. Artificial neural networks are merely fitting the parameters of a complex function to huge data sets using mathematical models and statistics to make decisions.

Early efforts at defining Artificial Intelligence began in the 1950s, showing promise as computers became more powerful. AI is not new, but how it’s done (and why it’s better) is fairly new. AI has finally made it along a successful path due to affordable (and thus widely accessible) computational power (i.e., high performance computing), an abundance of very large sets of identified data, and the maturation of AI algorithms.

Machine learning is a subset of the field of artificial intelligence. Machine learning was defined by IBM’s Samuel Arthur, a pioneer in AI, as “a field of study that gives computers the ability to learn without being explicitly programmed.”[i] One type of machine learning, patterned on neural networks of the human brain, is proving successful. The term for this concept in programming is unfortunately borrowed directly from medical terminology and is often called “neural networks,” although some refer to them as “Artificial Neural Networks” (ANN).

A biological neuron accepts inputs at the dendrites and produces output through the axons (see Figure 1). Axons fans out to connects through synapses to other dendrites on other neurons. A mathematical model for the brain’s neurons is the basis for neural networks in the machine learning. Neural networks in a machine are abstract constructs created within a computer program. In the human brain, the signals travel along the axons, which is modeled as x₀ in figure 1. The signal crosses a synapse in the brain, it’s modelled as picking up a multiplier w₀. The strength, or weight, of the multiplier at each synapse builds with other weights from other synapses (e.g., w₁x_1,w₂x₂), forming a strong influence based on a positive or negative weight. All of the values are summed up at the cell body, where if the total value is above a certain threshold, the neuron fires. In the mathematical model, the timing of the “neuron firing” is not considered important, only the firing rate (f).[ii]

Biological neurons are much more complex, dynamic, and involve much more than static weights, of course. The electronic version of “neural networks” is woefully simplistic in comparison. Since the human brain has on the order of 86 billion neurons and an estimated 10¹⁵ synapses, it makes sense that AI required accessible computational power before this technology could blossom.

Figure 1: Biological neuron (left) and a coarse and rudimentary mathematical model of the biological neuron (right). (Image: Efficient Processing of Deep Neural Networks: A Tutorial and Survey).

Conceptually speaking, the machine learning version of neural networks is made up of a few layers of these weighted “neurons” that are connected. Deep Neural Networks (DNN) have many more layers, so they can handle more complex problems. Computer vision using DNN may assign a single neuron for each pixel, for instance. Weights assigned to neurons are stored in the computer program as a matrix. Fast, multicore processors are desirable for DNN so that algorithms do not take long to compute. Speed in computation is especially important when the DNN has many layers. The inner layers are “hidden,” meaning that they perform without anyone seeing the many dynamic changes as they occur as weights influence decisions for each artificial neuron.

Figure 2: Left: A two-layer artificial neural network with three inputs, one hidden layer of four neurons, and one output layer of two neurons. Right: A three-layer neural network with three inputs, two hidden layers of four neurons each, and one output layer. Note that there are connections between neurons across layers, but not within a layer. (Image: CS231n, stanford.edu)

A machine learning algorithm is only a computer program, but it works by improving its performance with every piece of clearly identified data. Data that’s not clearly identified influences the DNN to make mistakes. For a simple example, images with and without cats in them would ideally be identified as “cat” or “not cat.” The label or property for the training (data) set in this case is “cat.” For accurate identification after training (known as “inference”), extremely large data sets are needed for training. The training data set can easily have anywhere from hundreds to more than a million images.

The availability of more clearly identified images makes the DNN more accurate, generally speaking. For instance, if you were to train a DNN to identify a ping pong ball in an image, you would feed the DNN with a training set full of images labeled with “ping pong ball” and “not ping pong ball.” However, if you only include images that also have a paddle in them (very common), you will likely get an identification for “ping pong ball” even if there is a paddle but no ball in the image. The DNN will have trained with the creator’s oversight to not include several images of ping pong balls without paddles. Therefore, the data sets that we use to train a neural network are critical in how the network later makes its decisions. You may have heard the saying from an Intel executive that “data is the new oil.”[iii] It’s noteworthy that companies are collecting data like never before. Facebook collects in the area of 350 million images every day through normal user activity. Google’s YouTube has over 1.3 billion active users in a one-month period, on average, with 300 hours of video uploaded every minute.[iv] Not all data is good, however. Data needs to be clearly identified, classified and labelled for it to be of use in AI.

There are many approaches to AI for use as smart machine tools in various areas. The main categories for the different types of machine learning algorithms are: unsupervised learning, supervised learning, reinforcement learning, and Deep Learning (DL). Unsupervised learning is useful when you need to determine any non-obvious relationships in an unlabeled dataset, i.e., a data set with no pre-assigned items. Supervised learning is just as the name implies: training progress is monitored, and feedback is injected into the process (e.g., perhaps because a label is missing for a portion of the training data set). Reinforcement learning is somewhere in between unsupervised and supervised learning. Reinforcement learning is like learning from experience, or by perceiving general patterns. In training, the model gets penalized for incorrect decisions and “rewarded” for correct decisions. AlphaGo, the machine that beat the world champion of the game Go, began with supervised learning and later self-trained using reinforcement training.[v] Note that with reinforcement learning, there is some delay involved because the feedback of correct or incorrect answer must be determined for an action. The most common Deep Learning algorithms in use today are the Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), and Reinforcement Learning (RL). The Deep Neural Network (DNN) and the Restricted Boltzmann Machine (RBM) are also DL algorithms. Some applications use more than one DL technique to obtain results.

An artificial neural network is nowhere nearly as complex as the human brain. Neural networks today are not able to provide true AI by any means. Artificial neural networks are merely fitting the parameters of a complex function to huge data sets using statistics and mathematical models to make decisions based on the patterns that they find. The dangers in relying on AI to do critical jobs for us lie within the training data set and in assuming that AI will adapt to changes without intervention. At the end of the day, AI is another form of computer programming, which is only as good as the programmer.

Lynnette Reese is Editor-in-Chief, Embedded Intel Solutions and Embedded Systems Engineering, and has been working in various roles as an electrical engineer for over two decades

[i]Puget, Jean Franscois. “What Is Machine Learning? (IT Best Kept Secret Is Optimization).” IBM Cognitive Advantage Reports, IBM Corporation, 18 May 2016, www.ibm.com/developerworks/community/blogs/jfp/entry/What_Is_Machine_Learning.

[ii] Karpathy, Andrej. “CS231n Convolutional Neural Networks for Visual Recognition.” CS231n Convolutional Neural Networks for Visual Recognition, 2018, cs231n.github.io/neural-networks-1/.

[iii] Gharib, Susie. “Intel CEO Says Data Is the New Oil.” Fortune, Fortune, 7 June 2018, fortune.com/2018/06/07/intel-ceo-brian-krzanich-data/.

[iv] salman.aslam.mughal. “ YouTube by the Numbers (2018): Stats, Demographics & Fun Facts.” Pinterest by the Numbers (2018): Stats, Demographics & Fun Facts, 5 Feb. 2018, www.omnicoreagency.com/youtube-statistics

[v] Silver, David, et al. “Mastering the Game of Go without Human Knowledge.” Nature News, Nature Publishing Group, 18 Oct. 2017, www.nature.com/articles/nature24270.