Artificial Intelligence: Neural Network Demistified

Artificial Intelligence is the hot topic now a days. You can find many tutorial on web, while most of them only concentrate about outer working model for machine learning mainly the coding and not saying much about the mathematics and science behind it . Its just like a magician showing you his magical tricks and not revealing the secret. This article focuses about what is happening inside a Deep Neural Network and try to untwist the terminology associated with Deep Learning.

Fields of Artificial Intelligence

Deep Neural Network Deep Learning is a part of Supervised Machine Learning in which we have some training data with Features and Labels (Target). Deep Neural Network was built keeping in mind, how the Human Brain functions.

A Simple Neuron

Biological Interpretation of Neuron

The Mathematical Analogy of the Neuron is something like

input vector X
weight matrix between input and hidden W
bias vector on hidden layer B
activation function on hidden nodes f()
output of the hidden layer Y

Y = f(Z) = f(X * W + B)

Mathematical Interpretation of a Neuron

How a Neural Model Works To understand the crux of Neural Networks let’s take an example. Suppose we want to model a Logical AND gate using Neural network. The truth table of the AND gate is as follows:-

Initially our Neural Network does not know anything about what a AND Logic is?

Let’s take some random weights W=[ 1 , 6 , 2] and Bias = +1

W1=1
W2=6
W3=2
b= 1

There are many activation functions available like sigmoid, Relu, SoftMax, tanh etc.., for our problem we will be using sigmoid activation function.

Mathematical Representation of Sigmoid Function

Using above formula, the Predicted Y for inputs A = 0 and B = 0

Now the expected output was 0 but we got 0.7310, which is obviously not correct. If we look closely at the equation the only parameter we can control is Weight W, we cannot change either input (A,B) nor output Y. But changing the weight values randomly again does not guarantee the correct output and Brute forcing every combination will be silly.

So, let’s try to minimize the error between our Predicted output and Actual output. Mean Square Error= (0−0.7310)^2/ 2 = 0.2672

Since Mean Square Error is a function which we need to minimize, the differentiation of the function w.r.t Weight variable can give us the ∆W, which we can use to adjust our weights.

∆W = Differentiation of the Error Function w.r.t W1, W2 and W3

W =W + ∆W;

Adjusting the previous Weights is called Backpropagation in Deep Neural Network and now we can say that our model has learned something new by adjusting its weight matrix. Suppose, obtained delta weights are ∆W1 = -2; ∆W2 = -2, ∆W3 =0.

So, our new Weight matrix will be

Let’s try to predict the Output with new weight values using our Equation for output

The Predicted Output is 0.26 which is much better than the previous predicted output. This adjustment of Weight over one complete cycle is called Epoch in Deep Learning. We keep on iterating over the model and keep on calculating ∆W and adjust our weight matrix until the mean squared error is minimized and our predicted output is equivalent to expected output.

Once our model is sufficiently trained the weight matrix becomes W = [-3 ,2, 2]