When Robots Love! : Feed Forward Back Propagating Neural Network (Regression) from Scratch without PyTorch / without TensorFlow using Only Calculus and Math, Update 04: ADAM + Derivatives

Here is a code I wrote using the online tutorials for machine learning for regression. In the current form, the code can predict complex functions with considerable accuracy! Like [Introduction and Feed Forward Back Propagating Neural Network], I tried to add as many comments as possible. Update 04 has while-loop, finally 😊 . Adam optimizer is now implemented. The code can now calculate 1st and 2nd derivatives of neural network output (Update 04)! Read theory here.

Update 01:

The hyperbolic tangent activation is implemented and the code can now be trained to learn trigonometric functions!

clear; clc;

%% traing data %%

x0=[-1*pi:pi/8:1*pi].'; %input data

y0=sin(x0); %expected output data

x=x0/max(x0); %normalized inputs

y=y0/max(y0); %normalized outputs

%% size of neural network %%

inputlayersize = 1; %input layers

outputlayersize = 1; %output layers

hiddenlayersize = 8; %hidden layers

%% weight and bias initialize; learning rate %%

w1=rand(inputlayersize, hiddenlayersize)-0.5; %weights input to hidden

w2=rand(hiddenlayersize,outputlayersize)-0.5; %weights hidden to output

b1=rand(1,hiddenlayersize)-0.5; %bias input to hidden

b2=rand(1,outputlayersize)-0.5; %bias hidden to output

lr = 0.1; %learning rate

%% forward propogation and training %%

for i=1:100000

z2=x*w1+b1;

a2=activation(z2); %hidden layer

z3=a2*w2+b2;

yhat=activation(z3); %final output (normalized)

delta3=-(y-yhat).*activationprime(z3);

DJW2= a2.'*delta3; %error hidden to output

delta2=delta3*w2.'.*activationprime(z2);

DJW1=x.'*delta2; %error weights input to hidden

w1=w1-(lr*DJW1); %updated weights input to hidden

w2=w2-(lr*DJW2); %updated weights hidden to output

b1=b1-(lr*mean(delta2)); %updated bias input to hidden

b2=b2-(lr*mean(delta3)); %updated bias hidden to output

end

%% plotting %%

yhat0=yhat*max(y0); %final outputs

hold on; grid on; box on, grid minor

set(gca,'FontSize',40)

set(gca, 'FontName', 'Times New Roman')

ylabel('sin (x)','FontSize',44)

xlabel('x','FontSize',44)

plot(x0,y0,'-','LineWidth',2,'MarkerSize',10)

plot(x0,yhat0,'+','LineWidth',2,'MarkerSize',10)

set(0,'DefaultLegendAutoUpdate','off')

legend({'Training Data','Neural Network Output'},'FontSize',44,'Location','Southeast')

%% activation %%

function [s]=activation(z)

%s=1./(1+exp(-z));

s=tanh(z);

end

%% derivative of activation %%

function [s]=activationprime(z)

%s=(exp(-z))./((1+exp(-z))).^2;

s=(sech(z)).^2;

end

Update 02:

Entirely new code, works even better than Update 01. Turns out, no activation functions are required in output layer.

%% clear and close

clear

clc

close all

%% network parameters

input_size = 1; % number of input features

hidden_size = 5; % number of neurons in the hidden layer

output_size = 1; % number of output features

learning_rate = 0.01; % learning rate

epochs = 1e7; % maximum iterations

epoch = 0; % iteration counter

loss = inf; % initial loss

loss_min = 1e-5; % minimum loss

%% generate training data

X = 0:0.05:2;

X = X';

Y = cos(X) - sin(2*X);

%% initialization

rng(1)

W1 = rand(input_size, hidden_size); % input to hidden weights

b1 = rand(1, hidden_size); % input to hidden bias

W2 = rand(hidden_size, output_size); % hidden to output weights

b2 = rand(1, output_size); % output bias

%% training

while loss >= loss_min && epoch <= epochs

% forward pass

hidden_input = X * W1 + b1;

hidden_output = 1 ./ (1 + exp(-hidden_input)); % hidden layer output

output = hidden_output * W2 + b2; % output of neural network

% loss calculation

loss = 0.5 * mean((output - Y).^2); % root mean square

% backpropagation

output_error = output - Y; % error hidden to output

hidden_error = (output_error * W2') .* (hidden_output .* (1 - hidden_output)); % error input to hidden

% update weights and biases

W2 = W2 - learning_rate * (hidden_output' * output_error);

b2 = b2 - learning_rate * sum(output_error, 1);

W1 = W1 - learning_rate * (X' * hidden_error);

b1 = b1 - learning_rate * sum(hidden_error, 1);

epoch = epoch + 1; % iteration counter

end

%% testing and plotting

test_input = 0:0.025:2; % test data

test_input = test_input';

hidden_input = test_input * W1 + b1;

hidden_output = 1 ./ (1 + exp(-hidden_input));

predicted_output = hidden_output * W2 + b2; % prediction by neural network

% plotting

hold on; grid on; box on, grid minor

set(gca,'FontSize',40)

set(gca, 'FontName', 'Times New Roman')

plot(X, Y, 'k-', 'LineWidth', 1, 'DisplayName', 'Training','MarkerSize',10);

plot(test_input, predicted_output, 'ro', 'LineWidth', 1, 'DisplayName', 'Testing','MarkerSize',10);

xlim([0 max(X)])

ylim([min(Y) max(Y)])

legend('show');

xlabel('Input');

ylabel('Output');

Update 03:

%% clear and close all

close all

clear

clc

%% make and prepare traing data

x0 = 0:0.2:9.6; % input data

x0 = x0';

y0 = x0.^2 .* sin(x0) + cos(2*x0) + 0.5 * x0; % expected output

noise_level = 20; % amount of noise (prevents over fitting)

rng(1) % random seed 1 (initialization is same everytime)

noise = noise_level * rand(size(y0)); % generate random noise

y0 = y0 + noise; % add noise

x = x0 / max(x0); % neural network variable name

y = y0 / max(y0); % neural network variable name

%% size of neural network

inputlayersize = 1; % input layer neurons

outputlayersize = 1; % output layer neurons

hiddenlayersize = 5; % hidden layer neurons

%% weight, bias initialize, learning rate, initial error and epochs initialize

w1 = rand(inputlayersize, hiddenlayersize); % weights input to hidden

w2 = rand(hiddenlayersize,outputlayersize); % weights hidden to output

b1 = rand(1,hiddenlayersize); % bias input to hidden

b2 = rand(1,outputlayersize); % bias hidden to output

vdb1 = zeros(size(b1)); % bias momentum input to hidden

vdb2 = zeros(size(b2)); % bias momentum hidden to output

vdw1 = zeros(size(w1)); % weights momentum input to hidden

vdw2 = zeros(size(w2)); % weights momentum hidden to output

B = 0.9; % beta

lr = 0.01; % learning rate

J = inf; % initial error

J_min = 1e-50; % minimum loss

epochs = 2e5; % maximum iterations

epoch = 0; % iteration counter

%% training

while J > J_min && epoch < epochs

% forward pass

z2 = x * w1 + b1;

a2 = activation(z2); % hidden layer output

z3 = a2 * w2 + b2;

yhat = z3; % neural network final output (normalized)

% yhat = (activation(x * w1 + b1) * w2 + b2) % neural network output (expanded form)

% error calculation

J = 0.5 * mean((y - yhat)).^2; % loss function

% J = 0.5 * (y - (activation(x * w1 + b1) * w2 + b2)).^2 % loss function (expanded form)

% gradient descent with momentum

dJb2 = - (y - yhat); % change in loss function w.r.t. b2

DJW2 = a2.' * dJb2; % change in loss function w.r.t. w2

dJb1 = dJb2 * w2.' .* activationprime(z2); % change in loss function w.r.t. b1

DJW1 = x .' * dJb1; % change in loss function w.r.t. w1

vdb2 = B * vdb2 + (1 - B) * dJb2; % momentum term for b2

vdw2 = B * vdw2 + (1 - B) * DJW2; % momentum term for w2

vdb1 = B * vdb1 + (1 - B) * dJb1; % momentum term for b1

vdw1 = B * vdw1 + (1 - B) * DJW1; % momentum term for w1

% backpropagation

b2 = b2 - (lr * mean(vdb2)); % updated bias hidden to output

w2 = w2 - (lr * vdw2); % updated weights hidden to output

b1 = b1 - (lr * mean(vdb1)); % updated bias input to hidden

w1 = w1 - (lr * vdw1); % updated weights input to hidden

epoch = epoch + 1; % iteration counter

end

%% testing and plotting

x_test = 0:0.3:9.6; % input data

x_test = x_test';

x_test = x_test / max(x_test);

z2 = x_test * w1 + b1;

a2 = activation(z2); % hidden layer output

z3 = a2 * w2 + b2;

yhat_test = z3; % neural network final output (normalized)

hold on; grid on; box on, grid minor

set(gca,'FontSize',40)

set(gca, 'FontName', 'Times New Roman')

ylabel('f (x)','FontSize',44)

xlabel('x','FontSize',44)

plot(x,y,'o','color',[0 0 0],'LineWidth',2,'MarkerSize',10)

plot(x_test,yhat_test,'-','color',[1 0 0],'LineWidth',2,'MarkerSize',10)

set(0,'DefaultLegendAutoUpdate','off')

legend({'Training Data','Neural Network Output'},'FontSize',44,'Location','Northwest')

%% activation

function [s]=activation(z)

s = 1 ./ (1 + exp(-z));

% s = tanh(z);

end

%% derivative of activation

function [s]=activationprime(z)

s = (exp(-z)) ./ ((1 + exp(-z))) .^2;

% s = (sech(z)) .^2;

end

Update 04:

%% clear and close all

close all

clear

clc

%% make and prepare traing data

x = -1*pi:pi/32:1*pi; % input data

x = x';

y = x.^2 .* sin(x); % expected output

%% size of neural network

inputlayersize = 1; % input layer neurons

outputlayersize = 1; % output layer neurons

hiddenlayersize = 10; % hidden layer neurons

%% weight, bias initialize, learning rate, initial error and epochs initialize

w1 = rand(inputlayersize, hiddenlayersize); % weights input to hidden

w2 = rand(hiddenlayersize,outputlayersize); % weights hidden to output

b1 = rand(1,hiddenlayersize); % bias input to hidden

b2 = rand(1,outputlayersize); % bias hidden to output

vdb1 = zeros(size(b1)); % bias momentum input to hidden

vdb2 = zeros(size(b2)); % bias momentum hidden to output

vdw1 = zeros(size(w1)); % weights momentum input to hidden

vdw2 = zeros(size(w2)); % weights momentum hidden to output

sdb1 = zeros(size(b1)); % bias RMS-prop input to hidden

sdb2 = zeros(size(b2)); % bias RMS-prop hidden to output

sdw1 = zeros(size(w1)); % weights RMS-prop input to hidden

sdw2 = zeros(size(w2)); % weights RMS-prop hidden to output

B1 = 0.9; % beta momentum

B2 = 0.999; % beta RMS-prop

E = 1e-8;

lr = 0.001; % learning rate

J = inf; % initial error

J_min = 1e-50; % minimum loss

epochs = 1e7; % maximum iterations

epoch = 0; % iteration counter

%% training

while J > J_min && epoch < epochs

% forward pass

z2 = x * w1 + b1;

a2 = activation(z2); % hidden layer output

z3 = a2 * w2 + b2;

yhat = z3; % neural network final output

% error calculation

J = 0.5 * mean((y - yhat)).^2; % loss function

% gradient descent with momentum, RMS-prop a.k.a. adam

rng(1)

dJb2 = - (y - yhat); % change in loss function w.r.t. b2

DJW2 = a2.' * dJb2; % change in loss function w.r.t. w2

dJb1 = dJb2 * w2.' .* activationprime(z2); % change in loss function w.r.t. b1

DJW1 = x .' * dJb1; % change in loss function w.r.t. w1

vdb2 = B1 * vdb2 + ((1 - B1) * dJb2); % momentum term for b2

vdw2 = B1 * vdw2 + ((1 - B1) * DJW2); % momentum term for w2

vdb1 = B1 * vdb1 + ((1 - B1) * dJb1); % momentum term for b1

vdw1 = B1 * vdw1 + ((1 - B1) * DJW1); % momentum term for w1

sdb2 = B2 * sdb2 + ((1 - B2) * dJb2.^2); % RMS-prop term for b2

sdw2 = B2 * sdw2 + ((1 - B2) * DJW2.^2); % RMS-prop term for w2

sdb1 = B2 * sdb1 + ((1 - B2) * dJb1.^2); % RMS-prop term for b1

sdw1 = B2 * sdw1 + ((1 - B2) * DJW1.^2); % RMS-prop term for w1

% backpropagation

b2 = b2 - (lr * (mean(vdb2) ./ (sqrt(mean(sdb2)) + E))); % updated bias hidden to output

w2 = w2 - (lr * (vdw2 ./ (sqrt(sdw2) + E))); % updated weights hidden to output

b1 = b1 - (lr * (mean(vdb1) ./ (sqrt(mean(sdb1)) + E))); % updated bias input to hidden

w1 = w1 - (lr * (vdw1 ./ (sqrt(sdw1) + E))); % updated weights input to hidden

epoch = epoch + 1; % iteration counter

end

%% testing and plotting

x_test = -1*pi:pi/90:1*pi; % input data

x_test = x_test';

z2_test = x_test * w1 + b1;

a2_test = activation(z2_test); % hidden layer output

z3_test = a2_test * w2 + b2;

yhat_test = z3_test; % neural network final output

dyhat_dx = w2.' .* activationprime(z2_test) * w1.'; % 1st derivative of output w.r.t. input

d2yhat_dx2 = w2.' .* (activationprime(z2_test) .* (1 - 2 * a2_test)) * (w1.^2).'; % 2nd derivative of output w.r.t. input

hold on; grid on; box on, grid minor

set(gca,'FontSize',40)

set(gca, 'FontName', 'Times New Roman')

ylabel('f (x)','FontSize',44)

xlabel('x','FontSize',44)

xlim([-1*pi 1*pi])

ylim([-4.5*pi 4.5*pi])

plot(x, y,'o','color',[1 0 0],'LineWidth',2,'MarkerSize',10)

plot(x_test, yhat_test,'-','color',[1 0 0],'LineWidth',2,'MarkerSize',10)

plot(x, 2 * x .* sin(x) + x.^2 .* cos(x),'s','color',[0 1 0],'LineWidth',2,'MarkerSize',10)

plot(x_test, dyhat_dx,'-.','color',[0 1 0],'LineWidth',2,'MarkerSize',10)

plot(x, 4 * x .* cos(x) - (x.^2 - 2) .* sin(x),'x','color',[0 0 1],'LineWidth',2,'MarkerSize',10)

plot(x_test, d2yhat_dx2,'--','color',[0 0 1],'LineWidth',2,'MarkerSize',10)

set(0,'DefaultLegendAutoUpdate','off')

legend({'Training Data','Neural Network Output','First Derivative Analytical','First Derivative Neural Network Output',...

'Second Derivative Analytical','Second Derivative Neural Network Output'},'FontSize',10,'Location','Northoutside')

%% activation

function [s]=activation(z)

s = 1 ./ (1 + exp(-z));

% s = tanh(z);

% c = 0; % adjust as needed

% beta = 0.010; % adjust as needed

% s = exp(-beta * (z - c).^2);

end

%% derivative of activation

function [s]=activationprime(z)

s = activation(z) .* (1 - activation(z));

% s = (sech(z)) .^2;

% c = 0; % adjust as needed

% beta = 0.010; % adjust as needed

% s = -2 * beta * (z - c) .* activation(z);

end

Lets see what the future bring! If you want to collaborate on the research projects related to turbo-machinery, aerodynamics, renewable energy and well, machine learning please reach out. Thank you very much for reading.

When Robots Love!

SolidWorks Videos

Thursday, December 9, 2021

Feed Forward Back Propagating Neural Network (Regression) from Scratch without PyTorch / without TensorFlow using Only Calculus and Math, Update 04: ADAM + Derivatives

No comments:

Post a Comment

About Me

Muon: An optimizer for the hidden layers of neural networks (XPU version)