SolidWorks Videos

Friday, December 10, 2021

Feed Forward Back Propagating Neural Network (Regression) from Scratch without PyTorch / without TensorFlow using Only Calculus and Math, Update 04: ADAM + Derivatives

     Here is a code I wrote using the online tutorials for machine learning for regression. In the current form, the code can predict complex functions with considerable accuracy! Like [Introduction and Feed Forward Back Propagating Neural Network], I tried to add as many comments as possible. Update 04 has while-loop, finally 😊 . Adam optimizer is now implemented. The code can now calculate 1st and 2nd derivatives of neural network output (Update 04)! Read theory here.

Update 01:

The hyperbolic tangent activation is implemented and the code can now be trained to learn trigonometric functions!


clear; clc;


%% traing data %%

x0=[-1*pi:pi/8:1*pi].'; %input data

y0=sin(x0); %expected output data

x=x0/max(x0); %normalized inputs

y=y0/max(y0); %normalized outputs


%% size of neural network %%

inputlayersize = 1; %input layers

outputlayersize = 1; %output layers

hiddenlayersize = 8; %hidden layers


%% weight and bias initialize; learning rate %%

w1=rand(inputlayersize, hiddenlayersize)-0.5; %weights input to hidden

w2=rand(hiddenlayersize,outputlayersize)-0.5; %weights hidden to output

b1=rand(1,hiddenlayersize)-0.5; %bias input to hidden

b2=rand(1,outputlayersize)-0.5; %bias hidden to output

lr = 0.1; %learning rate


%% forward propogation and training %%

for i=1:100000

    z2=x*w1+b1;

    a2=activation(z2); %hidden layer

    z3=a2*w2+b2;

    yhat=activation(z3); %final output (normalized)

    delta3=-(y-yhat).*activationprime(z3);

    DJW2= a2.'*delta3; %error hidden to output

    delta2=delta3*w2.'.*activationprime(z2);

    DJW1=x.'*delta2; %error weights input to hidden

    w1=w1-(lr*DJW1); %updated weights input to hidden

    w2=w2-(lr*DJW2); %updated weights hidden to output

    b1=b1-(lr*mean(delta2)); %updated bias input to hidden

    b2=b2-(lr*mean(delta3)); %updated bias hidden to output

end


%% plotting %%

yhat0=yhat*max(y0); %final outputs

hold on; grid on; box on, grid minor

set(gca,'FontSize',40)

set(gca, 'FontName', 'Times New Roman')

ylabel('sin (x)','FontSize',44)

xlabel('x','FontSize',44)

plot(x0,y0,'-','LineWidth',2,'MarkerSize',10)

plot(x0,yhat0,'+','LineWidth',2,'MarkerSize',10)

set(0,'DefaultLegendAutoUpdate','off')

legend({'Training Data','Neural Network Output'},'FontSize',44,'Location','Southeast')


%% activation %%

function [s]=activation(z)

%s=1./(1+exp(-z));

s=tanh(z);

end


%% derivative of activation %%

function [s]=activationprime(z)

%s=(exp(-z))./((1+exp(-z))).^2;

s=(sech(z)).^2;

end

Update 02:

      Entirely new code, works even better than Update 01. Turns out, no activation functions are required in output layer.


%% clear and close

clear

clc

close all


%% network parameters

input_size = 1; % number of input features

hidden_size = 5; % number of neurons in the hidden layer

output_size = 1; % number of output features

learning_rate = 0.01; % learning rate

epochs = 1e7; % maximum iterations

epoch = 0; % iteration counter

loss = inf; % initial loss

loss_min = 1e-5; % minimum loss


%% generate training data

X = 0:0.05:2;

X = X';

Y = cos(X) - sin(2*X);


%% initialization

rng(1)

W1 = rand(input_size, hidden_size); % input to hidden weights

b1 = rand(1, hidden_size); % input to hidden bias

W2 = rand(hidden_size, output_size); % hidden to output weights

b2 = rand(1, output_size); % output bias


%% training

while loss >= loss_min && epoch <= epochs

    % forward pass

    hidden_input = X * W1 + b1;

    hidden_output = 1 ./ (1 + exp(-hidden_input)); % hidden layer output

    output = hidden_output * W2 + b2; % output of neural network


    % loss calculation

    loss = 0.5 * mean((output - Y).^2); % root mean square


    % backpropagation

    output_error = output - Y; % error hidden to output

    hidden_error = (output_error * W2') .* (hidden_output .* (1 - hidden_output)); % error input to hidden


    % update weights and biases

    W2 = W2 - learning_rate * (hidden_output' * output_error);

    b2 = b2 - learning_rate * sum(output_error, 1);

    W1 = W1 - learning_rate * (X' * hidden_error);

    b1 = b1 - learning_rate * sum(hidden_error, 1);

    epoch = epoch + 1; % iteration counter

end


%% testing and plotting

test_input = 0:0.025:2; % test data

test_input = test_input';

hidden_input = test_input * W1 + b1;

hidden_output = 1 ./ (1 + exp(-hidden_input));

predicted_output = hidden_output * W2 + b2; % prediction by neural network


% plotting

hold on; grid on; box on, grid minor

set(gca,'FontSize',40)

set(gca, 'FontName', 'Times New Roman')

plot(X, Y, 'k-', 'LineWidth', 1, 'DisplayName', 'Training','MarkerSize',10);

plot(test_input, predicted_output, 'ro', 'LineWidth', 1, 'DisplayName', 'Testing','MarkerSize',10);

xlim([0 max(X)])

ylim([min(Y) max(Y)])

legend('show');

xlabel('Input');

ylabel('Output');


Update 03:

%% clear and close all
close all
clear
clc

%% make and prepare traing data
x0 = 0:0.2:9.6; % input data
x0 = x0';
y0 = x0.^2 .* sin(x0) + cos(2*x0) + 0.5 * x0; % expected output
noise_level = 20; % amount of noise (prevents over fitting)
rng(1) % random seed 1 (initialization is same everytime)
noise = noise_level * rand(size(y0)); % generate random noise
y0 = y0 + noise; % add noise
x = x0 / max(x0); % neural network variable name
y = y0 / max(y0); % neural network variable name

%% size of neural network
inputlayersize = 1; % input layer neurons
outputlayersize = 1; % output layer neurons
hiddenlayersize = 5; % hidden layer neurons

%% weight, bias initialize, learning rate, initial error and epochs initialize
w1 = rand(inputlayersize, hiddenlayersize); % weights input to hidden
w2 = rand(hiddenlayersize,outputlayersize); % weights hidden to output
b1 = rand(1,hiddenlayersize); % bias input to hidden
b2 = rand(1,outputlayersize); % bias hidden to output
vdb1 = zeros(size(b1)); % bias momentum input to hidden
vdb2 = zeros(size(b2)); % bias momentum hidden to output
vdw1 = zeros(size(w1)); % weights momentum input to hidden
vdw2 = zeros(size(w2)); % weights momentum hidden to output
B = 0.9; % beta
lr = 0.01; % learning rate
J = inf; % initial error
J_min = 1e-50; % minimum loss
epochs = 2e5; % maximum iterations
epoch = 0; % iteration counter

%% training
while J > J_min && epoch < epochs

    % forward pass
    z2 = x * w1 + b1;
    a2 = activation(z2); % hidden layer output
    z3 = a2 * w2 + b2;
    yhat = z3; % neural network final output (normalized)
    % yhat = (activation(x * w1 + b1) * w2 + b2) % neural network output (expanded form)

    % error calculation
    J = 0.5 * mean((y - yhat)).^2; % loss function
    % J = 0.5 * (y - (activation(x * w1 + b1) * w2 + b2)).^2 % loss function (expanded form)

    % gradient descent with momentum
    dJb2 = - (y - yhat); % change in loss function w.r.t. b2
    DJW2 = a2.' * dJb2; % change in loss function w.r.t. w2
    dJb1 = dJb2 * w2.' .* activationprime(z2); % change in loss function w.r.t. b1
    DJW1 = x .' * dJb1; % change in loss function w.r.t. w1

    vdb2 = B * vdb2 + (1 - B) * dJb2; % momentum term for b2
    vdw2 = B * vdw2 + (1 - B) * DJW2; % momentum term for w2
    vdb1 = B * vdb1 + (1 - B) * dJb1; % momentum term for b1
    vdw1 = B * vdw1 + (1 - B) * DJW1; % momentum term for w1

    % backpropagation
    b2 = b2 - (lr * mean(vdb2)); % updated bias hidden to output
    w2 = w2 - (lr * vdw2); % updated weights hidden to output
    b1 = b1 - (lr * mean(vdb1)); % updated bias input to hidden
    w1 = w1 - (lr * vdw1); % updated weights input to hidden

    epoch = epoch + 1; % iteration counter
end

%% testing and plotting
x_test = 0:0.3:9.6; % input data
x_test = x_test';
x_test = x_test / max(x_test);
z2 = x_test * w1 + b1;
a2 = activation(z2); % hidden layer output
z3 = a2 * w2 + b2;
yhat_test = z3; % neural network final output (normalized)

hold on; grid on; box on, grid minor
set(gca,'FontSize',40)
set(gca, 'FontName', 'Times New Roman')
ylabel('f (x)','FontSize',44)
xlabel('x','FontSize',44)
plot(x,y,'o','color',[0 0 0],'LineWidth',2,'MarkerSize',10)
plot(x_test,yhat_test,'-','color',[1 0 0],'LineWidth',2,'MarkerSize',10)
set(0,'DefaultLegendAutoUpdate','off')
legend({'Training Data','Neural Network Output'},'FontSize',44,'Location','Northwest')

%% activation
function [s]=activation(z)
s = 1 ./ (1 + exp(-z));
% s = tanh(z);
end

%% derivative of activation
function [s]=activationprime(z)
s = (exp(-z)) ./ ((1 + exp(-z))) .^2;
% s = (sech(z)) .^2;
end

Update 04:

%% clear and close all
close all
clear
clc

%% make and prepare traing data
x = -1*pi:pi/32:1*pi; % input data
x = x';
y = x.^2 .* sin(x); % expected output

%% size of neural network
inputlayersize = 1; % input layer neurons
outputlayersize = 1; % output layer neurons
hiddenlayersize = 10; % hidden layer neurons

%% weight, bias initialize, learning rate, initial error and epochs initialize
w1 = rand(inputlayersize, hiddenlayersize); % weights input to hidden
w2 = rand(hiddenlayersize,outputlayersize); % weights hidden to output
b1 = rand(1,hiddenlayersize); % bias input to hidden
b2 = rand(1,outputlayersize); % bias hidden to output
vdb1 = zeros(size(b1)); % bias momentum input to hidden
vdb2 = zeros(size(b2)); % bias momentum hidden to output
vdw1 = zeros(size(w1)); % weights momentum input to hidden
vdw2 = zeros(size(w2)); % weights momentum hidden to output
sdb1 = zeros(size(b1)); % bias RMS-prop input to hidden
sdb2 = zeros(size(b2)); % bias RMS-prop hidden to output
sdw1 = zeros(size(w1)); % weights RMS-prop input to hidden
sdw2 = zeros(size(w2)); % weights RMS-prop hidden to output
B1 = 0.9; % beta momentum
B2 = 0.999; % beta RMS-prop
E = 1e-8; 
lr = 0.001; % learning rate
J = inf; % initial error
J_min = 1e-50; % minimum loss
epochs = 1e7; % maximum iterations
epoch = 0; % iteration counter

%% training
while J > J_min && epoch < epochs

    % forward pass
    z2 = x * w1 + b1;
    a2 = activation(z2); % hidden layer output
    z3 = a2 * w2 + b2;
    yhat = z3; % neural network final output

    % error calculation
    J = 0.5 * mean((y - yhat)).^2; % loss function

    % gradient descent with momentum, RMS-prop a.k.a. adam
    rng(1)
    dJb2 = - (y - yhat); % change in loss function w.r.t. b2
    DJW2 = a2.' * dJb2; % change in loss function w.r.t. w2
    dJb1 = dJb2 * w2.' .* activationprime(z2); % change in loss function w.r.t. b1
    DJW1 = x .' * dJb1; % change in loss function w.r.t. w1

    vdb2 = B1 * vdb2 + ((1 - B1) * dJb2); % momentum term for b2
    vdw2 = B1 * vdw2 + ((1 - B1) * DJW2); % momentum term for w2
    vdb1 = B1 * vdb1 + ((1 - B1) * dJb1); % momentum term for b1
    vdw1 = B1 * vdw1 + ((1 - B1) * DJW1); % momentum term for w1

    sdb2 = B2 * sdb2 + ((1 - B2) * dJb2.^2); % RMS-prop term for b2
    sdw2 = B2 * sdw2 + ((1 - B2) * DJW2.^2); % RMS-prop term for w2
    sdb1 = B2 * sdb1 + ((1 - B2) * dJb1.^2); % RMS-prop term for b1
    sdw1 = B2 * sdw1 + ((1 - B2) * DJW1.^2); % RMS-prop term for w1

    % backpropagation
    b2 = b2 - (lr * (mean(vdb2) ./ (sqrt(mean(sdb2)) + E))); % updated bias hidden to output
    w2 = w2 - (lr * (vdw2 ./ (sqrt(sdw2) + E))); % updated weights hidden to output
    b1 = b1 - (lr * (mean(vdb1) ./ (sqrt(mean(sdb1)) + E))); % updated bias input to hidden
    w1 = w1 - (lr * (vdw1 ./ (sqrt(sdw1) + E))); % updated weights input to hidden

    epoch = epoch + 1; % iteration counter
end

%% testing and plotting
x_test = -1*pi:pi/90:1*pi; % input data
x_test = x_test';
z2_test = x_test * w1 + b1;
a2_test = activation(z2_test); % hidden layer output
z3_test = a2_test * w2 + b2;
yhat_test = z3_test; % neural network final output

dyhat_dx = w2.' .* activationprime(z2_test) * w1.'; % 1st derivative of output w.r.t. input
d2yhat_dx2 = w2.' .* (activationprime(z2_test) .* (1 - 2 * a2_test)) * (w1.^2).'; % 2nd derivative of output w.r.t. input

hold on; grid on; box on, grid minor
set(gca,'FontSize',40)
set(gca, 'FontName', 'Times New Roman')
ylabel('f (x)','FontSize',44)
xlabel('x','FontSize',44)
xlim([-1*pi 1*pi])
ylim([-4.5*pi 4.5*pi])

plot(x, y,'o','color',[1 0 0],'LineWidth',2,'MarkerSize',10)
plot(x_test, yhat_test,'-','color',[1 0 0],'LineWidth',2,'MarkerSize',10)

plot(x, 2 * x .* sin(x) + x.^2 .* cos(x),'s','color',[0 1 0],'LineWidth',2,'MarkerSize',10)
plot(x_test, dyhat_dx,'-.','color',[0 1 0],'LineWidth',2,'MarkerSize',10)

plot(x, 4 * x .* cos(x) - (x.^2 - 2) .* sin(x),'x','color',[0 0 1],'LineWidth',2,'MarkerSize',10)
plot(x_test, d2yhat_dx2,'--','color',[0 0 1],'LineWidth',2,'MarkerSize',10)

set(0,'DefaultLegendAutoUpdate','off')
legend({'Training Data','Neural Network Output','First Derivative Analytical','First Derivative Neural Network Output',...
    'Second Derivative Analytical','Second Derivative Neural Network Output'},'FontSize',10,'Location','Northoutside')

%% activation
function [s]=activation(z)
s = 1 ./ (1 + exp(-z));
% s = tanh(z);
% c = 0; % adjust as needed
% beta = 0.010; % adjust as needed
% s = exp(-beta * (z - c).^2);
end

%% derivative of activation
function [s]=activationprime(z)
s = activation(z) .* (1 - activation(z));
% s = (sech(z)) .^2;
% c = 0; % adjust as needed
% beta = 0.010; % adjust as needed
% s = -2 * beta * (z - c) .* activation(z);
end


     Lets see what the future bring! If you want to collaborate on the research projects related to turbo-machinery, aerodynamics, renewable energy and well, machine learning please reach out. Thank you very much for reading.

No comments:

Post a Comment