Here is a code I wrote using the online tutorials for machine learning for regression. In the current form, the code can predict complex functions with considerable accuracy! Like [Introduction and Feed Forward Back Propagating Neural Network], I tried to add as many comments as possible. Update 04 has while-loop, finally 😊 . Adam optimizer is now implemented. The code can now calculate 1st and 2nd derivatives of neural network output (Update 04)! Read theory here.
Update 01:
The hyperbolic tangent activation is implemented and the code can now be trained to learn trigonometric functions!
clear; clc;
%% traing data %%
x0=[-1*pi:pi/8:1*pi].'; %input data
y0=sin(x0); %expected output data
x=x0/max(x0); %normalized inputs
y=y0/max(y0); %normalized outputs
%% size of neural network %%
inputlayersize = 1; %input layers
outputlayersize = 1; %output layers
hiddenlayersize = 8; %hidden layers
%% weight and bias initialize; learning rate %%
w1=rand(inputlayersize, hiddenlayersize)-0.5; %weights input to hidden
w2=rand(hiddenlayersize,outputlayersize)-0.5; %weights hidden to output
b1=rand(1,hiddenlayersize)-0.5; %bias input to hidden
b2=rand(1,outputlayersize)-0.5; %bias hidden to output
lr = 0.1; %learning rate
%% forward propogation and training %%
for i=1:100000
z2=x*w1+b1;
a2=activation(z2); %hidden layer
z3=a2*w2+b2;
yhat=activation(z3); %final output (normalized)
delta3=-(y-yhat).*activationprime(z3);
DJW2= a2.'*delta3; %error hidden to output
delta2=delta3*w2.'.*activationprime(z2);
DJW1=x.'*delta2; %error weights input to hidden
w1=w1-(lr*DJW1); %updated weights input to hidden
w2=w2-(lr*DJW2); %updated weights hidden to output
b1=b1-(lr*mean(delta2)); %updated bias input to hidden
b2=b2-(lr*mean(delta3)); %updated bias hidden to output
end
%% plotting %%
yhat0=yhat*max(y0); %final outputs
hold on; grid on; box on, grid minor
set(gca,'FontSize',40)
set(gca, 'FontName', 'Times New Roman')
ylabel('sin (x)','FontSize',44)
xlabel('x','FontSize',44)
plot(x0,y0,'-','LineWidth',2,'MarkerSize',10)
plot(x0,yhat0,'+','LineWidth',2,'MarkerSize',10)
set(0,'DefaultLegendAutoUpdate','off')
legend({'Training Data','Neural Network Output'},'FontSize',44,'Location','Southeast')
%% activation %%
function [s]=activation(z)
%s=1./(1+exp(-z));
s=tanh(z);
end
%% derivative of activation %%
function [s]=activationprime(z)
%s=(exp(-z))./((1+exp(-z))).^2;
s=(sech(z)).^2;
end
Update 02:
Entirely new code, works even better than Update 01. Turns out, no activation functions are required in output layer.
%% clear and close
clear
clc
close all
%% network parameters
input_size = 1; % number of input features
hidden_size = 5; % number of neurons in the hidden layer
output_size = 1; % number of output features
learning_rate = 0.01; % learning rate
epochs = 1e7; % maximum iterations
epoch = 0; % iteration counter
loss = inf; % initial loss
loss_min = 1e-5; % minimum loss
%% generate training data
X = 0:0.05:2;
X = X';
Y = cos(X) - sin(2*X);
%% initialization
rng(1)
W1 = rand(input_size, hidden_size); % input to hidden weights
b1 = rand(1, hidden_size); % input to hidden bias
W2 = rand(hidden_size, output_size); % hidden to output weights
b2 = rand(1, output_size); % output bias
%% training
while loss >= loss_min && epoch <= epochs
% forward pass
hidden_input = X * W1 + b1;
hidden_output = 1 ./ (1 + exp(-hidden_input)); % hidden layer output
output = hidden_output * W2 + b2; % output of neural network
% loss calculation
loss = 0.5 * mean((output - Y).^2); % root mean square
% backpropagation
output_error = output - Y; % error hidden to output
hidden_error = (output_error * W2') .* (hidden_output .* (1 - hidden_output)); % error input to hidden
% update weights and biases
W2 = W2 - learning_rate * (hidden_output' * output_error);
b2 = b2 - learning_rate * sum(output_error, 1);
W1 = W1 - learning_rate * (X' * hidden_error);
b1 = b1 - learning_rate * sum(hidden_error, 1);
epoch = epoch + 1; % iteration counter
end
%% testing and plotting
test_input = 0:0.025:2; % test data
test_input = test_input';
hidden_input = test_input * W1 + b1;
hidden_output = 1 ./ (1 + exp(-hidden_input));
predicted_output = hidden_output * W2 + b2; % prediction by neural network
% plotting
hold on; grid on; box on, grid minor
set(gca,'FontSize',40)
set(gca, 'FontName', 'Times New Roman')
plot(X, Y, 'k-', 'LineWidth', 1, 'DisplayName', 'Training','MarkerSize',10);
plot(test_input, predicted_output, 'ro', 'LineWidth', 1, 'DisplayName', 'Testing','MarkerSize',10);
xlim([0 max(X)])
ylim([min(Y) max(Y)])
legend('show');
xlabel('Input');
ylabel('Output');
Update 03:
%% clear and close all
close all
clear
clc
%% make and prepare traing data
x0 = 0:0.2:9.6; % input data
x0 = x0';
y0 = x0.^2 .* sin(x0) + cos(2*x0) + 0.5 * x0; % expected output
noise_level = 20; % amount of noise (prevents over fitting)
rng(1) % random seed 1 (initialization is same everytime)
noise = noise_level * rand(size(y0)); % generate random noise
y0 = y0 + noise; % add noise
x = x0 / max(x0); % neural network variable name
y = y0 / max(y0); % neural network variable name
%% size of neural network
inputlayersize = 1; % input layer neurons
outputlayersize = 1; % output layer neurons
hiddenlayersize = 5; % hidden layer neurons
%% weight, bias initialize, learning rate, initial error and epochs initialize
w1 = rand(inputlayersize, hiddenlayersize); % weights input to hidden
w2 = rand(hiddenlayersize,outputlayersize); % weights hidden to output
b1 = rand(1,hiddenlayersize); % bias input to hidden
b2 = rand(1,outputlayersize); % bias hidden to output
vdb1 = zeros(size(b1)); % bias momentum input to hidden
vdb2 = zeros(size(b2)); % bias momentum hidden to output
vdw1 = zeros(size(w1)); % weights momentum input to hidden
vdw2 = zeros(size(w2)); % weights momentum hidden to output
B = 0.9; % beta
lr = 0.01; % learning rate
J = inf; % initial error
J_min = 1e-50; % minimum loss
epochs = 2e5; % maximum iterations
epoch = 0; % iteration counter
%% training
while J > J_min && epoch < epochs
% forward pass
z2 = x * w1 + b1;
a2 = activation(z2); % hidden layer output
z3 = a2 * w2 + b2;
yhat = z3; % neural network final output (normalized)
% yhat = (activation(x * w1 + b1) * w2 + b2) % neural network output (expanded form)
% error calculation
J = 0.5 * mean((y - yhat)).^2; % loss function
% J = 0.5 * (y - (activation(x * w1 + b1) * w2 + b2)).^2 % loss function (expanded form)
% gradient descent with momentum
dJb2 = - (y - yhat); % change in loss function w.r.t. b2
DJW2 = a2.' * dJb2; % change in loss function w.r.t. w2
dJb1 = dJb2 * w2.' .* activationprime(z2); % change in loss function w.r.t. b1
DJW1 = x .' * dJb1; % change in loss function w.r.t. w1
vdb2 = B * vdb2 + (1 - B) * dJb2; % momentum term for b2
vdw2 = B * vdw2 + (1 - B) * DJW2; % momentum term for w2
vdb1 = B * vdb1 + (1 - B) * dJb1; % momentum term for b1
vdw1 = B * vdw1 + (1 - B) * DJW1; % momentum term for w1
% backpropagation
b2 = b2 - (lr * mean(vdb2)); % updated bias hidden to output
w2 = w2 - (lr * vdw2); % updated weights hidden to output
b1 = b1 - (lr * mean(vdb1)); % updated bias input to hidden
w1 = w1 - (lr * vdw1); % updated weights input to hidden
epoch = epoch + 1; % iteration counter
end
%% testing and plotting
x_test = 0:0.3:9.6; % input data
x_test = x_test';
x_test = x_test / max(x_test);
z2 = x_test * w1 + b1;
a2 = activation(z2); % hidden layer output
z3 = a2 * w2 + b2;
yhat_test = z3; % neural network final output (normalized)
hold on; grid on; box on, grid minor
set(gca,'FontSize',40)
set(gca, 'FontName', 'Times New Roman')
ylabel('f (x)','FontSize',44)
xlabel('x','FontSize',44)
plot(x,y,'o','color',[0 0 0],'LineWidth',2,'MarkerSize',10)
plot(x_test,yhat_test,'-','color',[1 0 0],'LineWidth',2,'MarkerSize',10)
set(0,'DefaultLegendAutoUpdate','off')
legend({'Training Data','Neural Network Output'},'FontSize',44,'Location','Northwest')
%% activation
function [s]=activation(z)
s = 1 ./ (1 + exp(-z));
% s = tanh(z);
end
%% derivative of activation
function [s]=activationprime(z)
s = (exp(-z)) ./ ((1 + exp(-z))) .^2;
% s = (sech(z)) .^2;
end
%% clear and close all
close all
clear
clc
%% make and prepare traing data
x = -1*pi:pi/32:1*pi; % input data
x = x';
y = x.^2 .* sin(x); % expected output
%% size of neural network
inputlayersize = 1; % input layer neurons
outputlayersize = 1; % output layer neurons
hiddenlayersize = 10; % hidden layer neurons
%% weight, bias initialize, learning rate, initial error and epochs initialize
w1 = rand(inputlayersize, hiddenlayersize); % weights input to hidden
w2 = rand(hiddenlayersize,outputlayersize); % weights hidden to output
b1 = rand(1,hiddenlayersize); % bias input to hidden
b2 = rand(1,outputlayersize); % bias hidden to output
vdb1 = zeros(size(b1)); % bias momentum input to hidden
vdb2 = zeros(size(b2)); % bias momentum hidden to output
vdw1 = zeros(size(w1)); % weights momentum input to hidden
vdw2 = zeros(size(w2)); % weights momentum hidden to output
sdb1 = zeros(size(b1)); % bias RMS-prop input to hidden
sdb2 = zeros(size(b2)); % bias RMS-prop hidden to output
sdw1 = zeros(size(w1)); % weights RMS-prop input to hidden
sdw2 = zeros(size(w2)); % weights RMS-prop hidden to output
B1 = 0.9; % beta momentum
B2 = 0.999; % beta RMS-prop
E = 1e-8;
lr = 0.001; % learning rate
J = inf; % initial error
J_min = 1e-50; % minimum loss
epochs = 1e7; % maximum iterations
epoch = 0; % iteration counter
%% training
while J > J_min && epoch < epochs
% forward pass
z2 = x * w1 + b1;
a2 = activation(z2); % hidden layer output
z3 = a2 * w2 + b2;
yhat = z3; % neural network final output
% error calculation
J = 0.5 * mean((y - yhat)).^2; % loss function
% gradient descent with momentum, RMS-prop a.k.a. adam
rng(1)
dJb2 = - (y - yhat); % change in loss function w.r.t. b2
DJW2 = a2.' * dJb2; % change in loss function w.r.t. w2
dJb1 = dJb2 * w2.' .* activationprime(z2); % change in loss function w.r.t. b1
DJW1 = x .' * dJb1; % change in loss function w.r.t. w1
vdb2 = B1 * vdb2 + ((1 - B1) * dJb2); % momentum term for b2
vdw2 = B1 * vdw2 + ((1 - B1) * DJW2); % momentum term for w2
vdb1 = B1 * vdb1 + ((1 - B1) * dJb1); % momentum term for b1
vdw1 = B1 * vdw1 + ((1 - B1) * DJW1); % momentum term for w1
sdb2 = B2 * sdb2 + ((1 - B2) * dJb2.^2); % RMS-prop term for b2
sdw2 = B2 * sdw2 + ((1 - B2) * DJW2.^2); % RMS-prop term for w2
sdb1 = B2 * sdb1 + ((1 - B2) * dJb1.^2); % RMS-prop term for b1
sdw1 = B2 * sdw1 + ((1 - B2) * DJW1.^2); % RMS-prop term for w1
% backpropagation
b2 = b2 - (lr * (mean(vdb2) ./ (sqrt(mean(sdb2)) + E))); % updated bias hidden to output
w2 = w2 - (lr * (vdw2 ./ (sqrt(sdw2) + E))); % updated weights hidden to output
b1 = b1 - (lr * (mean(vdb1) ./ (sqrt(mean(sdb1)) + E))); % updated bias input to hidden
w1 = w1 - (lr * (vdw1 ./ (sqrt(sdw1) + E))); % updated weights input to hidden
epoch = epoch + 1; % iteration counter
end
%% testing and plotting
x_test = -1*pi:pi/90:1*pi; % input data
x_test = x_test';
z2_test = x_test * w1 + b1;
a2_test = activation(z2_test); % hidden layer output
z3_test = a2_test * w2 + b2;
yhat_test = z3_test; % neural network final output
dyhat_dx = w2.' .* activationprime(z2_test) * w1.'; % 1st derivative of output w.r.t. input
d2yhat_dx2 = w2.' .* (activationprime(z2_test) .* (1 - 2 * a2_test)) * (w1.^2).'; % 2nd derivative of output w.r.t. input
hold on; grid on; box on, grid minor
set(gca,'FontSize',40)
set(gca, 'FontName', 'Times New Roman')
ylabel('f (x)','FontSize',44)
xlabel('x','FontSize',44)
xlim([-1*pi 1*pi])
ylim([-4.5*pi 4.5*pi])
plot(x, y,'o','color',[1 0 0],'LineWidth',2,'MarkerSize',10)
plot(x_test, yhat_test,'-','color',[1 0 0],'LineWidth',2,'MarkerSize',10)
plot(x, 2 * x .* sin(x) + x.^2 .* cos(x),'s','color',[0 1 0],'LineWidth',2,'MarkerSize',10)
plot(x_test, dyhat_dx,'-.','color',[0 1 0],'LineWidth',2,'MarkerSize',10)
plot(x, 4 * x .* cos(x) - (x.^2 - 2) .* sin(x),'x','color',[0 0 1],'LineWidth',2,'MarkerSize',10)
plot(x_test, d2yhat_dx2,'--','color',[0 0 1],'LineWidth',2,'MarkerSize',10)
set(0,'DefaultLegendAutoUpdate','off')
legend({'Training Data','Neural Network Output','First Derivative Analytical','First Derivative Neural Network Output',...
'Second Derivative Analytical','Second Derivative Neural Network Output'},'FontSize',10,'Location','Northoutside')
%% activation
function [s]=activation(z)
s = 1 ./ (1 + exp(-z));
% s = tanh(z);
% c = 0; % adjust as needed
% beta = 0.010; % adjust as needed
% s = exp(-beta * (z - c).^2);
end
%% derivative of activation
function [s]=activationprime(z)
s = activation(z) .* (1 - activation(z));
% s = (sech(z)) .^2;
% c = 0; % adjust as needed
% beta = 0.010; % adjust as needed
% s = -2 * beta * (z - c) .* activation(z);
end
Lets see what the future bring! If you want to collaborate on the research projects related to turbo-machinery, aerodynamics, renewable energy and well, machine learning please reach out. Thank you very much for reading.