Creating a Custom Neural Network with Python from Scratch

6 min readFeb 9, 2023

Building a neural network from scratch in Python can be a great way to learn about deep learning and understand how neural networks work. Here is a general outline of the steps to build a simple neural network in Python:

1. Import necessary libraries: You will need to import libraries such as Numpy and Matplotlib for matrix operations and visualizing the results.

2. Prepare the data: This includes loading the data, preprocessing it, and splitting it into training and testing sets.

3. Define the model architecture: This includes defining the input, hidden, and output layers. You can also specify the activation function to use and the loss function for training.

4. Initialize the weights: Randomly initialize the weights of the model.

5. Feedforward: Implement the feedforward calculation, where the inputs are passed through the model to get the output.

6. Calculate the loss: The loss between the predicted output and the actual output.

7. Backpropagation: Implement the backpropagation algorithm to update the weights and reduce the loss.

8. Train the model: Train the model for a specified number of epochs, updating the weights after each iteration.

9. Evaluate the model: Evaluate the model on the test data to get the accuracy.

10. Make predictions: Use the trained model to make predictions on new data.

https://www.investopedia.com/terms/n/neuralnetwork.asp

This is a high-level overview of the steps involved in building a neural network from scratch in Python. Understanding the underlying concepts and math is essential before implementing a neural network.

import numpy as np
def sigmoid(x):
 return 1/(1 + np.exp(-x))
def sigmoid_derivative(x):
 return x * (1 - x)
class NeuralNetwork:
 def __init__(self, x, y):
 self.input = x
 self.weights1 = np.random.rand(self.input.shape[1], 4) 
 self.weights2 = np.random.rand(4, 1) 
 self.y = y
 self.output = np.zeros(y.shape)
def feedforward(self):
 self.layer1 = sigmoid(np.dot(self.input, self.weights1))
 self.output = sigmoid(np.dot(self.layer1, self.weights2))
def backprop(self):
 d_weights2 = np.dot(self.layer1.T, 2*(self.y - self.output) * sigmoid_derivative(self.output))
 d_weights1 = np.dot(self.input.T, np.dot(2*(self.y - self.output) * sigmoid_derivative(self.output), self.weights2.T) * sigmoid_derivative(self.layer1))
self.weights1 += d_weights1
 self.weights2 += d_weights2
def train(self, X, y):
 self.output = np.zeros(y.shape)
 self.input = X
 self.y = y
 self.feedforward()
 self.backprop()
X = np.array([[0,0,1], [0,1,1], [1,0,1], [1,1,1]])
y = np.array([[0],[1],[1],[0]])
nn = NeuralNetwork(X, y)
for i in range(1500):
 nn.train(X, y)
print(nn.output)

This code builds a simple feedforward neural network with a single hidden layer and trains it on the XOR problem. The network has an input layer, two weight matrices, and an output layer. The sigmoid function and its derivative, sigmoid_derivative, are used as the activation functions. The feedforward method implements the feedforward calculation and the backprop method implements the backpropagation algorithm to update the weights. The train method trains the model for a single iteration. The model is trained for 1500 iterations and the final output is printed.

A more advanced one is

import numpy as np

def relu(x):
    return np.maximum(0, x)

def relu_derivative(x):
    x[x<=0] = 0
    x[x>0] = 1
    return x

class ConvolutionalNeuralNetwork:
    def __init__(self, input_shape, num_filters, filter_size, pool_size):
        self.input_shape = input_shape
        self.num_filters = num_filters
        self.filter_size = filter_size
        self.pool_size = pool_size
        self.weights = np.random.rand(num_filters, filter_size, filter_size) / (filter_size * filter_size)
        self.bias = np.zeros((num_filters, 1))

    def convolution(self, input_data):
        self.feature_maps = np.zeros((input_data.shape[0]-self.filter_size+1, input_data.shape[1]-self.filter_size+1, self.num_filters))
        for filter_idx in range(self.num_filters):
            current_filter = self.weights[filter_idx]
            for i in range(self.feature_maps.shape[0]):
                for j in range(self.feature_maps.shape[1]):
                    self.feature_maps[i][j][filter_idx] = np.sum(input_data[i:i+self.filter_size, j:j+self.filter_size] * current_filter)
        return relu(self.feature_maps + self.bias)

    def max_pooling(self, feature_maps):
        self.pooled_features = np.zeros((int(feature_maps.shape[0]/self.pool_size), int(feature_maps.shape[1]/self.pool_size), self.num_filters))
        for filter_idx in range(self.num_filters):
            for i in range(0, feature_maps.shape[0], self.pool_size):
                for j in range(0, feature_maps.shape[1], self.pool_size):
                    self.pooled_features[int(i/self.pool_size)][int(j/self.pool_size)][filter_idx] = np.max(feature_maps[i:i+self.pool_size, j:j+self.pool_size, filter_idx])
        return self.pooled_features

    def feedforward(self, input_data):
        conv_out = self.convolution(input_data)
        pool_out = self.max_pooling(conv_out)
        return pool_out
        
    def backprop(self, input_data, gradient_signal):
        pool_out_grad = np.zeros(self.feature_maps.shape)
        for filter_idx in range(self.num_filters):
            for i in range(0, self.feature_maps.shape[0], self.pool_size):
                for j in range(0, self.feature_maps.shape[1], self.pool_size):
                    pool_out_grad[i:i+self.pool_size, j:j+self.pool_size, filter_idx] = gradient_signal[int(i/self.pool_size)][int(j/self.pool_size)][filter_idx]
        conv_out_grad = relu_derivative(self.feature_maps) * pool_out_grad
        self.weights_grad = np.zeros(self.weights.shape)
        self.bias_grad = np.zeros(self.bias.shape)
        for filter_idx in range(self.num_filters):
            for i in range(self.feature_maps.shape[0]):
                for j in range(self.feature_maps.shape[1]):
                    self.weights_grad[filter_idx] += input_data[i:i+self.filter_size, j:j+self.filter_size] * conv_out_grad[i][j][filter_idx]
                    self.bias_grad[filter_idx] += conv_out_grad[i][j][filter_idx]
        return conv_out_grad

    def update(self, learning_rate):
        self.weights -= learning_rate * self.weights_grad
        self.bias -= learning_rate * self.bias_grad

class FullyConnectedNeuralNetwork:
    def __init__(self, input_shape, num_classes):
        self.input_shape = input_shape
        self.num_classes = num_classes
        self.weights = np.random.rand(input_shape, num_classes) / input_shape
        self.bias = np.zeros((num_classes, 1))

    def feedforward(self, input_data):
        self.input_data = input_data
        return relu(np.dot(input_data, self.weights) + self.bias)

    def backprop(self, gradient_signal):
        self.weights_grad = np.dot(self.input_data.T, gradient_signal)
        self.bias_grad = np.sum(gradient_signal, axis=0, keepdims=True)
        return np.dot(gradient_signal, self.weights.T)

    def update(self, learning_rate):
        self.weights -= learning_rate * self.weights_grad
        self.bias -= learning_rate * self.bias_grad

class Softmax:
    def __init__(self):
        pass

    def feedforward(self, input_data):
        self.input_data = input_data
        exp_values = np.exp(input_data - np.max(input_data, axis=1, keepdims=True))
        probabilities = exp_values / np.sum(exp_values, axis=1, keepdims=True)
        return probabilities

    def backprop(self, gradient_signal):
        return gradient_signal

    def update(self, learning_rate):
        pass

class CrossEntropyLoss:
    def __init__(self):
        pass

    def feedforward(self, input_data, target):
        self.input_data = input_data
        self.target = target
        return -np.sum(target * np.log(input_data + 1e-8))

    def backprop(self):
        return -self.target / self.input_data

    def update(self, learning_rate):
        pass


class ConvolutionalNeuralNetworkClassifier:
    def __init__(self, input_shape, num_classes, num_filters, filter_size, pool_size):
        self.convolutional_neural_network = ConvolutionalNeuralNetwork(input_shape, num_filters, filter_size, pool_size)
        self.fully_connected_neural_network = FullyConnectedNeuralNetwork(int((input_shape[0]-filter_size+1)/pool_size) * int((input_shape[1]-filter_size+1)/pool_size) * num_filters, num_classes)
        self.softmax = Softmax()
        self.cross_entropy_loss = CrossEntropyLoss()

    def feedforward(self, input_data):
        conv_out = self.convolutional_neural_network.feedforward(input_data)
        pool_out = self.convolutional_neural_network.max_pooling(conv_out)
        fc_out = self.fully_connected_neural_network.feedforward(pool_out.reshape(pool_out.shape[0], -1))
        return self.softmax.feedforward(fc_out)

    def backprop(self, gradient_signal):
        softmax_grad = self.softmax.backprop(gradient_signal)
        fc_grad = self.fully_connected_neural_network.backprop(softmax_grad)
        conv_grad = self.convolutional_neural_network.backprop(self.convolutional_neural_network.pooled_features.reshape(self.convolutional_neural_network.pooled_features.shape[0], -1), fc_grad.reshape(self.convolutional_neural_network.pooled_features.shape))
        return conv_grad

    def update(self, learning_rate):
        self.convolutional_neural_network.update(learning_rate)
        self.fully_connected_neural_network.update(learning_rate)

    def train(self, input_data, target, learning_rate):
        output = self.feedforward(input_data)
        loss = self.cross_entropy_loss.feedforward(output, target)
        gradient = self.cross_entropy_loss.backprop()
        self.backprop(gradient)
        self.update(learning_rate)
        return loss

    def predict(self, input_data):
        output = self.feedforward(input_data)
        return np.argmax(output, axis=1)


def train(model, input_data, target, learning_rate):
    loss = model.train(input_data, target, learning_rate)
    return loss

def test(model, input_data, target):
    predictions = model.predict(input_data)
    accuracy = np.mean(predictions == np.argmax(target, axis=1))
    return accuracy

def main():
    # Load data
    train_data, train_target, test_data, test_target = load_data()
    # Create model
    model = ConvolutionalNeuralNetworkClassifier(input_shape=(28, 28), num_classes=10, num_filters=8, filter_size=3, pool_size=2)
    # Train model
    for epoch in range(10):
        print("Epoch: ", epoch)
        for i in range(0, train_data.shape[0], 32):
            loss = train(model, train_data[i:i+32], train_target[i:i+32], learning_rate=0.01)
            print("Loss: ", loss)
        accuracy = test(model, test_data, test_target)
        print("Accuracy: ", accuracy)

if __name__ == "__main__":
    main()

The implementation looks fairly complete for a Convolutional Neural Network (CNN) and a Fully Connected Neural Network (FCN) with a ReLU activation function. Here are a few observations and suggestions to improve the implementation:

The ReLU activation function is implemented correctly, but the derivative of ReLU needs to be implemented correctly. The derivative of ReLU is a step function where its value is 0 for negative inputs and 1 for positive inputs. However, the implementation sets all the negative values to 0 and positive values to 1, which needs to be corrected. To implement the derivative of ReLU correctly, you can use the following code:

def relu_derivative(x):
    x[x <= 0] = 0
    x[x > 0] = 1
    return x

1. The convolutional layer implementation is missing zero-padding, a common technique used in CNNs to preserve the spatial dimensions of the input data.

2. The backpropagation implementation for the convolutional layer is missing the computation of the gradient concerning the input data. This can be done using the transposed filter (also known as the flipped filter) to compute the gradient concerning the input data.

3. The update function is missing the computation of gradients for the bias terms.

4. For the FCN, the softmax activation function is missing, typically used for multi-class classification problems.

5. The backpropagation implementation for the FCN is missing the computation of gradient concerning the input data.

6. The update function for the FCN is missing the computation of gradients for the bias terms.

7. The input data is not reshaped before passing it through the FCN, but the FCN assumes it will receive a one-dimensional input. To resolve this issue, the input data should be reshaped from a three-dimensional array to a two-dimensional array before passing it through the FCN.

Overall, the implementation provides a good foundation for building a CNN and an FCN. However, the missing parts should be added to complete the implementation.

Creating a Custom Neural Network with Python from Scratch

A more advanced one is

Written by Mehmet Akif Cifci

Responses (3)