Creating a Custom Neural Network with Python from Scratch
Building a neural network from scratch in Python can be a great way to learn about deep learning and understand how neural networks work. Here is a general outline of the steps to build a simple neural network in Python:
1. Import necessary libraries: You will need to import libraries such as Numpy and Matplotlib for matrix operations and visualizing the results.
2. Prepare the data: This includes loading the data, preprocessing it, and splitting it into training and testing sets.
3. Define the model architecture: This includes defining the input, hidden, and output layers. You can also specify the activation function to use and the loss function for training.
4. Initialize the weights: Randomly initialize the weights of the model.
5. Feedforward: Implement the feedforward calculation, where the inputs are passed through the model to get the output.
6. Calculate the loss: The loss between the predicted output and the actual output.
7. Backpropagation: Implement the backpropagation algorithm to update the weights and reduce the loss.
8. Train the model: Train the model for a specified number of epochs, updating the weights after each iteration.
9. Evaluate the model: Evaluate the model on the test data to get the accuracy.
10. Make predictions: Use the trained model to make predictions on new data.
This is a high-level overview of the steps involved in building a neural network from scratch in Python. Understanding the underlying concepts and math is essential before implementing a neural network.
import numpy as np
def sigmoid(x):
return 1/(1 + np.exp(-x))
def sigmoid_derivative(x):
return x * (1 - x)
class NeuralNetwork:
def __init__(self, x, y):
self.input = x
self.weights1 = np.random.rand(self.input.shape[1], 4)
self.weights2 = np.random.rand(4, 1)
self.y = y
self.output = np.zeros(y.shape)
def feedforward(self):
self.layer1 = sigmoid(np.dot(self.input, self.weights1))
self.output = sigmoid(np.dot(self.layer1, self.weights2))
def backprop(self):
d_weights2 = np.dot(self.layer1.T, 2*(self.y - self.output) * sigmoid_derivative(self.output))
d_weights1 = np.dot(self.input.T, np.dot(2*(self.y - self.output) * sigmoid_derivative(self.output), self.weights2.T) * sigmoid_derivative(self.layer1))
self.weights1 += d_weights1
self.weights2 += d_weights2
def train(self, X, y):
self.output = np.zeros(y.shape)
self.input = X
self.y = y
self.feedforward()
self.backprop()
X = np.array([[0,0,1], [0,1,1], [1,0,1], [1,1,1]])
y = np.array([[0],[1],[1],[0]])
nn = NeuralNetwork(X, y)
for i in range(1500):
nn.train(X, y)
print(nn.output)
This code builds a simple feedforward neural network with a single hidden layer and trains it on the XOR problem. The network has an input layer, two weight matrices, and an output layer. The sigmoid
function and its derivative, sigmoid_derivative
, are used as the activation functions. The feedforward
method implements the feedforward calculation and the backprop
method implements the backpropagation algorithm to update the weights. The train
method trains the model for a single iteration. The model is trained for 1500 iterations and the final output is printed.
A more advanced one is
import numpy as np
def relu(x):
return np.maximum(0, x)
def relu_derivative(x):
x[x<=0] = 0
x[x>0] = 1
return x
class ConvolutionalNeuralNetwork:
def __init__(self, input_shape, num_filters, filter_size, pool_size):
self.input_shape = input_shape
self.num_filters = num_filters
self.filter_size = filter_size
self.pool_size = pool_size
self.weights = np.random.rand(num_filters, filter_size, filter_size) / (filter_size * filter_size)
self.bias = np.zeros((num_filters, 1))
def convolution(self, input_data):
self.feature_maps = np.zeros((input_data.shape[0]-self.filter_size+1, input_data.shape[1]-self.filter_size+1, self.num_filters))
for filter_idx in range(self.num_filters):
current_filter = self.weights[filter_idx]
for i in range(self.feature_maps.shape[0]):
for j in range(self.feature_maps.shape[1]):
self.feature_maps[i][j][filter_idx] = np.sum(input_data[i:i+self.filter_size, j:j+self.filter_size] * current_filter)
return relu(self.feature_maps + self.bias)
def max_pooling(self, feature_maps):
self.pooled_features = np.zeros((int(feature_maps.shape[0]/self.pool_size), int(feature_maps.shape[1]/self.pool_size), self.num_filters))
for filter_idx in range(self.num_filters):
for i in range(0, feature_maps.shape[0], self.pool_size):
for j in range(0, feature_maps.shape[1], self.pool_size):
self.pooled_features[int(i/self.pool_size)][int(j/self.pool_size)][filter_idx] = np.max(feature_maps[i:i+self.pool_size, j:j+self.pool_size, filter_idx])
return self.pooled_features
def feedforward(self, input_data):
conv_out = self.convolution(input_data)
pool_out = self.max_pooling(conv_out)
return pool_out
def backprop(self, input_data, gradient_signal):
pool_out_grad = np.zeros(self.feature_maps.shape)
for filter_idx in range(self.num_filters):
for i in range(0, self.feature_maps.shape[0], self.pool_size):
for j in range(0, self.feature_maps.shape[1], self.pool_size):
pool_out_grad[i:i+self.pool_size, j:j+self.pool_size, filter_idx] = gradient_signal[int(i/self.pool_size)][int(j/self.pool_size)][filter_idx]
conv_out_grad = relu_derivative(self.feature_maps) * pool_out_grad
self.weights_grad = np.zeros(self.weights.shape)
self.bias_grad = np.zeros(self.bias.shape)
for filter_idx in range(self.num_filters):
for i in range(self.feature_maps.shape[0]):
for j in range(self.feature_maps.shape[1]):
self.weights_grad[filter_idx] += input_data[i:i+self.filter_size, j:j+self.filter_size] * conv_out_grad[i][j][filter_idx]
self.bias_grad[filter_idx] += conv_out_grad[i][j][filter_idx]
return conv_out_grad
def update(self, learning_rate):
self.weights -= learning_rate * self.weights_grad
self.bias -= learning_rate * self.bias_grad
class FullyConnectedNeuralNetwork:
def __init__(self, input_shape, num_classes):
self.input_shape = input_shape
self.num_classes = num_classes
self.weights = np.random.rand(input_shape, num_classes) / input_shape
self.bias = np.zeros((num_classes, 1))
def feedforward(self, input_data):
self.input_data = input_data
return relu(np.dot(input_data, self.weights) + self.bias)
def backprop(self, gradient_signal):
self.weights_grad = np.dot(self.input_data.T, gradient_signal)
self.bias_grad = np.sum(gradient_signal, axis=0, keepdims=True)
return np.dot(gradient_signal, self.weights.T)
def update(self, learning_rate):
self.weights -= learning_rate * self.weights_grad
self.bias -= learning_rate * self.bias_grad
class Softmax:
def __init__(self):
pass
def feedforward(self, input_data):
self.input_data = input_data
exp_values = np.exp(input_data - np.max(input_data, axis=1, keepdims=True))
probabilities = exp_values / np.sum(exp_values, axis=1, keepdims=True)
return probabilities
def backprop(self, gradient_signal):
return gradient_signal
def update(self, learning_rate):
pass
class CrossEntropyLoss:
def __init__(self):
pass
def feedforward(self, input_data, target):
self.input_data = input_data
self.target = target
return -np.sum(target * np.log(input_data + 1e-8))
def backprop(self):
return -self.target / self.input_data
def update(self, learning_rate):
pass
class ConvolutionalNeuralNetworkClassifier:
def __init__(self, input_shape, num_classes, num_filters, filter_size, pool_size):
self.convolutional_neural_network = ConvolutionalNeuralNetwork(input_shape, num_filters, filter_size, pool_size)
self.fully_connected_neural_network = FullyConnectedNeuralNetwork(int((input_shape[0]-filter_size+1)/pool_size) * int((input_shape[1]-filter_size+1)/pool_size) * num_filters, num_classes)
self.softmax = Softmax()
self.cross_entropy_loss = CrossEntropyLoss()
def feedforward(self, input_data):
conv_out = self.convolutional_neural_network.feedforward(input_data)
pool_out = self.convolutional_neural_network.max_pooling(conv_out)
fc_out = self.fully_connected_neural_network.feedforward(pool_out.reshape(pool_out.shape[0], -1))
return self.softmax.feedforward(fc_out)
def backprop(self, gradient_signal):
softmax_grad = self.softmax.backprop(gradient_signal)
fc_grad = self.fully_connected_neural_network.backprop(softmax_grad)
conv_grad = self.convolutional_neural_network.backprop(self.convolutional_neural_network.pooled_features.reshape(self.convolutional_neural_network.pooled_features.shape[0], -1), fc_grad.reshape(self.convolutional_neural_network.pooled_features.shape))
return conv_grad
def update(self, learning_rate):
self.convolutional_neural_network.update(learning_rate)
self.fully_connected_neural_network.update(learning_rate)
def train(self, input_data, target, learning_rate):
output = self.feedforward(input_data)
loss = self.cross_entropy_loss.feedforward(output, target)
gradient = self.cross_entropy_loss.backprop()
self.backprop(gradient)
self.update(learning_rate)
return loss
def predict(self, input_data):
output = self.feedforward(input_data)
return np.argmax(output, axis=1)
def train(model, input_data, target, learning_rate):
loss = model.train(input_data, target, learning_rate)
return loss
def test(model, input_data, target):
predictions = model.predict(input_data)
accuracy = np.mean(predictions == np.argmax(target, axis=1))
return accuracy
def main():
# Load data
train_data, train_target, test_data, test_target = load_data()
# Create model
model = ConvolutionalNeuralNetworkClassifier(input_shape=(28, 28), num_classes=10, num_filters=8, filter_size=3, pool_size=2)
# Train model
for epoch in range(10):
print("Epoch: ", epoch)
for i in range(0, train_data.shape[0], 32):
loss = train(model, train_data[i:i+32], train_target[i:i+32], learning_rate=0.01)
print("Loss: ", loss)
accuracy = test(model, test_data, test_target)
print("Accuracy: ", accuracy)
if __name__ == "__main__":
main()
The implementation looks fairly complete for a Convolutional Neural Network (CNN) and a Fully Connected Neural Network (FCN) with a ReLU activation function. Here are a few observations and suggestions to improve the implementation:
The ReLU activation function is implemented correctly, but the derivative of ReLU needs to be implemented correctly. The derivative of ReLU is a step function where its value is 0 for negative inputs and 1 for positive inputs. However, the implementation sets all the negative values to 0 and positive values to 1, which needs to be corrected. To implement the derivative of ReLU correctly, you can use the following code:
def relu_derivative(x):
x[x <= 0] = 0
x[x > 0] = 1
return x
1. The convolutional layer implementation is missing zero-padding, a common technique used in CNNs to preserve the spatial dimensions of the input data.
2. The backpropagation implementation for the convolutional layer is missing the computation of the gradient concerning the input data. This can be done using the transposed filter (also known as the flipped filter) to compute the gradient concerning the input data.
3. The update function is missing the computation of gradients for the bias terms.
4. For the FCN, the softmax activation function is missing, typically used for multi-class classification problems.
5. The backpropagation implementation for the FCN is missing the computation of gradient concerning the input data.
6. The update function for the FCN is missing the computation of gradients for the bias terms.
7. The input data is not reshaped before passing it through the FCN, but the FCN assumes it will receive a one-dimensional input. To resolve this issue, the input data should be reshaped from a three-dimensional array to a two-dimensional array before passing it through the FCN.
Overall, the implementation provides a good foundation for building a CNN and an FCN. However, the missing parts should be added to complete the implementation.