You Only Look Once

5 min readFeb 18, 2023

YOLO (You Only Look Once) is an object detection algorithm that can detect objects in images and videos in real time. It was first introduced in 2016 by Joseph Redmon et al. YOLO divides an image into a grid of cells and predicts bounding boxes and class probabilities for each cell. It then uses non-max suppression to filter out overlapping bounding boxes and produces the final set of detections.

In YOLOv3, a later version of the algorithm, several improvements were made to enhance its accuracy and speed. One of the most significant improvements was the use of feature pyramid networks, which allowed the algorithm to detect objects at different scales and resolutions. Another improvement was the introduction of residual connections, which allowed the model to learn more complex features and improved its accuracy.

import cv2
import numpy as np

# Load Yolo
net = cv2.dnn.readNet("yolov3.weights", "yolov3.cfg")
classes = []
with open("coco.names", "r") as f:
    classes = [line.strip() for line in f.readlines()]
layer_names = net.getLayerNames()
output_layers = [layer_names[i[0] - 1] for i in net.getUnconnectedOutLayers()]
colors = np.random.uniform(0, 255, size=(len(classes), 3))

# Loading image
img = cv2.imread("room_ser.jpg")
img = cv2.resize(img, None, fx=0.4, fy=0.4)
height, width, channels = img.shape

# Detecting objects
blob = cv2.dnn.blobFromImage(img, 0.00392, (416, 416), (0, 0, 0), True, crop=False)

net.setInput(blob)
outs = net.forward(output_layers)

# Showing informations on the screen
class_ids = []
confidences = []
boxes = []
for out in outs:
    for detection in out:
        scores = detection[5:]
        class_id = np.argmax(scores)
        confidence = scores[class_id]
        if confidence > 0.5:
            # Object detected
            center_x = int(detection[0] * width)
            center_y = int(detection[1] * height)
            w = int(detection[2] * width)
            h = int(detection[3] * height)

            # Rectangle coordinates
            x = int(center_x - w / 2)
            y = int(center_y - h / 2)

            boxes.append([x, y, w, h])
            confidences.append(float(confidence))
            class_ids.append(class_id)

indexes = cv2.dnn.NMSBoxes(boxes, confidences, 0.5, 0.4)
font = cv2.FONT_HERSHEY_PLAIN
for i in range(len(boxes)):
    if i in indexes:
        x, y, w, h = boxes[i]
        label = str(classes[class_ids[i]])
        color = colors[i]
        cv2.rectangle(img, (x, y), (x + w, y + h), color, 2)
        cv2.putText(img, label, (x, y + 30), font, 3, color, 3)

cv2.imshow("Image", img)
cv2.waitKey(0)
cv2.destroyAllWindows()

YOLOv4, the latest version of the algorithm, introduced several new innovations to improve its performance further. One of the most significant improvements was the introduction of the Mish activation function, which performed better than previous activation functions like ReLU and Leaky ReLU. Another improvement was the use of Spatial Pyramid Pooling, which allowed the model to capture more context and improve its accuracy.

In YOLOv5, a newer version of the algorithm released in 2020, several improvements were made to enhance its accuracy and speed even further. One of the most significant improvements was the use of a novel backbone architecture called CSPNet, which allowed for a more efficient use of computational resources and improved the model’s accuracy.

Another improvement was the use of anchor-based predictions, which helped improve the localization accuracy of the bounding boxes. YOLOv5 also introduced a new method of augmenting the training data called Mosaic data augmentation, which combines multiple images into a single training sample to increase the diversity of the training set.

In addition, YOLOv5 introduced a streamlined and modular architecture, which made it easier for researchers to customize and adapt the algorithm to their specific needs. The new architecture also improved the speed and efficiency of the algorithm, making it possible to achieve real-time object detection on lower-end hardware.

YOLOv6 is a modified version of the YOLOv5 algorithm released in 2021 by Ultralytics. It builds upon the YOLOv5 architecture, with improvements made to the model’s speed, accuracy, and memory usage. One of the most significant changes in YOLOv6 is the use of a new scaling method that allows the model to be dynamically adjusted to fit different hardware configurations. The new scaling method is achieved through the use of a compound scaling technique that adjusts the model’s depth, width, and resolution simultaneously.

YOLOv7 is not an official version of the YOLO algorithm but rather a community-driven effort to improve YOLOv5. It includes several modifications, such as a new data augmentation technique called Self-Adversarial Training (SAT), which helps the model learn robust features by adding adversarial examples to the training data. It also includes a new module called Dynamic Spatial Attention (DSA) that selectively focuses on relevant features to improve the model’s performance.

YOLOv8 is the latest version of the YOLO algorithm, released in 2022. It includes several new features that improve the model’s accuracy, speed, and efficiency. One of the most significant improvements in YOLOv8 is the use of a new loss function called IOU-aware loss, which is more sensitive to small objects and improves the model’s accuracy on challenging datasets. It also includes a new dynamic backbone network that adjusts the model’s depth and width based on the input image’s resolution, further improving the model’s performance. Finally, YOLOv8 includes new techniques to reduce the model’s memory usage and improve inference speed.

The YOLOv8 algorithm is an advanced deep learning model that requires significant computing resources and expertise to implement. It is typically trained on large datasets of labeled images and requires the use of specialized hardware such as GPUs to achieve real-time performance.

Here are the general steps involved in implementing YOLOv8 in Python:

1. Preprocess the input data: The first step in implementing YOLOv8 is to preprocess the input data, which typically involves resizing the images and normalizing the pixel values.

2. Define the YOLOv8 model: Next, you need to define the YOLOv8 model architecture in Python, which typically involves building a neural network using a deep learning framework such as TensorFlow or PyTorch. The YOLOv8 architecture includes several advanced features such as dynamic scaling and IOU-aware loss.

3. Train the YOLOv8 model: Once the model is defined, you need to train it on a large dataset of labeled images using an appropriate optimization algorithm such as stochastic gradient descent (SGD). The training process typically involves iterating over the training dataset multiple times and adjusting the model parameters to minimize the loss function.

4. Evaluate the YOLOv8 model: After training the model, you need to evaluate its performance on a validation dataset to ensure that it is accurately detecting objects in new images. This step typically involves calculating metrics such as precision, recall, and mean average precision (mAP).

5. Deploy the YOLOv8 model: Finally, you can deploy the YOLOv8 model to a production environment where it can be used to detect objects in real-time. This step typically involves optimizing the model for inference, such as using techniques like model quantization and pruning to reduce its memory and compute requirements.

Overall, YOLO has been a significant breakthrough in the field of object detection, and its continued development has led to significant improvements in accuracy and speed, making it an essential tool for many applications.

You Only Look Once

Written by Mehmet Akif Cifci

No responses yet