Learn how to build a real-time object detection system with Python using popular libraries like OpenCV and YOLO. Try the code yourself in Google Colab! π»π·
Hello! I'm Usha Pithani, and I led this real-time object detection project.
Real-time object detection involves identifying and localizing objects within video streams as they happen, with minimal latency. This technology is at the heart of many modern applications, including:
This project specifically leverages **YOLOv11** for its balance of speed and accuracy, and **OpenCV** for camera interaction and display.
Follow these steps to set up and run the real-time object detection project using your laptop's webcam in Visual Studio Code.
It's crucial to use a virtual environment to manage project dependencies. Open VS Code, navigate to your desired project folder, and open a new terminal (Ctrl+Shift+`
).
Create and activate a virtual environment:
# Create virtual environment
python -m venv venv
# Activate on Windows (PowerShell)
.\venv\Scripts\Activate.ps1
# Activate on Windows (Command Prompt)
venv\Scripts\activate.bat
# Activate on macOS/Linux
source venv/bin/activate
Once activated (you'll see (venv)
in your terminal prompt), install the required libraries:
# Install PyTorch (CPU version - suitable for most laptops)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
# Install OpenCV and other utilities
pip install opencv-python numpy matplotlib seaborn
# Install Ultralytics (includes YOLOv11 support)
pip install ultralytics
Note: If you have an NVIDIA GPU, visit PyTorch's website for the CUDA-enabled installation command for faster inference.
3.1 In VS Code, create a new file named yolo_realtime_webcam.py
and paste the following code for Real-Time Object Detection Using Webcamπ·:
import cv2
from ultralytics import YOLO
# Print OpenCV version
print(f"OpenCV version: {cv2.__version__}")
# Load the latest YOLO11 model (choose model size: n, s, m, l, x)
model = YOLO("yolo11s.pt") # Alternatives: 'yolo11n.pt', 'yolo11m.pt', etc.
# Open the laptop webcam
cap = cv2.VideoCapture(0)
if not cap.isOpened():
print("Error: Could not access the webcam.")
exit()
print("Webcam opened successfully. Press 'q' to quit the detection window.")
try:
while True:
ret, frame = cap.read()
if not ret:
print("Failed to grab frame.")
break
# Run detection; YOLO expects BGR frames (default for OpenCV)
results = model.predict(source=frame, show=False, stream=False)
# Overlay boxes; results[0].plot() returns BGR image
annotated_frame = results[0].plot()
# Display result
cv2.imshow("YOLO11 Real-time Detection", annotated_frame)
# Exit on 'q'
if cv2.waitKey(1) & 0xFF == ord('q'):
break
finally:
cap.release()
cv2.destroyAllWindows()
print("Webcam released and windows closed.")
```
traffic_analyzer.py
and paste the following code for Real-Time Object Detection Using Video Fileπ₯:import cv2
from ultralytics import YOLO
def traffic_analysis_yolo_bytetrack(video_path, output_path="output_traffic.mp4"):
model = YOLO("yolo11n.pt")
# Open the video file
cap = cv2.VideoCapture(video_path)
if not cap.isOpened():
print(f"Error: Could not open video file {video_path}")
return
# Get video properties
frame_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
frame_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
fps = int(cap.get(cv2.CAP_PROP_FPS))
# Define the codec and create VideoWriter object
fourcc = cv2.VideoWriter_fourcc(*'mp4v') # You can use 'XVID' or 'MJPG'
out = cv2.VideoWriter(output_path, fourcc, fps, (frame_width, frame_height))
print("Starting vehicle detection and tracking...")
while True:
ret, frame = cap.read()
if not ret:
print("End of video or error reading frame.")
break
results = model.track(frame, persist=True, tracker="bytetrack.yaml", conf=0.3, iou=0.5, show=False)
# Process results
if results and results[0].boxes.id is not None:
# Get annotated frame with bounding boxes and track IDs
annotated_frame = results[0].plot()
# Display the frame
cv2.imshow("Traffic Analysis", annotated_frame)
out.write(annotated_frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
# Release resources
cap.release()
out.release()
cv2.destroyAllWindows()
print(f"Traffic analysis completed. Output saved to {output_path}")
if __name__ == "__main__":
# Replace with your video file path
input_video = "road_traffic.mp4"
traffic_analysis_yolo_bytetrack(input_video)
```
4.1 Save your yolo_realtime_webcam.py
file. With your virtual environment still active in the VS Code terminal, run:
python yolo_realtime_webcam.py
4.2 Save your traffic_analyzer.py
file. With your virtual environment still active in the VS Code terminal, run:
python traffic_analyzer.py
A new window will appear showing your webcam feed with real-time object detections. To exit, click on the detection window and press the q
key.
Click the badge below to open and run the object detection code in Google Colab! π
(Replace your-notebook-url
with your Colab notebook's shareable link if you create a new one.)
Here's the typical setup and inference code you'd run in a Google Colab notebook cell for YOLOv11 for Real-Time Object Detection using Image:
# Step 1: Install the Ultralytics package (YOLO11 is supported in ultralytics >= 8.0.x)
!pip install ultralytics --upgrade --quiet
# Step 2: Import the library
from ultralytics import YOLO
# Step 3: Load a YOLOv11 pretrained model (various sizes available, e.g., 'yolo11n.pt', 'yolo11s.pt', 'yolo11m.pt', 'yolo11x.pt')
model = YOLO('yolo11s.pt')
# Step 4: Inference on an image
results = model('path/to/image.jpg')
# Step 5: Show results (in a notebook or Python script)
results.show() # Visualize detection output
results.save() # Save results images to 'runs/detect/predict'
results.print() # Print results to console
Note: In Colab, results.show()
typically displays the image directly within the notebook output, and webcam access requires specific Colab code (like cv2_imshow
and JavaScript integration) which is more complex than a simple Python script.