一個(gè)簡單的車輛目標(biāo)檢測和跟蹤示例
介紹
目標(biāo)檢測:目標(biāo)檢測是指在圖像或視頻幀中識別和定位特定目標(biāo),并使用邊界框來確定它們的位置。YOLO(You Only Look Once)是一種高效的單階段目標(biāo)檢測算法,以其快速的處理速度和較高的準(zhǔn)確性而聞名。與傳統(tǒng)的兩階段檢測算法相比,YOLO的優(yōu)勢在于它能夠一次性處理整個(gè)圖像,從而實(shí)現(xiàn)實(shí)時(shí)目標(biāo)檢測,這在自動(dòng)駕駛、視頻監(jiān)控和機(jī)器人導(dǎo)航等應(yīng)用中尤為重要。
目標(biāo)跟蹤:目標(biāo)跟蹤則關(guān)注于在視頻序列中連續(xù)跟蹤已識別的目標(biāo)。SORT(Simple Online and Realtime Tracking)算法因其簡單性和實(shí)時(shí)性而被廣泛應(yīng)用于目標(biāo)跟蹤任務(wù)。其通過預(yù)測目標(biāo)的運(yùn)動(dòng)軌跡并實(shí)時(shí)更新其位置,有效實(shí)現(xiàn)了目標(biāo)的連續(xù)跟蹤。結(jié)合YOLO進(jìn)行檢測和SORT進(jìn)行跟蹤,可以實(shí)現(xiàn)目標(biāo)的連續(xù)監(jiān)控和分析,確保在整個(gè)視頻序列中的準(zhǔn)確和一致的跟蹤。項(xiàng)目 我們將使用YOLOv8m(中等版本)、OpenCV和SORT進(jìn)行目標(biāo)檢測,以確保準(zhǔn)確性和效率,來計(jì)算通過我們視頻中特定區(qū)域的車輛數(shù)量。
項(xiàng)目簡介
本項(xiàng)目旨在通過結(jié)合使用YOLOv8m(一種中等復(fù)雜度的YOLO變體)、OpenCV(一個(gè)開源的計(jì)算機(jī)視覺庫)和SORT算法,實(shí)現(xiàn)對視頻中特定區(qū)域內(nèi)通過的車輛數(shù)量的準(zhǔn)確計(jì)算。這一過程不僅確保了目標(biāo)檢測的準(zhǔn)確性,也提高了整個(gè)系統(tǒng)的效率。
1. 選擇一個(gè)視頻
2. 創(chuàng)建掩膜
為了專注于橋下的車輛,我們將利用畫布創(chuàng)建一個(gè)掩膜。掩膜是一個(gè)二值圖像,僅包含黑色(0)和白色(255)兩種像素值。在RGB色彩空間中,這對應(yīng)于:
- 白色(255, 255, 255)表示感興趣的區(qū)域,算法將在這些區(qū)域進(jìn)行處理。
- 黑色(0, 0, 0)表示要忽略或排除在處理之外的區(qū)域。
通過按位操作將掩膜與視頻結(jié)合,我們實(shí)現(xiàn)以下結(jié)果:
3. 定義一個(gè)區(qū)域
我們將在視頻中定義兩個(gè)區(qū)域:一個(gè)用于計(jì)算向下行駛的車輛數(shù)量,另一個(gè)用于計(jì)算向上行駛的車輛數(shù)量。
當(dāng)在指定區(qū)域內(nèi)識別到車輛時(shí),我們將改變該區(qū)域的顏色為綠色,表示檢測到車輛。
4. 構(gòu)建布局
讓我們使用cvzone構(gòu)建計(jì)數(shù)器的布局。
5. 代碼
- cv2:執(zhí)行圖像和視頻處理
- cvzone:與OpenCV協(xié)同工作
- numpy:處理數(shù)值運(yùn)算
- YOLO:應(yīng)用目標(biāo)檢測
- sort:用于跟蹤檢測到的目標(biāo)的SORT庫
import cv2
import numpy as np
from ultralytics import YOLO
import cvzone
from sort import sort
class_names = [
'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'stop sign', 'parking meter', 'bench',
'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', 'skis',
'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife',
'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed',
'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock',
'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush'
]
class_names_goal = ['car']
model = YOLO('yolov8m.pt')
tracker = sort.Sort(max_age=20)
mask = cv2.imread('mask.png')
video = cv2.VideoCapture('traffic.mp4')
width = 1280
height = 720
line_left_road_x1 = 256
line_left_road_x2 = 500
line_left_road_y = 472
line_right_road_x1 = 672
line_right_road_x2 = 904
line_right_road_y = 472
vehicle_left_road_id_count = []
vehicle_right_road_id_count = []
while True:
success, frame = video.read()
if not success:
break
frame = cv2.resize(frame, (width, height))
image_region = cv2.bitwise_and(frame, mask)
results = model(image_region, stream=True)
detections = []
cv2.line(frame, (line_left_road_x1, line_left_road_y) ,(line_left_road_x2, line_left_road_y), (0, 0, 255))
cv2.line(frame, (line_right_road_x1, line_right_road_y) ,(line_right_road_x2, line_right_road_y), (0, 0, 255))
for result in results:
for box in result.boxes:
class_name = class_names[int(box.cls[0])]
if not class_name in class_names_goal:
continue
confidence = round(float(box.conf[0]) * 100, 2)
if confidence < 30:
continue
x1, y1, x2, y2 = box.xyxy[0]
x1, y1, x2, y2 = int(x1), int(y1), int(x2), int(y2)
detections.append([x1, y1, x2, y2, float(box.conf[0])])
tracked_objects = tracker.update(np.array(detections))
for obj in tracked_objects:
x1, y1, x2, y2, obj_id = [int(i) for i in obj]
confidence_pos_x1 = max(0, x1)
confidence_pos_y1 = max(36, y1)
cv2.rectangle(frame, (x1, y1), (x2, y2), (255, 0, 255), 2)
cvzone.putTextRect(frame, f'ID: {obj_id}', (confidence_pos_x1, confidence_pos_y1), 1, 1)
center_x = (x1 + x2) // 2
center_y = (y1 + y2) // 2
if line_left_road_y - 10 < center_y < line_left_road_y + 10 and line_left_road_x1 < center_x < line_left_road_x2:
if not obj_id in vehicle_left_road_id_count:
vehicle_left_road_id_count.append(obj_id)
cv2.line(frame, (line_left_road_x1, line_left_road_y) ,(line_left_road_x2, line_left_road_y), (0, 255, 0), 2)
if line_right_road_y - 10 < center_y < line_right_road_y + 10 and line_right_road_x1 < center_x < line_right_road_x2:
if not obj_id in vehicle_right_road_id_count:
vehicle_right_road_id_count.append(obj_id)
cv2.line(frame, (line_right_road_x1, line_right_road_y) ,(line_right_road_x2, line_right_road_y), (0, 255, 0), 2)
cvzone.putTextRect(frame, f'Car Left Road Count: {len(vehicle_left_road_id_count)}', (50, 50), 2, 2, offset=20, border=2, colorR=(140, 57, 31), colorB=(140, 57, 31))
cvzone.putTextRect(frame, f'Car Right Road Count: {len(vehicle_right_road_id_count)}', (width - 460, 50), 2, 2, offset=20, border=2, colorR=(140, 57, 31), colorB=(140, 57, 31))
cv2.imshow('Image', frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
video.release()
cv2.destroyAllWindows()
6. 結(jié)果
源碼:https://github.com/VladeMelo/collaborative-filtering