navigate
R&D Strategy

Raqeeb: Driver Monitoring
System

Machine learning edge deployed models for real-time driver behaviour alerts.

Phone Use
Smoking
Fatigue / Drowsiness
Seatbelt
Distracted Driving
+ 19 alerts

Mowasalat · DMS R&D · 2026

Overview

Work Flow

Data team covers the creation of the consolidated dataset, Model team develops and fine-tunes the edge and centralized models, HW team deploys on edge

DATA TEAM Open-Source DDD, StateFarm, YawDD Howen Alert Data Fleet camera alerts Raw Footage Unlabelled video Annotation Model-assisted + Human Augmentation Preprocessing pipeline Consolidated Dataset Open + Howen + Raw MODEL TEAM Architecture Research & selection Pretrained Eval On open + Howen subset awaits consolidated data Fine-tuning Edge + Centralized models Evaluation Accuracy / latency / size HW TEAM HW Assessment TOPs, memory, NPU specs constrains model selection Compatibility Model ↔ Device fit Edge Deployment On-device inference ULTIMATE OBJECTIVE Creating a consolidated rich dataset Developing trained models that outperform Howen Build our own DMS camera powered by our proprietary model, white-labeled by a contract manufacturer Data team Model team HW team Dependency / handoff
Training Pipeline

End-to-End Pipeline

Four-stage flow from raw data to validated models — with action points per stage

STAGE 1 Dataset Consolidation Unified format · Annotation · Storage Open-Source Howen Alerts Raw Footage Automatic assisted labeling Consolidated Dataset STAGE 2 Preprocessing Augmentation · Balancing Augmentation Enrich dataset STAGE 3 Model Fine-tuning Edge · Centralized · Distillation Edge Lightweight models Model compression (e.g., quantization) Centralized Image models Video models Full precision STAGE 4 Evaluation Accuracy · Latency · Deployment Accuracy Per-class F1 on Howen data Latency On-device inference < 50ms Model size Edge ≤ 5MB · INT8 viable Feedback loop → retrain
Consolidation
Data acquisition — collect and onboard new data sources [ Data ]
Annotation pipeline — deploy auto-label model + human review loop [ Data ]
Centralized storage — secure data store setup [ Data ]
Automating ingestion — automated pipeline for incoming data [ Data ]
Preprocessing
Augmentation transforms — add extra images from existing to enhance models performance [ Data ]
Cleaning and preprocessing — partitioning, human validation [ Data ]
Fine-tuning
Edge model fine-tuning — 1 general vs. many specialized, detection vs. classification, image vs. video architectures [ Model ]
Centralized model fine-tuning — dataset annotation, complex use-case offloading, edge monitoring [ Model ]
Pretrained evaluation — evaluate on open-source datasets and a subset of Howen alert data [ Model ]
Architecture research — experiment with model architectures and sizes [ Model ]
HW compatibility check — validate model fits NPU constraints [ HW ]
Evaluation
Benchmark on Howen data — per-class F1, confusion matrices [ Model ]
Latency profiling — measure on-device inference and overall edge performance [ HW ]
Models

DMS Model Landscape

Comprehensive reference of open-source models for driver behaviour classification and detection

15 models
5 task categories
Updated: March 2026
1

Frame Classification

Single image → class label METRIC: ACCURACY
# Backbone Covered Events Size Dataset Accuracy
Pi 5 +
Hailo 8
Jetson
Orin Nano
1 MobileNetV2
TEEN-D
FatigueDrowsiness1/5
~14 MB Kaggle DDD 98.99%
2 VGG16
TEEN-D
FatigueDrowsiness1/5
~528 MB Kaggle DDD 97.51%
3 ResNet18
TEEN-D
FatigueDrowsiness1/5
~45 MB Kaggle DDD 95.28%
4 EfficientNet-B0
Followb1ind1y
DistractedPhoneEating2/5
~21 MB StateFarm 97.43%
5 MobileNetV3-Large
Followb1ind1y
DistractedPhoneEating2/5
~22 MB StateFarm 95.03%
6 YOLOv11x-cls
mosesb/drowsiness-detection-yolo-cls
FatigueEyesYawning1/5
~60 MB Kaggle DDD + custom 97.8%
7 MobileViT-v2-200
mosesb/drowsiness-detection-mobileViT-v2
FatigueEyesYawning1/5
~80 MB Kaggle DDD + custom 96.1%
8 ViT-Base-patch16-224
chbh7051/vit-driver-drowsiness-detection
FatigueEyesYawning1/5
~330 MB chbh7051/driver-drowsiness 99.0%
9 Custom CNN + VGG16
TEEN-D/Driver-Drowsiness-Detection
FatigueEyesYawning1/5
~140 MB Kaggle DDD 91.6% ~
2

Clip Classification

Video sequence → class label METRIC: ACCURACY
# Backbone Covered Events Size Dataset Performance
Pi 5 +
Hailo 8
Jetson
Orin Nano
10 TSM + ResNet50
8-class DMS action recognition on NIR video
Safe drivingPhoneEatingDistractedSeatbeltLooking around3/5
~98 MB Drive&Act (NIR) 63.28% Acc
11 Video Swin Transformer
Video Swin Transformer (2023)
Distracted (9 classes)Fatigue2/5
~200 MB DMD 97.5% Acc
3

Frame Object Detection

Single image → bounding box + class label METRIC: mAP50

mAP50 measures how accurately the model both locates and classifies objects — a score of 90% means the model correctly detects 9 out of 10 objects with at least 50% bounding box overlap.

# Backbone Covered Events Size Dataset mAP50
Pi 5 +
Hailo 8
Jetson
Orin Nano
12 YOLOv8
Driver behaviors (Jui) — 9,901 images
SeatbeltSmokingPhone3/5
~6 MB Roboflow: Driver behaviors ~85–90%
13 YOLOv8
DMD (Driver Monitoring) — 9,739 images
DistractedFatigueEyes2/5
~22 MB Roboflow: DMD ~89–92%
14 YOLOv8 + MHSA + ECA
ME-YOLOv8 (Debsi et al. 2024)
PhoneEyesYawnSeatbeltFatigue3/5
~30–50 MB DDFDD + StateFarm + YawDD 87–93%
15 YOLOv8 + DBCA + AFGCA
DAHD-YOLO (MDPI Sensors 2025)
SmokingPhoneDistracted3/5
~30–60 MB Custom + Kaggle cigarette ~90%
4

Clip Detection + Tracking

Video sequence → tracked bounding boxes + class labels METRIC: VARIES
No models with public datasets catalogued. Video-based detection with tracking (e.g., YOLOv8 + SORT/DeepSORT) is relevant for continuous event monitoring but existing implementations use private or custom datasets. This remains an open area for evaluation.
5

Pipeline (Detection → Classification)

Detect / crop region → classify the crop (two-stage)
No standalone pipeline models catalogued yet. However, several Frame Classification models above (rows 1–9) implicitly assume an upstream face/body detector that produces the input crop. In production, these would operate as a two-stage pipeline: e.g., face detector (MTCNN, RetinaFace, or YOLO-face) → crop → classify with MobileNetV3 / EfficientNet.
● ≥95% ● ≥88% ● <88% ● N/A
|
Compatible ~ Conditional Incompatible
|
Pi 5: ≤150 MB, INT8 req.  ·  Jetson: ≤1 GB, quant. optional
Hardware

Prototyping Platforms

Two candidate edge devices for DMS deployment — NPU-accelerated vs GPU-accelerated inference

Raspberry Pi 5

Raspberry Pi 5

with Hailo 8 AI Hat+
NPU Accelerated
Power5.1V @ 5A  (stable)
CPUQuad-core ARM Cortex-A76 @ 2.4 GHz
AcceleratorHailo 8 NPU — 26 TOPS, 2.5 W
RAM16 GB LPDDR4X
Camera I/F2× MIPI CSI-2
SensorPi Cam 3 — 12 MP, IMX708 (4608×2592)
Model Deployment
Hailo SDK TF Lite ONNX Keras
Constraints
≤ 150 MBMax model size
INT8Quantization required
5 – 10 TOPSUsable compute budget
ParallelMulti-model inference supported
Jetson Orin Nano

Jetson Orin Nano

NVIDIA Ampere Architecture
GPU Accelerated
Power7 – 19V @ 5A
CPU6-core ARM Cortex-A78 @ 2.4 GHz
AcceleratorCUDA Cores (Ampere) — 67 TOPS
RAM8 GB LPDDR5
Camera I/F2× MIPI CSI-2
SensorPi Cam 3 — 12 MP, IMX708 (4608×2592)
Model Deployment
TensorRT TensorFlow ONNX
Constraints
≤ 1 GBMax model size
OptionalQuantization not required
30 – 50 TOPSUsable compute budget
Shared camera module — Both platforms use Pi Camera 3 (12 MP, Sony IMX708, 4608×2592) via 2× MIPI CSI-2 interface
Datasets

Dataset: Drowsy Detection

48×48 face crops • 2 classes • 7,342 images • Kaggle (yasharjebraeily)

Drowsy Detection Samples

Used to validate TEEN-D model replications (MobileNetV2, ResNet18, VGG16).

Datasets

Dataset: StateFarm Distracted Driver

Dashboard camera • 10 classes • 22,424 images • Kaggle competition

StateFarm Samples

Used to validate Followb1ind1y replications (EfficientNet-B0, MobileNetV3-Large).

Datasets

Dataset: Drive&Act (NIR, Inner Mirror)

8 DMS classes after remapping • 11,895 clips of 8 frames • 640×480 NIR

Safe Driving
Eating / Drinking
Phone Use
Visual Distraction
Manual Distraction
Looking Around
Seatbelt Interaction
Vehicle Entry / Exit
Datasets

Label Remapping & Class Distribution

Drive&Act mid-level activities grouped by safety-relevant driver behavior

Label Remapping Class Distribution

Drive&Act defines 34 fine-grained activities collapsed into 8 DMS-relevant categories. 35× class imbalance (safe_driving: 3,707 vs looking_around: 107).

TSM Architecture

Drive&Act Preprocessing Pipeline

NIR-specific preprocessing before temporal shift module classification

Preprocessing Pipeline

NIR frames are extremely dark (88% of pixels below 50/255). CLAHE redistributes contrast, then frames are normalized with ImageNet statistics at full 640×480 resolution.

Models

Sample Predictions

Green = correct, Red = incorrect — mix of successes and failures

EfficientNet-B0 Predictions TEEN-D MobileNetV2 Predictions
Results

TSM Results on Drive&Act

Per-class Precision, Recall, and F1 on the test set

TSM Per-Class Metrics

Strong performance on visual_distraction (F1 78.1%) and safe_driving (F1 72.9%). Minority classes (looking_around, seatbelt_interaction) suffer from low support.

Results

TSM Predictions on Drive&Act

Click any clip to expand. Left = most confident correct, Right = most confident wrong.

Most Confident Correct
Most Confident Wrong

Each row = one DMS class. Confidence = softmax probability for the predicted label.