navigate
R&D Strategy

Raqeeb: Driver Monitoring
System

Machine learning edge deployed models for real-time driver behaviour alerts.

Phone Use
Smoking
Fatigue / Drowsiness
Seatbelt
Distracted Driving

Mowasalat · DMS R&D Scope · 2026

Overview

Work Flow

Data team covers the creation of the consolidated dataset, Model team develops and fine-tunes the edge and centralized models, HW team deploys on edge

DATA TEAM Open-Source Roboflow*, Kaggle, research. Howen Alert Data Fleet camera alerts Raw Footage Unlabelled video Annotation Model-assisted + Human Augmentation Preprocessing pipeline Consolidated Dataset Open + Howen + Raw MODEL TEAM Architecture Research & selection Pretrained Eval On open + Howen subset awaits consolidated data Fine-tuning Edge + Centralized models Evaluation Accuracy / latency / size HW TEAM HW Assessment TOPs, memory, NPU specs limits model size Compatibility Model ↔ Device fit Edge Deployment On-device inference ULTIMATE OBJECTIVE Creating a consolidated rich dataset create Mowasalat models that outperform the current OEMs models (97%) Build our own DMS camera powered by our proprietary model, white-labeled by a contract manufacturer Data team Model team HW team Dependency / handoff
Training Pipeline

End-to-End Pipeline

Four-stage flow from raw data to validated models — with action points per stage

STAGE 1 Dataset Consolidation Unified format · Annotation · Storage Open-Source Howen Alerts Raw Footage Automatic assisted labeling Consolidated Dataset STAGE 2 Preprocessing Augmentation · Balancing Augmentation Enrich dataset STAGE 3 Model Fine-tuning Edge · Centralized · Distillation Edge Lightweight models Model compression (e.g., quantization) Centralized Image models Video models Full precision STAGE 4 Evaluation Accuracy · Latency · Deployment Accuracy Per-class F1 on Howen data Latency On-device inference < 100ms Model size Edge ≤ 500MB
Consolidation
Data acquisition — collect and onboard new data sources (Roboflow*, Kaggle, research, etc.) [ Data ]
Annotation pipeline — deploy auto-label model + human review loop [ Data ]
Centralized storage* — secure data store setup [ Data ]
Automating ingestion — automated pipeline for incoming data [ Data ]
Preprocessing
Augmentation transforms — modifies existing images to reflect various environment conditions [ Data ]
Cleaning and preprocessing — partitioning, human validation [ Data ]
Fine-tuning
Edge model fine-tuning — 1 general vs. many specialized, detection vs. classification, image vs. video architectures [ Model ]
Centralized model fine-tuning — dataset annotation, complex use-case offloading, edge monitoring [ Model ]
Pretrained evaluation — evaluate on open-source datasets and a subset of Howen alert data [ Model ]
Architecture research — experiment with model architectures and sizes [ Model ]
HW compatibility check — validate model fits NPU constraints [ HW ]
Evaluation
Benchmark & error analysis — per-class F1, confusion matrices [ Model ]
Latency profiling — measure on-device inference and overall edge performance [ HW ]
Hardware

Prototyping Platforms

Two candidate edge devices for DMS deployment — NPU-accelerated vs GPU-accelerated inference

Raspberry Pi 5

Raspberry Pi 5

with Hailo 8 AI Hat+
NPU Accelerated
Power5.1V @ 5A  (stable)
CPUQuad-core ARM Cortex-A76 @ 2.4 GHz
AcceleratorHailo 8 NPU — 26 TOPS, 2.5 W
RAM16 GB LPDDR4X
Camera I/F2× MIPI CSI-2
SensorPi Cam 3 — 12 MP, IMX708 (4608×2592)
Model Deployment
Hailo SDK TF Lite ONNX Keras
Constraints
≤ 150 MBMax model size
INT8Quantization required
5 – 10 TOPSUsable compute budget
ParallelMulti-model inference supported
Jetson Orin Nano

Jetson Orin Nano

NVIDIA Ampere Architecture
GPU Accelerated
Power7 – 19V @ 5A
CPU6-core ARM Cortex-A78 @ 2.4 GHz
AcceleratorCUDA Cores (Ampere) — 67 TOPS
RAM8 GB LPDDR5
Camera I/FMIPI CSI-2 / USB
SensorArducam Jetson series / USB camera
Model Deployment
TensorRT TensorFlow ONNX
Constraints
≤ 1 GBMax model size
OptionalQuantization not required
30 – 50 TOPSUsable compute budget
Camera — Pi 5 uses Pi Camera 3 (12 MP, IMX708) via MIPI CSI-2. Jetson uses Arducam Jetson series or USB camera.
Models

DMS Model Landscape

Comprehensive reference of open-source models for driver behaviour classification and detection

15 models
5 task categories
Updated: March 2026
1

Frame Classification

Single image → class label METRIC: ACCURACY
# Model Name Covered Events Size Dataset Accuracy
Pi 5 +
Hailo 8
Jetson
Orin Nano
1 🤗 MobileNetV2
TEEN-D
FatigueDrowsiness1/5
~14 MB Kaggle DDD 98.99%
2 🤗 VGG16
TEEN-D
FatigueDrowsiness1/5
~528 MB Kaggle DDD 97.51%
3 🤗 ResNet18
TEEN-D
FatigueDrowsiness1/5
~45 MB Kaggle DDD 95.28%
4 EfficientNet-B0
Followb1ind1y
DistractedPhoneEating2/5
~21 MB StateFarm 97.43%
5 MobileNetV3-Large
Followb1ind1y
DistractedPhoneEating2/5
~22 MB StateFarm 95.03%
6 🤗 YOLOv11x-cls
mosesb/drowsiness-detection-yolo-cls
FatigueEyesYawning1/5
~60 MB Kaggle DDD + custom 97.8%
7 🤗 MobileViT-v2-200
mosesb/drowsiness-detection-mobileViT-v2
FatigueEyesYawning1/5
~80 MB Kaggle DDD + custom 96.1%
8 🤗 ViT-Base-patch16-224
chbh7051/vit-driver-drowsiness-detection
FatigueEyesYawning1/5
~330 MB chbh7051/driver-drowsiness 99.0%
9 🤗 Custom CNN + VGG16
TEEN-D/Driver-Drowsiness-Detection
FatigueEyesYawning1/5
~140 MB Kaggle DDD 91.6% ~
2

Clip Classification

Video sequence → class label METRIC: ACCURACY
# Model Name Covered Events Size Dataset Performance
Pi 5 +
Hailo 8
Jetson
Orin Nano
10 TSM + ResNet50
8-class DMS action recognition on NIR video
Safe drivingPhoneEatingDistractedSeatbeltLooking around3/5
~98 MB Drive&Act (NIR) 63.28% Acc
11 Video Swin Transformer
Video Swin Transformer (2023)
Distracted (9 classes)Fatigue2/5
~200 MB DMD 97.5% Acc
3

Frame Object Detection

Single image → bounding box + class label METRIC: mAP50

mAP50 measures how accurately the model both locates and classifies objects — a score of 90% means the model correctly detects 9 out of 10 objects with at least 50% bounding box overlap.

# Model Name Covered Events Size Dataset mAP50
Pi 5 +
Hailo 8
Jetson
Orin Nano
12 R YOLOv8
Driver behaviors (Jui) — 9,901 images
SeatbeltSmokingPhone3/5
~6 MB Roboflow: Driver behaviors ~85–90%
13 R YOLOv8
DMD (Driver Monitoring) — 9,739 images
DistractedFatigueEyes2/5
~22 MB Roboflow: DMD ~89–92%
14 YOLOv8 + MHSA + ECA
ME-YOLOv8 (Debsi et al. 2024)
PhoneEyesYawnSeatbeltFatigue3/5
~30–50 MB DDFDD + StateFarm + YawDD 87–93%
15 YOLOv8 + DBCA + AFGCA
DAHD-YOLO (MDPI Sensors 2025)
SmokingPhoneDistracted3/5
~30–60 MB Custom + Kaggle cigarette ~90%
16 YOLOv8 (Core ML)
DriveSafe — 1,886 images
FatigueDrowsinessDistracted1/5
~6 MB FORM + YawDD ~92% ~ ~
4

Clip Detection + Tracking

Video sequence → tracked bounding boxes + class labels METRIC: VARIES
No models with public datasets catalogued. Video-based detection with tracking (e.g., YOLOv8 + SORT/DeepSORT) is relevant for continuous event monitoring but existing implementations use private or custom datasets.
5

Pipeline (Detection → Classification)

Detect / crop region → classify the crop (two-stage)
No standalone pipeline models catalogued yet. However, several Frame Classification models above (rows 1–9) implicitly assume an upstream face/body detector that produces the input crop. In production, these would operate as a two-stage pipeline: e.g., face detector (MTCNN, RetinaFace, or YOLO-face) → crop → classify with MobileNetV3 / EfficientNet.
● ≥95% ● ≥88% ● <88% ● N/A
|
Compatible ~ Conditional Incompatible
|
Pi 5: ≤150 MB, INT8 req.  ·  Jetson: ≤1 GB, quant. optional
Datasets

Dataset: Drowsy Detection

48×48 face crops • 2 classes • 7,342 images • Kaggle

Drowsy Detection Samples
Datasets

Dataset: StateFarm Distracted Driver

Dashboard camera • 10 classes • 22,424 images • Kaggle

StateFarm Samples

Datasets

Dataset: Drive&Act (NIR, Inner Mirror)

8 DMS classes after remapping • 11,895 clips of 8 frames • 640×480 NIR

Safe Driving
Eating / Drinking
Phone Use
Visual Distraction
Manual Distraction
Looking Around
Seatbelt Interaction
Vehicle Entry / Exit
Datasets

Label Remapping & Class Distribution

Drive&Act mid-level activities grouped by safety-relevant driver behavior

Label Remapping Class Distribution

Drive&Act defines 34 fine-grained activities collapsed into 8 DMS-relevant categories. 35× class imbalance (safe_driving: 3,707 vs looking_around: 107).

TSM Architecture

Drive&Act Preprocessing Pipeline

NIR-specific preprocessing before temporal shift module classification

Preprocessing Pipeline

NIR frames are extremely dark, therefore preprocessing is necessary to make it compatible with open source models trained on ImageNet.

Models

Sample Predictions

Green = correct, Red = incorrect — mix of successes and failures

EfficientNet-B0 Predictions MobileNetV2 on Drowsy Dataset Predictions
Results

TSM Predictions on Drive&Act

Click any clip to expand. Left = most confident correct, Right = most confident wrong.

Most Confident Correct
Most Confident Wrong

Each row = one DMS class. Confidence = softmax probability for the predicted label.