navigate
R&D Strategy

Raqeeb: Driver Monitoring
System

Machine learning edge deployed models for real-time driver behaviour alerts.

Phone Use
Smoking
Fatigue / Drowsiness
Seatbelt
Distracted Driving
+ 19 alerts

Mowasalat · DMS R&D · 2026

Overview

Work Flow

Data team covers the creation of the consolidated dataset, Model team develops and fine-tunes the edge and centralized models, HW team deploys on edge

DATA TEAM Open-Source DDD, StateFarm, YawDD Howen Alert Data Fleet camera alerts Raw Footage Unlabelled video Annotation Model-assisted + Human Augmentation Preprocessing pipeline Consolidated Dataset Open + Howen + Raw MODEL TEAM Architecture Research & selection Pretrained Eval On open + Howen subset awaits consolidated data Fine-tuning Edge + Centralized models Evaluation Accuracy / latency / size HW TEAM HW Assessment TOPs, memory, NPU specs constrains model selection Compatibility Model ↔ Device fit Edge Deployment On-device inference Data team Model team HW team Dependency / handoff
Training Pipeline

End-to-End Pipeline

Four-stage flow from raw data to validated models — with action points per stage

STAGE 1 Dataset Consolidation Unified format · Annotation · Storage Open-Source Howen Alerts Raw Footage Automatic assisted labeling Consolidated Dataset STAGE 2 Preprocessing Augmentation · Balancing Augmentation Enrich dataset STAGE 3 Model Fine-tuning Edge · Centralized · Distillation Edge Lightweight models Model compression (e.g., quantization) Centralized Image models Video models Full precision STAGE 4 Evaluation Accuracy · Latency · Deployment Accuracy Per-class F1 on Howen data Latency On-device inference < 50ms Model size Edge ≤ 5MB · INT8 viable Feedback loop → retrain
Consolidation
Data acquisition — collect and onboard new data sources [ Data ]
Annotation pipeline — deploy auto-label model + human review loop [ Data ]
Centralized storage — secure data store setup [ Data ]
Automating ingestion — automated pipeline for incoming data [ Data ]
Preprocessing
Augmentation transforms — add extra images from existing to enhance models performance [ Data ]
Cleaning and preprocessing — partitioning, augmentation, human validation [ Data ]
Fine-tuning
Edge model fine-tuning — 1 general vs. many specialized, detection vs. classification, image vs. video architectures [ Model ]
Centralized model fine-tuning — dataset annotation, complex use-case offloading, edge monitoring & distillation [ Model ]
Pretrained evaluation — evaluate on open-source datasets and a subset of Howen alert data [ Model ]
Architecture research — experiment with model architectures and sizes [ Model ]
HW compatibility check — validate model fits NPU constraints [ HW ]
Evaluation
Benchmark on Howen data — per-class F1, confusion matrices [ Model ]
Latency profiling — measure on-device inference and overall edge performance [ HW ]
Datasets

Dataset: Drowsy Detection

48×48 face crops • 2 classes • 7,342 images • Kaggle (yasharjebraeily)

Drowsy Detection Samples

Used to validate TEEN-D model replications (MobileNetV2, ResNet18, VGG16).

Datasets

Dataset: StateFarm Distracted Driver

Dashboard camera • 10 classes • 22,424 images • Kaggle competition

StateFarm Samples

Used to validate Followb1ind1y replications (EfficientNet-B0, MobileNetV3-Large).

Datasets

Dataset: Drive&Act (NIR, Inner Mirror)

8 DMS classes after remapping • 11,895 clips of 8 frames • 640×480 NIR

Safe Driving
Eating / Drinking
Phone Use
Visual Distraction
Manual Distraction
Looking Around
Seatbelt Interaction
Vehicle Entry / Exit
Datasets

Label Remapping: 34 Activities → 8 DMS Classes

Drive&Act mid-level activities grouped by safety-relevant driver behavior

Label Remapping

Drive&Act defines 34 fine-grained activities. We collapse them into 8 DMS-relevant categories based on the type of driver distraction or behavior they represent.

Models

Pretrained Model Replication

5 models, 2 datasets — all published results matched or exceeded

ModelDatasetPublishedOursDelta
TEEN-D MobileNetV2Drowsy Det.98.99%98.99%0.00%
TEEN-D ResNet18Drowsy Det.95.28%95.28%0.00%
TEEN-D VGG16Drowsy Det.97.51%97.51%0.00%
Followb1ind1y EffNet-B0StateFarm96.85%97.43%+0.58%
Followb1ind1y MobNetV3-LStateFarm94.67%95.03%+0.36%
Model Comparison

Exact match on drowsy detection. Slight improvement on StateFarm due to minor preprocessing differences. All models evaluated on held-out test splits not seen during training.

Models

Sample Predictions

Green = correct, Red = incorrect — mix of successes and failures

EfficientNet-B0 Predictions TEEN-D MobileNetV2 Predictions
Results

TSM Results on Drive&Act

Per-class Precision, Recall, and F1 on the test set

TSM Per-Class Metrics

Strong performance on visual_distraction (F1 78.1%) and safe_driving (F1 72.9%). Minority classes (looking_around, seatbelt_interaction) suffer from low support.

Results

TSM Predictions on Drive&Act

Click any clip to expand. Left = most confident correct, Right = most confident wrong.

Most Confident Correct
Most Confident Wrong

Each row = one DMS class. Confidence = softmax probability for the predicted label.

Datasets

Class Distribution

Drive&Act 8 DMS classes — significant imbalance addressed with sqrt-inverse-freq weighting

Class Distribution

safe_driving dominates at 3,707 clips while looking_around has only 107 (35x imbalance). We use sqrt-inverse-frequency class weighting in the loss function to prevent the model from ignoring minority classes.

TSM Architecture

Our Preprocessing Pipeline

NIR-specific preprocessing before temporal shift module classification

Preprocessing Pipeline

NIR frames are extremely dark (88% of pixels below 50/255). CLAHE redistributes contrast, then frames are normalized with ImageNet statistics at full 640×480 resolution.

Hardware

Edge Deployment Constraints

Device specifications that bound model selection and optimization strategy

1.5–4
TOPs (NPU)
INT8
Quantization
≤ 5MB
Target Model Size
< 50ms
Inference Latency

Implications for Model Design

  • NPU budget of 1.5–4 TOPs rules out most video models at full resolution — TSM with a lightweight backbone (MobileNetV3) remains viable
  • INT8 quantization is mandatory — all candidate models must be quantization-friendly (no significant accuracy drop at INT8)
  • Image-based classifiers are the pragmatic default for edge; video-based models shift to the centralized path unless edge TOPs increase
  • Complex events (multi-step distraction sequences) are candidates for centralized inference via streamed or batched frames
Complete HW compatibility matrix: Map each candidate model to device NPU/memory constraints [ HW Lead ]

Edge vs. Centralized Decision Matrix

FactorEdgeCentralized
LatencyReal-timeSeconds
ComputeConstrainedFlexible
Model size≤ 5MBUncapped
Video inputDifficultNative
PrivacyOn-deviceRequires policy
ConnectivityIndependentRequired

Open Decision

Which events are edge-viable vs. centralized-only? This depends on the HW compatibility assessment and benchmark results from the Model team.