R&D Strategy

Raqeeb: Driver Monitoring
System

Machine learning edge deployed models for real-time driver behaviour alerts.

Phone Use

Smoking

Fatigue / Drowsiness

Seatbelt

Distracted Driving

+ 19 alerts

Mowasalat · DMS R&D · 2026

Overview

Work Flow

Data team covers the creation of the consolidated dataset, Model team develops and fine-tunes the edge and centralized models, HW team deploys on edge

Training Pipeline

End-to-End Pipeline

Four-stage flow from raw data to validated models — with action points per stage

Consolidation

Data acquisition — collect and onboard new data sources [ Data ]

Annotation pipeline — deploy auto-label model + human review loop [ Data ]

Centralized storage — secure data store setup [ Data ]

Automating ingestion — automated pipeline for incoming data [ Data ]

Preprocessing

Augmentation transforms — add extra images from existing to enhance models performance [ Data ]

Cleaning and preprocessing — partitioning, augmentation, human validation [ Data ]

Fine-tuning

Edge model fine-tuning — 1 general vs. many specialized, detection vs. classification, image vs. video architectures [ Model ]

Centralized model fine-tuning — dataset annotation, complex use-case offloading, edge monitoring & distillation [ Model ]

Pretrained evaluation — evaluate on open-source datasets and a subset of Howen alert data [ Model ]

Architecture research — experiment with model architectures and sizes [ Model ]

HW compatibility check — validate model fits NPU constraints [ HW ]

Evaluation

Benchmark on Howen data — per-class F1, confusion matrices [ Model ]

Latency profiling — measure on-device inference and overall edge performance [ HW ]

Datasets

Dataset: Drowsy Detection

48×48 face crops • 2 classes • 7,342 images • Kaggle (yasharjebraeily)

Used to validate TEEN-D model replications (MobileNetV2, ResNet18, VGG16).

Datasets

Dataset: StateFarm Distracted Driver

Dashboard camera • 10 classes • 22,424 images • Kaggle competition

Used to validate Followb1ind1y replications (EfficientNet-B0, MobileNetV3-Large).

Datasets

Dataset: Drive&Act (NIR, Inner Mirror)

8 DMS classes after remapping • 11,895 clips of 8 frames • 640×480 NIR

Safe Driving

Eating / Drinking

Phone Use

Visual Distraction

Manual Distraction

Looking Around

Seatbelt Interaction

Vehicle Entry / Exit

Datasets

Label Remapping: 34 Activities → 8 DMS Classes

Drive&Act mid-level activities grouped by safety-relevant driver behavior

Drive&Act defines 34 fine-grained activities. We collapse them into 8 DMS-relevant categories based on the type of driver distraction or behavior they represent.

Models

Pretrained Model Replication

5 models, 2 datasets — all published results matched or exceeded

Model	Dataset	Published	Ours	Delta
TEEN-D MobileNetV2	Drowsy Det.	98.99%	98.99%	0.00%
TEEN-D ResNet18	Drowsy Det.	95.28%	95.28%	0.00%
TEEN-D VGG16	Drowsy Det.	97.51%	97.51%	0.00%
Followb1ind1y EffNet-B0	StateFarm	96.85%	97.43%	+0.58%
Followb1ind1y MobNetV3-L	StateFarm	94.67%	95.03%	+0.36%

Exact match on drowsy detection. Slight improvement on StateFarm due to minor preprocessing differences. All models evaluated on held-out test splits not seen during training.

Models

Sample Predictions

Green = correct, Red = incorrect — mix of successes and failures

Results

TSM Results on Drive&Act

Per-class Precision, Recall, and F1 on the test set

Strong performance on visual_distraction (F1 78.1%) and safe_driving (F1 72.9%). Minority classes (looking_around, seatbelt_interaction) suffer from low support.

Results

TSM Predictions on Drive&Act

Click any clip to expand. Left = most confident correct, Right = most confident wrong.

Most Confident Correct

Most Confident Wrong

Each row = one DMS class. Confidence = softmax probability for the predicted label.

Datasets

Class Distribution

Drive&Act 8 DMS classes — significant imbalance addressed with sqrt-inverse-freq weighting

safe_driving dominates at 3,707 clips while looking_around has only 107 (35x imbalance). We use sqrt-inverse-frequency class weighting in the loss function to prevent the model from ignoring minority classes.

TSM Architecture

Our Preprocessing Pipeline

NIR-specific preprocessing before temporal shift module classification

NIR frames are extremely dark (88% of pixels below 50/255). CLAHE redistributes contrast, then frames are normalized with ImageNet statistics at full 640×480 resolution.

Hardware

Edge Deployment Constraints

Device specifications that bound model selection and optimization strategy

1.5–4

TOPs (NPU)

INT8

Quantization

≤ 5MB

Target Model Size

< 50ms

Inference Latency

Implications for Model Design

NPU budget of 1.5–4 TOPs rules out most video models at full resolution — TSM with a lightweight backbone (MobileNetV3) remains viable
INT8 quantization is mandatory — all candidate models must be quantization-friendly (no significant accuracy drop at INT8)
Image-based classifiers are the pragmatic default for edge; video-based models shift to the centralized path unless edge TOPs increase
Complex events (multi-step distraction sequences) are candidates for centralized inference via streamed or batched frames

Complete HW compatibility matrix: Map each candidate model to device NPU/memory constraints [ HW Lead ]

Edge vs. Centralized Decision Matrix

Factor	Edge	Centralized
Latency	Real-time	Seconds
Compute	Constrained	Flexible
Model size	≤ 5MB	Uncapped
Video input	Difficult	Native
Privacy	On-device	Requires policy
Connectivity	Independent	Required

Open Decision

Which events are edge-viable vs. centralized-only? This depends on the HW compatibility assessment and benchmark results from the Model team.

Raqeeb: Driver MonitoringSystem

Work Flow

Data team covers the creation of the consolidated dataset, Model team develops and fine-tunes the edge and centralized models, HW team deploys on edge

End-to-End Pipeline

Four-stage flow from raw data to validated models — with action points per stage

Dataset: Drowsy Detection

48×48 face crops • 2 classes • 7,342 images • Kaggle (yasharjebraeily)

Dataset: StateFarm Distracted Driver

Dashboard camera • 10 classes • 22,424 images • Kaggle competition

Dataset: Drive&Act (NIR, Inner Mirror)

8 DMS classes after remapping • 11,895 clips of 8 frames • 640×480 NIR

Label Remapping: 34 Activities → 8 DMS Classes

Drive&Act mid-level activities grouped by safety-relevant driver behavior

Pretrained Model Replication

5 models, 2 datasets — all published results matched or exceeded

Sample Predictions

Green = correct, Red = incorrect — mix of successes and failures

TSM Results on Drive&Act

Per-class Precision, Recall, and F1 on the test set

TSM Predictions on Drive&Act

Click any clip to expand. Left = most confident correct, Right = most confident wrong.

Class Distribution

Drive&Act 8 DMS classes — significant imbalance addressed with sqrt-inverse-freq weighting

Our Preprocessing Pipeline

NIR-specific preprocessing before temporal shift module classification

Edge Deployment Constraints

Device specifications that bound model selection and optimization strategy

Implications for Model Design

Edge vs. Centralized Decision Matrix

Open Decision

Raqeeb: Driver Monitoring
System