Berlin · Open to ML roles

Most AI engineers are good at training models. Far fewer are good at shipping them.

Abu Bakar
Siddik Nayem

[|]

Production AI across NLP, computer vision, and MLOps. From 60M-user recommendation engines to research in Nature and IEEE.

0M+

users served

papers published

years in prod AI

detection accuracy

GitHub LinkedIn Download CV

scroll

// about

The gap between a model and a product is where most AI projects die.

I close that gap. Four years of production AI — recommendation engines for 60M+ Grameenphone subscribers, computer vision pipelines at TRACKBOX.BE, research published in Nature Scientific Reports and IEEE.

What connects it: a bias toward production. Not a marginal improvement. Systems that work at scale, under constraints, for real users.

Available

Open to ML Engineering roles

Berlin-based · Remote OK · German Work Permit

nayemabs.de@gmail.com

Berlin, Germany

Central European Time · UTC+1

M.Sc. Artificial Intelligence

BTU Cottbus-Senftenberg · 2025 – Present

nayem @ berlin

const me = {

papers: 7,

users: '60M+',

stack: [

'PyTorch', 'AWS', 'GCP'

}

▌

60M+

Users Served

Papers Published

92%

Detection Accuracy

Years in Production AI

// projects

Systems that shipped.

Production work, research, and open-source tools — each with a specific problem and a measurable outcome.

Sports Analytics Client

Spatiotemporal Ball Detection

92% accuracy at 10px threshold — 2× the single-frame baseline

Standard object detectors fail on fast-moving balls: motion blur, occlusions, and perspective collapse break single-frame assumptions. The fix: model video as a temporal sequence. A large pre-trained vision encoder extracts rich spatial features from multi-frame windows, then a temporal reasoning module attends across those frames to jointly predict ball visibility and sub-pixel position in the final frame.

▸Multi-frame input pipeline: 5-frame windows at 10 FPS, custom temporal positional encoding
▸Pre-trained vision backbone with progressive unfreezing — 3-stage training schedule
▸Dual-objective loss: visibility classification (focal) + position regression (masked MSE, visible frames only)
▸Cloud batch training infrastructure — 26K+ samples, experiment tracking across 25+ variants
▸F1 Score 0.9032, RMSE 15.23px, Position-aware @25px = 0.85, @50px = 0.88
▸Benchmarked against frame-level detectors, video transformers, and cross-attention baselines

92% @ 10px thresholdvs 45% frame-level baselineF1 0.903226K+ training samples

PyTorchVision TransformersTemporal ModellingGCPW&BOpenCV

Enterprise Client

Recommendation Engine at Scale

Real-time personalisation for 60M+ subscribers at sub-100ms

A national telecom operator needed a recommendation engine that could serve 60M+ active subscribers in real time — surfacing relevant short-form video content and integrating ad-revenue signals without blowing latency budgets. The system is a retrieval + re-ranking pipeline backed by vector search, containerised microservices, and a multilingual content understanding layer covering three language variants.

▸Dual-encoder retrieval model: separate user and content towers trained on behavioural signals
▸Vector database for sub-100ms candidate retrieval across 60M+ user profiles at production load
▸Containerised microservices deployment via Kubernetes GitOps — GitLab CI/CD pipeline
▸Diversity controls: rotation logic with fallback to prevent filter bubbles and cold-start failures
▸Multilingual content moderation: toxicity filtering across three language variants for UGC safety
▸Ad-revenue integration: downstream engagement and conversion signals fed back into re-ranking
▸Load and memory profiled under peak concurrent request scenarios before production rollout

60M+ subscribers<100ms retrieval3 languagesKubernetes production

PyTorchVector SearchFastAPIKubernetesLangChainTransformersGitLab CI/CD

Personal

HealthRAG

Production bilingual RAG backend — English ↔ Japanese medical retrieval

Healthcare professionals in Japan and internationally need rapid access to multilingual clinical guidelines. HealthRAG is a production FastAPI service that ingests EN/JA medical documents, indexes them with FAISS using a multilingual sentence-transformer (384-dim, 50+ languages), and serves cross-lingual semantic search — query in English, retrieve from Japanese documents and vice versa.

▸Multilingual embedding: paraphrase-multilingual-MiniLM-L12-v2 — single 384-dim space, 50+ languages
▸FAISS in-memory index with disk persistence — scales to ~1M vectors/node at sub-ms latency
▸4 swappable LLM backends: GPT-4o-mini, Claude Sonnet, offline template, or custom endpoint
▸4 translation backends: Google, googletrans, Argostranslate (fully offline), or mock
▸Auto language detection via langdetect on ingest — no manual tagging required
▸Multi-platform Docker image (linux/amd64 + arm64) published to GHCR via GitHub Actions CI/CD
▸CPU inference ~20ms/doc — no GPU required for production serving

50+ languages~20ms/doc CPU~1M vectors FAISSMulti-platform Docker

FastAPIFAISSLangChainSentenceTransformersDockerGitHub Actionspytest

GitHub

CCDS, Independent University Bangladesh

BOLM: Bangladesh Open LULC Map

891M annotated pixels across 4,392 km² — Nature Scientific Reports 2025

Annotated satellite data for developing countries is scarce — the global LULC community has extensive coverage of North America and Europe, almost nothing for South Asia. BOLM addresses this: pixel-level land use/land cover annotations across the Dhaka metropolitan area using Bing imagery at 2.22m/pixel. 11 classes, 891 million annotated pixels, three-stage GIS expert validation.

▸4,392 km² coverage — Dhaka metro and surrounding rural/peri-urban zones
▸891 million annotated pixels across 11 classes: farmland, water, forest, urban structure, urban built-up, rural built-up, road, meadow, marshland, brick factory, unrecognized
▸3-stage annotation: Bing imagery (2.22m/pixel) + QGIS + GIS expert validation
▸Benchmarks: DeepLabV3+, HRNetv2, U-Net, UnimatchV2, Segmenter ViT-16 — best IoU 0.50 (UnimatchV2)
▸Companion paper accepted at ICIP 2025 (arXiv:2505.21915)
▸Part of a 7-paper arc: FCN-8 (SLAAI 2020) → SIGML/Sensors 2021 → ICPR 2020 → Scientific Reports 2025

891M pixels4,392 km²11 LULC classesNature Scientific Reports

DeepLabV3+U-NetHRNetv2PyTorchGDALQGISRemote Sensing

Paper Dataset

Personal / BTU Coursework

Llama 3.2-1B Emotion Classifier

Multi-label emotion detection via LoRA — F1 Macro 0.7011 on 5 classes

Fine-tuning LLMs for classification is expensive. LoRA (Low-Rank Adaptation) changes that: train ~0.1% of parameters while matching full fine-tune quality. This project fine-tunes Meta's Llama 3.2-1B for multi-label emotion detection (anger, fear, joy, sadness, surprise) using LoRA rank=16 and 4-bit NF4 quantization — reducing VRAM by ~75% while hitting F1 Macro 0.7011.

▸LoRA config: rank=16, alpha=32, target modules q_proj + v_proj — ~0.1% of total parameters trained
▸4-bit NF4 quantization via BitsAndBytes — ~75% VRAM reduction vs full precision
▸F1 Macro: 0.7011 | F1 Micro: 0.7183 | Hamming Loss: 0.188 | Jaccard (macro): 0.544
▸Per-class F1: fear 0.791, surprise 0.746, sadness 0.701, anger 0.667, joy 0.600
▸30 epochs, batch size 2 + gradient accumulation, lr 5e-5 with linear warmup
▸PEFT + BitsAndBytes + Transformers + Datasets — reproducible stack pushed to HuggingFace Hub

F1 Macro 0.7011F1 Micro 0.7183~0.1% params trained75% VRAM reduction

Llama 3.2LoRAPEFTBitsAndBytesHuggingFacetransformersPyTorch4-bit NF4

GitHub

Acme AI / CCDS IUB

Livestock Weight Prediction System

Image-to-weight estimation for cattle — $246,000 Gates Foundation grant

Weighing livestock in developing-world farms requires expensive equipment most farmers cannot afford. This project builds an end-to-end pipeline: photograph a cow, get a weight estimate. Covers cattle semantic segmentation (DeepLabV3+ on MMSegmentation), rear-view pose estimation, morphometric feature extraction, and a production FastAPI + Celery API for async batch inference.

▸Cattle segmentation: DeepLabV3+ on MMSegmentation (OpenMMLab) — repo: livestock-segmentation (2★ GitHub)
▸Pose estimation from rear view (cattle-pose-rear) — keypoint detection for girth, height, length measurement
▸Weight regression from body morphometrics → kg estimate
▸Production API: FastAPI + Celery + RabbitMQ for async batch jobs (cattle-web — 4★, 2 forks)
▸Optional Flower monitoring dashboard for task queue observability
▸12K-image public dataset released on Kaggle — 4K+ downloads
▸Funded by Bill & Melinda Gates Foundation ($246,000 grant) through CCDS, IUB

$246K Gates Foundation12K-image dataset4K+ Kaggle downloadsFastAPI + Celery

DeepLabV3+MMSegmentationPyTorchFastAPICeleryRabbitMQOpenCV

GitHub

// writing

Thinking out loud.

Notes on building AI systems that actually ship — retrieval, inference, data, and the gaps between benchmarks and production.

12 Feb 2026

6 min read

Most RAG Systems Are Retrieval Without Understanding

RAGLLMOpsProduction AI

Teams spend months tuning their LLM prompts, then wonder why the system still hallucinates. The answer is almost never the generation step. It's what you handed the model in the first place.

The failure mode nobody talks about: your retrieval pipeline is broken, but your LLM is polite enough to fabricate an answer anyway. Most teams building RAG systems optimise the wrong half. They write elaborate system prompts, swap between GPT-4o and Claude, obsess over temperature settings — and retrieve garbage. The **Garbage In, Gospel Out** problem is RAG's defining failure pattern. The model...

Member's content

This article is available for €4.99

One-time payment. Permanent access.

28 Jan 2026

5 min read

The Real Price of Open-Source LLMs

LLMsMLOpsCost

"Free" is one of the most expensive words in AI infrastructure. The GPU bill arrives later. The engineering hours never stop.

10 Jan 2026

7 min read

Object Detectors See Frames. Not Video.

Computer VisionDeep LearningSports AI

Every frame is a fresh start for a standard object detector. Motion, trajectory, and physics don't exist. That works fine — until it doesn't.

Standard object detectors are remarkably good at a fundamentally limited task. YOLOv8 can find a ball in an image. It cannot tell you where the ball was 100 milliseconds ago, how fast it's moving, or whether it's about to go out of frame. That information doesn't exist inside a single frame. Yet most computer vision systems for video are built exactly this way: a detector applied independently to ...

Member's content

This article is available for €4.99

One-time payment. Permanent access.

20 Dec 2025

8 min read

What 891 Million Pixels Actually Cost

Remote SensingDataResearch

The ML community celebrates benchmark numbers. It rarely talks about what it actually takes to produce the datasets that generate them. 891 million annotated pixels later, here's the real accounting.

// experience

The pattern: I follow problems through.

From Dhaka to Berlin — research labs, AI startups, and product companies.

Software Engineer (ML)

TRACKBOX.BE

Aug 2025 – Present

Remote

▸Soccer analytics platform — computer vision and NLP pipelines for match intelligence
▸Custom CNN+Transformer hybrid: 92% detection accuracy at 10-pixel threshold
▸NL2SQL analytics layer: query success from 20% to 95% across 7 iterations

PyTorchYOLONL2SQLOpenCVPython

M.Sc. Artificial Intelligence

Brandenburg University of Technology Cottbus-Senftenberg

Apr 2025 – Present

Cottbus, Germany

▸Focus: deep learning, NLP, computer vision, and autonomous systems

Senior AI Engineer

Think Flagship IXP

Nov 2024 – Jul 2025

Remote

▸Multi-stage recommendation system for Grameenphone — 60M+ subscribers
▸User/item tower architecture with real-time feature serving under 100ms
▸Multilingual sentiment analysis: English, Bangla, Banglish
▸Ad-targeting engine integration with downstream revenue impact

PyTorchLangChainAWSRedisPostgreSQL

AI Engineer

AinoviQ IT Limited

Oct 2023 – Oct 2024

Dhaka, Bangladesh

▸Virtual try-on system using generative adversarial networks (GANs)
▸Garment segmentation and pose estimation pipeline for e-commerce
▸Reduced inference latency by optimizing model serving on GCP

GANsPyTorchGCPOpenCVFastAPI

Lead ML Engineer

Acme AI Ltd.

Dec 2021 – Feb 2024

Dhaka, Bangladesh

▸Livestock health monitoring using satellite imagery and deep learning
▸Medical imaging pipelines for diagnostic assistance
▸Built and led the ML team — hired and mentored 4 engineers
▸Authored technical articles on RAG, GraphRAG, and open-source LLMs

PyTorchAWSMLflowDockerRemote Sensing

Research Associate

Independent University Bangladesh

May 2019 – Nov 2022

Dhaka, Bangladesh

▸Published in Nature Scientific Reports (2025), Sensors (2021), IEEE ICPR (2020)
▸BOLM: 12K-image land use/land cover benchmark dataset for Dhaka
▸Disaster response image classification with attention mechanisms
▸Funded by Bill & Melinda Gates Foundation and Australia Awards

ResearchDeep LearningRemote SensingPythonPyTorch

B.Sc. Computer Science

Independent University Bangladesh

Sep 2015 – Apr 2019

Dhaka, Bangladesh

▸Graduated with distinction — foundations in algorithms, ML, and systems

// skills

The tools, not the buzzwords.

Production-tested across NLP, computer vision, and infrastructure. Only what I have shipped.

Large Language ModelsLoRA / QLoRARAG / GraphRAGLangChainLlamaIndexHuggingFacePrompt EngineeringMultilingual NLPSentiment AnalysisNL2SQLYOLO (v5–v11)DETR / RF-DETRGANsOpenCVObject DetectionImage SegmentationPose EstimationSpatio-temporal ModelsRemote SensingVideo AnalyticsAWS (EC2, S3, SageMaker)Google Cloud PlatformDockerTerraformMLflowCI/CD PipelinesFAISS / MilvusModel ServingGDPR / HIPAA / ISOMonitoringPythonPyTorchTensorFlowFastAPITypeScriptPostgreSQLRedisNumPy / PandasScikit-learnLinuxLarge Language ModelsLoRA / QLoRARAG / GraphRAGLangChainLlamaIndexHuggingFacePrompt EngineeringMultilingual NLPSentiment AnalysisNL2SQLYOLO (v5–v11)DETR / RF-DETRGANsOpenCVObject DetectionImage SegmentationPose EstimationSpatio-temporal ModelsRemote SensingVideo AnalyticsAWS (EC2, S3, SageMaker)Google Cloud PlatformDockerTerraformMLflowCI/CD PipelinesFAISS / MilvusModel ServingGDPR / HIPAA / ISOMonitoringPythonPyTorchTensorFlowFastAPITypeScriptPostgreSQLRedisNumPy / PandasScikit-learnLinux

NLP & GenAI

Large Language ModelsLoRA / QLoRARAG / GraphRAGLangChainLlamaIndexHuggingFacePrompt EngineeringMultilingual NLPSentiment AnalysisNL2SQL

Computer Vision

YOLO (v5–v11)DETR / RF-DETRGANsOpenCVObject DetectionImage SegmentationPose EstimationSpatio-temporal ModelsRemote SensingVideo Analytics

MLOps & Infrastructure

AWS (EC2, S3, SageMaker)Google Cloud PlatformDockerTerraformMLflowCI/CD PipelinesFAISS / MilvusModel ServingGDPR / HIPAA / ISOMonitoring

Languages & Frameworks

PythonPyTorchTensorFlowFastAPITypeScriptPostgreSQLRedisNumPy / PandasScikit-learnLinux

// spoken languages

BengaliNative

EnglishC1 · Professional

GermanA2 · Learning

// contact

Let's build something.

I would welcome the opportunity to talk through how my background maps onto what you are building. Available for ML engineering roles in Berlin and remotely.

nayemabs.de@gmail.com

linkedin.com/in/nayemabs github.com/nayemabs

Berlin, Germany — German Work Permit

Download CV

// built with Next.js · deployed on Vercel //

Abu Bakar Siddik Nayem

The gap between a model and a product is where most AI projects die.

Systems that shipped.

Spatiotemporal Ball Detection

Recommendation Engine at Scale

HealthRAG

BOLM: Bangladesh Open LULC Map

Llama 3.2-1B Emotion Classifier

Livestock Weight Prediction System

Thinking out loud.

Most RAG Systems Are Retrieval Without Understanding

The Real Price of Open-Source LLMs

Object Detectors See Frames. Not Video.

What 891 Million Pixels Actually Cost

The pattern: I follow problems through.

Software Engineer (ML)

M.Sc. Artificial Intelligence

Senior AI Engineer

AI Engineer

Lead ML Engineer

Research Associate

B.Sc. Computer Science

The tools, not the buzzwords.

NLP & GenAI

Computer Vision

MLOps & Infrastructure

Languages & Frameworks

// spoken languages

Let's build something.

Abu Bakar
Siddik Nayem