Most AI engineers are good at training models. Far fewer are good at shipping them.
Abu Bakar
Siddik Nayem
Production AI across NLP, computer vision, and MLOps. From 60M-user recommendation engines to research in Nature and IEEE.
// about
The gap between a model and a product is where most AI projects die.
I close that gap. Four years of production AI — recommendation engines for 60M+ Grameenphone subscribers, computer vision pipelines at TRACKBOX.BE, research published in Nature Scientific Reports and IEEE.
What connects it: a bias toward production. Not a marginal improvement. Systems that work at scale, under constraints, for real users.
Open to ML Engineering roles
Berlin-based · Remote OK · German Work Permit
Berlin, Germany
Central European Time · UTC+1
M.Sc. Artificial Intelligence
BTU Cottbus-Senftenberg · 2025 – Present
// projects
Systems that shipped.
Production work, research, and open-source tools — each with a specific problem and a measurable outcome.
Spatiotemporal Ball Detection
92% accuracy at 10px threshold — 2× the YOLOv8 baseline
Standard object detectors fail on fast-moving balls: motion blur, occlusions, and perspective collapse. The solution: treat video as a sequence, not a frame. A frozen DINOv2 backbone (via RF-DETR encoder) extracts spatial features from 5-frame windows at 10 FPS. A 4-layer Temporal Transformer then reasons across those frames to predict ball visibility and exact pixel position in the final frame.
- ▸RF-DETR encoder with frozen DINOv2 backbone (86M params) — 3-stage progressive unfreezing
- ▸Temporal Transformer: 4 layers, 8 heads, d_model=512, custom positional encoding for frame sequences
- ▸Dual loss: Focal Loss for visibility + Masked MSE for position (only computed on visible frames)
- ▸GCP Batch training infrastructure — 26K+ samples, W&B tracking across 25+ architecture variants
- ▸F1 Score 0.9032, RMSE 15.23px, Position-aware @25px = 0.85, @50px = 0.88
- ▸Evaluated against: YOLOv8 fine-tuned (45% baseline), TimeSformer, Cross-Attention variants, TOTNet (7.19px RMSE)
Grameenphone Recommendation Engine
Real-time personalisation for 60M+ subscribers at sub-100ms
Bangladesh's largest telco needed a recommendation engine that could serve 60M+ subscribers in real time, surface relevant short-form video, and integrate ad targeting — without sacrificing latency. The architecture is a two-tower retrieval + reranking pipeline built with Milvus for vector search, gRPC microservices for inter-service communication, and a multilingual sentiment layer covering English, Bangla, and Banglish.
- ▸Two-tower model: user tower (behaviour history, demographics) + item tower (content embeddings, metadata)
- ▸Milvus vector database for sub-100ms candidate retrieval across 60M+ user profiles
- ▸gRPC microservices architecture with Docker + Kubernetes GitOps deployment (GitLab CI/CD)
- ▸Channel rotation with fallback logic — ensures diversity, prevents filter bubbles
- ▸Multilingual sentiment analysis on Bangla, Banglish, and English UGC for toxicity filtering
- ▸Ad-targeting engine integration: downstream revenue signal fed back into re-ranking score
- ▸Load tested with psutil + memory_profiler under peak concurrent request scenarios
HealthRAG
Production bilingual RAG backend — English ↔ Japanese medical retrieval
Healthcare professionals in Japan and internationally need rapid access to multilingual clinical guidelines. HealthRAG is a production FastAPI service that ingests EN/JA medical documents, indexes them with FAISS using a multilingual sentence-transformer (384-dim, 50+ languages), and serves cross-lingual semantic search — query in English, retrieve from Japanese documents and vice versa.
- ▸Multilingual embedding: paraphrase-multilingual-MiniLM-L12-v2 — single 384-dim space, 50+ languages
- ▸FAISS in-memory index with disk persistence — scales to ~1M vectors/node at sub-ms latency
- ▸4 swappable LLM backends: GPT-4o-mini, Claude Sonnet, offline template, or custom endpoint
- ▸4 translation backends: Google, googletrans, Argostranslate (fully offline), or mock
- ▸Auto language detection via langdetect on ingest — no manual tagging required
- ▸Multi-platform Docker image (linux/amd64 + arm64) published to GHCR via GitHub Actions CI/CD
- ▸CPU inference ~20ms/doc — no GPU required for production serving
BOLM: Bangladesh Open LULC Map
891M annotated pixels across 4,392 km² — Nature Scientific Reports 2025
Annotated satellite data for developing countries is scarce — the global LULC community has extensive coverage of North America and Europe, almost nothing for South Asia. BOLM addresses this: pixel-level land use/land cover annotations across the Dhaka metropolitan area using Bing imagery at 2.22m/pixel. 11 classes, 891 million annotated pixels, three-stage GIS expert validation.
- ▸4,392 km² coverage — Dhaka metro and surrounding rural/peri-urban zones
- ▸891 million annotated pixels across 11 classes: farmland, water, forest, urban structure, urban built-up, rural built-up, road, meadow, marshland, brick factory, unrecognized
- ▸3-stage annotation: Bing imagery (2.22m/pixel) + QGIS + GIS expert validation
- ▸Benchmarks: DeepLabV3+, HRNetv2, U-Net, UnimatchV2, Segmenter ViT-16 — best IoU 0.50 (UnimatchV2)
- ▸Companion paper accepted at ICIP 2025 (arXiv:2505.21915)
- ▸Part of a 7-paper arc: FCN-8 (SLAAI 2020) → SIGML/Sensors 2021 → ICPR 2020 → Scientific Reports 2025
Llama 3.2-1B Emotion Classifier
Multi-label emotion detection via LoRA — F1 Macro 0.7011 on 5 classes
Fine-tuning LLMs for classification is expensive. LoRA (Low-Rank Adaptation) changes that: train ~0.1% of parameters while matching full fine-tune quality. This project fine-tunes Meta's Llama 3.2-1B for multi-label emotion detection (anger, fear, joy, sadness, surprise) using LoRA rank=16 and 4-bit NF4 quantization — reducing VRAM by ~75% while hitting F1 Macro 0.7011.
- ▸LoRA config: rank=16, alpha=32, target modules q_proj + v_proj — ~0.1% of total parameters trained
- ▸4-bit NF4 quantization via BitsAndBytes — ~75% VRAM reduction vs full precision
- ▸F1 Macro: 0.7011 | F1 Micro: 0.7183 | Hamming Loss: 0.188 | Jaccard (macro): 0.544
- ▸Per-class F1: fear 0.791, surprise 0.746, sadness 0.701, anger 0.667, joy 0.600
- ▸30 epochs, batch size 2 + gradient accumulation, lr 5e-5 with linear warmup
- ▸PEFT + BitsAndBytes + Transformers + Datasets — reproducible stack pushed to HuggingFace Hub
Livestock Weight Prediction System
Image-to-weight estimation for cattle — $246,000 Gates Foundation grant
Weighing livestock in developing-world farms requires expensive equipment most farmers cannot afford. This project builds an end-to-end pipeline: photograph a cow, get a weight estimate. Covers cattle semantic segmentation (DeepLabV3+ on MMSegmentation), rear-view pose estimation, morphometric feature extraction, and a production FastAPI + Celery API for async batch inference.
- ▸Cattle segmentation: DeepLabV3+ on MMSegmentation (OpenMMLab) — repo: livestock-segmentation (2★ GitHub)
- ▸Pose estimation from rear view (cattle-pose-rear) — keypoint detection for girth, height, length measurement
- ▸Weight regression from body morphometrics → kg estimate
- ▸Production API: FastAPI + Celery + RabbitMQ for async batch jobs (cattle-web — 4★, 2 forks)
- ▸Optional Flower monitoring dashboard for task queue observability
- ▸12K-image public dataset released on Kaggle — 4K+ downloads
- ▸Funded by Bill & Melinda Gates Foundation ($246,000 grant) through CCDS, IUB
// experience
The pattern: I follow problems through.
From Dhaka to Berlin — research labs, AI startups, and product companies.
Software Engineer (ML)
TRACKBOX.BE
- ▸Soccer analytics platform — computer vision and NLP pipelines for match intelligence
- ▸Custom CNN+Transformer hybrid: 92% detection accuracy at 10-pixel threshold
- ▸NL2SQL analytics layer: query success from 20% to 95% across 7 iterations
M.Sc. Artificial Intelligence
Brandenburg University of Technology Cottbus-Senftenberg
- ▸Focus: deep learning, NLP, computer vision, and autonomous systems
Senior AI Engineer
Think Flagship IXP
- ▸Multi-stage recommendation system for Grameenphone — 60M+ subscribers
- ▸User/item tower architecture with real-time feature serving under 100ms
- ▸Multilingual sentiment analysis: English, Bangla, Banglish
- ▸Ad-targeting engine integration with downstream revenue impact
AI Engineer
AinoviQ IT Limited
- ▸Virtual try-on system using generative adversarial networks (GANs)
- ▸Garment segmentation and pose estimation pipeline for e-commerce
- ▸Reduced inference latency by optimizing model serving on GCP
Lead ML Engineer
Acme AI Ltd.
- ▸Livestock health monitoring using satellite imagery and deep learning
- ▸Medical imaging pipelines for diagnostic assistance
- ▸Built and led the ML team — hired and mentored 4 engineers
- ▸Authored technical articles on RAG, GraphRAG, and open-source LLMs
Research Associate
Independent University Bangladesh
- ▸Published in Nature Scientific Reports (2025), Sensors (2021), IEEE ICPR (2020)
- ▸BOLM: 12K-image land use/land cover benchmark dataset for Dhaka
- ▸Disaster response image classification with attention mechanisms
- ▸Funded by Bill & Melinda Gates Foundation and Australia Awards
B.Sc. Computer Science
Independent University Bangladesh
- ▸Graduated with distinction — foundations in algorithms, ML, and systems
// skills
The tools, not the buzzwords.
Production-tested across NLP, computer vision, and infrastructure. Only what I have shipped.
NLP & GenAI
Computer Vision
MLOps & Infrastructure
Languages & Frameworks
// spoken languages
// contact
Let's build something.
I would welcome the opportunity to talk through how my background maps onto what you are building. Available for ML engineering roles in Berlin and remotely.
// built with Next.js · deployed on Vercel //