STARTell me about reducing latency in an ML system.
Real-Time Fraud Monitoring — Kafka + XGBoost pipeline, achieved <100ms latency through SHAP-based feature selection and async scoring. Containerized via Docker + FastAPI.
Reflection: profiling revealed the bottleneck was feature serialization, not the model. Counter-intuitive but critical.
XGBoostKafka<100msMLOps
STARDescribe a complex LLM system you designed end-to-end.
NEXUS-AI — LangGraph + RAG + GPT-4 + DistilBERT, deployed with FastAPI, Pinecone, PostgreSQL, Redis, Docker, Next.js. Multi-agent orchestration with memory persistence.
Reflection: chunking strategy + embedding model choice mattered more than LLM selection. RAG quality is the leverage point.
LangGraphRAGGPT-4Pinecone
STARTell me about an open source contribution you're proud of.
Two merged PRs to adenhq/hive (YC-backed): W&B MCP/GraphQL tracking integration + unit test expansion. Started through cold outreach to the founder — hiring interest followed.
Reflection: open source PRs signal real craft. The YC connection opened a hiring conversation cold email alone never would.
open sourceW&BMCPGraphQL
STARHow have you handled ML model accuracy below expectations?
Multilingual NLI BERT — fine-tuned BERT + RoBERTa on 15-language dataset; initial accuracy was 81%, target was 90%+. Systematically tried data augmentation, loss function changes, and language-specific tokenization.
Result: 93.4% Kaggle accuracy. Reflection: the biggest gain came from better data cleaning, not architecture changes.
BERTRoBERTafine-tuning93.4%