$ whoami

Saurabh Singh

Backend & AI infrastructure engineer. I build production LLM systems and large-scale orchestration at Observe.ai. Previously Cashfree Payments.

Comfortable owning systems end-to-end — from Spring Boot + Temporal control planes processing 20M+ events/day, to FastAPI + llama.cpp stacks running fully offline.

→ selected work get in touch github ↗

events / day: 20M+
tenants: 1000+
infra saved / yr: $144K
code reduction: 85%

// now

What I'm building

day job · observe.ai

Production AI & orchestration at scale

Architecting AskObserve — an LLM-powered conversational analytics layer routing between structured (Snowflake) and unstructured (LLM/NLP) sources, and Hermes — a Temporal-based unified pipeline orchestration replacing 15+ fragmented workflows.

side · vajra

Air-gapped sovereign LLM stack

A solo-built, fully offline foundation-model stack: FastAPI gateway, classification-aware RAG, layered policy engine, hash-chained audit log, and a SOC dashboard. Submitted to a defence-sector RFP. Details on request.

// selected work

Production systems I've shipped

01 · Observe.ai

AskObserve

Production LLM · Conversational Analytics

accuracy: +35%
latency: −20%
search: <100ms

Spring-based microservice that puts an LLM in the loop for 25+ business units, with an intelligent selector that routes between Snowflake structured analytics and LLM/NLP sources.

Built dual-source ML router that picks between structured analytics and LLM/NLP based on query shape
Engineered sub-100ms semantic history search on Elasticsearch + MongoDB Atlas with compound indexes and fuzzy matching
Designed tag-based filter pipeline (hierarchical teams, date-range merging, dynamic enrichments) translating UI selections into ML format
AOP-based monitoring + cache-driven filter pipeline for low-latency reads

Spring BootJavaElasticsearchMongoDB AtlasSnowflakeLLM APIsRedis

02 · Observe.ai

Hermes

Workflow Orchestration · Platform

events/day: 20M+
tenants: 1000+
code reduction: 85%
storage saved: 99.6%

Mission-critical platform transformation consolidating 15+ fragmented workflow implementations into a single Temporal-based orchestration layer.

Designed a hybrid stage-group DAG with constrained inputs, enabling dynamic pipeline construction from compile-time constants
Reduced new-stage addition effort from ~1 week to <1 day
Eliminated 99.6% of Temporal storage overhead (~1TB → ~4GB/day) using a claim-check pattern
Killed dual SQS/Temporal code paths — single source of truth for orchestration

TemporalJavaSpring BootKafkaMongoDBAWS

03 · Observe.ai

Looker zero-downtime migration

Infrastructure · Cost

saved / yr: $144K
envs migrated: 3 prod
downtime: zero

Architected and executed a zero-downtime Looker cluster migration across 3 production environments with a custom account-based routing framework.

Custom account-routing framework for transparent cluster cutover
Robust Dry-Run / Execute / Rollback APIs with state transitions in MongoDB
Resolved auth-token cache collisions and large-dashboard payload limits via a custom Strip-Import-Patch approach

LookerJavaMongoDBAWS

04 · Independent · Defence RFP

Vajra

Air-gapped Foundation Model Stack

block rate: 100%
false positive: 0%
tests: 217

A solo-built sovereign LLM stack that runs fully offline on a laptop. Designed for high-stakes environments with classification-aware retrieval and tamper-evident audit.

FastAPI gateway · loopback-only · cold-start <8s
Foundation-model inference via llama.cpp (model-agnostic by design)
Classification-aware RAG with metadata pre-filter applied before kNN — full role isolation across clearance tiers
Layered input policy engine (regex / heuristics / optional offline classifier) with distinct reason codes
Output validator: secret-pattern, PII, grounding-similarity, and system-prompt echo detection
Tamper-evident audit log: append-only JSONL with SHA-256 hash chaining + standalone CLI verifier
Frozen red-team corpus (152 attacks · 8 categories) with versioned scorecard

FastAPIPythonllama.cppChromaDBNext.jsSQLite

05 · Sep 2022 – Jun 2024

Cashfree Payments

Payments · Backend

test coverage: 80%+
core svc PoCs: 2

Backend features for critical onboarding flows. Led PoC migrations for core services (onboardingsvc, commonmerchantsvc) including Kong Gateway, ArgoCD, Kafka rotation, and DB migration.

Rest Assured automation suite with 80%+ consistent coverage on service contracts
Automated testing infrastructure on BrowserStack and AWS Grid
REST API discipline on payments-grade onboarding flows

JavaSpring BootKafkaKongArgoCDRest Assured

// stack

What I reach for

AI / LLM

Claude / OpenAI APIs
llama.cpp
Mistral
RAG (ChromaDB)
Prompt engineering
Cursor / Copilot

Backend

Spring Boot
Java
FastAPI
Python
C++
TypeScript

Data

Elasticsearch
MongoDB Atlas
Snowflake
Redis
Caffeine
SQLite

Orchestration

Temporal
Kafka
AWS Lambda
API Gateway

Infra & DevOps

AWS (S3, Lambda, CloudWatch)
ArgoCD
Harness
Jenkins
Kong

Observability

Grafana
New Relic
Kibana
AOP monitoring
Distributed tracing

Security

OWASP ZAP
Threat modelling
SHA-256 audit chains
Role-isolated RAG

Frontend

React
Next.js
Tailwind

// background

Where I've been

Jun 2024 — Present
Observe.ai— Backend Developer 2
AskObserve, Hermes, Looker migration, on-call
Sep 2022 — Jun 2024
Cashfree Payments— Software Developer
Onboarding flows, service migrations, automation
May 2022 — Jul 2022
PwC— Technical Analyst
Microsoft Power Platform · OLA Support, Suraksha
Oct 2021 — Feb 2022
GDCM Consultancy— Freelance Software Engineer
End-to-end React frontend · UX research, SRS
2019 — 2023
Vellore Institute of Technology— B.Tech, Computer Science (Information Security)
CGPA 9.2 / 10 · Core Committee, VIT Leo Club

// contact

Let's build something

I'm most useful on hard backend problems — high-throughput orchestration, search, LLM systems with real production constraints, or anything that needs to ship and stay up. If that's the kind of thing you're working on, say hi.