$ whoami

Saurabh Singh

Backend & AI infrastructure engineer. I build production LLM systems and large-scale orchestration at Observe.ai. Previously Cashfree Payments.

Comfortable owning systems end-to-end — from Spring Boot + Temporal control planes processing 20M+ events/day, to FastAPI + llama.cpp stacks running fully offline.

events / day
20M+
tenants
1000+
infra saved / yr
$144K
code reduction
85%
// now

What I'm building

day job · observe.ai

Production AI & orchestration at scale

Architecting AskObserve — an LLM-powered conversational analytics layer routing between structured (Snowflake) and unstructured (LLM/NLP) sources, and Hermes — a Temporal-based unified pipeline orchestration replacing 15+ fragmented workflows.

side · vajra

Air-gapped sovereign LLM stack

A solo-built, fully offline foundation-model stack: FastAPI gateway, classification-aware RAG, layered policy engine, hash-chained audit log, and a SOC dashboard. Submitted to a defence-sector RFP. Details on request.

// selected work

Production systems I've shipped

01 · Observe.ai

AskObserve

Production LLM · Conversational Analytics

accuracy
+35%
latency
−20%
search
<100ms

Spring-based microservice that puts an LLM in the loop for 25+ business units, with an intelligent selector that routes between Snowflake structured analytics and LLM/NLP sources.

  • Built dual-source ML router that picks between structured analytics and LLM/NLP based on query shape
  • Engineered sub-100ms semantic history search on Elasticsearch + MongoDB Atlas with compound indexes and fuzzy matching
  • Designed tag-based filter pipeline (hierarchical teams, date-range merging, dynamic enrichments) translating UI selections into ML format
  • AOP-based monitoring + cache-driven filter pipeline for low-latency reads
Spring BootJavaElasticsearchMongoDB AtlasSnowflakeLLM APIsRedis
02 · Observe.ai

Hermes

Workflow Orchestration · Platform

events/day
20M+
tenants
1000+
code reduction
85%
storage saved
99.6%

Mission-critical platform transformation consolidating 15+ fragmented workflow implementations into a single Temporal-based orchestration layer.

  • Designed a hybrid stage-group DAG with constrained inputs, enabling dynamic pipeline construction from compile-time constants
  • Reduced new-stage addition effort from ~1 week to <1 day
  • Eliminated 99.6% of Temporal storage overhead (~1TB → ~4GB/day) using a claim-check pattern
  • Killed dual SQS/Temporal code paths — single source of truth for orchestration
TemporalJavaSpring BootKafkaMongoDBAWS
03 · Observe.ai

Looker zero-downtime migration

Infrastructure · Cost

saved / yr
$144K
envs migrated
3 prod
downtime
zero

Architected and executed a zero-downtime Looker cluster migration across 3 production environments with a custom account-based routing framework.

  • Custom account-routing framework for transparent cluster cutover
  • Robust Dry-Run / Execute / Rollback APIs with state transitions in MongoDB
  • Resolved auth-token cache collisions and large-dashboard payload limits via a custom Strip-Import-Patch approach
LookerJavaMongoDBAWS
04 · Independent · Defence RFP

Vajra

Air-gapped Foundation Model Stack

block rate
100%
false positive
0%
tests
217

A solo-built sovereign LLM stack that runs fully offline on a laptop. Designed for high-stakes environments with classification-aware retrieval and tamper-evident audit.

  • FastAPI gateway · loopback-only · cold-start <8s
  • Foundation-model inference via llama.cpp (model-agnostic by design)
  • Classification-aware RAG with metadata pre-filter applied before kNN — full role isolation across clearance tiers
  • Layered input policy engine (regex / heuristics / optional offline classifier) with distinct reason codes
  • Output validator: secret-pattern, PII, grounding-similarity, and system-prompt echo detection
  • Tamper-evident audit log: append-only JSONL with SHA-256 hash chaining + standalone CLI verifier
  • Frozen red-team corpus (152 attacks · 8 categories) with versioned scorecard
FastAPIPythonllama.cppChromaDBNext.jsSQLite
05 · Sep 2022 – Jun 2024

Cashfree Payments

Payments · Backend

test coverage
80%+
core svc PoCs
2

Backend features for critical onboarding flows. Led PoC migrations for core services (onboardingsvc, commonmerchantsvc) including Kong Gateway, ArgoCD, Kafka rotation, and DB migration.

  • Rest Assured automation suite with 80%+ consistent coverage on service contracts
  • Automated testing infrastructure on BrowserStack and AWS Grid
  • REST API discipline on payments-grade onboarding flows
JavaSpring BootKafkaKongArgoCDRest Assured
// stack

What I reach for

AI / LLM

  • Claude / OpenAI APIs
  • llama.cpp
  • Mistral
  • RAG (ChromaDB)
  • Prompt engineering
  • Cursor / Copilot

Backend

  • Spring Boot
  • Java
  • FastAPI
  • Python
  • C++
  • TypeScript

Data

  • Elasticsearch
  • MongoDB Atlas
  • Snowflake
  • Redis
  • Caffeine
  • SQLite

Orchestration

  • Temporal
  • Kafka
  • AWS Lambda
  • API Gateway

Infra & DevOps

  • AWS (S3, Lambda, CloudWatch)
  • ArgoCD
  • Harness
  • Jenkins
  • Kong

Observability

  • Grafana
  • New Relic
  • Kibana
  • AOP monitoring
  • Distributed tracing

Security

  • OWASP ZAP
  • Threat modelling
  • SHA-256 audit chains
  • Role-isolated RAG

Frontend

  • React
  • Next.js
  • Tailwind
// background

Where I've been

  1. Jun 2024 — Present
    Observe.aiBackend Developer 2
    AskObserve, Hermes, Looker migration, on-call
  2. Sep 2022 — Jun 2024
    Cashfree PaymentsSoftware Developer
    Onboarding flows, service migrations, automation
  3. May 2022 — Jul 2022
    PwCTechnical Analyst
    Microsoft Power Platform · OLA Support, Suraksha
  4. Oct 2021 — Feb 2022
    GDCM ConsultancyFreelance Software Engineer
    End-to-end React frontend · UX research, SRS
  5. 2019 — 2023
    Vellore Institute of TechnologyB.Tech, Computer Science (Information Security)
    CGPA 9.2 / 10 · Core Committee, VIT Leo Club
// contact

Let's build something

I'm most useful on hard backend problems — high-throughput orchestration, search, LLM systems with real production constraints, or anything that needs to ship and stay up. If that's the kind of thing you're working on, say hi.