Harshavardhanan Deekeswar

I build distributed systems. Fifteen years of it, most recently as Principal Architect at Verizon, working on network graph intelligence and integrating LLMs into operational tooling. Before that, architecture and delivery roles at Cognizant, TCS, and HCL across telecom, manufacturing, and insurance.

My focus now is production AI and token economics. In production AI, I work on the parts that matter after the MVP: reliability, latency budgets, evaluation, and failure modes. In token economics, I work on where tokens actually go, what they cost, and which optimizations hold up under real load. Fifteen years of distributed systems habits carry over directly.

I published ONTO earlier this year (arXiv:2604.17512). It's a columnar serialization format built for LLM input. Existing formats were designed for services exchanging documents. That assumption breaks when the consumer is a language model reading thousands of records under a token budget. ONTO treats LLM input as its own problem and reduces token usage by 46 to 51 percent versus JSON, with no measurable loss in accuracy.

I also write a research series on token economics in production. One recent piece of mine documents an undocumented behavior in OpenAI's API: prefix caches shared across model generations. At scale, this changes what you pay.

And I built an open-source AI Engineering Bootcamp. Ten weeks, production-focused, covering RAG, agents, LLMOps, observability, evaluation, and multi-agent patterns. I built it because it's the course I would have wanted when I started focusing on production AI.

Open to senior and staff roles. Remote or hybrid.

Harshavardhanan Deekeswar

Selected Work

ONTO

AI Engineering Bootcamp

Token Economics Research

Beyond Code