Our Research
Active projects. Production-grounded, reproducible, openly published.
Traceprop: End-to-End Provenance-Guided Data Attribution for Auditable Machine Learning
End-to-end data provenance for auditable ML. Traceprop connects raw source files through preprocessing and training all the way to individual predictions - closing the gap that MLflow, DVC, and attribution libraries all leave open. Three layers: ProvenanceTensor lineage (sub-1% overhead at 1M elements), JL-projected GradientStore attribution (LDS 0.622 on tabular, 266x faster than TRAK), and provenance-guided unlearning that exceeds retrain-from-scratch. Production-ready for EU AI Act Article 26 and GDPR Article 17 compliance. Apache 2.0. Preprint on Zenodo.
SynapseKit
A lightweight, production-focused LLM framework built from the ground up for speed and simplicity. v1.7.0 - 2 dependencies, 30x faster cold start than LangChain, async-native tool and retriever base classes. Designed for engineers who want batteries-included agent tooling without the middleware tax.
ChunkRank
A Python library for ranking and filtering document chunks by relevance before sending them to an LLM. Reduces token cost and improves answer quality in RAG pipelines by ensuring only the most informative chunks make it into the context window.
LLM Framework Showdown
A 30-notebook reproducible benchmark comparing SynapseKit, LangChain, and LlamaIndex across developer experience, RAG pipelines, agent capabilities, and production concerns. Every benchmark is runnable on Kaggle, every result is reproducible. We measure cold start time, dependency count, memory footprint, API abstraction depth, streaming latency, and 12 more dimensions that matter when you're shipping to production,not just running a demo.
