Skip to main content

Projects

What we're building and shipping.

May 5, 2026Engineers of AI

Traceprop: End-to-End Provenance-Guided Data Attribution for Auditable Machine Learning

We built Traceprop because every ML pipeline we audited had the same gap: source files on one side, predictions on the other, nothing connecting them. Traceprop closes it with three layers - ProvenanceTensor lineage (sub-1% overhead at 1M elements), JL-projected GradientStore attribution (LDS 0.622 on tabular, 266x faster than TRAK), and provenance-guided unlearning that exceeds retrain-from-scratch. EU AI Act Article 26 enforcement starts August 2026. Apache 2.0. Two lines of code change. Everything else stays the same.

pip install tracepropApache 2.0Zenodo DOI 10.5281/zenodo.20036000
credit_scores.csv row 4821ProvenanceTensornormalize, row_filterGradientStorek=4096, JL projectionAudit Answerrow 4821, score 0.921Benchmark ResultsLineage overhead1.007xLDS tabular (CPU)0.622vs TRAK speed266xUnlearn gap closed>100%Multi-src query2.36msApache 2.0 - pip install traceprop
Apr 15, 2026Engineers of AI

SynapseKit v1.7.0 - Lightweight LLM Framework

A production-grade LLM framework built because LangChain frustrated us,and we decided to measure whether a simpler approach could actually work better. SynapseKit v1.7.0 has 2 dependencies (vs LangChain's 67), a 30x faster cold start (12ms vs 360ms), and built-in cost guardrails that prevent a single agent run from blowing your API budget. Chains, agents, RAG pipelines, and tool use,all with zero magic and full debuggability. When something breaks at 3am, you can read the source in 20 minutes. MIT-licensed, fully documented, and battle-tested across 18 objective benchmarks.

2 dependencies30x faster cold startBuilt-in cost guardrails
SynapseKithttpxpydanticvsLangChain,67 dependencies30× faster2 vs 67MIT
Apr 2026Engineers of AI

LLM Framework Showdown,30 Notebooks on Kaggle

A reproducible benchmark comparing SynapseKit, LangChain, and LlamaIndex across developer experience, RAG pipelines, agent capabilities, and production concerns. Every notebook includes methodology, raw results, statistical significance tests, and our interpretation. No hidden preprocessing, no curated datasets. If you disagree with a result, fork the notebook and prove us wrong,that's the point. We measure cold start, memory footprint, streaming latency at P99, and 15 more dimensions that matter in production.

30 notebooks3 frameworksFully reproducible
30notebooksReproducible on Kaggle