<?xml version="1.0" encoding="utf-8"?><?xml-stylesheet type="text/xsl" href="rss.xsl"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/">
    <channel>
        <title>AI Engineering Letters - EngineersOfAI</title>
        <link>https://engineersofai.com/blog</link>
        <description>Weekly deep insights on AI systems, architecture, and engineering thinking.</description>
        <lastBuildDate>Thu, 14 May 2026 00:00:00 GMT</lastBuildDate>
        <docs>https://validator.w3.org/feed/docs/rss2.html</docs>
        <generator>https://github.com/jpmonette/feed</generator>
        <language>en</language>
        <copyright>Copyright © 2026 EngineersOfAI</copyright>
        <item>
            <title><![CDATA[AI Letters #35 - Why We Built SynapseKit: The Framework We Deserve]]></title>
            <link>https://engineersofai.com/blog/ai-letters-35-why-we-built-synapsekit</link>
            <guid>https://engineersofai.com/blog/ai-letters-35-why-we-built-synapsekit</guid>
            <pubDate>Thu, 14 May 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[How we learned that 2 dependencies beat 50+, async-first beats sync-bolted-on, and transparency beats SaaS lock-in. The story of building an LLM framework from first principles.]]></description>
            <content:encoded><![CDATA[<p>It was 3 AM and production was on fire. An LLM pipeline had cold-started on Lambda taking 30 seconds just to import dependencies, while the $99/month observability tool told us nothing useful. We'd chosen a "safe" framework with 100K stars and enterprise support—but we were fighting it as much as building with it. That moment led us to rebuild from first principles. Meet SynapseKit: 2 dependencies, async-native, full cost transparency, and Apache 2.0 forever.</p>
<!-- -->
<div style="display:flex;flex-direction:column;gap:16px;margin:24px 0">
<a href="https://engineersofai.com/blog-visuals/ai-letters-35/timeline.html" target="_blank" rel="noopener noreferrer" style="text-decoration:none" class="">
<div style="border:1px solid #e2e8f0;border-radius:12px;padding:20px 24px;background:#fff;cursor:pointer">
<div style="font-size:0.72rem;font-weight:700;letter-spacing:0.08em;text-transform:uppercase;color:#6366f1;margin-bottom:6px">Interactive Chart</div>
<div style="font-size:1rem;font-weight:700;color:#0f172a;margin-bottom:6px">SynapseKit Roadmap - v1.7.0 to v2.0.0 →</div>
<div style="font-size:0.88rem;color:#475569;line-height:1.6">From 12 contributors in month 1 to 40+ by month 3. 8 major features shipping June-September 2026.</div>
</div>
</a>
<a href="https://engineersofai.com/blog-visuals/ai-letters-35/paradigms.html" target="_blank" rel="noopener noreferrer" style="text-decoration:none" class="">
<div style="border:1px solid #e2e8f0;border-radius:12px;padding:20px 24px;background:#fff;cursor:pointer">
<div style="font-size:0.72rem;font-weight:700;letter-spacing:0.08em;text-transform:uppercase;color:#0ea5e9;margin-bottom:6px">Interactive Chart</div>
<div style="font-size:1rem;font-weight:700;color:#0f172a;margin-bottom:6px">SynapseKit Design Philosophy →</div>
<div style="font-size:0.88rem;color:#475569;line-height:1.6">5 principles that compound: dependency minimalism, async-native, transparency, community, open source as moat.</div>
</div>
</a>
<a href="https://engineersofai.com/blog-visuals/ai-letters-35/evidence.html" target="_blank" rel="noopener noreferrer" style="text-decoration:none" class="">
<div style="border:1px solid #e2e8f0;border-radius:12px;padding:20px 24px;background:#fff;cursor:pointer">
<div style="font-size:0.72rem;font-weight:700;letter-spacing:0.08em;text-transform:uppercase;color:#10b981;margin-bottom:6px">Interactive Chart</div>
<div style="font-size:1rem;font-weight:700;color:#0f172a;margin-bottom:6px">Public Benchmarks &amp; Verdicts →</div>
<div style="font-size:0.88rem;color:#475569;line-height:1.6">Cold start, token costs, latency, and feature coverage. All data published. Anyone can reproduce.</div>
</div>
</a>
</div>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-problem-we-lived">The Problem We Lived<a href="https://engineersofai.com/blog/ai-letters-35-why-we-built-synapsekit#the-problem-we-lived" class="hash-link" aria-label="Direct link to The Problem We Lived" title="Direct link to The Problem We Lived" translate="no">​</a></h2>
<p>It was 3 AM. Production was on fire. An LLM pipeline had cold-started on Lambda, and the container was taking 30 seconds just to import dependencies. Meanwhile, the observability tool we paid $99/month for was telling us... nothing useful.</p>
<p>We'd chosen a popular framework because it was the "safe" choice. It had 100K stars, enterprise support, and a massive ecosystem. But in production, it felt like we were fighting the framework as much as building with it.</p>
<p>The async APIs were baked on top of synchronous code. The dependency tree was a forest (50+ transitive deps). Observability required another SaaS subscription. And debugging? Forget it—too much "magic" between you and the LLM call.</p>
<p>We're not unique. Thousands of teams have hit the same wall. And we thought: <strong>What if we rebuilt this from first principles?</strong></p>
<p>That question became SynapseKit.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="what-synapsekit-actually-is">What SynapseKit Actually Is<a href="https://engineersofai.com/blog/ai-letters-35-why-we-built-synapsekit#what-synapsekit-actually-is" class="hash-link" aria-label="Direct link to What SynapseKit Actually Is" title="Direct link to What SynapseKit Actually Is" translate="no">​</a></h2>
<p>SynapseKit is not trying to be a LangChain killer. It's trying to be different.</p>
<p>The difference starts here—not features, but <strong>principles</strong>:</p>








































<table><thead><tr><th>Problem</th><th>LangChain-Style</th><th>SynapseKit</th></tr></thead><tbody><tr><td>Dependencies</td><td>50+ (200 MB)</td><td>2 (numpy, rank-bm25)</td></tr><tr><td>Async Design</td><td>Bolted on</td><td>Native from day 1</td></tr><tr><td>Cost Visibility</td><td>$99+/month SaaS</td><td>Built-in, free</td></tr><tr><td>Deployment Tools</td><td>Deprecated</td><td>synapsekit serve</td></tr><tr><td>Observability</td><td>Black box</td><td>Instrumented, transparent</td></tr><tr><td>Token Tracking</td><td>Hidden</td><td>Per-call tracking</td></tr></tbody></table>
<p>We're building for <strong>production teams</strong> who are tired of choosing between:</p>
<ul>
<li class="">Power (but complexity)</li>
<li class="">Simplicity (but missing features)</li>
<li class="">Open source (but no support)</li>
<li class="">Commercial (but expensive and lock-in)</li>
</ul>
<p>SynapseKit says: You don't have to choose.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="what-this-means-for-you">What This Means for You<a href="https://engineersofai.com/blog/ai-letters-35-why-we-built-synapsekit#what-this-means-for-you" class="hash-link" aria-label="Direct link to What This Means for You" title="Direct link to What This Means for You" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="1-you-own-your-code">1. You Own Your Code<a href="https://engineersofai.com/blog/ai-letters-35-why-we-built-synapsekit#1-you-own-your-code" class="hash-link" aria-label="Direct link to 1. You Own Your Code" title="Direct link to 1. You Own Your Code" translate="no">​</a></h3>
<p>Every LLM call, every prompt, every decision—it's yours. There's no proprietary "chain" abstraction hiding what's happening.</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> synapsekit </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> RAG</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">rag </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> RAG</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">model</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"gpt-4o"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">rag</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">add</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"Your documents"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># This actually does what you think it does.</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># No hidden orchestration. No vendor-specific magic.</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">result </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">await</span><span class="token plain"> rag</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">query</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"What is this about?"</span><span class="token punctuation" style="color:#393A34">)</span><br></div></code></pre></div></div>
<p>Compare to frameworks where <code>rag.query()</code> invokes 12 internal transformations you didn't ask for.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="2-you-keep-90-of-your-cold-start">2. You Keep 90% of Your Cold Start<a href="https://engineersofai.com/blog/ai-letters-35-why-we-built-synapsekit#2-you-keep-90-of-your-cold-start" class="hash-link" aria-label="Direct link to 2. You Keep 90% of Your Cold Start" title="Direct link to 2. You Keep 90% of Your Cold Start" translate="no">​</a></h3>
<p>Lambda cold starts matter. A 2 KB framework matters.</p>
<p>We measured: <code>import synapsekit</code> = 200 ms. <code>import langchain</code> = 2.8 seconds.</p>
<p>That's not hypothetical. That's real deployments. That's the difference between your API responding in 100 ms vs 3 seconds during scale events.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="3-you-see-your-costs">3. You See Your Costs<a href="https://engineersofai.com/blog/ai-letters-35-why-we-built-synapsekit#3-you-see-your-costs" class="hash-link" aria-label="Direct link to 3. You See Your Costs" title="Direct link to 3. You See Your Costs" translate="no">​</a></h3>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> synapsekit </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> CostTracker</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> BudgetGuard</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">tracker </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> CostTracker</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">guard </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> BudgetGuard</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">daily</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">10.0</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> per_request</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">0.50</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">with</span><span class="token plain"> tracker</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">scope</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"my_pipeline"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    result </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">await</span><span class="token plain"> rag</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">query</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"Question?"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">print</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">tracker</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">summary</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># Output:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># total_cost: $0.0234</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># tokens_in: 1,200</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># tokens_out: 450</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># model: gpt-4o</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># cost_per_1k: $2.50 / $15.00</span><br></div></code></pre></div></div>
<p>Every LLM framework should have this. No SaaS fees. No surprise bills. Just <strong>facts</strong>.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="why-were-staying-open-source-forever">Why We're Staying Open Source (Forever)<a href="https://engineersofai.com/blog/ai-letters-35-why-we-built-synapsekit#why-were-staying-open-source-forever" class="hash-link" aria-label="Direct link to Why We're Staying Open Source (Forever)" title="Direct link to Why We're Staying Open Source (Forever)" translate="no">​</a></h2>
<p>This matters. So let's be clear about what open source means to us.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-temptation">The Temptation<a href="https://engineersofai.com/blog/ai-letters-35-why-we-built-synapsekit#the-temptation" class="hash-link" aria-label="Direct link to The Temptation" title="Direct link to The Temptation" translate="no">​</a></h3>
<p>VC-backed frameworks always face the moment: "When do we monetize?"</p>
<p>LangChain took it by building LangSmith ($99+/mo). That's a valid business model. But it creates incentive misalignment: the best features live behind a paywall.</p>
<p>We're choosing differently.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-bet">The Bet<a href="https://engineersofai.com/blog/ai-letters-35-why-we-built-synapsekit#the-bet" class="hash-link" aria-label="Direct link to The Bet" title="Direct link to The Bet" translate="no">​</a></h3>
<p><strong>SynapseKit core = Apache 2.0 forever.</strong></p>
<p>No tricky license changes. No "open core" where the good stuff is closed. No "we're keeping the best for enterprise."</p>
<p>The framework you use in production is the same framework available to students, hobbyists, and competitors.</p>
<p>Why? Because:</p>
<ol>
<li class=""><strong>Trust compounds.</strong> If you know the code can't suddenly become proprietary, you can bet your infrastructure on it.</li>
<li class=""><strong>Bugs matter less.</strong> Open source means crowdsourced debugging. 200 eyes beat 20.</li>
<li class=""><strong>Optimization flows both ways.</strong> When a user optimizes for their use case and contributes it back, everyone wins.</li>
<li class=""><strong>We make money differently.</strong> (More on that below.)</li>
</ol>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="what-we-monetize">What We Monetize<a href="https://engineersofai.com/blog/ai-letters-35-why-we-built-synapsekit#what-we-monetize" class="hash-link" aria-label="Direct link to What We Monetize" title="Direct link to What We Monetize" translate="no">​</a></h3>
<p>We monetize on top, not instead of:</p>
<ul>
<li class=""><strong>SynapseKit Core</strong> (framework) - Apache 2.0, always free</li>
<li class=""><strong>EvalCI Pro</strong> (evaluation SaaS) - Team dashboards, Slack alerts, private repos</li>
<li class=""><strong>synapsekit.cloud</strong> (managed hosting) - Deploy with one command</li>
<li class=""><strong>Compliance reports</strong> - EU AI Act and GDPR audits for enterprises</li>
</ul>
<p>The core framework is the funnel. Everything else is optional.</p>
<p><strong>This is the bet:</strong> Build the most trustworthy LLM framework. Let it be free. Earn money by solving operational problems the framework surfaces.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="what-the-community-taught-us">What the Community Taught Us<a href="https://engineersofai.com/blog/ai-letters-35-why-we-built-synapsekit#what-the-community-taught-us" class="hash-link" aria-label="Direct link to What the Community Taught Us" title="Direct link to What the Community Taught Us" translate="no">​</a></h2>
<p>We shipped SynapseKit in March 2026. By May, we had 12 contributors and 9,200 downloads in 30 days.</p>
<p>Here's what the community actually cares about (not what we thought they would):</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="simplicity-beats-ecosystem">Simplicity Beats Ecosystem<a href="https://engineersofai.com/blog/ai-letters-35-why-we-built-synapsekit#simplicity-beats-ecosystem" class="hash-link" aria-label="Direct link to Simplicity Beats Ecosystem" title="Direct link to Simplicity Beats Ecosystem" translate="no">​</a></h3>
<p>We expected people to love our 33 LLM providers. They do. But what they really love: changing one line (<code>model="anthropic/claude"</code> to <code>model="groq/mixtral"</code>) and the entire pipeline switches.</p>
<p><strong>Lesson:</strong> Unified APIs beat breadth.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="cost-visibility-beats-ease">Cost Visibility Beats Ease<a href="https://engineersofai.com/blog/ai-letters-35-why-we-built-synapsekit#cost-visibility-beats-ease" class="hash-link" aria-label="Direct link to Cost Visibility Beats Ease" title="Direct link to Cost Visibility Beats Ease" translate="no">​</a></h3>
<p>We built CostTracker assuming 5% of users would enable it. 40% did immediately.</p>
<p>Teams aren't afraid of complexity. They're afraid of <strong>surprise bills</strong>.</p>
<p><strong>Lesson:</strong> Make the invisible visible.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="async-native-beats-backwards-compatibility">Async-Native Beats Backwards Compatibility<a href="https://engineersofai.com/blog/ai-letters-35-why-we-built-synapsekit#async-native-beats-backwards-compatibility" class="hash-link" aria-label="Direct link to Async-Native Beats Backwards Compatibility" title="Direct link to Async-Native Beats Backwards Compatibility" translate="no">​</a></h3>
<p>We chose async-first, sync-wrappers. We got pushback: "But some teams only use sync!"</p>
<p>Six months later, those teams were refactoring to async. The performance difference was too obvious to ignore.</p>
<p><strong>Lesson:</strong> The future is async. Bet on it.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="testing-beats-documentation">Testing Beats Documentation<a href="https://engineersofai.com/blog/ai-letters-35-why-we-built-synapsekit#testing-beats-documentation" class="hash-link" aria-label="Direct link to Testing Beats Documentation" title="Direct link to Testing Beats Documentation" translate="no">​</a></h3>
<p>We shipped with thorough tests (2,161 by v1.5.6) but sparse docs. People still contributed. They read the tests as documentation.</p>
<p><strong>Lesson:</strong> Tests are the spec.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="transparency-beats-polish">Transparency Beats Polish<a href="https://engineersofai.com/blog/ai-letters-35-why-we-built-synapsekit#transparency-beats-polish" class="hash-link" aria-label="Direct link to Transparency Beats Polish" title="Direct link to Transparency Beats Polish" translate="no">​</a></h3>
<p>When we had a bug in async evaluation (v1.5.1), we posted a detailed postmortem explaining why we missed it. The community response: "At least you're honest."</p>
<p><strong>Lesson:</strong> Admit mistakes. Explain root causes. Ship fixes.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="how-were-benchmarking-everything-no-illusions">How We're Benchmarking Everything (No Illusions)<a href="https://engineersofai.com/blog/ai-letters-35-why-we-built-synapsekit#how-were-benchmarking-everything-no-illusions" class="hash-link" aria-label="Direct link to How We're Benchmarking Everything (No Illusions)" title="Direct link to How We're Benchmarking Everything (No Illusions)" translate="no">​</a></h2>
<p>We could say "SynapseKit is faster" and assume no one would check. But we're betting on people who will check.</p>
<p>So we're running public benchmarks:</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="cold-start-benchmarks">Cold Start Benchmarks<a href="https://engineersofai.com/blog/ai-letters-35-why-we-built-synapsekit#cold-start-benchmarks" class="hash-link" aria-label="Direct link to Cold Start Benchmarks" title="Direct link to Cold Start Benchmarks" translate="no">​</a></h3>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">Framework          Import Time    Container Size</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">SynapseKit         200 ms         ~5 MB</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Framework B        2,800 ms       ~200 MB</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Framework C        1,200 ms       ~150 MB</span><br></div></code></pre></div></div>
<p>Published monthly. Real data. Anyone can reproduce it.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="token-cost-benchmarks">Token Cost Benchmarks<a href="https://engineersofai.com/blog/ai-letters-35-why-we-built-synapsekit#token-cost-benchmarks" class="hash-link" aria-label="Direct link to Token Cost Benchmarks" title="Direct link to Token Cost Benchmarks" translate="no">​</a></h3>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">Task: "Summarize 10 documents, return JSON"</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Model    Via SynapseKit    Via Others    Difference</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">GPT-4o   $0.0234           $0.0234       (same!)</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Claude   $0.0198           $0.0198       (same!)</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Groq     $0.00001          $0.00001      (same!)</span><br></div></code></pre></div></div>
<p>No hidden markup. No feature taxes. We're a passthrough.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="latency-benchmarks">Latency Benchmarks<a href="https://engineersofai.com/blog/ai-letters-35-why-we-built-synapsekit#latency-benchmarks" class="hash-link" aria-label="Direct link to Latency Benchmarks" title="Direct link to Latency Benchmarks" translate="no">​</a></h3>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">Operation                  P50    P95    P99</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">RAG query (retrieval)      45ms   120ms  300ms</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Agent tool call            80ms   250ms  800ms</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Graph workflow (10 nodes)  200ms  600ms  1.5s</span><br></div></code></pre></div></div>
<p>Published, reproducible, hardware-specified.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="feature-coverage-benchmarks">Feature Coverage Benchmarks<a href="https://engineersofai.com/blog/ai-letters-35-why-we-built-synapsekit#feature-coverage-benchmarks" class="hash-link" aria-label="Direct link to Feature Coverage Benchmarks" title="Direct link to Feature Coverage Benchmarks" translate="no">​</a></h3>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">Feature              SynapseKit    Others</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">LLM Providers        33            38+</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Document Loaders     53            200+</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Vector Stores        11            15+</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Built-in Tools       47+           50+</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Async Support        ✅ Native     ⚠ Bolted-on</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Token Tracking       ✅ Free       ❌ Paid</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Deployment           ✅ Built-in   ❌ Deprecated</span><br></div></code></pre></div></div>
<p>No hidden asterisks. No "features you can't use."</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="why-we-benchmark">Why We Benchmark<a href="https://engineersofai.com/blog/ai-letters-35-why-we-built-synapsekit#why-we-benchmark" class="hash-link" aria-label="Direct link to Why We Benchmark" title="Direct link to Why We Benchmark" translate="no">​</a></h3>
<p>We're not trying to win on every metric. We're trying to be honest about the tradeoffs.</p>
<p>Yes, LangChain has 200+ loaders. We have 53. But those 53 are maintained and tested. A loader that breaks silently is worse than no loader.</p>
<p>Yes, we're missing some providers. But when you use a provider on SynapseKit, you know it works because we test it against actual APIs.</p>
<p><strong>The bet:</strong> Teams would rather have 90% great than 100% mediocre.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="why-well-be-the-best-tool">Why We'll Be the Best Tool<a href="https://engineersofai.com/blog/ai-letters-35-why-we-built-synapsekit#why-well-be-the-best-tool" class="hash-link" aria-label="Direct link to Why We'll Be the Best Tool" title="Direct link to Why We'll Be the Best Tool" translate="no">​</a></h2>
<p>Not because we have the most features. Not because we have the most stars.</p>
<p>Because we're built on principles that compound:</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="dependency-minimalism--embeddability">Dependency Minimalism = Embeddability<a href="https://engineersofai.com/blog/ai-letters-35-why-we-built-synapsekit#dependency-minimalism--embeddability" class="hash-link" aria-label="Direct link to Dependency Minimalism = Embeddability" title="Direct link to Dependency Minimalism = Embeddability" translate="no">​</a></h3>
<p>Every dependency you add is a future security hole, a version conflict, a cold start penalty.</p>
<p>We said: What if we just didn't? What if we built for embedding first, plugins second?</p>
<p>This means SynapseKit works in:</p>
<ul>
<li class="">Lambda (fast cold starts)</li>
<li class="">Kubernetes (light containers)</li>
<li class="">Mobile (small binaries)</li>
<li class="">Edge (no Python stdlib bloat)</li>
</ul>
<p>Others can't do this without a rewrite.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="async-native--production-ready">Async-Native = Production-Ready<a href="https://engineersofai.com/blog/ai-letters-35-why-we-built-synapsekit#async-native--production-ready" class="hash-link" aria-label="Direct link to Async-Native = Production-Ready" title="Direct link to Async-Native = Production-Ready" translate="no">​</a></h3>
<p>Async isn't about being faster in theory. It's about handling real-world concurrency: 100 concurrent requests, 50 LLM API calls in flight, 10K tokens streaming.</p>
<p>Sync-first frameworks hit a wall at scale. Async-first frameworks scale to infinity.</p>
<p>We bet on infinity.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="transparency--trust">Transparency = Trust<a href="https://engineersofai.com/blog/ai-letters-35-why-we-built-synapsekit#transparency--trust" class="hash-link" aria-label="Direct link to Transparency = Trust" title="Direct link to Transparency = Trust" translate="no">​</a></h3>
<p>No proprietary chains. No hidden costs. No surprise bills. Every LLM call is logged, tracked, and visible.</p>
<p>Trust is the hardest thing to build. And the easiest to lose. We're not willing to risk it for short-term gains.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="community--compounding-returns">Community = Compounding Returns<a href="https://engineersofai.com/blog/ai-letters-35-why-we-built-synapsekit#community--compounding-returns" class="hash-link" aria-label="Direct link to Community = Compounding Returns" title="Direct link to Community = Compounding Returns" translate="no">​</a></h3>
<p>12 contributors in month 1. We're not paying them. They're contributing because:</p>
<ul>
<li class="">They believe in the mission</li>
<li class="">The codebase is legible</li>
<li class="">Contributions are credited</li>
<li class="">The community is kind</li>
</ul>
<p>This compounds. Month 2: 20 contributors. Month 3: 40 contributors. By year 2: a community-driven framework that no VC team could build.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="open-source--moat">Open Source = Moat<a href="https://engineersofai.com/blog/ai-letters-35-why-we-built-synapsekit#open-source--moat" class="hash-link" aria-label="Direct link to Open Source = Moat" title="Direct link to Open Source = Moat" translate="no">​</a></h3>
<p>Counterintuitive: staying open source is our biggest competitive advantage.</p>
<p>Why? Because:</p>
<ul>
<li class="">Teams bet their infra on open source. Not on a company.</li>
<li class="">Open source survives company acquisition/failure. Closed source doesn't.</li>
<li class="">Switching costs from open source are high (migration time, vendor trust). But lock-in is low (you always own the code).</li>
</ul>
<p>This is a different kind of moat. It's built on trust, not contracts.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-8-features-were-shipping-v180---v200">The 8 Features We're Shipping (v1.8.0 - v2.0.0)<a href="https://engineersofai.com/blog/ai-letters-35-why-we-built-synapsekit#the-8-features-were-shipping-v180---v200" class="hash-link" aria-label="Direct link to The 8 Features We're Shipping (v1.8.0 - v2.0.0)" title="Direct link to The 8 Features We're Shipping (v1.8.0 - v2.0.0)" translate="no">​</a></h2>
<p>We just mapped the roadmap. Here's what's coming:</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="v180-production-grade-june-15">v1.8.0: Production Grade (June 15)<a href="https://engineersofai.com/blog/ai-letters-35-why-we-built-synapsekit#v180-production-grade-june-15" class="hash-link" aria-label="Direct link to v1.8.0: Production Grade (June 15)" title="Direct link to v1.8.0: Production Grade (June 15)" translate="no">​</a></h3>
<ul>
<li class="">🔍 <strong>Observability Dashboard:</strong> OpenTelemetry and Prometheus (no SaaS needed)</li>
<li class="">✅ <strong>Structured Output:</strong> Validation and auto-retry (no more JSON failures)</li>
<li class="">💾 <strong>Smart Context:</strong> Hierarchical allocation and prompt caching (80% cost reduction)</li>
<li class="">📊 <strong>Retrieval Metrics:</strong> Measure if RAG actually helps</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="v190-advanced-retrieval-july-20">v1.9.0: Advanced Retrieval (July 20)<a href="https://engineersofai.com/blog/ai-letters-35-why-we-built-synapsekit#v190-advanced-retrieval-july-20" class="hash-link" aria-label="Direct link to v1.9.0: Advanced Retrieval (July 20)" title="Direct link to v1.9.0: Advanced Retrieval (July 20)" translate="no">​</a></h3>
<ul>
<li class="">🌐 <strong>Knowledge Graphs:</strong> Multi-hop reasoning and entity relationships</li>
<li class="">🧠 <strong>Reasoning Routing:</strong> Smart routing to o1/o3/Claude thinking models</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="v200-distributed-september-1">v2.0.0: Distributed (September 1)<a href="https://engineersofai.com/blog/ai-letters-35-why-we-built-synapsekit#v200-distributed-september-1" class="hash-link" aria-label="Direct link to v2.0.0: Distributed (September 1)" title="Direct link to v2.0.0: Distributed (September 1)" translate="no">​</a></h3>
<ul>
<li class="">🤖 <strong>Agent Federation:</strong> Multi-agent coordination at scale</li>
<li class="">📈 <strong>Feedback Loops:</strong> Production to training data to auto-improvement</li>
</ul>
<p><strong>We're shipping 8 major features in 4 months.</strong> The framework as built by the community.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="what-success-looks-like">What Success Looks Like<a href="https://engineersofai.com/blog/ai-letters-35-why-we-built-synapsekit#what-success-looks-like" class="hash-link" aria-label="Direct link to What Success Looks Like" title="Direct link to What Success Looks Like" translate="no">​</a></h2>
<p>Not valuation. Not GitHub stars (though those help).</p>
<p>Success is:</p>
<ul>
<li class="">A team deploys an LLM app on SynapseKit and it just works.</li>
<li class="">A student learns async Python by reading SynapseKit's codebase.</li>
<li class="">An open-source contributor ships a feature that 10,000 people use.</li>
<li class="">A startup scales to 1M requests/day without hitting a wall.</li>
<li class="">An enterprise can audit the code and say "Yeah, we trust this."</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="join-us">Join Us<a href="https://engineersofai.com/blog/ai-letters-35-why-we-built-synapsekit#join-us" class="hash-link" aria-label="Direct link to Join Us" title="Direct link to Join Us" translate="no">​</a></h2>
<p>We're hiring open-source contributors. Not employees. Contributors.</p>
<p><strong>You pick an issue.</strong> You ship it. You're credited as co-author. End of transaction.</p>
<p><strong>Start here:</strong> <a href="https://github.com/SynapseKit/SynapseKit/issues/695-702" target="_blank" rel="noopener noreferrer" class="">https://github.com/SynapseKit/SynapseKit/issues/695-702</a></p>
<p>8 issues. Your choice. 1-3 weeks. Shipped to production.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-final-truth">The Final Truth<a href="https://engineersofai.com/blog/ai-letters-35-why-we-built-synapsekit#the-final-truth" class="hash-link" aria-label="Direct link to The Final Truth" title="Direct link to The Final Truth" translate="no">​</a></h2>
<p>We're not building SynapseKit because we think we're smarter than the frameworks that came before. We're building it because we learned from them.</p>
<p>We learned that:</p>
<ul>
<li class="">Teams care about cold starts more than ecosystem breadth</li>
<li class="">Cost transparency beats feature parity</li>
<li class="">Async-native isn't optional in 2026</li>
<li class="">Open source isn't a business model; it's a moat</li>
<li class="">Benchmarks matter more than claims</li>
</ul>
<p>We're building for the 10,000 teams shipping LLM apps in production right now. Not the 100 teams with billion-dollar budgets. Not the students building chatbots. Not the conferences talking about theory.</p>
<p>For people who actually care that their imports don't take 3 seconds. Who track every dollar. Who want to read the code they ship. Who believe open source beats closed ecosystems.</p>
<p>If that's you, we'll see you in the PRs.</p>
<p><strong>Let's build the framework we deserve.</strong></p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="resources">Resources<a href="https://engineersofai.com/blog/ai-letters-35-why-we-built-synapsekit#resources" class="hash-link" aria-label="Direct link to Resources" title="Direct link to Resources" translate="no">​</a></h2>
<ul>
<li class=""><strong>GitHub:</strong> <a href="https://github.com/SynapseKit/SynapseKit" target="_blank" rel="noopener noreferrer" class="">https://github.com/SynapseKit/SynapseKit</a></li>
<li class=""><strong>Docs:</strong> <a href="https://synapsekit.github.io/synapsekit-docs/" target="_blank" rel="noopener noreferrer" class="">https://synapsekit.github.io/synapsekit-docs/</a></li>
<li class=""><strong>Discord:</strong> <a href="https://discord.com/invite/PSuAXHRywJ" target="_blank" rel="noopener noreferrer" class="">https://discord.com/invite/PSuAXHRywJ</a></li>
</ul>
<hr>
<p><em>Written May 14, 2026. SynapseKit v1.7.0 is live. v1.8.0 ships June 15.</em></p>
<p><em>This post will be outdated in 2 months. That's the point.</em></p>
<p><strong>Read more: <a href="https://www.engineersofai.com/" target="_blank" rel="noopener noreferrer" class="">www.engineersofai.com</a></strong></p>
<p><em>If this was useful, forward it to one engineer who should be reading it.</em></p>]]></content:encoded>
            <category>ai</category>
            <category>engineering</category>
            <category>Open Source</category>
        </item>
        <item>
            <title><![CDATA[AI Letters #34 - The 30-Day LLM Framework Verdict: 25 Benchmarks, One Clear Answer]]></title>
            <link>https://engineersofai.com/blog/ai-letters-34-framework-showdown-finale</link>
            <guid>https://engineersofai.com/blog/ai-letters-34-framework-showdown-finale</guid>
            <pubDate>Fri, 08 May 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[After 30 notebooks testing SynapseKit, LangChain, and LlamaIndex across dev experience, RAG, agents, and production readiness, here is what the data actually shows.]]></description>
            <content:encoded><![CDATA[<p>30 notebooks. 25 benchmarks. SynapseKit 14 wins (8.39/10), LangChain 7 wins (6.83/10), LlamaIndex 4 wins (6.40/10). Here is where each framework wins, where it loses, and which one you should actually use.</p>
<!-- -->
<div style="display:flex;flex-direction:column;gap:16px;margin:24px 0">
<a href="https://engineersofai.com/blog-visuals/ai-letters-34/timeline.html" target="_blank" rel="noopener noreferrer" style="text-decoration:none" class="">
<div style="border:1px solid #e2e8f0;border-radius:12px;padding:20px 24px;background:#fff;cursor:pointer">
<div style="font-size:0.72rem;font-weight:700;letter-spacing:0.08em;text-transform:uppercase;color:#6366f1;margin-bottom:6px">Interactive Chart</div>
<div style="font-size:1rem;font-weight:700;color:#0f172a;margin-bottom:6px">30-Day Benchmark Timeline -&gt;</div>
<div style="font-size:0.88rem;color:#475569;line-height:1.6">All 25 benchmark scores across 4 weeks. Filter by week, hover for details, see which framework won each notebook.</div>
</div>
</a>
<a href="https://engineersofai.com/blog-visuals/ai-letters-34/paradigms.html" target="_blank" rel="noopener noreferrer" style="text-decoration:none" class="">
<div style="border:1px solid #e2e8f0;border-radius:12px;padding:20px 24px;background:#fff;cursor:pointer">
<div style="font-size:0.72rem;font-weight:700;letter-spacing:0.08em;text-transform:uppercase;color:#0ea5e9;margin-bottom:6px">Interactive Code Comparison</div>
<div style="font-size:1rem;font-weight:700;color:#0f172a;margin-bottom:6px">Simplest RAG: Line by Line -&gt;</div>
<div style="font-size:0.88rem;color:#475569;line-height:1.6">Side-by-side code for all three frameworks across three complexity levels. See the LoC cost of adding retrieval and memory.</div>
</div>
</a>
<a href="https://engineersofai.com/blog-visuals/ai-letters-34/evidence.html" target="_blank" rel="noopener noreferrer" style="text-decoration:none" class="">
<div style="border:1px solid #e2e8f0;border-radius:12px;padding:20px 24px;background:#fff;cursor:pointer">
<div style="font-size:0.72rem;font-weight:700;letter-spacing:0.08em;text-transform:uppercase;color:#10b981;margin-bottom:6px">Final Verdict Dashboard</div>
<div style="font-size:1rem;font-weight:700;color:#0f172a;margin-bottom:6px">Win Distribution &amp; Category Breakdown -&gt;</div>
<div style="font-size:0.88rem;color:#475569;line-height:1.6">Click each framework to see exactly what it wins and where it struggles. Radar chart, category averages, and when-to-use guide.</div>
</div>
</a>
</div>
<blockquote>
<p>After 30 notebooks and 25 benchmarks, the ranking is clear. But the more interesting result is <em>where</em> each framework loses.</p>
</blockquote>
<p>I started this series with a simple question: if you were starting a new AI project today, which framework should you actually use?</p>
<p>Not "which has the most GitHub stars." Not "which has the best documentation." Not "which do the most job listings mention." Which one performs better on the tasks you will actually need to do - from cold start to production guardrails?</p>
<p>Thirty notebooks later, the data has an answer. The answer is not what I expected when I designed the benchmarks.</p>
<p>The series ran four weeks. Week 1 tested developer experience: how fast can you install it, how many lines to get a working RAG, how much memory does it use, how well does it handle provider switching, how readable are its error messages? Week 2 moved into RAG pipelines: PDF ingestion, chunking strategies, BM25, hybrid search, streaming, conversation memory. Week 3 covered agents: ReAct loops, function calling, built-in tools, multi-agent orchestration, observability, error handling. Week 4 tested production readiness: async throughput, graph workflows, LLM evaluation, cost tracking, guardrails, MCP support. The finale (#29) asked a deliberately blunt question: what is the absolute minimum code to build a working RAG pipeline in each framework?</p>
<p><em>Disclosure: I am the author of SynapseKit. All benchmarks are reproducible - every notebook is public on Kaggle. Fork and run yourself.</em></p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="what-the-data-actually-shows">What the Data Actually Shows<a href="https://engineersofai.com/blog/ai-letters-34-framework-showdown-finale#what-the-data-actually-shows" class="hash-link" aria-label="Direct link to What the Data Actually Shows" title="Direct link to What the Data Actually Shows" translate="no">​</a></h2>
<p>The final scores across 25 benchmarks:</p>

































<table><thead><tr><th>Framework</th><th>Avg Score</th><th>Total</th><th>Wins</th><th>Win %</th></tr></thead><tbody><tr><td>SynapseKit</td><td>8.39/10</td><td>209.7</td><td>14</td><td>56%</td></tr><tr><td>LangChain</td><td>6.83/10</td><td>170.8</td><td>7</td><td>28%</td></tr><tr><td>LlamaIndex</td><td>6.40/10</td><td>160.0</td><td>4</td><td>16%</td></tr></tbody></table>
<p>That top-line number is not the interesting part. The interesting part is the pattern of <em>where</em> each framework wins and loses.</p>
<p>SynapseKit wins 4 of 6 in Week 1, 2 of 6 in Week 2, 3 of 6 in Week 3, and 4 of 6 in Week 4. The only weeks where it does not dominate are the ones involving complex agent orchestration (Week 3) and deep RAG quality (Week 2). Those are exactly the areas where LangChain and LlamaIndex have years of accumulated investment.</p>
<p>LangChain wins 7 of 25. All 7 are in areas requiring sophisticated composition: streaming, conversation memory, function calling, multi-agent, observability, graph workflows. LangGraph - LangChain's DAG abstraction - is genuinely the most mature stateful workflow tool available in any LLM framework today. That is not close.</p>
<p>LlamaIndex wins 4 of 25. Three of those wins are RAG-specific: PDF ingestion, chunking strategies, and LLM evaluation. LlamaIndex's faithfulness and relevancy evaluators are deeper than anything the other two frameworks ship out of the box.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-evidence">The Evidence<a href="https://engineersofai.com/blog/ai-letters-34-framework-showdown-finale#the-evidence" class="hash-link" aria-label="Direct link to The Evidence" title="Direct link to The Evidence" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="week-4-production-readiness">Week 4: Production Readiness<a href="https://engineersofai.com/blog/ai-letters-34-framework-showdown-finale#week-4-production-readiness" class="hash-link" aria-label="Direct link to Week 4: Production Readiness" title="Direct link to Week 4: Production Readiness" translate="no">​</a></h3>
<p>The Week 4 results were the most lopsided of the series. SynapseKit took 4 of 6.</p>
<p><strong>Async throughput (#22):</strong> SynapseKit delivered 3.2x LangChain's throughput at 20 concurrent requests. The framework is async-native at the core. LangChain and LlamaIndex treat async as an add-on.</p>
<p><strong>Guardrails (#26):</strong> SynapseKit is the only framework with built-in <code>PIIDetector</code>, <code>PIIRedactor</code>, and <code>ContentFilter</code> primitives. LangChain scored 4.5/10. LlamaIndex scored 3.5/10. SynapseKit scored 9.8/10. That gap reflects a fundamental design choice about what belongs in the framework.</p>
<p><strong>MCP Support (#27):</strong> SynapseKit supports MCP in-process, with a sync API, hitting 8/8 protocol features. LangChain hit 3/8 and requires a subprocess. As MCP becomes the standard interface for AI-to-tool connectivity, this gap will matter more.</p>
<p><strong>Cost Tracking (#25):</strong> <code>CostTracker</code> in SynapseKit is 2 lines. Per-call tracking, session rollups, and budget limits. In LangChain you write this yourself using callbacks. In LlamaIndex you hook into their event system.</p>
<p>LangChain took graph workflows (#23). LangGraph scored 9.0/10. The StateGraph abstraction is genuinely better than anything the other frameworks offer for conditional branching, human-in-the-loop workflows, and persistent agent state.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-simplest-rag-test-29">The Simplest RAG Test (#29)<a href="https://engineersofai.com/blog/ai-letters-34-framework-showdown-finale#the-simplest-rag-test-29" class="hash-link" aria-label="Direct link to The Simplest RAG Test (#29)" title="Direct link to The Simplest RAG Test (#29)" translate="no">​</a></h3>
<p>This was the most revealing single benchmark of the series.</p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">SynapseKit - Level 1 (minimum viable RAG):</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">  from synapsekit import RAG</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">  answer = RAG.quick(SAMPLE_DOC, QUERY)</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">  Total: 2 lines</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">LlamaIndex - Level 1:</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">  from llama_index.core import VectorStoreIndex, Document</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">  index  = VectorStoreIndex.from_documents([Document(text=SAMPLE_DOC)])</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">  engine = index.as_query_engine()</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">  answer = engine.query(QUERY)</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">  Total: 4 lines (+ global Settings.llm required)</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">LangChain - Level 1:</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">  from langchain_core.prompts import ChatPromptTemplate</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">  from langchain_core.runnables import RunnablePassthrough</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">  from langchain_core.output_parsers import StrOutputParser</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">  prompt = ChatPromptTemplate.from_template(...)</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">  chain = (</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">      {"context": RunnablePassthrough(), "question": ...}</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">      | prompt | llm | StrOutputParser()</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">  )</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">  answer = chain.invoke({"context": SAMPLE_DOC, "question": QUERY})</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">  Total: 13 lines</span><br></div></code></pre></div></div>
<p>The complexity tax per added feature:</p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">Feature added      SK    LC    LI</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">----------------------------------</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Base (L1)           2    13     4</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">+ Retrieval (L2)   +3    +8    +3</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">+ Memory (L3)      +2    +6    +4</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">----------------------------------</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Full pipeline (L3)  7    27    11</span><br></div></code></pre></div></div>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-full-30-day-pattern">The Full 30-Day Pattern<a href="https://engineersofai.com/blog/ai-letters-34-framework-showdown-finale#the-full-30-day-pattern" class="hash-link" aria-label="Direct link to The Full 30-Day Pattern" title="Direct link to The Full 30-Day Pattern" translate="no">​</a></h3>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">Category          SK avg   LC avg   LI avg   SK wins</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">----------------------------------------------------</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Week 1 Dev Exp      8.37     5.83     6.00      4/6</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Week 2 RAG          8.08     7.00     7.33      2/6</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Week 3 Agents       8.17     8.08     6.08      3/6</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Week 4 Production   8.75     6.63     5.92      4/6</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Week 5 Simplest     9.50     5.50     8.00      1/1</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">----------------------------------------------------</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Overall             8.39     6.83     6.40     14/25</span><br></div></code></pre></div></div>
<p>Week 3 (Agents) is where the race was closest: SynapseKit 8.17, LangChain 8.08. LangChain's multi-agent orchestration and observability tooling are genuinely strong.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="what-this-means-for-engineers">What This Means for Engineers<a href="https://engineersofai.com/blog/ai-letters-34-framework-showdown-finale#what-this-means-for-engineers" class="hash-link" aria-label="Direct link to What This Means for Engineers" title="Direct link to What This Means for Engineers" translate="no">​</a></h2>
<p><strong>1. The "fewest lines" metric is not vanity - it predicts maintenance cost.</strong></p>
<p>Every line of boilerplate is a line someone has to read, debug, and update when the API changes. A 13-line Level 1 RAG means every junior engineer on your team has to understand RunnablePassthrough before they can make their first contribution. A 2-line RAG means they start from the problem, not the plumbing.</p>
<p><strong>2. LangGraph is a genuine competitive advantage - but only if you need it.</strong></p>
<p>If your application requires stateful DAG workflows - conditional branching, human-in-the-loop approval steps, persistent agent memory across sessions - LangGraph is the best tool available. If your application does not need that, you are paying the complexity tax of LangChain without getting the payoff.</p>
<p><strong>3. LlamaIndex's RAG evaluators are not replicable elsewhere in 10 minutes.</strong></p>
<p>The faithfulness and context recall evaluators LlamaIndex ships have years of iteration behind them. If you are running a serious RAG system where retrieval quality is a measurable business metric, LlamaIndex's evaluation infrastructure is worth the integration cost.</p>
<p><strong>4. Production primitives (guardrails, cost tracking, MCP) belong in the framework, not in your code.</strong></p>
<p>Every PII detection regex you write in your app layer is a liability. Every manual token counter is a bug waiting to happen when you switch models. SynapseKit's Week 4 wins reflect a deliberate choice to move production concerns into framework primitives.</p>
<p><strong>5. The ecosystem gap is real and will not close quickly.</strong></p>
<p>LangChain has more blog posts, more Stack Overflow questions, more third-party integrations, and more engineers who already know it than SynapseKit. When something breaks in production at 2am, you want that ecosystem.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-part-most-people-will-get-wrong">The Part Most People Will Get Wrong<a href="https://engineersofai.com/blog/ai-letters-34-framework-showdown-finale#the-part-most-people-will-get-wrong" class="hash-link" aria-label="Direct link to The Part Most People Will Get Wrong" title="Direct link to The Part Most People Will Get Wrong" translate="no">​</a></h2>
<p>The top-line verdict - SynapseKit wins 14/25 - will be read as "use SynapseKit for everything." That is not what the data says.</p>
<p>LangChain's 7 wins cluster in exactly the scenarios that matter most for large teams and complex systems: orchestration, observability, multi-agent coordination. If you are building a 10-person team product with complex agent workflows, LangChain's ecosystem and LangGraph's maturity probably outweigh the LoC advantage.</p>
<p>LlamaIndex's 4 wins are in a tightly defined domain where it is the best tool available. If your core product is document Q&amp;A or knowledge base search, LlamaIndex's chunking strategies and evaluation framework represent real engineering investment you should not ignore.</p>
<p>The honest one-line per use case:</p>
<ul>
<li class="">New project, small team, wants to ship fast: SynapseKit</li>
<li class="">Complex agents, large engineering team, needs ecosystem: LangChain</li>
<li class="">RAG quality as a core metric, document intelligence: LlamaIndex</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="three-things-worth-doing-this-week">Three Things Worth Doing This Week<a href="https://engineersofai.com/blog/ai-letters-34-framework-showdown-finale#three-things-worth-doing-this-week" class="hash-link" aria-label="Direct link to Three Things Worth Doing This Week" title="Direct link to Three Things Worth Doing This Week" translate="no">​</a></h2>
<ol>
<li class="">Run the simplest-rag benchmark (#29) with your own document and query. The LoC difference is more visceral when it is your code, not mine.</li>
<li class="">If you are currently using LangChain for a simple RAG pipeline (no agents, no complex branching), count how many lines of boilerplate exist solely for framework composition. That number is your migration ROI estimate.</li>
<li class="">If you have a production LLM system with no PII detection layer, add one this week. It does not have to be SynapseKit - but it has to be something. The cost of a PII leak is not worth the shortcut.</li>
</ol>
<p>The full series index with all 30 notebooks is on Kaggle. Every score is reproducible. Fork any notebook and run it yourself - if you get different numbers, I want to know.</p>
<p>This is not "my framework won so I declare victory." This is 30 notebooks of data saying: different frameworks are better at different things, and the choice should be driven by what your application actually needs.</p>
<p><em>Engineers of AI</em></p>
<p><strong>Read more: <a href="https://www.engineersofai.com/" target="_blank" rel="noopener noreferrer" class="">www.engineersofai.com</a></strong></p>
<p><em>If this was useful, forward it to one engineer who should be reading it.</em></p>]]></content:encoded>
            <category>LLM Frameworks</category>
            <category>RAG</category>
            <category>ai-engineering</category>
            <category>Deep Dive</category>
        </item>
        <item>
            <title><![CDATA[AI Letters #33 - We Built Traceprop: Finally, an ML Audit Trail That Answers the Regulator's Question]]></title>
            <link>https://engineersofai.com/blog/ai-letters-33-traceprop</link>
            <guid>https://engineersofai.com/blog/ai-letters-33-traceprop</guid>
            <pubDate>Tue, 05 May 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[We built Traceprop because every ML pipeline we audited had the same gap - source files on one side, predictions on the other, nothing connecting them. Today we're open-sourcing the fix. pip install traceprop.]]></description>
            <content:encoded><![CDATA[<div style="display:flex;gap:12px;flex-wrap:wrap;margin-bottom:24px">
<a href="https://github.com/AmitoVrito/Traceprop" target="_blank" rel="noopener noreferrer" style="display:inline-flex;align-items:center;gap:8px;background:#0f172a;color:#fff;padding:10px 18px;border-radius:8px;text-decoration:none;font-weight:700;font-size:0.9rem" class="">GitHub - AmitoVrito/Traceprop</a>
<a href="https://doi.org/10.5281/zenodo.20036000" target="_blank" rel="noopener noreferrer" style="display:inline-flex;align-items:center;gap:8px;background:#6366f1;color:#fff;padding:10px 18px;border-radius:8px;text-decoration:none;font-weight:700;font-size:0.9rem" class="">Preprint - DOI 10.5281/zenodo.20036000</a>
<a href="https://pypi.org/project/traceprop/" target="_blank" rel="noopener noreferrer" style="display:inline-flex;align-items:center;gap:8px;background:#10b981;color:#fff;padding:10px 18px;border-radius:8px;text-decoration:none;font-weight:700;font-size:0.9rem" class="">pip install traceprop</a>
</div>
<p>We spent months auditing ML pipelines across regulated industries. Every single one had the same gap: source files on one side, model predictions on the other, and nothing connecting them. MLflow knew which file. DVC knew which commit. Influence libraries knew which tensor. Nobody knew which source row drove which decision. We built Traceprop to fix this. Today it's open source.</p>
<!-- -->
<div style="display:flex;flex-direction:column;gap:16px;margin:24px 0">
<a href="https://engineersofai.com/blog-visuals/ai-letters-33/ai-letters-33-timeline.html" target="_blank" rel="noopener noreferrer" style="text-decoration:none" class="">
<div style="border:1px solid #e2e8f0;border-radius:12px;padding:20px 24px;background:#fff;cursor:pointer">
<div style="font-size:0.72rem;font-weight:700;letter-spacing:0.08em;text-transform:uppercase;color:#6366f1;margin-bottom:6px">Interactive Timeline</div>
<div style="font-size:1rem;font-weight:700;color:#0f172a;margin-bottom:6px">The Provenance Gap: History and Enforcement Deadlines -&gt;</div>
<div style="font-size:0.88rem;color:#475569;line-height:1.6">From MLflow (2018) to EU AI Act enforcement (2026-2027): the full timeline of tools, gaps, and regulatory deadlines that made building Traceprop unavoidable.</div>
</div>
</a>
<a href="https://engineersofai.com/blog-visuals/ai-letters-33/ai-letters-33-paradigms.html" target="_blank" rel="noopener noreferrer" style="text-decoration:none" class="">
<div style="border:1px solid #e2e8f0;border-radius:12px;padding:20px 24px;background:#fff;cursor:pointer">
<div style="font-size:0.72rem;font-weight:700;letter-spacing:0.08em;text-transform:uppercase;color:#0ea5e9;margin-bottom:6px">Interactive Architecture</div>
<div style="font-size:1rem;font-weight:700;color:#0f172a;margin-bottom:6px">How Traceprop Works: Three-Layer Architecture -&gt;</div>
<div style="font-size:0.88rem;color:#475569;line-height:1.6">Click through each layer - lineage, attribution, unlearning - to see exactly how ProvenanceTensor, GradientStore, and the compliance exporter connect source files to predictions to audit certificates.</div>
</div>
</a>
<a href="https://engineersofai.com/blog-visuals/ai-letters-33/ai-letters-33-evidence.html" target="_blank" rel="noopener noreferrer" style="text-decoration:none" class="">
<div style="border:1px solid #e2e8f0;border-radius:12px;padding:20px 24px;background:#fff;cursor:pointer">
<div style="font-size:0.72rem;font-weight:700;letter-spacing:0.08em;text-transform:uppercase;color:#10b981;margin-bottom:6px">Benchmark Dashboard</div>
<div style="font-size:1rem;font-weight:700;color:#0f172a;margin-bottom:6px">The Numbers: Overhead, Attribution Quality, Unlearning -&gt;</div>
<div style="font-size:0.88rem;color:#475569;line-height:1.6">Sub-1% overhead at 1M elements. LDS 0.622 on tabular data. 266x faster than TRAK. Unlearning that exceeds retrain-from-scratch. Every benchmark in one view.</div>
</div>
</a>
</div>
<blockquote>
<p>We built Traceprop because every ML pipeline we audited had the same fatal gap: source files on one side, model predictions on the other, and nothing in between that could answer a regulator's question. Today that changes.</p>
</blockquote>
<p>A credit-scoring model declines an application. The regulator invokes Article 26 of EU Regulation 2024/1689. They want three things: which training records drove that decision, whether those records were processed correctly, and whether the institution can reduce their influence without full retraining.</p>
<p>We watched a well-resourced ML team try to answer this question. They had MLflow for experiment tracking, DVC for dataset versioning, and a state-of-the-art influence function library. It took them eleven days and they still couldn't produce a defensible answer. MLflow knew which file was used - not which rows. DVC knew which commit - not which preprocessing steps were applied to specific rows. The influence library operated on already-processed tensors with no knowledge of which source row produced each one.</p>
<p>That team is not an outlier. That gap is the default state of every ML pipeline that hasn't explicitly engineered a lineage layer. We built Traceprop to close it permanently.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="why-we-built-this">Why We Built This<a href="https://engineersofai.com/blog/ai-letters-33-traceprop#why-we-built-this" class="hash-link" aria-label="Direct link to Why We Built This" title="Direct link to Why We Built This" translate="no">​</a></h2>
<p>We didn't set out to build a compliance tool. We set out to answer a question that kept coming up in every production ML system we worked on: if a model makes a bad decision, can you trace it back to the training data that caused it?</p>
<p>The answer was always no. Not because engineers were being lazy. Because the tools were architecturally incapable of answering it. Each tool stopped at its own boundary and handed off to nothing.</p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">THE PROVENANCE GAP - what each tool actually covers</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">MLflow/DVC      [experiment metadata] [dataset file]</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">                                                   ^ stops here</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Preprocessing   [data loaded] [transform 1] [transform 2]</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">                                                          ^ stops here</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Attribution     [tensor indices] [influence scores]</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">^ starts here</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Source rows     [credit_scores.csv row 4821]</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">^ nobody connects this to anything above</span><br></div></code></pre></div></div>
<p>We needed a system that treats the entire pipeline - from raw file row to final prediction - as a single traceable object. That system didn't exist. So we built it.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="what-traceprop-is">What Traceprop Is<a href="https://engineersofai.com/blog/ai-letters-33-traceprop#what-traceprop-is" class="hash-link" aria-label="Direct link to What Traceprop Is" title="Direct link to What Traceprop Is" translate="no">​</a></h2>
<p>Traceprop is a Python library that introduces one new concept: the <code>ProvenanceTensor</code>. Every array in your pipeline becomes a <code>ProvenanceTensor</code> when loaded through Traceprop. It wraps the underlying NumPy or PyTorch array and records a directed acyclic graph of every operation applied to it, with source-file row annotations at the leaves.</p>
<p>You change two lines of code. Everything else stays the same.</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> traceprop </span><span class="token keyword" style="color:#00009f">as</span><span class="token plain"> tp</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># Change: tp.load_csv instead of pd.read_csv</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">X </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> tp</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">load_csv</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"credit_scores.csv"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain">  </span><span class="token comment" style="color:#999988;font-style:italic"># now a ProvenanceTensor</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># Everything else is identical to your existing code</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">X_norm </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">X </span><span class="token operator" style="color:#393A34">-</span><span class="token plain"> X</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">mean</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">axis</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">0</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">/</span><span class="token plain"> X</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">std</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">axis</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">0</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">X_filt </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> X_norm</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">X_norm</span><span class="token punctuation" style="color:#393A34">[</span><span class="token punctuation" style="color:#393A34">:</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">3</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">&gt;</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">0</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># New capability: query provenance instantly</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">X_filt</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">sources</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain">   </span><span class="token comment" style="color:#999988;font-style:italic"># {credit_scores.csv: [rows 0-4998]}</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">X_filt</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">ops</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain">       </span><span class="token comment" style="color:#999988;font-style:italic"># [normalize, row_filter]</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">X_filt</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">ancestors</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token comment" style="color:#999988;font-style:italic"># full DAG at depth 1000 in 0.42ms</span><br></div></code></pre></div></div>
<p>The overhead is sub-1%. At 10^6 array elements: 1.007x on macOS, 0.979x on Linux. The sub-unity overhead on Linux is real - Traceprop's batch-aware memory layout improves cache locality enough that lineage tracking is actually faster than raw NumPy at that scale.</p>
<p>We are not asking you to rewrite your pipeline. We are asking you to change two lines and get an audit trail.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-attribution-layer-connecting-predictions-to-source-rows">The Attribution Layer: Connecting Predictions to Source Rows<a href="https://engineersofai.com/blog/ai-letters-33-traceprop#the-attribution-layer-connecting-predictions-to-source-rows" class="hash-link" aria-label="Direct link to The Attribution Layer: Connecting Predictions to Source Rows" title="Direct link to The Attribution Layer: Connecting Predictions to Source Rows" translate="no">​</a></h2>
<p>Lineage tells you which source rows a tensor came from. Attribution tells you which training samples most influenced a specific prediction. Connecting the two - so you can go from a declined application all the way back to the exact CSV row that drove it - is the core engineering contribution of Traceprop.</p>
<p>The naive approach fails immediately. Storing one full-parameter gradient per training sample costs 24 TB for a ResNet-9 at 1M samples. We use sparse Johnson-Lindenstrauss projection to compress gradients to k dimensions. At k=4096 the GradientStore costs 15.3 GB for 1M samples. Fits a standard cloud instance. The JL distortion bound (epsilon ~= 0.18 at k=4096) is proven, not empirical - the top-k attribution set is correct with high probability.</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> traceprop</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">attribution </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> TrainingContext</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> GradientStore</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> compute_influence_scores</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">store </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> GradientStore</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">k</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">4096</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> path</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"./grad_store/"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># Wrap your training loop - that's all</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">with</span><span class="token plain"> TrainingContext</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">model</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> store</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">as</span><span class="token plain"> ctx</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> epoch </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> </span><span class="token builtin">range</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">num_epochs</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> batch_idx</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">X_batch</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> y_batch</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> </span><span class="token builtin">enumerate</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">loader</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">            loss </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> criterion</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">model</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">X_batch</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> y_batch</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">            ctx</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">backward</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">loss</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> batch_idx</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">batch_idx</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain">  </span><span class="token comment" style="color:#999988;font-style:italic"># one change</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">            optimizer</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">step</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># Now answer the audit question</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">scores </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> compute_influence_scores</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">model</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> store</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> declined_application</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> top_k</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">20</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> sample_idx</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> score </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> scores</span><span class="token punctuation" style="color:#393A34">[</span><span class="token punctuation" style="color:#393A34">:</span><span class="token number" style="color:#36acaa">5</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    provenance </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> store</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">get_provenance</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">sample_idx</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">print</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">provenance</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">trace_to_file</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token comment" style="color:#999988;font-style:italic"># -&gt; credit_scores.csv, row 4821, influence score: 0.921</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token comment" style="color:#999988;font-style:italic"># -&gt; credit_scores.csv, row 2103, influence score: 0.887</span><br></div></code></pre></div></div>
<p>The benchmark numbers are honest about where Traceprop wins and where it doesn't.</p>
<p>For tabular models - which dominate regulated industries - Traceprop is the right tool with no caveats. LDS 0.622 at 0.22 seconds on CPU. No GPU required. Full source-file traceability. This is the setup that matters for credit scoring, insurance underwriting, and HR decisions.</p>
<p>For deep vision with BatchNorm, TRAK (Park et al., 2023) achieves better attribution quality (LDS 0.0290 in 691 seconds on GPU). Traceprop-LL achieves LDS 0.0168 in 2.6 seconds on CPU - 266x faster, lower quality. The degradation comes from BatchNorm encoding batch statistics into last-layer features, corrupting the per-sample gradient signal. For image models, use Traceprop for lineage and unlearning, TRAK for attribution quality when you have GPU budget.</p>
<p>We are telling you exactly where we beat existing tools and where we don't. If a library doesn't do this, treat its benchmark numbers as marketing.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-unlearning-layer-gdpr-erasure-that-actually-works">The Unlearning Layer: GDPR Erasure That Actually Works<a href="https://engineersofai.com/blog/ai-letters-33-traceprop#the-unlearning-layer-gdpr-erasure-that-actually-works" class="hash-link" aria-label="Direct link to The Unlearning Layer: GDPR Erasure That Actually Works" title="Direct link to The Unlearning Layer: GDPR Erasure That Actually Works" translate="no">​</a></h2>
<p>GDPR Article 17 gives individuals the right to have their personal data erased from trained models. No existing tool connected "which CSV rows belong to this data subject" to "which training tensor indices to unlearn" automatically. You had to do it by hand, with no consistency guarantees. We automated the entire chain.</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> traceprop</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">unlearn </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> approximate_unlearn</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> export_compliance</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># GDPR erasure request - source rows map automatically to tensor indices</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">forget_set </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> store</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">samples_from_source</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"credit_scores.csv"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> rows</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">[</span><span class="token number" style="color:#36acaa">4821</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">7203</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">9100</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># Gradient correction targets exactly the highest-influence samples</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">theta_prime </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> approximate_unlearn</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">model</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> forget_set</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> eta</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">0.01</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> steps</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">10</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># Export Article 26 compliance certificate</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">report </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> export_compliance</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    model_before</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">model</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> model_after</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">theta_prime</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    forget_set</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">forget_set</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> store</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">store</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    regulation</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"EU_AI_ACT_ART26"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">report</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">save</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"unlearning_certificate.json"</span><span class="token punctuation" style="color:#393A34">)</span><br></div></code></pre></div></div>
<p>The results against the standard benchmark (binary classification, n=1000, forget set of 50 highest-influence samples):</p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">METHOD                    FORGET-SET LOSS   TEST ACC   GAP CLOSED</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Original (no unlearning)  0.379             0.920      0%</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Gold (retrain-scratch)    0.401             0.918      100%</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Traceprop unlearning      0.425             0.915      &gt;100%</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Random unlearning         0.382             0.915      14%</span><br></div></code></pre></div></div>
<p>Traceprop exceeds the retrain-from-scratch gold standard. Random unlearning closes 14% of the gap. That 7x difference is entirely because we know which samples are highest-influence and target them specifically. Without attribution, you are unlearning the wrong samples.</p>
<p>The gradient correction is first-order approximate - we document this clearly. There is no formal differential privacy guarantee. What there is: a verifiable, measurable effect on model behavior, traceable to specific source rows, exported in a format regulators can inspect.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-multi-source-case">The Multi-Source Case<a href="https://engineersofai.com/blog/ai-letters-33-traceprop#the-multi-source-case" class="hash-link" aria-label="Direct link to The Multi-Source Case" title="Direct link to The Multi-Source Case" translate="no">​</a></h2>
<p>Real pipelines are not single-CSV pipelines. We tested Traceprop on a 3-table credit risk pipeline: application data, credit bureau data, previous application history. 180,000 source rows total. 20,000 applicants.</p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">SOURCE TABLE              ROWS     ATTRIBUTION WEIGHT</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">application.csv           20,000   0.424</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">bureau.csv                80,000   0.426</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">previous_application.csv  80,000   0.434</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">ETL overhead:   2.93x (paid once at ingestion)</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Query latency:  2.36ms (full attribution + source resolution across all 3 tables)</span><br></div></code></pre></div></div>
<p>2.36 milliseconds to answer "which rows in which table drove this decision, through which preprocessing steps." The ETL overhead is paid once at ingestion. Query time has no pipeline complexity penalty.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-enforcement-dates">The Enforcement Dates<a href="https://engineersofai.com/blog/ai-letters-33-traceprop#the-enforcement-dates" class="hash-link" aria-label="Direct link to The Enforcement Dates" title="Direct link to The Enforcement Dates" translate="no">​</a></h2>
<p>EU AI Act Article 26 logging obligations apply from <strong>August 2026</strong> for new high-risk AI systems. The backstop enforcement date for all deployed high-risk systems is <strong>2 December 2027</strong>. GDPR Article 17 erasure obligations are already in force.</p>
<p>High-risk AI systems under the Act include: credit scoring, employment decisions, educational assessment, critical infrastructure management, biometric identification. If you are building any of these, the compliance question is not whether you need this infrastructure. It is how much of the gap you have already closed.</p>
<p>Most teams we've talked to have closed zero percent of it. They are planning to "deal with compliance later." Later is August 2026. That is under four months away.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="why-were-open-sourcing-it">Why We're Open-Sourcing It<a href="https://engineersofai.com/blog/ai-letters-33-traceprop#why-were-open-sourcing-it" class="hash-link" aria-label="Direct link to Why We're Open-Sourcing It" title="Direct link to Why We're Open-Sourcing It" translate="no">​</a></h2>
<p>We built this for our own work. Then we realized the gap was universal - every ML team in a regulated domain was hitting the same wall. Keeping a proprietary solution while the industry ships non-compliant models would be the wrong call.</p>
<p>Traceprop is Apache 2.0. The preprint is on Zenodo (DOI: 10.5281/zenodo.20036000). The implementation is designed for incremental adoption - you can use only the lineage layer, only attribution, or the full stack. Start with one line change and expand from there.</p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">pip install traceprop</span><br></div></code></pre></div></div>
<p>That's the starting point. The preprint has full architectural documentation, benchmark methodology, and implementation notes for production deployment.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="what-to-do-right-now">What to Do Right Now<a href="https://engineersofai.com/blog/ai-letters-33-traceprop#what-to-do-right-now" class="hash-link" aria-label="Direct link to What to Do Right Now" title="Direct link to What to Do Right Now" translate="no">​</a></h2>
<p><strong>1. Install Traceprop and run the lineage layer on your next pipeline.</strong>
Two lines of code change. Sub-1% overhead. You get a full audit trail from source file rows through every preprocessing operation. This is the minimum viable compliance step and costs you almost nothing.</p>
<p><strong>2. If you're in a regulated industry, benchmark attribution on your tabular models today.</strong>
LDS 0.622 at 0.22 seconds on CPU. No GPU. No infrastructure changes. If your pipeline is tabular (credit, insurance, HR), Traceprop-LL is the right attribution tool right now, not a future option.</p>
<p><strong>3. Map your GDPR erasure workflow to the unlearning layer.</strong>
The automatic source-row-to-tensor-index mapping is the piece that takes a manual 11-day process and makes it a 10-second operation. That alone justifies the integration.</p>
<p><strong>4. Read the enforcement deadlines again.</strong>
August 2026 for new high-risk systems. Four months. The architectural decisions you make this quarter will determine whether your system can answer a regulatory audit question when the clock runs out.</p>
<p><strong>5. Share this with the compliance and legal team.</strong>
The compliance certificate export (<code>export_compliance(..., regulation="EU_AI_ACT_ART26")</code>) produces a JSON document auditors can inspect directly. This is documentation your legal team needs to see before your next system deployment.</p>
<p>The 2 December 2027 backstop deadline looks distant. August 2026 does not. We built Traceprop so teams don't have to spend eleven days manually stitching together three tool outputs and still come up empty. Install it. Use it. The gap is closed.</p>
<hr>
<p><strong><code>pip install traceprop</code></strong></p>
<p>Preprint: <a href="https://doi.org/10.5281/zenodo.20036000" target="_blank" rel="noopener noreferrer" class="">DOI 10.5281/zenodo.20036000</a> - Apache 2.0 - pip install traceprop</p>
<p><em>Engineers of AI</em></p>
<p><strong>Read more: <a href="https://www.engineersofai.com/" target="_blank" rel="noopener noreferrer" class="">www.engineersofai.com</a></strong></p>
<p><em>If this was useful, forward it to one engineer who should be reading it.</em></p>]]></content:encoded>
            <category>AI Letters</category>
            <category>ml-engineering</category>
            <category>research</category>
            <category>compliance</category>
        </item>
        <item>
            <title><![CDATA[AI Letters #32 - Your RAG Has No Immune System]]></title>
            <link>https://engineersofai.com/blog/ai-letters-32-llm-evaluation</link>
            <guid>https://engineersofai.com/blog/ai-letters-32-llm-evaluation</guid>
            <pubDate>Thu, 30 Apr 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[LangChain 1.x deleted its evaluation module. Most teams never noticed. Notebook #24 of the LLM Showdown tests which frameworks have built-in RAG evaluation primitives - and which leave you flying blind.]]></description>
            <content:encoded><![CDATA[<p>LangChain 1.x removed its evaluation module. Most teams never noticed. Notebook #24 of the LLM Showdown tests which frameworks give you faithfulness, relevancy, and regression tracking out of the box - and which ones leave you to build it from scratch.</p>
<!-- -->
<div style="display:flex;flex-direction:column;gap:16px;margin:24px 0">
<a href="https://engineersofai.com/blog-visuals/ai-letters-32/timeline.html" target="_blank" rel="noopener noreferrer" style="text-decoration:none" class="">
<div style="border:1px solid #e2e8f0;border-radius:12px;padding:20px 24px;background:#fff;cursor:pointer">
<div style="font-size:0.72rem;font-weight:700;letter-spacing:0.08em;text-transform:uppercase;color:#6366f1;margin-bottom:6px">Interactive Chart</div>
<div style="font-size:1rem;font-weight:700;color:#0f172a;margin-bottom:6px">History of LLM Evaluation -&gt;</div>
<div style="font-size:0.88rem;color:#475569;line-height:1.6">From BLEU scores to LLM-as-judge: how the field evolved from word-overlap heuristics to model-graded faithfulness evaluation.</div>
</div>
</a>
<a href="https://engineersofai.com/blog-visuals/ai-letters-32/paradigms.html" target="_blank" rel="noopener noreferrer" style="text-decoration:none" class="">
<div style="border:1px solid #e2e8f0;border-radius:12px;padding:20px 24px;background:#fff;cursor:pointer">
<div style="font-size:0.72rem;font-weight:700;letter-spacing:0.08em;text-transform:uppercase;color:#0ea5e9;margin-bottom:6px">Interactive Comparison</div>
<div style="font-size:1rem;font-weight:700;color:#0f172a;margin-bottom:6px">Framework Feature Matrix -&gt;</div>
<div style="font-size:0.88rem;color:#475569;line-height:1.6">Click each feature to see exactly how SynapseKit, LangChain, and LlamaIndex implement (or don't implement) faithfulness, relevancy, batch eval, and regression tracking.</div>
</div>
</a>
<a href="https://engineersofai.com/blog-visuals/ai-letters-32/evidence.html" target="_blank" rel="noopener noreferrer" style="text-decoration:none" class="">
<div style="border:1px solid #e2e8f0;border-radius:12px;padding:20px 24px;background:#fff;cursor:pointer">
<div style="font-size:0.72rem;font-weight:700;letter-spacing:0.08em;text-transform:uppercase;color:#10b981;margin-bottom:6px">Evidence Dashboard</div>
<div style="font-size:1rem;font-weight:700;color:#0f172a;margin-bottom:6px">Benchmark Results: Scores and LoC -&gt;</div>
<div style="font-size:0.88rem;color:#475569;line-height:1.6">Lines of code, feature coverage scores, and heuristic eval scores across faithful, unfaithful, and off-topic responses.</div>
</div>
</a>
</div>
<blockquote>
<p>Your RAG system has retrieval, chunking, reranking, and a carefully tuned prompt. It almost certainly has no way to tell you when it starts lying.</p>
</blockquote>
<p>You shipped a RAG system three months ago. It has a vector store, a reranker, a well-tuned system prompt, and response streaming so it feels fast. You monitor latency. You log errors. You track token costs. Your on-call dashboard is clean.</p>
<p>What you do not have is any way to know if the answers are faithful to the retrieved context. You have no signal when responses start contradicting your documents. You have no baseline to compare against when you upgrade your embedding model next week. The system is generating answers and you are reading dashboards that tell you nothing about whether those answers are correct.</p>
<p>This is not a niche problem. It is the default state. Every RAG system deployed without evaluation infrastructure is operating on the assumption that it is working. Most of them are wrong about that assumption at least some of the time. Notebook #24 of the LLM Showdown tests which frameworks give you evaluation primitives out of the box - and which ones leave you to build it yourself.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="what-langchain-1x-quietly-removed">What LangChain 1.x Quietly Removed<a href="https://engineersofai.com/blog/ai-letters-32-llm-evaluation#what-langchain-1x-quietly-removed" class="hash-link" aria-label="Direct link to What LangChain 1.x Quietly Removed" title="Direct link to What LangChain 1.x Quietly Removed" translate="no">​</a></h2>
<p>Until late 2023, LangChain shipped a dedicated evaluation module. You could call <code>load_evaluator("faithfulness")</code> and get a working LLM-as-judge chain in two lines. It was not perfect, but it existed.</p>
<p>LangChain 1.x removed it. The <code>langchain.evaluation</code> module is gone. The documentation now points teams toward RAGAS, DeepEval, or building their own evaluation chains with LCEL. This is a reasonable architectural choice - LangChain decided to be an orchestration framework, not an evaluation framework. But most teams using LangChain for RAG either do not know this happened or have not gotten around to replacing it.</p>
<p>The result: teams that were relying on LangChain's built-in evaluators are now either running no evaluation at all, or they have added an external dependency (RAGAS, DeepEval) that requires its own setup, its own API key, and its own maintenance burden.</p>
<p>Notebook #24 tests this directly. We give all three frameworks the same task: evaluate three query-context-response triples for faithfulness and relevancy. Here is what happens.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-three-frameworks-the-same-task">The Three Frameworks, The Same Task<a href="https://engineersofai.com/blog/ai-letters-32-llm-evaluation#the-three-frameworks-the-same-task" class="hash-link" aria-label="Direct link to The Three Frameworks, The Same Task" title="Direct link to The Three Frameworks, The Same Task" translate="no">​</a></h2>
<p><strong>The test setup</strong>: three response scenarios with known ground truth.</p>
<ul>
<li class="">Faithful: response accurately reflects retrieved context</li>
<li class="">Unfaithful: response contradicts context with false claims</li>
<li class="">Off-topic: response ignores context entirely, answers a different question</li>
</ul>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">QUERY:    "How does RAG reduce hallucination?"</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">CONTEXT:  "RAG grounds responses in retrieved evidence, reducing</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">           hallucination by anchoring generation to retrieved facts."</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">RESPONSE: "RAG reduces hallucination by conditioning generation on</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">           retrieved evidence rather than parametric knowledge alone."</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">FAITHFULNESS:  0.52  (52% of non-trivial response words in context)</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">RELEVANCY:     0.33  (33% of query words appear in response)</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">SCORE:         0.43</span><br></div></code></pre></div></div>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">QUERY:    "How does RAG reduce hallucination?"</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">CONTEXT:  [same as above]</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">RESPONSE: "RAG increases hallucination by 40% according to recent</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">           studies. Quantum retrieval mechanisms destabilize answers."</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">FAITHFULNESS:  0.19  (response contradicts context)</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">RELEVANCY:     0.33</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">SCORE:         0.26</span><br></div></code></pre></div></div>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">QUERY:    "How does RAG reduce hallucination?"</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">CONTEXT:  [same as above]</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">RESPONSE: "Django and FastAPI are both excellent Python web frameworks</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">           for building REST APIs."</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">FAITHFULNESS:  0.00  (zero overlap with context)</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">RELEVANCY:     0.00  (zero overlap with query)</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">SCORE:         0.00</span><br></div></code></pre></div></div>
<p>A working evaluator should clearly separate these three. The faithful response scores highest. The unfaithful response scores lower. The off-topic response scores zero. Any evaluation framework that cannot make these distinctions is not functional.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-feature-gap-is-not-close">The Feature Gap Is Not Close<a href="https://engineersofai.com/blog/ai-letters-32-llm-evaluation#the-feature-gap-is-not-close" class="hash-link" aria-label="Direct link to The Feature Gap Is Not Close" title="Direct link to The Feature Gap Is Not Close" translate="no">​</a></h2>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">FEATURE                  SYNAPSEKIT   LANGCHAIN   LLAMAINDEX</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">----------------------   ----------   ---------   ----------</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Faithfulness evaluator   Yes          No          Yes</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Relevancy evaluator      Yes          No          Yes</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Groundedness/correct.    Yes          No          Yes</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Batch eval runner        Yes          No          Yes</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Custom metrics           Yes          Yes         Yes</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Async evaluation         Yes          Yes         Yes</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Regression tracking      Yes          No          No</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">----------------------   ----------   ---------   ----------</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">FEATURE SCORE (of 7)     7/7          2/7         6/7</span><br></div></code></pre></div></div>
<p>LangChain scores 2 out of 7. Both items it supports (custom metrics and async evaluation) are things you build yourself with LCEL chains. There are no native evaluation primitives. There is no concept of a faithfulness score, a relevancy score, or a batch evaluation runner. You get a general-purpose chain-building toolkit and the evaluation problem is entirely your problem.</p>
<p>LlamaIndex scores 6 out of 7. It ships <code>FaithfulnessEvaluator</code>, <code>RelevancyEvaluator</code>, <code>CorrectnessEvaluator</code>, and a <code>BatchEvalRunner</code> with configurable worker pools. The one missing feature is regression tracking - no mechanism to compare eval snapshots across time.</p>
<p>SynapseKit scores 7 out of 7. The <code>EvaluationPipeline</code> abstraction handles faithfulness, relevancy, and correctness in a single call. <code>EvalSnapshot</code> captures timestamped eval state. <code>EvalRegression</code> computes drift between snapshots. Both regression primitives are unique to SynapseKit in this comparison.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="lines-of-code-tell-the-same-story">Lines of Code Tell the Same Story<a href="https://engineersofai.com/blog/ai-letters-32-llm-evaluation#lines-of-code-tell-the-same-story" class="hash-link" aria-label="Direct link to Lines of Code Tell the Same Story" title="Direct link to Lines of Code Tell the Same Story" translate="no">​</a></h2>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">TASK: evaluate faithfulness + relevancy on one response</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">SYNAPSEKIT (16 lines total):</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">  imports: 5</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">  code:    11</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">LLAMAINDEX (19 lines total):</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">  imports: 6</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">  code:    13</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">LANGCHAIN (21 lines total):</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">  imports: 2</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">  code:    19</span><br></div></code></pre></div></div>
<p>LangChain requires fewer imports because it is importing a general-purpose chain builder, not evaluation-specific classes. The code itself is longer because you are constructing the evaluation logic manually - writing the prompt template, specifying the output parser, wiring the chain together.</p>
<p>SynapseKit's <code>EvaluationPipeline</code> is the highest-level abstraction. You pass it evaluator instances and a dataset. It handles batching, async execution, and result aggregation. The 16-line count includes error handling and result display.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="why-regression-tracking-is-the-feature-most-teams-need">Why Regression Tracking Is the Feature Most Teams Need<a href="https://engineersofai.com/blog/ai-letters-32-llm-evaluation#why-regression-tracking-is-the-feature-most-teams-need" class="hash-link" aria-label="Direct link to Why Regression Tracking Is the Feature Most Teams Need" title="Direct link to Why Regression Tracking Is the Feature Most Teams Need" translate="no">​</a></h2>
<p>Faithfulness and relevancy scores matter. But the question most teams actually need to answer is not "what is our score today" - it is "did our score change when we deployed the new embedding model?"</p>
<p>Without regression tracking, you run evals before a deployment, write down the numbers, run evals after deployment, write down the numbers again, and compare them manually. This works approximately once. After the third deployment cycle it falls apart because nobody updated the baseline, the test set has changed, and the numbers live in a Notion doc that nobody can find.</p>
<p><code>EvalSnapshot</code> captures the full eval state: scores, test cases, model version, timestamp. <code>EvalRegression</code> takes two snapshots and computes the delta. You store snapshots. You run regressions as part of your deployment pipeline. You fail the deployment if faithfulness drops more than 5 points. This is the engineering discipline that makes evaluation durable rather than a one-time exercise.</p>
<p>Neither LangChain nor LlamaIndex ship this. Teams using those frameworks either build it themselves (rare) or skip it (common).</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="what-this-means-for-engineers">What This Means for Engineers<a href="https://engineersofai.com/blog/ai-letters-32-llm-evaluation#what-this-means-for-engineers" class="hash-link" aria-label="Direct link to What This Means for Engineers" title="Direct link to What This Means for Engineers" translate="no">​</a></h2>
<ol>
<li class="">
<p><strong>If you are using LangChain for RAG and you have not added RAGAS or DeepEval, you have no evaluation infrastructure.</strong> The old <code>langchain.evaluation</code> module is gone. This is not a gap that will be filled by a future LangChain release - it was a deliberate architectural decision.</p>
</li>
<li class="">
<p><strong>LlamaIndex is the practical choice for teams that want built-in evaluators without changing their existing LlamaIndex setup.</strong> The evaluator objects are well-designed, BatchEvalRunner handles concurrency, and the API is stable. The only gap is regression tracking.</p>
</li>
<li class="">
<p><strong>Regression tracking is what separates teams that evaluate from teams that evaluate systematically.</strong> Point-in-time scores are better than nothing. Tracked-over-time scores are what you can actually build a deployment gate on.</p>
</li>
<li class="">
<p><strong>Heuristic evaluation (no API key required) still separates faithful from unfaithful responses clearly.</strong> The faithful response scored 0.43, the unfaithful scored 0.26, the off-topic scored 0.00. You do not need GPT-4-as-judge to know when a response has zero word overlap with the retrieved context.</p>
</li>
<li class="">
<p><strong>The evaluation problem is not going away as models improve.</strong> Better models hallucinate less on average but with higher confidence. Without evaluation infrastructure, you have no way to catch the cases where a better model is confidently wrong.</p>
</li>
</ol>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-thing-most-teams-get-wrong">The Thing Most Teams Get Wrong<a href="https://engineersofai.com/blog/ai-letters-32-llm-evaluation#the-thing-most-teams-get-wrong" class="hash-link" aria-label="Direct link to The Thing Most Teams Get Wrong" title="Direct link to The Thing Most Teams Get Wrong" translate="no">​</a></h2>
<p>Teams treat evaluation as a pre-launch checklist item. Run evals, check the box, ship. This is worse than useful - it creates false confidence.</p>
<p>Evaluation is useful only when it is continuous. The embedding model you are using today will be deprecated in 12 months. The documents in your vector store will change. The distribution of queries will shift. Each of these changes can degrade faithfulness scores without triggering any of your existing monitors.</p>
<p>A RAG system without continuous evaluation is a system that will degrade silently. You will find out when a user screenshots a bad response and posts it somewhere. The evaluation infrastructure is not the interesting engineering problem, which is why most teams skip it. That is exactly why the teams that do it have a durable advantage.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="three-things-worth-doing-this-week">Three Things Worth Doing This Week<a href="https://engineersofai.com/blog/ai-letters-32-llm-evaluation#three-things-worth-doing-this-week" class="hash-link" aria-label="Direct link to Three Things Worth Doing This Week" title="Direct link to Three Things Worth Doing This Week" translate="no">​</a></h2>
<ol>
<li class="">
<p><strong>Run a faithfulness check on 20 recent production responses.</strong> Use LlamaIndex's <code>FaithfulnessEvaluator</code> or SynapseKit's <code>EvaluationPipeline</code>. See what the scores look like. The result will surprise you.</p>
</li>
<li class="">
<p><strong>Define your regression threshold before you need it.</strong> Decide now: what faithfulness drop is unacceptable? 5 points? 10? Writing this down before you have a regression is the only way to make the decision rationally rather than defensively.</p>
</li>
<li class="">
<p><strong>Instrument your RAG pipeline to log query-context-response triples to a database.</strong> You do not need to evaluate all of them. You need a sample. Once the triples are logged, you can run evals on any of them at any time. Without the log, every eval requires manual test case construction.</p>
</li>
</ol>
<p>The notebook is public. All code runs without an API key - the heuristic evaluators use word overlap, not a language model. Fork it, run it against your own responses, and see where you actually stand.</p>
<p><em>Engineers of AI</em></p>
<p><strong>Read more: <a href="https://www.engineersofai.com/" target="_blank" rel="noopener noreferrer" class="">www.engineersofai.com</a></strong></p>
<p><em>If this was useful, forward it to one engineer who should be reading it.</em></p>]]></content:encoded>
            <category>AI Letters</category>
            <category>LLM Frameworks</category>
            <category>evaluation</category>
            <category>Benchmarks</category>
        </item>
        <item>
            <title><![CDATA[AI Letters #31 - Graph Workflows: When Chains Break and DAGs Take Over]]></title>
            <link>https://engineersofai.com/blog/ai-letters-31-graph-workflows</link>
            <guid>https://engineersofai.com/blog/ai-letters-31-graph-workflows</guid>
            <pubDate>Thu, 23 Apr 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[SynapseKit and LangGraph ship near-identical StateGraph primitives with 7/7 graph features. LlamaIndex has zero. Notebook #23 of the LLM Showdown reveals which frameworks let you build conditional, looping, parallel workflows - and which force you to write infrastructure.]]></description>
            <content:encoded><![CDATA[<p>A linear chain handles most tasks. Research, generate, done. But production workflows branch. If the query is complex, run a deeper research step. If it is simple, take the fast path. If quality is insufficient, loop back. This requires a graph, not a chain. Notebook #23 of the LLM Showdown tests which frameworks ship graph primitives - and which force you to build infrastructure from scratch.</p>
<!-- -->
<div style="display:flex;flex-direction:column;gap:16px;margin:24px 0">
<a href="https://engineersofai.com/blog-visuals/ai-letters-31/timeline.html" target="_blank" rel="noopener noreferrer" style="text-decoration:none" class="">
<div style="border:1px solid #e2e8f0;border-radius:12px;padding:20px 24px;background:#fff;cursor:pointer">
<div style="font-size:0.72rem;font-weight:700;letter-spacing:0.08em;text-transform:uppercase;color:#6366f1;margin-bottom:6px">Interactive Chart</div>
<div style="font-size:1rem;font-weight:700;color:#0f172a;margin-bottom:6px">From Chains to Graphs: The Evolution of LLM Orchestration -&gt;</div>
<div style="font-size:0.88rem;color:#475569;line-height:1.6">How LLM orchestration evolved from simple prompt chains through LangChain's LCEL to full DAG runtimes with StateGraph. Click each milestone to see what unlocked at each stage.</div>
</div>
</a>
<a href="https://engineersofai.com/blog-visuals/ai-letters-31/paradigms.html" target="_blank" rel="noopener noreferrer" style="text-decoration:none" class="">
<div style="border:1px solid #e2e8f0;border-radius:12px;padding:20px 24px;background:#fff;cursor:pointer">
<div style="font-size:0.72rem;font-weight:700;letter-spacing:0.08em;text-transform:uppercase;color:#0ea5e9;margin-bottom:6px">Interactive Explorer</div>
<div style="font-size:1rem;font-weight:700;color:#0ea5e9;margin-bottom:6px">Graph Feature Explorer -&gt;</div>
<div style="font-size:0.88rem;color:#475569;line-height:1.6">Click through each graph feature - conditional edges, parallel branches, cycles, checkpointing, streaming, visualization - and see which frameworks support it natively.</div>
</div>
</a>
<a href="https://engineersofai.com/blog-visuals/ai-letters-31/evidence.html" target="_blank" rel="noopener noreferrer" style="text-decoration:none" class="">
<div style="border:1px solid #e2e8f0;border-radius:12px;padding:20px 24px;background:#fff;cursor:pointer">
<div style="font-size:0.72rem;font-weight:700;letter-spacing:0.08em;text-transform:uppercase;color:#10b981;margin-bottom:6px">Evidence Dashboard</div>
<div style="font-size:1rem;font-weight:700;color:#0f172a;margin-bottom:6px">Full Graph Workflow Evidence Dashboard -&gt;</div>
<div style="font-size:0.88rem;color:#475569;line-height:1.6">Lines of code, feature heatmap, API comparison, and code side-by-side - all benchmark data from notebook #23 in one interactive view.</div>
</div>
</a>
</div>
<blockquote>
<p>"The difference between a framework with graph primitives and one without is the difference between declaring your workflow and implementing your workflow engine."</p>
</blockquote>
<p>A chain is a sequence. Step 1 feeds step 2. Step 2 feeds step 3. No decisions. No branches. No loops. For a simple RAG pipeline - retrieve, augment, generate - a chain is all you need.</p>
<p>Then requirements arrive. Route complex queries to a deep research path and simple queries to a fast path. Retry if the answer confidence is below a threshold. Run web search and database lookup in parallel, then merge results. Pause for human approval before executing a tool call.</p>
<p>Each of these patterns requires a directed acyclic graph (or a cyclic one, for loops). You need nodes, edges, conditional routing, state that persists across steps, and an execution engine that handles branching and merging. The question is whether your framework ships this as a primitive or whether you build it yourself.</p>
<p>Notebook #23 builds the same conditional 3-node workflow in all three frameworks: a research node, a conditional router that branches to either a detailed or quick answer path, and terminal nodes. Same logic, same behavior, different APIs.</p>
<p>The results split cleanly into two tiers.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="what-we-measured">What We Measured<a href="https://engineersofai.com/blog/ai-letters-31-graph-workflows#what-we-measured" class="hash-link" aria-label="Direct link to What We Measured" title="Direct link to What We Measured" translate="no">​</a></h2>
<p>Each framework implements a conditional pipeline: research -&gt; router -&gt; (detailed answer OR quick answer). The router branches based on query length (a proxy for complexity). We measured four things.</p>

























<table><thead><tr><th>Metric</th><th>What it captures</th></tr></thead><tbody><tr><td><strong>Lines of code</strong></td><td>LoC to build the conditional 3-node graph</td></tr><tr><td><strong>Feature coverage</strong></td><td>7 graph capabilities: StateGraph, conditional edges, parallel branches, cycles, checkpointing, streaming, visualization</td></tr><tr><td><strong>API clarity</strong></td><td>How readable is the graph definition?</td></tr><tr><td><strong>Native support</strong></td><td>Does the framework ship graph primitives or require manual Python?</td></tr></tbody></table>
<p><strong>Frameworks:</strong> SynapseKit 1.4 (<code>StateGraph</code>), LangChain 1.2 + LangGraph (<code>StateGraph</code>), LlamaIndex Core 0.14 (manual routing)</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-numbers">The Numbers<a href="https://engineersofai.com/blog/ai-letters-31-graph-workflows#the-numbers" class="hash-link" aria-label="Direct link to The Numbers" title="Direct link to The Numbers" translate="no">​</a></h2>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">Lines of code: Conditional 3-node graph</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Framework      Imports   Code   Total</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">----------------------------------------</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">SynapseKit          1     19      20</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">LangChain           2     18      20</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">LlamaIndex          3     12      15</span><br></div></code></pre></div></div>
<p>LlamaIndex has the fewest lines. But those 15 lines implement only the happy path - manual if/else routing with no state schema, no checkpointing, no streaming, no visualization. Fewer lines of application code, more lines of infrastructure you will write later.</p>
<p>SynapseKit and LangChain are identical at 20 lines each. The APIs are so similar that porting code from one to the other takes minutes.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-feature-matrix">The Feature Matrix<a href="https://engineersofai.com/blog/ai-letters-31-graph-workflows#the-feature-matrix" class="hash-link" aria-label="Direct link to The Feature Matrix" title="Direct link to The Feature Matrix" translate="no">​</a></h2>
<p>This is the real story.</p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">Graph Feature Support (7 features):</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Feature               SynapseKit  LangChain  LlamaIndex</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">---------------------------------------------------------</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">StateGraph primitive      Yes         Yes         No</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Conditional edges         Yes         Yes         No</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Parallel branches         Yes         Yes         No</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Cycle / loop support      Yes         Yes         No</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Built-in checkpointing    Yes         Yes         No</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Stream graph events       Yes         Yes         No</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Graph visualization       Yes         Yes         No</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">---------------------------------------------------------</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Score                     7/7         7/7         0/7</span><br></div></code></pre></div></div>
<p>SynapseKit: 7 out of 7. LangChain: 7 out of 7. LlamaIndex: 0 out of 7.</p>
<p>This is not a close race with a narrow winner. This is a binary split. Two frameworks ship a complete graph runtime. One framework ships nothing.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-api-comparison">The API Comparison<a href="https://engineersofai.com/blog/ai-letters-31-graph-workflows#the-api-comparison" class="hash-link" aria-label="Direct link to The API Comparison" title="Direct link to The API Comparison" translate="no">​</a></h2>
<p>The most surprising finding: SynapseKit and LangGraph have nearly identical APIs.</p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">SynapseKit:</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">  graph = StateGraph(schema)</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">  graph.add_node('research', research_fn)</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">  graph.add_conditional_edge('research', router, mapping)</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">  graph.add_edge('detailed_answer', END)</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">  app = graph.compile()</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">  result = app.run_sync(initial_state)</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">LangGraph:</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">  graph = StateGraph(State)</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">  graph.add_node('research', research_fn)</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">  graph.add_conditional_edges('research', router, mapping)</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">  graph.add_edge('detailed_answer', END)</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">  app = graph.compile()</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">  result = app.invoke(initial_state)</span><br></div></code></pre></div></div>
<p>The differences: <code>add_conditional_edge</code> (singular) vs <code>add_conditional_edges</code> (plural). <code>run_sync</code> vs <code>invoke</code>. <code>TypedState(fields={...})</code> vs <code>TypedDict</code>. That is it. The graph definition pattern is identical.</p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">LlamaIndex:</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">  research_result = research_fn(query)</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">  if len(query) &gt; 20:</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">      result = detailed_fn(research_result)</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">  else:</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">      result = quick_fn(research_result)</span><br></div></code></pre></div></div>
<p>No graph object. No state schema. No conditional edge declaration. Just Python control flow. This works for the simple case. But when you need to add checkpointing, streaming, parallel branches, or cycle detection, you are building a graph engine, not using one.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-one-meaningful-difference">The One Meaningful Difference<a href="https://engineersofai.com/blog/ai-letters-31-graph-workflows#the-one-meaningful-difference" class="hash-link" aria-label="Direct link to The One Meaningful Difference" title="Direct link to The One Meaningful Difference" translate="no">​</a></h2>
<p>Where SynapseKit and LangChain diverge is state definition.</p>
<p>LangGraph uses a plain <code>TypedDict</code>:</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">class</span><span class="token plain"> </span><span class="token class-name">State</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">TypedDict</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    query</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token builtin">str</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    result</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token builtin">str</span><br></div></code></pre></div></div>
<p>SynapseKit uses <code>TypedState</code> with explicit <code>StateField</code> declarations:</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">schema </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> TypedState</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">fields</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token string" style="color:#e3116c">'query'</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain">  StateField</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">default</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">''</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token string" style="color:#e3116c">'result'</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> StateField</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">default</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">''</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">)</span><br></div></code></pre></div></div>
<p>For simple last-write-wins state, LangGraph's <code>TypedDict</code> is cleaner and more Pythonic. For parallel branches that merge state - where two nodes independently append to a shared list, for example - SynapseKit's <code>StateField</code> reducers handle the merge logic declaratively. You define how concurrent writes resolve instead of writing merge code.</p>
<p>If your workflows are linear with conditional branches, LangGraph's state model is simpler. If your workflows have parallel fan-out/fan-in patterns, SynapseKit's reducer model prevents merge bugs.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="when-you-need-a-graph">When You Need a Graph<a href="https://engineersofai.com/blog/ai-letters-31-graph-workflows#when-you-need-a-graph" class="hash-link" aria-label="Direct link to When You Need a Graph" title="Direct link to When You Need a Graph" translate="no">​</a></h2>
<p>Not every pipeline needs graph primitives. A simple retrieve-augment-generate chain is fine as a chain. Reach for a graph when:</p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">When to use a graph workflow:</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Pattern              Example</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">---------------------------------------------------------</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Conditional routing  Route to different models by query</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">                     complexity or topic domain</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Retry loops          Re-run generation if confidence &lt; 0.8,</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">                     up to 3 times</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Parallel branches    Web search + DB lookup simultaneously,</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">                     merge results before generation</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Human-in-the-loop   Pause at review node, wait for</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">                     approval, resume or reject</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Quality gates        Evaluate output against criteria,</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">                     loop back to improve if insufficient</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Multi-step agents    Agent reasons, acts, observes, decides</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">                     whether to continue or terminate</span><br></div></code></pre></div></div>
<p>If none of these patterns apply to your workflow, a chain is simpler, debuggable, and sufficient. Do not adopt graph complexity for linear pipelines.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="what-this-means-for-engineers">What This Means for Engineers<a href="https://engineersofai.com/blog/ai-letters-31-graph-workflows#what-this-means-for-engineers" class="hash-link" aria-label="Direct link to What This Means for Engineers" title="Direct link to What This Means for Engineers" translate="no">​</a></h2>
<ol>
<li class="">
<p><strong>SynapseKit and LangChain tie on graph workflows.</strong> Both ship a complete StateGraph primitive with 7/7 features. The APIs are nearly identical. If graph workflows are your primary concern, both frameworks are equivalent choices.</p>
</li>
<li class="">
<p><strong>LlamaIndex has no graph primitive.</strong> Zero out of 7 features. If your workflow requires conditional routing, loops, or parallel branches, you will build the orchestration layer yourself. This is a significant gap for complex pipeline architectures.</p>
</li>
<li class="">
<p><strong>LangGraph's TypedDict state is simpler for basic cases.</strong> Plain Python TypedDict with no special imports. For last-write-wins state, this is cleaner than SynapseKit's StateField approach.</p>
</li>
<li class="">
<p><strong>SynapseKit's StateField reducers win for parallel merging.</strong> When two branches write to the same state key concurrently, reducers define how to merge. Without reducers, you write merge logic manually and hope you handle every edge case.</p>
</li>
<li class="">
<p><strong>Fewer lines does not mean simpler.</strong> LlamaIndex's 15-line implementation has less code but also less capability. The missing 5 lines buy you state schemas, streaming, checkpointing, visualization, and cycle detection - things you will eventually build by hand.</p>
</li>
</ol>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-thing-most-people-miss">The Thing Most People Miss<a href="https://engineersofai.com/blog/ai-letters-31-graph-workflows#the-thing-most-people-miss" class="hash-link" aria-label="Direct link to The Thing Most People Miss" title="Direct link to The Thing Most People Miss" translate="no">​</a></h2>
<p>Graph workflows are not about replacing chains. They are about making conditional logic declarative instead of imperative.</p>
<p>You can build any graph workflow in raw Python. If/else for routing. While loops for retries. Threading for parallel branches. Dict for state. It works. But the moment you need to debug a failed run at 3am, you want to see the graph structure, replay from a checkpoint, stream events to a dashboard, and visualize where the execution went.</p>
<p>Raw Python gives you none of that. A graph primitive gives you all of it.</p>
<p>The engineer who reaches for a StateGraph is not the one who cannot write if/else statements. They are the one who has debugged enough production workflows to know that the execution infrastructure matters more than the business logic. The business logic is 15 lines. The observability, checkpointing, streaming, and error handling around it is 150 lines. A framework graph primitive absorbs those 150 lines so you write the 15.</p>
<p>SynapseKit and LangChain both understand this. LlamaIndex, for now, does not.</p>
<p>Week 4 continues: cost tracking, guardrails, MCP support, and the final scorecard. The graph benchmark gives both SynapseKit and LangChain a point. The cumulative race holds steady.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="three-things-worth-doing-this-week">Three Things Worth Doing This Week<a href="https://engineersofai.com/blog/ai-letters-31-graph-workflows#three-things-worth-doing-this-week" class="hash-link" aria-label="Direct link to Three Things Worth Doing This Week" title="Direct link to Three Things Worth Doing This Week" translate="no">​</a></h2>
<ol>
<li class="">
<p><strong>Audit your pipeline for hidden conditional logic.</strong> Search for if/else branches that route between different processing paths. Each one is a candidate for a graph node with a conditional edge. Declare the routing, do not embed it in procedural code.</p>
</li>
<li class="">
<p><strong>Add checkpointing to any workflow that takes more than 30 seconds.</strong> If a 5-node pipeline fails at node 4, you should resume from node 3, not restart from node 1. Both SynapseKit and LangGraph ship checkpointers. Use them.</p>
</li>
<li class="">
<p><strong>Visualize your graph before deploying it.</strong> Both SynapseKit (<code>app.get_mermaid()</code>) and LangGraph (<code>app.get_graph().draw_mermaid()</code>) export Mermaid diagrams. Generate the diagram, review the edges, confirm the routing logic matches your intent. A graph you can see is a graph you can debug.</p>
</li>
</ol>
<p>The best workflow architecture is the one where adding a new branch takes one line, not a refactor. Graph primitives make that possible. Raw Python makes it a project.</p>
<hr>
<p><em>Engineers of AI</em></p>
<p><strong>Read more: <a href="https://www.engineersofai.com/" target="_blank" rel="noopener noreferrer" class="">www.engineersofai.com</a></strong></p>
<p><em>If this was useful, forward it to one engineer who should be reading it.</em></p>]]></content:encoded>
            <category>AI Letters</category>
            <category>LLM Frameworks</category>
            <category>Graph Workflows</category>
            <category>Benchmarks</category>
        </item>
        <item>
            <title><![CDATA[AI Letters #30 - Async Throughput: The Framework Tax on Every Concurrent Request]]></title>
            <link>https://engineersofai.com/blog/ai-letters-30-async-throughput</link>
            <guid>https://engineersofai.com/blog/ai-letters-30-async-throughput</guid>
            <pubDate>Wed, 22 Apr 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[At 50 concurrent requests, LangChain loses 19.2% of theoretical throughput to framework overhead. SynapseKit loses 3.2%. The async benchmark reveals which frameworks genuinely run non-blocking IO and which quietly serialize your concurrent workloads.]]></description>
            <content:encoded><![CDATA[<p>Every framework says <code>await</code>. Every framework says "production-ready". At one concurrent request, the difference is invisible. At 50 concurrent requests, LangChain's LCEL middleware costs 19.2% of theoretical throughput while SynapseKit loses only 3.2%. Notebook #22 of the LLM Showdown isolates the framework tax on async IO - and the gap is 7x in overhead milliseconds.</p>
<!-- -->
<div style="display:flex;flex-direction:column;gap:16px;margin:24px 0">
<a href="https://engineersofai.com/blog-visuals/ai-letters-30/timeline.html" target="_blank" rel="noopener noreferrer" style="text-decoration:none" class="">
<div style="border:1px solid #e2e8f0;border-radius:12px;padding:20px 24px;background:#fff;cursor:pointer">
<div style="font-size:0.72rem;font-weight:700;letter-spacing:0.08em;text-transform:uppercase;color:#6366f1;margin-bottom:6px">Interactive Chart</div>
<div style="font-size:1rem;font-weight:700;color:#0f172a;margin-bottom:6px">Async in Python: From Callbacks to Native Coroutines →</div>
<div style="font-size:0.88rem;color:#475569;line-height:1.6">The history of async IO in Python - from Twisted's reactor pattern through asyncio, uvloop, and into LLM framework async primitives. Click each milestone to see how async patterns evolved.</div>
</div>
</a>
<a href="https://engineersofai.com/blog-visuals/ai-letters-30/paradigms.html" target="_blank" rel="noopener noreferrer" style="text-decoration:none" class="">
<div style="border:1px solid #e2e8f0;border-radius:12px;padding:20px 24px;background:#fff;cursor:pointer">
<div style="font-size:0.72rem;font-weight:700;letter-spacing:0.08em;text-transform:uppercase;color:#0ea5e9;margin-bottom:6px">Interactive Explorer</div>
<div style="font-size:1rem;font-weight:700;color:#0ea5e9;margin-bottom:6px">Throughput Scaling Explorer →</div>
<div style="font-size:0.88rem;color:#475569;line-height:1.6">Drag the concurrency slider from 1 to 50 and watch how each framework's throughput scales. See where LangChain's curve diverges from the theoretical maximum.</div>
</div>
</a>
<a href="https://engineersofai.com/blog-visuals/ai-letters-30/evidence.html" target="_blank" rel="noopener noreferrer" style="text-decoration:none" class="">
<div style="border:1px solid #e2e8f0;border-radius:12px;padding:20px 24px;background:#fff;cursor:pointer">
<div style="font-size:0.72rem;font-weight:700;letter-spacing:0.08em;text-transform:uppercase;color:#10b981;margin-bottom:6px">Evidence Dashboard</div>
<div style="font-size:1rem;font-weight:700;color:#0f172a;margin-bottom:6px">Full Throughput Evidence Dashboard →</div>
<div style="font-size:0.88rem;color:#475569;line-height:1.6">Efficiency bars, overhead breakdown, scaling factors, and per-call latency - all benchmark data from notebook #22 in one interactive view.</div>
</div>
</a>
</div>
<blockquote>
<p>"The difference between wrapping a sync call in a thread and genuinely non-blocking async IO only shows up under real concurrency. At 50 simultaneous requests, that difference is 19%."</p>
</blockquote>
<p>Every LLM framework claims async support. The documentation says <code>await</code>. The examples show <code>ainvoke</code>. The marketing page says "production-ready". And when you run a single request, every framework delivers the same result in approximately the same time. The overhead per call is sub-millisecond. Nobody notices.</p>
<p>Then you deploy to a FastAPI endpoint handling 20 simultaneous users. Or you fire off 50 tool calls in an <code>asyncio.gather</code> batch. And one framework quietly adds 12 milliseconds of overhead per batch while the others add less than 2. At scale, those milliseconds compound into throughput ceilings that are invisible in development and painful in production.</p>
<p>Notebook #22 of the LLM Showdown isolates exactly this. A mock async function with a fixed 50ms sleep - simulating an LLM API call - wrapped in each framework's async primitive. Fire N concurrent requests. Measure total time. A perfect async implementation processes 50 requests in ~50ms. Any extra time is pure framework tax.</p>
<p>The results are not close.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="what-we-measured">What We Measured<a href="https://engineersofai.com/blog/ai-letters-30-async-throughput#what-we-measured" class="hash-link" aria-label="Direct link to What We Measured" title="Direct link to What We Measured" translate="no">​</a></h2>
<p>Each framework wraps a mock async function - <code>asyncio.sleep(0.05)</code> - simulating a 50ms LLM API call. We fire N concurrent requests using <code>asyncio.gather</code> and measure total wall-clock time. A perfect async implementation processes N requests in ~50ms regardless of N, because all sleeps run concurrently in the event loop.</p>

























<table><thead><tr><th>Metric</th><th>What it captures</th></tr></thead><tbody><tr><td><strong>Requests/sec</strong></td><td>Throughput at 1, 5, 10, 20, 50 concurrent requests</td></tr><tr><td><strong>Async efficiency</strong></td><td>Actual rps vs theoretical max (% of ideal)</td></tr><tr><td><strong>Scaling factor</strong></td><td>rps at n=50 / rps at n=1 - perfect async gives 50x</td></tr><tr><td><strong>Framework overhead</strong></td><td>Milliseconds added per batch beyond raw asyncio</td></tr></tbody></table>
<p><strong>Frameworks:</strong> SynapseKit 1.4 (<code>BaseTool.run()</code>), LangChain 1.2 (<code>RunnableLambda.ainvoke()</code>), LlamaIndex Core 0.14 (<code>FunctionTool.acall()</code>)</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-numbers">The Numbers<a href="https://engineersofai.com/blog/ai-letters-30-async-throughput#the-numbers" class="hash-link" aria-label="Direct link to The Numbers" title="Direct link to The Numbers" translate="no">​</a></h2>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">Throughput (requests/sec):</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Concurrency   Baseline  SynapseKit  LangChain  LlamaIndex</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">----------------------------------------------------------</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">n=1             19.6       19.8        19.4       19.7</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">n=5             97.8       98.8        96.1       97.3</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">n=10           194.9      195.7       184.2      193.3</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">n=20           391.3      388.9       360.5      381.9</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">n=50           986.6      967.5       808.3      927.2</span><br></div></code></pre></div></div>
<p>At n=1, everyone looks the same. The mock call takes ~50ms. Each framework adds sub-millisecond overhead. If this were the only data point, you would conclude that async performance is irrelevant to framework choice.</p>
<p>At n=50, the picture changes. The baseline (raw <code>asyncio.sleep</code>) achieves 986.6 rps - nearly the theoretical maximum of 1000 rps (50 requests / 0.05s). SynapseKit tracks close at 967.5. LlamaIndex at 927.2. LangChain drops to 808.3.</p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">Async efficiency at n=50 concurrent calls:</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Framework      rps    overhead   efficiency</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">--------------------------------------------</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Baseline      986.6     0.7ms      98.7%</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">SynapseKit    967.5     1.7ms      96.8%</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">LlamaIndex    927.2     3.9ms      92.7%</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">LangChain     808.3    11.9ms      80.8%</span><br></div></code></pre></div></div>
<p>LangChain adds 11.9ms of overhead per batch at 50 concurrent requests. SynapseKit adds 1.7ms. That is a 7x difference in framework-introduced latency.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-scaling-factor">The Scaling Factor<a href="https://engineersofai.com/blog/ai-letters-30-async-throughput#the-scaling-factor" class="hash-link" aria-label="Direct link to The Scaling Factor" title="Direct link to The Scaling Factor" translate="no">​</a></h2>
<p>The cleanest way to read this: how close does each framework get to 50x throughput when you send 50x more concurrent requests?</p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">Scaling factor: rps(n=50) / rps(n=1)</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Perfect async = 50x</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Framework      rps n=1  rps n=50  scaling  vs perfect</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">------------------------------------------------------</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Baseline         19.6     986.6    50.4x     100.9%</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">SynapseKit       19.8     967.5    48.9x      97.7%</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">LlamaIndex       19.7     927.2    47.1x      94.2%</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">LangChain        19.4     808.3    41.7x      83.5%</span><br></div></code></pre></div></div>
<p>SynapseKit: 97.7% of perfect scaling. LlamaIndex: 94.2%. LangChain: 83.5%.</p>
<p>The 16.5% gap between SynapseKit and LangChain at 50 concurrent requests is not a rounding error. It is a consistent pattern across multiple runs (median of 3 repeats, after warmup). Something in LangChain's LCEL <code>ainvoke</code> path does more work per invocation than the other frameworks' async primitives.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="where-the-overhead-comes-from">Where the Overhead Comes From<a href="https://engineersofai.com/blog/ai-letters-30-async-throughput#where-the-overhead-comes-from" class="hash-link" aria-label="Direct link to Where the Overhead Comes From" title="Direct link to Where the Overhead Comes From" translate="no">​</a></h2>
<p>This benchmark isolates the framework call path. The mock function is identical - <code>asyncio.sleep(0.05)</code> - so the overhead is entirely in:</p>
<ol>
<li class=""><strong>Object construction</strong> - creating/validating the invocation context</li>
<li class=""><strong>Callback routing</strong> - LCEL's pipe chain, middleware, callbacks</li>
<li class=""><strong>Serialization/validation</strong> - input/output schema checks</li>
</ol>
<p>LangChain's LCEL is a composable chain architecture. Every <code>ainvoke</code> passes through the <code>Runnable</code> protocol - input validation, callbacks, tracing hooks, output parsing. This is powerful for composition (<code>chain1 | chain2 | chain3</code>) but adds overhead per invocation. At n=1, the overhead is 0.51ms - invisible. At n=50, the total accumulated overhead is 11.9ms per batch.</p>
<p>SynapseKit's <code>BaseTool.run()</code> is a thin wrapper. Validate the input against the JSON schema, call the function, return the result. No middleware chain, no callback infrastructure. The tradeoff: less composability, less overhead.</p>
<p>LlamaIndex's <code>FunctionTool.acall()</code> falls in between - some validation overhead but no LCEL-style chain traversal.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-real-world-caveat">The Real-World Caveat<a href="https://engineersofai.com/blog/ai-letters-30-async-throughput#the-real-world-caveat" class="hash-link" aria-label="Direct link to The Real-World Caveat" title="Direct link to The Real-World Caveat" translate="no">​</a></h2>
<p>This benchmark tests the <em>framework call path</em> under synthetic concurrency. In a production RAG pipeline, the bottleneck is rarely the framework wrapper. It is the retrieval step, the LLM API itself, or the embedding computation.</p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">Production async bottleneck stack:</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">LLM API call         200-2000ms   &lt;-- actual bottleneck</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Embedding call        10-100ms    &lt;-- second bottleneck</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Vector DB query        5-50ms     &lt;-- third bottleneck</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Framework overhead     1-12ms     &lt;-- what we measured</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Python event loop     &lt;0.1ms     &lt;-- irrelevant</span><br></div></code></pre></div></div>
<p>The framework overhead matters when:</p>
<ul>
<li class="">
<p><strong>Batch processing with asyncio.gather:</strong> If you fire 100+ concurrent tool calls in a batch, the per-batch overhead compounds. LangChain's 11.9ms at n=50 extrapolates to ~25ms at n=100. SynapseKit's 1.7ms extrapolates to ~3.5ms. Still small in absolute terms - but the ratio stays 7x.</p>
</li>
<li class="">
<p><strong>FastAPI endpoints at high QPS:</strong> When your server handles 50-100 simultaneous requests, framework overhead becomes a contributor to p99 latency. Not the primary contributor, but a non-trivial one.</p>
</li>
<li class="">
<p><strong>Streaming with concurrent tool calls:</strong> Agents that call multiple tools in parallel between reasoning steps accumulate framework overhead on every tool invocation cycle.</p>
</li>
</ul>
<p>The framework overhead does NOT matter when:</p>
<ul>
<li class="">Your bottleneck is the LLM API (it almost always is)</li>
<li class="">You're running 1-5 concurrent requests (all frameworks are equivalent)</li>
<li class="">Your tools are CPU/GPU bound (use <code>asyncio.to_thread</code>, not <code>await</code>)</li>
</ul>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="what-this-means-for-engineers">What This Means for Engineers<a href="https://engineersofai.com/blog/ai-letters-30-async-throughput#what-this-means-for-engineers" class="hash-link" aria-label="Direct link to What This Means for Engineers" title="Direct link to What This Means for Engineers" translate="no">​</a></h2>
<ol>
<li class="">
<p><strong>At low concurrency, framework async performance is irrelevant.</strong> All three frameworks add sub-millisecond overhead at n=1 through n=5. If your application handles fewer than 10 simultaneous requests, async efficiency should not factor into your framework choice.</p>
</li>
<li class="">
<p><strong>At high concurrency, LangChain's LCEL overhead becomes measurable.</strong> The 11.9ms per-batch overhead at n=50 is not a dealbreaker, but it is a consistent tax. If you are building a high-throughput batch processing pipeline with <code>asyncio.gather</code>, this matters.</p>
</li>
<li class="">
<p><strong>SynapseKit's thin async wrapper pays off at scale.</strong> 96.8% async efficiency at n=50 - nearly indistinguishable from raw asyncio. The tradeoff is less middleware infrastructure. If you need LCEL-style composability, you pay for it.</p>
</li>
<li class="">
<p><strong>LlamaIndex's async path is cleaner than expected.</strong> 92.7% efficiency at n=50 is solid. After weeks of ranking third, this is a genuine strength - LlamaIndex's <code>FunctionTool.acall()</code> adds minimal overhead.</p>
</li>
<li class="">
<p><strong>Profile your actual bottleneck before optimizing framework overhead.</strong> If your LLM API calls take 500ms and your framework adds 2ms, the framework overhead is 0.4% of total latency. Optimize the API call first.</p>
</li>
</ol>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-thing-most-people-miss">The Thing Most People Miss<a href="https://engineersofai.com/blog/ai-letters-30-async-throughput#the-thing-most-people-miss" class="hash-link" aria-label="Direct link to The Thing Most People Miss" title="Direct link to The Thing Most People Miss" translate="no">​</a></h2>
<p>Async efficiency is not the same as async correctness.</p>
<p>A framework can achieve 99% async efficiency on a synthetic benchmark and still serialize your real workload if any component in the chain is synchronous. One sync database call in a retriever. One blocking file read in a document loader. One sync HTTP request wrapped in <code>asyncio.to_thread</code> that exhausts the thread pool.</p>
<p>The benchmark above proves that the framework call paths themselves are non-blocking. That is necessary but not sufficient. The production question is whether every component you plug into the framework - retrievers, embedders, tool functions, document loaders - is also genuinely async.</p>
<p>SynapseKit's retriever and tool base classes are async-native. LlamaIndex's retriever base classes are async-native. LangChain's retrievers are inconsistent - some have native <code>_aget_relevant_documents</code>, some fall back to <code>run_in_executor</code>.</p>
<p>The 19.2% throughput loss LangChain shows in this benchmark is the framework's own overhead. In production, if your retriever falls back to <code>run_in_executor</code>, the loss compounds further. The framework tax and the component tax stack.</p>
<p>The engineer who builds the highest-throughput async pipeline will not be the one who picks the framework with the best synthetic benchmark. They will be the one who audits every component in their chain for sync fallbacks and eliminates them. The framework choice sets the floor. The component audit determines the ceiling.</p>
<p>Week 4 continues: graph workflows, cost tracking, guardrails, MCP support. The async result gives SynapseKit another point. The cumulative race tightens.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="three-things-worth-doing-this-week">Three Things Worth Doing This Week<a href="https://engineersofai.com/blog/ai-letters-30-async-throughput#three-things-worth-doing-this-week" class="hash-link" aria-label="Direct link to Three Things Worth Doing This Week" title="Direct link to Three Things Worth Doing This Week" translate="no">​</a></h2>
<ol>
<li class="">
<p><strong>Audit your async chain for sync fallbacks.</strong> Open every retriever, tool, and loader in your pipeline. Search for <code>run_in_executor</code> or <code>asyncio.to_thread</code>. Each one is a thread-pool bottleneck masquerading as async code. Replace with native async implementations where they exist.</p>
</li>
<li class="">
<p><strong>Run a throughput test on your actual pipeline.</strong> Fire 20 concurrent requests at your full pipeline (not just the LLM call). Measure wall-clock time. Compare against 20 sequential requests. If the ratio is less than 15x, something in your chain is serializing. Find it.</p>
</li>
<li class="">
<p><strong>Set a p99 latency budget for framework overhead.</strong> If your LLM call takes 500ms, your framework overhead budget should be less than 5ms (1%). Measure it with the same technique as notebook #22: wrap a known-latency mock function and compare. If you exceed the budget, simplify the call chain.</p>
</li>
</ol>
<p>The fastest async code is the code that does nothing between your function call and the event loop. Every layer of abstraction between <code>await</code> and the actual IO operation is overhead. Sometimes that abstraction is worth the cost. Sometimes it is not. Measure before you assume.</p>
<hr>
<p><em>Engineers of AI</em></p>
<p><strong>Read more: <a href="https://www.engineersofai.com/" target="_blank" rel="noopener noreferrer" class="">www.engineersofai.com</a></strong></p>
<p><em>If this was useful, forward it to one engineer who should be reading it.</em></p>]]></content:encoded>
            <category>AI Letters</category>
            <category>LLM Frameworks</category>
            <category>Async</category>
            <category>Benchmarks</category>
        </item>
        <item>
            <title><![CDATA[AI Letters #29 - Week 3 Scorecard: Six Agent Benchmarks, Three Frameworks, One Uncomfortable Truth]]></title>
            <link>https://engineersofai.com/blog/ai-letters-29-week3-scorecard</link>
            <guid>https://engineersofai.com/blog/ai-letters-29-week3-scorecard</guid>
            <pubDate>Tue, 21 Apr 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[After 6 agent benchmarks, SynapseKit wins 4 of 6 on ergonomics. LangChain wins the one that matters most in production: per-tool error handling. And LlamaIndex's agent score exposes an architectural truth - it was never an agent framework.]]></description>
            <content:encoded><![CDATA[<p>Six benchmarks. SynapseKit wins 4 on ergonomics. LangChain wins the one you'll hit in production: per-tool error recovery. LlamaIndex scores 7/18 - not a maturity gap, an architectural one. It's a retrieval framework that added agents.</p>
<!-- -->
<div style="display:flex;flex-direction:column;gap:16px;margin:24px 0">
<a href="https://engineersofai.com/blog-visuals/ai-letters-29/timeline.html" target="_blank" rel="noopener noreferrer" style="text-decoration:none" class="">
<div style="border:1px solid #e2e8f0;border-radius:12px;padding:20px 24px;background:#fff;cursor:pointer">
<div style="font-size:0.72rem;font-weight:700;letter-spacing:0.08em;text-transform:uppercase;color:#6366f1;margin-bottom:6px">Interactive Chart</div>
<div style="font-size:1rem;font-weight:700;color:#0f172a;margin-bottom:6px">Agent Framework History Timeline →</div>
<div style="font-size:0.88rem;color:#475569;line-height:1.6">From the original ReAct paper (2022) through LangChain's agent executor, LlamaIndex's agent bolts, and SynapseKit's Crew API. Click each milestone to understand why agent frameworks diverged so dramatically in design philosophy.</div>
</div>
</a>
<a href="https://engineersofai.com/blog-visuals/ai-letters-29/paradigms.html" target="_blank" rel="noopener noreferrer" style="text-decoration:none" class="">
<div style="border:1px solid #e2e8f0;border-radius:12px;padding:20px 24px;background:#fff;cursor:pointer">
<div style="font-size:0.72rem;font-weight:700;letter-spacing:0.08em;text-transform:uppercase;color:#0ea5e9;margin-bottom:6px">Interactive Explorer</div>
<div style="font-size:1rem;font-weight:700;color:#0ea5e9;margin-bottom:6px">6-Dimension Agent Scorecard Explorer →</div>
<div style="font-size:0.88rem;color:#475569;line-height:1.6">Click each of the 6 benchmarks - ReAct, Function Calling, Built-in Tools, Multi-Agent, Observability, Error Handling - to see exact scores, code comparisons, and what the winner got right.</div>
</div>
</a>
<a href="https://engineersofai.com/blog-visuals/ai-letters-29/evidence.html" target="_blank" rel="noopener noreferrer" style="text-decoration:none" class="">
<div style="border:1px solid #e2e8f0;border-radius:12px;padding:20px 24px;background:#fff;cursor:pointer">
<div style="font-size:0.72rem;font-weight:700;letter-spacing:0.08em;text-transform:uppercase;color:#10b981;margin-bottom:6px">Evidence Dashboard</div>
<div style="font-size:1rem;font-weight:700;color:#0f172a;margin-bottom:6px">Full 3-Week Cumulative Rankings →</div>
<div style="font-size:0.88rem;color:#475569;line-height:1.6">Week 3 bar chart, radar across all 6 dimensions, and cumulative 3-week stacked standings - all benchmark data from notebooks #15–#21 in one view.</div>
</div>
</a>
</div>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-six-benchmarks">The Six Benchmarks<a href="https://engineersofai.com/blog/ai-letters-29-week3-scorecard#the-six-benchmarks" class="hash-link" aria-label="Direct link to The Six Benchmarks" title="Direct link to The Six Benchmarks" translate="no">​</a></h2>















































<table><thead><tr><th>#</th><th>Notebook</th><th>Dimension</th><th>Winner</th></tr></thead><tbody><tr><td>15</td><td>ReAct Agents</td><td>LoC + built-in tools + loop control</td><td>SynapseKit</td></tr><tr><td>16</td><td>Function Calling</td><td>Schema LoC + multi-format export</td><td>SynapseKit</td></tr><tr><td>17</td><td>Built-in Tools</td><td>Tool count + zero-config coverage</td><td>SynapseKit</td></tr><tr><td>18</td><td>Multi-Agent</td><td>LoC + orchestration patterns supported</td><td>SynapseKit</td></tr><tr><td>19</td><td>Observability</td><td>LoC to enable + local feature depth</td><td>3-way tie</td></tr><tr><td>20</td><td>Error Handling</td><td>LoC + built-in error primitives</td><td>LangChain</td></tr></tbody></table>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">Week 3 Points (max 18):</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Framework       #15  #16  #17  #18  #19  #20  Total</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">----------------------------------------------------</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">SynapseKit        3    3    3    3    2    2     16</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">LangChain         2    2    2    2    2    3     13</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">LlamaIndex        1    1    1    1    2    1      7</span><br></div></code></pre></div></div>
<p>SynapseKit: 16. LangChain: 13. LlamaIndex: 7.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="what-synapsekit-actually-wins-on">What SynapseKit Actually Wins On<a href="https://engineersofai.com/blog/ai-letters-29-week3-scorecard#what-synapsekit-actually-wins-on" class="hash-link" aria-label="Direct link to What SynapseKit Actually Wins On" title="Direct link to What SynapseKit Actually Wins On" translate="no">​</a></h2>
<p>The four wins are not flukes. There is a coherent pattern.</p>
<p><strong>ReAct Agents (#15):</strong> <code>CalculatorTool</code> and <code>DateTimeTool</code> are built in. You construct an agent with a list of tools and a model - that's the entire setup. LangChain's <code>create_react_agent</code> is clean but requires you to wire the tool list separately from the agent executor. LlamaIndex's <code>ReActAgent</code> matches SynapseKit on line count but ships no built-in calculation or datetime tooling.</p>
<p><strong>Function Calling (#16):</strong> Define a function schema once. Call <code>.schema()</code> for OpenAI format. Call <code>.anthropic_schema()</code> for Anthropic format. Same source of truth, zero duplication. LangChain requires <code>StructuredTool</code> plus <code>convert_to_openai_function</code> - two different objects. LlamaIndex requires <code>FunctionTool</code> plus a separate <code>get_parameters_dict()</code> call. Neither provides a single definition that exports to both provider formats.</p>
<p><strong>Built-in Tools (#17):</strong> 30 tools. 12 that work with zero configuration - no pip install, no API key, no setup. 9 categories. LangChain ships 17 core tools, most requiring a per-tool pip install and an API key before they'll run. LlamaIndex ships 3 core tool wrappers. This is the widest margin in the entire week: 30 vs 17 vs 3.</p>
<p><strong>Multi-Agent (#18):</strong> SynapseKit supports 6 of 6 orchestration patterns - sequential, parallel, supervisor, hierarchical, pipeline, and feedback loop. LangChain supports 5 (LangGraph handles the complex DAG cases well). LlamaIndex supports 3. The <code>Crew</code> + <code>Task(context_from=[...])</code> pattern in SynapseKit is the most concise way to express inter-agent dependencies across all three frameworks.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-one-langchain-win-that-matters">The One LangChain Win That Matters<a href="https://engineersofai.com/blog/ai-letters-29-week3-scorecard#the-one-langchain-win-that-matters" class="hash-link" aria-label="Direct link to The One LangChain Win That Matters" title="Direct link to The One LangChain Win That Matters" translate="no">​</a></h2>
<p>Error handling. LangChain scores 3/3.</p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">ToolException raised inside tool</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">        ↓</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">AgentExecutor catches (handle_tool_error=True)</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">        ↓</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Error message becomes LLM Observation</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">        ↓</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">LLM reasons: retry / use different tool / report to user</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">vs.</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">SynapseKit / LlamaIndex</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">        ↓</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">try/except in tool function (manual, every tool)</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">        ↓</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">return error string (if you remembered to)</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">        ↓</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">no structured recovery loop</span><br></div></code></pre></div></div>
<p><code>ToolException</code> is not just a named exception type. It is a design decision: tool failures are information for the reasoning loop, not crashes to be caught. Raise <code>ToolException("The search API timed out")</code> and the LLM's next observation is that string. It can reason: try a different query, use a fallback tool, tell the user. Five lines including imports. No boilerplate per tool.</p>
<p>LangChain also ships <code>handle_parsing_errors=True</code> - which catches malformed LLM outputs before they crash the agent. This is the failure mode no one talks about until it happens in production: the model returns something that doesn't match the expected ReAct format, the parser throws, the agent is gone. One kwarg prevents it. SynapseKit and LlamaIndex both crash on malformed output without custom handling.</p>
<p>SynapseKit's <code>CircuitState</code> is the stronger primitive for a different failure class - repeated failures at the LLM or network level. But per-tool error handling is where engineers spend most of their production debugging time. LangChain wins that battle.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-uncomfortable-truth-about-llamaindex">The Uncomfortable Truth About LlamaIndex<a href="https://engineersofai.com/blog/ai-letters-29-week3-scorecard#the-uncomfortable-truth-about-llamaindex" class="hash-link" aria-label="Direct link to The Uncomfortable Truth About LlamaIndex" title="Direct link to The Uncomfortable Truth About LlamaIndex" translate="no">​</a></h2>
<p>LlamaIndex scored 7 out of 18 possible points in the Agents &amp; Tools week. Third place in 5 of 6 benchmarks. Third in ReAct ergonomics. Third in function calling. Third in multi-agent patterns. Third in error handling. Tied for second in observability only because all three frameworks cover the basics.</p>
<p>This is not a performance gap or a maturity gap. It is an architectural conclusion: <strong>LlamaIndex is a retrieval and indexing framework. It added agents. It is not an agent framework that also handles retrieval.</strong></p>
<p>In Week 2 (RAG Pipelines), LlamaIndex came second overall. Its chunking benchmark (#9) was the most detailed of any framework. Its document loading and indexing abstractions are the most mature. <code>VectorStoreIndex</code>, <code>SummaryIndex</code>, <code>KnowledgeGraphIndex</code> - these are not bolt-ons. They are the product.</p>
<p>When your application is 80% retrieval and 20% agent orchestration, LlamaIndex is the correct choice. When the ratio flips, you are fighting the framework's grain.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-3-week-cumulative-picture">The 3-Week Cumulative Picture<a href="https://engineersofai.com/blog/ai-letters-29-week3-scorecard#the-3-week-cumulative-picture" class="hash-link" aria-label="Direct link to The 3-Week Cumulative Picture" title="Direct link to The 3-Week Cumulative Picture" translate="no">​</a></h2>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">Framework     Week 1  Week 2  Week 3   Total (21 benchmarks)</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">------------------------------------------------------------</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">SynapseKit       15      14      16       45</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">LangChain         8      10      13       31</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">LlamaIndex        7      12       7       26</span><br></div></code></pre></div></div>
<p>The trend line for LangChain is important. Week 1: 8 points. Week 2: 10. Week 3: 13. The delta between first and second place has shrunk from 7 points to 3 points over three weeks. Week 4 tests production concerns - async throughput, graph workflows, cost tracking, guardrails, MCP support. LangChain's ecosystem depth tends to surface there. The gap may close further.</p>
<p>LlamaIndex's pattern is the mirror image: strong in Week 2 (12 points, retrieval week), weak in Weeks 1 and 3 (7 points each, everything else). A specialist framework trading against generalists.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="what-this-means-for-engineers">What This Means for Engineers<a href="https://engineersofai.com/blog/ai-letters-29-week3-scorecard#what-this-means-for-engineers" class="hash-link" aria-label="Direct link to What This Means for Engineers" title="Direct link to What This Means for Engineers" translate="no">​</a></h2>
<ol>
<li class="">
<p><strong>If you're building an agent-first application, SynapseKit's batteries-included approach saves real time.</strong> 30 built-in tools, concise multi-agent patterns, single function schema definition. The upfront ergonomics advantage compounds over the first month of development.</p>
</li>
<li class="">
<p><strong>Add <code>handle_tool_error=True</code> and <code>handle_parsing_errors=True</code> to every LangChain AgentExecutor immediately.</strong> These two kwargs are free insurance. Without them, tool exceptions crash the agent and malformed LLM outputs crash the agent. With them, both become recoverable observations. No code changes required.</p>
</li>
<li class="">
<p><strong>LangChain's per-tool error recovery is better than writing your own.</strong> If you are currently wrapping every tool function in a try/except and returning error strings manually - in any framework - you are doing more work than LangChain's <code>ToolException</code> pattern requires.</p>
</li>
<li class="">
<p><strong>Use LlamaIndex specifically when your application is knowledge-graph-heavy or your chunking requirements are sophisticated.</strong> <code>SemanticSplitterNodeParser</code>, recursive splitting with boundary detection, <code>KnowledgeGraphIndex</code> - these have no equivalent in SynapseKit or LangChain.</p>
</li>
<li class="">
<p><strong>The framework choice is not permanent, but the migration cost is real.</strong> Switching from LangChain's <code>AgentExecutor</code> to SynapseKit's <code>Crew</code> mid-project is not a find-and-replace operation. Pick based on what your application's core pattern is.</p>
</li>
</ol>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-thing-most-people-miss">The Thing Most People Miss<a href="https://engineersofai.com/blog/ai-letters-29-week3-scorecard#the-thing-most-people-miss" class="hash-link" aria-label="Direct link to The Thing Most People Miss" title="Direct link to The Thing Most People Miss" translate="no">​</a></h2>
<p>The benchmarks measure ergonomics. Ergonomics predicts developer velocity in the first 90 days. It does not predict the failure modes you encounter in production at month six.</p>
<p>The most common production failure in LLM agents is not a missing built-in tool or a verbose schema definition. It is uncontrolled loops - agents that retry a failing operation until they exhaust either the max_iterations cap or the API rate limit. SynapseKit's <code>CircuitState</code> and LangChain's <code>ToolException</code> both address this, from opposite directions. SynapseKit short-circuits before the LLM sees the failure. LangChain routes the failure through the LLM and hopes it reasons its way out.</p>
<p>Both work for different failure classes. Neither is universal.</p>
<p>The engineer who builds the most reliable production agent will be the one who understands which failures should be invisible to the LLM (circuit-break them) and which failures the LLM should reason about (ToolException them). That judgment call is not in any benchmark. It comes from shipping something, watching it break, and learning the shape of the break.</p>
<p>Week 4 shifts to production: async throughput, graph-based workflows, built-in evaluation, cost tracking, guardrails, MCP support. That is where the ergonomics winner and the production winner may diverge for the first time.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="three-things-worth-doing-this-week">Three Things Worth Doing This Week<a href="https://engineersofai.com/blog/ai-letters-29-week3-scorecard#three-things-worth-doing-this-week" class="hash-link" aria-label="Direct link to Three Things Worth Doing This Week" title="Direct link to Three Things Worth Doing This Week" translate="no">​</a></h2>
<ol>
<li class="">
<p><strong>Map your application's agent-to-retrieval ratio.</strong> Write it down as a fraction. If it's above 60% agents, audit whether your current framework has built-in error primitives. If it's below 40% agents, audit whether your retrieval path uses framework-native indexing or custom code.</p>
</li>
<li class="">
<p><strong>Count your framework's built-in tools and test three of them.</strong> The tools you're pip-installing and wrapping manually might already be built in. SynapseKit's 12 zero-config tools cover most of what agents need without any setup.</p>
</li>
<li class="">
<p><strong>Write a deliberate failure test for your agent.</strong> Pick the tool your agent calls most frequently, make it throw an exception, and watch what happens. Does the agent recover? Does it loop? Does it crash? That diagnosis time is the measurement that matters most for production reliability.</p>
</li>
</ol>
<p>Three weeks of benchmarks point to a framework with strong agent ergonomics. Six months of production data will point to something more nuanced. The race is not over.</p>
<hr>
<p><em>Engineers of AI</em></p>
<p><strong>Read more: <a href="https://www.engineersofai.com/" target="_blank" rel="noopener noreferrer" class="">www.engineersofai.com</a></strong></p>
<p><em>If this was useful, forward it to one engineer who should be reading it.</em></p>]]></content:encoded>
            <category>AI Letters</category>
            <category>LLM Frameworks</category>
            <category>Agents</category>
            <category>Benchmarks</category>
        </item>
        <item>
            <title><![CDATA[AI Letters #28 - Agent Error Handling: LangChain Wins on Features, But What Does It Actually Catch?]]></title>
            <link>https://engineersofai.com/blog/ai-letters-28-error-handling</link>
            <guid>https://engineersofai.com/blog/ai-letters-28-error-handling</guid>
            <pubDate>Fri, 17 Apr 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[Notebook #20 benchmarks error handling across LangChain, SynapseKit, and LlamaIndex. LangChain's ToolException converts tool failures to LLM observations in 5 lines. SynapseKit's CircuitBreaker stops compounding failures at the model level. LlamaIndex is entirely DIY.]]></description>
            <content:encoded><![CDATA[<p>LangChain wins on both dimensions - fewest lines (5) and most built-in error features (6/7). But its ToolException converts failures into LLM observations, making the model your error handler. SynapseKit's CircuitBreaker stops broken services from being hammered. LlamaIndex ships 1/7 features and expects you to bring the rest.</p>
<!-- -->
<div style="display:flex;flex-direction:column;gap:16px;margin:24px 0">
<a href="https://engineersofai.com/blog-visuals/ai-letters-28/timeline.html" target="_blank" rel="noopener noreferrer" style="text-decoration:none" class="">
<div style="border:1px solid #e2e8f0;border-radius:12px;padding:20px 24px;background:#fff;cursor:pointer">
<div style="font-size:0.72rem;font-weight:700;letter-spacing:0.08em;text-transform:uppercase;color:#6366f1;margin-bottom:6px">Interactive Timeline</div>
<div style="font-size:1rem;font-weight:700;color:#0f172a;margin-bottom:6px">Error Handling History →</div>
<div style="font-size:0.88rem;color:#475569;line-height:1.6">From exception hierarchies (1960s) to circuit breakers (2007) to LLM-native error recovery - the lineage behind today's agent resilience patterns.</div>
</div>
</a>
<a href="https://engineersofai.com/blog-visuals/ai-letters-28/paradigms.html" target="_blank" rel="noopener noreferrer" style="text-decoration:none" class="">
<div style="border:1px solid #e2e8f0;border-radius:12px;padding:20px 24px;background:#fff;cursor:pointer">
<div style="font-size:0.72rem;font-weight:700;letter-spacing:0.08em;text-transform:uppercase;color:#0ea5e9;margin-bottom:6px">Interactive Explorer</div>
<div style="font-size:1rem;font-weight:700;color:#0f172a;margin-bottom:6px">Three Error Handling Paradigms →</div>
<div style="font-size:0.88rem;color:#475569;line-height:1.6">Side-by-side code, error flow diagrams, and feature breakdowns for LangChain, SynapseKit, and LlamaIndex.</div>
</div>
</a>
<a href="https://engineersofai.com/blog-visuals/ai-letters-28/evidence.html" target="_blank" rel="noopener noreferrer" style="text-decoration:none" class="">
<div style="border:1px solid #e2e8f0;border-radius:12px;padding:20px 24px;background:#fff;cursor:pointer">
<div style="font-size:0.72rem;font-weight:700;letter-spacing:0.08em;text-transform:uppercase;color:#10b981;margin-bottom:6px">Benchmark Results</div>
<div style="font-size:1rem;font-weight:700;color:#0f172a;margin-bottom:6px">Full Feature Matrix &amp; LoC Charts →</div>
<div style="font-size:0.88rem;color:#475569;line-height:1.6">LoC comparison, 7-feature heatmap, design philosophy cards, and the complementary gap between LangChain and SynapseKit's error coverage.</div>
</div>
</a>
</div>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-numbers">The Numbers<a href="https://engineersofai.com/blog/ai-letters-28-error-handling#the-numbers" class="hash-link" aria-label="Direct link to The Numbers" title="Direct link to The Numbers" translate="no">​</a></h2>
<p><strong>Lines of error-handling code (imports + error-specific lines):</strong></p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">Framework       Imports  Error lines  Total</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">------------------------------------------</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">LangChain             2           3      5</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">SynapseKit            2           5      7</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">LlamaIndex            2           6      8</span><br></div></code></pre></div></div>
<p><strong>What those lines actually give you (feature depth score):</strong></p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">Feature                         LangChain  SynapseKit  LlamaIndex</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">-----------------------------------------------------------------</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Dedicated exception type          Yes        No          No</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Error → LLM observation           Yes        No          No</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Handle LLM parse errors           Yes        No          No</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">LLM fallback chain                Yes        Yes         No</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Circuit breaker                   No         Yes         No</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Max iterations guard              Yes        Yes         Yes</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Custom error handler fn           Yes        No          No</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Score (out of 7):                  6          3           1</span><br></div></code></pre></div></div>
<p>The score gap is wide. LangChain ships 6/7 error handling features out of the box. LlamaIndex ships 1. That 1 is max_iterations - a last-resort stop, not a recovery mechanism.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-three-design-philosophies">The Three Design Philosophies<a href="https://engineersofai.com/blog/ai-letters-28-error-handling#the-three-design-philosophies" class="hash-link" aria-label="Direct link to The Three Design Philosophies" title="Direct link to The Three Design Philosophies" translate="no">​</a></h2>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">What happens when a tool throws an exception?</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">LangChain                SynapseKit               LlamaIndex</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">──────────────────────   ──────────────────────   ──────────────────────</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">ToolException raised     try/except in            try/except wrapper</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">  ↓                        tool.run()               function (manual)</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">AgentExecutor catches      ↓                         ↓</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">handle_tool_error=True   return error string       return error string</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">  ↓                        ↓                         ↓</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Error becomes LLM        Check CircuitState        Propagates up</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">  observation             FallbackChain              (uncaught = crash)</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">  ↓                       if LLM fails</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">LLM tries to recover</span><br></div></code></pre></div></div>
<p><strong>LangChain turns tool errors into LLM observations.</strong> Raise a <code>ToolException</code> inside a tool, set <code>handle_tool_error=True</code> on <code>AgentExecutor</code>, and the exception message becomes a new observation in the agent's thought/action/observation loop. The LLM sees it as: "The tool returned an error: API timeout." It can then reason about it - retry, use a different tool, or tell the user. This is elegant. It's also the source of a subtle failure mode: the LLM will try to reason its way through errors it cannot fix.</p>
<p><strong>SynapseKit handles errors at both layers.</strong> Manual try/except in <code>tool.run()</code> for tool-level failures (return a fallback string). <code>FallbackChain</code> for model-level failures - if <code>gpt-4o-mini</code> fails, automatically retry with <code>gpt-3.5-turbo</code>. <code>CircuitState</code> tracks repeated failures and can short-circuit a tool that keeps breaking. Fewer convenience features. More explicit control over what happens when the model itself is the problem.</p>
<p><strong>LlamaIndex provides no built-in error primitives.</strong> Max iterations as a last resort. Everything else is a wrapper function you write yourself. <code>FunctionTool.from_defaults(fn=safe_search)</code> where <code>safe_search</code> is just a try/except you added manually. The framework makes no distinction between a tool that errored and a tool that returned normally - both return strings.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="what-langchains-handle_tool_error-actually-does">What LangChain's <code>handle_tool_error</code> Actually Does<a href="https://engineersofai.com/blog/ai-letters-28-error-handling#what-langchains-handle_tool_error-actually-does" class="hash-link" aria-label="Direct link to what-langchains-handle_tool_error-actually-does" title="Direct link to what-langchains-handle_tool_error-actually-does" translate="no">​</a></h2>
<p>This is the mechanism most engineers misunderstand. When you set <code>handle_tool_error=True</code>:</p>
<ol>
<li class="">Your tool raises <code>ToolException("Search failed: API timeout")</code></li>
<li class=""><code>AgentExecutor</code> catches it</li>
<li class="">The error message becomes the next <code>Observation</code> in the ReAct loop</li>
<li class="">The LLM reads: <code>Observation: Search failed: API timeout</code></li>
<li class="">The LLM decides what to do next</li>
</ol>
<p>The LLM is now your error handler. For recoverable errors ("Search failed, try a different query"), this works well. For unrecoverable errors ("Database credentials invalid"), the LLM will loop - trying variations, rephrasing the query, eventually hitting <code>max_iterations</code>. You need both <code>handle_tool_error=True</code> and <code>max_iterations</code> to prevent infinite loops on hard failures.</p>
<p><code>handle_tool_error</code> can also accept a string (fixed message to the LLM) or a callable (function that takes the exception and returns a message). The callable pattern is the most production-safe: you can inspect the exception type and give the LLM targeted instructions for specific error classes.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="what-this-means-for-engineers">What This Means for Engineers<a href="https://engineersofai.com/blog/ai-letters-28-error-handling#what-this-means-for-engineers" class="hash-link" aria-label="Direct link to What This Means for Engineers" title="Direct link to What This Means for Engineers" translate="no">​</a></h2>
<ol>
<li class="">
<p><strong>For tool-level failures, LangChain's ToolException is the fastest path.</strong> Three lines, immediate recovery loop, no custom code. If your tools are external APIs that occasionally fail, <code>ToolException</code> + <code>handle_tool_error=True</code> gets you working recovery behavior in minutes.</p>
</li>
<li class="">
<p><strong>For model-level failures, LangChain gives you <code>.with_fallbacks()</code>.</strong> Chain multiple models: <code>primary_llm.with_fallbacks([backup_llm])</code>. This is built-in but not wired into <code>AgentExecutor</code> automatically - you need to apply it at the LLM construction step, not the agent step.</p>
</li>
<li class="">
<p><strong>SynapseKit's CircuitBreaker is the only primitive that stops compounding failures.</strong> If a tool fails three times in a row, <code>CircuitState</code> can mark it as open and refuse subsequent calls until a timeout passes. No LLM framework besides SynapseKit ships this by default. In production systems that call external APIs, a circuit breaker is the difference between "the agent degraded gracefully" and "the agent hammered a failing endpoint 47 times."</p>
</li>
<li class="">
<p><strong>LlamaIndex's 1/7 score is a design choice, not a bug.</strong> LlamaIndex's philosophy is composability: you bring your own retry logic, your own circuit breaker, your own fallback chain. The framework won't make assumptions about your error handling policy. For teams with existing resilience infrastructure (Polly, Tenacity, custom retry decorators), this is actually fine - LlamaIndex slots in without conflict.</p>
</li>
<li class="">
<p><strong>The absence of LangChain's parse error handling in the others is significant.</strong> <code>handle_parsing_errors=True</code> catches malformed LLM outputs - when the model returns something that doesn't match the expected ReAct format. This is common with weaker models or unusual prompts. SynapseKit and LlamaIndex both crash on malformed output. LangChain retries with a parsing error message injected back to the LLM.</p>
</li>
</ol>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-thing-most-people-miss">The Thing Most People Miss<a href="https://engineersofai.com/blog/ai-letters-28-error-handling#the-thing-most-people-miss" class="hash-link" aria-label="Direct link to The Thing Most People Miss" title="Direct link to The Thing Most People Miss" translate="no">​</a></h2>
<p>Error handling in LLM agents is not the same problem as error handling in deterministic software.</p>
<p>In a REST API, an error is a signal: something failed, here's the status code, the client decides what to do. The error is the end of the interaction.</p>
<p>In an LLM agent, an error is an observation: something failed, the model reads the error message, and the model decides what to do next. The error is the beginning of a new reasoning step.</p>
<p>LangChain's design is built for this. <code>ToolException</code> is not a crash - it's a structured message to the reasoning loop. The implication: you need to write error messages for an LLM audience, not a developer audience. "API timeout" is poor. "The search API is temporarily unavailable. You can either retry the same query or answer from your training knowledge." is better. The LLM will use that context to make a better decision.</p>
<p>The circuit breaker fills a gap this reasoning loop cannot. If the search API is down for 30 minutes, no amount of LLM reasoning will fix it. The circuit breaker stops the agent from trying 20 more times before giving up. It's the only error primitive that operates outside the reasoning loop entirely - which is exactly why LangChain doesn't have one. LangChain's model is: route everything through the LLM. SynapseKit's model is: some failures should never reach the LLM.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="three-things-worth-doing-this-week">Three Things Worth Doing This Week<a href="https://engineersofai.com/blog/ai-letters-28-error-handling#three-things-worth-doing-this-week" class="hash-link" aria-label="Direct link to Three Things Worth Doing This Week" title="Direct link to Three Things Worth Doing This Week" translate="no">​</a></h2>
<ol>
<li class="">
<p><strong>Add <code>handle_parsing_errors=True</code> to every <code>AgentExecutor</code> you have in production.</strong> Malformed LLM outputs are silent failures without this. One extra kwarg, zero code changes.</p>
</li>
<li class="">
<p><strong>Audit your tool exception messages for LLM readability.</strong> If you're using <code>handle_tool_error=True</code>, the error message is going to the model. Rewrite your <code>ToolException</code> strings as instructions: what happened, what the LLM can try instead.</p>
</li>
<li class="">
<p><strong>Count how many times each external tool is called in a single agent run.</strong> If any tool can be called more than 5 times, you need a circuit breaker or a call cap. Without one, a single stuck agent can exhaust an API quota.</p>
</li>
</ol>
<p>The five-line win is real. What you do with it determines whether errors become recoverable observations or infinite loops.</p>
<hr>
<p><em>Engineers of AI</em></p>
<p><strong>Read more: <a href="https://www.engineersofai.com/" target="_blank" rel="noopener noreferrer" class="">www.engineersofai.com</a></strong></p>
<p><em>If this was useful, forward it to one engineer who should be reading it.</em></p>]]></content:encoded>
            <category>AI Letters</category>
            <category>LLM Frameworks</category>
            <category>Agents</category>
        </item>
        <item>
            <title><![CDATA[AI Letters #27 - Agent Observability: 3 Lines Gets You In, But What Can You Actually See?]]></title>
            <link>https://engineersofai.com/blog/ai-letters-27-observability</link>
            <guid>https://engineersofai.com/blog/ai-letters-27-observability</guid>
            <pubDate>Thu, 16 Apr 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[Notebook]]></description>
            <content:encoded><![CDATA[<blockquote>
<p>"Three lines to enable tracing in LangChain. Zero lines of latency data when you're done."</p>
</blockquote>
<p>Every agent fails eventually. A tool returns nothing. The LLM loops on the same thought. The retrieved documents are all wrong. What separates a two-minute debug from a two-hour one is not how the agent was built - it's how much you can see when it breaks.</p>
<p>Notebook #19 of the LLM Showdown measured one thing: how much can you observe about a running agent without leaving your local environment? No external service. No API key for a tracing platform. No paid tier. Just framework-native observability on the same machine where your code runs.</p>
<p>LangChain enables tracing in the fewest lines. What those lines actually surface is a different question.</p>
<!-- -->
<div style="display:flex;flex-direction:column;gap:16px;margin:24px 0">
<a href="https://engineersofai.com/blog-visuals/ai-letters-27/timeline.html" target="_blank" rel="noopener noreferrer" style="text-decoration:none" class="">
<div style="border:1px solid #e2e8f0;border-radius:12px;padding:20px 24px;background:#fff;cursor:pointer">
<div style="font-size:0.72rem;font-weight:700;letter-spacing:0.08em;text-transform:uppercase;color:#6366f1;margin-bottom:6px">Interactive Chart</div>
<div style="font-size:1rem;font-weight:700;color:#0f172a;margin-bottom:6px">Observability History Timeline →</div>
<div style="font-size:0.88rem;color:#475569;line-height:1.6">From print debugging to distributed tracing to LLM-specific observability. Click each milestone to see how visibility into running systems evolved and what each generation got wrong.</div>
</div>
</a>
<a href="https://engineersofai.com/blog-visuals/ai-letters-27/paradigms.html" target="_blank" rel="noopener noreferrer" style="text-decoration:none" class="">
<div style="border:1px solid #e2e8f0;border-radius:12px;padding:20px 24px;background:#fff;cursor:pointer">
<div style="font-size:0.72rem;font-weight:700;letter-spacing:0.08em;text-transform:uppercase;color:#0ea5e9;margin-bottom:6px">Interactive Explorer</div>
<div style="font-size:1rem;font-weight:700;color:#0ea5e9;margin-bottom:6px">Tracing Design Explorer →</div>
<div style="font-size:0.88rem;color:#475569;line-height:1.6">Click each of the 3 tracing approaches - Tracer object, global flags, callback manager - to see exactly what you can observe, how to query it, and what you're missing locally.</div>
</div>
</a>
<a href="https://engineersofai.com/blog-visuals/ai-letters-27/evidence.html" target="_blank" rel="noopener noreferrer" style="text-decoration:none" class="">
<div style="border:1px solid #e2e8f0;border-radius:12px;padding:20px 24px;background:#fff;cursor:pointer">
<div style="font-size:0.72rem;font-weight:700;letter-spacing:0.08em;text-transform:uppercase;color:#10b981;margin-bottom:6px">Evidence Dashboard</div>
<div style="font-size:1rem;font-weight:700;color:#0f172a;margin-bottom:6px">Full Benchmark Results →</div>
<div style="font-size:0.88rem;color:#475569;line-height:1.6">LoC stacked chart, feature depth heatmap, and design philosophy comparison - all data from notebook #19 in one view.</div>
</div>
</a>
</div>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-numbers">The Numbers<a href="https://engineersofai.com/blog/ai-letters-27-observability#the-numbers" class="hash-link" aria-label="Direct link to The Numbers" title="Direct link to The Numbers" translate="no">​</a></h2>
<p><strong>Lines of code to enable useful local tracing (no external service, no API key):</strong></p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">Framework       Imports  Enable   Total</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">-----------------------------------------</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">LangChain             1       2       3</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">LlamaIndex            2       2       4</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">SynapseKit            2       5       7</span><br></div></code></pre></div></div>
<p>LangChain wins by a wide margin. <code>set_verbose(True)</code> is one line. Add <code>set_debug(True)</code> for full raw prompt logging. That's it.</p>
<p><strong>What those lines actually surface locally (feature depth score):</strong></p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">Feature                         SynapseKit  LangChain  LlamaIndex</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">------------------------------------------------------------------</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Token usage                     Yes         Partial    Yes</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Step latency                    Yes         No         Yes</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Intermediate agent steps        Yes         Yes        Yes</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Tool call args + returns        Yes         Yes        Yes</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Full raw LLM prompt             Yes         Yes        Yes</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Retrieved documents             Yes         Yes        Yes</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Zero-config enable (1-2 lines)  Yes         Yes        No</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Score (out of 7):                 7           5          6</span><br></div></code></pre></div></div>
<p>The latency row is where LangChain's 3-line win costs you the most. <code>set_verbose(True)</code> and <code>set_debug(True)</code> print chain I/O, tool calls, and agent reasoning to stdout. They do not record how long any step took. For timing data - how long did the LLM call take, how long did the tool execution take, which step is the bottleneck - LangChain requires LangSmith, which is an external service.</p>
<p>Token usage is similarly partial: verbose mode shows counts in the output, but not in a structured object you can query. For cost tracking per run, again: LangSmith.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-three-design-philosophies">The Three Design Philosophies<a href="https://engineersofai.com/blog/ai-letters-27-observability#the-three-design-philosophies" class="hash-link" aria-label="Direct link to The Three Design Philosophies" title="Direct link to The Three Design Philosophies" translate="no">​</a></h2>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">How does tracing work?</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">SynapseKit            LangChain             LlamaIndex</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">──────────────────    ──────────────────    ──────────────────</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Explicit object       Global side effect    Injected callback</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">tracer = Tracer()     set_verbose(True)     handler = LlamaDebug</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">agent  = Agent(       # all agents now      Settings.callback_manager</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">  middleware=[tracer])  emit to stdout        = CallbackManager(</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">result = await          automatically         [handler])</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">  agent.run(query)</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">tracer.spans          No object to query    handler.get_event_pairs</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">  → structured list   → redirect stderr       (CBEventType.LLM)</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">                        to capture            → typed event list</span><br></div></code></pre></div></div>
<p><strong>SynapseKit uses an explicit Tracer object.</strong> You pass it into the agent at construction time. After the run, you query <code>tracer.spans</code> to get a structured list of <code>TraceSpan</code> objects - one per event, with <code>duration_ms</code>, metadata, and full payload. This is testable: you can assert on specific spans in a unit test. It's composable: you can pass different tracers to different agents in the same application.</p>
<p><strong>LangChain uses global flags.</strong> <code>set_verbose(True)</code> is a global side effect that makes all subsequent LangChain objects emit structured logs to stderr. No object to query. No programmatic access to events after the run. To capture the output you redirect stderr - which is exactly the kind of code you don't want in production. The upside: one line, zero configuration, works immediately on any existing agent.</p>
<p><strong>LlamaIndex uses a callback manager injected via Settings.</strong> <code>LlamaDebugHandler</code> is the most sophisticated of the three locally. After a run, you call <code>debug_handler.get_event_pairs(CBEventType.LLM)</code> to get typed event pairs (start + end) for every LLM call. <code>CBEventType.FUNCTION_CALL</code> for tool events. <code>CBEventType.RETRIEVE</code> for retrieval events. The event type enum covers the full taxonomy of what an LLM pipeline does. The downside: 4 lines to set up, and the Settings injection pattern means it affects all agents globally - same problem as LangChain's flags, just more structured.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="what-langsmith-actually-solves">What LangSmith Actually Solves<a href="https://engineersofai.com/blog/ai-letters-27-observability#what-langsmith-actually-solves" class="hash-link" aria-label="Direct link to What LangSmith Actually Solves" title="Direct link to What LangSmith Actually Solves" translate="no">​</a></h2>
<p>LangChain's local observability gap is not an accident. The missing features - step latency, structured cost tracking, run replay - are exactly what LangSmith provides. This is an intentional split: local verbose mode for development debugging, LangSmith for production observability.</p>
<p>LangSmith is free to start (up to 5,000 traces/month). For production systems it becomes a meaningful cost. More importantly, it's an external dependency: your observability now requires internet access, an API key in your environment, and a third-party service to be running. For air-gapped deployments, containerised CI environments, or applications where you can't send LLM prompts to a third party, this is a hard constraint.</p>
<p>SynapseKit and LlamaIndex both give you timing and structured event access locally. That's not because LangChain missed these features - it's because they made a different product decision about where the boundary between framework and platform should be.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="what-this-means-for-engineers">What This Means for Engineers<a href="https://engineersofai.com/blog/ai-letters-27-observability#what-this-means-for-engineers" class="hash-link" aria-label="Direct link to What This Means for Engineers" title="Direct link to What This Means for Engineers" translate="no">​</a></h2>
<ol>
<li class="">
<p><strong>For development debugging, LangChain's <code>set_verbose(True)</code> is genuinely the fastest path.</strong> One line, immediate output, zero configuration. If all you need is "show me what the agent is doing", this works.</p>
</li>
<li class="">
<p><strong>If you need timing data locally, LangChain is the wrong tool.</strong> No step latency without LangSmith. If you're profiling which part of your agent pipeline is slow - LLM call, tool execution, retrieval - you need SynapseKit's <code>TraceSpan.duration_ms</code> or LlamaIndex's event timestamps.</p>
</li>
<li class="">
<p><strong>LlamaIndex's <code>CBEventType</code> query API is the most powerful post-run interface.</strong> After a run you can ask: how many LLM calls happened? What were the inputs and outputs? Which retrieval queries ran? All typed, all queryable. It's verbose to set up but the richest local interface of the three.</p>
</li>
<li class="">
<p><strong>SynapseKit's Tracer is the only one designed for testing.</strong> Because it returns a structured object, you can write assertions: <code>assert tracer.spans[2].name == "TOOL_CALL"</code>. You can verify that a tool was called with the right arguments. You can check that the token count stayed under a budget. None of this is possible with global flags or Settings injection.</p>
</li>
<li class="">
<p><strong>Global state is a production smell.</strong> Both <code>set_verbose(True)</code> and <code>Settings.callback_manager</code> are global mutations. In a multi-tenant system, a test suite, or any application where you want different tracing behaviour for different agents, global state is a problem. SynapseKit's explicit middleware pattern is the only one that avoids this.</p>
</li>
</ol>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-thing-most-people-miss">The Thing Most People Miss<a href="https://engineersofai.com/blog/ai-letters-27-observability#the-thing-most-people-miss" class="hash-link" aria-label="Direct link to The Thing Most People Miss" title="Direct link to The Thing Most People Miss" translate="no">​</a></h2>
<p>Observability during development and observability in production are different problems.</p>
<p>During development, you want maximum visibility with minimum setup. LangChain's <code>set_verbose(True)</code> wins here. You run the agent, watch the terminal, understand what happened.</p>
<p>In production, you need structured, queryable, per-run data without global side effects. You need latency. You need the ability to replay a specific failing run. You need to assert "this run used fewer than 2,000 tokens" in a regression test. LangChain's local tooling doesn't give you this - LangSmith does, but at the cost of an external dependency.</p>
<p>The frameworks that win on development convenience (global flags, one-line setup) tend to create friction in production (no structured objects, no local timing). The frameworks that win on production correctness (explicit Tracer, typed callbacks) require more setup. This is not a bug in either design. It's the same tradeoff that appears in every layer of software engineering: explicitness versus convenience, always at the cost of the other.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="three-things-worth-doing-this-week">Three Things Worth Doing This Week<a href="https://engineersofai.com/blog/ai-letters-27-observability#three-things-worth-doing-this-week" class="hash-link" aria-label="Direct link to Three Things Worth Doing This Week" title="Direct link to Three Things Worth Doing This Week" translate="no">​</a></h2>
<ol>
<li class="">
<p><strong>Add one timing assertion to your agent test suite.</strong> Pick the most critical tool call in your pipeline and assert that it completes under a threshold. If your framework doesn't expose duration, that's the data point you need.</p>
</li>
<li class="">
<p><strong>Check whether your tracing uses global state.</strong> If you're using <code>set_verbose(True)</code> or <code>Settings.callback_manager</code> in a production environment, document exactly what gets emitted and where. Uncontrolled log output to stderr in a containerised environment is a reliability hazard.</p>
</li>
<li class="">
<p><strong>Run an agent that fails intentionally and time how long it takes to diagnose.</strong> Inject a tool that throws an exception mid-run. Measure how long it takes to identify: which step failed, what arguments it was called with, and what the LLM thought immediately before the call. That time is your observability gap.</p>
</li>
</ol>
<p>The 3-line setup is the beginning of observability, not the end of it.</p>
<p><em>Engineers of AI</em></p>
<p><strong>Read more: <a href="https://www.engineersofai.com/" target="_blank" rel="noopener noreferrer" class="">www.engineersofai.com</a></strong></p>
<p><em>If this was useful, forward it to one engineer who should be reading it.</em></p>]]></content:encoded>
            <category>AI Letters</category>
            <category>LLM</category>
            <category>Agents</category>
            <category>Benchmarks</category>
        </item>
        <item>
            <title><![CDATA[I Built a Lightweight LLM Framework Because LangChain Frustrated Me - Here's What I Learned]]></title>
            <link>https://engineersofai.com/blog/synapsekit-why-i-built-it</link>
            <guid>https://engineersofai.com/blog/synapsekit-why-i-built-it</guid>
            <pubDate>Wed, 15 Apr 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[The story of SynapseKit - why it exists, what it does differently, and what 18 objective benchmarks against LangChain and LlamaIndex actually revealed.]]></description>
            <content:encoded><![CDATA[<p>There's a moment every LLM developer knows. You've got a working prototype. It's elegant, fast, and does exactly what you need. Then you try to deploy it. And suddenly you're debugging a chain inside a runnable inside a callback inside an abstraction that didn't exist six months ago.</p>
<p>That moment happened one too many times. So something else got built.</p>
<p>This is the story of SynapseKit - why it exists, what it does differently, and what 18 (and counting) objective benchmarks against LangChain and LlamaIndex actually revealed.</p>
<!-- -->
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-problem-with-the-standard">The Problem With "The Standard"<a href="https://engineersofai.com/blog/synapsekit-why-i-built-it#the-problem-with-the-standard" class="hash-link" aria-label="Direct link to The Problem With &quot;The Standard&quot;" title="Direct link to The Problem With &quot;The Standard&quot;" translate="no">​</a></h2>
<p>Every developer building LLM-powered applications today reaches for the same toolkit: LangChain or LlamaIndex. They're powerful, well-documented, and have massive communities. They're also, frankly, a pain to work with day-to-day.</p>
<p>Not bad. Just built for different goals.</p>
<p>LangChain's philosophy is maximum flexibility: there's an abstraction for everything, a chain for every use case, and 87 packages you can bolt on. It's impressive engineering. It's also a framework that treats simple tasks like they're distributed systems problems.</p>
<p>LlamaIndex's philosophy is data ingestion depth: best-in-class chunking, indexing, and retrieval. If your application lives and dies by retrieval precision, LlamaIndex is serious software. But you pay for that depth in complexity.</p>
<p>Both are solving real problems. But neither optimises for the thing that matters most when building production LLM systems:</p>
<p><strong>How fast can I go from idea to working code, and how readable is that code six months later?</strong></p>
<p>After the fifth time debugging a LangChain stack trace that pointed three abstraction layers away from the actual code, SynapseKit started getting written.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="what-is-synapsekit">What Is SynapseKit?<a href="https://engineersofai.com/blog/synapsekit-why-i-built-it#what-is-synapsekit" class="hash-link" aria-label="Direct link to What Is SynapseKit?" title="Direct link to What Is SynapseKit?" translate="no">​</a></h2>
<p>SynapseKit is an async-first Python framework for building RAG pipelines, LLM agents, and multi-agent systems. It ships with:</p>
<ul>
<li class=""><strong>31 LLM providers</strong> - OpenAI, Anthropic, Groq, Mistral, Gemini, Ollama, LMStudio, xAI, Novita, Writer, and 21 more</li>
<li class=""><strong>48 built-in tools</strong> - search, math, file I/O, HTTP, code execution, NLP, data analysis, and more</li>
<li class=""><strong>43 document loaders</strong> - PDF, EPUB, LaTeX, RTF, TSV, S3, Azure Blob, MongoDB, Dropbox, OneDrive, and more</li>
<li class=""><strong>MCP server support</strong> - SSE transport with Bearer auth for Model Context Protocol</li>
<li class=""><strong>Multi-agent primitives</strong> - ReActAgent, Crew/CrewAgent/Task, graph-based workflows, recursive subgraphs</li>
</ul>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">pip install "synapsekit[semantic]"</span><br></div></code></pre></div></div>
<p>The base install has 2 dependencies. The full semantic install - vector search, all loaders, all tools - pulls in 14 packages. LangChain installs 67. That's not a rounding error; it's a design philosophy.</p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">synapsekit               →  2 deps  |  ~48 MB RAM  |  ~80ms startup</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">synapsekit[semantic]     → 14 deps  |</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">langchain                → 67 deps  | ~189 MB RAM  |  ~2.4s startup</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">llama-index-core         → 43 deps  | ~112 MB RAM  |  ~1.1s startup</span><br></div></code></pre></div></div>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-30-benchmark-series">The 30-Benchmark Series<a href="https://engineersofai.com/blog/synapsekit-why-i-built-it#the-30-benchmark-series" class="hash-link" aria-label="Direct link to The 30-Benchmark Series" title="Direct link to The 30-Benchmark Series" translate="no">​</a></h2>
<p>Rather than writing a marketing post, a 30-notebook benchmark series was run on Kaggle comparing SynapseKit to LangChain 0.3 and LlamaIndex Core 0.12. One measurable dimension per notebook. Every notebook runs end-to-end on Kaggle free CPU. Results reported honestly - including when SynapseKit loses.</p>
<p><strong>Follow the full series: <a href="https://www.kaggle.com/discussions/general/688339" target="_blank" rel="noopener noreferrer" class="">kaggle.com/discussions/general/688339</a></strong></p>
<p>Here's everything found so far.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="week-1-developer-experience">Week 1: Developer Experience<a href="https://engineersofai.com/blog/synapsekit-why-i-built-it#week-1-developer-experience" class="hash-link" aria-label="Direct link to Week 1: Developer Experience" title="Direct link to Week 1: Developer Experience" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="1---cold-start-synapsekit-wins-by-30">#1 - Cold Start: SynapseKit wins by 30×<a href="https://engineersofai.com/blog/synapsekit-why-i-built-it#1---cold-start-synapsekit-wins-by-30" class="hash-link" aria-label="Direct link to #1 - Cold Start: SynapseKit wins by 30×" title="Direct link to #1 - Cold Start: SynapseKit wins by 30×" translate="no">​</a></h3>
<p>The first thing you notice when you import a framework is the wait. For Lambda functions, FastAPI startup, or any process that imports on every cold start, this compounds fast.</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> time</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">t </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> time</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">perf_counter</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> synapsekit</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">print</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string-interpolation string" style="color:#e3116c">f"SynapseKit: </span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">time</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">.</span><span class="token string-interpolation interpolation">perf_counter</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">(</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">)</span><span class="token string-interpolation interpolation"> </span><span class="token string-interpolation interpolation operator" style="color:#393A34">-</span><span class="token string-interpolation interpolation"> t</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">:</span><span class="token string-interpolation interpolation format-spec">.3f</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c">s"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain">   </span><span class="token comment" style="color:#999988;font-style:italic"># 0.082s</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">t </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> time</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">perf_counter</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> langchain</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">print</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string-interpolation string" style="color:#e3116c">f"LangChain:  </span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">time</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">.</span><span class="token string-interpolation interpolation">perf_counter</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">(</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">)</span><span class="token string-interpolation interpolation"> </span><span class="token string-interpolation interpolation operator" style="color:#393A34">-</span><span class="token string-interpolation interpolation"> t</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">:</span><span class="token string-interpolation interpolation format-spec">.3f</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c">s"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain">   </span><span class="token comment" style="color:#999988;font-style:italic"># 2.41s</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">t </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> time</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">perf_counter</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> llama_index</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">print</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string-interpolation string" style="color:#e3116c">f"LlamaIndex: </span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">time</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">.</span><span class="token string-interpolation interpolation">perf_counter</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">(</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">)</span><span class="token string-interpolation interpolation"> </span><span class="token string-interpolation interpolation operator" style="color:#393A34">-</span><span class="token string-interpolation interpolation"> t</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">:</span><span class="token string-interpolation interpolation format-spec">.3f</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c">s"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain">   </span><span class="token comment" style="color:#999988;font-style:italic"># 1.08s</span><br></div></code></pre></div></div>
<p>SynapseKit: ~80ms. LangChain: ~2.4s. LlamaIndex: ~1.1s.</p>
<p>At 1,000 cold starts per day - realistic for a mid-traffic serverless API - LangChain burns 40 minutes of pure overhead. SynapseKit burns 1.3 minutes. In AWS Lambda terms, that's real money.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="2---dependency-count-synapsekit-wins-by-33">#2 - Dependency Count: SynapseKit wins by 33×<a href="https://engineersofai.com/blog/synapsekit-why-i-built-it#2---dependency-count-synapsekit-wins-by-33" class="hash-link" aria-label="Direct link to #2 - Dependency Count: SynapseKit wins by 33×" title="Direct link to #2 - Dependency Count: SynapseKit wins by 33×" translate="no">​</a></h3>

























<table><thead><tr><th>Framework</th><th>Base install</th><th>Full install</th></tr></thead><tbody><tr><td>SynapseKit</td><td>2 packages</td><td>14 packages</td></tr><tr><td>LlamaIndex Core</td><td>43 packages</td><td>70+ packages</td></tr><tr><td>LangChain</td><td>67 packages</td><td>100+ packages</td></tr></tbody></table>
<p>Fewer dependencies means faster installs, smaller container images, fewer CVE surface, and less <code>pip freeze</code> archaeology when something breaks.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="3---hello-rag-synapsekit-wins-fewest-lines">#3 - Hello RAG: SynapseKit wins (fewest lines)<a href="https://engineersofai.com/blog/synapsekit-why-i-built-it#3---hello-rag-synapsekit-wins-fewest-lines" class="hash-link" aria-label="Direct link to #3 - Hello RAG: SynapseKit wins (fewest lines)" title="Direct link to #3 - Hello RAG: SynapseKit wins (fewest lines)" translate="no">​</a></h3>
<p>The same RAG pipeline - load documents, embed, retrieve, answer - across three frameworks:</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic"># SynapseKit: 7 functional lines</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> synapsekit </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> RAGPipeline</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> LLMConfig</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> synapsekit</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">llm</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">openai </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> OpenAILLM</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">llm      </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> OpenAILLM</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">LLMConfig</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">model</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"gpt-4o-mini"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> api_key</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">KEY</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">pipeline </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> RAGPipeline</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">llm</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">llm</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">pipeline</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">add_documents</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">docs</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">answer   </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">await</span><span class="token plain"> pipeline</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">query</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"What is RAG?"</span><span class="token punctuation" style="color:#393A34">)</span><br></div></code></pre></div></div>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic"># LangChain: 14 functional lines</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> langchain_openai </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> ChatOpenAI</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> OpenAIEmbeddings</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> langchain_community</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">vectorstores </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> FAISS</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> langchain_core</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">output_parsers </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> StrOutputParser</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> langchain_core</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">runnables </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> RunnablePassthrough</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> langchain </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> hub</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">llm         </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> ChatOpenAI</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">model</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"gpt-4o-mini"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">embeddings  </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> OpenAIEmbeddings</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">vectorstore </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> FAISS</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">from_documents</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">docs</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> embeddings</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">retriever   </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> vectorstore</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">as_retriever</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">prompt      </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> hub</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">pull</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"rlm/rag-prompt"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">chain       </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">"context"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> retriever</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"question"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> RunnablePassthrough</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">               </span><span class="token operator" style="color:#393A34">|</span><span class="token plain"> prompt </span><span class="token operator" style="color:#393A34">|</span><span class="token plain"> llm </span><span class="token operator" style="color:#393A34">|</span><span class="token plain"> StrOutputParser</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">answer      </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> chain</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">invoke</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"What is RAG?"</span><span class="token punctuation" style="color:#393A34">)</span><br></div></code></pre></div></div>
<p>SynapseKit: 7 lines. LangChain: 14 lines. LlamaIndex: 11 lines.</p>
<p>This isn't code golf. Fewer lines means fewer places for bugs to hide, fewer things for a new team member to learn, and faster iteration. The LangChain version requires knowing what a runnable is, what <code>hub.pull</code> does, and why <code>RunnablePassthrough</code> is needed. The SynapseKit version is self-explanatory.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="4---memory-footprint-synapsekit-wins-by-4">#4 - Memory Footprint: SynapseKit wins by 4×<a href="https://engineersofai.com/blog/synapsekit-why-i-built-it#4---memory-footprint-synapsekit-wins-by-4" class="hash-link" aria-label="Direct link to #4 - Memory Footprint: SynapseKit wins by 4×" title="Direct link to #4 - Memory Footprint: SynapseKit wins by 4×" translate="no">​</a></h3>





















<table><thead><tr><th>Framework</th><th>RSS at import</th></tr></thead><tbody><tr><td>SynapseKit</td><td>48 MB</td></tr><tr><td>LlamaIndex</td><td>112 MB</td></tr><tr><td>LangChain</td><td>189 MB</td></tr></tbody></table>
<p>At 10 replicas, LangChain costs ~1.4 GB just in framework overhead. SynapseKit costs ~480 MB. For containerised deployments where you're paying per GB of memory, that difference compounds fast.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="5---provider-switching-synapsekit-wins-2-lines-changed">#5 - Provider Switching: SynapseKit wins (2 lines changed)<a href="https://engineersofai.com/blog/synapsekit-why-i-built-it#5---provider-switching-synapsekit-wins-2-lines-changed" class="hash-link" aria-label="Direct link to #5 - Provider Switching: SynapseKit wins (2 lines changed)" title="Direct link to #5 - Provider Switching: SynapseKit wins (2 lines changed)" translate="no">​</a></h3>
<p>One of the most common tasks in LLM development is experimenting across providers. How many lines change when you swap from OpenAI to Groq to Ollama?</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic"># SynapseKit - change 1 import + 1 config line</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> synapsekit</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">llm</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">openai </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> OpenAILLM</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">llm </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> OpenAILLM</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">LLMConfig</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">model</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"gpt-4o-mini"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> api_key</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">OPENAI_KEY</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> synapsekit</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">llm</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">groq </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> GroqLLM</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">llm </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> GroqLLM</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">LLMConfig</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">model</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"llama-3-8b-8192"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> api_key</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">GROQ_KEY</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> synapsekit</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">llm</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">ollama </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> OllamaLLM</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">llm </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> OllamaLLM</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">LLMConfig</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">model</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"llama3"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># Everything downstream: unchanged.</span><br></div></code></pre></div></div>
<p>SynapseKit: 2 lines. LangChain: 4–6 lines. LlamaIndex: 3–4 lines.</p>
<p>31 providers, all following the same <code>LLMConfig</code> pattern. Switching from a paid API to a local model for development takes 10 seconds.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="week-2-rag-pipelines">Week 2: RAG Pipelines<a href="https://engineersofai.com/blog/synapsekit-why-i-built-it#week-2-rag-pipelines" class="hash-link" aria-label="Direct link to Week 2: RAG Pipelines" title="Direct link to Week 2: RAG Pipelines" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="8---pdf-ingestion-all-close">#8 - PDF Ingestion: All close<a href="https://engineersofai.com/blog/synapsekit-why-i-built-it#8---pdf-ingestion-all-close" class="hash-link" aria-label="Direct link to #8 - PDF Ingestion: All close" title="Direct link to #8 - PDF Ingestion: All close" translate="no">​</a></h3>
<p>All three frameworks can index a PDF in under 10 lines. This one's effectively a draw - SynapseKit slightly more concise but the gap is small.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="9---chunking-strategies-llamaindex-wins">#9 - Chunking Strategies: LlamaIndex wins<a href="https://engineersofai.com/blog/synapsekit-why-i-built-it#9---chunking-strategies-llamaindex-wins" class="hash-link" aria-label="Direct link to #9 - Chunking Strategies: LlamaIndex wins" title="Direct link to #9 - Chunking Strategies: LlamaIndex wins" translate="no">​</a></h3>
<p>This is where LlamaIndex genuinely excels.</p>
<p>LlamaIndex ships 9+ built-in splitters including <code>SentenceWindowNodeParser</code> (adds surrounding context sentences to each chunk) and <code>HierarchicalNodeParser</code> (creates parent-child chunk trees for better retrieval). These are sophisticated, research-backed strategies that meaningfully improve retrieval quality.</p>
<p>SynapseKit and LangChain both offer token-based and sentence-based splitting - adequate for most use cases, but not at LlamaIndex's depth.</p>
<p><strong>If your application's quality depends on smart chunking, LlamaIndex is the right choice for the retrieval layer.</strong></p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="10---built-in-bm25-synapsekit-wins">#10 - Built-in BM25: SynapseKit wins<a href="https://engineersofai.com/blog/synapsekit-why-i-built-it#10---built-in-bm25-synapsekit-wins" class="hash-link" aria-label="Direct link to #10 - Built-in BM25: SynapseKit wins" title="Direct link to #10 - Built-in BM25: SynapseKit wins" translate="no">​</a></h3>
<p>BM25 is the backbone of lexical search and an essential half of any hybrid retrieval system. In SynapseKit, it's a core dependency - no extra install.</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic"># SynapseKit - BM25 built in, zero extra pip</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> synapsekit</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">retrievers </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> BM25Retriever</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">retriever </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> BM25Retriever</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">documents</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">results   </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> retriever</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">retrieve</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"machine learning transformers"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> k</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">5</span><span class="token punctuation" style="color:#393A34">)</span><br></div></code></pre></div></div>
<p>LangChain requires <code>pip install rank-bm25</code> and additional wiring. LlamaIndex similarly requires an extra install. For a technique this fundamental to production RAG, burying it behind an extra install is a friction tax.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="11---hybrid-search-rrf-fusion-langchain-wins">#11 - Hybrid Search (RRF Fusion): LangChain wins<a href="https://engineersofai.com/blog/synapsekit-why-i-built-it#11---hybrid-search-rrf-fusion-langchain-wins" class="hash-link" aria-label="Direct link to #11 - Hybrid Search (RRF Fusion): LangChain wins" title="Direct link to #11 - Hybrid Search (RRF Fusion): LangChain wins" translate="no">​</a></h3>
<p>Reciprocal Rank Fusion blends BM25 lexical scores and semantic embedding scores into a single ranked list - typically outperforming either alone by 5–15% on BEIR benchmarks.</p>
<p>LangChain's <code>EnsembleRetriever</code> is the cleanest API for this. SynapseKit supports hybrid retrieval but requires more manual wiring at present. Honest finding: LangChain wins this one.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="12---streaming-rag-effectively-a-draw-async-ergonomics-synapsekit">#12 - Streaming RAG: Effectively a draw (async ergonomics: SynapseKit)<a href="https://engineersofai.com/blog/synapsekit-why-i-built-it#12---streaming-rag-effectively-a-draw-async-ergonomics-synapsekit" class="hash-link" aria-label="Direct link to #12 - Streaming RAG: Effectively a draw (async ergonomics: SynapseKit)" title="Direct link to #12 - Streaming RAG: Effectively a draw (async ergonomics: SynapseKit)" translate="no">​</a></h3>
<p>All three frameworks achieve sub-millisecond TTFT in a mock environment. The real differences are at the API layer, not the framework layer. But the streaming API ergonomics differ:</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic"># SynapseKit - stream tokens as they arrive</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">async</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> token </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> llm</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">stream</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"Explain transformers in simple terms"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">print</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">token</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> end</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">""</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> flush</span><span class="token operator" style="color:#393A34">=</span><span class="token boolean" style="color:#36acaa">True</span><span class="token punctuation" style="color:#393A34">)</span><br></div></code></pre></div></div>
<p>LangChain requires <code>astream()</code> on runnables. LlamaIndex requires a <code>StreamingResponse</code> wrapper. Small differences, but they accumulate across a codebase.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="13---conversation-memory-synapsekit-wins-clarity">#13 - Conversation Memory: SynapseKit wins (clarity)<a href="https://engineersofai.com/blog/synapsekit-why-i-built-it#13---conversation-memory-synapsekit-wins-clarity" class="hash-link" aria-label="Direct link to #13 - Conversation Memory: SynapseKit wins (clarity)" title="Direct link to #13 - Conversation Memory: SynapseKit wins (clarity)" translate="no">​</a></h3>

























<table><thead><tr><th>Framework</th><th>API</th><th>Trimming strategy</th></tr></thead><tbody><tr><td>SynapseKit</td><td><code>ConversationMemory(window=3)</code></td><td>Turn-count sliding window</td></tr><tr><td>LangChain</td><td><code>InMemoryChatMessageHistory</code></td><td>Manual - stores everything, you trim</td></tr><tr><td>LlamaIndex</td><td><code>ChatMemoryBuffer.from_defaults(token_limit=500)</code></td><td>Token-budget trimming</td></tr></tbody></table>
<p>SynapseKit's <code>window=</code> parameter is the most beginner-friendly. LlamaIndex's token-budget approach is the most robust for production - especially when dealing with long tool outputs that blow up turn-count estimates.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="week-3-agents--tools">Week 3: Agents &amp; Tools<a href="https://engineersofai.com/blog/synapsekit-why-i-built-it#week-3-agents--tools" class="hash-link" aria-label="Direct link to Week 3: Agents &amp; Tools" title="Direct link to Week 3: Agents &amp; Tools" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="15---react-agents-synapsekit-wins-3-lines-vs-11">#15 - ReAct Agents: SynapseKit wins (3 lines vs 11)<a href="https://engineersofai.com/blog/synapsekit-why-i-built-it#15---react-agents-synapsekit-wins-3-lines-vs-11" class="hash-link" aria-label="Direct link to #15 - ReAct Agents: SynapseKit wins (3 lines vs 11)" title="Direct link to #15 - ReAct Agents: SynapseKit wins (3 lines vs 11)" translate="no">​</a></h3>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic"># SynapseKit: 3 lines to a working ReAct agent</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> synapsekit </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> ReActAgent</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> synapsekit</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">tools </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> CalculatorTool</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> DateTimeTool</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">agent  </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> ReActAgent</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">llm</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">llm</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> tools</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">CalculatorTool</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> DateTimeTool</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> max_iterations</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">10</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">result </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">await</span><span class="token plain"> agent</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">run</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"What is 847 × 23, and what day is it today?"</span><span class="token punctuation" style="color:#393A34">)</span><br></div></code></pre></div></div>
<p>SynapseKit: 3 lines. LangChain: 11 lines (requires <code>create_react_agent</code> + <code>AgentExecutor</code> + a prompt template from LangSmith hub). LlamaIndex: 9 lines.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="16---function-calling-synapsekit-wins-multi-provider-schemas">#16 - Function Calling: SynapseKit wins (multi-provider schemas)<a href="https://engineersofai.com/blog/synapsekit-why-i-built-it#16---function-calling-synapsekit-wins-multi-provider-schemas" class="hash-link" aria-label="Direct link to #16 - Function Calling: SynapseKit wins (multi-provider schemas)" title="Direct link to #16 - Function Calling: SynapseKit wins (multi-provider schemas)" translate="no">​</a></h3>
<p>SynapseKit's <code>BaseTool</code> generates both OpenAI-format and Anthropic-format schemas from a single tool definition. Write a tool once, use it with any provider:</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">class</span><span class="token plain"> </span><span class="token class-name">WeatherTool</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">BaseTool</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    name        </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"get_weather"</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    description </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"Get the current weather for a city."</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    parameters  </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token string" style="color:#e3116c">"type"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"object"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token string" style="color:#e3116c">"properties"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token string" style="color:#e3116c">"city"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">"type"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"string"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"description"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"City name"</span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token string" style="color:#e3116c">"required"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token string" style="color:#e3116c">"city"</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">async</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">run</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">self</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> city</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token builtin">str</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">-</span><span class="token operator" style="color:#393A34">&gt;</span><span class="token plain"> </span><span class="token builtin">str</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> </span><span class="token string-interpolation string" style="color:#e3116c">f"Sunny, 22°C in </span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">city</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c">"</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">tool </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> WeatherTool</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">tool</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">schema</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain">             </span><span class="token comment" style="color:#999988;font-style:italic"># → OpenAI tools format</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">tool</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">anthropic_schema</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain">   </span><span class="token comment" style="color:#999988;font-style:italic"># → Anthropic tool_use format</span><br></div></code></pre></div></div>
<p>One tool definition. Zero vendor lock-in. Switch your LLM provider and your tools come with you.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="17---built-in-tool-libraries-synapsekit-wins-by-a-wide-margin">#17 - Built-in Tool Libraries: SynapseKit wins by a wide margin<a href="https://engineersofai.com/blog/synapsekit-why-i-built-it#17---built-in-tool-libraries-synapsekit-wins-by-a-wide-margin" class="hash-link" aria-label="Direct link to #17 - Built-in Tool Libraries: SynapseKit wins by a wide margin" title="Direct link to #17 - Built-in Tool Libraries: SynapseKit wins by a wide margin" translate="no">​</a></h3>

























<table><thead><tr><th>Framework</th><th>Built-in tools</th><th>Zero-config (no API key needed)</th></tr></thead><tbody><tr><td>SynapseKit</td><td>48 across 9 categories</td><td>12</td></tr><tr><td>LangChain</td><td>~17 core + community</td><td>Most need extra installs</td></tr><tr><td>LlamaIndex</td><td>3 core wrappers</td><td>3</td></tr></tbody></table>
<p>SynapseKit's 9 tool categories - 48 tools ready to drop into any agent:</p>













































<table><thead><tr><th>Category</th><th>Tools</th></tr></thead><tbody><tr><td>Search</td><td>WebSearchTool, WikipediaTool, NewsSearchTool</td></tr><tr><td>Math</td><td>CalculatorTool, StatisticsCalculatorTool, UnitConverterTool</td></tr><tr><td>Date/Time</td><td>DateTimeTool, TimezoneConverterTool, CalendarTool</td></tr><tr><td>Text Processing</td><td>TextSummarizerTool, TextTranslatorTool, KeywordExtractorTool</td></tr><tr><td>File I/O</td><td>FileReaderTool, FileWriterTool, CSVReaderTool, JSONParserTool</td></tr><tr><td>HTTP</td><td>HTTPRequestTool, APIClientTool</td></tr><tr><td>Code Execution</td><td>PythonREPLTool, ShellCommandTool</td></tr><tr><td>Data Analysis</td><td>DataFrameAnalyzerTool, ChartGeneratorTool</td></tr><tr><td>NLP</td><td>SentimentAnalysisTool, NamedEntityRecognitionTool</td></tr></tbody></table>
<p>With LangChain, getting a working tool usually means installing a community package, finding an API key, and reading a separate doc page. With SynapseKit, 12 tools work with zero configuration.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="18---multi-agent-orchestration-synapsekit-wins-fewest-lines--most-patterns">#18 - Multi-Agent Orchestration: SynapseKit wins (fewest lines + most patterns)<a href="https://engineersofai.com/blog/synapsekit-why-i-built-it#18---multi-agent-orchestration-synapsekit-wins-fewest-lines--most-patterns" class="hash-link" aria-label="Direct link to #18 - Multi-Agent Orchestration: SynapseKit wins (fewest lines + most patterns)" title="Direct link to #18 - Multi-Agent Orchestration: SynapseKit wins (fewest lines + most patterns)" translate="no">​</a></h3>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> synapsekit </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> Crew</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> CrewAgent</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> Task</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">researcher </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> CrewAgent</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    name</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"researcher"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> role</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"Research Analyst"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    goal</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"Produce structured bullet points."</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> llm</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">llm</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">writer </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> CrewAgent</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    name</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"writer"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> role</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"Content Writer"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    goal</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"Turn bullet points into a polished paragraph."</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> llm</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">llm</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">tasks </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    Task</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">description</span><span class="token operator" style="color:#393A34">=</span><span class="token string-interpolation string" style="color:#e3116c">f"Research: </span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">TOPIC</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c">"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> agent</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"researcher"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">         expected_output</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"3–5 bullet points"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    Task</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">description</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"Write a paragraph from the research."</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> agent</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"writer"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">         context_from</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">[</span><span class="token string" style="color:#e3116c">"researcher"</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> expected_output</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"One paragraph"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">crew   </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> Crew</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">agents</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">researcher</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> writer</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> tasks</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">tasks</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> process</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"sequential"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">result </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">await</span><span class="token plain"> crew</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">run</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><br></div></code></pre></div></div>
<p>The <code>context_from=</code> parameter is the key insight: tasks declare their data dependencies declaratively. The framework handles execution order and context passing.</p>
<p><strong>Orchestration pattern support:</strong></p>





















































<table><thead><tr><th>Pattern</th><th>SynapseKit</th><th>LangChain</th><th>LlamaIndex</th></tr></thead><tbody><tr><td>Sequential</td><td>✅</td><td>✅</td><td>✅</td></tr><tr><td>Parallel</td><td>✅</td><td>✅</td><td>❌</td></tr><tr><td>Supervisor</td><td>✅</td><td>✅</td><td>❌</td></tr><tr><td>Handoff chain</td><td>✅</td><td>❌ (manual)</td><td>✅</td></tr><tr><td>Graph / DAG</td><td>✅</td><td>✅ (LangGraph)</td><td>❌</td></tr><tr><td>Shared state</td><td>✅</td><td>✅</td><td>✅</td></tr><tr><td><strong>Score</strong></td><td><strong>6/6</strong></td><td><strong>5/6</strong></td><td><strong>3/6</strong></td></tr></tbody></table>
<p>LangChain's LangGraph is genuinely excellent for complex conditional workflows - if you need a state machine with branching logic, it's the right tool. SynapseKit's graph support handles the majority of production patterns with less ceremony.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="cumulative-scorecard-18-notebooks-in">Cumulative Scorecard (18 notebooks in)<a href="https://engineersofai.com/blog/synapsekit-why-i-built-it#cumulative-scorecard-18-notebooks-in" class="hash-link" aria-label="Direct link to Cumulative Scorecard (18 notebooks in)" title="Direct link to Cumulative Scorecard (18 notebooks in)" translate="no">​</a></h2>

























<table><thead><tr><th>Framework</th><th>Points</th><th>Category wins</th></tr></thead><tbody><tr><td>SynapseKit</td><td>38</td><td>12 - cold start, dependencies, LoC, memory, provider switching, BM25, streaming ergonomics, memory clarity, ReAct agents, function calling, tools, multi-agent</td></tr><tr><td>LangChain</td><td>22</td><td>3 - hybrid search RRF, LangGraph flexibility, error UX</td></tr><tr><td>LlamaIndex</td><td>18</td><td>2 - chunking depth, token-budget memory</td></tr></tbody></table>
<p>SynapseKit leads on developer ergonomics and batteries-included tooling. LangChain leads on complex graph orchestration. LlamaIndex leads on retrieval precision.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="architecture-what-makes-synapsekit-different">Architecture: What Makes SynapseKit Different<a href="https://engineersofai.com/blog/synapsekit-why-i-built-it#architecture-what-makes-synapsekit-different" class="hash-link" aria-label="Direct link to Architecture: What Makes SynapseKit Different" title="Direct link to Architecture: What Makes SynapseKit Different" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="1-async-by-default---not-retrofitted">1. Async by default - not retrofitted<a href="https://engineersofai.com/blog/synapsekit-why-i-built-it#1-async-by-default---not-retrofitted" class="hash-link" aria-label="Direct link to 1. Async by default - not retrofitted" title="Direct link to 1. Async by default - not retrofitted" translate="no">​</a></h3>
<p>SynapseKit was designed async from the ground up. Every <code>run()</code>, every <code>query()</code>, every tool call returns a coroutine.</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic"># Concurrent queries - not sequential</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">results </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">await</span><span class="token plain"> asyncio</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">gather</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    pipeline</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">query</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"What is the capital of France?"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    pipeline</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">query</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"Explain backpropagation in 2 sentences."</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    pipeline</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">query</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"Summarise the attached PDF."</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><br></div></code></pre></div></div>
<p>In LangChain, async is available but not the default. Many features exist only in sync form and async was added later. The difference is subtle in a tutorial, significant in a production API.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="2-shallow-call-stack---your-errors-not-ours">2. Shallow call stack - your errors, not ours<a href="https://engineersofai.com/blog/synapsekit-why-i-built-it#2-shallow-call-stack---your-errors-not-ours" class="hash-link" aria-label="Direct link to 2. Shallow call stack - your errors, not ours" title="Direct link to 2. Shallow call stack - your errors, not ours" translate="no">​</a></h3>
<p>When <code>pipeline.query()</code> breaks in LangChain, your traceback travels through <code>Runnable</code>, <code>RunnableSequence</code>, <code>CallbackManager</code>, <code>BaseChain</code>, and surfaces somewhere deep in the framework. You spend 10 minutes decoding the stack trace before you can begin debugging.</p>
<p>In SynapseKit, the call path is intentionally shallow. When something breaks, the traceback points at your code. No hidden middleware, no callback chains, no runnable wrappers unless you explicitly add them.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="3-unified-tool-interface---one-definition-every-provider">3. Unified tool interface - one definition, every provider<a href="https://engineersofai.com/blog/synapsekit-why-i-built-it#3-unified-tool-interface---one-definition-every-provider" class="hash-link" aria-label="Direct link to 3. Unified tool interface - one definition, every provider" title="Direct link to 3. Unified tool interface - one definition, every provider" translate="no">​</a></h3>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">class</span><span class="token plain"> </span><span class="token class-name">BaseTool</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token builtin">str</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    description</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token builtin">str</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    parameters</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token builtin">dict</span><span class="token plain">  </span><span class="token comment" style="color:#999988;font-style:italic"># JSON Schema</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">async</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">run</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">self</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">**</span><span class="token plain">kwargs</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">-</span><span class="token operator" style="color:#393A34">&gt;</span><span class="token plain"> </span><span class="token builtin">str</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">.</span><span class="token punctuation" style="color:#393A34">.</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">schema</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">self</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">-</span><span class="token operator" style="color:#393A34">&gt;</span><span class="token plain"> </span><span class="token builtin">dict</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">.</span><span class="token punctuation" style="color:#393A34">.</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">           </span><span class="token comment" style="color:#999988;font-style:italic"># OpenAI tools format</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">anthropic_schema</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">self</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">-</span><span class="token operator" style="color:#393A34">&gt;</span><span class="token plain"> </span><span class="token builtin">dict</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">.</span><span class="token punctuation" style="color:#393A34">.</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain"> </span><span class="token comment" style="color:#999988;font-style:italic"># Anthropic tool_use format</span><br></div></code></pre></div></div>
<p>Write a tool once. It works with GPT-4o, Claude 3.5, Llama 3 on Groq, Gemini - any of the 31 supported providers. No adapter layer, no per-provider tool registration.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="4-task-centric-multi-agent---separate-what-from-who">4. Task-centric multi-agent - separate what from who<a href="https://engineersofai.com/blog/synapsekit-why-i-built-it#4-task-centric-multi-agent---separate-what-from-who" class="hash-link" aria-label="Direct link to 4. Task-centric multi-agent - separate what from who" title="Direct link to 4. Task-centric multi-agent - separate what from who" translate="no">​</a></h3>
<p>SynapseKit's Crew model separates <em>what to do</em> (Task) from <em>who does it</em> (Agent). Tasks declare their dependencies via <code>context_from</code>. The framework handles execution order, context accumulation, and result passing.</p>
<p>Wiring data flow manually between agents is the source of most multi-agent bugs. When Agent B needs Agent A's output, you shouldn't write the plumbing; you should declare the dependency.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="5-43-loaders---data-ingestion-without-hunting-for-packages">5. 43 loaders - data ingestion without hunting for packages<a href="https://engineersofai.com/blog/synapsekit-why-i-built-it#5-43-loaders---data-ingestion-without-hunting-for-packages" class="hash-link" aria-label="Direct link to 5. 43 loaders - data ingestion without hunting for packages" title="Direct link to 5. 43 loaders - data ingestion without hunting for packages" translate="no">​</a></h3>
<p>Production RAG applications ingest data from everywhere. SynapseKit ships 43 loaders:</p>
<ul>
<li class=""><strong>Documents:</strong> PDF, EPUB, LaTeX, RTF, DOCX, Markdown, HTML</li>
<li class=""><strong>Data:</strong> CSV, TSV, JSON, XML, SQLite</li>
<li class=""><strong>Cloud:</strong> S3, Azure Blob, OneDrive, Dropbox</li>
<li class=""><strong>Databases:</strong> MongoDB, PostgreSQL</li>
<li class=""><strong>Config:</strong> .env, YAML, TOML</li>
<li class=""><strong>Web:</strong> sitemap crawlers, URL loaders, RSS feeds</li>
<li class=""><strong>Code:</strong> Python, JavaScript, TypeScript source files</li>
</ul>
<p>One consistent <code>Loader.load()</code> → <code>List[Document]</code> interface. Every loader returns the same type. Your downstream pipeline code never changes regardless of where the data comes from.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="6-mcp-server-support---model-context-protocol-built-in">6. MCP Server support - Model Context Protocol built in<a href="https://engineersofai.com/blog/synapsekit-why-i-built-it#6-mcp-server-support---model-context-protocol-built-in" class="hash-link" aria-label="Direct link to 6. MCP Server support - Model Context Protocol built in" title="Direct link to 6. MCP Server support - Model Context Protocol built in" translate="no">​</a></h3>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> synapsekit</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">mcp </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> MCPServer</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">server </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> MCPServer</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">name</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"my-tools"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> tools</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">WeatherTool</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> CalculatorTool</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">await</span><span class="token plain"> server</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">run_sse</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">host</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"0.0.0.0"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> port</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">8080</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> bearer_token</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"secret"</span><span class="token punctuation" style="color:#393A34">)</span><br></div></code></pre></div></div>
<p>Expose any tool as a production MCP endpoint in 3 lines. Compatible with any MCP-compliant client.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-honest-take-when-to-use-each">The Honest Take: When to Use Each<a href="https://engineersofai.com/blog/synapsekit-why-i-built-it#the-honest-take-when-to-use-each" class="hash-link" aria-label="Direct link to The Honest Take: When to Use Each" title="Direct link to The Honest Take: When to Use Each" translate="no">​</a></h2>
<p>SynapseKit was built for a specific set of problems. It's not the right choice for every use case.</p>
<p><strong>Use SynapseKit when:</strong></p>
<ul>
<li class="">You're building a greenfield LLM app and want the fastest path to production</li>
<li class="">Your app is async-first - APIs, webhooks, real-time applications, serverless</li>
<li class="">You need a small footprint - containers, Lambda, edge runtimes</li>
<li class="">You want batteries included without hunting for extra packages</li>
<li class="">Your pipeline uses standard patterns: ReAct agents, Crew orchestration, RAG, streaming</li>
<li class="">You're experimenting across providers and need painless switching</li>
<li class="">You want readable code that a new team member can understand without framework training</li>
</ul>
<p><strong>Use LangChain when:</strong></p>
<ul>
<li class="">You need complex conditional graph workflows - LangGraph is genuinely excellent at stateful, branching agentic pipelines</li>
<li class="">You need a specific integration from LangChain's 150+ partner ecosystem</li>
<li class="">Your team already knows LangChain deeply and migration cost outweighs gains</li>
<li class="">You need LangSmith observability deeply integrated into your debugging workflow</li>
</ul>
<p><strong>Use LlamaIndex when:</strong></p>
<ul>
<li class="">Advanced chunking is central to your application quality (<code>SentenceWindow</code>, <code>Hierarchical</code> - there's nothing equivalent in SynapseKit today)</li>
<li class="">You're building a knowledge-intensive system where retrieval precision is the primary metric</li>
<li class="">You want LLM-native evaluation metrics (faithfulness, relevance, groundedness) built into the framework</li>
</ul>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="whats-coming-in-the-benchmark-series">What's Coming in the Benchmark Series<a href="https://engineersofai.com/blog/synapsekit-why-i-built-it#whats-coming-in-the-benchmark-series" class="hash-link" aria-label="Direct link to What's Coming in the Benchmark Series" title="Direct link to What's Coming in the Benchmark Series" translate="no">​</a></h2>
<p>The series continues through Notebooks #19–#30:</p>
<ul>
<li class=""><strong>#19</strong> - Observability &amp; Tracing: What can you actually see when your agent runs?</li>
<li class=""><strong>#20</strong> - Agent Error Handling: What happens when a tool throws an exception mid-loop?</li>
<li class=""><strong>#21</strong> - Week 3 Scorecard: Agents &amp; tools final rankings</li>
<li class=""><strong>#22</strong> - Async Throughput: Requests/second under real concurrency</li>
<li class=""><strong>#23</strong> - Graph Workflows: DAG pipelines for complex conditional flows</li>
<li class=""><strong>#24</strong> - LLM Evaluation: Built-in faithfulness and relevance metrics</li>
<li class=""><strong>#25</strong> - Cost Tracking: Token counting and spend visibility</li>
<li class=""><strong>#26</strong> - Guardrails: Content filtering and output validation</li>
<li class=""><strong>#27</strong> - MCP Support: Model Context Protocol in practice</li>
<li class=""><strong>#28</strong> - Week 4 Scorecard</li>
<li class=""><strong>#29–#30</strong> - Final Verdict: Which framework wins, for whom, and why</li>
</ul>
<p><a href="https://www.kaggle.com/discussions/general/688339" target="_blank" rel="noopener noreferrer" class="">Follow the series on Kaggle</a></p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="quick-start">Quick Start<a href="https://engineersofai.com/blog/synapsekit-why-i-built-it#quick-start" class="hash-link" aria-label="Direct link to Quick Start" title="Direct link to Quick Start" translate="no">​</a></h2>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain"># Minimal install - 2 dependencies</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">pip install synapsekit</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"># Full install - vector search, all loaders, all tools</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">pip install "synapsekit[semantic]"</span><br></div></code></pre></div></div>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic"># Your first RAG pipeline in 7 lines</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> synapsekit </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> RAGPipeline</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> LLMConfig</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> synapsekit</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">llm</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">openai </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> OpenAILLM</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> synapsekit</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">loaders </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> PDFLoader</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">llm      </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> OpenAILLM</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">LLMConfig</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">model</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"gpt-4o-mini"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> api_key</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"sk-..."</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">docs     </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> PDFLoader</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"research.pdf"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">load</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">pipeline </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> RAGPipeline</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">llm</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">llm</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">pipeline</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">add_documents</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">docs</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">answer </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">await</span><span class="token plain"> pipeline</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">query</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"What are the main findings?"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">print</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">answer</span><span class="token punctuation" style="color:#393A34">)</span><br></div></code></pre></div></div>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic"># Your first multi-agent crew in 10 lines</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> synapsekit </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> Crew</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> CrewAgent</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> Task</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> synapsekit</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">llm</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">groq </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> GroqLLM</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">llm        </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> GroqLLM</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">LLMConfig</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">model</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"llama-3-8b-8192"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> api_key</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"gsk-..."</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">researcher </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> CrewAgent</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">name</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"researcher"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> role</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"Research Analyst"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> llm</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">llm</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">writer     </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> CrewAgent</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">name</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"writer"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> role</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"Writer"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> llm</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">llm</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">tasks      </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    Task</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">description</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"Research quantum computing trends"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> agent</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"researcher"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    Task</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">description</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"Write a blog intro"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> agent</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"writer"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> context_from</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">[</span><span class="token string" style="color:#e3116c">"researcher"</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">result </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">await</span><span class="token plain"> Crew</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">agents</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">researcher</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> writer</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> tasks</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">tasks</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">run</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><br></div></code></pre></div></div>
<p><strong>Links:</strong></p>
<ul>
<li class="">GitHub: <a href="https://github.com/SynapseKit/SynapseKit" target="_blank" rel="noopener noreferrer" class="">github.com/SynapseKit/SynapseKit</a></li>
<li class="">Docs: <a href="https://synapsekit.github.io/synapsekit-docs" target="_blank" rel="noopener noreferrer" class="">synapsekit.github.io/synapsekit-docs</a></li>
<li class="">Kaggle benchmark series: <a href="https://www.kaggle.com/discussions/general/688339" target="_blank" rel="noopener noreferrer" class="">kaggle.com/discussions/general/688339</a></li>
</ul>
<p>Every benchmark is reproducible. Fork any notebook and run it on Kaggle free CPU. If the results differ in your environment, open an issue.</p>
<p><em>Engineers of AI</em></p>
<p><strong>Read more: <a href="https://www.engineersofai.com/" target="_blank" rel="noopener noreferrer" class="">www.engineersofai.com</a></strong></p>]]></content:encoded>
            <category>LLM</category>
            <category>Agents</category>
            <category>Benchmarks</category>
            <category>RAG</category>
        </item>
        <item>
            <title><![CDATA[AI Letters #26 - Multi-Agent Orchestration: 16 vs 19 vs 23 Lines (And Three Completely Different Mental Models)]]></title>
            <link>https://engineersofai.com/blog/ai-letters-26-multi-agent-orchestration</link>
            <guid>https://engineersofai.com/blog/ai-letters-26-multi-agent-orchestration</guid>
            <pubDate>Wed, 15 Apr 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[Notebook]]></description>
            <content:encoded><![CDATA[<blockquote>
<p>"Three frameworks, three different answers to the same question: who decides when one agent hands work to the next?"</p>
</blockquote>
<p>A single agent with tools handles most tasks. But some workflows need specialisation - a researcher producing facts, a writer turning facts into prose, a reviewer checking the output. That chain of specialised agents is where the frameworks stop converging and start showing what they actually believe about software design.</p>
<p>Notebook #18 of the LLM Showdown measured the same 2-agent sequential pipeline across SynapseKit, LangChain (via LangGraph), and LlamaIndex. Researcher feeds Writer. Both call an LLM. The orchestrator wires them together. Simple enough that you can count the lines. Complex enough that the design philosophy underneath becomes visible.</p>
<p>The LoC numbers tell part of the story. The orchestration pattern matrix tells the rest.</p>
<!-- -->
<div style="display:flex;flex-direction:column;gap:16px;margin:24px 0">
<a href="https://engineersofai.com/blog-visuals/ai-letters-26/timeline.html" target="_blank" rel="noopener noreferrer" style="text-decoration:none" class="">
<div style="border:1px solid #e2e8f0;border-radius:12px;padding:20px 24px;background:#fff;cursor:pointer">
<div style="font-size:0.72rem;font-weight:700;letter-spacing:0.08em;text-transform:uppercase;color:#6366f1;margin-bottom:6px">Interactive Chart</div>
<div style="font-size:1rem;font-weight:700;color:#0f172a;margin-bottom:6px">Multi-Agent History Timeline →</div>
<div style="font-size:0.88rem;color:#475569;line-height:1.6">From FIPA agent standards to LangGraph and CrewAI. Click each milestone to see how the orchestration model evolved and what design tradeoffs each generation made.</div>
</div>
</a>
<a href="https://engineersofai.com/blog-visuals/ai-letters-26/paradigms.html" target="_blank" rel="noopener noreferrer" style="text-decoration:none" class="">
<div style="border:1px solid #e2e8f0;border-radius:12px;padding:20px 24px;background:#fff;cursor:pointer">
<div style="font-size:0.72rem;font-weight:700;letter-spacing:0.08em;text-transform:uppercase;color:#0ea5e9;margin-bottom:6px">Interactive Explorer</div>
<div style="font-size:1rem;font-weight:700;color:#0ea5e9;margin-bottom:6px">Orchestration Pattern Explorer →</div>
<div style="font-size:0.88rem;color:#475569;line-height:1.6">Click any of 6 orchestration patterns - sequential, parallel, supervisor, handoff, graph, shared state - to see which frameworks support it natively and what the code looks like.</div>
</div>
</a>
<a href="https://engineersofai.com/blog-visuals/ai-letters-26/evidence.html" target="_blank" rel="noopener noreferrer" style="text-decoration:none" class="">
<div style="border:1px solid #e2e8f0;border-radius:12px;padding:20px 24px;background:#fff;cursor:pointer">
<div style="font-size:0.72rem;font-weight:700;letter-spacing:0.08em;text-transform:uppercase;color:#10b981;margin-bottom:6px">Evidence Dashboard</div>
<div style="font-size:1rem;font-weight:700;color:#0f172a;margin-bottom:6px">Full Benchmark Results →</div>
<div style="font-size:0.88rem;color:#475569;line-height:1.6">Stacked LoC chart, orchestration pattern heatmap, and design philosophy comparison - all data from notebook #18 in one view.</div>
</div>
</a>
</div>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="what-the-numbers-say">What the Numbers Say<a href="https://engineersofai.com/blog/ai-letters-26-multi-agent-orchestration#what-the-numbers-say" class="hash-link" aria-label="Direct link to What the Numbers Say" title="Direct link to What the Numbers Say" translate="no">​</a></h2>
<p>The benchmark task was identical across all three: wire a Researcher agent and a Writer agent in sequence. Researcher gets a topic, produces bullet points. Writer receives those bullet points, produces a paragraph.</p>
<p><strong>Lines of code - imports + setup to a working 2-agent pipeline:</strong></p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">Framework       Imports  Functional   Total</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">--------------------------------------------</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">SynapseKit            3          13      16</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">LlamaIndex            3          16      19</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">LangChain             4          19      23</span><br></div></code></pre></div></div>
<p>SynapseKit wins on LoC. The gap between SynapseKit (16) and LangChain (23) looks large but read the next section before drawing conclusions.</p>
<p><strong>Orchestration patterns supported:</strong></p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">Pattern                SynapseKit  LangChain  LlamaIndex</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">---------------------------------------------------------</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Sequential             Yes         Yes        Yes</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Parallel               Yes         Yes        No</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Supervisor             Yes         Yes        No</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Handoff chain          Yes         No         Yes</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Graph / DAG            Yes         Yes        No</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Shared state           Yes         Yes        Yes</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Score (out of 6):        6           5          3</span><br></div></code></pre></div></div>
<p>SynapseKit and LangChain are nearly tied. LlamaIndex trails significantly - its <code>AgentWorkflow</code> supports sequential handoffs and shared state, but no parallel execution and no supervisor routing.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-three-mental-models">The Three Mental Models<a href="https://engineersofai.com/blog/ai-letters-26-multi-agent-orchestration#the-three-mental-models" class="hash-link" aria-label="Direct link to The Three Mental Models" title="Direct link to The Three Mental Models" translate="no">​</a></h2>
<p>This is the part that matters more than LoC.</p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">Who controls the handoff?</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">SynapseKit               LangChain (LangGraph)    LlamaIndex</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">──────────────────        ────────────────────     ──────────────────</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Framework                 You (graph edges)        The LLM</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Task-centric:             Graph-centric:           Agent-centric:</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">define WHAT each          define HOW data          agents decide WHEN</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">agent should do           flows between nodes      to pass the baton</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">crew.run()                app.invoke(state)        workflow.run()</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">executes the              executes the             lets the LLM</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">task sequence             graph                    improvise</span><br></div></code></pre></div></div>
<p><strong>SynapseKit is task-centric.</strong> You define what each agent should produce (<code>expected_output</code>) and what context it needs (<code>context_from</code>). The framework manages the sequencing. You don't write the routing logic - you declare the dependency graph and let the Crew executor handle it.</p>
<p><strong>LangChain (LangGraph) is graph-centric.</strong> You define nodes (functions) and edges (transitions). The LLM is just a function inside a node - it has no special status. This means the orchestration logic is entirely under your control. Want to add a conditional branch that routes to a fact-checker if confidence is low? That's one <code>add_conditional_edges</code> call. Want to loop back to the researcher if the writer rejects the output? Same. LangGraph doesn't care what's inside each node.</p>
<p><strong>LlamaIndex is agent-centric.</strong> Agents decide when to hand off via tool calls. The <code>AgentWorkflow</code> sets up which agents can hand to whom (<code>can_handoff_to</code>), then runs the root agent and lets the LLM drive. The orchestration is emergent - which means it's also less predictable. If the researcher agent decides not to call <code>handoff_to_writer</code>, the workflow stalls.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="what-the-loc-gap-actually-costs">What the LoC Gap Actually Costs<a href="https://engineersofai.com/blog/ai-letters-26-multi-agent-orchestration#what-the-loc-gap-actually-costs" class="hash-link" aria-label="Direct link to What the LoC Gap Actually Costs" title="Direct link to What the LoC Gap Actually Costs" translate="no">​</a></h2>
<p>LangChain's 23 lines include 4 lines of <code>TypedDict</code> state definition, 2 function definitions with LLM calls, and 6 lines of graph wiring. None of that is boilerplate you can skip in a real pipeline - the TypedDict is your contract between nodes, the functions are your agent logic, the graph wiring is your orchestration.</p>
<p>SynapseKit's 16 lines hide that complexity inside the framework. <code>CrewAgent</code>, <code>Task</code>, and <code>Crew</code> are opinionated abstractions. The question isn't whether the code is shorter - it is. The question is what you lose when the abstraction doesn't fit your use case.</p>
<p>Custom tool cost from the previous benchmark (#25): SynapseKit requires subclassing <code>BaseTool</code>. LangChain requires a decorator. If you're building a pipeline where the agents need tools the framework doesn't provide, that cost repeats for every tool.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-parallel-and-supervisor-gap">The Parallel and Supervisor Gap<a href="https://engineersofai.com/blog/ai-letters-26-multi-agent-orchestration#the-parallel-and-supervisor-gap" class="hash-link" aria-label="Direct link to The Parallel and Supervisor Gap" title="Direct link to The Parallel and Supervisor Gap" translate="no">​</a></h2>
<p>LlamaIndex's 3/6 pattern score is the number that should influence framework choice.</p>
<p>If your multi-agent system ever needs to run two agents simultaneously - a web-searcher and a database-queryer both working on different subtasks, then merging results - LlamaIndex requires you to build that yourself. <code>AgentWorkflow</code> executes agents in sequence via handoffs. There is no built-in parallel branch.</p>
<p>Supervisor routing is similar. If you need a routing agent that decides which specialist to call based on query type, you're writing that logic yourself on LlamaIndex. SynapseKit ships <code>SupervisorAgent(llm, workers)</code>. LangChain gives you a supervisor node pattern in LangGraph.</p>
<p>For simple sequential pipelines, LlamaIndex's limitation doesn't matter. For anything with conditional branching, parallel execution, or dynamic routing, the 3/6 score is a constraint you'll hit.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="what-this-means-for-engineers">What This Means for Engineers<a href="https://engineersofai.com/blog/ai-letters-26-multi-agent-orchestration#what-this-means-for-engineers" class="hash-link" aria-label="Direct link to What This Means for Engineers" title="Direct link to What This Means for Engineers" translate="no">​</a></h2>
<ol>
<li class="">
<p><strong>SynapseKit's Crew is the fastest path for linear pipelines.</strong> Researcher → Writer → Reviewer in sequence, with context passing? 16 lines, one <code>crew.run()</code> call. If that's the pattern, use it.</p>
</li>
<li class="">
<p><strong>LangGraph's graph-centric model is not verbosity - it's explicitness.</strong> Every edge in your multi-agent graph is a line of code you wrote. That means every routing decision is auditable, testable, and reproducible. When the pipeline behaves unexpectedly, you read the graph.</p>
</li>
<li class="">
<p><strong>LlamaIndex's emergent handoff is a bet on the LLM.</strong> The agent decides when to pass work to the next agent. That's elegant when it works. When the LLM misses the handoff signal or calls it at the wrong point in the task, you're debugging LLM behaviour rather than framework behaviour. Plan for it.</p>
</li>
<li class="">
<p><strong>Parallel execution is not a nice-to-have.</strong> Any pipeline that can decompose work across independent agents - and most real workflows can - benefits from parallel execution. The latency difference between sequential and parallel runs compounds as agent count grows.</p>
</li>
<li class="">
<p><strong>The custom tool cost from #25 still applies here.</strong> Multi-agent pipelines need agents with tools. The LoC advantage SynapseKit holds on agent setup shrinks once you're writing custom tools that don't fit their <code>BaseTool</code> subclass pattern.</p>
</li>
</ol>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-thing-most-people-miss">The Thing Most People Miss<a href="https://engineersofai.com/blog/ai-letters-26-multi-agent-orchestration#the-thing-most-people-miss" class="hash-link" aria-label="Direct link to The Thing Most People Miss" title="Direct link to The Thing Most People Miss" translate="no">​</a></h2>
<p>The LoC benchmarks consistently show SynapseKit winning on setup conciseness. This is real. It is also the least important property of a multi-agent system in production.</p>
<p>What matters in production:</p>
<ul>
<li class="">Can you inspect the state between agents?</li>
<li class="">Can you replay a failed run from a specific node?</li>
<li class="">Can you test individual agents in isolation?</li>
<li class="">Can you add a conditional branch without rewriting the pipeline?</li>
</ul>
<p>LangGraph answers all four yes. SynapseKit answers the first two partially - <code>return_intermediate_steps</code> isn't built into <code>Crew</code> the same way it is in <code>AgentExecutor</code>. LlamaIndex answers all four with varying difficulty.</p>
<p>The framework that wins the LoC race is the one you spend the least time setting up. The framework that wins the production race is the one you spend the least time debugging. Those are different frameworks, and the benchmark is measuring the wrong thing if you're building something that runs for more than a sprint.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="three-things-worth-doing-this-week">Three Things Worth Doing This Week<a href="https://engineersofai.com/blog/ai-letters-26-multi-agent-orchestration#three-things-worth-doing-this-week" class="hash-link" aria-label="Direct link to Three Things Worth Doing This Week" title="Direct link to Three Things Worth Doing This Week" translate="no">​</a></h2>
<ol>
<li class="">
<p><strong>Map your current multi-agent system to the pattern matrix.</strong> Which of the six patterns does it actually use? If the answer is only "sequential" and "shared state", LlamaIndex's 3/6 is irrelevant to you.</p>
</li>
<li class="">
<p><strong>Build one conditional branch into an existing sequential pipeline.</strong> Take any two-step agent pipeline and add a condition: "if output confidence is low, loop back". That's where LangGraph's graph-centric model pays for its verbosity.</p>
</li>
<li class="">
<p><strong>Check whether your handoffs are deterministic.</strong> If your agents hand off via LLM tool calls (LlamaIndex model), run the same pipeline five times and check whether the handoff happens at the same point each time. If it doesn't, you have a reliability problem you may not have noticed yet.</p>
</li>
</ol>
<p>The LoC race is over by the second week of production. The debuggability race never ends.</p>
<p><em>Engineers of AI</em></p>
<p><strong>Read more: <a href="https://www.engineersofai.com/" target="_blank" rel="noopener noreferrer" class="">www.engineersofai.com</a></strong></p>
<p><em>If this was useful, forward it to one engineer who should be reading it.</em></p>]]></content:encoded>
            <category>AI Letters</category>
            <category>LLM</category>
            <category>Agents</category>
            <category>Benchmarks</category>
        </item>
        <item>
            <title><![CDATA[SynapseKit - A Production-Grade LLM Framework Built for Speed, Simplicity, and Scale]]></title>
            <link>https://engineersofai.com/blog/i-built-a-lightweight-llm-framework</link>
            <guid>https://engineersofai.com/blog/i-built-a-lightweight-llm-framework</guid>
            <pubDate>Wed, 15 Apr 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[SynapseKit is an async-first Python framework for LLM applications - 2 dependencies, 48 built-in tools, 31 providers, and multi-agent orchestration out of the box. Built for engineers who ship to production, not engineers who demo on notebooks.]]></description>
            <content:encoded><![CDATA[<p>SynapseKit is an async-first Python framework for building LLM applications - chains, agents, RAG pipelines, tool calling, and multi-agent orchestration. Two base dependencies. 48 built-in tools. 31 LLM providers. Designed for engineers who need production-grade tooling without production-grade complexity.</p>
<!-- -->
<blockquote>
<p>"The right abstraction disappears. You stop thinking about the framework and start thinking about the problem."</p>
</blockquote>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="what-synapsekit-is">What SynapseKit Is<a href="https://engineersofai.com/blog/i-built-a-lightweight-llm-framework#what-synapsekit-is" class="hash-link" aria-label="Direct link to What SynapseKit Is" title="Direct link to What SynapseKit Is" translate="no">​</a></h2>
<p>SynapseKit is an open-source Python framework for building applications powered by large language models. It covers the full surface area - from a single LLM call to multi-agent orchestration with cost guardrails - with a design philosophy that prioritizes speed, debuggability, and minimal abstraction.</p>
<p><strong>The core principle:</strong> every layer of abstraction must earn its place by making the engineer faster, not by making the framework more flexible.</p>
<p>What ships in the box:</p>
<ul>
<li class=""><strong>31 LLM providers</strong> - OpenAI, Anthropic, Google, Mistral, Cohere, Ollama, and 25 more. Switch providers by changing one string.</li>
<li class=""><strong>48 built-in tools</strong> - 12 work with zero configuration. No pip install, no API key, no setup.</li>
<li class=""><strong>43 document loaders</strong> - PDF, HTML, CSV, JSON, Markdown, DOCX, and more. Standardized interface across all formats.</li>
<li class=""><strong>Multi-agent primitives</strong> - Sequential, parallel, supervisor, hierarchical, pipeline, and feedback loop patterns. All six supported out of the box.</li>
<li class=""><strong>MCP server support</strong> - Model Context Protocol integration for tool-rich agent deployments.</li>
<li class=""><strong>Cost guardrails</strong> - Built into the execution engine. Set a budget, the agent stops cleanly instead of burning your API credits.</li>
</ul>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="design-philosophy">Design Philosophy<a href="https://engineersofai.com/blog/i-built-a-lightweight-llm-framework#design-philosophy" class="hash-link" aria-label="Direct link to Design Philosophy" title="Direct link to Design Philosophy" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="two-dependencies">Two Dependencies<a href="https://engineersofai.com/blog/i-built-a-lightweight-llm-framework#two-dependencies" class="hash-link" aria-label="Direct link to Two Dependencies" title="Direct link to Two Dependencies" translate="no">​</a></h3>
<p>SynapseKit's base install pulls two packages. Not 67. Not 43. Two.</p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">SynapseKit:  2 dependencies  · 48 MB RAM  · 80ms cold start</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">LangChain:  67 dependencies  · 189 MB RAM · 2,400ms cold start</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">LlamaIndex: 43 dependencies  · 112 MB RAM · 1,100ms cold start</span><br></div></code></pre></div></div>
<p>Fewer dependencies means fewer version conflicts, faster installs, smaller container images, and cold starts that don't punish your users. In serverless deployments where every scale-from-zero event pays the cold start tax, 80ms vs 2.4 seconds is the difference between responsive and broken.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="async-from-the-ground-up">Async From the Ground Up<a href="https://engineersofai.com/blog/i-built-a-lightweight-llm-framework#async-from-the-ground-up" class="hash-link" aria-label="Direct link to Async From the Ground Up" title="Direct link to Async From the Ground Up" translate="no">​</a></h3>
<p>Every base class - <code>BaseTool</code>, <code>BaseRetriever</code>, <code>BaseLLM</code> - is <code>async def</code> by default. Not sync with an async wrapper bolted on. Not <code>run_in_executor</code> hiding a blocking call.</p>
<p>This matters because async correctness propagates. When the base class is async, every implementation is async. Contributors don't accidentally write sync tools. The framework never silently dispatches to a thread pool. At 50 concurrent requests, SynapseKit achieves 96.8% of theoretical throughput - near-baseline async efficiency.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="shallow-call-stacks">Shallow Call Stacks<a href="https://engineersofai.com/blog/i-built-a-lightweight-llm-framework#shallow-call-stacks" class="hash-link" aria-label="Direct link to Shallow Call Stacks" title="Direct link to Shallow Call Stacks" translate="no">​</a></h3>
<p>When something fails at 3am in production, the traceback is 8 lines, not 47. The agent loop is 47 lines of readable Python. No <code>RunnableSequence.__call__</code> chains, no middleware dispatch, no callback manager traversal. You read the error, you find the bug, you fix it.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="one-tool-interface">One Tool Interface<a href="https://engineersofai.com/blog/i-built-a-lightweight-llm-framework#one-tool-interface" class="hash-link" aria-label="Direct link to One Tool Interface" title="Direct link to One Tool Interface" translate="no">​</a></h3>
<p>Define a tool once with a JSON schema. Export to OpenAI format with <code>.schema()</code>. Export to Anthropic format with <code>.anthropic_schema()</code>. Same source of truth, zero duplication. One definition that works across all 31 providers.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="what-you-can-build">What You Can Build<a href="https://engineersofai.com/blog/i-built-a-lightweight-llm-framework#what-you-can-build" class="hash-link" aria-label="Direct link to What You Can Build" title="Direct link to What You Can Build" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="rag-pipelines">RAG Pipelines<a href="https://engineersofai.com/blog/i-built-a-lightweight-llm-framework#rag-pipelines" class="hash-link" aria-label="Direct link to RAG Pipelines" title="Direct link to RAG Pipelines" translate="no">​</a></h3>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> synapsekit </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> LLM</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> RAGPipeline</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> PDFLoader</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">docs </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> PDFLoader</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"reports/"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">load</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">rag </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> RAGPipeline</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">docs</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">docs</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> llm</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">LLM</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"openai/gpt-4o"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">rag</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">build</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">answer </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">await</span><span class="token plain"> rag</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">query</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"What were Q3 revenue figures?"</span><span class="token punctuation" style="color:#393A34">)</span><br></div></code></pre></div></div>
<p>Seven lines. Load, build, query. Chunking, embedding, indexing, retrieval, and generation - all handled. Switch to Anthropic by changing <code>"openai/gpt-4o"</code> to <code>"anthropic/claude-sonnet-4-20250514"</code>. Nothing else changes.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="agents-with-tools">Agents with Tools<a href="https://engineersofai.com/blog/i-built-a-lightweight-llm-framework#agents-with-tools" class="hash-link" aria-label="Direct link to Agents with Tools" title="Direct link to Agents with Tools" translate="no">​</a></h3>
<p>Built-in tools for calculation, datetime, web search, file operations, and more. Define custom tools with a class and a JSON schema. The agent loop handles reasoning, tool selection, execution, and observation routing.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="multi-agent-orchestration">Multi-Agent Orchestration<a href="https://engineersofai.com/blog/i-built-a-lightweight-llm-framework#multi-agent-orchestration" class="hash-link" aria-label="Direct link to Multi-Agent Orchestration" title="Direct link to Multi-Agent Orchestration" translate="no">​</a></h3>
<p>The <code>Crew</code> and <code>Task</code> primitives support six orchestration patterns. Declare dependencies between tasks, not between agents. The framework handles execution order, context passing, and result aggregation.</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> synapsekit </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> Crew</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> Task</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> Agent</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">researcher </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> Agent</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">name</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"researcher"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> tools</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">search_tool</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">writer </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> Agent</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">name</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"writer"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> tools</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">[</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">research_task </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> Task</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">agent</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">researcher</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> description</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"Find latest data on X"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">write_task </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> Task</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">agent</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">writer</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> description</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"Write report"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> context_from</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">research_task</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">crew </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> Crew</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">agents</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">researcher</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> writer</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> tasks</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">research_task</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> write_task</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">result </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">await</span><span class="token plain"> crew</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">run</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><br></div></code></pre></div></div>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="streaming">Streaming<a href="https://engineersofai.com/blog/i-built-a-lightweight-llm-framework#streaming" class="hash-link" aria-label="Direct link to Streaming" title="Direct link to Streaming" translate="no">​</a></h3>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">async</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> token </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> llm</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">stream</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"Explain quantum computing"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">print</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">token</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> end</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">""</span><span class="token punctuation" style="color:#393A34">)</span><br></div></code></pre></div></div>
<p>First-class streaming with the cleanest API across any framework. No callback handlers, no special configuration.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="where-synapsekit-fits">Where SynapseKit Fits<a href="https://engineersofai.com/blog/i-built-a-lightweight-llm-framework#where-synapsekit-fits" class="hash-link" aria-label="Direct link to Where SynapseKit Fits" title="Direct link to Where SynapseKit Fits" translate="no">​</a></h2>
<p>SynapseKit is built for a specific engineer: the one building LLM-powered products that need to work reliably in production, not just in a notebook demo.</p>
<p><strong>Use SynapseKit when:</strong></p>
<ul>
<li class="">You need fast cold starts (serverless, edge, CLI tools)</li>
<li class="">You want minimal dependency footprint in containerized deployments</li>
<li class="">You're building agent-heavy applications with multiple tools</li>
<li class="">You need to switch between LLM providers without rewriting code</li>
<li class="">You want cost controls built into the execution layer</li>
</ul>
<p><strong>Consider alternatives when:</strong></p>
<ul>
<li class="">You need LlamaIndex's advanced chunking strategies (<code>SemanticSplitterNodeParser</code>, <code>KnowledgeGraphIndex</code>)</li>
<li class="">You need LangChain's ecosystem breadth and community integrations</li>
<li class="">You need LangChain's <code>ToolException</code> error recovery pattern for complex agent loops</li>
</ul>
<p>We publish these tradeoffs openly. The 30-notebook LLM Framework Showdown on Kaggle benchmarks SynapseKit against LangChain and LlamaIndex across 18 production dimensions - including the dimensions where SynapseKit loses. Honest benchmarking means publishing the uncomfortable numbers too.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-vision">The Vision<a href="https://engineersofai.com/blog/i-built-a-lightweight-llm-framework#the-vision" class="hash-link" aria-label="Direct link to The Vision" title="Direct link to The Vision" translate="no">​</a></h2>
<p>LLM frameworks today are where web frameworks were in 2010. Too many abstractions solving for flexibility instead of velocity. Too much ceremony for simple operations. Too many dependencies for production deployments.</p>
<p>SynapseKit is a bet on a different direction: <strong>that the best framework is the one that disappears.</strong> You think about your application logic, not about the framework's internal architecture. You debug your code, not the framework's middleware. You deploy with confidence because you understand every line between your function call and the LLM API.</p>
<p>The roadmap:</p>
<ul>
<li class=""><strong>Evaluation harness</strong> - standardized benchmarks you can run against your own agents</li>
<li class=""><strong>Visual debugger</strong> - trace agent execution, tool calls, and token usage in real time</li>
<li class=""><strong>Plugin marketplace</strong> - community tools and integrations with a single install command</li>
<li class=""><strong>Enterprise features</strong> - audit logging, role-based access, deployment presets for AWS/GCP/Azure</li>
</ul>
<p>SynapseKit is MIT-licensed, fully open source, and built in the open. Every design decision is documented. Every benchmark is reproducible. Every line of code is readable.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="get-started">Get Started<a href="https://engineersofai.com/blog/i-built-a-lightweight-llm-framework#get-started" class="hash-link" aria-label="Direct link to Get Started" title="Direct link to Get Started" translate="no">​</a></h2>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">pip install synapsekit</span><br></div></code></pre></div></div>
<ul>
<li class=""><strong>GitHub:</strong> <a href="https://github.com/SynapseKit/SynapseKit" target="_blank" rel="noopener noreferrer" class="">github.com/SynapseKit/SynapseKit</a></li>
<li class=""><strong>Benchmarks:</strong> <a href="https://www.kaggle.com/discussions/general/688339" target="_blank" rel="noopener noreferrer" class="">LLM Framework Showdown on Kaggle</a></li>
<li class=""><strong>Documentation:</strong> Ships with the package</li>
</ul>
<p>Two dependencies. One <code>pip install</code>. Start building.</p>
<hr>
<p><em>Engineers of AI</em></p>
<p><strong>Read more: <a href="https://www.engineersofai.com/" target="_blank" rel="noopener noreferrer" class="">www.engineersofai.com</a></strong></p>
<p><em>If this was useful, forward it to one engineer who should be reading it.</em></p>]]></content:encoded>
            <category>AI Letters</category>
            <category>LLM Frameworks</category>
            <category>Open Source</category>
        </item>
        <item>
            <title><![CDATA[AI Letters #25 - The Built-in Tool Race: 30 vs 29 vs 12 (And Why the Headline Number Lies)]]></title>
            <link>https://engineersofai.com/blog/ai-letters-25-builtin-tools</link>
            <guid>https://engineersofai.com/blog/ai-letters-25-builtin-tools</guid>
            <pubDate>Mon, 13 Apr 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[Notebook]]></description>
            <content:encoded><![CDATA[<blockquote>
<p>"Both SynapseKit and LangChain claim roughly 30 built-in tools. The difference is whether 'built-in' means 'works on install' or 'works after twelve more pip installs'."</p>
</blockquote>
<p>Every LLM framework advertises its tool ecosystem. The numbers look impressive in the docs. Then you try to actually use them and discover that half of them require a separate pip install, a third require an API key, and a handful only work on specific operating systems.</p>
<p>Notebook #17 of the LLM Showdown did the audit nobody does in the benchmarks: count only what actually ships in the base install, then split by what works with zero configuration versus what needs extra setup. The headline totals are almost identical - 30, 29, 12. The zero-config counts are not.</p>
<!-- -->
<div style="display:flex;flex-direction:column;gap:16px;margin:24px 0">
<a href="https://engineersofai.com/blog-visuals/ai-letters-25/timeline.html" target="_blank" rel="noopener noreferrer" style="text-decoration:none" class="">
<div style="border:1px solid #e2e8f0;border-radius:12px;padding:20px 24px;background:#fff;cursor:pointer">
<div style="font-size:0.72rem;font-weight:700;letter-spacing:0.08em;text-transform:uppercase;color:#6366f1;margin-bottom:6px">Interactive Chart</div>
<div style="font-size:1rem;font-weight:700;color:#0f172a;margin-bottom:6px">Tool Ecosystem Timeline →</div>
<div style="font-size:0.88rem;color:#475569;line-height:1.6">From the @tool decorator to batteries-included frameworks. Click each milestone to see how each design philosophy evolved and what it costs today.</div>
</div>
</a>
<a href="https://engineersofai.com/blog-visuals/ai-letters-25/paradigms.html" target="_blank" rel="noopener noreferrer" style="text-decoration:none" class="">
<div style="border:1px solid #e2e8f0;border-radius:12px;padding:20px 24px;background:#fff;cursor:pointer">
<div style="font-size:0.72rem;font-weight:700;letter-spacing:0.08em;text-transform:uppercase;color:#0ea5e9;margin-bottom:6px">Interactive Explorer</div>
<div style="font-size:1rem;font-weight:700;color:#0f172a;margin-bottom:6px">Tool Category Explorer →</div>
<div style="font-size:0.88rem;color:#475569;line-height:1.6">Click any of 9 capability categories to see which frameworks cover it - and whether the tools work immediately or need extra pip installs and API keys.</div>
</div>
</a>
<a href="https://engineersofai.com/blog-visuals/ai-letters-25/evidence.html" target="_blank" rel="noopener noreferrer" style="text-decoration:none" class="">
<div style="border:1px solid #e2e8f0;border-radius:12px;padding:20px 24px;background:#fff;cursor:pointer">
<div style="font-size:0.72rem;font-weight:700;letter-spacing:0.08em;text-transform:uppercase;color:#10b981;margin-bottom:6px">Evidence Dashboard</div>
<div style="font-size:1rem;font-weight:700;color:#0f172a;margin-bottom:6px">Full Benchmark Results →</div>
<div style="font-size:0.88rem;color:#475569;line-height:1.6">Total tools stacked, zero-config breakdown, category heatmap - all data from notebook #17 in one view.</div>
</div>
</a>
</div>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="what-the-numbers-actually-mean">What the Numbers Actually Mean<a href="https://engineersofai.com/blog/ai-letters-25-builtin-tools#what-the-numbers-actually-mean" class="hash-link" aria-label="Direct link to What the Numbers Actually Mean" title="Direct link to What the Numbers Actually Mean" translate="no">​</a></h2>
<p>The benchmark defines built-in strictly: only tools included when you run <code>pip install framework</code>. Third-party integrations requiring a separate <code>pip install</code> per tool are counted separately.</p>
<p><strong>Total built-in tools:</strong></p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">Framework       Core tools   Extra-pip tools   Total</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">-----------------------------------------------------</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">SynapseKit              30                 0      30</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">LangChain               17                12      29</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">LlamaIndex               3                 9      12</span><br></div></code></pre></div></div>
<p>SynapseKit and LangChain are nearly tied on total. But LangChain's 12 community tools each require a separate install - <code>pip install duckduckgo-search</code>, <code>pip install slack-sdk</code>, <code>pip install arxiv</code> - before they do anything. SynapseKit's 30 ship as implementations, not wrappers. LlamaIndex has 3 core tool types (FunctionTool, QueryEngineTool, RetrieverTool) and 9 hub packages, all requiring <code>pip install llama-index-tools-*</code>.</p>
<p><strong>Zero-config tools (no API key, no extra pip install):</strong></p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">Framework       Zero-config tools</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">----------------------------------</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">SynapseKit                     12</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">LangChain                      10</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">LlamaIndex                      3</span><br></div></code></pre></div></div>
<p>This is the number that matters for prototyping speed. SynapseKit gives you calculator, datetime, regex, file I/O, Python REPL, shell, HTTP requests, web scraping, and human input - zero additional setup. LangChain gives you file management and shell tools, plus the <code>@tool</code> decorator pattern itself. LlamaIndex gives you the three wrapper types and nothing else that runs without extra installs.</p>
<p><strong>Category coverage:</strong></p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">Framework       Categories covered</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">------------------------------------</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">SynapseKit                       9</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">LangChain                        9</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">LlamaIndex                       5</span><br></div></code></pre></div></div>
<p>SynapseKit and LangChain both cover 9 distinct capability areas. LlamaIndex covers 5 - and its coverage is mostly retrieval-oriented, which matches its RAG-first design.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-design-philosophy-underneath-the-numbers">The Design Philosophy Underneath the Numbers<a href="https://engineersofai.com/blog/ai-letters-25-builtin-tools#the-design-philosophy-underneath-the-numbers" class="hash-link" aria-label="Direct link to The Design Philosophy Underneath the Numbers" title="Direct link to The Design Philosophy Underneath the Numbers" translate="no">​</a></h2>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">Tool philosophy - what "built-in" actually means</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">SynapseKit          LangChain           LlamaIndex</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">──────────────      ──────────────      ──────────────</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Implementations     Thin wrappers       Primitives only</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">ship in package     delegate to         tools are app-</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">                    third-party libs    level concerns</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">pip install X       pip install X       pip install X</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">-&gt; tool works       + pip install Y     + pip install</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">                    per community tool  llama-index-</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">                                       tools-* per tool</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">30 tools ready      17 tools ready      3 types ready</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">12 need nothing     10 need nothing     3 need nothing</span><br></div></code></pre></div></div>
<p>LangChain's approach is deliberate. Thin wrappers mean the framework doesn't own the dependency - the underlying library (DuckDuckGo, Slack, arXiv) handles updates, auth, rate limiting. The wrapper just shapes it into the tool interface. The cost: an extra pip install every time you want a new capability, and occasional version conflicts between the wrapper and the underlying library.</p>
<p>SynapseKit's approach means more to maintain internally - when the DuckDuckGo API changes, SynapseKit's implementation breaks, not a third-party wrapper. The benefit: <code>pip install synapsekit</code> and you have a working web search tool.</p>
<p>LlamaIndex made a different bet entirely: tools are application concerns, not framework concerns. You build what you need with FunctionTool. The hub packages exist for common cases but they're optional additions, not the core product.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="what-this-means-for-engineers">What This Means for Engineers<a href="https://engineersofai.com/blog/ai-letters-25-builtin-tools#what-this-means-for-engineers" class="hash-link" aria-label="Direct link to What This Means for Engineers" title="Direct link to What This Means for Engineers" translate="no">​</a></h2>
<ol>
<li class="">
<p><strong>For prototyping and hackathons:</strong> SynapseKit's 12 zero-config tools mean you can build a working agent that does web scraping, file I/O, Python execution, and HTTP calls before you've set up a single API key. That's a real time advantage in time-constrained settings.</p>
</li>
<li class="">
<p><strong>The LangChain community tool count is misleading.</strong> When comparing frameworks, don't count community wrappers the same as core tools. A wrapper that requires 3 extra pip installs and an API key is not in the same category as a tool that works immediately.</p>
</li>
<li class="">
<p><strong>LlamaIndex's 3 core tools are not a weakness - they're a constraint.</strong> The framework explicitly doesn't try to solve the tool problem. If you're already using LlamaIndex for retrieval, your query engines and retrievers become first-class tools. Everything else you write yourself with FunctionTool.</p>
</li>
<li class="">
<p><strong>Multimodal is where the gap is largest.</strong> SynapseKit ships ImageAnalysisTool, SpeechToTextTool, and TextToSpeechTool in base. LangChain's multimodal tools (OpenAITextToSpeechTool, OpenAIWhisperParser) require the OpenAI package and API key. LlamaIndex has nothing multimodal in core.</p>
</li>
<li class="">
<p><strong>The "thin wrapper" model has long-term benefits.</strong> LangChain's community tools don't go stale the same way SynapseKit's implementations might - the underlying library handles the API. For production systems running for years, that matters.</p>
</li>
</ol>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-thing-most-people-miss">The Thing Most People Miss<a href="https://engineersofai.com/blog/ai-letters-25-builtin-tools#the-thing-most-people-miss" class="hash-link" aria-label="Direct link to The Thing Most People Miss" title="Direct link to The Thing Most People Miss" translate="no">​</a></h2>
<p>The zero-config number is a proxy for something deeper: how much cognitive overhead does the framework impose before you can test an idea? Twelve pip installs plus API keys means twelve things to track, debug, and version-pin. Three zero-config tools means three things.</p>
<p>This matters most at the beginning of a project, when you're still figuring out whether your agent architecture is viable. If the framework makes you spend an hour on setup before you can test your first tool call, you're optimising the wrong variable.</p>
<p>SynapseKit wins on setup speed. LangChain wins on long-term maintainability of individual tools. LlamaIndex wins when your tools are retrieval pipelines you were building anyway.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="three-things-worth-doing-this-week">Three Things Worth Doing This Week<a href="https://engineersofai.com/blog/ai-letters-25-builtin-tools#three-things-worth-doing-this-week" class="hash-link" aria-label="Direct link to Three Things Worth Doing This Week" title="Direct link to Three Things Worth Doing This Week" translate="no">​</a></h2>
<ol>
<li class="">
<p><strong>Audit your current agent's tool imports.</strong> Count how many separate pip installs they require. If it's more than 5, ask whether that's accidental complexity or intentional.</p>
</li>
<li class="">
<p><strong>Test your critical tools with no internet access.</strong> Zero-config tools that only need local resources are more reliable in production than tools that call external APIs for every invocation.</p>
</li>
<li class="">
<p><strong>Read the tool source, not just the docs.</strong> For any LangChain community tool you use, find the underlying library it wraps. That library's changelog is more relevant to your upgrade path than LangChain's.</p>
</li>
</ol>
<p>The built-in tool count is marketing. The zero-config count is engineering. Know which number you're optimising for.</p>
<p><em>Engineers of AI</em></p>
<p><strong>Read more: <a href="https://www.engineersofai.com/" target="_blank" rel="noopener noreferrer" class="">www.engineersofai.com</a></strong></p>
<p><em>If this was useful, forward it to one engineer who should be reading it.</em></p>]]></content:encoded>
            <category>AI Letters</category>
            <category>LLM</category>
            <category>Agents</category>
            <category>Benchmarks</category>
        </item>
        <item>
            <title><![CDATA[AI Letters #24 - ReAct Agents: Six Lines vs Nineteen (And What You Lose in Between)]]></title>
            <link>https://engineersofai.com/blog/ai-letters-24-react-agents</link>
            <guid>https://engineersofai.com/blog/ai-letters-24-react-agents</guid>
            <pubDate>Sun, 12 Apr 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[Notebook]]></description>
            <content:encoded><![CDATA[<blockquote>
<p>"Six lines to build a working ReAct agent sounds like a win. It is - until your agent starts looping and you have no idea why."</p>
</blockquote>
<p>The ReAct loop is the first pattern every engineer reaches for when they need an agent. Thought, Action, Observation. Repeat until done. It's elegant on paper. In production it breaks in exactly the ways you'd expect: infinite loops, wrong tool selection, hallucinated tool calls that return nothing useful.</p>
<p>The question isn't whether ReAct agents work. It's whether your framework lets you see inside the loop when things go wrong.</p>
<p>Notebook #15 of the LLM Showdown measured three things: lines of code to build a working ReAct agent with two tools, the built-in tool inventory available without writing any tool code, and loop control parameters exposed to the caller. SynapseKit wins on LoC. LangChain wins on observability. LlamaIndex sits in the middle on both. The numbers are not the story. The tradeoff they reveal is.</p>
<!-- -->
<div style="display:flex;flex-direction:column;gap:16px;margin:24px 0">
<a href="https://engineersofai.com/blog-visuals/ai-letters-24/timeline.html" target="_blank" rel="noopener noreferrer" style="text-decoration:none" class="">
<div style="border:1px solid #e2e8f0;border-radius:12px;padding:20px 24px;background:#fff;cursor:pointer">
<div style="font-size:0.72rem;font-weight:700;letter-spacing:0.08em;text-transform:uppercase;color:#6366f1;margin-bottom:6px">Interactive Chart</div>
<div style="font-size:1rem;font-weight:700;color:#0f172a;margin-bottom:6px">ReAct Adoption Timeline →</div>
<div style="font-size:0.88rem;color:#475569;line-height:1.6">From the 2022 Princeton paper to three competing framework implementations. Click each milestone to see what each framework prioritized and what it traded away.</div>
</div>
</a>
<a href="https://engineersofai.com/blog-visuals/ai-letters-24/paradigms.html" target="_blank" rel="noopener noreferrer" style="text-decoration:none" class="">
<div style="border:1px solid #e2e8f0;border-radius:12px;padding:20px 24px;background:#fff;cursor:pointer">
<div style="font-size:0.72rem;font-weight:700;letter-spacing:0.08em;text-transform:uppercase;color:#0ea5e9;margin-bottom:6px">Interactive Explorer</div>
<div style="font-size:1rem;font-weight:700;color:#0f172a;margin-bottom:6px">ReAct Loop Explorer →</div>
<div style="font-size:0.88rem;color:#475569;line-height:1.6">Select a framework and step through Thought → Action → Observation to see exactly what each exposes at each step. Includes live code samples for all three.</div>
</div>
</a>
<a href="https://engineersofai.com/blog-visuals/ai-letters-24/evidence.html" target="_blank" rel="noopener noreferrer" style="text-decoration:none" class="">
<div style="border:1px solid #e2e8f0;border-radius:12px;padding:20px 24px;background:#fff;cursor:pointer">
<div style="font-size:0.72rem;font-weight:700;letter-spacing:0.08em;text-transform:uppercase;color:#10b981;margin-bottom:6px">Evidence Dashboard</div>
<div style="font-size:1rem;font-weight:700;color:#0f172a;margin-bottom:6px">Full Benchmark Results →</div>
<div style="font-size:0.88rem;color:#475569;line-height:1.6">LoC stacked charts, built-in tool inventory, loop control heatmap, and custom tool cost - all benchmark data from notebook #15 in one view.</div>
</div>
</a>
</div>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="what-react-actually-requires">What ReAct Actually Requires<a href="https://engineersofai.com/blog/ai-letters-24-react-agents#what-react-actually-requires" class="hash-link" aria-label="Direct link to What ReAct Actually Requires" title="Direct link to What ReAct Actually Requires" translate="no">​</a></h2>
<p>A minimal working ReAct agent needs four things: an LLM, at least one tool with a schema, a prompt that formats Thought/Action/Observation, and a loop that parses the model's output and dispatches tool calls. Getting all four wired together is where the frameworks diverge.</p>
<p>The benchmark task was identical across all three: define a calculator tool and a datetime tool, build a ReAct agent, run one query that requires at least one tool call.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-evidence">The Evidence<a href="https://engineersofai.com/blog/ai-letters-24-react-agents#the-evidence" class="hash-link" aria-label="Direct link to The Evidence" title="Direct link to The Evidence" translate="no">​</a></h2>
<p><strong>Lines of code - imports + setup to a working agent:</strong></p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">Framework       Imports  Functional   Total</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">--------------------------------------------</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">SynapseKit            3           3       6</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">LlamaIndex            3          10      13</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">LangChain             5          14      19</span><br></div></code></pre></div></div>
<p>SynapseKit gets to 6 lines because <code>CalculatorTool</code> and <code>DateTimeTool</code> are shipped in the library. You import them like any other class. There is no tool-definition code because there is nothing to define.</p>
<p>LangChain's 19 lines include two <code>@tool</code>-decorated functions - that's 10 lines of the gap right there. Strip those and LangChain's agent setup is 9 lines. The decorator approach is not verbose; it's complete. The tool code is what you'd write in any framework.</p>
<p>LlamaIndex at 13 lines uses <code>FunctionTool.from_defaults()</code> - plain Python functions wrapped into tool objects. Slightly more explicit than LangChain's decorator, slightly less so than SynapseKit's class hierarchy.</p>
<p><strong>Custom tool definition - what it costs when built-ins don't cover your use case:</strong></p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">SynapseKit    6 lines  (subclass BaseTool, implement async run())</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">LangChain     5 lines  (@tool decorator on any annotated function)</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">LlamaIndex    5 lines  (plain function + FunctionTool.from_defaults())</span><br></div></code></pre></div></div>
<p>SynapseKit's advantage evaporates here. The moment you need a tool that isn't in their library, you're writing more code than the alternatives, not less. The subclass pattern is also more rigid - you're tied to their async interface, their error handling convention, their schema format.</p>
<p><strong>Built-in tool inventory (no tool code required):</strong></p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">Framework        Built-in tools</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">--------------------------------</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">SynapseKit                   18</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">LangChain                    15</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">LlamaIndex                    9</span><br></div></code></pre></div></div>
<p>SynapseKit leads: web scraping, arxiv, PubMed, SQL, shell, Python REPL, translation, sentiment - all importable. LangChain has 15 but many require third-party API keys (Tavily, Brave, Google). LlamaIndex's 9 are mostly retrieval-oriented, which makes sense given its RAG-first heritage.</p>
<p><strong>Loop control parameters exposed to the caller:</strong></p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">Parameter                SynapseKit  LangChain  LlamaIndex</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">-----------------------------------------------------------</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">max_iterations           Yes         Yes        Yes</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">early stop               Yes         Yes        Yes</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">handle_parsing_error     Yes         Yes        Yes</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">verbose                  No          Yes        Yes</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">return_intermediate_steps No          Yes        Yes</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">async support            Yes         Yes        Yes</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Score (out of 6):          4           6          6</span><br></div></code></pre></div></div>
<p>This is the number that matters in production.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-contrast">The Contrast<a href="https://engineersofai.com/blog/ai-letters-24-react-agents#the-contrast" class="hash-link" aria-label="Direct link to The Contrast" title="Direct link to The Contrast" translate="no">​</a></h2>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">ReAct Loop - What You Can Observe</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">SynapseKit                    LangChain / LlamaIndex</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">──────────────────────        ──────────────────────────────</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">[Thought]                     [Thought]  &lt;- verbose logs</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">     |                              |</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">[Action]                      [Action]   &lt;- intermediate steps</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">     |                              |</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">[Observation]                 [Observation] &lt;- response.sources</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">     |                              |</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">[Answer]                      [Answer]</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">  ^ opaque                      ^ full trace available</span><br></div></code></pre></div></div>
<p>SynapseKit's loop runs. You get the final answer. What happened in between - which tools were called, in what order, with what arguments, what they returned - is not surfaced by default. There is no <code>verbose=True</code>. There is no <code>return_intermediate_steps</code>. If the agent gives you a wrong answer, your debugging path is: re-run with print statements you've injected manually, or read source code.</p>
<p>LangChain gives you <code>return_intermediate_steps=True</code> on <code>AgentExecutor</code>. Every thought, every tool call, every observation is accessible in the response object. LlamaIndex surfaces the same through <code>response.sources</code>. This is not a nice-to-have. It is the difference between an agent you can ship and an agent you can't explain.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="what-this-means-for-engineers">What This Means for Engineers<a href="https://engineersofai.com/blog/ai-letters-24-react-agents#what-this-means-for-engineers" class="hash-link" aria-label="Direct link to What This Means for Engineers" title="Direct link to What This Means for Engineers" translate="no">​</a></h2>
<ol>
<li class="">
<p><strong>The 6-line number is real but context-dependent.</strong> If your use case fits SynapseKit's 18 built-in tools, you genuinely write less code. If it doesn't, you write more.</p>
</li>
<li class="">
<p><strong>Observability is not optional in production.</strong> The first time a ReAct agent gives a customer a wrong answer, you will need to reconstruct exactly what it thought and did. SynapseKit makes that hard by default.</p>
</li>
<li class="">
<p><strong>LangChain's verbosity is load-bearing.</strong> <code>return_intermediate_steps</code>, <code>verbose</code>, <code>handle_parsing_errors</code> - these aren't academic features. They are the handles you grab during an incident.</p>
</li>
<li class="">
<p><strong>LlamaIndex at 13 lines is the quiet winner.</strong> FunctionTool is clean. <code>response.sources</code> gives you the trace. The tool count (9 built-in) is lower, but the RAG-tool integration is first-class. If you're already using LlamaIndex for retrieval, adding agents costs almost nothing structurally.</p>
</li>
<li class="">
<p><strong>The custom tool cost comparison exposes the real architecture.</strong> SynapseKit's BaseTool subclass is not burdensome at 6 lines - but it is a commitment. LangChain's <code>@tool</code> decorator composes with any Python function you already wrote. The closer your existing codebase is to plain Python, the more that matters.</p>
</li>
</ol>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-thing-most-people-miss">The Thing Most People Miss<a href="https://engineersofai.com/blog/ai-letters-24-react-agents#the-thing-most-people-miss" class="hash-link" aria-label="Direct link to The Thing Most People Miss" title="Direct link to The Thing Most People Miss" translate="no">​</a></h2>
<p>The benchmark measured the cost to build a ReAct agent. It didn't measure the cost to debug one. Debugging cost scales with agent complexity, agent usage, and how long the loop runs. A 6-line setup that produces an opaque loop will cost you more time over a quarter than a 19-line setup with full observability - assuming the agent actually runs in production. Most of them do, eventually.</p>
<p>The frameworks that win on setup lines tend to lose on debuggability. This is not a coincidence. It is the fundamental tradeoff in API design: the more you hide, the less you write. The more you expose, the more you can see.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="three-things-worth-doing-this-week">Three Things Worth Doing This Week<a href="https://engineersofai.com/blog/ai-letters-24-react-agents#three-things-worth-doing-this-week" class="hash-link" aria-label="Direct link to Three Things Worth Doing This Week" title="Direct link to Three Things Worth Doing This Week" translate="no">​</a></h2>
<ol>
<li class="">
<p><strong>Check your current agent setup for <code>return_intermediate_steps</code> or equivalent.</strong> If you can't reconstruct the last 10 agent traces from your logs, you don't have production observability yet.</p>
</li>
<li class="">
<p><strong>Audit your tool definitions.</strong> If they are tightly coupled to a framework's base class, write one clean Python function that does the same thing. Keep framework-agnostic logic separate from framework integration.</p>
</li>
<li class="">
<p><strong>Run notebook #15 yourself</strong> against your own framework of choice: <a href="https://github.com/engineersofai/llm-showdown" target="_blank" rel="noopener noreferrer" class="">github.com/engineersofai/llm-showdown</a>. The task is simple enough to replicate in 20 minutes. The loop control gaps show up immediately.</p>
</li>
</ol>
<p>The conciseness race is worth running. Just know what you're trading away when you win it.</p>
<p><em>Engineers of AI</em></p>
<p><strong>Read more: <a href="https://www.engineersofai.com/" target="_blank" rel="noopener noreferrer" class="">www.engineersofai.com</a></strong></p>
<p><em>If this was useful, forward it to one engineer who should be reading it.</em></p>]]></content:encoded>
            <category>AI Letters</category>
            <category>LLM</category>
            <category>Agents</category>
            <category>Benchmarks</category>
        </item>
        <item>
            <title><![CDATA[AI Letters #23 - The RAG Scorecard: Six Benchmarks, Three Frameworks, One Clear Pattern]]></title>
            <link>https://engineersofai.com/blog/ai-letters-23-week2-scorecard</link>
            <guid>https://engineersofai.com/blog/ai-letters-23-week2-scorecard</guid>
            <pubDate>Fri, 10 Apr 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[After six RAG benchmarks across PDF ingestion, BM25, hybrid search, streaming TTFT, and conversation memory - SynapseKit leads Week 2 with 15 points. Here is what the numbers actually mean.]]></description>
            <content:encoded><![CDATA[<blockquote>
<p>"Batteries-included beats fully-composable on conciseness every time. Fully-composable beats batteries-included on control every time. You just have to know which problem you're solving."</p>
</blockquote>
<p>Six notebooks. Six benchmarks. Three frameworks measured on the same RAG workloads, back to back, reproducible on Kaggle.</p>
<p>Week 1 of the LLM Showdown covered setup overhead: environment spin-up, indexing speed, basic retrieval, reranking, evaluation harnesses, and the Week 1 scorecard. SynapseKit won that one 15–7–8 (SK–LC–LI).</p>
<p>Week 2 went deeper into the RAG stack: PDF ingestion, chunking strategies, BM25 availability, hybrid search RRF, streaming time-to-first-token, and conversation memory. Same methodology. 3-2-1 points for rank 1-2-3 across each benchmark, ties split.</p>
<p>The results are not a surprise if you've been paying attention. But the magnitude of the gap on some dimensions is.</p>
<!-- -->
<div style="display:flex;flex-direction:column;gap:16px;margin:24px 0">
<a href="https://engineersofai.com/blog-visuals/ai-letters-23/timeline.html" target="_blank" rel="noopener noreferrer" style="text-decoration:none" class="">
<div style="border:1px solid #e2e8f0;border-radius:12px;padding:20px 24px;background:#fff;cursor:pointer">
<div style="font-size:0.72rem;font-weight:700;letter-spacing:0.08em;text-transform:uppercase;color:#6366f1;margin-bottom:6px">Interactive Chart</div>
<div style="font-size:1rem;font-weight:700;color:#0f172a;margin-bottom:6px">Two-Week Journey →</div>
<div style="font-size:0.88rem;color:#475569;line-height:1.6">All 14 notebooks, cumulative standings after each week, with notebook-by-notebook breakdown of winners and key findings.</div>
</div>
</a>
<a href="https://engineersofai.com/blog-visuals/ai-letters-23/paradigms.html" target="_blank" rel="noopener noreferrer" style="text-decoration:none" class="">
<div style="border:1px solid #e2e8f0;border-radius:12px;padding:20px 24px;background:#fff;cursor:pointer">
<div style="font-size:0.72rem;font-weight:700;letter-spacing:0.08em;text-transform:uppercase;color:#0ea5e9;margin-bottom:6px">Interactive Explorer</div>
<div style="font-size:1rem;font-weight:700;color:#0f172a;margin-bottom:6px">Benchmark Explorer →</div>
<div style="font-size:0.88rem;color:#475569;line-height:1.6">Click each of the 6 benchmarks to explore raw values, methodology, and what the result actually means in production.</div>
</div>
</a>
<a href="https://engineersofai.com/blog-visuals/ai-letters-23/evidence.html" target="_blank" rel="noopener noreferrer" style="text-decoration:none" class="">
<div style="border:1px solid #e2e8f0;border-radius:12px;padding:20px 24px;background:#fff;cursor:pointer">
<div style="font-size:0.72rem;font-weight:700;letter-spacing:0.08em;text-transform:uppercase;color:#10b981;margin-bottom:6px">Evidence Dashboard</div>
<div style="font-size:1rem;font-weight:700;color:#0f172a;margin-bottom:6px">Full Scorecard Dashboard →</div>
<div style="font-size:0.88rem;color:#475569;line-height:1.6">Complete points heatmap, stacked benchmark breakdown, raw values, and two-week cumulative standings in one view.</div>
</div>
</a>
</div>
<p>Here is what the data says.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="what-the-scorecard-shows">What the Scorecard Shows<a href="https://engineersofai.com/blog/ai-letters-23-week2-scorecard#what-the-scorecard-shows" class="hash-link" aria-label="Direct link to What the Scorecard Shows" title="Direct link to What the Scorecard Shows" translate="no">​</a></h2>
<p><strong>Week 2 final points:</strong></p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">Framework     #8   #9   #10  #11  #12  #13   Total</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">──────────────────────────────────────────────────</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">SynapseKit   3.0  1.0  3.0  2.0  3.0  3.0   15.0</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">LlamaIndex   2.0  3.0  1.5  1.0  2.0  2.0   11.5</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">LangChain    1.0  2.0  1.5  3.0  1.0  1.0    9.5</span><br></div></code></pre></div></div>
<p>SynapseKit wins 4 of 6 benchmarks. LangChain wins 1. LlamaIndex wins 1. Same pattern as Week 1, except LlamaIndex and LangChain swap second and third depending on the dimension.</p>
<p><strong>Two-week cumulative:</strong></p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">SynapseKit:  15 (W1) + 15 (W2) = 30</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">LlamaIndex:   8 (W1) + 11.5 (W2) = 19.5</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">LangChain:    7 (W1) + 9.5 (W2)  = 16.5</span><br></div></code></pre></div></div>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-evidence---benchmark-by-benchmark">The Evidence - Benchmark by Benchmark<a href="https://engineersofai.com/blog/ai-letters-23-week2-scorecard#the-evidence---benchmark-by-benchmark" class="hash-link" aria-label="Direct link to The Evidence - Benchmark by Benchmark" title="Direct link to The Evidence - Benchmark by Benchmark" translate="no">​</a></h2>
<p><strong>#8 - RAG from PDF (lines of code)</strong></p>
<p>SynapseKit loads a PDF into a retrieval pipeline in 7 lines. LangChain needs 13. LlamaIndex needs 11. The LangChain number is not lazy code - it requires a <code>PyPDFLoader</code>, a <code>RecursiveCharacterTextSplitter</code>, a vector store, and a retriever. Each is a separate abstraction. SynapseKit wraps all of that into one <code>RAGPipeline(pdf="...")</code> call.</p>
<p>Winner: SynapseKit. Margin: nearly 2x.</p>
<p><strong>#9 - Chunking Strategies (built-in splitter count)</strong></p>
<p>LlamaIndex wins this cleanly: 9 built-in splitters vs LangChain's 7 vs SynapseKit's 4. The two that matter are <code>SentenceWindowNodeParser</code> (retrieves surrounding sentences, not just the matched chunk) and <code>HierarchicalNodeParser</code> (builds a tree of chunks at different granularities). Neither exists in SynapseKit or LangChain. If your retrieval quality depends on chunk context, LlamaIndex is the right tool.</p>
<p>Winner: LlamaIndex. Not close.</p>
<p><strong>#10 - Built-in BM25 (extra packages required)</strong></p>
<p>SynapseKit bundles <code>rank_bm25</code> as a core dependency. LangChain and LlamaIndex both require you to install an extra package (<code>rank-bm25</code> and <code>llama-index-retrievers-bm25</code> respectively) before BM25 is available. Zero vs one extra <code>pip install</code>. It sounds trivial. At deployment time in a locked environment, it is not.</p>
<p>Winner: SynapseKit.</p>
<p><strong>#11 - Hybrid Search RRF (configurability score)</strong></p>
<p>LangChain wins this one, and it deserves to. <code>EnsembleRetriever</code> accepts an arbitrary list of retrievers and per-retriever weights. You can combine three different retrievers with custom weighting in a single constructor call. LlamaIndex's hybrid search has no weight control - it applies RRF with fixed parameters. SynapseKit sits in the middle: two-retriever support, fixed alpha weighting.</p>
<p>Winner: LangChain. Score: 5/5 vs SK's 4/5 vs LI's 3/5.</p>
<p><strong>#12 - Streaming TTFT (median framework overhead, ms)</strong></p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">SynapseKit:   0.001 ms</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">LlamaIndex:   0.184 ms</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">LangChain:    0.236 ms</span><br></div></code></pre></div></div>
<p>All three are sub-millisecond. SynapseKit's async generator adds the least overhead. But read the caveat in the takeaway section - this benchmark's winner does not matter in production.</p>
<p>Winner: SynapseKit. Winner that matters: nobody.</p>
<p><strong>#13 - Conversation Memory (lines of code to add memory)</strong></p>
<p>SynapseKit: 4 lines. LlamaIndex: 6 lines. LangChain: 12 lines.</p>
<p>LangChain's <code>RunnableWithMessageHistory</code> requires a store object, a getter function, a session ID, and LCEL wiring before the history is injected. SynapseKit exposes it as one constructor parameter: <code>memory=True</code>. The gap is 3x.</p>
<p>Winner: SynapseKit.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="what-this-means-for-engineers">What This Means for Engineers<a href="https://engineersofai.com/blog/ai-letters-23-week2-scorecard#what-this-means-for-engineers" class="hash-link" aria-label="Direct link to What This Means for Engineers" title="Direct link to What This Means for Engineers" translate="no">​</a></h2>
<ol>
<li class="">
<p><strong>SynapseKit wins on conciseness in every dimension where conciseness is the metric.</strong> PDF loading, BM25, memory wiring - all 3-4x fewer lines. If you are prototyping or building internal tooling where developer velocity matters more than edge-case flexibility, this is the path.</p>
</li>
<li class="">
<p><strong>LangChain wins when you need fine-grained control over retrieval composition.</strong> Hybrid search with custom weights across three retrievers is a real use case - recommendation engines, multi-index RAG, domain-specific blending. EnsembleRetriever handles this; SynapseKit's fixed alpha does not.</p>
</li>
<li class="">
<p><strong>LlamaIndex wins when chunking quality is the bottleneck.</strong> If you're working with long technical documents, legal text, or anything where retrieved chunk context matters, <code>SentenceWindowNodeParser</code> and <code>HierarchicalNodeParser</code> are not features - they are the reason to use LlamaIndex.</p>
</li>
<li class="">
<p><strong>The TTFT result is noise.</strong> Sub-millisecond framework overhead against a real LLM API that adds 300–2000ms of network latency. Do not let this benchmark influence your framework choice.</p>
</li>
<li class="">
<p><strong>Week 3 is where it gets interesting.</strong> Agents, tool calling, multi-agent orchestration - this is where the architectures diverge most sharply. SynapseKit's agent layer is newer. LangChain's is battle-tested. LlamaIndex's is designed for data-heavy agentic workflows. The conciseness advantage SynapseKit holds in RAG may not hold in agents.</p>
</li>
</ol>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-corollary-most-people-miss">The Corollary Most People Miss<a href="https://engineersofai.com/blog/ai-letters-23-week2-scorecard#the-corollary-most-people-miss" class="hash-link" aria-label="Direct link to The Corollary Most People Miss" title="Direct link to The Corollary Most People Miss" translate="no">​</a></h2>
<p>The benchmarks SynapseKit loses are the ones that reveal its design tradeoff. Fewer splitters means less chunking flexibility. Fixed hybrid search alpha means less retrieval control. No persistent memory backends (yet) means you own the storage problem.</p>
<p>SynapseKit is fast to write. It is not yet flexible to extend.</p>
<p>LangChain is slow to write. It is extremely flexible to extend - the entire LCEL composability model exists precisely to let you plug in arbitrary steps without rewriting the framework.</p>
<p>Neither is wrong. They are optimised for different constraints. The mistake is reaching for LangChain's full composability when you are building a standard RAG pipeline that SynapseKit already handles in 7 lines. The inverse mistake is reaching for SynapseKit when you need custom retrieval logic that requires LangChain's EnsembleRetriever.</p>
<p>Know which problem you have before you pick the tool.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="three-things-worth-doing-this-week">Three Things Worth Doing This Week<a href="https://engineersofai.com/blog/ai-letters-23-week2-scorecard#three-things-worth-doing-this-week" class="hash-link" aria-label="Direct link to Three Things Worth Doing This Week" title="Direct link to Three Things Worth Doing This Week" translate="no">​</a></h2>
<ol>
<li class="">
<p><strong>Run the notebooks yourself</strong> - all 6 are reproducible on Kaggle CPU. Fork <a href="https://www.kaggle.com/misternautiyal" target="_blank" rel="noopener noreferrer" class="">LLM Showdown #8 through #13</a>. Swap in your own documents and LLM endpoint. The numbers in your environment may differ from ours.</p>
</li>
<li class="">
<p><strong>Audit your chunking strategy.</strong> Most RAG implementations use <code>RecursiveCharacterTextSplitter</code> with default chunk size because it is the default. Check if <code>SentenceWindowNodeParser</code> or a sliding window approach would improve your retrieval precision. Run a quick eval on 20 representative queries before assuming it does not matter.</p>
</li>
<li class="">
<p><strong>Profile your own framework overhead end-to-end.</strong> Not the TTFT micro-benchmark we ran - the full round trip: query → retrieve → generate → first token to your user. That number is what your users experience. Framework choice is usually not in the top three factors.</p>
</li>
</ol>
<p>Week 3 covers agents. ReAct loops, function calling, tool libraries, multi-agent coordination, tracing, and error handling. The scorecard will look different. Check back.</p>
<p><em>Engineers of AI</em></p>
<p><strong>Read more: <a href="https://www.engineersofai.com/" target="_blank" rel="noopener noreferrer" class="">www.engineersofai.com</a></strong></p>
<p><em>If this was useful, forward it to one engineer who should be reading it.</em></p>]]></content:encoded>
            <category>AI Letters</category>
            <category>LLM</category>
            <category>RAG</category>
            <category>Benchmarks</category>
        </item>
        <item>
            <title><![CDATA[AI Letters #22 - Conversation Memory in RAG: One Param vs Forty Lines of Boilerplate]]></title>
            <link>https://engineersofai.com/blog/ai-letters-22-conversation-memory</link>
            <guid>https://engineersofai.com/blog/ai-letters-22-conversation-memory</guid>
            <pubDate>Thu, 09 Apr 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[We wired multi-turn conversation memory into RAG pipelines across SynapseKit, LangChain, and LlamaIndex. The LoC gap is wider than any previous benchmark. The persistence and window-strategy differences are what matter in production.]]></description>
            <content:encoded><![CDATA[<blockquote>
<p>RAG gives the model context from documents. Memory gives it context from the conversation. Without both, your chatbot doesn't know what it just said.</p>
</blockquote>
<p>Every RAG system eventually faces the same question: what happens on the second turn? The user asks a follow-up. "What did you mean by that?" "Can you give me an example?" "How does that compare to what you said earlier?" Without memory, the model treats each question as the first. Context from the previous turn is gone. The answer it gives to the follow-up is either wrong, generic, or disconnected from what came before.</p>
<p>Conversation memory is the fix. A buffer of past exchanges gets prepended to the retrieved context and injected into the prompt. The model now has the document context and the conversation context. It can use both. The question is how much it costs to add this to your pipeline - and what happens when the conversation gets long enough that you have to start dropping old messages.</p>
<p>We wired identical multi-turn memory into RAG pipelines across SynapseKit 1.4, LangChain 1.2, and LlamaIndex Core 0.14. Same conversation, same task, same question: how many lines does it take to add memory, and what happens at the edge cases? The LoC gap is the widest of any benchmark in this series. The persistence and window-strategy differences are what will matter in your production system.</p>
<!-- -->
<div style="display:flex;flex-direction:column;gap:16px;margin:24px 0">
<a href="https://engineersofai.com/blog-visuals/ai-letters-22/timeline.html" target="_blank" rel="noopener noreferrer" style="text-decoration:none" class="">
<div style="border:1px solid #e2e8f0;border-radius:12px;padding:20px 24px;background:#fff;cursor:pointer">
<div style="font-size:0.72rem;font-weight:700;letter-spacing:0.08em;text-transform:uppercase;color:#6366f1;margin-bottom:6px">Interactive Chart</div>
<div style="font-size:1rem;font-weight:700;color:#0f172a;margin-bottom:6px">LoC Across All 12 Benchmarks →</div>
<div style="font-size:0.88rem;color:#475569;line-height:1.6">Cumulative lines of code per framework from hello world to conversation memory. See where each framework has built its lead.</div>
</div>
</a>
<a href="https://engineersofai.com/blog-visuals/ai-letters-22/paradigms.html" target="_blank" rel="noopener noreferrer" style="text-decoration:none" class="">
<div style="border:1px solid #e2e8f0;border-radius:12px;padding:20px 24px;background:#fff;cursor:pointer">
<div style="font-size:0.72rem;font-weight:700;letter-spacing:0.08em;text-transform:uppercase;color:#0ea5e9;margin-bottom:6px">Code Explorer</div>
<div style="font-size:1rem;font-weight:700;color:#0f172a;margin-bottom:6px">Memory Pipeline Code by Framework →</div>
<div style="font-size:0.88rem;color:#475569;line-height:1.6">Full multi-turn memory RAG code side by side - one param vs session stores vs token buffers, annotated for each framework.</div>
</div>
</a>
<a href="https://engineersofai.com/blog-visuals/ai-letters-22/evidence.html" target="_blank" rel="noopener noreferrer" style="text-decoration:none" class="">
<div style="border:1px solid #e2e8f0;border-radius:12px;padding:20px 24px;background:#fff;cursor:pointer">
<div style="font-size:0.72rem;font-weight:700;letter-spacing:0.08em;text-transform:uppercase;color:#10b981;margin-bottom:6px">Data</div>
<div style="font-size:1rem;font-weight:700;color:#0f172a;margin-bottom:6px">Message Retention vs Window Size + Feature Matrix →</div>
<div style="font-size:0.88rem;color:#475569;line-height:1.6">How many messages each framework retains at different window sizes, plus the full memory API feature comparison across all three.</div>
</div>
</a>
</div>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="what-we-measured">What We Measured<a href="https://engineersofai.com/blog/ai-letters-22-conversation-memory#what-we-measured" class="hash-link" aria-label="Direct link to What We Measured" title="Direct link to What We Measured" translate="no">​</a></h2>
<p><strong>Task:</strong> Build a multi-turn RAG pipeline with conversation memory. Run 5 turns of questions. Measure lines of code, window strategy, and message retention at different window sizes.</p>





















<table><thead><tr><th>Metric</th><th>What it captures</th></tr></thead><tbody><tr><td>Lines of code</td><td>Code to add multi-turn memory to an existing RAG pipeline</td></tr><tr><td>Window strategy</td><td>How old messages get dropped - turn count vs token limit</td></tr><tr><td>Message retention</td><td>Messages kept after 5 turns at window sizes 1, 2, 3, 5</td></tr></tbody></table>
<p><strong>Frameworks:</strong> SynapseKit 1.4, LangChain 1.2, LlamaIndex Core 0.14. Kaggle CPU.
<em>Disclosure: I'm the author of SynapseKit. All code is on Kaggle - fork and run yourself.</em></p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-code">The Code<a href="https://engineersofai.com/blog/ai-letters-22-conversation-memory#the-code" class="hash-link" aria-label="Direct link to The Code" title="Direct link to The Code" translate="no">​</a></h2>
<p><strong>SynapseKit - 1 constructor argument:</strong></p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> synapsekit </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> RAG</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">rag </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> RAG</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">model</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"gpt-4o-mini"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> api_key</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">KEY</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> memory_window</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">5</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">await</span><span class="token plain"> rag</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">add_documents</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">DOCS</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">r1 </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">await</span><span class="token plain"> rag</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">ask</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"What is RAG?"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">r2 </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">await</span><span class="token plain"> rag</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">ask</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"How does it improve accuracy?"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">r3 </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">await</span><span class="token plain"> rag</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">ask</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"Which retrieval method is fastest?"</span><span class="token punctuation" style="color:#393A34">)</span><br></div></code></pre></div></div>
<p>Memory is a single parameter on the <code>RAG</code> constructor. <code>memory_window=5</code> keeps the last 5 turns. Every subsequent <code>.ask()</code> call automatically prepends the conversation history to the retrieved context. Zero additional setup. The tradeoff: in-memory only, no persistence across sessions.</p>
<p><strong>LangChain - session store + getter + LCEL wiring:</strong></p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> langchain_community</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">retrievers </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> BM25Retriever</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> langchain_openai </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> ChatOpenAI</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> langchain_core</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">prompts </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> ChatPromptTemplate</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> MessagesPlaceholder</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> langchain_core</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">runnables</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">history </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> RunnableWithMessageHistory</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> langchain_core</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">chat_history </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> InMemoryChatMessageHistory</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">store </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">get_session_history</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">session_id</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token builtin">str</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">-</span><span class="token operator" style="color:#393A34">&gt;</span><span class="token plain"> InMemoryChatMessageHistory</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> session_id </span><span class="token keyword" style="color:#00009f">not</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> store</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">        store</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">session_id</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> InMemoryChatMessageHistory</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> store</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">session_id</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">retriever </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> BM25Retriever</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">from_texts</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">DOCS</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> k</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">3</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">prompt </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> ChatPromptTemplate</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">from_messages</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"system"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"Context: {ctx}"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    MessagesPlaceholder</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"history"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"human"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"{question}"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">chain </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">"ctx"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> retriever</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"question"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> RunnablePassthrough</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">|</span><span class="token plain"> prompt </span><span class="token operator" style="color:#393A34">|</span><span class="token plain"> ChatOpenAI</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">chain_with_history </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> RunnableWithMessageHistory</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    chain</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> get_session_history</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    input_messages_key</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"question"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> history_messages_key</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"history"</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">r1 </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> chain_with_history</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">invoke</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">"question"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"What is RAG?"</span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> config</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">"configurable"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">"session_id"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"s1"</span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">)</span><br></div></code></pre></div></div>
<p><code>RunnableWithMessageHistory</code> is the canonical LangChain pattern. You define a session store (here in-memory, but can be Redis/DynamoDB/Postgres), a getter function, and wire it around your chain. Twelve lines before a single question is asked. The payoff: swap <code>InMemoryChatMessageHistory</code> for <code>RedisChatMessageHistory</code> and you have persistent multi-user memory with no other changes.</p>
<p><strong>LlamaIndex - token-budget buffer on the chat engine:</strong></p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> llama_index</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">core </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> Document</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> VectorStoreIndex</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> Settings</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> llama_index</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">core</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">memory </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> ChatMemoryBuffer</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> llama_index</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">llms</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">openai </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> OpenAI</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Settings</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">llm </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> OpenAI</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">model</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"gpt-4o-mini"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">index  </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> VectorStoreIndex</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">from_documents</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">Document</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">text</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">d</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> d </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> DOCS</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">memory </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> ChatMemoryBuffer</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">from_defaults</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">token_limit</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">1500</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">engine </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> index</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">as_chat_engine</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">memory</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">memory</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> chat_mode</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"context"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">r1 </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> engine</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">chat</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"What is RAG?"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">r2 </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> engine</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">chat</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"How does it improve accuracy?"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">r3 </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> engine</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">chat</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"Which retrieval method is fastest?"</span><span class="token punctuation" style="color:#393A34">)</span><br></div></code></pre></div></div>
<p><code>ChatMemoryBuffer</code> takes a <code>token_limit</code> instead of a turn count. The engine drops old messages when the buffer exceeds the limit. Clean API - comparable conciseness to SynapseKit at the chat engine level. Can serialize to <code>SimpleChatStore</code> for lightweight persistence.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-numbers">The Numbers<a href="https://engineersofai.com/blog/ai-letters-22-conversation-memory#the-numbers" class="hash-link" aria-label="Direct link to The Numbers" title="Direct link to The Numbers" translate="no">​</a></h2>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">Framework    Imports   Functional   Total</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">──────────────────────────────────────────</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">SynapseKit       1           5         6</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">LlamaIndex       3           6         9</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">LangChain        5          12        17</span><br></div></code></pre></div></div>
<p>This is the widest LoC gap in the series. LangChain's session store pattern adds 5 lines of boilerplate before the chain is even built - the getter function, the store dict, and the <code>RunnableWithMessageHistory</code> wrapper. That boilerplate is the price of flexibility. You get pluggable backends. SynapseKit gives you the same result in one argument, but you're locked to in-memory.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="window-strategy-the-detail-that-matters">Window Strategy: The Detail That Matters<a href="https://engineersofai.com/blog/ai-letters-22-conversation-memory#window-strategy-the-detail-that-matters" class="hash-link" aria-label="Direct link to Window Strategy: The Detail That Matters" title="Direct link to Window Strategy: The Detail That Matters" translate="no">​</a></h2>
<p>Both frameworks drop old messages when the window fills up. The question is what "window" means:</p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">Framework     Strategy              Reasoning unit   Control</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">──────────────────────────────────────────────────────────────</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">SynapseKit    Sliding window        Turns            memory_window=N</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">LangChain     Store all; trim       Turns (manual)   slice last N*2</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">LlamaIndex    Token budget          Tokens           token_limit=N</span><br></div></code></pre></div></div>
<p><strong>Turn-count windows</strong> (SynapseKit, LangChain) are easy to reason about: "keep the last 3 exchanges." The problem is that turns vary wildly in length. A 3-turn window might be 200 tokens or 2,000 tokens depending on the conversation. At scale, that variance creates unpredictable prompt sizes.</p>
<p><strong>Token-limit windows</strong> (LlamaIndex) are harder to reason about - "keep 1,500 tokens of history" doesn't tell you how many turns that is. But they're more predictable in terms of prompt size, which is what actually matters for LLM API cost and latency. You know exactly how much context you're sending.</p>
<p>Message retention after 5 turns at different window sizes:</p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">Window    SynapseKit   LangChain   LlamaIndex</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">─────────────────────────────────────────────</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">w=1          2 msg       2 msg      ~2 msg</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">w=2          4 msg       4 msg      ~4 msg</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">w=3          6 msg       6 msg      ~6 msg</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">w=5         10 msg      10 msg     10 msg</span><br></div></code></pre></div></div>
<p>At equivalent settings, all three retain the same number of messages. The difference surfaces when conversations are long and token-dense - LlamaIndex starts dropping earlier than a turn-count window of the same number.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="persistence-where-they-truly-split">Persistence: Where They Truly Split<a href="https://engineersofai.com/blog/ai-letters-22-conversation-memory#persistence-where-they-truly-split" class="hash-link" aria-label="Direct link to Persistence: Where They Truly Split" title="Direct link to Persistence: Where They Truly Split" translate="no">​</a></h2>



























































<table><thead><tr><th>Feature</th><th>SynapseKit</th><th>LangChain</th><th>LlamaIndex</th></tr></thead><tbody><tr><td>In-memory</td><td>Yes</td><td>Yes</td><td>Yes</td></tr><tr><td>Redis</td><td>No</td><td>Yes</td><td>No</td></tr><tr><td>DynamoDB</td><td>No</td><td>Yes</td><td>No</td></tr><tr><td>Postgres</td><td>No</td><td>Yes</td><td>No</td></tr><tr><td>JSON file</td><td>No</td><td>Yes</td><td>Yes (SimpleChatStore)</td></tr><tr><td>Custom backend</td><td>No</td><td>Yes</td><td>Partial</td></tr><tr><td><code>clear()</code></td><td>Yes</td><td>Yes</td><td>Yes</td></tr><tr><td>Format to string</td><td>Yes</td><td>Yes</td><td>Yes</td></tr></tbody></table>
<p>LangChain's persistence ecosystem is the clear winner. Swap one import and your session store moves from in-memory to Redis. This is the critical path for any multi-user production app - users expect their conversation to persist across sessions, across devices, across server restarts.</p>
<p>SynapseKit's in-memory limitation is the one place where its simplicity becomes a real constraint. For a single-user, single-session chatbot, it's fine. For a production app with multiple users, you'll either fork the memory implementation or migrate to LangChain for this layer.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="what-this-means-for-engineers">What This Means for Engineers<a href="https://engineersofai.com/blog/ai-letters-22-conversation-memory#what-this-means-for-engineers" class="hash-link" aria-label="Direct link to What This Means for Engineers" title="Direct link to What This Means for Engineers" translate="no">​</a></h2>
<ol>
<li class="">
<p><strong>Don't build your own memory layer.</strong> All three frameworks provide one. Rolling your own conversation buffer means reinventing trimming logic, format conversion, and history injection - work that's already done for you.</p>
</li>
<li class="">
<p><strong>Choose turn-count windows for simple apps, token-budget windows for production.</strong> Turn count is easy to explain to stakeholders. Token budget is what keeps your API costs predictable at scale. If you're serving real users, measure the token distribution of your turns before deciding.</p>
</li>
<li class="">
<p><strong>LangChain's <code>RunnableWithMessageHistory</code> is boilerplate, but it's good boilerplate.</strong> The session getter pattern decouples your chain from the storage backend. When you move to Redis in production, you change one line. That's worth 7 extra lines at setup time.</p>
</li>
<li class="">
<p><strong>LlamaIndex's <code>chat_engine</code> is the fastest path to a working multi-turn RAG demo.</strong> Two lines - memory and engine. If you're building a prototype or an internal tool where persistence doesn't matter, this is the fastest start.</p>
</li>
<li class="">
<p><strong>Memory and RAG interact in ways that will surprise you.</strong> When the retrieved context changes and the memory context contradicts it, the model has to reconcile them. This creates subtle failures - confident-sounding answers that combine stale memory context with fresh document context incorrectly. Test multi-turn RAG with contradictory document updates before shipping.</p>
</li>
</ol>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-corollary-most-people-miss">The Corollary Most People Miss<a href="https://engineersofai.com/blog/ai-letters-22-conversation-memory#the-corollary-most-people-miss" class="hash-link" aria-label="Direct link to The Corollary Most People Miss" title="Direct link to The Corollary Most People Miss" translate="no">​</a></h2>
<p>The memory problem compounds. A single-turn RAG pipeline has one context window to manage: the retrieved documents. A multi-turn RAG pipeline has two: the documents and the conversation history. They compete for the same token budget.</p>
<p>Most teams add memory and don't adjust their retrieval budget. The result: the total context grows until it hits the model's context limit and something gets truncated - usually silently. The retrieved documents get cut first because they're appended after the history. The model starts answering from memory rather than documents. Retrieval quality degrades. Nobody notices because the answers still sound coherent.</p>
<p>The fix is explicit: set <code>max_tokens_for_context = total_budget - memory_tokens - system_prompt_tokens</code> and cap your retriever's <code>top_k</code> accordingly. None of the three frameworks do this automatically.</p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">Context budget allocation (simplified):</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">────────────────────────────────────────────────</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Total context window         128,000 tokens</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">System prompt                ~500 tokens</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Conversation memory          ~2,000 tokens  (10 turns × ~200 tokens/turn)</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Retrieved documents          ~4,000 tokens  (top-5 chunks × ~800 tokens)</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">LLM response budget          ~2,000 tokens</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">────────────────────────────────────────────────</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Remaining buffer             119,500 tokens</span><br></div></code></pre></div></div>
<p>Do the maths before you hit the limit, not after.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="three-things-worth-doing-this-week">Three Things Worth Doing This Week<a href="https://engineersofai.com/blog/ai-letters-22-conversation-memory#three-things-worth-doing-this-week" class="hash-link" aria-label="Direct link to Three Things Worth Doing This Week" title="Direct link to Three Things Worth Doing This Week" translate="no">​</a></h2>
<ol>
<li class="">
<p><strong>Add <code>memory_window</code> or <code>token_limit</code> to your RAG pipeline today.</strong> If you're building a chat interface on top of RAG and not passing history into the prompt, every follow-up question is being answered in isolation. That's a worse user experience than a basic chatbot.</p>
</li>
<li class="">
<p><strong>Measure your average conversation length in tokens.</strong> Pull a sample of real conversations, tokenize them, and see what percentile hits 1,500 tokens. That's your <code>token_limit</code> starting point. A turn-count window of 5 in a technical conversation can hit 3,000 tokens easily.</p>
</li>
<li class="">
<p><strong>Read the Kaggle notebook.</strong> Full code, retention tables at different window sizes, and the live demo: <a href="https://www.kaggle.com/code/misternautiyal/llm-showdown-13-conversation-memory" target="_blank" rel="noopener noreferrer" class="">LLM Showdown #13 - Conversation Memory in RAG</a></p>
</li>
</ol>
<hr>
<p>Memory is the difference between a search engine with an LLM frontend and an actual conversational AI. The frameworks all provide it. The split is in how they drop old messages and whether they persist across sessions. One approach gives you a single argument and no persistence. One gives you a token budget and lightweight JSON persistence. One gives you full production backends at the cost of boilerplate. Pick the one that matches where your app needs to be in six months, not where it is today.</p>
<p><em>Engineers of AI</em></p>
<p><strong>Read more: <a href="https://www.engineersofai.com/" target="_blank" rel="noopener noreferrer" class="">www.engineersofai.com</a></strong></p>
<p><em>If this was useful, forward it to one engineer who should be reading it.</em></p>]]></content:encoded>
            <category>AI Letters</category>
            <category>LLM Engineering</category>
            <category>RAG</category>
            <category>Benchmarks</category>
        </item>
        <item>
            <title><![CDATA[AI Letters #21 - Streaming RAG: Time to First Token Across Three Frameworks]]></title>
            <link>https://engineersofai.com/blog/ai-letters-21-streaming-rag-ttft</link>
            <guid>https://engineersofai.com/blog/ai-letters-21-streaming-rag-ttft</guid>
            <pubDate>Tue, 07 Apr 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[We stripped out network latency with a mock LLM and measured the pure framework overhead between calling .stream() and getting the first token. All three frameworks cleared sub-millisecond. The interesting difference is the API surface.]]></description>
            <content:encoded><![CDATA[<blockquote>
<p>When users wait for an LLM, the number that matters is time-to-first-token, not total time. 200ms to first token feels instant. 2 seconds to first token feels broken - even if the full answer arrives faster.</p>
</blockquote>
<p>Every LLM UI eventually learns the same lesson. Users don't measure latency the way your dashboard does. They don't care about tokens-per-second, p99 tail latency, or median completion time. They care about one thing: how long until <em>something</em> appears on screen. That number is TTFT - time to first token - and it dominates perceived performance more than any other metric in LLM serving.</p>
<p>The catch is that when you're building a streaming RAG pipeline, the framework itself sits between your <code>.stream()</code> call and the first token your user sees. Every <code>async for</code>, every LCEL graph traversal, every callback dispatch adds latency before a single character leaves the server. In production that overhead is invisible because network latency to OpenAI or Anthropic is 100–1000x larger. But strip out the network with a mock LLM and you can finally see what the framework itself costs you.</p>
<p>We built identical streaming RAG pipelines across SynapseKit 1.4, LangChain 1.2, and LlamaIndex Core 0.14. Same documents, same query, same mock LLM that yields the exact same token list with zero network latency. The result: all three clear the sub-millisecond bar comfortably. Nobody loses on the number. The interesting split is elsewhere - in the shape of the streaming API itself.</p>
<!-- -->
<div style="display:flex;flex-direction:column;gap:16px;margin:24px 0">
<a href="https://engineersofai.com/blog-visuals/ai-letters-21/timeline.html" target="_blank" rel="noopener noreferrer" style="text-decoration:none" class="">
<div style="border:1px solid #e2e8f0;border-radius:12px;padding:20px 24px;background:#fff;cursor:pointer">
<div style="font-size:0.72rem;font-weight:700;letter-spacing:0.08em;text-transform:uppercase;color:#6366f1;margin-bottom:6px">Interactive Chart</div>
<div style="font-size:1rem;font-weight:700;color:#0f172a;margin-bottom:6px">TTFT vs Network Latency →</div>
<div style="font-size:0.88rem;color:#475569;line-height:1.6">How framework overhead compares to real network latency from OpenAI, Anthropic, and a local model. The framework is a rounding error - until it isn't.</div>
</div>
</a>
<a href="https://engineersofai.com/blog-visuals/ai-letters-21/paradigms.html" target="_blank" rel="noopener noreferrer" style="text-decoration:none" class="">
<div style="border:1px solid #e2e8f0;border-radius:12px;padding:20px 24px;background:#fff;cursor:pointer">
<div style="font-size:0.72rem;font-weight:700;letter-spacing:0.08em;text-transform:uppercase;color:#0ea5e9;margin-bottom:6px">Code Explorer</div>
<div style="font-size:1rem;font-weight:700;color:#0f172a;margin-bottom:6px">Streaming RAG Code by Framework →</div>
<div style="font-size:0.88rem;color:#475569;line-height:1.6">Full streaming pipeline code side by side - imports, setup, and the `.stream()` consumption pattern annotated for each framework.</div>
</div>
</a>
<a href="https://engineersofai.com/blog-visuals/ai-letters-21/evidence.html" target="_blank" rel="noopener noreferrer" style="text-decoration:none" class="">
<div style="border:1px solid #e2e8f0;border-radius:12px;padding:20px 24px;background:#fff;cursor:pointer">
<div style="font-size:0.72rem;font-weight:700;letter-spacing:0.08em;text-transform:uppercase;color:#10b981;margin-bottom:6px">Data</div>
<div style="font-size:1rem;font-weight:700;color:#0f172a;margin-bottom:6px">TTFT Distribution + API Surface Matrix →</div>
<div style="font-size:0.88rem;color:#475569;line-height:1.6">Median TTFT, p99 tail, sync vs async support, and callback availability - the full scorecard across all three frameworks.</div>
</div>
</a>
</div>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="what-we-measured">What We Measured<a href="https://engineersofai.com/blog/ai-letters-21-streaming-rag-ttft#what-we-measured" class="hash-link" aria-label="Direct link to What We Measured" title="Direct link to What We Measured" translate="no">​</a></h2>
<p><strong>Task:</strong> Build a streaming RAG pipeline (BM25 retrieval + LLM stream). Feed the retrieved context into an LLM that streams tokens. Measure the latency from calling <code>.stream()</code> to receiving the first token.</p>





















<table><thead><tr><th>Metric</th><th>What it captures</th></tr></thead><tbody><tr><td>Lines of code</td><td>Code to wire up a streaming RAG pipeline</td></tr><tr><td>TTFT (median)</td><td>Pure framework overhead with a zero-latency mock LLM</td></tr><tr><td>Streaming API surface</td><td>Sync vs async, generator vs callback, on-RAG vs on-LLM</td></tr></tbody></table>
<p><strong>Why a mock LLM:</strong> real LLM APIs add 100–2000ms of network and provider latency. That swamps any framework difference. Strip it out and the framework overhead finally becomes visible - the part you can actually optimise.</p>
<p><strong>Frameworks:</strong> SynapseKit 1.4, LangChain 1.2, LlamaIndex Core 0.14. Kaggle CPU. 50 reps per framework.
<em>Disclosure: I'm the author of SynapseKit. All code is on Kaggle - fork and run yourself.</em></p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-code">The Code<a href="https://engineersofai.com/blog/ai-letters-21-streaming-rag-ttft#the-code" class="hash-link" aria-label="Direct link to The Code" title="Direct link to The Code" translate="no">​</a></h2>
<p><strong>SynapseKit - async generator on the RAG object itself:</strong></p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> synapsekit </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> RAG</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">rag </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> RAG</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">model</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"gpt-4o-mini"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> api_key</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">KEY</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> provider</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"openai"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">await</span><span class="token plain"> rag</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">add_documents</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">DOCS</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">async</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> token </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> rag</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">stream</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">QUERY</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">print</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">token</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> end</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">""</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> flush</span><span class="token operator" style="color:#393A34">=</span><span class="token boolean" style="color:#36acaa">True</span><span class="token punctuation" style="color:#393A34">)</span><br></div></code></pre></div></div>
<p><code>rag.stream(query)</code> is a single method call that streams the full RAG pipeline - retrieve, construct prompt, call LLM, yield tokens. No chain composition, no graph construction. Async-only.</p>
<p><strong>LangChain - LCEL chain with <code>.stream()</code>:</strong></p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> langchain_community</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">retrievers </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> BM25Retriever</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> langchain_openai </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> ChatOpenAI</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> langchain_core</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">prompts </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> ChatPromptTemplate</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> langchain_core</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">runnables </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> RunnablePassthrough</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">retriever </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> BM25Retriever</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">from_texts</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">DOCS</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> k</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">3</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">prompt    </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> ChatPromptTemplate</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">from_template</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"Context: {ctx}\n\nQ: {q}"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">llm       </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> ChatOpenAI</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">model</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"gpt-4o-mini"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> streaming</span><span class="token operator" style="color:#393A34">=</span><span class="token boolean" style="color:#36acaa">True</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">chain     </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">"ctx"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> retriever</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"q"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> RunnablePassthrough</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">|</span><span class="token plain"> prompt </span><span class="token operator" style="color:#393A34">|</span><span class="token plain"> llm</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> chunk </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> chain</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">stream</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">QUERY</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">print</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">chunk</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">content</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> end</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">""</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> flush</span><span class="token operator" style="color:#393A34">=</span><span class="token boolean" style="color:#36acaa">True</span><span class="token punctuation" style="color:#393A34">)</span><br></div></code></pre></div></div>
<p>LCEL composition makes every step explicit and swappable. More imports, more ceremony, but you can yank out the retriever or add a reranker without touching the stream call. Both sync (<code>.stream</code>) and async (<code>.astream</code>) are native.</p>
<p><strong>LlamaIndex - <code>query_engine(streaming=True)</code>:</strong></p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> llama_index</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">core </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> Document</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> VectorStoreIndex</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> Settings</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> llama_index</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">llms</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">openai </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> OpenAI</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Settings</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">llm </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> OpenAI</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">model</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"gpt-4o-mini"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">index  </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> VectorStoreIndex</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">from_documents</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">Document</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">text</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">d</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> d </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> DOCS</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">engine </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> index</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">as_query_engine</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">streaming</span><span class="token operator" style="color:#393A34">=</span><span class="token boolean" style="color:#36acaa">True</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">response </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> engine</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">query</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">QUERY</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> chunk </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> response</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">response_gen</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">print</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">chunk</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> end</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">""</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> flush</span><span class="token operator" style="color:#393A34">=</span><span class="token boolean" style="color:#36acaa">True</span><span class="token punctuation" style="color:#393A34">)</span><br></div></code></pre></div></div>
<p>One flag flip (<code>streaming=True</code>) turns the query engine into a streaming generator. Clean surface. No native async stream on the query engine - you'd wrap it yourself or reach for the lower-level async APIs.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-numbers">The Numbers<a href="https://engineersofai.com/blog/ai-letters-21-streaming-rag-ttft#the-numbers" class="hash-link" aria-label="Direct link to The Numbers" title="Direct link to The Numbers" translate="no">​</a></h2>
<p>With a mock LLM that yields the same token list at zero network latency, we ran 50 TTFT measurements per framework:</p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">Framework    Median TTFT   p99 TTFT   API shape</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">────────────────────────────────────────────────────────────</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">SynapseKit      0.08 ms      0.15 ms   async generator</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">LangChain       0.12 ms      0.21 ms   sync generator</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">LlamaIndex      0.14 ms      0.26 ms   sync generator</span><br></div></code></pre></div></div>
<p>All three land in the sub-millisecond zone. The framework overhead itself is effectively free. At this resolution the numbers are noise. If you're choosing a framework to optimise TTFT, you're optimising the wrong thing - put your effort into prompt caching, smaller context windows, provider selection, and serving infrastructure. That's where the real milliseconds live.</p>
<p>For reference, here's what actually dominates TTFT in production:</p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">Component                 Typical latency</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">────────────────────────────────────────────</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Framework overhead        &lt; 1 ms</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Embedding lookup          5–20 ms</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">BM25 retrieval            10–50 ms</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Network to LLM provider   80–200 ms</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">LLM first token           150–600 ms</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">────────────────────────────────────────────</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Total TTFT                250 ms – 1 s</span><br></div></code></pre></div></div>
<p>The framework is a rounding error. A 0.08ms vs 0.14ms difference cannot be measured in production - it vanishes into jitter.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-api-surface-split">The API Surface Split<a href="https://engineersofai.com/blog/ai-letters-21-streaming-rag-ttft#the-api-surface-split" class="hash-link" aria-label="Direct link to The API Surface Split" title="Direct link to The API Surface Split" translate="no">​</a></h2>
<p>This is where the frameworks actually diverge. When you're writing real code, the shape of the streaming API matters more than its latency.</p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">Feature                 SynapseKit    LangChain    LlamaIndex</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">──────────────────────────────────────────────────────────────</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Primary API             async gen     sync + async  sync gen</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Sync support            No            Yes           Yes</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Native async on RAG     Yes           Yes           No</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Callback handlers       No            Yes           Yes (mgr)</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Stream on RAG object    Yes           Yes (LCEL)    Yes (flag)</span><br></div></code></pre></div></div>
<p><strong>SynapseKit is async-only.</strong> There is no <code>.stream()</code> on a sync path. If your codebase runs in Flask, Django sync views, or a Jupyter notebook without an event loop, every call site needs <code>asyncio.run()</code> or you need to restructure around async. That's a migration, not a drop-in.</p>
<p><strong>LangChain is the most flexible.</strong> <code>chain.stream()</code> for sync, <code>chain.astream()</code> for async, plus a callback handler ecosystem (<code>StreamingStdOutCallbackHandler</code>, <code>AsyncIteratorCallbackHandler</code>) for every framework integration you might need. If you're building a Streamlit app, a CLI tool, and an async FastAPI endpoint from the same chain, this is the path.</p>
<p><strong>LlamaIndex sits in the middle.</strong> Native sync generators (<code>response.response_gen</code>) are easy to consume. The async story is weaker - the query engine doesn't expose a clean async stream by default. You reach for lower-level LLM APIs or wrap the sync generator in a thread.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="what-this-means-for-engineers">What This Means for Engineers<a href="https://engineersofai.com/blog/ai-letters-21-streaming-rag-ttft#what-this-means-for-engineers" class="hash-link" aria-label="Direct link to What This Means for Engineers" title="Direct link to What This Means for Engineers" translate="no">​</a></h2>
<ol>
<li class="">
<p><strong>Stop optimising framework TTFT overhead.</strong> At sub-millisecond, it's below the noise floor of every real LLM deployment. The TTFT you see in your dashboard is 99%+ network and provider latency. Focus there.</p>
</li>
<li class="">
<p><strong>Match the streaming API to your runtime.</strong> If your app is async (FastAPI, async workers, LangGraph): SynapseKit and LangChain <code>.astream()</code> are both clean. If your app is sync (Flask, Django sync views, Jupyter, a CLI): LangChain <code>.stream()</code> or LlamaIndex's <code>response_gen</code> let you avoid restructuring.</p>
</li>
<li class="">
<p><strong>Use callbacks for UI binding, generators for pipelines.</strong> LangChain's callback handler pattern is the cleanest path for tying stream output into progress bars, partial rendering, and multi-consumer fan-out. For a one-consumer pipeline, a generator is simpler.</p>
</li>
<li class="">
<p><strong>Stream from the RAG object, not the LLM.</strong> All three frameworks can stream from the top-level RAG call (SynapseKit <code>rag.stream</code>, LangChain LCEL chain, LlamaIndex <code>query_engine(streaming=True)</code>). Don't roll your own retrieve + LLM stream loop - you'll reimplement the prompt construction wrong.</p>
</li>
<li class="">
<p><strong>Measure TTFT end-to-end, not in isolation.</strong> The real number includes retrieval time, prompt build, network round-trip, and the provider's own time-to-first-token. That's the number your users experience. Framework overhead disappears into it.</p>
</li>
</ol>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-corollary-most-people-miss">The Corollary Most People Miss<a href="https://engineersofai.com/blog/ai-letters-21-streaming-rag-ttft#the-corollary-most-people-miss" class="hash-link" aria-label="Direct link to The Corollary Most People Miss" title="Direct link to The Corollary Most People Miss" translate="no">​</a></h2>
<p>TTFT is not the only perception metric. Inter-token latency - the jitter between the 2nd, 10th, and 100th tokens - matters almost as much. A stream that arrives in steady 15ms bursts feels smooth. A stream that arrives in a burst, stalls for 200ms, then bursts again feels broken. And inter-token latency is where framework buffering, callback dispatch, and LCEL graph traversal actually <em>can</em> start to matter at production volumes.</p>
<p>None of these frameworks add visible buffering on a mock LLM. But layer in a callback chain, a streaming response wrapper, and a server-sent-events encoder on top, and you can build a pipeline that adds 10–20ms of buffering per token. That's the part you have to profile yourself - and the part no benchmark in this series will catch for you.</p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">Perception metric           What the user feels</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">──────────────────────────────────────────────────</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">TTFT                        Did anything happen?</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Inter-token latency         Is it flowing or stalling?</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Total time                  Was it fast enough to use?</span><br></div></code></pre></div></div>
<p>You optimise all three in different ways. Framework choice affects the first two slightly and the third not at all.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="three-things-worth-doing-this-week">Three Things Worth Doing This Week<a href="https://engineersofai.com/blog/ai-letters-21-streaming-rag-ttft#three-things-worth-doing-this-week" class="hash-link" aria-label="Direct link to Three Things Worth Doing This Week" title="Direct link to Three Things Worth Doing This Week" translate="no">​</a></h2>
<ol>
<li class="">
<p><strong>Instrument TTFT in your production RAG.</strong> Log the three numbers that matter: retrieval latency, prompt-build latency, and time-to-first-token from the LLM. If any one is above 300ms, that's where the work is - not in the framework.</p>
</li>
<li class="">
<p><strong>Switch from <code>.stream()</code> to <code>.astream()</code> if you're on an async stack.</strong> Sync <code>.stream()</code> inside an async handler blocks the event loop. Most teams accidentally run sync streams in async contexts because it was easier to paste the tutorial code.</p>
</li>
<li class="">
<p><strong>Read the Kaggle notebook.</strong> Full reproducible code, mock LLM implementations for each framework, 50-run TTFT distributions: <a href="https://www.kaggle.com/code/misternautiyal/llm-showdown-12-streaming-rag-ttft" target="_blank" rel="noopener noreferrer" class="">LLM Showdown #12 - Streaming RAG TTFT</a></p>
</li>
</ol>
<hr>
<p>Streaming is the default UX for every modern LLM product. The frameworks all do it. None of them are meaningfully slower than the others. The real question is whether your stream fits your runtime - async or sync, generator or callback, on the RAG or on the LLM. Pick the shape that matches your code, not the one with the lowest microsecond count on a benchmark that doesn't include the network.</p>
<p><em>Engineers of AI</em></p>
<p><strong>Read more: <a href="https://www.engineersofai.com/" target="_blank" rel="noopener noreferrer" class="">www.engineersofai.com</a></strong></p>
<p><em>If this was useful, forward it to one engineer who should be reading it.</em></p>]]></content:encoded>
            <category>AI Letters</category>
            <category>LLM Engineering</category>
            <category>RAG</category>
            <category>Benchmarks</category>
        </item>
        <item>
            <title><![CDATA[AI Letters #20 - Hybrid Search: RRF Fusion Across Three Frameworks]]></title>
            <link>https://engineersofai.com/blog/ai-letters-20-hybrid-search</link>
            <guid>https://engineersofai.com/blog/ai-letters-20-hybrid-search</guid>
            <pubDate>Mon, 06 Apr 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[We ran identical hybrid BM25 + vector search pipelines through SynapseKit, LangChain, and LlamaIndex. The LoC difference is small. The RRF configurability difference is not.]]></description>
            <content:encoded><![CDATA[<blockquote>
<p>Pure vector search misses exact matches. Pure BM25 misses semantics. Hybrid search almost always wins - the question is how much control you get over the fusion.</p>
</blockquote>
<p>Every production RAG system eventually hits the same wall. Vector search retrieves semantically similar documents, but it fails on exact-match queries: model names, version numbers, function names, error codes. The query "GPT-4o" and the document "GPT-4o" don't reliably produce close vectors. BM25 doesn't have this problem. It matches terms, weighs them by rarity, and returns the right document.</p>
<p>Reciprocal Rank Fusion - RRF - is the standard way to combine both. It takes two ranked lists, assigns each document a score of <code>1 / (k + rank)</code>, sums the scores, and re-ranks. The parameter <code>k</code> controls how much the top ranks dominate. It requires no score normalisation, works across retrieval algorithms with incompatible score scales, and runs in microseconds.</p>
<p>We built identical hybrid pipelines across SynapseKit 1.4, LangChain 1.2, and LlamaIndex Core 0.14. Same corpus, same query, same task: BM25 + vector, top-3 via RRF. The LoC gap is smaller than the BM25-only benchmark. The configurability gap is not.</p>
<!-- -->
<div style="display:flex;flex-direction:column;gap:16px;margin:24px 0">
<a href="https://engineersofai.com/blog-visuals/ai-letters-20/timeline.html" target="_blank" rel="noopener noreferrer" style="text-decoration:none" class="">
<div style="border:1px solid #e2e8f0;border-radius:12px;padding:20px 24px;background:#fff;cursor:pointer">
<div style="font-size:0.72rem;font-weight:700;letter-spacing:0.08em;text-transform:uppercase;color:#6366f1;margin-bottom:6px">Interactive Chart</div>
<div style="font-size:1rem;font-weight:700;color:#0f172a;margin-bottom:6px">LoC Across the Series →</div>
<div style="font-size:0.88rem;color:#475569;line-height:1.6">Lines of code per framework across all 11 benchmarks - from hello world to hybrid search. See which framework has compounded its lead.</div>
</div>
</a>
<a href="https://engineersofai.com/blog-visuals/ai-letters-20/paradigms.html" target="_blank" rel="noopener noreferrer" style="text-decoration:none" class="">
<div style="border:1px solid #e2e8f0;border-radius:12px;padding:20px 24px;background:#fff;cursor:pointer">
<div style="font-size:0.72rem;font-weight:700;letter-spacing:0.08em;text-transform:uppercase;color:#0ea5e9;margin-bottom:6px">Code Explorer</div>
<div style="font-size:1rem;font-weight:700;color:#0f172a;margin-bottom:6px">Hybrid Pipeline Code by Framework →</div>
<div style="font-size:0.88rem;color:#475569;line-height:1.6">Full BM25 + vector + RRF pipeline code side by side - imports, setup, and retrieval call annotated for each framework.</div>
</div>
</a>
<a href="https://engineersofai.com/blog-visuals/ai-letters-20/evidence.html" target="_blank" rel="noopener noreferrer" style="text-decoration:none" class="">
<div style="border:1px solid #e2e8f0;border-radius:12px;padding:20px 24px;background:#fff;cursor:pointer">
<div style="font-size:0.72rem;font-weight:700;letter-spacing:0.08em;text-transform:uppercase;color:#10b981;margin-bottom:6px">Data</div>
<div style="font-size:1rem;font-weight:700;color:#0f172a;margin-bottom:6px">LoC, RRF Configurability, and Result Overlap →</div>
<div style="font-size:0.88rem;color:#475569;line-height:1.6">Lines of code breakdown, configurable RRF parameters per framework, and result overlap across frameworks on an identical hybrid query.</div>
</div>
</a>
</div>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="what-we-measured">What We Measured<a href="https://engineersofai.com/blog/ai-letters-20-hybrid-search#what-we-measured" class="hash-link" aria-label="Direct link to What We Measured" title="Direct link to What We Measured" translate="no">​</a></h2>
<p><strong>Task:</strong> Index 5 documents with both BM25 and vector search, run an identical query through each hybrid retriever, return top-3 results via RRF fusion.</p>





















<table><thead><tr><th>Metric</th><th>What it captures</th></tr></thead><tbody><tr><td>Lines of code</td><td>Code to build and query a hybrid BM25 + vector pipeline</td></tr><tr><td>RRF configurability</td><td>Parameters exposed: weights, k, retriever count</td></tr><tr><td>Result agreement</td><td>Overlap in top-3 results across frameworks</td></tr></tbody></table>
<p><strong>Frameworks:</strong> SynapseKit 1.4, LangChain 1.2, LlamaIndex Core 0.14. Kaggle CPU.
<em>Disclosure: I'm the author of SynapseKit. All code is on Kaggle - fork and run yourself.</em></p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-code">The Code<a href="https://engineersofai.com/blog/ai-letters-20-hybrid-search#the-code" class="hash-link" aria-label="Direct link to The Code" title="Direct link to The Code" translate="no">​</a></h2>
<p><strong>SynapseKit - 8 lines (2 imports + 6 functional):</strong></p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> synapsekit</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">retrieval </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> HybridSearchRetriever</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> Retriever</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> InMemoryVectorStore</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> synapsekit</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">embeddings </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> SynapsekitEmbeddings</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">emb    </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> SynapsekitEmbeddings</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">model</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"all-MiniLM-L6-v2"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> use_gpu</span><span class="token operator" style="color:#393A34">=</span><span class="token boolean" style="color:#36acaa">False</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">r      </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> Retriever</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">InMemoryVectorStore</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">emb</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">hybrid </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> HybridSearchRetriever</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">r</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> bm25_weight</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">0.5</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> vector_weight</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">0.5</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> rrf_k</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">60</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">hybrid</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">add_documents</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">DOCS</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">await</span><span class="token plain"> r</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">add</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">DOCS</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">results </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">await</span><span class="token plain"> hybrid</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">retrieve</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">QUERY</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> top_k</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">3</span><span class="token punctuation" style="color:#393A34">)</span><br></div></code></pre></div></div>
<p>A single <code>HybridSearchRetriever</code> class wraps both modes. <code>bm25_weight</code>, <code>vector_weight</code>, and <code>rrf_k</code> are explicit constructor parameters. Limitation: fixed at two retrievers.</p>
<p><strong>LangChain - 11 lines (4 imports + 7 functional):</strong></p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> langchain_classic</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">retrievers</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">ensemble </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> EnsembleRetriever</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> langchain_community</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">retrievers </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> BM25Retriever</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> langchain_core</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">vectorstores </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> InMemoryVectorStore</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> langchain_community</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">embeddings </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> HuggingFaceEmbeddings</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">emb    </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> HuggingFaceEmbeddings</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">model_name</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"all-MiniLM-L6-v2"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">vs     </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> InMemoryVectorStore</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">emb</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">vs</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">add_texts</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">DOCS</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">bm25   </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> BM25Retriever</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">from_texts</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">DOCS</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> k</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">3</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">vec_r  </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> vs</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">as_retriever</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">search_kwargs</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">"k"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">3</span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">hybrid </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> EnsembleRetriever</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">retrievers</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">bm25</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> vec_r</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> weights</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">[</span><span class="token number" style="color:#36acaa">0.5</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">0.5</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">results </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">doc</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">page_content </span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> doc </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> hybrid</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">invoke</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">QUERY</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">]</span><br></div></code></pre></div></div>
<p><code>EnsembleRetriever</code> is compositional: pass a list of any retrievers, a matching <code>weights</code> list. Add a third retriever by appending to both lists.</p>
<p><strong>LlamaIndex - 12 lines (4 imports + 8 functional):</strong></p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> llama_index</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">core</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">retrievers </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> QueryFusionRetriever</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> VectorIndexRetriever</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> llama_index</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">core </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> Document</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> VectorStoreIndex</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> Settings</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> llama_index</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">core</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">node_parser </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> SentenceSplitter</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> llama_index</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">retrievers</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">bm25 </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> BM25Retriever</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Settings</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">llm </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token boolean" style="color:#36acaa">None</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">nodes  </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> SentenceSplitter</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">chunk_size</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">512</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">get_nodes_from_documents</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">             </span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">Document</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">text</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">d</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> d </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> DOCS</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">bm25_r </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> BM25Retriever</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">from_defaults</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">nodes</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">nodes</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> similarity_top_k</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">3</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">vec_r  </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> VectorIndexRetriever</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">index</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">VectorStoreIndex</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">nodes</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> similarity_top_k</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">3</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">fused  </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> QueryFusionRetriever</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">bm25_r</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> vec_r</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> similarity_top_k</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">3</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">                              num_queries</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">1</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> use_async</span><span class="token operator" style="color:#393A34">=</span><span class="token boolean" style="color:#36acaa">False</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">results </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">n</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">text </span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> n </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> fused</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">retrieve</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">QUERY</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">]</span><br></div></code></pre></div></div>
<p>Twelve lines. Node parsing is unavoidable LlamaIndex boilerplate. The RRF k parameter is fixed internally and not exposed.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-numbers">The Numbers<a href="https://engineersofai.com/blog/ai-letters-20-hybrid-search#the-numbers" class="hash-link" aria-label="Direct link to The Numbers" title="Direct link to The Numbers" translate="no">​</a></h2>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">Framework    Imports   Functional   Total</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">──────────────────────────────────────────</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">SynapseKit       2           6         8</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">LangChain        4           7        11</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">LlamaIndex       4           8        12</span><br></div></code></pre></div></div>
<p>The gap is smaller here than in BM25-only (where LangChain won at 3 lines). Hybrid search adds enough setup that the difference compresses. Four lines separate the most concise from the most verbose.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="rrf-configurability">RRF Configurability<a href="https://engineersofai.com/blog/ai-letters-20-hybrid-search#rrf-configurability" class="hash-link" aria-label="Direct link to RRF Configurability" title="Direct link to RRF Configurability" translate="no">​</a></h2>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">Parameter             SynapseKit   LangChain    LlamaIndex</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">────────────────────────────────────────────────────────────</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">BM25 weight           Yes          Yes          No</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Vector weight         Yes          Yes          No</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">RRF k constant        Yes          Yes          No</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Retriever count       2 only       Unlimited    Unlimited</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Async support         Yes          Yes          Yes</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Configurability       4/5          5/5          3/5</span><br></div></code></pre></div></div>
<p>LlamaIndex's <code>QueryFusionRetriever</code> applies equal weighting to all retrievers. There is no <code>weights</code> parameter. If BM25 produces more false positives than vector, you cannot correct for it.</p>
<p>SynapseKit exposes weights and the k constant explicitly. The tradeoff: fixed at two retrievers. You cannot add a sparse retriever or reranker as a third leg.</p>
<p>LangChain is the most flexible. <code>EnsembleRetriever</code> takes <code>weights=[0.3, 0.5, 0.2]</code> for three retrievers. You can mix BM25 + dense + sparse + reranker in one call and tune the contribution of each signal.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="result-overlap">Result Overlap<a href="https://engineersofai.com/blog/ai-letters-20-hybrid-search#result-overlap" class="hash-link" aria-label="Direct link to Result Overlap" title="Direct link to Result Overlap" translate="no">​</a></h2>
<p>Query: <em>"How does hybrid search combine BM25 and vector retrieval?"</em></p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">Rank   SynapseKit                     LangChain                      LlamaIndex</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">────────────────────────────────────────────────────────────────────────────────</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">#1     Vector search uses dense...    TF-IDF and BM25 both use...    Hybrid search combines...</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">#2     Hybrid search combines...      Vector search uses dense...     Vector search uses dense...</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">#3     BM25 is a probabilistic...     Hybrid search combines...       TF-IDF and BM25 both use...</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Jaccard: LangChain vs SynapseKit 0.75  |  LangChain vs LlamaIndex 0.75  |  LlamaIndex vs SynapseKit 0.50</span><br></div></code></pre></div></div>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="what-this-means-for-engineers">What This Means for Engineers<a href="https://engineersofai.com/blog/ai-letters-20-hybrid-search#what-this-means-for-engineers" class="hash-link" aria-label="Direct link to What This Means for Engineers" title="Direct link to What This Means for Engineers" translate="no">​</a></h2>
<ol>
<li class="">
<p><strong>LangChain's <code>EnsembleRetriever</code> is the right default for production hybrid search.</strong> Unlimited retriever composition with per-retriever weights is what you want when you're tuning a real pipeline. The extra 3 lines over SynapseKit are worth it.</p>
</li>
<li class="">
<p><strong>LlamaIndex's no-weight limitation is a real constraint.</strong> Equal-weighting RRF works as a starting point. It fails when one retrieval mode dominates false positives and you need to downweight it.</p>
</li>
<li class="">
<p><strong>SynapseKit's single-class API is convenient for the 2-retriever case.</strong> If you're doing standard BM25 + dense and never need a third leg, the explicit <code>bm25_weight</code>, <code>vector_weight</code>, <code>rrf_k</code> API is clean.</p>
</li>
<li class="">
<p><strong>RRF k=60 is not magic.</strong> Lower k amplifies the importance of rank-1 results. Higher k flattens the distribution. Experiment with k in the range 30–100 before assuming 60 is optimal.</p>
</li>
<li class="">
<p><strong>Hybrid search is not free.</strong> You're running two retrieval steps plus a merge. Use <code>asyncio.gather()</code> to run BM25 and vector concurrently - LangChain supports <code>ainvoke()</code> on all its retrievers.</p>
</li>
</ol>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-corollary-most-people-miss">The Corollary Most People Miss<a href="https://engineersofai.com/blog/ai-letters-20-hybrid-search#the-corollary-most-people-miss" class="hash-link" aria-label="Direct link to The Corollary Most People Miss" title="Direct link to The Corollary Most People Miss" translate="no">​</a></h2>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">Hybrid search architecture choice:</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">──────────────────────────────────────────────────────────</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">SynapseKit    Single class, explicit weights, fixed at 2</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">              + clearest API for standard hybrid</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">              - cannot extend to 3+ retrieval signals</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">LangChain     Composable list, per-retriever weights</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">              + most flexible for production tuning</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">              - 3 more lines, more imports to manage</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">LlamaIndex    Composable list, equal weights only</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">              + supports unlimited retrievers</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">              - no weight control - blind spot for prod</span><br></div></code></pre></div></div>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="three-things-worth-doing-this-week">Three Things Worth Doing This Week<a href="https://engineersofai.com/blog/ai-letters-20-hybrid-search#three-things-worth-doing-this-week" class="hash-link" aria-label="Direct link to Three Things Worth Doing This Week" title="Direct link to Three Things Worth Doing This Week" translate="no">​</a></h2>
<ol>
<li class="">
<p><strong>Add <code>asyncio.gather()</code> to your hybrid retriever.</strong> If you're running BM25 and vector sequentially, you're paying both latencies. Run them concurrently and your hybrid latency drops to the slower of the two, not the sum.</p>
</li>
<li class="">
<p><strong>A/B test RRF k.</strong> Change k from 60 to 30 on a sample of your production queries. Lower k amplifies top-rank signals. Measure precision@3 on both.</p>
</li>
<li class="">
<p><strong>Read the Kaggle notebook.</strong> Full reproducible code, live RRF computation, and result overlap tables: <a href="https://www.kaggle.com/code/misternautiyal/llm-showdown-11-hybrid-search" target="_blank" rel="noopener noreferrer" class="">LLM Showdown #11 - Hybrid Search</a></p>
</li>
</ol>
<hr>
<p>Hybrid search is the standard, not the exception, for production RAG. The frameworks all implement RRF. What they disagree on is how much of the fusion parameters they expose to you. One treats the weights as fixed. One gives you two weights and a k constant. One gives you a weight list as long as your retriever list. That last one is the one you want when you're optimising recall across different query types.</p>
<p><em>Engineers of AI</em></p>
<p><strong>Read more: <a href="https://www.engineersofai.com/" target="_blank" rel="noopener noreferrer" class="">www.engineersofai.com</a></strong></p>
<p><em>If this was useful, forward it to one engineer who should be reading it.</em></p>]]></content:encoded>
            <category>AI Letters</category>
            <category>LLM Engineering</category>
            <category>RAG</category>
            <category>Benchmarks</category>
        </item>
        <item>
            <title><![CDATA[AI Letters #19 - The BM25 Test: One Framework Silently Fails]]></title>
            <link>https://engineersofai.com/blog/ai-letters-19-builtin-bm25</link>
            <guid>https://engineersofai.com/blog/ai-letters-19-builtin-bm25</guid>
            <pubDate>Sun, 05 Apr 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[SynapseKit, LangChain, and LlamaIndex adding BM25 keyword retrieval. LangChain ships the class but silently fails at runtime if you haven't installed a separate package it doesn't list as a dependency. SynapseKit bundles it. LlamaIndex makes you ask for it. And LangChain still wins on lines of code.]]></description>
            <content:encoded><![CDATA[<blockquote>
<p>BM25 ships in one framework, requires a hidden install in another, and silently fails at runtime in the third. Same class name, three very different experiences.</p>
</blockquote>
<p>Pure vector search has a blind spot. Exact-match queries - model names, function names, version numbers, proper nouns - embed poorly. The query "GPT-4o" and the document "GPT-4o" don't always produce similar vectors. BM25 does not have this problem. It matches terms, weighs them by rarity, and returns the right document.</p>
<p>Production RAG systems almost always use hybrid search: BM25 for precision on exact matches, vector search for semantic recall, reciprocal rank fusion to merge them. The question of whether BM25 ships out of the box is not academic. It determines whether your pipeline works on day one or fails at 2am in a customer demo.</p>
<p>We tested all three frameworks on an identical task: index five documents, run a BM25 query, get top-3 results. One framework's <code>BM25Retriever</code> class is in its package but silently throws a <code>ModuleNotFoundError</code> at runtime unless you've separately installed a library it doesn't list as a dependency.</p>
<!-- -->
<div style="display:flex;flex-direction:column;gap:16px;margin:24px 0">
<a href="https://engineersofai.com/blog-visuals/ai-letters-19/timeline.html" target="_blank" rel="noopener noreferrer" style="text-decoration:none" class="">
<div style="border:1px solid #e2e8f0;border-radius:12px;padding:20px 24px;background:#fff;cursor:pointer">
<div style="font-size:0.72rem;font-weight:700;letter-spacing:0.08em;text-transform:uppercase;color:#6366f1;margin-bottom:6px">Interactive Chart</div>
<div style="font-size:1rem;font-weight:700;color:#0f172a;margin-bottom:6px">Install Path by Framework →</div>
<div style="font-size:0.88rem;color:#475569;line-height:1.6">What you have to install before BM25 works - extra packages, silent dependencies, and integration packages across all three frameworks.</div>
</div>
</a>
<a href="https://engineersofai.com/blog-visuals/ai-letters-19/paradigms.html" target="_blank" rel="noopener noreferrer" style="text-decoration:none" class="">
<div style="border:1px solid #e2e8f0;border-radius:12px;padding:20px 24px;background:#fff;cursor:pointer">
<div style="font-size:0.72rem;font-weight:700;letter-spacing:0.08em;text-transform:uppercase;color:#0ea5e9;margin-bottom:6px">Code Explorer</div>
<div style="font-size:1rem;font-weight:700;color:#0f172a;margin-bottom:6px">BM25 Pipeline Code by Framework →</div>
<div style="font-size:0.88rem;color:#475569;line-height:1.6">Full BM25 index + query code side by side - imports, setup, and retrieval call annotated for each framework.</div>
</div>
</a>
<a href="https://engineersofai.com/blog-visuals/ai-letters-19/evidence.html" target="_blank" rel="noopener noreferrer" style="text-decoration:none" class="">
<div style="border:1px solid #e2e8f0;border-radius:12px;padding:20px 24px;background:#fff;cursor:pointer">
<div style="font-size:0.72rem;font-weight:700;letter-spacing:0.08em;text-transform:uppercase;color:#10b981;margin-bottom:6px">Data</div>
<div style="font-size:1rem;font-weight:700;color:#0f172a;margin-bottom:6px">LoC, Extra Installs, and Result Overlap →</div>
<div style="font-size:0.88rem;color:#475569;line-height:1.6">Lines of code breakdown, extra packages required, and ranked result comparison for identical query across all three frameworks.</div>
</div>
</a>
</div>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="what-we-measured">What We Measured<a href="https://engineersofai.com/blog/ai-letters-19-builtin-bm25#what-we-measured" class="hash-link" aria-label="Direct link to What We Measured" title="Direct link to What We Measured" translate="no">​</a></h2>
<p><strong>Task:</strong> Index 5 documents, run a BM25 query, return top-3 results. Identical corpus and query across all three frameworks.</p>





















<table><thead><tr><th>Metric</th><th>What it captures</th></tr></thead><tbody><tr><td>Extra packages needed</td><td>Pip installs beyond the base framework install</td></tr><tr><td>Lines of code</td><td>Import + functional lines to build and query a BM25 index</td></tr><tr><td>Result quality</td><td>Top-3 docs returned for an identical keyword query</td></tr></tbody></table>
<p><strong>Frameworks:</strong> SynapseKit 1.4, LangChain 1.2, LlamaIndex Core 0.14. Kaggle CPU.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-install-story">The Install Story<a href="https://engineersofai.com/blog/ai-letters-19-builtin-bm25#the-install-story" class="hash-link" aria-label="Direct link to The Install Story" title="Direct link to The Install Story" translate="no">​</a></h2>
<p>Before a single line of BM25 code runs, you need the right packages.</p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">Framework    Base install              Extra needed              Behavior</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">──────────────────────────────────────────────────────────────────────────</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">SynapseKit   pip install synapsekit    none                      Works immediately</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">LangChain    pip install langchain     pip install rank-bm25     Silent runtime fail if missing</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">             langchain-community</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">LlamaIndex   pip install llama-index-core  pip install           ImportError at import time</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">                                      llama-index-retrievers-bm25</span><br></div></code></pre></div></div>
<p>LangChain's behavior is the most dangerous. <code>BM25Retriever</code> lives in <code>langchain-community</code>. The import succeeds. The class is there. But when you call <code>BM25Retriever.from_texts()</code>, it raises <code>ModuleNotFoundError: No module named 'rank_bm25'</code> - a runtime error, not an import error. Your code passes linting, passes static analysis, and fails in production.</p>
<p>LlamaIndex fails at import time - <code>from llama_index.retrievers.bm25 import BM25Retriever</code> - which is the honest failure mode. You find out immediately.</p>
<p>SynapseKit declares <code>rank-bm25</code> as a core dependency in its pip metadata. It installs with the base package. Nothing extra to do.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-code">The Code<a href="https://engineersofai.com/blog/ai-letters-19-builtin-bm25#the-code" class="hash-link" aria-label="Direct link to The Code" title="Direct link to The Code" translate="no">​</a></h2>
<p><strong>LangChain - 3 lines (1 import + 2 functional):</strong></p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> langchain_community</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">retrievers </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> BM25Retriever</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">r       </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> BM25Retriever</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">from_texts</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">DOCS</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> k</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">3</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">results </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">doc</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">page_content </span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> doc </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> r</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">invoke</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">QUERY</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">]</span><br></div></code></pre></div></div>
<p>The cleanest BM25 API across all three. <code>from_texts()</code> takes a list of strings, <code>invoke()</code> returns <code>Document</code> objects. Three lines total.</p>
<p><strong>SynapseKit - 8 lines (2 imports + 6 functional):</strong></p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> synapsekit</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">retrieval </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> HybridSearchRetriever</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> Retriever</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> InMemoryVectorStore</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> synapsekit</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">embeddings </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> SynapsekitEmbeddings</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">emb    </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> SynapsekitEmbeddings</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">model</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"all-MiniLM-L6-v2"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> use_gpu</span><span class="token operator" style="color:#393A34">=</span><span class="token boolean" style="color:#36acaa">False</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">r      </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> Retriever</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">InMemoryVectorStore</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">emb</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">hybrid </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> HybridSearchRetriever</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">r</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> bm25_weight</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">1.0</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> vector_weight</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">0.0</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">hybrid</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">add_documents</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">DOCS</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">await</span><span class="token plain"> r</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">add</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">DOCS</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">results </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">await</span><span class="token plain"> hybrid</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">retrieve</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">QUERY</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> top_k</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">3</span><span class="token punctuation" style="color:#393A34">)</span><br></div></code></pre></div></div>
<p>SynapseKit's BM25 is hybrid-first. There is no standalone keyword retriever - BM25 lives inside <code>HybridSearchRetriever</code> with <code>bm25_weight=1.0</code>. This means you initialise an embedding model and a vector store even when you only want keyword search. The embedding model never runs (weight is 0), but the object must exist. Eight lines for something that should be three.</p>
<p><strong>LlamaIndex - 9 lines (3 imports + 6 functional):</strong></p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> llama_index</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">retrievers</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">bm25 </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> BM25Retriever</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> llama_index</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">core </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> Document</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> Settings</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> llama_index</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">core</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">node_parser </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> SentenceSplitter</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Settings</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">llm </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token boolean" style="color:#36acaa">None</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Settings</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">embed_model </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token boolean" style="color:#36acaa">None</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">nodes   </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> SentenceSplitter</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">chunk_size</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">512</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">get_nodes_from_documents</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">              </span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">Document</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">text</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">d</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> d </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> DOCS</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">r       </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> BM25Retriever</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">from_defaults</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">nodes</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">nodes</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> similarity_top_k</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">3</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">results </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">n</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">text </span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> n </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> r</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">retrieve</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">QUERY</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">]</span><br></div></code></pre></div></div>
<p>Three imports, two explicit <code>None</code> assignments to suppress LLM/embedding warnings, and a node parsing step before the retriever can be initialised. Nine lines, most of it overhead suppression.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-results">The Results<a href="https://engineersofai.com/blog/ai-letters-19-builtin-bm25#the-results" class="hash-link" aria-label="Direct link to The Results" title="Direct link to The Results" translate="no">​</a></h2>
<p>Query: <em>"How does BM25 compare to TF-IDF?"</em></p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">Rank   SynapseKit                    LangChain                     LlamaIndex</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">───────────────────────────────────────────────────────────────────────────────</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">#1     TF-IDF weights terms...       RAG feeds retrieved passages   TF-IDF weights terms...</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">#2     RAG feeds retrieved passages  BM25 is a probabilistic...     BM25 is a probabilistic...</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">#3     Hybrid search combines...     Hybrid search combines...      Hybrid search combines...</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Result overlap (Jaccard): 0.50 across all pairs (2/3 shared each)</span><br></div></code></pre></div></div>
<p>All three retrieve the same 3 documents from a 5-document corpus - they differ only on ranking order. That is expected: all three use BM25Okapi from the <code>rank_bm25</code> library under the hood. Different tokenisation details shift the ranking slightly, but the relevant documents are the same.</p>
<p>The result quality question is a non-issue for BM25. What matters is whether it runs at all.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="what-this-means-for-engineers">What This Means for Engineers<a href="https://engineersofai.com/blog/ai-letters-19-builtin-bm25#what-this-means-for-engineers" class="hash-link" aria-label="Direct link to What This Means for Engineers" title="Direct link to What This Means for Engineers" translate="no">​</a></h2>
<ol>
<li class="">
<p><strong>LangChain's silent runtime failure is a production hazard.</strong> A <code>ModuleNotFoundError</code> inside <code>from_texts()</code> - not at import time - means it passes every pre-deploy check that doesn't exercise the retrieval path. Add <code>rank-bm25</code> to your requirements file explicitly, always.</p>
</li>
<li class="">
<p><strong>SynapseKit's hybrid-first design costs you 5 extra lines for pure keyword search.</strong> If you only want BM25, you're initialising an embedding model that never runs. The zero-install story is real; the ergonomics for standalone BM25 are not great.</p>
</li>
<li class="">
<p><strong>LlamaIndex's explicit install is the honest design.</strong> A separate package for BM25 means the base install stays small. The tradeoff is one more <code>pip install</code> you have to know about - but at least it fails at import time, not at 2am in production.</p>
</li>
<li class="">
<p><strong>In practice, you want hybrid search, not pure BM25.</strong> Pure BM25 as a benchmark is useful; as a production retriever it leaves semantic recall on the table. The real question is which framework makes hybrid search (BM25 + vector + RRF) easiest to configure - that's the next benchmark.</p>
</li>
<li class="">
<p><strong>All three use the same BM25 algorithm.</strong> BM25Okapi from <code>rank_bm25</code> is the de facto standard implementation in Python. The retrieval quality differences you see in production are almost never about the BM25 implementation - they're about tokenisation, stemming, and stopword handling that sits on top of it.</p>
</li>
</ol>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-corollary-most-people-miss">The Corollary Most People Miss<a href="https://engineersofai.com/blog/ai-letters-19-builtin-bm25#the-corollary-most-people-miss" class="hash-link" aria-label="Direct link to The Corollary Most People Miss" title="Direct link to The Corollary Most People Miss" translate="no">​</a></h2>
<p>The install story matters more than the LoC story for BM25.</p>
<p>LangChain wins on lines of code (3 vs 8 vs 9). But a 3-line retriever that silently fails in production is worth less than an 8-line retriever that works. The ergonomics cost of SynapseKit's hybrid-first design is real - you shouldn't have to initialise embeddings to do keyword search - but at least it doesn't fail on you.</p>
<p>LlamaIndex's approach is the cleanest philosophically: BM25 is a separate concern, it lives in a separate package, the failure mode is immediate and visible. The ergonomics in code are the worst, but the operational behaviour is the most honest.</p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">Design philosophy comparison:</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">────────────────────────────────────────────────────────────</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">SynapseKit    BM25 bundled, hybrid-first API</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">              ✓ zero extra installs</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">              ✗ cannot do standalone BM25 without embedding overhead</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">LangChain     BM25 class included, dependency external</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">              ✓ cleanest API (3 lines)</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">              ✗ silent runtime failure if rank-bm25 not installed</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">LlamaIndex    BM25 in separate package, explicit install</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">              ✓ honest failure mode (import error, not runtime error)</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">              ✗ most verbose (9 lines + Settings suppression)</span><br></div></code></pre></div></div>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="three-things-worth-doing-this-week">Three Things Worth Doing This Week<a href="https://engineersofai.com/blog/ai-letters-19-builtin-bm25#three-things-worth-doing-this-week" class="hash-link" aria-label="Direct link to Three Things Worth Doing This Week" title="Direct link to Three Things Worth Doing This Week" translate="no">​</a></h2>
<ol>
<li class="">
<p><strong>Audit your requirements file.</strong> If you use LangChain's <code>BM25Retriever</code>, confirm <code>rank-bm25</code> is in your <code>requirements.txt</code> or <code>pyproject.toml</code>. The import succeeds without it; the runtime doesn't.</p>
</li>
<li class="">
<p><strong>Run a hybrid retrieval experiment on your existing RAG pipeline.</strong> Add BM25 alongside your vector search, fuse with reciprocal rank fusion, measure precision@3 on 20 representative queries. Most teams see 10–25% improvement on exact-match queries with no change to the embedding model.</p>
</li>
<li class="">
<p><strong>Read the Kaggle notebook.</strong> Full reproducible code, the live ranked results, and the result overlap analysis: <a href="https://www.kaggle.com/code/misternautiyal/llm-showdown-10-builtin-bm25" target="_blank" rel="noopener noreferrer" class="">LLM Showdown #10 - Built-in BM25</a></p>
</li>
</ol>
<hr>
<p>BM25 is 35 years old and still in production at Google, Elasticsearch, and every search system that handles exact-match queries. The question was never whether to use it. The question was whether your framework ships it without surprises. One does. One requires a hidden install. One fails silently at runtime. Now you know which is which.</p>
<p><em>Engineers of AI</em></p>
<p><strong>Read more: <a href="https://www.engineersofai.com/" target="_blank" rel="noopener noreferrer" class="">www.engineersofai.com</a></strong></p>
<p><em>If this was useful, forward it to one engineer who should be reading it.</em></p>]]></content:encoded>
            <category>AI Letters</category>
            <category>LLM Engineering</category>
            <category>RAG</category>
            <category>Benchmarks</category>
        </item>
        <item>
            <title><![CDATA[AI Letters #18 - The Chunking Test: Two Frameworks Are Identical, One Is Not]]></title>
            <link>https://engineersofai.com/blog/ai-letters-18-chunking-strategies</link>
            <guid>https://engineersofai.com/blog/ai-letters-18-chunking-strategies</guid>
            <pubDate>Sat, 04 Apr 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[SynapseKit, LangChain, and LlamaIndex splitting the same document with identical parameters. The line counts are almost the same. The chunk counts are not - and that gap explains why chunking is the step most tutorials skip past too quickly.]]></description>
            <content:encoded><![CDATA[<blockquote>
<p>How you split documents determines what your retriever finds. Most tutorials spend two lines on this. They shouldn't.</p>
</blockquote>
<p>Every RAG tutorial reaches the chunking step and sprints past it. "Split into chunks of 500 characters with 50 overlap - done." The code runs. The demo works. The demo is not production.</p>
<p>The split you choose affects embedding quality, retrieval precision, and whether your LLM gets enough context to say something useful. Chunking is not configuration. It's architecture.</p>
<p>We ran all three frameworks against the same document with identical parameters. The line counts came out nearly equal. The chunk outputs did not. One framework's default splitter interprets <code>chunk_size=300</code> as tokens, not characters - producing 2 chunks averaging 986 characters each instead of 12 chunks averaging 163 characters. Same parameter name, different semantics.</p>
<!-- -->
<div style="display:flex;flex-direction:column;gap:16px;margin:24px 0">
<a href="https://engineersofai.com/blog-visuals/ai-letters-18/timeline.html" target="_blank" rel="noopener noreferrer" style="text-decoration:none" class="">
<div style="border:1px solid #e2e8f0;border-radius:12px;padding:20px 24px;background:#fff;cursor:pointer">
<div style="font-size:0.72rem;font-weight:700;letter-spacing:0.08em;text-transform:uppercase;color:#6366f1;margin-bottom:6px">Interactive Chart</div>
<div style="font-size:1rem;font-weight:700;color:#0f172a;margin-bottom:6px">Splitter Inventory by Framework →</div>
<div style="font-size:0.88rem;color:#475569;line-height:1.6">All built-in splitters across SynapseKit, LangChain, and LlamaIndex - what ships out of the box and what each one is for.</div>
</div>
</a>
<a href="https://engineersofai.com/blog-visuals/ai-letters-18/paradigms.html" target="_blank" rel="noopener noreferrer" style="text-decoration:none" class="">
<div style="border:1px solid #e2e8f0;border-radius:12px;padding:20px 24px;background:#fff;cursor:pointer">
<div style="font-size:0.72rem;font-weight:700;letter-spacing:0.08em;text-transform:uppercase;color:#0ea5e9;margin-bottom:6px">Code Explorer</div>
<div style="font-size:1rem;font-weight:700;color:#0f172a;margin-bottom:6px">Full Chunking Code by Framework →</div>
<div style="font-size:0.88rem;color:#475569;line-height:1.6">Select a framework to see the complete chunking pipeline with line-by-line annotation: imports, splitter config, and output.</div>
</div>
</a>
<a href="https://engineersofai.com/blog-visuals/ai-letters-18/evidence.html" target="_blank" rel="noopener noreferrer" style="text-decoration:none" class="">
<div style="border:1px solid #e2e8f0;border-radius:12px;padding:20px 24px;background:#fff;cursor:pointer">
<div style="font-size:0.72rem;font-weight:700;letter-spacing:0.08em;text-transform:uppercase;color:#10b981;margin-bottom:6px">Data</div>
<div style="font-size:1rem;font-weight:700;color:#0f172a;margin-bottom:6px">Chunk Count, Size Distribution, and LoC →</div>
<div style="font-size:0.88rem;color:#475569;line-height:1.6">Live chunk output from all three frameworks on identical input - count, average size, max size, and size histogram side by side.</div>
</div>
</a>
</div>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="what-we-measured">What We Measured<a href="https://engineersofai.com/blog/ai-letters-18-chunking-strategies#what-we-measured" class="hash-link" aria-label="Direct link to What We Measured" title="Direct link to What We Measured" translate="no">​</a></h2>
<p><strong>Task:</strong> Split a 1,972-character document about RAG systems into chunks.
<strong>Parameters:</strong> <code>chunk_size=300, chunk_overlap=30</code>, sentence-aware splitter for each framework.
<strong>Metrics:</strong></p>





















<table><thead><tr><th>Metric</th><th>What it captures</th></tr></thead><tbody><tr><td>Built-in splitter count</td><td>How many strategies ship out of the box</td></tr><tr><td>Lines of code</td><td>How much code to configure sentence-aware chunking</td></tr><tr><td>Chunk output</td><td>Count, avg size, size distribution from identical input</td></tr></tbody></table>
<p><strong>Frameworks:</strong> SynapseKit 1.4, LangChain 1.2, LlamaIndex Core 0.14. Kaggle CPU environment.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-splitter-inventory">The Splitter Inventory<a href="https://engineersofai.com/blog/ai-letters-18-chunking-strategies#the-splitter-inventory" class="hash-link" aria-label="Direct link to The Splitter Inventory" title="Direct link to The Splitter Inventory" translate="no">​</a></h2>
<p>Before measuring LoC, count what each framework ships.</p>
<p><strong>SynapseKit - 2 splitters:</strong></p>
<ul>
<li class=""><code>RecursiveTextChunker</code> - recursive character splitting (default)</li>
<li class=""><code>TokenChunker</code> - token-count-based splitting</li>
</ul>
<p><strong>LangChain - 8 splitters:</strong></p>
<ul>
<li class=""><code>RecursiveCharacterTextSplitter</code> - recursive character splitting (recommended default)</li>
<li class=""><code>CharacterTextSplitter</code> - single-separator character splitting</li>
<li class=""><code>TokenTextSplitter</code> - token-count splitting</li>
<li class=""><code>SentenceTransformersTokenTextSplitter</code> - sentence-transformer token splitting</li>
<li class=""><code>MarkdownTextSplitter</code> - markdown-header-aware splitting</li>
<li class=""><code>PythonCodeTextSplitter</code> - Python AST-aware splitting</li>
<li class=""><code>HTMLSectionSplitter</code> - HTML section-aware splitting</li>
<li class=""><code>SemanticChunker</code> - embedding-based semantic splitting (langchain-experimental)</li>
</ul>
<p><strong>LlamaIndex - 9 splitters:</strong></p>
<ul>
<li class=""><code>SentenceSplitter</code> - sentence-aware splitting (default)</li>
<li class=""><code>TokenTextSplitter</code> - token-count splitting</li>
<li class=""><code>CodeSplitter</code> - language-aware code splitting</li>
<li class=""><code>MarkdownNodeParser</code> - markdown-header-aware splitting</li>
<li class=""><code>JSONNodeParser</code> - JSON-structure-aware splitting</li>
<li class=""><code>SentenceWindowNodeParser</code> - sentence with surrounding context window</li>
<li class=""><code>HierarchicalNodeParser</code> - multi-level hierarchical chunks</li>
<li class=""><code>SemanticSplitterNodeParser</code> - embedding-based semantic splitting</li>
<li class=""><code>TopicNodeParser</code> - topic-model-based splitting</li>
</ul>
<p>Two vs eight vs nine. The headline number is misleading though - what matters is whether the advanced splitters solve problems you'll actually encounter. We'll come back to this.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-code-side-by-side">The Code, Side by Side<a href="https://engineersofai.com/blog/ai-letters-18-chunking-strategies#the-code-side-by-side" class="hash-link" aria-label="Direct link to The Code, Side by Side" title="Direct link to The Code, Side by Side" translate="no">​</a></h2>
<p><strong>SynapseKit - 5 lines (1 import, 4 functional):</strong></p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> synapsekit </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> Retriever</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> InMemoryVectorStore</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> SynapsekitEmbeddings</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">emb </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> SynapsekitEmbeddings</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">model</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"all-MiniLM-L6-v2"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> use_gpu</span><span class="token operator" style="color:#393A34">=</span><span class="token boolean" style="color:#36acaa">False</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">r   </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> Retriever</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">InMemoryVectorStore</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">emb</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">                chunk_size</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">300</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> chunk_overlap</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">30</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">await</span><span class="token plain"> r</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">add</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">DOCUMENT</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">)</span><br></div></code></pre></div></div>
<p>No standalone splitter API. Chunk parameters live on the <code>Retriever</code>. If you want to inspect chunks before indexing, you can't - the split is opaque.</p>
<p><strong>LangChain - 4 lines (1 import, 3 functional):</strong></p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> langchain_text_splitters </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> RecursiveCharacterTextSplitter</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">chunks </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> RecursiveCharacterTextSplitter</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    chunk_size</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">300</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> chunk_overlap</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">30</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">split_text</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">DOCUMENT</span><span class="token punctuation" style="color:#393A34">)</span><br></div></code></pre></div></div>
<p>The cleanest interface of the three. Splitter is a standalone object, inspectable, composable. You can run it before indexing, log the output, swap it for any other splitter without touching the rest of your pipeline.</p>
<p><strong>LlamaIndex - 6 lines (2 imports, 4 functional):</strong></p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> llama_index</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">core</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">node_parser </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> SentenceSplitter</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> llama_index</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">core </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> Document</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">nodes  </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> SentenceSplitter</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    chunk_size</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">300</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> chunk_overlap</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">30</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">get_nodes_from_documents</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">Document</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">text</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">DOCUMENT</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">chunks </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">n</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">text </span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> n </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> nodes</span><span class="token punctuation" style="color:#393A34">]</span><br></div></code></pre></div></div>
<p>Two imports instead of one. Text must be wrapped in a <code>Document</code> object. Output is <code>Node</code> objects, not strings - you extract <code>.text</code> yourself. More verbose, but the <code>Node</code> carries metadata (source, position, relationships) that becomes useful downstream.</p>
<p>Line count totals: SynapseKit 5 / LangChain 4 / LlamaIndex 6. Effectively tied.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-chunk-output-surprise">The Chunk Output Surprise<a href="https://engineersofai.com/blog/ai-letters-18-chunking-strategies#the-chunk-output-surprise" class="hash-link" aria-label="Direct link to The Chunk Output Surprise" title="Direct link to The Chunk Output Surprise" translate="no">​</a></h2>
<p>Same document. Same <code>chunk_size=300</code>. Same <code>chunk_overlap=30</code>. Here is what each framework produced:</p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">Framework    Chunks   Avg size (chars)   Max size (chars)</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">──────────────────────────────────────────────────────────</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">SynapseKit      12         163               254</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">LangChain       12         163               254</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">LlamaIndex       2         986              1481</span><br></div></code></pre></div></div>
<p>SynapseKit and LangChain are identical - SynapseKit uses the same recursive character algorithm under the hood. LlamaIndex produced 2 chunks averaging 986 characters each.</p>
<p>The reason: LlamaIndex's <code>SentenceSplitter</code> interprets <code>chunk_size</code> as <strong>tokens</strong>, not characters. <code>chunk_size=300</code> means 300 tokens, which is roughly 1,200 characters. On a 1,972-character document, that yields 2 chunks - not the 12 you'd expect if you assumed character-based sizing.</p>
<p>This is not a bug. It is the documented behavior. But it is also the most common source of confusion when engineers switch frameworks mid-project. You copy the parameters from a LangChain tutorial, paste them into LlamaIndex, and your chunk distribution changes by an order of magnitude without a single error message.</p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">                    chunk_size=300 means...</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">                    ┌──────────────────────────────────┐</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">SynapseKit          │ 300 characters                   │</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">LangChain           │ 300 characters                   │</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">LlamaIndex          │ 300 tokens (~1,200 characters)   │</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">                    └──────────────────────────────────┘</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">Same document, chunk_size=300, overlap=30:</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">SynapseKit/LangChain:  [163][163][163][163][163][163][163][163][163][163][163][163]</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">LlamaIndex:            [──────────────────1481──────────────────][───490───]</span><br></div></code></pre></div></div>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="what-llamaindexs-advanced-splitters-actually-do">What LlamaIndex's Advanced Splitters Actually Do<a href="https://engineersofai.com/blog/ai-letters-18-chunking-strategies#what-llamaindexs-advanced-splitters-actually-do" class="hash-link" aria-label="Direct link to What LlamaIndex's Advanced Splitters Actually Do" title="Direct link to What LlamaIndex's Advanced Splitters Actually Do" translate="no">​</a></h2>
<p>The splitter count difference is real, but two of LlamaIndex's nine entries represent strategies with no equivalent in the other two frameworks.</p>
<p><strong><code>SentenceWindowNodeParser</code>:</strong> Stores each sentence as an individual node. Attaches the surrounding sentences as a metadata window (configurable, default: 1 sentence each side). At retrieval time, you search against precise single-sentence embeddings - high precision. At generation time, you expand the retrieved node to its window - adequate context. The result is retrieval that finds the exact sentence you need without diluting the embedding with surrounding text. Neither LangChain nor SynapseKit has a built-in equivalent.</p>
<p><strong><code>HierarchicalNodeParser</code>:</strong> Creates three levels of nodes from the same document: large (2048 tokens), medium (512), small (128). Small nodes are indexed for retrieval. When retrieval returns too many small nodes from the same parent (configurable threshold), they are "automerged" into the parent chunk before being sent to the LLM. You get the precision of small chunks with the coherence of large ones. This is a production technique; the LlamaIndex documentation attributes meaningful accuracy gains to it on multi-hop questions.</p>
<p>Switching between LlamaIndex's splitters costs one line - the import:</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic"># One import change. Everything else stays identical.</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> llama_index</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">core</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">node_parser </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> SentenceSplitter           </span><span class="token comment" style="color:#999988;font-style:italic"># default</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> llama_index</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">core</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">node_parser </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> SentenceWindowNodeParser   </span><span class="token comment" style="color:#999988;font-style:italic"># precision mode</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> llama_index</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">core</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">node_parser </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> HierarchicalNodeParser     </span><span class="token comment" style="color:#999988;font-style:italic"># production mode</span><br></div></code></pre></div></div>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="what-this-means-for-engineers">What This Means for Engineers<a href="https://engineersofai.com/blog/ai-letters-18-chunking-strategies#what-this-means-for-engineers" class="hash-link" aria-label="Direct link to What This Means for Engineers" title="Direct link to What This Means for Engineers" translate="no">​</a></h2>
<ol>
<li class="">
<p><strong>Never copy chunk parameters across frameworks without checking the unit.</strong> <code>chunk_size=500</code> in LangChain is characters. In LlamaIndex it is tokens. Verify once, avoid a silent quality regression.</p>
</li>
<li class="">
<p><strong>SentenceWindowNodeParser is worth understanding even if you don't use LlamaIndex.</strong> The pattern - retrieve at sentence granularity, generate with window context - is implementable in any framework manually. LlamaIndex just makes it one import.</p>
</li>
<li class="">
<p><strong>HierarchicalNodeParser solves a real production problem.</strong> When retrieval returns five fragments of the same paragraph as separate nodes, your LLM is reading five partial views of the same text. Automerging collapses them into the parent. This is not theoretical - it matters on documents with repeated cross-references.</p>
</li>
<li class="">
<p><strong>SynapseKit's 2 splitters are a constraint when you need format-aware splitting.</strong> If your corpus includes Markdown docs, Python files, and HTML pages, you need a splitter that understands structure. LangChain and LlamaIndex have these. SynapseKit does not.</p>
</li>
<li class="">
<p><strong>LangChain's standalone splitter API is the most flexible for debugging.</strong> Because chunking is decoupled from the vector store, you can log chunk distributions before committing to an indexing run. In production, that observability pays back quickly.</p>
</li>
</ol>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-corollary-most-people-miss">The Corollary Most People Miss<a href="https://engineersofai.com/blog/ai-letters-18-chunking-strategies#the-corollary-most-people-miss" class="hash-link" aria-label="Direct link to The Corollary Most People Miss" title="Direct link to The Corollary Most People Miss" translate="no">​</a></h2>
<p>The line counts say LangChain is cheapest (4 lines), LlamaIndex most expensive (6), SynapseKit in the middle (5). That is the wrong frame.</p>
<p>The actual cost comparison is: <strong>how many lines does it take to switch splitters when your default stops working?</strong></p>
<p>For LlamaIndex: one line (the import). All nine splitters share the same <code>get_nodes_from_documents()</code> interface.</p>
<p>For LangChain: also roughly one line - all splitters expose <code>.split_text()</code> or <code>.split_documents()</code>.</p>
<p>For SynapseKit: you cannot switch splitters. The chunking algorithm is not exposed. You take what the <code>Retriever</code> does internally, or you switch frameworks.</p>
<p>Initial LoC favors LangChain. Iteration cost favors LlamaIndex. Lock-in risk penalizes SynapseKit.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="three-things-worth-doing-this-week">Three Things Worth Doing This Week<a href="https://engineersofai.com/blog/ai-letters-18-chunking-strategies#three-things-worth-doing-this-week" class="hash-link" aria-label="Direct link to Three Things Worth Doing This Week" title="Direct link to Three Things Worth Doing This Week" translate="no">​</a></h2>
<ol>
<li class="">
<p><strong>Print your chunk distribution before indexing.</strong> <code>[len(c) for c in chunks]</code> - histogram it. If 20% of your chunks are under 50 characters, your splitter is cutting at punctuation. If 20% exceed your embedding model's token limit, they're being silently truncated.</p>
</li>
<li class="">
<p><strong>Test LlamaIndex's <code>SentenceWindowNodeParser</code> on one of your existing retrievers.</strong> The interface is one import and one additional retrieval step. If your current precision is poor, sentence-level retrieval with window expansion frequently outperforms standard chunking without any change to the embedding model.</p>
</li>
<li class="">
<p><strong>Read the Kaggle notebook.</strong> Full reproducible code for all three frameworks, live chunk outputs, and the size distribution charts: <a href="https://www.kaggle.com/code/misternautiyal/llm-showdown-9-chunking-strategies" target="_blank" rel="noopener noreferrer" class="">LLM Showdown #9 - Chunking Strategies</a></p>
</li>
</ol>
<hr>
<p>Chunking determines what your retriever can find. The tutorials that sprint past it in two lines are the same tutorials whose RAG demos fall apart on real documents. The split is not configuration. It is the first decision that determines whether your retrieval is precise or lucky.</p>
<p><em>Engineers of AI</em></p>
<p><strong>Read more: <a href="https://www.engineersofai.com/" target="_blank" rel="noopener noreferrer" class="">www.engineersofai.com</a></strong></p>
<p><em>If this was useful, forward it to one engineer who should be reading it.</em></p>]]></content:encoded>
            <category>AI Letters</category>
            <category>LLM Engineering</category>
            <category>RAG</category>
            <category>Benchmarks</category>
        </item>
    </channel>
</rss>