Which Web Framework Is Actually the Fastest? We Benchmarked 9 of Them.
When we started benchmarking ntnt against other web frameworks, the results were humbling. Our single-threaded interpreter was doing 118K req/sec on plaintext (respectable!) but only 8,371 req/sec on database queries. FastAPI was doing 37K. Gin was doing 130K. We had work to do.
After some under the hood work, ntnt v0.4.2 hits 302K req/sec on plaintext and 37.8K on database queries — a 4.5× improvement on real-world workloads. This is the story of how we got there, and what we learned about making an interpreted language compete with compiled frameworks.
The Benchmark Suite
Before optimizing anything, we needed to know what "fast" meant. We built a benchmark suite modeled after TechEmpower, testing seven workloads that cover the spectrum from pure HTTP overhead to heavy database rendering:
- Plaintext —
GET /plaintext→ "Hello, World!" (pure HTTP overhead) - JSON —
GET /json→ serialize a JSON object - Route params —
GET /users/:id→ router pattern matching - JSON body —
POST /json→ parse + echo 1KB body - DB single —
GET /db→ one random PostgreSQL query - DB multi —
GET /queries?count=20→ 20 sequential queries - Template —
GET /template→ 10 queries + HTML render
Every framework runs the same seven endpoints, written the way that framework's own docs recommend. Nobody got special treatment. All of them hit the same PostgreSQL instance with the same 10,000-row table. We used wrk for load testing: 4 threads, 100 connections, 15 seconds per run, and took the median of 3 runs. We tested nine frameworks across six languages.
Where We Started: The Single-Thread Bottleneck
ntnt's architecture is unusual. It's an interpreted language with a Rust runtime — the server infrastructure (Axum, Tokio, connection handling) is compiled Rust, but the application logic (route handlers, templates, middleware) runs through ntnt's tree-walking interpreter. In v0.4.1, all of that interpretation happened on a single thread.
For plaintext and JSON responses, this was fine. The interpreter evaluated a handler, returned a string, and Axum sent it back. At 118K req/sec, we were competitive with Hono on Bun. But the moment a handler touched the database, the single thread became a funnel. Every request queued behind every other request's database I/O. The latency numbers told the story: 11.94ms average on a single DB query, when the query itself takes microseconds.
The 20-query benchmark was brutal: 457 req/sec with 217ms average latency. The interpreter was spending almost all its time waiting on PostgreSQL, and nothing else could run while it waited.
Fix #1: Async Database Connection Pool
The first fix was the database layer. ntnt v0.4.1 used synchronous postgres::Client — one connection, blocking I/O, no concurrency. We replaced it with deadpool-postgres, an async connection pool backed by Tokio.
The pool manages up to 20 connections with lazy allocation. A dedicated 4-thread Tokio runtime (DB_RUNTIME) bridges the synchronous interpreter world into async database operations. When a handler calls db.query(), the interpreter hands the query to the async runtime, which dispatches it on a pooled connection, and the result comes back without blocking the entire server.
This alone was a huge win for DB-heavy workloads, but the single interpreter thread was still the bottleneck for everything else.
Fix #2: Worker Pool
The bigger architectural change was the worker pool. Instead of one interpreter handling all requests, ntnt v0.4.2 spawns N worker threads (one per CPU core, capped at 8), each running its own Interpreter instance. Incoming requests flow through a flume MPMC channel to whichever worker is free.
Each worker evaluates the .tnt source file on startup (in Worker mode, which registers routes but skips listen()), then enters a request loop. Workers skip hot-reload entirely, so there are no filesystem checks on every request. That overhead only matters during development anyway.
The combination of worker pool + async DB pool is multiplicative. Eight workers can each dispatch database queries concurrently, and the connection pool handles the fan-out to PostgreSQL. The pipeline goes from "one thread waiting on one connection" to "eight threads dispatching across twenty connections."
The Results
Here's what the v0.4.1 → v0.4.2 progression looks like:
HTTP Benchmarks — v0.4.1 vs v0.4.2
Pure throughput tests (no database). Worker pool parallelism is the main driver.
Database Benchmarks — v0.4.1 vs v0.4.2
DB-heavy workloads saw the biggest gains: workers × async connection pool.
DB-heavy workloads saw the biggest gains because they suffered the most from the single-thread bottleneck. Plaintext improved 2.6× (8 workers instead of 1), while database workloads improved up to 5.1× (workers × connection pool concurrency).
How ntnt Compares to 8 Other Frameworks
Raw numbers don't mean much without context. We benchmarked nine frameworks across six languages, from compiled systems languages (Rust, Go) to interpreted stacks (Python, Ruby, Node.js, Bun). Here's the full picture:
Plaintext — GET /plaintext
Pure HTTP overhead. No I/O, no serialization — just how fast can you return a string.
JSON Serialization — GET /json
Serialize a simple object and return it. Tests JSON encoding + HTTP.
Single DB Query — GET /db
One random PostgreSQL row, serialized as JSON. The real-world baseline.
20 Queries — GET /queries?count=20
20 sequential random queries per request. Stress-tests connection handling.
Template Rendering — GET /template
10 DB queries + HTML table. The closest thing to a real page render.
What the Numbers Mean
A few things jump out from the full comparison:
ntnt is the fastest interpreted language for HTTP-only work. At 302K req/sec plaintext, ntnt outperforms FastAPI (174K), Hono/Bun (118K), Fastify (81K), and everything else that isn't compiled to native code. The only frameworks ahead of it are Actix and Gin, which compile directly to native code. You'd expect them to be faster.
The interpreter overhead is measurable but not disqualifying. Actix does 477K to ntnt's 302K on plaintext — a 1.6× gap. That's the cost of tree-walking interpretation versus compiled Rust, and it's honestly smaller than we expected. For DB workloads, the gap narrows further because the bottleneck shifts to PostgreSQL.
Fastify confirms Express is the bottleneck, not Node.js. Fastify on the same Node.js runtime does 81K plaintext where Express does 18K — a 4.5× difference. If you're building on Node.js, framework choice matters more than runtime performance.
Django's sync story is painful. Django with gunicorn and psycopg2 manages only 385 req/sec on single DB queries — versus 37K for FastAPI on the same language. The difference is entirely async vs sync. Django's ORM and synchronous worker model create a 100× penalty on database workloads. An async Django setup (ASGI + asyncpg) would perform significantly better, but that's not what most Django apps are running.
Rails is slower than Express on plaintext but competitive on DB. Puma's request overhead (11K) is heavier than Node's event loop (18K), but ActiveRecord's connection pooling keeps Rails competitive at 7.3K for database queries. Ruby's story is similar to Python's: the framework architecture matters more than the language speed.
What We Shipped
The v0.4.2 release includes:
- Worker pool — N interpreter threads via flume MPMC channel, auto-scaled to CPU count
- Async DB — deadpool-postgres connection pool (max 20, lazy allocation)
- Transaction support —
begin(),commit(),rollback()with connection pinning - Consistent Result types — all DB operations return
Result<T, String>
The total implementation is about 500 lines of Rust across interpreter.rs (worker pool) and postgres.rs (async DB). The ntnt application code didn't change at all — existing apps get the speed boost just by updating the binary.
What's Next
We're not done. The 20-query benchmark (2.3K req/sec) still shows room for improvement — Gin does 9.3K and FastAPI does 5.8K. Our queries run sequentially within each worker; adding query_batch() or parallel query execution within a single handler could close that gap.
Template rendering (4.6K) is another opportunity. The template engine currently re-parses on each render; caching compiled templates would reduce that overhead significantly.
The benchmark suite itself is open source. Run it yourself, add a framework, or challenge our methodology. We'd rather have accurate numbers than flattering ones.
The benchmark suite is open source. Run it yourself or add a framework.
Get ntnt v0.4.2 →