A Brief History of Type Systems and How AI Is Changing the Tradeoffs

The question of how much a computer should know about your data before running your code has been debated for about as long as programming languages have existed. It's a genuinely hard problem, not because we lack good answers, but because the good answers contradict each other depending on what you're trying to do.

What follows is a survey of how different languages have answered that question, what tradeoffs each answer carries, and where NTNT landed, including some evidence that's causing us to rethink our own defaults.

What Types Are and Why They Exist

At the most basic level, a type is a label that tells the computer what kind of data a value is. The number 42 is an integer. The text "hello" is a string. The list [1, 2, 3] is an array of integers. These labels let the computer (and the programmer) know what operations make sense for a given piece of data. You can add two integers. You can concatenate two strings. You can't divide a string by an array, and a type system is the mechanism that either prevents you from trying or tells you when you've done it by accident.

Types serve several purposes at once. They catch mistakes early: if a function expects a number and you pass it a string, a type system can flag that before the program runs, or at least when it runs into the problem. They serve as documentation: reading fn calculate_tax(income: Float, rate: Float) -> Float tells you what the function does and what it needs without reading the implementation. They enable tooling: editors can autocomplete fields on a typed object, refactoring tools can rename a function and update every call site, and compilers can generate faster code when they know the types in advance.

The pain points are equally real. Types add ceremony. In a heavily typed language, you spend time writing declarations that feel obvious, like telling the compiler that a variable holding the number 5 is, in fact, a number. They add rigidity: a function that would work perfectly well on both strings and numbers might need generics, overloads, or interface definitions to express that flexibility in the type system. And they add a learning curve, especially when type systems become expressive enough to encode complex constraints like memory ownership or effect tracking.

The fundamental tension is between safety and speed of development. More type information means more bugs caught before the code runs, but it also means more time spent satisfying the type checker. Less type information means faster iteration, but it pushes bug discovery to runtime, where the cost of finding and fixing them is higher.

What makes this genuinely hard is that the right answer depends on what you're building. Consider two applications. The first is an ecommerce store. Products have prices, and those prices determine how much a customer gets charged at checkout. If the price of an item somehow gets set to "banana" in the database, and the application treats that as null or an empty string and renders it as $0.00, you now have customers getting free items. In that context, you want the system to stop everything. A type error in the price field isn't a cosmetic problem, it's a financial one. The correct behavior is to fail loudly, refuse to render the checkout, and alert someone that the data is corrupt.

Now consider a simple blog application. Posts have a date field that displays when the article was published. If that date somehow gets set to "banana", the post is still readable. The date is wrong, and you want to fix it, but shutting down the entire page because of a corrupt date field means your readers can't read the article at all. In that context, rendering the page with a missing or placeholder date and logging the problem for someone to fix later is arguably the better outcome.

In both cases, the underlying type error is the same: a string appeared where a structured value (a price, a date) was expected. A strict type system would catch both at compile time or crash at runtime, which is exactly what you want for the ecommerce checkout and exactly what you don't want for the blog post. A lenient type system would let both slide, which keeps the blog readable but lets customers walk out with free products. The type system itself doesn't know which situation it's in. It just applies its rules uniformly, and that uniformity is both the strength and the limitation.

Nobody wants bugs. Everyone would prefer an application that always works correctly. The appeal of strict types is obvious: catch the errors before they reach users. So why would anyone choose leniency?

For human developers, the answer has historically been speed. Strict type systems slow down the writing process. You spend time annotating, satisfying the compiler, working around the type checker when it can't express what you mean. For a human exploring an unfamiliar API, prototyping a new feature, or iterating on a design, that friction is real. Lenient types let you keep moving, try things, see what works, and tighten things up later. The tradeoff is bugs in production, but for many applications the speed of iteration is worth it, especially early in a project's life.

For AI agents, this calculus may be different. Agents don't experience the friction of writing type annotations the way humans do. An agent generates a fully typed function signature as quickly as an untyped one. The overhead that makes humans choose lenient types doesn't apply. What agents do need is fast, clear feedback when something goes wrong, and as we'll discuss later, there's evidence that strict type systems actually provide better feedback loops for agents than lenient ones. The benefits of leniency may be primarily a human concern, while agents benefit more from the structured error signals that strict types produce.

This is why the type system debate is so persistent. The people arguing for strict types are thinking about the ecommerce case, and the people arguing for lenient types are thinking about the blog case, and both are right for their context. But the introduction of agents as primary code authors adds a new dimension to the question, because the tradeoffs that shaped fifty years of language design were made with human authors in mind.

A Short History of Type Systems in Languages

The story of type systems in programming languages is, roughly, a series of reactions to whatever came before.

C gave you types, but let you cast between them freely. The type system existed more as documentation for the compiler's memory layout than as a safety mechanism. You could store a pointer in an integer, cast it back, and hope for the best. This was fine until it wasn't, which was often.

Java reacted toward safety. Everything typed, everything declared, no ambiguity. The result was reliable but verbose. ArrayList<HashMap<String, List<Integer>>> is nobody's idea of readable code, but it tells the compiler exactly what you mean. Java developers spent real portions of their careers writing type declarations that humans would never read.

Python and Ruby reacted in the other direction. No type declarations at all. Move fast. The types are in the programmer's head, and the tests will catch everything else. This was liberating until codebases got large, teams got big, and "the tests will catch it" became "the tests mostly catch it."

Go made an interesting bet in 2009: static types, but deliberately simple ones. No generics (until Go 1.18 in March 2022), no type hierarchy, no exceptions. Rob Pike and the Go team believed that most of the complexity in languages like Java and C++ came from the type system itself, and that a simpler type system with less expressiveness would produce more maintainable code. The tradeoff was real. Go codebases are famously readable by anyone, but the lack of generics meant a lot of interface{} and runtime type assertions, which is to say, giving up some of the safety that types are supposed to provide. Go eventually added generics, but the decade without them shaped a culture that values simplicity over expressiveness in ways the type system still reflects.

Rust, arriving around the same time, made the opposite bet: a type system so expressive it can track memory ownership, lifetimes, and thread safety at compile time. The borrow checker eliminates entire classes of bugs that plague C and C++, including use-after-free, data races, and null pointer dereferences. The cost is that the type system itself becomes something you have to learn, fight with, and sometimes work around. Rust's learning curve is notoriously steep, and a significant portion of that curve is the type system. But there's an interesting counterpoint to the "Rust is hard" narrative: SWE-bench Multilingual found that AI agents (SWE-agent + Claude 3.7 Sonnet) actually achieved the highest resolution rate on Rust tasks compared to all other languages tested, despite Rust solutions modifying more lines of code on average. The strictest type system produced the best agent outcomes. We'll come back to why that matters.

Elixir took a different path entirely. José Valim built Elixir on the Erlang VM (BEAM), a runtime designed for telecoms where uptime matters more than almost anything. Elixir is dynamically typed, and deliberately so. The philosophy is that the runtime should be resilient to failures rather than trying to prevent them all at compile time. Processes are isolated. When something crashes, its supervisor restarts it. Pattern matching catches structural mismatches at the function boundary. The type system is minimal, but the fault-tolerance architecture means a type error in one process doesn't take down the system. Ericsson's AXD301 switch, built on Erlang, famously achieved nine nines of reliability (99.9999999% uptime). Elixir is also evolving here: there's active research into set-theoretic types for Elixir, a collaboration between Valim and researchers at CNRS, which aims to bring compile-time type checking without sacrificing the language's dynamic character.

Then came the convergence. The statically typed languages started borrowing dynamic ideas: Java added var, C++ added auto, Kotlin made type inference the default. Meanwhile, the dynamically typed languages started borrowing static ideas: Python added type hints (and adoption has grown significantly, with Meta's 2025 survey finding 86% of respondents now use them always or often), Ruby added RBS, JavaScript got TypeScript. Everyone was arriving at the same conclusion from opposite directions. You want types, but you don't want to write them all by hand.

Type Strategies and Tradeoffs

Stepping back from the history, there are roughly five strategies in use today, each making a different bet about what matters most.

Fully static, fully explicit (Java, early Go, early C++). You annotate everything. The compiler catches type errors before your code runs. The cost is verbosity and rigidity, and a function that works perfectly well on both strings and integers needs generics or overloads to express that.

Static with heavy inference (Rust, Haskell, OCaml, Kotlin). The compiler figures out most types from context. You annotate function signatures and locals are inferred. Rust will deduce that let x = vec![1, 2, 3] is a Vec<i32> without being told, and Haskell takes this further by letting you write entire programs without a single type annotation. The cost is that when inference fails, the error messages can be bewildering.

Dynamic with runtime resilience (Elixir, Erlang, Python, Ruby, Lua). Variables are names pointing at values, and type errors surface at runtime. The advantage is speed of development with no ceremony and no compilation step. Elixir and Erlang add a twist by making the runtime itself fault-tolerant, so a type error crashes a process, not the system. But the cost scales with codebase size in languages without that safety net. A 500-line script is fine without types, but a 50,000-line application becomes a minefield.

Gradual typing via a separate tool (TypeScript, Python + mypy, Flow). The language itself is dynamically typed, but an external type checker can analyze annotated code. TypeScript is the most successful example. You can write any everywhere and gradually tighten things. The cost is that the runtime doesn't know about the types. TypeScript types are erased before execution, so a value annotated as string can be a number at runtime if the type assertion was wrong.

Gradual typing built into the language (Dart, Typed Racket, Raku). The language and its runtime natively understand both typed and untyped code. Typed Racket inserts runtime contracts at the boundary between typed and untyped code, so type guarantees hold even when calling into untyped modules. The cost is complexity in the language implementation and sometimes surprising performance characteristics at boundaries.

Language-Level Opinions on Types

One thing all five strategies share is that type strictness is treated as a property of the language itself. The language is either strict or permissive, and that decision ripples outward into what the language gets used for.

This is worth dwelling on. A language's type system doesn't just catch bugs. It shapes what people build with the language and how they think about building it.

Python's lack of types made it the default for scripting, data science, and prototyping, domains where iteration speed matters more than long-term safety. Java's strict types made it the default for enterprise software and Android, domains where large teams need to read each other's code and refactoring has to be safe. Go's simple types attracted infrastructure and DevOps tooling, where the codebases are large but the data structures are relatively simple. Rust's expressive types attracted systems programming, where memory safety is worth the learning curve.

Even within a language, the defaults matter. TypeScript's strict mode is opt-in, which means many TypeScript projects run without it. The default shapes the culture. Python's type hints have grown in adoption (86% in Meta's 2025 survey), but they're still unenforced by the runtime, which means type errors can slip through in code that passes the type checker. Elixir's decision to handle failures at runtime rather than prevent them at compile time means Elixir developers think in terms of supervision trees and graceful degradation rather than type-level proofs.

The point is that a language's opinion on types isn't just a technical choice. It's a filter for what kind of software gets written in that language, what kind of developers adopt it, and what kind of errors they consider acceptable.

With that framing, the question for any new language isn't just "what type system should we build?" but "what kind of software do we want people to build with us, and how might the type system push them toward or away from that?"

NTNT's Approach to Types

NTNT is a young language, and its type system is still evolving. The design philosophy has been to build the controls first and dial them in over time as we learn more about what works, both for human developers and for the AI agents the language is designed around.

Today, NTNT has two independent controls for type behavior. They're both set via environment variables, which means they can differ between development and production, between CI stages, or between different services in the same application.

The lint axis (NTNT_LINT_MODE) controls static analysis before your code runs:

default: Only check code that has type annotations. Untyped code passes silently.
warn: Also warn about functions that are missing annotations entirely.
strict: Missing annotations are errors. Everything must be typed.

The runtime axis (NTNT_TYPE_MODE) controls what happens when a type mismatch occurs during execution:

forgiving: Mismatches are silently ignored. Index into the wrong type, get None back, keep going.
warn: Mismatches log a [WARN] to stderr but execution continues. This is the current default.
strict: Mismatches crash the program immediately.

The two axes are orthogonal, and that's intentional. Different applications have different needs. A prototype exploring an unfamiliar API benefits from lenient types that don't crash on every mismatch. A payment processing service benefits from the strictest checking available. A content site with dynamic templates benefits from resilience in the rendering layer even if the backend routes are strictly typed. The configurability exists so that the type behavior can match the context of what's being built, not just the phase of development.

# Prototyping
NTNT_LINT_MODE=default NTNT_TYPE_MODE=forgiving ntnt run server.tnt

# Development
NTNT_LINT_MODE=warn NTNT_TYPE_MODE=warn ntnt run server.tnt

# Production
NTNT_LINT_MODE=strict NTNT_TYPE_MODE=strict ntnt run server.tnt

What This Looks Like in Code

Types in NTNT are optional. When present, they're enforced at the level your configuration dictates.

// No types
fn greet(name) {
    return "Hello, {name}!"
}

// Typed
fn greet(name: String) -> String {
    return "Hello, {name}!"
}

// Mixed
fn process(data: Map<String, Any>, options) {
    let name: String = data["name"] ?? "Anonymous"
    let limit = options["limit"] ?? 10  // inferred as Int
    // ...
}

Local variables are inferred. let x = 5 is an Int, let names = ["Alice", "Bob"] is [String], let result = fetch(url) is Result<Response, String>. You only annotate locals when you want to.

Function signatures are where types matter most, since that's the boundary where one piece of code promises something to another. ntnt lint --warn-untyped reports which public functions are missing annotations, and ntnt lint --strict refuses to pass until every public signature is typed.

Type Resilience

In most languages, a type mismatch at runtime is an immediate crash. Try to index into a number in Python and you get a TypeError.

In NTNT's warn or forgiving mode, the operation returns None instead:

let x = 42
let y = x["key"]  // In Python: TypeError. In NTNT (warn): None + a warning.

This is influenced by how Objective-C handles nil messaging (send a message to nil, get nil back) and by optional chaining in Swift and Kotlin. Those languages built nil-safety into the type system from the start. NTNT makes the crash-or-continue decision configurable, which is more flexible but less principled.

The ?? operator provides a default for the chain:

let name = data["user"]["name"] ?? "Unknown"
// If data["user"] fails (not a map), returns None
// If None["name"] fails, returns None
// ?? catches any None in the chain and provides a default

Templates also follow this model. In warn mode, expression errors render as HTML comments in development and empty strings in production. For loops skip non-iterable values, and {{#if}} treats expression errors as false. This kind of resilience is particularly useful for content-heavy applications where a misspelled variable in one section of a page shouldn't produce a 500 error for the entire request.

Types and Contracts

NTNT also has design-by-contract with requires/ensures clauses, and the two systems catch different things.

Types catch structural errors: you passed a string where an integer was expected. Contracts catch semantic errors: the value is the right type but the wrong value. The divisor is zero, the age is negative, the array is empty when it shouldn't be.

fn withdraw(account: Account, amount: Float) -> Result<Transaction, String>
    requires amount > 0
    requires amount <= account.balance
    ensures account.balance >= 0
{
    // Types guarantee the shapes. Contracts guarantee the rules.
}

In HTTP handlers, requires failures return 400, ensures failures return 500. The type checker also validates contract expressions, so ensures len(result) > 0 on a function returning Int is caught at lint time rather than runtime.

What the Evidence Suggests About Agents and Types

NTNT is designed for AI agents as primary code authors. The current defaults are permissive, partly because the language and its type system are still maturing, and partly based on an intuition that agents benefit from leniency since type mistakes are common during iterative development.

Recent benchmark data complicates that intuition. SWE-bench Multilingual evaluated AI agents across nine programming languages and found that agents resolved Rust tasks at the highest rate among all languages tested. Rust, the language with the strictest type system, the borrow checker, the lifetime annotations. By the "be lenient with agents" logic, Rust should have produced the worst outcomes, not the best.

One explanation is that what actually helps agents isn't permissiveness but signal quality. Rust's error messages are famously detailed and structured. They tell you the file, the line, the expected type, the actual type, and often suggest a fix. An agent can parse a Rust error, understand what's wrong, and correct it in one iteration. Compare this to a warning buried in a log file that the agent might not even see, or a None value silently propagating through a template until the output is subtly wrong in a way that's hard to diagnose.

Strict types with good error messages may be a better feedback loop for agents than permissive types with warnings. The crash is the signal. The error message is the fix instruction. The agent doesn't need leniency; it needs clarity.

NTNT's error messages are already structured along these lines:

error[E003]
  --> /app/routes/users.tnt:12
  = Type error: Cannot apply '-' to String and Int
   |
   11 | let y = 42
   12 | let result = x - y
   |
  expected: compatible types for '-'
     found: String - Int
  hint: convert with int(x) or float(x)

This points toward a direction for the language. The current defaults exist because the type system is still being built out, and strict defaults are only as useful as the coverage and quality of the errors behind them. As that coverage improves, the defaults should move toward strict. The investment goes into making strict-mode errors clear and actionable, not into more sophisticated ways to absorb type mismatches.

The configurability remains important regardless, because the right type strictness depends on the kind of application being built, not just the maturity of the language. A payment processor and a content management system have different tolerances for type errors, even when both are written by the same agent in the same language. The goal is to provide the controls and the data to make an informed choice, then gradually shift the defaults as the evidence and the implementation warrant it.

What's Missing

There are significant gaps worth acknowledging.

There's no runtime type enforcement at every function call boundary. A value from an untyped function can pass through without being checked until it hits an operation that doesn't make sense for its type. Typed Racket's approach of inserting contracts at typed/untyped boundaries is more principled; we haven't done that.

There are no dependent types. Languages like Idris and Agda can express "an array of exactly 5 elements" or "a number between 1 and 100" in the type system itself. NTNT uses contracts for these constraints, which are checked at runtime rather than compile time.

There's no effect system. We had one briefly and removed it because it was syntax without enforcement. A real effect system needs static analysis infrastructure we haven't built yet.

There's no formal cross-module type soundness proof. The type checker works across files, but a function imported from an untyped module is assumed to return Any, which means the boundary between typed and untyped code is where bugs can hide.

And there's the "gradual becomes never" risk. If the default is permissive and a project never tightens things, the type system is dead weight. TypeScript has this problem in some codebases, with any everywhere and @ts-ignore as punctuation.

There's also a simplicity cost. Two independent axes means more modes to test, more documentation to write, and more edge cases. A language with one clear behavior is easier to reason about.

Using It

For anyone building an NTNT app today, the recommended path is:

Start with no types. Get the logic right. The default config won't produce any type warnings.

fn get_users(req) {
    let rows = pg_query(db, "SELECT * FROM users", [])
    return json(rows)
}

Add types to public API boundaries when the code stabilizes. Route handlers, lib exports, anything called from multiple places.

fn get_users(req: Request) -> Response {
    let rows = pg_query(db, "SELECT * FROM users", [])??
    return json(rows)
}

Run ntnt lint --warn-untyped periodically to see what's still untyped. Set NTNT_TYPE_MODE=strict for production, or at least for routes handling money, auth, or user data. Use ?? at the seams between typed and untyped code. Use contracts for semantic constraints that types can't express.

Closing Thoughts

The type system you choose shapes the software you build. That's true whether the author is a human or an agent. NTNT's approach is to provide the full range of type strictness, from forgiving to strict, and make it configurable per-deployment so the behavior can match the application's needs.

The evidence from SWE-bench Multilingual suggests that for agent-driven development, strict types with clear error messages may produce better outcomes than permissive types with buried warnings. That's shaping where we take the language next: better error messages, broader type coverage, and a path toward stricter defaults as the implementation matures.

If you have thoughts on this, or experience with how type systems affect agent-driven development, we'd like to hear about it on GitHub.