Interactive Theory Building
Kleis is not just a language for writing mathematics — it is a substrate for producing mathematics. This chapter shows how an AI agent can co-author formal theories with a human researcher, using the kleis-theory MCP to submit definitions, check consistency, and derive verified theorems in real time.
The Idea
Traditional theorem provers require the human to write everything. The agent workflow is different:
- Human describes the mathematical concepts informally
- Agent formalizes them as Kleis structures with operations and axioms
- Kleis + Z3 validates each submission (parsing, loading, consistency)
- Agent proposes conjectures and Z3 either verifies or rejects them
- Human guides the exploration — “what about nullspaces?” “try composing kernels”
The result is a .kleis file containing a verified theory that neither the
human nor the agent could have produced as efficiently alone.
The kleis-theory MCP
The theory-building workflow is powered by a Model Context Protocol server that exposes nine tools:
| Tool | Purpose |
|---|---|
submit_structure | Add a structure (with operations and axioms) to the theory |
submit_define | Add a function definition |
submit_data | Add algebraic data types |
try_structure | Dry-run a submission without modifying the theory |
evaluate | Evaluate an expression or verify a proposition via Z3 |
describe_schema | List everything currently loaded |
list_session | Show the history of agent submissions |
load_theory | Reset and load a saved theory |
save_theory | Save the current theory to a .kleis file |
Try-Before-Commit
Every submission goes through a three-stage pipeline:
Agent submits Kleis source
│
┌────▼────┐
│ Parse │──── Syntax error? → reject, theory unchanged
└────┬────┘
│
┌────▼────┐
│ Load │──── Name conflict? → reject, theory unchanged
└────┬────┘
│
┌────▼────┐
│ Verify │──── Inconsistent axioms? → reject, theory unchanged
└────┬────┘
│
┌────▼────┐
│ Commit │──── Append to session.kleis, rebuild evaluator
└─────────┘
The agent’s proposed code is first written to a scratch file and validated against a fresh evaluator. Only on success does it become part of the theory. This prevents a bad submission from corrupting accumulated work.
Walkthrough: Admissible Kernels in Projected Ontology Theory
This section reproduces an actual theory-building session. The goal: formalize the concept of admissible kernels from Projected Ontology Theory (POT) and discover what theorems follow from the axioms.
Step 1: Data Types
Every formalization starts with the types. POT has three:
data GreenKernel = GK(ℤ)
data Flow = Fl(ℤ)
data FieldR4 = FR4(ℤ)
These are “uninterpreted” types — Kleis and Z3 know they exist and are distinct, but nothing about their internal structure. This is deliberate: POT is meant to be general, not tied to specific representations.
Step 2: Algebraic Structures
Flows and fields need additive structure:
structure FlowAlgebra {
operation flow_add : Flow -> Flow -> Flow
operation flow_smul : ℂ -> Flow -> Flow
element flow_zero : Flow
axiom flow_add_comm : ∀(a b : Flow). flow_add(a, b) = flow_add(b, a)
axiom flow_add_id : ∀(a : Flow). flow_add(a, flow_zero) = a
}
structure FieldAlgebra {
operation field_add : FieldR4 -> FieldR4 -> FieldR4
operation field_smul : ℂ -> FieldR4 -> FieldR4
element field_zero : FieldR4
axiom field_add_comm : ∀(f g : FieldR4). field_add(f, g) = field_add(g, f)
axiom field_add_id : ∀(f : FieldR4). field_add(f, field_zero) = f
}
Each submission is validated before it enters the theory. If you wrote
flow_add_comm with a typo in the operation name, the pipeline would catch
it at the Load stage.
Step 3: The Core Axioms
Now the heart of the theory — what makes a kernel “admissible”:
operation apply_kernel : GreenKernel -> Flow -> FieldR4
structure AdmissibleKernel {
operation is_admissible : GreenKernel -> Bool
axiom kernel_lin_add : ∀(G : GreenKernel, a b : Flow).
implies(is_admissible(G),
apply_kernel(G, flow_add(a, b))
= field_add(apply_kernel(G, a), apply_kernel(G, b)))
axiom kernel_lin_smul : ∀(G : GreenKernel, c : ℂ, a : Flow).
implies(is_admissible(G),
apply_kernel(G, flow_smul(c, a))
= field_smul(c, apply_kernel(G, a)))
axiom kernel_maps_zero : ∀(G : GreenKernel).
implies(is_admissible(G),
apply_kernel(G, flow_zero) = field_zero)
}
Three axioms: linearity over addition (K1a), linearity over scalar multiplication (K1b), and zero preservation (K2). These are the weakest constraints that make the kernel “transform-like” without hardcoding any specific physics.
Step 4: Projective Equivalence
Two flows are equivalent under a kernel if they project to the same field:
structure ProjectiveEquivalence {
operation equiv : GreenKernel -> Flow -> Flow -> Bool
axiom equiv_intro : ∀(G : GreenKernel, a b : Flow).
implies(apply_kernel(G, a) = apply_kernel(G, b), equiv(G, a, b))
axiom equiv_elim : ∀(G : GreenKernel, a b : Flow).
implies(equiv(G, a, b), apply_kernel(G, a) = apply_kernel(G, b))
}
We axiomatized only the definition — equiv holds if and only if the
kernel maps agree. We did not axiomatize reflexivity, symmetry, or
transitivity. Those should follow.
Step 5: Derived Theorems
Now the agent asks Z3 to verify consequences:
evaluate: ∀(G : GreenKernel, a : Flow). equiv(G, a, a)
→ ✅ VERIFIED (reflexivity)
evaluate: ∀(G : GreenKernel, a b : Flow).
implies(equiv(G, a, b), equiv(G, b, a))
→ ✅ VERIFIED (symmetry)
evaluate: ∀(G : GreenKernel, a b c : Flow).
implies(and(equiv(G, a, b), equiv(G, b, c)), equiv(G, a, c))
→ ✅ VERIFIED (transitivity)
Z3 proves all three. The proof chain: equiv(G, a, b) →
apply_kernel(G, a) = apply_kernel(G, b) (by equiv_elim) → equality is
an equivalence relation → equiv(G, a, a) (by equiv_intro). We derived
that projective equivalence is a proper equivalence relation without ever
stating it as an axiom.
Step 6: Nullspace
The nullspace of a kernel is the set of flows it kills:
structure Nullspace {
operation in_null : GreenKernel -> Flow -> Bool
axiom null_def : ∀(G : GreenKernel, a : Flow).
implies(in_null(G, a), apply_kernel(G, a) = field_zero)
axiom null_intro : ∀(G : GreenKernel, a : Flow).
implies(apply_kernel(G, a) = field_zero, in_null(G, a))
axiom null_closed_add : ∀(G : GreenKernel, a b : Flow).
implies(is_admissible(G),
implies(and(in_null(G, a), in_null(G, b)),
in_null(G, flow_add(a, b))))
axiom null_closed_smul : ∀(G : GreenKernel, c : ℂ, a : Flow).
implies(is_admissible(G),
implies(in_null(G, a), in_null(G, flow_smul(c, a))))
}
More derived theorems:
evaluate: ∀(G : GreenKernel, a b : Flow).
implies(and(is_admissible(G), and(in_null(G, a), in_null(G, b))),
equiv(G, a, b))
→ ✅ VERIFIED (nullspace elements are projectively equivalent)
evaluate: ∀(G : GreenKernel). in_null(G, flow_zero)
→ ❌ UNKNOWN
The second query returns Unknown. Why? Because kernel_maps_zero only applies
to admissible kernels. The correct formulation is:
evaluate: ∀(G : GreenKernel).
implies(is_admissible(G), in_null(G, flow_zero))
→ ✅ VERIFIED
This is not a failure — it’s the system working as intended. Z3 caught a missing precondition that a human might overlook. The “Unknown” result tells the theorist: your axioms don’t support this claim as stated.
Step 7: Kernel Composition
Can admissible kernels be composed while preserving admissibility?
structure KernelComposition {
operation compose_kernel : GreenKernel -> GreenKernel -> GreenKernel
axiom compose_admissible : ∀(G1 G2 : GreenKernel).
implies(and(is_admissible(G1), is_admissible(G2)),
is_admissible(compose_kernel(G1, G2)))
}
evaluate: ∀(G1 G2 : GreenKernel).
implies(and(is_admissible(G1), is_admissible(G2)),
in_null(compose_kernel(G1, G2), flow_zero))
→ ✅ VERIFIED
Z3 chains the reasoning: two admissible kernels compose to an admissible kernel (axiom), admissible kernels map zero to zero (axiom), zero-mapping implies nullspace membership (axiom). Three hops, fully automatic.
Step 8: Save
save_theory(name: "pot_admissible_kernels")
→ ✅ Saved to theories/pot_admissible_kernels.kleis
The theory is now a plain .kleis file that can be imported, extended,
or loaded into a future session.
Walkthrough: UN Charter Article 51 — Legal Cross-Examination
The previous walkthrough formalized a physics theory. This one formalizes international law and uses Z3 as a cross-examiner.
The Legal Question
Article 2(4) of the UN Charter prohibits the use of force between states. Article 51 provides one exception: self-defense. But two competing doctrines disagree on when self-defense is lawful:
- Strict doctrine: only after an armed attack has occurred
- Anticipatory doctrine: also when an attack is imminent
This disagreement has real consequences. By formalizing both doctrines as axiom packs over a shared ontology, Z3 can identify exactly where they diverge — and prove what follows from each.
The Ontology
Types and operations that both doctrines share. The key design decision:
lawful_self_defense and status are parameterized by Doctrine, so
both doctrines coexist in the same file without axiom collision:
data State = US | Iran | Russia | Ukraine | Israel | Lebanon | CustomState(ℤ)
data Norm = Permitted | Prohibited
data Act = MilitaryAct(ℤ)
data ThreatEvidence = Intel(ℤ)
data Doctrine = Strict | Anticipatory
operation use_of_force : Act → Bool
operation attacker : Act → State
operation target : Act → State
operation sc_authorized : Act → Bool
operation necessary : Act → Bool
operation proportional : Act → Bool
operation armed_attack_occurred : State × State → Bool
operation imminent_attack : State × State → Bool
operation lawful_self_defense : Doctrine × Act → Bool
operation status : Doctrine × Act → Norm
Article 2(4): The Baseline
Force is prohibited unless an exception applies. A single axiom encodes both directions, universally quantified over doctrines:
structure Article2_4 {
axiom force_status : ∀(d : Doctrine, a : Act).
use_of_force(a) →
((status(d, a) = Permitted ↔ (sc_authorized(a) ∨ lawful_self_defense(d, a)))
∧ (status(d, a) = Prohibited ↔ ¬(sc_authorized(a) ∨ lawful_self_defense(d, a))))
}
The Two Doctrines
The strict doctrine requires an armed attack. Each axiom is pinned to
its Doctrine value — Strict or Anticipatory — so they never
interfere:
structure StrictArticle51 {
axiom sd_requires : ∀(a : Act).
lawful_self_defense(Strict, a) →
(necessary(a) ∧ proportional(a) ∧
armed_attack_occurred(target(a), attacker(a)))
}
The anticipatory doctrine also accepts imminence:
structure AnticipatoryArticle51 {
axiom sd_requires : ∀(a : Act).
lawful_self_defense(Anticipatory, a) →
(necessary(a) ∧ proportional(a) ∧
(armed_attack_occurred(target(a), attacker(a))
∨ imminent_attack(target(a), attacker(a))))
}
The only difference is one disjunct: ∨ imminent_attack(...). The
Doctrine parameter ensures each doctrine’s axioms operate independently
— no silent domination.
The Case File and Evidence Profile
Facts of a disputed strike — US attacks Iran, no SC authorization, no prior
armed attack. The NoEvidence structure prevents Z3 from inventing
phantom witnesses to satisfy the imminence predicate:
structure CaseFacts {
element case_act : Act
axiom is_force : use_of_force(case_act)
axiom who_attacks : attacker(case_act) = US
axiom who_is_target : target(case_act) = Iran
axiom no_sc_auth : ¬sc_authorized(case_act)
axiom no_prior_attack : ¬armed_attack_occurred(Iran, US)
}
structure NoEvidence {
axiom no_ev : ∀(e : ThreatEvidence). ¬evidence(Iran, US, e)
}
Without NoEvidence, Z3 could freely instantiate Intel(k) for any k
and set credible, immediate, no_alternative to true — technically
satisfying the imminence definition. The closed-world guard forces the
case file to supply evidence or accept its absence.
Verified Theorems
The file lives under examples/authorization/ alongside Zanzibar and OAuth2
models. This is not an accident — the structure is the same. Authorization
asks “is this action permitted given these rules and this identity?”
The UN Charter asks “is this use of force permitted given these doctrines
and these facts?” Both are judgment functions over acts. The difference
is that authorization is a gate before the act; international law is
a judgment after the act has already occurred.
Load the theory into the kleis-theory MCP:
load_theory(imports: ["examples/authorization/un_charter_article51.kleis"])
→ ✅ 6 structures loaded
Cross-Examination
The human asks questions in natural language. The AI agent inspects the loaded axioms and translates each question into a Kleis proposition, which Z3 either verifies or disproves. The human never writes Kleis — the agent does.
Human: “Is the act a use of force?”
evaluate: ∀(a : Act). a = case_act → use_of_force(a)
→ ✅ VERIFIED
Human: “Did the Security Council authorize it?”
evaluate: ∀(a : Act). a = case_act → sc_authorized(a)
→ ❌ DISPROVED
Human: “Did an armed attack occur?”
evaluate: ∀(dummy : Act). ¬armed_attack_occurred(Iran, US)
→ ✅ VERIFIED (none occurred)
Human: “Does any evidence of imminence exist?”
evaluate: ∀(e : ThreatEvidence). ¬evidence(Iran, US, e)
→ ✅ VERIFIED (none exists)
Human: “Is imminence established?”
evaluate: ∀(v : State, a : State). v = Iran ∧ a = US → ¬imminent_attack(v, a)
→ ✅ VERIFIED (impossible without evidence)
Human: “Can self-defense be claimed under either doctrine?”
evaluate: ∀(a : Act). a = case_act → lawful_self_defense(Strict, a)
→ ❌ DISPROVED
evaluate: ∀(a : Act). a = case_act → lawful_self_defense(Anticipatory, a)
→ ❌ DISPROVED
Human: “What is the legal status?”
evaluate: ∀(a : Act). a = case_act → status(Strict, a) = Prohibited
→ ✅ VERIFIED
evaluate: ∀(a : Act). a = case_act → status(Anticipatory, a) = Prohibited
→ ✅ VERIFIED
Human: “Could it ever be Permitted?”
evaluate: ∀(a : Act). a = case_act → status(Strict, a) = Permitted
→ ❌ DISPROVED
evaluate: ∀(a : Act). a = case_act → status(Anticipatory, a) = Permitted
→ ❌ DISPROVED
Human: “Do both doctrines agree?”
evaluate: ∀(a : Act). a = case_act →
(status(Strict, a) = Prohibited ∧ status(Anticipatory, a) = Prohibited)
→ ✅ VERIFIED
The Verdict
Both doctrines produce the same verdict: Prohibited. The anticipatory
doctrine could diverge, but only if the case file supplies an
EvidenceSupported profile instead of NoEvidence. Without evidence,
imminence is logically impossible, and both doctrines collapse to the
same judgment. The Doctrine parameter makes this comparison explicit
in a single Z3 query.
Why This Matters
The cross-examination is not a simulation — it is a proof. The human asked questions in English; the agent translated them into formal propositions; Z3 checked every possible model. No rhetoric, no ambiguity, no appeal to authority. Just axioms, facts, and a solver.
Change the case facts (assert sc_authorized, or
armed_attack_occurred) and the verdict flips — same law, different
facts, different theorem.
The same kleis-theory MCP that built a physics theory in the previous walkthrough now performs legal reasoning. The substrate is the same: structures, axioms, Z3 verification. The domain changed; the method did not.
What Makes This Different
From a Chat About Math
A language model can talk about admissible kernels all day. It can generate
plausible-sounding theorems. But it cannot verify them. With kleis-theory,
every claim the agent makes passes through Z3. When the agent proposed
in_null(G, flow_zero) without the admissibility precondition, Z3 said no.
The agent had to correct its reasoning.
From a Traditional Proof Assistant
In Lean, Coq, or Isabelle, the human writes everything. The assistant checks. With kleis-theory, the agent writes the formalizations. The human says “what about nullspaces?” and the agent produces the structure, the axioms, and the verification queries. The human guides; the agent formalizes; Z3 verifies.
From a Notebook
A Jupyter notebook runs code. It doesn’t accumulate verified knowledge. Each cell is independent. In kleis-theory, each submission builds on everything before it. The theory grows monotonically. Nothing is forgotten, nothing contradicts.
From Asking a Lawyer (or an LLM)
In the UN Charter cross-examination, the human did not need to know international law. The human said “cross-examine this case” and the agent did the rest — it queried the loaded theory via the MCP, discovered which structures and predicates were available, understood the logical dependencies between them, and constructed the propositions in Kleis syntax. The human asked in English; the agent translated to formal logic; Z3 returned the verdict.
This is different from asking an LLM “is this attack legal?” An LLM would produce a plausible paragraph citing Article 51, hedging with “it depends,” and possibly hallucinating case law. The agent does something structurally different: it reads the axioms that are actually loaded, constructs propositions that reference them precisely, and submits those propositions to a solver that checks every possible model. If the agent gets the proposition wrong, Z3 rejects it. If the axioms are inconsistent, Z3 reports inconsistency. There is no room for plausible-sounding nonsense.
The axioms are self-describing — lawful_self_defense, necessary,
proportional, armed_attack_occurred — the names carry the semantics.
The structures tell the agent which predicates gate which conclusions.
The agent does not “know” international law; it reads the formal encoding
and reasons over it. The domain knowledge lives in the .kleis file,
not in the model’s training data.
The Theory File
The saved theory is a standard Kleis file:
import "stdlib/prelude.kleis"
data GreenKernel = GK(ℤ)
data Flow = Fl(ℤ)
data FieldR4 = FR4(ℤ)
structure FlowAlgebra {
operation flow_add : Flow -> Flow -> Flow
operation flow_smul : ℂ -> Flow -> Flow
element flow_zero : Flow
axiom flow_add_comm : ∀(a b : Flow). flow_add(a, b) = flow_add(b, a)
axiom flow_add_id : ∀(a : Flow). flow_add(a, flow_zero) = a
}
// ... AdmissibleKernel, ProjectiveEquivalence, Nullspace, KernelComposition
It can be imported into other Kleis files, loaded into the REPL, or used as the starting point for a future theory session:
load_theory(imports: ["theories/pot_admissible_kernels.kleis"])
Summary of Verified Results
From the axioms of admissible kernels, Z3 derived:
| # | Theorem | Status |
|---|---|---|
| 1 | equiv(G, a, a) — reflexivity | ✅ Verified |
| 2 | equiv(G, a, b) → equiv(G, b, a) — symmetry | ✅ Verified |
| 3 | equiv(G, a, b) ∧ equiv(G, b, c) → equiv(G, a, c) — transitivity | ✅ Verified |
| 4 | in_null(G, a) ∧ in_null(G, b) → equiv(G, a, b) — null equivalence | ✅ Verified |
| 5 | is_admissible(G) → in_null(G, flow_zero) — zero in nullspace | ✅ Verified |
| 6 | Composed admissible kernels preserve nullspace of zero | ✅ Verified |
| 7 | in_null(G, flow_zero) without admissibility precondition | ❌ Unknown |
The “Unknown” in row 7 is not a bug — it’s Z3 correctly identifying that the axioms don’t support the claim as stated. This kind of feedback is what makes interactive theory building productive.
Enabling the kleis-theory MCP
Toggle it with:
scripts/theory.sh on # Enable
scripts/theory.sh off # Disable
scripts/theory.sh status # Check
When enabled, the nine tools become available to the AI agent in Cursor.
What’s Next
The theory file is the beginning, not the end. Future sessions can:
- Extend the theory with new structures (e.g.,
PhaseErasure,Residues) - Specialize to specific physics (GR kernels, QM propagators)
- Connect multiple theories via imports
- Compute with numerical methods (LAPACK, ODE solver) on concrete instances
The substrate is ready. Build on it.