There is an ongoing, sometimes heated, argument between people embracing AI "vibe coding" as a great new technology liberating us from mundane tasks and allowing us to focus on high-level problems, vs people considering it is "stupid autocomplete" which produces "AI slop" while dumbing down writers who routinely rely on it, in the process.
We will frame this discussion around AI-assisted proof synthesis (e.g., for the Lean proof assistant). While proof and code synthesis are conceptually similar (by the Curry-Howard correspondence), the former is more illustrative in an academic context, as it brings questions of originality, attribution, and credit assignment into sharper focus. Additionally, there are a couple of important differences pertinent to using AI:
- Correctness criteria. Once the theorem is formulated, the proof assistant can check the candidate proof and determine whether the given theorem statement is indeed proven.
- Proof irrelevance. Any two proofs of the same proposition are considered indistinguishable and equally valid.
These differences address concerns about the correctness of generated proof (which can be checked) and eliminate the need for human review, thereby invalidating the "AI slop" concern (style or verbosity does not matter, as the proof term is intended for the checker, not for humans).
🧠 Informal Problem
│ 👤 Human — formalisation
▼
📄 Lean Specification
│ 🤖 AI — script generation
▼
📝 Proof Script
│ LΞ⋀N — elaboration + checking
▼
✔️ Verified Proof Term
This leaves us with more fundamental questions: 1) Can the AI-generated proofs be considered genuinely innovative or just a rehashing of existing prior work? 2) How much of the result could be attributed to the individual vs the machine?
The first question is a topic of ongoing debate about AI, which goes beyond proof or code synthesis applications and is outside the scope of this essay.
To answer the second one, we first informally divide proofs into two broad categories. The first type of proof is primarily mundane and mechanical, for example, as is common in proofs in program verification, which usually involve structural induction, case splits, term unfolding, and normalisation, etc. Such proof generation could be viewed as automation, similar to SAT solving, using decidable theories such as Presburger arithmetic or using tactics such as the "Sledgehammer" in Isabelle/HOL.
The second type is more intricate proofs in which the solution is non-trivial and requires making deep observations about the nature of the problem, linking it to external theories, identifying relevant invariants, and introducing and proving new conjectures.
Intuitively, the first kind involves no innovation or creativity and requires no credit for intellectual contribution, while the second calls for assigning intellectual credit.
These two categories may correspond to the "accidental" and "essential" difficulties described in [1]. In the context of mechanised proofs, essential difficulties are inherent to the nature of the problem. In contrast, accidental difficulties stem from the mechanics of formalisation, such as proof automation, combinatorial explosion, representational bottlenecks, and limitations of the proof assistant. But is there a precise boundary between these two?
| Accidental Complexity | Essential Complexity |
|---|---|
| structural induction | finding the right abstraction |
| case splits | identifying key invariants |
| term unfolding / reduction | introducing new definitions or concepts |
| normalisation / rewriting | linking to external theories |
| proof repair / refactoring | formulating auxiliary lemmas |
| heuristic lemma search | discovering new conjectures |
| managing combinatorial explosion | reframing the problem conceptually |
Let us reflect on the pre-modern-AI era, on computer algebra systems like Maple or Mathematica, which can symbolically solve some differential equations. Mathematicians typically do not list Maple/Mathematica as coauthors. Back then, one might say that the problems such a system could solve are merely accidental difficulties. Conversely, if it could not be solved by it, the difficulty may be either essential or accidental.
Unfortunately, with modern AI, we can no longer use such a simple distinction. It is claimed, albeit still a matter of debate, that modern AI may exhibit emergent properties that go beyond what it was trained on. This leaves the possibility of producing truly innovative results and thus solving problems of essential difficulty.
Unsatisfactorily, this leaves the line between the proofs of accidental and essential difficulty not firmly fixed. It remains a judgment call. Moreover, historically, such a line has shifted: many mathematical proofs of problems that were regarded as essential in the 19th century are now treated as accidental, as the development of new methods turned once-insightful arguments into routine applications.
As of the time of this writing (November 2025), the type of proofs which AI could reliably generate falls into the first category: mundane and mechanical (accidental complexity). So if you are using AI to grind out these proofs, it would be safe to consider it just an automation aid that could be used without reservation. As the state of the art in AI models evolves and they can tackle the essential complexity in proofs, we may need to start crediting them for their contributions.
References:
[1] Brooks, "No Silver Bullet Essence and Accidents of Software Engineering," in Computer, vol. 20, no. 4, pp. 10–19, April 1987, doi: 10.1109/MC.1987.1663532. https://www.cs.unc.edu/techreports/86-020.pdf