Research
A comprehensive literature review of Voynich Manuscript (MS 408) research, decipherment attempts, and proposed AI approaches. Compiled from 100+ years of scholarship.
Established Facts
What we know for certain about MS 408, based on physical analysis and scholarly consensus.
Radiocarbon Dating
1404-1438 CEVellum dated by University of Arizona (2009). Ink dating inconclusive but consistent. This rules out many post-1500 authorship theories.
Provenance
Traced to 1600sEarliest confirmed owner: Georg Baresch (Prague, 1637). Passed to Athanasius Kircher via Jan Marek Marci. Possibly owned by Rudolf II (purchased for 600 ducats). Acquired by Wilfrid Voynich in 1912 from Villa Mondragone.
Physical Format
240 pages, 23.5 x 16.2 cmQuarto format. Some folios are fold-outs (up to 6 panels). Written left-to-right. Contains ~170,000 characters across ~35,000 words. No corrections, strikethroughs, or erasures visible.
Sections
6 thematic sectionsBotanical (herbal, ~130 pages), Astronomical (zodiac/star charts), Biological (nude figures in tubes), Pharmaceutical (jars/roots), Recipes (dense text, short paragraphs), and unlabeled sections.
Script
~20-30 distinct charactersUnique script with no confirmed match to any known writing system. Characters include 'gallows' (tall ornate letters), bench-like characters, and simple loops. Multiple transcription systems exist (EVA, Currier, Frogguy).
Statistical Properties
Language-like but anomalousFollows Zipf's law for word frequencies. Word-level entropy (~10 bits/word) matches English/Latin. But character-level second-order entropy (h2 ~2 bits) is lower than ALL 316 natural languages tested (typically 3-4 bits) — characters are abnormally predictable. Two distinct 'languages' (Currier A and B) in different sections. ~37,919 word tokens, ~8,114 unique types.
Script & Language Properties
Technical characteristics of the Voynich script discovered through computational analysis.
EVA (European Voynich Alphabet)
Created by Rene Zandbergen and Gabriel Landini (1998). Defines ~25 basic characters using ASCII equivalents. 'Analytical' approach that breaks visible strokes into components (e.g., 'ch' = c + h). Most widely used system for computational analysis.
Gallows Characters
Four tall, ornate characters (EVA: t, k, p, f) that extend above the line. Appear mainly at word beginnings and line beginnings. 'Rare' gallows (cth, ckh, cph, cfh) combine bench + gallows. Their distribution suggests they may be numerals, abbreviation markers, or paragraph markers.
The 'Slot' Model (Stolfi)
Voynich words follow a rigid crust-mantle-core structure with specific character subsets at each layer. This grammar covers 96.5% of all running text tokens. Word length distribution is unusually binomial (peaked, not the long tail seen in natural languages). Position 1: q, s, d (or empty). Position 2: o (or empty). Position 3: l, r (or empty). Core: e, ch, sh, etc. Final: y, m, n, g. Unlike any known natural language or cipher.
Currier A vs B
Two statistically distinct 'languages' first identified by Prescott Currier (1976). Language A: more 'o' characters, found in herbal-A sections. Language B: more 'a' characters, found in herbal-B and biological sections. Could represent two dialects, scribes, or encoding schemes.
Entropy Anomalies
Character-level entropy: ~4.5 bits (similar to English). Word-level entropy: ~10 bits/word (lower than most European languages at ~11-12). Second-order entropy (character pairs) is unusually low, suggesting high predictability. This is the 'too regular' problem — the text is more predictable than natural languages.
Zipf's Law Compliance
Word frequencies follow Zipf's law (frequency inversely proportional to rank) very closely, matching natural language behavior. This was initially cited as evidence of real language, but it's now known that some generated texts can also exhibit Zipf-like distributions.
Hapax Legomena
Words appearing only once make up 14-20% of the vocabulary (depending on transcription). This is lower than typical natural languages (~40-60%) but higher than random text. The rate varies significantly based on which transliteration alphabet is used — a key insight from Zandbergen (2022).
Line as Functional Unit
First words on each line are ~1 character longer than average. Certain words have strong line-position affinity — one character appears line-finally in 85% of its occurrences. No known natural language shows this property. Lines may function as encoding units rather than arbitrary text wrapping.
Labels vs Paragraph Text
Labels (text near illustrations) have 12.4% 'abnormal' words vs only 3.7% in running text. Labels show very few repetitions and many unique forms. Interestingly, many label words also appear in running text — but not near their labeled illustration, suggesting labels aren't simple captions.
Extreme Positional Constraints
Characters show extreme positional preferences: 'q' appears only word-initially, 'm' only word-finally, 'y' only at word start or end. Bigram contact rules are far tighter than any natural alphabet — certain character pairs never occur despite both being common individually.
No Corrections
The manuscript shows virtually no corrections, cross-outs, or scribal errors. This is unusual for a 15th-century manuscript and has been cited both as evidence of careful copying (real text) and as evidence of meaningless generation (hoax). Genuine manuscripts of this era typically show corrections.
The 'Double Unknown' Problem
The Voynich presents a unique challenge: both the cipher method AND the underlying language are unknown. In every historical decipherment success (hieroglyphs, Linear B, Enigma), at least one was known. This 'double unknown' means the search space is combinatorially vast.
Hard Numbers
Specific measurements that define the Voynich puzzle. These are the constraints any successful theory must explain.
Entropy Comparison (bits)
| Text | h1 | h2 |
|---|---|---|
| Voynich (EVA) | 3.86 | 1.84 |
| Latin (Pliny) | 4.00 | 3.27 |
| Italian (Dante) | 4.01 | 3.13 |
| German | 4.02 | 3.04 |
| English | 4.21 | ~3.2 |
| Random | 6.01 | ~6.0 |
h1 = first-order entropy. h2 = conditional (bigram) entropy. Voynich h2 is ~40% lower than natural languages.
Most Frequent Words
| EVA Word | Count | Family |
|---|---|---|
| daiin | 863 | daiin |
| ol | 537 | ol |
| chedy | 501 | chedy |
| aiin | 469 | daiin |
| shedy | 426 | shedy |
| chol | 396 | chol |
| or | 363 | ol |
| ar | 350 | ar |
| qokeey | 308 | qo- |
| qokeedy | 305 | qo- |
37,919 tokens, 8,114 unique types. ~50% hapax legomena.
Constraint Elimination
The h2 entropy constraint alone eliminates most cipher families. This is the analysis most decipherment attempts never performed — they proposed mechanisms that are mathematically impossible given the observed statistics.
| Cipher Family | Predicted h2 | Voynich h2 | Verdict |
|---|---|---|---|
| Simple Substitution | ~3.0-3.3 | 1.84 | Eliminated |
| Polyalphabetic (Vigenère) | ~4.0-5.0+ | 1.84 | Eliminated |
| Simple Homophonic | ~3.0-4.0 | 1.84 | Eliminated |
| Verbose Cipher (Naibbe-style) | ~1.8-2.2 | 1.84 | Best Fit |
| Constructed Language | ~1.5-2.5 | 1.84 | Possible |
| Compressed Bitstream | ~1.0-2.5 | 1.84 | Possible |
| Meaningless Hoax (Cardan grille) | ~1.5-2.5 | 1.84 | Possible |
Simple SubstitutionEliminated
Polyalphabetic (Vigenère)Eliminated
Simple HomophonicEliminated
Verbose Cipher (Naibbe-style)Best Fit
Constructed LanguagePossible
Compressed BitstreamPossible
Meaningless Hoax (Cardan grille)Possible
Bottom line: Only three families survive the h2 constraint — verbose cipher, constructed language, and compression. Everything else is mathematically eliminated. The verbose cipher (Naibbe-style) is the only one that also matches the slot grammar, word-length distribution, Zipf compliance, AND is historically plausible for the 15th century.
Crib Catalog
Known or suspected plaintext — the cryptanalyst's entry points. Every successful decipherment in history used cribs. The Voynich has more than most people realize.
Zodiac labels
high confidenceLocation: Folios 70v-73r (zodiac section)
Suspected plaintext: Month names: mars, abril, may, junio, julio... Zodiac signs: taurus, gemini, cancer...
Latin-alphabet labels visible directly on the pages. Best available cribs. Some may have been added by a later hand — check ink/handwriting consistency.
Repeated paragraph-initial words
medium confidenceLocation: Recipe section (folios 103r-116r)
Suspected plaintext: Likely 'Recipe' (Take), 'Accipe' (Accept), or equivalent
Medieval recipe/herbal texts almost universally begin paragraphs with formulaic verbs. Need to identify which Voynich word appears paragraph-initially most often in this section.
Star labels in astronomical section
medium confidenceLocation: Folios 67r-69r (astronomical/cosmological)
Suspected plaintext: Possible star names or cardinal directions
Small labels near star diagrams. If any can be matched to known star names (in Latin, Arabic, or vernacular), they provide additional cribs.
Plant name labels
low confidenceLocation: Herbal section (folios 1v-66v)
Suspected plaintext: Labels adjacent to botanical illustrations
IF a plant can be confidently identified (Tucker claims 166 IDs, most disputed), the label might encode its name. Low confidence because plant IDs are contested and labels may not be simple names.
Number sequences
low confidenceLocation: Various (especially pharmaceutical section)
Suspected plaintext: Dosage quantities, astronomical measurements
If gallows characters are numerals (as some have proposed), pharmaceutical sections might contain countable quantities. Very speculative but testable if gallows-as-numeral hypothesis is correct.
Cryptanalytic Attack Plan
Eight specific computational attacks, ordered by information yield per CPU-hour. Each attack has defined success/failure criteria and specifies which hypothesis it falsifies. This is not a wish list — it is an execution plan.
Bigram Transition Entropy Mapping
criticalCompute conditional entropy H(char_{i+1} | char_i) at each position within words. In a verbose cipher, positions INSIDE an encoded letter have near-zero entropy (deterministic table lookup). Positions at letter BOUNDARIES have high entropy (next plaintext letter is unpredictable). If verbose cipher is correct, this plot shows periodic spikes.
What we compute
For every word position i (1→2, 2→3, 3→4...), calculate H(next_char | current_char) across all words. Plot as a function of position. Run separately for each word length (5-char words, 6-char words, etc.) to control for edge effects.
Data needed
IVTFF transcription (162,755 characters). Already have it.
Effort
Single Python script. Hours to implement, seconds to run. Highest information-per-CPU-hour of any analysis.
Success looks like
Periodic entropy spikes at regular intervals. Period=2 means each letter encodes as 2 glyphs. Period=3 means 3 glyphs per letter. The Naibbe model predicts VARIABLE length, so we'd see a noisier but detectable signal — strongest within fixed-length word buckets.
Failure looks like
Smooth, monotonically decreasing entropy curve (like natural language) → not a verbose cipher. Flat/uniform entropy → compressed bitstream.
Falsifies
If no periodic structure found → verbose cipher hypothesis is severely weakened. If smooth curve → consistent with constructed language.
Zodiac Page Crib Attack
criticalThe zodiac section (folios 70v-73r) has Latin month names and zodiac signs written in a known alphabet RIGHT NEXT to Voynich text. In classical cryptanalysis, this is a 'crib' — suspected plaintext. The entire Enigma effort was built on cribs. The Voynich zodiac pages hand us cribs for free. Nobody has done a systematic crib-based attack with modern tools.
What we compute
1) Map each Latin label to its spatially adjacent Voynich text tokens. 2) For each cipher model (Naibbe, homophonic, etc.), test whether ANY valid table could map the Latin word to the Voynich token. 3) Cross-validate: recover partial table from 12 cribs, predict the other 12.
Data needed
High-res zodiac pages (have on R2) + IVTFF transcription for those folios. Manual identification of spatial label-to-text adjacency (human labor required).
Effort
Medium. Manual crib identification takes hours. Automated Naibbe sweep takes days of compute. But the potential payoff is decipherment itself.
Success looks like
Recovering 3-4 consistent table entries from independent cribs would be extraordinary. If the same entry appears from two different crib pairs, that's near-proof of the cipher model. Even partial table recovery opens the door to full decipherment.
Failure looks like
No consistent mapping exists across any cipher model → either labels aren't cribs (added later by a different hand), or the cipher is more exotic than anything we've modeled.
Falsifies
If no cipher model produces consistent cribs → all tested cipher models are wrong for the zodiac section (though A/B split means other sections may use different keys).
Slot-Specific Frequency Factoring
highStolfi's slot model constrains which characters appear at which word positions. A cryptographer sees each slot as a separate cipher channel. Within each slot, character frequencies reflect plaintext letter frequencies filtered through the cipher table. By analyzing per-slot frequencies in Language A vs B, we can factor apart 'different cipher' from 'different language.'
What we compute
1) Tag each character with its Stolfi slot position. 2) Compute per-slot character frequencies for A pages vs B pages. 3) Spearman rank correlation between A and B per slot. 4) Under a cipher model assumption, solve the linear system to recover underlying plaintext letter frequencies. 5) Compare recovered frequencies against known language letter distributions.
Data needed
IVTFF transcription + Stolfi slot assignments (published, need to digitize/implement the parser).
Effort
Medium. Need to implement slot parser + frequency analysis pipeline.
Success looks like
Recovered frequency distribution matches a specific language within 1-2 standard deviations. Even a noisy match (top 5 letters correct) narrows the source language from 'any language ever' to 2-3 candidates. If A and B show high slot-frequency correlation → same language, different tables. Low correlation → different languages.
Failure looks like
Recovered frequencies match no known language → either constructed language, or the cipher model is wrong, or noise overwhelms the signal.
Falsifies
If recovered frequencies match no natural language AND A/B slot correlations are low → strong evidence for constructed language over cipher.
Boundary Detection via Rare Bigrams
highIn a verbose cipher, bigrams within encoded letters are table-driven (high frequency). At letter boundaries, suffix-prefix combinations may be rarer — they arise only from specific letter-pair sequences. Find the lowest-frequency non-zero bigrams and map where they appear in words. If they cluster at specific positions, those positions are letter boundaries.
What we compute
1) Full bigram frequency matrix (25x25 for EVA). 2) Compute observed/expected ratio for each bigram at each word position (normalize for character rarity). 3) Map positions where ratio is most extreme. 4) Cross-reference with Attack 1 entropy spikes for validation.
Data needed
IVTFF transcription. Already have it.
Effort
Low. Can run as an extension of Attack 1. Same Python pipeline.
Success looks like
Rare bigram positions match entropy spike positions from Attack 1 → strong cross-validation of letter boundaries. The combined boundary map tells us encoding unit structure.
Failure looks like
Rare bigrams are uniformly distributed → no boundary signal. This would weaken the verbose cipher hypothesis OR suggest the table designer achieved perfect boundary masking.
Falsifies
No clustering + no entropy periodicity = verbose cipher model is wrong.
Quire-Level Statistical Consistency
highThe manuscript is organized in quires (gatherings of pages). 15th-century cipher users typically changed keys at document boundaries. If the cipher key changes per quire, text within quires should be more statistically consistent than across quires. The A/B split might actually be a quire-key split.
What we compute
1) Group text by physical quire (codicological data available from Beinecke catalog). 2) Per-quire: character frequencies, bigram frequencies, h2, word-length distribution. 3) ANOVA: within-quire variance vs between-quire variance. 4) Compare quire boundaries against Currier A/B split.
Data needed
IVTFF transcription + quire structure mapping (published in codicological studies of MS 408).
Effort
Low once quire mapping is obtained. Quick statistical computation.
Success looks like
Between-quire variance significantly higher than within-quire → key changes per quire. If A/B maps cleanly to quire groups → A/B is a key change, NOT a language change. This simplifies the problem enormously.
Failure looks like
Within-quire and between-quire variance are similar → no key change at quire boundaries. Or quire boundaries don't align with A/B → A/B is driven by something else (scribe, language, topic).
Falsifies
If quires show no statistical grouping → quire-level key changes are not a factor.
Paragraph-Initial Formula Detection
mediumIf the underlying text is real (herbal, recipe, medical), paragraph openings should be formulaic. Medieval Latin herbals start with 'Recipe...', 'Accipe...', 'Herba...'. Italian: 'Prendi...', 'Questa pianta...'. The same cipher-word appearing at the start of many paragraphs in the recipe section is likely a formula like 'Take...'
What we compute
1) Identify paragraph boundaries in IVTFF data (marked by locus tags). 2) Extract first word of each paragraph per section. 3) Frequency analysis: which words appear paragraph-initially far more often than expected? 4) Compare across sections — does the recipe section have a different paragraph-initial word than the herbal section?
Data needed
IVTFF transcription with paragraph markers.
Effort
Low. Simple frequency analysis.
Success looks like
A specific Voynich word dominates paragraph openings in recipe sections. Combined with zodiac cribs, this gives us additional cribs to constrain cipher tables.
Failure looks like
No word shows paragraph-initial preference → either no formulaic structure, or paragraph boundaries in our transcription are wrong.
Falsifies
If no formulaic structure exists AND text is real → unusual for medieval texts of this type.
Word-Length Source Language Matching
mediumIn a verbose cipher with fixed encoding-unit length, Voynich word length is proportional to plaintext word length. The frequency distribution of Voynich word lengths should correlate with source language word-length distributions. Compare against 15th-century Latin, Italian, German, Hebrew, Arabic texts.
What we compute
1) Voynich word-length distribution (in characters). 2) If encoding unit = N glyphs/letter, divide lengths by N to get 'plaintext letter count' distribution. 3) Compare against word-length distributions of candidate source languages from period-appropriate corpora. 4) Test multiple values of N (2, 3, variable).
Data needed
IVTFF transcription + period-appropriate corpora for candidate languages.
Effort
Medium. Need to source 15th-century text corpora for multiple languages.
Success looks like
For a specific N, the rescaled distribution matches a candidate language with statistical significance (KS-test p > 0.05). This identifies both the encoding unit length AND the source language simultaneously.
Failure looks like
No value of N produces a match → either variable-length encoding (Naibbe), or the 'words' aren't encoding units of plaintext words.
Falsifies
Failure for all fixed N strongly supports variable-length encoding (consistent with Naibbe but complicates analysis).
The Three-Hypothesis Falsification Battery
mediumDefine specific, falsifiable predictions for each remaining hypothesis (verbose cipher, constructed language, compression). Design a single test battery that eliminates at least one. Every test that passes narrows the field. Run Attacks 1-7 and interpret results jointly.
What we compute
Joint interpretation matrix: (1) Verbose cipher predicts periodic entropy, crib consistency, slot frequencies matching a natural language. (2) Constructed language predicts smooth entropy, no crib matches, frequencies matching no known language. (3) Compression predicts uniform entropy, no positional structure, high information density. Score each hypothesis against all attack results.
Data needed
Results from Attacks 1-7.
Effort
Interpretive work. No new computation — but the most intellectually demanding step.
Success looks like
One hypothesis is clearly favored across all dimensions. Two or more are falsified. We know WHAT the text is, even if we can't read it yet.
Failure looks like
Results are ambiguous — some attacks favor one hypothesis, others favor another. This would suggest a hybrid mechanism (e.g., compressed then enciphered) or something we haven't modeled.
Falsifies
The entire framework is designed to falsify. If nothing is falsified, our models are all wrong.
The Three Surviving Hypotheses
After constraint elimination, only three hypotheses remain viable. Each makes specific, testable predictions. The attack plan above is designed to falsify at least one.
Verbose Cipher (Naibbe-style)
A 15th-century verbose homophonic substitution cipher, encrypting Latin or Italian using tables selected by dice and playing cards. Each plaintext letter maps to multiple ciphertext glyphs. Word boundaries in ciphertext are cipher artifacts, not linguistic boundaries.
Testable predictions
- •Periodic entropy structure within words (Attack 1)
- •Zodiac cribs produce consistent table entries (Attack 2)
- •Slot-specific frequencies match a natural language (Attack 3)
- •Rare bigrams cluster at specific word positions (Attack 4)
- •Different quires may show different statistical profiles (Attack 5)
- •Paragraph-initial formulas exist in recipe/herbal sections (Attack 6)
- •Rescaled word-length distribution matches source language (Attack 7)
Falsified by
If Attacks 1+4 show no periodic/clustering structure AND Attack 3 recovers no natural language match, verbose cipher is falsified.
Constructed Language
An artificial language designed with extremely strict phonotactic rules, possibly as a secret scholarly notation or philosophical language. No cipher involved — the text IS the language.
Testable predictions
- •Smooth entropy curve with no periodic structure (Attack 1)
- •No zodiac crib consistency under any cipher model (Attack 2)
- •Slot frequencies match no known natural language (Attack 3)
- •No rare-bigram clustering — all transitions are 'designed' (Attack 4)
- •A/B distinction reflects two dialects or registers, not key changes (Attack 5)
- •Paragraph-initial patterns may still exist (formulaic structure possible in conlangs)
Falsified by
If Attack 1 shows clear periodic entropy structure → constructed language is falsified (natural languages and conlangs don't have periodic within-word entropy).
Compressed Bitstream
The text is a compressed representation of information, where the 'characters' are symbols in a compression codebook. Low entropy is a natural consequence of compression. The 'slot grammar' might be the structure of the compression scheme itself.
Testable predictions
- •Near-uniform conditional entropy across all word positions (Attack 1)
- •No crib consistency — compression doesn't preserve word-level structure (Attack 2)
- •No match to any natural language frequency distribution (Attack 3)
- •No rare-bigram clustering — all bigrams are equally 'designed' (Attack 4)
- •High information density per character compared to natural language
Falsified by
If Attack 1 shows ANY positional structure (periodic or smooth) → pure compression is falsified. If zodiac cribs work under any model → compression is falsified.
Decipherment Attempts
A century of attempts to crack the Voynich code — from microscopic shorthand to neural networks.
William Newbold
Microscopic shorthand cipher
Claimed pen strokes contained microscopic Greek shorthand visible only under magnification. Announced he had deciphered the manuscript revealing Roger Bacon's scientific discoveries including telescopes and spiral nebulae.
William Friedman / NSA Team
Constructed language hypothesis
Legendary codebreaker William Friedman led an informal NSA cryptographer team. Concluded the text 'does not act like natural language' and conjectured it might be an artificial/constructed language. John Tiltman and Mary D'Imperio contributed foundational analysis.
Joseph Feely
Simple Latin substitution cipher
Published 'Roger Bacon's Cipher: The Right Key Found' claiming a straightforward alphabetic substitution producing medieval Latin text.
Prescott Currier
Two 'languages' (A and B)
Naval cryptanalyst identified two statistically distinct character distributions corresponding to different manuscript sections. Proposed at least two scribes or encoding methods.
William Bennett
Entropy measurements
Performed systematic entropy analysis comparing Voynich text to known languages and ciphers. Found word-level entropy notably low.
Mary D'Imperio
Comprehensive survey (NSA)
Published 'The Voynich Manuscript: An Elegant Enigma' through NSA. Not a decipherment but the first rigorous catalog of all attempts, properties, and theories. Classified possibilities into four categories: natural language cipher, synthetic language, random/hoax, and exotic natural language.
Leo Levitov
Cathar endura ritual
Proposed the text is a 'polyglot oral tongue' mixing medieval Flemish, Old French, and Old High German, documenting Cathar euthanasia rituals. Published 'Solution of the Voynich Manuscript.'
Jorge Stolfi
Word paradigm / Chinese language
Performed extensive computational analysis revealing strict word-internal structure: words follow a prefix-midfix-suffix pattern with specific character 'slots'. Later explored connections to Chinese and Manchu writing.
Gordon Rugg
Cardan grille hoax
Demonstrated that a table of syllables combined with a Cardan grille (a card with windows) could produce text with Voynich-like statistical properties. Published in Scientific American.
Sravana Reddy & Kevin Knight
Computational NLP analysis
Applied modern NLP techniques including anagram solving and machine translation. Tested whether the text could be an anagram of known languages. Published at ACL workshop.
Montemurro & Zanette
Semantic content via information theory
Applied clustering and entropy analysis to show the manuscript has a complex statistical structure consistent with meaningful content. Found semantic-like networks organized by topic sections. Published in PLoS ONE.
Stephen Bax
Partial decipherment via proper nouns
Attempted bottom-up decipherment starting with plant names. Identified ~10 characters by matching botanical illustrations to known plants and reading labels as their names in various languages.
Torsten Timm
Self-citation / verbose cipher
Proposed that Voynich words are generated by a process of partial self-copying from nearby text (both forward and backward), explaining the high repetition and low entropy.
Kondrak & Hauer
AI decipherment as Hebrew
Used neural networks and AI-based decipherment techniques to propose the underlying language is Hebrew encoded with alphagram substitution. Generated headline-grabbing claims.
Gerard Cheshire
Proto-Romance language
Claimed the text is written in 'proto-Romance' — a precursor to modern Romance languages — and that he had fully deciphered it. Published in Romance Studies.
Bowern & Lindemann
Authoritative linguistic survey
Published definitive review in Annual Review of Linguistics confirming genuine linguistic structure. Surveyed all major computational and linguistic analyses to date. Concluded the manuscript has real language-like properties that resist simple explanations.
Rene Zandbergen
STA (Super Transliteration Alphabet)
Published comprehensive analysis of all transliteration systems showing that alphabet choice fundamentally changes statistical results. Proposed STA as superset to enable cross-system comparison.
Brewer & Lewis
Gynecology / women's secrets
Published in Social History of Medicine arguing the manuscript concerns sex, conception, and gynecology. Connected to Bavarian physician Johannes Hartlieb (c.1410-1468) who wrote about plants, women, magic, astronomy, baths — and recommended 'secret letters' to obscure gynecological recipes.
Lisa Fagin Davis
Multispectral imaging / codicology
Analyzed multispectral images of folio 1r, discovering previously hidden columns: two Roman alphabets (offset by one letter) and one column of Voynich characters. Attributed the handwriting to Johannes Marcus Marci (owned manuscript 1662-1665) — an early decryption attempt using substitution ciphers.
Greshko (Naibbe cipher)
Verbose homophonic substitution cipher
Demonstrated a historically plausible cipher method using dice and playing cards ('Naibbe') that encrypts Latin and Italian into Voynich-like ciphertext. The method is hand-executable with 15th-century materials and produces text that is fully decipherable while reproducing multiple Voynich statistical properties simultaneously.
The Hoax Debate
Is the Voynich Manuscript a genuine encoded text, or an elaborate meaningless hoax? The debate remains open.
Arguments for Hoax
- •Gordon Rugg showed Cardan grille can produce similar-looking text (2004)
- •Self-citation algorithm (Timm 2014) reproduces both of Zipf's laws from simple copying
- •No corrections or hesitations in the writing — unusual for genuine medieval manuscripts
- •No one has deciphered it despite 600+ years and modern computers
- •Second-order entropy (h2 ~2 bits) is lower than ALL 316 natural languages tested — abnormally predictable
- •Some character combinations appear suspiciously regular
- •The illustrations include impossible/fantasy plants that resist identification
Arguments Against Hoax
- •Long-range semantic correlations found by Montemurro & Zanette (2013) — extremely hard to fake with any mechanical method
- •Cardan grille wasn't invented until 1550 — at least 112 years after the vellum was created (1404-1438)
- •The Naibbe cipher (2025) proves a real, decipherable cipher CAN produce Voynich-like statistics — no hoax needed
- •Content-word clustering by section tracks topics — meaningless text wouldn't organize this way
- •Currier A/B distinction with two correlated scribes suggests systematic encoding, not random generation
- •Marci's failed 17th-century decryption attempt (found via multispectral imaging, 2024) shows it was already undecipherable
- •Bowern & Lindemann (2021, Annual Review of Linguistics) confirm genuine linguistic structure
- •Cost of vellum in the 15th century made hoaxing on this scale economically irrational
- •Sophisticated codicological structure (quire arrangement, catchwords) consistent with genuine manuscripts
Botanical Identifications
Attempts to identify the ~130 plant illustrations as real species — a potential entry point for decipherment.
Tucker, Talbert & Janick (2013-2019)
DebatedIdentified 166 phytomorphs as New World species in their 2019 book 'Flora of the Voynich Codex' (Springer). Includes sunflower, chili pepper, armadillo. Won the American Botanical Council's James A. Duke Excellence in Botanical Literature Award (2020).
Award-winning botanical analysis but conflicts with radiocarbon dating (vellum 1404-1438, predating European knowledge of New World). Tucker argues text was written later on old vellum. Mainstream scholars remain skeptical of the New World hypothesis.
Edith Sherwood (2008)
DebatedIdentified several plants as common European species, suggesting Northern Italian origin. Compared illustrations to works by Leonardo da Vinci's contemporaries.
Some identifications are reasonable but many botanical illustrations appear to be composites or fantasy plants, making identification speculative.
Stephen Bax (2014)
DebatedUsed botanical identifications as entry points for decipherment. Matched illustration of Centaurea (cornflower) to the label text, proposing phonetic values for several characters.
Method is sound in principle (similar to how Egyptian hieroglyphs were deciphered via bilingual texts), but sample size too small for statistical validation.
AI Prior Art
Recent AI/ML work on ancient scripts, cipher-breaking, and manuscript analysis that directly informs our approach. These are the shoulders we stand on.
Oracle Bone Script Decipherment (OBSD)
2024Diffusion model learned to transform ancient Oracle Bone characters into modern Chinese. Won ACL 2024 Best Paper. Follow-ups (DCSD-OBI, OracleFusion at ICCV 2025) improved accuracy by 11%+. Uses Chinese-CLIP for cross-modal consistency.
Relevance: Direct precedent: train a diffusion model on Voynich glyphs to learn structural decomposition and morph toward known alphabets.
DeepMind Ithaca (Ancient Greek)
2022Transformer trained on 78,000 ancient Greek inscriptions. 62% accuracy on text restoration alone; 72% when combined with historians. Geographic attribution at 71% accuracy. Dating within 30 years. Published in Nature.
Relevance: Shows self-supervised pre-training works on ancient corpora. Our 37,919 word tokens may be sufficient for character-level pattern learning.
Vesuvius Challenge (Herculaneum Scrolls)
2023-25CT scanning + AI read 2,000+ characters from carbonized scrolls where ink was invisible to the eye. $700K grand prize awarded February 2024. Discovered a previously unknown tract by Philodemus.
Relevance: Proof that AI can extract information from manuscripts that humans literally cannot see. Multispectral + AI pipeline could reveal hidden Voynich features.
Neural Decipherment (MIT, Barzilay)
2019-21Minimum-cost flow optimization deciphered Ugaritic (via Hebrew) and Linear B (via Greek). 2021 follow-up handled undersegmented scripts with phonetic priors. Published ACL 2019 and TACL 2021.
Relevance: Best algorithmic framework for cipher-breaking, but requires a known related language — the Voynich's fundamental blocker.
Coupled Simulated Annealing (Tamburini)
2025Combinatorial optimization tested on Linear A, Proto-Elamite, Indus Valley, Rongorongo. Allows null, one-to-many, and many-to-one character mappings. Published Frontiers in AI.
Relevance: Could be applied directly to Voynich under various language hypotheses. Handles verbose mappings naturally.
Gaskell & Bowern: 'Gibberish After All?'
202242 volunteers wrote meaningless text. ML classifier found Voynich transcriptions statistically resemble human-produced gibberish more than meaningful text. Code on GitHub (danielgaskell/voynich).
Relevance: Serious challenge: any decipherment theory must explain why the text passes this gibberish test. Counter: Montemurro's semantic clustering is harder to fake.
Compression-Based Hypothesis
2025Treats Voynich as an encoded bitstream, testing decompression parameters using Shannon entropy as fitness. Proposes low redundancy and word structure are artifacts of compression (LZ77 + Huffman-like).
Relevance: Novel framework. If the text is compressed rather than encrypted, the decipherment problem changes fundamentally.
Naibbe Cipher Implementation
2025Open-source Python implementation (github.com/greshko/naibbe-cipher). Includes encryption, decryption, and Voynich-style text generator. Verbose homophonic substitution using dice + playing cards.
Relevance: Concrete, testable cipher model. We can generate synthetic Naibbe ciphertexts and compare statistics to the real manuscript systematically.
LLM Cryptanalysis Benchmarks (CipherBench)
2024-25Multiple benchmark studies (CipherBench, CipherBank at ACL 2025) tested GPT-4, Claude, and Gemini on classical cipher-breaking. LLMs can break simple substitution ciphers (Caesar, Atbash) with ~90% accuracy but fail on polyalphabetic and homophonic ciphers. Performance degrades sharply with cipher complexity.
Relevance: Establishes that current LLMs cannot brute-force the Voynich directly. But LLMs excel at pattern description and hypothesis generation — use them as analytical partners, not as cipher-breakers.
Neural LM Scoring for Cipher-Breaking
2018-24Kambhatla et al. (EMNLP 2018) replaced n-gram language models with LSTMs in hill-climbing cipher attacks, achieving 2x improvement on homophonic substitution ciphers. Extended by beam search decipherment (ACL 2022). The same technique powered AZdecrypt's solution of the Zodiac Z340 cipher in 2020.
Relevance: This is the core technique for our Naibbe attack strategy. Neural LM scoring of candidate plaintexts is dramatically better than classical n-gram methods.
AZdecrypt (Zodiac Z340 Solution)
2020David Oranchak, Jarl Van Eycke, and Sam Blake used AZdecrypt to crack the 51-year-old Zodiac Z340 cipher. The tool uses hill climbing with n-gram scoring, testing 650,000+ candidate transpositions. FBI confirmed the solution in March 2021.
Relevance: Closest real-world precedent for a computational Voynich attack. Z340 was a homophonic substitution cipher with transposition — similar complexity class. The Voynich has 500x more ciphertext, giving more statistical leverage.
Sign2Vec / Cypro-Minoan Clustering
2022Ferrara (PNAS 2022) applied unsupervised neural embeddings to cluster signs in the undeciphered Cypro-Minoan script. Learned representations captured structural relationships without any labeled data. Identified potential sign variants and ligatures.
Relevance: Directly applicable to visual alphabet discovery. Train embeddings on our 17,672 glyph images to discover the 'true' character set from pixels alone, bypassing transcription assumptions.
Akkadian Neural Machine Translation
2023Gutherz et al. (PNAS Nexus 2023) built an NMT system translating Akkadian cuneiform to English. Trained on ~10,000 parallel text pairs. Companion model 'cuneiformBase-400m' handles multiple ancient languages. Follow-ups (ICLR 2025) added iterative translation refinement.
Relevance: Shows NMT works on ancient languages with sufficient parallel data. The Voynich lacks parallel text — but if even partial decipherment produces fragments, NMT could bootstrap the rest.
Multimodal LLMs on Historical Manuscripts
2024-25GPT-4V and Claude Vision applied to medieval manuscript analysis — page layout detection, script identification, illustration classification. HTR-United and CATMuS Medieval provide training data (200+ manuscripts, 160K+ lines). Florence-2 and Grounding DINO enable zero-shot object detection in manuscript pages.
Relevance: We already used Claude Vision for glyph extraction (17,672 glyphs). Next: systematic page-level analysis combining text regions, illustration types, and marginalia to build a structural map of the entire manuscript.
AI Research Approaches
Novel AI/ML approaches we are exploring to make progress on the Voynich mystery. These go beyond previous attempts by leveraging modern multi-modal models and information theory.
Diffusion Model Glyph Analysis
Train a diffusion model on our 17,672 cropped Voynich glyphs — following the Oracle Bone approach (ACL 2024 Best Paper). Learn the structural decomposition of the script. Use conditional generation to morph Voynich characters toward known alphabets (Latin, Arabic, Hebrew). The script requiring the smallest transformation may be related.
CLIP Text-Image Alignment
Use CLIP/OpenCLIP to embed Voynich pages (text and images separately). Test whether text embeddings cluster the same way as image embeddings — do botanical texts cluster with botanical images? Fine-tune on medieval manuscripts (Tractatus de Herbis) where text-image relationships are known, then apply to Voynich. Zero effort, high signal.
Visual Alphabet Discovery
Use DINOv2/SAM2 vision transformers to cluster all 17,672 glyph images by visual similarity, ignoring all prior transcriptions. Discover the 'true' alphabet from pixel data. Compare to EVA assignments. Could reveal whether EVA over-splits or under-splits characters — resolving Zandbergen's fundamental concern.
Naibbe Brute-Force Search
Using the open-source Naibbe cipher implementation, systematically test all possible cipher tables against candidate plaintext languages (Latin, Italian, German). Use a language model to score 'coherence' of candidate plaintexts. GPU-accelerated search could make this tractable. Essentially what Bletchley Park did for Enigma.
Cipher-Type Fingerprinting
Generate millions of synthetic ciphertexts: encrypt 15th-century texts with every known cipher type. Measure statistical fingerprints (h1, h2, Zipf slope, word length, hapax rate). Train a classifier to identify cipher family from ciphertext statistics alone. Run on real Voynich text. Even a probabilistic answer would be a breakthrough.
Cross-Section Transfer Learning
Train character-level transformers separately on each manuscript section. Measure transfer learning performance: if a model trained on section A predicts section B well, they share deep structure. Could quantify the Currier A/B distinction and discover new sub-languages. Inspired by Ithaca (DeepMind, Nature 2022).
Zero-Shot Botanical Matching
Use CLIP for zero-shot sketch-based image retrieval (CVPR 2023) to match Voynich botanical drawings against plant photo databases. No one has tried this with medieval illustrations. If even 5 plant labels can be confidently identified, that gives anchor points — breaking the 'double unknown.'
Scribe-Specific Decipherment
Five scribes were identified (Davis), perfectly correlated with Language A/B. Use ViT-based writer identification (F1=0.96 on medieval manuscripts) to validate, then treat each scribe's text as a separate cipher problem. Each sub-corpus would have more consistent statistics, making cipher-breaking more tractable.
Line-Unit Cipher Hypothesis
The line functions as an encoding unit (first words longer, certain chars only at line ends). What if each line is independently encrypted with a position-derived key? Test by measuring whether character distributions within lines are more uniform than across lines. If so, the key resets per line.
Synthetic Pre-Training for Decipherment
Train a transformer on millions of synthetic cipher-plaintext pairs: encrypt 15th-century texts with every known cipher type (substitution, homophonic, verbose, Naibbe-style). The model learns to 'feel' cipher structure. Then fine-tune on real Voynich text. Even without deciphering, the model's internal representations reveal which cipher family the Voynich belongs to.
LLM-Assisted Pattern Discovery
Use Claude/GPT-4 not as cipher-breakers but as pattern describers. Feed pages of Voynich transcription and ask for structural observations: repeated phrases, formulaic openings, section-boundary markers. LLMs excel at noticing patterns humans miss in large texts. Benchmark: CipherBench shows LLMs fail at decryption but succeed at pattern analysis.
Datasets & Tools
Open-source resources, datasets, and pre-trained models available for Voynich research today.
Voynich Full Dataset v2.0
DatasetZenodo (Jan 2026). Complete EVA2 + Greek transliteration of the manuscript. Includes word boundaries, line positions, and folio metadata. The most current digital text resource.
IVTFF Takahashi Transcription
Dataset162,755 expert-transcribed characters across 225 folios. IVTFF format with locus tags (folio.paragraph.line). Our primary ground truth for text analysis.
CATMuS Medieval
DatasetHuggingFace dataset with 200+ medieval manuscripts and 160K+ transcribed lines. Pre-training resource for handwriting recognition models. Covers Latin, French, English, German scripts from 8th-16th century.
Naibbe Cipher (Python)
ToolOpen-source implementation: encryption, decryption, and Voynich-style text generator. 6 substitution tables, dice + card selection. github.com/greshko/naibbe-cipher.
AZdecrypt
ToolThe tool that cracked the Zodiac Z340 cipher. Hill climbing + n-gram scoring for homophonic substitution ciphers. Open source, actively maintained. Direct starting point for Voynich computational attacks.
eScriptorium + Kraken
ToolBest open-source pipeline for historical document HTR (Handwritten Text Recognition). Kraken handles segmentation + OCR; eScriptorium provides the annotation UI. Used by major digital humanities projects.
OBSD / OracleFusion
ModelDiffusion models for ancient script decipherment (ACL 2024 Best Paper). OBSD transforms Oracle Bone characters into modern Chinese. OracleFusion (ICCV 2025) adds cross-modal consistency. github.com/guanhaisu/OBSD.
cuneiformBase-400m
Model400M parameter multilingual model for cuneiform scripts (Akkadian, Sumerian, Hittite). Demonstrates that transformer pre-training works on ancient scripts with limited data.
How the Naibbe Cipher Works
The most promising cipher model for the Voynich. A verbose homophonic substitution cipher using only 15th-century materials.
1. Split plaintext
A die roll determines whether each plaintext letter is encrypted alone (unigram) or paired with its neighbor (bigram). Roughly 50/50 split. This randomizes word boundaries in ciphertext.
2. Select table
Draw a playing card to select one of 6 substitution tables. Each letter has different multi-glyph strings per table. One letter can be disguised 6 ways as unigram, or 36 ways as a bigram (6 prefix x 6 suffix choices).
3. Write glyphs
Look up the multi-glyph string from the selected table. All outputs obey the Voynich slot grammar. Adjacent glyphs within a "word" are determined by the table, not the plaintext — this is why h2 is so low.
Statistical match
Historical plausibility
The earliest known homophonic cipher dates to 1401 (Francesco I Gonzaga, Duke of Mantua). Playing cards were widespread in Italy from the late 1300s — the word "naibbe" itself comes from this era. Dice were ubiquitous. Italian city-states used nomenclators (hybrid letter-cipher + word-code systems) in diplomatic correspondence throughout the 15th century. Alberti's polyalphabetic cipher (1467) came later — the Naibbe uses only techniques available in 1404-1438.
Computational Attack Strategy
If the Voynich is a verbose homophonic cipher (Naibbe-style), here is how a computational attack would work. Inspired by the Z340 Zodiac cipher solution (2020).
The Joint Segmentation + Decryption Problem
Unlike standard homophonic ciphers (1 symbol = 1 letter), a verbose cipher requires solving TWO problems simultaneously: (1) how to segment Voynich "words" into letter-encoding units (unigrams vs bigrams), and (2) what each unit decrypts to. This makes it fundamentally harder than Z340 — but the Voynich has ~170,000 characters (vs Z340's 340), giving far more statistical leverage.
Step 1: Constrain the table
Generate only slot-grammar-compliant glyph strings. This eliminates ~99%+ of possible table entries. The remaining valid strings (hundreds to low thousands) form the candidate pool.
Step 2: Hill climbing + neural LM
Use hill climbing with neural language model scoring (LSTM or transformer, trained on 15th-century Latin/Italian). Swap table entries, score candidate plaintext coherence. Accept improvements, reject regressions. Thousands of parallel restarts on GPU.
Step 3: Multi-language sweep
Run the attack against language models for Latin, Italian, German, Hebrew, Arabic, and other candidate languages. The language that produces the highest-scoring plaintext is the most likely source. This sidesteps the "unknown language" problem.
Step 4: Scribe-specific attacks
Run separate attacks on Language A pages (scribes 1,4) and Language B pages (scribes 2,3,5). Each sub-corpus has more consistent statistics. If different cipher tables were used per scribe, this decomposition is essential.
Estimated tractability
On modern GPU hardware (A100/H100), individual scoring evaluations take microseconds, but the search space requires billions of evaluations. With slot grammar constraints reducing the table space by 99%+, and parallel restarts across thousands of GPU cores, estimated wall-clock time: days to weeks on a GPU cluster. The Z340 was solved with 650,000 candidate transpositions — the Voynich search space is larger but the data is 500x longer.
Key References
Bennett, W.R. (1976). "Scientific and Engineering Problem-Solving with the Computer." Prentice-Hall.
Currier, P.H. (1976). "New Research on the Voynich Manuscript." Presented at ACA seminar.
D'Imperio, M.E. (1978). "The Voynich Manuscript: An Elegant Enigma." NSA/CSS publication.
Rugg, G. (2004). "An Elegant Hoax?" Cryptologia, 28(1), 31-46.
Reddy, S. & Knight, K. (2011). "What We Know About the Voynich Manuscript." ACL Workshop.
Amancio, D.R. et al. (2013). "Probing the statistical properties of unknown texts." PLoS ONE, 8(7).
Montemurro, M.A. & Zanette, D.H. (2013). "Keywords and Co-Occurrence Patterns in the Voynich Manuscript." PLoS ONE, 8(6).
Tucker, A.O. & Talbert, R.H. (2013). "A Preliminary Analysis of the Botany, Zoology, and Mineralogy of the Voynich Manuscript." HerbalGram, 100.
Timm, T. (2014). "How the Voynich Manuscript was created." arXiv:1407.6639.
Bax, S. (2014). "A proposed partial decoding of the Voynich script." stephenbax.net.
Tucker, A.O. & Janick, J. (2019). "Flora of the Voynich Codex." Springer.
Timm, T. & Schinner, A. (2019). "A Possible Generating Algorithm of the Voynich Manuscript." Cryptologia.
Bowern, C. & Lindemann, L. (2021). "The Linguistics of the Voynich Manuscript." Annual Review of Linguistics, 7.
Zandbergen, R. (2022). "Transliteration of the Voynich MS." CEUR-WS Vol. 3313.
Luo, J. et al. (2019). "Neural Decipherment via Minimum-Cost Flow." ACL 2019.
Sommerschield, T. et al. (2022). "Restoring and attributing ancient texts using deep neural networks." Nature.
Gaskell, D. & Bowern, C. (2022). "Gibberish after all?" University of Malta.
Gutherz, G. et al. (2023). "Translating Akkadian to English with neural machine translation." PNAS Nexus.
Brewer, K. & Lewis, M.L. (2024). "The Voynich Manuscript, Dr Johannes Hartlieb and the Encipherment of Women's Secrets." Social History of Medicine, 37(3).
Guan, H. et al. (2024). "OBSD: Deciphering Oracle Bone Language with Diffusion Models." ACL 2024 (Best Paper).
Tamburini, F. (2025). "Automatic decipherment via coupled simulated annealing." Frontiers in AI.
Kambhatla, N. et al. (2018). "Decipherment with a Million Random Restarts." EMNLP 2018.
Oranchak, D. et al. (2020). "Cracking the Zodiac Killer's Z340 Cipher." FBI confirmed March 2021.
Ferrara, S. (2022). "Cypro-Minoan sign clustering via neural embeddings." PNAS.
Greshko, M. (2025). "The Naibbe Cipher." Cryptologia. [github.com/greshko/naibbe-cipher]
Li, Z. et al. (2025). "CipherBench: Benchmarking LLMs on Classical Cipher Breaking." ACL 2025.
Our Current Status
Data imported: 162,755 expert-transcribed characters from Takahashi IVTFF transcription (225 folios mapped to 176 pages). AI-extracted glyph bounding boxes for 17,672 characters with cropped images on R2.
Key finding: Our initial AI glyph extraction (Claude Vision) captured only ~9% of the known characters, missing critical digraphs (ch, sh) and common characters (h, d, y). This demonstrates why expert transcriptions remain essential as ground truth.
Next steps: Import the Zandbergen-Landini Extended EVA (ZL) transcription as a superior data source. Build configurable alphabet system to run analyses under different transliteration assumptions. Begin entropy gradient mapping and cross-section transfer learning experiments.