Research

A comprehensive literature review of Voynich Manuscript (MS 408) research, decipherment attempts, and proposed AI approaches. Compiled from 100+ years of scholarship.

Last updated: March 7, 2026162,755 transcribed characters176 manuscript pages

Established Facts

What we know for certain about MS 408, based on physical analysis and scholarly consensus.

Radiocarbon Dating

1404-1438 CE

Vellum dated by University of Arizona (2009). Ink dating inconclusive but consistent. This rules out many post-1500 authorship theories.

Provenance

Traced to 1600s

Earliest confirmed owner: Georg Baresch (Prague, 1637). Passed to Athanasius Kircher via Jan Marek Marci. Possibly owned by Rudolf II (purchased for 600 ducats). Acquired by Wilfrid Voynich in 1912 from Villa Mondragone.

Physical Format

240 pages, 23.5 x 16.2 cm

Quarto format. Some folios are fold-outs (up to 6 panels). Written left-to-right. Contains ~170,000 characters across ~35,000 words. No corrections, strikethroughs, or erasures visible.

Sections

6 thematic sections

Botanical (herbal, ~130 pages), Astronomical (zodiac/star charts), Biological (nude figures in tubes), Pharmaceutical (jars/roots), Recipes (dense text, short paragraphs), and unlabeled sections.

Script

~20-30 distinct characters

Unique script with no confirmed match to any known writing system. Characters include 'gallows' (tall ornate letters), bench-like characters, and simple loops. Multiple transcription systems exist (EVA, Currier, Frogguy).

Statistical Properties

Language-like but anomalous

Follows Zipf's law for word frequencies. Word-level entropy (~10 bits/word) matches English/Latin. But character-level second-order entropy (h2 ~2 bits) is lower than ALL 316 natural languages tested (typically 3-4 bits) — characters are abnormally predictable. Two distinct 'languages' (Currier A and B) in different sections. ~37,919 word tokens, ~8,114 unique types.

Script & Language Properties

Technical characteristics of the Voynich script discovered through computational analysis.

EVA (European Voynich Alphabet)

Created by Rene Zandbergen and Gabriel Landini (1998). Defines ~25 basic characters using ASCII equivalents. 'Analytical' approach that breaks visible strokes into components (e.g., 'ch' = c + h). Most widely used system for computational analysis.

Gallows Characters

Four tall, ornate characters (EVA: t, k, p, f) that extend above the line. Appear mainly at word beginnings and line beginnings. 'Rare' gallows (cth, ckh, cph, cfh) combine bench + gallows. Their distribution suggests they may be numerals, abbreviation markers, or paragraph markers.

The 'Slot' Model (Stolfi)

Voynich words follow a rigid crust-mantle-core structure with specific character subsets at each layer. This grammar covers 96.5% of all running text tokens. Word length distribution is unusually binomial (peaked, not the long tail seen in natural languages). Position 1: q, s, d (or empty). Position 2: o (or empty). Position 3: l, r (or empty). Core: e, ch, sh, etc. Final: y, m, n, g. Unlike any known natural language or cipher.

Currier A vs B

Two statistically distinct 'languages' first identified by Prescott Currier (1976). Language A: more 'o' characters, found in herbal-A sections. Language B: more 'a' characters, found in herbal-B and biological sections. Could represent two dialects, scribes, or encoding schemes.

Entropy Anomalies

Character-level entropy: ~4.5 bits (similar to English). Word-level entropy: ~10 bits/word (lower than most European languages at ~11-12). Second-order entropy (character pairs) is unusually low, suggesting high predictability. This is the 'too regular' problem — the text is more predictable than natural languages.

Zipf's Law Compliance

Word frequencies follow Zipf's law (frequency inversely proportional to rank) very closely, matching natural language behavior. This was initially cited as evidence of real language, but it's now known that some generated texts can also exhibit Zipf-like distributions.

Hapax Legomena

Words appearing only once make up 14-20% of the vocabulary (depending on transcription). This is lower than typical natural languages (~40-60%) but higher than random text. The rate varies significantly based on which transliteration alphabet is used — a key insight from Zandbergen (2022).

Line as Functional Unit

First words on each line are ~1 character longer than average. Certain words have strong line-position affinity — one character appears line-finally in 85% of its occurrences. No known natural language shows this property. Lines may function as encoding units rather than arbitrary text wrapping.

Labels vs Paragraph Text

Labels (text near illustrations) have 12.4% 'abnormal' words vs only 3.7% in running text. Labels show very few repetitions and many unique forms. Interestingly, many label words also appear in running text — but not near their labeled illustration, suggesting labels aren't simple captions.

Extreme Positional Constraints

Characters show extreme positional preferences: 'q' appears only word-initially, 'm' only word-finally, 'y' only at word start or end. Bigram contact rules are far tighter than any natural alphabet — certain character pairs never occur despite both being common individually.

No Corrections

The manuscript shows virtually no corrections, cross-outs, or scribal errors. This is unusual for a 15th-century manuscript and has been cited both as evidence of careful copying (real text) and as evidence of meaningless generation (hoax). Genuine manuscripts of this era typically show corrections.

The 'Double Unknown' Problem

The Voynich presents a unique challenge: both the cipher method AND the underlying language are unknown. In every historical decipherment success (hieroglyphs, Linear B, Enigma), at least one was known. This 'double unknown' means the search space is combinatorially vast.

Hard Numbers

Specific measurements that define the Voynich puzzle. These are the constraints any successful theory must explain.

Entropy Comparison (bits)

Texth1h2
Voynich (EVA)3.861.84
Latin (Pliny)4.003.27
Italian (Dante)4.013.13
German4.023.04
English4.21~3.2
Random6.01~6.0

h1 = first-order entropy. h2 = conditional (bigram) entropy. Voynich h2 is ~40% lower than natural languages.

Most Frequent Words

EVA WordCountFamily
daiin863daiin
ol537ol
chedy501chedy
aiin469daiin
shedy426shedy
chol396chol
or363ol
ar350ar
qokeey308qo-
qokeedy305qo-

37,919 tokens, 8,114 unique types. ~50% hapax legomena.

37,919
Word tokens
8,114
Unique types
96.5%
Grammar coverage
~5 chars
Avg word length

Constraint Elimination

The h2 entropy constraint alone eliminates most cipher families. This is the analysis most decipherment attempts never performed — they proposed mechanisms that are mathematically impossible given the observed statistics.

Cipher FamilyPredicted h2Voynich h2Verdict
Simple Substitution~3.0-3.31.84Eliminated
Polyalphabetic (Vigenère)~4.0-5.0+1.84Eliminated
Simple Homophonic~3.0-4.01.84Eliminated
Verbose Cipher (Naibbe-style)~1.8-2.21.84Best Fit
Constructed Language~1.5-2.51.84Possible
Compressed Bitstream~1.0-2.51.84Possible
Meaningless Hoax (Cardan grille)~1.5-2.51.84Possible
Simple SubstitutionEliminated
Simple substitution preserves source entropy — it just relabels characters. If plaintext is Latin (h2=3.27), ciphertext h2≈3.27. The Voynich h2 of 1.84 is impossible under simple substitution over any known language.
Polyalphabetic (Vigenère)Eliminated
Multiple shifting alphabets push bigram entropy UPWARD toward random (h2→h1 or higher). Polyalphabetic ciphers make text LESS predictable, not more. The Voynich is abnormally predictable. Wrong direction entirely.
Simple HomophonicEliminated
Multiple symbols per letter preserve or slightly increase h2 (more symbols = more randomness). Cannot produce h2=1.84 from a natural language source. Only works if combined with verbose encoding.
Verbose Cipher (Naibbe-style)Best Fit
One plaintext letter → multiple ciphertext glyphs. Bigrams WITHIN an encoded letter are deterministic (table-driven, h2≈0). Only letter BOUNDARIES carry real entropy. This naturally produces low h2. The Naibbe model produces h2=2.00±0.01, matching the Voynich within measurement uncertainty.
Constructed LanguagePossible
A designed language with extremely strict phonotactic rules could achieve low h2. But no known 15th-century constructed language exists, and the skill required to design one with these specific statistical properties seems anachronistic.
Compressed BitstreamPossible
Compression creates predictable byte-level structure mapped to a glyph alphabet. Could explain low h2 and unusual word patterns. But LZ77/Huffman-like compression wasn't formalized until the 20th century. A 15th-century equivalent would be extraordinary.
Meaningless Hoax (Cardan grille)Possible
A table-and-grille method can produce text with matching h2. But: (a) Cardan grille wasn't invented until 1550; (b) can't reproduce Montemurro & Zanette's long-range semantic correlations; (c) Gaskell & Bowern's gibberish classifier IS consistent with hoax. Not fully eliminated but weakened.

Bottom line: Only three families survive the h2 constraint — verbose cipher, constructed language, and compression. Everything else is mathematically eliminated. The verbose cipher (Naibbe-style) is the only one that also matches the slot grammar, word-length distribution, Zipf compliance, AND is historically plausible for the 15th century.

Crib Catalog

Known or suspected plaintext — the cryptanalyst's entry points. Every successful decipherment in history used cribs. The Voynich has more than most people realize.

Zodiac labels

high confidence

Location: Folios 70v-73r (zodiac section)

Suspected plaintext: Month names: mars, abril, may, junio, julio... Zodiac signs: taurus, gemini, cancer...

Latin-alphabet labels visible directly on the pages. Best available cribs. Some may have been added by a later hand — check ink/handwriting consistency.

Repeated paragraph-initial words

medium confidence

Location: Recipe section (folios 103r-116r)

Suspected plaintext: Likely 'Recipe' (Take), 'Accipe' (Accept), or equivalent

Medieval recipe/herbal texts almost universally begin paragraphs with formulaic verbs. Need to identify which Voynich word appears paragraph-initially most often in this section.

Star labels in astronomical section

medium confidence

Location: Folios 67r-69r (astronomical/cosmological)

Suspected plaintext: Possible star names or cardinal directions

Small labels near star diagrams. If any can be matched to known star names (in Latin, Arabic, or vernacular), they provide additional cribs.

Plant name labels

low confidence

Location: Herbal section (folios 1v-66v)

Suspected plaintext: Labels adjacent to botanical illustrations

IF a plant can be confidently identified (Tucker claims 166 IDs, most disputed), the label might encode its name. Low confidence because plant IDs are contested and labels may not be simple names.

Number sequences

low confidence

Location: Various (especially pharmaceutical section)

Suspected plaintext: Dosage quantities, astronomical measurements

If gallows characters are numerals (as some have proposed), pharmaceutical sections might contain countable quantities. Very speculative but testable if gallows-as-numeral hypothesis is correct.

Cryptanalytic Attack Plan

Eight specific computational attacks, ordered by information yield per CPU-hour. Each attack has defined success/failure criteria and specifies which hypothesis it falsifies. This is not a wish list — it is an execution plan.

#1

Bigram Transition Entropy Mapping

critical

Compute conditional entropy H(char_{i+1} | char_i) at each position within words. In a verbose cipher, positions INSIDE an encoded letter have near-zero entropy (deterministic table lookup). Positions at letter BOUNDARIES have high entropy (next plaintext letter is unpredictable). If verbose cipher is correct, this plot shows periodic spikes.

What we compute

For every word position i (1→2, 2→3, 3→4...), calculate H(next_char | current_char) across all words. Plot as a function of position. Run separately for each word length (5-char words, 6-char words, etc.) to control for edge effects.

Data needed

IVTFF transcription (162,755 characters). Already have it.

Effort

Single Python script. Hours to implement, seconds to run. Highest information-per-CPU-hour of any analysis.

Success looks like

Periodic entropy spikes at regular intervals. Period=2 means each letter encodes as 2 glyphs. Period=3 means 3 glyphs per letter. The Naibbe model predicts VARIABLE length, so we'd see a noisier but detectable signal — strongest within fixed-length word buckets.

Failure looks like

Smooth, monotonically decreasing entropy curve (like natural language) → not a verbose cipher. Flat/uniform entropy → compressed bitstream.

Falsifies

If no periodic structure found → verbose cipher hypothesis is severely weakened. If smooth curve → consistent with constructed language.

#2

Zodiac Page Crib Attack

critical

The zodiac section (folios 70v-73r) has Latin month names and zodiac signs written in a known alphabet RIGHT NEXT to Voynich text. In classical cryptanalysis, this is a 'crib' — suspected plaintext. The entire Enigma effort was built on cribs. The Voynich zodiac pages hand us cribs for free. Nobody has done a systematic crib-based attack with modern tools.

What we compute

1) Map each Latin label to its spatially adjacent Voynich text tokens. 2) For each cipher model (Naibbe, homophonic, etc.), test whether ANY valid table could map the Latin word to the Voynich token. 3) Cross-validate: recover partial table from 12 cribs, predict the other 12.

Data needed

High-res zodiac pages (have on R2) + IVTFF transcription for those folios. Manual identification of spatial label-to-text adjacency (human labor required).

Effort

Medium. Manual crib identification takes hours. Automated Naibbe sweep takes days of compute. But the potential payoff is decipherment itself.

Success looks like

Recovering 3-4 consistent table entries from independent cribs would be extraordinary. If the same entry appears from two different crib pairs, that's near-proof of the cipher model. Even partial table recovery opens the door to full decipherment.

Failure looks like

No consistent mapping exists across any cipher model → either labels aren't cribs (added later by a different hand), or the cipher is more exotic than anything we've modeled.

Falsifies

If no cipher model produces consistent cribs → all tested cipher models are wrong for the zodiac section (though A/B split means other sections may use different keys).

#3

Slot-Specific Frequency Factoring

high

Stolfi's slot model constrains which characters appear at which word positions. A cryptographer sees each slot as a separate cipher channel. Within each slot, character frequencies reflect plaintext letter frequencies filtered through the cipher table. By analyzing per-slot frequencies in Language A vs B, we can factor apart 'different cipher' from 'different language.'

What we compute

1) Tag each character with its Stolfi slot position. 2) Compute per-slot character frequencies for A pages vs B pages. 3) Spearman rank correlation between A and B per slot. 4) Under a cipher model assumption, solve the linear system to recover underlying plaintext letter frequencies. 5) Compare recovered frequencies against known language letter distributions.

Data needed

IVTFF transcription + Stolfi slot assignments (published, need to digitize/implement the parser).

Effort

Medium. Need to implement slot parser + frequency analysis pipeline.

Success looks like

Recovered frequency distribution matches a specific language within 1-2 standard deviations. Even a noisy match (top 5 letters correct) narrows the source language from 'any language ever' to 2-3 candidates. If A and B show high slot-frequency correlation → same language, different tables. Low correlation → different languages.

Failure looks like

Recovered frequencies match no known language → either constructed language, or the cipher model is wrong, or noise overwhelms the signal.

Falsifies

If recovered frequencies match no natural language AND A/B slot correlations are low → strong evidence for constructed language over cipher.

#4

Boundary Detection via Rare Bigrams

high

In a verbose cipher, bigrams within encoded letters are table-driven (high frequency). At letter boundaries, suffix-prefix combinations may be rarer — they arise only from specific letter-pair sequences. Find the lowest-frequency non-zero bigrams and map where they appear in words. If they cluster at specific positions, those positions are letter boundaries.

What we compute

1) Full bigram frequency matrix (25x25 for EVA). 2) Compute observed/expected ratio for each bigram at each word position (normalize for character rarity). 3) Map positions where ratio is most extreme. 4) Cross-reference with Attack 1 entropy spikes for validation.

Data needed

IVTFF transcription. Already have it.

Effort

Low. Can run as an extension of Attack 1. Same Python pipeline.

Success looks like

Rare bigram positions match entropy spike positions from Attack 1 → strong cross-validation of letter boundaries. The combined boundary map tells us encoding unit structure.

Failure looks like

Rare bigrams are uniformly distributed → no boundary signal. This would weaken the verbose cipher hypothesis OR suggest the table designer achieved perfect boundary masking.

Falsifies

No clustering + no entropy periodicity = verbose cipher model is wrong.

#5

Quire-Level Statistical Consistency

high

The manuscript is organized in quires (gatherings of pages). 15th-century cipher users typically changed keys at document boundaries. If the cipher key changes per quire, text within quires should be more statistically consistent than across quires. The A/B split might actually be a quire-key split.

What we compute

1) Group text by physical quire (codicological data available from Beinecke catalog). 2) Per-quire: character frequencies, bigram frequencies, h2, word-length distribution. 3) ANOVA: within-quire variance vs between-quire variance. 4) Compare quire boundaries against Currier A/B split.

Data needed

IVTFF transcription + quire structure mapping (published in codicological studies of MS 408).

Effort

Low once quire mapping is obtained. Quick statistical computation.

Success looks like

Between-quire variance significantly higher than within-quire → key changes per quire. If A/B maps cleanly to quire groups → A/B is a key change, NOT a language change. This simplifies the problem enormously.

Failure looks like

Within-quire and between-quire variance are similar → no key change at quire boundaries. Or quire boundaries don't align with A/B → A/B is driven by something else (scribe, language, topic).

Falsifies

If quires show no statistical grouping → quire-level key changes are not a factor.

#6

Paragraph-Initial Formula Detection

medium

If the underlying text is real (herbal, recipe, medical), paragraph openings should be formulaic. Medieval Latin herbals start with 'Recipe...', 'Accipe...', 'Herba...'. Italian: 'Prendi...', 'Questa pianta...'. The same cipher-word appearing at the start of many paragraphs in the recipe section is likely a formula like 'Take...'

What we compute

1) Identify paragraph boundaries in IVTFF data (marked by locus tags). 2) Extract first word of each paragraph per section. 3) Frequency analysis: which words appear paragraph-initially far more often than expected? 4) Compare across sections — does the recipe section have a different paragraph-initial word than the herbal section?

Data needed

IVTFF transcription with paragraph markers.

Effort

Low. Simple frequency analysis.

Success looks like

A specific Voynich word dominates paragraph openings in recipe sections. Combined with zodiac cribs, this gives us additional cribs to constrain cipher tables.

Failure looks like

No word shows paragraph-initial preference → either no formulaic structure, or paragraph boundaries in our transcription are wrong.

Falsifies

If no formulaic structure exists AND text is real → unusual for medieval texts of this type.

#7

Word-Length Source Language Matching

medium

In a verbose cipher with fixed encoding-unit length, Voynich word length is proportional to plaintext word length. The frequency distribution of Voynich word lengths should correlate with source language word-length distributions. Compare against 15th-century Latin, Italian, German, Hebrew, Arabic texts.

What we compute

1) Voynich word-length distribution (in characters). 2) If encoding unit = N glyphs/letter, divide lengths by N to get 'plaintext letter count' distribution. 3) Compare against word-length distributions of candidate source languages from period-appropriate corpora. 4) Test multiple values of N (2, 3, variable).

Data needed

IVTFF transcription + period-appropriate corpora for candidate languages.

Effort

Medium. Need to source 15th-century text corpora for multiple languages.

Success looks like

For a specific N, the rescaled distribution matches a candidate language with statistical significance (KS-test p > 0.05). This identifies both the encoding unit length AND the source language simultaneously.

Failure looks like

No value of N produces a match → either variable-length encoding (Naibbe), or the 'words' aren't encoding units of plaintext words.

Falsifies

Failure for all fixed N strongly supports variable-length encoding (consistent with Naibbe but complicates analysis).

#8

The Three-Hypothesis Falsification Battery

medium

Define specific, falsifiable predictions for each remaining hypothesis (verbose cipher, constructed language, compression). Design a single test battery that eliminates at least one. Every test that passes narrows the field. Run Attacks 1-7 and interpret results jointly.

What we compute

Joint interpretation matrix: (1) Verbose cipher predicts periodic entropy, crib consistency, slot frequencies matching a natural language. (2) Constructed language predicts smooth entropy, no crib matches, frequencies matching no known language. (3) Compression predicts uniform entropy, no positional structure, high information density. Score each hypothesis against all attack results.

Data needed

Results from Attacks 1-7.

Effort

Interpretive work. No new computation — but the most intellectually demanding step.

Success looks like

One hypothesis is clearly favored across all dimensions. Two or more are falsified. We know WHAT the text is, even if we can't read it yet.

Failure looks like

Results are ambiguous — some attacks favor one hypothesis, others favor another. This would suggest a hybrid mechanism (e.g., compressed then enciphered) or something we haven't modeled.

Falsifies

The entire framework is designed to falsify. If nothing is falsified, our models are all wrong.

The Three Surviving Hypotheses

After constraint elimination, only three hypotheses remain viable. Each makes specific, testable predictions. The attack plan above is designed to falsify at least one.

Verbose Cipher (Naibbe-style)

A 15th-century verbose homophonic substitution cipher, encrypting Latin or Italian using tables selected by dice and playing cards. Each plaintext letter maps to multiple ciphertext glyphs. Word boundaries in ciphertext are cipher artifacts, not linguistic boundaries.

Testable predictions

  • Periodic entropy structure within words (Attack 1)
  • Zodiac cribs produce consistent table entries (Attack 2)
  • Slot-specific frequencies match a natural language (Attack 3)
  • Rare bigrams cluster at specific word positions (Attack 4)
  • Different quires may show different statistical profiles (Attack 5)
  • Paragraph-initial formulas exist in recipe/herbal sections (Attack 6)
  • Rescaled word-length distribution matches source language (Attack 7)

Falsified by

If Attacks 1+4 show no periodic/clustering structure AND Attack 3 recovers no natural language match, verbose cipher is falsified.

Constructed Language

An artificial language designed with extremely strict phonotactic rules, possibly as a secret scholarly notation or philosophical language. No cipher involved — the text IS the language.

Testable predictions

  • Smooth entropy curve with no periodic structure (Attack 1)
  • No zodiac crib consistency under any cipher model (Attack 2)
  • Slot frequencies match no known natural language (Attack 3)
  • No rare-bigram clustering — all transitions are 'designed' (Attack 4)
  • A/B distinction reflects two dialects or registers, not key changes (Attack 5)
  • Paragraph-initial patterns may still exist (formulaic structure possible in conlangs)

Falsified by

If Attack 1 shows clear periodic entropy structure → constructed language is falsified (natural languages and conlangs don't have periodic within-word entropy).

Compressed Bitstream

The text is a compressed representation of information, where the 'characters' are symbols in a compression codebook. Low entropy is a natural consequence of compression. The 'slot grammar' might be the structure of the compression scheme itself.

Testable predictions

  • Near-uniform conditional entropy across all word positions (Attack 1)
  • No crib consistency — compression doesn't preserve word-level structure (Attack 2)
  • No match to any natural language frequency distribution (Attack 3)
  • No rare-bigram clustering — all bigrams are equally 'designed' (Attack 4)
  • High information density per character compared to natural language

Falsified by

If Attack 1 shows ANY positional structure (periodic or smooth) → pure compression is falsified. If zodiac cribs work under any model → compression is falsified.

Decipherment Attempts

A century of attempts to crack the Voynich code — from microscopic shorthand to neural networks.

1921

William Newbold

Rejected

Microscopic shorthand cipher

Claimed pen strokes contained microscopic Greek shorthand visible only under magnification. Announced he had deciphered the manuscript revealing Roger Bacon's scientific discoveries including telescopes and spiral nebulae.

Debunked by John Manly (1931) who showed the 'microscopic' marks were natural ink degradation, not intentional writing.
1950s

William Friedman / NSA Team

Debated

Constructed language hypothesis

Legendary codebreaker William Friedman led an informal NSA cryptographer team. Concluded the text 'does not act like natural language' and conjectured it might be an artificial/constructed language. John Tiltman and Mary D'Imperio contributed foundational analysis.

The team never cracked the cipher, but produced seminal analytical work. Friedman's artificial language hypothesis remains one of the leading theories. D'Imperio's 1978 summary became the foundational reference work.
1943

Joseph Feely

Rejected

Simple Latin substitution cipher

Published 'Roger Bacon's Cipher: The Right Key Found' claiming a straightforward alphabetic substitution producing medieval Latin text.

Method produced incoherent text when applied to longer passages. No independent verification.
1976

Prescott Currier

Accepted

Two 'languages' (A and B)

Naval cryptanalyst identified two statistically distinct character distributions corresponding to different manuscript sections. Proposed at least two scribes or encoding methods.

The A/B distinction is one of the most robust findings in Voynich studies. Language A dominates herbal sections, B dominates biological/pharmaceutical. Confirmed by multiple independent analyses.
1976

William Bennett

Accepted

Entropy measurements

Performed systematic entropy analysis comparing Voynich text to known languages and ciphers. Found word-level entropy notably low.

Bennett's entropy measurements remain a cornerstone reference. The low entropy suggests either a highly redundant language, a verbose cipher, or a system with built-in redundancy.
1978

Mary D'Imperio

Accepted

Comprehensive survey (NSA)

Published 'The Voynich Manuscript: An Elegant Enigma' through NSA. Not a decipherment but the first rigorous catalog of all attempts, properties, and theories. Classified possibilities into four categories: natural language cipher, synthetic language, random/hoax, and exotic natural language.

Remains the foundational reference work. D'Imperio's framework of four categories is still used to classify new proposals.
1987

Leo Levitov

Rejected

Cathar endura ritual

Proposed the text is a 'polyglot oral tongue' mixing medieval Flemish, Old French, and Old High German, documenting Cathar euthanasia rituals. Published 'Solution of the Voynich Manuscript.'

Linguistic methodology widely criticized. The Cathar interpretation doesn't fit the botanical illustrations. Many forced translations.
1990s

Jorge Stolfi

Debated

Word paradigm / Chinese language

Performed extensive computational analysis revealing strict word-internal structure: words follow a prefix-midfix-suffix pattern with specific character 'slots'. Later explored connections to Chinese and Manchu writing.

The word paradigm model is widely accepted as descriptive but its interpretation is debated. The Chinese/Manchu connection has not gained broad support, though the structural analysis is considered seminal work.
2004

Gordon Rugg

Debated

Cardan grille hoax

Demonstrated that a table of syllables combined with a Cardan grille (a card with windows) could produce text with Voynich-like statistical properties. Published in Scientific American.

Proved it's possible to create Voynich-like text mechanically. But proving possibility doesn't prove actuality. Critics note his generated text lacks some deeper statistical properties found in the real manuscript (e.g., long-range word correlations shown by Montemurro & Zanette, 2013).
2011

Sravana Reddy & Kevin Knight

Debated

Computational NLP analysis

Applied modern NLP techniques including anagram solving and machine translation. Tested whether the text could be an anagram of known languages. Published at ACL workshop.

Showed that simple cipher hypotheses (substitution, anagram of Latin/English) are unlikely. Did not propose a decipherment but narrowed the space of possibilities.
2013

Montemurro & Zanette

Accepted

Semantic content via information theory

Applied clustering and entropy analysis to show the manuscript has a complex statistical structure consistent with meaningful content. Found semantic-like networks organized by topic sections. Published in PLoS ONE.

Widely cited as strong evidence against the hoax hypothesis. Showed long-range word correlations that would be extremely difficult to fake with a Cardan grille or random method.
2014

Stephen Bax

Debated

Partial decipherment via proper nouns

Attempted bottom-up decipherment starting with plant names. Identified ~10 characters by matching botanical illustrations to known plants and reading labels as their names in various languages.

Some character identifications appear reasonable but the method is highly speculative and sample size too small for verification. Bax passed away in 2017 before completing the work.
2014

Torsten Timm

Debated

Self-citation / verbose cipher

Proposed that Voynich words are generated by a process of partial self-copying from nearby text (both forward and backward), explaining the high repetition and low entropy.

The self-citation model explains several puzzling statistical properties. Compatible with either meaningful encoding or a sophisticated hoax. Doesn't identify the underlying content.
2017

Kondrak & Hauer

Rejected

AI decipherment as Hebrew

Used neural networks and AI-based decipherment techniques to propose the underlying language is Hebrew encoded with alphagram substitution. Generated headline-grabbing claims.

Widely criticized by Voynich researchers. The 'decoded' sentences were cherry-picked and incoherent. The method conflated statistical pattern matching with actual decipherment. Kondrak later acknowledged limitations.
2019

Gerard Cheshire

Rejected

Proto-Romance language

Claimed the text is written in 'proto-Romance' — a precursor to modern Romance languages — and that he had fully deciphered it. Published in Romance Studies.

Universally rejected by medievalists and linguists. Lisa Fagin Davis and others showed the 'translations' were circular reasoning. The journal received formal complaints. 'Proto-Romance' as described doesn't match any known historical linguistics.
2021

Bowern & Lindemann

Accepted

Authoritative linguistic survey

Published definitive review in Annual Review of Linguistics confirming genuine linguistic structure. Surveyed all major computational and linguistic analyses to date. Concluded the manuscript has real language-like properties that resist simple explanations.

The most authoritative academic survey of the field. Confirms the text is not random gibberish but leaves open whether it encodes natural language, constructed language, or something else entirely.
2022

Rene Zandbergen

Accepted

STA (Super Transliteration Alphabet)

Published comprehensive analysis of all transliteration systems showing that alphabet choice fundamentally changes statistical results. Proposed STA as superset to enable cross-system comparison.

Key insight: different transliteration choices lead to different entropy values, character frequencies, and even different word counts. Any analysis must be explicit about its alphabet assumptions.
2024

Brewer & Lewis

Debated

Gynecology / women's secrets

Published in Social History of Medicine arguing the manuscript concerns sex, conception, and gynecology. Connected to Bavarian physician Johannes Hartlieb (c.1410-1468) who wrote about plants, women, magic, astronomy, baths — and recommended 'secret letters' to obscure gynecological recipes.

Well-received as a contextual framework. The botanical + bathing + nude figures combination matches medieval 'women's secrets' literature. Doesn't crack the text itself but provides the most convincing content hypothesis to date.
2024

Lisa Fagin Davis

Accepted

Multispectral imaging / codicology

Analyzed multispectral images of folio 1r, discovering previously hidden columns: two Roman alphabets (offset by one letter) and one column of Voynich characters. Attributed the handwriting to Johannes Marcus Marci (owned manuscript 1662-1665) — an early decryption attempt using substitution ciphers.

Confirms the text was already mysterious by the 17th century. Marci tried and failed to decrypt it using simple substitution — strong evidence that it's not a simple cipher. Davis is the leading codicological authority on the manuscript.
2025

Greshko (Naibbe cipher)

Ongoing

Verbose homophonic substitution cipher

Demonstrated a historically plausible cipher method using dice and playing cards ('Naibbe') that encrypts Latin and Italian into Voynich-like ciphertext. The method is hand-executable with 15th-century materials and produces text that is fully decipherable while reproducing multiple Voynich statistical properties simultaneously.

Published in Cryptologia. The most significant recent development — shows that a real, decipherable cipher CAN produce text matching the manuscript's statistical fingerprint. Does not prove the Voynich IS a Naibbe cipher, but establishes it as a viable mechanism. Experts welcome it as a useful benchmark.

The Hoax Debate

Is the Voynich Manuscript a genuine encoded text, or an elaborate meaningless hoax? The debate remains open.

Arguments for Hoax

  • Gordon Rugg showed Cardan grille can produce similar-looking text (2004)
  • Self-citation algorithm (Timm 2014) reproduces both of Zipf's laws from simple copying
  • No corrections or hesitations in the writing — unusual for genuine medieval manuscripts
  • No one has deciphered it despite 600+ years and modern computers
  • Second-order entropy (h2 ~2 bits) is lower than ALL 316 natural languages tested — abnormally predictable
  • Some character combinations appear suspiciously regular
  • The illustrations include impossible/fantasy plants that resist identification

Arguments Against Hoax

  • Long-range semantic correlations found by Montemurro & Zanette (2013) — extremely hard to fake with any mechanical method
  • Cardan grille wasn't invented until 1550 — at least 112 years after the vellum was created (1404-1438)
  • The Naibbe cipher (2025) proves a real, decipherable cipher CAN produce Voynich-like statistics — no hoax needed
  • Content-word clustering by section tracks topics — meaningless text wouldn't organize this way
  • Currier A/B distinction with two correlated scribes suggests systematic encoding, not random generation
  • Marci's failed 17th-century decryption attempt (found via multispectral imaging, 2024) shows it was already undecipherable
  • Bowern & Lindemann (2021, Annual Review of Linguistics) confirm genuine linguistic structure
  • Cost of vellum in the 15th century made hoaxing on this scale economically irrational
  • Sophisticated codicological structure (quire arrangement, catchwords) consistent with genuine manuscripts

Botanical Identifications

Attempts to identify the ~130 plant illustrations as real species — a potential entry point for decipherment.

Tucker, Talbert & Janick (2013-2019)

Debated

Identified 166 phytomorphs as New World species in their 2019 book 'Flora of the Voynich Codex' (Springer). Includes sunflower, chili pepper, armadillo. Won the American Botanical Council's James A. Duke Excellence in Botanical Literature Award (2020).

Award-winning botanical analysis but conflicts with radiocarbon dating (vellum 1404-1438, predating European knowledge of New World). Tucker argues text was written later on old vellum. Mainstream scholars remain skeptical of the New World hypothesis.

Edith Sherwood (2008)

Debated

Identified several plants as common European species, suggesting Northern Italian origin. Compared illustrations to works by Leonardo da Vinci's contemporaries.

Some identifications are reasonable but many botanical illustrations appear to be composites or fantasy plants, making identification speculative.

Stephen Bax (2014)

Debated

Used botanical identifications as entry points for decipherment. Matched illustration of Centaurea (cornflower) to the label text, proposing phonetic values for several characters.

Method is sound in principle (similar to how Egyptian hieroglyphs were deciphered via bilingual texts), but sample size too small for statistical validation.

AI Prior Art

Recent AI/ML work on ancient scripts, cipher-breaking, and manuscript analysis that directly informs our approach. These are the shoulders we stand on.

Oracle Bone Script Decipherment (OBSD)

2024

Diffusion model learned to transform ancient Oracle Bone characters into modern Chinese. Won ACL 2024 Best Paper. Follow-ups (DCSD-OBI, OracleFusion at ICCV 2025) improved accuracy by 11%+. Uses Chinese-CLIP for cross-modal consistency.

Relevance: Direct precedent: train a diffusion model on Voynich glyphs to learn structural decomposition and morph toward known alphabets.

DeepMind Ithaca (Ancient Greek)

2022

Transformer trained on 78,000 ancient Greek inscriptions. 62% accuracy on text restoration alone; 72% when combined with historians. Geographic attribution at 71% accuracy. Dating within 30 years. Published in Nature.

Relevance: Shows self-supervised pre-training works on ancient corpora. Our 37,919 word tokens may be sufficient for character-level pattern learning.

Vesuvius Challenge (Herculaneum Scrolls)

2023-25

CT scanning + AI read 2,000+ characters from carbonized scrolls where ink was invisible to the eye. $700K grand prize awarded February 2024. Discovered a previously unknown tract by Philodemus.

Relevance: Proof that AI can extract information from manuscripts that humans literally cannot see. Multispectral + AI pipeline could reveal hidden Voynich features.

Neural Decipherment (MIT, Barzilay)

2019-21

Minimum-cost flow optimization deciphered Ugaritic (via Hebrew) and Linear B (via Greek). 2021 follow-up handled undersegmented scripts with phonetic priors. Published ACL 2019 and TACL 2021.

Relevance: Best algorithmic framework for cipher-breaking, but requires a known related language — the Voynich's fundamental blocker.

Coupled Simulated Annealing (Tamburini)

2025

Combinatorial optimization tested on Linear A, Proto-Elamite, Indus Valley, Rongorongo. Allows null, one-to-many, and many-to-one character mappings. Published Frontiers in AI.

Relevance: Could be applied directly to Voynich under various language hypotheses. Handles verbose mappings naturally.

Gaskell & Bowern: 'Gibberish After All?'

2022

42 volunteers wrote meaningless text. ML classifier found Voynich transcriptions statistically resemble human-produced gibberish more than meaningful text. Code on GitHub (danielgaskell/voynich).

Relevance: Serious challenge: any decipherment theory must explain why the text passes this gibberish test. Counter: Montemurro's semantic clustering is harder to fake.

Compression-Based Hypothesis

2025

Treats Voynich as an encoded bitstream, testing decompression parameters using Shannon entropy as fitness. Proposes low redundancy and word structure are artifacts of compression (LZ77 + Huffman-like).

Relevance: Novel framework. If the text is compressed rather than encrypted, the decipherment problem changes fundamentally.

Naibbe Cipher Implementation

2025

Open-source Python implementation (github.com/greshko/naibbe-cipher). Includes encryption, decryption, and Voynich-style text generator. Verbose homophonic substitution using dice + playing cards.

Relevance: Concrete, testable cipher model. We can generate synthetic Naibbe ciphertexts and compare statistics to the real manuscript systematically.

LLM Cryptanalysis Benchmarks (CipherBench)

2024-25

Multiple benchmark studies (CipherBench, CipherBank at ACL 2025) tested GPT-4, Claude, and Gemini on classical cipher-breaking. LLMs can break simple substitution ciphers (Caesar, Atbash) with ~90% accuracy but fail on polyalphabetic and homophonic ciphers. Performance degrades sharply with cipher complexity.

Relevance: Establishes that current LLMs cannot brute-force the Voynich directly. But LLMs excel at pattern description and hypothesis generation — use them as analytical partners, not as cipher-breakers.

Neural LM Scoring for Cipher-Breaking

2018-24

Kambhatla et al. (EMNLP 2018) replaced n-gram language models with LSTMs in hill-climbing cipher attacks, achieving 2x improvement on homophonic substitution ciphers. Extended by beam search decipherment (ACL 2022). The same technique powered AZdecrypt's solution of the Zodiac Z340 cipher in 2020.

Relevance: This is the core technique for our Naibbe attack strategy. Neural LM scoring of candidate plaintexts is dramatically better than classical n-gram methods.

AZdecrypt (Zodiac Z340 Solution)

2020

David Oranchak, Jarl Van Eycke, and Sam Blake used AZdecrypt to crack the 51-year-old Zodiac Z340 cipher. The tool uses hill climbing with n-gram scoring, testing 650,000+ candidate transpositions. FBI confirmed the solution in March 2021.

Relevance: Closest real-world precedent for a computational Voynich attack. Z340 was a homophonic substitution cipher with transposition — similar complexity class. The Voynich has 500x more ciphertext, giving more statistical leverage.

Sign2Vec / Cypro-Minoan Clustering

2022

Ferrara (PNAS 2022) applied unsupervised neural embeddings to cluster signs in the undeciphered Cypro-Minoan script. Learned representations captured structural relationships without any labeled data. Identified potential sign variants and ligatures.

Relevance: Directly applicable to visual alphabet discovery. Train embeddings on our 17,672 glyph images to discover the 'true' character set from pixels alone, bypassing transcription assumptions.

Akkadian Neural Machine Translation

2023

Gutherz et al. (PNAS Nexus 2023) built an NMT system translating Akkadian cuneiform to English. Trained on ~10,000 parallel text pairs. Companion model 'cuneiformBase-400m' handles multiple ancient languages. Follow-ups (ICLR 2025) added iterative translation refinement.

Relevance: Shows NMT works on ancient languages with sufficient parallel data. The Voynich lacks parallel text — but if even partial decipherment produces fragments, NMT could bootstrap the rest.

Multimodal LLMs on Historical Manuscripts

2024-25

GPT-4V and Claude Vision applied to medieval manuscript analysis — page layout detection, script identification, illustration classification. HTR-United and CATMuS Medieval provide training data (200+ manuscripts, 160K+ lines). Florence-2 and Grounding DINO enable zero-shot object detection in manuscript pages.

Relevance: We already used Claude Vision for glyph extraction (17,672 glyphs). Next: systematic page-level analysis combining text regions, illustration types, and marginalia to build a structural map of the entire manuscript.

AI Research Approaches

Novel AI/ML approaches we are exploring to make progress on the Voynich mystery. These go beyond previous attempts by leveraging modern multi-modal models and information theory.

Diffusion Model Glyph Analysis

Train a diffusion model on our 17,672 cropped Voynich glyphs — following the Oracle Bone approach (ACL 2024 Best Paper). Learn the structural decomposition of the script. Use conditional generation to morph Voynich characters toward known alphabets (Latin, Arabic, Hebrew). The script requiring the smallest transformation may be related.

Feasibility: highNovelty: high

CLIP Text-Image Alignment

Use CLIP/OpenCLIP to embed Voynich pages (text and images separately). Test whether text embeddings cluster the same way as image embeddings — do botanical texts cluster with botanical images? Fine-tune on medieval manuscripts (Tractatus de Herbis) where text-image relationships are known, then apply to Voynich. Zero effort, high signal.

Feasibility: highNovelty: high

Visual Alphabet Discovery

Use DINOv2/SAM2 vision transformers to cluster all 17,672 glyph images by visual similarity, ignoring all prior transcriptions. Discover the 'true' alphabet from pixel data. Compare to EVA assignments. Could reveal whether EVA over-splits or under-splits characters — resolving Zandbergen's fundamental concern.

Feasibility: highNovelty: high

Naibbe Brute-Force Search

Using the open-source Naibbe cipher implementation, systematically test all possible cipher tables against candidate plaintext languages (Latin, Italian, German). Use a language model to score 'coherence' of candidate plaintexts. GPU-accelerated search could make this tractable. Essentially what Bletchley Park did for Enigma.

Feasibility: mediumNovelty: medium

Cipher-Type Fingerprinting

Generate millions of synthetic ciphertexts: encrypt 15th-century texts with every known cipher type. Measure statistical fingerprints (h1, h2, Zipf slope, word length, hapax rate). Train a classifier to identify cipher family from ciphertext statistics alone. Run on real Voynich text. Even a probabilistic answer would be a breakthrough.

Feasibility: highNovelty: high

Cross-Section Transfer Learning

Train character-level transformers separately on each manuscript section. Measure transfer learning performance: if a model trained on section A predicts section B well, they share deep structure. Could quantify the Currier A/B distinction and discover new sub-languages. Inspired by Ithaca (DeepMind, Nature 2022).

Feasibility: highNovelty: medium

Zero-Shot Botanical Matching

Use CLIP for zero-shot sketch-based image retrieval (CVPR 2023) to match Voynich botanical drawings against plant photo databases. No one has tried this with medieval illustrations. If even 5 plant labels can be confidently identified, that gives anchor points — breaking the 'double unknown.'

Feasibility: mediumNovelty: high

Scribe-Specific Decipherment

Five scribes were identified (Davis), perfectly correlated with Language A/B. Use ViT-based writer identification (F1=0.96 on medieval manuscripts) to validate, then treat each scribe's text as a separate cipher problem. Each sub-corpus would have more consistent statistics, making cipher-breaking more tractable.

Feasibility: mediumNovelty: medium

Line-Unit Cipher Hypothesis

The line functions as an encoding unit (first words longer, certain chars only at line ends). What if each line is independently encrypted with a position-derived key? Test by measuring whether character distributions within lines are more uniform than across lines. If so, the key resets per line.

Feasibility: highNovelty: high

Synthetic Pre-Training for Decipherment

Train a transformer on millions of synthetic cipher-plaintext pairs: encrypt 15th-century texts with every known cipher type (substitution, homophonic, verbose, Naibbe-style). The model learns to 'feel' cipher structure. Then fine-tune on real Voynich text. Even without deciphering, the model's internal representations reveal which cipher family the Voynich belongs to.

Feasibility: highNovelty: high

LLM-Assisted Pattern Discovery

Use Claude/GPT-4 not as cipher-breakers but as pattern describers. Feed pages of Voynich transcription and ask for structural observations: repeated phrases, formulaic openings, section-boundary markers. LLMs excel at noticing patterns humans miss in large texts. Benchmark: CipherBench shows LLMs fail at decryption but succeed at pattern analysis.

Feasibility: highNovelty: medium

Datasets & Tools

Open-source resources, datasets, and pre-trained models available for Voynich research today.

Voynich Full Dataset v2.0

Dataset

Zenodo (Jan 2026). Complete EVA2 + Greek transliteration of the manuscript. Includes word boundaries, line positions, and folio metadata. The most current digital text resource.

IVTFF Takahashi Transcription

Dataset

162,755 expert-transcribed characters across 225 folios. IVTFF format with locus tags (folio.paragraph.line). Our primary ground truth for text analysis.

CATMuS Medieval

Dataset

HuggingFace dataset with 200+ medieval manuscripts and 160K+ transcribed lines. Pre-training resource for handwriting recognition models. Covers Latin, French, English, German scripts from 8th-16th century.

Naibbe Cipher (Python)

Tool

Open-source implementation: encryption, decryption, and Voynich-style text generator. 6 substitution tables, dice + card selection. github.com/greshko/naibbe-cipher.

AZdecrypt

Tool

The tool that cracked the Zodiac Z340 cipher. Hill climbing + n-gram scoring for homophonic substitution ciphers. Open source, actively maintained. Direct starting point for Voynich computational attacks.

eScriptorium + Kraken

Tool

Best open-source pipeline for historical document HTR (Handwritten Text Recognition). Kraken handles segmentation + OCR; eScriptorium provides the annotation UI. Used by major digital humanities projects.

OBSD / OracleFusion

Model

Diffusion models for ancient script decipherment (ACL 2024 Best Paper). OBSD transforms Oracle Bone characters into modern Chinese. OracleFusion (ICCV 2025) adds cross-modal consistency. github.com/guanhaisu/OBSD.

cuneiformBase-400m

Model

400M parameter multilingual model for cuneiform scripts (Akkadian, Sumerian, Hittite). Demonstrates that transformer pre-training works on ancient scripts with limited data.

How the Naibbe Cipher Works

The most promising cipher model for the Voynich. A verbose homophonic substitution cipher using only 15th-century materials.

1. Split plaintext

A die roll determines whether each plaintext letter is encrypted alone (unigram) or paired with its neighbor (bigram). Roughly 50/50 split. This randomizes word boundaries in ciphertext.

2. Select table

Draw a playing card to select one of 6 substitution tables. Each letter has different multi-glyph strings per table. One letter can be disguised 6 ways as unigram, or 36 ways as a bigram (6 prefix x 6 suffix choices).

3. Write glyphs

Look up the multi-glyph string from the selected table. All outputs obey the Voynich slot grammar. Adjacent glyphs within a "word" are determined by the table, not the plaintext — this is why h2 is so low.

Statistical match

h1 entropy
3.86 +/- 0.01
h2 entropy
2.00 +/- 0.01
Word lengths
Binomial, peak 5-6
Slot grammar
100% compliant

Historical plausibility

The earliest known homophonic cipher dates to 1401 (Francesco I Gonzaga, Duke of Mantua). Playing cards were widespread in Italy from the late 1300s — the word "naibbe" itself comes from this era. Dice were ubiquitous. Italian city-states used nomenclators (hybrid letter-cipher + word-code systems) in diplomatic correspondence throughout the 15th century. Alberti's polyalphabetic cipher (1467) came later — the Naibbe uses only techniques available in 1404-1438.

Computational Attack Strategy

If the Voynich is a verbose homophonic cipher (Naibbe-style), here is how a computational attack would work. Inspired by the Z340 Zodiac cipher solution (2020).

The Joint Segmentation + Decryption Problem

Unlike standard homophonic ciphers (1 symbol = 1 letter), a verbose cipher requires solving TWO problems simultaneously: (1) how to segment Voynich "words" into letter-encoding units (unigrams vs bigrams), and (2) what each unit decrypts to. This makes it fundamentally harder than Z340 — but the Voynich has ~170,000 characters (vs Z340's 340), giving far more statistical leverage.

Step 1: Constrain the table

Generate only slot-grammar-compliant glyph strings. This eliminates ~99%+ of possible table entries. The remaining valid strings (hundreds to low thousands) form the candidate pool.

Step 2: Hill climbing + neural LM

Use hill climbing with neural language model scoring (LSTM or transformer, trained on 15th-century Latin/Italian). Swap table entries, score candidate plaintext coherence. Accept improvements, reject regressions. Thousands of parallel restarts on GPU.

Step 3: Multi-language sweep

Run the attack against language models for Latin, Italian, German, Hebrew, Arabic, and other candidate languages. The language that produces the highest-scoring plaintext is the most likely source. This sidesteps the "unknown language" problem.

Step 4: Scribe-specific attacks

Run separate attacks on Language A pages (scribes 1,4) and Language B pages (scribes 2,3,5). Each sub-corpus has more consistent statistics. If different cipher tables were used per scribe, this decomposition is essential.

Estimated tractability

On modern GPU hardware (A100/H100), individual scoring evaluations take microseconds, but the search space requires billions of evaluations. With slot grammar constraints reducing the table space by 99%+, and parallel restarts across thousands of GPU cores, estimated wall-clock time: days to weeks on a GPU cluster. The Z340 was solved with 650,000 candidate transpositions — the Voynich search space is larger but the data is 500x longer.

Key References

Bennett, W.R. (1976). "Scientific and Engineering Problem-Solving with the Computer." Prentice-Hall.

Currier, P.H. (1976). "New Research on the Voynich Manuscript." Presented at ACA seminar.

D'Imperio, M.E. (1978). "The Voynich Manuscript: An Elegant Enigma." NSA/CSS publication.

Rugg, G. (2004). "An Elegant Hoax?" Cryptologia, 28(1), 31-46.

Reddy, S. & Knight, K. (2011). "What We Know About the Voynich Manuscript." ACL Workshop.

Amancio, D.R. et al. (2013). "Probing the statistical properties of unknown texts." PLoS ONE, 8(7).

Montemurro, M.A. & Zanette, D.H. (2013). "Keywords and Co-Occurrence Patterns in the Voynich Manuscript." PLoS ONE, 8(6).

Tucker, A.O. & Talbert, R.H. (2013). "A Preliminary Analysis of the Botany, Zoology, and Mineralogy of the Voynich Manuscript." HerbalGram, 100.

Timm, T. (2014). "How the Voynich Manuscript was created." arXiv:1407.6639.

Bax, S. (2014). "A proposed partial decoding of the Voynich script." stephenbax.net.

Tucker, A.O. & Janick, J. (2019). "Flora of the Voynich Codex." Springer.

Timm, T. & Schinner, A. (2019). "A Possible Generating Algorithm of the Voynich Manuscript." Cryptologia.

Bowern, C. & Lindemann, L. (2021). "The Linguistics of the Voynich Manuscript." Annual Review of Linguistics, 7.

Zandbergen, R. (2022). "Transliteration of the Voynich MS." CEUR-WS Vol. 3313.

Luo, J. et al. (2019). "Neural Decipherment via Minimum-Cost Flow." ACL 2019.

Sommerschield, T. et al. (2022). "Restoring and attributing ancient texts using deep neural networks." Nature.

Gaskell, D. & Bowern, C. (2022). "Gibberish after all?" University of Malta.

Gutherz, G. et al. (2023). "Translating Akkadian to English with neural machine translation." PNAS Nexus.

Brewer, K. & Lewis, M.L. (2024). "The Voynich Manuscript, Dr Johannes Hartlieb and the Encipherment of Women's Secrets." Social History of Medicine, 37(3).

Guan, H. et al. (2024). "OBSD: Deciphering Oracle Bone Language with Diffusion Models." ACL 2024 (Best Paper).

Tamburini, F. (2025). "Automatic decipherment via coupled simulated annealing." Frontiers in AI.

Kambhatla, N. et al. (2018). "Decipherment with a Million Random Restarts." EMNLP 2018.

Oranchak, D. et al. (2020). "Cracking the Zodiac Killer's Z340 Cipher." FBI confirmed March 2021.

Ferrara, S. (2022). "Cypro-Minoan sign clustering via neural embeddings." PNAS.

Greshko, M. (2025). "The Naibbe Cipher." Cryptologia. [github.com/greshko/naibbe-cipher]

Li, Z. et al. (2025). "CipherBench: Benchmarking LLMs on Classical Cipher Breaking." ACL 2025.

Our Current Status

Data imported: 162,755 expert-transcribed characters from Takahashi IVTFF transcription (225 folios mapped to 176 pages). AI-extracted glyph bounding boxes for 17,672 characters with cropped images on R2.

Key finding: Our initial AI glyph extraction (Claude Vision) captured only ~9% of the known characters, missing critical digraphs (ch, sh) and common characters (h, d, y). This demonstrates why expert transcriptions remain essential as ground truth.

Next steps: Import the Zandbergen-Landini Extended EVA (ZL) transcription as a superior data source. Build configurable alphabet system to run analyses under different transliteration assumptions. Begin entropy gradient mapping and cross-section transfer learning experiments.