How does the Language Creator work?

The Language Creator is a tool for generating plausible, human-like languages from explicit typological choices. It does not merely invent a few exotic-looking words and call it a day, which would be easier, and therefore naturally less useful.

The usual starting point is the questionnaire. It is already filled in with random values, so you can press Submit at once, or you can adjust the settings yourself: phonology, writing system, morphology, syntax and a collection of smaller typological preferences. The point is not that every possible combination is equally natural, but that the choices are explicit and inspectable.

When you submit the questionnaire, the system creates an ELD file: an Editable Language Definition. This is the internal description of the generated language. It contains the questionnaire answers together with the resulting phonology, orthography, morphology, lexicon and other machinery needed to produce the visible output. The ELD is then used to generate a grammar with glossed examples, a bilingual dictionary and unglossed sample texts.

If you only want to explore the generated language, you can ignore the ELD file entirely. From the grammar page you can continue to the dictionary and the sample texts, or download the whole result as a single DOCX document. For normal users, that is the workflow: questionnaire in, grammar out, mild linguistic sorcery in the middle.

The ELD file is there for people who want more control. You can download it, edit it and upload it again through the upload page. That makes it possible to change individual morphemes, adjust grapheme mappings, alter lexical items or share a language definition with somebody else without returning to the questionnaire and hoping the gods of randomness feel cooperative.

From abstract structure to surface language

The glossed examples and the running texts are generated by the same general mechanism. Each sentence begins as an abstract XML representation. At that stage, lexical items and grammatical roles are known, but word order, case marking, agreement, affixes, clitics and other surface details have not yet been fully realised.

A simplified initial representation of a sentence meaning “I patted the cat” might look like this:

<example id="relp1" translation="I patted the cat.">
  <s>
    <vp tam="past-punct">
      <verb>pat</verb>
    </vp>
    <np number="sing" def="neither" role="subj">
      <pron person="1excl" number="sing"/>
    </np>
    <np def="def" number="sing" tr="rheme" role="obj">
      <noun>cat</noun>
    </np>
  </s>
</example>

The system then passes this structure through an ordered pipeline of transformation modules. One module may add determiners, another may assign case, another may decide constituent order, while others handle agreement, pronouns, negation, possession, adpositions, affixes or other grammatical features. Each module does one job, or at least tries to, because even software deserves a fighting chance against chaos.

For instance, in a language with ergative alignment, a case-marking module may enrich the noun phrases by assigning ergative case to the transitive subject and absolutive case to the object:

module casemarking:

<example id="relp1" translation="I patted the cat.">
  <s>
-    <np number="sing" def="neither" role="subj" locus="number">
+    <np number="sing" def="neither" role="subj" locus="number" case="ERG">
      <word person="1excl" number="sing" posp="pron"/>
    </np>
-    <np def="def" number="sing" tr="rheme" role="obj" locus="number">
+    <np def="def" number="sing" tr="rheme" role="obj" locus="number" case="ABS">
      <word posp="noun" gloss="cat"/>
      <word gloss="the" posp="det"/>
    </np>
    <vp tam="past-punct" negation="POS" question="not.Q" ta="PRES">
      <word posp="verb" gloss="pat"/>
    </vp>
  </s>
</example>

Later modules may turn those abstract features into actual words, suffixes or other surface material. The result is a fully realised example sentence in the generated language, with an interlinear gloss and translation. The same process is used for the longer sample texts, only with fewer explanatory crutches, because apparently readers enjoy being made to work.

The transformation sequence for each example can be inspected by clicking on its example number in the grammar. This is an important part of the project: the output should not feel like a black box. If a language does something odd, the aim is that you can find out which part of the machinery did it, rather than blaming “AI”, elves or typological vibes.

Plausibility, not handcuffs

The Language Creator tries to bias its languages towards attested cross-linguistic patterns without enforcing a narrow list of what is “allowed”. Many settings use sliders rather than yes/no switches. For example, a parameter may increase the likelihood and richness of a class of sounds instead of simply declaring that the language either has them or does not.

This approach is deliberately imperfect. Human languages are full of patterns, tendencies, counterexamples and historical accidents. Some settings presuppose others: adjective order only matters if the language has adjectives, and gender systems interact with number in ways that cannot be reduced to a neat little checkbox, despite humanity’s tragic love of neat little checkboxes.

The goal is therefore not to prove that every generated language could exist. It is to make typological choices operational: to turn labels such as “ergative”, “head-final”, “suffixing”, “pro-drop” or “polysynthetic” into procedures that generate concrete linguistic forms. When those procedures fail or produce something strange, that failure is itself useful. It shows where a typological label hides more complexity than the label admits.

Explore, inspect, improve

You may browse the collection of showcased languages to see examples of what the system can generate. Individual languages may also be submitted to the showcase page, and structured feedback can be sent on plausibility, internal consistency and how well the output matches the questionnaire settings.

If you would like more context, you can read about the project’s background and origin, or approach it from a linguistic perspective or a conlanging perspective. The first is mostly about typology and structural consequences; the second is about practical experimentation and iterative design.

Contributions are welcome, including bug reports, typographical corrections, suggestions, documentation and code. The source code is hosted on Codeberg and released under the GNU General Public License.

The Language Creator was written by Thomas Widmann. The current version is 0.92.

Version history

version 0.92: released on 20th June 2026.
version 0.91: released on 23rd May 2026.
version 0.90 (the first one working properly): released on 12th April 2026.