Phonokit

Phonological Analysis in Typst

Author

Guilherme D. Garcia

Published

January 7, 2026

DOI Typst Package MIT License GitHub Issues Last Commit

In what follows, I assume you know about Typst, but in a nutshell it is a programming language designed for typesetting. There’s a great tutorial here and an introductory series on YouTube here. To migrate from \(\LaTeX\) in 2025, I had to spend some time playing around with the language to see if I’d be able to move all of my workflow (slides, articles, CV, etc.). I quickly discovered that it could do everything I do in \(\LaTeX\) and, crucially, much more (see here). That’s how the idea for this package was born: it is a collection of functions I often use in my teaching/research in phonology.

TipManual

There’s a comprehensive vignette for the package in PDF format here. The file constains numerous examples for all functions in Phonokit.


Main features

IPA Module

Unlike \(\LaTeX\), Typst offers out-of-the box support for Unicode characters such as phonetic symbols. While this is great, I am already used to tipa in \(\LaTeX\), so my first goal was to have a function that emulated tipa as much as possible: it would be familiar, practical and quick. This is what the function #ipa() does. There are only minor differences between \textipa{} and #ipa() — see reference sheet in Figure 1.

  • tipa-style input: use familiar \(\LaTeX\) tipa notation instead of hunting for Unicode symbols
  • Comprehensive symbol support: most IPA consonants, vowels, and other symbols from the tipa chart
  • Combining diacritics: Nasalized (\\~), devoiced (\\r), syllabic (\\v); the tie (\\t) is also available
  • Suprasegmentals: primary stress ('), secondary stress (,), length (:)
  • Automatic character splitting: Type SE instead of S E for efficiency (spacing is necessary around characters using backslashes)
  • Charis SIL font needed for all transcriptions. If you don’t already have this font installed, visit https://software.sil.org/charis/download/
Figure 1: Reference sheet (see vignette). Click to zoom.

IPA Charts Module

I have used the great vowel package multiple times in \(\LaTeX\), but I don’t love its interface — see example in my \(\LaTeX\) tutorial for phonologists here. The #vowels() function in phonokit is simpler: it takes a string of vowels and plot them onto a vowel trapezoid. The trapezoid in Figure 2 was created with #vowels("english"), but the function takes a string of vowels, so you can customize your trapezoid as needed. The same applies to the function #consonants().

A similar function also exists for consonants: #consonants(). It returns an IPA table of pulmonic consonants given an input (string). For both #vowels() and #consonants(), you also have the option of using a language as an input (see list of available languages below).

  • Vowel charts: plot vowels on the IPA vowel trapezoid with accurate positioning
  • Consonant tables: display consonants in the pulmonic IPA consonant table
  • Language inventories: pre-defined inventories for some languages (English, Spanish, French, German, Italian, Portuguese, Japanese, Russian, Arabic, Mandarin)
  • Custom symbol sets: plot any combination of IPA symbols
  • Automatic positioning: symbols positioned according to phonetic properties (place, manner, voicing, frontness, height, roundedness)
  • Proper IPA formatting: voiceless/voiced pairs, grayed-out impossible articulations, minimal pair bullets for vowels
  • Scalable charts: adjust size to fit your document layout (scaling includes text as expected)
Figure 2: Vowel trapezoid for (Standard American) English.

Prosody Module

The functions #syllable(), #foot() and #word() help you create prosodic representations from strings. They adjust sizing automatically, but you can also use the scale argument.

  • Prosodic structure visualization: draw syllable structures with onset, nucleus, and coda
  • Flexible foot structure: use parentheses to mark explicit foot boundaries and stress mark to identify headedness (iambs, trochees)
  • Stress marking: mark stressed syllables with apostrophe '
  • Flexible alignment: left or right alignment for prosodic word heads
Figure 3: A geminate consonant in a traditional moraic representation.

Figure 3 is the result of #foot-mora("'pot.ta", coda: true), where coda: true indicates that moras project a mora. The function detects identical coda-onset sequences, so “pot.ta” triggers the representation for a geminate.

To create the representation for prosodic words, the function #word() is used. Figure 4 shows a simple PWd generated with the code #word("('po.Ra).('ma.pa)", foot: "R"), where foot: "R" indicates which foot is the main foot in the PWd (when more than on foot is present). Notice that feet are detected based on the use of parentheses in the input. Stress marks ' are used to determine foot headedness. All functions accept the same input used in the #ipa() function, which means that phonetic symbols are automatically detected.

Figure 4: A prosodic word assuming onset-rhyme representations for syllables.

All functions involving prosodic representations also have a scale argument. This is important because we often need to adjust the dize of a representation but the text itself may not scale appropriately (line width can also be tricky in these scenarios). The argument in question takes care of everything.

Autosegmental phonology module

Starting with version 0.3.0, Phonokit also offers a function to create autosegmental representations, including features and tones. The example in Figure 5 is from Zsiga (2024). Read the vignette to learn more about the function. In a nutshell, you can represent linking, delinking, floating tones, and highlighted tones. Thus, the most common processes involving features and tones are easy to represent with the #autoseg() function.

Figure 5: Tone spreading example
#autoseg(
  ("e", "b", "e"),
  features: ("L", "", "H"),
  spacing: 0.5,
  tone: true,
  gloss: [èbě],
)
#a-r // arrow
#autoseg(
  ("e", "b", "e"),
  features: ("L", "", "H"),
  links: ((0, 2),),
  spacing: 0.5,
  tone: true,
  gloss: [_pumpkin_],
)
Code block 1: Code used to generate an autosegmental representation.

Constraint grammar module

The package includes a function to generate OT tableaux (see Figure 6), but it goes one step further and produces a MaxEnt tableau Goldwater and Johnson (2003) Hayes and Wilson (2008) with the function #maxent(). Figure 7 illustrates a scenario where all candidates have a non-zero probability of being observed given a specific input \(x\). The column \(H(y)\) displays the Harmony score of each candidate \(y\), calculated as the weighted sum of all constraint violations. Next, the column \(e^{-H(y)}\) provides the unnormalized probability, which is the exponential of the negated Harmony score (this has also been called the MaxEnt score). Finally, the actual predicted probability is shown in column \(P(y|x)\), which is obtained by dividing the unnormalized value of a candidate by \(Z(x)\) (the sum of all unnormalized scores).

Figure 6: A typical OT tableau
#tableau(
        input: "kraTa",
        candidates: ("kra.Ta", "ka.Ta", "ka.ra.Ta"),
        constraints: ("Max", "Dep", "*Complex"),
        violations: (
          ("", "", "*"),
          ("*!", "", ""),
          ("", "*!", ""),
        ),
        winner: 0, // <- Position of winning cand
        dashed-lines: (1,) // <- Note the comma
      )
Code block 2: Code used to generate an OT tableau.

The function #maxent() calculates \(H(y)\), \(e^{-H(y)}\) and \(P(y|x)\) automatically given the weights provided. Figure 7 lists the weights for the constraints in use at the top and prints probability bars at the right margin. These can be turned off with visualize: false (see Code block 3), but they are printed by default as this can help students quickly visualize probabilities when many candidates are evaluated.

Figure 7: MaxEnt tableau with automatic calculation.
#maxent(
  input: "kraTa",
  candidates: ("[kra.Ta]", "[ka.Ta]", "[ka.ra.Ta]"),
  constraints: ("Max", "Dep", "*Complex"),
  weights: (2.5, 1.8, 1),
  violations: (
    (0, 0, 1),
    (1, 0, 0),
    (0, 1, 0),
  ),
  visualize: true  // Show probability bars (default)
)
Code block 3: Code used to generate a MaxEnt tableau.

It is often useful to present a ranking using a Hasse diagram. These diagrams can be generated in Phonokit using the #hasse() function. In a nutshell, the function takes tuples with \(n\) elements. In the simplest case, \(n = 1\), which produces a floating constraint. The example in Figure 8 shows a basic scenario The third element in the first tuple indicates the “stratum” in the diagram — this is especially important in more complex cases, which require better control over the vertical position of different constraints. Optional arguments exist to give the user more flexibility (e.g., scale and node-spacing).

Figure 8: A simple Hasse diagram
#hasse(
        (
          ("*Complex", "Max", 0),
          ("*Complex", "Dep", 0),
          ("Onset", "Max", 0),
          ("Onset", "Dep", 0),
          ("Max", "NoCoda", 1),
          ("Dep", "Constraint[Feat]", 1, "dotted"),
        ),
        node-spacing: 3,
      )
Code block 4: Code used to generate a Hasse diagram.

Package Repository

You can download/fork the most up-to-date version of the package (development version) in the repository below.

  • http://github.com/guilhermegarcia/phonokit

Copyright © Guilherme Duarte Garcia

References

Goldwater, Sharon, and Mark Johnson. 2003. “Learning OT Constraint Rankings Using a Maximum Entropy Model.” In Proceedings of the Stockholm Workshop on Variation Within Optimality Theory, 111–20.
Hayes, Bruce, and Colin Wilson. 2008. “A Maximum Entropy Model of Phonotactics and Phonotactic Learning.” Linguistic Inquiry 39 (3): 379–440. https://doi.org/10.1162/ling.2008.39.3.379.
Zsiga, Elizabeth C. 2024. The Sounds of Language: An Introduction to Phonetics and Phonology. Chichester, UK: John Wiley & Sons.