The trouble with journal typesetting

or, why your LaTeX manuscript still comes back broken

typesetting
publishing
LaTeX
Quarto
linguistics
Author

Guilherme D. Garcia

Published

June 8, 2026

Modified

June 8, 2026

Keywords

typesetting, journals, LaTeX, Quarto, Typst, JATS, scholarly publishing, linguistics

TipA small rant, and a slightly less small explanation

Anyone who has published a journal article knows the feeling. You spend months getting the manuscript right; the figures are clean, the IPA glyphs are properly aligned, the trees look the way they’re supposed to look. You submit, you get accepted, you wait for proofs — and the proofs come back with errors you never made. How does that even happen?

This post is partly a complaint and partly an attempt to understand why the situation is the way it is. I went in expecting to confirm what I already thought (publishers are needlessly stuck in the past); I came out with a more complicated picture.

Journal typesetting often, well, sucks

I think most academics have, at some point, opened a set of proofs and stared at the screen in disbelief. A misplaced italic here, a broken ligature there, a transcription that got mangled because an IPA character didn’t survive whatever conversion happened backstage. Footnotes pointing to the wrong line. References silently re-ordered… figures that are now pixelated.

The most frustrating cases aren’t the ones where the author made a mistake. They’re the ones where the manuscript was clean, and the production process introduced an error that wasn’t there before.

It’s worse in linguistics (maybe?)

This problem is bad in every field, but it’s especially bad in linguistics, because almost everything we do leans on notation that doesn’t behave well under naïve text processing:

  • IPA: a Unicode minefield. Diacritics stack, combining characters can be reordered by overzealous normalization, and many fonts don’t cover the full set.
  • Interlinear glosses: the alignment is the point. If the typesetter reflows the text without understanding what the columns mean, the gloss becomes nonsense.
  • Syntactic trees: usually drawn with forest, tikz-qtree, or similar. These are precisely the kinds of figures that get flattened, re-rendered, or worse, redrawn by hand.
  • OT tableaux: the pointing hand, the shading, the dotted lines — all encoded in expex, linguex, or a custom macro — and all easy to break.
  • Autosegmental and metrical diagrams: same story.

In other words, the field’s typographic needs are tightly coupled to its content. A “small” typesetting error in linguistics can be a substantive error.

But many of us already use \(\LaTeX\) — so what gives?

Here’s the part that’s always puzzled me. Many linguists — especially in phonology, syntax, semantics, and computational work — write in \(\LaTeX\) (and have done so for a long time). Some have moved on to Typst.1 The point is: the input already looks great. If the source file produces a beautiful PDF on the author’s machine, why does the journal’s version look worse?

Even when journals provide a \(\LaTeX\) template (and many do), the proofs that come back are frequently broken. Errors get introduced that were never in the source. Why?

What actually happens behind the scenes

When I started looking into this, I found out that the production pipeline at most journals — including newer, open-access ones — looks roughly like this:

  1. The author submits a manuscript (Word or \(\LaTeX\)).
  2. The publisher sends the manuscript to a typesetting vendor (usually a third-party company, often outsourced internationally).
  3. The vendor’s pipeline converts the manuscript into JATS2 XML — a structured format used across scholarly publishing, one I wasn’t familiar with.
  4. From that XML, the vendor produces the typeset PDF (often via InDesign), the HTML version of the article, and any other format the publisher wants.
  5. The proofs you receive are generated from the XML, not from your original source.

That last point is the key. Your .tex file isn’t the source of truth at the publisher’s end — the XML is. Whatever the vendor’s conversion does to your \(\LaTeX\) is what becomes your paper.

Why XML at all?

This was the part I didn’t appreciate. JATS XML isn’t a publisher whim — it’s the format that the rest of the scholarly infrastructure expects:

  • Indexers (CrossRef, Scopus, Web of Science, PubMed, DOAJ, OpenAlex) ingest JATS to populate metadata, references, and search.
  • Dark archives (LOCKSS, CLOCKSS, Portico) preserve JATS as the long-term “version of record” — the idea being that, in 50 years, you want something readable that doesn’t depend on a specific TeX distribution or a specific font file.
  • Accessibility and multi-format output (responsive HTML, ePub, MathML for screen readers) are all derived from XML.

Put differently: even a journal that wanted to be \(\LaTeX\)-only would still have to produce JATS XML, because without it the article effectively doesn’t exist in the wider scholarly ecosystem.

And why InDesign?

InDesign is where the visual PDF gets composed from the XML, using the journal’s house template. Vendors have built workflows around InDesign for decades; their copyeditors, typesetters, and QA people are trained on it. Switching to a \(\LaTeX\)-only pipeline would mean rebuilding all of that — and hiring technical staff to maintain it. At the per-article economics of most journals (especially open-access ones competing on cost), that’s a non-trivial investment.

What about newer journals?

This is the question I kept coming back to. Why don’t new journals just use \(\LaTeX\), end to end?

The answer turns out to be the same: if a journal is published, say, through Open Library of Humanities, a platform hosting many journals across fields, the production pipeline is shared, vendor-based, and XML-first — because the platform needs to scale across journals whose authors mostly submit Word. A single journal can’t unilaterally opt out without rebuilding the whole stack.

Why not Quarto?

Quarto can already produce HTML, PDF (via \(\LaTeX\) or Typst), and JATS XML from a single source. The quarto-journals extensions cover several major templates. In a parallel universe, every journal could accept .qmd and run it through a vetted pipeline.

I think some likely reasons are:

  • Quarto is still relatively young;
  • Apparently, its JATS output isn’t yet a one-to-one replacement for what indexers expect across all journals;
  • Journals don’t have people who are familiar with Quarto/Typst yet;
  • Switching pipelines creates some production risks… which is never something journals want, of course.

So yes, it sucks

The version of this story I had in my head was roughly: “Publishers are stuck in the past; if they just used \(\LaTeX\), we’d be fine.” After looking into it, I’d rephrase that as something more like: “Publishers are stuck in a real infrastructure that they cannot unilaterally replace, even when [much] better tools exist.”

That’s not a defense of the status quo. The errors introduced in proofs are real, and in a field like linguistics they often touch the actual claims of the paper. But the fix isn’t “tell vendors to use \(\LaTeX\)” — it’s getting indexers, dark archives, and platforms to accept a different source, while a critical mass of journals proves that a Quarto/Typst/\(\LaTeX\)-first pipeline can produce clean JATS at scale without melting down.

That also means the path forward is more about boring infrastructure work than about yelling at publishers — which, frankly, hasn’t worked yet anyway.

The interesting thing to me is the asymmetry between what the publishing industry uses and what we as individuals have at our disposal. This goes back to Typst: it’s a great tool; it’s fast, modern, light, intuitive, and very parsimonious (I haven’t used \(\LaTeX\) ever since I switched). But if tex files are not universally accepted across linguistics journals — not even by the giants in the industry (OUP, CUP, etc.) — can you imagine how long we’ll have to wait to be able to actually use more modern tools…?


Copyright © Guilherme Duarte Garcia

Footnotes

  1. I’ve written about Typst and Quarto elsewhere on this site. The short version: both are excellent and both, in principle, solve the multi-format output problem we’ll see below.↩︎

  2. Journal Article Tag Suite.↩︎