Do CAPES-7 programs publish in predatory journals?

Working note for the “Para quem publicamos?” study. Question raised: since much humanities output is in non-indexed journals, what share of those publications is in predatory journals? Analysis run 2026-05-28.

Short answer: Almost none. Across all CAPES-7 article output (2017–2020), 0.24% of articles are in predatory venues. For Linguistics/Letras the figure is 0.026% — a single article in four years. Predatory publishing is not a meaningful phenomenon in this elite sample, and is effectively absent from Letras.

Why this matters

It forecloses a lazy misreading of the main finding. Low international indexing in Linguistics/Letras is not because the field publishes in predatory junk. The non-indexed output is legitimate-but-local (Brazilian journals, in Portuguese), not predatory. The two things are completely different and this note quantifies the difference.

Data and universe

  • Source: CAPES ARTPE records (journal articles) for CAPES-7 programs, 2017–2020 quadrennium.
  • Universe: 42,235 program-article records; 3,878 in Linguistics/Letras (focal_field=1), of which 3,405 are non-indexed in SJR and 473 indexed.
  • Journal name + ISSN were parsed from the CAPES DS_ISSN field (format (ISSN) JOURNAL NAME); publisher from NM_EDITORA (present for only ~16% of rows); SJR publisher/quartile from the matched articles.csv.

Method

Reference lists: Beall’s List (archived, beallslist.net) — 1,344 publishers + 1,515 standalone journals, normalized (accent-stripped, lowercased, punctuation removed).

An article is flagged predatory if either the journal name matches a Beall standalone journal or the publisher (NM_EDITORA or the SJR publisher field) matches a Beall publisher — subject to two precision rules:

  1. Definition of record (conservative): predatory = on Beall’s list AND not SJR-indexed. An SJR-indexed journal is, by SJR’s curation, circulating internationally, so labelling it predatory is exactly the indefensible move. This rule dissolves the contested-megapublisher problem (see below).
  2. Collision suppression / contested exclusion: contested-but-indexed megapublishers (MDPI, Frontiers, Hindawi, Dove, Libertas, Bentham) are excluded, and generic journal-name collisions with legitimate journals (e.g. ACS’s Journal of Natural Products) are removed by the not-indexed rule.

Why the conservative definition is necessary

A naïve “count everything on Beall’s” yields 735 articles (1.7%) — but that count is dominated by SJR-indexed journals from contested megapublishers (MDPI’s Molecules, Sensors, Entropy, Nanomaterials…, Bentham, Oncotarget). Those are indexed and internationally circulating; treating them as predatory would be wrong and would be the first thing a referee attacks. Restricting to non-indexed Beall venues removes that noise and leaves a clean, unambiguous set.

Results

Group Articles Naïve (all Beall) Predatory (conservative) %
All CAPES-7 42,235 735 100 0.237
Linguistics/Letras 3,878 2 1 0.026
Other fields 38,357 733 99 0.258
Letras — non-indexed only 3,405 1 0.029
  • 56 distinct predatory journals, 100 articles total.
  • The single Linguistics/Letras hit: International Journal of English Research (ISSN 2455-2186), one article, LETRAS/UFRGS, 2018.
  • Mechanism-driven cross-check for Letras: predatory venues are English-only, so any Letras predatory publication must be English-titled. The English-titled non-indexed Letras candidate pool (138 journals / 219 articles) was inspected; all but the one above are legitimate (Revista da ANPOLL, Letras de Hoje, Alfa, WORD, Studies in Romanticism, JoSS…). The estimate of 1 is therefore near-exhaustive, not a lower bound clipped by tooling.

Representative predatory journals (non-indexed, on Beall’s)

Articles Journal (ISSN) Field
15 International Journal of Development Research (2230-9926) other
11 International Journal for Innovation Education and Research (2411-2933 / 2411-3123) other
4 International Journal of Science and Research (2454-2008) other
4 Journal of Novel Physiotherapies — OMICS (2165-7025) other
4 European Journal of Chemistry (2153-2249) other
4 Biomedical Journal of Scientific & Technical Research (2574-1241) other
2 Creative Education — SCIRP (2151-4755) other
1 American Journal of Applied Chemistry — SciencePG (2330-8753) other
1 European Journal of Scientific Research (1450-216X) other
1 European Academic Research (2286-4822) other
(OMICS dental/oncology/obesity titles, etc.) other
2 International Journal of English Research (2455-2186) Letras (1) + other (1)

The genuine predatory hits cluster in applied/STEM, medicine/dentistry, and business — not the humanities.

Interpretation: the language-barrier mechanism

The board analysis (predatory_authors.md) found Brazilian linguists — including the sitting ABRALIN president — listed on predatory editorial boards. Yet they almost never publish in predatory venues. The two diverge for a simple structural reason:

  • Board membership is low-friction vanity: you receive a flattering email, you have little exposure to foreign venue quality, you accept. No language barrier.
  • Publishing is high-friction: predatory journals operate in English, never Portuguese. A Portuguese-default author has to clear the same language hurdle as for any international venue — so the predatory “shortcut” offers no advantage over a legitimate local journal.

Result: governance complicity ≠ output. Brazilian linguists appear on predatory mastheads far more than they publish in predatory journals.

Limitations

  • Beall’s is frozen at 2017 and is name/publisher-keyed; no comprehensive predatory-ISSN list was obtainable (SciencePG/OMICS directories are JS-rendered and could not be harvested statically).
  • NM_EDITORA is blank for ~84% of rows; publisher-level predators were partly recovered via the SJR publisher field and via journal-name matching, but some publisher-level cases with no name match and no publisher field will be missed.
  • The not-indexed rule slightly undercounts genuinely-predatory journals that achieved brief SJR indexing.
  • Net effect: the other-fields figure is a mild lower bound; the Letras figure is robust (near-exhaustive given the language-barrier filter).

Reproducibility

Matching is name+publisher+SJR-publisher against normalized Beall lists, joined to lattes/data/processed/articles.csv on (cd_programa_ies, year, production_id) and lattes/data/raw/.../*_artpe.csv. Beall snapshots: beallslist.net publishers + standalone-journals pages.


Note (to be added):

Copyright © Guilherme Duarte Garcia