Findings Summary
Date generated: 2026-05-24
Scope
This project currently analyzes CAPES-7 journal article output for 2017-2020. The focal field is LINGUÍSTICA E LITERATURA, compared with eleven other journal-heavy CAPES evaluation areas.
The current processed dataset contains:
- 63 CAPES-7 programs.
- 39,049 journal article rows.
- 12 discipline groups.
- Publication language from CAPES
DS_IDIOMA. - SJR quartile matching via ISSN.
The quartile source currently used is the accessible UKZN mirror of the 2024 Scimago/SJR quartile file. This is a proxy because Scimago’s official historical CSV endpoint was blocked from this environment. For publication, replace this proxy with historical SJR files for 2017-2020 if possible.
Areas Included
The figures use every area currently present in the processed article table:
| Area | CAPES-7 programs | Article rows |
|---|---|---|
| Administração Pública e de Empresas, Ciências Contábeis e Turismo | 3 | 2,021 |
| Astronomia / Física | 11 | 10,261 |
| Ciência da Computação | 7 | 2,766 |
| Ciência Política e Relações Internacionais | 2 | 754 |
| Ciências Biológicas I | 5 | 3,698 |
| Economia | 4 | 564 |
| História | 2 | 771 |
| Linguística e Literatura | 6 | 3,878 |
| Matemática / Probabilidade e Estatística | 7 | 2,199 |
| Psicologia | 3 | 1,105 |
| Química | 10 | 9,487 |
| Sociologia | 3 | 1,545 |
Political Science is therefore already included, under CAPES’s area label CIÊNCIA POLÍTICA E RELAÇÕES INTERNACIONAIS.
Main Result
Linguistics/Letras is an extreme outlier on three related outcomes:
| Outcome | Linguistics/Letras | Comparison fields | Gap |
|---|---|---|---|
| Matched to SJR | 12.2% | 73.4% | -61.2 pp |
| Published in English | 8.3% | 82.2% | -73.9 pp |
| Q1, all articles denominator | 1.9% | 43.1% | -41.2 pp |
| Q1/Q2, all articles denominator | 6.0% | 62.3% | -56.3 pp |
The indexing result should not be treated as a mere missing-data problem. In journal-based fields, SJR coverage is itself a meaningful signal of whether a journal belongs to the main international indexing ecosystem. Being indexed does not guarantee quality, but being absent from SJR is rarely consistent with being a top international journal in the field.
Discipline-Level Summary
| Area | Articles | SJR match | English | Q1, all articles | Q1/Q2, all articles | Q1/Q2 among SJR-indexed |
|---|---|---|---|---|---|---|
| Linguística e Literatura | 3,878 | 12.2% | 8.3% | 1.9% | 6.0% | 49.3% |
| História | 771 | 22.7% | 6.0% | 4.2% | 15.6% | 68.6% |
| Sociologia | 1,545 | 20.6% | 10.6% | 3.8% | 15.7% | 76.4% |
| Ciência Política e Relações Internacionais | 754 | 29.0% | 17.9% | 8.4% | 22.3% | 76.7% |
| Administração Pública e de Empresas, Ciências Contábeis e Turismo | 2,021 | 41.8% | 45.9% | 17.0% | 25.1% | 60.1% |
| Psicologia | 1,105 | 56.7% | 50.0% | 17.2% | 29.7% | 52.3% |
| Economia | 564 | 56.2% | 72.5% | 31.6% | 41.5% | 73.8% |
| Ciência da Computação | 2,766 | 67.3% | 91.6% | 39.1% | 58.1% | 86.4% |
| Ciências Biológicas I | 3,698 | 91.8% | 92.6% | 58.1% | 81.0% | 88.2% |
| Química | 9,487 | 84.1% | 93.4% | 42.3% | 68.3% | 81.3% |
| Astronomia / Física | 10,261 | 80.3% | 95.0% | 56.4% | 73.2% | 91.1% |
| Matemática / Probabilidade e Estatística | 2,199 | 84.6% | 96.4% | 57.2% | 78.6% | 92.9% |
The indexed-only Q1/Q2 column is useful, but it answers a different question: conditional on a journal being indexed in SJR, where does it rank? The all-article denominator is the stronger field-level measure because it preserves the indexing gap rather than discarding it.
Regression-Style Checks
Simple logit models with a year control give the same qualitative result:
- Odds of SJR indexing for Linguistics/Letras articles are about 0.050 times the comparison fields.
- Odds of being in English are about 0.020 times the comparison fields.
- Odds of Q1/Q2 placement are about 0.039 times the comparison fields.
These are descriptive checks, not causal models. They show that the focal-field gap is not an artifact of small year-to-year changes from 2017 to 2020.
Language as a Mechanism
The data support the idea that language is part of the mechanism linking field norms to indexing and quartile placement.
| Field group | Language | Articles | SJR indexed | Q1/Q2, all articles |
|---|---|---|---|---|
| Linguistics/Letras | English | 322 | 36.0% | 27.3% |
| Linguistics/Letras | Not English | 3,556 | 10.0% | 4.1% |
| Comparison fields | English | 28,925 | 81.9% | 71.4% |
| Comparison fields | Not English | 6,246 | 34.4% | 20.4% |
Within Linguistics/Letras, English articles are much more likely to be indexed and much more likely to be Q1/Q2. This is consistent with the proposed pathway:
field norms / audience / language orientation
-> Portuguese and Brazilian/local journal publication
-> lower SJR indexing
-> lower observed Q1/Q2 placement
Adding English to the logit models reduces, but does not eliminate, the Linguistics/Letras penalty:
- For SJR indexing, the focal-field odds ratio moves from 0.050 to 0.188 after adding English. English itself has an odds ratio of 8.39.
- For Q1/Q2 placement, the focal-field odds ratio moves from 0.039 to 0.160 after adding English. English itself has an odds ratio of 9.74.
This is a descriptive mediation pattern: English publication explains part of the gap, but not all of it.
Brazilian Journal Proxy
For articles matched to SJR, the 2024 SJR file includes journal country. This allows a partial proxy for Brazilian versus non-Brazilian journals, but only among indexed journals.
| Field group | SJR country group | Indexed articles | English | Q1/Q2 among indexed |
|---|---|---|---|---|
| Linguistics/Letras | Brazilian SJR journal | 307 | 13.4% | 39.1% |
| Linguistics/Letras | Non-Brazilian SJR journal | 166 | 45.2% | 68.1% |
| Comparison fields | Brazilian SJR journal | 2,133 | 49.4% | 29.2% |
| Comparison fields | Non-Brazilian SJR journal | 23,697 | 95.5% | 89.9% |
The pattern is consistent with the hypothesis that Brazilian/local journal publication is associated with lower international quartile placement. However, this proxy cannot classify the large number of non-indexed articles by journal country. A stronger version would require an external ISSN-to-country registry or a curated Brazilian-journal list.
Working Hypotheses
The current evidence supports these working hypotheses:
- Linguistics/Letras CAPES-7 output is structurally less integrated into SJR’s international journal-indexing system than the comparison fields.
- Low English-language publication is one major mechanism behind low indexing.
- Brazilian/local journal publication likely mediates part of the relationship between language and SJR indexing.
- The Q1/Q2 gap should be measured primarily with all articles in the denominator, because non-indexing is substantively informative rather than a random missingness process.
- Conditional Q1/Q2 among indexed journals is useful as a secondary diagnostic: it asks whether indexed Linguistics/Letras journals are also lower-ranked after excluding non-indexed output.
- The results support the claim that Linguistics/Letras publishes much less in international indexed journals. They do not by themselves prove that the underlying research is worse; they show that journal placement, language, and indexing ecology differ sharply.
Figures
Generated figures are in figures/:
pct_english_by_discipline.pngsjr_indexing_rate_by_discipline.pngsjr_quartile_distribution_by_discipline.pngq1q2_among_indexed_by_discipline.pngenglish_vs_q1q2_by_discipline.pngenglish_share_over_time.pngsjr_indexing_by_language_and_field.pngq1q2_by_language_and_field.png
Current Caveats
- The SJR file is a 2024 proxy, not a historical 2017-2020 panel.
- CAPES’s accessible datastore copy of the 2013-2016 article-detail table is incomplete, so processed analyses currently use 2017-2020 only.
- Professor-level denominators are based on observed faculty authors because the CAPES docente datastore returned empty filtered files for the selected program codes.
- Linguistics/Letras in CAPES combines linguistics and literature programs. The present summary treats the CAPES area as the focal unit; a later refinement could split linguistics-oriented and literature-oriented programs.
Adding Areas
To add another CAPES evaluation area, add its normalized CAPES area name to STUDY_AREAS in scripts/derive_capes7_programs.py, then rerun:
python3 scripts/derive_capes7_programs.py
python3 scripts/fetch_capes7_datastore.py
python3 scripts/build_database.py
python3 scripts/export_summaries.py
Rscript analysis.RPolitical Science does not need to be added because it is already present as CIÊNCIA POLÍTICA E RELAÇÕES INTERNACIONAIS.
Copyright © Guilherme Duarte Garcia