Regular expressions in Fonology (ii)

Transcription

Transcription is divided into several important steps that follow a specific structure. These steps are: cleaning, handling exceptions, applying transcription rules, and final cleaning.

Cleaning

Before transcribing words, the data must be cleaned. To do this, all characters are converted to lowercase and all punctuation marks are removed. Once the cleaning is done, transcription can begin.

Exceptions

In French, spelling is relatively generalizable,¹ but many words still have irregular spelling. These unique words, such as “monsieur”, “hier”, or “yeux”, must be transcribed first if we want to avoid our transcription rules replacing the spelling that allows us to identify them.

Applying the rules

It would be impossible to present all the rules used in the module. The following rules are therefore examples that illustrate the basic concepts needed to understand the process.

There are letters (and groups of letters) that are fairly simple to transcribe, such as:

“â” –> /ɑ/
“gn” –> /ɲ/
“oy” –> /waj/
etc.

These graphemes are regular and therefore very easy to transcribe². However, for other rules, more caution is required. The order in which rules are applied is generally very important. For example, consider two rules:

A : “u” –> /y/
B : “ou” –> /u/

If rule A is applied before rule B, there is no problem. However, if rule B is applied before rule A, all “u” will become /y/, since the program does not distinguish between the character “u” and the phoneme /u/.³

Temporary replacements

In some cases, even changing the order of the rules does not fix the errors. To address this, we used temporary replacements. This method makes it possible to specify whether a letter has already been transcribed or not. For example, consider the following rules:

A : “ées”, “és”, “ée” and “é”⁴ –> /e/
B : “e” –> /ə/

At first, the rules are in a problematic order, because rule A will lose its effect due to rule B. However, if we try to reorder the rules, we still observe a problem: “ées” and “ée” will be transcribed as /eə/. It is in cases like this that temporary replacements can be used.

Keeping the order described above, we can modify rule A as follows:

A : “ées”, “és”, “ée” and “é” –> “E”
B : “e” –> /ə/

As a result, rule B no longer targets the output of rule A, since it is now an uppercase letter.

Final cleaning and last transcriptions

If temporary replacements are used, the result is functional but not yet a phonemic transcription. It is therefore necessary to convert the temporary characters into the correct phonemes. It is also in this stage of transcription that geminate consonants are reduced to their non-geminate equivalents. For example: “tt” –> /t/.

Regular expressions in Fonology (ii)

Introduction

Transcription

Cleaning

Exceptions

Applying the rules

Temporary replacements

Final cleaning and last transcriptions

Conclusion

Footnotes