TeX hyphenation algorithm
9 February 2022
Wikipedia's article on hyphenation mentions the TeX typesetting system's marvellous hyphenation algorithm, and then states:
In TeX's original hyphenation patterns for American English, the exception list contains only 14 words.
A source is given, and it's TeX source code, in a file called hyphen.tex
. Of course, one needs an actual computer to open a text file with such a weird extension, a phone won't do, so for the benefit of past me who was curious, here are the 14 exceptions:
- as·so·ciate
- as·so·ciates
- dec·li·na·tion
- oblig·a·tory
- phil·an·thropic
- present
- presents
- project
- projects
- reci·procity
- re·cog·ni·zance
- ref·or·ma·tion
- ret·ri·bu·tion
- ta·ble
They make quite a lot of sense, in my opinion, as in I can see why an algorithm would like to hyphenate present or project as pre-sent or pro-ject, or table as tab-le, or declination as de-cli-na-tion. I'm less sure about what the algorithm would've done to associate and associates, though.
On my TeXlive installation, ushyphex.tex
contains around 1760 more exceptions; here's a small sample:
- acad·e·my
- acad·e·mies
- acro·nym
- ad·a·mant
- adren·a·line
- aero·space
- anon·y·mous
- asymp·to·matic
- as·ymp·tot·ic
- bed·rock
- be·dwarf
- busier
- busi·est
- bussing
- ca·coph·ony (I don't agree with this one, I prefer caco·phony)
- co·re·la·tion (not correlation)
- ga·lac·tic
- gal·axy
- graph·eme
- gra·phe·mic
- ico·no·grapher (I don't like this one either)
- icon·o·graphic
- input·enc (a LaTeX thing)
- meth·od (don't like this either, prefer me·thod)
- pseu·dog·rapher (ew)
- quaint·er (probably necessary, rather than qua·in·ter)
- sem·itic
- spic·i·ly (nope)
- vi·cars
- vis·ual
- wave·let
- yes·ter·year
The file hyph-en-gb.tex
from the TeX Users Group page on hyphenation patterns gives these exceptions for British English:
- uni·ver·sity
- uni·ver·sit·ies
- how·ever
- ma·nu·script
- ma·nu·scripts
- re·ci·pro·city
- through·out
- some·thing