Very short shortened Wikipedia URLs

I recently learned about Wikimedia's own URL shortener service, w.wiki. It's specifically for shortening URLs of the Wikimedia project's sites (like Wikipedia or the Wiktionary), not for shortening the wider internet's URLs. An example: Wikipedia's article on Vim is at w.wiki/35Gz.

I learned about this when browsing available data dumps, and there was a dump for a list of already shortened URLs. The URL shortener avoids making duplicates by checking if there's already a shorter URL "ID" available, and this is the list it uses. The latest dump available when I'm writing this was made on 2022-11-07, and had 723 642 URLs in it then. Many of the early ones are quite useful, with two-character IDs to probably all Wikimedia sites anybody would feasibly want to use (the English Wikipedia is at w.wiki/G9 and the Latin Vicipaedia is at w.wiki/5N and so on), but as the list is scrolled down the links become more and more single-use: w.wiki/fkU is a link to a query in the Wikidata Query Service; it's a 345-character SPARQL (like SQL but for web stuff) query, and it has a syntax error and doesn't even work, but was probably meant to search for Wikipedia articles of tourist attractions in Manhattan. (w.wiki/fun is the user talk page of someone on the Bengali Wikipedia, and w.wiki/LoL is also some query, querying male Indian citizens who are in the Malayalam Wikipedia.)

The alphabet of the link IDs appears to be the digits from 2 to 9 (no 1 or 0, plus no ID begins with a 2), uppercase A-Z (but no I or O) and lowercase a-z (but no l), plus the dollar sign $, for a total alphabet size of 58. The order in the dumped listing is that, also: first numbers, then uppercase, then lowercase, finally dollar sign. This set has clearly been chosen with visual clarity in mind: not only handwriting, but some fonts also, make either no distinction or a hard to see distinction between 1 I l 0 O. It's not quite the same as Bitcoin addresses' base-58, since this lacks 1 but includes $.

What I found interesting are the 57 one-character IDs, like w.wiki/w: they're clearly human-picked, while other IDs seem to be first-come-first-serve. I wonder who picked them? Why this particular set of foods, of cities, of pop culture references? Whoever they were, we can learn a bit, a tiny bit, about them.

IDLinks toExplanation
3www.wikimedia.org/The Wikimedia main page gets the first link, since they run the whole show
4www.wikidata.org/wiki/Q42Douglas Adams' Wikidata page
5en.wikipedia.org/wiki/Wikipedia:Five_pillarsThe five fundamental principles of Wikipedia (it's an encyclopedia, neutral point of view, free content, respect and civility, no firm rules)
6phabricator.wikimedia.org/T183647#3871427A comment with song lyrics set to "Smells Like Teen Spirit" – posted on 3 January 2018, so we have a lower bound for when these links were made
7www.wikidata.org/wiki/Lexeme:L7The word "cat"; the Wikidata page for the word itself. (w.wiki/L7 leads to the Samogitian Wikipedia)
8www.wikidata.org/wiki/Q8Wikidata page for happiness: a "mental or emotional state of well-being characterized by pleasant emotions"
9phabricator.wikimedia.org/T44085The original bug report or feature request that started the search for w.wiki. Apparently the shortener was finally deployed on 2018-04-11.
Afa.wikipedia.org/wiki/آلن_تورینگFarsi: Alan Turing, influential computer scientist
Bde.wikipedia.org/wiki/BierGerman: beer
Cfr.wikipedia.org/wiki/Croissant_(viennoiserie)French: croissants, the food, rather than other meanings of "crescent"
Den.wikipedia.org/wiki/Darth_VaderDarth Vader is a fictional villain whose name begins with a D
Een.wikipedia.org/wiki/Easter_egg_(media)Humorous self-reference
Fen.wikipedia.org/wiki/FOSSFree and open-source software
Gpl.wikipedia.org/wiki/GdańskPolish: Gdańsk, previously also known in English as Danzig, a city in Poland of significant historical importance
Hhe.wikipedia.org/wiki/חיפהHebrew: Haifa, a city in Israel
Jhe.wikipedia.org/wiki/ירושליםHebrew: Jerusalem, another city in Israel
Kko.wikipedia.org/wiki/김치Korean: kimchi, the favorite food of the Koreans
Len.wikipedia.org/wiki/LGBTQueer
Men.wikipedia.org/wiki/MediaWikiMediaWiki is the software that Wikipedia runs on
Nen.wikipedia.org/wiki/NetHackA video game of legendary influence
Pen.wikipedia.org/wiki/Jean-Luc_PicardFictional captain of the fictional starship USS Enterprise
Qwww.wikidata.org/wiki/Help:ItemsWikidata is mainly composed of items, which are identified by a code number beginning with a Q (like Q8 or Q42)
Ren.wikipedia.org/wiki/Dennis_RitchieComputer scientist, creator of the C language and co-creator of Unix
Ssv.wikipedia.org/wiki/StockholmSwedish: Stockholm, capital city and largest city of Sweden
Tzh.wikipedia.org/wiki/臺北市Chinese: Taipei City, capital of the Republic of China but not the biggest city on the island of Taiwan
Uen.wikipedia.org/wiki/URL_shorteningMeta-reference: the shortened URL brings you to a page on URL shortening
Ven.wikipedia.org/wiki/V_for_VendettaGraphic novel published in the 1980s, about an anarchist called V in a dystopian fascist Britain
Wen.wikipedia.org/wiki/WikipediaSort of meta: Wikipedia's article on itself
Xen.wikipedia.org/wiki/42A disambiguation page on Wikipedia. The number 42 is culturally significant among nerds (affectionate) because it featured as a plot element in The Hitchhiker's Guide to the Galaxy. I can't help but feel there would've been funnier uses for w.wiki/X, but oh well.
Yen.wiktionary.org/wiki/whyA pun: the word why is homophonic to the name of the letter Y, and this Wiktionary page will illustrate this.
Zde.wikipedia.org/wiki/ZürichGerman: Zürich, largest city (but not capital) of Switzerland
aen.wikipedia.org/wiki/Alan_TuringInfluential computer scientist
bhu.wikipedia.org/wiki/BudapestHungarian: Budapest, capital city and largest city of Hungary
ccommons.wikimedia.org/Wikimedia Commons: tens of millions of freely usable media files
dwikidata.org/Wikidata: Wikipedia but without the text
een.wikipedia.org/wiki/Easter_eggHumorous self-reference
fen.wikipedia.org/wiki/Free_softwareAn ideological approach to software
ggerrit.wikimedia.org/Wikimedia's code review thing
hen.wikipedia.org/wiki/Planck_constantA physical constant of fundamental importance, tying a photon's energy to its frequency, with the symbol h
ien.wikipedia.org/wiki/Wikipedia:Ignore_all_rulesOfficial Wikipedia policy: "If a rule prevents you from improving or maintaining Wikipedia, ignore it." (That's it, in its entirely.)
jen.wikipedia.org/wiki/JabberwockyA celebrated nonsense poem written by Lewis Carroll, published in 1871
ken.wikipedia.org/wiki/Boltzmann_constantPhysical constant that (roughly speaking) ties gas pressure to temperature, with the symbol k
men.wikipedia.org/wiki/Mexico_CityCity in Mexico
nwikinews.org/Wikipedia for news? Not entirely sure, never really used it.
oores.wikimedia.org/ORES is some machine learning thing idk
pphabricator.wikimedia.org/Phabricator is like Wikimedia's Bugzilla: a bug tracker
qen.wikipedia.org/wiki/QueerLGBT
ren.wikipedia.org/wiki/R_(programming_language)A language much like S, used by statisticians
swikisource.org/Wikisource: books, texts, what have you: the free library
twiktionary.org/Wiktionary is Wikipedia but it's a dictionary instead of an encyclopedia. It's pretty good, you should check it out. I like it way more than stuff like dictionary.com; it's less Web-2.0, less obtrusive.
uwikiversity.org/Wikiversity: "free course materials"
vwikivoyage.org/Wikivoyage: "free travel guide", dunno if it's any good
wwikipedia.org/It's Wikipedia
xen.wikipedia.org/wiki/Project_Xanadu1960s hypertext, predecessor to HTML+HTTP, but apparently it never went anywhere because it was too complicated
yhy.wikipedia.org/wiki/ԵրևանArmenian: Yerevan, capital of Armenia and an ancient city
zde.wikipedia.org/wiki/Konrad_ZuseGerman: Konrad Zuse, pioneer computer scientist, built a computer in 1941 and designed a programming language in 1943–1945
$donate.wikimedia.org/The dollar sign means money, and Wikimedia needs it

With a string of piped-together Unix text processing tools I was also able to count the number of times different domains were used. Combining together mobile subdomains (i.e. counting en.m.wikipedia.org as en.m.wikipedia.org) the most common domain for shortening is query.wikidata.org, with 117 314 shortenings or 16.2 %, but the Japanese, Bengali and English Wikipedias aren't far behind, with 113 522 for ja, 112 954 for bn, and 107 009 for en. There's a drop of about 25 000 to the next domain, the Chinese Wikipedia with 82 338 links, and another drop of about 46 000 to the Assamese Wikipedia with 36 045 links.

(The string of commands used to get this list:)

cut -f 2 --delimiter=\| shorturls-20221107 remove the tag | cut -f 3 --delimiter=/ grab the domain | sed -e 's/\.m\././' remove ".m" from domains | sort | uniq -c | sort --numeric --reverse | head -n 50 | sed -Ee 's/ +([0-9]+) (.*)$/<tr><td>\1<td>\2<\/tr>/' format as HTML table rows

Domain# Links
query.wikidata.org117314
ja.wikipedia.org113522
bn.wikipedia.org112954
en.wikipedia.org107009
zh.wikipedia.org82338
as.wikipedia.org36045
commons.wikimedia.org29189
meta.wikimedia.org12489
as.wikisource.org10419
ckb.wikipedia.org9824
ar.wikipedia.org8968
ru.wikipedia.org6884
bn.wikibooks.org5091
simple.wikipedia.org4446
en.wiktionary.org4424
gom.wikipedia.org3710
m.wikidata.org3355
www.wikidata.org2931
es.wikipedia.org2729
bn.wikisource.org2027
de.wikipedia.org1963
zh.wikinews.org1841
gom.wiktionary.org1799
uk.wikipedia.org1706
fr.wikipedia.org1665
fa.wikipedia.org1430
bn.wikivoyage.org1415
upload.wikimedia.org1164
nl.wikipedia.org1125
it.wikipedia.org942
turnilo.wikimedia.org871
www.mediawiki.org869
bn.wiktionary.org835
pt.wikipedia.org817
incubator.wikimedia.org776
hi.wikipedia.org762
ba.wikipedia.org761
pt.wikinews.org750
ru.wikimedia.org604
pa.wikisource.org599
ca.wikipedia.org569
stats.wikimedia.org526
ko.wikipedia.org517
en.wikibooks.org508
he.wikipedia.org481
tt.wikipedia.org478
pa.wikipedia.org477
test.wikipedia.org449
m.mediawiki.org441
bpy.wikipedia.org436

The amount of use this service has among speakers of Indian languages surprised me. Perhaps the various wikis are a bigger cultural thing there, or they're used as a social network of sorts, or maybe those particular wikis have advertised w.wiki whereas there's no "short link" button on the English Wikipedia; I don't know.