Very short shortened Wikipedia URLs
2022-11-12
I recently learned about Wikimedia's own URL shortener service, w.wiki. It's specifically for shortening URLs of the Wikimedia project's sites (like Wikipedia or the Wiktionary), not for shortening the wider internet's URLs. An example: Wikipedia's article on Vim is at w.wiki/35Gz.
I learned about this when browsing available data dumps, and there was a dump for a list of already shortened URLs. The URL shortener avoids making duplicates by checking if there's already a shorter URL "ID" available, and this is the list it uses. The latest dump available when I'm writing this was made on 2022-11-07, and had 723 642 URLs in it then. Many of the early ones are quite useful, with two-character IDs to probably all Wikimedia sites anybody would feasibly want to use (the English Wikipedia is at w.wiki/G9 and the Latin Vicipaedia is at w.wiki/5N and so on), but as the list is scrolled down the links become more and more single-use: w.wiki/fkU is a link to a query in the Wikidata Query Service; it's a 345-character SPARQL (like SQL but for web stuff) query, and it has a syntax error and doesn't even work, but was probably meant to search for Wikipedia articles of tourist attractions in Manhattan. (w.wiki/fun is the user talk page of someone on the Bengali Wikipedia, and w.wiki/LoL is also some query, querying male Indian citizens who are in the Malayalam Wikipedia.)
The alphabet of the link IDs appears to be the digits from 2 to 9 (no 1 or 0, plus no ID begins with a 2), uppercase A-Z (but no I or O) and lowercase a-z (but no l), plus the dollar sign $, for a total alphabet size of 58. The order in the dumped listing is that, also: first numbers, then uppercase, then lowercase, finally dollar sign. This set has clearly been chosen with visual clarity in mind: not only handwriting, but some fonts also, make either no distinction or a hard to see distinction between 1 I l 0 O. It's not quite the same as Bitcoin addresses' base-58, since this lacks 1 but includes $.
What I found interesting are the 57 one-character IDs, like w.wiki/w: they're clearly human-picked, while other IDs seem to be first-come-first-serve. I wonder who picked them? Why this particular set of foods, of cities, of pop culture references? Whoever they were, we can learn a bit, a tiny bit, about them.
ID | Links to | Explanation |
---|---|---|
3 | www.wikimedia.org/ | The Wikimedia main page gets the first link, since they run the whole show |
4 | www.wikidata.org/Douglas Adams' Wikidata page | |
5 | en.wikipedia.org/The five fundamental principles of Wikipedia (it's an encyclopedia, neutral point of view, free content, respect and civility, no firm rules) | |
6 | phabricator.A comment with song lyrics set to "Smells Like Teen Spirit" – posted on 3 January 2018, so we have a lower bound for when these links were made | |
7 | www.wikidata.org/The word "cat"; the Wikidata page for the word itself. (w.wiki/L7 leads to the Samogitian Wikipedia) | |
8 | www.wikidata.org/Wikidata page for happiness: a "mental or emotional state of well-being characterized by pleasant emotions" | |
9 | phabricator.The original bug report or feature request that started the search for w.wiki. Apparently the shortener was finally deployed on 2018-04-11. | |
A | fa.wikipedia.org/Farsi: Alan Turing, influential computer scientist | |
B | de.wikipedia.org/German: beer | |
C | fr.wikipedia.org/French: croissants, the food, rather than other meanings of "crescent" | |
D | en.wikipedia.org/Darth Vader is a fictional villain whose name begins with a D | |
E | en.wikipedia.org/Humorous self-reference | |
F | en.wikipedia.org/Free and open-source software | |
G | pl.wikipedia.org/Polish: Gdańsk, previously also known in English as Danzig, a city in Poland of significant historical importance | |
H | he.wikipedia.org/Hebrew: Haifa, a city in Israel | |
J | he.wikipedia.org/Hebrew: Jerusalem, another city in Israel | |
K | ko.wikipedia.org/Korean: kimchi, the favorite food of the Koreans | |
L | en.wikipedia.org/Queer | |
M | en.wikipedia.org/MediaWiki is the software that Wikipedia runs on | |
N | en.wikipedia.org/A video game of legendary influence | |
P | en.wikipedia.org/Fictional captain of the fictional starship USS Enterprise | |
Q | www.wikidata.org/Wikidata is mainly composed of items, which are identified by a code number beginning with a Q (like Q8 or Q42) | |
R | en.wikipedia.org/Computer scientist, creator of the C language and co-creator of Unix | |
S | sv.wikipedia.org/Swedish: Stockholm, capital city and largest city of Sweden | |
T | zh.wikipedia.org/Chinese: Taipei City, capital of the Republic of China but not the biggest city on the island of Taiwan | |
U | en.wikipedia.org/Meta-reference: the shortened URL brings you to a page on URL shortening | |
V | en.wikipedia.org/Graphic novel published in the 1980s, about an anarchist called V in a dystopian fascist Britain | |
W | en.wikipedia.org/Sort of meta: Wikipedia's article on itself | |
X | en.wikipedia.org/A disambiguation page on Wikipedia. The number 42 is culturally significant among nerds (affectionate) because it featured as a plot element in The Hitchhiker's Guide to the Galaxy. I can't help but feel there would've been funnier uses for w.wiki/X, but oh well. | |
Y | en.wiktionary.org/A pun: the word why is homophonic to the name of the letter Y, and this Wiktionary page will illustrate this. | |
Z | de.wikipedia.org/German: Zürich, largest city (but not capital) of Switzerland | |
a | en.wikipedia.org/Influential computer scientist | |
b | hu.wikipedia.org/Hungarian: Budapest, capital city and largest city of Hungary | |
c | commons.Wikimedia Commons: tens of millions of freely usable media files | |
d | wikidata.org/ | Wikidata: Wikipedia but without the text |
e | en.wikipedia.org/Humorous self-reference | |
f | en.wikipedia.org/An ideological approach to software | |
g | gerrit.Wikimedia's code review thing | |
h | en.wikipedia.org/A physical constant of fundamental importance, tying a photon's energy to its frequency, with the symbol h | |
i | en.wikipedia.org/Official Wikipedia policy: "If a rule prevents you from improving or maintaining Wikipedia, ignore it." (That's it, in its entirely.) | |
j | en.wikipedia.org/A celebrated nonsense poem written by Lewis Carroll, published in 1871 | |
k | en.wikipedia.org/Physical constant that (roughly speaking) ties gas pressure to temperature, with the symbol k | |
m | en.wikipedia.org/City in Mexico | |
n | wikinews.org/ | Wikipedia for news? Not entirely sure, never really used it. |
o | ores.ORES is some machine learning thing idk | |
p | phabricator.Phabricator is like Wikimedia's Bugzilla: a bug tracker | |
q | en.wikipedia.org/LGBT | |
r | en.wikipedia.org/A language much like S, used by statisticians | |
s | wikisource.org/ | Wikisource: books, texts, what have you: the free library |
t | wiktionary.org/ | Wiktionary is Wikipedia but it's a dictionary instead of an encyclopedia. It's pretty good, you should check it out. I like it way more than stuff like dictionary.com; it's less Web-2.0, less obtrusive. |
u | wikiversity.org/ | Wikiversity: "free course materials" |
v | wikivoyage.org/ | Wikivoyage: "free travel guide", dunno if it's any good |
w | wikipedia.org/ | It's Wikipedia |
x | en.wikipedia.org/1960s hypertext, predecessor to HTML+HTTP, but apparently it never went anywhere because it was too complicated | |
y | hy.wikipedia.org/Armenian: Yerevan, capital of Armenia and an ancient city | |
z | de.wikipedia.org/German: Konrad Zuse, pioneer computer scientist, built a computer in 1941 and designed a programming language in 1943–1945 | |
$ | donate.The dollar sign means money, and Wikimedia needs it | |
With a string of piped-together Unix text processing tools I was also able to count the number of times different domains were used. Combining together mobile subdomains (i.e. counting en.m.wikipedia.org as en.m.wikipedia.org) the most common domain for shortening is query.wikidata.org, with 117 314 shortenings or 16.2 %, but the Japanese, Bengali and English Wikipedias aren't far behind, with 113 522 for ja
, 112 954 for bn
, and 107 009 for en
. There's a drop of about 25 000 to the next domain, the Chinese Wikipedia with 82 338 links, and another drop of about 46 000 to the Assamese Wikipedia with 36 045 links.
(The string of commands used to get this list:)
cut -f 2 --delimiter=\| shorturls-20221107 | cut -f 3 --delimiter=/ | sed -e 's/\.m\././' | sort | uniq -c | sort --numeric --reverse | head -n 50 | sed -Ee 's/ +([0-9]+) (.*)$/<tr><td>\1<td>\2<\/tr>/'
Domain | # Links |
---|---|
query.wikidata.org | 117314 |
ja.wikipedia.org | 113522 |
bn.wikipedia.org | 112954 |
en.wikipedia.org | 107009 |
zh.wikipedia.org | 82338 |
as.wikipedia.org | 36045 |
commons.wikimedia.org | 29189 |
meta.wikimedia.org | 12489 |
as.wikisource.org | 10419 |
ckb.wikipedia.org | 9824 |
ar.wikipedia.org | 8968 |
ru.wikipedia.org | 6884 |
bn.wikibooks.org | 5091 |
simple.wikipedia.org | 4446 |
en.wiktionary.org | 4424 |
gom.wikipedia.org | 3710 |
m.wikidata.org | 3355 |
www.wikidata.org | 2931 |
es.wikipedia.org | 2729 |
bn.wikisource.org | 2027 |
de.wikipedia.org | 1963 |
zh.wikinews.org | 1841 |
gom.wiktionary.org | 1799 |
uk.wikipedia.org | 1706 |
fr.wikipedia.org | 1665 |
fa.wikipedia.org | 1430 |
bn.wikivoyage.org | 1415 |
upload.wikimedia.org | 1164 |
nl.wikipedia.org | 1125 |
it.wikipedia.org | 942 |
turnilo.wikimedia.org | 871 |
www.mediawiki.org | 869 |
bn.wiktionary.org | 835 |
pt.wikipedia.org | 817 |
incubator.wikimedia.org | 776 |
hi.wikipedia.org | 762 |
ba.wikipedia.org | 761 |
pt.wikinews.org | 750 |
ru.wikimedia.org | 604 |
pa.wikisource.org | 599 |
ca.wikipedia.org | 569 |
stats.wikimedia.org | 526 |
ko.wikipedia.org | 517 |
en.wikibooks.org | 508 |
he.wikipedia.org | 481 |
tt.wikipedia.org | 478 |
pa.wikipedia.org | 477 |
test.wikipedia.org | 449 |
m.mediawiki.org | 441 |
bpy.wikipedia.org | 436 |
The amount of use this service has among speakers of Indian languages surprised me. Perhaps the various wikis are a bigger cultural thing there, or they're used as a social network of sorts, or maybe those particular wikis have advertised w.wiki whereas there's no "short link" button on the English Wikipedia; I don't know.