人类会不会意外地向外星人发送病毒?

Retro Alien Virus

For The Future Of Humanity

Performing trillions of computations per second, ET is a powerful computing machine that predicts economic changes, locates mineral rich mines and even automates the power grid of an entire planet. Eager to decipher a mysterious code, a team of engineers, linguists, philosophers and mathematicians have meticulously programmed ET to unravel the meaning of a message which they have received from a planet called Earth. Waiting, they wonder what kind of message could have been transmitted to their technologically advanced civilization and why earthlings would attempt to contact them?

Back here on earth, speaking at the British Science Festival Dr. Anders Sandberg, of the Future of Humanity Institute at Oxford University, explains that “...it might be a good idea to gamble, and hope there is someone slightly older and wiser out there. If aliens told us something about how to handle our climate, or artificial intelligence, we might want to listen." Dr Sandberg is commenting on the Breakthrough Message competition, which aims to encourage debate about how to communicate with extraterrestrial intelligence. He also acknowledges that this could be dangerous and that extreme caution should be exercised when choosing which digital messages to send into space.

The Oxford PhD warns that humanity may accidentally send unwanted or malicious code to extraterrestrials—perhaps even a virus, which would disrupt their global network. “Languages can hide a lot of information. We’re worried about malware being sent to aliens.” Even languages that describe pictures have hidden malicious code. But, many are critical of this possibility, as aliens would, most certainly, not be building machines with either terrestrial materials or architecture. But, if there is some fundamental commonality—which is both essential to all computing, and intrinsically vulnerable—then, maybe, we should take it into consideration.

ET ticks away, humming notes and flashing numbers on a monitor placed at the center of a cooled server farm as it tries to make sense of the string of unknown symbols. To Earthlings, their technology seems exotic and unrecognizable. But, the components from which it is built could also be used to build televisions, radios and thermostats. Though, it would be impossible for a television to perform the operations that are required of ET. What this computing machine does is more important than its building materials or even its physical architecture. ET is defined only by what it does. So, what does it do? What problems does it solve? What languages can it understand?

What Is A Computer?

In computer science, all machines are classified by the sets of problems that they can solve. The most simple of these machine classes are called FSMs( finite state machines ), which consist only of states and transitions between those states. For example, consider a thermostat in a home. This machine reads thermal data as input and exists in either one of two states, a hot state ( for input greater than or equal to 70° ) or a cold state ( for input less than 70° ).

Thermo State Machine

Thermostat

That's all it does, exist in a state, and transitions state based on the value of its input. When the thermostat transitions to another state, a furnace can write a value to a room. Respectively, we can think of hot and cold as 1 and 0, and the room as one bit of memory. It is said that this machine accepts a language of temperatures. Any language that is understood by an FSM is called a Regular Language.

This machine is the most basic machine that will be of any use. But, there are an infinite number of FSMs—some of which are interesting. If a machine has many bits of memory and many states, this machine can become capable of more elaborate computations, and therefore accepts more complex languages. If a machine has an infinite amount of memory and an arbitrary number of states then it could possibly solve any computable problem.

file="modFiveAutomata.svg"

PCs, Macs and PS4s are all computers. They are all machines that posses a powerful computing capability that is far more complex than simple Finite State Machines. Each of these computer is capable of running text editors, media players and even web browsers. A web browser, itself, is capable of running web apps such as Google Docs, Spotify and Angry Birds, there is even a version of Windows 3.1 which runs on web browsers. The web browser is capable of performing the same computations as a computer. So, we can even call Firefox a computer—though, in an obviously more abstract sense, since Firefox has no physical hardware.

However, not all programs are encoded to run on all computers. If an installer for Internet Explorer were run on a Mac, the Mac would complain that the program is not readable. The code that is used to install Internet Explorer on a Windows PC does not run on a Mac or any other computer that does not run Windows. So why would they both be called computationally equivalent if one is capable of running a program while another is not?

Computers are modeled after a very special machine that accepts a Universal language. Universal Turing Machines ( UTMs ) are special kinds of machines that are not contingent upon physical architecture. Rather, they are abstract models of computing that are defined precisely by what they are capable of doing. What makes a UTM so incredible is that it's capable of simulating any other Turing Machine. Any machine—such as those that can add numbers, decompress photos or edit text documents—can be encoded such that it can be simulated by a UTM. When an encoded machine is feed into a UTM, the UTM will, essentially, become that machine. It is said that the UTM accepts the language of all Turing Machines.

My PC has a Nintendo NES emulator that lets me play the old-school Nintendo games such as Super Mario Brothers and The Legend Of Zelda. The code from which these games are animated is exactly the code that is found on their respective game cartridges that were produced in the mid 1980s. But, because of architectural differences, a copy of Super Mario Brothers will not run directly on a PC; the console will barf up some error message. PCs and NESs are built with different hardware and run different operating systems. The code from an NES game is complete nonsense to a PC. In order to simulate these games, there must be implemented an intermediate piece of software called an emulator.

An emulator is a program that simulates a specific Turing Machine. In this case, the NES emulator is an NES machine which has been encoded to run on a PC. The way in which the emulator does this is not of much importance. As long as, for each input, the emulator's output is equal to the output of an NES, we say that the two machines are equivalent. Since, the emulator allows the PC to simulate the Nintendo, we have shown that the PC can do anything that the NES can do. And, since the NES is a computer, the PC must also be a computer.

The Universal Turing Machine

Universal Turing Bender

Since both the NES and the PC are modeled after a UTM, shouldn't it be that an NES can simulate a PC? This would be correct, except for the fact that the Nintendo NES was designed with only two kilobytes of onboard memory, and a modern PC typically has four million times as much memory. The NES cannot simulate a PS4 without overflowing, yet we still say that both machines are computers because the processors of each machine are capable of the same read, write, seek and state-transition functions as a UTM. However, a true UTM has infinite memory. Which is why it can only said that these terrestrial machines are modeled after UTMs.

But, memory restrictions are not the only issue that prevents us from executing code on various machines. A Wii has much more memory than an NES and has processing speeds comparable to that of the PS4. The reason a Wii game will not play on a PS4, and vice versa, is because their programs ( games ) are encoded differently. They are encoded in such a way that guarantees mutual exclusion between machines. If the encoding of these machines were identical, a Wii could be simulated inside a PS4 inside a Wii.

Encoding

On Earth, TMs typically process binary data ( ones and zeros ). But, binary data can be converted to equivalent encodings of unary ( just ones ), ternary ( 0, 1, 2 ), analogue ( continuous spectrum ), etc... Therefore a string of binary data could be simulated on an analogue machine, and vice versa, by simply converting between the two systems of encoding. As an example, imagine a digital camera that takes in light ( an analogue signal ), and then converts it into binary, storing it in memory until the image is to be reconstructed. Binary data is then converted back into an array of colors, displayed on an LCD screen.

Same Language Different Encoding

0 = 0;
1 = 1;
2 = 2;
3 = 10;
4 = 11;
5 = 12;
6 = 20;
7 = 21;
8 = 22;
9 = 100;
10= 101;
11= 102;
12= 110;
13= 111;
14= 112;
15= 120;

FSM accepting the language of ternary strings divisible by five.

Programs take data as input and produce an output. But, data is just data. The data that is found inside a picture is indistinguishable from the data found inside a text. It is the program's responsibility to interpret the data. Text is just the human readable output of a program that takes a binary file as input. When a text file, containing the text “Hello, world.”, is opened with a text editor, the program will read a string of binary data that looks like this:

01001000 01100101 01101100 01101100 01101111 00101100 00100000 01110111 01101111 01110010 01101100 01100100 00101110 00001010

Each grouping of eight bits represents a single character; ten letters, a coma, a space, a period and a character signaling the end of file. A text editor can read this data and displays it as a set of alphanumeric characters ( Hello, world. ).

Likewise, binary data from a photo can be input into a text editor and it will be displayed as an alphanumeric representation, as if it were a book. Below is an example of a small image that has been opened as text. Many images are much larger, composed of a quarter of a million characters or more.

Simple Image Encoding

01000010 01001101 10001010 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00110110 00000000 00000000 00000000 00101000 00000000 00000000 00000000 00000010 00000000 00000000 00000000 00000010 00000000 00000000 00000000 00000001 00000000 00011000 00000000 00000000 00000000 00000000 00000000 00010000 00000000 00000000 00000000 00010011 00001011 00000000 00000000 00010011 00001011 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 11111111 11111111 11111111 00000000 00000000 11111111 11111111 11111111 00000000 00000000 00000000 00000000 00000000

The image on the left is a four pixel bitmap. On the rigth is its binary contents. If this file is opened in a text-editor, the output will look like this.

BMŠ 6 ( ਠ ਠ ਠ ਜ ਠ ਜਠ ਜਠ ÿÿÿ ÿÿÿ

Notice the header field “BM”. This tag tells an image-viewer program that this file is a BitMap image. If the image-viewer reads a file which does not contain such a header field then it barfs up some error message. However, if the file contains the correct header information then the image-viewer reads the rest of the data as if it is a picture.

We could take the data from the file containing “Hello, world.” and then insert that message into an image file with an image header, so that it is encoded in a way that an image-viewer will accept.

Hidden Data Inside Image

01000010 01001101 10001010 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00110110 00000000 00000000 00000000 00101000 00000000 00000000 00000000 00000010 00000000 00000000 00000000 00000010 00000000 00000000 00000000 00000001 00000000 00011000 00000000 00000000 00000000 00000000 00000000 00010000 00000000 00000000 00000000 00010011 00001011 00000000 00000000 00010011 00001011 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 01001000 01100101 01101100 01101100 01101111 00101100 00100000 01110111 01101111 01110010 01101100 01100100 00101110

The image has been altered and now contains the text "Hello, world."

BMŠ 6 ( ਠ ਠ ਠ ਜ ਠ ਜਠ ਜਠ Hello, world.

The process of hiding messages in photos is called stegonography

Earlier, it was explained that programs are machines that have been encoded in binary. It must follow that a program could be input for another program. When a program is feed into a text editor, we see something like this:

ELF###.........#.>.#...#@.....

Notice that there is a tag “ELF”, in the header field, which tells another program that the code that follows is to be executed as a program. This is because most programs are not given complete control over a machine's hardware but rather are run within another program—the operating system ( OS ).

The OS, whether it be Windows, Mac OSX or Linux, exercises its authority to prevent its tenant programs from performing malicious operations on the system, such as writing to another program's memory, wasting resources or directly taking control of the machine's hardware. However, due to a program's ignorance, it is sometimes possible to trick it into preforming some malicious operation.

What Is A Virus?

What would be interesting is if a piece of data could instruct a program to copy the data and then to write itself into another file. This code must be cleaver enough to trick a program into copying it. It must know something about the machine in order to trick it. However, this code doesn't have to be written with the intention of causing infection, it could, hypothetically, happen by accident.

The data in a bitmap file can be thought of as instructions that an image-viewer uses to reconstruct an image. However, there is a chance that this data could instruct the image-viewer to perform a more nefarious operation. If the image file is able to trick the image-viewer into replicating the picture then we would call that image a virus. Any data that tricks a program into replicating that data is a virus. More abstractly, a virus can be thought of as a machine that can only be simulated by another, unwitting, machine.

Far away in the Gear System, Revolio Clockburg is operating a 3d printer factory but is not capable of producing enough 3d printers to satisfy the demand of his customers. He decides to program a 3d printer to construct the vital components of a 3d printer. But, Clockburg realizes that the parts of the 3d printers are useless without being assembled. He must build and program a machine to assemble these parts.

Revolio programs a 3D printer to produce all the pieces of the assembler and writes a program so that the assembler can assemble the 3D printers. Now here is the tricky part, there is no assembler to assemble any of the assembler. He must instruct an existing machine to build the first assembler. Luckily, he is a machine sophisticated enough to build this machine.

After some programming and assembling, his first printing/assembly machine combo is able to produce another printing/assembly machine and by the time the first machine is making its second machine its first child is already ready to build its own child. The reproduction of these machines will explode at an exponential rate, causing the machines to outgrow the reproduction of Revolio's peoples and starve them of precious resources. This could be the beginning of a Gear War.

This type of machine is called a worm. It replicates itself but does not infect other machines. A virus is the code that infects new machines, commissioning them to replicate more instances of themselves. But, an infected machine must be complex enough to simulate a virus. That is, in order to trick a machine, Revolio, into simulating another machine, Dot, Revolio must already be capable of accomplishing the same operations as Dot.

Furthermore, Dot must be very cleaver to devise a question, which is herself the answer. So what does she ask of Revolio?

“What is this question?” This question could have a definitive answer such as “who am I” or “what is purpose?”. However, “What is this question?” is also an answer to “What is this question?”. But, a program that does not understand the context of this question is bound to give a nonsensical answer, such as 42.

To interpret this question as a recursive answer to itself, it must first understand the context of the question. More precisely, it must understand that “this” is a reference to the question itself. This question would have to be somewhat self aware in the regards of knowing where it is and how big it is. “this” has a beginning and an end. “this” == “what is this question?”

Should it be assumed that these aliens possess the technology to determine if some data is actually a virus? More precisely, can it be decided that a piece of code can reproduce itself? In order to answer this, a question must first be asked: will a binary string, w, produce itself as an output for some program v? This is analogous to asking a machine if there is some input that would cause it to print “hello, world.” or jump off a bridge. The short answer is no, it is not even possible to design an algorithm to solve such a problem.

Since it's not possible to design a program that determines whether a piece of code is a virus, designing a virus detection software must use a different strategy. The way it's done is similar to the way that biological organisms fight viruses. If it is understood that a string of code reproduces itself in an undesirable way, then a picture of this code is taken—an antibody. Now, this image is input into a program that can decide whether or not a piece of code matches this image. When there is a match, the program knows that the data is infected with a virus.

Probability

Could a pet bird catch a human's flu? Most of the time when a human or a bird is infected by an influenza virus, the virus will not transmit between the two species. When a virus is introduced to a human from a bird, or vice versa, the other species will not recognize the virus as a legitimate set of instructions. The body will disregard the unknown bundle of DNA and RNA, passing it through the kidneys and liver before finally excreting the waste. However, every once in a while the virus evolves, in just the right way, such that it's recognized by the other species. This evolution is both a probability and an inevitability, but probably not likely.

So, imagine that there exists a 1024-bit string ( just smaller than the smallest variant of the Pixel virus ) that causes a viral reaction on the alien machine, ET. It can't be known what this string is. Arbitrarily picking this string, w, is like picking numbers on a lottery ticket. No one knows which numbers will be drawn, it can only be hoped that the correct numbers were picked. Since w is much larger than the numbers on a lottery ticket, any ticket has a greater chance of being a winning ticket than any string being w. What is the probability that any string will in fact be the string that causes a viral reaction?

This is actually pretty easy to calculate if w is thought of as a series of coin tosses. The base case is when w is a single bit string. If a coin is tossed and lands tails up then it is a '0'. If it lands heads up then it is a '1'. Since there are only two possibilities for which this string could be—'0' or '1'—it is that there is a one in two chance that the coin will land on the virus side.

So, lets consider that w now has two bits ( coins ). Since there are two more options, the new number of possibilities can be found by multiplying the previous number of possibilities by two. There are now four possibilities: 00, 01, 10 or 11. And now, it is that there is a one in four chance that the sequence of flips will result in the viral string. We continue to multiply by two for each bit we add. After multiplying this number by two for each bit in the string we are left with 2^1024 for the number of possibilities that could arise for w.

Here is 2^1028, that I calculated with Big Number Calculator.

2, 876, 309, 015, 779, 705, 452, 366, 888, 305, 262, 439, 573, 788, 763, 166, 307, 690, 516, 374, 881, 298, 523, 722, 812, 888, 015, 410, 123, 335, 637, 158, 520, 576, 337, 921, 822, 077, 942, 293, 722, 540, 636, 301, 030, 665, 959, 885, 558, 890, 231, 585, 990, 044, 286, 294, 797, 847, 764, 420, 835, 513, 619, 937, 505, 911, 249, 327, 233, 360, 092, 301, 410, 410, 917, 479, 406, 103, 582, 609, 768, 653, 235, 794, 613, 608, 170, 953, 380, 771, 839, 155, 935, 015, 675, 460, 877, 365, 701, 273, 987, 586, 195, 456

This number of combinations is on the order of magnitude of 10311!!! To put this in perspective, there are an estimated 10123 atoms in the known Universe. If there were as many universes as there are atoms in our Universe and we were to count the number of atoms in all these universes, we would still have more possibilities for our choice of w.

The odds of picking this one viral string is like winning the lottery 30 different times, with only 30 tickets. But, just because this one string is a virus doesn't mean that there aren't others. There are probably a lot more. The only way to know if a string is a virus is to run it on a machine and see what happens. Even though we cannot know how many viruses exist within a set of strings, the number is probably pretty low.

Summary

Humanity may decide to send a message into the depths of space in an attempt to contact extraterrestrial life. The writers of this message should consider how this message could affect alien technology as much as they, hopefully, will consider how a response could affect humanity.

For all we know, if aliens receive our message, their technology could be much more advanced than ours, or it could be more primitive. All we can say for sure is that in order for a machine to decode a message, it must be at least as complex as the language in which the message is written. What is required of a machine to decode the most sophisticated of languages is precisely what is required of a message to hijack that machine. The most sophisticated computing machines are capable of simulating other computing machines and are vulnerable to self replicating code.

Arbitrarily creating a message which could damn a machine is probably unlikely. It may even be naive of us to assume that aliens aren't already thinking of this possibility. However, knowing this could allow Earthlings to better design a message that ET can understand. Perhaps they'll expect this message to not be written in a language with the context to reference itself. Any discussion we have may help ET to understand us when we contact it.

Creative Commons License
人类会不会意外地向外星人发送病毒? by Jesse Riggs is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Based on a work at jesse-riggs.com.