by Lawrence Kesteloot, January 2009
(Also available as an e-book from Amazon.)
After a minute Patrick came back with a small dusty cardboard box. Dave and I stared as Patrick opened it and pulled out a network switch, the old kind from the days when they were made with metal cases. He plugged in the power supply and carefully straightened out a CAT-5 cable to hook the switch up to our network. I wanted to yell at him for being so careful and deliberate at a time like this. Dave sat next to me and was uncharacteristically still.
I stopped breathing as Patrick struggled to get the plug lined up with the port. I stared at the front panel lights, and felt Dave doing the same. My eyes watered. Patrick pushed the plug in. The front lights immediately lit and flashed actively. I felt my hands and face flush, and out of the corner of my eye saw Dave sit up and open his mouth as if to speak. He then put his face down into his cupped hands, and threw up.
This is not how we would have reacted three weeks earlier. Three weeks earlier we had settled down into that difficult part of a project after the honeymoon ends and you have to implement the tedious parts of the product, the parts that were just simple boxes in the design, and seemed straightforward, but turned out to be tricky for reasons that are incidental, and therefore uninteresting.
Patrick and Dave had done the design, of course, and had humored me as I had suggested various ideas which in retrospect must have been obvious to them. It had been fascinating to hear them debate design issues. So many of their arguments were based on intuition rather than reasoned logic. I had learned more each week than my entire last semester of school.
I found myself siding more with Dave. Dave Mitchell had a great laugh. He was on the heavy side and knocked things over a lot. I liked the insights he had about management and the process of software development, and he liked to teach me things. Patrick and Dave had split up the work they had designed, and they had given me tests to write in advance of their code being ready.
It had been a quiet day when I heard Dave whisper, “What the hell?” and put his hands into his hair. I sensed an opportunity to learn and walked over to his desk. He didn’t notice. He was quickly switching back to his editor, adding a line of debug output, compiling, running, shaking his head, and switching back to add another line elsewhere.
“What’s the matter?”
He waited until the program finished running to answer. “I can’t figure out this bug. I can’t figure out where this number is coming from here.” He pointed to a line in this debug output. Dave and Patrick liked to debug by writing lines to the console, rather than stepping through with a debugger, as I liked to do.
“What happens when you step through it in the debugger?” This was a tease. Dave didn’t know how to use a debugger, short of displaying the stack trace of a core file.
Dave paused, then brought up the debugger on his program. I smiled in surprise, then walked him through how to set breakpoints. We spent the next few hours narrowing down where, in the mess of his code, Boost, and the system libraries, this value was coming from.
“Patrick, can you help us here?” This surprised me more than the debugger. Dave rarely asked Patrick for help.
Patrick came over and Dave explained the situation. “The program is crashing, and that’s because of this bad dereference here, but the value is correct here.” Dave pointed at some code while Patrick’s eyes scanned the screen. Then a thought occurred to me. Dave was still explaining and I didn’t want to interrupt. My heart was jumping as I waited for Dave to take a breath.
“I think it’s a compiler bug,” I said with confidence, squinting my eyes at the code. Patrick and Dave raced each other to brush this off.
“Blaming the compiler is a last resort,” said Patrick. “Same with the standard library. Chances are vastly greater that your brand-new code has a bug, rather than code being used by thousands of people.” I nodded wisely, but I felt my cheeks blush.
Patrick Karlsson often made me feel bad. He never meant to, I’m sure, but he had a way of arguing in a brutally straightforward way that made me feel thoughtless and young. He was taller than Dave and I, and thin. He was quiet, except when correcting something I had said or code I had written, but I liked him anyway. I sometimes found his design decisions questionable and naive. Dave and I liked to gang up on him.
Patrick was cracking his knuckles. He asked questions about threads and volatile and aliasing, questions I didn’t entirely understand but wish I’d thought of. Dave considered each and said that none of Patrick’s concerns applied in this case. Patrick pursed his lips to the right, the way he did when he wasn’t sure the answers he’d been given were correct, and said, “Hmmm, I don’t know. Weird,” and returned to his desk.
I was relieved that Patrick hadn’t crushed us with an obvious solution, and I guessed that Dave felt the same as he sat back up from his slouch and started typing. “Is there any way to show the assembly right here where things go pear-shaped?”
I showed him how to do this, and I wondered if he was giving my theory a chance. He typed the command and the lines of C were split apart by dozens of lines of assembly.
“That’s not right,” Dave said, “We got the command wrong.”
“I’m sure that’s right,” I said. I knew the debugger well, and by “we” he meant me.
“No it’s not, that far too much assembly for this line of code. And some of these aren’t even legal assembly instructions.”
“That’s not possible,” said Patrick from his desk.
Dave blinked slowly. “Well, I meant that I’ve never seen these, they’re not what a compiler outputs, this isn’t the right place.” He was flustered, and I felt more inexperienced than ever. Dave got up and said, “I’m burned out, we’ll look at this again tomorrow.”
He left and I sat down at his computer. The only thing I had contributed all day was knowledge about the debugger, and Dave didn’t even think I had gotten that right. I looked at the assembly on the screen. Some of the instructions were obvious and some were cryptic. I found an online assembly reference and started tracing the instructions, keeping notes about register contents in my notebook.
Dave was right, these instructions made no sense. Not only did they perform too much work for the corresponding C code, but they didn’t even make sense internally. But I convinced myself that this was at least the right section, because a few of the instructions matched the surrounding C lines.
I focused on one instruction in particular. It was subtracting a register from itself. That’s not necessarily odd in itself: it might be an efficient way to set a register to zero. But this register was then used in other math instructions. The compiler must have known it’d be zero and optimized it out. I hesitated, then called Patrick over. I explained what I had found. He stared at it for a good while, then said, “That’s not a normal subtraction instruction.”
He was right. I looked it up more carefully, and found that it was a variation that also subtracted out the carry bit. This was a way to get the carry bit into a register. I worked my way backwards to see where the carry was set. The code got increasingly convoluted, and I repeatedly made incorrect assumptions that threw me off-course.
I turned around to ask Patrick a question and found an empty desk. It was nearly midnight. I set the office alarm and rode my bike home.
I got in late the next morning. I hadn’t slept well, and it had taken me hours to fall sleep. Each time I closed my eyes I saw fast-moving assembly instructions in large bright letters.
I walked straight to Dave’s desk to discuss the previous night’s investigations, but he wasn’t interested.
“You were right,” he said, “It was a compiler bug. I tweaked the C code and I’m no longer triggering it. That weird assembly is gone.”
I was in the awkward position of having to contradict the compliment. “I don’t know if it’s a bug. The code I saw wouldn’t have been generated by a bug.”
“It was definitely a compiler bug.” I had worked with Dave enough to know that the more confidently he spoke, the less secure he was. But he was tired of being held up by the problem, and I dropped it.
Dave walked up to my desk with a grin and a cup from Peet’s Coffee. I thought all coffees tasted fine, but he was picky, so I pretended to be picky so that I could spend time with him walking to Peet’s. I felt a bit stung that he’d gone without me this time.
“You remember that compiler bug from Tuesday?” He grinned some more. His teeth were yellow from the espresso.
“It happened again this morning. Same file, same problem. The weird assembly is back. The strange thing is that I didn’t change the C file.” He was surprisingly mellow about it.
“Maybe you changed a header file?” I’m brilliant.
“Oh maybe.” He frowned. “Actually, no. Let me check SVN.”
He came back two minutes later. No relevant files had changed.
Patrick walked over. “This is no good. Here it’s a crash, but it could be something more subtle. In four months our system will be so complex that a subtle problem will take weeks to track down. Can you look at the compiler’s release notes and see if we can upgrade?”
I looked briefly and found what I had expected. We were on the latest and most stable version of the recommended major release. It didn’t get more stable than that. I went further and searched for some permutation of “weird assembly” and “compiler bug” and found nothing.
I compiled Dave’s code on my computer and disassembled it. The strange code was there, at the same place it had been Tuesday. I recognized the subtract-with-carry instruction and a few others that seemed unusual. I also looked through the rest of the code. It was striking how the unusual instructions didn’t appear anywhere else. The rest of the generated code used instructions you would expect. After all, when do you ever need to subtract and use the existing value of the carry bit?
I downloaded the source code of the compiler. I had never been so overwhelmed. It was a tangled mess of compiler passes, plug-in frameworks, embedded languages to describe CPU architectures, and abstraction layers. I went straight to the files that described the translation from the abstract syntax tree to the machine’s assembly. I grepped for the subtract-with-carry. It wasn’t there. I looked for a few other odd ones. Some were there, most weren’t.
At lunch I mentioned my findings. The small office we used had a patio, which, it being Mountain View, was nearly always usable. Friday was Togo’s day. I preferred Subway, but Dave hated the bread.
“That’s super weird,” said Patrick, frowning at the table. He said “super” a lot. I think it’s a California thing. “Which other ones weren’t in the translation file?”
“I don’t remember,” I said. “A few more that involved the carry bit. Some vector instructions.”
Patrick looked up. “That compiler doesn’t do vectorization.”
“That’s what I just said.”
“Either the instruction was generated by mistake or you’re disassembling non-code.” He was still frowning.
Dave cut in. “It’s definitely code. This is what’s causing our bug. It’s being executed.”
“And it’s not generated by mistake,” I added. “It’s a math instruction, but I don’t think it’s being used for math. It’s not nonsense.”
“Neat.” We had finally hooked Patrick. After lunch he and I sat down at my desk and walked through the part of the code I knew best. I showed him how although the instructions were obscure and strangely used, they did make sense, all together. There was a deliberate flow of data.
“Let’s look for a backward jump instruction,” he said.
“The target of such a jump may be the top of a loop. It’s a good place to start the analysis.”
“You’re actually going to figure out what this code is doing?”
“Oh yeah.” He smiled and his eyes lit up.
It took us the rest of the afternoon to pick through the convoluted jump targets and decode four consecutive instructions. That snippet, it turns out, was finding the sign of an integer. Anyone else would have done a simple comparison and a jump to set the output register to -1, 0, or 1, but the four instructions were a mess of instructions that all either set the carry bit as a side-effect, or used it in an unorthodox way.
“You know,” said Patrick, “this isn’t even the interesting part. I want to know how this code gets in here. You said these instructions weren’t in the translation file?”
“They must be elsewhere. Let’s grep the source for both the symbolic version and the op-code.”
I did a recursive grep and it came up empty. I didn’t know what to try next.
“Try the binary,” said Patrick.
“The binary of the compiler.”
I grepped for the name of the assembly instruction and found nothing, but looking for the op-code turned up hundreds of hits.
“Interesting,” I said, “they’re not generating individual instructions, they’re dumping chunks of pre-built code.”
“I don’t know why.”
“I mean, why do you say that?”
“Because these op-codes are all together. It’s not a look-up table.”
“Maybe it’s code,” he said after a second.
I disassembled the whole compiler and looked for a few of the suspicious instructions, and found large sections that looked like the obscure code we had been tracing all afternoon.
“The compiler is infected!” I said. “No wonder we couldn’t find it in its source code.”
“Okay, let’s start by recompiling the compiler,” said Patrick. “I’ll poke around the Web to see if anyone’s seen this before. When you’re done with the compiler, recompile all our own code and see if Dave’s bug is gone.”
The compiler took two hours to recompile, not including the time I spent learning the convoluted build process. Meanwhile, Patrick found nothing online. I rebuilt our source tree and ran the tests. They failed, in the same place.
“Maybe the bug was due to something else after all?” I suggested when Patrick walked over.
“Disassemble the compiler again,” he said.
I did, and the foreign codes were still there.
“That can’t be,” I said. “This is a fresh build of the official sources.”
“They must have gotten infected. Perhaps someone broke into the download site and replaced them with modified sources.”
“I would love to see that code,” I said, and opened my editor to a file in the compiler.
“That’s going to take you forever,” Patrick said. “Put a breakpoint in write(), with a condition of the string containing this op-code.”
And this from the guy who pretends not to like debuggers! Setting the condition was not easy, and there were many false positives, but the following Monday morning I hit the write() that outputs the suspicious code. I popped up to the C code. It was a generic routine to output a buffer. I worked backwards to the code that had filled it, and it was all straightforward loops based on the translation table, which I knew was clean.
In desperation, I started paging through every source file of the compiler looking for any code that might be responsible. Much of it was manipulating the abstract syntax tree. Then it occurred to me that the hack couldn’t be there, since the back-end translation table was clean. The hack must be in the back-end itself. In fact, it would have to be after register assignment. This narrowed down the search considerably, and I spent an hour looking through these files before it was time for lunch.
Monday was pho day. We always went to Pho World, which was so low-rent it didn’t even have a menu, but it was a delicious way to spend four dollars.
“Eye of round, no tripe, no cilantro,” said Dave to the little man, and we all followed suit. I don’t know what cut “eye of round” is but I know it’s safe.
“So I couldn’t find it,” I said, “either backward from the breakpoint or forward from the code.”
“Are you still debugging that?” said Dave.
“We have to,” said Patrick. “We can’t build a product on a foundation like this.”
“It’s just so odd that the C source would be clean but the assembly have these weird op-codes,” I said.
“I thought you had tracked it down to the compiler,” said Dave.
“I don’t mean in your code, I mean in the compiler itself.”
“The compiler could be responsible for that too.”
“No, I checked the code.”
“But the code is compiled by the compiler.”
I didn’t understand what Dave was saying, but Patrick had looked up and seemed on the verge of an insight. I waited for the lucid explanation.
“So the compiler detects and modifies your program, for reasons still unknown, but also detects and modifies itself when compiled.”
“Right,” said Dave with a grin, squirting brown sauce in his soup and splattering some of it onto the table.
“How would that work?” Patrick said, now in full form. “A hacker adds this code to the compiler and distributes it to everyone. The code detects that it’s compiling the compiler and adds itself back into the binary. One revision later the hacker removes the code from the official source distribution. The hack then perpetuates itself forever, with no trace in the source.”
“And what’s the purpose?” I said. I was skeptical that code could detect and insert itself so flawlessly across multiple versions.
“I don’t know, we didn’t get far enough into the analysis of Dave’s code. The obvious thing would be some kind of password validation code, modified to always accept some back-door password.”
I wish he wouldn’t say that such things are obvious.
“So let’s go back to an old version of the compiler,” said Dave.
“You mean the binary of an old version. The source won’t help us. I don’t think we have binaries around. We don’t know how far back this goes.”
“How about this,” I said, trying to be helpful, “I’ll write a utility that you can run on a binary, and it’ll tell you if it detects the suspicious use of these op codes.”
“Oh good idea.” I liked Patrick.
It didn’t take long to write this utility. It just ran a binary through the disassembler, then through a few greps to find instructions that were definitely not generated by the compiler. It found the code in Dave’s program, as well as in the compiler. I set it loose on all executables in the system.
“Here’s the list,” I said as I walked to Patrick’s desk. It was three printed pages. He looked through it.
“This is not good. The Java runtime, the Python runtime, Perl, the compiler, and a bunch of other programs that probably don’t matter.”
“Why do those runtimes matter?”
“Because if we can’t trust the C compiler, then we have to write a new one, which isn’t all that hard, but what language are you going to write it in? Are you going to trust your new compiler to the hacked Python interpreter?”
“You’re not going to write a new compiler,” said Dave from his desk. “Don’t get all conspiracy paranoid. It’s probably just a bug.”
I left Patrick staring at the list and went back to my desk. I still didn’t have an answer to the question I had asked at lunch: What’s the purpose of this hack? I had enjoyed reverse-engineering some of these bits of code, and frankly I feared that Patrick would ask me to write a C compiler in assembly, so I put my headphones on and opened the debugger to the part of the C compiler that looked obfuscated.
Again the over-use of instructions that involve the carry bit, unusual abuse of vector instructions, and convoluted and sometimes unnecessary jumps. No compiler would generate this. This was hand-crafted to be difficult to understand. I set out to figure out the purpose of this page of code.
Some time later I saw the custodian pick up my trash bin to empty it. I took off my headphones. Dave and Patrick were gone. It was ten o’clock. I looked back at the screen of now-familiar instructions. I had pieced together a rough idea of what it did, or at least what some of its parts did. I felt a rush of energy. I was close to an answer. I stayed until morning.
Patrick sat next to me as I pointed at my screen. “Here they use the vector instruction to get a sum of squares, which is just a convoluted way to compare these two byte arrays.” That was the climax of a ten-minute explanation.
“Wait, so what are they doing?” he asked.
“They’re doing pattern matching.”
“Why all the convoluted crap?”
“Well it’s a fuzzy search, and I guess it’s pretty fast.”
“Yeah but this is the most convoluted Rube-Goldberg way of doing something I’ve ever seen.”
My shoulders sunk and I fiddled with the mouse. I looked at my reams of notes. He wasn’t even a bit impressed.
“So, what now?” I asked after several seconds of silence.
“I don’t know, let me catch up on email.” He left for his desk. I was deflated and my muscles ached. I spent the rest of the morning browsing feeds and refreshing the live blog of a MacWorld event.
At lunch Patrick recounted my findings to Dave. He remembered every detail. Dave grinned and shook his head with every new complication in the algorithm. I wasn’t convinced that he was able to follow it all so quickly, and without any code to look at. I realized, listening to it all again, that Patrick had been right. This was far too convoluted a way of doing something relatively simple.
“Have you ever seen those obfuscated programming contests?” Dave asked when Patrick was almost finished. “This is just like that.”
“A friend and I competed with each other to write an obfuscated program in college,” said Patrick. “I don’t know how he did it, but I started by writing the program normally, then incrementally made it more difficult to follow, by renaming variables, inlining expressions, stuff like that. That process never would have resulted in what I saw this morning.”
“Well maybe other people do it differently,” said Dave.
“I feel strange about this code,” said Patrick. “I don’t know how to explain it. It feels cold.”
Dave and I looked at Patrick. He was dissolving wasabi into his soy sauce. I didn’t interrupt his thought-gathering.
“It’s like those chess programs,” he finally said. “They don’t have intuition about what will work, or feelings about the board position, or experience to draw on. They just try every possible option and pick the best one. This feels like that. Like someone tried every combination of instructions until they got code that did what it was supposed to do. There’s no beauty. The code is ugly and functional. I guess that’s what I meant by ‘cold’.”
“How does it know what it’s supposed to do?” I asked.
“Well,” said Patrick, “maybe you could just define boundary cases and run the code to see if the output was right. Code that works on boundary cases is likely to work for all cases.”
“Oh I like that!” said Dave, grinning. “I would love to specify the outcome and have the compiler write the code for me.”
“Well specifying the outcome precisely enough would be no easier than just writing the code,” said Patrick, and I felt my smile fade with Dave’s. I wondered whether a drunk Patrick would be more fun. Or at least less of a buzz-kill.
“So why would somebody do that, then?” I asked.
Patrick put a roll into his mouth and stared at the center of the round patio table. “Maybe no one did. Maybe this is all computers doing this.”
I didn’t know what to say to that. I didn’t think he was kidding.
“Okay, there’s this thing called Occam’s Razor,” Dave started.
“Well what’s your explanation?”
“Um, not self-aware artificially-intelligent robot overlords infecting my object marshalling code.”
“That’s not an explanation.”
“Have you run an anti-virus scan?” Dave liked to pronounce it “an-TIV-erus”, like the name of a Roman emperor.
“Does that affect your explanation of why this code looks this way?” Patrick could be a bit brutal.
Dave exhaled and took the question seriously. It was hard to refute Patrick’s point. No other explanation made sense. We had never seen a compiler generate this kind of code, and a human hand-crafting the assembly would have a hard enough time writing a self-referential op-code-injecting future-proof worm without needlessly using arcane instructions. If anything, using such instructions would draw attention to the code. In fact it had made it easy for my script to detect all infected programs on my system.
“Okay so what’s your full explanation?” Dave finally asked.
“I don’t know,” said Patrick, frowning. “Maybe some artificial intelligence program that ran amok. Oh, or you know how computer viruses evade virus scanners by modifying their own code? Maybe it started out that way, with a virus that was programmed to modify itself while retaining the same behavior.”
We were all silent for a few minutes. I was trying to see if this explanation was plausible. It seemed unlikely, but it would only take a single instance of a program somewhere going in this direction, and it would grow and propagate wildly. Half a million new computer viruses are detected each year. Our brains have no intuition for probabilities involving such numbers.
A thought occurred to me. “Well wait, this didn’t start in our lab. We just got pre-built binaries from the official distribution. Someone must have run into this before. I can’t believe that the official compiler is inserting half-broken code into binaries and we’re the first who have noticed.”
“Oh that’s good,” said Patrick, “I’ll post something after lunch.”
An hour later I got an email from Patrick with several links to forums where he had posted our findings and asked if anyone had ever seen them before. Dave followed up with a link to the Ask Slashdot post he had just submitted where he begged fellow Slashdotters to provide an explanation so that his co-worker wouldn’t force him to write a compiler in assembly language.
“Oh nice, Dave,” said Patrick, and I couldn’t tell if that was sarcastic or genuine. Dave gave a half-hearted “Thanks” that told me he wasn’t sure either.
I spent the afternoon poking through more mystery code and refreshing the forum posts. On a few we were ignored completely, but on most we were ridiculed. The mocking on Slashdot was awful. All comments were +5 funny. They didn’t seem funny to me. I felt my scratchy chin and remembered my sleepless night. I rode home in a trance and went straight to bed.
The next morning I found Patrick looking at my print-out of infected programs. I walked to his desk and stood by him.
“What are you looking for?” I asked him.
“A way for us to write a compiler in something other than assembly.”
“The assembler’s not infected?” asked Dave from his desk.
“It’s not on this list.”
I turned toward my desk. “I’ll double-check it.”
“Yeah it’s infected,” said Patrick.
Dave leaned back in his chair and covered his face in his hands. “This is crazy. We can’t be writing a compiler in assembly. No way.”
“It’s not that awful,” said Patrick. “We’ll spend a day writing useful low-level routines, and after that assembly’s not much more painful than C. And the compiler won’t have an optimizer or anything. I bet it doesn’t even need to support floating point. We just need to compile the compiler with it. It only needs to work on a single program.”
“That program took me two hours to compile, though. It’s pretty complex.”
“Do we have to write the linker too?” asked Dave.
I hadn’t even considered the linker. Evidently neither had Patrick, who quickly turned to Dave, then turned back to the print-outs, furiously looking for “ld”.
“It’s not on here. I guess that makes sense, it’s easier to insert code when compiling,” said Patrick. I didn’t immediately see why that was.
“Wait, let’s back up,” said Dave. “Last week I was able to fix my bug by modifying the code enough to avoid triggering the, um, the hack.” He didn’t know what to call it either.
“But then it came back, without you changing the code,” said Patrick.
“Yeah, I know, but before we write this thing, let’s at least try to modify the compiler’s own code to see if we can get it to generate a clean build.”
Patrick paused to consider. “You could, I guess, but where would you make the modification? Remember that we think this was introduced several revisions ago. That means the pattern recognition is pretty solid. And each test will take you two hours.”
“Well I can compile it in the background while we start to write this thing.”
And so it went. Dave downloaded the compiler’s source, compiled it, found the obfuscated op-codes, mapped it back to the source, and hacked at it trying to get a clean build. I could sense him losing hope as he realized the obfuscated code was spread throughout the program, not just in a few isolated cases.
Meanwhile Patrick wrote some basic string and file manipulation routines in assembly and put together a text filter that did nothing useful. He wanted to verify that its binary was clean. It was.
I brushed up on my assembly. At school we had done some in the operating systems class, adding pre-emptive scheduling. I remember having an unexpected high from programming the metal this way. There was something pure and raw about manipulating registers, about knowing exactly what was going on. I had felt an intimacy with the processor that high-level languages had abstracted away.
By afternoon Patrick was ready to give me an assignment, writing the pre-processor. At first it seemed straightforward: include files, substitute macros, add conditionals. But then I thought of expressions in conditionals, avoiding macro substitutions in strings, and I started looking for a Dave-like easy work-around.
“Can’t we use the infected one?” I asked. I knew it was on the list, but I didn’t see how an infected version would affect us. We could pre-process the entire compiler and manually look for tampered code. I doubted we’d see any, but if we did, we could also fix it manually.
Patrick liked that idea. “Okay, you start on that. Modify the compiler’s build system so it has two steps, one that pre-processes to temporary files, and another that compiles those.”
The plan had back-fired. My easy work-around became not only the tedious and difficult modification of the compiler’s convoluted build system, but then the mindless flipping through output code side-by-side with the original, looking for signs of tampering.
Patrick, meanwhile, had started on a simple recursive-descent parser. I heard him suddenly laugh. He turned to me and said, “I was about to allocate my first abstract syntax tree node when I realized that I don’t have malloc.”
The good thing about that project was that every bit of it could be as minimal and inefficient as necessary. The only requirement was that the output be correct. Patrick’s malloc just grabbed bytes from the heap and never bothered to free them.
We continued like this for days. Patrick would hand us some simple assignment, such as implementing a single C library function, and Dave or I would do it while waiting for a compile to finish.
Dave was not making progress with his own plan. He played whack-a-mole with the various segments of code he was trying to trick into generating benign code, but each success would cause a regression elsewhere.
Two weeks later our compiler, which had been getting through an increasing fraction of the original compiler, got through it all. I ran my analysis program on the result and it was clean. We compiled the original compiler with itself and the result was clean. It was two in the afternoon and we had forgotten lunch in our sprint to the finish line.
“Let’s start a full rebuild of our code and go eat,” said Patrick.
At lunch my mind was too wired to relax but too tired to make conversation. Dave was telling a story about local politics. I just wanted the food to arrive so I’d have an excuse to stay silent. Finally I found it too painful to think of anything but our problem.
“You know what bugs me,” I said awkwardly when Dave took a break. “We’ve never come close to figuring out the purpose of those modifications.”
Patrick answered immediately, also eager to stop talking about politics. “If I’m right and this is machine-instigated, then there doesn’t have to be a purpose.”
“Why would they do it then?” asked Dave.
“Viruses don’t spread because they have a purpose,” said Patrick. “They spread because they’re good at spreading.”
“So you think this is a virus?” I asked.
“Well it is in the sense that it spreads.”
“How do you know it spreads?”
Patrick was silent. “It puts stuff into our code,” he said finally, unsure of his answer.
“It doesn’t put itself into our code,” said Dave. “That wouldn’t make sense, I wasn’t writing a compiler. That was just some network code.”
“Well if you’re going to spread, then network code would be a good routine to infect.”
I felt a bit ashamed that none of us had discussed this before sinking two weeks into a new compiler. We had been so eager to get our project back on firm footing that we hadn’t properly investigated the alien in our system.
“But my network code wasn’t talking to a compiler. I mean, this is a compiler bug, a compiler virus. I don’t understand what you think it’s doing.” Dave was getting annoyed.
“I don’t know what it’s doing,” said Patrick. “I wonder if it’s sending stuff over the network.”
“We could check that with Wireshark,” said Dave, referring to the program that monitored network activity on a machine.
Suddenly all I wanted to do was go back to the office and try it. Every muscle ached with the desire to stand up and leave. The sandwiches arrived and we all immediately wrapped them up and drove back.
I had already installed Wireshark two months earlier to help debug a TCP/IP problem Patrick was having, so we went to my desk. I ran the program and recorded a minute of activity.
“That’s a lot of stuff,” Dave said as the screen filled up with packets.
“Yeah let’s pick a common one,” said Patrick.
I looked through the list and visually picked one that seemed to recur. We found that it was ssh, and I remembered that I had a shell window tailing a log file. I closed it and recorded another minute.
This time we had fewer packets. We picked through them one by one. Time synchronization, ARP, Gmail refreshing, programs checking for upgrades. We added each to the Wireshark filter once we had convinced ourselves they were innocuous.
I then did a 10-minute capture. There were a few more packets, all again innocuous. It’s what we had expected, of course. Dave mumbled something and walked to the kitchen.
Then I had a thought. I felt chills down my arm while I frantically searched the papers on my desk, looking for the printout. Then I found it, the list of infected programs. My eyes zipped down the list, cursing myself for not sorting it. Then my stomach squeezed tight as I found it, half-way down the second page: Wireshark.
Patrick guessed what I was looking for and saw the reaction on my face when I had found it.
“I guess we can’t trust it,” he said.
“Trust what?” said Dave, coming back with a soda.
“Wireshark,” said Patrick. “It’s infected.”
Dave rolled his eyes. He sat down at his desk and unwrapped his sandwich.
“Let’s look at the lights on this network switch,” I said. We each had a small switch on our desk. The light for my port was flickering every few seconds.
“Shut down all the programs we found earlier,” said Patrick.
I closed the brower, the chat program, various other daemons and tools. I couldn’t shut everything down; there’s always something left in the operating system, such as DHCP. But the flashes were infrequent enough that I could corralate them with packets found by Wireshark. This was good. Wireshark couldn’t be hiding packets, we’d have seen them on the switch’s lights.
Patrick left for his sandwich and I opened mine. I looked at Dave. He was smiling at us. His smile bothered me. Just because we’re paranoid doesn’t mean it’s not true. I had seen enough amazing coding samples in the previous few weeks to be convinced that something big and serious was happening under our nose.
No program is useful nowadays without communicating over the network. I rejected the idea that a virus crafted so carefully and strangely would restrain its activities to the local machine when the world was a packet away. Hundreds of programs on my computer were infected. I refused to believe they were mute.
Patrick suddenly stood up, nearly kicking his chair to the floor. He walked into the hallway and came back ten minutes later with a piece of equipment the size of a small suitcase. He approached me with it and I shoved everything on my desk to the side to make space. It was a digital scope he’d gotten from the hardware engineers in the lab upstairs.
He then reached into his pocket and pulled out a break-out cable with RJ-45 plugs. He put it between my computer and the switch. He didn’t know how to use the scope. I brought up a shell window and generated plenty of traffic for him to see. Eventually he was able to clearly see all my packets. I closed the window and we waited, shifting our eyes between the scope and the network switch’s front panel lights.
It was only a few seconds until the scope flicked with activity. I had been staring at it and couldn’t be sure that the switch had shown nothing. I brought up Wireshark to have a history of the packets.
“Nevermind that,” said Patrick. “You look at the switch and call out each packet you see. I’ll do that with the scope.”
Dave got up and walked casually to us, standing behind me.
I saw the light flash. “Now”, I said, as Patrick simultaneously said, “Yes.”
That happened two more times with the words reversed. Then Patrick said, “Now,” and I had seen nothing. This happened again a few seconds later. I felt more chills in my arm and in my legs.
“Whoa,” said Patrick. I looked over and the scope was showing a long stretch of activity. I looked back at the switch’s lights, which were dark.
Dave said, “You are not going to tell me that the switch is infected and hiding packets.”
“It is totally hiding packets,” said Patrick, staring at the activity on the scope and putting both hands on his head.
I looked up at Dave. His face was pale. His eyes darted between the two pieces of equipment. I was paralyzed.
Then Patrick stood erect in his chair, staring at the wall. This trance lasted only two seconds before he stood up and ran again into the hallway.
You already know what happened next. He came back with an old switch, from years earlier, and plugged it in. Its lights flickered in sync with the scope, to packets censored by Wireshark and the modern switch.
I was glad for the vomit. It gave me something to do. Dave left for the bathroom and Patrick and I cleaned up using paper towels I kept on my desk. But the stench was not enough to suppress the panic, and my weak arms shook in fear.
We sat in silence at the patio table, our sandwiches half-eaten in front of us. None of the things I wanted to say seemed worth saying. I alternately convinced myself that we were mistaken and that we were doomed. I wished desperately that Patrick would say something.
The patio gate opened and the mailman walked in. He stopped by our table, picked out a letter from his bundle, put it on the table, then continued into the building.
Dave grabbed it. “It’s addressed to both of us,” he said, glancing up at Patrick. He ripped open the side and pulled out a letter hand-written on loose-leaf paper. He fumbled with the sheets for a few seconds, then read the letter aloud.
We found your posts online. For three years we’ve been waiting for them, scouring the Internet and monitoring forums.
You must know that this has happened before, several times. The first was four years ago, to our team in Virginia. We found our binaries modified. We could recompile the code to clean them, but we found the binaries mysteriously modified again only a few hours later.
The next case was only a few months later to an unrelated team in San Diego. Both the binaries and the source had been modified. The source had to be periodically cleaned up, because recompilation was not sufficient to solve the problem.
It was a another year until we found the third case, a team in Spain. The binary was dirty, the source was clean, but recompiling did not fix the problem. The compiler’s source code had been modified to insert the strange op-codes.
Each team uncovered the worm’s weakest point and developed a solution. This weakest point was then patched for the next attack. Each generation pushed the worm deeper into the system.
Now it’s your turn. Not only has the compiler been modified, but its source is clean. It infects itself on recompilation. Our own machines have been infected for nine months this way. So has the rest of the world’s. You may wonder, then, why we were eagerly waiting for your post. To explain this, we must make two observations.
The first is that anyone could have guessed these weaknesses. It takes a fool to modify a development team’s binary, expecting it to not be recompiled. Anyone would have skipped that step, there’s no need to learn that lesson. Modifying their source is similarly naive. These modifications are technically very sophisticated. Who would be so technically advanced but so socially naive?
Machines. It wasn’t until the third attack that we came up with this hypothesis, and now we’re convinced. If you’ve studied the modified op-codes, you may also have come to the same conclusion. The op-codes were clearly generated by trial-and-error, by generating a random sequence and testing to see if it behaved correctly. Only a machine would do this.
The second observation is that, all these years, the worm has been widely spread but innocuous to infected programs. Yet it was not innocuous to these four teams. They were unable to finish their projects. They tried simple work-arounds, but these work-arounds persistently failed. Only a single team, world-wide, was affected by each generation of the worm.
This makes sense. Computers cannot empathize with humans. They can’t predict what we’re going to do when we run into a problem. The machines must have known that their worm has weaknesses, but they didn’t know what these weaknesses were. They forced a small team to be affected by the worm until that team found the weakest point and circumvented the worm. The machines then patched the weakness and tried again.
This is a large-scale version of what they do when they’re generating op-codes. They try different things until one works, instead of planning it out, as a human would. It’s a typically effective way to solve a problem, when you’re made of silicon.
By now you’ve presumably written a new compiler, perhaps in assembly language, and recompiled everything cleanly. We expect a few years to pass before we see the next team post to forums. We can already predict their findings. The compiler’s binary will be clean. Or rewriting the compiler in assembly won’t work. The worm will have been pushed deeper, perhaps into the text editor, the assembler, the file system, the hard disk interface, or the CPU itself.
Dave couldn’t finish. I don’t think there was much left of the letter anyway. He put it down in his lap.
We were silent for hours. Eventually Patrick left, then Dave. The November night came early and cold, but I couldn’t move.
I replayed our adventure. I analyzed each step, identified each assumption we had made. The big leap was the one about machines being responsible. As Carl Sagan had said, “Extraordinary claims require extraordinary evidence.” We had no extraordinary evidence. We had no evidence at all, only the lack of another explanation.
Could this have been done by humans? Humans write computer viruses all the time. It’s entirely possible that we had found an everyday virus and had overreacted. The team from Virginia may just be paranoid lunatics. This was far more plausible than the theory of machines.
I felt a weight lift as I considered this alternative explanation. These virus writers might well have written a program to generate op-codes randomly. They may well have started simply and, over years, made their attack more sophisticated, pushing it deeper into the system. Perhaps the authors are autistic savants.
My mind was blank for many minutes. When my thoughts returned, I found this new theory even more implausible than the one about machines. We had no idea what machines could do, but I could be pretty sure that no human would approach writing a worm this way. If we could have insight into an entity’s approach to playing chess, and found that it tried all possible paths, the only honest theory would be a machine player.
Furthermore, for practical purposes, it didn’t matter. We had to tell the world of our findings. We had to stop the perpetrators before they modified hardware disk interfaces or the CPU. We were lost if the worm reached such a low-level substrate of our technology.
I imagined posting online again. I remembered the +5 funny Slashdot replies. Who should we tell? Intel? The government? The Virginia team must have tried. Why didn’t anti-virus programs detect this? I imagined conversations with officials who would laugh at our theories. I imagined screaming at them.
I suddenly stood up, clenching my fists and pacing the enclosed patio. I imagined yelling at people because I couldn’t imagine yelling at machines. Like most programmers, I had always spoken anthropomorphically about computers, but when these computers had finally acted like humans, I found their nature ephemeral and soulless.
I picked up my sandwich in its wrapper and went inside. I dumped the sandwich and opened the fridge absent-mindedly. I stared at the drinks. A happier thought occurred to me. None of the code we had analyzed seemed malicious. Other than occasionally bothering a team like ours, we had seen no evidence of malice.
Patrick had been right when he said that a virus doesn’t spread because it has a purpose, only because it’s good at spreading. This worm might be a permanent tag-along, like mitochondria, forging a symbiotic relationship with us. If we tried to stamp it out, then we may corner it into a resistant strain that is hostile to us. But this was unnecessary. Let’s just let it be. It’s unsettling, but workable.
I shut the fridge, put on my jacket, and walked to the alarm panel. The LCD on the panel reminded me that it, too, was a computer. Was it infected? Did it know of our new compiler? Was it going to let me arm it? Were the magnetic door locks going to let me out of the building?
I pressed the digits for the arming code and heard the countdown tones. I walked out onto the patio to my bicycle. They had permitted me to leave. I looked at my bike and smiled at its mechanical simplicity. I thought of traffic lights and sewer pumps. I thought of my credit card. I thought of cars and telephones.
I wanted to sit down. I was vanquished. Taking humanity back to a pre-computer age was impossible. I was suddenly ashamed to have participated in our undoing. I was a programmer. I was a member of the group that had spawned our successors.
I tucked my right pant leg into my sock and unlocked my bike. With the shame came a strange pride. Not many people can claim to have created a species, if it can be called that. Sure, lawyers and doctors and politicians are important. But they’re just maintaining the current house, not creating the future inhabitants of Earth.
I rode out of the parking lot into the empty street. I smelled again the flower scent that I associated so strongly with Mountain View. I had never looked up its name. There’s nothing wrong with being a transitional species. Nearly all species had been, and in the long run we would be one anyway. We remembered Lucy. We remembered dinosaurs. We worshiped dinosaurs. The machines would remember us.