Yes Virginia, There Really Is a Virus

June 1995

We have been hearing about viruses for years now. The US practically went nuts when the Michelangelo virus was supposed to shut down all the PCs in the world. But it didn't. We know that they are out there, somewhere, but it is rather like we know that drugs are out there somewhere too, but in somebody else's grungy neighborhood, not our own clean cut and wholesome neighborhoods.

Not so. We got nailed by a virus. I never thought it could happen to me or mine, but it did. I knew that I always practiced safe hex, but I also thought that my partners did too. And that is where I failed.

Not many people in Spokane know it, but World Wide Widgets builds microcomputers, among other things. Over the last 15 years, we have built over 7,000 of them, and they are installed literally all over the world. These are used for controlling the Widget Production. WWW pioneered the concept of using distributed control systems to increase production and decrease energy usage in this very energy intensive process.

Well, we don't actually build them. We design them, and then we farm out to other companies the actual build cycle. We have gone through several such companies, one recently being a well known company in Silicon Valley. As part of the build cycle, the manufacturer also tests our microcomputers in an Environmental Test Chamber, where the micro has to run flawlessly for eight hours while being repetitively cycled from -20C to +60C. (We put these things in ambient temperatures literally from the arctic circle to the equator.) A PC clone using a turbo version of the Intel 8088 chip is used to monitor the microcomputers being tested in the chamber. That PC runs a program that I wrote about four years ago.

We started using this SV company when they were just getting started about 10 years ago. If you now walk through their facility, you will find circuit boards being built for all the major players in the electronics business, from Apple to IBM to HP. We generally do not have continuous orders for our micros. These orders come in batches, of a few hundred, only once or twice a year. Our test gear sat around idle most of the time in a corner of the SV company's manufacturing floor, taking up quite a lot of now very valuable real estate. Eventually they told us to bundle up our test chambers (two of them, we owned them) and all our other paraphernalia, and begone. So we gathered, and bewent.

We now have our micros built by a fine company in Post Falls, Idaho. They have their own environmental chamber, but we still supply the PC and programs to supervise the test. And since we have slightly redesigned the hardware of our micro, we had to make some minor changes to the test programs.

So we had the monitor PCs shipped up to our offices here from the California assembly company. When they arrived, one of them was busted (the power supply had blown), but the other one seemed to work just fine, and since the Idaho company has only one chamber, we needed only one PC. So we put the busted one back in its shipping box and stored it somewhere, and went to work on the live one.

I had the source for the test program on our local LAN, and made the changes there, and did the compilations there, and copied the programs over to the test PC via floppy. The changes went well, and on Friday of that week, I declared my task finished. On Sunday I went to a convention in Seattle for several days. Monday, our own test engineer took the monitor PC to the Idaho manufacturer to supervise the testing of the first batch of newly built micros. And he saw the test system crash after 28 minutes. (The total test runs about eight hours, but the changes we made to the program would be observable in the first couple of minutes, so my local testing never ran much longer than that.) After doing all the things he could think of (and since I was partying it up at the Seattle Sheraton) he schlepped the whole system back to our offices.

Where indeed, the system continued to crash every 28 minutes. So he hauled out another system, this time a real honest to gosh original IBM XT. (Our California HQ sent us a whole carload when they upgraded everybody to 386s this year.) And that one crashed after 28 minutes. So he unboxed a brand new 386 SX clone that we were going to ship to one of our customers, and even it crashed after the same 28 minutes. Since I was still on the Wet Coast, he enlisted the aid of one of our other programmers, who tracked the problem down to something in the communications subsystem before he tossed in the towel.

Our test system communicates to the Micros in the chamber via a serial async port at 4800 baud. For many years, and on many projects, we have used a very fine software package from a company named Blaise to provide us with the interrupt handling software that we need to run at high speeds (and 4800 baud is high speed on an 8088). And it seemed that this software was what was declaring some sort of Uart error 28 minutes into the test.

I got back from my convention on Thursday feeling very wasted from the cold and wet ambiance one finds in Seattle, and was hit with this problem. I installed some diagnostic patches to my test program, and verified that it failed at a test for Overrun error on the comm port. We installed a communication analyzer on the comm line, and monitored it at the time the system crashed, and found that the PC kept sending out messages even after the apparent crash, and was getting very valid responses back from the micros. But yet the PC itself reported that the comm port continued to fail after the crash. We could sort of understand getting an overrun error on a real slow 8088, but not hardly on a 20Mhz 386.

As much as I love to holler "Hardware!!!" at times like this, my argument was getting a little weak. And since I was so totally whacked out on antihistamines that I couldn't think straight anyway, I bagged it for the day.

Friday, much more clear of head, I walked into the office and asserted that the problem existed in neither the PC hardware, the micro hardware, or the Blaise software. My ex boss, a hardware guy who used to design computers for Control Data and had done some of the design of our micro, of course agreed completely. The problem was, what the heck else was there that could be screwing up? Other than my software, of which only the teensiest of weeny little changes had been made.

Num Num that I am, I had destroyed the executable of the original test program on the PC. (It was such a teensy change, after all!) But I did have a backup of it on my LAN. So I took that program to the PC, and the test failed after 28 minutes. Now this was the original, three year ago stuff, that had happily run through dozens of eight hour test cycles down in Silicon Valley.

Well, we all found that to be quite interesting, except for the fact that it wasn't the really real executable, but my backup from my LAN. So I asked our test engineer to rip the disk drive out of the busted PC and install it in the working PC. And darned if the executable on that disk didn't work just fine.

I did a line-by-line comparison of the source (untouched) on the virgin PC disk, vs the source (original) on my LAN. And found exactly one difference in one module, and that only being some extra blanks in one line. Since the programs are all written in C, blanks can be significant (unlike good old Fortran where they are not), but groups of blanks are generally not significant. If you do misuse them the compiler will generally bark about it. And it did not.

So I moved that virgin, working executable from the virgin clone's disk to the 386. And the system crashed in 28 minutes. An earthquake would not have made more of an impact on us at this instant. This made absolutely no sense to us at all, that a program would work fine here, and crash there.

So everybody went to lunch. Except me. And those of you who know me, know that for me to miss lunch, requires a Reason of insurmountable importance. And that Reason was to stare at the wall and thunk real hard for about an hour.

I took the test program, and stripped everything out of it except the bare bones of communicating to the micros, took out all the user interface junk, took out all the database stuff, got it down to its essential skivvies with only one print line every cycle through the test (a cycle takes about 5 seconds). And it crashed after 28 minutes. But this time, I saw something very weird on the CRT. In the middle of this repeating series of lines ("The latest cycle has just completed successfully.") and whose very last line indicated that the cycle had just failed, was a blank box, a couple rows high and about 15 columns wide, in the lower left area (but not corner) of the screen. As if something was trying to pop up a HaHa box or something. The original program used a windowing package named Boss Windows, but the Boss code had been tossed out in this stripped down version, so this blank box couldn't come from anything dealing with my own windowing efforts.

On a hunch, I DIRed the program file on the virgin disk, and the same program file which had just been copied to the 386, and found them to be different by about 3K. About then my (then) current boss, a software guy, wandered in, and I told him about this observation and my gathering hypothesis about a virus being the cause of it all, but he brushed it off indicating that the size difference was due to different cluster factors on the different disks. This would have made sense, if a DIR of either disk had not shown some odd numbers on some files. Odd in the sense of 1,3,5. Rounding up to the next cluster would always result in an even number.

By now I am seriously thinking virus, ridiculous as that idea had to be. But Sherlock Holmes did say, when you have eliminated the impossible, whatever remains, however improbable, must be the cause. We had a virus scanning program on our LAN, which I copied to a floppy disk, and used that floppy copy to scan the 386. Or attempted to, anyway. The first thing that the virus scanning software said, was that the virus scanning software from the floppy disk had somehow been corrupted. I found that to be of some interest, since I had just created it. I then proceeded to do the dumbest thing of the whole week, and that was to take that floppy disk to my own workstation to see if the LAN version of the virus scanner could detect a virus on the floppy. And then I did an even dumber event, by accidentally running not the virus scanning software from the LAN as I had originally intended, but accidentally ran the software from the floppy, on my own PC. And got the same message about the program being corrupted.

The one smart (really, lucky) thing that I did in all this, was to run these things on my own workstation while shelled out from XTREE. As soon as the virus scanning software failed, it exited to the secondary shell, then the secondary command.com shell exited and returned control to XTREE, which immediately reported that some dumb fool had left a TSR in memory after the last program execution, and that XTREE could not continue from that state. It was highly unlikely that the virus scanning package would have been designed to leave a TSR laying around after being used, so I felt that whatever was on the floppy was now trying to infect my own workstation. XTREE saved my workstation by essentially locking my system up when it detected that TSR.

I quickly rebooted my workstation, and this time scanned the floppy with the LAN version of the virus scanning program, which reported that well, of course, you have the Jerusalem Virus all over the place there. The final test was to reformat the floppy, make it bootable, copy the virus scanning software to it, boot the 386sx from the floppy and run the test on the 386 hard disk, and now it reported finding that same virus in every executable program that had ever been run by that machine.

It appears that this virus infects your system by first loading a TSR into memory, and then every Executable program from that time on gets infected with the virus, which manifests itself with a HaHa popup window after 28 minutes, except for some reason in our case, the HaHa window didn't work correctly. So every time we copied a floppy from the original PC to another PC, it took the infection with it, and installed a new copy of the TSR on that new machine, and buggered all of its executables. Fortunately, we never copied From the test PC to My Own workstation (except to run the virus scanning software from the floppy, which fortunately XTREE trapped). And we never copied anything onto the virgin disk of the busted PC, so it remained uninfected. But as soon as the virgin disk's uninfected program was copied over to the 386, it immediately got infected by the TSR already on the 386, and then the program would never run correctly.

The infection was held to those three machines, which were not connected to our LAN at any time. XTREE saved my machine, and probably our LAN. We scanned both and found nothing. I reformatted all the floppies we had been using, and all the hard disks on the three PCs, and then our test engineer proceeded to cut up and throw away all the floppies just to be Really Really sure we had nailed the problem. And finally after all that, my revised program does run the complete eight hour environmental test.

So the cost: About a week and a half of people time (parts of three people during one week), and one week lost for testing our newly built micros at the new manufacturing facility. So charge $5000 (fully loaded) in people time, and maybe some like amount in wasted test facility time. Not to mention not doing the stuff that was supposed to be done during that week by various and sundry programmers and engineers and manufacturing people and ... Call it a round $15K.

And where did this virus come from? When we first powered up the one live monitor PC in our offices, we noticed in passing that somebody had installed some neat games on it. We suspect that since our equipment was laying around idle most of the time down at the Silicon Valley factory, that maybe some production workers decided to play some games on our PCs during their spare time. We really don't know where these games came from. All we know is that the games were there when we powered up the machine on our site and we noticed the virus shortly after that.

What did we learn? We no longer think of viruses as science fiction. We no longer trust floppies whose genesis cannot be documented until we run them through a virus scanner. It is not often that we loan out our PCs, but we would probably scan them when they came back to us. And, we did learn the hard way that viruses do exist, and can seriously mess up your week when they bite you.

Read Next Article -->

Return to Home Page ^




Afterwords:

This article was submitted by me to ComputorLink magazine when I was an interested reader. The original article did name (company) names, and so it was sent through the company bureaucracy to see if it could be published, and much to my amazement they allowed it. Subsequently, I talked to the publisher of the magazine, and she expressed interest in my submitting more articles. The second one did not mention the Company, but the third one did. So I duly sent it through the system, and they flunked it. Since then, I have been referring to the Company as WWW, and have changed this article from the original to fit that scheme.