The mystery of the rumored theft of CP/M by a little company called Microsoft can finally be investigated—using software forensic tools
Editor’s Note: Upon publication, this article failed to properly disclose the connection between its author, Bob Zeidman, and Microsoft Corp., a key subject of the story. Mr. Zeidman is currently retained by Microsoft as an expert witness in Motorola Mobility v. Microsoft. IEEE Spectrum regrets the omission.
The history of the computer industry is filled with fascinating tales of riches that appear to practically fall from the sky.
Along with stories of riches won, there are stories of opportunities missed. Take that of Ronald Wayne, who cofounded Apple Computer with Steve Wozniak and Steve Jobs but sold his shares for just US $2300. And John Atanasoff, who proudly showed his digital computer design to John Mauchly—who later codesigned the Eniac, often defined as the first electronic computer, without credit to Atanasoff.
But by far the most famous story of missed fame and fortune is that of Gary Kildall. A pioneer in computer operating systems, Kildall wrote Control Program for Microcomputers (CP/M), the operating system used on many of the early hobbyist personal computers, such as the MITS Altair 8800, the IMSAI 8080, and the Osborne 1, before IBM introduced its own machine, the PC. Kildall could have virtually owned the personal computer operating system business, had he sold that system to IBM. He didn’t. Why is a matter of speculation, mundane gossip, and urban legend. We’ll get to that.
Bill Gates at Microsoft, however, did sell an operating system to IBM—and reaped then-unimaginable rewards. A cloud of speculation has hung over that part of the story as well. The big question: Was the operating system Gates sold to IBM his to sell? Or was a key part of it stolen from Kildall?
Microsoft has stated that its hands were clean. Kildall maintained that QDOS, and subsequently MS-DOS, had been directly copied from CP/M and thus infringed on his copyright. But until now there’s been no way to conduct a reliable examination of the software itself, to look inside MS-DOS for the fingerprints of CP/M, and settle the issue once and for all.
My company’s CodeSuite forensic software lets us look inside operating systems and other software for fingerprints of other programs. And I applied it to finally answer the question: Did Bill Gates steal the heart of DOS?
But first, here is the sequence of events, part known, part only speculated. IBM in 1980 started a skunkworks project in Boca Raton, Fla., to create the IBM PC. The company decided that rather than developing software in-house, as was typical at IBM, it would instead partner with one of the small companies already producing code for microcomputers. IBM’s first stop was then-small Microsoft in Bellevue, Wash., known for its version of the BASIC programming language. There, the young Gates told IBM that Microsoft didn’t have an operating system, but that the company should contact Kildall at his company, Digital Research Inc. (DRI) in Pacific Grove, Calif., because Kildall did have an operating system—CP/M.
Here’s where the story varies, depending on who’s telling it. In one version, the IBM executives flew down to meet Kildall who, as a member of the personal computer counterculture, didn’t trust “Big Brother” IBM. Rather than meet with the buttoned-down IBM executives, Kildall took off in his plane for a joyride. When the IBM execs showed up, they were met by Kildall’s wife and business partner, Dorothy McEwen, who refused to sign IBM’s nondisclosure agreement, or NDA, a standard business document that would have kept the discussion secret and would have not allowed DRI to make use of confidential information presented by IBM. After several hours of haggling over the NDA, the IBM executives got frustrated and left.
In another version of the story, Kildall and DRI employee Tom Rolander went off in the plane to deliver software to a customer and left the license negotiations with McEwen, who normally handled those matters. McEwen felt the NDA was too restrictive and talked to their attorney, Gerry Davis, who advised her to wait for Kildall to return. Kildall returned later that day. Accounts again differ on whether he signed the NDA or even participated in discussions with IBM.
In any case, it’s a fact that no deal was signed. Kildall later said that he met IBM negotiator Jack Sams on a flight to Florida that evening, negotiated a deal on the flight, and shook hands on it. Sams denied ever meeting Kildall. In fact, the IBM negotiators, still in need of an operating system, flew to Seattle again that day—not Florida—and met with Bill Gates.
Since Gates’s first meeting with IBM, he had conveniently gotten his hands on a microcomputer operating system similar to Kildall’s, from nearby Seattle Computer Products. SCP, which sold microcomputer boards, needed an operating system that ran on the new Intel 8086 processor. Because DRI was late in porting its system to that processor, SCP hired programmer Tim Paterson to create one. It called this system QDOS, for “Quick and Dirty Operating System.” Gates bought the rights to QDOS for $75 000 and hired Paterson to modify it into MS-DOS; that’s what he licensed to IBM for its PC as PC-DOS.
The IBM PC became a huge success, and Microsoft soon displaced DRI as the leading microcomputer operating system company. Kildall resented the success of Gates and Microsoft, and he eventually went back to IBM and negotiated a deal to offer CP/M on IBM PCs. However, Kildall negotiated a very high license fee, much higher than that of MS-DOS, meaning IBM had to charge $240 per copy of CP/M rather than the $40 per copy it charged for PC-DOS. Few people bought CP/M, and PC-DOS sales continued to grow.
Kildall maintained that QDOS, and subsequently MS-DOS, had been directly copied from CP/M and thus infringed on his copyright. DRI attorney Davis claimed forensic experts had proven that MS-DOS had been copied from CP/M, but that in 1981 there was no way to go to court over copyright infringement and get a judgment. The latter, at least, was not true. One year earlier, Congress had passed the Computer Software Copyright Act of 1980, which made copyright protection of software explicit, so why DRI didn’t take the battle to court at the time is not clear.
The claim about the forensic proof was probably false as well. In those days, software forensic tools consisted of off-the-shelf disassemblers and various utilities, typically used for debugging, cobbled together by investigators—they weren’t very effective. Today, however, forensic tools are much more sophisticated; they can divide up programs into various elements to be compared independently, and they can compare object code directly without having to disassemble or decompile it, processes that can add extraneous information or lose elements of the code. The tools available today that compare software source code include Measure of Software Similarity (MOSS) from Stanford University and JPlag from the University of Karlsruhe, in Germany. The tools called CodeSuite, sold by my company, Software Analysis and Forensic Engineering Corp. (SAFE), compare source code and object code as well.
So with CodeSuite in hand, I donned my deerstalker, downed a Rockstar Energy Drink, pulled out my meerschaum-and-mahogany calabash pipe, and set out to answer the Gates/Kildall question once and for all.
The investigation commenced on the Unofficial CP/M Web Site,where I downloaded CP/M source code files. These files included notices of copyright by Gary Kildall from 1975, shortly after he founded DRI. They were written in the PL/M programming language, which Kildall developed for microprocessors while he was employed at Intel. From the same site I downloaded the source code files of CP/M 2.0, which were from 1981 but contained copyright notices from 1976, 1977, and 1978. These files were written in both PL/M and low-level assembly code. I also downloaded executable binary files of CP/M 1.4 that included three source code files dated from 22 March 1979 through 5 September 1981.
Continuing to gather clues, I downloaded 86-DOS (QDOS) source code files and executable binary files from Howard’s Seattle Computer Products SCP 86-DOS Resource Website that contained revision dates in April 1981. The source code files were written in assembly language.
Getting an early copy of MS-DOS source code wasn’t so easy; it isn’t sitting around online, which is understandable because it’s a commercial product from an ongoing company rather than open source or developed by a now defunct company. Because I collect vintage computers, I just happened to have one of the first PC clones, the Compaq “luggable” computer and a floppy disk containing MS-DOS 1.11 for it.
First, I compared the QDOS source code with the CP/M source code to see if there was any evidence that QDOS was copied from or was a derivative of CP/M. This had to be done in two steps because the CP/M source code included files written in the PL/M programming language as well as files written in assembly language.
Could our suspect software include code copied from a high-level language such as PL/M and translated into a low-level assembly language? It’s not likely: The languages are so different it would be like translating a story from ancient Egyptian hieroglyphic into English. It’s doable, but it’s easier to just write the story from scratch. I compared the files anyway—and voilà: I found a match between programming statements in the two programs.
I kept my calm, though, because these matching statements appeared to be common, simple statements—for example, statements like “CALL,” “MAKE,” and “BOOT.” The same held for the bit of correlation I found in some comments and strings; these comments and strings, such as “SECTORS PER TRACK,” are common operating system terms and messages that can probably be found in many programs. A real smoking gun—sequences of instructions that matched that would show similar, possibly copied functioning pieces of software—was notably missing.
I continued my examination of the source code. Again, some correlation came to light, but most of it in this case came from partially matching identifiers—that is, pieces but not the whole; names of variables and functions. Had I found a trail? Partially matching identifiers could be a clue that a clever programmer had copied some code but changed the names enough to appear different while still retaining some meaning. For example, the variable name FirstName might be changed to Fname.
I looked a little closer, but the trail evaporated. I could clearly see that the partially matching identifiers were simply commonly used names or random characters. For example, the identifier ENDMOD in the CP/M source code partially matched the identifiers MOD5 and MOD6 in the QDOS source code.
The CodeSuite tools rely on Internet searches to filter out correlations due to reasons other than copying. For example, if an element is found in two programs and is also found many times on the Internet, it is most likely a commonly used term. If it is found in two programs but nowhere else on the Internet, then it is almost certainly present because it was copied.
Normally, I would filter out all matching elements that had any hits on the Internet at all. In this case, I decided to be a bit more liberal and filter out matching elements that were found more than 100 times on the Internet. In this way, even things that were found in some other programs or documents on the Internet would not be filtered out. I didn’t come up empty-handed; though no identifiers remained and no comments or strings remained, one programming statement did: “jnz comerr.”
Aha! I thought. Could “jnz comerr” finally be the smoking gun that shows DOS was copied from CP/M? It turns out this statement could be found in only one place on the Internet. Could it be simply mere coincidence that it appeared in both sets of code?
Breaking this statement down, I determined that “jnz” was a standard program assembly language statement for “jump if not zero.” “Comerr” was a label that both programs use to specify the beginning of some routine; it appeared to be a combination of “com,” which could refer to a communications port or a command, and “err,” which typically means an error. My guess: “Comerr” was a routine that handles either communication errors or command errors. If “jnz comerr” referred to the same function in both programs, it could break this case wide open.
I turned to look at the actual routines in the code and could tell that these were significantly different routines. The QDOS routine was in a file that handles input and output and gets invoked when there is a problem reading a file. The CP/M routine was more complex and was in a file that handles a problem in the processing of a command, getting invoked when there is a problem with a command. These routines have no relationship to each other and therefore do not signify copying.
I went on to try to compare object code for MS-DOS using that old MS-DOS 1.11 floppy. I used a CodeSuite tool called BitMatch to compare MS-DOS 1.11 binary code with the CP/M source code files from CP/M version 2.0 and miscellaneous incomplete sets of source code files from copies of CP/M version 1.4 and earlier versions that were available on the Unofficial CP/M Web Site.
Using binary files in comparisons is not foolproof; it’s possible that the copying may not show up because the compilation of source code into binary can eliminate telltale elements. If matches appear even after filtering, then the files have almost certainly been copied. However, if no matches appear, it doesn’t mean that copying hasn’t happened, just that it has gone undetected.
Still, it was a test worth running. And, indeed, comparing binary with source code, I found 80 matching identifiers. But with a couple of exceptions, these identifiers were all common words from operating systems and programming or just from the English language. I also found 11 matching strings, but again, these strings were all common words or phrases. And once I filtered the matching elements to eliminate common identifiers found more than 100 times on the Internet, all the matches evaporated.
Next I compared the MS-DOS 1.11 binary code with CP/M binary code. There was only one matching identifier: “com.” This is a common abbreviation for a communication port like a serial port or printer port, and certainly not a sign of copying. There were also 65 matching strings, but they were all common words or phrases used in many operating systems.
So far, every trail had come up cold. But I had one more trick up my sleeve. Legend tells us Kildall himself buried a secret message in CP/M and that the message can also be found in MS-DOS.
In 2006, science fiction writer and technology reporter Jerry Pournelle said on “This Week in Tech,” an Internet radio show, that this secret command triggered the display of a copyright notice for DRI and Kildall’s full name. According to Pournelle, Kildall had demonstrated this command to him by typing it into DOS; it produced the notice and thus proved that DOS was copied from CP/M.
This story, circulated for years, has a few problems. First, no one knows the secret command; Pournelle claims he wrote the command down but has never shown it to anyone. In addition, such a message would be easily seen by opening the binary files in a simple text editor unless the message was encrypted. CP/M had to fit on a floppy disk that held only 160 kilobytes; Kildall’s achievement was squeezing an entire operating system into such a small footprint. But it is difficult to imagine he could do this and also squeeze in an undetectable encryption routine. And although we’re now in an era of hackers breaking into heavily secured computers, no one has ever cracked DOS to find this secret command.
But I set out to look for it anyway. I used a utility program developed at SAFE to extract strings of text from binary files. Not only did Kildall’s name not show up in any QDOS or MS-DOS text strings, it did not show up in CP/M either. The term “Digital Research” did appear in copyright notices in the CP/M binary files, but not in MS-DOS or QDOS binary files.
If Jerry Pournelle did indeed see a hidden message revealed by a secret command, it was not in MS-DOS.
And that is that. Every lead brought me not to Bill Gates but to a dead end. QDOS was absolutely not copied from CP/M, and MS-DOS showed no signs of copying either. Kildall’s accusations about Bill Gates were totally groundless.
Gary Kildall’s fate was sad. He died in 1994 at the age of 52. The circumstances of his death are as muddied and debated as the missed meeting with IBM. He suffered a head injury in a California biker bar—some reports describe a brawl, others a fall from a chair or down a staircase, and still others report a heart attack. Some claim that Kildall committed suicide and his family covered it up. Most agree that alcoholism, in one way or another, led to his death.
Kildall indeed deserves credit for creating the first personal computer operating system, but his operating system didn’t come out of nowhere; it was essentially a simpler version of many other operating systems in use at the time, including Unix, developed in 1969, and VAX/VMS, introduced in 1978. And while Kildall is sometimes remembered as a pauper for “being cheated by Bill Gates,” DRI was actually a successful company for many years, and Kildall sold it to Novell in 1991 for $120 million. Kildall was undeniably very creative and innovative, but he was also a poor businessman who was nonetheless very successful. If he was not as successful as Bill Gates, it wasn’t because Microsoft stole the CP/M source code.
About the Author
Bob Zeidman is the president and founder of Zeidman Consulting, a premier contract research and development firm in Silicon Valley. He is also the president and founder of Software Analysis and Forensic Engineering Corp., a provider of software intellectual property analysis tools. Zeidman has worked on and testified in more than 100 cases involving billions of dollars in disputed intellectual property. His latest book is The Software IP Detective’s Handbook (Prentice Hall, 2011). And that deerstalker hat? He purchased it at 221B Baker St. in London, the fictional home of Sherlock Holmes.
To Probe Further
To download a zip file of the code used for this analysis, along with the code comparisons and the string extractions, go to http://www.zeidmanconsulting.com/DOS_comparisons.