Softpanorama

Home Switchboard Unix Administration Red Hat TCP/IP Networks Neoliberalism Toxic Managers
May the source be with you, but remember the KISS principle ;-)
Bigger doesn't imply better. Bigger often is a sign of obesity, of lost control, of overcomplexity, of cancerous cells

Nikolai Bezroukov. Portraits of Open Source Pioneers

For readers with high sensitivity to grammar errors access to this page is not recommended :-)


Was the initial Linux kernel code
independent of Minix ?

Trust but verify

Russian proverb

We should take this opportunity to use the ancient prayer:

UNIX is a trademark of AT&T in the USA and other countries.

Earlier versions of this prayer do seem to exist, it is unclear why the form of words altered. `AT&T' was the Corporation where the Creators of the cult worshiped. The Corporation totally disappeared in the wars and many of its original records were either destroyed or altered by the victor in an attempt to `re-write' history. The placement of the country USA on the four continents has been lost.

The UNIX cult

 

This question rose in 2004  with the publication of  a controversial paper "Samizdat" by Ken Brown, in which the author, among other things, questioned the independence of the original Linux kernel code. The paper is not freely available on the Internet, but you can get general impression about its content from reading a follow-up paper Samizdat's critics... Brown replies.  Here is a relevant quote:

Samizdat concludes that the root of attribution, IP misappropriation, and acknowledgement problems in Linux is ---in fact--- the trust model. Basically, Torvalds and other Linux advocates are admitting to using a ‘three monkeys’ policy for software development: see no evil, speak no evil, hear no evil. Specifically, Torvalds and the Linux kernel management team accept blind source code contributions. Then, they ask for a certification. But the certification does not hold the contributor, the Linux community, or Torvalds legally accountable. Nor does it guarantee that the source is produced in a 'clean room'. Meanwhile users are left to just 'trust' Linux too, legally left to face the ramifications of any significant legal problems. This is a 'wishful thinking' policy, and is not a sound approach for software development. The reality is that, none, including Linus Torvalds, can ever guarantee that code in the Linux kernel is free of counter ownership, or attribution claims. AdTI suggests that the U.S. government should buy and invest in software from a confirmable entity, not from an assortment of unconfirmable sources. AdTI is certain that inevitably, some unfortunate user of Linux will be facing an incalculable legal problem.

Meanwhile, we should also very plainly ask, “who[m] are we trusting?”

In a controversial section of Samizdat, I ask readers to pose some very hard questions about the origin of the Linux kernel. This is for a number of reasons, but especially because the same people that are selling the trust model cannot answer basic questions about what attribution, acknowledgement, and IP credit they may have owed ATT Corporation and/or Prentice Hall Corporation in 1991 when the Linux kernel was introduced. The same community that sells ‘trust’, is the same community that celebrates: the theft of ATT Unix source code in the late 70’s, joked about the theft of Windows source code in February, and commenting on the Cisco source code theft in May wrote in Newsforge, “maybe the theft will be a good enough reason for Cisco customers to check out open source alternatives….(3)”

Isn’t fair to question the character and ethics of individuals that espouse contempt for intellectual property? Isn’t fair to question their character, when the core of their business strategy is trust?

... ... ...

Why do accounts continually assert that Torvalds "wrote Linux from scratch"?

Presumably, Professor Tanenbaum was not in Linus Torvalds's apartment at the time Linux was, to use a phrase recently (but only recently) disclaimed by Torvalds, "invented." Yet Tanenbaum vehemently insists that Torvalds wrote Linux from scratch, which means from a blank computer screen to most people. No books, no resources, no notes -- certainly not a line of source code to borrow from, or to be tempted to borrow from. But in a number of interviews AdTI completed with various individuals about operating system development, almost everyone reported that it is highly unlikely that even a pure genius could start from a blank computer screen and write the early Linux kernel. Suppose he could, would he?

In fact, everyone reported to me the opposite, that it only makes perfect sense to start with someone’s code, or framework, which is the common practice among programmers.

Ken Brown's hypothesis that Linux initial codebase was a derivative (plagiarism) of the Minix codebase has two dimensions: technical and social.

Technical dimension is the level of similarity of code between relevant version of Linux and Minix. The question of similarity between programs is a pretty complex topic and good algorithms for comparison of a large codebase are very difficult to come by. In general this is problem related to program understanding. One can use some heuristic approaches along with algorithms related to string matching and several such tools exists, but the problem in general is algorithmically unsolvable. Please note that besides comparing the source code directly, one can compare derivatives like symbol table, procedure call tree, etc. You can also "normalize' source code by using various program transformations to expose identical code sequences even if variables were renamed (for example correlating identifiers via  XREF table) and control flow slightly alerted (for example converting all conditional statements into some "canonic" form). Those more complex and labor consuming methods are often used in forensic investigations. For general discussion of related algorithms see, for example:

Social dimension is the level of respect for IP rights in the open source community. As one of the participants on O'Reilly Network: Finding Ken Brown's Lies [Jun. 04, 2004] discussion forum noted, Ken Brown is not arguing that Linus did not create Linux from scratch, he is arguing that Linus as well as other "GPL-based commercially sold binary software" (he called this class of software "hybrid source") authors avoid admitting where their "inspirations" came from, essentially stating that all such authors in principle don't care about IP rights and avoid acknowledgements for the works they based their creations on (thus violating academic ethics).  Moreover he is suggesting that GPL community as a whole doesn't care about breaking IP laws (true, as GPL is essentially directed into the subversion of this law), and, in fact, vigorously defends such authors no matter what: 

You miss his argument
2004-06-05 18:06:28  kollivier

He's not arguing that anyone really believes Linus created Linux from scratch, he's arguing that everyone is *avoiding* admitting where his inspirations for Linux came from. Remember, he said that "hybrid source" (ugh) authors don't care about IP rights. He is suggesting that the open source community doesn't care about breaking IP laws, and in fact is covering up Linus' "inspirations" (i.e. the Prentice Hall book), because arguably he copied their IP while not adhering to their terms of agreement. (Remember, he asks why Tanenbaum tried to get Prentice Hall to re-license the code as BSD.)

Linus' admission that the 'tooth fairy and Santa Claus' wrote the code is not only cute, but it helps AdTI make its point. AdTI can claim it's trying to find out what IP may have been influential in the creation of Linux, and Linus answers the issue with a smart remark. Does this sound like someone who feels IP rights are important? Linus needs to be upfront about his influences, and explain why he was breaking no IP laws in using them, in order to keep this from getting messier.

The guy is arrogant and does indeed write misleading arguments, but you should take him seriously. Misleading arguments or not, he makes his point such that influential people will listen to him, in absence of a mature and responsible response to him by the open source community. Jeering and just trying to pick apart his statements for mistakes actually look like attempts to "divert peoples' attention" from the real issues at hand.

You miss his argument
2004-06-06 18:33:56  kollivier [Reply | View]

Yes, but let me use a "Linux-friendly" site to make Ken Brown's point. The site says this: "Linux was created because the licensing requirements for Minix were horribly restrictive, as well as being for-profit." (URL: http://c2.com/cgi/wiki?MinixOperatingSystem )

As far as I can tell in my research, MINIX was moved to the BSD license in 2000. The previous license, also from what I could determine, required licensing for any commercial usage. Thus, any code in Linux that may have come from MINIX could have been a violation of the licensing terms prior to the 2000 licensing change, and this is what Ken Brown is stipulating. (He's effectively saying that Tanenbaum knew there was a problem and tried to get the license changed *because of this*, to protect Linus.)

Is there any truth to this argument? I don't know, really, but no one has said "no" outright. They just keep slamming Ken Brown and saying he has no point, without really saying *why* his points are invalid. He as much as says that people like Tanenbaum aren't going to make his points for him - he's saying they're protecting Linus. The question is: could Linux being inspired by Minix have been a problem (i.e. is there actual code copied), and is it a problem that Linus simply ignored or never considered? I think people's tendency to be "vocal" about the Minix origins were probably because for a long time no one ever thought this was actually a problem. (i.e. most people didn't realize there were any licensing issues at all.)

What I'd like to see is a clear rebuttal, based on hard evidence, of Ken Brown's "facts", and the OSS community has stepped to the plate before, with SCO's FUD, even though they all knew it was FUD. That was because it was important to clearly show the lies and deception their arguments are based on. With Ken Brown, everyone's saying he doesn't have a point, but they skirt around the issue of whether or not Linus actually copied code from Minix. Heck, many people have called it Linux's "precursor". So where's the evidence that no IP rights were broken?

As for respect for intellectual property the answer is simple: users of GPL by definition does not and should not respect intellectual property. Software anarchism is the intrinsic property of the GPL as well as Stallmanism as a "software cult" and here nothing can be changed (see BSD vs GPL. Chapter 1)

As for independence of the source code then additional research is definitely needed to answer this question. In my opinion, a substantial level of independence of the initial Linux code from Minix should be expected from several standpoints, and first of all due to the nature of the initial reference group for Linux project:

Another important consideration is purely psychological: why a definitely gifted programmer would try to kill himself in almost a year solitarily confinement and come out with a plagiarized Minix code ? That's not a good way to enhance self-esteem, that's for sure. And Linus did not need to pass an OS design exam, as he already did that  ;-).

Of course, the originality of algorithms, especially in tasks like the recreation of POSIX compatible kernel is limited by the complexity of the problem and here Linus, as he admitted himself,  relied on Bach's book,  but within those constraints imposed by his source of algorithms for the kernel, it is reasonable to assume that the first released version Linux kernel contained code that was independent of Minix code to the extent Bash book algorithms were different from used in Minix. Also to a certain sense it was an improvement of Minix architecture, as it was written for a much better microprocessor. Subsequent versions of course only amplified the initial design differences, so it does not make much sense to compare version after 0.01. 

The implicit assumption of Brown's paper is that "the high speed of development means lifting of the code." This is a questionable hypothesis. Gifted programmers can create 2K lines of code in one weekend if the problem on which they are working is well defined. Moreover individual productivity of programmers varies greatly. Some programmers has productivity that looks completely fantastic to mere mortals (Bill Joy is probably the most well-known example of "light-speed programming" -- may be I misrepresented quote from somebody in BSD project here, but it was something like that: "other talented programmers can do anything Bill Joy can and even write a better code, but it will take them 10 times longer" ;-).  As Bill Joy aptly put it in his 2000 interview to Salon (Free Software Project BSD Unix Power to the people, from the code) rewriting Unix kernel was not a big deal for anybody with his level of programming talent:

"If I had to rewrite Unix from scratch, I could do it in a summer, easily," says Joy. "And it would be much better. A much, much better job. The ideas are old."

Please note that a the time when Linus wrote the kernel the amount of material available to such a rewrite was impressive and included BSD NET/1 tape, Bach's book, Lion's book, Tanenbaum book, and several others as well as several reference implementations.

This was far from the situation when you create something from scratch.  Also the first usable version was probably version 0.99, that was released in 1992. So it would be better to consider the date of "release" of Linux not version 0.01 (which was a semi debugged, barely-working student-project-like version oriented mainly on getting feedback/help from Minix community developers, who wanted a POSIX compatible version of Minix), but version 0.99 which was the first usable version. This is not a six-month period, but almost a  two year period and don't forget that the reimplentation was done using a very good book by Bach (complete blueprints), Usenet community and in no way was "from scratch". Reimplementation is always faster then the original implementation: you simply cannot compare speed of icebreaker with the speed of ship who follows her in clear water.

As another participant in the same forum (O'Reilly Network: Finding Ken Brown's Lies [Jun. 04, 2004]) in his reply to kollivier (see above) noted:

You missed the history
2004-06-06 07:33:39  Moshe [Reply | View]

KB first claimed that Linus had stolen code. When Brown couldn't get proof, even after the most tortured, leading questions from people he thought would be hostile to Linus he twisted their words to try to support his theft case. When they ( Tanenbaub, Stallman & Ritchie ) came out and cried 'FOUL!!!' Brown retreated to his current stance.

What is the current stance, really? While the theft angle is almost gone, (it's still lurking in there in the weaselly wording) in a nutshell it's something like "without having been educated in how OSes work Linus would never have written Linux." Or "Without knowledge Linus could never have written Linux".

Well, DUHHH... Ritchie had Knowledge too, of the state of the art at his time. Tanenbaum had knowledge too, about the state of the art at his time.

Brown's current stance has disintegrated, moving away from direct accusations of theft to "if some projects take a long time then all projects must, and those projects that don't take a long time have cheated in some way."

What is interesting in Moshe comments is that he is right: the level of Unix analysis and description in the books and press (DDJ) was very highly detailed. And Linus actually proved that Unix OS kernel can be replicated without much knowledge and OS development training, by learning the craft along the way, like in any typical student project. As for the "Linus had stolen the code" claim, please note that Brown is not a programmer and Linus problems should be viewed more in ethical domain, like "violation of academic ethics" claim (complete lack of acknowledgment).

Around the middle of April, 2004, Alexey Toptygin,  at the request of Ken Brown  performed a very simple code analysis (using source code, not XREF  tables,  or procedure call trees), comparing several early versions of Linux and Minix. The results of this analysis were not included in the Ken Brown's paper probably because his conclusions contradicted Brown's hypotheses about direct borrowing of code.

As I already stated above, I am not a believer in automatic code comparison in complex cases and I respectfully ask the reader to view them very skeptically both in positive and negative cases. From my point of view Alexey Toptigin results are at best extremly superficial: my professional experience suggests that for such complex tasks as comparison of Minix and Linux codebase,  human eyes are the only suitable instrument: without a lot of blood and sweet it is impossible to find real similarities between any two large, complex programs.

Automatic tools can help, but not much. And indeed a professional can find many flaws in  Alexey's analysis. One obvious problem is the tool chosen for the task:  he used a very primitive software similarity tester SIM version 2.12 written by Dick Grune, Vrije Universiteit (Amsterdam, Netherlands). This generic utility does not actually "understands" the C language. Moreover it's written in wrong language (unless you are a masochist, scripting language like Perl should be used for solving this kind of problems :-). It looks like "token-based diff" and the only advantage of the tool in comparison with plain vanilla diff that I see is that it uses a rather simplistic built-in lexical normalization, eliminating whitespace, comments and may be some tokens (like  ".", ",", etc). Similar or better normalization can probably be achieved by using eliminating all line breaks and comments on the first pass and then using a pretty-printer for the code in the second. SIM looks like mostly oriented at finding plagiarism in English language submissions, but not so much for the source code comparison, to say nothing about the task of comparing two kernels. On the webpage for the tool, the author states the following:

"SIM tests lexical similarity in texts in C, Java, Pascal, Modula-2, Lisp and natural language. It can be used - to detect potentially duplicated code fragments in large software projects, - to detect plagiarism in software and text-based projects, educational and otherwise."

The main idea of SIM is to compare a (rather short) student submission against the bank of departmental submissions and find one with the direct matching (student's plagiarism detection).  That task has little similarity with the task of comparing two operating systems written for different microprocessors (8086 and 386) which is an extremely complex forensic type of investigation.  The outline of the algorithm used in SIM is described below: 

The general outline of the similarity checker is as follows:

1. the files are read in (pass 1)
2. a forward-reference table is prepared
3. the set of interesting runs is determined
4. the line numbers of the runs are determined (pass 2)
5. the contents of the runs are printed in order (pass 3)

To keep the memory requirements (relatively) small, the exact positions of the tokens are not recorded.  This necessitates pass 2.  See, however,  the pertinent chapter.

READING THE FILES

Each file is tokenized using an lex-generated scanner appropriate for the input.  Each token fits in one byte, possibly using all 8 bits.  The tokens are stored in the array tk_buff[], which is extended by reallocation if it overflows.  See buff.c. Also, to optimize away pass 2, an attempt is made to remember the token positions of all beginnings of lines.  The token-positions at BOL are stored in the array nl_buff[], which is also extended by reallocation, if needed.  If the attempt fails due to lack of memory, nl_buff[] is abandoned, and pass2 will read the files.

PREPARING THE FORWARD-REFERENCE TABLE

Text is compared by comparing every substring to all substrings to the right of it; this process is in essence quadratic.  However, only substrings of length at least 'min_run_size' are of interest, which gives us the possibility to speed up this process by using a hash table.

Once the entire text has been read in, a forward-reference table forward_references[] is made (see hash.c). For every position in the text, we construct an index which gives the next position in the text where a run of min_run_size tokens starts that has the same hash code.  If there is no such run, the index is 0.

To fill in this array, we use a hash table last_index[], such that last_index[i] is the index of the latest token with hash_code i, or 0 if there is none.  If at a given position p, we find that the text ahead of us has hash code i, last_index[i] tells us which position in forward_references[] will have to be updated to p. See make_forward_references().

For long text sequences (say hundreds of thousands of tokens), the hashing is not really efficient any more since too many spurious matches occur.  Therefore, the forward reference table is scanned a second time, eliminating from any chain all references to runs that do not start with and end in the same token (actually, this is a second hash code). For the UNIX manuals this reduced the number of matches from 91.9% to 1.9% (of which 0.06% was genuine).

DETERMINING THE SET OF INTERESTING RUNS

The overall structure of the routine compare() (see compare.c) is:

for all new files
for all texts it must be compared to
for all positions in the new file
for all positions in the text
for ever increasing sizes
try to match and keep the best

If for a given position in the new file a good run (i.e. on of at least minimum length) has been found, the run is registered using a call of add_run(), the run is skipped in the new file and searching continues at the position after it.  This prevents duplicate reports of runs.

Add_run() allocates a struct run for the run (see sim.h) which contains two struct chunks and a quality description.  It fills in the two chunks with the pertinent info, one for the first file and one for the second (which may be the same, if the run relates two chunks in the same file).

The run is then entered into the arbitrary-in-sorted-out store AISO (see aiso.spc and aiso.bdy, a genuine generic abstract data type in C!), in which it is inserted according to its quality.  Both positions (struct position) in both chunks in the run (so four in total) are each entered in a linked list starting at the tx_pos field in the struct text of the appropriate file.

When this is finished, the forward reference table can be deleted.

So the final results of this phase are visible both through the tx_pos fields and through the aiso interface.

DETERMINING THE EXACT POSITION OF EACH RUN (PASS 2)

The purpose of this pass is to find for each chunk, which up to now is known by token position only, its starting and ending line number (which cannot be easily derived from the token position).

For each file that has a non-zero tx_pos field, ie. that has some interesting chunks, the positions in the tx_pos list are sorted on ascending line number (they have been found in essentially arbitrary order) by sort_pos() in pass2.c.

Next we scan the pos list and the file in parallel, updating the info in a position when we meet it.  A position carries an indication whether it is a starting or an ending position, since slightly differing calculations have to be done in each case.

Actually, if the nl_buff[] data structure still exists, the file is not accessed at all and the data from nl_buff[] is used instead.  This is done transparently in buff.c.

PRINTING THE CONTENTS OF THE RUNS (PASS 3)

Since each struct run has now been completely filled in, this is simple; the hard work is calculating the page layout. Pass3() accesses the aiso store and retrieves from it the runs in descending order of importance.  Show_run() opens both files, positions them using the line numbers and prints the runs.

As one can see from the generic description of the underling algorithm presented above,  SIM's ability to detect similarities in code is very limited as the tool was designed more for natural text comparison, not so much for code comparison. Still it is a useful first step and it's important to note that it did not find direct "lifting" of substantial chunks of Minix codebase, the task the tool is perfectly capable to accomplish. The problem is that without suitable preprocessor that performs lexical normalization by using XREF tables and procedure call maps,  I am not even sure that a single renaming of some  frequently used variables (like i to kkk, or something like that) will not do the trick of fooling the tool.

Still the tool is adequate for detection of the direct coping of fragments.  Alexey Toptygin analysis did not find substantial similarities outside the filesystem-related modules (Minix was a reference codebase;  the early kernels had to be booted with Minix - i.e. they were not  standalone systems, but relied on Minix for filesystem and the like). Alexey correctly figured out that the main question is the level of similarity of versions starting  from 0.95 (released March 1992), as it was just before Yggdrasil released the first CD-ROM distribution. Before that date that non-commercial use of Minix code was completely legitimate (at least from my standpoint), as it was published in a textbook. Alexey included version 0.96c into his analysis. His main conclusion is that:

... out of thousand of lines of code, only 4 small segments were found to be similar, and since in each case the similarity was required by external factors (the C standard, the POSIX standard, the minix filesystem format), it is highly unlikely that any source code was copied either from minix to linux or vice-versa.

IMHO the release of Linux was essentially a stage of development,  when such fragments, if they existed before,  were eliminated. Here is the text of his report to Ken Brown:

linux versions:

and minix versions:

were analyzed and compared against each other for code-level similarities.

The software similarity tester SIM version 2.12 was used for source code comparison.

Since minix is distributed as a full OS, whereas linux is only a kernel, comparisons were restricted to the kernel portion of minix; however, source code line counts are provided for both the minix kernel and the entire minix distribution.

OS lines of c lines of c
and assembly
lines of c and assembly,
counting blank lines
linux-0.01 7574 8933 9877
linux-0.11 10232 11453 12666
linux-0.12 14059 15420 16914
linux-0.96c 29418 29719 32943
minix-1.1 (kernel) 11386 12780 14886
minix-1.1 (whole) 26960 29171 33670
minix-1.2 (kernel) 10800 36007 37894
minix-1.2 (whole) 19052 59431 62520

line count of (and links to) raw comparison data:

VS minix-1.1 minix-1.2
linux-0.01 1503 1496
linux-0.11 1673 1666
linux-0.12 1888 1881
linux-0.96c 6318 6310

 

VS linux-0.01 linux-0.11 linux-0.12 linux-0.96c
minix-1.1 1804 1804 1870 2500
minix-1.2 2014 2014 2073 2810

Comparison Analysis:

The raw comparison files are very large, but mostly full of false positives. This is due to the way SIM handles lists of constants and SIM's inability to distinguish between function calls and certain elements of syntax.

Only 4 actual similarities were found. They are excerpted in whole, with reference to the respective source files, and discussed. Since the similar code sections are fairly invariant over all versions of minix and linux compared, excerpts will be taken from linux-0.96c and minix-1.2.

  1. in linux, include/linux/ctype.h:
    #define _U      0x01    /* upper */
    #define _L      0x02    /* lower */
    #define _D      0x04    /* digit */
    #define _C      0x08    /* cntrl */
    #define _P      0x10    /* punct */
    #define _S      0x20    /* white space (space/lf/tab) */
    #define _X      0x40    /* hex digit */
    #define _SP     0x80    /* hard space (0x20) */
    
    #define isalnum(c) ((_ctype+1)[c]&(_U|_L|_D))
    #define isalpha(c) ((_ctype+1)[c]&(_U|_L))
    #define iscntrl(c) ((_ctype+1)[c]&(_C))
    #define isdigit(c) ((_ctype+1)[c]&(_D))
    #define isgraph(c) ((_ctype+1)[c]&(_P|_U|_L|_D))
    #define islower(c) ((_ctype+1)[c]&(_L))
    #define isprint(c) ((_ctype+1)[c]&(_P|_U|_L|_D|_SP))
    #define ispunct(c) ((_ctype+1)[c]&(_P))
    #define isspace(c) ((_ctype+1)[c]&(_S))
    #define isupper(c) ((_ctype+1)[c]&(_U))
    #define isxdigit(c) ((_ctype+1)[c]&(_D|_X))
    
    
    in minix, include/ctype.h: 
    #define _U      0001
    #define _L      0002
    #define _N      0004
    #define _S      0010
    #define _P      0020
    #define _C      0040
    #define _X      0100
    
    #define isalpha(c)      ((_ctype_+1)[c]&(_U|_L))
    #define isupper(c)      ((_ctype_+1)[c]&_U)
    #define islower(c)      ((_ctype_+1)[c]&_L)
    #define isdigit(c)      ((_ctype_+1)[c]&_N)
    #define isxdigit(c)     ((_ctype_+1)[c]&(_N|_X))
    #define isspace(c)      ((_ctype_+1)[c]&_S)
    #define ispunct(c)      ((_ctype_+1)[c]&_P)
    #define isalnum(c)      ((_ctype_+1)[c]&(_U|_L|_N))
    #define isprint(c)      ((_ctype_+1)[c]&(_P|_U|_L|_N))
    #define iscntrl(c)      ((_ctype_+1)[c]&_C)
    #define isascii(c)      ((unsigned)(c)<=0177)
    

    These are the 'character type' macros. They predate both minix and linux, and are a part of the majority of C libraries. They are specified in the ANSI C standard (ANSI X3.159-1989), and arereferred to in most C textbooks (i.e. "C++ How to Program" H. M. Deitel, P. J. Deitel --2nd ed. ISBN 0-13-528910-6).
     

  2. in linux, include/linux/stat.h:
    #define S_IFMT  00170000
    #define S_IFSOCK 0140000
    #define S_IFLNK  0120000
    #define S_IFREG  0100000
    #define S_IFBLK  0060000
    #define S_IFDIR  0040000
    #define S_IFCHR  0020000
    #define S_IFIFO  0010000
    #define S_ISUID  0004000
    #define S_ISGID  0002000
    #define S_ISVTX  0001000
    


    in minix, h/stat.h:

    #define S_IFMT  0170000         /* type of file */
    #define S_IFDIR 0040000         /* directory */
    #define S_IFCHR 0020000         /* character special */
    #define S_IFBLK 0060000         /* block special */
    #define S_IFREG 0100000         /* regular */
    #define S_ISUID   04000         /* set user id on execution */
    #define S_ISGID   02000         /* set group id on execution */
    #define S_ISVTX   01000         /* save swapped text even after use */
    #define S_IREAD   00400         /* read permission, owner */
    #define S_IWRITE  00200         /* write permission, owner */
    #define S_IEXEC   00100         /* execute/search permission, owner */
    

    Both the names and values of these constants are specified by the POSIX standard.
     

  3. in linux, in fs/read_write.c:
            switch (origin) {
                    case 0:
                            tmp = offset;
                            break;
                    case 1:
                            tmp = file->f_pos + offset;
                            break;
                    case 2:
                            if (!file->f_inode)
                                    return -EINVAL;
                            tmp = file->f_inode->i_size + offset;
                            break;
            }
            if (tmp < 0)
                    return -EINVAL;
            file->f_pos = tmp;
    


    in minix, in fs/open.c

      switch(whence) {
            case 0: pos = offset;   break;
            case 1: pos = rfilp->filp_pos + offset; break;
            case 2: pos = rfilp->filp_ino->i_size + offset; break;
            default: return(EINVAL);
      }
      if (pos < (file_pos) 0) return(EINVAL);
    
      rfilp->filp_ino->i_seek = ISEEK;      /* inhibit read ahead */
      rfilp->filp_pos = pos;
    

    The behavior of the lseek system call is specified by POSIX. Since it is so simple, practically all implementations will be highly similar.
     

  4. in linux, in fs/minix/inode.c:
            s->s_imap[0]->b_data[0] |= 1;
            s->s_zmap[0]->b_data[0] |= 1;
    


    in minix, in fs/super.c

      sp->s_imap[0]->b_int[0] |= 3; /* inodes 0, 1 busy */
      sp->s_zmap[0]->b_int[0] |= 1; /* zone 0 busy */
    

    This operation is required in order to correctly mount the minix filesystem. All implementations would need this or equivalent code.

Since, out of thousand of lines of code, only 4 small segments were found to be similar, and since in each case the similarity was required by external factors (the C standard, the POSIX standard, the minix filesystem format), it is highly unlikely that any source code was copied either from minix to linux or vice-versa.

I would like to stress it again, that Alexey Toptygin analysis is just one small attempt to solve an extremely complex problem and should be interpreted as such.

But the whole question of "partial plagiarism" in software is a different thing, if the source is a published textbook. In the latter case "lifting" a substantial fragments from provided sample code is a more permissible that copying fragments from somebody's article/book into somebody else article/book without proper acknowledgement of the author ("classic plagiarism"). The mere fact that the Minix code was published in a textbook makes reuse for non-commercial purposes legitimate for all readers (with proper acknowledgement and without any warranty; I would say that historically programming textbooks always used BSD license, even before it existed :-)  A typical usage policy for sample code in a programming book is as following:

You may study, use, and modify these examples for any purpose. This means that you can use the examples, or modified versions of the examples in your programs, and you can even sell those programs. You can distribute the source code to these examples, but only for non-commercial purposes, and only as long as the copyright notice is retained. This means that you can make them available on a public Web site, for example, but that you cannot include them on a commercial CD-ROM without the prior permission of O'Reilly and Associates.

So, formally, before of the advent of commercial Linux distributors (November, 1992; the first commercial distribution,  Yggdrasil, was based on 0.99.13 kernel) a substantial portion of Minix code (as published in the Tanenbaum's textbook) can be used in Linux and even a direct usage of substantial portions of Minix code does not represent an infringement.  "Linux commercialization wave" (that produced "GPL-based commercially sold software", called in Ken Brown's paper "hybrid source") started in 1992, after version 0.99 was released. So it would be better to consider the date of "release" of Linux not version 0.01 (which was a semi debugged barely-working student project like version), but version 0.99 which is the first really usable version. This is not a six-month period, but almost a  year period and don't forget that the reimplementation was done using a very good book by Bach (complete blueprints) and not from scratch as in case of original Unix development.

Still the way Linus defends this trivial similarities is interesting. In his letter to GROKLAW dated December 22 2003 he stated the following:

For example, SCO lists the files "include/linux/ctype.h" and "lib/ctype.h", and some trivial digging shows that those files are actually there in the original 0.01 distribution of Linux (ie September of 1991). And I can state

- I wrote them (and looking at the original ones, I'm a bit ashamed:  the "toupper()" and "tolower()" macros are so horribly ugly that I wouldn't admit to writing them if it wasn't because somebody else claimed to have done so ;)

- writing them is no more than five minutes of work (you can verify that with any C programmer, so you don't have to take my word for it)

- the details in them aren't even the same as in the BSD/UNIX files (the approach is the same, but if you look at actual implementation details you will notice that it's not just that my original "tolower/toupper" were embarrassingly ugly, a number of other details differ too).

In short: for the files where I personally checked the history, I can definitely say that those files are trivially written by me personally, with no copying from any UNIX code _ever_.

So it's definitely not a question of "all derivative branches". It's a question of the fact that I can show (and SCO should have been able to see) that the list they show clearly shows original work, not "copied".

Anyway, even for early versions the legitimate question of the absence of acknowledgement arise and I tend to agree with professor Tanenbaum that it is really regrettable that Linus chose not to mention Minix team in the source code; after all Minix was a prototype that he started with and he definitely used Minix file system and set of tools for development.  In this sense Ken Brown's paper rose a legitimate question that was overlooked before.


Etc

Society

Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers :   Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism  : The Iron Law of Oligarchy : Libertarian Philosophy

Quotes

War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda  : SE quotes : Language Design and Programming Quotes : Random IT-related quotesSomerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose BierceBernard Shaw : Mark Twain Quotes

Bulletin:

Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 :  Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method  : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law

History:

Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds  : Larry Wall  : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOSProgramming Languages History : PL/1 : Simula 67 : C : History of GCC developmentScripting Languages : Perl history   : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history

Classic books:

The Peter Principle : Parkinson Law : 1984 : The Mythical Man-MonthHow to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite

Most popular humor pages:

Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor

The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D


Copyright © 1996-2018 by Dr. Nikolai Bezroukov. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) in the author free time and without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.

This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

You can use PayPal to make a contribution, supporting development of this site and speed up access. In case softpanorama.org is down you can use the at softpanorama.info

Disclaimer:

The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the author present and former employers, SDNP or any other organization the author may be associated with. We do not warrant the correctness of the information provided or its fitness for any purpose.

The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.

Last modified: September 12, 2017