What is (not) plagiarism

A conceptual confusion. Cases of plagiarism often rely on computer-based tools that confuses the differentiation between plagiarism and copyright infringement. This could be a threat to acedemic values in general.

Plagiarism is not reducible to mere replication of words.

Computer-based tools (including AI) provide an extensive range of new methods to examine the originality of research publications. When it comes to the identification of cases of plagiarism, however, their use often relies on a conceptual confusion that suppresses differentiation between plagiarism and copyright infringement. This confusion further threatens to result in massive devaluation of scientific and scholarly contributions and intellectual work in general.

Computer tools used in order to examine the originality of written works operate by checking whether combinations of words in a given document coincide with the combinations of words that can be found in other documents on the internet. Such coincidence often (though not always) may indicate copyright infringement in the most general sense, understood as the misrepresentation of texts written by others as one’s own. 

Plagiarism, however, is normally understood as «[p]resenting work or ideas from another source as your own, with or without consent of the original author, by incorporating it into your work without full acknowledgement». This Oxford University definition goes hand in hand with definitions stated in various dictionaries that systematically define plagiarism as the misappropriation of «work and ideas» and not as mere replication of words. 

This is reasonable, since «work and ideas» should pertain to the intellectual content and not to the words used to convey that content. It is possible to steal someone’s research work and ideas, scientific discoveries or scholarly contributions and present them as one’s own without repeating the same words, without committing any copyright infringement and without being caught by modern computer-based tools that check the originality of research publications.

This distinction between copyright infringement and plagiarism is best illustrated by an example. Years ago I was reviewing a book whose author argued that synesthesia is not «a phenomenon in which stimulation on one sensory pathway leads to automatic, involuntary experiences in a second sensory pathway.» At the same time, the Wikipedia article about synesthesia precisely states that synesthesia is «a phenomenon in which stimulation on one sensory pathway leads to automatic, involuntary experiences in a second sensory pathway». 

Since the author’s formulation clearly replicated the one in Wikipedia, and since the author did not acknowledge the source, I thought that this was a clear case of plagiarism. 

The editor of the journal I was reviewing the book for, however, pointed out that I was wrong. Since the same words were used and since the source remained unacknowledged, the situation could be fairly qualified as a minor copyright infringement. However, misappropriation of a combination of words is not necessarily the same as the misappropriation of the intellectual content these words express. In this particular case, the intellectual content the author expressed was clearly different from the Wikipedia’s article, since he directly opposed the statement from Wikipedia. In other words, the accusation of plagiarism could not be sustained.

It should also be clarified that by «copyright infringement» I mean here something much more fundamental than the violation of the legal requirement to obtain a permission if one reproduces more than a certain amount of text from another source. In principle, the author of a text has the right on the recognition of authorship of the specific combination of words that constitute the text that he or she authored. If this combination is replicated and the source is not stated, the author’s rights are violated even in the case of short sections of text for which the law states that a permission is not needed, and even if someone else officially owns the copyright. This ought to be simply a matter of the courteous recognition of authorship. 

Accordingly, one should differentiate between minor and major infringements. It is not the same if one copies a couple of lines or if one publishes someone else’s book under his or her name—the way it is not the same if someone steals a chocolate in a supermarket or commits a bank robbery. 

If all cases of copyright infringement (understood in this wide sense) are simply qualified as plagiarism, the accurate understanding of the gravity of the wrongdoing is lost.

In the case of plagiarism such distinctions in the level of wrongdoing are typically not made (people do not talk about minor or major plagiarism). If all cases of copyright infringement (understood in this wide sense) are simply qualified as plagiarism, the accurate understanding of the gravity of the wrongdoing is lost. At the same time, the tendency to equate minor copyright infringements with plagiarism tends to present such misdemeanors in inappropriately dramatic light. It blows them out of proportion. 

This brings public and media attention to cases that hardly deserve it, while on the long run it threatens to relativize the very concept of plagiarism.

In different disciplines it may be common to tolerate different kinds of copyright infringements. In some disciplines it is not unusual to preface all articles from the same research project with one and the same paragraph that explains the intentions of the project. Each article then continues, after this opening paragraph, with the presentation of the genuine contribution specific to that article. The opening paragraph is thus merely a generic introduction, while the actual novel intellectual content follows afterwards and differs from article to article. In such cases it would be certainly inappropriate to talk about self-plagiarism merely because the opening paragraphs coincide. 

Similarly, in every discipline there are platitude statements that are often repeated and that authors in the given field may use, for instance, in order to introduce a topic or an argument. Computer tools that are used to check the originality of academic work will typically classify such statements as plagiarism because they are widely repeated in the literature, whereas one should rather describe them as common assumptions or widely shared knowledge.

The tendency to confuse copyright infringements with plagiarism can have particularly unpleasant consequences for researchers in the fields where publications typically result from team work. If different authors write different parts of the same paper, they can never be sure whether some co-author may have copied parts of the text from elsewhere—while, at the same time, the responsibility for the paper is shared by all co-authors. 

Accusations for plagiarism in such situations seem to hit the innocent and the guilty. At the same time, it is reasonable to expect that all co-authors should agree about the intellectual content of their publication, and if this content replicates someone else’s work that the authors want to present as their own, then the accusation for plagiarism is fair and justified.

The Oxford definition of plagiarism cited above—and other definitions that agree with it—implies that the concept of self-plagiarism is a contradiction in terms. Clearly there are cases when authors repeat the same ideas in new publications, but insofar as these are originally their ideas it is meaningless to talk about plagiarism. Authors are often asked to present their contributions in public or re-represent them in writing in a different form in order to clarify them. It is hard to see how something could be wrong with this standard academic practice. 

But authors do commit copyright violations if they use the same combinations of words when they express the same ideas again. Admittedly, such violations can be excruciatingly hard to avoid when writing again on the same topic. An author who has published a book on a subject has typically read that book more than ten times while editing it. The book’s formulations have been deeply impressed in his or her mind and one has often worked hard in order to create them. It is therefore not easy to find new words to write again on the same topic on which one has already published, and however one tries the old formulations may still unconsciously sneak into the new text. 

It is therefore inappropriate to take the repetition of the same phrases or even individual sentences in such situations as malicious. But obviously, if the text pretends to be a new text on the same topic, then it is a problem if it replicates longer passages from an older text. Here too, it is important to differentiate between minor and major copyright infringements. Talking about plagiarism in such situations merely blurs the difference in degree and threatens to relativize plagiarism as an offence by blowing minor infringements out of proportion. 

Attitudes to what is today called «self-plagiarism» have also changed over time. At the time I was starting my career the major concern was to avoid violating the contract with the publisher, but I do not remember that the term «self-plagiarism» was used the way it is used today. I may be wrong, but I do remember colleagues who regarded referring explicitly to one’s previous publications as shameful self-advertising and avoided it.

This entails that scientific research and scholarly work merely consist of word combinations and that the intellectual content is irrelevant.

I have mentioned above that the failure to differentiate between plagiarism and copyright infringement threatens to result in the devaluation of intellectual work in general. It is very dangerous for academic institutions and research communities to accept the identification of copyright infringement with plagiarism, because this entails that scientific research and scholarly work merely consist of word combinations and that the intellectual content is irrelevant. 

Consider the case of an institution that wants to penalize for plagiarism a candidate who copied a section of text from another person’s work. Clearly, they may have regulations against copyright infringement that penalize the copying of word combinations. But if they identify this copying of words (and not the misappropriation of the intellectual content) as plagiarism, they are saying that they are not awarding their degrees for the intellectual content of the work and ideas, but merely for skillfully crafted combinations of words. 

In principle, establishing that words have been copied from another text cannot be enough to establish that one is dealing with plagiarism; one needs to show that the replication of words was accompanied by the misappropriation of the intellectual content for which the degree is being awarded. 

As mentioned, plagiarism is not reducible to mere replication of words. The tendency to confuse plagiarism with copyright infringement that underwrites the current use of computer tools for checking the originality of scholarly and research contributions should therefore be a matter of serious concern for everyone in academic and research communities. It reduces intellectual contributions to mere word combinations, and it certainly presents a serious threat to academic values in general.

