Saturday, June 05, 2004

Variability in plagiarism

Pharyngula has an excellent response to a case of plagiarism in which a student is threatening to sue his university because he's been punished for plagiarism. I'm not surprised by this student's response (some of the plagiarizers I've caught have been quite defiant), but am stunned that some people have found grains of truth in the student's argument and believe the student shouldn't be expelled.

The student is claiming that he was never told plagiarism was wrong, and that he'd gotten away with it in the past so he should be allowed to get away with it now. Pharyngula's major points are that ignorance of the law is no excuse, and that getting away with a crime in the past does not mean that you can commit the same crime in the future. To carry the crime analogy further, I'd add that if students are found to have plagiarized an assignment then it makes perfect sense to reexamine all of the student's prior work for plagiarism as well, just as the police would check to see if a newly caught bank robber's profile fit any previously unsolved bank robberies.

One of the points that I've seen missing from many discussions on plagiarism is just how varied plagiarism can be. Not all plagiarism is capital-P Plagiarism wherein an entire paper or large fraction thereof has been copied from the net or a book. Among the cases of plagiarism I've caught in a few years of teaching biology, almost every one has been different in scale or style from the others. I can think of two scales for ranking plagiarism severity that can independently vary: the degree to which the student has cited their work, and the degree to which students have used other people's work.

Degree of citation (assuming word-for-word plagiarism):
  1. No literature cited section, in-text citations, or quotes around the copied text.
  2. Literature cited section at the end of the text that doesn't contain the work that has been plagiarized from, with no in-text citations or quotes around the copied text.
  3. Literature cited section at the end of the text that possibly contains the work that has been plagiarized from; the work may be cited incorrectly or too vaguely to be identified (e.g. the root homepage of a university instead of a faculty member's research paper), with no in-text citations or quotes around the copied text.
  4. Literature cited section at the end of the text that does contain the work that has been plagiarized from, but no in-text citations or quotes around the copied text
  5. Literature cited section with the proper reference and an in-text citation by the plagiarized portion, but no quotes around any of the copied text.
  6. Literature cited section with the proper reference and an in-text citation including quotes around some of the copied text, but with portions of the copied text not in quotes.
  7. Literature cited section with the proper reference and an in-text citation including quotes around all copied text (not plagiarism).

Amount of copied material:
  1. Entire assignment is copied.
  2. Large fraction of the assignment is copied, but the student has written some interlinking portions.
  3. Moderate amount of the assignment is copied, with a moderate amount of original student work.
  4. Small amount of the assignment is copied, possibly only a sentence or two, with the majority being original student work.
  5. Nothing has been copied (not plagiarism).
Combining both axes means that there are at least 24 different ways that students can plagiarize, and I've probably left out a number of variants. Note that neither of these scales properly takes into account plagiarism of ideas but not wording, which is significantly more difficult to detect with absolute certainty, or the variability in source materials used for the plagiarism.

I've observed cases of plagiarism that vary widely on both of these axes: I've had students take multiple quotes from a source and cite them properly, only to take one or two sentences from the same source and not quote them, and I've had other students turn in papers that are almost entirely copied from a few websites that are not cited. While all of these cases (1-6 and A-D in the lists) are clearly plagiarism, I'm not sure they deserve the same punishment, especially since the milder cases (e.g. 6D) could easily be accidental, caused by nothing more than sleep deprivation and stress. This variation is one reason why I don't like most zero-tolerance policies regarding plagiarism (e.g. automatic F in the course or expulsion), even though I despise plagiarism.

For the most minor cases, a 6D and a one-sentence 5D, I reduce the paper's grade by a goodly number of points and give the student a stern warning. For all other cases my minimum punishment is a 0 on the assignment, the inability to receive an A in the course, and a referral to the dean. This referral ensures that the plagiarism is permanently noted in the student's disciplinary record, and the dean may give further sanctions if desired. I also give myself leeway to be more strict for severe forms of plagiarism; this past semester I caught a 1C student, and in that case I failed the student from the course. I'm relatively comfortable giving my students severe punishments for plagiarism since I provide a lot of guidance on literature citation and avoiding plagiarism in all my classes, including lecturing on the topic, distributing two separate handouts on plagiairsm, and including a section on academic dishonesty in my syllabus.

I've also discovered that plagiarism is not easy to identify, or at least it's not easy for me to identify. I first started electronically scanning for plagiarism three years ago when I'd gotten a crop of about 50 student papers, and while grading them noticed that one was pretty suspicious (not telling me things I'd asked for in the assignment, using unusually advanced terminology, etc.). I manually scanned the paper in, used optical character recognition (OCR) software to convert the scans to text files, and subsequently found that the paper had been almost entirely plagiarized from one website. After finding that case I decided to scan in and OCR every paper from that batch, and discovered that ~30% of the papers contained some form of word-for-word copying plagiarism. I'd read all of those papers, and hadn't suspected that anything was awry in most of them. Thus either I'm especially clueless (not a hypothesis to discount quickly), or plagiarism is harder to detect than one would otherwise think.

To this day I do the same thing, grading papers before I've scanned them for plagiarism, and I'm still surprised to find cases of plagiarism where I'd already read the paper and hadn't suspected anything. Afterwards, I do often spot differences in formatting or writing style between the plagiarized and non-plagiarized portions of the text, but detecting those differences while grading a pile of 100+ papers can be challenging.

Given the low numbers of plagiarism cases that I hear most other faculty reporting (one or two a year, if that), I suspect that many cases are going undetected. For the past four semesters I've made a huge deal about plagiarism in my classes, including explicitly telling students that I will be electronically scanning their papers for plagiarism, and I've still discovered more than 20 cases (even though I teach biology and assign only two relatively short writing projects a semester).

No comments: