Distributed Review Process

Examples and motivation

The following examples are related to the reliability of Wikipedia and/or the review process respectively the lack thereof. Of course examples cannot prove anything, but they can be helpfull for illustrating certain deficits and problems Wikipedia has according to different people.

In an interview Klaus Holoch spokesman of the Brockhaus-Enzyklopädie (renowned German Encyclopedia) said that he has doubts about the correctness of the Wikipedia. According to him there are wrong things in Wikipedia which nobody will notice, for example he claims that there are entries about birds which do not exist. He said that he is not afraid of Wikipedia, stating that if one has to rely on some statements one needs sources that verify every statement multiple times. For Brockhaus every entry is checked at least three times from experts. [Rough translation/summary from [1]]. Other media covering mentions the problem of reliability in Wikipedia too, e.g. [2].

People on different discussion forums (Slashdot, kuro5hin?FIXME, c't to name only a few) as well as in private mail have asked the question of reliability of Wikipedia and number of times.

The Wikipedia article about the history of Luxembourg [3] contains the following sentence: "The country considers 1835 to be its year of independence". Looking up this date in other sources, e.g., [4],[5] one finds that 1839 is the correct date, which is in perfect accordance with [6]. This factual mistake has been in Wikipedia for more than 2 years! FIXME

Checking some of the statements is very difficult. The English Wikipedia for instance states in its article about KDE, that KDE 1.1 was released on February 6 [7] while the French Wikipedia writes that KDE 1.1 was released in "4 mars 1999" [8]. Verifying who is right turns out to be difficult. If you check it, it not only turns out, that it is hard to find sensible results with google at all, but that it might depend on which webpage you find first:

[9] or [10]. The first is a writeup of a talk about the history of KDE (from a KDE developer), the second is the old webpage from the KDE project. French and English versions listing the dates for more than a year--one of them clearly is wrong.

In the past it happened a number of times to me that I had to check some statement I found in Wikipedia. This was either because I needed to ensure, that the information was correct or because the statement looked "not right". In the spirit of Wikipedia I looked things up in a trustworthy source and in case Wikipedia was wrong I corrected it. If Wikipedia was right I was happy about that, but on the other hand I felt a bit unsatisfied, since there was no way for me documenting that I have checked a statement in order to give others some more trust in this particular statement.

"Doch es regt sich auch Kritik an der Verlässlichkeit der Informationen der Wikipedia. Dank des offenen Redaktionsprozesses kann jeder an Artikeln mitarbeiten - Manipulationen und Fehlinformationen sind aber ebenso einfach." [11] FIXME

[12]- fixme

The traditional peer review process

If we accept for a moment that Wikipedia needs a better review process which can be trusted more than its current form, than it makes sense to first have a look at how journals and other encyclopedias do their reviews. In most cases they use some sort of peer review process which can have different variants, but in almost all cases the following points are part of that process:

two or more experts in the fields are doing the review
they read and judge the whole paper or article and give a recommendation if the article can be accepted, needs corrections or should be rejected.
the referees work in most cases on an anonymous basis, that means that the reviewers are not known to the authors of an article

In general the editor chooses the referees and has the final judgement whether an article should be published or not. Often the editor himself has some expertise in the field.

The chief rationale of a review process is the experience that one person won't spot every weakness. Having a number of experts that carefully look through the article reduces the chance for overseeing a mistake. Of course this is no guarantee that the article is 100% free of errors. In history a number of articles have passed the peer review process and it turned out much later that the results were wrong. There was a "proof" of the four-color theorem which was considered correct until after decades it was found that it contained a loop-hole.

Some Theses

This section presents a couple of theses about Wikipedia in its current form. The subsections that follow contain more information and some arguments for their support. Based on these theses we formulate some requirements a good review process for Wikipedia should have. For understand the distributed review process itself it is not important agreeing with every of the presented theses. The only exception being theses number four which states that Wikipedia needs a review process.

The incentive for checking is a lot lower than for adding content
Most bugs are shallow
Articles not necessarily ever reach 100 points
Wikipedia needs an equivalent of peer review
There is no 100% guarantee for the correctness of an article
A review of a whole article approach will not work
The review should focus on reliability of information

The incentive for checking is lower than for adding content

How many of the Wikipedia contributers have taken a whole paragraph or even a whole article and checked sentence by sentence the correctness of the given statements? I would dare to say say that not very many did this. One of the reason for this is that the reward for checking the correctness of statements is very low.

doing good checking takes a lot of time. In particular if you are not an expert in the field finding and having access to trustworthy sources may take a lot of time.
the "good" thing is if you find some statement which is wrong. Then you can at least change something, but if everything turns out to be correct then nothing changes at all.
even more frustrating than not improving the article is that there is no way telling the system what statements you have checked. This means that your hard work will not only be unnoticed by others, but also that others do not get any advantage from that. If others knew that a statement S1 has been verified by the trustworthy (whatever this means for now) persons A, B and C they at least can trust the information with a high degree in contrast to having no information about the correctness of S1. And if someone knows that S1 has been checked already three times they might want to check statement S2 which has not been checked at all instead of a fourth review of S1. But all this is not possible at the moment.

When adding content you get a immediate reward for the additional content: the article gets a little bit longer and your name shows up in the article history. If you started a new article even the total number of articles rises, sometimes others join the edit process a short time after creation of the article and you can see the article growing.

Even if you make a mistake or aren't 100% sure about the correctness of a statement you can hope that others will correct it later. "After all it's the first version and it has not to be perfect" - you don't have to work with the same care as in the review, how many of us really do double checking of every statement?

Most bugs are shallow

...

Requirements for a review system for Wikipedia

The requirements for a working review system for Wikipedia are different than those for traditional encyclopedias. We almost solely rely on volunteer work and do not have the money to pay experts for a review. Therefore the traditional peer review system will not work for us.

We can now start thinking about different other review systems--a number of very different approaches was already suggested, e.g. see Wikipedia_approval_mechanism and meta:Peer_review_and_the_Wikipedia_process. Nevertheless it makes sense to formulate some properties a good review system should have first. That allows us a better comparison of the different suggested review systems.

List of features a review process suited for Wikipedia should have:

The most important thing about a review process of course is that it works. And that means that articles that have been passed our review process should be considered at least on par with articles of Britannica etc. regarding the correctness of the statements.
The above requirement implies that we have some mechanism in place that guarantees that every statement was at least reviewed by one person.
Helping with reviewing should be as easy as possible. It would be perfect if reviewing has not a much higher entry barrier than editing an article. Ideally there is some link "do review" below every article and participating in reviewing initially only takes one or two minutes of reading the basic instructions.
We do not want to waste our resources. Everyone who is willing to participate should be able to so. If a contributer has only time checking a part of an article he should not be forced to check the whole article instead.
The review process should be efficent - it should be clear which articles and ideally which parts of an article need a deeper review. Checking some statments fifty times and others not at all is not sensible.
There should be some reward for checking statements. This could for instance be the overall trust level of the article that increases and/or the name of the reviewer showing up in the contributer list.

The Distributed Review Process

The Distributed Review Process, for short DRP, is based on the these, that an accurate check of a whole article will not work for thousands of articles. Therefore it is necessary to break down the article into smaller pieces which then can be reviewed independently by many contributers.

The simple idea of DRP is to decompose the article into sentences. Reviewers then have the possibility to give feedback to the system whether a statement is true or not. Simply speaking the number of checked sentences forms a measure for the overall correctness of an article. Unfortunately there are several things which make this simple picture more complicated, but from a users point of view it will almost look as simple as described. More on DRP is discussed below and additional extensions such as a trust system for wikipedia are described in the next section.

Introduction to DRP

As stated above and discussed in more detail the previous section a review process that accepts whole-article checks only will not work very well. The reason for this is primarily that checking a whole article is tremendous amount of work almost nobody wants to do. Articles like Johann Sebastian Bach contain literally thousands of statements and there is only little incentive for doing a review and checking every single of these statements.

All these thousands of statements need to be checked and in many cases even an expert doesn't know the truth of a statement out of his head. Even worse is is sometime very hard to find a trustworthy source at all in order to verify a particular statement. Since nobody is paid for doing this, it will in most cases inevitable lead to partially checked articles only, which means that we either get no review at all or an inaccurate ones.

User:Eloquence wrote on his platform page the following: "My experience on FAC indicates that many people have not even read the articles they support (no surprise, many of them are 40,000 characters and longer). Still, even with this simple process, we only have about 100 articles which haved pass this type of review."

This is another example (besides Nupedia) why a large scale, traditional like review process for Wikipedia will fail. A recent example of a proposition for validation can be found on meta "Proposition of validation by a committee" and the same reasons as described above will lead to the same problems.

Basically the statement that a "review of a whole article" approach will not work is a the logical continuation of what have learnt from the past by the failure of Nupedia respectively the success of Wikipedia: there are only very few people that have the motivation of writing a whole article, while on the other hand Wikipedia has proved that many people are willing to add to and improve articles incrementally. This by the way is easy to verify: almost every longer article was written by at least a three persons.

The scheme, that many are willing to do a little and a few are willing to do a lot, can be found in many other volunteer projects as well and is also known as Pareto principle or "80-20-rule". It is therefore sensible to accept this distribution of contributions as fact. A good review system should take this into account by making best possible use of the given resources, that means that as many as possible people should be able to contribute to the review process even if their time is very limited. For instance a professor who only can spend a couple of minutes per day on reviewing should nevertheless be able to help with the review process.

At this point it should be emphasized here again that the review process that is discussed here concentrated on the correctness of statements only. This is an important difference to the traditional review process, where the reviewer gives additional comments on what is missing and what should be improved. Wikipedia has various other forms for assuring that an article is complete, as described in the previous section (FIXME).

User:Marco Krohn/Distributed Review Process