User talk:Maunus/PeerReviewReform

Comments welcome[edit]

I have discussed aspects of this proposal with Laser brain (talk · contribs), Victoriaearle (talk · contribs), Mike Christie (talk · contribs), Anthonyhcole (talk · contribs), Sphilbrick (talk · contribs), Atsme (talk · contribs), Fixuture (talk · contribs), Anthere (talk · contribs), and Fuzheado (talk · contribs) - all of whom I hereby ping to request their input regarding this proposal. ·maunus · snunɐɯ· 06:41, 13 August 2016 (UTC)[reply]

Initial comments[edit]

Maunus, I haven't had time yet to read this in detail and do the archive searching I'd like to do, but here are some initial comments about the list of problems you placed at the top of the proposal.

In your listing of problems, you say GA reviews tend to be superficial. Do you mean the model (single reviewer) is wrong, so GA reviews are flawed almost no matter what we find when we look at examples? Or do you mean that the model is possibly acceptable, but that reviewers are often executing it badly? Can you give examples?
If, as I'd argue, a key problem is lack of reviewing resources, wouldn't this exacerbate the problem by insisting on multiple reviewers per review? One of the advantages of GA and PR at the moment is that they only require one editor to assist, though often more than one editor contributes comments at PR.
A related comment: your point 3, that first time nominators often fail, is certainly true for FAC, but I would want to see data for GA to be sure it's true there. I suspect it's much less true. (The mentoring approach currently being discussed at WT:FAC seems to me to be aimed at precisely this problem.) If GA does suffer this problem, and if reviewers are in fact unhelpful there, then I agree it's an issue, but again I'd want to see the data. Almost all GA reviews I've looked at in the last year or two have been generally helpful, to the point that I think no additional input is needed to help the nominator over the line.
I don't see how the proposal moves the needle on quality of content reviewing (point 4 in your list); one needs content experts to do content reviewing. As far as I can see, on existing reviews, any content expert who participates does contribute their expertise; and generally speaking non-experts seem to do their best too, though of course they'll miss things. I can't see how this problem will be addressed. Or do you mean that there will be less focus on non-content requirements such as MoS? I can't see how that's beneficial; those things rarely cause a FAC fail in my experience, and are pretty easy to fix once pointed out, and generally worth doing, if tedious.
I'm not aware of any cliques that compromise the reviews at present -- your point 6. Can you give examples?
Finally, you say "the rate at which Wikipedia generates high quality content has stagnated. This is particularly visible in the processes of GA and peer review". Again I think we'd need numbers here. There's no question it's slower than it was in 2006, and I know we're promoting fewer FAs than we were then, but is that true for GA, and when did the slowdown happen?

-- Mike Christie (talk - contribs - library) 11:04, 11 August 2016 (UTC)[reply]

I will adress your points one by one in order.

There is extreme variability in the way that GA reviews are approached by different reviewers - and the quality of the outcome is entirely dependent on the quality of the reviewers approach. My most recent experience was the review of Tycho Brahe which was 100% superficial using the GA criteria as a simple checklist - it was done my a newly registered user and not a GA regulars - but I know that some GA regulars make the same kind of reviews. I have observed this many times with other reviewers - an example I have noted in the past is the GA review of Irataba. In some cases the reviewer does make comments towards improvement, but only along the lines of their particular interests and specialities (for example Time riley's helpful comments on style and ENGVAr issues in the review of Jean Francois Champollion - but does not engage with the content at all). Compare with the GA review for Greenlandic language which was in many ways more rigorous than some FA reviews - exactly because the reviewer had expertise in the field. Hence, the problem is having a single reviewer, with no additional input on the review process. The model I propose is designed to make the review result in improvements - because the review team can decide to aim for any level they think is within their ability - and because the review process is about improving it to make sure that the article will be approved by the vote (i.e. they are motivated to aim high, and even to over-shoot the target), and not about simply giving the article the stamp (which motivates the GA reviewer to set the bar as low as possible).
In fact this process requires less reviewers per promotion - since a Review Team can consist of only the proposer and one other person, it is in effect the same as the current requirements for DYK and GA - and the requirement of three team members for an EA nomination is fewer than the number of support statements currently required for an FA. I think this down-sizing the participation in the review is justified by the added !vote which makes the final decision to promote. I don't consider that !voting participation will subtract from the reviewer efforts - since nothing is really required other than !voting. Note that the requirement for having a Review Team is basically the same as the mentorship requirement - except that it makes a team with collective responsibility instead of a mentor/mentee relation. Furthermore I think that creating a single streamlined review process with a single pool of reviewers will encourage movement between review processes, and grow the reviewer pool.
Data for GAN is welcome from anyone who cares to assemble it. Some of the problems I describe are based on my own impression. It may be that first time nominators fail less often at GAN - this could easily be a result of superficial reviews and the single reviewer which makes the outcome quite random.
I think the way the rubric is made makes content quality more salient throughout the process, additionally I think the more work intensive review process makes for a better chance that content issues will be noted and addressed before the !vote. The addition of an external review aspect is the single most important part of the proposal for assuring content quality, and I think this aspect is important to integrate into the process either as a requirement or an encouraged option. The aim is to make the review stage somewhat more work intensive in terms of article improvements during the review proces, but less work intensive in terms of decision making - i.e. focusing reviewers efforts away from criticizing articles in relation to a checklist or rubric - and more on article improvement. So hopefully I think this means that reviewer labor is spent better in terms of quality improvement.
I am not aware of any cliques that compromise the reviews at present either. I think the cliques that do exist at FAC are actually mostly beneficial because they mutually motivate each other to continue producing high quality content in areas of shared interest. At GA I think there are situations documented where people mutually review eachothers work uncritically - this is particuclarly because of the one-on-one review form. There have been similar concerns at the DYK QPQ requirement in the past. Whether or not the problem actually exists, the !voting stage removes the risk entirely in my opinion.
Again this is my impression, and I invite any kind of empirical study that corroborates or challenges this impression. I do know that at GAN there have been concerns about increasing backlogs and fewer reviewers, but I am not aware of quantiative data on the topic. I will ask at GAN. This data looks like the monthly increase of GAs has been more or less stable for the past years - suggesting that the backlog is due to an increase in nominations.·maunus · snunɐɯ· 11:32, 11 August 2016 (UTC)[reply]
I'll see if I can find time for a more thorough read through later this week, but I'm interested in seeing other opinions first. Mike Christie (talk - contribs - library) 13:03, 11 August 2016 (UTC)[reply]

Me too, I will think about how to produce more data and use that in the "why we need reform" section as this will likely be crucial if i am o convince others that such a major reform is necessary. Also it will of course make it possible to improve the proposal to know the specific details of the different processes. Thanks for your time taken in reading this so far!·maunus · snunɐɯ· 13:19, 11 August 2016 (UTC)[reply]

Comments from Iazyges

As a peer reviewer, i dont think that wp:PR needs reforms, except for as you mentioned more members, to clear the backlog. I do agree a lot with you points about the FA and GA reviews, but it has some weaknesses: "If the community thinks it is a good idea it could be possible to add a fourth level to the rubric for articles that have been vetted by external experts through the external review process, and vouched for by the copyeditors guild." Oh the cabal that doesn't exist just voted that while it doesn't exist, it exists a little more than before. I feel that this lends a bit to much power to the copyeditors, if your saying that they get a veto ability, which is what it sounds like but I may be wrong. One thing you didn't mention that should probably be done, is to make a separate peer review, explicitly for the improvement of GA or FA potential candidates, and leaving WP:PR as a group that is purely a improvement advice only, and not for criteria checklists as you mentioned. Iazyges (talk) 17:21, 12 August 2016 (UTC)[reply]

Note that when I say "peer review" I include all the review processes of wikipedia, not just the part now called "Peer Review". The "vouching" is basically that a copyeditor looks the article over, copyediting it and says that it is in good shape for publication. It is ont a veto power, because they are expected to actually carry out the copyedits that they think are necessary. I think it is an important part of the proposal that what is onw called "Peer Review" is basically made into the "review stage" of the article improvement process. To have a separate unconnected process for non-binding improvement suggestions is a waste of reviewers - since these could be more economically used by making the improvement the main part of the review process and make the "checklist" simply a vote carried out by the community. In a way you can say that my proposal makes the current "peer review" the only peer review process and that FA and GA reviews are basically turned into !votes. It makes the role of peer reviewers like you more significant because you will be the ones who make decisions about when an article is ready for a vote.·maunus · snunɐɯ· 19:22, 12 August 2016 (UTC)[reply]

Maunus, I've made my own reform proposal on the peer review page, I propose splitting the peer review from criteria review, that the PR will suggest how to make it better, and the CR will sugges thow to make it fit the GA or FA criteria better, Likely it would go PR->CR->FAC or GAC, but of course you could choose to go any route that ends in FAC or GAC, however i would support the vote system you propose, my only conflict with it is that I think people wouldn't be neutral, Would they have to list their reasons for voting yay or nay? Because if so I could see that working. Iazyges (talk) 19:53, 12 August 2016 (UTC)[reply]

I will look at your proposal, note that my proposal is to have a single review in order not to split the pool of reviewers and also that the reviewers do not simply suggest improvements but that they make part of a team that actually carries all those improvements out, instead of having all that responsibility on a single nominator. The !vote will of course be determined in the same way they are at FAC and RFAs where the consensus takes into account the strength of argument and not just the number of votes for or against. The person closing the discussion will therefore be able to discount !votes that are ill-considered and do ont present sufficient argument. ·maunus · snunɐɯ· 06:45, 13 August 2016 (UTC)[reply]

Maunus Would that person be an admin or else would a new position be made, to ensure neutrality? Iazyges (talk) 14:28, 13 August 2016 (UTC)[reply]

Maunus, I can only comment for myself but most of the time I make suggestions, and only edit if it is entirely apparent that the articles need it, ie fixing grammar or spelling, as I generally like to reach consensus on edits if they are being prepared for FAC or GAC. Iazyges (talk) 14:29, 13 August 2016 (UTC)[reply]

I dont think it needs to be a new position - as I describe in the proposal I think it could just be anyone who has participated in promoting a certain number of FA articles, for example five. They would be similar to the current FA-directors, except they wouldnt be specially appointed. I really dont think neutrality is a big problem since therw would be all the same recourses for review of the decision as we have for all our other decision making processes.·maunus · snunɐɯ· 14:51, 13 August 2016 (UTC)[reply]

I understand that that is how most reviewers do both at PR, GAN and FAC - they dont generally participate in the editing. I think this is one of the elements that ought to be changed in the process because it creates a problematic social dynamic between reviewer and nominator,by cgiving a sense of ownership for the nominator and a sense of being the authority to the reviewer. I think a much better process can be created if the nominator and reviewer see themselves as making up a team with the joint responsibility for improving the article.·maunus · snunɐɯ· 14:57, 13 August 2016 (UTC)[reply]

FAC[edit]

I've been mulling this over, and have a couple more comments. First, regardless of its merits, the proposal is much too ambitious to have any hope of gaining consensus. I don't think changes this big are achieved in single steps. To get to where you'd like to go, I think you'll have to figure out what intermediate steps might look like.

Second, I felt when I read this that it would not work well for your top level of internal assessment -- let's continue to call it FA. As criteria get more stringent, it takes more and more expertise and practice to judge whether an article meets those criteria. Not every reviewer at FAC has reviewed dozens of articles against the FA criteria, but a great many of them have; some have reviewed hundreds. That experience is one reason why FAC works as well as it does. I think FAC rarely promotes an article that is a clear fail against the criteria, and it's partly because it's a small community. Similarly, the FA coordinators are very good at evaluating the criteria, because they've done it hundreds, perhaps thousands, of times.

People review at FAC for a variety of reasons, but quite a few reviews are done out of a sense of obligation to that community. If you put all of GA, PR, and FA into a single assessment pool, you'll dilute those talents. I think the result would be lower average quality on promoted FAs.

Related to this is a point about resources, which as I mentioned on your talk page is, I think, a key issue in all these process engineering discussions. I have a certain amount of energy to put into reviews; if the reviews require more effort on my part, or are less enjoyable, I'll do less of them. I am a volunteer here, after all; I do this for fun. I think it benefits the encyclopedia when I post an oppose at a FAC. If you refuse to allow me to oppose, or insist that any criticism has to be accompanied by a resolution to the problems I identify, I simply won't join in the reviews of articles I don't want to work on. I think many others will react in the same way. We don't have the skilled resources necessary to implement your scheme, at least for the highest quality articles. The narrow keyhole of FAC is the only way we've found to keep the demand down to at least close to the level of the supply. It's regrettable, but the alternatives look worse to me.

I haven't thought as much about GA and PR, but I suspect the proposal is vulnerable to criticism on similar grounds -- that it requires resources that don't exist. That's a different discussion, though. Mike Christie (talk - contribs - library) 22:19, 15 August 2016 (UTC)[reply]

Your points are well taken, even if I disagree with a couple of aspects. Certainly you are right that wikipedia is generally neither able or willing to pass reforms of this magnitude - the system is built to keep the status quo, as can be seen with the amount of effort it takes to make even the smallest change to a policy or process. In fact I don't even know what process would be used to make this reform. I will start thinking about how to move in this direction in steps. One would be to institute external peer review. The other could be to change FAC so that it would include an RfC-style !vote that could attract new participants. GA and Peer Review probably could simply be merged into a single process.

For your two other points I simply disagree with your predictions - I dont think a noticeable difference in FA quality will occur (and if it does it will be set off by the increased quantity of FA quality articles), and I also dont think that the diluting the reviewer pool and resource concerns are warranted - rather I think we will become able to have more reviewers exactly because talent will be shared across processes and better developed. Having a process based so much on experience that new talent is hardly ever developed is not a sustainable process and following this path FAC will simply die as the experienced reviewers burn out and no new editors take over. Keeping demand for reviews down to the level of "experienced talented reviewer" is doing a disservice to Wikipedia because it actively hampers quality improval. Having 2000 FAs at a slightly lower level of quality than today's best FAs is surely better for the encyclopedia than having 200 in which everyone is fault free. Encouraging editors who are not experienced and not willing run through the gauntlet of FA to produce high quality content is certainly better for the encyclopedia than being 100% sure that no "unworthy article" makes the grade undeserved. Nonetheless, in my proposal you can still post your oppose, you just do it during the !vote instead of during the review - and the review will be dedicated to what should be the core activity of all wikipedians: collaboratively improving the encyclopedia. ·maunus · snunɐɯ· 06:59, 16 August 2016 (UTC)[reply]

Comments from Maile[edit]

Mine is a more down-in-the-trenches viewpoint. But in an all-volunteer force where nobody has real authority, this is where you have to convince others to agree to the changes. The following reflects my views only.

1. Consensus to move forward. You would first have to get an accumulated consensus from all of the existing processes to even begin the transformation .

2. Individual processes have individual editors who have found their purpose at Wikipedia by being a regular reviewer, overseer, or in some other level.

3. Any process that has a slot on on the main page is not going to be ceded without a battle.

4. GA. I agree that it could be better. It's severely backlogged. Sticking out like a sore thumb is the absence of a notability requirement for a nominated article to be passed. Having a nomination pass with only a single reviewer is lacking. There are no qualifiers for a registered editor to be either a nominator or a reviewer, and one lone opinion passes or defeats a nomination. The well-intentioned GA Cup is in essence a contest to get as many reviews passed as possible, with no oversight of review quality.

5. DYK, for all its critics, is actually a better review process than some. It was set up to encourage new editors and has evolved into something else. The system of checks and balances is one reviewer, and a lot of other people who complain, frequently, that the review missed something. Promoting it to the main page is a two-step process that follows the review, requiring the promotion to the Prep area and the promotion to Queue to all be different individuals. There is much good that happens there. It's also a somewhat odd phenomenon in that it is often a battleground of wills, and recurring overthrow attempts by some who want something else to take that main page slot. It is an arduous place to get one small change of the guidelines, and most efforts at such either die for lack of attention or get voted down. If you want to consolidate the processes, I predict DYK will be your most difficult hurdle because of stakeholders who will lose their perceived niche.

I believe your proposal is one of the best thought-out I've seen. But besides the existing fact of lack of volunteer staffing, your biggest problem is too many individuals who feel they have a stake in where they are. — Maile (talk) 14:31, 19 August 2016 (UTC)[reply]

Comments from Vanamonde[edit]

My apologies for taking so long to get to this. Feel free to take my comments with a pinch of salt, because I'm still a relative noob at this entire process: at last count I'd been through (as reviewer or nominator) 2 FACs, 5 peer reviews, 9 GANs, and somewhere around 60 DYKs.

First off, thanks for pulling this together. No idea if this will go anywhere, but I most certainly agree with you that reform of some sort is necessary, for most of the reasons that you have mentioned. That said, I generally believe that the Wikipedia community is highly conservative, in the sense that it is resistant to change. So some of my comments are not so much what I think would be ideal, but what might be necessary to get this anywhere.

Proposal substance[edit]

I do like the idea of collaborative editing of an article. I wonder, though, whether there is a need to have a formal "proposal stage" for this: it seems a little bureacratic. Why not list a review, and have people move in to give suggestions/make improvements from the get-go?

In a similar vein, I think one of the most important goals of this project is to address the problem with participation and backlogs. I wonder, though, if "reviewing" becomes a more intensive process than it currently is (wherein somebody can make suggestions, but is not expected to actually work on the substance) what the effect is going to be on participation.

I think the idea of scrapping the stub-start-c-b grades is an excellent idea: the last three, especially, have become quite arbitrary. I do feel, though, that we are going to need two levels below GA: because we have many articles which would not pass the "BA" criteria as you have outlined them, and yet are so obviously notable that they cannot be deleted.

This is rather a personal quibble, but I at least view the role of the DYK process as quite different from GAN, PR, and FAC. Those three are quality reviews: DYK is essentially, in my view, a process of showcasing new things by putting them on the main page. Now it is an open question whether the DYK project is viable in the long-run, but for the moment, I'm going to assume that it will continue to exist. In that case:

Having all BAs be eligible for DYK would make many million eligible all at once (or am I misunderstanding something?)
Trying to integrate the DYK process into this is introducing a rather odd "hook" requirement into what is essentially a quality review.

Give these issues, I'm wondering whether it makes more sense to delink the DYK process entirely. Have one review process for quality reviews, essentially as you've described it but without the DYK component. The way I see it, every article would start out as an "SA" (or start-class, for want of a better term). An editor could choose to nominate it for any of the thresholds, and it could end up anywhere, but for an article that's been through the review process, "BA" would be a minimum. Meanwhile, we would have a completely delinked process for putting things on the mainpage, sort of as we do today: DYK, ITN, TFA, OTD, FP, and FLC.

I actually agree that integrating DYK might not be a good or necessary idea - and that the hoolk reqiurement is rather odd in the context. Maybe we should decouple DYK from the review process, that might also avoid some of the reluctance from the regulars of that process. Of course in the process there actually are two levels below GA - one is BA the other is "not yet BA" (basically the same as "stub"). But there is no reason to have the status below BA named and designated on articles, since it will simply be assumed that any article that hasn't been reviewed yet by default has that status.·maunus · snunɐɯ· 14:17, 29 August 2016 (UTC)[reply]

Sure, it doesn't actually need a formal name, though perhaps one would make the process easier to understand. ~~

Personally, I'm not a fan of the "nom gets removed after x period" notion. At DYK, at least, a nomination being old creates pressure to review it, and thereby every article is guaranteed a review of some sort. I think this is essential to a review process: otherwise it is going to create an uphill battle for those folks who work on the less popular topics. This is already an issue, but if we make ignoring nominations a possibility, I think it would get worse.

Passing the proposal[edit]

I'm surely not the first person to point this out, but there's a lot of folks invested in our current process, and others who react badly to reform proposals, so it occurs to me that a way to get it to pass would be to minimize cosmetic changes and/or break the proposal up into chunks that would still be improvements on their own. So here's a few ideas, which may sometimes be mutually exclusive.

Keep the "FA" designation in place. Lots of folks worked hard to get those shiny stars, and understandably, might be attached to them: I'd be attached to mine, I know, had I any.

Similarly to what I've said above, restrict the proposal to GAN, PR, and FAC.

Alternatively, start with a proposal just to merge the three review processes. Turn them all into an FAC-like process, with multiple users commenting, and a few coordinators assessing consensus after a brief while. The reform to the process itself can be a separate chunk. While this may not achieve many of your objectives, it would still be an improvement, IMO.