What do citations tell us about peer-review?

Let’s imagine for a second that we’re editing a research journal. Researchers submit papers to us, we peer-review them to check for flaws, accept the sound papers and reject the others.

There are 2 kinds of mistakes that we can make:

  • Type 1: we reject a paper that should have been accepted.
  • Type 2: we accept a paper that should have been rejected.

It’s important to distinguish these things if we want to evaluate journals properly.

It’s customary to use citation metrics, or distributions, to evaluate journals. Citations are random, biased, varied-in-meaning and for a very large proportion of articles, they are zero. This makes them a somewhat unhelpful proxy for quality.

Citations have also been used to measure the quality of the articles in a journal and even the quality of authors who publish in a journal.

  • If that makes no sense to you, then it shouldn’t, because it doesn’t.

Type I errors

  • So, what does a high impact factor tell us about our peer-review? It tells us that we are not completely incompetent. Strangely, this phrase does not often appear on journal marketing literature.

Type II errors

Unfortunately, citations don’t help much here. Imagine we want to compare 2 journals. We could count the zero-cited articles in them, that might be better than nothing, but this wouldn’t be a useful measurement because:

  1. We don’t know what articles were submitted to the journals. Maybe one made twice as many type 2 errors as the other, but if it received 10x as many low-quality submissions - that would mean it has the better type 2 error-rate.
  2. Added to that, citations are biased in a number of ways, so comparing like-for-like between 2 journals is impossible.

Worse still, accepting low-quality (zero-cited) papers doesn’t have a big effect on your impact factor, so the incentive to enforce high standards is low — and this is particularly true for high impact-factor journals.

  • So, what do citations tell us about type 2 error-rates in peer-review? Nothing. Nothing at all. In fact, there is no accepted metric for how good a journal’s peer-review is. Isn’t that odd? Peer-review is the de facto qualifier for all of science and we have no way to measure it’s failure rate.

Assessing assessment

For decades now, we have relied on an opaque method of research assessment. Peer-review is carried out in secret and there has been little in the way of data, research, and innovation to understand and improve the system. That’s odd, because there’s obvious value for everyone in doing so, and it’s a costly system.

Some journals are now publishing referee reports (in some cases even for articles that get negative reviews — potentially allowing for some measurement of those error rates). And there are some organisations gathering data on peer-review. This is encouraging if we want to properly evaluate the review process.

Change is hard. Authors are often put-off submitting to these journals by the risk of a negative review, or by low impact factors of new entrants with new ideas.

This is a pity, because I think there’s a lot more we can do to improve.

Written by

Data scientist working in research communication. #webapps #python #machinelearning #ai

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store