The most surprising thing I’ve learned from overseeing the peer-review process is that science is an inherently subjective pursuit.
I’d always known science as the realm of the cold-hard-fact, the empirical observation and of steadfast, unyielding logic (of course it is!). But I’ve learned that, as long as science is a human endeavour, opinion is there too.
Don’t believe me? Imagine you are asked to referee a paper which rests on certain assumptions. How do you feel about those?
Data science in the realm of scholarly publishing is challenging for this reason: subjectivity finds its way into the data, too.
Citations: the currency of science
Citations are the go-to unit of data for measuring the quality of scientific research. You know how this goes:
- When an author writes a paper, they mention other papers they have read.
- These mentions are called “citations” and they are like a vote for the other paper.
- Naively, counting these citations seems like a simple, handy way to measure the quality of content. Lots of votes means it’s a good paper, right?
Citations matter because scientists are often measured on the quality of their work using citation metrics. Job offers and funding rest on citations. In this sense, citations actually do translate into real currency.
I once ran a project to try to predict citations for research papers. It turns out that you can predict citations very well if you have enough of the right data, but what’s most interesting to me is which data is most effective in predicting the citations of a paper.
- Authorship tops the list. If you find a paper written by an author who has already received a lot of citations, you can bet that paper will also be cited (See Fig1, Table 2)
Authorship correlates with citations for a number of reasons, but I am certain that we see bias in the data here. Furthermore, bias isn’t just limited to citations that DO happen. Lesser-known authors in lesser-known journals might not get cited due to lack of visibility and awareness as well as their peers’ preference for citing others.
To be clear: I am not suggesting that there is widespread dishonesty in citation practices, just that the culture of citation leads us to a situation where the data is of limited use. (Citation abuse IS a thing, but that’s not what I am referring to here.)
I’ll be blunt: I dislike citation analysis. It’s often based on the assumption that this data is a sound and objective measure of quality. I don’t feel good about that assumption.
Maybe there are some data sources that can allow us to overcome the bias in citation data. Altmetrics are certainly interesting and it is heartening to see initiatives like scite.ai which analyses the context of citations in the literature.
However, I think that any measure that is used to quantify scientific quality will likely become biased due to Goodhardt’s law:
When a measure becomes a target, it ceases to be a good measure
As above, if authors know that they are being measured on anything in the literature, it will affect the literature — potentially to the degree that the measure is no longer useful.
No one likes a whiner
I don’t want to just complain about a problem here and not suggest a solution.
When it comes to measuring quality, perhaps the most obvious alternative to citation data is something I’ve already alluded to: the peer-review system.
- The peer-review system seems ideal. I mean, it’s SUPPOSED to measure quality. That’s what it’s for.
- Indeed, there are numerous efforts underway to open up peer-review data. Some publishers allow authors to publish the peer-review history for their papers and there are also a number of standalone peer-review services out there.
- I have always preferred the idea of a small number of experts performing careful review vs a large number of people voting. To me, this seems more fair.
This is great! It means that data which allows us to measure research quality could be available soon. But it leaves me with one concern: what’s to stop similar bias arising in peer-review if it starts to become a system for measuring scientific quality?
A few moments ago, you were browsing Twitter (I guess) and you clicked on a link to come and read this blog post (thank you!). When you did so, your browser sent a message to this website to tell it that you came from Twitter. That message said: “referer : twitter.com”. Do you see the problem? When the protocol was written, there was a spelling mistake in the word “referrer”. It has never been fixed.
When we create new things, we run the risk of setting our mistakes in stone. If the value of open peer review is to be realised, the output needs to be standardised carefully and early. Right now, we have no protocol, but we do have an opportunity to build a better method for measuring the quality of scientific research. We should seize that.