Sizing the journals market with CrossRef

Adam Day
5 min readJan 25, 2021

Let’s do a thought experiment. Imagine there are just 2 journal publishers in the world: P1 and P2.

In year 1:

  • P1 publishes 100 articles
  • P2 publishes 0 articles

Then, in year 2:

  • P1 publishes 9,900 articles.
  • P2 publishes 100 articles

Here’s the weird thing. P1 has grown very rapidly in absolute terms, but the market share of P1 has declined from 100% to 99%. This might seem a little odd, but it’s really just a feature of any market where new entrants are common.

  • New entrants will always take some market share (unless they publish nothing).
  • Large publishers have a high probability of losing market share.
  • It’s also important to remember that, when we look at a rapidly growing (or rapidly changing) market, we are not comparing apples to apples when we compare one year to another. The market is a different thing in one year to another.

With all of this in mind, it might be interesting to look at some industry data from CrossRef.

A big noise

First, a reminder for readers of my blog. This graph shows approximate growth of published research articles. In fact, it shows total CrossRef DOI registrations per-year. CrossRef data is noisy, but this is a reasonable proxy for the number of published articles each year. Note that the rate of published articles is very high and increasing very rapidly.

Scientific-paper production is low until the end of World War 2. It then picks up until 2000 where the growth rate appears to increase again — perhaps due to the increased ease in publishing online compared with print?

At the same time, the number of new-entrants has grown very swiftly. (I’ve included some caveats in an appendix, but for the sake of this blog post I’m considering this to be a proxy for the number of active publishers in each year.)

Note the drop in 2020: I suspect that this is because some small publishers have not yet got round to registering DOIs from last year with CrossRef. So it appears that we will have missing data for recent events.

The obvious

Despite the limitations of the data, there are 2 super-obvious long-term trends here:

  • Rapid growth in research papers being published
  • Rapid growth in the number of organisations publishing research.

The less-obvious

This leads to a surprising effect: almost all large publishers have declining market shares despite experiencing rapid growth. Elsevier, for example, has grown in absolute terms more than any other publisher, but it has a declining percentage of the global total (and has done for 20 years).

  • This decline in market share seems to begin around the year 2000
  • It also seems to be partly caused by new entrants and small publishers. As a group, the bottom 90% have had a growing market share over most of the last decade. (The cut-off changes constantly, but right now, the bottom 90% of publishers all publish fewer than 400 article each year. So it appears that very small operations are influencing this change.)
The thin rainbow-coloured lines at the bottom of this image show the proportion of DOIs registered by the bottom 90% of CrossRef members. Remember: year-to-year, this is apples-to-oranges. The market is very different in 2020 compared to 2000.

I should emphasise that declining market share is not the same as a declining output or a change in rank. Large publishers do not often change their ranking significantly. It is hard to visualise this with CrossRef data because growth often happens through mergers & acquisitions which aren’t shown in CrossRef data.

The key thing to take away here is that this is a market with

  • a small number of large, rapidly-growing, organisations and
  • a lot of small new entrants
  • interestingly, this means that those large organisations experience a loss of market share.

About 2020…

You might have noticed that 2020 stands out as a little different in the above charts.

  • Elsevier seems to have regained some market share (indeed we see this same thing when we look at other large publishers individually as well)
  • The top 10% of publishers appear to have generally recovered market share, too.
  • Recall that journal submissions surged in 2020.

The covid-surge no-doubt explains some of the growth in the output of publishers in 2020. However, given that we have seen that a lot of small publishers do not (yet) appear in the 2020 data, I think it’s too early to say if there is a new trend in market share here. It may simply be that we are counting the growth in large publishers first.

Nevertheless, it will be interesting to watch this space. The ongoing transition to Open Access has led to the creation of a number of novel Open-Access deals between large publishers and their customers. This might mean a change to the trends we see above.

APPENDIX: the caveats

I’m always careful to point out the limitations of any analysis where I see them. CrossRef data is very noisy and, consequently, extracting insights from it is hard. Below, I will list some of these issues which might help to contextualise the blog post above.

  • Not all research articles get CrossRef DOIs, so we know the data is incomplete.
  • I have used ‘DOI-registrations’ and ‘published articles’ interchangeably, but these things are not equivalent to one another. Not all DOIs represent unique research articles. E.g. sometimes when a publisher acquires a journal, they re-register the DOIs. This potentially means that some articles have multiple DOIs.
  • I also used ‘CrossRef members’ and ‘publishers’ interchangeably. The same caveat applies. Not all CrossRef members are publishers. There are digital libraries, preprint servers, conference proceedings, predatory publishers and other entities registering content on CrossRef. In some cases, DOIs registered by these organisations should not be counted as unique research articles, but I haven’t found a way to filter those out reliably.
  • The graph showing the count of CrossRef members shows data as it is now, so in cases where a publisher has been bought by another publisher, it appears as though the publisher that was bought never existed at all (all of the content being assigned to the purchasing publisher). With that in mind, this graph does not show new entrants well. It’s another proxy: the total number of CrossRef members active in each year.
  • The dates assigned to DOI-registrations above appear to be the earliest publication date associated with the article. In some cases, this is wildly wrong. E.g. in the years after Wiley and Blackwell merged, the DOIs assigned by Wiley surged sharply. My best guess is that DOIs were issued for Blackwell’s archival content and the incorrect publication date was applied.
  • We see other unexplained spikes and dips throughout CrossRef data. My assumption is that when we look at long term global trends, these sources of noise average out to some degree, but I can see some unexplained features even at a high level.

--

--

Adam Day

Creator of Clear Skies, the Papermill Alarm and other tools clear-skies.co.uk #python #machinelearning #ai #researchintegrity