The COVID-surge in research papers: explaining the gender-disparity

It has been reported in numerous places that the proportion of journal submissions with female authors has reduced since the start of COVID restrictions in March (1,2,3,4).

A common explanation proposed for this is that women have taken on greater domestic responsibilities during these times and this has prevented them from doing research. This may indeed be true, but I think there is something else going on here.

There’s a well-known fact among academic publishers which is that, as soon as COVID lockdown started across the globe in early March, the number of research papers being submitted for peer-review surged. A recent preprint reported that Elsevier saw increases as high as 58% over the previous year. This is in agreement with the surge in submissions that we saw at SAGE (my employer).

Approximate submission totals made via SAGE’s ScholarOne implementation.

At the time, this seemed like good news. While many businesses were suffering as a result of the virus, ours was actually seeing an increase in activity. But, where were these papers coming from? Here are some obvious possible explanations:

  • The new papers were about COVID-19 and so the virus caused some acceleration in scientific output.
  • Lockdown gave academics more time to write papers resulting in an increase in scientific production.
  • The papers were already written and just waiting to be submitted somewhere. In fact, they were predominantly rejected articles i.e. articles which had been written some time ago, submitted to journals, rejected, and then awaiting submission to some other venue.

While the first 2 of these explanations are no-doubt true to an extent, I believe that the third explanation accounts for most of the surge.

My $0.02

The best explanation I can see here is that, lockdown didn’t significantly increase the rate of scientific production, but it did give researchers an opportunity to submit a backlog of rejected articles.

  • There wasn’t really time to start writing new papers between the start of March and the start of the COVID-surge, the effect seems too quick to be explainable by new research.
  • This explanation accounts for the rise in submissions and something else: a rise in rejection rate. It appears that the COVID-surge included a lot of lower-quality papers.
A proxy for SAGE’s rejection rate. (Very noisy data, but you can see the rise in rejection-rate from April onwards)

Furthermore, if we look at submissions to ArXiv, there is no COVID-surge. Nothing close to the ~50% rise in submissions as seen at Elsevier and SAGE. ArXiv, as a preprint server, more closely represents the rate of scientific production than journal submissions do.

(I think that there is maybe some growth in ArXiv submissions after June 2020, but bearing in mind that a percentage of ArXiv is post-print, this could be caused by COVID-surge papers being archived after peer-review.)

This raises some new questions

If women’s scientific production has fallen during COVID times, I think we would see a change in the trend in submissions to ArXiv. But perhaps it’s just too small to see, or obscured somehow.

Nevertheless, it seems more reasonable to conclude that we see a drop in the proportion of papers written by women that are being submitted to journals for some other reason. This also seems to be consistent with thorough research findings like these. But why?

  • Frankly, I don’t know. To hazard a guess: since men often have longer careers in science, then perhaps they have more rejected articles sitting around waiting to be re-submitted. This seems to explain the change in the proportion of submissions to journals from women and also agrees with the other data shown here.
  • But, as I say, it’s just a guess based on incomplete data. Maybe there’s another reason why we see a gender disparity in the COVID-surge. What do you think?

Rejected articles are interesting, aren’t they? My team at SAGE recently released an open-source Python package for tracking rejected articles. At some point, if I have time, I will use this package to check the average time between rejection and publication during COVID-times. If I’m right, then we’ll see a drop in the average time to publication for rejected articles.

Appendix: Some other data on publishers’ output

I did wonder if I was seeing a difference between ArXiv and Elsevier because they cover different topic areas. ArXiv is more physicsy and Elsevier is more health-sciences. However, what we see across most publishers is a rise in the rate of DOI-registrations (a proxy for publications) a few months after the start of lockdown. This would be consistent with those publishers experiencing a surge in submissions beginning in March and this being reflected in publications later in the year (due to the delay caused by peer-review). We can see this in the publications of physics publishers like the American Physical Society and the American Institute of Physics. Beware that the data here can be very noisy.

APS growth accelerates in early 2020. Left: 12-month rolling average trend. Right: actual monthly totals.
While AIP Publishing is on a downward trend, you can see that the trend stalls in 2020

And here are some other publishers so that you can see that the COVID-surge was a wide-spread phenomenon and not limited to a small number of publishing houses.

SAGE, Elsevier and Springer — again all shown as both actual monthly totals and 12-month rolling average.

I also looked at IOP Publishing, the largest physics publisher (and a former employer of mine). However, I did not see an upswing in their output since March. That said, there are a lot of other things going on here.

  • IOP publishes a large proportion of conference proceedings (which are commissioned, so not likely to be rejected articles).
  • Also, there is significant recent growth which may also obscure the covid-surge.

Finally, I did take a quick look at genders of first names of authors on ArXiv submissions. The data is obviously horrendous — with the most common category being ‘unknown’. Better methods of determining gender from names are available, but I chose this one because it is quick. For what it’s worth, I see no change in the trendline for women’s names, but the data has a large error.

Data comes from the ‘gender-guesser’ Python package. Again, we see a rolling 12-month average

Data scientist working in research communication. #webapps #python #machinelearning #ai

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store