It has been reported in numerous places that the proportion of journal submissions with female authors has reduced since the start of COVID restrictions in March (1,2,3,4).

A common explanation proposed for this is that women have taken on greater domestic responsibilities during these times and this has prevented them from doing research. This may indeed be true, but I think there is something else going on here.

There’s a well-known fact among academic publishers which is that, as soon as COVID lockdown started across the globe in early March, the number of research papers being submitted for peer-review surged. A recent preprint reported that Elsevier saw increases as high as 58% over the previous year. … has a few features which I’ve covered in past blog posts, but the latest feature is a sign-in function.

Image for post
Image for post
Image from Pixabay by user: Tumisu
  • Any ORCID member can click a button to sign-up
  • Once you’ve done this, arXangel will check your ORCID history, read all of your papers and will use those to figure out what you like to read.
  • Then, the following day, you can log in and find a feed of new preprints in your area of interest.

There are lots of services out there for searching for preprints or even for building recommendations (e.g. input some keywords, or pick a list of related papers and get given a feed of new preprints), but I can’t think of another one that does it with just 1 or 2 clicks. …

Image for post
Image for post
A scientist forms an opinion (Public domain image from Pixabay)

The most surprising thing I’ve learned from overseeing the peer-review process is that science is an inherently subjective pursuit.

I’d always known science as the realm of the cold-hard-fact, the empirical observation and of steadfast, unyielding logic (of course it is!). But I’ve learned that, as long as science is a human endeavour, opinion is there too.

Don’t believe me? Imagine you are asked to referee a paper which rests on certain assumptions. How do you feel about those?

Data science in the realm of scholarly publishing is challenging for this reason: subjectivity finds its way into the data, too.

Citations: the currency of science

Citations are the go-to unit of data for measuring the quality of scientific research. …

If you are editing a journal and your journal has a lot of overlap with ArXiv, then can offer a few useful services.

Finding new content

It’s common practice for editors to search ArXiv for new preprints that fit their journal. If a preprint looks interesting, then the editor might invite the authors to submit the preprint to their journal.

With ArXangel, you no longer have to perform that search manually. ArXangel can show you suitable preprints for your journal in a feed.

E.g. Here is the feed for a journal called Neurocomputing

This list shows articles which are similar to articles published in Neurocomputing in the past. …

It’s been a while since the last update on ArXangel. The main reason for this being that I was very busy updating ArXangel. The site has a few new features which I think might be of interest.

ArXangel Articles

The old standard functionality still exists. You can still use the site to:

One thing to mention is that, at the time of writing, you can search for any ArXiv preprint on the ‘articles’ route of ArXangel, however the results come from a historic dataset. So if you want to research a very new topic (like COVID-19, for example, this isn’t the best place to do so). …

In a recent post, I introduced ArXangel — a hobby project of mine which recommends referees for arXiv preprints. It’s a very simple application. All it does is takes some arXiv preprint, finds similar published papers and then lists the authors of those papers as potential referees. This is what might be called a ‘high accuracy’ approach to recommendation in that our approach is only supposed to find people with the right expertise and ignores other considerations.

That’s what we want, right? The right expertise?

This might sound like a stupid question to anyone who hasn’t spent a lot of time managing peer-review. Presumably, all you need to do is find the people with the most relevant expertise in that particular field and ask them to review, right? So, if ArXangel is finding people with the right expertise, it should be giving us an ideal list of reviewers, shouldn’t it? …

Image for post
Image for post
The Greek letter “chi”. It is illegal to build any service for preprints without using this letter.

When I started out in publishing, the first job I was given was searching for scientists and asking if they would volunteer to peer-review the thousands of research papers that were coming into my work-queue. The scientists were always a pleasure to work with, but it was tough, repetitive work and the automation tools available were not much help.

At some point, it occurred to me that there might be a way to match referees to papers without having to do any work at all. Laziness is a wonderful motivator, so I went ahead and wrote what was perhaps the worst algorithm I could have come up with at the time. …

In a recent blog post, we saw how we can use data to define a semantic space for covid-19-related papers and get a nice visualisation of that space, like this:

Image for post
Image for post
Coronavirus-related papers are red and non-coronavirus papers are green

You can see that the coronavirus papers (which come from the CORD-19 dataset by Semantic Scholar) and the other papers (a random sample of PubMed) occupy quite different regions of the space. That’s good, because it means that we can build a machine-learning classifier to discriminate between those 2 datasets.

That classifier is essentially a tool which can assign a probability that any document in this space is relevant to coronavirus. …

One thing that helps with initial exploration of data is to visualise the data and look for clusters and other patterns. It is relatively trivial to do.

Semantic Scholar kindly released a dataset of COVID-19 related papers (called ‘CORD-19’) on 13 March 2020.

Unfortunately, visualising the CORD-19 data on its own does not put the data into context. For that, we need to compare with a much larger dataset. Most of the world’s medical science journals (at least, the good ones) are indexed by a service called PubMed. …

EDIT: since writing this post, AllenAI’s Semantic Scholar have posted a customizable feed of COVID-19 research as well as a datadump of past research. This is essentially the ‘ideal thing’ I’m describing below.

Do you know what rhabdomyolysis is? … No? Neither do I…

Many publishers are making literature relating to the ongoing coronavirus outbreak free-to-read. My employer, SAGE, is no exception and we’re actively promoting related content from our journals.

But which literature is related? We can’t just search for papers on ‘coronavirus’, since many relevant papers don’t use that word. Take papers on past pandemics such as the recent MERS outbreak or the Spanish Flu pandemic of 1918–1919. Are those relevant? …


Adam Day

Data scientist working in research communication. #webapps #python #machinelearning #ai

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store