Let’s do a thought experiment. Imagine there are just 2 journal publishers in the world: P1 and P2.

In year 1:

  • P1 publishes 100 articles
  • P2 publishes 0 articles

Then, in year 2:

  • P1 publishes 9,900 articles.
  • P2 publishes 100 articles

Here’s the weird thing. P1 has grown very rapidly in absolute terms, but the market share of P1 has declined from 100% to 99%. This might seem a little odd, but it’s really just a feature of any market where new entrants are common.

  • New entrants will always take some market share (unless they publish nothing).
  • Large publishers have…


It has been reported in numerous places that the proportion of journal submissions with female authors has reduced since the start of COVID restrictions in March (1,2,3,4).

A common explanation proposed for this is that women have taken on greater domestic responsibilities during these times and this has prevented them from doing research. This may indeed be true, but I think there is something else going on here.

There’s a well-known fact among academic publishers which is that, as soon as COVID lockdown started across the globe in early March, the number of research papers being submitted for peer-review surged…


ArXangel.net has a few features which I’ve covered in past blog posts, but the latest feature is a sign-in function.

Image for post
Image for post
Image from Pixabay by user: Tumisu
  • Any ORCID member can click a button to sign-up
  • Once you’ve done this, arXangel will check your ORCID history, read all of your papers and will use those to figure out what you like to read.
  • Then, the following day, you can log in and find a feed of new preprints in your area of interest.

There are lots of services out there for searching for preprints or even for building recommendations (e.g. input some keywords, or pick a list…


Image for post
Image for post
A scientist forms an opinion (Public domain image from Pixabay)

The most surprising thing I’ve learned from overseeing the peer-review process is that science is an inherently subjective pursuit.

I’d always known science as the realm of the cold-hard-fact, the empirical observation and of steadfast, unyielding logic (of course it is!). But I’ve learned that, as long as science is a human endeavour, opinion is there too.

Don’t believe me? Imagine you are asked to referee a paper which rests on certain assumptions. How do you feel about those?

Data science in the realm of scholarly publishing is challenging for this reason: subjectivity finds its way into the data, too.

Citations: the currency of science


If you are editing a journal and your journal has a lot of overlap with ArXiv, then ArXangel.net can offer a few useful services.

Finding new content

It’s common practice for editors to search ArXiv for new preprints that fit their journal. If a preprint looks interesting, then the editor might invite the authors to submit the preprint to their journal.

With ArXangel, you no longer have to perform that search manually. ArXangel can show you suitable preprints for your journal in a feed.

E.g. Here is the feed for a journal called Neurocomputing https://arxangel.net/journal_feed/?journal_name=Neurocomputing

This list shows articles which are similar to…


It’s been a while since the last update on ArXangel. The main reason for this being that I was very busy updating ArXangel. The site has a few new features which I think might be of interest.

ArXangel Articles

The old standard functionality still exists. You can still use the site to:

One…


In a recent post, I introduced ArXangel — a hobby project of mine which recommends referees for arXiv preprints. It’s a very simple application. All it does is takes some arXiv preprint, finds similar published papers and then lists the authors of those papers as potential referees. This is what might be called a ‘high accuracy’ approach to recommendation in that our approach is only supposed to find people with the right expertise and ignores other considerations.

That’s what we want, right? The right expertise?

This might sound like a stupid question to anyone who hasn’t spent a lot of…


Image for post
Image for post
The Greek letter “chi”. It is illegal to build any service for preprints without using this letter.

When I started out in publishing, the first job I was given was searching for scientists and asking if they would volunteer to peer-review the thousands of research papers that were coming into my work-queue. The scientists were always a pleasure to work with, but it was tough, repetitive work and the automation tools available were not much help.

At some point, it occurred to me that there might be a way to match referees to papers without having to do any work at all. Laziness is a wonderful motivator, so I went ahead and wrote what was perhaps the…


In a recent blog post, we saw how we can use data to define a semantic space for covid-19-related papers and get a nice visualisation of that space, like this:

Image for post
Image for post
Coronavirus-related papers are red and non-coronavirus papers are green

You can see that the coronavirus papers (which come from the CORD-19 dataset by Semantic Scholar) and the other papers (a random sample of PubMed) occupy quite different regions of the space. That’s good, because it means that we can build a machine-learning classifier to discriminate between those 2 datasets.

That classifier is essentially a tool which can assign a probability that any document in this space is relevant to…


One thing that helps with initial exploration of data is to visualise the data and look for clusters and other patterns. It is relatively trivial to do.

Semantic Scholar kindly released a dataset of COVID-19 related papers (called ‘CORD-19’) on 13 March 2020.

Unfortunately, visualising the CORD-19 data on its own does not put the data into context. For that, we need to compare with a much larger dataset. Most of the world’s medical science journals (at least, the good ones) are indexed by a service called PubMed. …

Adam Day

Data scientist working in research communication. #webapps #python #machinelearning #ai

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store