How to use the Papermill Alarm API

Also… what is an API?

Adam Day
4 min readOct 17, 2022
API keys come in many forms. Source: wiki commons

The Papermill Alarm API, is a service which you can send some article metadata to and which will return an alert telling you if the paper looks like past papermill-products. Anyone can use it, but it definitely helps to have the support of an IT or data professional.

(By the way, API stands for “application programming interface”, but I think my definition is clearer!)

APIs like this are used all over the web to transmit data between systems. I’d like to tell you that using an API is as easy as using a typical webform, and I’ll describe a simple method for that in a moment. But the truth is that it can be even easier: it can be completely automated.

The easiest way to use any API is a 3 step process:

  1. Walk into the IT department of the publishing company you work for
  2. Shout loudly “HELLO! DOES ANYONE HERE KNOW HOW TO WORK AN API?”
  3. Observe hands being raised and pick the one you like best.

As a backup option, you might subtly rustle a pack of chocolate digestives, or jammie dodgers. Your IT team will know what that means.

Seriously though, using an API is second-nature to a lot of IT professionals. Publishers use APIs routinely to interact with Crossref, ORCID, ROR and others. So there should be someone in any organisation who can help. The Papermill Alarm even comes with code examples in multiple languages.

But we can actually use it without any code at all.

Before we start

Let’s sign up for a subscription to the Papermill Alarm API. You’ll notice there are 2 options.

  1. POST Batches
  2. POST Single documents

If you just want to try the Papermill Alarm out, then let’s go with single documents. If you are experienced with APIs, then you will want batch processing. Batches are faster and I’d recommend this option for someone with experience of using APIs.

Checking an individual document (no coding required!)

Now simply

  • under ‘Request Body’, select ‘Body’ and edit the document in there so that it has 3 keys: “id”, “title” & “abstract”

This might give us a document like this

{
“id”: “your_document_id”,
“title”: “This is not a title of a paper”,
“abstract”: “This is just an example piece of text. Not a real abstract.”
}

Note that the format here is ‘JSON’ and it is quite strict. We must use double quotes, not single quotes. There are colons between each ‘key’ and ‘value’ and then there’s a comma at the end of each line EXCEPT the last one!

Click ‘test endpoint’.

That was easy wasn’t it?

Checking documents in a fast, automated way

But this could be a lot of work for a large publishing house with 100s or 1000s of new submissions coming in each day. How should we use the Papermill Alarm API if we have a lot of documents to check?

This is actually exactly what the API is designed for and, while there wasn’t much work in the above single-document check, it is quite straightforward to check as many documents as we want in a completely automated way.

Imagine eating breakfast every morning smugly aware that a computer has already automatically flagged papermill-products coming into your work queue.

I’m providing demo Python code for this in a github repository. This example will show you how to check a large dataset. If you have a data source for your new submissions it should be straightforward to set up a scheduled task to check those and alert you when red alerts are triggered.

My example is intended for someone with experience of Python. But note that RapidAPI also provides code examples in a range of languages to suit the skills of any particular IT professional.

If you’d like to hear more about scholarly APIs for papermill detection, you should join me for this ConTech Live API series presentation. See you there!

--

--

Adam Day

Creator of Clear Skies, the Papermill Alarm and other tools clear-skies.co.uk #python #machinelearning #ai #researchintegrity