In the context of open access scientific publishing, it is common that the Version of Record (VoR) includes the full text of the article as well as figures and tables. Scientific metadata about an article is no longer just the abstract, authors, and references. This full text, structured data that is permissively licensed will open up many workflows for scientific research. The full text is most often stored in an XML format which can be used to create all other views of this content (PDF, HTML). The structure of this XML varies between publisher, with the most used open standard being JATS (Journal Article Tag Suite), a NISO standard.
JATS defines a set of XML elements and attributes designed to represent journal articles in a single standard XML format. A JATS document is a single XML file that includes an
article, which has
front matter (authors, affiliations, title, abstract, funding), the
body of the article (all sections, text, references to figures/images, tables), and the
back matter (appendix sections, data-availability statements, conflict statements, acknowledgements, and the reference list). There can also be a number of
sub-articles (with their own stubbed, frontmatter) that can be used to store other documents in a project (reviews, responses, appendicies, or even computational notebooks!).
MyST can be used to both download, read, create the full JATS XML, using our tool called
jats-xml (see install instructions). To download a JATS file, for example, you can use the command-line interface:
jats download https://elifesciences.org/articles/81952 article.jats
You can also summarize a JATS article, either locally or remotely using a DOI or some URLs.
jats summary https://elifesciences.org/articles/81952
JATS for Journal Archiving and Interchange¶
There are three different tag-sets for JATS, as a lot of the content that we are aiming to work with requires computational notebooks and multiple articles we have chosen to focus MyST on the Journal Archiving and Interchange (v1.3). There is a lot of overlap between the tag-sets, with the Journal Archiving and Interchange tag-set being the most permissive and allowing, for example,
sub-articles. Try clicking the "JATS" button in the following demo.
The largest structural difference in body content is that the sections (
sec) are nested, which are accomplished as specific transforms for this export target.