Overview

Najko Jahn

2023-08-21

What is searched?

Europe PMC is a repository of life science literature. Europe PMC ingests all PubMed content and extends its index with other literature and patent sources.

For more background on Europe PMC, see:

https://europepmc.org/About

Levchenko, M., Gou, Y., Graef, F., Hamelers, A., Huang, Z., Ide-Smith, M., … McEntyre, J. (2017). Europe PMC in 2017. Nucleic Acids Research, 46(D1), D1254–D1260. https://doi.org/10.1093/nar/gkx1005

How to search Europe PMC with R?

This client supports the Europe PMC search syntax. If you are unfamiliar with searching Europe PMC, check out the Europe PMC query builder, a very nice tool that helps you to build queries. To make use of Europe PMC queries in R, copy & paste the search string to the search functions of this package.

In the following, some examples demonstrate how to search Europe PMC with R.

Managing search results

By default, 100 records are returned, but the number of results can be expanded or limited with the limit parameter.

europepmc::epmc_search('"Human malaria parasites"', limit = 10)
#> # A tibble: 10 × 28
#>    id     source pmid  doi   title authorString journalTitle issue journalVolume
#>    <chr>  <chr>  <chr> <chr> <chr> <chr>        <chr>        <chr> <chr>        
#>  1 37452… MED    3745… 10.1… Mole… Lazrek Y, F… Sci Rep      1     13           
#>  2 37277… MED    3727… 10.1… Sexu… Harris CT, … Nat Microbi… 7     8            
#>  3 36777… MED    3677… 10.3… A no… Das R, Vash… Front Vet S… <NA>  10           
#>  4 37365… MED    3736… 10.2… Virt… Yasir M, Pa… Curr Comput… <NA>  <NA>         
#>  5 37454… MED    3745… 10.1… Simi… Fornace KM,… Lancet Infe… <NA>  <NA>         
#>  6 PPR66… PPR    <NA>  10.1… Gene… Suárez-Cort… <NA>         <NA>  <NA>         
#>  7 37121… MED    3712… 10.1… The … Thompson TA… Trends Para… 7     39           
#>  8 36007… MED    3600… 10.1… Bulk… Li X, Kumar… Parasitol I… <NA>  91           
#>  9 PPR55… PPR    <NA>  10.1… A co… Zhang X, Fl… <NA>         <NA>  <NA>         
#> 10 36495… MED    3649… 10.1… A ra… Dong L, Li … Clin Chim A… <NA>  539          
#> # ℹ 19 more variables: pubYear <chr>, journalIssn <chr>, pageInfo <chr>,
#> #   pubType <chr>, isOpenAccess <chr>, inEPMC <chr>, inPMC <chr>, hasPDF <chr>,
#> #   hasBook <chr>, hasSuppl <chr>, citedByCount <int>, hasReferences <chr>,
#> #   hasTextMinedTerms <chr>, hasDbCrossReferences <chr>, hasLabsLinks <chr>,
#> #   hasTMAccessionNumbers <chr>, firstIndexDate <chr>,
#> #   firstPublicationDate <chr>, pmcid <chr>

Results are sorted by relevance. Other options via the sort parameter are

Search by DOIs

Sometimes, you would like to check, if articles are indexed in Europe PMC using DOI names, a widely used identifier for scholarly articles. Use epmc_search_by_doi() for this purpose.

my_dois <- c(
  "10.1159/000479962",
  "10.1002/sctm.17-0081",
  "10.1161/strokeaha.117.018077",
  "10.1007/s12017-017-8447-9"
  )
europepmc::epmc_search_by_doi(doi = my_dois)
#> # A tibble: 4 × 28
#>   id      source pmid  doi   title authorString journalTitle issue journalVolume
#>   <chr>   <chr>  <chr> <chr> <chr> <chr>        <chr>        <chr> <chr>        
#> 1 289578… MED    2895… 10.1… Clin… Schnieder M… Eur Neurol   5-6   78           
#> 2 289413… MED    2894… 10.1… Conc… Doeppner TR… Stem Cells … 11    6            
#> 3 290181… MED    2901… 10.1… One-… Psychogios … Stroke       11    48           
#> 4 286236… MED    2862… 10.1… Defe… Carboni E, … Neuromolecu… 2-3   19           
#> # ℹ 19 more variables: pubYear <chr>, journalIssn <chr>, pageInfo <chr>,
#> #   pubType <chr>, isOpenAccess <chr>, inEPMC <chr>, inPMC <chr>, hasPDF <chr>,
#> #   hasBook <chr>, hasSuppl <chr>, citedByCount <int>, hasReferences <chr>,
#> #   hasTextMinedTerms <chr>, hasDbCrossReferences <chr>, hasLabsLinks <chr>,
#> #   hasTMAccessionNumbers <chr>, firstIndexDate <chr>,
#> #   firstPublicationDate <chr>, pmcid <chr>

Output options

By default, a non-nested data frame printed as tibble is returned. Other formats are output = "id_list" returning a list of IDs and sources, and output = “‘raw’”” for getting full metadata as list. Please be aware that these lists can become very large.

More advanced options to search Europe PMC

Annotations

Europe PMC provides text-mined annotations contained in abstracts and open access full-text articles.

These automatically identified concepts and term can be retrieved at the article-level:

europepmc::epmc_annotations_by_id(c("MED:28585529", "PMC:PMC1664601"))
#> # A tibble: 724 × 13
#>    source ext_id   pmcid    prefix exact postfix name  uri   id    type  section
#>    <chr>  <chr>    <chr>    <chr>  <chr> <chr>   <chr> <chr> <chr> <chr> <chr>  
#>  1 MED    28585529 PMC5467… "tive… Beta… " allo… Beta… http… http… Clin… Title …
#>  2 MED    28585529 PMC5467… "nomi… genes ".\nRa… gene  http… http… Sequ… Title …
#>  3 MED    28585529 PMC5467… "nomi… genes " is o… gene  http… http… Sequ… Abstra…
#>  4 MED    28585529 PMC5467… " One… genes " are … gene  http… http… Sequ… Abstra…
#>  5 MED    28585529 PMC5467… " ide… beet  " (Bet… Beta… http… http… Clin… Abstra…
#>  6 MED    28585529 PMC5467… "ify … Beta… " ssp.… Beta… http… http… Clin… Abstra…
#>  7 MED    28585529 PMC5467… "ulga… gene  " Rz2 … gene  http… http… Sequ… Abstra…
#>  8 MED    28585529 PMC5467… "e ge… geno… " sequ… geno… http… http… Sequ… Abstra…
#>  9 MED    28585529 PMC5467… "eque… beet  ". Our… Beta… http… http… Clin… Abstra…
#> 10 MED    28585529 PMC5467… "disc… genes " rele… gene  http… http… Sequ… Abstra…
#> # ℹ 714 more rows
#> # ℹ 2 more variables: provider <chr>, subType <chr>

To obtain a list of articles where Europe PMC has text-minded annotations, either subset the resulting data.frame

tt <- epmc_search("malaria")
tt[tt$hasTextMinedTerms == "Y" | tt$hasTMAccessionNumbers == "Y",]
#> # A tibble: 76 × 28
#>    id           source pmid    pmcid doi   title authorString journalTitle issue
#>    <chr>        <chr>  <chr>   <chr> <chr> <chr> <chr>        <chr>        <chr>
#>  1 36419237     MED    364192… PMC9… 10.1… Path… Walker IS, … Virulence    1    
#>  2 37158217     MED    371582… PMC1… 10.1… Mobi… Kollipara A… Glob Health… 1    
#>  3 37310126     MED    373101… PMC1… 10.1… Clin… Bi D, Huang… Ann Med      1    
#>  4 37459385     MED    374593… <NA>  10.1… A co… Eisenberg S… Glob Health… 1    
#>  5 36871259     MED    368712… PMC9… 10.1… Asse… Jantausch B… Med Educ On… 1    
#>  6 37053493     MED    370534… <NA>  10.1… Opti… Kalula A, M… J Biol Dyn   1    
#>  7 37191627     MED    371916… PMC1… 10.1… Huma… Ellis R, We… Hum Vaccin … 1    
#>  8 37165851     MED    371658… PMC1… 10.1… Tria… Cho Y, Awoo… Glob Health… 1    
#>  9 37074313     MED    370743… PMC9… 10.1… Deng… Asaga Mac P… Ann Med      1    
#> 10 IND607962262 AGR    <NA>    <NA>  <NA>  Effe… Ojueromi OO… Journal of … 11   
#> # ℹ 66 more rows
#> # ℹ 19 more variables: journalVolume <chr>, pubYear <chr>, journalIssn <chr>,
#> #   pageInfo <chr>, pubType <chr>, isOpenAccess <chr>, inEPMC <chr>,
#> #   inPMC <chr>, hasPDF <chr>, hasBook <chr>, hasSuppl <chr>,
#> #   citedByCount <int>, hasReferences <chr>, hasTextMinedTerms <chr>,
#> #   hasDbCrossReferences <chr>, hasLabsLinks <chr>,
#> #   hasTMAccessionNumbers <chr>, firstIndexDate <chr>, …

or expand the query choosing an annotation type or provider from the Europe PMC Advanced Search query builder.

epmc_search('malaria AND (ANNOTATION_TYPE:"Cell") AND (ANNOTATION_PROVIDER:"Europe PMC")')
#> # A tibble: 100 × 28
#>    id     source pmid  doi   title authorString journalTitle issue journalVolume
#>    <chr>  <chr>  <chr> <chr> <chr> <chr>        <chr>        <chr> <chr>        
#>  1 30925… MED    3092… 10.1… Cong… Fatima S, S… Pediatr Eme… 12    37           
#>  2 31808… MED    3180… 10.1… Reti… Villaverde … J Pediatric… 5     9            
#>  3 31782… MED    3178… 10.1… Incr… Jongo SA, C… Clin Infect… 11    71           
#>  4 30989… MED    3098… 10.1… Clin… Enane LA, S… J Pediatric… 3     9            
#>  5 31300… MED    3130… 10.1… Blac… Opoka RO, W… Clin Infect… 11    70           
#>  6 31505… MED    3150… 10.1… Acut… Oshomah-Bel… J Trop Pedi… 2     66           
#>  7 31687… MED    3168… 10.1… Eval… Ferdinand D… Trans R Soc… 3     114          
#>  8 31693… MED    3169… 10.1… Redu… Kingston HW… J Infect Dis 9     221          
#>  9 31843… MED    3184… 10.1… Arte… Pull L, Lup… Malar J      1     18           
#> 10 31864… MED    3186… 10.1… Unde… Adhikari SR… Malar J      1     18           
#> # ℹ 90 more rows
#> # ℹ 19 more variables: pubYear <chr>, journalIssn <chr>, pageInfo <chr>,
#> #   pubType <chr>, isOpenAccess <chr>, inEPMC <chr>, inPMC <chr>, hasPDF <chr>,
#> #   hasBook <chr>, hasSuppl <chr>, citedByCount <int>, hasReferences <chr>,
#> #   hasTextMinedTerms <chr>, hasDbCrossReferences <chr>, hasLabsLinks <chr>,
#> #   hasTMAccessionNumbers <chr>, firstIndexDate <chr>,
#> #   firstPublicationDate <chr>, pmcid <chr>

Data integrations

Another nice feature of Europe PMC is to search for cross-references between Europe PMC to other databases. For instance, to get publications cited by entries in the Protein Data bank in Europe published 2016:

europepmc::epmc_search('(HAS_PDB:y) AND FIRST_PDATE:2016')
#> # A tibble: 100 × 28
#>    id       source pmid     pmcid    doi   title authorString journalTitle issue
#>    <chr>    <chr>  <chr>    <chr>    <chr> <chr> <chr>        <chr>        <chr>
#>  1 28039433 MED    28039433 PMC5255… 10.1… Stru… Su HP, Rick… Proc Natl A… 3    
#>  2 28036383 MED    28036383 PMC5201… 10.1… Stru… Kovaľ T, Øs… PLoS One     12   
#>  3 27977122 MED    27977122 <NA>     10.1… Comp… De Deurwaer… ACS Chem Ne… 5    
#>  4 28144358 MED    28144358 PMC5238… 10.3… Bioc… Ulrich V, B… Beilstein J… <NA> 
#>  5 28028551 MED    28028551 <NA>     10.1… Stru… Zhou Z, Liu… Appl Microb… 7    
#>  6 27958736 MED    27958736 <NA>     10.1… Glyc… Hamark C, B… J Am Chem S… 1    
#>  7 27959534 MED    27959534 PMC6634… 10.1… Stru… Reed AJ, Vy… J Am Chem S… 1    
#>  8 28083536 MED    28083536 PMC5183… 10.3… Conf… Paoletti F,… Front Mol B… <NA> 
#>  9 28024148 MED    28024148 <NA>     10.1… Solu… Bibow S, Po… Nat Struct … 2    
#> 10 28031486 MED    28031486 PMC5255… 10.1… Stru… Sevrioukova… Proc Natl A… 3    
#> # ℹ 90 more rows
#> # ℹ 19 more variables: journalVolume <chr>, pubYear <chr>, journalIssn <chr>,
#> #   pageInfo <chr>, pubType <chr>, isOpenAccess <chr>, inEPMC <chr>,
#> #   inPMC <chr>, hasPDF <chr>, hasBook <chr>, hasSuppl <chr>,
#> #   citedByCount <int>, hasReferences <chr>, hasTextMinedTerms <chr>,
#> #   hasDbCrossReferences <chr>, hasLabsLinks <chr>,
#> #   hasTMAccessionNumbers <chr>, firstIndexDate <chr>, …

The following sources are supported

To retrieve metadata about these external database links, use europepmc_epmc_db().

Citations and reference sections

Europe PMC let us also obtain citation metadata and reference sections. For retrieving citation metadata per article, use

europepmc::epmc_citations("9338777", limit = 500)
#> # A tibble: 240 × 11
#>    id       source citationType   title authorString journalAbbreviation pubYear
#>    <chr>    <chr>  <chr>          <chr> <chr>        <chr>                 <int>
#>  1 36883860 MED    research supp… "Iso… Rodrigues C… J Virol                2023
#>  2 36790562 MED    review; journ… "Por… Liu Y, Niu … Funct Integr Genom…    2023
#>  3 36417007 MED    research-arti… "Hum… Lowe JWE.    Hist Philos Life S…    2022
#>  4 35729348 MED    research supp… "Det… Ishihara S,… Sci Rep                2022
#>  5 35437972 MED    research-arti… "Sca… Chen JQ, Zh… Zool Res               2022
#>  6 34834962 MED    im; research … "Por… Denner J.    Viruses                2021
#>  7 34578447 MED    im; research … "Hig… Denner J, S… Viruses                2021
#>  8 33353186 MED    im; review-ar… "Xen… Galow AM, G… Int J Mol Sci          2020
#>  9 31565893 MED    research-arti… "Reg… Chung HC, N… J Vet Sci              2019
#> 10 30230709 MED    research supp… "Bio… Legallais C… Adv Healthc Mater      2018
#> # ℹ 230 more rows
#> # ℹ 4 more variables: volume <chr>, issue <chr>, pageInfo <chr>,
#> #   citedByCount <int>

For reference section from an article:

europepmc::epmc_refs("28632490", limit = 200)
#> # A tibble: 169 × 19
#>    id       source citationType    title  authorString journalAbbreviation issue
#>    <chr>    <chr>  <chr>           <chr>  <chr>        <chr>               <chr>
#>  1 12002480 MED    JOURNAL ARTICLE Tricl… Adolfsson-E… Chemosphere         9-10 
#>  2 18795164 MED    JOURNAL ARTICLE In vi… Ahn KC, Zha… Environ Health Per… 9    
#>  3 18556606 MED    JOURNAL ARTICLE Effec… Aiello AE, … Am J Public Health  8    
#>  4 17683018 MED    JOURNAL ARTICLE Consu… Aiello AE, … Clin Infect Dis     <NA> 
#>  5 15273108 MED    JOURNAL ARTICLE Relat… Aiello AE, … Antimicrob Agents … 8    
#>  6 18207219 MED    JOURNAL ARTICLE The i… Allmyr M, H… Sci Total Environ   1    
#>  7 17007908 MED    JOURNAL ARTICLE Tricl… Allmyr M, A… Sci Total Environ   1    
#>  8 26948762 MED    JOURNAL ARTICLE Press… Alvarez-Riv… J Chromatogr A      <NA> 
#>  9 23192912 MED    JOURNAL ARTICLE Expos… Anderson SE… Toxicol Sci         1    
#> 10 25837385 MED    JOURNAL ARTICLE Obser… Vladar EK, … Methods Cell Biol   <NA> 
#> # ℹ 159 more rows
#> # ℹ 12 more variables: pubYear <int>, volume <chr>, pageInfo <chr>,
#> #   citedOrder <int>, match <chr>, issn <chr>, essn <chr>,
#> #   publicationTitle <chr>, publisherLoc <chr>, publisherName <chr>,
#> #   externalLink <chr>, doi <chr>

Fulltext access

Europe PMC gives not only access to metadata, but also to full-texts. Adding AND (OPEN_ACCESS:y) to your search query, returns only those articles where Europe PMC has also the fulltext.

Fulltext as xml document can accessed via the PMID or the PubMed Central ID (PMCID):

europepmc::epmc_ftxt("PMC3257301")
#> {xml_document}
#> <article article-type="research-article" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML">
#> [1] <front>\n  <journal-meta>\n    <journal-id journal-id-type="nlm-ta">PLoS  ...
#> [2] <body>\n  <sec id="s1">\n    <title>Introduction</title>\n    <p>Atmosphe ...
#> [3] <back>\n  <ack>\n    <p>We would like to thank Dr. C. Gourlay and Dr. T.  ...