Update Maddison Data

library(MaddisonData)

Intro

This vignette describes how to download the latest Maddison Project data and create from it four data sets included in the package MaddisonData: MaddisonCountries, MaddisonData, MaddisonSources, and MaddisonYears.

The Wikipedia article on the Maddison Project says, “Development economist Branko Milanović (writing for the World Bank), development economist Morten Jerven, and billionaire philanthropist Bill Gates have identified the Maddison Project, the Penn World Tables, and World Bank/IMF data (the World Development Indicators), as the three main sources of worldwide economic statistics such as GDP data, with the focus of the Maddison Project being on historical data. Economist Paul Krugman has suggested the Maddison Project as a data source for historical debt, growth, and labor output and productivity data.”

Bolt and van Zanden (2020) say that, “The Maddison database on Historical Statistics of the World Economy has probably the widest coverage of data on GDP per capita across countries and over time currently available … . To compare income levels and developments for this period and set of countries, national income estimates are converted … to a common currency using purchasing power parities (PPPs).”

Get the data

The web site for Maddison project data offers both “Angus Maddison’s unaltered final dataset” and the “Latest Maddison Project Release”.

On 2025-08-28 I saw that the “Latest Maddison Project Release” was dated 2024-09-18. I requested that and found that it was available in either Excel or Stata format. I downloaded the Excel format to my current working directory as mpd2023web.xlsx. (On 2025-08-28 I found a 4 MB file by this name dated 2025-06-03. I changed that name to mpd023web0.xlsx and downloaded ostensibly the same file and got one with 4.9 MB. I doubt if I would have deleted some of the content of the file with date 2025-06-03, though that’s possible. In any event, I plan to ignore this difference for the moment.)

In 2024, I also downloaded “Maddison Database 2010” and compared it with mpd2020.xlsx. I compared the two for numbers for the former USSR, the UK, and the US. I found that the newer data had many numbers the older data didn’t while the older data had a few numbers absent from the newer data. However, it seemed that I would be wise to ignore the older data.

Find the data file

(MadXlsx <- path_package2('^mpd2023.*xlsx$'))
#> [1] "/private/var/folders/2n/zqk768wj3818l8x2wttbc5kw0000gn/T/RtmpldI43c/Rbuild51b644c22063/MaddisonData/inst/extdata/mpd2023_web.xlsx"
#> attr(,"searched")
#> [1] "/private/var/folders/2n/zqk768wj3818l8x2wttbc5kw0000gn/T/RtmpldI43c/Rbuild51b644c22063/MaddisonData/vignettes"             
#> [2] "/private/var/folders/2n/zqk768wj3818l8x2wttbc5kw0000gn/T/RtmpldI43c/Rbuild51b644c22063/MaddisonData/vignettes/extdata"     
#> [3] "/private/var/folders/2n/zqk768wj3818l8x2wttbc5kw0000gn/T/RtmpldI43c/Rbuild51b644c22063/MaddisonData/vignettes/inst/extdata"
#> [4] "/private/var/folders/2n/zqk768wj3818l8x2wttbc5kw0000gn/T/RtmpldI43c/Rbuild51b644c22063/MaddisonData/vignettes"             
#> [5] "/private/var/folders/2n/zqk768wj3818l8x2wttbc5kw0000gn/T/RtmpldI43c/Rbuild51b644c22063/MaddisonData/vignettes/extdata"     
#> [6] "/private/var/folders/2n/zqk768wj3818l8x2wttbc5kw0000gn/T/RtmpldI43c/Rbuild51b644c22063/MaddisonData/vignettes/inst/extdata"
#> [7] "/private/var/folders/2n/zqk768wj3818l8x2wttbc5kw0000gn/T/RtmpldI43c/Rbuild51b644c22063/MaddisonData"                       
#> [8] "/private/var/folders/2n/zqk768wj3818l8x2wttbc5kw0000gn/T/RtmpldI43c/Rbuild51b644c22063/MaddisonData/extdata"               
#> [9] "/private/var/folders/2n/zqk768wj3818l8x2wttbc5kw0000gn/T/RtmpldI43c/Rbuild51b644c22063/MaddisonData/inst/extdata"
(foundData <- (length(MadXlsx)>0))
#> [1] TRUE

If more than one file was found, pick the most recent one.

MadInfo <- file.info(MadXlsx)
cat('Multiple files found.\n')
#> Multiple files found.
print(MadInfo)
#>                                                                                                                                      size
#> /private/var/folders/2n/zqk768wj3818l8x2wttbc5kw0000gn/T/RtmpldI43c/Rbuild51b644c22063/MaddisonData/inst/extdata/mpd2023_web.xlsx 4903804
#>                                                                                                                                   isdir
#> /private/var/folders/2n/zqk768wj3818l8x2wttbc5kw0000gn/T/RtmpldI43c/Rbuild51b644c22063/MaddisonData/inst/extdata/mpd2023_web.xlsx FALSE
#>                                                                                                                                   mode
#> /private/var/folders/2n/zqk768wj3818l8x2wttbc5kw0000gn/T/RtmpldI43c/Rbuild51b644c22063/MaddisonData/inst/extdata/mpd2023_web.xlsx  644
#>                                                                                                                                                 mtime
#> /private/var/folders/2n/zqk768wj3818l8x2wttbc5kw0000gn/T/RtmpldI43c/Rbuild51b644c22063/MaddisonData/inst/extdata/mpd2023_web.xlsx 2025-08-28 21:49:07
#>                                                                                                                                                 ctime
#> /private/var/folders/2n/zqk768wj3818l8x2wttbc5kw0000gn/T/RtmpldI43c/Rbuild51b644c22063/MaddisonData/inst/extdata/mpd2023_web.xlsx 2025-11-20 07:47:08
#>                                                                                                                                                 atime
#> /private/var/folders/2n/zqk768wj3818l8x2wttbc5kw0000gn/T/RtmpldI43c/Rbuild51b644c22063/MaddisonData/inst/extdata/mpd2023_web.xlsx 2025-09-07 13:02:24
#>                                                                                                                                   uid
#> /private/var/folders/2n/zqk768wj3818l8x2wttbc5kw0000gn/T/RtmpldI43c/Rbuild51b644c22063/MaddisonData/inst/extdata/mpd2023_web.xlsx 501
#>                                                                                                                                   gid
#> /private/var/folders/2n/zqk768wj3818l8x2wttbc5kw0000gn/T/RtmpldI43c/Rbuild51b644c22063/MaddisonData/inst/extdata/mpd2023_web.xlsx  20
#>                                                                                                                                   uname
#> /private/var/folders/2n/zqk768wj3818l8x2wttbc5kw0000gn/T/RtmpldI43c/Rbuild51b644c22063/MaddisonData/inst/extdata/mpd2023_web.xlsx    sg
#>                                                                                                                                   grname
#> /private/var/folders/2n/zqk768wj3818l8x2wttbc5kw0000gn/T/RtmpldI43c/Rbuild51b644c22063/MaddisonData/inst/extdata/mpd2023_web.xlsx  staff
imax <- which.max(MadInfo$mtime)
Madxlsx <- MadXlsx[imax]

Read the data file

Now read Madxlsx.

MaddisonData0 <- readxl::read_xlsx(Madxlsx, sheet='Full data')
head(MaddisonData0, 2)
#> # A tibble: 2 × 6
#>   countrycode country     region                     year gdppc   pop
#>   <chr>       <chr>       <chr>                     <dbl> <dbl> <dbl>
#> 1 AFG         Afghanistan South and South East Asia     1    NA    NA
#> 2 AFG         Afghanistan South and South East Asia   730    NA    NA
tail(MaddisonData0)
#> # A tibble: 6 × 6
#>   countrycode country  region              year gdppc    pop
#>   <chr>       <chr>    <chr>              <dbl> <dbl>  <dbl>
#> 1 ZWE         Zimbabwe Sub Saharan Africa  2017 1844. 13870.
#> 2 ZWE         Zimbabwe Sub Saharan Africa  2018 1900. 14097.
#> 3 ZWE         Zimbabwe Sub Saharan Africa  2019 1753. 14345.
#> 4 ZWE         Zimbabwe Sub Saharan Africa  2020 1586. 14615.
#> 5 ZWE         Zimbabwe Sub Saharan Africa  2021 1687. 14900.
#> 6 ZWE         Zimbabwe Sub Saharan Africa  2022 1704. 15205.
MaddisonSources0 <- readxl::read_xlsx(Madxlsx, 
                        sheet='Sources')
#> New names:
#> • `` -> `...2`
#> • `` -> `...3`
names(MaddisonSources0) <- c('ISO', 'years', 'source')
head(MaddisonSources0)
#> # A tibble: 6 × 3
#>   ISO                                                               years source
#>   <chr>                                                             <chr> <chr> 
#> 1 GDP pc: 2008 - 2022: Total Economy Database the Conference Board… <NA>  <NA>  
#> 2 <NA>                                                              <NA>  <NA>  
#> 3 ISO code                                                          Coun… Source
#> 4 ARG                                                               Arge… <NA>  
#> 5 <NA>                                                              1800… Prado…
#> 6 <NA>                                                              1870… Berto…
tail(MaddisonSources0)
#> # A tibble: 6 × 3
#>   ISO                                                               years source
#>   <chr>                                                             <chr> <chr> 
#> 1 <NA>                                                              <NA>  <NA>  
#> 2 ZWE                                                               Zimb… <NA>  
#> 3 <NA>                                                              1914… Broad…
#> 4 <NA>                                                              <NA>  <NA>  
#> 5 Population sources                                                <NA>  <NA>  
#> 6 1990 onward Total Economy Database the Conference Board for all … <NA>  <NA>

Countries and country codes?

How many characters in countrycode?

ctryCds <- unique(MaddisonData0$countrycode)
nCds <- length(ctryCds)
table(nchCode <- nchar(ctryCds))
#> 
#>   3 
#> 169

Good: All 3-letter codes.

Let’s concatenate country after countrycode then check to see if there are any ctryCds with multiple country names.

cdCtry <- sort(with(MaddisonData0, unique(
            paste0(countrycode, country))))
if(length(cdCtry)!=nCds){
    stop('some countrycode(s) have more than one', 
         ' country')
}
cd_Ctry <- data.frame(ISO=substring(cdCtry, 1, 3),
                        country=substring(cdCtry, 4))

NOTES:

  1. The variable countrycode in MaddisonData is called ISO in MaddisonSources. We will standardize on ISO for this.

  2. There is a countrycode function in a countrycode package that sounds like it might help translating between country and countrycode. However, a little experimentation exposed “ambiguous” results. Therefore, we will not pursue that further here.

Let’s create MaddisonCountries as a data.frame with columns: countrycode, country, and region:

rownames(cd_Ctry) <- cd_Ctry$countrycode
ctryRgn <- sort(with(MaddisonData0, unique(
            paste0(countrycode, region))))
MaddisonCountries <- cbind(cd_Ctry, 
            region = substring(ctryRgn, 4))
rownames(MaddisonCountries) <- MaddisonCountries$ISO
save(MaddisonCountries, file='MaddisonCountries.rdb', 
       compress=TRUE)  
# CANNOT include MaddisonCountries.rda in .Rbuildignore, 
# so change the name.   
# per R Packages, section 9.7  
# As of 2025-09-24 the R Packages book recommends writing 
# using usethis::use_data 
if(FALSE){  
    tryUse <- try(usethis::use_data(MaddisonData))
}
getwd()
#> [1] "/private/var/folders/2n/zqk768wj3818l8x2wttbc5kw0000gn/T/RtmpldI43c/Rbuild51b644c22063/MaddisonData/vignettes"

This saves MaddisonCountries in MaddisonCountries.rdb in the working directory. To include that in the package, copy or move that file into the data subdirectory of the package and change the suffix to rda.

We can use subset got translate between country and countrycode or select all countries in selected regions.

subset(MaddisonCountries, ISO=='GBR', country)
#>            country
#> GBR United Kingdom
subset(MaddisonCountries, grepl('Yugo', country), 1:3)
#>     ISO           country         region
#> YUG YUG Former Yugoslavia Eastern Europe
table(MaddisonCountries$region)
#> 
#>                    East Asia               Eastern Europe 
#>                            6                           32 
#>                Latin America Middle East and North Africa 
#>                           26                           20 
#>    South and South East Asia           Sub Saharan Africa 
#>                           16                           46 
#>               Western Europe            Western Offshoots 
#>                           19                            4
# What are "Western Offshoots"? 
subset(MaddisonCountries, grepl('Of', region), 
                      c(country, ISO))
#>           country ISO
#> AUS     Australia AUS
#> CAN        Canada CAN
#> NZL   New Zealand NZL
#> USA United States USA

Now create MaddisonData with countrycode but without country and region.

First identify all rows of MaddisonData with both gdppc and pop NA.

MaddisonData

NAs <- with(MaddisonData0, is.na(gdppc) & is.na(pop))
str(MaddisonData <- MaddisonData0[!NAs, c(1, 4:6)])
#> tibble [23,280 × 4] (S3: tbl_df/tbl/data.frame)
#>  $ countrycode: chr [1:23280] "AFG" "AFG" "AFG" "AFG" ...
#>  $ year       : num [1:23280] 1820 1870 1913 1950 1951 ...
#>  $ gdppc      : num [1:23280] NA NA NA 1156 1170 ...
#>  $ pop        : num [1:23280] 3280 4207 5730 8150 8284 ...
names(MaddisonData)[1] <- 'ISO' 

save(MaddisonData, file='MaddisonData.rdb', 
       compress=TRUE)  
# CANNOT include MaddisonData.rda in .Rbuildignore, 
# so change the name.   
# per R Packages, section 9.7  
# As of 2025-09-24 the R Packages book recommends writing 
# using usethis::use_data 
if(FALSE){  
    tryUse <- try(usethis::use_data(MaddisonCountries))
}
getwd()
#> [1] "/private/var/folders/2n/zqk768wj3818l8x2wttbc5kw0000gn/T/RtmpldI43c/Rbuild51b644c22063/MaddisonData/vignettes"

MaddisonSources

The first and last rows in MaddisonSources0 give general info for GDP and population for all countries since 2008 and 1990, respectively.

The format is a row giving ISO and country followed by one or more rows with sources for different date ranges followed by a blank row.

Get the first row for each country:

str(sourceNA <- which(is.na(MaddisonSources0[,1])
        & is.na(MaddisonSources0[,2]) & 
          is.na(MaddisonSources0[,3])))
#>  int [1:69] 2 7 11 14 17 20 24 28 31 37 ...
str(row1 <- c(4, sourceNA[-1]+1))
#>  num [1:69] 4 8 12 15 18 21 25 29 32 38 ...
MaddisonSources0[row1[1:3], ]
#> # A tibble: 3 × 3
#>   ISO   years     source
#>   <chr> <chr>     <chr> 
#> 1 ARG   Argentina <NA>  
#> 2 BEL   Belgium   <NA>  
#> 3 BGR   Bulgaria  <NA>
MaddisonSources0[tail(row1, 3), ]
#> # A tibble: 3 × 3
#>   ISO                years    source
#>   <chr>              <chr>    <chr> 
#> 1 ZMB                Zambia   <NA>  
#> 2 ZWE                Zimbabwe <NA>  
#> 3 Population sources <NA>     <NA>

Sources for each ISO / country?

ISOsourceNms <- MaddisonSources0[head(row1, -1), 1, drop=TRUE]
nISOsources <- length(ISOsourceNms)
MaddisonSources <- vector('list', nISOsources)
names(MaddisonSources) <- ISOsourceNms
# Get year ranges for each country
MaddisonYears <- data.frame(
  ISO      =character(0), 
  yearBegin=integer(0), 
  yearEnd  =integer(0), 
  sourceNum=integer(0)
  )
for(i in 1:nISOsources){
  rowi <- (row1[i]+1):(row1[i+1]-2)
  MadSrci <- MaddisonSources0[rowi, 2:3]
  MaddisonSources[[i]] <- MadSrci
  MadYrsi0 <- MadDateRanges(MadSrci[, 1, drop=TRUE])
  MadYrsi <- cbind(ISO=ISOsourceNms[i], MadYrsi0)
  MaddisonYears <- rbind(MaddisonYears, 
                              MadYrsi) 
}

MaddisonSources[['EGY']]
#> # A tibble: 3 × 2
#>   years                  source                                                 
#>   <chr>                  <chr>                                                  
#> 1 1                      Scheidel, W. and Friesen, S. J., ‘The size of the econ…
#> 2 700 – 1500             Pamuk, Ş. and M. Shatzmiller (2011). “Real Wages and G…
#> 3 1820, 1870, 1913, 1950 Pamuk, S. (2006), Estimating Economic Growth in the Mi…
MaddisonYears[MaddisonYears$ISO=='EGY', ]
#>    ISO yearBegin yearEnd sourceNum
#> 24 EGY         1       1         1
#> 25 EGY       700    1500         2
#> 26 EGY      1820    1820         3
#> 27 EGY      1870    1870         3
#> 28 EGY      1913    1913         3
#> 29 EGY      1950    1950         3

attr(MaddisonSources, 'since2008') <- paste(
  "gdppc since 2008: Total Economy Database (TED) from the", 
  "Conference Board for all countries included in TED and UN",
  "national accounts statistics for all others.")
head(MaddisonSources, 3)
#> $ARG
#> # A tibble: 2 × 2
#>   years       source                                                            
#>   <chr>       <chr>                                                             
#> 1 1800 - 1870 Prados de la Escosura, L. (2009). “Lost Decades? Economic Perform…
#> 2 1870 - 1900 Bertola, L and Ocampo, J.A. (2012) The Economic Development of La…
#> 
#> $BEL
#> # A tibble: 2 × 2
#>   years      source                                                             
#>   <chr>      <chr>                                                              
#> 1 1          Scheidel, W. and Friesen, S. J., ‘The size of the economy and the …
#> 2 1500- 1846 Buyst, E. (2011), “Towards Estimates of Long Term Growth in the So…
#> 
#> $BGR
#> # A tibble: 1 × 2
#>   years     source                                                              
#>   <chr>     <chr>                                                               
#> 1 1892-1945 Ivanov, M. (2006). “Bulgarian National Income between 1892 and 1924…
head(MaddisonYears)
#>   ISO yearBegin yearEnd sourceNum
#> 1 ARG      1800    1870         1
#> 2 ARG      1870    1900         2
#> 3 BEL         1       1         1
#> 4 BEL      1500    1846         2
#> 5 BGR      1892    1945         1
#> 6 BIH      1952    2008         1

tail(MaddisonSources, 3)
#> $ZAF
#> # A tibble: 1 × 2
#>   years                   source                                                
#>   <chr>                   <chr>                                                 
#> 1 1700–1900 (Cape Colony) Fourie, J. and Van Zanden, J.L. (2013). GDP in the Du…
#> 
#> $ZMB
#> # A tibble: 1 × 2
#>   years     source                                                              
#>   <chr>     <chr>                                                               
#> 1 1906-1950 Broadberry, S., & Gardner, L. (2022). Economic growth in Sub-Sahara…
#> 
#> $ZWE
#> # A tibble: 1 × 2
#>   years     source                                                              
#>   <chr>     <chr>                                                               
#> 1 1914-1950 Broadberry, S., & Gardner, L. (2022). Economic growth in Sub-Sahara…
tail(MaddisonYears)
#>     ISO yearBegin yearEnd sourceNum
#> 128 VEN      1830    2012         1
#> 129 YUG      1952    2008         1
#> 130 YUG      2008    2008         2
#> 131 ZAF      1700    1900         1
#> 132 ZMB      1906    1950         1
#> 133 ZWE      1914    1950         1

save(MaddisonSources, file='MaddisonSources.rdb', 
       compress=TRUE)  
save(MaddisonYears, file='MaddisonYears.rdb', 
       compress=TRUE)  

Bibliography

Bolt and Van Zanden (2024), “Maddison style estimates of the evolution of the world economy: A new 2023 update”, Journal of Economic Surveys, 1–41.