| Type: | Package |
| Title: | Datasets for the Book 'Getting (more out of) Graphics' |
| Version: | 0.7 |
| Description: | Datasets analysed in the book Antony Unwin (2024, ISBN:978-0367674007) "Getting (more out of) Graphics". |
| Depends: | R (≥ 3.5) |
| License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
| Suggests: | tidyverse |
| NeedsCompilation: | no |
| Packaged: | 2024-08-28 08:53:02 UTC; antonyunwin2 |
| Author: | Antony Unwin [aut, cre, cph] |
| Maintainer: | Antony Unwin <unwin@math.uni-augsburg.de> |
| Repository: | CRAN |
| Date/Publication: | 2024-09-02 15:00:02 UTC |
GmooG: datasets analysed in "Getting (more out of) Graphics"
Description
There are 25 chapters of graphical data analyses in the book. Datasets that are not readily available are mainly provided in this package.
Details
Other datasets are analysed in the book as well. They are available in various R packages. Some can be downloaded and updated from the web.
Author(s)
Antony Unwin unwin@math.uni-augsburg.de
The 200 best times for male and female swimmers for many swimming events
Description
The best times up till mid-2021 are for 17 individual swimming events for men and women and for three relay events.
Usage
data(All200)
Format
A data frame with 7685 observations on the following 10 variables.
full_name_computedName of swimmer
team_codecountry
sdatedate of swim
bdatedate of birth
SwimTimeperformance (in seconds)
GenderWomen or Men
styleone of four swimming strokes or three relay events
distancelength of swim with special coding for relays (e.g. 4x100)
distlength of swim in metres
Rank_Orderranking within an event
Details
The dataset is analysed in Chapter 20, "Are swimmers swimming faster?".
Source
https://www.worldaquatics.com/swimming/rankings
Examples
data(All200, package="GmooG")
with(All200, table(style))
Voting at the 1912 Democratic Convention
Description
The number of votes by each state for each candidate on each ballot for the Democratic nomination for president.
Usage
data(DC1912)
Format
A data frame with 3939 observations on the following 4 variables.
StateState or territory name (there were 52)
CandidateName of one of the 13 candidates or 'NotVoting'
BallotBallot number (1 to 46)
VotesNumber of votes for the candidate on that ballot from the state
Details
Two other smaller datasets are used in combination with this one for the final plot of Chapter 4 (Figure 4.7), "Voting 46 times to choose a Presidential candidate", the estimated times of the ballots (DC1912ballots) and the adjournment times (DC1912adjourns).
Source
Woodson, Urey. 1912. Official Report of the Proceedings of the Democratic National Convention. Chicago: Peterson linotyping Company
Examples
data(DC1912, package="GmooG")
with(DC1912, table(State))
Times of adjournments at the 1912 Democratic Convention
Description
Times that the six adjournments started and finished, taken from Woodson's convention report.
Usage
data(DC1912adjourns)
Format
A data frame with 6 observations on the following 2 variables.
StartTDate and time of start of adjournment
EndTDate and time of end of adjournment
Details
This dataset is used in combination with the datasets DC1912 and DC1912ballots for the final plot of Chapter 4 (Figure 4.7), "Voting 46 times to choose a Presidential candidate".
Source
Woodson, Urey. 1912. Official Report of the Proceedings of the Democratic National Convention. Chicago: Peterson linotyping Company
Examples
data(DC1912adjourns, package="GmooG")
DC1912adjourns
Estimated times of ballots at the 1912 Democratic Convention
Description
The date and time that each ballot took place have been estimated from Woodson's convention report.
Usage
data(DC1912ballots)
Format
A data frame with 46 observations on the following 2 variables.
BallotBallot number (1 to 46)
DateTDate and time of the ballot
Details
This dataset is used in combination with the datasets DC1912 and DC1912adjourns for the final plot of Chapter 4 (Figure 4.7), "Voting 46 times to choose a Presidential candidate".
Source
Woodson, Urey. 1912. Official Report of the Proceedings of the Democratic National Convention. Chicago: Peterson linotyping Company
Examples
data(DC1912ballots, package="GmooG")
head(DC1912ballots)
Numbers of delegates for the individual states and groups
Description
The number of pledged delegates by group at the 2020 Democratic convention.
Usage
data(DC1912dels)
Format
A data frame with 58 observations on the following 3 variables.
StateName of group (mostly state or territory)
TotPNumber of pledged delegates by group at the 2020 Democratic convention
regionOrdered factor: MidWest, NorthEast, West, South, Territory, NA
Details
This dataset is used in Chapter 4, "Voting 46 times to choose a Presidential candidate".
Source
https://ballotpedia.org/Democratic_delegate_rules,_2020 and https://www.census.gov
Examples
data(DC1912dels, package="GmooG")
head(DC1912dels)
Electoral votes for the individual states of the US
Description
The number of electoral votes for each of the 50 states and D.C. from 1788 till 2020.
Usage
data(DC1912evs)
Format
A data frame with 51 observations on the following 36 variables.
CodeCode for State
StateState name (there were 51 including D.C.)
y1788Numbers of electoral votes by State in 1788
y1792Numbers of electoral votes by State in 1792
y17961800Numbers of electoral votes by State for 1796 and 1800
y18041808Numbers of electoral votes by State in 1804 and 1808
y1812Numbers of electoral votes by State in 1812
y1816Numbers of electoral votes by State in 1816
y1820Numbers of electoral votes by State in 1820
y18241828Numbers of electoral votes by State in 1824 and 1828
y1832Numbers of electoral votes by State in 1832
y18361840Numbers of electoral votes by State in 1836 and 1840
y1844Numbers of electoral votes by State in 1844
y1848Numbers of electoral votes by State in 1848
y18521856Numbers of electoral votes by State in 1852 and 1856
y1860Numbers of electoral votes by State in 1860
y1864Numbers of electoral votes by State in 1864
y1868Numbers of electoral votes by State in 1868
y1872Numbers of electoral votes by State in 1872
y18761880Numbers of electoral votes by State in 1876 and 1880
y18841888Numbers of electoral votes by State in 1884 and 1888
y1892Numbers of electoral votes by State in 1892
y18961900Numbers of electoral votes by State in 1896 and 1900
y1904Numbers of electoral votes by State in 1904
y1908Numbers of electoral votes by State in 1908
y19121928Numbers of electoral votes by State from 1912 to 1928
y19321940Numbers of electoral votes by State from 1932 to 1940
y19441948Numbers of electoral votes by State in 1944 and 1948
y19521956Numbers of electoral votes by State in 1952 and 1956
y1960Numbers of electoral votes by State in 1960
y19641968Numbers of electoral votes by State in 1964 and 1968
y19721980Numbers of electoral votes by State from 1972 to 1980
y19841988Numbers of electoral votes by State in 1984 and 1988
y19922000Numbers of electoral votes by State from 1992 to 2000
y20042008Numbers of electoral votes by State in 2000 and 2008
y20122020Numbers of electoral votes by State from 2012 to 2020
Details
This dataset is used in Chapter 4, "Voting 46 times to choose a Presidential candidate".
Source
https://en.wikipedia.org/wiki/United_States_Electoral_College
Examples
data(DC1912evs, package="GmooG")
head(DC1912evs[, c("State", "y1788", "y19121928", "y20122020")])
DLQI assessment in a phase 3 clinical trial of patients with psoriasis.
Description
150 psoriasis patients were randomized to Placebo (Treatment A) and 450 to the active treatment (Treatment B). The treatment effect in terms of Quality of Life was assessed at Week 16.
Usage
data(DLQI)
Format
A data frame with 900 observations on the following 15 variables.
USUBJIDindividual ID
TRTPlacebo (A) or Treatment (B)
PASI_BASELINEPsoriasis Area and Severity Index at Baseline
VISITInitial or at Week 16
DLQI101How Itchy, Sore, Painful, Stinging: 0-3
DLQI102How Embarrassed, Self Conscious: 0-3
DLQI103Interfered Shopping, Home, Yard: 0-3
DLQI104Influenced Clothes You Wear: 0-3
DLQI105Affected Social, Leisure Activity: 0-3
DLQI106Made It Difficult to Do Any Sports: 0-3
DLQI107Prevented Working or Studying: 0-3
DLQI108Problem Partner, Friends, Relative: 0-3
DLQI109Caused Any Sexual Difficulties: 0-3
DLQI110How Much a Problem is Treatment: 0-3
DLQI_SCOREDLQI Total Score: 0-30
Details
This dataset is used in Chapter 12, "Psoriasis and the Quality of Life".
Source
https://github.com/VIS-SIG/Wonderful-Wednesdays/tree/master/data/2021/2021-01-13
Examples
data(DLQI, package="GmooG")
with(DLQI, summary(PASI_BASELINE))
Vehicle accidents with deer in Bavaria
Description
Numbers of vehicle accidents with deer every half-hour from the beginning of 2002 till the end of 2011.
Usage
data(DVCdeer)
Format
A data frame with 175296 observations on the following 3 variables.
minsbeginning of half-hour period, from 00:00 to 23:30
dayday
Freqnumber of accidents
Details
This dataset and the dataset DVCnot are both used in Chapter 24, "When do road accidents with deer happen in Bavaria?".
Source
https://www.jstatsoft.org/article/view/v092i01
Examples
data(DVCdeer, package="GmooG")
with(DVCdeer, table(Freq))
Vehicle accidents in Bavaria not involving deer
Description
Numbers of vehicle accidents every half-hour from the beginning of 2002 till the end of 2011.
Usage
data(DVCnot)
Format
A data frame with 175296 observations on the following 3 variables.
minsbeginning of half-hour period, from 00:00 to 23:30
dayday, from 2002-01-01 to 2011-12-31
Freqnumber of accidents
Details
This dataset and the dataset DVCnot are both used in Chapter 24, "When do road accidents with deer happen in Bavaria?".
Source
https://www.jstatsoft.org/article/view/v092i01
Examples
data(DVCnot, package="GmooG")
with(DVCnot, table(Freq))
The top 116 decathletes of recent times in April 2021
Description
Details of the best performances of the top decathletes
Usage
data(Decath21)
Format
A data frame with 116 observations on the following 15 variables.
RankRank order
DecathleteDecathlete's name
NationalityDecathlete's nationality
Totalthe total points achieved over all 10 events
Run100mTime for the 100 metres (secs)
LongJumpDistance jumped (metres)
ShotPutDistance putting the shot (metres)
HighJumpHeight jumped (metres)
Run400mTime for the 400 metres (secs)
Hurdle110mTime for the 110 metres hurdles (secs)
DiscusDDistance throwing the discus (metres)
PoleVaultHeight achieved (metres)
JavelinDDistance throwing the javelin (metres)
Run1500mTime for the 1500 metres (secs)
VenueLocation and year of performance
Source
Examples
data(Decath21, package="GmooG")
with(Decath21, summary(Run1500m))
Trial of how drivers used electric car charging facilities
Description
A field experiment on electric vehicle charging
Usage
data(ElecCars)
Format
A data frame with 3395 observations on these 24 variables.
sessionIdcharging session
kwhTotaltotal energy use of a given EV charging session, measured in kWh
dollarsamount paid by the user in US$ for a given charging session
createddate and time the session began
endeddate and time the session ended
startTimehour of day began
endTimehour of day ended
chargeTimeHrstotal length of session
weekdayday of the week of session
platformdigital platform used by driver
distancedistance from home, if reported
userIduser code
stationIdstation code
locationIdlocation code
managerVehiclebinary, 1 if manager car
facilityTypetype of facility, manufacturing = 1, office = 2, research and development = 3, other = 4
Monbinary for day of week of session
Tuesbinary for day of week of session
Wedbinary for day of week of session
Thursbinary for day of week of session
Fribinary for day of week of session
Satbinary for day of week of session
Sunbinary for day of week of session
reportedZipbinary, 1 if user reported zip code
Details
This dataset is used in Chapter 13, "Charging electric cars".
Source
Examples
data(ElecCars, package="GmooG")
with(ElecCars, table(weekday))
Working population of France in 1954
Description
Numbers working in three sectors in each department of France in 1954.
Usage
data(F1954)
Format
A data frame with 90 observations on the following 8 variables.
IDID code for the department
DeptDepartment name
I.AgricultureNumber in thousands of workers in agriculture
II.IndustryNumber in thousands of workers in industry
III.CommerceNumber in thousands of workers in commerce
BertinTotalTotal of the three sectors reported by Bertin
AreaArea of department in sq kms
NOM_DEPTAlternative name for department
Details
The sector data is from Bertin, while area data has been taken from the Guerry package and Wikipedia. The alternative department name was used for merging with a shape file of France (France54Map). The dataset is analysed in Chapter 7, "Re-viewing Bertin's main example".
Source
Bertin, Jaques. 1973. Semiologie Graphique. 2nd ed. The Hague: Mouton-Gautier
Examples
data(F1954, package="GmooG")
with(F1954, summary(I.Agriculture))
Map of the departments of France in 1954
Description
A polygon map of the French departments
Usage
data(France54Map)
Format
An sf object with 90 observations on the following 2 variables
DeptDepartment name
geometrylist of department polygons
Details
This shape file is used in Chapter 7, "Re-viewing Bertin's main example", and combined with the data in the file F1954. Combining the six new departments of 1967 into the two former departments of Seine and Seine-et-Oise is approximately right.
Source
http://coulmont.com/cartes/rcarto.pdf Derived from GEOFLADept_FR_Corse_AV_L93/DEPARTEMENT.SHP
Life expectancy data from Gapminder
Description
Life expectancy at birth for almost 200 countries from 1800 to 2016 and forecasts for 2017 to 2100
Usage
data(GapLifeE)
Format
A data frame with 187 observations on 302 variables. The first variable is the name of the country. Every other variable is named as a year from 1800 to 2100 and the values are the historical life expectancy figures up to 2016 and forecasts of life expectancy from 2017 on.
Details
This dataset and the datasets GapRegions and GapPop are all used in Chapter 2, "Graphics and Gapminder".
Source
Examples
data(GapLifeE, package="GmooG")
library(tidyverse)
ggplot(GapLifeE, aes(`1900`, `2000`)) + geom_point()
Population data from Gapminder
Description
Population data for almost 200 countries from 1800 to 2016 and forecasts for 2017 to 2100
Usage
data(GapPop)
Format
A data frame with 195 observations on 302 variables. The first variable is the name of the country. Every other variable is named as a year from 1800 to 2100 and the values are the historical population figures up to 2016 and forecasts of population from 2017 on.
Details
This dataset and the datasets GapLifeE and GapRegions are all used in Chapter 2, "Graphics and Gapminder".
Source
Examples
data(GapPop, package="GmooG")
library(tidyverse)
ggplot(GapPop, aes(`1900`, `2000`)) + geom_point()
World region definitions used by Gapminder
Description
Gapminder offers several different divisions into regions of the almost 200 countries of the world.
Usage
data(GapRegions)
Format
A data frame with 197 observations on 16 variables.
geocountry abbreviation
namecountry name
four_regionsworld split into four regions
eight_regionsworld split into eight regions
six_regionsworld split into six regions
members_oecd_g77group membership: oecd, g77, other
Latitudelatitude of country
Longitudelongitude of country
UN member sincedate of joining UN
World bank regionworld split into seven regions by World bank
World bank, 4 income groups 2017world split into four income groups by World bank
World bank, 3 income groups 2017world split into three income groups by World bank, all NA
Details
This dataset and the datasets GapLifeE and GapPop are all used in Chapter 2, "Graphics and Gapminder".
Source
Examples
data(GapRegions, package="GmooG")
with(GapRegions, table(four_regions, six_regions))
Demographic and economic data for Germany in 2021
Description
Demographic and cconomic data for the 299 German parliamentary constituencies in 2021
Usage
data(GermanDemographics)
Format
A data frame with 299 observations on the following 17 variables
WkrNrConstituency (Wahlkreis) number
WkrNameConstituency name
CommunitiesNumber of communities
AreaArea in square kms
PopulationPopulation
GermansNumber of Germans in the population
ForeignersPercentage of foreigners in the population
PopDensityPopulation density, numbers per square km
Under18Percentage population under 18
Age1824Percentage population between 18 and 24
Age2534Percentage population between 25 and 34
Age3559Percentage population between 35 and 59
Age6074Percentage population between 60 and 74
Age75upPercentage population 75 and older
CarsPerPCars per 1000 people
HochschulreifePercentage qualified for university
UnemployedUnemployment rate
Details
This dataset and the datasets GermanElection21 and GermanExtraSeats are all used in Chapter 26, "German Election 2021–what happened?"
Source
https://www.bundeswahlleiterin.de Derived from btw21_strukturdaten.csv
Examples
data(GermanDemographics, package="GmooG")
with(GermanDemographics, summary(Under18))
Results of the election for the German Bundestag in Autumn 2021
Description
Detailed results by constituency for the German election of 2021 (and for the previous election in 2017)
Usage
data(GermanElection21)
Format
A data frame with 16024 observations on the following 9 variables
WkNrConstituency (Wahlkreis) number
WkNameConstituency name
LandBundesland number
ParteiParty
StimmeFirst (personal) or second (party) vote
AnzahlNumber of votes in 2021 election
VorpAnzahlNumber of votes in 2017 election
BundeslandBundesland name
RegionRegion: West, Berlin, East
Details
This dataset and the datasets GermanDemographics and GermanExtraSeats are all used in Chapter 26, "German Election 2021–what happened?"
Source
https://www.bundeswahlleiterin.de Derived from btw21_kerg2.csv
Examples
library(tidyverse)
data(GermanElection21, package="GmooG")
btw1vP <- GermanElection21 %>% count(Partei) %>% arrange(-n)
Extra seats at German elections from 1949 to 2021
Description
Numbers of extra seats (Ueberhangmandate and Ausgleichsmandate) needed to satisfy the German election rules
Usage
data(GermanExtraSeats)
Format
A data frame with 20 observations on these 2 variables.
YearElection year
NumberNumber of extra seats needed
Details
This dataset is used in Chapter 26, "German Election 2021–what happened?".
Source
German election results from https://www.bundeswahlleiter.de
Examples
data(GermanExtraSeats, package="GmooG")
library(tidyverse)
ggplot(GermanExtraSeats, aes(Year, Number)) + geom_line()
Map of the German parliamentary constituencies in 2021
Description
A polygon map of the German constituencies
Usage
data(GermanyMap)
Format
An sf object with 299 observations on the following 5 variables
WKR_NRConstituency (Wahlkreis) number
WKR_NAMEConstituency name
LAND_NRBundesland number
LAND_NAMEBundesland name
geometrylist of constituency polygons
Details
This map file is used in Chapter 26, "German Election 2021–what happened?"
Source
https://www.bundeswahlleiterin.de Derived from Geometrie_Wahlkreise_20DBT_geo.shp
Measurements of the speed of light by Michelson in 1879
Description
Michelson included more details of each experiment in the table of results in his report.
Usage
data(Mich1879)
Format
A data frame with 100 observations on the following 4 variables.
DateDay of the experiment (from 5 June to 2 July 1879)
TimeAM, PM or Elec (under electric light)
Valueestimate of the speed of light minus 299000, uncorrected for temperature and refraction
Temperaturetemperature in degrees Fahrenheit, from 58 to 90
Details
This dataset and the dataset newcomb are both used in Chapter 5, "Measuring the speed of light".
Source
Michelson, Albert. 1880. "Experimental Determination of the Velocity of Light Made at the U.S. Naval Academy, Annapolis." Astronomical Papers 1: 109-45. https://books.google.de/books? id=343nAAAAMAAJ
Examples
data(Mich1879, package="GmooG")
with(Mich1879, summary(Temperature))
Competitors at the modern Olympic Games
Description
Individuals who competed at the Olympic Games from 1896 to 2016.
Usage
data(OlympicPeople)
Format
A data frame with 219434 observations on the following 4 variables.
SexSex of athlete
NOCAbbreviation for national team
YearYear of Games
CityLocation of Games
Details
This dataset and the dataset OlympicPerfs are both used in Chapter 6, "The modern Olympic Games in numbers".
Source
Derived from https://www.kaggle.com/datasets/heesoo37/120-years-of-olympic-history-athletes-and-results
Examples
data(OlympicPeople, package="GmooG")
with(OlympicPeople, table(Year))
Performances of competitors at the modern Summer Olympic Games
Description
Performances at the Summer Olympic Games from 1896 to 2016.
Usage
data(OlympicPerfs)
Format
A data frame with 108789 observations on the following 8 variables.
rankrank in event
medalTypemedal won: one of Gold, Silver, Bronze, NA
gameslocation and year
disciplinediscipline of event
eventname of event
result_valueresult reported
result_typetype of result: distance, time, points, weight, and four others
countrycountry
Details
This dataset and the dataset OlympicPeople are both used in Chapter 6, "The modern Olympic Games in numbers".
Source
Derived from a dataset scraped from the web and provided to the maintainer.
Examples
data(OlympicPerfs, package="GmooG")
library(tidyverse)
OlyD <- OlympicPerfs %>% count(discipline)
Descriptions of three species of shearwaters (Audubon, Galapagos, Tropical)
Description
Plumage and morphological characteristics of three species of shearwaters.
Usage
data(SeaBirds)
Format
A data frame with 153 observations on the following 6 variables.
collarone of five categories
eyebrowsfour levels from none to very pronounced
undertailfour levels: White, Black, Black & White, Black & WHITE
bordernone, few or many
sexmale or female
speciesone of Audubon, Galapagos, Tropical
Details
This dataset is used in Chapter 23, "Distinguishing shearwaters".
Source
Derived from the R package CoModes (numerial categories have been converted to text and common names rather than scientific names are used for species)
Examples
data(SeaBirds, package="GmooG")
with(SeaBirds, table(species))
Responses on gay rights in Annenberg's 2004 National Election survey
Description
Responses on questions about gay rights at State level and Federal level
Usage
data(SurvGR)
Format
A data frame with 81422 observations on 11 variables.
IDID number
cDATEDate of interview
StateRespondent's state of residence
ageRespondent's age
genderRespondent's gender
raceRespondent's race
urbanityUrban, Suburban, or Rural
QuFQuestion answered about Federal gay rights
valFAnswer to Federal question
valSAnswer to State question
QuSQuestion answered about State gay rights
Details
This dataset is used in Chapter 9, "Results from surveys on gay rights".
Source
The Annenberg Public Policy Center of the University of Pennsylvania
Examples
data(SurvGR, package="GmooG")
with(SurvGR, table(urbanity))
Passengers and crew who sailed on the Titanic
Description
Some information on those who sailed on the Titanic
Usage
data(TitanicPassCrew)
Format
A data frame with 2208 observations on 7 variables.
AgeAge of individual
GenderGender of individual
GroupClass of passenger or section of crew
Areaabbreviated version of Group
JoinedPort where individual boarded:Belfast, Southampton, Cherbourg or Queenstown
NationalityIndividual's nationality
survivedWhether the individual survived:yes or no
Details
This dataset is used in Chapter 26, "The Titanic Disaster".
Source
Derived from a fuller dataset available from Encyclopedia Titanica
Examples
data(TitanicPassCrew, package="GmooG")
with(TitanicPassCrew, table(Joined))
Map of the Regional Classification of the contiguous US States
Description
Map of the contiguous US States including information on the regional classification by the Census Bureau
Usage
data(USregions)
Format
A data frame with 49 observations on 4 variables.
NAMEname of state
State2-letter code for state
Regionone of four Census Bureau regions: NorthEast, South, MidWest, West
geometrymap polygons for state
Details
This dataset is used in Chapter 9, "Results from surveys on gay rights".
Source
The polygon map data is from the spData package
Examples
data(USregions, package="GmooG")
Fuel economy data for car models in the US
Description
Fuel economy data for individual models of cars and trucks provided by the US Department of Energy.
Usage
data(VehEffUS)
Format
A data frame with 43516 observations on the following 16 variables.
yearmodel year, from 1984 to 2022)
makemake of car
modelmodel of car
VClassclass of vehicle
cylindersnumber of cylinders, from 2 to 16
atvTypetype of alternative fuel or advanced technology vehicle
displengine displacement in liters
drivedrive axle type
tranytransmission
citycity MPG for fuelType1
highwayhighway MPG for fuelType1
combinedcombined MPG for fuelType1
fuelCostA08annual fuel cost for fuelType1 ($)
fuelType1main fuel type
barrels08annual petroleum consumption in barrels for fuelType1
co2TailpipeGpmtailpipe CO2 in grams/mile for fuelType1
Details
This dataset is used in Chapter 17, "Fuel efficiency of cars in the USA".
Source
Selection of variables from https://www.fueleconomy.gov/feg/epadata/vehicles.csv.zip
Examples
data(VehEffUS, package="GmooG")
with(VehEffUS, table(drive))
Testing facial recognition software
Description
Buolamwini and Gebru used their own database that included more women and more people of colour to evaluate how well commercial gender classification algorithms coped with different shades of skin colour in a gender-balanced test database.
Usage
data(aFacial)
Format
A data frame with 72 observations on the following 5 variables.
SexFemale or Male
Skinone of six shades of skin colour from I to VI
PredictionCorrect or Wrong
Freqnumber of cases
Softwareone of three facial recognition software packages
Details
Summary data tables of percentages and some numerical totals were provided in the paper and the supplementary material. Assuming the results had to be based on integer numbers of cases it was possible to reconstruct summary raw numbers of the dataset. The dataset is analysed in Chapter 22, "Comparing software for facial recognition".
Source
Buolamwini, Joy, and Timnit Gebru. 2018. "Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification." Proceedings of Machine Learning Research 81: 1-15
Examples
data(aFacial, package="GmooG")
head(aFacial, n=12)
Human space flights
Description
Individuals who travelled into space between 1961 and 2019.
Usage
data(astronauts)
Format
A data frame with 1277 observations on the following 24 variables.
idid number of record
numberid number of individual
nationwide_numbernational number of individual
nameindividual's name
original_namename in own language
sexsex of individual
year_of_birthyear of birth of individual
nationalitynationality
military_civilianmilitary or civilian
selectionselection group
year_of_selectionselection year
mission_numbermission number of individual
total_number_of_missionstotal missions of individual
occupationrole on flight: commander, pilot, flight engineer, ...
year_of_missionMission year
mission_titleMission name
ascend_shuttleName of ascent shuttle
in_orbitName of spacecraft used in orbit
descend_shuttleName of descent shuttle
hours_missionDuration of mission in hours
total_hrs_sumTotal duration of all missions in hours
field21Instances of EVA by mission
eva_hrs_missionDuration of extravehicular activities during the mission
total_eva_hrsTotal duration of all extravehicular activities in hours
Details
This dataset is used in Chapter 10, "Who went up in space for how long?"
Source
https://github.com/rfordatascience/tidytuesday/tree/master/data/2020/2020-07-14
Examples
data(astronauts, package="GmooG")
library(tidyverse)
nc <- astronauts %>% count(nationality) %>% arrange(-n)
Colours worn by European international football teams
Description
Colours for displaying teams
Usage
data(eu20col)
Format
A data frame with 39 observations on these 6 variables.
team_alpha3three letter short form for country
url_teamwebpage for country
kit_shirtshirt colour in hex format
kit_awayaway shirt colour in hex format
kit_shortsshorts colour in hex format
kit_sockssocks colour in hex format
Details
This dataset and the dataset eu20p are both used in Chapter 15, "Home or away: where do soccer players play?"
Source
https://github.com/guyabel/chord-uefa-ec/
Examples
data(eu20col, package="GmooG")
head(eu20col)
Colours worn by European international football teams
Description
Colours for displaying teams
Usage
data(eu20p)
Format
A data frame with 4012 observations on these 21 variables.
yearyear of competition
squadcountry
noplayer's squad number (from 1968 on)
posposition, GK=Goalkeeper, DF=Defender, MF=midfield, FW=Forward
playerplayer name
date_of_birth_agedate of birth and age at competition
capsnumber of international caps
clubclub team of player
player_urlwebpage for player
club_fa_urlwebpage for Country Football Association of club
club_faCountry Football Association of club
club_2Second name for club
club_countryCountry of club
club_country_flagImage of country's flag
goalsnumber of goals scored for country
captainlogical TRUE (captain) or FALSE
player_originalplayer name and whether they were captain
nat_teamInternational team
club_country_harmCountry of club
nat_team_alpha3abbreviation for international team
club_alpha3abbreviation for country of club
Details
This dataset and the dataset eu20col are both used in Chapter 15, "Home or away: where do soccer players play?"
Source
https://github.com/guyabel/chord-uefa-ec/
Examples
data(eu20p, package="GmooG")
with(eu20p, table(pos))
Comparison of four tests for malaria
Description
Studying magneto-optical diagnosis of symptomatic malaria in Papua New Guinea.
Usage
data(malaria)
Format
A data frame with 956 observations on the following 24 variables.
IDPatient ID
Collect_DateDate blood sample collected
AgePatient age
WeightPatient weight
SexPatient sex
Temperatureancillary temperature in degrees Centigrade
HbPatient hemoglobin level in g/dL
illMalariaMalaria in last two weeks
RDT1HRP2 line positive
RDT2LDH line positive
RDTbHRP and LDH lines positive
PfqPCR copy number for P. falciparum per microL of blood
PvqPCR copy number for P. vivax in copies per microL of blood
LM_Pffinal expert light microscopy result for P. falciparum in parasites per microL of blood
LM_Pfgfinal expert light microscopy result for P. falciparum gametocytes in parasites per microL of blood
LM_Pvfinal expert light microscopy result for P. vivax in parasites per microL of blood
LM_Pvgfinal expert light microscopy result for P. vivax gametocytes in parasites per microL of blood
LM_Pmfinal expert light microscopy result for P. malariae in parasites per microL of blood
LM_Pofinal expert light microscopy result for P. ovale in parasites per microL of blood
AveMOAverage magneto-optical signalof blood aliquots #1,2,3 in mV/V
sdMOStandard deviation of the magneto-optical signals of blood aliquots #1,2,3 in mV/V
MO1Magneto-optical signal of blood aliquot #1 in mV/V
MO2Magneto-optical signal of blood aliquot #2 in mV/V
MO3Magneto-optical signal of blood aliquot #3 in mV/V
Details
This dataset is used in Chapter 19, "Comparing tests for malaria".
Source
doi:10.6084/m9.figshare.13078181.v1
Examples
data(malaria, package="GmooG")
with(malaria, summary(AveMO))
Measurements of the speed of light by Newcomb in 1882
Description
Newcomb reported three series of measurements and regarded the third series used here as the best.
Usage
data(newcomb)
Format
A data frame with 66 observations on the following 6 variables.
DateDay of the experiment (from 24 July to 5 September 1882)
ObserverNewcomb or Holcombe (who assisted Newcombe in these experiments)
Wt1a weight given by Newcomb for the quality of the image observed
Wt2a second weight for the quality of the image
Timetime taken in millionths of a second for light to travel a distance of 7.44242 kilometres in air
Wtoverall weight given by Newcomb to the observation
Details
This dataset and the dataset Mich1879 are both used in Chapter 5, "Measuring the speed of light".
Source
Newcomb, Simon. 1891. "Measures of the Velocity of Light Made Under the Direction of the Secretary of the Navy During the Years 1880-1882." Astronomical Papers 2: 107-230
Examples
data(newcomb, package="GmooG")
with(newcomb, summary(Time))