| Title: | Prediction Data from GPT Detectors | 
| Version: | 0.1.0 | 
| Description: | Researchers carried out a series of experiments passing a number of essays to different GPT detection models. Juxtaposing detector predictions for papers written by native and non-native English writers, the authors argue that GPT detectors disproportionately classify real writing from non-native English writers as AI-generated. | 
| License: | MIT + file LICENSE | 
| Encoding: | UTF-8 | 
| RoxygenNote: | 7.2.3 | 
| Depends: | R (≥ 2.10) | 
| LazyData: | true | 
| URL: | https://simonpcouch.github.io/detectors/ | 
| Suggests: | knitr | 
| NeedsCompilation: | no | 
| Packaged: | 2023-10-26 14:23:49 UTC; simoncouch | 
| Author: | Simon Couch [cre, aut] | 
| Maintainer: | Simon Couch <simonpatrickcouch@gmail.com> | 
| Repository: | CRAN | 
| Date/Publication: | 2023-10-26 15:30:02 UTC | 
Predictions from GPT Detectors
Description
Data derived from the paper GPT detectors are biased against non-native English writers. The study authors carried out a series of experiments passing a number of essays to different GPT detection models. Juxtaposing detector predictions for papers written by native and non-native English writers, the authors argue that GPT detectors disproportionately classify real writing from non-native English writers as AI-generated.
Usage
detectors
Format
A data frame with 6,185 rows and 9 columns:
- kind
- Whether the essay was written by a - "Human"or- "AI".
- .pred_AI
- The class probability from the GPT detector that the inputted text was written by AI. 
- .pred_class
- The uncalibrated class prediction, encoded as - if_else(.pred_AI > .5, "AI", "Human")
- detector
- The name of the detector used to generate the predictions. 
- native
- For essays written by humans, whether the essay was written by a native English writer or not. These categorizations are coarse; values of - "Yes"may actually be written by people who do not write with English natively.- NAindicates that the text was not written by a human.
- name
- A label for the experiment that the predictions were generated from. 
- model
- For essays that were written by AI, the name of the model that generated the essay. 
- document_id
- A unique identifier for the supplied essay. Some essays were supplied to multiple detectors. Note that some essays are AI-revised derivatives of others. 
- prompt
- For essays that were written by AI, a descriptor for the form of "prompt engineering" passed to the model. 
For more information on these data, see the source paper.
Source
doi:10.1016/j.patter.2023.100779
Examples
detectors