This tutorial is for you if you want to leverage Apache Arrow for accessing and
manipulating data on databases. See
vignette("DBI", package = "DBI")
and
vignette("DBI", package = "DBI-advanced")
for tutorials on
accessing data using R’s data frames instead of Arrow’s structures.
Apache Arrow is
a cross-language development platform for in-memory analytics.
arrow::RecordBatchReader
Zero chance of interfering with existing DBI backends
Fully functional fallback implementation for all existing DBI backends
Requires {arrow} R package
New generics:
dbReadTableArrow()
dbCreateTableArrow()
dbAppendTableArrow()
dbGetQueryArrow()
dbSendQueryArrow()
dbFetchArrow()
dbFetchArrowChunk()
dbWriteTableArrow()
New classes:
DBIResultArrow
DBIResultArrowDefault
## <nanoarrow_array_stream struct<a: int32, b: double, c: string>>
## $ get_schema:function ()
## $ get_next :function (schema = x$get_schema(), validate = TRUE)
## $ release :function ()
## a b c
## 1 1 4.5 five
## 2 2 4.5 five
## 3 3 4.5 five
## <nanoarrow_array_stream struct<COUNT(*): int32>>
## $ get_schema:function ()
## $ get_next :function (schema = x$get_schema(), validate = TRUE)
## $ release :function ()
## COUNT(*)
## 1 2
## <nanoarrow_array_stream struct<a: int32, b: double, c: string>>
## $ get_schema:function ()
## $ get_next :function (schema = x$get_schema(), validate = TRUE)
## $ release :function ()
## <nanoarrow_array struct[2]>
## $ length : int 2
## $ null_count: int 0
## $ offset : int 0
## $ buffers :List of 1
## ..$ :<nanoarrow_buffer validity<bool>[0][0 b]> ``
## $ children :List of 3
## ..$ a:<nanoarrow_array int32[2]>
## .. ..$ length : int 2
## .. ..$ null_count: int 0
## .. ..$ offset : int 0
## .. ..$ buffers :List of 2
## .. .. ..$ :<nanoarrow_buffer validity<bool>[0][0 b]> ``
## .. .. ..$ :<nanoarrow_buffer data<int32>[2][8 b]> `1 2`
## .. ..$ dictionary: NULL
## .. ..$ children : list()
## ..$ b:<nanoarrow_array double[2]>
## .. ..$ length : int 2
## .. ..$ null_count: int 0
## .. ..$ offset : int 0
## .. ..$ buffers :List of 2
## .. .. ..$ :<nanoarrow_buffer validity<bool>[0][0 b]> ``
## .. .. ..$ :<nanoarrow_buffer data<double>[2][16 b]> `4.5 4.5`
## .. ..$ dictionary: NULL
## .. ..$ children : list()
## ..$ c:<nanoarrow_array string[2]>
## .. ..$ length : int 2
## .. ..$ null_count: int 0
## .. ..$ offset : int 0
## .. ..$ buffers :List of 3
## .. .. ..$ :<nanoarrow_buffer validity<bool>[0][0 b]> ``
## .. .. ..$ :<nanoarrow_buffer data_offset<int32>[3][12 b]> `0 4 8`
## .. .. ..$ :<nanoarrow_buffer data<string>[8 b]> `fivefive`
## .. ..$ dictionary: NULL
## .. ..$ children : list()
## $ dictionary: NULL
## NULL
in_arrow <- nanoarrow::as_nanoarrow_array(data.frame(a = 1:4))
stream <- dbGetQueryArrow(con, "SELECT $a AS batch, * FROM tbl WHERE a < $a", param = in_arrow)
as.data.frame(stream)
## batch a b c
## 1 2 1 4.5 five
## 2 3 1 4.5 five
## 3 3 2 4.5 five
## 4 4 1 4.5 five
## 5 4 2 4.5 five
## 6 4 3 4.5 five
stream <- dbGetQueryArrow(con, "SELECT * FROM tbl WHERE a < 3")
dbWriteTableArrow(con, "tbl_new", stream)
dbReadTable(con, "tbl_new")
## a b c
## 1 1 4.5 five
## 2 2 4.5 five
stream <- dbGetQueryArrow(con, "SELECT * FROM tbl WHERE a < 3")
dbCreateTableArrow(con, "tbl_split", stream)
dbAppendTableArrow(con, "tbl_split", stream)
## [1] TRUE
stream <- dbGetQueryArrow(con, "SELECT * FROM tbl WHERE a >= 3")
dbAppendTableArrow(con, "tbl_split", stream)
## [1] TRUE
## a b c
## 1 1 4.5 five
## 2 2 4.5 five
## 3 3 4.5 five
As usual, do not forget to disconnect from the database when done.
That concludes the major features of DBI. For more details on the
library functions covered in this tutorial see the DBI specification at
vignette("spec", package = "DBI")
.