rlist provides searching capabilities, that is, find values within a list without specifying the key or searching path. list.search
handles a variety of search demands. The following is the definition of this function.
list.search(.data, when, how, what, ...,
na.rm = FALSE, classes = class(what), unlist = FALSE)
Definition of arguments:
.data
: the list to be searchedwhen
: the function to aggregate returned logical vector: all
or any
how
: identical
, unidentical
, equal
, unequal
, include
, exclude
, like(dist)
, or unlike(dist)
?what
: the value to search...
: additional parameters passed to how
functionna.rm
: should when
function ignore NA
values?classes
: a character vector of the classes to be examinedunlist
: should the final result be unlisted?Exact search is to find values only by logical examinations. Suppose we search the following list.
x <- list(p1 = list(type="A",score=c(c1=9)),
p2 = list(type=c("A","B"),score=c(c1=8,c2=9)),
p3 = list(type=c("B","C"),score=c(c1=9,c2=7)),
p4 = list(type=c("B","C"),score=c(c1=8,c2=NA)))
First, we search all values in the list that is identical to “A”.
list.search(x, all, identical, "A")
# $p1
# $p1$type
# [1] "A"
Only values that are identical to character vector "A"
will be put in the resulting list. We can also unlist the result.
list.search(x, all, identical, "A", unlist = TRUE)
# p1.type
# "A"
Then, we search all values identical to c("A","B")
.
list.search(x, all, identical, c("A","B"))
# $p2
# $p2$type
# [1] "A" "B"
Next, we search if there is a numeric vector c(10,8)
.
list.search(x, all, identical, c(10,8))
# named list()
The result is none. If you are familiar with how function identical
works, you should not feel surprises since this may be the strongest comparer to tell whether two objects are the same: Two objects are identical when and only when they have absolutely the same structure including values and names. That explains why there is no numeric vector like c(10,8)
because all numeric vectors in x
are named vectors like c(c1=10,c2=8)
.
To compare values between atomic vectors just like using ==
, we can use allEqual
, anyEqual
, allUnequal
, or anyUnequal
as the comparer function.
Search length-1 numeric vectors all equal to 9.
list.search(x, all, equal, 9)
# $p1
# $p1$score
# c1
# 9
Search length-2 numeric vectors all equal to c(8,9)
.
list.search(x, all, equal, c(8,9))
# $p2
# $p2$score
# c1 c2
# 8 9
Search length-2 numeric vectors all equal to c(8,9)
ignoring NA
.
list.search(x, all, equal, c(8,9), na.rm = TRUE)
# $p2
# $p2$score
# c1 c2
# 8 9
#
#
# $p4
# $p4$score
# c1 c2
# 8 NA
Search length-1 character vectors in which any value equals “A”.
list.search(x, any, equal, "A")
# $p1
# $p1$type
# [1] "A"
Search length-1 numeric vectors in which any value equals 8.
list.search(x, any, equal, 8)
# named list()
Search length-2 numeric vectors c(x,y)
for which any correspondent values are equal, that is, any(c(x,y)==c(8,9))
is TRUE
.
list.search(x, any, equal, c(8,9))
# $p2
# $p2$score
# c1 c2
# 8 9
#
#
# $p4
# $p4$score
# c1 c2
# 8 NA
Search all numeric vectors in which both 8 and 9 are included.
list.search(x, all, include, c(8,9))
# $p2
# $p2$score
# c1 c2
# 8 9
Search all numeric vectors in which any of 7, 8, or 10 is included.
list.search(x, any, include, c(7,8,10))
# $p2
# $p2$score
# c1 c2
# 8 9
#
#
# $p3
# $p3$score
# c1 c2
# 9 7
#
#
# $p4
# $p4$score
# c1 c2
# 8 NA
The comparison is flexible enough to support fuzzy searching using functions provided by stringdist
package. Consider the following list.
x <- list(
p1 = list(name="Ken",age=24),
p2 = list(name="Kent",age=26),
p3 = list(name="Sam",age=24),
p4 = list(name="Keynes",age=30),
p5 = list(name="Kwen",age=31))
rlist's built-in functions like(dist,...)
and unlike(dist,...)
that internally call stringdist::stringdist
handle fuzzy search to meet a wide range of demands.
For both functions, dist
means the maximum string distance between the actual value in the list and the search term you specify, and ...
is the additional parameters passed to their internally called functions. More specifically,
like(1)
means the string distances of the values in the character vector must be no greater than one.unlike(1)
means the string distances of the values in the character vector must be no less than one.For example, if we want to find out names similar with "Ken"
with maximum distance 1, we have to specify fun = like(1)
. Since the names are single-valued, it does not matter whether to choose all
or any
.
list.search(x, any, like(1), "ken", unlist = TRUE)
# p1.name
# "Ken"
If the distance constraint is too tight, set a greater value.
list.search(x, any, like(2), "ken", unlist = TRUE)
# p1.name p2.name p5.name
# "Ken" "Kent" "Kwen"
Suppose we are working with the following data in which names becomes length-2 character vectors.
x <- list(
p1 = list(name=c("Ken", "Ren"),age=24),
p2 = list(name=c("Kent", "Potter"),age=26),
p3 = list(name=c("Sam", "Lee"),age=24),
p4 = list(name=c("Keynes", "Bond"),age=30),
p5 = list(name=c("Kwen", "Hu"),age=31))
Search all character vectors in which any element is like “Ken” within string distance 1.
list.search(x, any, like(1), "Ken")
# $p1
# $p1$name
# [1] "Ken" "Ren"
#
#
# $p2
# $p2$name
# [1] "Kent" "Potter"
#
#
# $p5
# $p5$name
# [1] "Kwen" "Hu"
Search all character vectors in which all elements are unlike “Ken” due to string distance no less than 2.
list.search(x, all, unlike(2), "Ken")
# $p3
# $p3$name
# [1] "Sam" "Lee"
#
#
# $p4
# $p4$name
# [1] "Keynes" "Bond"
Search all character vectors c(x,y)
like c("Ken","Hu")
together with both string distance no greater than 2, that ,is, the distances between x
and “Ken” as well as that between y
and “Hu” should be no greater than 2.
list.search(x, all, like(2), c("Ken","Hu"))
# $p5
# $p5$name
# [1] "Kwen" "Hu"
The fuzzy search functions also work with filtering functions.
Consider the following data.
x <- list(
p1 = list(name=c("Ken", "Ren"),age=24),
p2 = list(name=c("Kent", "Potter"),age=26),
p3 = list(name=c("Sam", "Lee"),age=24),
p4 = list(name=c("Keynes", "Bond"),age=30),
p5 = list(name=c("Kwen", "Hu"),age=31))
We can also use fuzzy search compares with list.filter
. For example, filter all list members whose name
has any character value like Ken
with maximum distance 1, and output their pasted names as a named character vector. Here we use pipeline.
library(pipeR)
x %>>%
list.filter(any(like(1)(name,"Ken"))) %>>%
list.mapv(paste(name,collapse = " "))
# p1 p2 p5
# "Ken Ren" "Kent Potter" "Kwen Hu"