Customise your query

Get more and better results from OpenCage

Daniel Possenriede, Jesse Sadler, Maëlle Salmon

2021-02-18

Geocoding is surprisingly hard. Address formats and spellings differ in and between countries; administrative areas on different levels intersect; names, numbers, and boundaries change over time — you name it. The OpenCage API, therefore, supports about a dozen parameters to customise queries. This vignette explains how to use the query parameters with {opencage} to get better geocoding results.

Multiple results

Forward geocoding typically returns multiple results because many places have the same or similar names.

By default oc_forward_df() only returns one result: the one defined as the best result by the OpenCage API. To receive more results, modify the limit argument, which specifies the maximum number of results that should be returned. Integer values between 1 and 100 are allowed.

oc_forward_df("Berlin")
#> # A tibble: 1 x 4
#>   placename oc_lat oc_lng oc_formatted   
#>   <chr>      <dbl>  <dbl> <chr>          
#> 1 Berlin      52.5   13.4 Berlin, Germany
oc_forward_df("Berlin", limit = 5)
#> # A tibble: 4 x 4
#>   placename oc_lat oc_lng oc_formatted                                               
#>   <chr>      <dbl>  <dbl> <chr>                                                      
#> 1 Berlin      52.5   13.4 Berlin, Germany                                            
#> 2 Berlin      44.5  -71.2 Berlin, New Hampshire, United States of America            
#> 3 Berlin      39.8  -89.9 Berlin, Sangamon County, Illinois, United States of America
#> 4 Berlin      41.6  -72.7 Berlin, Connecticut, United States of America

Reverse geocoding only returns at most one result. Therefore, oc_reverse_df() does not support the limit argument.

OpenCage may sometimes have more than one record of one place. Duplicated records are not returned by default. If you set the no_dedupe argument to TRUE, you will receive duplicated results when available.

oc_forward_df("Berlin", limit = 5, no_dedupe = TRUE)
#> # A tibble: 5 x 4
#>   placename oc_lat oc_lng oc_formatted                                               
#>   <chr>      <dbl>  <dbl> <chr>                                                      
#> 1 Berlin      52.5   13.4 Berlin, Germany                                            
#> 2 Berlin      52.5   13.4 Berlin, Germany                                            
#> 3 Berlin      44.5  -71.2 Berlin, New Hampshire, United States of America            
#> 4 Berlin      39.8  -89.9 Berlin, Sangamon County, Illinois, United States of America
#> 5 Berlin      41.6  -72.7 Berlin, Connecticut, United States of America

Better targeted results

As you can see, place names are often ambiguous. Happily, the OpenCage API has tools to deal with this problem. The countrycode, bounds, and proximity arguments can make the query more precise. min_confidence lets you limit the results to those with a specified confidence score (which is not necessarily the “best” or most “relevant” result, though). These parameters are only relevant and available for forward geocoding.

countrycode

The countrycode parameter restricts the results to the given country. The country code is a two letter code as defined by the ISO 3166-1 Alpha 2 standard. E.g. “AR” for Argentina, “FR” for France, and “NZ” for the New Zealand.

oc_forward_df(placename = "Paris", countrycode = "US", limit = 5)
#> # A tibble: 5 x 4
#>   placename oc_lat oc_lng oc_formatted                                         
#>   <chr>      <dbl>  <dbl> <chr>                                                
#> 1 Paris       33.7  -95.6 Paris, TX 75460, United States of America            
#> 2 Paris       38.2  -84.3 Paris, Kentucky, United States of America            
#> 3 Paris       36.3  -88.3 Paris, TN 38242, United States of America            
#> 4 Paris       39.6  -87.7 Paris, IL 61944, United States of America            
#> 5 Paris       44.3  -70.5 Paris, Oxford County, Maine, United States of America

Multiple countrycodes per placename must be wrapped in a list. Here is an example with places called “Paris” in Italy and Portugal.

oc_forward_df(placename = "Paris", countrycode = list(c("IT", "PT")), limit = 5)
#> # A tibble: 5 x 4
#>   placename oc_lat oc_lng oc_formatted                                                           
#>   <chr>      <dbl>  <dbl> <chr>                                                                  
#> 1 Paris       44.6   7.28 Brossasco, Cuneo, Italy                                                
#> 2 Paris       46.5  10.4  23032 Valfurva, Italy                                                  
#> 3 Paris       37.4  -8.79 8670-320 Odemira, Portugal                                             
#> 4 Paris       43.5  12.1  Paris, Monterchi, Arezzo, Italy                                        
#> 5 Paris       46.5  11.3  Paris, Via Firenze - Florenzstraße, 56, 39100 Bolzano - Bozen BZ, Italy

Despite the name, country codes also exist for territories that are not independent states, e.g. Gibraltar (“GI”), Greenland (“GL”), Guadaloupe (“GP”), or Guam (“GU”). You can look up specific country codes with the {ISOcodes} or {countrycodes} packages or on the ISO or Wikipedia webpages. In fact, you can also look up country codes via OpenCage as well. If you were interested in the country code of Curaçao for example, you could run:

oc_forward_df("Curaçao", no_annotations = FALSE)["oc_iso_3166_1_alpha_2"]
#> # A tibble: 1 x 1
#>   oc_iso_3166_1_alpha_2
#>   <chr>                
#> 1 CW

bounds

The bounds parameter restricts the possible results to a defined bounding box. A bounding box is a named numeric vector with four coordinates specifying its south-west and north-east corners: (xmin, ymin, xmax, ymax). The bounds parameter can most easily be specified with the oc_bbox() helper. For example, bounds = oc_bbox(-0.56, 51.28, 0.27, 51.68). OpenCage provides a ‘bounds-finder’ to interactively determine bounds values.

Below is an example of the use of bounds where the bounding box specifies the the South American continent.

oc_forward_df(placename = "Paris", bounds = oc_bbox(-97, -56, -32, 12), limit = 5)
#> # A tibble: 5 x 4
#>   placename oc_lat oc_lng oc_formatted                                                          
#>   <chr>      <dbl>  <dbl> <chr>                                                                 
#> 1 Paris      -6.71  -69.9 Eirunepé, Região Geográfica Intermediária de Tefé, Brazil             
#> 2 Paris      -3.99  -79.2 110108, Loja, Ecuador                                                 
#> 3 Paris     -13.5   -62.5 Canton Motegua, Municipio Baures, Provincia de Iténez, Bolivia        
#> 4 Paris     -23.5   -47.5 Jardim Zezo Miguel, Sorocaba, Região Metropolitana de Sorocaba, Brazil
#> 5 Paris       6.31  -75.6 París, 051052 Bello, ANT, Colombia

Again, you can also use {opencage} to determine a bounding box for subsequent queries. If you wanted to map the airports on the Hawaiian islands, for example, you could find the appropriate bounding box and then search for the airports:

hi <- oc_forward_df(placename = "Hawaii", no_annotations = FALSE)

hi_bbox <-
  oc_bbox(
    hi$oc_southwest_lng,
    hi$oc_southwest_lat,
    hi$oc_northeast_lng,
    hi$oc_northeast_lat
  )

oc_forward_df(placename = "Airport", bounds = hi_bbox, limit = 10)
#> # A tibble: 10 x 4
#>    placename oc_lat oc_lng oc_formatted                                                                                                    
#>    <chr>      <dbl>  <dbl> <chr>                                                                                                           
#>  1 Airport     21.3  -158. Daniel K. Inouye International Airport, Aolele Street, Honolulu, HI 96820, United States of America             
#>  2 Airport     19.7  -156. Ellison Onizuka Kona International Airport at Keahole, Queen Kaahumanu Highway, Keahole, HI, United States of A~
#>  3 Airport     20.9  -156. Kahului Airport, Airport Access Road, Kahului, HI 96732-3509, United States of America                          
#>  4 Airport     19.7  -155. Hilo International Airport, Banyan Drive, Hilo, HI 96720, United States of America                              
#>  5 Airport     21.4  -158. Kaneohe Bay Marine Corp Air Station/Mokapu Point Airport, 6th Street, Honolulu County, HI 96863, United States ~
#>  6 Airport     19.6  -156. Old Kona Airport State Recreation Area, Laniakea, Kailua-Kona, Hawaii, United States of America                 
#>  7 Airport     21.2  -157. Molokai Airport, Mauna Loa Highway, Maui County, HI 96729, United States of America                             
#>  8 Airport     20.8  -157. Lanai Airport, Lanai Airport Road, Maui County, HI 96763, United States of America                              
#>  9 Airport     21.3  -158. Kalaeloa Airport, Midway Street, Kapolei, HI 96862, United States of America                                    
#> 10 Airport     21.0  -157. Kapalua-West Maui Airport, Honoapiilani Highway, Lahaina, HI 96761, United States of America

Note that such a query will only give you airports with “Airport” in their name or address, but not necessarily airfields, airstrips, etc. If you are more interested in these kind of features, you might want to take a look at the {osmdata} package.

proximity

The proximity parameter provides OpenCage with a hint to bias results in favour of those closer to the specified location. It is just one of many factors used for ranking results, however, and (some) results may be far away from the location or point passed to the proximity parameter. A point is a named numeric vector of a latitude and longitude coordinate pair in decimal format. The proximity parameter can most easily be specified with the oc_points() helper. For example, proximity = oc_point(38.0, -84.5), if you happen to already know the coordinates. If not, you can also look them up with {opencage}, of course:

lx <- oc_forward_df("Lexington, Kentucky")

lx_point <- oc_points(lx$oc_lat, lx$oc_lng)

oc_forward_df(placename = "Paris", proximity = lx_point, limit = 5)
#> # A tibble: 4 x 4
#>   placename oc_lat oc_lng oc_formatted                             
#>   <chr>      <dbl>  <dbl> <chr>                                    
#> 1 Paris       38.2 -84.3  Paris, Kentucky, United States of America
#> 2 Paris       48.9   2.35 Paris, France                            
#> 3 Paris       39.6 -87.7  Paris, IL 61944, United States of America
#> 4 Paris       38.8 -85.6  Paris, IN 47230, United States of America

Note that the French capital is listed before other places in the US, which are closer to the point provided. This illustrates how proximity is only one of many factors influencing the ranking of results.

Confidence

min_confidence — an integer value between 0 and 10 — indicates the precision of the returned result as defined by its geographical extent, i.e. by the extent of the result’s bounding box. When you specify min_confidence, only results with at least the requested confidence will be returned. Thus, in the following example, the French capital is too large to be returned.

oc_forward_df(placename = "Paris", min_confidence = 7, limit = 5)
#> # A tibble: 1 x 4
#>   placename oc_lat oc_lng oc_formatted                             
#>   <chr>      <dbl>  <dbl> <chr>                                    
#> 1 Paris       38.2  -84.3 Paris, Kentucky, United States of America

Note that confidence is not used for the ranking of results. It does not tell you which result is more “correct” or “relevant”, nor what type of thing the result is, but rather how small a result is, geographically speaking. See the API documentation for details.

Retrieve more information from the API

Besides parameters to target your search better, OpenCage offers parameters to receive more or specific types of information from the API.

language

If you would like to get your results in a specific language, you can pass an IETF BCP 47 language tag, such as “tr” for Turkish or “pt-BR” for Brazilian Portuguese, to the language parameter. OpenCage will attempt to return results in that language.

oc_forward_df(placename = "Munich", language = "tr")
#> # A tibble: 1 x 4
#>   placename oc_lat oc_lng oc_formatted           
#>   <chr>      <dbl>  <dbl> <chr>                  
#> 1 Munich      48.1   11.6 Münih, Bavyera, Almanya

Alternatively, you can specify the “native” tag, in which case OpenCage will attempt to return the response in the “official” language(s) of the location. Keep in mind, however, that some countries have more than one official language or that the official language may not be the one actually used day-to-day.

oc_forward_df(placename = "Munich", language = "native")
#> # A tibble: 1 x 4
#>   placename oc_lat oc_lng oc_formatted                
#>   <chr>      <dbl>  <dbl> <chr>                       
#> 1 Munich      48.1   11.6 München, Bayern, Deutschland

If the language parameter is set to NULL (which is the default), the tag is not recognized, or OpenCage does not have a record in that language, the results will be returned in English.

oc_forward_df(placename = "München")
#> # A tibble: 1 x 4
#>   placename oc_lat oc_lng oc_formatted            
#>   <chr>      <dbl>  <dbl> <chr>                   
#> 1 München     48.1   11.6 Munich, Bavaria, Germany

To find the correct language tag for your desired language, you can search for the language on the BCP47 language subtag lookup for example. Note however, that there are some language tags in use on OpenStreetMap, one of OpenCage’s main sources, that do not conform with the IETF BCP 47 standard. For example, OSM uses zh_pinyin instead of zh-Latn-pinyin for Hanyu Pinyin. It might, therefore, be helpful to consult the details page of the target country on openstreetmap.org to see which language tags are actually used. In any case, neither the OpenCage API nor the functions in this package will validate the language tags you provide.

For further details, see OpenCage’s API documentation.

Annotations

OpenCage supplies additional information about the result location in what it calls annotations. Annotations include, among a variety of other types of information, country information, time of sunset and sunrise, UN M49 codes or the location in different geocoding formats, like Maidenhead, Mercator projection (EPSG:3857), geohash or what3words. Some annotations, like the Irish Transverse Mercator (ITM, EPSG:2157) or the Federal Information Processing Standards (FIPS) code will only be shown when appropriate.

Whether the annotations are shown, is controlled by the no_annotations argument. It is TRUE by default, which means that the output will not contain annotations. (Yes, inverted argument names are confusing, but we just follow OpenCage’s lead here.) When you set no_annotations to FALSE, all columns are returned (i.e. output is implicitly set to "all"). This leads to a results with a lot of columns.

oc_forward_df("Dublin", no_annotations = FALSE)
#> # A tibble: 1 x 71
#>   placename oc_lat oc_lng oc_confidence oc_formatted oc_mgrs oc_maidenhead oc_callingcode oc_flag oc_geohash oc_qibla oc_wikidata
#>   <chr>      <dbl>  <dbl>         <int> <chr>        <chr>   <chr>                  <int> <chr>   <chr>         <dbl> <chr>      
#> 1 Dublin      53.3  -6.26             5 Dublin, Ire~ 29UPV8~ IO63ui83sw               353 "\U000~ gc7x9812v~     114. Q1761      
#> # ... with 59 more variables: oc_dms_lat <chr>, oc_dms_lng <chr>, oc_itm_easting <chr>, oc_itm_northing <chr>, oc_mercator_x <dbl>,
#> #   oc_mercator_y <dbl>, oc_osm_edit_url <chr>, oc_osm_note_url <chr>, oc_osm_url <chr>, oc_un_m49_statistical_groupings <list>,
#> #   oc_un_m49_regions_europe <chr>, oc_un_m49_regions_ie <chr>, oc_un_m49_regions_northern_europe <chr>, oc_un_m49_regions_world <chr>,
#> #   oc_currency_alternate_symbols <list>, oc_currency_decimal_mark <chr>, oc_currency_html_entity <chr>, oc_currency_iso_code <chr>,
#> #   oc_currency_iso_numeric <chr>, oc_currency_name <chr>, oc_currency_smallest_denomination <int>, oc_currency_subunit <chr>,
#> #   oc_currency_subunit_to_unit <int>, oc_currency_symbol <chr>, oc_currency_symbol_first <int>, oc_currency_thousands_separator <chr>,
#> #   oc_roadinfo_drive_on <chr>, oc_roadinfo_speed_in <chr>, oc_sun_rise_apparent <int>, oc_sun_rise_astronomical <int>,
#> #   oc_sun_rise_civil <int>, oc_sun_rise_nautical <int>, oc_sun_set_apparent <int>, oc_sun_set_astronomical <int>, oc_sun_set_civil <int>,
#> #   oc_sun_set_nautical <int>, oc_timezone_name <chr>, oc_timezone_now_in_dst <int>, oc_timezone_offset_sec <int>,
#> #   oc_timezone_offset_string <chr>, oc_timezone_short_name <chr>, oc_what3words_words <chr>, oc_northeast_lat <dbl>,
#> #   oc_northeast_lng <dbl>, oc_southwest_lat <dbl>, oc_southwest_lng <dbl>, oc_iso_3166_1_alpha_2 <chr>, oc_iso_3166_1_alpha_3 <chr>,
#> #   oc_category <chr>, oc_type <chr>, oc_city <chr>, oc_continent <chr>, oc_country <chr>, oc_country_code <chr>, oc_county <chr>,
#> #   oc_county_code <chr>, oc_political_union <chr>, oc_state <chr>, oc_state_code <chr>

roadinfo

roadinfo indicates whether the geocoder should attempt to match the nearest road (rather than an address) and provide additional road and driving information. It is FALSE by default, which means OpenCage will not attempt to match the nearest road. Some road and driving information is nevertheless provided as part of the annotations (see above), even when roadinfo is set to FALSE.

oc_forward_df(placename = c("Europa Advance Rd", "Bovoni Rd"), roadinfo = TRUE)
#> # A tibble: 2 x 29
#>   placename oc_lat oc_lng oc_confidence oc_formatted oc_roadinfo_dri~ oc_roadinfo_one~ oc_roadinfo_road oc_roadinfo_roa~ oc_roadinfo_spe~
#>   <chr>      <dbl>  <dbl>         <int> <chr>        <chr>            <chr>            <chr>            <chr>            <chr>           
#> 1 Europa A~   36.1  -5.34             9 Europa Adva~ right            yes              Europa Advance ~ secondary        km/h            
#> 2 Bovoni Rd   18.3 -64.9              9 Bovoni Hill~ left             <NA>             <NA>             <NA>             mph             
#> # ... with 19 more variables: oc_roadinfo_surface <chr>, oc_northeast_lat <dbl>, oc_northeast_lng <dbl>, oc_southwest_lat <dbl>,
#> #   oc_southwest_lng <dbl>, oc_iso_3166_1_alpha_2 <chr>, oc_iso_3166_1_alpha_3 <chr>, oc_category <chr>, oc_type <chr>,
#> #   oc_continent <chr>, oc_country <chr>, oc_country_code <chr>, oc_postcode <chr>, oc_road <chr>, oc_road_type <chr>, oc_state <chr>,
#> #   oc_county <chr>, oc_peak <chr>, oc_state_code <chr>

A blog post provides more details.

Abbreviated addresses

The geocoding functions also have an abbr parameter, which is FALSE by default. When it is TRUE, the addresses in the formatted field of the results are abbreviated (e.g. “Main St.” instead of “Main Street”).

oc_forward_df("Wall Street")
#> # A tibble: 1 x 4
#>   placename   oc_lat oc_lng oc_formatted                                             
#>   <chr>        <dbl>  <dbl> <chr>                                                    
#> 1 Wall Street   40.7  -74.0 Wall Street, New York, NY 10005, United States of America
oc_forward_df("Wall Street", abbrv = TRUE)
#> # A tibble: 1 x 4
#>   placename   oc_lat oc_lng oc_formatted                    
#>   <chr>        <dbl>  <dbl> <chr>                           
#> 1 Wall Street   40.7  -74.0 Wall St, New York, NY 10005, USA

See this blog post for more information.

Vectorised arguments

All of the function arguments mentioned above are vectorised, so you can send queries like this:

oc_forward_df(
  placename = c("New York", "Rio", "Tokyo"),
  language = c("es", "de", "fr")
)
#> # A tibble: 3 x 4
#>   placename oc_lat oc_lng oc_formatted                                                     
#>   <chr>      <dbl>  <dbl> <chr>                                                            
#> 1 New York    40.7  -74.0 Nueva York, Estados Unidos de América                            
#> 2 Rio        -22.9  -43.2 Rio de Janeiro, Região Metropolitana do Rio de Janeiro, Brasilien
#> 3 Tokyo       35.7  140.  Tokyo, Japon

Or geocode place names with country codes in a data frame:

for_df <-
  data.frame(
    location = c("Golden Gate Bridge", "Buckingham Palace", "Eiffel Tower"),
    ccode = c("at", "cg", "be")
  )

oc_forward_df(for_df, placename = location, countrycode = ccode)
#> # A tibble: 3 x 5
#>   location           ccode oc_lat oc_lng oc_formatted                                                                              
#>   <chr>              <chr>  <dbl>  <dbl> <chr>                                                                                     
#> 1 Golden Gate Bridge at     48.0   15.6  Karer, Golden Gate Bridge, 3180 Gemeinde Lilienfeld, Austria                              
#> 2 Buckingham Palace  cg     -4.80  11.8  Buckingham Palace, Boulevard du Général Charles de Gaulle, Pointe-Noire, Congo-Brazzaville
#> 3 Eiffel Tower       be     50.9    4.34 Eiffel Tower, Avenue de Bouchout - Boechoutlaan, 1020 City of Brussels, Belgium

This also works with oc_reverse_df(), of course.

rev_df <-
  data.frame(
    lat = c(51.952659, 41.401372),
    lon = c(7.632473, 2.128685)
  )

oc_reverse_df(rev_df, lat, lon, language = "native")
#> # A tibble: 2 x 3
#>     lat   lon oc_formatted                                        
#>   <dbl> <dbl> <chr>                                               
#> 1  52.0  7.63 Friedrich-Ebert-Straße 7, 48153 Münster, Deutschland
#> 2  41.4  2.13 Carrer de Calatrava, 68, 08017 Barcelona, España

Further information

For further information about the output and query parameters, see the OpenCage API docs and the OpenCage FAQ. When building queries, OpenCage’s best practices can be very useful, as well as their guide to geocoding accuracy.