---
title: "Trip updates"
author: "Matthew Palm"
date: "2026-05-18"
output: html_vignette
vignette: >
  %\VignetteIndexEntry{Trip updates}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include= FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

This vignette shows how to read GTFS-realtime trip updates into R using the {gtfsrealtime} package. Trip updates describe the real-time progress of a vehicle along a scheduled trip, including predicted arrival and departure times, delays, skipped stops, canceled trips, and other real-time information. The {gtfsrealtime} package reads the nested GTFS-realtime trip update format and flattens it into a data frame that is easier to inspect and analyze in R.

## Load libraries

First, we load {gtfsrealtime} to read GTFS-realtime files and {dplyr} to inspect and summarize the resulting data frame.

```{r setup, message=FALSE}
library(gtfsrealtime)
library(dplyr)
```

## Load a GTFS-realtime trip updates feed

This example uses a New York City trip updates feed included with {gtfsrealtime}. The file is compressed with bzip2 to save space. {gtfsrealtime} can automatically detect and read uncompressed files as well as files compressed with zip, gzip, or bzip2. Zip files can contain multiple GTFS-realtime files, in which case {gtfsrealtime} will read all of them. You can differentiate which file each update came from based on the `file_index` field.

GTFS-realtime time values are stored as Unix timestamps, which are interpreted relative to UTC. To convert to local time, we provide a local time zone. Time zones are specified in standardized TZ database format, generally Continent/City. If you do not want to convert times, you can specify a time zone of Etc/UTC.

```{r read-updates, warning=FALSE}
updates <- read_gtfsrt_trip_updates(
  system.file("nyc-trip-updates.pb.bz2", package = "gtfsrealtime"),
  "America/New_York"
)
```

When reading this example feed, {gtfsrealtime} warns that some GTFS-realtime entity IDs are duplicated. In these cases, the package appends suffixes such as `_duplicated_1` so that each row can be represented with a unique `id`. There are quite a few of them, so they are suppressed here to keep the vignette readable, but the first two are:

```
1: ! ID UP_A6-Weekday-SDon-094800_B6_243 is duplicated. Replacing with UP_A6-Weekday-SDon-094800_B6_243_duplicated_1 . This may cause joins between different
  GTFS-realtime files (even within a ZIP archive) to be incorrect.
2: ! ID UP_A6-Weekday-SDon-094800_B6_243 is duplicated. Replacing with UP_A6-Weekday-SDon-094800_B6_243_duplicated_2 . This may cause joins between different
  GTFS-realtime files (even within a ZIP archive) to be incorrect.
```

These warnings are useful in practice: duplicated entity IDs can affect workflows that join records across GTFS-realtime files or across multiple files within a ZIP archive, as IDs may no longer match across files.

## Explore trip updates

GTFS-realtime trip updates are [hierarchical](https://gtfs.org/documentation/realtime/reference/#message-tripupdate); one trip update can contain information about the trip as a whole as well as updates for multiple stops along that trip. `read_gtfsrt_trip_updates()` flattens that structure into a data frame. As a result, the same `trip_id` may appear in multiple rows when the feed contains stop-level updates for multiple stops.

```{r glimpse-updates}
glimpse(updates)
```

## Inspecting one trip across its stops

Because a single trip can include predictions for multiple stops, it is useful to inspect all rows associated with one `trip_id`. In the example below, we select the first trip in the feed and display the route, stop sequence, stop ID, and predicted arrival and departure times for each stop. If a trip update has no stop time updates, it will appear as a single row with all the `stop_*` fields NA. Documentation for all of the columns is in the documentation for `read_gtfsrt_trip_updates()`.

```{r inpsect-updates}
updates |>
  filter(trip_id == first(trip_id)) |>
  select(
    trip_id,
    route_id,
    stop_id,
    stop_sequence,
    arrival_time,
    departure_time,
    arrival_delay,
    departure_delay
  )
```

