The goal of this vignette is to illustrate how event data can be preprocessed in R to create an eventlog
object. Two different approaches are discussed: importing an event log from a XES-file, and importing an event log in csv-format.
A very easy way to create event logs in R
is to import the event log stored in XES-format. For example, we take the eventlog of municipality 1 of the BPI Challenge 2015, which can be found at the Process Mining Data Repository. In order to follow this Vignette, just store the data somewhere on your local pc.
Once you have the data op your pc, you can input the location to the eventlog_from_xes
function. Alternatively, calling this function without any arguments, as is done below, will open a dialog-box, allowing us to navigate to the event log.
data <- eventlog_from_xes()
data
## Event log consisting of:
## 52217 events
## 1099 traces
## 1199 cases
## 398 activities
## 52217 activity instances
##
## Source: local data frame [52,217 x 15]
##
## case_concept.name event_question event_dateFinished
## (chr) (chr) (chr)
## 1 10009138 EMPTY 2014-04-14 00:00:00
## 2 10009138 False 2014-04-14 00:00:00
## 3 10009138 EMPTY 2014-04-14 00:00:00
## 4 10009138 True 2014-04-14 00:00:00
## 5 10009138 EMPTY 2014-04-14 00:00:00
## 6 10009138 EMPTY 2014-04-14 00:00:00
## 7 10009138 EMPTY 2014-04-14 00:00:00
## 8 10009138 False 2014-04-14 00:00:00
## 9 10009138 False 2014-04-14 00:00:00
## 10 10009138 EMPTY 2014-04-14 00:00:00
## .. ... ... ...
## Variables not shown: event_dueDate (chr), event_action_code (chr),
## event_activityNameEN (chr), event_planned (chr), event_time.timestamp
## (chr), event_monitoringResource (chr), event_org.resource (chr),
## event_activityNameNL (chr), event_concept.name (chr),
## event_lifecycle.transition (chr), event_dateStop (chr),
## activity_instance (dbl)
Printing the event log, stored in the object data
, immediatly shows that the object is of the class eventlog
. The eventlog_from_xes
functions also handles the following things:
In this example, all events refer to the same lifecycle transition, i.e. complete.
table(data$event_lifecycle.transition)
##
## complete
## 52217
As a result, each single event conforms to a seperate activity instance. Thus, there are as many activity instances as there are events.
n_events(data)
## [1] 52217
n_activity_instances(data)
## [1] 52217
The event log classifiers are initialized as follows
case_id(data)
activity_id(data)
activity_instance_id(data)
lifecycle_id(data)
timestamp(data)
## [1] "case_concept.name"
## [1] "event_concept.name"
## [1] "activity_instance"
## [1] "event_lifecycle.transition"
## [1] "event_time.timestamp"
The only preprocessing step that needs to be done is to convert the timestamps to objects of the POSIXct
class. This can be done using the lubridate
package and by looking at the format the timestamps are in.
library(lubridate)
data[1:4,timestamp(data)]
## Source: local data frame [4 x 1]
##
## event_time.timestamp
## (chr)
## 1 2014-04-11T00:00:00+02:00
## 2 2014-04-14T00:00:00+02:00
## 3 2014-04-14T00:00:00+02:00
## 4 2014-04-14T00:00:00+02:00
data$event_time.timestamp <- ymd_hms(data$event_time.timestamp)
Note that case attributes can be extracted from a XES-file using the function case_attributes_from_xes
Alternatively, the event log might be stored in a csv-file. For importing csv files, more information can be found in ?read.csv
or using the readr
package. An example of an event log imported from a csv-file has been included under the name csv_example
.
data("csv_example", package = "edeaR")
head(csv_example)
## CASE ACTIVITY COMPLETE START
## 1 CA1 A 2015-01-03 01:23:45 2015-01-01 01:23:45
## 2 CA1 B 2015-01-04 01:23:45 2015-01-03 01:23:45
## 3 CA1 C 2015-01-07 01:23:45 2015-01-05 01:23:45
## 4 CA1 D 2015-01-07 01:23:45 2015-01-06 01:23:45
## 5 CA1 E 2015-01-09 01:23:45 2015-01-07 01:23:45
## 6 CA10 B 2015-01-12 01:23:45 2015-01-11 01:23:45
In this example, it can be seen that each row is in fact an activity instance, bearing multiple timestamps, i.e. both a complete and a start timestamp. The following steps are required in order to convert this data.frame to an event log.
POSIXct
objectseventlog
objectcsv_example$activity_instance <- 1:nrow(csv_example)
This can be easily done using the tidyr
package. Look to ?tidyr
for more information.
library(tidyr)
csv_example <- gather(csv_example, LIFECYCLE, TIMESTAMP, -CASE, - ACTIVITY, -ACTIVITY_INSTANCE)
head(csv_example)
## CASE ACTIVITY ACTIVITY_INSTANCE LIFECYCLE TIMESTAMP
## 1 CA1 A 1 START 2015-01-01 01:23:45
## 2 CA1 B 2 START 2015-01-03 01:23:45
## 3 CA1 C 3 START 2015-01-05 01:23:45
## 4 CA1 D 4 START 2015-01-06 01:23:45
## 5 CA1 E 5 START 2015-01-07 01:23:45
## 6 CA365 A 6 START 2015-01-01 01:23:45
By changing this column in a factor, their levels can easily be changed
csv_example$LIFECYCLE <- factor(csv_example$LIFECYCLE, labels = c("start","complete"))
head(csv_example)
## CASE ACTIVITY ACTIVITY_INSTANCE LIFECYCLE TIMESTAMP
## 1 CA1 A 1 start 2015-01-01 01:23:45
## 2 CA1 B 2 start 2015-01-03 01:23:45
## 3 CA1 C 3 start 2015-01-05 01:23:45
## 4 CA1 D 4 start 2015-01-06 01:23:45
## 5 CA1 E 5 start 2015-01-07 01:23:45
## 6 CA365 A 6 start 2015-01-01 01:23:45
Using lubridate
, as before.
csv_example$TIMESTAMP <- ymd_hms(csv_example$TIMESTAMP)
log <- eventlog(eventlog = csv_example,
case_id = "CASE",
activity_id = "ACTIVITY",
activity_instance_id = "ACTIVITY_INSTANCE",
lifecycle_id = "LIFECYCLE",
timestamp = "TIMESTAMP")
log
## Event log consisting of:
## 12766 events
## 12 traces
## 1000 cases
## 6 activities
## 6383 activity instances
##
## Source: local data frame [12,766 x 5]
##
## CASE ACTIVITY ACTIVITY_INSTANCE LIFECYCLE TIMESTAMP
## (fctr) (fctr) (int) (fctr) (time)
## 1 CA1 A 1 start 2015-01-01 01:23:45
## 2 CA1 B 2 start 2015-01-03 01:23:45
## 3 CA1 C 3 start 2015-01-05 01:23:45
## 4 CA1 D 4 start 2015-01-06 01:23:45
## 5 CA1 E 5 start 2015-01-07 01:23:45
## 6 CA365 A 6 start 2015-01-01 01:23:45
## 7 CA365 B 7 start 2015-01-03 01:23:45
## 8 CA365 C 8 start 2015-01-05 01:23:45
## 9 CA365 D 9 start 2015-01-06 01:23:45
## 10 CA365 E 10 start 2015-01-07 01:23:45
## .. ... ... ... ... ...