This function visualizes FFTs. The best way to use the function is to create an fft object with fft()
, then apply plot()
to the object
Let’s start with an example, we’ll create an fft object called heart.fft
from the heartdisease
dataset:
set.seed(100) # For reproducability due to training / testing data split
heart.fft <- fft(
train.cue.df = heartdisease[,names(heartdisease) != "diagnosis"],
train.criterion.v = heartdisease$diagnosis,
train.p = .5,
max.levels = 4
)
Once you’ve created an fft object using fft()
you can visualize the tree (and ROC curves) using plot()
. There are two main arguments:
which.tree
: Which tree do you want to plot? You can specify an integer such as which.tree = 2
will plot the tree #2 in the fft object, or which.tree = "best.train"
which will use the best training tree.
which.data
: Which data do you want to apply the tree to? Currently, you can either use the training dataset with which.data = "train"
or the test dataset with which.data = "test"
.
Let’s plot the best training tree for the heartdisease data when applied to the test dataset:
plot(heart.fft,
which.tree = "best.train",
which.data = "test",
description = "Heart Disease",
decision.names = c("Healthy", "Disease")
)
Here’s how to interpret this tree: if thal is greater than 3, classify as signal (“+ disease”). For this dataset, 72 cases (18 true noise + 54 true signal) were classified as signal while the remaining 80 cases (152 - 72) moved to the next level. Now, if cp is less than 4, classify as noise (“- disease”). Here, of the remaining 80 cases, 50 (45 true noise and 5 true signal) were classified as noise (“- disease”). This left 30 cases which were classified at the final level.
Cumulative classification statistics are showin in the bottom panel of the plot.
Now let’s compare the result to tree # 1. This is a very conservative tree with Noise exits on all but the last branch. This tree should have a very low false-alarm rate but (unfortunately) also a very low hit-rate.
plot(heart.fft,
which.tree = 1,
which.data = "test",
description = "Heart Disease",
decision.names = c("Healthy", "Disease")
)
Indeed, this tree is very conservative: only 23 cases were classified as signals (“+ disease”) at the very end of the tree.
You can also plot ROC curves, showing the cumulative HR and FAR of each of the trees, by specifying roc = T
. Because all trees (and both training and test datasets) are plotted, no additional arguments are necessary.
Here are the fitting and training ROC for the heart.fft
object. The best training tree is specified with large filled symbols. The circle is for the training data, the triangle is for the test data.
plot(heart.fft,
roc = T
)
Here, the value of the triangle (FAR = 24%, HR = 88%) corresponds to our individual tree plot above (i.e.; the best fittting tree applied to the test data). The point towards the bottom left of the plot (FAR = 0%, HR = 32%) corresponds to tree 2 (the very conservative tree).
You can also include ROC curves for Logistic Regression and CART by including lr = T
and/or cart = T
:
plot(heart.fft,
roc = T,
lr = T,
cart = T
)
Here, we can see that for data fitting (circles and dashed lines), the FFTs did worse than LR and CART. However, for fitting (triangles and solid lines) the trees outperformed both LR and CART.