traineR Package

PROMiDAT

2020-10-29

The traineR package seeks to unify the different ways of creating predictive models and their different predictive formats. It includes methods such as K-Nearest Neighbors, Decision Trees, ADA Boosting, Extreme Gradient Boosting, Random Forest, Neural Networks, Deep Learning, Support Vector Machines, Bayesian and Logical Regression.

The main idea of the package is that all predictions can be execute using a standard syntax, also that all predictive methods can be used in the same way by default, for example, that all packages are use classification in their default invocation and all methods use a formula to determine the predictor variables (independent variables) and the response variable.

Examples:

For the following examples we will use the Puromycin dataset:

conc rate state
0.02 76 treated
0.02 47 treated
0.06 97 treated
0.06 107 treated
0.11 123 treated
0.11 139 treated
0.22 159 treated
0.22 152 treated
0.56 191 treated
0.56 201 treated
n <- seq_len(nrow(Puromycin))
.sample <- sample(n, length(n) * 0.7)
data.train <- Puromycin[.sample,]
data.test  <- Puromycin[-.sample,]

Logistic Regression

Modeling:

model <- train.glm(state~., data.train)
model
#> 
#> Call:  glm(formula = state ~ ., family = binomial, data = data.train)
#> 
#> Coefficients:
#> (Intercept)         conc         rate  
#>     2.42091      2.76388     -0.02687  
#> 
#> Degrees of Freedom: 15 Total (i.e. Null);  13 Residual
#> Null Deviance:       22.18 
#> Residual Deviance: 20.03     AIC: 26.03

Prediction as probability:

Note: the result is always a matrix.

prediction <- predict(model, data.test , type = "prob")
prediction
#>        treated untreated
#> [1,] 0.6410078 0.3589922
#> [2,] 0.7329453 0.2670547
#> [3,] 0.7618719 0.2381281
#> [4,] 0.5250667 0.4749333
#> [5,] 0.4313970 0.5686030
#> [6,] 0.5750735 0.4249265
#> [7,] 0.4750727 0.5249273

Prediction as classification:

Note: the result is always a factor.

prediction <- predict(model, data.test , type = "class")
prediction
#> [1] treated   treated   treated   treated   untreated treated   untreated
#> Levels: treated untreated

Confusion Matrix

mc <- confusion.matrix(data.test, prediction)
mc
#>            prediction
#> real        treated untreated
#>   treated         4         0
#>   untreated       1         2

Some Rates:

general.indexes(mc = mc)
#> 
#> Confusion Matrix:
#>            prediction
#> real        treated untreated
#>   treated         4         0
#>   untreated       1         2
#> 
#> Overall Accuracy: 0.8571
#> Overall Error:    0.1429
#> 
#> Category Accuracy:
#> 
#>       treated    untreated
#>      1.000000     0.666667

ADA Boosting

Modeling:

model <- train.ada(state~., data.train, iter = 200)
model
#> Call:
#> ada(state ~ ., data = data.train, iter = 200)
#> 
#> Loss: exponential Method: discrete   Iteration: 200 
#> 
#> Final Confusion Matrix for Data:
#>            Final Prediction
#> True value  treated untreated
#>   treated         5         3
#>   untreated       5         3
#> 
#> Train Error: 0.5 
#> 
#> Out-Of-Bag Error:  0.5  iteration= 6 
#> 
#> Additional Estimates of number of iterations:
#> 
#> train.err1 train.kap1 
#>          1          1

Prediction as probability:

prediction <- predict(model, data.test , type = "prob")
prediction
#>      treated untreated
#> [1,]     0.5       0.5
#> [2,]     0.5       0.5
#> [3,]     0.5       0.5
#> [4,]     0.5       0.5
#> [5,]     0.5       0.5
#> [6,]     0.5       0.5
#> [7,]     0.5       0.5

Prediction as classification:

prediction <- predict(model, data.test , type = "class")
prediction
#> [1] untreated untreated treated   treated   treated   treated   treated  
#> Levels: treated untreated

Confusion Matrix:

mc <- confusion.matrix(data.test, prediction)
mc
#>            prediction
#> real        treated untreated
#>   treated         2         2
#>   untreated       3         0

Some Rates:

general.indexes(mc = mc)
#> 
#> Confusion Matrix:
#>            prediction
#> real        treated untreated
#>   treated         2         2
#>   untreated       3         0
#> 
#> Overall Accuracy: 0.2857
#> Overall Error:    0.7143
#> 
#> Category Accuracy:
#> 
#>       treated    untreated
#>      0.500000     0.000000

For the following examples we will use the iris dataset:

Sepal.Length Sepal.Width Petal.Length Petal.Width Species
5.1 3.5 1.4 0.2 setosa
4.9 3.0 1.4 0.2 setosa
4.7 3.2 1.3 0.2 setosa
4.6 3.1 1.5 0.2 setosa
5.0 3.6 1.4 0.2 setosa
5.4 3.9 1.7 0.4 setosa
4.6 3.4 1.4 0.3 setosa
5.0 3.4 1.5 0.2 setosa
4.4 2.9 1.4 0.2 setosa
4.9 3.1 1.5 0.1 setosa
data("iris")
n <- seq_len(nrow(iris))
.sample <- sample(n, length(n) * 0.75)
data.train <- iris[.sample,]
data.test <- iris[-.sample,]

Decision Trees

Modeling:

model <- train.rpart(Species~., data.train)
model
#> n= 112 
#> 
#> node), split, n, loss, yval, (yprob)
#>       * denotes terminal node
#> 
#> 1) root 112 73 virginica (0.31250000 0.33928571 0.34821429)  
#>   2) Petal.Length< 2.45 35  0 setosa (1.00000000 0.00000000 0.00000000) *
#>   3) Petal.Length>=2.45 77 38 virginica (0.00000000 0.49350649 0.50649351)  
#>     6) Petal.Length< 4.95 38  2 versicolor (0.00000000 0.94736842 0.05263158) *
#>     7) Petal.Length>=4.95 39  2 virginica (0.00000000 0.05128205 0.94871795) *

Prediction as probability:

prediction <- predict(model, data.test , type = "prob")
prediction
#>     setosa versicolor  virginica
#> 5        1 0.00000000 0.00000000
#> 6        1 0.00000000 0.00000000
#> 9        1 0.00000000 0.00000000
#> 15       1 0.00000000 0.00000000
#> 16       1 0.00000000 0.00000000
#> 21       1 0.00000000 0.00000000
#> 26       1 0.00000000 0.00000000
#> 31       1 0.00000000 0.00000000
#> 33       1 0.00000000 0.00000000
#> 34       1 0.00000000 0.00000000
#> 41       1 0.00000000 0.00000000
#> 42       1 0.00000000 0.00000000
#> 46       1 0.00000000 0.00000000
#> 49       1 0.00000000 0.00000000
#> 50       1 0.00000000 0.00000000
#> 52       0 0.94736842 0.05263158
#> 57       0 0.94736842 0.05263158
#> 60       0 0.94736842 0.05263158
#> 63       0 0.94736842 0.05263158
#> 69       0 0.94736842 0.05263158
#> 70       0 0.94736842 0.05263158
#> 72       0 0.94736842 0.05263158
#> 75       0 0.94736842 0.05263158
#> 85       0 0.94736842 0.05263158
#> 89       0 0.94736842 0.05263158
#> 96       0 0.94736842 0.05263158
#> 98       0 0.94736842 0.05263158
#> 105      0 0.05128205 0.94871795
#> 113      0 0.05128205 0.94871795
#> 117      0 0.05128205 0.94871795
#> 122      0 0.94736842 0.05263158
#> 127      0 0.94736842 0.05263158
#> 128      0 0.94736842 0.05263158
#> 133      0 0.05128205 0.94871795
#> 134      0 0.05128205 0.94871795
#> 137      0 0.05128205 0.94871795
#> 138      0 0.05128205 0.94871795
#> 139      0 0.94736842 0.05263158

Prediction as classification:

prediction <- predict(model, data.test , type = "class")
prediction
#>  [1] setosa     setosa     setosa     setosa     setosa     setosa    
#>  [7] setosa     setosa     setosa     setosa     setosa     setosa    
#> [13] setosa     setosa     setosa     versicolor versicolor versicolor
#> [19] versicolor versicolor versicolor versicolor versicolor versicolor
#> [25] versicolor versicolor versicolor virginica  virginica  virginica 
#> [31] versicolor versicolor versicolor virginica  virginica  virginica 
#> [37] virginica  versicolor
#> Levels: setosa versicolor virginica

Confusion Matrix:

mc <- confusion.matrix(data.test, prediction)
mc
#>             prediction
#> real         setosa versicolor virginica
#>   setosa         15          0         0
#>   versicolor      0         12         0
#>   virginica       0          4         7

Some Rates:

general.indexes(mc = mc)
#> 
#> Confusion Matrix:
#>             prediction
#> real         setosa versicolor virginica
#>   setosa         15          0         0
#>   versicolor      0         12         0
#>   virginica       0          4         7
#> 
#> Overall Accuracy: 0.8947
#> Overall Error:    0.1053
#> 
#> Category Accuracy:
#> 
#>        setosa   versicolor    virginica
#>      1.000000     1.000000     0.636364

The model still supports the functions of the original package.

library(rpart.plot)
prp(model, extra = 104, branch.type = 2, 
    box.col = c("pink", "palegreen3", "cyan")[model$frame$yval])

Bayesian Method

Modeling:

model <- train.bayes(Species~., data.train)
model
#> 
#> Naive Bayes Classifier for Discrete Predictors
#> 
#> Call:
#> naiveBayes.default(x = X, y = Y, laplace = laplace)
#> 
#> A-priori probabilities:
#> Y
#>     setosa versicolor  virginica 
#>  0.3125000  0.3392857  0.3482143 
#> 
#> Conditional probabilities:
#>             Sepal.Length
#> Y                [,1]      [,2]
#>   setosa     4.957143 0.3211063
#>   versicolor 5.939474 0.5504492
#>   virginica  6.674359 0.6788992
#> 
#>             Sepal.Width
#> Y                [,1]      [,2]
#>   setosa     3.400000 0.2612189
#>   versicolor 2.757895 0.3045880
#>   virginica  2.974359 0.3544588
#> 
#>             Petal.Length
#> Y                [,1]      [,2]
#>   setosa     1.462857 0.1864304
#>   versicolor 4.265789 0.5205383
#>   virginica  5.630769 0.5717719
#> 
#>             Petal.Width
#> Y                 [,1]      [,2]
#>   setosa     0.2485714 0.1147156
#>   versicolor 1.3236842 0.2059050
#>   virginica  2.0487179 0.2780305

Prediction as probability:

prediction <- predict(model, data.test , type = "prob")
prediction
#>              setosa   versicolor    virginica
#>  [1,]  1.000000e+00 7.737257e-17 2.664560e-25
#>  [2,]  1.000000e+00 1.163952e-12 3.249398e-20
#>  [3,]  1.000000e+00 5.666669e-15 1.970121e-24
#>  [4,]  1.000000e+00 3.332201e-16 1.027164e-23
#>  [5,]  1.000000e+00 8.196702e-14 3.311215e-20
#>  [6,]  1.000000e+00 8.351056e-14 3.797976e-22
#>  [7,]  1.000000e+00 5.922675e-14 4.711814e-23
#>  [8,]  1.000000e+00 1.456018e-14 1.374143e-23
#>  [9,]  1.000000e+00 4.827492e-18 3.624167e-25
#> [10,]  1.000000e+00 5.023423e-17 5.047569e-24
#> [11,]  1.000000e+00 8.752754e-16 1.308454e-24
#> [12,]  1.000000e+00 1.389926e-11 1.434594e-21
#> [13,]  1.000000e+00 4.444219e-14 1.628944e-23
#> [14,]  1.000000e+00 4.451511e-16 3.673816e-24
#> [15,]  1.000000e+00 5.822182e-16 6.658709e-25
#> [16,]  5.940570e-87 9.545975e-01 4.540252e-02
#> [17,]  5.029672e-98 7.077119e-01 2.922881e-01
#> [18,]  6.794652e-60 9.999217e-01 7.831053e-05
#> [19,]  7.624232e-55 9.999920e-01 7.961368e-06
#> [20,]  4.312516e-90 9.933454e-01 6.654567e-03
#> [21,]  1.115374e-51 9.999932e-01 6.825135e-06
#> [22,]  4.918234e-62 9.998355e-01 1.644616e-04
#> [23,]  2.855165e-73 9.987707e-01 1.229258e-03
#> [24,]  1.309660e-83 9.941360e-01 5.864046e-03
#> [25,]  1.375492e-62 9.998215e-01 1.785080e-04
#> [26,]  6.445989e-63 9.998707e-01 1.293008e-04
#> [27,]  3.045611e-72 9.991760e-01 8.239834e-04
#> [28,] 3.140649e-185 1.713030e-06 9.999983e-01
#> [29,] 1.809605e-165 2.337754e-05 9.999766e-01
#> [30,] 2.585373e-146 5.885113e-03 9.941149e-01
#> [31,] 6.060711e-125 3.225090e-02 9.677491e-01
#> [32,] 1.639403e-112 3.157370e-01 6.842630e-01
#> [33,] 1.231208e-115 1.870445e-01 8.129555e-01
#> [34,] 1.356643e-174 8.992617e-06 9.999910e-01
#> [35,] 1.809296e-112 8.184932e-01 1.815068e-01
#> [36,] 7.907582e-186 4.218470e-08 1.000000e+00
#> [37,] 2.048596e-145 5.669839e-03 9.943302e-01
#> [38,] 7.586350e-111 3.009125e-01 6.990875e-01

Prediction as classification:

prediction <- predict(model, data.test , type = "class")
prediction
#>  [1] setosa     setosa     setosa     setosa     setosa     setosa    
#>  [7] setosa     setosa     setosa     setosa     setosa     setosa    
#> [13] setosa     setosa     setosa     versicolor versicolor versicolor
#> [19] versicolor versicolor versicolor versicolor versicolor versicolor
#> [25] versicolor versicolor versicolor virginica  virginica  virginica 
#> [31] virginica  virginica  virginica  virginica  versicolor virginica 
#> [37] virginica  virginica 
#> Levels: setosa versicolor virginica

Confusion Matrix:

mc <- confusion.matrix(data.test, prediction)
mc
#>             prediction
#> real         setosa versicolor virginica
#>   setosa         15          0         0
#>   versicolor      0         12         0
#>   virginica       0          1        10

Some Rates:

general.indexes(mc = mc)
#> 
#> Confusion Matrix:
#>             prediction
#> real         setosa versicolor virginica
#>   setosa         15          0         0
#>   versicolor      0         12         0
#>   virginica       0          1        10
#> 
#> Overall Accuracy: 0.9737
#> Overall Error:    0.0263
#> 
#> Category Accuracy:
#> 
#>        setosa   versicolor    virginica
#>      1.000000     1.000000     0.909091

Random Forest

Modeling:

model <- train.randomForest(Species~., data.train)
model
#> 
#> Call:
#>  randomForest(formula = Species ~ ., data = data.train, importance = TRUE) 
#>                Type of random forest: classification
#>                      Number of trees: 500
#> No. of variables tried at each split: 2
#> 
#>         OOB estimate of  error rate: 6.25%
#> Confusion matrix:
#>            setosa versicolor virginica class.error
#> setosa         35          0         0  0.00000000
#> versicolor      0         35         3  0.07894737
#> virginica       0          4        35  0.10256410

Prediction as probability:

prediction <- predict(model, data.test , type = "prob")
prediction
#>     setosa versicolor virginica
#> 5    1.000      0.000     0.000
#> 6    1.000      0.000     0.000
#> 9    0.998      0.002     0.000
#> 15   0.944      0.056     0.000
#> 16   0.958      0.042     0.000
#> 21   0.998      0.002     0.000
#> 26   0.998      0.002     0.000
#> 31   1.000      0.000     0.000
#> 33   1.000      0.000     0.000
#> 34   0.982      0.018     0.000
#> 41   1.000      0.000     0.000
#> 42   0.932      0.060     0.008
#> 46   1.000      0.000     0.000
#> 49   1.000      0.000     0.000
#> 50   1.000      0.000     0.000
#> 52   0.000      0.996     0.004
#> 57   0.000      0.956     0.044
#> 60   0.006      0.956     0.038
#> 63   0.000      0.878     0.122
#> 69   0.000      0.840     0.160
#> 70   0.000      1.000     0.000
#> 72   0.000      0.998     0.002
#> 75   0.000      0.998     0.002
#> 85   0.072      0.868     0.060
#> 89   0.002      0.998     0.000
#> 96   0.002      0.998     0.000
#> 98   0.000      0.998     0.002
#> 105  0.000      0.000     1.000
#> 113  0.000      0.000     1.000
#> 117  0.000      0.010     0.990
#> 122  0.000      0.278     0.722
#> 127  0.000      0.550     0.450
#> 128  0.000      0.422     0.578
#> 133  0.000      0.000     1.000
#> 134  0.000      0.586     0.414
#> 137  0.000      0.000     1.000
#> 138  0.000      0.016     0.984
#> 139  0.000      0.674     0.326

Prediction as classification:

prediction <- predict(model, data.test , type = "class")
prediction
#>  [1] setosa     setosa     setosa     setosa     setosa     setosa    
#>  [7] setosa     setosa     setosa     setosa     setosa     setosa    
#> [13] setosa     setosa     setosa     versicolor versicolor versicolor
#> [19] versicolor versicolor versicolor versicolor versicolor versicolor
#> [25] versicolor versicolor versicolor virginica  virginica  virginica 
#> [31] virginica  versicolor virginica  virginica  versicolor virginica 
#> [37] virginica  versicolor
#> Levels: setosa versicolor virginica

Confusion Matrix:

mc <- confusion.matrix(data.test, prediction)
mc
#>             prediction
#> real         setosa versicolor virginica
#>   setosa         15          0         0
#>   versicolor      0         12         0
#>   virginica       0          3         8

Some Rates:

general.indexes(mc = mc)
#> 
#> Confusion Matrix:
#>             prediction
#> real         setosa versicolor virginica
#>   setosa         15          0         0
#>   versicolor      0         12         0
#>   virginica       0          3         8
#> 
#> Overall Accuracy: 0.9211
#> Overall Error:    0.0789
#> 
#> Category Accuracy:
#> 
#>        setosa   versicolor    virginica
#>      1.000000     1.000000     0.727273

The model still supports the functions of the original package.

library(randomForest)
varImpPlot(model)

K-Nearest Neighbors

Modeling:

model <- train.knn(Species~., data.train)
model
#> 
#> Call:
#> kknn::train.kknn(formula = Species ~ ., data = data.train)
#> 
#> Type of response variable: nominal
#> Minimal misclassification: 0.05357143
#> Best kernel: optimal
#> Best k: 4

Prediction as probability:

prediction <- predict(model, data.test , type = "prob")
prediction
#>           setosa versicolor  virginica
#>  [1,] 1.00000000 0.00000000 0.00000000
#>  [2,] 1.00000000 0.00000000 0.00000000
#>  [3,] 1.00000000 0.00000000 0.00000000
#>  [4,] 1.00000000 0.00000000 0.00000000
#>  [5,] 1.00000000 0.00000000 0.00000000
#>  [6,] 1.00000000 0.00000000 0.00000000
#>  [7,] 1.00000000 0.00000000 0.00000000
#>  [8,] 1.00000000 0.00000000 0.00000000
#>  [9,] 1.00000000 0.00000000 0.00000000
#> [10,] 1.00000000 0.00000000 0.00000000
#> [11,] 1.00000000 0.00000000 0.00000000
#> [12,] 0.04903811 0.95096189 0.00000000
#> [13,] 1.00000000 0.00000000 0.00000000
#> [14,] 1.00000000 0.00000000 0.00000000
#> [15,] 1.00000000 0.00000000 0.00000000
#> [16,] 0.00000000 1.00000000 0.00000000
#> [17,] 0.00000000 0.84193132 0.15806868
#> [18,] 0.00000000 1.00000000 0.00000000
#> [19,] 0.00000000 1.00000000 0.00000000
#> [20,] 0.00000000 0.50000000 0.50000000
#> [21,] 0.00000000 1.00000000 0.00000000
#> [22,] 0.00000000 1.00000000 0.00000000
#> [23,] 0.00000000 1.00000000 0.00000000
#> [24,] 0.00000000 1.00000000 0.00000000
#> [25,] 0.00000000 1.00000000 0.00000000
#> [26,] 0.00000000 1.00000000 0.00000000
#> [27,] 0.00000000 1.00000000 0.00000000
#> [28,] 0.00000000 0.00000000 1.00000000
#> [29,] 0.00000000 0.00000000 1.00000000
#> [30,] 0.00000000 0.15806868 0.84193132
#> [31,] 0.00000000 0.00000000 1.00000000
#> [32,] 0.00000000 0.15806868 0.84193132
#> [33,] 0.00000000 0.20710678 0.79289322
#> [34,] 0.00000000 0.00000000 1.00000000
#> [35,] 0.00000000 0.95096189 0.04903811
#> [36,] 0.00000000 0.00000000 1.00000000
#> [37,] 0.00000000 0.04903811 0.95096189
#> [38,] 0.00000000 0.50000000 0.50000000

Prediction as classification:

prediction <- predict(model, data.test , type = "class")
prediction
#>  [1] setosa     setosa     setosa     setosa     setosa     setosa    
#>  [7] setosa     setosa     setosa     setosa     setosa     versicolor
#> [13] setosa     setosa     setosa     versicolor versicolor versicolor
#> [19] versicolor versicolor versicolor versicolor versicolor versicolor
#> [25] versicolor versicolor versicolor virginica  virginica  virginica 
#> [31] virginica  virginica  virginica  virginica  versicolor virginica 
#> [37] virginica  versicolor
#> Levels: setosa versicolor virginica

Confusion Matrix:

mc <- confusion.matrix(data.test, prediction)
mc
#>             prediction
#> real         setosa versicolor virginica
#>   setosa         14          1         0
#>   versicolor      0         12         0
#>   virginica       0          2         9

Some Rates:

general.indexes(mc = mc)
#> 
#> Confusion Matrix:
#>             prediction
#> real         setosa versicolor virginica
#>   setosa         14          1         0
#>   versicolor      0         12         0
#>   virginica       0          2         9
#> 
#> Overall Accuracy: 0.9211
#> Overall Error:    0.0789
#> 
#> Category Accuracy:
#> 
#>        setosa   versicolor    virginica
#>      0.933333     1.000000     0.818182

Neural Networks (nnet)

Modeling:

model <- train.nnet(Species~., data.train, size = 20)
#> # weights:  163
#> initial  value 118.563259 
#> iter  10 value 27.379768
#> iter  20 value 6.898345
#> iter  30 value 2.005139
#> iter  40 value 1.187001
#> iter  50 value 0.508702
#> iter  60 value 0.007067
#> iter  70 value 0.000571
#> iter  80 value 0.000164
#> final  value 0.000084 
#> converged
model
#> a 4-20-3 network with 163 weights
#> inputs: Sepal.Length Sepal.Width Petal.Length Petal.Width 
#> output(s): Species 
#> options were - softmax modelling

Prediction as probability:

prediction <- predict(model, data.test , type = "prob")
prediction
#>           setosa   versicolor    virginica
#> 5   1.000000e+00 3.540393e-09 5.602636e-28
#> 6   1.000000e+00 1.384381e-08 1.336905e-27
#> 9   9.999998e-01 1.820305e-07 3.647849e-27
#> 15  1.000000e+00 1.227782e-09 3.434296e-28
#> 16  1.000000e+00 1.360741e-09 3.944093e-28
#> 21  9.999997e-01 3.351086e-07 4.593636e-27
#> 26  9.999964e-01 3.611180e-06 1.470213e-26
#> 31  9.999992e-01 8.150421e-07 7.822082e-27
#> 33  1.000000e+00 1.498803e-09 3.819933e-28
#> 34  1.000000e+00 1.256102e-09 3.561856e-28
#> 41  1.000000e+00 3.659719e-09 5.752690e-28
#> 42  9.998752e-01 1.248142e-04 9.474535e-26
#> 46  9.999998e-01 1.907995e-07 3.764596e-27
#> 49  1.000000e+00 4.846918e-09 6.522075e-28
#> 50  1.000000e+00 1.414839e-08 1.016431e-27
#> 52  2.406713e-09 1.000000e+00 9.507302e-17
#> 57  1.271116e-09 1.000000e+00 2.059633e-15
#> 60  1.206706e-09 1.000000e+00 6.593198e-16
#> 63  4.173078e-07 9.999996e-01 4.262150e-12
#> 69  8.325243e-07 9.986374e-01 1.361764e-03
#> 70  1.156555e-08 1.000000e+00 3.148005e-18
#> 72  1.068104e-08 1.000000e+00 3.846966e-18
#> 75  9.205884e-09 1.000000e+00 4.746927e-18
#> 85  1.084021e-09 1.000000e+00 3.119235e-14
#> 89  3.284597e-09 1.000000e+00 2.252965e-17
#> 96  5.549045e-09 1.000000e+00 6.096271e-18
#> 98  6.205572e-09 1.000000e+00 8.416561e-18
#> 105 7.467414e-26 1.187189e-34 1.000000e+00
#> 113 1.465050e-17 2.888662e-21 1.000000e+00
#> 117 8.205352e-11 3.266860e-10 1.000000e+00
#> 122 9.628941e-21 6.905333e-27 1.000000e+00
#> 127 2.427668e-05 5.703348e-02 9.429422e-01
#> 128 7.700479e-06 9.983065e-01 1.685839e-03
#> 133 3.541590e-26 4.642146e-35 1.000000e+00
#> 134 5.769176e-09 1.000000e+00 2.917273e-11
#> 137 4.318577e-24 4.134972e-32 1.000000e+00
#> 138 9.525828e-10 1.736074e-08 1.000000e+00
#> 139 1.858905e-06 9.999489e-01 4.927697e-05

Prediction as classification:

prediction <- predict(model, data.test , type = "class")
prediction
#>  [1] setosa     setosa     setosa     setosa     setosa     setosa    
#>  [7] setosa     setosa     setosa     setosa     setosa     setosa    
#> [13] setosa     setosa     setosa     versicolor versicolor versicolor
#> [19] versicolor versicolor versicolor versicolor versicolor versicolor
#> [25] versicolor versicolor versicolor virginica  virginica  virginica 
#> [31] virginica  virginica  versicolor virginica  versicolor virginica 
#> [37] virginica  versicolor
#> Levels: setosa versicolor virginica

Confusion Matrix:

mc <- confusion.matrix(data.test, prediction)
mc
#>             prediction
#> real         setosa versicolor virginica
#>   setosa         15          0         0
#>   versicolor      0         12         0
#>   virginica       0          3         8

Some Rates:

general.indexes(mc = mc)
#> 
#> Confusion Matrix:
#>             prediction
#> real         setosa versicolor virginica
#>   setosa         15          0         0
#>   versicolor      0         12         0
#>   virginica       0          3         8
#> 
#> Overall Accuracy: 0.9211
#> Overall Error:    0.0789
#> 
#> Category Accuracy:
#> 
#>        setosa   versicolor    virginica
#>      1.000000     1.000000     0.727273

Neural Networks (neuralnet)

Modeling:

model <- train.neuralnet(Species~., data.train, hidden = c(5, 7, 6),
                         linear.output = FALSE, threshold = 0.01, stepmax = 1e+06)
summary(model)
#>                     Length Class      Mode    
#> call                  7    -none-     call    
#> response            336    -none-     logical 
#> covariate           448    -none-     numeric 
#> model.list            2    -none-     list    
#> err.fct               1    -none-     function
#> act.fct               1    -none-     function
#> linear.output         1    -none-     logical 
#> data                  5    data.frame list    
#> exclude               0    -none-     NULL    
#> net.result            1    -none-     list    
#> weights               1    -none-     list    
#> generalized.weights   1    -none-     list    
#> startweights          1    -none-     list    
#> result.matrix       139    -none-     numeric 
#> prmdt                 4    -none-     list

Prediction as probability:

prediction <- predict(model, data.test , type = "prob")
prediction
#>           setosa   versicolor    virginica
#> 5   9.999784e-01 2.349022e-05 3.863667e-41
#> 6   9.999787e-01 2.316524e-05 3.790956e-41
#> 9   9.999791e-01 2.272498e-05 3.693040e-41
#> 15  9.999780e-01 2.397119e-05 3.971954e-41
#> 16  9.999783e-01 2.362588e-05 3.894128e-41
#> 21  9.999785e-01 2.334208e-05 3.830477e-41
#> 26  9.999788e-01 2.297651e-05 3.748898e-41
#> 31  9.999789e-01 2.288129e-05 3.727726e-41
#> 33  9.999782e-01 2.365945e-05 3.901677e-41
#> 34  9.999781e-01 2.378450e-05 3.929827e-41
#> 41  9.999784e-01 2.342997e-05 3.850159e-41
#> 42  9.999797e-01 2.193147e-05 3.518303e-41
#> 46  9.999789e-01 2.290021e-05 3.731930e-41
#> 49  9.999783e-01 2.357690e-05 3.883123e-41
#> 50  9.999785e-01 2.340400e-05 3.844340e-41
#> 52  4.237364e-14 1.000000e+00 1.330275e-15
#> 57  1.614217e-14 1.000000e+00 1.412431e-15
#> 60  3.835912e-14 1.000000e+00 1.337155e-15
#> 63  8.116158e-14 1.000000e+00 1.286167e-15
#> 69  5.371352e-20 1.000000e+00 1.035519e-14
#> 70  8.128189e-14 1.000000e+00 1.285627e-15
#> 72  7.923377e-14 1.000000e+00 1.287487e-15
#> 75  7.133272e-14 1.000000e+00 1.294854e-15
#> 85  8.051674e-15 1.000000e+00 1.494540e-15
#> 89  7.482973e-14 1.000000e+00 1.291266e-15
#> 96  8.085033e-14 1.000000e+00 1.285826e-15
#> 98  6.584553e-14 1.000000e+00 1.300182e-15
#> 105 3.269140e-74 3.246313e-19 1.000000e+00
#> 113 1.300904e-70 1.315887e-17 1.000000e+00
#> 117 3.605219e-60 5.728337e-12 1.000000e+00
#> 122 2.093673e-66 1.705439e-15 1.000000e+00
#> 127 6.536273e-36 9.999993e-01 4.731222e-07
#> 128 2.838659e-31 1.000000e+00 8.477163e-10
#> 133 2.141709e-74 2.780683e-19 1.000000e+00
#> 134 3.519674e-21 1.000000e+00 2.953209e-14
#> 137 2.034621e-73 7.007973e-19 1.000000e+00
#> 138 5.099400e-57 5.516031e-10 1.000000e+00
#> 139 5.868921e-27 1.000000e+00 4.679605e-12

Prediction as classification:

prediction <- predict(model, data.test , type = "class")
prediction
#>  [1] setosa     setosa     setosa     setosa     setosa     setosa    
#>  [7] setosa     setosa     setosa     setosa     setosa     setosa    
#> [13] setosa     setosa     setosa     versicolor versicolor versicolor
#> [19] versicolor versicolor versicolor versicolor versicolor versicolor
#> [25] versicolor versicolor versicolor virginica  virginica  virginica 
#> [31] virginica  versicolor versicolor virginica  versicolor virginica 
#> [37] virginica  versicolor
#> Levels: setosa versicolor virginica

Confusion Matrix:

mc <- confusion.matrix(data.test, prediction)
mc
#>             prediction
#> real         setosa versicolor virginica
#>   setosa         15          0         0
#>   versicolor      0         12         0
#>   virginica       0          4         7

Some Rates:

general.indexes(mc = mc)
#> 
#> Confusion Matrix:
#>             prediction
#> real         setosa versicolor virginica
#>   setosa         15          0         0
#>   versicolor      0         12         0
#>   virginica       0          4         7
#> 
#> Overall Accuracy: 0.8947
#> Overall Error:    0.1053
#> 
#> Category Accuracy:
#> 
#>        setosa   versicolor    virginica
#>      1.000000     1.000000     0.636364

Support Vector Machines

Modeling:

model <- train.svm(Species~., data.train)
model
#> 
#> Call:
#> svm(formula = Species ~ ., data = data.train, probability = TRUE)
#> 
#> 
#> Parameters:
#>    SVM-Type:  C-classification 
#>  SVM-Kernel:  radial 
#>        cost:  1 
#> 
#> Number of Support Vectors:  40

Prediction as probability:

prediction <- predict(model, data.test , type = "prob")
prediction
#>          setosa  versicolor   virginica
#> 5   0.964357981 0.020442068 0.015199951
#> 6   0.926950276 0.044849160 0.028200564
#> 9   0.905095747 0.062002026 0.032902227
#> 15  0.891772412 0.059965842 0.048261746
#> 16  0.652977480 0.172663035 0.174359484
#> 21  0.960240694 0.025342380 0.014416926
#> 26  0.929409291 0.048300977 0.022289732
#> 31  0.952559475 0.030555049 0.016885476
#> 33  0.864388195 0.071516225 0.064095580
#> 34  0.820805933 0.093704823 0.085489245
#> 41  0.967795278 0.019120319 0.013084403
#> 42  0.294054344 0.579272196 0.126673460
#> 46  0.933307242 0.045321504 0.021371255
#> 49  0.958800936 0.024364643 0.016834420
#> 50  0.968571287 0.018881820 0.012546894
#> 52  0.012019081 0.970020981 0.017959939
#> 57  0.012662727 0.931687729 0.055649543
#> 60  0.012531525 0.928195550 0.059272925
#> 63  0.016203473 0.970656475 0.013140052
#> 69  0.019213123 0.751874297 0.228912580
#> 70  0.009928375 0.981402933 0.008668691
#> 72  0.010681590 0.985282236 0.004036174
#> 75  0.011679422 0.983864991 0.004455587
#> 85  0.015159095 0.911392176 0.073448729
#> 89  0.018170038 0.974795572 0.007034391
#> 96  0.018824543 0.976782957 0.004392500
#> 98  0.010848484 0.984328319 0.004823197
#> 105 0.009025324 0.003900057 0.987074620
#> 113 0.009166204 0.015742776 0.975091020
#> 117 0.010865290 0.117212659 0.871922051
#> 122 0.011263410 0.045535914 0.943200675
#> 127 0.010903215 0.317863190 0.671233595
#> 128 0.011861984 0.434774313 0.553363703
#> 133 0.009093796 0.003274160 0.987632044
#> 134 0.010951009 0.747986311 0.241062680
#> 137 0.013177118 0.016856054 0.969966829
#> 138 0.011622741 0.161527433 0.826849825
#> 139 0.012099214 0.498754719 0.489146067

Prediction as classification:

prediction <- predict(model, data.test , type = "class")
prediction
#>  [1] setosa     setosa     setosa     setosa     setosa     setosa    
#>  [7] setosa     setosa     setosa     setosa     setosa     versicolor
#> [13] setosa     setosa     setosa     versicolor versicolor versicolor
#> [19] versicolor versicolor versicolor versicolor versicolor versicolor
#> [25] versicolor versicolor versicolor virginica  virginica  virginica 
#> [31] virginica  virginica  virginica  virginica  versicolor virginica 
#> [37] virginica  virginica 
#> Levels: setosa versicolor virginica

Confusion Matrix:

mc <- confusion.matrix(data.test, prediction)
mc
#>             prediction
#> real         setosa versicolor virginica
#>   setosa         14          1         0
#>   versicolor      0         12         0
#>   virginica       0          1        10

Some Rates:

general.indexes(mc = mc)
#> 
#> Confusion Matrix:
#>             prediction
#> real         setosa versicolor virginica
#>   setosa         14          1         0
#>   versicolor      0         12         0
#>   virginica       0          1        10
#> 
#> Overall Accuracy: 0.9474
#> Overall Error:    0.0526
#> 
#> Category Accuracy:
#> 
#>        setosa   versicolor    virginica
#>      0.933333     1.000000     0.909091

Extreme Gradient Boosting

Modeling:

model <- train.xgboost(Species~., data.train, nrounds = 79, maximize = FALSE, verbose = 0)
model
#> ##### xgb.Booster
#> raw: 69.6 Kb 
#> call:
#>   xgb.train(params = params, data = train_aux, nrounds = nrounds, 
#>     watchlist = watchlist, obj = obj, feval = feval, verbose = verbose, 
#>     print_every_n = print_every_n, early_stopping_rounds = early_stopping_rounds, 
#>     maximize = maximize, save_period = save_period, save_name = save_name, 
#>     xgb_model = xgb_model, callbacks = callbacks, eval_metric = "mlogloss")
#> params (as set within xgb.train):
#>   booster = "gbtree", objective = "multi:softprob", eta = "0.3", gamma = "0", max_depth = "6", min_child_weight = "1", subsample = "1", colsample_bytree = "1", num_class = "3", eval_metric = "mlogloss", validate_parameters = "TRUE"
#> xgb.attributes:
#>   niter
#> callbacks:
#>   cb.evaluation.log()
#> # of features: 4 
#> niter: 79
#> nfeatures : 4 
#> evaluation_log:
#>     iter train_mlogloss
#>        1       0.740619
#>        2       0.531637
#> ---                    
#>       78       0.019232
#>       79       0.019180

Prediction as probability:

prediction <- predict(model, data.test , type = "prob")
prediction
#>            setosa  versicolor    virginica
#>  [1,] 0.994102299 0.004506736 0.0013909652
#>  [2,] 0.994102299 0.004506736 0.0013909652
#>  [3,] 0.990518510 0.007897455 0.0015841086
#>  [4,] 0.974304318 0.024332428 0.0013632636
#>  [5,] 0.974304318 0.024332428 0.0013632636
#>  [6,] 0.994102299 0.004506736 0.0013909652
#>  [7,] 0.994102299 0.004506736 0.0013909652
#>  [8,] 0.993904650 0.004505840 0.0015895240
#>  [9,] 0.994102299 0.004506736 0.0013909652
#> [10,] 0.974304318 0.024332428 0.0013632636
#> [11,] 0.994102299 0.004506736 0.0013909652
#> [12,] 0.980996370 0.008246775 0.0107567860
#> [13,] 0.993904650 0.004505840 0.0015895240
#> [14,] 0.994102299 0.004506736 0.0013909652
#> [15,] 0.994102299 0.004506736 0.0013909652
#> [16,] 0.001607533 0.997773588 0.0006188240
#> [17,] 0.001606412 0.997077703 0.0013158814
#> [18,] 0.009366194 0.987494290 0.0031394563
#> [19,] 0.003260965 0.989244819 0.0074942745
#> [20,] 0.003637327 0.984017551 0.0123450998
#> [21,] 0.001727476 0.994302511 0.0039700456
#> [22,] 0.001264269 0.998296797 0.0004389427
#> [23,] 0.001040670 0.998560727 0.0003986500
#> [24,] 0.016989637 0.977287650 0.0057227351
#> [25,] 0.002890580 0.996140540 0.0009688937
#> [26,] 0.002890580 0.996140540 0.0009688937
#> [27,] 0.001040670 0.998560727 0.0003986500
#> [28,] 0.001147901 0.002052797 0.9967992306
#> [29,] 0.001147901 0.002052797 0.9967992306
#> [30,] 0.001147901 0.002052797 0.9967992306
#> [31,] 0.006289480 0.094022602 0.8996878862
#> [32,] 0.019232145 0.843630254 0.1371375918
#> [33,] 0.005231957 0.219559729 0.7752082944
#> [34,] 0.001147480 0.002418917 0.9964336157
#> [35,] 0.013699026 0.811612904 0.1746880412
#> [36,] 0.001147901 0.002052797 0.9967992306
#> [37,] 0.001147901 0.002052797 0.9967992306
#> [38,] 0.011059565 0.919935644 0.0690047964

Prediction as classification:

prediction <- predict(model, data.test , type = "class")
prediction
#>  [1] setosa     setosa     setosa     setosa     setosa     setosa    
#>  [7] setosa     setosa     setosa     setosa     setosa     setosa    
#> [13] setosa     setosa     setosa     versicolor versicolor versicolor
#> [19] versicolor versicolor versicolor versicolor versicolor versicolor
#> [25] versicolor versicolor versicolor virginica  virginica  virginica 
#> [31] virginica  versicolor virginica  virginica  versicolor virginica 
#> [37] virginica  versicolor
#> Levels: setosa versicolor virginica

Confusion Matrix:

mc <- confusion.matrix(data.test, prediction)
mc
#>             prediction
#> real         setosa versicolor virginica
#>   setosa         15          0         0
#>   versicolor      0         12         0
#>   virginica       0          3         8

Some Rates:

general.indexes(mc = mc)
#> 
#> Confusion Matrix:
#>             prediction
#> real         setosa versicolor virginica
#>   setosa         15          0         0
#>   versicolor      0         12         0
#>   virginica       0          3         8
#> 
#> Overall Accuracy: 0.9211
#> Overall Error:    0.0789
#> 
#> Category Accuracy:
#> 
#>        setosa   versicolor    virginica
#>      1.000000     1.000000     0.727273