In this vignette we will explore the supported deep learning
frameworks in `nn2poly`

and how can they be used. Currently
supported frameworks are `keras`

/`tensorflow`

and
`luz`

/`torch`

. The benefit of using them is that
we provide an easy to use implementation of the weight constraints
needed during training to avoid problems in the Taylor expansion carried
out by `nn2poly`

. Furthermore, models created with these
frameworks are directly supported by the `nn2poly()`

function, allowing then to skip the need to manually create an object
with weights and activation functions as explained in
`vignette("nn2poly-01-introduction")`

.

If you are interested in having support for another deep learning framework, please open an issue in our GitHub repo!

Note: Using the provided constraints when training a model increases the number of needed epochs to converge, which is an expected trade off between learning speed and control over the weights for easier explainability.

As our goal here is to show how to impose the needed constraints for
each framework, we will replicate the polynomial data generation from
`vignette("nn2poly-01-introduction")`

and solve later the
same regression problem with the different deep learning frameworks.
Refer to that vignette for more details.

Create the polynomial \(4x_1 - 3 x_2x_3\):

Generate the data:

```
# Define number of variables and sample size
p <- 3
n_sample <- 500
# Predictor variables
X <- matrix(0,n_sample,p)
for (i in 1:p){
X[,i] <- rnorm(n = n_sample,0,1)
}
# Response variable + small error term
Y <- nn2poly:::eval_poly(poly = polynomial, newdata = X) +
stats::rnorm(n_sample, 0, 0.1)
# Store all as a data frame
data <- as.data.frame(cbind(X, Y))
head(data)
#> V1 V2 V3 Y
#> 1 1.3709584 1.029140719 2.3250585 -1.7547416
#> 2 -0.5646982 0.914774868 0.5241222 -3.7107357
#> 3 0.3631284 -0.002456267 0.9707334 1.3609395
#> 4 0.6328626 0.136009552 0.3769734 2.4608270
#> 5 0.4042683 -0.720153545 -0.9959334 -0.6141076
#> 6 -0.1061245 -0.198124330 -0.5974829 -0.7455793
```

Scale the data to have everything in the \([-1,1]\) interval and divide it in train and test.

```
# Data scaling to [-1,1]
maxs <- apply(data, 2, max)
mins <- apply(data, 2, min)
data <- as.data.frame(scale(data, center = mins + (maxs - mins) / 2, scale = (maxs - mins) / 2))
# Divide in train (0.75) and test (0.25)
index <- sample(1:nrow(data), round(0.75 * nrow(data)))
train <- data[index, ]
test <- data[-index, ]
train_x <- as.matrix(train[,-(p+1)])
train_y <- as.matrix(train[,(p+1)])
test_x <- as.matrix(test[,-(p+1)])
test_y <- as.matrix(test[,(p+1)])
```

With our common data generated, we are now ready to delve into the
different deep learning frameworks. In each case we will solve the same
regression problem with the previously generated polynomial data, with
two neural networks for each framework. These neural networks will have
the same structure but one will be constrained (`nn_con`

) and
the other one unconstrained (`nn_uncon`

) to showcase their
differences.

`keras`

/`tensorflow`

In this section we will show how to use `nn2poly`

with a
`keras`

neural network and how to impose the needed weight
constraints during training.

First of all we will set the structure of our neural networks using a
sequential model in `keras`

. This following function will be
used to create both the constrained and unconstrained neural networks in
the `keras`

example.

```
keras_model <- function() {
tensorflow::set_random_seed(42)
nn <- keras::keras_model_sequential()
nn <- keras::layer_dense(nn, units = 100, activation = "tanh", input_shape = p)
nn <- keras::layer_dense(nn, units = 100, activation = "tanh")
nn <- keras::layer_dense(nn, units = 100, activation = "tanh")
nn <- keras::layer_dense(nn, units = 1, activation = "linear")
nn
}
```

Now we define both NNs. The only needed step to impose our
implemented weight constraints on a `keras`

model is to use
the function `add_constraints()`

and then train it as
usually. The constraints currently accept two possible types,
`"l1_norm"`

and `"l2_norm"`

. In both cases, the
norm for each weight vector (including the bias) incident on a neuron
will be constrained to be less or equal than 1. Using data scaled to the
\([-1,1]\) interval, the l1-norm is the
one that guarantees best results.

Also, it is important to note that the weight constraints are imposed on all layers except for the last one, which is expected to be linear and not need a Taylor expansion.

Note: Our implementation differs slightly from the constraints provided by`keras`

because in our implementation we join the bias with the rest of the weights while`keras`

default constraints allow only for bias or kernel constraints. Our implementation uses a custom callback that is applied at the end of each batch.

Now we can compile and train both NNs using standard the
`keras`

approach. Note that we have to increase the number of
needed epochs with the constrained NN to learn properly.

We can visualize the NN predictions vs the original Y values for both neural networks and observe how both of them provide accurate predictions (the values fall near the “perfect” diagonal red line).

```
# Obtain the predicted values with the NN to compare them
prediction_nn_uncon <- predict(nn_uncon, test_x)
#> 4/4 - 0s - 48ms/epoch - 12ms/step
# Diagonal plot implemented in the package to quickly visualize and compare predictions
nn2poly:::plot_diagonal(x_axis = prediction_nn_uncon, y_axis = test_y, xlab = "Unconstrained NN prediction", ylab = "Original Y")
```

```
# Obtain the predicted values with the NN to compare them
prediction_nn_con <- predict(nn_con, test_x)
#> 4/4 - 0s - 91ms/epoch - 23ms/step
# Diagonal plot implemented in the package to quickly visualize and compare predictions
nn2poly:::plot_diagonal(x_axis = prediction_nn_con, y_axis = test_y, xlab = "Constrained NN prediction", ylab = "Original Y")
```

After the NNs have been trained, we can directly call
`nn2poly`

on the `keras`

model.

```
# Polynomial for nn_uncon
final_poly_uncon <- nn2poly(object = nn_uncon,
max_order = 3)
# Polynomial for nn_con
final_poly_con <- nn2poly(object = nn_con,
max_order = 3)
```

We can visualize the polynomial predictions versus NN predictions

```
# Obtain the predicted values for the test data with our two polynomials
prediction_poly_uncon <- predict(object = final_poly_uncon, newdata = test_x)
prediction_poly_con <- predict(object = final_poly_con, newdata = test_x)
nn2poly:::plot_diagonal(x_axis = prediction_nn_uncon, y_axis = prediction_poly_uncon, xlab = "NN prediction", ylab = "Polynomial prediction") + ggplot2::ggtitle("Polynomial for nn_uncon")
```

```
nn2poly:::plot_diagonal(x_axis = prediction_nn_con, y_axis = prediction_poly_con, xlab = "NN prediction", ylab = "Polynomial prediction") + ggplot2::ggtitle("Polynomial for nn_con")
```

We can clearly see how the constrained NN allows to obtain a close
approximation with `nn2poly`

while the polynomial obtained
for the unconstrained one is not a good representation.

We can also represent the obtained polynomial coefficients and
observe how the constrained case clearly has terms `2,3`

and
`1`

with the same sign as the original polynomial while the
rest are close to 0.

Note: The coefficients values are not in the same scale as the original polynomial due to the fact that we have scaled all the data before training, even the response variable Y. Furthermore, as data has been scaled to the \([-1,1]\) interval, interactions of order 2 or higher would usually need a higher absolute value than the lower order coefficients to be more relevant

`luz`

/`torch`

In this section we will show how to use `nn2poly`

with a
`torch`

neural network, built with its higher level API
`luz`

, and how to impose the needed weight constraints during
training. Furthermore, we will explain how to use
`luz_model_sequential()`

, a helper needed to create a
`torch`

model in an adequate manner so that it can be easily
recognized by `nn2poly`

.

`torch`

In this framework, we need to manipulate a bit our data to use it as
input for a `torch`

model. This can be done dividing our
`train_x`

data intro only train and validation matrices and
using `luz::as_dataloader()`

```
# Divide in only train and validation
all_indices <- 1:nrow(train_x)
only_train_indices <- sample(all_indices, size = round(nrow(train_x)) * 0.8)
val_indices <- setdiff(all_indices, only_train_indices)
# Create lists with x and y values to feed luz::as_dataloader()
only_train_x <- as.matrix(train_x[only_train_indices,])
only_train_y <- as.matrix(train_y[only_train_indices,])
val_x <- as.matrix(train_x[val_indices,])
val_y <- as.matrix(train_y[val_indices,])
only_train_list <- list(x = only_train_x, y = only_train_y)
val_list <- list(x = val_x, y = val_y)
torch_data <- list(
train = luz::as_dataloader(only_train_list, batch_size = 50, shuffle = TRUE),
valid = luz::as_dataloader(val_list, batch_size = 50)
)
```

Now we will set the structure of our neural networks using a
sequential model in `torch`

. To do so we have implemented the
helper `luz_model_sequential()`

, which lets the user stack
linear layers in a similar way as `keras`

and allows
`nn2poly()`

to be directly used in the model without the need
to extract weights and activation functions manually as in the default
example of `vignette("nn2poly-01-introduction")`

.

This following function will be used to create both the constrained
and unconstrained neural networks in the `torch`

example.

```
luz_nn <- function() {
torch::torch_manual_seed(42)
luz_model_sequential(
torch::nn_linear(p,100),
torch::nn_tanh(),
torch::nn_linear(100,100),
torch::nn_tanh(),
torch::nn_linear(100,100),
torch::nn_tanh(),
torch::nn_linear(100,1)
)
}
```

Now we define both NNs. In this case, differing from the
`keras`

example, we will impose the constraints later to
follow the same approach as in `luz`

documentation. However,
the use of `add_constraints()`

is analogous.

First we train the unconstrained NN using ans standard
`luz`

approach.

```
fitted_uncon <- nn_uncon %>%
luz::setup(
loss = torch::nn_mse_loss(),
optimizer = torch::optim_adam,
metrics = list(
luz::luz_metric_mse()
)
) %>%
luz::fit(torch_data$train, epochs = 50, valid_data = torch_data$valid)
fitted_uncon %>% plot()
```

Then, we train the constrained NN by using the function function
`add_constraints()`

before calling `fit()`

. The
constraints currently accept two possible types, `"l1_norm"`

and `"l2_norm"`

. In both cases, the norm for each weight
vector (including the bias) incident on a neuron will be constrained to
be less or equal than 1. Using data scaled to the \([-1,1]\) interval, the l1-norm is the one
that guarantees best results.

Also, it is important to note that the weight constraints are imposed on all layers except for the last one, which is expected to be linear and not need a Taylor expansion.

Note that we have to increase the number of needed epochs with the constrained NN to learn properly.

```
fitted_con <- nn_con %>%
luz::setup(
loss = torch::nn_mse_loss(),
optimizer = torch::optim_adam,
) %>%
add_constraints("l1_norm") %>%
fit(torch_data$train, epochs = 700, valid_data = torch_data$valid)
fitted_con %>% plot()
```

Note: Our implementation uses a custom callback that is applied at the end of each batch.

We can visualize the NN predictions vs the original Y values for both neural networks and observe how both of them provide accurate predictions (the values fall near the “perfect” diagonal red line).

```
# Obtain the predicted values with the NN to compare them
prediction_NN_uncon <- as.array(predict(fitted_uncon, test_x))
# Diagonal plot implemented in the package to quickly visualize and compare predictions
nn2poly:::plot_diagonal(x_axis = prediction_NN_uncon, y_axis = test_y, xlab = "Unconstrained NN prediction", ylab = "Original Y")
```

```
# Obtain the predicted values with the NN to compare them
prediction_NN_con <- as.array(predict(fitted_con, test_x))
# Diagonal plot implemented in the package to quickly visualize and compare predictions
nn2poly:::plot_diagonal(x_axis = prediction_NN_con, y_axis = test_y, xlab = "Constrained NN prediction", ylab = "Original Y")
```

After the NNs have been trained, we can directly call
`nn2poly`

on the `luz`

model.

```
# Polynomial for nn_uncon
final_poly_uncon <- nn2poly(object = fitted_uncon,
max_order = 3)
# Polynomial for nn_con
final_poly_con <- nn2poly(object = fitted_con,
max_order = 3)
```

We can visualize the polynomial predictions versus NN predictions

```
# Obtain the predicted values for the test data with our two polynomials
prediction_poly_uncon <- predict(object = final_poly_uncon, newdata = test_x)
prediction_poly_con <- predict(object = final_poly_con, newdata = test_x)
nn2poly:::plot_diagonal(x_axis = prediction_nn_uncon, y_axis = prediction_poly_uncon, xlab = "NN prediction", ylab = "Polynomial prediction") + ggplot2::ggtitle("Polynomial for nn_uncon")
```

```
nn2poly:::plot_diagonal(x_axis = prediction_nn_con, y_axis = prediction_poly_con, xlab = "NN prediction", ylab = "Polynomial prediction") + ggplot2::ggtitle("Polynomial for nn_con")
```

We can clearly see how the constrained NN allows to obtain a close
approximation with `nn2poly`

while the polynomial obtained
for the unconstrained one is not a good representation.

We can also represent the obtained polynomial coefficients and
observe how the constrained case clearly has terms `2,3`

and
`1`

with the same sign as the original polynomial while the
rest are close to 0.

Note: The coefficients values are not in the same scale as the original polynomial due to the fact that we have scaled all the data before training, even the response variable Y. Furthermore, as data has been scaled to the \([-1,1]\) interval, interactions of order 2 or higher would usually need a higher absolute value than the lower order coefficients to be more relevant