fntl: Tools for Rcpp with STL Functions

Author
Affiliation

Andrew M. Raim3

Published

Last modified: 2024-05-23 23:02

Abstract

TBD abstract goes here.

Comments

I think a nice document will make a huge difference in the usability of this package. The generated documentation from Doxygen is okay, but not the most appealing. It has a lot of cruft that distracts from what I’d want to see as a user.

Here’s what I have in mind right now. Let’s try to manually document the API in a human readable way.

TBD:

  • should fd_hessian use the Numerical Cookbook version of fd_deriv?
  • make sure to mark C++11 as a requirement for this package. We may need an attribute like [[Rcpp::plugins("cpp11")]] somewhere too.

Intro

TBD

The idea of the package is to pass STL functions as arguments like we normally do in R, and have it feel as similar as possible. This might be a good opportunity to cite the original paper on the R language.

Rcpp: why/when we’re motivated to use it and write tools for it. When can R be slow and why use R versus C++?

What is the use case? In many situations, we carry out tasks like integration and optimization in R and write the integrand or target function in C++ if their computation is slow or cumbersome in R. Rcpp is used to generate an R interface for the C++ function so that they can be integrated. A use case for the present work is when we are in Rcpp already, perhaps as part of a larger computation, and need to do an integration. Calling back to R is possible, but incurs a performance penalty. Doing so repeatedly accumulates the penalty, and may negate performance benefits of using C++ in the first place.

Principles

TBD
  • Avoid external dependencies beyond base R, the Rcpp package, and dependencies needed for these. This is to make the package usable in a locked-down environment where installing system libraries may be a nontrivial ask.

  • Where possible, we prefer to use C implementations of methods which are exposed within base R. For these, we implement adapters which allow the C implementation to call the users STL function.

  • We roll our own implementation of a method when necessary. We may opt for a more basic algorithm or a less optimized code than, say, one in Netlib. Good error handling and correctness of the code are most important.

  • We don’t implement functions where it is easy to use STL functions with the existing Rcpp interface. For example, see the demo with a lambda function here. We can give an example of sapply here and show that base Rcpp is enough there.

Literature review

TBD
  • Roptim has similar functionality to call the optim family of fucntions, but uses Functors. They have a nice guide to calling the C functions underlying optim which was very helpful for the present package.

  • RcppNumerical also has some of the same functionality. It makes use of libraries NumericalIntegration and LBFGS++ which are not specific to R. Its interface is also based on Functors.

  • We try to make the C++ interface as R-like as possible.

  • I think it would be possible to make adapters to call Roptim and RcppNumerical functions with STL functions too.

General Programming Guide

TBD
  • How to use the package:

    • The depends statement
    • Including fntl.h
    • Using the fntl namespace
    • A minimal working example might be useful here
  • A review of STL functions and lambdas in C++.

  • The design pattern with result, args, status codes (as enum classes), and the main function. Arguments in args are taken to have the same defaults as in R, where it makes sense (like when the same underlying C code is called).

  • the abstract result class, the to_list() member function, and how users will make use of them.

  • Use of to_underlying with the status codes, and creating status codes from underlying types.

  • The ErrorAction enum class and its values.

API

TBD

I think the idea will be to have one subsection per function.

For each subsection:

  • Which method is being implemented?
  • Give references where possible.
  • Did we write the code ourselves, call something in R, etc?
  • The interface
    • The main function(s) and a table describing its arguments.
    • The arg structure: member names, types, descriptions, and default values.
    • The result structure: member names, types, and descriptions.
    • The list of status codes, if any.
  • A brief example: only a snippet to illustrate usage. Doesn’t need to be the full code to compile and run. However, we might also want to show a fully working example alongside the snippet, perhaps with a link to the code.

Integration

Source code can be found in integrate.h.

Description

Compute the integral \[ \int_a^b f(x) dx, \] where \(a\) may be finite or \(-\infty\) and \(b\) may be finite or \(\infty\). Calls one of two C functions underlying the R function integrate: Rdqags when both limits are finite and Rdqagi otherwise. These functions are based on routines in QUADPACK (Piessens et al. 1983).

Function

The second form uses defaults for all elements of args.

integrate_result integrate(
1    const uv_function& f,
2    double lower,
3    double upper,
4    const integrate_args& args
)

integrate_result integrate(
    const uv_function& f,
    double lower,
    double upper
)
1
Function to take as the integrand.
2
Lower limit \(a\) of integral; may be R_NegInf.
3
Upper limit \(b\) of integral; may be R_PosInf.
4
Additional arguments.

Optional Arguments

struct integrate_args {
1    unsigned int subdivisions = 100L;
2    double rel_tol = mach_eps_4r;
3    double abs_tol = mach_eps_4r;
4    bool stop_on_error = true;
};
1
The maximum number of subintervals.
2
Relative accuracy requested.
3
Absolute accuracy requested.
4
If true, errors in integrate raise exceptions.

Result

struct integrate_result : public result {
1    double value;
2    double abs_error;
3    int subdivisions;
4    integrate_status status;
5    int n_eval;
6    std::string message;

7    Rcpp::List to_list() const;
};
1
The final approximation of the integral.
2
Estimate of the modulus of the absolute error.
3
The number of subintervals produced in the subdivision process.
4
A code describing the status of the operation.
5
Number of function evaluations.
6
A message describing the status of the operation.
7
Return an Rcpp List representation.

Status Codes

enum class integrate_status : int {
0    OK = 0L,
1    MAX_SUBDIVISIONS = 1L,
2    ROUNDOFF_ERROR = 2L,
3    BAD_INTEGRAND_BEHAVIOR = 3L,
4    ROUNDOFF_ERROR_EXTRAPOLATION_TABLE = 4L,
5    PROBABLY_DIVERGENT_INTEGRAL = 5L,
6    INVALID_INPUT = 6L
};
0
OK.
1
maximum number of subdivisions reached.
2
roundoff error was detected.
3
extremely bad integrand behaviour.
4
roundoff error is detected in the extrapolation table.
5
the integral is probably divergent.
6
the input is invalid.

Example

Compute the integral \(\int_{-\infty}^0 e^{-x^2/2} dx\).

const fntl::uv_function& f = [](double x) {
    return exp(-pow(x, 2) / 2);
};
auto out = fntl::integrate(f, R_NegInf, 0);
double value = out.value;

Finite Differences

Derivative

Source code can be found in fd-deriv.h.

Description

Compute the derivative of \(f : \mathbb{R} \rightarrow \mathbb{R}\) numerically at point \(x\) using Ridders algorithm, adapted from the program dfridr in Section 5.7 of Press et al. (2007).

Function

The second form uses defaults for all elements of args.

fd_deriv_result fd_deriv(
1    const uv_function& f,
2    double x,
3    double h = 1,
4    unsigned int maxiter = 10
)
1
Function to differentiate.
2
Point where derivative is taken.
3
Initial step size.
4
Maximum number of iterations.

TBD is this true? Taking maxiter = 1 gives the basic (symmetric) finite-difference derivative using the initial value of h as the perturbation.

Result

struct fd_deriv_result : public result {
1    double value;
2    double err;
3    unsigned int iter;

4    Rcpp::List to_list() const;
};
1
The final approximation of the derivative.
2
An estimate of the error in approximation.
3
Number of iterations used to produce approximation.
4
Return an Rcpp List representation.

Example

Compute the derivative of \(f(x) = \sin(x)\) at \(x = 1/2\).

const fntl::uv_function& f = [](double x) {
    return sin(x);
};
auto out = fntl::fd_deriv(f, 0.5);
double value = out.value;

Gradient

TBD

fd-grad.h

Hessian

TBD

fd-hessian.h

Jacobian

TBD

fd-jacobian.h

Root-Finding

TBD

findroot.h

Optimization

Brent’s Algorithm

TBD

brent...

Nelder-Mead

TBD

neldermead.h

BFGS

TBD

bfgs...

L-BFGS-B

TBD

lbfgsb.h

Conjugate Gradient

cgmin...

Simulated Annealing

TBD

samin...

Newton-Raphson

TBD

nlm...

Examples

TBD

Not sure what to put here… maybe we can give some more involved but still small examples, and show the results? We could potentially merge this with the performance studies section if the examples here are used there.

Performance Studies

TBD

Ideas

  • Pick an application(s) with some intensive optimization or integration, but which is simple enough to describe briefly.

  • Compare the following implementations

    • Pure R
    • A matrix-ized version
    • A version in Rcpp where we call back to R for the optimization or integration
    • A version in Rcpp where we use this package.

Conclusions

TBD

Content needed

Acknowledgments

TBD

Content needed

References

Piessens, R., E. deDoncker-Kapenga, C. W. Uberhuber, and D. K. Kahaner. 1983. QUADPACK: A Subroutine Package for Automatic Integration. Springer.
Press, William H., Saul A. Teukolsky, William T. Vetterling, and Brian P. Flannery. 2007. Numerical Recipes: The Art of Scientific Computing. 3rd ed. Cambridge University Press.

Footnotes

  1. Corresponding author . Center for Statistical Research & Methodology, U.S. Census Bureau, Washington, DC, 20233, U.S.A. Disclaimer: This document is released to inform interested parties of ongoing research and to encourage discussion of work in progress. Any views expressed are those of the authors and not those of the U.S. Census Bureau.↩︎

  2. Corresponding author . Center for Statistical Research & Methodology, U.S. Census Bureau, Washington, DC, 20233, U.S.A. Disclaimer: This document is released to inform interested parties of ongoing research and to encourage discussion of work in progress. Any views expressed are those of the authors and not those of the U.S. Census Bureau.↩︎

  3. Corresponding author . Center for Statistical Research & Methodology, U.S. Census Bureau, Washington, DC, 20233, U.S.A. Disclaimer: This document is released to inform interested parties of ongoing research and to encourage discussion of work in progress. Any views expressed are those of the authors and not those of the U.S. Census Bureau.↩︎