Help for package mvbutils

Title:

General utilities, workspace organization, code and doc editing, live package maintenance, etc

Description:

Hierarchical workspace tree, code editing and backup, easy package prep, editing of packages while loaded, per-object lazy-loading, easy documentation, macro functions, and miscellaneous utilities. Needed by various packages including debug, offarray, and kinference.

Depends:

R (≥ 4.2)

Imports:

utils, tools, stats, grDevices, graphics

Additional_repositories:

https://markbravington.r-universe.dev

Suggests:

doParallel, foreach, debug

KeepPlaintextDoco:

yes

NeedsCompilation:

ByteCompile:

License:

GPL-2 | GPL-3 [expanded from: GPL (≥ 2)]

Version:

2.12.120

Packaged:

2026-05-25 04:53:05 UTC; markj

Author:

Mark V. Bravington [aut, cre]

Maintainer:

Mark V. Bravington <markb2@summerinsouth.net>

Repository:

CRAN

Date/Publication:

2026-05-25 15:50:02 UTC

How to use the mvbutils package

Description

Package mvbutils is a collection of utilities, which eventually (?late 2026?) will be split into two parts: all-purpose routines that are usable in other packages and/or interactively, and an R-life-management part mvbutifuls only for interactive use (and not on CRAN). You should use the R-universe version of mvbutils, at https://markbravington.r-universe.dev; I rarely update the one on CRAN, which only exists because other people's packages depend on it. Personally I put all my packages on R-universe only, unless CRAN is unavoidable.

For now (v2.12+), mvbutils offers the following main features:

Miscellaneous goodies: local/nested functions (mlocal), display of what-calls-what (foodweb), multiple replacement (multirep), nicely-formatted latex tables (xtable.mvb), autoprinting of statements and outputs within a function (visify), universal-ish date convertor (autodate), parallel-processing numerical derivative (numvbderiv_parallel), matching multiple columns in dataframes/matrices (multimatch), nicely converting between arrays and dataframes (A2D and D2A), numerous lower-level lower-level utility functions and operators (mvbutils.utils, mvbutils.operators, extract.named, mdeparse, mcut, search.for.regexpr, strip.missing, FOR). These functions mostly aren't spectacular in themselves, but many are used in other more spectacular packages.

The interactive-only part has:

Hierarchical organization of projects (AKA tasks) and sub-tasks, allowing switching within a single R session, searching and moving objects through the hierarchy, objects in ancestor tasks always visible from child (sub)tasks, etc. See cd.
Improved function, text, and script editing facilities, interfacing with whichever text editor you prefer. The R command line is not frozen while editing, and you can have multiple edit windows open. Scriptlets can be edited as expressions, for subsequent calls to eval. Function documentation can be stored as plain text after the function definition, and will be found by help even if the function isn't part of a package. There is also a complete automatic text-format backup system for functions & text. See fixr.
Automated package construction, including production of Rd-format from plain text documentation. Packages can be edited & updated while loaded, without needing to quit/rebuild/reinstall. See mvbutils.packaging.tools.
"Lazy loading" for individual objects, allowing fast and transparent access to collections of biggish objects where only a few objects are used at a time. See mlazy.

Interactive use for r life management

As of 2024, I may be the only person still using these features! There used to be more users, before the Rstudio publicity machine juggernauted its way thru. Mine's still pretty good though: better, I reckon. Anyway! To get the full features of the mvbutils package– in particular, the project organization– you need to start R in the same directory every time (your "ROOT task"), and then switch to whichever project from inside R; see cd. Various options always need to be set to make fixr and the debug package work the way you want, so one advantage of the start-in-the-same directory-approach is that you can keep all your project-independent options(), library loads, etc., in a single .First function or ".Rprofile" file, to be called automatically when you start R. However, many features (including support for the debug package) will work even if you don't follow this suggestion.

The remaining sections of this document cover details that most users don't know about; there's no need to read them when you are just starting out with mvbutils.

Housekeeping info

On loading, the mvbutils package creates a new environment in the search path, called mvb.session.info, which stores some housekeeping information. mvb.session.info is never written to disk, and disappears when the R session finishes. [For Splus users: mvb.session.info is similar to frame 0.] You should never change anything in mvb.session.info by hand, andbut it is sometimes useful to look at some of the variables there:

.First.top.search is the directory R started in (your ROOT task).
.Path shows the currently-attached part of the task hierarchy.
base.xxx is the original copy of an overwritten system function, e.g. library
fix.list keeps track of objects being edited via fixr
session.start.time is the value of Sys.time() when mvbutils was loaded
source.list is used by source.mvb to allow nesting of sources
r.window.handle is used by the handy package (Windows only)
partial.namespaces is used to alleviate difficulties with unloadable data files– see mvbutils.packaging.tools
things whose name starts with ".." are environments used in live-editing packages
maintained.packages is a list of the latter

Redefined functions

On loading, package mvbutils redefines a few system functions: lockEnvironment, importIntoEnv loadNamespace, print.function, help, rbind.data.frame and, by default, library, savehistory, loadhistory, and save.image. (The original version of routine xxx can always be obtained via base.xxx if you really need it.) The modifications, which are undone when you unload mvbutils, should have [almost] no side-effects. Briefly:

library is modified so that its default pos argument is just under the ROOT workspace (the one that was on top when mvbutils was loaded), which is needed by cd. This means that packages no longer get attached by default always in position 2.
lockEnvironment and importIntoEnv are modified to allow live-editing of your own maintained packages– no change to default behaviour.
loadNamespace has the default value of its "partial" argument altered, to let you bypass .onLoad for selected faulty packages– see mvbutils.packaging.tools and look for partial.namespaces. This allows the loading of certain ".RData" files which otherwise crash from hidden attempts to load a namespace. It lets you get round some truly horrendous problems arising from faults with 3rd-party packages, as well as problems when you stuff up your own packages.
rbind.data.frame does not ignore zero-row arguments (so it takes account of their factor levels, for example).
rbind.data.frame: dimensioned elements (i.e. matrices & arrays within data.frames) no longer have any extra attributes removed. Hence, for example, you can (if you are also using my nicetime package) rbind two data frames that both have POSIXct-matrix elements without turning them into raw seconds and losing timezones.
help and ? are modified so that, if utils:::help can't find help for a function (but not a method, dataset, or package), it will look instead for a doc attribute of the function to display in a pager/browser using dochelp. Character objects with a ".doc" extension will also be found and displayed. This lets you write and distribute "informal help".
loadhistory and savehistory are modified so that they use the current "R_HISTFILE" environment variable if it set. This can be set dynamically during an R session using Sys.setenv. Standard R behaviour is to respect "R_HISTFILE" iff it is set before the R session starts, but not to track it during a session. If "R_HISTFILE" is not set, then cd will on first use set "R_HISTFILE" to "<<ROOT task>>/.RHistory", so that same the history file will be used throughout each and every session.
save.image is modified to call Save instead; this will behave exactly the same for workspaces not using mvbutils task-hierarchy feature or the debug package, but otherwise will prevent problems with mtraced functions and mlazyed objects.
print.function is modified to let you go on seamlessly using functions written prior to R 2.14 in conjunction with the srcref system imposed by R 2.14; see fixr.

Some of these redefinitions are optional and can be turned off if you really want: loadhistory, savehistory, save.image, library, lockEnvironment, importIntoEnv, and loadNamespace. To turn them off, set options(mvbutils.replacements=FALSE) before loading mvbutils. However, I really don't recommend doing so; it will prevent cd etc, fixr, and the package-maintenance tools from working properly, and if you use debug you will probably cause yourself trouble when you forgetfully save.image an mtraced function. You can also set the "mvbutils.replacements" option to a character vector comprising some or all of the above names, so that only those happen; if so, you're on your own. The other replacements are unavoidable (but should not be apparent for packages that don't import mvbutils).

After mvbutils has loaded, you can undo the modification of a function xxx by calling assign.to.base( "xxx", base.xxx). Exceptions are help, ?, print.function, rbind.data.frame which are intrinsic to mvbutils. Unloading mvbutils' will undo all the modifications.

Nicer posixt behaviour

POSIXct etc have some nasty behaviour, and mvbutils used to include some functions that ameliorated things. I've moved them into a separate package nicetime, available on request.

Ess and mvbutils

For ESS users: I'm not an Emacs user and so haven't tried ESS with the mvbutils package myself, but a read-through of the ESS documentation (as of ~2005) suggests that a couple of ESS variables may need changing to get the two working optimally. Please check the ESS documentation for further details on these points. I will update this helpfile when/if I receive more feedback on what works (though there hasn't been ESS feedback in ~8 years...).

cd changes the search list, so you may need to alter "ess-change-sp-regex" in ESS.
cd also changes the prompt, so you may need to alter "inferior-ess-prompt". Prompts have the form WORD1/WORD2/.../WORDn> where WORDx is a letter followed by zero or more letters, underscores, periods, or digits.
move can add/remove objects in workspaces other than the top one, so if ESS relies on stored internal summaries of "what's where", these may need updating.

Display bugs

If you have a buggy Linux display where readline() always returns the cursor to the start of the line, overwriting any prompt, then try options( cd.extra.CR=TRUE).

Author(s)

Mark Bravington

Array into dataframe

Description

From an array (or matrix or vector) input, A2D produces a dataframe with one column per dimension of the input, plus a column for the contents, which will be called "response" unless you set the name.of.response argument. Its (almost) inverse is D2A.

The other columns will have names "D1", "D2", etc, unless either (i) the input has a named dimnames attribute, in which its names will be used, or (ii) the argument "add.names" is set to a character vector naming the dimensions. They will be numeric if If the input has any dimnames, then the latter's non-NULL elements will be used in place of 1,2,3,... etc for the entries in the corresponding columns.

Offarray

If you know you are dealing with an offarray object rather than a regular array, you can just call as.data.frame(<myoffar>,...) instead, for clarity. But if you do call A2D, all will be well (try it). OTOH, if you call base::array2DF(<myoffar>) then you generally don't get what you want.

Note

D2A and (something similar to) A2D used to be in my semi-secret handy2 package under slightly different names, but they are useful enough that I've moved them to mvbutils in 2025. The handy2 version (array.to.data.frame AKA a2d) made factor columns rather than character, and contained a lot of code. A2D is largely a wrapper for base::array2DF (qv), which didn't use to exist; however, A2D makes numeric columns where possible and uses names of dimnames to set column names if possible. This all makes it work better with tapply.

A2D and D2A are not strict inverses, because (i) if you start with a data.frame that lacks rows for some index combinations, those rows will still appear in the result, (ii) factor columns turn into character columns, and (iii) columns might get re-ordered.

Usage

A2D( x, name.of.response = "Value")

Arguments

x

array, matrix, or, vector, including offarray objects from the eponymous package.

name.of.response

what to call the output column that holds the array contents (as opposed to its dimensions).

Value

A data.frame, with one more column than there are dimensions to a.

Examples

grubbb <- expand.grid( xx=1:4, yy=2:3) # data.frame
grubbb$z <- with( grubbb, xx+10*yy)
D2A( grubbb, data.col='z')
A2D( D2A( grubbb, data.col='z'), name.of.response='zzzzzzz')
# ... how very interesting...

Pre-install-buildy hooks for compiled code

Description

Clink_packages registers or returns pre-install-buildy hook(s) in task-packages for different types of source code, eg for Rcpp or RcppTidy or TMB or ADT. You should never need to call Clink_packages yourself; it's meant for use by helper-mini-packages that tell mvbutils what to do about a specific type(s) of source code. Authors of "proper" packages containing real low-level source code don't really need to know about any of this— though you could have a look at TIDY.STUBS.AND.SYMBOLS below, and at RcppTidy if you are normally using Rcpp.

Packages built with mvbutils::pre.install— and any package wanting to debug low-level code via eg package vscode— need to have a .onLoad that starts by calling run_Cloaders_<pkgname>(). That function does any work connected with setting up native-symbols and R stubs; package Rcpp on its own will normally cause such stuff to happen before .onLoad, but that will be gently subverted if you use mvbutils::pre.install, which instead makes the function run_Cloader_<pkgname> for you automatically. By having that function, it becomes possible to change the low-level code (eg during debugging), including changing function arguments and so on, without re-installing the entire package. See something else for more details.

The list of potential helper-mini-packages for your whole package (normally just one, but you never know...) is determined by Description->Imports. Source code is expected to be directly in "<mypack>/src" (the only choice if plain old Rcpp is being used), and/or in the "N>=0" different subfolders of that. Each of these "N+1" folders will potentially generate a single DLL with the name of the folder. Each folder is scanned by all registered helper-minis, to see if that helper is wanted; processing of the folder stops after the first helper-mini that finds something to do. The helper-mini might add extra files to the folder (eg a wrapper that exports R-callable stubs, like "RcppExports.cpp"), and will probably add a "Cloader" written in R, to "<mypack>/R" (like "RcppExports.R")— though that will be obscured in the "source package" seen later by INSTALL. If "N>0", or if any of the folders demand it, then an overall "Makefile" will be produced (required for multiple DLLs). Subfolders starting with a period are skipped (e.g. "src/.vscode").

The code that does the scanning for one specific type of low-level code (eg for Rcpp-type code) is a "Pre-Install-Buildy Hook" (PIBH), which are called during pre.install. Each PIBH should have five named arguments (see below). The PIBH will be invoked for folder "<mypack>/src", and for each subfolder thereof— this lets you generate several DLLs (one from each subfolder) of the same type— useful eg for ADT or TMB. A PIBH should check whether it's suitable for that folder, and if so (re)generate any necessary files; but it should also check whether regeneration is necessary (see .REGENERATION.CHECKS). Then the PIBH should return either NULL if there's nothing to do (e.g. it didn't find any suitable source files), or a list/dataframe with these elements:

Cloader: path to R file(s) that set up native symbols and stubs, etc— eg "R/RcppExports.R". Usually just one, but there could be several if multiple DLLs of the same type are required.
DLL: pretty self-explanatory; omit the path and the extension
subenv: see .TIDY.STUBS.AND.SYMBOLS below
makelines: what to put in the Makefile, if there is one (see below). If several commands are required, paste them together separated by "'\n'". At the moment this is left blank for Rcpp which is "needy" about being the only DLL in town, and/but also for RcppTidy where it shouldn't be (to allow multiple DLLs).
needs_makefile: anything except TRUE means that a Makefile will definitely be generated in the src folder. Otherwise, no Makefile is generated unless more than one Clinker is active (basically because multiple DLLs will then be required). Omit it if you don't need it (same effect as FALSE).
postcopy_hook_expr: if set, an expression to be run inside the body of pre.install after the source package has been set up "fully". The PIBH itself will be called before the source package exists, but via this hook it can arrange to do much of the work post hoc; that approach is needed for Rcpp in particular, because Rcpp::compileAttributes demands a full source package. If you can put stuff into the PIBH rather than into this hook, it's probably better to do so. However, if you must, then mvbutils:::Clinks_Rcpp shows how to add a hook, and mvbutils:::Clinks_Rcpp_postcopy shows an example of what might be in a hook.

The arguments to a PIBH must be, in order:

pkg, DLL: self-explanatory names (no paths or extensions)
lldir, Rdir: paths to folders containing the low-level "source" code (which will be either "<mypack>/src", or a subfolder thereof) and the R code (which will be "<mypack>/R").
src_changed: this will contain the src_changed function, which your PIBH can then use without referring to package mvbutils at all.

Tidy stubs and symbols

A PIBH like RcppTidy:::RcppTidy_pre_install can arrange to put all the native-symbols and corresponding R stubs for each DLL into a separate sub-environment of your package's namespace. You then access the R stub for your low-level function myCfun in your DLL Cbits by calling DLL_Cbits$myCfun(...). The effect is like DLL_Cbits=useDynLib(Cbits,.registration=TRUE) in your NAMESPACE (see "Writing R Exensions")—. except NB that, at least for Rcpp and RcppTidy, you should always call the R stub, rather than trying to access the native-symbol directly via .Call. To achieve this effect, your PIBH should return subenv=TRUE. There are a couple of minor differences from what's described in 'Writing R Extensions".

DLL_Cbits will actually be an environment, rather than an S3-classed list.
All the native-symbols from Cbits will be moved into DLL_Cbits, rather than cluttering up the main namespace.
Info on the DLL itself is available via attr(DLL_Cbits,"DLLInfo"), rather than as DLL_Cbits itself— not that you should need it.
DLL_Cbits gets a finalizer that will unload the DLL when/if the package is unloaded— this avoids you having to write a .onLoad for the package.

Note that if you don't like the name DLL_Cbits for the environment— eg because it's too cumbersome— then you can rename it yourself in .onLoad, eg like so:

  mypack:::.onLoad <- function (libname, pkgname) { # include Rbrace to get around doc2Rd bug !}
   #### mypack onload ####
    run_Cloaders_mypack() # must come first
    evalq({
      DLL <- DLL_Cbits
      rm( DLL_Cbits)
    }, envir=asNamespace( 'mypack'))
  # ...

Registering a pre install buildy hook

The point is that there's no way to know whether your package will load before or after mvbutils, so a little subterfuge is required. If mvbutils is already loaded, you can just call Clink_packages directly; if not, it's necessary to set a hook to be run when-and-if mvbutils does load (it may never be needed, eg during production use of a "proper" package that just uses your helper-mini-package). So your helper-mini-package will need a .onLoad containing something very like this:

    # Tell mvbutils::pre.install about this package
    # Try to minimize lookups at time-of-future-use...
    xfun <- eval( substitute( function(...) mvbutils::Clink_packages( pkgname, RcppTidy_pre_install)))
    if( 'mvbutils' %in% loadedNamespaces()) { # already there
      xfun()
    } else {
      setHook(packageEvent("mvbutils", "onLoad"), xfun)
    }

Note that: (i) pkgname will normally be the first argument of the .onLoad, and will be the name of your helper-mini-package; and (ii) RcppTidy_pre_install should the name of your PIBH; and (iii) your helper-mini-package should list mvbutils in "Description->Suggests", but probably not in "Description->Imports" because the latter will force mvbutils to be loaded even if the end-point "proper" pacakge doesn't need it.

Regeneration checks

There's no point in regenerating headers if the main source files haven't changed. mvbutils::src_changed is a utility function that your PIBH can call, to check whether source files have changed.

Makefile

If there is only one type of source code in your package, then no Makefile will be produced unless the PIBH sets needs_makefile; normally that's not necessary. It might be required for eg Pascal sources— and might be a bloody sight easier to do than figuring out Makevars from the spectacularly opaque doco in "Writing R Extensions"... not that anything about "make" is easy AFAICS.

"Writing R Extensions" discourages the use of a Makefile, but I think there's no way round it in the case of multiple targets (see eg https://github.com/kaskr/adcomp/issues/43).

Usage

Clink_packages(...)
src_changed( source_files, Cloader)
dummy_PIBH( pkg, DLL, lldir, Rdir, src_changed) # not "real"
# ... but its existence lets you see what the args should be
# Code of 'dummy_PIBH' is actually real, but from a different place

Arguments

...

(Clink_packages) either missing, or a single character string naming a helper-mini-package such as "Rcpp" or "RcppTidy", or a named function for such a package. See Value and EXAMPLES.

source_files

(src_changed) the ones to check to see if re-pre-build is necessary

Cloader

(src_changed) the current version of the R file that produces "stubs" for source routines, etc. pre.install will add a checksum-manifest as the first line.

pkg

(PIBH) name of package

DLL

(PIBH) DLL...

lldir

(PIBH) folder where the source files live; normally "<mypack>/src", but could be a subfolder of that.

Rdir

(PIBH) folder where the Cloader (written in R) should go, eg "<myprotopack>/R"

src_changed

(PIBH) You can call this function to check for changes in source-files. It will be set to mvbutils::src_changed when your PIBH is called; this means that your PIBH does not have to refer to package mvbutils at all, and so packages depending/importing your helper also do not need to import/depend on package mvbutils.

Value

Clink_packages() returns all registered helper packages, as a list. Clink_packages( "RcppTidy") returns the PIBH for package "RcppTidy". Clink_packages( RcppTidy=<<some function>>, ADThelper=<<some function>>) registers the PIBH for those packages. PIBHs for registration are checked to see that they have just two arguments, Cdir and Rdir. src_changed returns "" if nothing has changed, or a new first line for the manifest file if rebuilding is needed. In which case, your PIBH should do the rebuilding, and prepend the new first line to the manifest. [Currently the code half-attempts to edit the manifest... not right.] PIBH should return a list with these elements:

Cloader

pathname of R script that will do any post-useDynLib setup for the DLL, eg creating R stubs to call the low-level routines. Should be in "<mypack>/R".

makelines

if a Makefile does end up being used, what instruction should compile this DLL? Single string, with "\n" for any newlines.

needs_makefile

normally FALSE for C(etc) code; set to TRUE if this particular bit of source-code demands a Makefile (eg if it's in Pascal); see .MAKEFILE

subenv

Name of environment within the package namespace, within which native-symbols should be created. Can be "", in which case the symbols are created directly in the namespace (like Rcpp does mid-2019).

extra_copies

new files that will need copying from the task package to the source package— eg "<mypack>/src/RcppExports.cpp". Don't include Cloader.

Examples

# Setup in a helper-mini-package
## Not run: 
if( FALSE && is_very_annoying( CRAN)){ # otherwise CMD CHECK --as-cran tries to run this :/
  # In ADT:::._onLoad, where ADT:::ADT_PIBH is a PIBH; see main text
    Clink_packages( ADT=ADT_PIBH)
  # Inside ADT:::ADT_PIBH <- function( dir, Rdir) {...}
    redo <- mvbutils::src_changed( Rcpp_files, this_Cloader)
    if( nzchar( redo)) { recompile() }
} # if F

## End(Not run)

data.frame.to.array package:mvbutils

Description

D2A makes an array out of one column in a dataframe, with (by default) the remaining columns forming the array dimensions, in order. Its (almost) inverse is A2D.

You can choose which columns to use for the dimensions, and in which order, via the dim.cols argument. However, it often easier to subset the array by columns in the call, eg D2A( x[ cq( Year, Len, Count), data.col="Count"). Each unique value in an index column gets a "row" in the array. Combinations of indices that don't appear as rows in the input will become missing.value in the output. If a row has missing values in any index column, it is ignored.

Duplicated index rows in the data.frame are not advisable, and trigger a warning; I think the last value will be used, but I'm not sure.

Note

D2A and (something similar to) A2D used to be in my semi-secret handy2 package under slightly different names, but they are useful enough that I've moved them to mvbutils in 2025.

You can of course do vaguely similar things with base-R and perhaps with countless other packages too, but why not just use this? I do!

Usage

D2A(
  df,
  data.col,
  dim.cols = names(df) %except% data.col,
  missing.value = NA)

Arguments

df

data.frame

dim.cols

character vector saying which columns (in order) to use for array dimensions. Default is everything except data.col

data.col

string saying which column should form the contents of the output

missing.value

what to put into the output for index-combinations that don't occur in the input.

Value

Array with length( dim.cols) dimensions, and appropriate dimnames.

D2a

Dataframe to array

Examples

grubbb <- expand.grid( xx=1:4, yy=2:3) # data.frame
grubbb$z <- with( grubbb, xx+10*yy)
D2A( grubbb, 'z')
# Let's remove some values, and change the order of array dims...
minigrubbb <- grubbb[ c( 1, 3, 4, 7),]
D2A( minigrubbb, 'z', dim.cols=cq( yy, xx))
# Don't have to use all columns
D2A( minigrubbb, 'z', dim.cols='xx')

Unload DLL easily

Description

R's dyn.unload is ridiculously hard to use in practice, because it requires complete paths. These can be extracted from getLoadedDLLs, but only with ridiculous amounts of effort and tricks that I always forget. Use DYN.UNLOAD instead with just the basename of the DLL(s) you actually want to unload.

Note that there can be multiple versions of a DLL loaded at the same time, with the same "name" (according to getLoadedDLLs) but different paths. This will unload the first one (only), so you may need to call it repeatedly.

Usage

DYN.UNLOAD( dllnames, warn_if_not_loaded=TRUE)

Arguments

dllnames

Usually one string, eg "my_dodgy_C_code", but you can do several at once (in a character vector, obvs).

warn_if_not_loaded

Pretty self-explanatory.

Value

The satisfaction of actually having cleared the bloody thing out of memory, eg so that you can delete the file.

Examples

## Not run: 
if( FALSE && is_very_annoying( CRAN)){ # otherwise CMD CHECK --as-cran tries to run this :/
  DYN.UNLOAD( "offending_C_code")
} # if F

## End(Not run)

Generate a negated version of your function. Useful for 'nlminb' etc.

Description

You pass it a function f(.); it returns a function whose result will be -f(.). The arguments, return attributes, and environment are identical to those of f.

Usage

NEG(f)

Arguments

f

Normally, a function that returns a scalar; rarely, a NULL.

Value

A function that returns -f. However, if is.null(f), the result is also NULL; this is useful e.g. for gradient arg to nlminb.

Examples

NEG( sqrt)( 4) # -2
# should put in more complex one here...
e <- new.env()
e$const <- 3
funco <- function( x) -sum( ( x-const)^2L)
environment( funco) <- e
nlminb( c( 0, 0), NEG( funco)) # c( 3, 3)
dfunco <- NULL
nlminb( c( 0, 0), NEG( funco), gradient=NEG( dfunco)) # c( 3, 3)

Markdownize & reverse NEWS object

Description

Probably only for me. Each of my maintained packages has a <mypack>.NEWS object (character vector) in a pretty arbitrary format which I might as well markdownize, so that utils::news can process it. Reverse order (previously, most recent came last).) Remove dates.

Usage

RENEWS( pkg, character.only = FALSE)

Arguments

pkg

eg ..debug or debug or "debug" (the latter a string)

character.only

for programmatic use, enforce string format as pkg

Value

Modified <pkg>.NEWS of class cat, so you can check it before manually assiging to eg ..<mypack>$<mypack>.NEWS. Likely to need cleanup with fixr afterwards!

Examples

## Not run: 
if( FALSE && is_very_annoying( CRAN)){ # otherwise CMD CHECK --as-cran tries to run this :/
  RENEWS( mvbutils)
} # if F

## End(Not run)

Stash variables in caller's environment

Description

REPORTO is a convenience function for use during model-fitting, when you have a hand-written "objective function" to optimize. Suppose your function obfun computes lots of jolly interesting intermediate quantities, which you would like to preserve somewhere, before the function exits and they vanish. Then, just insert a call eg REPORTO( key_result, fascinating, important) somewhere. You can make multiple calls to REPORTO (with different variables...) and they will all be stashed.

You should probably give obfun an environment before you do this, otherwise the interesting stuff will end up in .GlobalEnv (if you are lucky), resulting in clutter. You can also use environment(obfun) to pre-stash data (i.e., before you invoke the function), so that obfun will be able to just refer directly to it, again without cluttering up .GlobalEnv. That level of self-discipline is worth cultivating. See Examples, and eg ?closure for some kind of intro to R's lexical-scoping rules, on which this all depends. There must be a more reader-friendly help link somewhere, though...

Of course, you can do all this with base-R commands anyway (see below). But a key reason for using REPORTO— at least if you are using the package offarray— is that the package offartmb will automatically translate REPORTO calls into RTMB::REPORT calls, so your code can then run under package RTMB without further modification; see offarray::reclasso. Plus, even in normal R use, the REPORTO( var1, var2) syntax is clearer and easier.

Pedants corner

HaRd-nuts will note that normal-R-use REPORTO is "just" syntactic sugar for the totally-self-explanatory idiom:

  list2env( mget( c( "key_result", "fascinating", "important")),
    envir=environment( sys.function()))

And, yes, of course, in normal-R use you can also achieve the effect via <<- and assign. But the former requires you to pre-create the interesting things in environment( obfun), the latter has pig-ugly syntax, and both require self-discipline, which is hateful to me anyway. However, if you really want to do all that, feel free! (And remember to write your own code to handle the RTMB case.)

Usage

REPORTO(..., names = NULL)

Arguments

...

variables you want to stash— unquoted. Can be empty.

names

A character vector with the names of additional variables to stash.

Thus, REPORTO(myvar) or REPORTO(names="myvar") have identical effects. The names argument is handy if you want to stash, say, all variables whose names begin with "ncomps_"— then REPORTO(names=ls(pattern="^ncomps_").

Value

REPORTO itself returns NULL; it is called for its side-effects.

Examples

rego <- function( beta){
  v1 <- X %*% beta
  v2 <- y-v1
  REPORTO( v1, v2)
  ssq <- sum( v2*v2)
return( ssq)
}
e <- new.env( parent=environment( rego))
e$X <- matrix( 1:6, 3, 2)
e$y <- 7:9
environment( rego) <- e
# Now rego will "know about" X & y...
rego( c( 1.6, 2.4))
# ... and it can stash its results there
e$v1
e$v2

Save R objects

Description

These function resemble save and save.image, with two main differences. First, any functions which have been mtraced (see package debug) will be temporarily untraced during saving (the debug package need not be loaded). Second, Save and Save.pos know how to deal with lazy-loaded objects set up via mlazy. Save() is like save.image(), and also tries to call savehistory (see Details). Save.pos(i) saves all objects from the ith position on the search list in the corresponding ".RData" file (or "all.rda" file for image-loading packages, or "*.rdb/*.rdx" for lazyloading packages). There is less flexibility in the arguments than for the system equivalents. If you use the cd system in mvbutils, you will rarely need to call Save.pos directly; cd, move and FF will do it for you.

Usage

Save()
Save.pos( pos, path, ascii=FALSE)

Arguments

pos

string or numeric position on search path, or environment (e.g. ..mypack if "mypack" is a maintained-package).

path

directory or file to save into (see Details).

ascii

file type, as per save

Details

There is a safety provision in Save and Save.pos, which is normally invisible to the user, but can be helpful if there is a failure during the save process (for example, if the system shuts down unexpectedly). The workspace image is first saved under a name such as "n.RData" (the name will be adapted to avoid clashes if necessary). Then, if and only if the new image file has a different checksum to the old ".RData" file, the old file will be deleted and the new one will be renamed ".RData"; otherwise, the new file will be deleted. This also means that the ".RData" file will not be updated at all if there have been no changes, which may save time when synchronizing file systems or backing up.

Two categories of objects will not be saved by Save or Save.pos. The first category is anything named in options( dont.save); by default, this is ".packageName", ".SavedPlots", "last.warning", and ".Traceback", and you might want to add ".Last.value". The second category is anything which looks like a maintained package, i.e. an environment whose name starts with ".." and which has attributes "name", "path", and "task.tree". A warning will be given if such objects are found. [From bitter experience, this is to prevent accidents on re-loading after careless mistakes such as ..mypack$newfun <- something; what you meant, of course, is ..mypack$newfun <<- something. Note that the accident will not cause any bad effects during the current R session, because environments are not duplicated; anything you do to the "copy" will also affect the "real" ..mypack. However, a mismatch will occur if the environment is accidentally saved and re-loaded; hence the check in Save.]

path is normally inferred from the path attribute of the pos workspace. If no such attribute can be found (e.g. if the attached workspace was a list object), you will be prompted. If path is a directory, the file will be called ".RData" if that file already exists, or "R/all.rda" if that exists, or "R/*.rbd" for lazy loads if that exists; and if none of these exist already, then the file will be called ".RData" after all. If you specify path, it must be a complete directory path or file path (i.e. it will not be interpreted relative to a path attribute).

Compression

mvbutils uses the default compression options of save, unless you set options() "mvbutils.compress" and/or "mvbutils.compression_level" to appropriate values as per ?save. The same applies to mlazy objects. Setting options(mvbutils.compression_level=1) can sometimes save quite a bit of time, at the cost of using more disk space. Set these options to NULL to return to the defaults.

History files

Save calls savehistory(). With package mvbutils from about version 2.5.6 on, savehistory and loadhistory will by default use the same file throughout each and every R session. That means everything works nicely for most users, and you really don't need to read the rest of this section unless you are unhappy with the default behaviour.

If you are unhappy, there are two things you might be unhappy about. First, savehistory and loadhistory are by default modified to always use the current value of the R_HISTFILE environment variable at the time they are called, whereas default R behaviour is to use the value when the session started, or ".Rhistory" in the current directory if none was set. I can't imagine why the default would be preferable, but if you do want to revert to it, then try to follow the instructions in ?mvbutils, and email me if you get stuck. Second, the default for R_HISTFILE itself is set by mvbutils to be the file ".Rhistory" in the .First.top.search directory– normally the one you start R in. You can change that default by specifying R_HISTFILE yourself before loading mvbutils, in one of the many ways described by the R documentation on ?Startup and ?Sys.getenv.

Author(s)

Mark Bravington

Examples

## Not run: 
if( FALSE && is_very_annoying( CRAN)){ # otherwise CMD CHECK --as-cran tries to run this :/
  Save() #
  Save.pos( "package:mvbutils") # binary image of exported functions
  Save.pos( 3, path="temp.Rdata") # path appended to attr( search()[3], "path")
} # if F

## End(Not run)

Skeletal flat-format documentation

Description

You very likely don't need to read this— add.flatdoc.to is usually called automatically for you, by fixr( ..., new.doc=TRUE). It adds skeletal flat-format documentation to a function, suitable for conversion to Rd-format using doc2Rd. The result should pass RCMD CHECK (but won't be much use until you actually edit the documentation).

Usage

# See *Examples* for practical usage
add.flatdoc.to(x, char.x = NULL, pkg=NULL, env=NULL, convert.to.source=FALSE)

Arguments

x

unquoted function name, sought in parent.frame() unless pkg is set

char.x

[string] function name

pkg

[string] name of maintained package where x lives (optional)

env

[environment] where to get x from; defaults to caller unless pkg is set.

convert.to.source

[logical] if TRUE and x has a "srcref" attribute, the latter is converted to a source attribute. This ensures that fixr will write the function correctly to the temporary file used by your text editor. Default is FALSE for historical reasons.

Details

You don't have to write Rd-compatible documentation from the outset. You can write documentation that's as free-form as you wish, and there's no need to use add.flatdoc.to to do it– you can write the doco directly in your text editor provided that you can source the resultant melange OK (see fixr). I find add.flatdoc.to useful, though, because I can never remember the headings or mild layout conventions of doc2Rd and Rd-format itself.

Value

A function with attribute doc containing the flat-format documentation.

Examples

myfun <- function( ...) ...
myfun <- mvbutils:::add.flatdoc.to( myfun)
# 'fixr( myfun)' will now allow editing of code & doco together
# Or, in a maintained package:
# ..mypack$myfun <<- add.flatdoc.to( myfun, pkg='mypack')
## End don't run

Universal date converter

Description

At your own risk: this aims for the most-sensible interpretation of a character vector of dates in whatever godawful format they may be, to avoid the delights of strptime. "Most sensible" is according to me; but you (or the originator of the dataset) might have different ideas, and if so it's your problem. See Details for, you guessed it, DETAILS.

Usage

autodate(datestr, ct=TRUE)

Arguments

datestr

character vector

ct

whether to return POSIXct (the default) or POSIXlt object

Details

All dates in the vector must have the same format as each other. Each must have a Day, Month, and Year, in any order except that Year cannot be in the middle, separated by either "/" or "-". Spaces are ignored. Month can be numbers, 3-letter abbreviation, or full month name. Year can be either 2- or 4-digits, but (unlike strptime itself) all digits are checked; note that strptime will uncomplainingly accept 1/1/2099 as coming from AD20 if you tell it %Y, even tho IMO you should have to write eg 1/1/0020 if you want stuff pre-AD1000, and autodate will enforce that. Consequently, leading zeros on Day and Month are ignored, but are honoured on Year.

In case of ambiguous results (which are common, especially with Day and 2-digit Year), the version with the smallest range is chosen; if several versions have equal range, the most recent (or furthest-future) is chosen.

Value

POSIXct or POSIXlt object, always with timezone GMT. Attributes (dim etc) should be preserved.

Examples

## Should add more...
## Unambigous:
autodate( '1-Mar-2017')
# [1] "2017-03-01 GMT"
## Stupid
autodate( '1/1/1')
# Warning in autodate("1/1/1") :
#   Ambiguous date format: gonna pick futurest...
# [1] "2001-01-01 GMT"
## Ancient: NB 4 digits.
autodate( '1/13/0001')
# [1] "0001-01-13 GMT"
## Lazy, 2-digit year: assume modern
autodate( '1/13/01')
#  Warning in autodate("1/13/01") :
#    Ambiguous date format: gonna pick futurest...
#  [1] "2001-01-13 GMT"
## Corner case...
autodate( character(), ct=FALSE)
# POSIXlt of length 0
autodate( character())
# character(0) # actually CORRECT-- it really is 'POSIXct'-- but just prints as if wrong

Like Rd2roxygen, but fixing some bugs

Description

Like package Rd2roxgyen (qv), for modding the R source in an existing source package to add Roxgyen comments (i.e., documentation and export instructions). Package Rd2roxgyen does most of the work, but it has a couple of bugs and I don't think they are likely to get fixed soon (one of them is "a feature").

This is called internally by pre.install, if "RoxygenNote" is found in the DESCRIPTION file, but can also be called manually.

Personally I don't like Roxygen— to me it seems a bad implementation of a reasonable idea (keep documentation tightly linked with code, and avoid markup complexity) for which there are better and simpler ways— but Others do. So this might help, especially if Others are collaborating with non-Roxygenites..

Usage

bugfix_Rd2roxygen( sourcedir, pkg = basename(sourcedir), nsinfo = NULL)

Arguments

sourcedir

folder containing the source package (so it should contain a "DESCRIPTION" file, a folder called "R", and so on)

pkg

name of the package, deduced from sourcedir if not supplied

nsinfo

info slurped from the NAMESPACE file (actually just about S3 methods, which Rd2roxygen inexplicably ignores). Used internally by pre.install for efficiency, but if you are calling this manually, you can leave it and the NAMESPACE file itself will be used

Value

Alters the file "<sourcedir>/R/<pkg>.R". Also, if there's a file "<sourcedir>/R/<pkg>-package.R", then a "Collate" field is added (or modified) to the DESCRIPTION file, to make sure that the package-source is collated last. This is a good idea, for reasons that I can no longer remember.

Organizing R workspaces

Description

cd allows you to set up and move through a hierarchically-organized set of R workspaces, each corresponding to a directory. While working at any level of the hierarchy, all higher levels are attached on the search path, so you can see objects in the "parents". You can easily switch between workspaces in the same session, you can move objects around in the hierarchy, and you can do several hierarchy-wide things such as searching, even on parts of the hierarchy that aren't currently attached.

Usage

# Occasionally: cd()
# Usually: cd(to)
# Rarely:
 cd(to, execute.First = TRUE, execute.Last = TRUE)

Arguments

to

the path of a task to move to or create, as an unquoted string. If omitted, you'll be given a menu. See Details.

execute.First

should the .First.task code be executed on attachment? Yes, unless there's a bug in it.

execute.Last

should the .Last.task code be executed on detachment? Yes, unless there's a bug in it.

Details

R workspaces can become very cluttered, so that it becomes difficult to keep track of what's what (I have seen workspaces with over 1000 objects in them). If you work on several different projects, it can be awkward to work out where to put "shared" functions– or to remember where things are, if you come back to a project after some months away. And if you just want to test out a bit of code without leaving permanent clutter, but while still being able to "see" your important objects, how do you do it? cd helps with all such problems, by letting you organize all your projects into a single tree structure, regardless of where they are stored on disk. Each workspace is referred to (for historical reasons) as a "task".

Note that there is a basic choice when working with R: do you keep everything you write in a text file which you source every time you start; or do you store all the objects in a workspace as a binary image in a ".RData" file, and rely on save and load? [Hybrids are possible, too.] Some people prefer the text-based approach, but others including me prefer the binary image approach; my reasons are that binary images let me organize my work across tasks more systematically, and that repeated text-sourcing is much too slow when lengthy analyses or data extractions are involved. The cd system is really geared to the binary image model and, before cd moves to a new task, either up or down the hierarchy, the current workspace is automatically saved to a binary image. Nevertheless, I don't think cd is incompatible with other ways of working, as long as the ".RData" file (actually the tasks object) is not destroyed from session to session. At any rate, some people who work by sourceing large code files still seem to find cd useful; it's even possible to use the .First.task feature to auto-load a task's source files into a text editor when you cd to that task. With the ".RData"-only approach, it is highly advisable to have some way of keeping separate text backups, at least of function code. The fixr editing system is geared up to this, and I presume other systems such as ESS are too.

To use the cd system, you will need to start R in the same workspace every time. This will become your ROOT or home task, from which all other tasks stem. There need not be much in this workspace except for an object called tasks (see below), though you can use it for shared functions that you don't want to organize into a package. From the ROOT task, your first action in a new R session will normally be to use cd to switch to a real task. The cd command is used both to switch between existing tasks, and to create new ones.

To set yourself up for working with cd, it's probably a good idea to make the ROOT task a completely new blank workspace, so the first step is to (outside R) create an empty folder with some name like "Rstart". [In MS-Windows, you should think about where to put this, to save yourself inordinate typing later on. If you are planning to create a completely new set of folders for your R projects, you might want to put this ROOT folder near the top of the disk directory structure, rather than in the insane default that Windows proffers, which usually looks something like "c:\document...\local...\long...\ridiculous...". However, if you are planning instead to link existing folders into the task hierarchy, then it's better to create the ROOT folder just above, or parallel to, the location of these folders.] Start R in this folder, type library( mvbutils), and then start linking your existing projects into the task hierarchy. [Of course, this assumes that you do have existing projects. If you don't, then just start creating new tasks.] To link in a project, just type cd() and a menu will appear. The first time, there will be only one option: "CREATE NEW TASK". Select it (or type 0 to quit if you are feeling nervous), and you will be prompted for a "task name", by which R will always subsequently refer to the task. Keep the name short; it doesn't have to be related to the location of the disk directory where the .RData lives. Avoid spaces and weird characters– use periods as separators. Task names are case-sensitive. Next, you'll be asked which disk directory this task refers to. By default, cd expects that you are creating a new task, and therefore suggests putting the directory immediately below the current task directory. However, if you are linking in an existing project, you'll need to supply the directory name. You can save huge amounts of typing by using "." to refer to the current directory, and on *nix systems you can use "~" too. Next, you'll be returned to the R command prompt– but the prompt will have changed, so that the ">" is preceded by the task name. If you type search(), you'll see your ROOT task in position 2, below .GlobalEnv as usual. Despite the name, though, the new .GlobalEnv contains the project you've just linked, and if you type ls(), you should see some familiar objects. Now type cd(0) to move back to the ROOT task (note the changed prompt), type search() and ls() again to orient yourself, and proceed as before to link the rest of your pre-existing tasks into the hierarchy. When you now type cd(), the menu will have more choices. If you select an existing task rather than creating a new one, you will switch straightaway to that workspace; watch the prompt.

Once you have a hierarchy set up, you can switch the current workspace within the hierarchy by calling e.g. cd(existing.task) (note the lack of quotes), or by calling cd() and picking off the menu. You can move through several levels of the hierarchy at once, using a path specifier such as cd(mytask/data/funcs) or cd(../child.of.sibling). Path specifiers are just like Unix or Windows disk paths with "/" as the separator, so that "." means "current task" and ".." means "parent". However, the character 0 must be used to denote the ROOT task, so that you have to type cd(0/different.task) rather than cd(/different.task). You can display the entire hierarchy by calling cdtree(0), or graphically via plot( cdtree( 0)).

When you first set up your task hierarchy, you'll also want to create or modify the .First function in your ROOT task. At a minimum, this should call library( mvbutils), but you may also want to set some options controlling the behaviour of cd (see the Options section). If you use other features of mvbutils such as the function-editing interface in fixr, there will be further options to be set in .First. [MAC users: for some strange reason .First just doesn't get called if you are using the "usual" RGUI for MACs. So what you need to do is create a ".Rprofile" file in your ROOT folder using a text editor; this file should both contain the definition of the .First function, and should also call .First() directly. You can also put the .First commands directly into the ".Rprofile" file, but watch out for the side-effect of creating objects in .GlobalEnv.]

You can create a fully hierarchical structure, with subtasks within subtasks within tasks, etc. Even if your projects don't naturally look like this, you may find the facility useful. When I create a new task, I tend to start with just one level of hierarchy, containing data, function code, and results. When this gets unspeakably messy, I often create one (or more) subtasks, usually putting the basic data at the top level, and functions and results at the lower level. Apart from tidiness, this provides some degree of protection against overwriting the original data. And when even this gets too messy– in one task, I have more than 150 functions, and it is very easy to generate 100s of analysis results– I create another level, keeping "established" functions at the second tier and using the third tier for temporary workspace and results. There are no hard-and-fast rules here, of course, and different people use R in very different ways.

A task can have .First.task and/or .Last.task functions, which get called immediately after cding into the task from its parent, or immediately before cding back to its parent, respectively (see Arguments). These can be useful for dynamic loading, loading scripts into a text editor, attaching & detaching datasets, etc., and facilitate the use of tasks as informal packages.

For turning tasks into formal R packages, consult mvbutils.packaging.tools.

How it works

The mechanism underlying the tree structure is very simple: each task that has any subtasks will contain a character vector called tasks, whose names are the R names of the tasks, and whose elements are the corresponding disk directories. Your ROOT task need contain no more than a .First function and a tasks object.

You can manually modify the tasks vector, and sometimes this is essential. If you decide to move a disk directory, for example, you can manually change the corresponding element of tasks to reflect the change. (Though if you are moving a whole task hierarchy, e.g. when migrating to a new machine, consult cd.change.all.paths. Having said that, the ability to use relative pathnames in tasks, which is present since about mvbutils version 2.0, makes cd.change.all.paths partly redundant.) You can also rename a task very easily, via something like

  names( tasks)[ names( tasks)=="my.old.name"] <- "my.new.name"

You can use similar methods to "reparent" a subtask without changing the directory structure.

There is (deliberately, to avoid accidents) no completely automatic way of removing tasks. To "hide" a task from the cd system, you first need to be cded to its parent; then remove the corresponding element of the tasks object, most easily via e.g.

  tasks <- tasks %without.name% "mysubtask"

If you want to remove the directories corresponding to "mysubtask", you have to do so manually, either in the operating system or (for the brave) in R code.

Remember to Save() at some point after manually modifying tasks.

Options

Various options() can be set, as follows. Remember to put these into your .First function, too.

write.mvb.tasks=TRUE causes a sourceable text representation of the tasks object to be maintained in each directory, in the file tasks.r. This helps in case you accidentally wipe out the .RData file and lose track of where the child tasks live. To create these text representations for the first time throughout the hierarchy, call cd.write.mvb.tasks(0). You need to put the the options call in your .First.

abbreviate.cdprompt=n controls the length of the prompt string. Only the first n characters of all ancestral task names will be shown. For example, n=1 would replace the prompt long.task.name/data/funcs> with l/d/funcs>.

mvbutils.update.history.on.cd=FALSE will prevent automatic saving & reloading of the history file when cd is called.

cd checks the R_HISTFILE environment variable and, if unset, sets it to file.path( getwd()), ".Rhistory"). This (combined with the mvbutils replacement of the standard versions of savehistory and loadhistory– see package?mvbutils) ensures that the same history file is used throughout each and every R session. My experience is that a single master history file is safer. However, if you want to override this behaviour– e.g. if you want to use a separate history file for each task– call something like Sys.setenv( R_HISTFILE=".Rhistory") before the first use of cd.

Note

cd calls setwd so that file searches will default to the task directory (see also task.home).

cd always calls Save before attaching a child task on top or moving back up the hierarchy. If you have many and/or big objects, the default behaviour can be slow. You can speed this up– sometimes dramatically– by "mcacheing" some of your objects so that they are stored in separate files– see mlazy.

If there are no changes to the ".RData" file, cd will not modify the file– in particular, its date-of-access will be unchanged. This helps avoid unnecessary file copying on subsequent synchronization. However, there are several seemingly innocuous operations which change the workspace: calling a random number function (changes .Random.seed), causing an error (creates .Traceback), and causing a warning (creates last.warning). To avoid forcing a change to the entire ".RData" file whenever one of these changes, you can set option( mvbutils.quick.cd=TRUE); this turns on mcacheing for those objects (see mlazy), so that they are stored in separate mini-files.

cd is only meant to be called interactively, and has only been tested in that context.

cd will issue a warning and refuse to move back up the hierarchy if it detects a non-task attached in position 2. You will need to manually detach any such objects before cding back up, or write a .Last.task function to automatically do the detaching. To make sure that library (and any automatic loading of packages, e.g. if triggered by loading a file referring to a namespace) always inserts packages below ROOT, the .onLoad code in mvbutils makes a minor hack to library, changing the default pos argument accordingly.

Two objects in the mvb.session.info search environment (see search()) help keep track of what parts of the hierarchy are currently attached; .First.top.search and .Path. The former is set when mvbutils loads, and the latter is updated by cd. Attached tasks can be identified by having a path attribute consisting of a named character vector. Normal packages also have a path attribute, but without names.

Author(s)

Mark Bravington

Hierarchy-crawling functions for cd-organized workspaces

Description

These functions work through part or all of a workspace (task) hierarchy set up via cd. cdfind searches for objects through the (attached and unattached) task hierarchy. cdtree displays the hierarchy structure. cd.change.all.paths is useful for moving or migrating all or part of the hierarchy to new disk directories. cd.write.mvb.tasks sets up sourceable text representations of the hierarchy, as a safeguard. cditerate is the engine that crawls through the hierarchy, underpinning the others; you can write your own functions to be called by cditerate.

If a task folder or its ".RData" file doesn't exist, a warning is given and (obviously) it's not iterated over. If that file does exist but there's a problem while loading it (e.g. a reference to the namespace of a package that can't be loaded– search for partial.namespaces in mvbutils.packaging.tools) then the iteration is still attempted, because something might be loaded. Neither case should cause an error.

Usage

cdfind( pattern, from = ., from.text, show.task.name=FALSE)
cdregexpr( regexp, from = ., from.text, ..., show.task.name=FALSE)
cdtree( from = ., from.text = substitute(from), charlim = 90)
cd.change.all.paths( from.text = "0", old.path, new.path)
cd.write.mvb.tasks( from = ., from.text = substitute(from))
cditerate( from.text, what.to.do, so.far = vector("NULL", 0), ..., show.task.name=FALSE)
## S3 method for class 'cdtree'
plot( x, ...) # S3 method for cdtree; normally plot( cdtree(<<args>>))

Arguments

pattern

regexpr to be checked against object names.

regexp

regexpr to be checked against function source code.

from

unquoted path specifier (see cd); make this 0 to operate on the entire hierarchy.

from.text

use this in place of from if you want to use a character string instead

show.task.name

(boolean) as-it-happens display of which task is being looked at

charlim

maximum characters per line allowed in graphical display of cdtree; reduce if unreadable, or change par( cex)

old.path

regexpr showing portion of directory names to be replaced

new.path

replacement portion of directory names

what.to.do

function to be called on each task (see Details)

so.far

starting value for accumulated list of function results

...

further fixed arguments to be passed to what.to.do (for cditerate), or grep (for cdregexpr), or foodweb (for plot.cdtree)

x

result of a call to cdtree, for plotting

Details

All these functions start by default from the task that is currently top of the search list, and only look further down the hiearchy (i.e. to unattached descendents). To make them work through the whole hierarchy, supply 0 as the from argument. cdtree has a plot method, useful for complicated task hierarchies.

If you want to automatically crawl through the task hierarchy to do something else, you can write a wrapper function which calls cditerate, and an inner function to be passed as the what.to.do argument to cditerate. The wrapper function will typically be very short; see the code of cdfind for an example.

The inner function (typically called cdsomething.guts) must have arguments found, task.dir, task.name, and env, and may have any other arguments, which will be set according as the ... argument of cditerate. found accumulates the results of previous calls to what.to.do. Your inner function can augment found, and should return the (possibly augmented) found. As for the other parameters: task.dir is obvious; task.name is a character(1) giving the full path specifier, e.g. "ROOT/mytask"; and env holds the environment into which the task has been (temporarily) loaded. env allows you to examine the task; for instance, you can check objects in the task by calling ls(env=env) inside your what.to.do function. See the code of cdfind.guts for an example.

Value

cdfind returns a list with one element for each object that is found somewhere; each such element is a character vector showing the tasks where the object was found. cdregexpr returns a list with one element for each task where a function whose source matches the regexpr is found; the names of each list element names the functions within that task (an ugly way to return results, for sure). cdtree returns an object of class cdtree, which is normally printed with indentations to show the hierarchy. You can also plot(cdtree(...)) to see a graphical display. cd.change.all.paths and cd.write.mvb.tasks do not return anything useful.

Author(s)

Mark Bravington

Examples

## Not run: 
cdfind( ".First", 0) # probably returns list( .First="ROOT")

## End(Not run)

Support routine for cd-organized workspace hierarchy.

Description

Sets the command-line prompt to the correct value (see cd, and the notes on the option abbreviate.cdprompt); useful if the prompt somehow becomes corrupted. cdprompt never seems necessary in R but has been useful in the S+ manifestations of mvbutils, where system bugs are commoner.

Usage

cdprompt()

Author(s)

Mark Bravington

Examples

cdprompt()

Show functions and callees in environment 'egood' that have changed or disappeared in environment 'ebad'.

Description

Useful eg when you have been modifying a package, and have buggered stuff up, and want to partly go back to an earlier version... entirely hypothetical of course, things like that never ever happens to me. Mere mortals might want to create a new environment goodenv, use evalq(source(<<old.mypack.R.source.file>> local=T), goodenv), then find.changes( goodenv, asNamespace("mypack")). If your package is lazy-loaded, you're stuffed; I avoid lazy-loading, except perhaps for final distribution, because it just makes it much harder to track problems. Not that I ever have problems, of course.

Can be applied either to a specified set of functions, or by default to all the functions in egood. If the former, then all callees of the specified functions are also checked for changes, as are all their callees, and so on recursively.

Usage

changed.funs(egood, ebad, topfun, fw = NULL)

Arguments

egood, ebad

environments #1 & #2. Not symmetric; functions only in ebad won't be checked.

topfun

name of functions in egood to check; all callees will be checked too, recursively. Default is all functions in egood.

fw

if non-NULL, the result of a previous call to foodweb(egood), but this will be called automatically if not.

Value

Character vector with the names of changed/lost functions.

Check consistency of maintained package versions

Description

Utility to compare version numbers of the different "instances" of one of your maintained packages. Only the most up-to-date folders relevant to the running R version are checked; see mvbutils.packaging.tools.

The "instances" checked are:

the task package itself (in eg ..mypack$mypack.VERSION)
the source package created by pre.install
the installed package, maintained by patch.install
the tarball package, created by build.pkg
the binary package, created by build.pkg.binary

The care argument controls what's shown. Mismatches when care="installed" should be addressed by patch.install, because something has gotten out-of-synch (probably when maintaining the same version of a package for different R versions). Mismatches with the built ("tarball" and "binary") packages are not necessarily a problem, just an indication of work-in-progress.

Usage

check.patch.versions(care = NULL)

Arguments

care

if non-NULL, a character vector with elements in the set "installed", "source", "tarball", and "binary". Only packages where there's a version mismatch between these fields and the task package version will be shown.

Value

A character matrix with maintained packages as rows, and the different instances as columns. "NA" indicates that a version couldn't be found.

Compare source packages eg for checking git

Description

Suppose you have a maintained task-package, and you've made a source package from it. And that there's a version on github, which you want to update. So you pull it, into your local github spot, then check for any changes with this function. If there aren't any, then you don't need to mess around with unpackage; you could carry on maintaining your task-package as usual, then scrunge it into your github spot, then push.

compare_spack_code actually looks for functions in "mypack.R" file that differ between the versions. It tries to look at attributes of the functions, too (usually there won't be any). If you ask for one specific function only, it will try to use the diffr package to display a nice diff of the two versions.

Probably I should describe what to do if you do find a difference... haven't needed to yet!

Usage

compare_spacks(pkg, gitplace = "d:/github/flub",
    d1, d2, character.only = FALSE)
compare_spack_code(pkg, gitplace = "d:/github/flub",
    d1, d2, character.only = FALSE, showdiff=NULL)

Arguments

pkg

as per build.pkg etc; eg mypack or ..mypack

gitplace

your local github spot

d1, d2

Or you can specify the folders directly with these (need to set both)

character.only

as per build.pkg etc, eg char="mypack" (or more likely char=thispack when thispack is the index of a for-loop)

showdiff

optional, name of one function to show differences for.

Value

A list with character-vector components in1, in2, and diffs (unless showdiff is set). Any file (or any function, for compare_spack_code) which are not different won't be mentioned. If showdiff is set, nothing is returned, but you should see the results in your browser.

Remove doc attributes when package loads

Description

Suppose you want to keep plain-text "doc" attributes attached to your function code even in the package source (as opposed to in a private version of the package). You probably don't want them around after the package loads for real, though. In that case, you can stick a call to dedoc_namespace at the end of your .onLoad and everything should be copacetic.

Usage

dedoc_namespace(ns)

Arguments

ns

Name of the package, or its namespace environment.

Examples

## Not run: 
if( FALSE && is_very_annoying( CRAN)){ # otherwise CMD CHECK --as-cran tries to run this :/
  # Put this into your package:
  .onLoad <- function( libname, pkgname){
    # stuff for .onLoad(), or no stuff
    dedoc_package( pkgname)
  }
} # if F

## End(Not run)

Shorthand filler-inner for lists

Description

Suppose you want to set up a list where several consecutive elements take the same value, but you don't want to repeatedly type that value: then use dittolist to set empty (missing) elements to the previous non-empty element. Wrap in unlist() to create a vector instead of a list.

Usage

ditto.list(...)
# EG:
# ditto.list( a=1, b=, c='hello') # a: 1; b: 1, c: 'hello'

Arguments

...

anything, named or unnamed; missing elements OK

Value

List

Examples

unlist( ditto.list( a=1, b=, c='hello')) # a: 1; b: 1, c: 'hello'

Modify a function's scope

Description

do.in.envir lets you write a function whose scope (enclosing environment) is defined at runtime, rather than by the environment in which it was defined.

Usage

# Use only as wrapper of function body, like this:
# my.fun <- function(...) do.in.envir( fbody, envir=)
# ... should be the arg list of "my.fun"
# fbody should be the code of "my.fun"
do.in.envir( fbody, envir=parent.frame(2)) # Don't use it like this!

Arguments

fbody

the code of the function, usually a braced expression

envir

the environment to become the function's enclosure

Details

By default, a do.in.envir function will have, as its enclosing environment, the environment in which it was called, rather than defined. It can therefore read variables in its caller's frame directly (i.e. without using get), and can assign to them via <<-. It's also possible to use do.in.envir to set a completely different enclosing environment; this is exemplified by some of the functions in debug, such as go.

Note the difference between do.in.envir and mlocal; mlocal functions evaluate in the frame of their caller (by default), whereas do.in.envir functions evaluate in their own frame, but have a non-standard enclosing environment defined by the envir argument.

Calls to e.g. sys.nframe won't work as expected inside do.in.envir functions. You need to offset the frame argument by (at time of writing this documentation...) 5, so that sys.parent() should be replaced by sys.parent( 5) and sys.call by sys.call(-5). In future, 5 may not be the right magic number.

do.in.envir functions are awkward inside namespaced packages, because the code in fbody will have "forgotten" its original environment when it is eventually executed. This means that objects in the namespace will not be found.

The debug package tries to m'trace' inside do.in.envir functions; this used to work, but hasn't been recently tested in R4.1 where a few internal R deepshit mysteries seem to have changed.

Value

Whatever fbody returns.

Author(s)

Mark Bravington

Examples

fff <- function( abcdef) ffdie( 3)
ffdie <- function( x) do.in.envir( { x+abcdef} )
fff( 9) # 12; ffdie wouldn't know about abcdef without the do.in.envir call
# Show sys.call issues
# Note that the "envir" argument in this case makes the
# "do.in.envir" call completely superfluous!
try({ # Not needed here, but I was trying to debug CMD CHECK examples and hit woe
  ffe <- function(...) do.in.envir( envir=sys.frame( sys.nframe()), sys.call( -5))
  ffe( 27, b=4) # ffe( 27, b=4)
})

Easier sapply/lapply avoiding explicit function

Description

Simpler to demonstrate:

  do.on( find.funs(), environment( get( .)))
  # same as:
  lapply( find.funs(), function( x) environment( get( x)))

do.on evaluates expr for all elements of x. The expression should involve the symbol ., and will be cast into a function which has an argument . and knows about any dotdotdot arguments passed to do.on (and objects in the function that calls do.on). If x is atomic (e.g. character or numeric, but not list) and lacks names, it will be given names via named. With do.on, you are calling sapply, so the result is simplified if possible, unless simplify=FALSE (or simplify="array", for which see sapply). With FOR, you are calling lapply, so no simplication is tried; this is often more useful for programming.

Usage

do.on(x, expr, ..., simplify = TRUE)
FOR(x, expr, ...)

Arguments

x

thing to be iterated over. Names are copied to the result, and are pre-allocated if required as per Description

expr

expression, presumably involving the symbol . which will successively become the individual elements of x

...

other "arguments" for expr

simplify

as per sapply, and defaulting to TRUE.

Value

do.on

as per sapply, a vector or array of the same "length" as x.

FOR

a list of the same length as x

Examples

do.on( 1:7, sum(1:.))
#  1  2  3  4  5  6  7
# 1  3  6 10 15 21 28
# note the numeric "names" in the first row
FOR( 1:3, sum(1:.))

Converts plain-text documentation to Rd format

Description

doc2Rd converts plain-text documentation into an Rd-format character vector, optionally writing it to a file. You probably won't need to call doc2Rd yourself, because pre.install and patch.install do it for you when you are building a package; the entire documentation of package mvbutils was produced this way. The main point of this helpfile is to describe plain-text documentation details. However, rather than wading through all the material below, just have a look at a couple of R's help screens in the pager, e.g. via help( glm, help_type="text"), copy the result into a text editor, and try making one yourself. Don't bother with indentation though, except in item lists as per More details below (the pager's version is not 100% suitable). See fixr and its new.doc argument for how to set up an empty template: also help2flatdoc for how to convert existing Rd-format doco.

docotest lets you quickly check how your doco would look in a browser.

For how to attach plain-text documentation to your function, see docattr and write_sourceable_function, etc.

Usage

doc2Rd( text, file=NULL, append=, warnings.on=TRUE, Rd.version=,
    def.valids=NULL, check.legality=TRUE)
docotest( fun.or.text, ...)

Arguments

For doc2Rd:

text

(character or function) character vector of documentation, or a function with a doc attribute that is a c.v. of d..

file

(string or connection) if non-NULL, write the output to this file

append

(logical) only applies if !is.null(file); should output be appended rather than overwriting?

warnings.on

(logical) ?display warnings about apparently informal documentation?

Rd.version

(character) what Rdoc version to create "man" files in? Currently "1" means pre-R2.10, "2" means R2.10 and up. Default is set according to what version of R is running.

def.valids

(character) objects or helpfiles for which links should be generated automatically. When doc2Rd is being called from pre.install, this will be set to all documented objects in your package. Cross-links to functions in other packages are not currently generated automatically (in fact not at all, yet).

check.legality

if TRUE and Rd.version is 2 or more, then the output Rd will be run thru parse_Rd and a try-error will be returned if that fails; normal return otherwise. Not applicable if Rd.version is 1.

For docotest:

fun.or.text

(character or function) character vector of documentation, or a function with a doc attribute that is a c.v. of d.. NB if maintaining a package, you need to run this on the "raw" code (e.g. ..mypack$myfun), not on the installed function (e.g. not myfun or mypack::myfun).

...

other args passed to Rd2HTML when it tries to convert doc2Rd output to HTML. I've no idea what these might be, since they wouldn't be used in reality by pre.install when it assembles your source package.

Value

Character vector containing the text as it would appear in an Rd file, with class of "cat" so it prints nicely on the screen.

More details

Flat-format (plain-text) documentation in doc attributes, or in stand-alone character objects whose name ends with ".doc", can be displayed by the replacement help in mvbutils (see dochelp) without any further ado. This is very useful while developing code before the package-creation stage, and you can write such documentation any way you want. For display in an HTML browser (as opposed to R's internal pager), and/or when you want to generate a package, doc2Rd will convert pretty much anything into a legal Rd file. However, if you can follow a very few rules, using doc2Rd will actually give nice-looking authentic R help. For this to work, your documentation basically needs to look like a plain-text help file, as displayed by help(..., help_type="text"), except without most indentation (so, your paragraphs should not contain hard line breaks).

Rather than wading through this help file to work out how to write plain-text help, just have a look at a couple of R's help screens in the pager, and try making one yourself. You can also use help2flatdoc to convert an existing plain-text help file. Also check the file "sample.fun.rrr" in the "demostuff" subdirectory of this package (see Examples). If something doesn't work, delve more deeply...

There are no "escape characters"– the system is "text WYSIWYG". For example, if you type a \ character in your doc, help will display a \ in that spot. Single quotes and percent signs can have special implications, though– see below.
Section titles should either be fully capitalized, or end with a : character. The capitalized version shows up more clearly in informal help. Replace any spaces with periods, e.g. SEE.ALSO not SEE ALSO. The only non-alpha characters allowed are hyphens.
Subsections are like sections, except they start with a sequence of full stops, one per nesting level. See also Subsections.
"Item lists", such as in the Arguments section and sometimes the Value section (and sometimes other sections), should be indented and should have a colon to separate the item name from the item body.
General lists of items, like this bullet-point list, should be indented and should start with a "-" character, followed by a space.
Your spacing is generally ignored (exceptions: Usage, Examples, multi-line code blocks; see previous point). Tabs are converted to spaces. Text is wrapped, so you should write paragraphs as single lines without hard line breaks. Use blank lines generously, to make your life easier; also, they will help readability of informal helpfiles.
To mark in-line code fragments (including variable names, package names, etc– basically things that R could parse), put them in single quotes. Hence you can't use single quotes within in-line code fragments.

An example of what you couldn't include:

'myfun( "'No no no!'")'

Single quotes are OK within multi-code blocks, Usage, and Examples. For multi-line code blocks in other sections, don't bother with the single-quotes mechanism. Instead, insert a "%%#" line before the first line of the block, and make sure there is a blank line after the block.
You can insert "hidden lines", starting with a % character, which get passed to the Rd conversion routines. If the line starts with %%, then the Rd conversion routines will ignore it too. The "%%#" line to introduce multi-line code blocks is a special case of this.
Some other special constructs, such as links, can be obtained by using particular phrases in your documentation, as per Special fields.

Subsections

I've bolded some of these meta-refs to sections

Subsections are a nice new feature in R 2.11. You can use them to get better control over the order in which parts of documentation appear. R will order sections thus: Usage, Arguments, Details, Value, other sections you write in alphabetical section order, Notes, See also. That order is not always useful. You can add subsections to Details so that people will see them in the order you want. If you want Value to appear before Details, then just rename Details to "MORE.DETAILS", and put subsections inside that.

In plain-text, subsection headings are just like section headings, except they start with a period (don't use the initial periods when cross-referencing to it elsewhere in the doco). You can have nested subsections by adding extra periods at the start, like this:

Another depth of nesting

In the plain text version of this doco, the SUBSECTIONS line starts with one period, and the ANOTHER.DEPTH.OF.NESTING line starts with two. If you try to increase subsection depth by more than one level, i.e. with 2+ full stops more than the previous (sub)section, then doc2Rd will correct your "mistake".

Special fields

Almost anything between a pair of single quotes will be put into a \code{} or \code{\link{}} or \pkg{} or \env{} construct, and the quotes will be removed. A link will be used if the thing between the quotes is a one-word name of something documented in your package (assuming doc2Rd is being called from pre.install). A link will also be used in all cases of the form "See XXX" or "see XXX" or "XXX (qv)", where XXX is in single quotes, and any " (qv)" will be removed. With "[pP]ackage XXX" and "XXX package", a \pkg{} construct will be used. References to .GlobalEnv and .BaseNamespaceEnv go into \env{} constructs. Otherwise, a \code{} construct will be used, unless the following exceptions apply. The first exception is if the quotes are inside Usage, Examples, or a multi-line code block. The second is if the first quote is preceded by anything other than " ", "(" or "-". The final semi-exception is that a few special cases are put into other constructs, as next.

URLs and email addresses should be enclosed in <...>; they are auto-detected and put into \url{} and \email{} constructs respectively.

Lines that start with a % will have the % removed before conversion, so their contents will be passed to RCMD Rdconv later (unless you start the line with %%). They aren't displayed by dochelp, though, so can be used to hide an unhelpful USAGE, say, or to hide an "#ifdef windows".

A solitary capital-R is converted to \R. Triple dots used to be converted to \dots (regardless of whether they're in code or normal text) but I've stopped doing so because this conversion was taking 97% of the total runtime!

Any reasonable "*b*old" or "_emphatic stuff_" constructions (no quotes, just the asterisks) will go into \bold{} and \emph{} constructs respectively, to give bold or emphatic stuff. (Those first two didn't, because they are "unreasonable"– in particular, they're quoted.) No other fancy constructs are supported (yet).

Format for non-function help

For documenting datasets, the mandatory sections seem to be Description, Usage, and Format; the latter works just like Arguments, in that you specify field names in a list. Other common sections include Examples, Source, References, and Details.

Extreme details

The first line should be the docfile name (without the Rd) followed by a few spaces and the package descriptor, like so:

utility-funs package:mypack

When doc2Rd runs, the docfile name will appear in both the \name{} field and the first \alias{} field. pre.install will actually create the file "utility-funs.Rd". The next non-blank lines form the other alias entries. Each of those lines should consist of one word, preceded by one or more spaces for safety (not necessary if they have normal names).

"Informal documentation" is interpreted as any documentation that doesn't include a "DESCRIPTION" (or "Description:") line. If this is the case, doc2Rd first looks for a blank line, treats everything before it as \alias{} entries, and then generates the Description section into which all the rest of your documentation goes. No other sections in your documentation are recognized, but all the special field substitutions above are applied. (If you really don't want them to be, use the multi-line code block mechanism.) Token Usage, Arguments, and Keywords sections are appended automatically, to keep RCMD happy.

Section titles built into Rd are: Description, Usage, Synopsis (defunct for R>=3.1), Arguments, Value, Details, Examples, Author or Author(s), See also, References, Note, Keywords and, for data documentation only, Format and Source. Other section titles (in capitals, or terminated with a colon) can be used, and will be sentence-cased and wrapped in a \section{} construct. Subsections work like sections, but begin with a sequence of full stops, one per nesting level. Most cross-refs to (sub)sections will be picked up automatically and put into bold, so that e.g. "see MY.SECTION" will appear as "see My section"; when referring to subsections, omit the initial dots. To force a cross-reference that just doesn't want to appear, use e.g. "MY.SECTION (qv)", or just wrap it in "*...*".

The \docType field is set automatically for data documentation (iff a Format section is found) and for package documentation (iff the name on the first line includes "-package").

Spacing within lines does matter in Usage (qv), Examples, and multi-line code blocks, where what you type really is what you get (except that a fixed indent at the start of all lines in such a block is removed, usually to be reinstated later by the help facilities). The main issue is in the package "manual" that RCMD generates for you, where the line lengths are very short and overflows are common. (Overflows are also common with in-line code fragments, but little can be done about that.) The "RCMD Rd2dvi –pdf" utility is helpful for seeing how individual helpfiles come out.

In See also, the syntax is slightly different; names of things to link to should not be in single quotes, and should be separated by commas or semicolons; they will be put into \code{\link{}} constructs. You can split SEE.ALSO across several lines; this won't matter for pager help, but can help produce tidier output in the file "***-manual.tex" produced by RCMD CHECK.

In Examples, to designate "don't run" segments, put a "## Don't run" line before and a "## End don't run" line after.

I never bother with Keywords (except sometimes "internal", to avoid exporting something), but if you do, then separate the keywords with commas, semicolons, or line breaks; don't use quotes. A token Keywords section will be auto-generated if you don't include one, to keep RCMD happy.

Infrequently asked questions

Q: Why didn't you use Markdown/MyPetBargainSyntax?

A: Mainly because I didn't know about them, to be honest. But WRTO MarkDown it seemed to me that the hard-line-breaks feature would be a pain. If anyone thinks there's really good alternative standard, please let me know.

Q: I have written a fancy displayed equation using \deqn{} and desperately want to include it. Can I?

A: Yes (though are you sure that a fancy equation really belongs in your function doco? how about in an attached PDF, or vignette?). Just prefix all the lines of your \deqn with %. If you want something to show up in informal help too, then make sure you also include lines with the text version of the equation, as per the next-but-one question.

Q: I have written a fancy in-line equation using \eqn{} and desperately want to include it. Can I?

A: No. Sorry.

Q: For some reason I want to see one thing in informal help (i.e. when the package isn't actually loaded but just sitting in a task on the search path), but a different thing in formal help. Can I do that?

A: If you must. Use the %-line mechanism for the formal help version, and then insert a line "%#ifdef flub" before the informal version, and a line "%#endif" after it. Your text version will show up in informal help, and your fancy version will show up in all help produced via Rd. (Anyone using the "flub" operating system will see both versions...)

Q: How can I insert a file/kbd/samp/option/acronym etc tag?

A: You can't. They all look like single quotes in pager-style help, anyway.

Q: What about S3?

A: S3 methods often don't need to be documented. However, they can be documented just like any other function, except for one small detail: in the Usage section, the call should use the generic name instead of your method name, and should be followed by a comment "# S3 method for <class>"; you can append more text to the comment if you wish. E.G.: if you are documenting a method print.cat, the Usage section should contain a call to print(x,...) # S3 method for cat rather than print.cat(x,...). The version seen by the user will duplicate this "S3 method..." information, but never mind eh.

If you are also (re)defining an S3 generic and documenting it in the same file as various methods, then put a comment # generic on the relevant usage line. See ?print.function for associated requirements.

Confusion will deservedly arise with a function that looks like an S3 method, but isn't. It will be not be labelled as S3 by pre.install because you will of course have used the full name in the Usage section, because it isn't a method. However, it can still be found by NextMethod etc., so you shouldn't do that. (Though mvbutils::max.pkg.ver currently does exactly that...)

S3 classes themselves need to be documented either via a relevant method using an alias line, or via a separate myclass.doc text object.

Q: What about S4?

A: I am not a fan of S4 and have found no need for it in many 1000s of lines of R code... hence I haven't included any explicit support for it so far. Nevertheless, things might well work anyway, unless special Rd constructs are needed. If doc2Rd doesn't work for your S4 stuff (bear in mind that the %-line mechanism may help), then for now you'll still have to write S4 Rd files yourself; see pre.install for where to put them. However, if anyone would like the flatdoc facility for S4 and is willing to help out, I'm happy to try to add support.

Examples

## Needs a function with the right kind of "doc" attr
## Look at file "demostuff/sample.fun.rrr"
# NB source.mvb() also works, _without_ the '$value', but obsolete from at least v2.12
sample.fun <- source( system.file( file.path(
    'demostuff', 'sample.fun.rrr'), package='mvbutils'))$value
print( names( attributes( sample.fun)))
cat( '***Original plain-text doco:***\n')
print( as.cat( attr( sample.fun, 'doc'))) # unescaped, ie what you'd actually edit
cat( '\n***Rd output:***\n')
sample.fun.Rd <- doc2Rd( sample.fun)
print( sample.fun.Rd) # already "cat" class
## Not run: 
if( FALSE && is_very_annoying( CRAN)){ # otherwise CMD CHECK --as-cran tries to run this :/
  docotest( sample.fun) # should display in browser
} # if F

## End(Not run)

Flat-format documentation

Description

The docattr convention, and its obsolete ancestor flatdoc, lets you edit plain-text documentation (or other plain-text attributes) in the same file as your function's source code. You can add docattr yourself; in the full mvbutils scheme, that's rarely done explicitly, but you will see them in text files produced by fixr. They are mostly used to write Rd-style help with almost no markup (much cleaner than Roxygen!) that will be converted into Rd-format when building/exporting packages. However, mvbutils extends help so that ?myfunc will display plain-text documentation for myfunc, even if myfunc isn't in a package. There are no restrictions on the format of informal-help documentation, so docattr is useful for adding quick simple help just for you or for colleagues. If your function is to be part of a maintained package (see mvbutils.packaging.tools), then the documentation should follow a slightly more formal structure; use fixr( myfun, new.doc=T) to set up the appropriate template.

A neat trick, for a function where you want "internal" documentation but not visible (yet), is to name the attribute "secret_doc" rather than "doc".

The difference between these two functions is that docattr (which requires R >= 4.1) has completely regular R syntax, taking advantage of "raw strings" (see Quotes). flatdoc had to use a rather devious trick which required that the file was subsequently read in by source.mvb rather than source. If you have a task package wundapak which uses flatdoc, then you can convert it with eg tidyup_docattr(..wundapak).

docattr is a simple wrapper for string2charvec, to which it just adds the "docattr" class so that the documentation is not printed by default; you will just see "# FLAT-FORMAT DOCUMENTATION" appended to the function body.

Usage

# ALWAYS use it like this:
# structure( function( ...) {body},
# doc=docattr( r"--{
#... including newlines, etc}--
# }--"))
# almost NEVER like this
docattr( rawstr)
# Obsolete flatdoc() version
# ALWAYS use it like this:
# structure( function( ...) {body},
# doc=flatdoc( EOF="<<end of doc>>"))
# plaintext doco goes here...
# NEVER use it like this:
flatdoc( EOF="<<end of doc>>")
tidyup_docattr( e)

Arguments

rawstr

a single string, almost certainly a raw string containing the plain-text documentation.

EOF

character string showing when plain text ends, as in readlines.mvb

body

replace with your function code

...

replace with your function arg list

e

an environment (usually a task package, starting with ..— but it could be .GlobalEnv), or the name of an environment.

Value

docattr returns a character vector of class docattr. The print method for docattr objects just displays the string "# FLAT-FORMAT DOCUMENTATION", to avoid screen clutter.

Internal details of flatdoc

This section can be safely ignored by almost all users, and as of mvbutils v2.11.0, it's obsolete anyway since docattr should now replace flatdoc throughout.

On some text editors, you can modify syntax highlighting so that the "start of comment block" marker is set to the string "doc=flatdoc(".

It's possible to use flatdoc to read in more than one free-format text attribute. The EOF argument can be used to distinguish one block of free text from the next. These attributes can be accessed from your function via attr( sys.function(), "<<attr.name>>"), and this trick is occasionally useful to avoid having to include multi-line text blocks in your function code; it's syntactically clearer, and avoids having to escape quotes, etc. mvbutils:::docskel shows one example.

fixr uses write.sourceable.function to create text files that use the flatdoc convention. Its counterpart FF reads these files back in after they're edited. The reading-in is not done with source but rather with source.mvb, which understands flatdoc. The call to doc=flatdoc causes the rest of the file to be read in as plain text, and assigned to the doc attribute of the function. Documentation can optionally be terminated before the end of the file with the following line:

  <<end of doc>>

or whatever string is given as the argument to flatdoc; this line will cause source.mvb to revert to normal statement processing mode for the rest of the file. Note that vanilla source will not respect flatdoc; you do need to use source.mvb.

flatdoc should never be called from the command line; it should only appear in text files designed for source.mvb.

The rest of this section is probably obsolete, though things should still work.

If you are writing informal documentation for a group of functions together, you only need to flatdoc one of them, say myfun1. Informal help will work if you modify the others to e.g.

  myfun2 <- structure( function(...) { whatever}, doc=list("myfun1"))

If you are writing with doc2Rd in mind and a number of such functions are to be grouped together, e.g. a group of "internal" functions in preparation for formal package release, you may find make.usage.section and make.arguments.section helpful.

Author(s)

Mark Bravington

Examples

# This illustrates the general format for a function with attached plain-text
# documentation. It is the format produced by write_sourceable_function()
flubbo <- structure( function( x){
  ## A comment
  x+1
}
,doc=mvbutils::docattr( r"-{
flubbo       not-yet-in-a-package
'flubbo' is a function! And here is some informal doco for it. Whoop-de-doo!
See "sample.fun.rrr" in the "demostuff" folder for a better example, with full paragraphs.
I have had to shorten this one to appease the CRANia.
}-")
)
# Here is one way to add a text attribute to a function:
myfun <- structure( function( myname){
  texto <- attr( sys.function(), 'text')
  sprintf( texto, myname)
}
, text= mvbutils::docattr( r"--{
It's all about \%s!
The "universe" 'revolves' around \%s!
}--"))
myfun( 'potatoes')
## Don't run
## OBSOLETE: 'flatdoc' itself is superceded by 'docattr'
## Put next lines up to "<<end of doc>>" into a text file <<your filename>>
## and remove the initial hashes
#structure( function( x) {
#  x*x
#}
#,doc=flatdoc("<<end of doc>>"))
#
#Here is some informal documentation for the "SQUARE" function
#<<end of doc>>
## Now try SQUARE <- source.mvb( <<your filename>>); ?SQUARE
## Example with multiple attributes
## Put the next lines up to "<<end of part 2>>"
## into a text file, and remove the single hashes
#myfun <- structure( function( attname) {
#  attr( sys.function(), attname)
#}
#,  att1=flatdoc( EOF="<<end of part 1>>")
#,  att2=flatdoc( EOF="<<end of part 2>>"))
#This goes into "att1"
#<<end of part 1>>
#and this goes into "att2"
#<<end of part 2>>
## Now "source.mvb" that file, to create "myfun"; then:
# myfun( 'att1') # "This goes into \\"att1\\""
# myfun( 'att2') # "and this goes into \\"att2\\""
## End don't run

Documentation (informal help)

Description

dochelp(topic) will be invoked by the replacement help if conventional help fails to find documentation for topic topic. If topic is an object with a doc attribute (or failing that if <<topic>> or <<topic>>.doc is a character vector), then the attribute (or the character object) will be formatted and displayed by the pager or browser. dochelp is not usually called directly.

Usage

# Not usually called directly
# If it is, then normal usage is: dochelp( topic)
dochelp( topic, doc, help_type=c( "text", "html"))
# Set options( mvb_help_type="text") if the browser gives you grief

Arguments

topic

(character) name of the object to look for help on, or name of "...doc" character object– e.g. either thing or thing.doc if the character object is thing.doc.

doc

(character or list)– normally not set, but deduced by default from topic; see Details.

help_type

as per help. Defaults to getOption( "mvb_help_type") in normal usage, which in turn defaults to getOption( "help_type") as for standard help. Only "text" and "html" are supported by dochelp; anything else maps to "text", which invokes R's internal pager.

Details

dochelp will only be called if the original help call was a simple help( topic=X, ...) form, with X not a call and with no try.all.packages or type or lib.loc arguments (the other help options are OK).

The doc argument defaults to the doc attribute of get("topic"). The only reason to supply a non-default argument would be to use dochelp as a pager; this might have some value, since dochelp does reformat character vectors to fit nicely in the system pager window, one paragraph per element, using strwrap. Elements starting with a "%" symbol are not displayed.

To work with dochelp, a doc attribute should be either:

a character vector, of length >=1. New elements get line breaks in the pager. Or:
a length-one list, containing the name of another object with a doc attribute. dochelp will then use the doc attribute of that object instead. This referencing can be iterated.

If the documentation is very informal, start it with a blank line to prevent find.documented( ..., doctype="Rd") from finding it.

With help_type="text", the doco will be re-formatted to fit the pager; each paragraph should be a single element in the character vector. Elements starting with a % will be dropped (but may still be useful for doc2Rd).

With help_type="html", the doco will be passed thru doc2Rd and then turned into HTML. doc2Rd is pretty forgiving and has a fair crack at converting even very informal documentation, but does have its limits. If there is an error in the doc2Rd conversion then help_type will be reset to "text".

flatdoc offers an easy way to incorporate plain-text (flat-format) documentation– formal or informal– in the same text file as a function definition, allowing easy maintenance. The closer you get to the displayed appearance of formal R-style help, the nicer the results will look in a browser (assuming help_type="html"), but the main thing is to just write some documentation– the perfect is the enemy of the good in this case!

Author(s)

Mark Bravington

Examples

#
myfun <- structure( function() 1,
  doc="Here is some informal documentation for myfun\n")
dochelp( "myfun")
help( "myfun") # calls dochelp

Prevent sealing of a namespace, to facilitate package maintenance.

Description

Call dont.lock.me() during a .onLoad to stop the namespace from being sealed. This will allow you to add/remove objects to/from the namespace later in the R session (in a sealed namespace, you can only change objects, and you can't unseal a namespace retrospectively). There could be all sorts of unpleasant side-effects. Best to leave it to maintain.packages to look after this for you...

Usage

# default of env works if called directly in .onLoad
dont.lock.me( env=environment( sys.function( -1)))

Arguments

env

the environment to not lock.

Details

dont.lock.me hacks the standard lockEnvironment function so that locking won't happen if the environment has a non-NULL dont.lock.me attribute. Then it sets this attribute for the namespace environment.

Examples

## Not run: 
if( FALSE && is_very_annoying( CRAN)){ # otherwise CMD CHECK --as-cran tries to run this :/
  # This unseals the namespace of MYPACK only if the option "maintaining.MYPACK" is non-NULL:
  .onLoad <- function( libname, pkgname) {
    if( !is.null( getOption( 'maintaining.' %&% pkgname)))
      mvbutils:::dont.lock.me()
  }
} # if F

## End(Not run)

Helper for live-editing of packages

Description

Normally, objects in a NAMESPACEd package are locked and can't be changed. Sometimes this isn't what you want; you can prevent it by calling dont.lockBindings in the .onLoad for the package. For user-visible objects (i.e. things that end up in the "package:blah" environment on the search path), you can achieve the same effect by calling dont.lockBindings in the package's .onAttach function, with namespace=FALSE.

Usage

 dont.lockBindings( what, pkgname, namespace.=TRUE)

Arguments

what

(character) the names of the objects to not lock.

pkgname

(string) the name of the package. As you will only use this inside .onLoad, you can just set this to pkgname which is an argument of .onLoad.

namespace.

TRUE to antilock in the namespace during .onLoad; FALSE to antilock in the visible manifestation of the package.

Details

Locking occurs after .onLoad / .onAttach are called so, to circumvent it, dont.lockBindings creates a hook function to be called after the locking step.

Examples

## Not run: 
if( FALSE && is_very_annoying( CRAN)){ # otherwise CMD CHECK --as-cran tries to run this :/
  library( debug)
  debug:::.onLoad # d.lB is called to make 'tracees' editable inside 'debug's namespace.
  debug:::.onAttach # d.lB is called to make 'tracees' editable in the search path
  # NB also that an active binding is used to ensure that the 'tracees' object in the search...
  #... path is a "shadow of" or "pointer to" the one in 'debug's namespace; the two cannot get...
  #... out-of-synch
} # if F

## End(Not run)

Create variables from corresponding named list elements

Description

This is a convenience function for creating named variables from lists. It's particularly useful for "unpacking" the results of calls to .C.

Usage

extract.named( l, to=parent.frame())

Arguments

l

a list, with some named elements (no named elements is OK but pointless)

to

environment

Value

nothing directly, but will create variables

Author(s)

Mark Bravington

Examples

ff <- function(...) { extract.named( list(...)); print( ls()); bbb }
# note bbb is not "declared"
ff( bbb=6, ccc=9) # prints [1] "bbb" "ccc", returns 6

Read in fixed-width files quickly

Description

Experimental replacement for read.fwf that runs much faster. Included in mvbutils only to reduce dependencies amongst my other packages.

Usage

fast.read.fwf(file, width,
    col.names = if (!is.null(colClasses))
        names( colClasses) else "V" %&%  1:ncol(fields),
    colClasses = character(0), na.strings = character(0L),  tz = "", ...)

Arguments

file

character

width

vector of column widths. Negative numbers mean "skip this many columns". Use an NA as the final element if there are likely to be extra characters at the end of each row after the last one that you're interested in.

col.names

names for the columns that are NOT skipped

colClasses

can be used to control type conversion; see read.table. It is an optional vector whose names must be part of col.names. There is one extension of the read.table rules:a colClass string starting POSIXct. will trigger automatic conversion to POSIXct, using the rest of the string as the format specifier. See also tz.

na.strings

are there any strings (other than NA) which should convert to NAs?

tz

used in auto-conversion to POSIXct when colClass is set

...

ignored; it's here so that this function can be called just like read.fwf

Value

A data.frame, as per read.fwf and read.table. misc

Support for flat-format documentation

Description

find.documented locates functions that have flat-format documentation; the functions and their documentation can be separate, and are looked for in all the environments in pos, so that functions documented in one environment but existing in another will be found. find.docholder says where the documentation for one or more functions is actually stored. Both find.documented and find.docholder check two types of object for documentation: (i) functions with "doc" attributes, and (ii) character-mode objects whose name ends in ".doc"

Usage

find.documented( pos=1, doctype=c( "Rd", "casual", "own", "any"),
  only.real.objects=TRUE, exclude.internal=FALSE)
find.docholder( what, pos=find( what[1]))

Arguments

pos

search path position(s), numeric or character. In find.documented, any length. In find.docholder, only pos[1] will be used; it defaults to where the first element of what is found.

doctype

Defaults to "Rd". If supplied, it is partially matched against the choices in Usage. "Rd" functions are named in the alias list at the start of (i) any doc attribute of a function, and (ii) any text object whose name ends with ".doc", that exist in pos (see doc2Rd). "casual" functions have their own doc attribute and will be found by the replacement of help; note that the doc attribute can be just a reference to another documented function, of mode "list" as described in dochelp. "own" functions (a subset of "casual") have their own character-mode doc attribute, and are suitable for doc2Rd. "any" combines casual and Rd.

only.real.objects

If TRUE, only return names of things that exist somewhere in the pos environments. FALSE means that other things such as the name of helpfiles might be returned, too.

exclude.internal

If TRUE, check the doc attributes to see if they have "KEYWORDS<whitespace>internal", and if so, omit that function. Normally you probably wouldn't want that yourself; but it is used in make.NAMESPACE to decide about exportees.

what

names of objects whose documentation you're trying to find.

Value

find.documented

Character vector of function names.

find.docholder

list whose names are what; element i is a character vector showing which objects hold documentation for what[i]. Normally you'd expect either 0 or 1 entries in the character vector; more than 1 would imply duplication.

Note

doctype="Rd" looks for the alias names, i.e. the first word of all lines occurring before the first blank line. This may include non-existent objects, but these are checked for and removed.

Start informal documentation (i.e. not intended for doc2Rd) with a blank line to avoid confusion.

Author(s)

Mark Bravington

Shows functions and scriptlets sorted by date of edit

Description

fix.order sorts the functions and scriptlets according to the filedates of their backups (in the .Backup.mvb directory). This is very useful for reminding yourself what you were working on recently. It only works if functions and scriptlets have been edited using the fixr system.

Usage

fix.order( env=1)

Arguments

env

a single number, character string, or environment. Numbers and characters are interpreted as search path positions. The environment must be an attached mvb-style task.

Details

Only objects that have a BU*** backup file will appear. Objects that have a BU*** file but have been deleted will not appear.

Value

Character vector of functions and scriptlets sorted by date/time of last modification.

To do

Probably should modify this so it takes an arbitrary task path instead of a search position only. Task doesn't really need to be attached.

Add a pattern argument a la find.funs.

Author(s)

Mark Bravington

Examples

## Not run: 
if( FALSE && is_very_annoying( CRAN)){ # otherwise CMD CHECK --as-cran tries to run this :/
  ## Need to create backups and do some function editing first
  fix.order() # functions in .GlobalEnv
  fix.order( "ROOT") # functions in your startup task
} # if F

## End(Not run)

Editing functions, text objects, and scriptlets

Description

fixr opens a function (or text object, or "script" stored as an R expression— see Scriptlets) in your preferred text editor. Control returns immediately to the R command line, so you can keep working in R and can be editing several objects simultaneously (cf edit). A session-duration list of objects being edited is maintained, so that each object can be easily sourced back into its rightful workspace. These objects will be updated automatically on file-change if you've run autoedit( TRUE) (e.g. in your .First), or manually by calling FF(). There is an optional automatic text backup facility. It is designed to work with the cd system, and may well notwork outside of that system (note that you are automatically in that system if you call library(mvbutils)).

The safest is to call fixtext to edit text objects, and fixr for functions and everything else. However, fixr can handle both, and for objects that already exist it will preserve the type. For new objects, though, you have to specify the type by calling either fixr or fixtext. If you forget— ie if you really wanted to create a new text object, but instead accidentally typed fixr( mytext)— you will (probably) get a parse error, and mytext will then be "stuck" as a broken function. Your best bet is to copy the actual contents in the text-editor to the clipboard, type fixtext( mytext) in R, paste the old contents into the text-editor, and save the file; R will then reset the type and all should be well.

readr also opens a file in your text editor, but in read-only mode, and doesn't update the backups or the list of objects being edited.

fixr is designed for interfacing stand-alone text editors with R. I've never tried to interface it with e.g. ESS or Rstudio; that might be possible, and even desirable, not least because of the next subsection.

Packages

fixr works with package mvbutils package-maintenance system, and in fact is probably the only way to edit stuff inside that system. If your maintained package mypack (see maintain.packages) is already loaded, changes will be reflected immediately in the namespace of mypack, and also in any packages that import mypack, and to lists of S3 methods. The sole exception is if your function already exists in asNamespace(mypack) and consists of a single call to .Call or .External. In that case, it is assumed to be an automated wrapper to low-level code that was created when the DLL was loaded, and thus should not be mucked about with; a warning is issued (but it's usually fine). [To force an overwrite without a rebuild, you'd have to first delete the function directly from the namespace— yikes; but you're probably at a point of needing to rebuild the package anyway.] The newly-edited version will still go into the source-package for mypack, though, even though it is just a placeholder. The reason you might want to do that, is merely to note the existence of this wrapper-to-low-level (and what its arguments are), and to document it (either internally for use in the package, or for actual export if the user can call it directly).

Usage

  # Usually: fixr( x) or fixr( x, new.doc=T)
  fixr( x, new=FALSE, install=FALSE, what, fixing, pkg=NULL,
      character.only=FALSE, new.doc=FALSE, force.srcref=FALSE,
      stop.fixing=character())
  # fixtext really has exact same args as fixr, but technically its args are:
  fixtext( x, ...)
  # Usually: readr( x) but exact same args as fixr, though the defaults are different
  readr( x, ...)
  FF() # manual check and update, usually only needed...
      # ... temporarily if autoedit() stops working
  autoedit( do=TRUE) # stick this line in your .First

Arguments

x

a quoted or unquoted name of a function, text object, or expression. You can also write mypack$myfun, or mypack::myfun, or mypack:::myfun, or ..mypack$myfun, to simultaneously set the pkg argument (only if mypack has been set up with maintain.packages). Note that fixr uses non-standard evaluation of its x argument, unless you specify character.only=TRUE. If your object has a funny name, either quote it and set character.only=TRUE, or pass it directly as...

character.only

(logical or character) if TRUE, x is treated as a string naming the object to be edited, rather than the unquoted object name. If character.only is a string, it is treated as the name of x, so that eg fixr(char="funny%name") works.

new.doc

(logical) if TRUE, add skeleton plain-text R-style documentatation, as per add.flatdoc.to. Also use this to create an empty scriptlet for a general (non-function, non-text) object.

force.srcref

(logical) Occasionally there have been problems transferring old code into "new" R, especially when a function has text attributes such as (but not limited to) doc; the symptom is, they appear in the editor just as "# FLAT-FORMAT DOCUMENTATION". This sometimes requires manual poking-around, but usually can be sorted out by calling fixr(...,force.srcref=TRUE).

new

(logical, seldom used) if TRUE, edit a blank function template, rather than any existing object of that name elsewhere in the search path. New edit will go into .GlobalEnv unless argument pkg is set.

install

(logical, rarely used) logical indicating whether to go through the process of asking you about your editor

what

Don't use this– it's "internal"! [Used by fixtext, which calls fixr with what="" to force text-mode object. what should be an object with the desired class.]

fixing

(logical, rarely used) FALSE for read-only (i.e. just opening editor to examine the object)

pkg

(string or environment) if non-NULL, then specifies in which package a specific maintained package (see maintain.packages) x should be looked for.

do

(logical) TRUE => automatically update objects from altered files; FALSE => don't.

...

other arguments, except what in fixtext, and fixing in readr, are passed to fixr.

stop.fixing

(character vector) removes these items from fix list.

Details

When fixr is run for the first time (or if you set install=TRUE), it will ask you for some basic information about your text editor. In particular, you'll need to know what to type at a command prompt to invoke your text editor on a specific file; in Windows, you can usually find this by copying the Properties/Shortcut/Target field of a shortcut, followed by a space and the filename. After supplying these details, fixr will launch the editor and print a message showing some options ("backup.fix", "edit.scratchdir" and "program.editor"), that will need to be set in your .First. function. You should now be able to do that via fixr(.First).

Changes to the temporary files used for editing can be checked for automatically whenever a valid R command is typed (e.g. by typing 0<ENTER>; <ENTER> alone doesn't work). To set this up, call autoedit() once per session, e.g. in your .First. The manual version (ie what autoedit causes to run automatically) is FF(). If any file changes are detected by FF, the code is sourced back in and the appropriate function(s) are modified. FF tries to write functions back into the workspace they came from, which might not be .GlobalEnv. If not, you'll be asked whether you want to Save that workspace (provided it's a task– see cd). FF should still put the function in the right place, even if you've called cd after calling fixr (unless you've detached the original task) or if you moved it. If the function was being mtraced (see package?debug), FF will re-apply mtrace after loading the edited version. If there is a problem with parsing, the source attribute of the function is updated to the new code, but the function body is invisibly replaced with a stop call, stating that parsing failed.

If something goes wrong during an automatic call to FF, the automatic-call feature will stop working; this is rare, but can be caused eg by hitting <ESC> while being prompted whether to save a task. To restart the feature in the current R session, do autoedit(F) and then autoedit(T). It will come back anyway in a new R session.

readr requires a similar installation process. To get the read-only feature, you'll need to add some kind of option/switch on the command line that invokes your text editor in read-only mode; not all text editors support this. Similarly to fixr, you'll need to set options( program.reader=<<something>>) in your .First; the installation process will tell you what to use.

fixr, and of course fixtext, will also edit character vectors. If the object to be edited exists beforehand and has a class attribute, fixr will not change its class; otherwise, the class will be set to "cat". This means that print invokes the print.cat method, which displays text more readably than the default. Any other attributes on character vectors are stripped.

For functions, the file passed to the editor will have a ".r" extension. For character vectors or other things, the default extension is ".txt", which may not suit you since some editors decide syntax-highlighting based on the file extension. (EG if the object is a character-vector "R script", you might want R-style syntax highlighting.) You can somewhat control that behaviour by setting options()$fixr.suffices, eg

  options( fixr.suffices=c( r='.r', data='.dat'))

which will mean that non-function objects whose name ends .r get written to files ending ".r.r", and objects whose name ends .data get written to files ending ".data.dat"; any other non-functions will go to files ending ".txt". This does require you to use some discipline in naming objects, which is no bad thing; FWIW my "scripts" always do have names ending in .r, so that I can see what's what.

fixr creates a blank function template if the object doesn't exist already, or if new=TRUE. If you want to create a new character vector as opposed to a new function, call fixtext, or equivalently set what="" when you call fixr.

If the function has attributes, the version in the text editor will be wrapped in a structure(...) construct (and you can do this yourself). If a doc attribute exists, it's printed as free-form text at the end of the file, nowadays wrapped in a call like this:

  , doc =  mvbutils::docattr( r"----{
  <documentation here...>
  }----")

(perhaps with no dashes, or a different number) which eventually ends with some closing brackets and so on. Or, in functions written with old versions of mvbutils (pre-2.11), you'll instead see something like this:

  ,doc=flatdoc( EOF="<<end of doc>>"))

and then the docu, and no closing stuff at the end of the file.

When the file is sourced back in, those lines will cause the rest of the free-format text (no escape characters needed, etc) to be read in as a doc attribute, which can be displayed by help. If you want to add plain-text documentation, you can also add these lines yourself– see flatdoc. Calling fixr( myfun, new.doc=TRUE) sets up a documentation template that you can fill in, ready for later conversion to Rd format in a package (see mvbutils.packaging.tools).

The list of functions being edited by fixr is stored in the variable fix.list in the mvb.session.info environment. When you quit and restart R, the function files you have been using will stay open in the editor, but fix.list will be empty; hence, updating the file "myfun.r" will not update the corresponding R function. If this happens, just type fixr(myfun) in R and when your editor asks you if you want to replace the on-screen version, say no. Save the file again (some editors require a token modification, such as space-then-delete, first) and R will notice the update. Very very occasionally, you may want to tell R to stop trying to update one of the things it's editing, via eg fixtext <<- fixtext[-3,] if the offending thing is the third row in fixlist; note the double arrow.

An automatic text backup facility is available from fixr: see ?get.backup. The backup system also allows you to sort edited objects by edit date; see ?fix.order.

Changes with r 2 14

Time was, functions had their source code (including comments, author's preferred layout, etc) stored in a "source" attribute, a simple character vector that was automatically printed when you looked at the function. Thanks to the fiddly, convoluted, opaque "srcref" system that has replaced "source" as of R 2.14— to no real benefit that I can discern— fixr in versions of mvbutils prior to 2.5.209 didn't work correctly with R 2.14 up. Versions of mvbutils after 2.5.509 should work seamlessly.

The technical point is that, from R 2.14 onwards, basic R will not show the source attribute when you type a function name without running the function; unless there is a srcref attribute, all you will see is the deparsed raw code. Not nice; so the replacement to print.function in mvbutils will show the source attribute if it, but no srcref attribute, is present. As soon as you change a function with fixr post-R-2.14, it automatically loses any source attribute and acquires a "proper" srcref attribute, which will from then on.

Local function groups

There are several ways to work with "nested" (or "child" or "lisp-style macro") functions in R, thanks to R's scoping and environment rules; I've used at least four, most often mlocal in package mvbutils. One is to keep a bunch of functions together in a local environment so that they (i) know about each other's existence and can access a shared variable pool, (ii) can be edited en bloc, but (iii) don't need to clutter up the "parent" code with the definitions of the children. fixr will happily create & edit such a function-group, as long as you make sure the last statement in local evaluates to a function. For example:

  # after typing 'fixr( secondfun)' in R, put this into your text editor:
  local({
    tot <- 0
    firstfun <- function( i) tot <<- tot+i
    # secondfun is defined in the next few lines:
    # entirely optional to precede them with 'secondfun <-'
    function( j) {
        for( ii in 1:j)
          firstfun( ii)
        tot
      }
  })

Note that it's not necessary to assign the last definition to a variable inside the local call, unless you want to be able to reach that function recursively from one of the others, as in the first example for local. Note also that firstfun will not be visible "globally", only from within secondfun when it executes.

secondfun above can be debugged as usual with mtrace in the debug package. If you want to turn on mtracing for firstfun without first mtracing secondfun and manually calling mtrace(firstfun) when secondfun appears, do mtrace(firstfun, from=environment( secondfun)).

Note: I think all this works OK in normal use (Oct 2012), but be careful! I doubt it works when building a package, and I'm not sure that R-core intend that it should; you might have to put the local-building code into the .onLoad.

Scriptlets

Note: I've really gone off "scriptlets" (writing this in mid 2016). These days I prefer to keep "scripts" as R character-vector objects (because I dislike having lots of separate files), edited by fixtext and manually executed as required by mrun— which also has a debugging option that automatically applies debug::mtrace. I'm not going to remove support for scriptlets in fixr, but I'm not going to try hard to sort out any bugs either. Instructions below are unchanged, and unchecked, from some years ago.

You can also maintain "scriptlets" with fixr, by embedding the instructions (and comments etc) in an expression(...) statement. Obviously, the result will be an expression; to actually execute a scriptlet after editing it, use eval(). The scriptlet itself is stored in the "source" attribute as a character vector of class cat, and the expression itself is given class thing.with.source so that the source is displayed in preference to the raw expression. Backup files are maintained just as for functions. Only the first syntactically complete statement is returned by fixr (though subsequent material, including extra comments, is always retained in the source attribute); make sure you wrap everything you want done inside that call to expression(...).

Two cases I find useful are:

instructions to create data.frames or matrices by reading from a text file, and maybe doing some initial processing;
expressions for complicated calls with particular datasets to model-fitting functions such as glm.

  # Object creator:
  expression( { # Brace needed for multiple steps
    raw.data <- read.table( "bigfile.txt", header=TRUE, row=NULL)
    # Condense date/time char fields into something more useful:
    raw.data <- within( raw.data, {
      Time <- strptime( paste( DATE, TIME, sep=' '), format="%Y-%m-%d %H:%M:%S")
      rm( DATE, TIME)
    })
    cat( "'raw.data' created OK")
  })

and

  # Complicated call:
  expression(
    glm( LHS ~ captain + beard %in% soup, data=alldata %where% (mushroom=='magic'), family=binomial( link=caterpillar))
  )

Bear in mind that eval(myscriptlet) takes place in .GlobalEnv unless you tell it not to, so the first example above actually creates raw.data even though it returns NULL. To trace evaluation of myscriptlet with the debug package, call debug.eval( myscriptlet).

For a new scriptlet mything, the call to fixr should still just be fixr(mything). However, if you have trouble with this, try fixr( mything, what=list()) instead, even if mything won't be a list(). For an existing non-function, you'll need the new=T argument, e.g. fixr( oldthing, new=T), and you'll then have to manually copy/paste the contents.

Note that you can't use quote() instead of expression(), because any attempt to display the object will cause it to run instead; this is a quirk of S3 methods!

For the brave

In principle, you can also edit non-expressions the same way. For example, you can create a list directly (not requiring subsequent eval()) via a scriptlet like this:

  list(
    a = 1, # a number
    b = 'aardvark' # a character
  )

Nowadays I tend to avoid this, because the code will be executed immediately R detects a changed file, and you have no other (easy) control over when it's evaluated. Also, note that the result will have class thing.with.source (prepended to any other S3 classes it might have), which has its own print method that shows the source; hence you won't see the contents directly when you just type its name, which may or may not be desirable.

Troubleshooting

Rarely, fixr (actually FF) can get confused, and starts returning errors when trying to update objects from their source files. (Switching between "types" of object with the same name— function, expression, character vector— can do this.) In such cases, it can be useful to purge the object from the fix.list, a session-duration data.frame object in workspace mvb.session.info on the search path. Say you are having trouble with object "badthing": then

  fix.list <<- fix.list[ names( fix.list) != 'bad.thing',]

will do the trick (note the double arrow). This means FF will no longer look for updates to the source file for badthing, and you are free to again fixr( badthing).

To purge the entire fix.list, do this:

fix.list <<- fix.list[ 0,]

Shows which functions call what

Description

foodweb is applied to a group of functions (e.g. all those in a workspace); it produces a graphical display showing the hierarchy of which functions call which other ones. This is handy, for instance, when you have a great morass of functions in a workspace, and want to figure out which ones are meant to be called directly. callers.of(funs) and callees.of(funs) show which functions directly call, or are called directly by, funs.

Usage

foodweb( funs, where=1, charlim=80, prune=character(0), rprune,
    ancestors=TRUE, descendents=TRUE, plotting =TRUE, plotmath=FALSE,
    generics=c( "c","print","plot", "["), lwd=0.5, xblank=0.18,
    border="transparent", boxcolor="white", textcolor="black",
    color.lines=TRUE, highlight="red", calc_xpos=plotting, ...)
## S3 method for class 'foodweb'
plot(x, textcolor, boxcolor, xblank, border, textargs = list(),
    use.centres = TRUE, color.lines = TRUE, poly.args = list(),
    expand.xbox = 1.05, expand.ybox = expand.xbox * 1.2, plotmath = FALSE,
    cex=par( "cex"), ...) # S3 method for foodweb
callers.of( funs, fw, recursive=FALSE)
callees.of( funs, fw, recursive=FALSE)

Arguments

funs

character vector OR (in foodweb only) the result of a previous foodweb call

where

position(s) on search path, or an environment, or a list of environments

charlim

controls maximum number of characters per horizontal line of plot

prune

character vector. If omitted, all funs will be shown; otherwise, only ancestors and descendants of functions in prune will be shown. Augments funs if required.

rprune

regexpr version of prune; prune <- funs %matching% rprune. Does NOT augment funs. Overrides prune if set.

ancestors

show ancestors of prune functions?

descendents

show descendents of prune functions?

plotting

graphical display?

plotmath

leave alone

generics

calls TO functions in generics won't be shown

lwd

see par

xblank

leave alone

border

border around name of each object (TRUE/FALSE)

boxcolor

background colour of each object's text box

textcolor

of each object

color.lines

will linking lines be coloured according to the level they originate at?

highlight

seemingly not used

cex

text size (see "cex" in ?par)

calc_xpos

whether to calculate reasonable on-screen positions. Defaults to TRUE if plotting, and FALSE otherwise (to save a bit of time). If you aren't plotting immediately but might plot the results later, you should set this to TRUE.

...

passed to plot.foodweb and thence to par

textargs

not currently used

use.centres

where to start/end linking lines. TRUE is more accurate but less tidy with big webs.

expand.xbox

how much horizontally bigger to make boxes relative to text?

expand.ybox

how much vertically bigger to ditto?

poly.args

other args to rect when boxes are drawn

fw

an object of class foodweb, or the funmat element thereof (see Value)

x

a foodweb (as an argument to plot.foodweb)

recursive

(callees.of and callers.of only) whether to include callee/rs of callee/rs of... (Thanks to William Proffitt for this suggestion.)

Details

The main value is in the graphical display. At the top ("level 0"), functions which don't call any others, and aren't called by any others, are shown without any linking lines. Functions which do call others, but aren't called themselves, appear on the next layer ("level 1"), with lines linking them to functions at other levels. Functions called only by level 1 functions appear next, at level 2, and so on. Functions which call each other will always appear on the same level, linked by a bent double arrow above them. The colour of a linking line shows what level of the hierarchy it came from.

foodweb makes some effort to arrange the functions on the display to keep the number of crossing lines low, but this is a hard problem! Judicious use of prune will help keep the display manageable. Perhaps counterintuitively, any functions NOT linked to those in prune (which all will be, by default) will be pruned from the display.

foodweb tries to catch names of functions that are stored as text, and it will pick up e.g. glm in do.call( "glm", glm.args). There are limits to this, of course (?methods?).

The argument list may be somewhat daunting, but the only ones normally used are funs, where, and prune. Also, to get a readable display, you may need to reduce cex and/or charlim. A number of the less-obvious arguments are set by other functions which rely on plot.foodweb to do their display work. Several may disappear in future versions.

If the display from foodweb is unclear, try foodweb( .Last.value, cex=<<something below 1>>, charlim=<<something probably less than 100>>). This works because foodweb will also accept a foodweb-class object as its argument. You can also assign the result of foodweb to a variable, which is useful if you expect to do a lot of tinkering with the display, or to inspect the who-calls-whom matrix by hand.

callers.of and callees.of process the output of foodweb, looking for immediate dependencies only. The second argument will call foodweb by default, so it may be more efficient to call foodweb first and assign the result to a variable. NB you can set recursive=TRUE for the obvious result.

Bug in rgui windows graphics

When plotting the foodweb, there's a display bug in Rgui for windows which somehow causes the fontsize to shrink in each successive calls! Somehow par("ps") keeps on shrinking. Indeed, on my own machines, calling par(ps=par("ps"))$ps will show a decreasing value each time... Working around this was very tricky; variants of saving/restoring par inside plot.foodweb do not work. As of package mvbutils version 2.8.142, there's an attempted fix directly in foodweb, but conceivably the fixe will somehow cause problems for other people using default graphics windows in Rgui. Let me know if that's you... (in which case I'll add an option() to not apply the fix).

Value

foodweb returns an object of (S3) class foodweb. This has three components:

funmat

a matrix of 0s and 1s showing what (row) calls what (column). The dimnames are the function names.

x

shows the x-axis location of the centre of each function's name in the display, in par("usr") units

level

shows the y-axis location of the centre of each function's name in the display, in par("usr") units. For small numbers of functions, this will be an integer; for larger numbers, there will some adjustment around the nearest integer

Apart from graphical annotation, the main useful thing is funmat, which can be used to work out the "pecking order" and e.g. which functions directly call a given function. callers.of and callees.of return a character vector of function names.

Examples

foodweb( ) # functions in .GlobalEnv
# I have had to trim this set of examples because CRAN thinks it's too slow...
# ... though it's only 5sec on my humble laptop. So...
## Not run: 
if( FALSE && is_very_annoying( CRAN)){ # otherwise CMD CHECK --as-cran tries to run this :/
  foodweb( where="package:mvbutils", cex=0.4, charlim=60) # yikes!
  foodweb( c( find.funs("package:mvbutils"), "paste"))
  # functions in .GlobalEnv, and "paste"
  foodweb( find.funs("package:mvbutils"), prune="paste")
  # only those parts of the tree connected to "paste";
  # NB that funs <- unique( c( funs, prune)) inside "foodweb"
  foodweb( where="package:mvbutils", rprune="aste")
  # doesn't include "paste" as it's not in "mvbutils", and rprune doesn't augment funs
  foodweb( where=asNamespace( "mvbutils")) # secret stuff
  fw <- foodweb( where="package:mvbutils")
  ## End Don't run
  fw <- foodweb( where=asNamespace( "mvbutils")) # also plots
  fw$funmat # a big matrix
  callers.of( "mlocal", fw)
  callees.of( "find.funs", fw)
  # ie only descs of functions whose name contains 'name'
  foodweb( where=asNamespace( 'mvbutils'), rprune="name", ancestors=FALSE, descendents=TRUE)
} # if F

## End(Not run)

Expand relative file path

Description

path is expanded relative to start, with any . being eliminated and any .. being treated as "go back one step". If path doesn't start with a . or .., start is ignored. Might be Windows-specific but probably fairly safe in general. NB that all separators in path and start must be "/".

Usage

full.path(path, start)

Arguments

path

character(1)

start

character(1), defaulting to .

Convenient automated loading of DLLs

Description

generic.dll.loader is to be called from the .onLoad of a package. It calls library.dynam on all the DLLs it can find in the "libs" folder (so you don't need to specify their names), or in the appropriate sub-architecture folder below "libs". It also creates "R aliasses" in your namespace for all the registered low-level routines in each DLL (i.e. those returned by getDLLRegisteredRoutines, qv), so that the routines can be called efficiently later on from your code— see Details.

If you just want to use mvbutils to help build/maintain your package, and don't need your package to import/depend on other functions in mvbutils, then it's fine to just copy the code from generic.dll.loader etc and put it directly into your own .onLoad.

ldyn.tester, create.wrappers.for.dll, and ldyn.unload are to help you develop a DLL that has fully-registered routines, without immediately having to create an R package for it. ldyn.tester loads a DLL and returns its registration info. The DLL must be in a folder .../libs/<subarch> where <subarch> is .Platform$r_arch iff that is non-empty; this is because ldyn.tester merely tricks library.dynam into finding a spurious "package", and that's the folder structure that library.dynam needs to see. create.wrappers.for.dll does the alias-creation mentioned above for generic.dll.loader. ldyn.unload unloads the DLL.

Usage

# Only call this inside your .onLoad!
generic.dll.loader(libname, pkgname, ignore_error=FALSE, dlls=NULL)
# Only call these if you are informally developing a DLL outside a package
ldyn.tester(chname)
create.wrappers.for.dll( this.dll.info, ns=new.env( parent=parent.frame(2)))
ldyn.unload( l1)

Arguments

libname, pkgname

as per .onLoad

ignore_error

?continue to load other DLLs if one fails?

dlls

default (NULL) means "load all the DLLs you can find". Otherwise, it should be a character vector specifying the DLLs by name, without folder— no extension is necessary.

chname

(for ldyn.tester) Path to the DLL (extension not required)

this.dll.info

(for create.wrappers.for.dll) A DLLInfo object, as returned by .dynLibs()[[N]] or library.dynam(...)

ns

(for create.wrappers.for.dll) If you're calling create.wrappers.for.dll manually, then this defaults to the calling environment, probably .GlobalEnv. For "internal use", ns is meant to be a namespace, but you shouldn't be using it like that!

l1

(for ldyn.unload) Result of previous call to ldyn.tester

Details

R-callable aliasses for your low-level routines will be called e.g. C_myrout1, Call_myrout2, F_myrout3, or Ext_myrout4, depending on type. Those for routines in "myfirstdll" will be stored in the environment LL_myfirstdll ("Low Level") in your package's namespace, which itself inherits from the namespace. In your own R code elsewhere in your package, you can then have something like

  .C( LL_myfirstdll$C_myrout1, <<arguments>>) # NB no need for PACKAGE argument

Getting fancy, you can alternatively set the environment of your calling function to LL_myfirstdll (which inherits from the namespace, so all your other functions are still visible). In that case, you can just write

  .C( C_myrout1, <<arguments>>)

Value

generic.dll.loader returns NULL (but see Details). ldyn.tester returns a class "DLLInfo" object if successful. ldyn.unload should return NULL if successful, and crash otherwise. create.wrappers.for.dll returns the environment containing the aliasses. Be careful with accidentally saving and loading the results of ldyn.tester and create.wrappers.for.dll; they won't be valid in a new R session. You might be better off creating them in the mvb.session.info environment on the search path; they will still be found, but won't persist in a different R session. See Examples.

Examples

## Not run: 
if( FALSE && is_very_annoying( CRAN)){ # otherwise CMD CHECK --as-cran tries to run this :/
  mypack:::.onLoad <- function( libname, pkgname) generic.dll.loader( libname, pkgname)
  #... or just copy the code into your .onLoad
  # For casual testing of a DLL that's not yet in a package
  dl <- ldyn.tester( 'path/to/my/dll/libs/i386/mydll.dll')
  getDLLRegisteredRoutines( l1)
  LL_mydll <- create.wrappers.for.dll( dl)
  .C( LL_mydll$C_rout1, as.integer( 0)) # ... whatever!
  ldyn.unload( dl)
  # Safer because not permanent:
  assign( 'dl', ldyn.tester( 'path/to/my/dll/libs/i386/mydll.dll'), pos='mvb.session.info')
  assign( 'LL_mydll', create.wrappers.for.dll( dl), pos='mvb.session.info')
  .C( LL.mydll$C_rout1, as.integer( 0)) # ... whatever!
} # if F

## End(Not run)

Text backups of function source code

Description

get.backup retrieves backups of a function or character object. create.backups creates backup files for all hitherto-unbacked-up functions in a search environment. For get.backup to work, all backups must have been created using the fixr system (or create.backups). read.bkind shows the names of objects with backups, and gives their associated filenames.

Usage

get.backup( name, where=1, rev=TRUE, zap.name=TRUE, unlength=TRUE)
create.backups( pos=1)
read.bkind( where=1)

Arguments

name

function name (character)

where, pos

position in search path (character or numeric), or e.g. ..mypack for maintained package mypack.

rev

if TRUE, most recent backup comes first in the return value

zap.name

if TRUE, the tag "funname" <- at the start of each backup is removed

unlength

if TRUE, the first line of each backup is removed iff it consists only of a number equal to 1+length( object). This matches the (current) format of character object backups.

Details

fixr and FF are able to maintain text-file backups of source code, in a directory ".Backup.mvb" below the task directory. The directory will contain a file called "index", plus files BU1, BU2, etc. "index" shows the correspondence between function names and BUx files. Each BUx file contains multiple copies of the source code, with the oldest first. Even if a function is removed (or moved) from the workspace, its BUx file and "index" entry are not deleted.

The number of backups kept is controlled by options(backup.fix), a numeric vector of length 2. The first element is how many backups to keep from the current R session. The second is how many previous R sessions to keep the final version of the source code from. Older versions get discarded. I use c(5,2). If you want to use the backup facility, you'll need to set this option in your .First. If the option is not set, no backups happen. If set, then every call to Save or Save.pos will create backups for all previously-unbackupped functions, by automatically calling create.backups. create.backups can also be called manually, to create the backup directory, index, and backup files for all functions in the currently-top task.

get.backup returns all available backup versions as character vectors, by default with the most recent first. To turn one of these character vectors into a function, a source step is needed; see Examples.

read.bkind shows which file to look for particular backups in. These files are text-format, so you can look at one in a text editor and manually extract the parts you want. You can also use read.bkind to set up a restoration-of-everything, as shown in Examples. I deliberately haven't included a function for mass restoration in mvbutils, because it's too dangerous and individual needs vary.

Currently there is no automatic way to determine the type of a backed-up object. All backups are stored as text, so text objects look very similar to functions. However, the first line of a text object is just a number equal to the length of the text object; the first line of a function object starts with "function(" or "structure( function(". The examples show one way to distinguish automatically.

The function fix.order uses the access dates of backup files to list your functions sorted by date order.

move will also move backup files and update INDEX files appropriately.

Value

get.backup

Either NULL with a warning, if no backups are found, or a list containing the backups, each as a character vector.

create.backups

NULL

read.bkind

a list with components files and object.names; these are character vector with elements in 1-1 correspondence. Some of the objects named may not currently exist in where.

Author(s)

Mark Bravington

Examples

## Not run: 
if( FALSE && is_very_annoying( CRAN)){ # otherwise CMD CHECK --as-cran tries to run this :/
  ## Need some backups first
  # Restore a function:
  g1 <- get.backup( "myfun", "package:myfun")[[1]] # returns most recent backup only
  # To turn this into an actual function (with source attribute as per your formatting):
  myfun <- source.mvb( textConnection( g1)) # would be nice to have an self-closing t.c.
  cat( get.backup( "myfun", "package:myfun", zap=FALSE)[[1]][1])
  # shows "myfun" <- function...
  # Restore a character vector:
  mycharvec <- as.cat( get.backup( 'mycharvec', ..mypackage)[[1]]) # ready to roll
  # Restore most recent backup of everything... brave!
  # Will include functions & charvecs that have subsequently been deleted
  bks <- read.bkind() # in current task
  for( i in bks$object.names) {
    cat( "Restoring ", i, "...")
    gb <- get.backup( i, unlength=FALSE)[[1]] # unlength F so we can check type
    # Is it a charvec?
    if( grepl( '^ *[0-9]+ *$', gb[1])) # could check length too
      gb <- as.cat( gb[-1]) # remove line showing length and...
      # ...set class to "cat" for nice printing, as per 'as.cat'
    else {
      # Nope, so it's a function and needs to be sourced
      tc <- textConnection( gb)
      gbfun <- try( source.mvb( gb)) # will set source attribute, documentation etc.
      close( tc)
      if( gbfun %is.a% "try-error") {
        gbfun <- stop( function( ...) stop( ii %&% " failed to parse"), list( ii=i))
        attr( gbfun, 'source') <- gb # still assign source attribute
      }
      gb <- gbfun
    }
    assign( i, gb)
    cat( '\n')
  }
} # if F

## End(Not run)

Detect number of CPU cores in CRAN-robust way

Description

This is only relevant for parallel code inside package examples and vignettes. In real applications, you would call parallel::detectCores (qv) and then decide how many of those to use. But CRAN enforces a limit of (currently) 2 cores when checking examples (and presumably vignettes etc)— and doesn't give you any way to find out what the limit is from code; it just gives an error. Since the entire point of parallel processing is to use lots of cores if available, CRAN makes it impossible to demonstrate anything realistic in examples, if you want to get them past CRAN. You can of course limit the number of cores to 2 purely for CRAN's benefit, but then you are castrating your code for real tests.

To avoid this lunacy, you can call this function inside your example/vignette. It counts roughly how many cores are allowed (ie won't cause an error), up to the limit requested by its argument (which you would get from detectCores etc). Actually it only goes in multiples of 2, so it won't necessarily give you the max.

In real code as opposed to examples, you probably don't want this; rather call parallel::detectCores and then decide for yourself, as I mention in two other places in this helpfile!

The "algorithm" is to start with 2 cores and keep doubling until there's an error (trapped with try), or until the target is reached. This will at least be "quick" on CRAN. But it always means setting up and destroying a cluster at least twice, which is inefficient if you can just decide for yourself! And if you are really happy just having your example use 2 cores, then just use 2 cores in the example— don't bother with this!

At present, this code is only in mvbutils so I can make the numvbderiv_parallel (qv) example run nicely; but I guess I might use it in other packages eventually. Parallel stuff in R is messy; be warned.

Usage

get_ncores_CRANal( target)

Arguments

target

How many cores you would like. Presumably, requires a previous call to parallel::detectCores and also a sensible decision on your part.

Value

Integer

Examples

## Not run: 
"See numvbderiv example"

## End(Not run)

Update local git repo

Description

Update local git repo of your package (e.g. splendid), from a source package. Well, all it does is delete old files, and overcopy any whose MD5 sum has changed; you still have to do all the git bollocks yourself (add/commit/push, in my book). Maybe you should do a "git pull" first before all this, so that you can have the fun of reconciling changes before the extreme fun of "cannot pull; changes..." messages and the inevitable descent into "git push force".

IMO everything is simpler if your R source files are stored individually (function-by-function) because then you can easily see what changed, but the vast and unenlightened hordes disagree with me and plonk it all in one single mega-file, complete with Roxygen "documentation" (don't get me started...). Sigh.

I hate Git, BTW— in case that's not already obvious. This is really for my own use, in conjunction with unpackage, for a way to reconcile my own devel process with Git.

Usage

gitup_pkg(
  pkg,
  gitparent,
  character.only = FALSE,
  excludo='funs.rda')

Arguments

pkg

name of yr task package, as per install_pkg etc (there are various options)

gitparent

folder where yr local git copy lives, or possibly one level higher

character.only

if TRUE, interpret pkg like a normal R variable, not like in library

excludo

files to not copy

Modify standard R functions, including tweaking their default arguments

Description

You probably shouldn't use these... hack lets you easily change the argument defaults of a function. assign.to.base replaces a function in base or utils (or any other package and its namespace and S3 methods table) with a modified version, possibly produced by hack. Package mvbutils uses these two to change the default position for library attachment, etc; see the code of mvbutils:::.onLoad.

Note that, if you call assign.to.base during the .onLoad of your package, then it must be called directly from the .onLoad, not via an intermediate function; otherwise, it won't correctly reset its argument in the import-environment of your namespace. To get round this, wrap it in an mlocal; see mvbutils:::.onLoad for an example.

assign.to.base is only meant for changing things in packages, e.g. not for things that merely sit in non-package environments high on the search path (where <<- should work). I don't know how it will behave if you try. It won't work for S4 methods, either.

Usage

 hack( fun, ...)
 assign.to.base( x, what=,  where=-1, in.imports=, override.env = TRUE)

Arguments

fun

a function (not a character string)

...

pairlist of arguments and new default values, e.g. arg1=1+2. Things on RHS of equal signs will not be evaluated.

x

function name (a character string)

what

function to replace x, defaulting to "replacement." %&% x

where

where to find the replacement function, defaulting to usual search path

in.imports

usually TRUE, if this is being called from an .onLoad method in a namespace. Make sure any copies of the function to be changed that are in the "imports" namespace also get changed. See Description.

override.env

should the replacement use its own environment, or (by default) the one that was originally there?

Examples

## Not run: 
if( FALSE && is_very_annoying( CRAN)){ # otherwise CMD CHECK --as-cran tries to run this :/
  hack( dir, all.files=getOption( "ls.all.files", TRUE)) # from my '.First'
  assign.to.base( "dir", hack( dir, all.files=TRUE))
} # if F

## End(Not run)

The R help system

Description

?x is the usual way to get help on x; it's primarily a shortcut for help(x). There are rarer but more flexible variations, such as x?y or help(x,...). See base-R help on help. The versions of help and ? exported by mvbutils behave exactly the same as base-R, unless base-R help fails after being called with a single argument, e.g. help(topic). In that case, if topic is an object with an attribute called "doc" (or failing that if topic or topic.doc is a character vector), then the attribute (or the character object) will be formatted and displayed by the pager (by default) or browser. This lets you write informal documentation for non-package objects that can still be found by help, and by colleagues you distribute your code to. See dochelp for more information. The rest of this documentation is copied directly from base-R for help, except as noted under Arguments for help_type.

Usage

help(topic, package = NULL, lib.loc = NULL,
     verbose = getOption("verbose"),
     try.all.packages = getOption("help.try.all.packages"),
     help_type = getOption("help_type"))

Arguments

topic

usually, a name or character string specifying the topic for which help is sought. A character string (enclosed in explicit single or double quotes) is always taken as naming a topic. If the value of topic is a length-one character vector the topic is taken to be the value of the only element. Otherwise topic must be a name or a reserved word (if syntactically valid) or character string. See Details for what happens if this is omitted.

package

a name or character vector giving the packages to look into for documentation, or NULL. By default, all packages in the search path are used. To avoid a name being deparsed use e.g. (pkg_ref) (see the examples).

lib.loc

a character vector of directory names of R libraries, or NULL. The default value of NULL corresponds to all libraries currently known. If the default is used, the loaded packages are searched before the libraries. This is not used for HTML help (see Details).

verbose

logical; if TRUE, the file name is reported.

try.all.packages

logical; see Note.

help_type

character string:the type of help required. Possible values are "text", "html" and "pdf". Case is ignored, and partial matching is allowed. [Note that, for informal doco, getOption( mvb_help_type, "text") is used; i.e., the default there is always the pager, which lets you be as informal as you please.]

Details

The following types of help are available:

Plain text help
HTML help pages with hyperlinks to other topics, shown in a browser by browseURL. If for some reason HTML help is unavailable (see startDynamicHelp), plain text help will be used instead.
For help only, typeset as PDF - see the section on Offline help.

The default for the type of help is selected when R is installed - the factory-fresh default is HTML help.

The rendering of text help will use directional quotes in suitable locales (UTF-8 and single-byte Windows locales): sometimes the fonts used do not support these quotes so this can be turned off by setting options(useFancyQuotes = FALSE).

topic is not optional. If it is omitted, R will give:

If a package is specified, (text or, in interactive use only, HTML) information on the package, including hints/links to suitable help topics.
If lib.loc only is specified, a (text) list of available packages.
Help on help itself if none of the first three arguments is specified.

Some topics need to be quoted (by backticks) or given as a character string. These include those which cannot syntactically appear on their own such as unary and binary operators, function and control-flow reserved words (including if, else for, in, repeat, while, break and next). The other reserved words can be used as if they were names, for example TRUE, NA and Inf.

If multiple help files matching topic are found, in interactive use a menu is presented for the user to choose one: in batch use the first on the search path is used. (For HTML help the menu will be an HTML page, otherwise a graphical menu if possible if getOption("menu.graphics") is true, the default.)

Note that HTML help does not make use of lib.loc: it will always look first in the attached packages and then along libPaths().

Offline help

Typeset documentation is produced by running the LaTeX version of the help page through pdflatex: this will produce a PDF file.

The appearance of the output can be customized through a file Rhelp.cfg somewhere in your LaTeX search path: this will be input as a LaTeX style file after Rd.sty. Some environment variables are consulted, notably R_PAPERSIZE (via getOption("papersize")) and R_RD4PDF (see Making manuals in the R Installation and Administration Manual).

If there is a function offline_help_helper in the workspace or further down the search path it is used to do the typesetting, otherwise the function of that name in the utils namespace (to which the first paragraph applies). It should accept at least two arguments, the name of the LaTeX file to be typeset and the type (which as from R 2.15.0 is ignored). As from R 2.14.0 it should accept a third argument, texinputs, which will give the graphics path when the help document contains figures, and will otherwise not be supplied.

Note

Unless lib.loc is specified explicitly, the loaded packages are searched before those in the specified libraries. This ensures that if a library is loaded from a library not in the known library trees, then the help from the loaded library is used. If lib.loc is specified explicitly, the loaded packages are not searched.

If this search fails and argument try.all.packages is TRUE and neither packages nor lib.loc is specified, then all the packages in the known library trees are searched for help on topic and a list of (any) packages where help may be found is displayed (with hyperlinks for help_type = "html"). NB: searching all packages can be slow, especially the first time (caching of files by the OS can expedite subsequent searches dramatically).

References

Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.

Examples

help()
help(help)              # the same
help(lapply)
help("for")             # or ?"for", but quotes/backticks are needed
help(package="splines") # get help even when package is not loaded
topi <- "women"
help(topi)
try(help("bs", try.all.packages=FALSE)) # reports not found (an error)
help("bs", try.all.packages=TRUE)       # reports can be found
                                        # in package 'splines'
## For programmatic use:
topic <- "family"; pkg_ref <- "stats"
help((topic), (pkg_ref))

Convert help files to flatdoc format.

Description

Converts a vanilla R help file (as shown in the internal pager) to plain-text format. The output conventions are those in doc2Rd, so the output can be turned into Rd-format by running it through doc2Rd. This function is useful if you have existing Rd-format documentation and want to try out the flatdoc system of integrated code and documentation. Revised Nov 2017: now pretty good, but not perfect; see Details.

Usage

 help2flatdoc( fun.name, pkg=NULL, text=NULL, aliases=NULL)

Arguments

fun.name

function name (a character string)

pkg

name of package

text

plain-text help

aliases

normally leave this empty— see Details.

The real argument is text; if missing, this is deduced from the help for fun.name (need not be a function) in the installed package pkg.

Details

The package containing fun.name must be loaded first. If you write documentation using flatdoc, prepare the package with pre.install, build it with RCMD BUILD or INSTALL, and run help2flatdoc on the result, you should largely recover your original flat-format documentation. Some exceptions:

Nesting in lists is ignored.
Numbered lists won't convert back correctly (Nov 2017), but the problem there is in doc2Rd.
Link-triggering phrases (i.e. that will be picked up by doc2Rd, such as "see <blah>") aren't explicitly created– probably, links could be automated better via an argument to doc2Rd.

Aliases (i.e. if this doco can be found by help under several different names) are deduced from function calls in the Usage section, in addition to anything supplied specifically in the alias argument. The latter is really just meant for internal use by unpackage.

Value

A character vector of plain-text help, with class cat so it prints nicely.

Examples

cd.doc <- help2flatdoc( "cd", "mvbutils")
print( cd.doc)
cd.Rd <- doc2Rd( cd.doc)

Package building, distributing, and checking

Description

These are convenient wrappers for R's package creation and installation tools. They are designed to be used on packages created from tasks via mvbutils package, specifically pre.install (though they can be used for "home-made" packages). The mvbutils approach deliberately makes re-installation a rare event, and one call to install.pkg might suffice for the entire life of a simple package. After that very first installation, you'd probably only need to call install.pkg if (when...) new versions of R entail re-installation of packages, and build.pkg/build.pkg.binary/check.pkg when you want to give your package to others, either directly or via CRAN etc.

Folders

Source packages and built packages go into various folders, depending on various things. Normally you shouldn't have to mess around with the folder structure, but you will still need to know where built packages are put so that you can send them to other people. Specifically, these ...pkg... functions work in the highest-versioned "Rx.y" folder that is not newer than the running R version. If no such folder exists, then 'build.pkg/build.pkg.binary" will create one from the running R version; you can also create such a folder manually, as a kind of "checkpoint", when you want to make your package dependent on a specific R version. See "Folders and R versions" in mvbutils.packaging.tools for full details.

There are also two minor housekeeping functions: cull.old.builds to tidy up detritus, and set.rcmd.vars which does absolutely nothing (yet). cull.old.builds looks through all "Rx.y" folders (where built packages live) and deletes the least-recent ".tar.gz" and ".zip" files in each (regardless of which built package versions are in the other "Rx.y" folders).

Usage

  # Usually: build.pkg( mypack) etc
  install.pkg( pkg, character.only=FALSE, lib=.libPaths()[1], flags=character(0),
      multiarch=NA, preclean=TRUE)
  build.pkg( pkg, character.only=FALSE, flags=character(0), cull.old.builds=TRUE)
  build.pkg.binary( pkg, character.only=FALSE, flags=character(0),
      cull.old.builds=TRUE, multiarch=NA, preclean=TRUE)
  check.pkg( pkg, character.only=FALSE, build.flags=character(0),
      check.flags=character( 0), envars=character(0), CRAN=FALSE)
  cull.old.builds( pkg, character.only=FALSE)
  set.rcmd.vars( ...) # NYI; see envars arg...
  # ... or if it doesn't and you need to set env vars eg PATH
  # for R CMD to work,  then DIY; see *Details*

Arguments

See the examples

pkg

usually an unquoted package name, but interpretation can be changed by non-default character.only. You can also get away with eg ..mypack, ie a direct reference to the maintained package. A folder name can also be used, for a non-mvbutils-maintained package. Just as if it was "maintained", the folder should contain a subfolder with the (same) package name and the real package contents (eg "c:/r/mypack/mypack/DESCRIPTION" should exist), and any built things will go into eg "c:/r/mypack/R3.2"

character.only

default FALSE. If TRUE, treat pkg as a normal object, which should therefore be a string containing the package's name. If character.only is itself a string, it will override pkg and be treated as the name of the package.

lib

(install.pkg only) where to install to; default is the same place R would install to, i.e. .libPaths()[1].

flags

character vector, by default empty. Any entries should be function-specific flags, such as "–md5" for build.pkg. It will be passed through paste( flags, collapse=" "), so you can supply flags individually (eg flags=c( "--md5", "--compact.vignettes")) or jointly (eg flags="--md5 --compact.vignettes").

build.flags, check.flags

(check.pkg only) as per flags but for the two separate parts of check.pkg (see Details). check.flags is overridden if CRAN==TRUE'.

envars

optional named character vector of envars to set on the command-line, which is how you control some RCMD behaviour. They will be restored afterwards (or deleted if they didn't exist beforehand).

preclean

adds flag "–preclean" if TRUE (the default); this is probably a good idea since one build-failure can otherwise cause R to keep failing to build.

multiarch

Adds flag "-no-multiarch" if FALSE. Defaults to TRUE unless "Biarch:FALSE" is found in the DESCRIPTION. Default used to be FALSE when I was unable to get 64bit versions to build. Now I mostly can (after working round BINPREF64 bug in R3.3.something by futzing around in etc/arch/Makeconf based on random internet blogs).

cull.old.builds

self-explanatory

CRAN

(check.pkg only) if TRUE, set the --as-cran flag to "RCMD check" and unset all other check flags (except library locations, which are set automatically by all these functions). Note that this will cause R to check various internet databases, and so can be slow.

...

name-value pairs of system environment variables (not used for now)

Details

Before doing any of this, you need to have used pre.install to create a source package. (Or patch.install, if you've done all this before and just want to re-install/build/check for some reason.)

The only environment variable currently made known to R CMD is R_LIBS– let me know if others would be useful.

install.pkg calls "R CMD INSTALL" to install from a source package.

build.pkg calls "R CMD build" to wrap up the source package into a "tarball", as required by CRAN and also for distribution to non-Windows-and-Mac platforms.

build.pkg.binary (Windows & Mac only) calls "R CMD INSTALL –build" to generate a binary package. A temporary installation directory is used, so your existing installation is not overwritten or deleted if there's a problem; R CMD INSTALL –build has a nasty habit of doing just that unless you're careful, which build.pkg.binary is.

check.pkg calls "R CMD check" after first calling build.pkg (more efficiently, I should perhaps try to work out whether there's an up-to-date tarball already). It doesn't delete the tarball afterwards. It may also be possible for you to do some checks directly from R via functions in the utils package, which is potentially a lot quicker. However, NB the possibility of interference with your current R session. For example, at one stage codoc (which is the only check that I personally find very useful) tried to unload & load the package, which was very bad; but I think that may no longer be the case.

You may have to set some environment variables (eg PATH, and perhaps R_LIBS) for the underlying R CMD calls to work. As of mvbutils v2.11.18, the envars argument might do the trick (just for the duration of the RCMD call). Otherwise, currently you have to do it manually— your .First or .Rprofile would be a good place. [There was a plan for a function set.rcmd.vars that could temporarily set envars before each RCMD call and then restore them afterwards, but I've shelved it in favour of envars, at least for now.]

Perhaps it would be desirable to let some flags be set automatically, eg via something in the pre.install.hook for a package. I'll add this if requested.

Value

Ideally, the "status code" of the corresponding RCMD operation: 0 for success or some other integer if not. It will have several attributes attached, most usefully "output" which duplicates what's printed while the functions are running. (Turn off "buffered output" in RGui to see it as it's happening.) This requires the existence of the "tee" shell redirection facility, which is built-in to Linux and presumably Macs, but not to Windows. You can get one version from Coreutils in GnuWin32; make sure this is on your PATH, but probably after the Rtools folders required by the R build process, to avoid conflicts between the other Coreutils versions and those in Rtools (I don't know what I'm talking about here, obviously; I'm just describing what I've done, which seems to work). If "tee" eventually moves to Rtools, then this won't be necessary. If no "tee" is available, then:

- progress of RCMD will be shown "live" in a separate shell window

- the status code is returned as NA, but still has the attributes including "output". You could, I suppose, "parse" the output somehow to check for failure.

The point of all this "tee" business is that there's no reliable way in R itself to both show progress on-screen within R (which is useful, because these procedures can be slow) and to return the screen output as a character vector (which is useful so you can subsequently, pore through the error messages, or bask in a miasma of smugness).

Examples

## Not run: 
if( FALSE && is_very_annoying( CRAN)){ # otherwise CMD CHECK --as-cran tries to run this :/
  # First time package installation
  # Must be cd()ed to task above 'mvbutils'
  maintain.packages( mvbutils)
  pre.install( mvbutils)
  install.pkg( mvbutils)
  # Subsequent maintenance is all done by:
  patch.install( mvbutils)
  # For distro to
  build.pkg( mvbutils)
  # or on Windows (?and Macs?)
  build.pkg.binary( mvbutils)
  # If you enjoy R CMD CHECK:
  check.pkg( mvbutils)
  # How to not fail if Suggestees are missing (I think), via envars
  check.pkg( mvbutils, envars=c( '_R_CHECK_FORCE_SUGGESTS_'=0))
  # Also legal:
  build.pkg( ..mvbutils)
  # To do it under programmatic control
  for( ipack in all.my.package.names) {
    build.pkg( char=ipack)
  }
} # if F

## End(Not run)

Auto-registration and loading of dynamic library

Description

A bit like useDynLib but for direct use in your own package's .onLoad, this loads a DLL and creates objects that allow the DLL routines to be called directly. If your package "Splendid" calls library.dynam.reg in its .onLoad() to load a DLL "speedoo" which contains routines "whoosh" and "zoom", then an environment "C_speedoo" will be created in the asNamespace("Splendid"), and the environment will contain objects whoosh and zoom. R-code routines in "Splendid" can then call e.g.

  .C( C_speedo$whoosh, ....)

You can only call library.dynam.reg inside .onLoad, because after that the namespace will be sealed so you can't poke more objects into it.

Note

Currently, all routines go into C_speedoo, regardless of how they are meant to be called (.C, .Call, .Fortran, or .External). It's up to you to call them the right way. I might change this to create separate Call_speedoo etc.

Note2

As of R3.1.1 at least, it's possible that "recent" changes to the useDynLib directive in a package namespace might obviate the need for this function. In particular, useDynLib can now create an environment/list that refers directly to DLL, containing references to individual routines (which will be slightly slowed because they need to be looked up each time). Also, useDynLib can automatically register its routines. What's not obvious is whether it can yet do both these things together— which is what library.dynam.reg is aimed at.

Usage

# Only inside a '.onLoad', where you will already know "package" and "lib.loc"
library.dynam.reg(chname, package, lib.loc, ...)

Arguments

chname

DLL name, a string— without any path

package, lib.loc

strings as for library.dynam

...

other args to library.dynam

Cacheing objects for lazy-load access

Description

load.refdb is like load, but automatically calls setup.mcache to create access arrangements for cached objects. You probably don't need to call it directly.

Usage

load.refdb( file, envir, fpath=attr( envir, "path"))

Arguments

file

a filename relative to fpath

envir

an environment or (more usually) a position on the search path (numeric or character)

fpath

a directory. Usually the default will do.

Author(s)

Mark Bravington

Macro-like functions

Description

local.on.exit is the analogue of on.exit for "nested" or "macro" functions written with mlocal.

Usage

# Inside an 'mlocal' function of the form
# function( <<args>>, nlocal=sys.parent(), <<temp.params>>) mlocal({ <<code>> })
local.on.exit( expr, add=FALSE)

Arguments

expr

the expression to evaluate when the function ends

add

if TRUE, the expression will be appended to the existing local.on.exit expression. If FALSE, the latter is overwritten.

Details

on.exit doesn't work properly inside an mlocal function, because the scoping is wrong (though sometimes you get away with it). Use local.on.exit instead, in exactly the same way. I can't find any way to set the exit code in the calling function from within an mlocal function.

Exit code will be executed before any temporary variables are removed (see mlocal).

Author(s)

Mark Bravington

Examples

ffin <- function( nlocal=sys.parent(), x1234, yyy) mlocal({
  x1234 <- yyy <- 1 # x1234 & yyy are temporary variables
  # on.exit( cat( yyy)) # would crash after not finding yyy
  local.on.exit( cat( yyy))
  })
ffout <- function() {
  x1234 <- 99
  ffin()
  x1234 # still 99 because x1234 was temporary
}
ffout()

Macro-like functions

Description

In an mlocal function, local.return should be used whenever return is called, wrapped inside the return call around the return arguments.

Usage

local.return(...) # Don't use it like this!
# Correct usage: return( local.return( ...))

Arguments

...

named and unnamed list, handled the same way as return before R 1.8, or as returnList

Author(s)

Mark Bravington

Examples

ffin <- function( nlocal=sys.parent()) mlocal( return( local.return( a)))
ffout <- function( a) ffin()
ffout( 3) # 3
# whereas:
ffin <- function( nlocal=sys.parent()) mlocal( return( a))
try(
  ffout( 3) # error:; "return" alone doesn't work
)

"Declare" child functions, allowing much tidier code

Description

Only call this within a function, say f. The named functions are copied into the environment of f, with their environments set to the environment of f. This means that when you call one of the named functions later in f, it will be able to see all the variables in f, just as if you had defined the function inside f. Using localfuncs avoids you having to clutter f with definitions of child functions. It differs from mlocal in that the local functions won't be changing objects directly in f unless they use <<- – they will instead have normal R lexical scoping.

Usage

localfuncs(funcs)

Arguments

funcs

character vector of function names

Examples

inner <- function( x) {
  y <<- y+x
  0
}
outer <- function( z) {
  # Multiply z by 2!
  y <- z
  localfuncs( 'inner')
  inner( z)
  return( y)
}
outer( 4) # 8

Report objects and their memory sizes

Description

lsize is like ls, except it returns a numeric vector whose names are the object names, and whose elements are the object sizes. The vector is sorted in order of increasing size. lsize avoids loading objects cached using mlazy; instead of their true size, it uses the size of the file that stores each cached object, which is shown as a negative number. The file size is typically smaller than the size of the loaded object, because mlazy saves a compressed version. NB that lsize will scan all objects in the environment, including ones with funny names, whereas ls does so only if its all.names argument is set to TRUE.

Missing objects should return 0 (which may or may not be exactly correct!). You won't normally get that, but see Examples for a perverse case.

Environments

If there are environment-objects (which are really symbols that point to frames— see R) within the very environment you are lsizeing, what should their size be? There is no perfect answer. R will tell you it's "56 bytes" and that's what you'll get with the default recursive=0. However, each environment could be holding arbitrarily large objects— so you might want to know how much memory they are "really" taking. You can do so by setting recursive to a positive number, which also controls the depth of recursion (because environments can themselves contain other environments).

However-however, those environments might be innocuous things that just refer to shared system-y ones (eg namespaces of packages, copies of .GlobalEnv, etc), in which case they are not costing any memory. And if two symbols refer to the same actual environment, they are duplicates and second one is not taking any extra real memory. So lsize tries to keep track of such cases (whenever recursive>0), and not to incorporate their memory-use; whether it does so optimally, is another Q. NB that for duplicates, only the alphabetically-first will be recursed.

There are lots of places that environments can lurk within other objects: notably, environments-of-functions, and formulae/results of calls to lm etc. These can take huge amounts of memory, sometimes manifest only when saving/loading. lsize does not currently attempt to measure those; but see find.lurking.envs.

Usage

lsize( envir=.GlobalEnv, recursive=0)

Arguments

envir

where to look for the objects. Will be coerced to environment, so that e.g. lsize( 2) and lsize( "package:mvbutils") work. envir can be a sys.frame– useful during debugging.

recursive

depth of recursion to allow, for objects that are themselves environments. See .ENVIRONMENTS.

Value

Named numeric vector.

Author(s)

Mark Bravington

Examples

# Current workspace
lsize()
# Contrived example to show objects in a function's environment
nonsense <- function(..., a, b, c) lsize( environment())
try( # this might be fragile with missings; OK in R4.3
  nonsense()
)
# a, b, c are all missing; this example might break in future R versions
# ...   a   b   c
#   0   0   0   0

Set up task package for live editing

Description

See mvbutils.packaging.tools before reading or experimenting!

Set up task package(s) for editing and/or live-editing. Usually called in .First or .First.task. You need to be cded into the parent task of your task-package. maintain.packages must be called before loading the package via library or require. The converse, unmaintain.package, is rarely needed; it's really only meant for when unpackage doesn't work properly, and you want a "clean slate" task package.

Usage

# E.g. in your .First, after library( mvbutils), or in...
# ... a '.First.task' above yr task-package
maintain.packages(..., character.only = FALSE, autopatch=FALSE)
unmaintain.package( pkg, character.only = FALSE)

Arguments

...

names of packages, unquoted unless character.only is TRUE. Package names must correspond to subtasks of the current task.

character.only

see above

pkg

name of package, unquoted unless character.only is TRUE.

autopatch

whether to patch.install out-of-date installed packages (default FALSE, but TRUE is common).

Details

maintain.packages( mypack) loads a copy of your task-package "mypack" (as stored in its ".RData" file) into a environment ..mypack (an "in-memory-task-package"), which itself lives in the "mvb.session.info" environment on the search path. You don't normally need to know this, because normally you'd modify/create/delete objects in the package via fixr or fixr(..., pkg="mypack") or rm.pkg( ..., pkg="mypack"). But to move objects between the package and other tasks, you do need to refer to the in-memory task package, e.g. via move( ..., from=..Splendid, to=subtask/of/current). In most cases, you will be prompted afterwards for whether to save the task package on disk, but you can always do yourself via Save.pos( ..Splendid). Note that only these updates and saves only update the task package and the loaded package. To update the source package using the task package, call pre.install; to update the installed package on disk as well as the source package, call patch.install.

Creating new things

It's always safe to create new objects of any type in .GlobalEnv, then use move(newthing,.,..mypack). For a new function, you can shortcut this two-step process and create it directly in the in-memory maintained package, via fixr(..mypack$newfun); fixr will take care of synchronization with the loaded package. This also ought to work for text objects created via fixtext. Otherwise, use the two-step route, unless you have a good reason to do the following...

Directly modifying the maintained package

Rarely, you may have a really good reason to directly modify the contents of ..mypack, e.g. via

  ..mypack$newfun <<- function( x) whatever

You can do it, but there are two problems to be aware of. The first is that changes won't be directly propagated to the loaded package, possibly not even after patch.install (though they will be honoured when you library() the package again). That is definitely the case for general data objects, and I'm not sure about functions; however, successful propagation after patch.install may happen for a special objects such as mypack.DESCRIPTION and documentation objects. Hence my general advice is to use fixr or move.

The second, minor, problem is that you will probably forget to use <<- and will use <- instead, so that a local copy of ..mypack will be created in the current task. This is no big deal, and you can just rm the local copy; the local copy and the master copy in "mvb.session.info" both point to the same thing, and modifying one implies modifying the other, so that deleting the local copy won't lose your changes. Save detects accidental local copies of task packages, and omits them from the disk image, so there shouldn't be any problems next time you start R even if you completely forget about local/master copies.

Autopatch

If autopatch==TRUE, then maintain.packages will check whether the corresponding installed packages are older than the ".RData" files of the task packages. If they are, it will do a full patch.install; if not, it will still call patch.install but only to reverse-update any bundled DLLs (see pre.install), not to re-install the R-source. I find autopatch useful with packages containing C code, where a crash in the C code can cause R to die before the most recent R-code changes have been "committed" with patch.install. When you next start R, a call to maintain.packages with autopatch=TRUE will "commit" the changes before the package is loaded, because you have to call maintain.packages before library; this seems to be more reliable than running patch.install manually after library after a restart.

Maintained packages as tasks

If you use mvbutils to pre-build your package, then your package must exist as a task in the cd hierarchy. Older versions of mvbutils allowed you to cd to a maintained package, but this is now forbidden because of the scope for confusion. Thanks to maintain.packages, there is no compelling need to have the package/task at the top of the search path; fixr, move, etc work just fine without. If you really do want to cd to a maintained package, you must call unmaintain.package first.

One piece of cleanup that I recommend, is to move any subtasks of "mypack" one level up in the task hierarchy, and to remove the tasks object from "Splendid" itself, e.g. via something like:

  cd( task.above.splendid)
  tasks <- c( tasks, combined.file.paths( tasks[ "Splendid"], ..Splendid$tasks))
  # ... combined.file.paths is an imaginary function. Watch out if you've used relative paths!
  rm.pkg( tasks, pkg="Splendid")

Examples

## Not run: 
if( FALSE && is_very_annoying( CRAN)){ # otherwise CMD CHECK --as-cran tries to run this :/
  # In your .First:
  library( mvbutils)
  maintain.packages( myfirstpack, mysecondpack, mythirdpack)
  # or...
  live.edit.list <- c( 'myfirstpack', 'mysecondpack', 'mythirdpack')
  maintain.packages( live.edit.list, character.only=TRUE)
  library( myfirstpack) # etc
} # if F

## End(Not run)

Auto-create a NAMESPACE file

Description

Called by pre.install for would-be packages that have a .onLoad function, and are therefore assumed to want a namespace. Produces defaults for the import, export, and S3Methods. You can modify this information prior to the NAMESPACE file being created, using the pre-install hook mechanism. The default for import is taken from the DESCRIPTION file, but the defaults for export and S3 methods are deduced from your functions, and are described below.

Usage

# Don't call this directly-- pre.install will do it automatically for you
make.NAMESPACE( env=1, path=attr( env, "path"),
  description=read.dcf( file.path( path, "DESCRIPTION"))[1,], more.exports=character( 0),
  useDynLib=character())

Arguments

env

character or numeric position on search path

path

directory where proto-package lives

description

(character) elements for the DESCRIPTION file, e.g. c( ..., Author="R.A. Fisher", ...). By default, read from existing file.

more.exports

(character) things to export that normally wouldn't be.

useDynLib

character vector of DLLs, without path or extension. Elements with names will get a NAMESPACE entry of the form <env_name>=useDynLib( <DLL>, .registration=TRUE), with the symbols being placed into a subenvironment withint the package namespace environment. Unnamed elements just get useDynLib(<DLL>) and the symbols go directly into the package namespace environment; that's how Rcpp currently operates (and I'm not keen on it!). If you don't want native-symbol registration, then you should use a PIBH (pre-install build hook; see pre.install) and move the non-regstrees from nsinfo$useDynLib into nsinfo$useDynLib_sans_rego.

Details

There is (currently) no attempt to handle S4 methods.

The imported packages are those listed in the "Depends:" and "Imports:" field of the DESCRIPTION file. All exported functions in those packages will be imported (i.e. currently no "importFrom" provision), except if a function would be screened by a later import or by your package's own functions. The latter should avoid clash warnings when your package is loaded.

The exported functions are all those in find.documented(doctype="any") unless they appear to be S3 methods, plus any functions that have a non-NULL export.me attribute. The latter is a cheap way of arranging for a function to be exported, but without formal documentation (is that wise??). pre.install will incorporate any undocumented export.me functions in the "mypack-internal.Rd" file, so that RCMD CHECK will be happy.

The S3 methods are all the functions whose names start "<<generic>>." and whose first argument has the same name as in the appropriate <<generic>>. The generics that are checked are (i) the names of the character vector .knownS3Generics in package base; (ii) all functions that look like generics in any importees or dependees of your would-be package (i.e. functions in the namespace whose name is a prefix of a function in the S3 methods table of the namespace, and whose body contains a call to UseMethod); (iii) any plausible-looking generic in your would-be package (effectively the same criterion). Documented functions which look like methods but whose flat-doc documentation names them explicitly in the Usage section (e.g. referring to print.myclass(...) rather than just print(...), the latter being how you're supposed to document methods) are assumed not be methods.

Construct sections of documentation

Description

Don't bother reading about these unless you are sure you need to! These are really intended for expediting documentation of large numbers of "internal" functions in a proto-package, and are called by make.internal.doc. make.usage.section and make.arguments.section form prototype USAGE and ARGUMENTS section for the specified functions. These are ready for pasting into flat-format documentation (and subsequent editing).

Usage

make.usage.section( funs=, file=stdout(), env=.GlobalEnv)
make.arguments.section( funs=, file=stdout(), env=.GlobalEnv)

Arguments

funs

character vector of function names, defaulting to find.funs() %except% find.documented( doctype="Rd")

file

where to put the output ("clipboard" is useful). NULL means don't print.

env

where to look for the functions

Details

The default funs argument will find all functions not mentioned in flat-format ready-for-doc2Rd documentation. This is useful for documenting a group of "internal" functions.

make.usage.section simply puts the name of each function before its deparsed and concatenated argument list, one function per line.

make.arguments.section puts one argument per line, then a colon, then the name of the function in parentheses. The idea is that something about the argument should be added manually in a text editor.

Value

Character vector containing the doc section (in plain text, not Rd format).

Author(s)

Mark Bravington

Examples

if( FALSE){
  # Can't run this directly, coz internal
  # so not exported
  ns <- asNamespace( 'mvbutils')
  make.usage.section( c( "make.usage.section", "find.funs"),
    env=ns)
  make.arguments.section( c( "make.usage.section", "find.funs"),
    env=ns)
}

Suppress stupid CRAN notes, and facilitate use of Suggested packages

Description

Sigh... CRAN will reject packages with NOTE "no visible binding for global variable" from R CMD check— not WARNING, just NOTE. This is stupid because of false-positives whenever mlocal, cq, or indeed evalq or local, are called :/. Arguing with CRAN is about as productive as trying to teach a dead fish to climb trees. So, call make_CRANtidote instead, to get a bit of code that you can paste at the end of your package's .onLoad to suppress the non-problem. Your package does not need to import mvbutils to use that code.

It works by creating a dummy for each of those variables in the namespace environment, specifically a function that causes an error (stop) with the message "CRANtidote failure". It has to be a function, since as-CRAN checks look not just for variables, but for functions. Sigh.

The dummy function should never be called, and if you see that "CRANtidote failure" error message, then the dummy has masked something important. Unwanted side-effects are unlikely, but possible: here's two examples. First, when I initially tried this with mvbutils itself, it noticed that .onAttach was mentioned directly somewhere, and helpfully tried to add that— causing failure on attach(), since R tried to run a non-function .onAttach. So .onAttach is now specifically excluded to avoid this, and will need tweaking manually in your code. There might be other special things only created/needed after .onLoad that need similar attention.

The second side-effect might be more common, but is easily fixed. Suppose you requireNamespace a package (which is OK, even according to CRAN and even if it lives in a different repo, as long as it's listed in "Suggests"), but you don't want to always refer to its members by name, e.g. otherpak::funcaroo. Of course, R says you're Supposed To do the latter, but that is not always cozza increased verbosity and decreased code clarity; and e.g. what about operators a la %dopar% in parallel? Then, just call locally_import("otherpak") instead of requireNamespace("otherpak"). It will create a local copy (an "import") of all exportees in "otherpak", by default just in the frame it is called from (so the copy disappears as soon as your code finishes). The local copies will mask any dummies created by make_CRANtidote, so things should work as you expect.

locally_import might be generally useful for packages in "Suggests", even without make_CRANtidote (if you don't care about CRAN), since it saves on precious colons— and it does eliminate the unlikely-but-logically-possible problem that a synonymous function from some completely different attached package might mask the version you are hoping to call (and without locally_import, the colonless version would only work if you attach(otherpak) anyway). You can also "import" just some selected functions.

Usage

make_CRANtidote( checkout)
locally_import( pkg, funs=getNamespaceExports( pkg), to=parent.frame())

Arguments

checkout

Result of a call to check.pkg; AFAIK, it doesn't matter here whether the CRAN argument is set.

pkg

name of a package, presumably listed in "Suggests" but not necessarily

funs

which things to "import" from pkg. Default is everything that's exported by pkg. Usually it's only functions that might be imported, but I think it's conceivable that non-functions might be exported— which is fine, despite the name funs.

to

environment to create the "imports" in. The default is whatever frame locally_import is called from— so the copy will almost always be temporary. In the somewhat unlikely event that you are calling locally_import inside lapply or similar, you might need to specify e.g. to=environment().

Value

make_CRANtidote returns a character vector of class cat, which will display nicely so that you can cut-and-paste it into your .onLoad. locally_import returns TRUE or FALSE, like requireNamespace.

Examples

## Not run: 
if( FALSE && is_very_annoying( CRAN)){ # to avoid CRAN trying to run this despite dont-run :/
  test <- check.pkg( 'mypak')
  make_CRANtidote( test)
  # Some function defined in your own package.
  # This example is complete nonsense--- its only purpose is to show syntax!
  mypakfun <- function( x){
      locally_import( 'doParallel') # everything from 'doParallel'
      locally_import( 'otherpak', c( 'funcaroo', 'bo_diddley')) # just those 2
      # Look mum no double-colons
      thrub <- funcaroo( x %dopar% y)
    return( bo_diddley( thrub))
  }
}

## End(Not run)

Hide dull columns in data frames

Description

make_dull AKA make.dull adds a "dull" S3 class to designated columns in a data.frame. When the data.frame is printed, entries in those columns will show up just as "...". Useful for hiding long boring stuff like matrices with loads of columns, nucleotide sequences, MD5 sums, and filenames. Columns will still print clearly and behave properly if manually extracted. You can remove dullness via undull; see Examples.

The dull class has methods for format (used when printing a data.frame) and [, so that dullness is perpetuated.

Usage

make_dull(df, cols)

Arguments

df

a data.frame

cols

columns to designate

Details

Ask yourself: do you really want details of a function called make_dull? Life may be sweet but it is also short.

Actually, I've had to add something "for the record", but you probably don't want to read this. Just prepending "dull" to the "class" attribute of a column (see oldClass) can mask normal S3 dispatch— matrices being a case in point. Therefore, make_dull now also adds (part of) the implicit class of the column after "dull". In quasi-English, what that means is that a matrix will acquire explicit class attribute c( "dull", "matrix", "array"), whereas a normal non-dull matrix would have a NULL "class" attribute but an implicit class of c("matrix","array"). This means you can still do e.g. isSymmetric( x$dullmat) and inherits( x$dullmat, "matrix"). Other semi-exotic objects (including array, which AFAICR only semi-works inside dataframes, list which I think works OK, and user-defined classes) get similar treatment.

This Sort Of Thing is why it's marginally worth having an explicit undull function, as opposed to just unclass or eg oldClass(excol) <- oldClass( excol) %except% "dull". The former would destroy a user-defined class; the latter would leave superfluous matrix and array elements in an unnecessary "class" attribute (although I'm not sure that causes any practical problems).

Sigh... we are deep in the S3wamplands here; "the perfect is the enemy of the good" where S3 is concerned, so we just gotta put up with some of the murky stuff. That's OK. Be careful what you wish for: ie S4!

More details

make_dull is both autologous and idempotent.

Value

A modified data.frame

Examples

# Becos more logical syntax:
rsample <- function (n = length(pop), pop, replace = FALSE, prob = NULL){
  pop[sample(seq_along(pop) - 1, size = n, replace = replace, prob = prob) + 1]
}
df <- data.frame( x=1:3,
    y=apply( matrix( rsample( 150, as.raw( 33:127), rep=TRUE), 50, 3), 2, rawToChar),
    stringsAsFactors=FALSE) # s.A.F. value shouldn't matter
df # zzzzzzzzzzzzzzz
df <- make_dull( df, 'y')
df # wow, exciting!
df$y # zzzzzzzzzzzzzz
undull( df$y) # no class attrib now
df$ZZZ <- matrix( 1:99, nrow=3, ncol=33)
df # boooring
class( df$ZZZ)
oldClass( df$ZZZ)
df <- make_dull( df, 'ZZZ')
df # whew
class( df$ZZZ)
oldClass( df$ZZZ)
ZZZ <- df$ZZZ
# Suddenly it is interesting! So...
ZZZ <- undull( ZZZ)
class( ZZZ)
oldClass( ZZZ)

Max package version

Description

Finds the highest version number of an installed package in (possibly) several libraries. Mainly for internal use in mvbutils, but might come in handy if your version numbers have gotten out-of-synch eg with different R versions. On my setup, all my "non-base" libraries are folders inside "d:/rpackages", with folder names such as "R2.13"; my .First sets .libPaths() to all of these that are below the running version of R (but that are still legal for that R version; so for R > 3.0, folders named "R2.xxxx" would be excluded). Hence I can call max_pkg_ver( mypack, "d:/rpackages") to find the highest installed version in all these subfolders.

Usage

max_pkg_ver(pkg, libroot, pattern = "^[rR][ -]?[0-9]+")
    # NB named with underscores to avoid interpretation as S3 method

Arguments

pkg

character, the name of the package

libroot

folder(s) to be searched recursively for package pkg

pattern

what regexp to use when looking for potential libraries to recurse into

Value

A numeric_version object for the highest-numbered installation, with value numeric_version("0") if no such package is found. If libroot is a single library containing the package, the result will equal packageVersion( pkg, limbroot).

Examples

max_pkg_ver( "mvbutils", .libPaths())

Put reals and integers into specified bins, returning factors.

Description

Put reals and integers into specified bins, returning ordered factors. Like cut but for human use.

Usage

mcut( x, breaks, pre.lab='', mid.lab='', post.lab='', digits=getOption( 'digits'))
mintcut( x, breaks=NULL, prefix='', all.levels=, by.breaks=1)

Arguments

x

(numeric vector) What to bin– will be coerced to integer for mintcut

breaks

(numeric vector) LH end of each bin– should be increasing. Values of x exactly on the LH end of a bin will go into that bin, not the previous one. For mintcut, defaults to equal-size bins across the range of x, where bin size is set from by.breaks which itself defaults to 1. For mcut, should start with -Inf if necessary, but should not finish with Inf unless you want a bin for Infs only.

prefix, pre.lab

(string) What to prepend to the factor labels– e.g. "Amps" if your original data is about Amps.

mid.lab

"units" to append to numeric vals inside factor labels. Tends to make the labels harder to read; try using post.lab instead.

post.lab

(string) What to append to the factor labels.

digits

(integer) How many digits to put into the factor labels.

all.levels

if FALSE, omit factor levels that don't occur in x. To override "automatically", just set the "all.levels" attribute of breaks to anything non-NULL; useful e.g. if you are repeatedly calling mintcut with the same breaks and you always want all.levels=TRUE.

by.breaks

for mintcut when default breaks is used, to set the bin size.

Details

Values of x below breaks[1] will end up as NAs. For mintcut, factor labels (well, the bit after the prefix) will be of the form "2-7" or "3" (if the bin range is 1) or "8+" (for last in range). For mcut, labels will look like this (apart from the pre.lab and post.lab bits): "[<0.25]" or "[0.25,0.50]" or "[>=0.75]".

Examples

set.seed( 1)
mcut( runif( 5), c( 0.25, 0.5, 0.75))
# [1] [0.25,0.50] [0.25,0.50] [0.50,0.75] [>=0.75]     [<0.25]
# Levels: [<0.25] [0.25,0.50] [0.50,0.75] [>=0.75]
 mcut( runif( 5), c( 0.25, 0.5, 0.75), pre.lab='A', post.lab='B', digits=1)
# [1] A[>=0.8]B    A[>=0.8]B    A[0.5,0.8]B A[0.5,0.8]B A[<0.2]B
# Levels: A[<0.2]B A[0.2,0.5]B A[0.5,0.8]B A[>=0.8]B
mintcut( 1:8, c( 2, 4, 7))
# [1] <NA> 2-3  2-3  4-6  4-6  4-6  7+   7+
# Levels: 2-3 4-6 7+
mintcut( c( 1, 2, 4)) # auto bins, size defaulting to 1
# [1] 1  2  4+
# Levels: 1 < 2 < 3 < 4+
mintcut( c( 1, 2, 6), by=2) # auto bins of size 2
# [1] 1-2 1-2 5+
# Levels: 1-2 < 3-4 < 5+

Deparsing nicelier

Description

R's built-in deparse rather messes up a:=b, a?b, and ?a, destroying their elegance. This doesn't— though for ?a it does wrap the result in superfluous parentheses. There might be a few superfluous parens in other cases too, but base::deparse does that too (to safeguard reparsability).

Detail

This works (quickly) by first using substitute to replace calls to := and ? by fake user-defined operators (%<something>%), then calling deparse, then gsub to re-replace the user-def-ops by := and ?. The <something> is meant to be a string that could never occur accidentally in deparse output (ie no character constant or name could ever deparse to it)— I hope I got it right!

I am not entirely sure about precedence. Because user-defined ops have higher precedence than := or ?, what I think happens is that deparse puts in extra parentheses to safeguard precedence. I don't remove them, so I think the end result is correct in the sense that parse(text=mdeparse(expr)) will keep precedence correct, though maybe with extra parens.

Apparently rlang::expr_deparse also handles := and ? sensibly (and doesn't put the parens on ?a), but it is slow because it does everything itself, and crikey it is a big dependency. mdeparse is fast because it's a beautiful hack.

While I was thinking about this, I came upon the doubt package, which is a really clever thing— kudos to the author!

Usage

mdeparse(expr, ...)

Arguments

expr

what to deparse

...

other args for base::deparse

Value

Character vector.

Examples

deparse( quote( a:=b))     # [1] "`:=`(a, b)"
mdeparse( quote( a:=b))    # [1] "a := b"
mdeparse( quote( a?b))     # [1] "a ? b"
deparse( quote( a?b))      # [1] "`?`(a, b)"
mdeparse( quote( ?b))      # [1] "(?b)" best I could do--- sorry!
deparse( quote( ?b))       # [1] "`?`(b)"

Cacheing objects for lazy-load access

Description

mlazy and friends are designed for handling collections of biggish objects, where only a few of the objects are accessed during any period, and especially where the individual objects might change and the collection might grow or shrink. As with "lazy loading" of packages, and the gdata/ASOR packages, the idea is to avoid the time & memory overhead associated with loading in numerous huge R binary objects when not all will be needed. Unlike lazy loading and gdata, mlazy caches each mlazyed object in a separate file, so it also avoids the overhead that would be associated with changing/adding/deleting objects if all objects lived in the same big file. When a workspace is Saved, the code updates only those individual object files that need updating.

Apart from possibly environment objects (see subsection), mlazy does not require any special structure for object collections; in particular, the data doesn't have to go into a package. mlazy is particularly useful for users of cd because each cd to/from a task causes a read/write of the binary image file (usually ".RData"), which can be very large if mlazy is not used. Read DETAILS next. Feedback is welcome.

Environments

Sometimes nowadays I use an R environment instead of a list to store stuff, usually to take advantage of inheritance. They are a bit different to other R objects, and if you don't understand them properly, then be careful! The salient point here is that an environment is really a pointer, and unlike other R objects, two R environment "objects" can actually point to exactly the same "shared memory". Now, mlazy works fine with single copy of an environment (when it's saved into the cache folder, R will automatically include any necessary parent environments, etc) but if you have two objects that point to the same "real" environment, and you mlazy just one of them or both of them, then I don't know what's going to happen when you reload them; do you now end up with two separate unlinked copies, or what? So... like I said, be very careful with mlazy and environment objects. (It would be handy if there was a tool to check for other references to a given environment, which must be buried inside R's internal structures since reference-counting is certainly used there. But...)

Usage

mlazy( ..., what, envir=.GlobalEnv, save.now=TRUE)
  # cache some objects
mtidy( ..., what, envir=.GlobalEnv)
  # (cache and) purge the cache to disk, freeing memory
demlazy( ..., what, envir=.GlobalEnv)
  # makes 'what' into normal uncached objects
mcachees( envir=.GlobalEnv)
  # shows which objects in  envir are cached
attach.mlazy( dir, pos=2, name=)
  # load mcached workspace into new search environment,
  # or create empty s.e. for cacheing

Arguments

...

unquoted object names, overridden by what if supplied

what

character vector of object names, all from the same environment. For mtidy and demlazy, defaults to all currently-cached objects in envir

envir

environment or position on the search path, defaulting to the environment where what or objs live.

save.now

see DETAILS

dir

name of directory, relative to task.home.

pos

numeric position of environment on search path, 2 or more

name

name to give environment, defaulting to something like "data:current.task:dir".

Value

These functions are used only for their side-effects, except for cachees which returns a character vector of object names.

More details

All this is geared to working with saved images (i.e. ".RData" or "all.rda" files) rather than creating all objects anew each session via source. If you use the latter approach, mlazy will probably be of little value.

The easiest way to set up cacheing is just to create your objects as normal, then call

mlazy( <<objname1>>, <<objname2>>, <<etc>>)

Save()

This will not seem to do much immediately– your object can be read and changed as normal, and is still taking up memory. The memory and time savings will come in your next R session in this workspace.

You should never see any differences (except in time & memory usage) between working with cached (AKA mlazyed) and normal uncached objects.[One minor exception is that cacheing a function may stuff up the automatic backup system, or at any rate the "backstop" version of it which runs when you cd. This is deliberate, for speeding up cd. But why would you cache a function anyway?]

mlazy itself doesn't save the workspace image (the ".RData" or "all.rda" file), which is where the references live; that's why you need to call Save periodically. save.image and save will not work properly, and nor will load– see NOTE below. Save doesn't store cached objects directly in the ".RData" file, but instead stores the uncached objects as normal in .RData together with a special object called something like .mcache00 (guaranteed not to conflict with one of your own objects). When the .RData file is subsequently reloaded by cd, the presence of the .mcache00 object triggers the creation of "stub" objects that will load the real cached objects from disk when and only when each one is required; the .mcache00 object is then deleted. Cached objects are loaded & stored in a subdirectory "mlazy" from individual files called "obj*.rda", where "*" is a number.

mlazy and Save do not immediately free any memory, to avoid any unnecessary re-loading from disk if you access the objects again during the current session. To force a "memory purge" during an R session, you need to call mtidy. mtidy purges its arguments from the cache, replacing them by promises just as when loading the workspace; when a reference is next accessed, its cached version will be re-loaded from disk. mtidy can be useful if you are looping over objects, and want to keep memory growth limited– you can mtidy each object as the last statement in the loop. By default, mtidy purges the cache of all objects that have previously been cached. mtidy also caches any formerly uncached arguments, so one call to mtidy can be used instead of mlazy( ...); mtidy( ...).

move understands cached objects, and will shuffle the files accordingly.

demlazy will delete the corresponding "obj*.rda" file(s), so that only an in-memory copy will then exist; don't forget to Save soon after.

Warning

The system function load does not understand cacheing. If you merely load an image file saved using Save, cached objects will not be there, but there will be an extra object called something like .mcache00. Hence, if you have cached objects in your ROOT task, they will not be visible when you start R until you load the mvbutils library– another fine reason to do that in your .First. The .First.lib function in mvbutils calls setup.mcache( .GlobalEnv) to automatically prepare any references in the ROOT task.

Cacheing in other search environments

It is possible to cache in search environments other the current top one (AKA the current workspace, AKA .GlobalEnv). This could be useful if, for example, you have a large number of simulated datasets that you might need to access, but you don't want them cluttering up .GlobalEnv. If you weren't worried about cacheing, you'd probably do this by calling attach( "<<filename>>"). The cacheing equivalent is attach.mlazy( "cachedir"). The argument is the name of a directory where the cached objects will be (or already are) stored; the directory will be created if necessary. If there is a ".RData" file in the directory, attach.mlazy will load it and set up any references properly; the ".RData" file will presumably contain mostly references to cached data objects, but can contain normal uncached objects too.

Once you have set up a cacheable search environment via attach.mlazy (typically in search position 2), you can cache objects into it using mlazy with the envir argument set (typically to 2). If the objects are originally somewhere else, they will be transferred to envir before cacheing. Whenever you want to save the cached objects, call Save.pos(2).

You will probably also want to modify or create the .First.task (see cd) of the current task so that it calls attach.mlazy("<<cache directory name>>"). Also, you should create a .Last.task (see cd) containing detach(2), otherwise cd(..) and cd(0/...) won't work.

Options

By default, mlazy now saves & loads into a auto-created subdirectory called "mlazy". In the earliest releases, though, it saved "obj*.rda" files into the same directory as ".RData". It will now move any "obj*.rda" files that it finds alongside ".RData" into the "mlazy" subdirectory. You can (possibly) override this by setting options( mlazy.subdir=FALSE), but the default is likely more reliable.

By default, there is no way to figure out what object is contained in a "obj*.rda" without forcibly loading that file or inspecting the .mcache00 object in the "parent" .RData file– not that you should ever need to know. However, if you set options( mlazy.index=TRUE) (recommended), then a file "obj.ind" will be maintained in the "mlazy" directory, showing (object name - value) pairs in plain text (tab-separated). For directories with very large numbers of objects, there may be some speed penalty. If you want to create an index file for an existing "mlazy" directory that lacks one, cd to the task and call mvbutils:::mupdate.mcache.index.if.opt(mlazy.index=TRUE).

See Save for how to set compression options, and save for what you can set them to; options(mvbutils.compression_level=1) may save some time, at the expense of disk space.

Troubleshooting

In the unlikely event of needing to manually load a cached image file, use load.refdb– cd and attach.mlazy do this automatically.

In the unlikely event of lost/corrupted data, you can manually reload individual "obj*.rda" files using load– each "obj*.rda" file contains one object stored with its correct name. Before doing that, call demlazy( what=mcachees()) to avoid subsequent trouble. Once you have reloaded the objects, you can call mlazy again.

See Options for the easy way to check what object is stored in a particular "obj*.rda" file. If that feature is turned off on your system, the failsafe way is to load the file into a new environment, e.g. e <- new.env(); load( "obj99.rda", e); ls( e).

To see how memory changes when you call mlazy and mtidy, call gc().

To check object sizes without actually loading the cached objects, use lsize. Many functions that iterate over all objects in the environment, such as eapply, will cause mlazy objects to be loaded.

Housekeeping of "obj**.rda" files happens during Save; any obsolete files (i.e. corresponding to objects that have been removed) are deleted.

Inner workings

What happens: each workspace acquires a mcache attribute, which is a named numeric vector. The absolute values of the entries correspond to files– 53 corresponds to a file "obj53.rda", etc., and the names to objects. When an object myobj is mlazyed, the mcache is augmented by a new element named "myobj" with a new file number, and that file is saved to disk. Also, "myobj" is replaced with an active binding (see makeActiveBinding). The active binding is a function which retrieves or sets the object's data within the function's environment. If the function is called in change-value mode, then it also makes negative the file number in mcache. Hence it's possible to tell whether an object has been changed since last being saved.

When an object is first mlazyed, the object data is placed directly into the active binding function's environment so that the function can find/modify the data. When an object is mtidyed, or when a cached image is loaded from disk, the thing placed into the A.B.fun's environment is not the data itself, but instead a promise saying, in effect, "fetch me from disk when you need me". The promise gets forced when the object is accessed for reading or writing. This is how "lazy loading" of packages works, and also the gdata package. However, for mlazy there is the additional requirement of being able to determine whether an object has been modified; for efficiency, only modified objects should be written to disk when there is a Save.

There is presumably some speed penalty from using a cache, but experience to date suggests that the penalty is small. Cached objects are saved in compressed format, which seems to take a little longer than an uncompressed save, but loading seems pretty quick compared to uncompressed files.

Author(s)

Mark Bravington

Examples

## Not run: 
if( FALSE && is_very_annoying( CRAN)){ # otherwise CMD CHECK --as-cran tries to run this :/
  biggo <- matrix( runif( 1e6), 1000, 1000)
  gc() # lots of memory
  mlazy( biggo)
  gc() # still lots of memory
  mtidy( biggo)
  gc() # better
  biggo[1,1]
  gc() # worse; it's been reloaded
} # if F

## End(Not run)

Macro-like functions

Description

mlocal lets you write a function whose statements are executed in its caller's frame, rather than in its own frame.

Usage

# Use only as wrapper of function body, like this:
# my.fun <- function(..., nlocal=sys.parent()) mlocal( expr)
# ... should be replaced by the arguments of "my.fun"
# expr should be replaced by the code of "my.fun"
# nlocal should always be included as shown
mlocal( expr) # Don't use it like this!

Arguments

expr

the function code, normally a braced expression

Details

Sometimes it's useful to write a "child" function that can create and modify variables in its parent directly, without using assign or <<- (note that <<- will only work on variables that exist already). This can make for clearer, more modular programming; for example, tedious initializations of many variables can be hidden inside an initialize() statement. The definition of an mlocal function does not have to occur within its caller; the mlocal function can exist as a completely separate R object.

mlocal functions can have arguments just like normal functions. These arguments will temporarily hide any objects of the same name in the nlocal frame (i.e. the calling frame). When the mlocal function exits, its arguments will be deleted from the calling frame and the hidden objects (if any) will be restored. Sometimes it's desirable to avoid cluttering the calling frame with variables that only matter to the mlocal function. A useful convention is to "declare" such temporary variables in your function definition, as defaultless arguments after the nlocal argument.

The nlocal argument of an mlocal function– which must ALWAYS be included in the definition, with the default specified as sys.parent()– can normally be omitted when invoking your mlocal function. However, you will need to set it explicitly when your function is to be called by another, e.g. lapply; see the third example. A more daring usage is to call e.g. fun.mlocal(nlocal=another.frame.number) so that the statements in fun.mlocal get executed in a completely different frame. A convoluted example can be found in the (internal) function find.debug.HQ in the debug package, which creates a frame and then defines a large number of variables in it by calling setup.debug.admin(nlocal=new.frame.number). As of 2016, you can also set nlocal to be an environment.

mlocal functions can be nested, though this gets confusing. By default, all evaluation will happen in the same frame, that of the original caller.

Note that (at least at present) all arguments are evaluated as soon as your mlocal function is invoked, rather than by the usual lazy evaluation mechanism. Missing arguments are still OK, though.

If you call return in an mlocal function, you must call local.return too.

on.exit doesn't work properly. If you want to have exit code in the mlocal function itself, use local.on.exit. I can't find any way to set the exit code in the calling function from within an mlocal function. (Not checked for some years)

Frame-dependent functions (sys.parent()) etc. will not do what you expect inside an mlocal function. For R versions between at least 1.8 and 2.15, calling the mvb... versions will return information about the caller of the current mlocal() function caller (or the original caller, if there is a chain of mlocals). For example, mvb.sys.function() returns the definition of the caller, and mvb.sys.parent() the frame of the caller's parent. Note that sys.frame( mvb.sys.nframe()) gives the current environment (i.e. where all the variables live), because this is shared between the caller and the mlocal function. Other behaviour seems to depend on the version of R, and in R 2.15 I don't know how to access the definition of the mlocal function itself. This means, for example, that you can't reliably access attributes of the mlocal function itself, though you can access those of its caller via e.g. attr( mvb.sys.function(), "thing").

Value

As per your function; also see local.return.

Author(s)

Mark Bravington

Examples

# Tidiness and variable creation
init <- function( nlocal=sys.parent()) mlocal( sqr.a <- a*a)
ffout <- function( a) { init(); sqr.a }
ffout( 5) # 25
# Parameters and temporary variables
ffin <- function( n, nlocal=sys.parent(), a, i) mlocal({
    # this "n" and "a" will temporarily replace caller's "n" and "a"
    print( n)
    a <- 1
    for( i in 1:n)
      a <- a*x
    a
  })
x.to.the.n.plus.1 <- function( x, n) {
    print( ffin( n+1))
    print( n)
    print( ls())
  }
x.to.the.n.plus.1( 3, 2) # prints as follows:
# [1] 3 (in "ffin")
# [1] 27 (result of "ffin")
# [1] 2 (original n)
# [1] "n" "x" (vars in "x.to.the..."-- NB no a or i)
# Use of "nlocal"
ffin <- function( i, nlocal=sys.parent()) mlocal( a <- a+i )
ffout <- function( ivec) { a <- 0; sapply( ivec, ffin, nlocal=sys.nframe()) }
ffout( 1:3) # 1 3 6

Organizing R workspaces

Description

move shifts one or more objects around the task hierarchy (see cd), whether or not the source and destination are currently attached on the search path.

Usage

# Usually: unquoted object name, unquoted from and to, e.g.
# move( thing, ., 0/somewhere)
# Use 'what' arg to move several objects at once, e.g.
# move( what=c( "thing1", "thing2"), <<etc>>)
# move( x, from, to)
# move( what=, from, to)
# Next line shows the formal args, but the real usage would NEVER be like this...0
move( x='.', from='.', to='.', what, overwrite.by.default=FALSE, copy=FALSE)

Arguments

x

unquoted name

from

unquoted path specifier (or maintained package specifier)

to

unquoted path specifier (or M.P. specifier)

what

character vector

overwrite.by.default

logical(1)

copy

logical(1)

Details

The normal invocation is something like move( myobj, ., 0/another.task)– note the lack of quotes around myobj. To move objects with names that have to be quoted, or to move several objects at the same time, specify the what argument: e.g. move( what=c( "myobj", "%myop%"), ., 0/another.task). Note that move is playing fast and loose with standard argument matching here; it correctly interprets the . as from, rather than x. This well-meaning subversion can lead to unexpected trouble if you deviate from the paradigms in Examples. If in doubt, you can always name from and to.

move can also handle moves in and out of packages being live-edited (see maintain.packages). If you want to specify a move to/from your package "whizzbang", the syntax of to and from should be ..whizzbang (i.e. the actual environment where the pre-installed package lives). An alternative for those short of typing practice is maintained.packages$whizzbang. No quotes in either case.

If move finds an object with the same name in the destination, you will be asked whether to overwrite it. If you say no, the object will not be moved. If you want to force overwriting of a large number of objects, set overwrite.by.default=TRUE.

By default, move will delete the original object after it has safely arrived in its destination. It's normally only necessary (and more helpful) to have just one instance of an object; after all, if it needs to be accessed by several different tasks, you can just move it to an ancestral task. However, if you really do want a duplicate, you can avoid deletion of the original by setting copy=TRUE.

You will be prompted for whether to save the source and destination tasks, if they are attached somewhere, but not in position 1. Normally this is a good idea, but you can always say no, and call Save.pos later. If the source and/or destination are not attached, they will of course be saved automatically. The top workspace (i.e. current task) .GlobalEnv is never saved automatically; you have to call Save yourself.

move is not meant to be called within other functions.

Author(s)

Mark Bravington

Examples

## Not run: 
if( FALSE && is_very_annoying( CRAN)){ # otherwise CMD CHECK --as-cran tries to run this :/
  move( myobj, ., 0) # back to the ROOT task
  move( what="%myop%", 0/first.task, 0/second.task)
  # neither source nor destination attached. Funny name requires "what"
  move( what=c( "first.obj", "second.obj"), ., ../sibling.task)
  # multiple objects require "what"
  move( myobj, ..myfirstpack, ..mysecondpack) # live-edited packages
} # if F

## End(Not run)

Match rows of one dataframe to another using multiple columns

Description

Like match, but for more than one variable at a time— and geared specifically to dataframes (or matrices). NA values match only to NAs.

So useful that I've finally moved it from secret package handy2 into package mvbutils, and added documentation.

Any factor fields (which I hardly ever use; characters are just Better) will be matched based on the strings they display as, so that (eg) arbitrary re-orderings levels won't matter.

Usage

multimatch(df1, df2, nomatch = NA, char.force = FALSE, force.same.cols = TRUE)

Arguments

df1, df2

two dataframes. Unless you set force.same.cols=FALSE, column-order is assumed to be the same in both, and a mismatched number of columns will trigger an error.

nomatch

like in match

char.force

?convert all columns to character before checking? Usually doesn't matter; if it does, TRUE is probably the safer, but historically the default is FALSE

force.same.cols

Perhaps a misleading name... set to FALSE if you want multimatch to use only columns whose name exists in both dataframes, and to re-order columns if necessary so that the names match. Usually the non-default FALSE is better!

Details

multimatch works by constructing a single numeric composite for each row in df1 and df2, based on multiplying numbers of distinct values across columns. This could potentially overflow, or give inaccurate results, if the number of columns and distinct values is very large. So, don't use multimatch in that situation...

Value

A numeric vector, one element per row in df1, showing which row in df2 it matches to, or nomatch if none do.

Examples

xx <- data.frame(
    animal= cq( cat, dog, cat), colour= cq( blue, blue, pink), size= 1:3,
    royalty= cq( high, low, high))
yy <- data.frame(
    animal= cq( dog, dog, cat), colour= cq( red, blue, pink), size= 1:3,
    loyalty=cq( high, high, low)) # note the spelling!
multimatch( xx, yy)
# NA NA NA
multimatch( xx[,1:3], yy[,1:3])  # ignore 4th col
# NA 2 3
multimatch( xx, yy, force=FALSE) # auto-drop loyalty & royalty (different names)
# NA 2 3
try( multimatch( xx[,1:2], yy[,1:3]))
# <error>: num of cols
try( multimatch( xx[,1:2], yy[,1:3], force=FALSE))
# all good
multimatch( as.matrix( xx[,1:3]), as.matrix( yy[,1:3])) # matrices OK too
# NA 2 3

Replacement and insertion functions with more/less than 1 replacement per spot

Description

multirep is like replace, but the replacements are a list of the same length as the number of elements to replace. Each element of the list can have 0, 1, or more elements– the original vector will be expanded/contracted accordingly. (If all elements of the list have length 1, the result will be the same length as the original.) multinsert is similar, but doesn't overwrite the elements in orig (so the result of multinsert is longer). massrep is like multirep, but takes lists as arguments so that a group-of-line-numbers in the first list is replaced by a group-of-lines in the second list.

Usage

multirep( orig, at, repl, sorted.at=TRUE)
multinsert( orig, at, ins, sorted.at=TRUE)
massrep( orig, atlist, replist, sorted.at=TRUE)

Arguments

orig

vector

at

numeric vector, saying which elements of the original will be replaced or appended-to. Can't exceed length(orig). 0 is legal in multinsert but not multirep. Assumed sorted unless sorted.at is set to FALSE.

atlist

list where each element is a group of line numbers to be replaced by the corresponding element of replist (and that element can have a different length). Normally each group of line numbers would be consecutive, but this is not mandatory.

repl, ins, replist

a list of replacements. repl[[i]] will replace line at[i] in orig, possibly removing it (if repl[[i]] has length 0) or inserting extra elements (if repl[[i]] has length > 1). In multinsert, repl can be a non-list, whereupon it will be cast to list(repl) [if at is length 1] or as.list(repl) [if at is length>1]. If length(repl) < length(at), repl will be replicated to the appropriate size. If repl is atomic, it will be typecast into a list– in this case, all replacements/insertions will be of length 1.

sorted.at

if TRUE, then at had better be sorted beforehand; if FALSE, at will be sorted for you inside multirep, and repl is reordered accordingly.

Examples

multirep( cq( the, cat, sat, on, the, mat), c( 2, 6),
    list( cq( big, bug), cq( elephant, howdah, cushion)))
# [1] "the" "big" "bug" "sat" "on" "the" "elephant" "howdah" "cushion"
multirep( cq( the, cat, sat, on, the, mat), c( 2, 6),
    list( cq( big, bug), character(0)))
# [1] "the" "big" "bug" "sat" "on" "the"
# NB the 0 in next example:
multinsert( cq( cat, sat, on, mat), c( 0, 4),
    list( cq( fat), cq( cleaning, equipment)))
# [1] "fat" "cat" "sat" "on" "mat" "cleaning" "equipment"

Session info environment

Description

Package mvbutils needs a place to stash useful session-level stuff, such as fix.list (see fixr). Since like foreeeever (2001), this has been via a special environment called mvb.session.info which is attached to the search path. However, that's not how yer sposed to do it apparently, so for better security the direct use of mvb.session.info is deprecated in favour of calling the function mvb_session_env() or its dotty synonym. Like the base-R functions globalenv() and baseenv(), an environment is returned.

Future versions of mvbutils package will move the session-info environment into "private" storage within the mvbutils namespace, so that it can only easily be accessed via mvb_session_env() or mvb.session.env().

Usage

mvb_session_env()
mvb.session.env()

Value

Environment where session-level info used by the mvbutils package (and perhaps by other packages, such as debug) is stashed.

Functions to Access the Function Call Stack

Description

These functions are "do what I mean, not what I say" equivalents of the corresponding system functions. The system functions can behave strangely when called in strange ways (primarily inside eval calls). The mvb equivalents behave in a more predictable fashion.

Usage

mvb.sys.parent(n=1)
mvb.sys.nframe()
mvb.parent.frame(n=1)
mvb.eval.parent( expr, n=1)
mvb.match.call(definition = sys.function(mvb.sys.parent()),
    call = sys.call(mvb.sys.parent()),  expand.dots = TRUE, envir= mvb.parent.frame( 2))
mvb.nargs()
mvb.sys.call(which = 0)
mvb.sys.function(n)

Arguments

All as per the corresponding system functions, from whole helpfiles the following is taken:

which

the frame number if non-negative, the number of generations to go back if negative. (See the Details section.)

n

the number of frame generations to go back.

definition

a function, by default the function from which match.call is called.

call

an unevaluated call to the function specified by definition, as generated by call.

expr

an expression to evaluate

expand.dots

logical. Should arguments matching ... in the call be included or left as a ... argument?

envir

an environment from which the ... in call are retrieved, if any (as per base::match.call)

Details

Sometimes eval is used to execute statements in another frame. If such statements include calls to the system versions of these routines, the results will probably not be what you want. In technical terms: the same environment will actually appear several times on the call stack (returned by sys.frame()) but with a different calling history each time. The mvb. equivalents look through sys.frames() for the first frame whose environment is identical to the environment they were called from, and base all conclusions on that first frame. To see how in detail, look at the most fundamental function: mvb.sys.parent.

mvbutils pre 2.7 used to include mvb.sys.on.exit as well (to return whatever the on.exit code would be), but I think this was by mistake; the code was actually specific to my debug package (which already has its own substitute), and so I've moved it out of mvbutils.

Value

See the helpfiles for the system functions.

Author(s)

Mark Bravington

Examples

ff.no.eval <- function() sys.nframe()
ff.no.eval() # 1
ff.system <- function() eval( quote( sys.nframe()), envir=sys.frame( sys.nframe()))
ff.system() # expect 1 as per ff.no.eval, get 3
ff.mvb <- function() eval( quote( mvb.sys.nframe()), envir=sys.frame( sys.nframe()))
ff.mvb() # 1
ff.no.eval <- function(...) sys.call()
ff.no.eval( 27, b=4) # ff.no.eval( 27, b=4)
ff.system <- function(...) eval( quote( sys.call()), envir=sys.frame( sys.nframe()))
ff.system( 27, b=4) # eval( expr, envir, enclos) !!!
ff.mvb <- function(...) eval( quote( mvb.sys.call()), envir=sys.frame( sys.nframe()))
ff.mvb( 27, b=4) # ff.mvb( 27, b=4)

Private options for mvbutils package and beyond

Description

Set/get values in the environment mvbutils::mvboptions. Mostly for mvbutils itself, but anyone can use it at their own risk! Partly intended to ultimately obviate the dicey mvb.session.info environment on the search path...

Usage

mvboption(...) # eg mvboption( use_something=TRUE) to set,
# ... or mvboption( 'what_am_i') to get

Arguments

...

Either a named pairlist (eg mvboption( a=1, b=2)) to set, or a character vector to get

Value

Any previous value(s) of the options, if setting; this might mean an empty list. When getting, it's a list if more than one thing is being gotten, or the value itself if just one. More obvious than it sounds. See Examples.

Examples

mvboption( something=1)  # empty list
mvboption( a=2, b=3)     # empty list
mvboption( 'b')          # [1] 3
mvboption( cq( a, b))    # list with two elements

Utility operators

Description

Succinct or convenience operators

Usage

a %&% b
x %**% y
a %!in% b
vector %except% condition    # does NOT strip attributes--- see *Value*
x %grepling% patt
x %has.name% name
x %is.not.a% what
x %is.a% what
x %is.not.an% what
x %is.an% what
name %in.names.of% x
x %matching% patt
a %not.in% b
a %not.in.range% b
x %perling% patt
x %that.end.with% suffix
x %that.start.with% prefix
x %that.match% patt
x %that.dont.match% patt
x %THAT.MATCH% patt
x %THAT.DONT.MATCH% patt
a %that.are.in% b       # does NOT strip attributes--- see *Value*
x %without.name% what   # does NOT strip attributes--- see *Value*
a %in.range% b
a %such.that% b         # does NOT strip attributes--- see *Value*
a %SUCH.THAT% b         # does NOT strip attributes--- see *Value*
from %upto% to
from %downto% to
x %where% cond          # also equiv mwhere(x,cond)
x %where.warn% cond
a %<-% value # really e.g. {x;y} %<-% list( 'yes', sqrt(pi)) to create x & y

Arguments

a, b, vector, condition, x, y, name, what, patt, from, to, cond, value, prefix, suffix

see Arguments by function.

Value

%&%

character vector. If either is zero-length, so is the result (unlike paste).

%**%

numeric, possibly a matrix

%upto%, %downto%

numeric

%is.a%, %in%, etc

logical

%<-%

technically NULL return, but it overwrites / creates objects; see below...

%has.name%, %in.names.of%

logical vector

All others

same type as first argument.

Note that attributes are _not_ stripped by the subsetting of %without.name%, %except%, %such.that%, %SUCH.THAT%, or %that.are.in% (as of v2.8.369)--- whereas base R does, which I view as a bug. However, attributes may still get stripped by the other %that... and %matching%, because (some of) those use unique. Possibly I should tweak those too, but I think they are inconsistently designed (e.g. %that.match% returns unique values, but %that.dont.match% uses subsetting) and I dont want to risk breaking more things...

Arguments by function

%&% a, b: character vectors to be pasted with no separator. If either is zero-length, so is the result (unlike paste).

%**% x, y: matrices or vectors to be multiplied using %*% but with less fuss about dimensions

%!in%, %that.are.in% a, b: vectors (character, numeric, complex, or logical).

%except% vector, condition: character or numeric vectors

%has.name%, %in.names.of% x, name: whether name (perhaps several) is in names(x). These differ only in the order of the parameters, but in some contexts one version seems more natural. Sugar for base::hasName.

%in.range%, %not.in.range% a, b: numeric vectors.

%is.a%, etc. x: object whose class is to be checked

%is.a%, etc. what: class name

%matching%, %that.match%, %that.dont.match%, %THAT.MATCH%, %THAT.DONT.MATCH%, %grepling%, %perling% x: character vector

%matching%, %that.match%, %that.dont.match%, %THAT.MATCH%, %THAT.DONT.MATCH%, %grepling%, %perling% patt: character vector of regexps, with perl syntax for %perling%. Use the upper-case versions for case-insensitive matching.

%that.start.with%, %that.end.with% : prefix & suffix: fixed (non-regex) strings that must match the start or end of x , as per startsWith and endsWith.

%such.that%, %SUCH.THAT% a: vector

%such.that%, %SUCH.THAT% b: expression containing a ., to subscript a with

%upto%, %downto% from, to: numeric(1)

%where%, %where.warn% x: data.frame

%where%, %where.warn% cond: unquoted expression to be evaled in context of x, then in the calling frame of %where% (or .GlobalEnv). Should evaluate to logical (or maybe numeric or character); NA is treated as FALSE. Wrap cond in parentheses to avoid trouble with operator precedence. NB %where% is equivalent to mwhere, which can be handily used in base-R pipes.

%without.name% x: object with names attribute

%without.name% what: character vector of names to drop

%<-% a, value: value should be a list, and a should be e.g. {x;y;z} with as many elements as value has. The elements of value are assigned, in order, to the objects named in a, which are created / overwritten in the calling environment.

Author(s)

Mark Bravington

Examples

"a" %&% "b" # "ab"
matrix( 1:4, 2, 2) %**% matrix( 1:2, 2, 1) # c( 7, 10); '%*%' gives matrix result
matrix( 1:2, 2, 1) %**% matrix( 1:4, 2, 2) # c( 5, 11); '%*%' gives error
1:2 %**% matrix( 1:4, 2, 2) # '%*%' gives matrix result
1:5 %!in% 3:4 # c( TRUE, TRUE, FALSE, FALSE, TRUE)
1:5 %not.in% 3:4 # c( TRUE, TRUE, FALSE, FALSE, TRUE)
1:5 %that.are.in% 3:4 # c( 3, 4)
trf <- try( 1+"nonsense")
if( trf %is.not.a% "try-error") cat( "OK\n") else cat( "not OK\n")
1:5 %except% c(2,4,6) # c(1,3,5)
c( alpha=1, beta=2) %without.name% "alpha" # c( beta=2)
xx <- list( y=0, z='pterodactyl')
xx %has.name% 'y' # yep
xx %has.name% 'pringle' # nope
xx %has.name% cq( y, z) # yep and yep again
cq( y, z, zzz) %in.names.of% xx # same thing
1:5 %in.range% c( 2, 4) # c(F,T,T,T,F)
1:5 %not.in.range% c( 2, 4) # c(T,F,F,F,T)
c( "cat", "hat", "dog", "brick") %matching% c( "at", "ic") # cat hat brick
c( "cat", "hat", "dog", "brick") %that.match% c( "at", "ic") # cat hat brick; ...
# ... synonym for '%matching%'
c( "cat", "hat", "dog", "brick") %THAT.MATCH% c( "AT", "ic") # cat hat brick; case-insensitive
# ... synonym for '%matching%'
c( "cat", "hat", "dog", "brick") %that.dont.match% c( "at", "ic") # dog; ...
# ... like '%except%' but for regexps
c( "cat", "hat", "dog", "brick") %that.end.with% 'at' # cat hat
c( "cat", "hat", "dog", "brick") %that.start.with% 'br' # brick
1 %upto% 2 # 1:2
1 %upto% 0 # numeric( 0); use %upto% rather than : in for-loops to avoid unintended errors
1 %downto% 0 # 1:0
1 %downto% 2 # numeric( 0)
ff <- function( which.row) {
    x <- data.frame( a=1:3, b=4:6)
    x %where% (a==which.row)
  }
ff( 2) # data.frame( a=2, b=5)
x <- data.frame( start=1:3, end=c( 4, 5, 0))
x %where.warn% (start < end) # gives warning about row 3
(1:5) %such.that% (.>2) # 3,4,5
listio <- list( a=1,  b=2)
chars <- cq( a, b)
chars %SUCH.THAT% (listio[[.]]==2) # 'b'; %such.that% won't work because [[]] can't handle xtuples
{x;y} %<-% list( 'yes', sqrt(pi))
# x: [1] "yes"
# y: [1] 1.772

How to create & maintain packages with mvbutils

Description

This document covers:

using mvbutils to create a new package from scratch;
using mvbutils to maintain a package you've created (e.g. edit it while using it);
converting an existing package into mvbutils-compatible format;
how to customize the package-creation process.

For clarity, the simplest usage is presented first in each case. For how to do things differently, first look further down this document, then in the documentation for pre.install and perhaps doc2Rd.

You need to understand cd and fixr before trying any of this.

Setting up a package from scratch

First, the simplest case: suppose you have some pure R code and maybe data that you'd like to make into a package called "Splendid". The bare-minimum steps you need are:-

Make sure all the code & data lives in a single task called "Splendid".
cd to the task above "Splendid"
maintain.packages( Splendid)
pre.install( Splendid). This will create a "source package" in a subdirectory of Splendid's task directory. The subdirectory will be called "Splendid".
Make sure you have all the R build tools installed and on your path– see "R-exts" for details (and NB that if you need to install Latex, then google MikTex & choose a minimal install).
install.pkg( Splendid) to do what you'd expect. On Windows, you can alternatively first do build.pkg.binary( Splendid), then use R's menus to "Packages/Install from local zip files".
library(Splendid); your package will be loaded for use, and is also ready for live-editing.

Your package will probably just about work now, but the result won't yet be perfect. The additional steps you'll likely need are these:

Sort out the Description file or object[(]see below[)]
Provide Documentation and metadata[(]see below[)]
Sort out any C/Fortran source code, pre-compiled code, demos, and other additional files (see pre.install)
Move any subtasks of Splendid to one level up the task hierarchy (see maintain.packages)

Once you have set up "Splendid" so that maintain.packages works, you won't need to cd directly into "Splendid" again— which is good, because you're not allowed to.

Glossary

Task package is a folder with at least an ".RData" file, linked into the cd hierarchy. It contains master copies of the objects in your package, plus perhaps a few other objects required to build the package (e.g. stand-alone items of documentation).

In-memory task package is an environment in the current R session that contains an image of the task package. Objects in it are never used directly, only as templates for editing. It is loaded by maintain.packages, and Save.pos uses it to update the task package (usually automatic).

Source package is a folder containing, yes, an R-style source package. It is created initially by pre.install, and subsequently by patch.install or pre.install.

Installed package is a folder containing, yes, an R-style installed package. It is always created from the source package, initially by install.pkg and subsequently by patch.install or install.pkg.

Loaded package is the in-memory version of an installed package, loaded by library.

Tarball package is a zipped-up version of a source package, for distro on non-Windows-Mac platforms or submission to CRAN and subsequent installation via "R CMD INSTALL". Usually it will not contain DLLs of any low-level code, just the source low-level code. It is created by build.pkg.

Binary package is a special zipped-up version for distro to Windows or Macs that includes actual DLLs, for installation via e.g. the "Packages/Install from local ZIP" menu. It is created by build.pkg.binary.

Built package is a tarball package or binary package.

Converting an existing package

Suppose you have already have a source package "hardway", and would like to try maintaining it via mvbutils. You'll need to create a task package, then create a new version of the source package, then re-install it. The first step is to call unpackage( hardway) to creat the task package "hardway" in a subdirectory of the current task. Plain-text documentation will be attached to functions, or stored as ".doc" text objects. All functions and documentation must thereafter be edited using fixr. The full sequence is something like:

  # Create task package in subdirectory of current:
  unpackage( "path/to/existing/source/package/hardway")
  #
  # Load image into memory:
  maintain.packages( hardway)
  #
  # Make new version of source package:
  pre.install( hardway, ...) # use dir= to control where new source pkg goes
  #
  install.pkg( hardway) # or build.pkg.binary( hardway) followed by "install from local zip file" menu
  #
  library( hardway) # off yer go

If you get problems after maintain.packages, you might need unmaintain.package( hardway) to clear out the in-memory copy of the new task package.

Documentation and metadata

Documentation for functions can be stored as plain text just after a function's source code, as described in docattr. Just about anything will do– you don't absolutely have to follow the conventional structure of R help if you are really in a hurry. However, the easiest way to add kosheR but skeletal documentation to your function brilliant, is fixr( brilliant, new.doc=TRUE); again, see flatdoc and doc2Rd if you want to understand what's going on. The format is almost exactly as displayed in plain-text help, i.e. from help(..., help_type="text"), except without much indentation (ie no hard linebreaks within paragraphs). My recommendation is to just start writing something that looks reasonable, and see if it works. To quickly test the ultimate appearance, you can use e.g. docotest(..Splendid$brilliant). More generally, run patch.install(Splendid) which, as explained in Maintaining a package below, updates everything for your package including the help system, so you can then just do ?brilliant. If you run into problems with writing documentation for your functions, then refer to doc2Rd for further details of format, such as how to document several functions in the same file.

You can also provide three other types of documentation, for: (i) general use of your package (please do! it helps the user a lot; packages where the doco PDF consists only of an alphabetical list of functions/objects are a pain); (ii) more specific aspects of usage that are not tied to individual functions, such as this file; and (iii) datasets. These types of documentation should be stored in the package as text objects whose name ends in ".doc"; examples of the three types could be "Splendid.package.doc", "glitzograms.with.Splendid.doc", and "earlobes.doc" if you have a dataset earlobes. See doc2Rd for format details.

You must document every function and dataset that the user will see (i.e. exported ones), but you don't need to document any others. Specifying "internal" as a keyword may work for visibly documenting unexported functions, but it's a bit odd. Another way is to assign the plain-text docu to an attribute called secret_doc, in which case you can see it when you edit the function (and it's there if you ever do want to export the function) but no-one else will.

Github etc

You may or may not want to include the plain-text docu in the source for e.g. Github or other public repo. It can be nice to have, because if you ever need to re-create the task package yourself from the Github version (e.g. if something local has gone wrong) then your original docu is right there. You can control that by adding a field "KeepPlaintextDoco: YES" in Description. The default is not to, in which case you'd have to re-generate the plain-text docu via unpackage (not the end of the world, but...). If you do keep plain-text docu on github, then you should probably include a call to dedoc_namespace("<mypackage>") in your .onLoad, so that the plain-text docu is removed from the loaded functions.

Description file or object

When you first create a package from a task via pre.install, there probably won't be any DESCRIPTION information, so mvbutils will create a default "DESCRIPTION" file in your task folder, which it then copies to the source package. However, the default won't really be what you really want, as you'll realize if you type library( help=Splendid). You can either manually edit the default "DESCRIPTION" file, or you can use fixtext(Splendid.DESCRIPTION, pkg="Splendid") to create a text object in your task package, which you then populate with the contents of the default "DESCRIPTION" file, and then edit. If a Splendid.DESCRIPTION object exists, mvbutils will use it in preference to a file; I find this tidier, because more of the package metadata lives in a single place, viz. inside the task package.

Apart from the obvious changes needed to the default "DESCRIPTION" file or text object, the most important fields to add are "Imports:" (or "Depends:" for packages that are pre-R2.14 and that also don't have a namespace), to say what other packages are needed by "Splendid". The DESCRIPTION file/text should rarely need to be updated, since the "autoversion" feature (see pre.install doco) can be used to take care of version numbering. The most common reason to change the DESCRIPTION is probably to add/remove packages in "Imports"; at present, this pretty much requires you to unload & reload the package, but I may try to expedite this in future versions.

Vignettes

Vignettes can be slow to build, and a teeny bit awkward (because they are a post-hoc addition to R adopted only with some reluctance by R-core, is my guess), so they are not handled automatically by pre.install or friends. Instead, you have to build them yourself (after the package is loaded) by calling vignette.pkg, after which patch.install and build.pkg etc should work fine. What it does, is put the "built" vignette files (as distinct from your ".Rmd" etc source) into a folder "inst/doc". I think it updates the vignette stuff in the installed package, too, but I'm not sure...

[Is this still true?] There's no automatic generation of an index for vignettes, except for their filenames. To provide more information, use fixtext to create a text object in your task package called e.g. mypack.VIGNETTES, with lines as follows:

  my.first.vignette: Behold leviathan, mate
  my.second.vignette: What a good idea, to write a vignette

The next bit is hopefully obsolete as of 2023, because vignette.pkg should get the job done. But here it is anyway...

As a very experimental feature, you can also include R code for a homebrewed vignette, via a file with the same name but extension ".R" also in "inst/doc". Users can access it as normal for vignette code, via edit( vignette( "my.first.vignette", package="mypack")) or via doing something to system.file( file.path( "doc", "my.first.vignette.R"), package="mypack").

You can put full-on Sweave-style vignettes into a "vignettes" folder, and they should be set up correctly in the source package. Currently, though, they are not re-installed by patch.install; you need to use build.pkg and install.pkg (partly defeating the point of these package-building utilities).

Very technical details about homebrewed vignettes

"Rnw stubs" are created for all homebrewed vignettes so that the help system finds them. A rudimentary index will be created for vignettes not mentioned in <<mypack>>.VIGNETTES. If you create your own "inst/doc/index.html" file, this takes precedence over mvbutil's versions, so that <<mypack>>.VIGNETTES is not used.

Namespace

Usually this is automatic. pre.install etc automatically creates a "NAMESPACE" file for your package, ensuring inter alia that all documented objects are user-visible. To load DLLs, add a .onLoad function that contains the body code of generic.dll.loader in package mvbutils (thus avoiding dependence on mvbutils). For more complicated fiddling, see Customizing package creation.

Packages without namespaces pre r 2 14

Namespaces only became compulsory with R 2.14. If you're setting up your package in an earlier version of R, mvbutils will not create a namespace unless it finds a .onLoad function. To trigger namespacing, just create a .onLoad with this definition: function( libname, pkgname) {}.

Maintaining a package

Once you have successfully gotten your "Splendid" package installed and loaded the first time, you should rarely need to call install.pkg or build.pkg etc again, except when you are about to distribute to others. In your own work, after calling maintain.packages and library in an R session, you can modify, add and delete functions, datasets, and documentation in your package via the standard functions fixr, move, and rm.pkg (or directly), and these changes will mostly be immediately manifested in the loaded package within your R session– this is "live editing". The changes are made first to the in-memory task package, which will be called e.g. ..Splendid, and then propagated to the loaded package. Don't try to manipulate the loaded package's namespace directly. See maintain.packages for details.

To update the installed package (on disk), call patch.install( Splendid); this also calls pre.install to update the source package, updates the help system in the current session, and does a few other synchronizations. You need to call patch.install before quitting R to ensure that the changes are manifest in the loaded package the next time you start R; otherwise they will only exist in the in-memory task package, and won't be callable.

Troubleshooting

In rare cases, you may find that maintain.packages( Splendid) fails. If that happens, there won't be a ..Splendid environment, which means you can't fix whatever caused the load failure. The load failure is (invariably in my experience) caused by a hidden attempt to load a namespaced package, which is failing for yet another reason, usually something in its .onLoad; that package might or might not be "Splendid" itself. If you can work out what other package is trying to load itself– say badpack– you can temporarily get round the problem by making use of the character vector partial.namespaces, which lives in the "mvb.session.info" search environment, as follows:

  partial.namespaces <<- c( partial.namespaces, "badpack")

That will prevent execution of badpack:::.onLoad. Consequently badpack won't be properly loaded, but at least the task package will be loaded into ..Splendid, so that you can make a start on the problem. If you can't work out which package is causing the trouble, try

  partial.namespaces <<- "EVERY PACKAGE"

After that, no namespaced package will load properly, so remember to clear partial.namespaces <<- NULL before resuming normal service.

Occasionally (usually during patch.install), you might see R errors like "cannot allocate vector of size 4.8Gb". I think this happens when some internal cache gets out-of-synch. It doesn't seem to cause much damage to the installed package, but once it's happened in an R session, it tends to happen again. I usually quit & restart R.

You might also find find.lurking.envs useful, via eapply( ..Splendid, find.lurking.envs); this will show any functions (or other things) in ..Splendid that have accidentally acquired a non-standard environment such as a namespace, which can trigger a "hidden" package load attempt. The environment for all functions in ..Splendid should probably be .GlobalEnv; the environments in the loaded package will be different, of course.

It's rare to need to manually inspect either the source package or the installed package. But if you do, then spkg helps for the former, e.g. dir( spkg( mypack)); and system.file helps for the latter, e.g. system.file( package="mypack"), or system.file( file.path( "help", "AnIndex"), package="mypack").

Distributing and checking

build.pkg calls R CMD BUILD to create a "tarball" of the package (a ".tar.gz" file), which is the appropriate format for distribution to Unix folk and submission to CRAN. build.pkg.binary creates a binary package (a ".zip" file), suitable for Windows or Macs. check.pkg runs R CMD CHECK (but see next paragraph for a quicker alternative), which is required by CRAN and sometimes useful at other times. These .pkg functions are pretty simple wrappers to the R CMD tools with similar names. However, for those with imperfect memories and limited time, there are enough arcane and mutable nuances with the "raw" R CMD commands (including the risk of inadvertently deleting existing installations) to make the wrappers in mvbutils useful.

You might find check.pkg reporting NOTEs along the lines of "no visible global...", especially for variables mentioned in evalq, local, mlocal, or cq. (It's not really an mvbutils thing since evalq and local are base-R, but I mention it here to show the workaround.) This is harmless unless you are intent on submitting your package to CRAN, who will promptly reject it for the time-honoured rationale of "because I say so". If so, see make_CRANtidote for the workaround.

Various functions in the tools package can be used to quickly check specific aspects of an installed package, without needing a full-on, and slow, R CMD CHECK. In particular, I sometimes use

  codoc( spkg( mypack)) # also spkg( "mypack"), spkg( ..mypack)
  undoc( spkg( ..mypack))

Nothing is printed unless a problem is found, so a blank result is good news! It's also possible to run other tools such as checkTnF and checkFF similarly.

By default, mvbutils adds code to the source package to circumvent the CRAN checks for "no visible function/binding", which I consider to be a waste of time; for example, unless circumvented they generate 338 false positives for package mvbutils. If for some reason you actually want these checks, see "Overriding defaults" in pre.install.

Folders and different r versions

Life can get complicated when there are several versions of R around, particularly when they require different package formats at source or build or install time (eg R 2.10, 2.12, R 3.0). install.pkg etc do their best to simplify this for you. You won't normally need to know the details unless you are trying to maintain several versions of your package for different versions of R for distribution to other people who use those different R versions. But if you do need to know the details, then the default folder structure is as follows. If the task package lives in folder "mypack", then the source package is created by pre.install in "mypack/mypack", and the built package(s) will go into folders such as "mypack/R2.15" depending on what R version is running.

Note that your task package can only ever have one version; if different behaviour is required for different R versions, then you need to code this up your functions, or via some trickery in .onLoad.

Built packages

Building comes first: the tarballed/zipped packages from build.pkg and build.pkg.binary are placed in a folder parallel to the source package, with a name of the form "Rx.y". mvbutils tries to be sensible about what "x.y" should be. It will never be newer than the running R version. It will never be older than the most recent major R version that required mandatory package rebuilds (eg R 3.0 and R 2.12). If one or more folders already exist that satisfy those properties, the highest-numbered one will be used. If not, a new folder will be created with the current R major version (eg R 2.15.3 will trigger a folder "R2.15"). You can create your own "Rx.y" folder, for instance if the current version of your package requires an R feature only found in R version "x.y". Also, mvbutils knows which R versions change the format of built packages, and will create a new folder for such a version if required.

The default behaviour is therefore that build.pkg.<binary> will keep building into the same folder. For example, if at some point a "mypack/R2.12" folder was created, then that's where all builds will be sent regardless of the running R version, until you either manually create an "mypack/Rx.y" folder that's closer to the running R version, or the latter hits 3.0 which automatically triggers the creation of a new "mypack/R3.0" folder. Thanks to the "autoversion" feature of pre.install, the version number of the build will change whenever <pre/patch>.install is used. (Note that old built packages are not removed until/unless you explicitly call cull.old.builds, although it's "good housekeeping" to do the latter occasionally.) By manually creating new "Rx.y" folder when necessary, you can ensure that there won't be any updates to built packages for R older than "x.y", which gives a kind of "checkpoint" feature; your built packages for older versions of R (ie for distribution to users of those older R versions) won't be accidentally zapped by cull.old.builds housekeeping, and you can be sure that old code running under old versions of R will still work.

What this does not let you do easily, is use your current R version to create updated versions of your package for R-versions that pre-date the most up-to-date "Rx.y" folder. For example, if you are running R3.0, there is guaranteed to be an "R3.0" folder, so calling build.pkg<.binary> won't build new packages in an "R2.15" folder. Again, usually this doesn't matter, because new "Rx.y" folders are only rarely created automatically, so builds will tend to stay in the same folder and the newest version will be accessible to all. But sometimes it is a hassle... Nevertheless, I have managed to maintain parallel versions of my packages across the R2.15-R3.0 change, by (sequentially) running two R versions and calling build.pkg<.binary> from each. (Note that build.pkg<.binary> can only build in the format of running R version– you can't "cross-build" for different built formats from the same R session.)

Source packages

R occasionally demands a change in source package format, as opposed to built package format (as with R 3.0). (IIRC one example is R 2.10, with the change in helpfile format.) Then you face the problem of how to keep several source packages. This can be controlled by options("mvbutils.sourcepkgdir.postfix"), which is appended to the name of the folder where your source package will be created and used for building or installing. The default is the empty string "", so that the default source package folder for "mypack" is "mypack/mypack". To allow for multiple source package versions, you could put something like this in your .First or ".Rprofile":

  if( getRversion() >= numeric_version( '4.0')) {
    # New source package format
    options( mvbutils.sourcepkgdir.postfix='[R4]')
  }

Everything should then work automatically; all source-package operations will refer to "mypack/mypack[R4]" if you are running version 4 or above, or to "mypack/mypack" if you are running an earlier R version, and you should never really need to know the source package foldername yourself (build.pkg etc do it all for you). This depends on you setting the option yourself, and has not been tested yet. Eventually I may hardwire the feature automatically into mvbutils (or is it better for each source package to go into an appropriate built-package folder? but that sounds a bit like version hell).

Customizing package creation

You can customize many aspects of the mvbutils package-creation process, by adding a function pre.install.hook.Splendid to your package. See pre.install for further details.

Miscellaneous utilities

Description

Miscellaneous utilities.

Usage

add_list_defaults( l, ...)
as.cat( x)
atts( x, exclude=cq( levels, class, dim, dimnames, names, row.names, tsp))
clamp( x, min, max)
clip( x, n=1)
compacto( x, gap, width, extra)
cq( ...)
deparse.names.parsably( x)
disatt( x, keep_=cq( levels, dim, dimnames, names, row.names, tsp), keep)
eclone( env)
empty.data.frame( ...)
env.name.string( env)
expanded.call( nlocal=sys.parent())
everyth( x, by=1, from=1)
find.funs(pos=1, ..., exclude.mcache = TRUE, mode="function")
find.scriptlets(pos=1, ..., exclude.mcache = TRUE, pattern='[.][rR]$')
find.lurking.envs(obj, delve=FALSE, trace=FALSE)
index( lvector)
integ(expr, lo, hi, what = "x", ..., args.to.integrate = list())
inv.logit( qq)
is.dir( dir)
isF( x)
isT( x)
legal.filename( name)
logit( x)
lsall( ...)
masked( pos)
masking( pos=1)
mkdir( dirlist)
most.recent( lvec)
mwhere( x, cond)
my.all.equal( x, y, ...)
named( x)
nscat( fmt, ..., sep='\n', file='')
nscatn( fmt, ..., sep='\n', file='')
option.or.default( opt.name, default=NULL)
pos( substrs, mainstrs, any.case = FALSE, names.for.output)
put.in.session( ...)
rename.els( ..., ignore.missing=FALSE)
returnList( ...)
rsample( n=length(pop), pop, replace=FALSE, prob=NULL)
safe.rbind( df1, df2) # Deprecated in 2013
scatn( fmt, ..., sep='\n', file='', append=FALSE)
sourceable( f, fname=deparse1( substitute( f)))
sqr( x)
to.regexpr( x)
undent( s)
xfactor( x, exclude=if( is.factor( x) && any( is.na( levels( x)))) NULL else NA)
xgsub( x, pattern, replacement, perl=!fixed, fixed=FALSE, ...)
xsub( x, pattern, replacement, perl=!fixed, fixed=FALSE, ...)
yes.no( prompt, default)

Arguments

l, x, y, n, gap, width, extra, ..., by, keep, keep_, env, from, exclude, exclude.mcache, nlocal, lvector, dir, name, pos, frame, mode, dirlist, lvec, cond, opt.name, default, substrs, mainstrs, any.case, names.for.output, ignore.missing, pop, replace, prob, df1, df2, prompt, obj, delve, trace, fmt, sep, append, file, expr, lo, hi, what, args.to.integrate, qq, s, min, max, pattern, replacement, perl, fixed, f, fname

see "Arguments by function"

Details

add_list_defaults appends its ... argument(s) to its l argument, excluding those where l already has an element with that name. l should be a list.

as.cat makes a character vector print as if it was catted rather than printed (one element per line, no extra quotes or backslashes, no [1] etc prefixes).

atts returns the names of the attributes of x, excluding any that are in exclude.

clamp clamps its 1st argument to the limits specified by the 2nd and 3rd. You can also just supply a range of values in the 2nd arg, and leave the 3rd missing. It's meant for use with pipes; see Examples.

clip removes the last n elements of x. OBSOLETE— use head( x, -n) instead.

compacto gives a matrix an extra S3 class "compacto", which means it will print out with column names/label vertical and optionally no gaps between the columns. gap and width control the latter in fairly obvious ways. extra controls what gets printed to help the eye follow vertical alignment. See Examples; there is a method print.compacto which surely needs little further description.

cq is handy for typing cq( alpha, beta, gamma) instead of cq( "alpha", "beta", "gamma"). Certain strings DO still require quotes around them, e.g. cq( "NULL", "1-2")).

deparse.names.parsably is like deparse except that name objects get wrapped in a call to as.name, so that they won't be evaluated accidentally.

disatt gets rid of most attributes on x. If you want to preserve some, use keep. The usually-default argument keep_, which is merged with keep, ensures that the "basic" attributes are retained; if you want to drop some of those too, you will have to modify keep_. Note that S3 class is dropped by default, because some S3 objects may not make sense without certain attributes.

eclone clones an environment into a new one with the same parent, ie making deep copies of all the (non-environment) members, so that changing their values in the new env won't affect the original values (unlike if you just assign the old env to the new one). Functions whose environment was the original environment, will have their environment reset to the new one. If you don't understand that, then either don't worry be happy, or do more homework on R's environment objects. See also Examples.

empty.data.frame creates a template data frame with 0 rows but with all columns of the appropriate type. Useful for rbinding to later.

env.name.string returns a string naming an environment; its name attribute if there is one, or the name of its path attribute if applicable, concatenated with the first line of what would be shown if you printed the argument. Unlike environmentName, this will always return a non-empty string.

expanded.call returns the full argument list available to its caller, including defaults where arguments were not set explicitly. The arguments may not be those originally passed, if they were modified before the invocation of expanded.call. Default arguments which depend on calculations after the invocation of expanded.call will lead to an error.

everyth extracts every by-th element of x, starting at position from.

find.funs finds "function" objects (or objects of other modes, via the "mode" arg) in one or more environments, optionally matching a pattern.

find.scriptlets is like find.funs but looks for character vectors whose name suggests that they are a "scriptlet" (ie text runnable with eval(parse(text=<scriptlet>))), as per fixr and suitable for mrun or mdrun in the debug package).

find.lurking.envs( myobj) will search through myobj and all its attributes, returning the size of each sub-object. The size of environments is returned as Inf. The search is completely recursive, except for environments and by default the inner workings of functions; attributes of the entire function are always recursed. Changing the delve parameter to TRUE ensures full recursion of function arguments and function bodies, which will show e.g. the srcref structure; try it to see why the default is FALSE. find.lurking.envs can be very useful for working out e.g. why the result of a model-fitting function is taking up 1000000MB of disk space; sometimes this is due to unnecessary environments in well-concealed places.

index returns the position(s) of TRUE elements. Unlike which: attributes are lost; NA elements map to NAs; index(<<length 0 object>>) is numeric(0); index( <<non-logical>>) is NA.

integ is a handy wrapper for integrate, that takes an expression rather than a function— so integ( sin(x), 0, 1) "just works".

is.dir tests for directoriness.

isF and isT test a logical scalar in the obvious way, with NA (and non-logicals) failing the test, to avoid teeeedious repetition of is( !is.na( my.complicated.expression) & my.complicated.expression) .... They are deliberately not vectorized (contrary to some versions of mvbutils documentation); arguments with non-1 length trigger a warning.

legal.filename coerces its character argument into a similar-looking string that is a legal filename on any (?) system.

logit and inv.logit apply those transformations (for those of us who can never remember what the stats package versions are called).

lsall is like ls but coerces all.names=TRUE.

masked checks which objects in search()[pos] are masked by identically-named objects higher in the search path. masking checks for objects mask identically-named objects lower in the search path. Namespaces may make the results irrelevant.

mkdir makes directories; unlike dir.create, it can do several levels at once.

most.recent returns the highest-so-far position of TRUE within a logical vector, or 0 if TRUE has not occurred yet; most.recent( c(F,T,F,T)) returns c(0,2,2,4).

mwhere subsets a data.frame by row, just like %where% (qv); it's for use in pipes, as per Examples.

my.all.equal is like all.equal, except that it returns FALSE in cases where all.equal returns a non-logical-mode result.

named(x) is just names(x) <- as.character( x); x; useful for lapply etc.

nscat, nscatn: see scatn

option.or.default obsolete— use equivalent getOption() instead.

pos is probably to be eschewed in new code, in favour of gregexpr with fixed=TRUE, which is likely faster. (And I should rewrite it to use gregexpr.) It's one of a few legacy functions in mvbutils that pre-date improvements in base R. pos will either search for several literal patterns in a single target, or vice versa– but not both. It returns a matrix showing the positions of the matching substrings, with as many columns as the maximum number of matches. 0 signifies "no match"; there is always at least one column even if there are no matches at all.

rename.els replaces specified names of a vector with new ones.

returnList returns a list corresponding to old-style (pre-R 1.8) return syntax. Briefly: a single argument is returned as itself. Multiple arguments are returned in a list. The names of that list are the argument names if provided; or, for any unnamed argument that is just a symbolic name, that symbolic name; or no name at all, for other unnamed arguments. You can duplicate pre-1.8 behaviour of return(...) via return(returnList(...)).

rsample draws n random samples from pop, according to replace and prob. It is like R's built-in sample but avoids the latter's inconsistent syntax, instead using a syntax similar to all the other r... random variable functions.

safe.rbind ( Deprecated in 2013 ) mimics rbind, but works round an R bug (I reckon) where a column appears to be a numeric in one data.frame but a factor in the other. But I now think you should just sort your column classes/types properly in advance, rather than mixing types and relying on somewhat arbitrary conversion rules.

scatn is just cat( sprintf( fmt, ...), "", file=file, sep=sep). scatn prints a newline afterwards, but not before; nscat does the opposite; nscatn does both. If you're just displaying a "title" before calling print, use nscat.

sqr squares its argument (i.e. multiplies the argument by itself), without the risk that x^2 might incur exponentiation.

to.regexpr converts literal strings to their equivalent regexps, e.g. by doubling backslashes. Useful if you want "fixed=TRUE" to apply only to a portion of your regexp.

undent is handy when you want a slab of multi-line text inside some function you are writing. Raw-string syntax helps a lot (see the final examples of ?Quotes), but indentation is horrible and the first line is out-of-step with the rest. You ideally want your text to appear indented at whatever looks nice inside your code, but for the actual string not to be indented. So, start your raw string with a newline, and wrap the string in undent, and all will be well.

sourceable takes a function and returns a character vector which, when printed with print or writeLines, will probably be amenable to source. Unlike deparse, it keeps the original source text, including comments. It will strip the "<environment:gsd907897gsd>" and "<bytecode:097a0sdg>" verbiage which otherwise often prevents source from working, and which frequently annoy me. But it will try to keep other attributes, such as useful "constants" accessed from within the function's code via eg environment(sys.function())$<usefulconst>. sourceable may be more generally useful than the similar write.sourceable.function because the latter is geared up to dealing directly with the mvbutils function-documentation system, and also actually writes to a file— whereas sourceable returns a character vector (of class cat) which it's up to you to write or whatever.

xfactor either turns a non-factor x into a factor, honouring the exclude argument of factor; or, with a factor x, maps any NA levels to a non-NA level with label "\001" (ASCII 1). ICNR why :)

xsub and xgsub are for pipes. They are just like sub and gsub, except that the x argument comes first, and that there is a default of perl=TRUE (unless you set fixed=TRUE). So you can write eg str |> xsub( "old", "new") rather than str |> sub( "old", "new", x=_) or sub( "old", "new", str). It's just better.

yes.no cats its "prompt" argument and waits for user input. if the user input pmatches "yes" or "YES", then yes.no returns TRUE; if the input pmatches no or NO then yes.no returns FALSE; if the input is ” and default is set, then yes.no returns default; otherwise it repeats the question. You probably want to put a space at the end of prompt.

Value

as.cat

character vector of class cat

clip

vector of the same mode as x

cq

character vector

empty.data.frame

data.frame

env.name.string

a string

expanded.call

a call object

everyth

same type as x

find.funs

a character vector of function names

find.scriptlets

a character vector of scriptlet names

find.lurking.envs

a data.frame with columns "what" and "size"

integ

scalar

inv.logit

numeric vector

is.dir

logical vector

is.nonzero

TRUE or FALSE

isF, isT

TRUE or FALSE

legal.filename

character( 1)

logit

numeric vector

masked

character vector

masking

character vector

mclip

possibly-modified version of x

mkdir

logical vector of success/failure

nscat

NULL

nscatn

NULL

most.recent

integer vector the same length as lvec, with values in the range (0,length(lvec)).

named

vector of the same mode as x

option.or.default

option's value

pos

numeric matrix, one column per match found plus one; at least one column guaranteed

rename.els

whatever the first argument was, with new names

returnList

list or single object

rsample

vector of same type as pop

safe.rbind

data.frame

scatn

NULL

sourceable

character (class cat to print nicely)

sqr

as per input

to.regexpr

character

undent

string

xgsub

character

xsub

character

yes.no

TRUE or FALSE

Arguments by function

add_list_defaults: l: a list. ...: name-value pairs that act as defaults for l if it doesn't already contain elements with those names.
as.cat: x: character vector that you want to be displayed via cat( x, sep="\n")
atts: x: any object; exclude: a character vector whatever quotidian attributes that you are not interested in knowing about
clip: x: a vector or list
clip: n: integer saying how many elements to clip from the end of x
cq: ...: quoted or unquoted character strings, to be substituted and then concatenated
deparse.names.parsably: x: any object for deparse- name objects treated specially
eclone: env: an environment
empty.data.frame: ...: named length-1 vectors of appropriate mode, e.g. "first.col=”"
env.name.string: env: environment
expanded.call: nlocal: frame to retrieve arguments from. Normally, use the default; see mlocal.
everyth: x: subsettable thing. by: step between values to extract. from: first position.
find.funs, find.scriptlets: ...: extra arguments for objects. Usually just "pattern" for regexp searches.
find.funs, find.scriplets: exclude.mcache: if TRUE (default), don't look at mlazy objects
find.funs: mode: "function" to look for functions, "environment" to look for environments, etc
find.scriptlets: pattern: regexp that scriptlet names should match.
find.lurking.envs: delve: whether to recurse into function arguments and function bodies
find.lurking.envs: trace: just a debugging aid– leave as FALSE
index: lvector: vector of TRUE/FALSE/NA
integ: expr: an expression; what: a string, the argument of expr to be integrated over; lo, hi: limits; ...: other variables to be set in the expression; args.to.integrate: a list of other things to pass to integrate
is.dir: dir: character vector of files to check existence and directoriness of.
isF, isT: x: anything, but meant to be a logical scalar
legal.filename: name: character string to be modified
find.funs: pos: list of environments, or vector of char or numeric positions in search path.
lsall: ...: as for ls, except that all.names will be coerced to TRUE
masking, masked: pos: position in search path
mclip: x: thing to be clipped (usually numeric, but character should work)— dimensions and other attributes are preserved; min, max: clipping range
mkdir: dirlist: character vector of directories to create
most.recent: logical vector
my.all.equal: x, y: anything; ...: passed to all.equal
named: x: character vector which will become its own names attribute
nscat, nscatn: see scatn
option.or.default: opt.name: character(1)
option.or.default: default: value to be returned if there is no option called "opt.name"
pos: substrs: character vector of patterns (literal not regexpr)
pos: mainstrs: character vector to search for substrs in.
pos: any.case: logical- ignore case?
pos: names.for.output: character vector to label rows of output matrix; optional
put.in.session: ...: a named set of objects, to be assigned into the mvb.session.info search environment
rename.els: ...: the first argument is the thing to rename. Subsequent args like X=<some expr giving a string result> mean that whichever element of the first arg was called "X", will now be called the result of that expression. ignore.missing=TRUE means that requests to rename non-existent elements will be ignored; otherwise, they will throw an error.
returnList: ...: named or un-named arguments, just as for return before R 1.8.
rsample: n: number to draw; pop: values they can take; replace: whether to sample with replacement; prob: weights (must be same length as pop)
safe.rbind: df1, df2: data.frame or list
scatn, nscat: fmt, ...: as per sprintf; file, sep, append: as per cat
sourceable: f: an actual function object
sourceable: fname: what name should the function be assigned to, if the result is fed to source? Default is the name of f itself, which is usually fine.
sqr: x: anything for which * is a valid op.
to.regexpr: x: character vector
undent: s: string, presumably a "raw string".
xfactor: a factor.
xgsub: x, pattern, replacement, perl=!fixed, fixed= FALSE, ...: as per gsub
xsub: as per xgsub
yes.no: prompt: string to put before asking for input
yes.no: default: value to return if user just presses <ENTER>

Author(s)

Mark Bravington

Examples

# add_list_defaults
ll <- list( A='cat', B=c('dog','goldfish'), C='funnelweb')
add_list_defaults( ll, B='rabbit', D='anthrax')
# B does not change, but D is added
# as.cat
ugly.bugly <- c( 'A rose by any other name', 'would annoy taxonomists')
ugly.bugly
#[1] "A rose by any other name"                 "would annoy taxonomists"
as.cat( ugly.bugly) # calls print.cat--- no clutter
#A rose by any other name
#would annoy taxonomists
x <- structure( matrix( 1:4, 2, 2), baggage='purple suitcase')
atts( x) # will not print "dim" since that is in default 'exclude' list
#[1] "baggage"
1:7 |> clamp( 2, 4)
#[1] 2 2 3 4 4 4 4
1:7 |> clamp( 2:4)
#[1] 2 2 3 4 4 4 4
clip( 1:5, 2) # 1:3
cq( alpha, beta) # c( "alpha", "beta")
x <- matrix( 1:4, 2, 2)
compacto( x)
compacto( x, extra='|', width=3) # similar to gap... yet different
colnames( x) <- c( 'Gogol', 'Turgenev')
compacto( x)
x <- 6
attr( x, 'massive') <- 1:1e3 # not that massive; used to have 1e5, but
# ... CRAN checks prints bloody everything!
x
disatt( x)
old_env <- new.env()
evalq( envir=old_env, {
  x <- 3
  fun <- function() x
})
new_env <- eclone( old_env)
new_env$x <- 5
new_env$fun() # 5
lazy_env <- old_env
lazy_env$x <- 4
old_env$x # 4 ! Take care with environments...
old_env$fun() # 4 of course
new_env$x # 5 phew
empty.data.frame( a=1, b="yes")
# data.frame with 0 rows of columns "a" (numeric) and "b" (character)
empty.data.frame( a=1, b=factor( c( "yes", "no")))$b
# factor with levels c( "no", "yes")
everyth( 1:10, 3, 5) # c( 5, 8)
f <- function( a=9, b) expanded.call(); f( 3, 4) # list( a=3, b=4)
find.funs( "package:base", patt="an") # "transform" etc.
e <- new.env()
e$myscript.R <- as.cat( string2charvec( r"{
  # raw strings are great!
  dir()
}"))
find.scriptlets( e) # "myscript.R"
find.lurking.envs( cd)
#                                     what  size
#1                     attr(obj, "source")  5368
#2                                     obj 49556
#3 environment(obj) <: namespace:mvbutils>   Inf
## Don't run:
eapply( .GlobalEnv, find.lurking.envs)
## End don't run
integ( sin(x), 0, 1) # [1] 0.4597
integ( sin(x+a), a=5, 0, 1) # [1] -0.6765; 'a' is "passed" to 'expr'
integ( sin(y+a), what='y', 0, 1, a=0) # [1] 0.4597; arg is 'y' not 'x'
is.dir( getwd()) # TRUE
isF( FALSE) # TRUE
isF( NA) # FALSE
isF( c( FALSE, FALSE)) # FALSE, with a warning
sapply( c( FALSE, NA, TRUE), isF)
# [1]  TRUE FALSE FALSE
sapply( c( FALSE, NA, TRUE), isT)
# [1] FALSE FALSE  TRUE
legal.filename( "a:b\\c/d&f") # "a.b.c.d&f"
most.recent( c( FALSE,TRUE,FALSE,TRUE)) # c( 0, 2, 2, 4)
# mwhere for subsetting: find vowels whose alphetic position is a multiple of 5
df <- data.frame( x=1:10, y=LETTERS[ 1:10])
# Base-R pipes may not exist for the R version being used here
# So, try to parse the expression first...
pp <- try( parse( text=
 'df |> mwhere( x %% 5 == 0) |> mwhere( y %in% cq( A,E,I,O,U))'
))
if( pp %is.not.a% 'try-error') eval( pp[[1]]) # just E-row
rsample( 9, LETTERS[1:3], replace=TRUE)
sapply( named( cq( alpha, beta)), nchar)  # c( alpha=5, beta=4)
pos( cq( quick, lazy), "the quick brown fox jumped over the lazy dog")
# matrix( c( 5, 37), nrow=2)
pos( "quick", c( "first quick", "second quick quick", "third"))
# matrix( c( 7,8,0, 0,14,0), nrow=3)
pos( "quick", "slow") # matrix( 0)
x <- c( Cat='good', Dog='bad')
rename.els( x, Cat='Armadillo')
# Armadillo       Dog
#    "good"     "bad"
try( rename.els( x, Zorilla='Bandicoot'))
# Error in rename.els(x, Zorilla = "Bandicoot") : all(present) is not TRUE
rename.els( x, Zorilla='Bandicoot', ignore.missing=TRUE)
#    Cat    Dog
# "good"  "bad"
f <- function() { a <- 9; return( returnList( a, a*a, a2=a+a)) }
f() # list( a=9, 81, a2=18)
scatn( 'Things %i', 1:3)
nscat( 'Things %i', 1:3)
nscatn( 'Things %i', 1:3)
to.regexpr( "a{{") # "a\\{\\{"
    test <- undent( r"--{
        I can indent this
          howsoever I like.
        New paragraph!
      }--")
as.cat( test)
glurp <- function( x) const + x
attr( glurp, 'const') <- 44
sourceable( glurp)
# To avoid an intermediate file, use
# ... eval(parse(text=<>))) in place of source
oglurp <- glurp
glurp <- eval( parse( text= sourceable( glurp))) # will overwrite klunge
identical( glurp, oglurp) # yes
longstring <- 'Bollocks, then the good stuff, then more bollocks'
longstring |> xsub( ',[^,]*$', '') |> xsub( '.*, *', '')
# "then the good stuff"
## Don't run:
# and i mean REALLY don't, so stop crazy Craniac bypasses...
if( FALSE){ mkdir( "subdirectory.of.getwd")}
if( interactive()) yes.no( "OK (Y/N)? ")
masking( 1)
masked( 5)
## End don't run

Arbitrary-level retrieval from and modification of recursive objects

Description

As of R 2.12, you probably don't need these at all. But, in case you do: my.index and my.index.assign are designed to replace [[ and [[<- within a function, to allow arbitrary-depth access into any recursive object. In order to avoid conflicts with system usage and/or slowdowns, it is wise to do this only inside a function definition where they are needed. A zero-length index returns the entire object, which I think is more sensible than the default behaviour (chuck a tanty). my.index.exists tests whether the indexed element actually exists. Note that these functions were written in 2001; since then, base-R has extended the default behaviour of [[ etc for recursive objects, so that my.index( thing, 1, 3, 5) can sometimes be achieved just by to thing[[c(1,3,5)]] with the system version of [[. However, at least as of R 2.10.1, the system versions still have limited "recursability".

Usage

# Use them like this, inside a function definition:
# assign( "[[", my.index); var[[i]]
# assign( "[[<-", my.index.assign); var[[i]] <- value
my.index( var, ...) # not normally called by name
my.index.assign( var, ..., value) # not normally called by name
my.index.exists( i, var)

Arguments

var

a recursive object of any mode (not just list, but e.g. call too)

value

anything

...

one or more numeric index vectors, to be concatenated

i

numeric index vector

Details

Although R allows arbitrary-level access to lists, this does not (yet) extend to call objects or certain other language objects– hence these functions. They are written entirely in R, and are probably very slow as a result. Notwithstanding EXAMPLES below, it is unwise to replace system [[ and [[<- with these replacements at a global level, i.e. outside the body of a function– these replacements do not dispatch based on object class, for example.

Note that my.index and my.index.assign distort strict R syntax, by concatenating their ... arguments before lookup. Strictly speaking, R says that x[[2,1]] should extract one element from a matrix list; however, this doesn't really seem useful because the same result can always be achieved by x[2,1][[1]]. With my.index, x[[2,1]] is the same as x[[c(2,1)]]. The convenience of automatic concatentation seemed slightly preferable (at least when I wrote these, in 2001).

my.index.exists checks whether var is "deep enough" for var[[i]] to work. Unlike the others, it does not automatically concatenate indices.

At present, there is no facility to use a mixture of character and numeric indexes, which you can in S+ via "list subscripting of lists".

Author(s)

Mark Bravington

Examples

local({
  assign( "[[", my.index)
  assign( "[[<-", my.index.assign)
  ff <- function() { a <- b + c }
  body( ff)[[2,3,2]] # as.name( "b")
  my.index.exists( c(2,3,2), body( ff)) # TRUE
  my.index.exists( c(2,3,2,1), body( ff)) # FALSE
  body( ff)[[2,3,2]] <- quote( ifelse( a>1,2,3))
  ff # function () { a <- ifelse(a > 1, 2, 3) + c }
  my.index.exists( c(2,3,2,1), body( ff)) # now TRUE
})

Prints a call object nicely

Description

Prints a call-mode object nicely, with one argument per line. This is useful, for example, in displaying readably the outcomes of sys.call(), which is often used to create a call attribute for the results of complicated functions.

Usage

noice( cc, ...)

Arguments

cc

a call object, eg something appended to a fitting result via sys.call.

...

any other arguments for deparse

Value

Character vector with one argument per line, of class as.cat so that it prints cleanly. Long arguments are truncated, so the result is not guaranteed to re-parse cleanly (a general issue with R which seems unavoidable in any powerful language).

Examples

# This is a bona fide function call from my own work
# normally it would be evaluated directly, and sys.call()
# would be used inside it to assign a 'call' attrib to the result
# but the call attrib then looks like a mess-o-rama
# The quote() wrapper is just used here to make the point
# It would be interesting if 'call' could cope with a 'source' or
# 'srcref' argument, and would "know" how to print itself, but that
# is a big ask
# BTW, the 72-char limit in Rd EXAMPLES and USAGE is a PITBA
monster <- quote( est_N(
    popcompo = fp1a_17,
    df_rs_as_at_l = NULL,
    df_rs_ls = NULL, # NB comments are allowed, but get chucked
    newstyle_data = data17b,
    use_alpha_hsp = TRUE,
    AMIN = 8, AMAX = 30,
    YMIN = 2002, YMAX = 2014,
    prior_mean_z_plusgroup = 0.386,
    prior_sd_z_plusgroup = 0.0268,
    LMIN = 150, LMAX = 200,
    logit_surv_form = ~ I( pmax( age, 19)- AMAX) - 1,
    log_nsa_y1_form = ~factor(sex),
    log_nys_a1_reqm_form = ~0,
    logit_tresid_form = ~sex * I(len - 170),
    log_selbase_form = ~ 0,
    log_daily_reprodm_form = ~ 0,
    vb_form = ~sex,
    log_vb_cv_Linf_form = ~1,
    log_rct_re_var_start = log( sqr( 0.41)),
    fix_CV_R = TRUE,
    RE_rct = TRUE,
    sel_is_by_sex = TRUE,
    ssreduce_l = 1,
    fec_bout_start_fit = start_of_bout,
    fec_rest_start_fit = start_of_rest,
    fec_ovwt_fit = bfec,
    lf_sel_model=lv10kk5fix,
    nu_lata = 12))
monster # yuk
noice( monster) # yum

Economy numerical derivatives

Description

numvbderiv does simple two-point symmetric numerical differentiation of any function WRTO a vector parameter, via (f(x+delta)-f(x-delta))/(2*delta). Your function can return a vector/matrix/array of real or complex type, and if the x parameter is not scalar, then the result has one extra dimension at the end for the per-parameter-element derivatives.

For multi-parameter (length(x)>=4 or so) derivatives of slow functions, you can speed things up a lot with parallel processing, by setting PARALLEL=TRUE or (better) by directly calling numvbderiv_parallel. But, be aware there is substantial learning-curve-pain-cost to all this parallel shenanigans in R. numvbderiv_parallel uses the foreach package to diff wrto each component x[i] in parallel, using however many cores at a time you tell it to. You have to set up a "parallel cluster" beforehand in R. See Examples— it took me a long time to get this working, but now it's good.

numvbderiv is definitely "economy model" and for many many years I have kept it out of mvbutils, because it is not particularly accurate nor incredibly robust, and I didn't want to have to deal with people's questions! But I use numvbderiv and numvbderiv_parallel all the time in code that I want to share (sometimes with different names, omitting the "mv"), and in 2024 it just became too annoying to have to distribute them separately. So here they are, with nice new names, and tarted-up documentation that you are now enjoying, but still warts and all.

Faq

Q: Surely there are well-known methods to produce more accurate and robust numerical derivatives?
A: Yep.
Q: I want something more!
A: Then use something else!
Q: Oh well. But I guess numvbderiv is easy to use, right?
A: Yep.

As to accuracy: IME numvbderiv is usually fine, and computationally cheap! The relevant parameter is eps; to compute Df(x)/dx|x=x0 your function f is evaluated at x0+/-eps*x0 (unless x0==0 exactly, in which case it is at +/-eps). The bigger you go with eps, the less mathematically accurate the result, since the neglected higher-order terms are bigger; but if you go too small, then the answer becomes computationally inaccurate because of rounding etc. The default is crude but has usually worked OK for me, given this is not a high-accuracy routine. I sometimes play around with values between 1e-3 and 1e-7. If you're worried, try two different values that differ by an order-of-mag.

Unlike eg the numDeriv package or pracma::numderiv, which use more function evaluations at various step-sizes to account for higher-order terms in the finite-difference approximation, numvbderiv does not try to be very accurate, and you do have to specify the step yourself (see Arguments) or trust the default. Nevertheless, I expect my numvbderiv to be more accurate than the original (or still-current default) of stats::numericDeriv because the latter appears not to do symmetric calculation, based on the code in "Writing R Extensions" section 5.11. IE, it just does (f(x0+e)-f(x0))/e. [Update in 2024: AFAIK stats::numericDeriv used never to even have a symmetric option, but it now appears to have added one now via its central argument— although that defaults to FALSE :/ .] Also, stats::numericDeriv is pretty horrible to use TBH; AFAIK its main historical purpose was just to show how to interface C to R, not to actually differentiate stuff!

Usage

numvbderiv( f, x0, eps=0.0001, param.name=NULL, ...,
  SIMPLIFY=TRUE, PARALLEL=FALSE,
  TWICE.TO.GET.SENSIBLE.ANSWERS=TRUE)
numvbderiv_parallel(f, x0, eps = 0.0001, param.name = NULL, ...,
    SIMPLIFY = TRUE,  PARALLEL = TRUE,
    PROGRESS_BAR = interactive() && .Platform$OS.type!='unix',
    PROGRESS_BAR_FILE = "",  FOREACH_ARGS = list())

Arguments

f

function of one or more arguments

x0

value to numdiff around

eps

Relative step-size. Evaluation is at x0+/-eps*x0 unless x0==0 exactly, in which case it is at +/-eps. See Faq.

param.name

Unless the parameter you want to diff WRTO comes first in the argument-list of f, you need to specify its name, eg param.name="c" if your function is f(a,b,c) and you wanna diff wirto the third one.

...

Other args that your f wants.

SIMPLIFY

If TRUE and f appears to return a "scalar-equivalent" result (eg all-but-one of its dimensions are of extent 1, as you can sometimes get eg from a matrix-multiply I guess if you use R's built-in routine), then this will turn the result into a pure vector. Avoids you getting tedious N*1 or 1*N "matrix" results that you then have to c() yourself.

PARALLEL

if FALSE, use the scalar version. If TRUE and length(x0)>1 and the foreach package is available and there is a "currently registered doPar backend" [sic], then parallel woop-woop magic will be used. numvbderiv/numvbderiv_parallel have defaults PARALLEL=FALSE/TRUE respectively.

FOREACH_ARGS

things to pass to foreach::foreach, eg .packages or perhaps .exports so your function can find stuff it needs when it is invoked in a new cold lonely R session.

PROGRESS_BAR

If you are bothering to use the parallel version, then presumably things are fairly slow; you can set PROGRESS_BAR=TRUE to see how it's going. I don't know if it works on Linux, coz it relies on flush.console, so the default there is FALSE, but you can give it a try.)

PROGRESS_BAR_FILE

I use numvbderiv_parallel during interactive R sessions in RGui, and the default of appearing in the console seems ideal. For other uses, you might need to tell the progress bar to appear somewhere else, via this argument which is passed as the file argument of txtProgressBar.

TWICE.TO.GET.SENSIBLE.ANSWERS

Leave it alone!!! Not for you.

Details

The progress bar

The progress bar (parallel case only) uses a txtProgressBar and some excellent Github code from K Vasilopoulos. It's no good trying to get your own function to show its progress or call-count in the parallel case, because it will be executing in separate invisible R processes and messages don't get sent back, so this is the only convenient way. However, the nature of foreach means that this progress bar is only updated when a task finishes, and since all deriv-steps will take about the same time, you'll probably get the first 4 finishing all-at-once, so that progress will update in a very clunky fashion and if your parameter is of low dimension, the bar may not help. The numvbderiv_parallel code actually does try to update the progress-bar before the paralleling begins, immediately after the very first function call which is to f(x0) itself, so in principle you should "quickly" get some idea of how long it's all gonna take— but that update doesn't always seem to show up. Displaying the bar relies on a call to utils::flush.console (qv) so prolly doesn't work under Unix; maybe there's another way. Future versions of numvbderivParallel may let you supply your own progress-bar rather than forcing txtProgressBar. For now, be grateful for what you have been given.

Value

Normally, an array/matrix with same dimensions as f(x0) except for an extra one at the end, of length(x0). If SIMPLIFY=TRUE (see Arguments) and a pure vector "makes sense", then the dimensions will be stripped and you'll get a pure vector.

Examples

# Complex numbers are OK:
numvbderiv( function( x) x*x, complex( real=1, imaginary=3))
# [1] 2+6i
# Parallel example...  the whole point is to show speed and generality
# Works fine on my machine
# But if testing under CRAN, which I normally never do,
# then CRAN's ludicrous 2-core limit, and deliberate inability to
# check CRANality (or even number of cores _allowed_) while running,
# makes this completely ridiculous
# Not for the first time
# I have used the function 'get_ncores_CRANal' to try to get round this...
if( require( 'doParallel')){ # auto loads foreach, iterators, parallel thx2 "Depends"
  ncores <- detectCores( logical=FALSE)
  scatn( '%i cores really found', ncores)
  if( ncores > 2 ){ # pointless otherwise
    # Need a slowish example. 1e5 is too small; 1e7 better,
    # ... but hard on auto builders eg R-universe
    BIGGOVAL <- 1e5
    slowfun <- function( pars, BIGGO)
      sum( sqr( 1+1/outer( seq_len( BIGGO), pars)))
    parstart <- rep( 2, 8)
    system.time(
      dscalar <- numvbderiv( slowfun, parstart,
          BIGGO=BIGGOVAL # named extra param (part of ...)
        )
    ) # scalar
    # Make "doPar back end". I do not know what I am doing ...
    # NB I like to leave some cores spare, hence "-1"--
    # superstition, really
    ncores_target <- min( ncores-1, length( parstart))
    # Anti CRANky: ignore on your own machine:
    # ncores_target should just work
    ncores_avail <- get_ncores_CRANal( ncores_target)
    scatn( 'Using %i cores eg cozza CRAN', ncores_avail)
    CLUSTO <- makeCluster( ncores_avail)
    registerDoParallel( CLUSTO, ncores_avail)
    # Next bit ensures slaves can find packages... sigh.
    # Necessary _here_ coz example, but you may not need it
    # clusterCall does not work properly :/, so the "obvious" fails:
    # clusterCall( CLUSTO, .libPaths, .libPaths())
    # Instead, we are forced into this nonsense:
    print( # for debuggery with as-CRAN
    eval( substitute(
        clusterEvalQ( CLUSTO, .libPaths( lb)),
        list( lb=.libPaths())))
    )
    # Need 'mvbutils::sqr', hence '.packages' arg
    scatn( 'Starting parallel time test')
    print( system.time(
      dpara <- numvbderiv_parallel( slowfun, parstart,
          BIGGO=BIGGOVAL, # named extra parameter
          FOREACH_ARGS=list( .packages= 'mvbutils')
        )
      )
    )
    scatn( 'Done')
    print( rbind( dscalar, dpara))
    # To refer to other data (ie beside params)
    # best practice is to put it into function's environment
    # (generally true, not just for numvbderiv)
    e <- new.env()
    e$paroffset <- c( 6, -3)
    fun2 <- function( pars) { # not a speed test, can be smaller
        sum( sqr( 1+1/outer( 1:1e3, pars+paroffset)))
      }
    environment( fun2) <- e
    scatn( 'Scalar, using extra data via environment')
    print( numvbderiv( fun2, parstart))
    # Parallel version should still work, coz function's environment
    # is also passed to slaves
    scatn( 'Trying parallel version...')
    print( try({
      numvbderiv_parallel( fun2, parstart,
          FOREACH_ARGS=list( .packages= 'mvbutils')
        )
      })
    )
    # Sometimes you do need to explicitly export stuff to the slave processes
    # Here's a version that will get paroffset from datenv
    # datenv must exist...
    alt_fun2 <- function( pars){
      environment( fun2) <- list2env( datenv)
      fun2( pars)
    }
    scatn( 'With explicit data (in parallel)')
    datenv <- as.list( e)
    print( numvbderiv_parallel( alt_fun2, parstart,
        FOREACH_ARGS=list(
          .packages= 'mvbutils',
          .export= cq( datenv, fun2) # stuff that alt_fun2 refers to
          )
      )
    )
    # Always tidy up your clusters once you have finished playing
    stopImplicitCluster()
    stopCluster( CLUSTO)
    rm( CLUSTO)
  } # if ncores>2
} # parallel

Update a source and/or installed package from a task package

Description

See mvbutils.packaging.tools before reading or experimenting!

pre.install creates a "source package" from a "task package", ready for first-time installation using install.pkg. You must have called maintain.packages( mypack) at some point in your R session before pre.install( mypack) etc.

patch.install is normally sufficient for subsequent maintenance of an already-installed package (ie you rarely need call install.pkg again). Again, maintain.packages must have been called earlier. It's also expected that the package has been loaded via library() before patch.install is called, but this may not be required. patch.install first calls pre.install and then modifies the installed package accordingly on-the-fly, so there is no need to re-load or re-build or re-install. patch.install also updates the help system with immediate effect, i.e. during the current R session. You don't need to call patch.install after every little maintenance change to your package during an R session; it's usually only necessary when (i) you want updated help, or (ii) you want to make the changes "permanent" (eg so they'll work in your next R session). However, it's not a problem to call patch.install quite often. patch.installed is a synonym for patch.install.

It's possible to tweak the source-package-creation process, and this is what 'pre.install.hook..." is for; see Details and section on OVERRIDINGx.DEFAULTS below.

spkg is a rarely-needed utility that returns the folder of source package created by pre.install.

Vignettes have to be built "manually" (but it's easy!), using vignette.pkg.

Usage

 # 95% of the time you just need:
 # pre.install( pkg)
 # patch.install( pkg)
 # Your own hook: pre.install.hook.<<mypack>>( default.list, <<myspecialargs>>, ...)
 pre.install(
     pkg,
     character.only= FALSE,
     force.all.docs= FALSE,
     rewrap.forcibly= TRUE,
     dir.above.source= "+",
     autoversion= getOption("mvb.autoversion", TRUE),
     click.version= TRUE,
     R.target.version= getRversion(),
     Roxygen= NULL,
     timeout_Roxygen= getOption( 'mvb.timeout_Roxgyen', 0), # seconds
     vignette.build= TRUE,
     silent= FALSE,
     ...)
 patch.installed(
     pkg,
     character.only= FALSE,
     force.all.docs= FALSE,
     rewrap.forcibly= TRUE,
     help.patch= TRUE,
     DLLs.only= FALSE,
     update.installed.cache= getOption("mvb.update.installed.cache", TRUE),
     pre.inst= !DLLs.only,
     dir.above.source= "+",
     R.target.version= getRversion(),
     autoversion= getOption("mvb.autoversion", TRUE),
     click.version= TRUE,
     vignette.build= FALSE,
     compress.lazyload= getOption( 'mvb.compress.lazyload', TRUE),
     silent= FALSE)
 patch.install(...) # actually, args are exactly as for 'patch.installed'
 spkg( pkg)

Arguments

pkg

package name. Either quoted or unquoted is OK; unquoted will be treated as quoted unless character.only=TRUE. Here and in most other places in mvbutils, you can also specify an actual in-memory-task-package object such as ..mypack.

character.only

Default FALSE, which allows unquoted package names. You can set it to TRUE, or just set e.g. char="my@funny@name", which will trump any use of pkg.

force.all.docs

normally just create help files for objects whose documentation has changed (which will always be generated, regardless of force.all.docs). If TRUE, then recreate help for all documented objects. Can also be a character vector of specific docfile names (usually function names, but can be the names of the Rd file, without path or the Rd extension), in which case those Rd files will be regenerated.

rewrap.forcibly

iff the package contains low-level code (C etc) and this is TRUE, this will re-invoke the PIBH (Pre-Install Build Hook) to recreate "housekeeping" code that e.g. creates R wrappers to call the low-level code from (avoiding direct use of .Call etc). Otherwise, the "housekeeping" code will only be recreated if the low-level source code has changed. It will also purge any pre-existing files in the "R" subfolder of of your task— be warned! That feature is so that if you change from using eg the Rcpp package to the RcppTidy package, you won't have multiple versions of C-loaders. See other places in doco...

help.patch

if TRUE, patch the help of the installed package

DLLs.only

just synchronize the DLLs and don't bother with other steps (see Compiled code)

default.list

list of various things– see under "Overriding..." below

...

arguments to pass to your pre.install.hook.XXX function, usually if you want to be able to build different "flavours" of a package (e.g. a trial version vs. a production version, or versions with and without enormous datasets included). In patch.install, ... is just shorthand for the arg list of patch.installed.

update.installed.cache

If TRUE, then clear the installed-package cache, so that things like installed.packages work OK. The only reason to set to FALSE could be speed, if you have lots of packages; feedback appreciated. Default is TRUE unless you have set options( mvb.update.installed.cache=FALSE).

pre.inst

?run pre.install first? Default is TRUE unless DLLs.only=TRUE; leave it unless you know better.

autoversion

if TRUE, use the <mypack>.VERSION counter to update source-package DESCRIPTION. This is generally much better than manually tweaking the task package's <mypack>.DESCRIPTION object (or task package's DESCRIPTION file, in the hyper-manual case). Only versions with at least 3 levels will be updated:so 1.0.0 will go to 1.0.1, 1.0.0.0 will go to 1.0.0.1, but 1.0 will stay the same. Default is TRUE unless you have set options( mvb.autoversion=FALSE). To force a major revsion, modify ..<mypack>$<mypack>.VERSION yourself (unless you are using a manual version in Description, which is discouraged). However, if you have manually changed the DESCRIPTION object or file's version to something beyond the source/installed version, then the larger number will take precedence.

click.version

if TRUE, try to automatically increment the version number in the source (and installed, if patch.install) packages. Normally a good idea, except if you are updating several incompatible libraries for different R versions— then you will need to run pre.install/patch.install in each R version but for the same underyling package, and you don't want the versions to get out of synch!

vignette.build

if TRUE, call tools::buildVignettes to generate HTMLs and/or PDFs from any RMD vignettes (only), so that the (?binary?) package will install properly. At present, you are still responsible for generating the RMD files in the first place; and the installed package is not updated by patch.install even if vignette.build=TRUE. Note that the default value is different for pre.install vs patch.install, because the former is meant to prepare a package for distro, whereas the latter is mainly updating the locally-installed version, for which vignette-updates won't work (would need to rebuild installed index, etc). If vignette.build=FALSE, any hand-pre-built vignettes may still get installed correctly. This is all messy stuff, subject to change!

dir.above.source

folder within which the source package will go, with a + at the start being shorthand for the task package folder (the default). Hence pre.install( pkg=mypack, dir="+/holder") will lead to creation of "holder/mypack" below the task folder of mypack. Set this manually if you have to maintain different versions of the package for different R versions, or different flavours of the package for other reasons, or if your source package must live in a "subversion tree" (whatever that is).

R.target.version

Not needed 99% of the time; use only if you want to create source package for a different version of R. Supercedes the Rd.version argument of pre.install pre-'mvbutils' 2.5.57, used to control the documentation format. Set R.target.version to something less than "2.10" for ye olde "Rd version 1" format.

compress.lazyload

Installed packages feature "lazy-load" databases for documentation and for the R functions themselves (whether you like it or not), and patch.installed updates them. By default, R will compress these databases as it builds them, which can be remarkably slow. This option is an experimental feature to make uncompressed versions (by tweaking the compress argument to tools:::makeLazyLoadDB). You can try setting options(mvbutils.compress.lazyload=FALSE) for a speedup, but it's not really tested yet...

Roxygen

?should the Rd files be run thru Rd2Roxygen and added into the source-script "<mypack>.R"? NULL (default) means that the decision is based on whether the DESCRIPTION file contains a "RoxygenNote" field.

timeout_Roxygen

In case Rd2roxygen just does not work properly, you can stop it hanging the machine by setting this timeout (in which, pre-installation will continue, but without any Roxygen in the source— ohdearhowsadnevermindeh). Use options( mvb.timeout_Roxygen=<something>) to do it for your whole R session (which is also the only way to apply it during patch.install).

silent

whether to show messages about starting/finishing documentation-prep and lazyification.

Details

As per the Glossary section of mvbutils.packaging.tools: the "task package" is the directory containing the ".RData" file with the guts of your package, which should be linked into the cd task hierarchy. The "source package" is usually the directory "<<pkg>>" below the task package, which will be created if needs be.

The default behaviour of pre.install is as follows– to change it, see Overriding defaults. A basic source package is created in a sourcedirectory "<<pkg>>" of the current task. The package will have at least a DESCRIPTION file, a NAMESPACE file, a single R source file with name "<<pkg>>.R" in the "R" sourcedirectory, possibly a "sysdata.rda" file in the same place to contain non-functions, and a set of Rd files in the "man" sourcedirectory. Rd files will be auto-created from docattr or flatdoc style documentation, although precedence will be given to any pre-existing Rd files found in an "Rd" source directory of your task, which get copied directly into the package. If the DESCRIPTION file or object contains a field "KeepPlaintextDoco" with value YES/TRUE or abbrevation thereof, then the plain-text "docattr" documentation will be stored in the R source file too— see dedoc_namespace. If DESCRIPTION includes a "RoxygenNote" field, then pre.install will try to add Roxgyen comments before documented functions, using Rd2Roxygen (which is buggy, but at least one bug gets fixed automatically here). Any "inst", "demo", "vignettes", "tests", "src", "exec", and "data" subdirectories will be copied to the source package, recursively (i.e. including any of their sourcedirectories). There is no compilation of source code, since only a source package is being created; see also Compiled code below.

Most objects in the task package will go into the source package, but there are usually a few you wouldn't want there: objects that are concerned only with how to create the package in the first place, and ephemeral system clutter such as .Random.seed. The default exceptions are: functions pre.install.hook.<<pkg>>, .First.task, and .Last.task; data <<pkg>>.file.exclude.regexes, <<pkg>>.DESCRIPTION, <<pkg>>.VERSION, <<pkg>>.UNSTABLE, forced!exports, .required, .Depends, tasks, .Traceback, .packageName, last.warning, .Last.value, .Random.seed, .SavedPlots; and any character vector whose name ends with ".doc".

All pre-existing files in the "man", "src", "tests", "exec", "demo", "inst", and "R" sourcedirectories of the source-package directory will be removed (unless you have some mlazy objects; see below). If— but this is deprecated— a file ".Rbuildignore" is present in the task package, then it's copied to the package directory, but I've never gotten this feature to work. If not but there is an object <pkg>.Rbuildignore (the preferred way; it should be a character vector), then that's used (and is automatically augmented to exclude some task-package housekeeping files). To exclude files that would otherwise be copied, i.e. those in "inst/demo/src/data" folders, create a character vector of regexes called <<pkg>>.file.exclude.regexes; any file matching any of these won't be copied.

If there is a "changes.txt" file in the task package (but this is deprecated), it will be copied to the "inst" sourcedirectory of the package, as will any files in the task's own "inst" sourcedirectory. A DESCRIPTION file will be created, preferably from a <<pkg>>.DESCRIPTION object in the task package; see mvbutils.packaging.tools for more. Any "Makefile.*" in the task package will be copied, as will any in the "src" sourcedirectory (not sure why both places are allowed). No other files or sourcedirectories in the package directory will be created or removed, but some essential files will be modified.

Any other character-vectors in the task package with names mypack.x, where "mypack" is your packagename and "x" is one of (NEWS, CHANGES, LICENCE, LICENSE, INSTALL, configure, cleanup, ChangeLog, README, Rbuildignore) or "README.y" where "y" is whatever, will be written into the source package as the corresponding file (e.g. a NEWS file will be created).

If a NAMESPACE file is present in the task (usually no need), then it is copied directly to the package. If not, then pre.install will generate a NAMESPACE file by calling make.NAMESPACE, which makes reasonable guesses about what to import, export, and S3methodize. What is & isn't an S3 method is generally deduced OK (see make.NAMESPACE for gruesome details), but you can override the defaults via the pre-install hook. FWIW, since adding the package-creation features to mvbutils, I have never bothered explicitly writing a NAMESPACE file for any of my packages. By default, only documented functions are exported (i.e. visible to the user or other packages); the rest are only available to other functions in your package.

If any of the Rd files starts with a period, e.g. ".dotty.name", it will be renamed to e.g. "01.dotty.name.Rd" to avoid some problems with RCMD. This should never matter, but just so you know...

To speed up conversion of documentation, a list of raw & converted documentation is stored in the file "doc2Rd.info.rda" in the task package, and conversion is only done for objects whose raw documentation has changed, unless force.all.docs is TRUE.

pre.install creates a file "funs.rda" in the package's "R" sourcedirectory, which is subsequently used by patch.installed. The function build.pkg (or R CMD BUILD) and friends will omit this file (currently with a complaint, which I intend to fix eventually, but which does not cause trouble).

Compiled code

pre.install tries to produce automatic R-side/C-side wrappers of C(++) code written for package Rcpp. The system used is extensible by the user (a pretty advanced user!) to other flavours of R/C code, via Clink_packages (qv). Current extensions are RcppTidy and ADT. Version-tracking on the automatically-generated wrapper files (one in "./src" and one in "./R") doesn't seem to be working that well yet, and you may well need to call rewrap_forcibly to make it happen. Watch out for old versions of those files left lying around by accident, since they can cause havoc.

In the case of package RcppTidy, what happens is this (actually via Clink_packages()$RcppTidy, which itself is a function that mvbutils is notified of by package RcppTidy itself when the latter is loaded):

it calls compileAttributes to generate an "RcppExports.cpp" file in the source package (in folder "src"), if a change from previous one is detected. The file is edited to retain the md5sum of the sources, which is used in subsequent runs to check for changes.
it modifies the "RcppExports.R" file so that the R-side auto headers are all placed into an environment DLL in the namespace.

This is to avoid polluting your namespace at point-of-load with possible aliases for C code, and to allow you to document and/or export "Rcpp functions" in the same way you would your other functions. It is less automated but arguably more controlled. To export (for the R-level user) a "Rcpp function", you need to explicitly write a wrapper, eg

  rapid_thing <- function( a, b, c) DLL$rapid_thing( a, b, c)

and add documentation for rapid_thing. There is also provision for different compilation systems, like RcppTidy and an ADT-oriented one...

patch.install does not compile source code; currently, you need to do that yourself, though I might add support for that if I can work a sufficiently general mechanism. If you use R to do your compilation, then install.pkg should work after pre.install, though you may need detach("package:mypack", unload=T) first and that will disrupt your R session. Alternatively, you may be able to use R CMD SHLIB to create the DLL directly, which you can then copy into the "libs" sourcedirectory of the installed package, without needing to re-install. I haven't tried this, but colleagues have reported success.

If, like me, you pre-compile your own DLLs manually (not allowed on CRAN, but fine for distribution to other users on the same OS), then you can put the DLLs into a folder "inst/libs" of the task (see next for Windows); they will end up as usual in the "libs" folder of the installed package, even though R itself hasn't compiled them. On Windows, put the DLLs one level deeper in "inst/libs/<<arch>>" instead, where "<<arch>>" is found from .Platform$r_arch; for 32-bit Windows, it's currently "i386". All references in this section to "libs", whether in the task or source or installed package, should be taken as meaning "libs/<<arch>>". You pretty much also need to create the alternate "x64" folder, too, even if it's empty; otherwise, the mvbutils installation tools will fail ( >= R3.3 or so).

To load your package's DLLs, call library.dynam in the .onLoad function, for example like this:

  .onLoad <- function( libname, pkgname){
    library.dynam( 'my_first_dll', package=pkgname)
    library.dynam( 'my_other_dll', package=pkgname)  # fine to have several DLLs
  }

To automatically load all DLLs, you can copy the body of mvbutils:::generic.dll.loader into your own .onLoad, or just include a call to generic.dll.loader(libname,pkgname) if you don't mind having dependence on mvbutils.

After the package has been installed for the first time, I change my compiler settings so that the DLL is created directly in the installed package's "libs" folder; this means I can use the compiler's debugger while R is running. To accommodate this, patch.install behaves as follows:

any new DLLs in the task package are copied to the installed package;
any DLLs in the installed package but not in the task package are deleted;
for any DLLs in both task & installed, both copies are synchronized to the newer version;
the source package always matches the task package

You can call patch.install( mypack, DLLs.only=TRUE) if you only want the DLL-synching step.

(Before version 2.5.57, mvbutils allowed more latitude in where you could put your home-brewed DLLs, but it just made life more confusing. The only place that now works is as above.)

Data objects

Data objects are handled a bit differently to the recommendations in "R extensions" and elsewhere– but the end result for the package user is the same, or better. The changes have been made to speed up package maintenance, and to improve useability. Specifically:

Undocumented data objects live only in the package's namespace, i.e. visible only to your functions.
Documented data objects appear both in the visible part of the package (i.e. in the search path), and in the namespace. [The R standard is that these should not be visible in the namespace, but this doesn't seem sensible to me.]
The easiest way to export a data object, is to "document" it by putting its name into an alias line of the doc attribute of an existing function. (Alias lines are single-word lines directly after the first line of the doc attr.)
To document a data object xxx in its own right, include a flat-format text object xxx.doc in your task package; see doc2Rd. xxx.doc itself won't appear in the packaged object, but will result in documentation for xxx and any other data objects that are given as alias lines.
Big data objects can be set up for transparent individual lazy-loading (see below) to save time & memory, but lazy-loading is otherwise off by default for individual data objects.
There is no need for the user ever to call data to access a dataset in the package, and in fact it won't work.

Note that the data(...) function has been pretty much obsolete since the advent of lazy-loading in R 2.0; see R-news #4/2.

In terms of package structure, as opposed to operation, there is no "data" sourcedirectory. Data lives either in the "sysdata.rdb/rdx" files in the "R" sourcedirectory (but can still be user-visible, which is not normally the case for objects in those files), or in the "mlazy" sourcedirectory for those objects with individual lazy-loading.

Big data objects

Lazy-loading objects cached with mlazy are handled specially, to speed up pre.install. Such objects get their cache-files copied to "inst/mlazy", and the .onLoad is prepended with code that will load them on demand. By default, they are exported if and only if documented, and are not locked. The following objects are not packaged by default, even if mlazyed: .Random.seed, .Traceback, last.warning, and .Saved.plots. These are mlazyed automatically if options( mvb.quick.cd) is TRUE– see cd.

Tinytests

Any "scriptlets" (charvecs whose name ends ".r" or ".R') whose first line contains the word "tinytest", are assumed to be for package tinytest. They will be written into eponymous files in the "inst/tinytest" folder, where they will be accessible to tinytest::test_package(<mypack>). If you already have manual tinytest script-files in "inst/tinytest", they will be copied into the sourcedirectory tree too (and will overwrite any scriptlets with the same names). Package tinytest also requires a magic file "tinytest.R" to exist in the folder "<mypack>/tests", and that will be created in the sourcedirectory if it does not exist in your task directory. Remember to add "tinytest" to "Suggests" field in Description.

Documentation and exporting

Package documentation

Just because you have a package Splendid, it doesn't follow that a user will be able to figure out how to use it from the alphabetical list of functions in library( help=Splendid); even if you've written vignettes, it may not be obvious which to use. The recommended way to provide a package overview is via "package documentation", which the user accesses via package?Splendid. You can write this in a text object called e.g. "Splendid.package.doc", which will be passed through doc2Rd with an extra "docType{package}" field added. The first line should start e.g. "Splendid-package" and the corresponding ".Rd" file will be put first into the index. It's good to have just the name of the package as a second line (unless it is also the name of an already-documented function within the package). Speaking as a frequently bewildered would-be user of other people's packages– and one who readily gives up if the "help" is impenetrable– I urge you to make use of this feature!

Vignettes

See mvbutils.packaging.tools.

Bare minimum for export

Only documented functions and data are exported from your package (unless you resort to the subterfuge described in the subsection after this). Documented things are those found by find.documented( doc="any"). The simplest way to document something is just to add its name as an "alias line" to the existing documentation of another function, before the first empty line. For example, if you're already using flatdoc to document my.beautiful.function, you can technically "document" and thus export other functions like so:

  structure( function( blahblahblah)...
  ,doc=flatdoc())
  my.beautiful.function    package:splendid
  other.exported.function.1
  other.exported.function.2

The package will build & install OK even if you don't provide USAGE and ARGUMENTS sections for the other functions. Of course, R CMD CHECK wouldn't like it (and may have a point on this occasion). If you just are after "legal" (for R CMD CHECK) albeit unhelpful documentation for some of your functions that you can't face writing proper doco for yet, see make.usage.section and make.argument.section.

Exporting undocumented things and vice versa

A bit naughty (RCMD CHECK complains), but quite doable. Note that "things" can be data objects, not just functions. Simply write a pre-install hook (see Overriding defaults) that includes something like this:

  pre.install.hook.mypack <- function( hooklist) {
    hooklist$nsinfo$exports <- c( hooklist$nsinfo$exports, "my.undocumented.thing")
  return( hooklist)
  }

You can follow a similar approach if you want to document something but not to export it (so that it can only be accessed by Splendid:::unexported.thing). This probably isn't naughty.

Overriding defaults

Source package folder can be controlled via options("mvbutils.sourcepkgdir.postfix"), as per "Folders and different R versions" in mvbutils.packaging.tools. You'd only need to do this if you have multiple R versions installed that require different source-package formats (something that does not often change).

If a function pre.install.hook.<<pkgname>> exists in the task "<<pkgname>>", it will be called during pre.install. It will be passed one list-mode argument, containing default values for various installation things that can be adjusted; and it should return a list with the same names. It will also be passed any ... arguments to pre.install, which can be used e.g. to set "production mode" vs "informal mode" of the end product. For example, you might call preinstall(mypack,modo="production") and then write a function pre.install.hook.mypack( hooklist, modo) that includes or excludes certain files depending on the value of modo. The hook can do two things: sort out any file issues not adequately handled by pre.install, and/or change the following elements in the list that is passed in. The return value should be the possibly-modified list. Hook list elements are:

copies: files to copy directly
dll.paths: DLLs to copy directly
extra.filecontents: named list; each element is the contents of a text file, the corresponding name being the path of the file to create eg "inst/src/utils.pas"— a nonstandard name
extra.docs: names of character-mode objects that constitute flat-format documentation
description: named elements of DESCRIPTION file
task.path: path of task (ready-to-install package will be created as a sourcedirectory in this)
has.namespace: should a namespace be used?
use.existing.NAMESPACE: ignore default and just copy the existing NAMESPACE file?
nsinfo: default namespace information, to be written iff has.namespace==TRUE and use.existing.NAMESPACE==FALSE
exclude.funs: any functions not to include
exclude.data: non-functions to exclude from system.rda
dont.check.visibility: either TRUE (default default), FALSE, or a specified character vector, to say which objects are not to be checked for "globality" by RCMD CHECK (using the globalVariables mechanism). Leave alone if you don't understand this. You can change the "default default" via options( mvb_dont_check_visibility=FALSE).

There are two reasons for using a hook rather than directly setting parameters in pre.install. The first is that pre.install will calculate sensible but non-obvious default values for most things, and it is easier to change the defaults than to set them up from scratch in the call. The second is that once you have written a hook, you can forget about it– you don't have to remember special argument values each time you call pre.install for that task.

Debugging a pre install hook

To understand what's in the list and how to write a pre-install hook, the easiest way is probably to write a dummy one and then mtrace it before calling pre.install(mypack). However, it's all a bit clunky at present (July 2011). Because the hook only exists in the "..mypack" shadow environment, mtrace won't find it automatically, so you'll need mtrace( pre.install.hook.mypack, from=..mypack). That's fine, but if you then modify the source of your hook function, you'll get an error following the "Reapplying trace..." message. So you need to do mtrace.off before saving your edited hook-function source, and then mtrace the hook again before calling pre.install(mypack). To be fixed, if I can work out how...

Different versions of r

R seems to be rather fond of changing the structural requirements of source & installed packages. mvbutils tries to shield you from those arcane and ephemeral details– usually, your task package will not need changing, and pre.install will automatically generate source & installed packages in whatever format R currently requires. However, sometimes you do at least need to be able to build different "instances" of your package for different versions of R. The sourcedir and maybe the R.target.version arguments of pre.install may help with this.

But if you need to build instances of your package for a different version of R, then you may need this argument (and dir.above.source). I try to keep mvbutils up-to-date with R's fairly frequent revisions to package structure rules, with the aim that you (or I) can easily produce a source/binary-source package for a version of R later than the one you're using right now, merely by setting R.target.version. However, be warned that this may not always be enough; there might at some point be changes in R that will require you to be running the appropriate R version (and an appropriate version of mvbutils) just to recreate/rebuild your package in an appropriate form.

The nuances of R.target.version change with the changing tides of R versions, but the whole point of pre.install etc is that you shouldn't really need to know about those details; mvbutils tries to look after them for you. For example, though: as of 10/2011, the "detailed behaviour" is to enforce namespaces if R.target.version >= 2.14, regardless of whether your package has a .onLoad or not.

Packages without namespaces pre r2 14

You used to be allowed to build packages without namespaces– not to be encouraged for general distribution IMO, but occasionally a useful shortcut for your own stuff nevertheless (mainly because everything is "exported", documented or not). For R <= 2.14, mvbutils will decide for itself whether your package is meant to be namespaced, based on whether any of the following apply: there is a NAMESPACE file in the task package; there is a .onLoad function in the task; there is an "Imports" directive in the DESCRIPTION file.

Author(s)

Mark Bravington

Examples

## Not run: 
if( FALSE && is_very_annoying( CRAN)){ # otherwise CMD CHECK --as-cran tries to run this :/
  # Workflow for simple case:
  cd( task.above.mypack)
  maintain.packages( mypack)
  # First-time setup, or after major R version changes:
  pre.install( mypack)
  install.pkg( mypack)
  library( mypack)
  # ... do stuff
  # Subsequent maintenance:
  maintain.packages( mypack) # only once per session, usually at the start
  library( mypack) # maybe optional
  # ...do various things involving changes to mypack, then...
  patch.install( mypack) # keep disk image up-to-date
  # Prepare copies for distribution
  build.pkg( mypack) # for Linux or CRAN
  build.pkg.binary( mypack) # for Windows or Macs
  check.pkg( mypack) # if you like that sort of thing
} # if F

## End(Not run)

Print values

Description

See base-R documentation of print and print.default. Users should see no difference with the mvbutils versions; they need to be documented and exported in mvbutils for purely technical reasons. There are also three useful special-purpose print methods in mvbutils; see Value.Some of the base-R documentation is reproduced below.

The motive for redeclaration is to have a seamless transition within the fixr editing system, from the nice simple "source"-attribute system used to store function source-code before R2.14, to the quite extraordinarily complicated "srcref" system used thereafter. mvbutils does so via an augmented version of base-R's print method for functions, without which your careful formatting and commenting would be lost. If a function has a "source" attribute but no "srcref" attribute (as would be the case for many functions created prior to R2.14), then the augmented print.function will use the "source" attribute. There is no difference from base-R in normal use.

See How to override an s3 method if you really want to understand the technicalities.

Usage

print(x, ...) # generic
## Default S3 method:
print(x, ...) # S3 method for default
## S3 method for class 'function'
print(x, useSource=TRUE, ...) # S3 method for function
## S3 method for class 'cat'
print(x, ...) # S3 method for cat
## S3 method for class 'compacto'
print(x, ...,
    gap= attr( x, 'gap'),
    width= attr( x, 'width'),
    extra= attr( x, 'extra'))# S3 method for compacto
## S3 method for class 'specialprint'
print(x, ...) # S3 method for specialprint
## S3 method for class 'pagertemp'
print(x, ...) # S3 method for pagertemp
## S3 method for class 'call'
print(x, ...) # S3 method for call
## S3 replacement method for class ''<-''
print(x, ...) # S3 method for "<-" (a special sort of call)
## S3 method for class ''(''
print(x, ...) # S3 method for "(" (a special sort of call)
#print(x, ...) # S3 method for "{" (a special sort of call)
## S3 method for class ''if''
print(x, ...) # S3 method for "if" (a special sort of call)
## S3 method for class ''for''
print(x, ...) # S3 method for "for" (a special sort of call)
## S3 method for class ''while''
print(x, ...) # S3 method for "while" (a special sort of call)
## S3 method for class 'name'
print(x, ...) # S3 method for name (symbol)

Arguments

x

thing to print.

...

other arguments passed to NextMethod and/or ignored. There are many special arguments to base-R print.default, as described in its documentation. They are not named individually in the mvbutils version for technical reasons, but you can still use them.

gap, width, extra

see compacto

useSource

[print.function] logical, indicating whether to use source references or copies rather than deparsing language objects. The default is to use the original source if it is available. The mvbutils override will print a "source" attribute if one exists but no "srcref" attribute does, whereas base-R post-2.14 would just print a deparsed version of the function.

Value

Technically, an invisible version of the object is returned. But the point of print is to display the object. print.function displays source code, as per Description. print.default and print.call need to exist in mvbutils only for technical reasons. The other two special methods are: print.cat applies to character vectors of S3 class cat, which are printed each on a new line, without the [1] prefix or double-quotes or backslashes. It's ideal for displaying "plain text". Use as.cat to coerce a character vector so that it prints this way. print.compacto shows compacto matrices with vertical column names/labels, and optionally with no gaps between columns. print.specialprint can be used to ensure an object (of class specialprint) displays in any particular way you want, without bothering to define a new S3 class and write a print method. Just give the object an attribute "print" of mode expression, which can refer to the main argument x and any other arguments. That expression will be run by print.specialprint– see Examples. print.pagertemp is meant only for internal use by the informal-help viewer.

How to override an s3 method

Suppose you maintain a package mypack in which you want to mildly redefine an existing S3 method, like mvbutils does with print.function. (Drastic redefinitions are likely to be a bad idea, but adding or tweaking functionality can occasionally make sense.) The aim is that other packages which import mypack should end up using your redefined method, and so should the user if they have explicitly called library( mypack). But your redefined method should not be visible to packages that don't import mypack, nor to the user if mypack has only been loaded implicitly (i.e. if mypack is imported by another package, so that asNamespace(mypack) is loaded but package:mypack doesn't appear on the search path). It's hard to find out how to do this. Here's what I have discovered:

For a new S3 method (i.e. for a class that doesn't already have one), then you just need to mark it as an S3method in the mypack NAMESPACE file (which mvbutils packaging tools do for you automatically). You don't need to document the new method explicitly, and consequently there's no need to export it. The new method will still be found when the generic runs on an object of the appropriate class.
If you're modifying an existing method, you can't just declare it as S3method in the NAMESPACE file of mypack. If that's all you did, R would complain that it already has a registered method for that class— fair enough. Therefore, you also have to redeclare and export the generic, so that there's a "clean slate" for registering the method (specifically, in the S3 methods table for mypack, where the new generic lives). The new generic will probably be identical to the existing generic, very likely just a call to UseMethod. Because it's exported, it needs to be documented; you can either just refer to base-R documentation (but you still need all the formal stuff for Arguments etc, otherwise RCMD CHECK complains), or you can duplicate the base-R doco with a note. help2flatdoc is useful here, assuming you're wisely using mvbutils to build & maintain your package.
If you redeclare the generic, you also need to make sure that your method is exported as well as S3-registered in the NAMESPACE file of mypack. This is somehow connected with the obscure scoping behaviour of UseMethod and I don't really understand it, but the result is: if you don't export your method, then it's not found by the new generic (even though it exists in asNamespace(mypack), which is the environment of the new generic, and even though your method is also S3-registered in that same environment). Because you export the method, you also need to document it.
Unfortunately, the new generic won't know about the methods already registered for the old generic. So, for most generics (exceptions listed later), you will also have to define a generic.default method in mypack— and you need to export and therefore document it too, as per the previous point. This generic.default just needs to invoke the original generic, so that the already-registered S3 methods are searched. However, this can lead to infinite loops if you're not careful. See mvbutils:::print.default for how to do it. If you were redefining a generic that was originally (or most recently) defined somewhere other than baseenv(), then you'd need to replace the latter with asNamespace(<<original.defining.package>>).
Because your new generic.default might invoke any of the pre-existing (or subsequently-registered) methods of the original generic, you should just make its argument list x,.... In other words, don't name individual arguments even if they are named in the original generic.default (eg for print.default).
Objects of mode name, call, and "(" or "{" or "<-" (special types of call) cause trouble in generic.default (at least using the approach in the previous point, as in mvbutils:::print.default). Unless they have a specific method, the object will be automatically evaluated. So if your generic is ever likely to be invoked on a call object, you'll need a special generic.call method, as in mvbutils:::print.call; the same goes for those other objects.
A few generics— rbind and cbind, for example— use their own internal dispatch mechanism and don't have e.g. an rbind.default. Of course, there is a default behaviour, but it's not defined by an R-level function; see ?InternalGenerics. For these generics, the previous point wouldn't work as a way of looking for existing methods. Fortunately, at least for rbind, things seem to "just work" if your redefined generic simply runs the code of the base generic (but don't call the latter directly, or you risk infinite loops— just run its body code). Then, if your generic is run, the search order is (1) methods registered for your generic in asNamespace("mypack"), whether defined in mypack itself or subsequently registered by another package that uses mypack, (2) methods defined/registered for the base generic (ie in the original generic's namespace), (3) the original "implicit default method". But if the original generic is run (e.g. from another package that doesn't import mypack), then step (1) is skipped. This is good; if another package pack2 imports mypack and registers an S3 method, the S3 registration will go into the mypack S3 lookup table, but if pack2 doesn't import mypack then the S3 registration will go into the base S3 lookup table (or the lookup table for whichever package the generic was originally defined in, eg package stats).

Examples

# Special methods shown below; basic behaviour of 'print', 'print.default',
# and 'print.function' is as for base-R
#cat
ugly.bugly <- c( 'A rose by any other name', 'would annoy taxonomists')
ugly.bugly
#[1] "A rose by any other name"                 "would annoy taxonomists"
as.cat( ugly.bugly) # calls print.cat--- no clutter
#A rose by any other name
#would annoy taxonomists
# nullprint:
biggo <- 1:1000
biggo
# [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18
# [2] 19  20  21  22  23  24  25  26  27  28 etc...
oldClass( biggo) <- 'nullprint'
biggo # calls print.nullprint
# nuthin'
# specialprint:
x <- matrix( exp( seq( -20, 19, by=2)), 4, 5)
attr( x, 'print') <- expression( {
    x[] <- sprintf( '%12.2f', x);
    class( x) <- 'noquote';
    attr( x, 'print') <- NULL;
    print( x)
  })
class( x) <- 'specialprint'
x # calls print.specialprint; consistently formatted for once
#     [,1]         [,2]         [,3]         [,4]         [,5]
#[1,]         0.00         0.00         0.02        54.60    162754.79
#[2,]         0.00         0.00         0.14       403.43   1202604.28
#[3,]         0.00         0.00         1.00      2980.96   8886110.52
#[4,]         0.00         0.00         7.39     22026.47  65659969.14

Data frames: better behaviour with zero-length cases

Description

rbind concatenates its arguments by row; see cbind for basic documentation. There is an rbind method for data frames which mvbutils overrides, and rbdf calls the override directly. The mvbutils version should behave exactly as the base-R version, with two exceptions:

zero-row arguments are not ignored, e.g. so that factor levels which never appear are not dropped.
dimensioned (array or matrix) elements do not lose any extra attributes (such as class).

I find the zero-row behaviour more logical, and useful because e.g. it lets me create an empty.data.frame with the correct type/class/levels for all columns, then subsequently add rows to it. The behaviour for matrix (array) elements allows e.g. the rbinding of data frames that contain matrices of POSIXct elements without losing the POSIXct class (as in my package nicetime).

When rbinding data frames, best practice is to make sure all the arguments really are data frames. Lists and matrices also work OK (they are first coerced to data frames), but scalars are dangerous (even though base-R will process them without complaint). rbind is quirky around data frames; unless all the arguments are data frames, sometimes rbind.data.frame will not be called even when you'd expect it to be, and the coercion of scalars is frankly potty; see Details and EXAMPLES. mvbutils:::rbind.data.frame tries to mimic the base-R scalar coercion, but I'm not sure it's 100% compatible. Again, the safest way to ensure a predictable outcome, is to make sure all arguments really are data frames, and/or to call rbdf directly.

Note that ("thanks" to stringsAsFactors) the order in which data frames are rbound can affect the result— see Examples.

Obsolete

Versions of mvbutils prior to 2.8.207 installed replacements for $<-.data.frame and [[<-.data.frame that circumvented weird behaviour with the base-R versions when the data.frame had zero rows. That weird behaviour seems to be fixed in base-R as of version 3.4.4 (perhaps earlier). I've therefore removed those replacements (after warnings from newer versions RCMD CHECK). Hopefully, everything works... but just for the record, here's the old text, which I think no longer applies.

[I think this paragraph is obsolete.] Normally, you can replace elements in, or add a column to, data frames via e.g. x$y <- z or x[["y"]] <- z. However, in base-R this fails for no good reason if x is a zero-row data frame; the sensible behaviour when y doesn't exist yet, would be to create a zero-length column of the appropriate class. mvbutils overrides the base (mis)behaviour so it works sensibly. Should work for matrix/array "replacements" too.

Usage

rbind(..., deparse.level = 1) # generic
## S3 method for class 'data.frame'
rbind(..., deparse.level = 1) # S3 method for data.frame
rbdf(..., deparse.level = 1) # explicitly call S3 method...
# ... for data frames (circumvent rbind dispatch)
## OBSOLETE x[[i,j]] <- value # S3 method for data.frame; only ...
## OBS ... the version x[[i]] <- value is relevant here, tho' arguably j==0 might be
## OBS x$name <- value # S3 method for data.frame

Arguments

...

Data frames, or things that will coerced to data frames. NULLs are ignored.

deparse.level

not used by rbind.data.frame, it's for the default and generic only

Details

old arguments

i,j: column and row subscripts
name: column name
x, value: that's up to you; I just have to include them here to stop RCMD CHECK from moaning... :/

See cbind documentation in base-R.

R's dispatch mechanism for rbind is as follows [my paraphrasing of base-R documentation]. Mostly, if any argument is a data frame then rbind.data.frame will be used. However, if one argument is a data frame but another argument is a scalar/matrix of a class that has an rbind method, then "default rbind" will be called instead. Although the latter still returns a data frame, it stuffs up e.g. class attributes, so that POSIXct objects will be turned into huge numbers. Again, if you really want a data frame result, make sure all the arguments are data frames.

In mvbutils:::rbind.data.frame (and AFAIK in the base-R version), arguments that are not data frames are coerced to data frames, by calling data.frame() on them. AFAICS this works predictably for list and matrix arguments; note that lists need names, and matrices need column names, that match the names of the real data frame arguments, because column alignment is done by name not position. Behaviour for scalars is IMO weird; see Examples. The idea seems to be to turn each scalar into a single-row data frame, coercing its names and truncating/replicating it to match the columns of the first real data frame argument; any names of the scalar itself are disregarded, and alignment is by position not name. Although mvbutils:::rbind.data.frame tries to mimic this coercion, it seems to me unnecessary (the user should just turn the scalar into something less ambiguous), confusing, and dangerous, so mvbutils issues a warning. Whether I have duplicated every quirk, I'm not sure.

Note also that R's accursed drop=TRUE default means that things you might reasonably think should be data frames, might not be. Under some circumstances, this might result in rbind.data.frame being bypassed. See Examples.

Short of rewriting data.frame and rbind, there's nothing mvbutils can do to fix these quirks. Whether base-R should consider any changes is another story, but back-compatibility probably suggests not.

Value

[Taken from the base-R documentation, modified to fit the mvbutils version] The rbind data frame method first drops any NULL arguments, then coerces all others to data frames (see Details for how it does this with scalars). Then it drops all zero-column arguments. (If that leaves none, it returns a zero-column zero-row data frame.) It then takes the classes of the columns from the first argument, and matches columns by name (rather than by position). Factors have their levels expanded as necessary (in the order of the levels of the levelsets of the factors encountered) and the result is an ordered factor if and only if all the components were ordered factors. (The last point differs from S-PLUS.) Old-style categories (integer vectors with levels) are promoted to factors. Zero-row arguments are kept, so that in particular their column classes and factor levels are taken account of. Because the class of each column is set by the first data frame, rather than "by consensus", numeric/character/factor conversions can be a bit surprising especially where NAs are involved. See the final bit of EXAMPLES.

Examples

# mvbutils versions are used, unless base:: or baseenv() gets mentioned
# Why base-R dropping of zero rows is odd
rbind( data.frame( x='yes', y=1)[-1,], data.frame( x='no', y=0))$x # mvbutils
#[1] no
#Levels: yes no # two levels
base::rbind( data.frame( x='yes', y=1)[-1,], data.frame( x='no', y=0))$x # base-R
#[1] no
#Levels: no # lost level
rbind( data.frame( x='yes', y=1)[-1,], data.frame( x='no', y=0, stringsAsFactors=FALSE))$x
#[1] no
#Levels: yes no
base::rbind( data.frame( x='yes', y=1)[-1,], data.frame( x='no', y=0, stringsAsFactors=FALSE))$x
#[1] "no" # x has turned into a character
# Quirks of scalar coercion
evalq( rbind( data.frame( x=1), x=2, x=3), baseenv()) # OK I guess
#   x
#1  1
#x  2
#x1 3
evalq( rbind( data.frame( x=1), x=2:3), baseenv()) # NB lost element
#  x
#1 1
#x 2
evalq( rbind( data.frame( x=1, y=2, z=3), c( x=4, y=5)), baseenv())
# NB gained element! Try predicting z[2]...
#  x y z
#1 1 2 3
#2 4 5 4
evalq( rbind( data.frame( x='cat', y='dog'), cbind( x='flea', y='goat')), baseenv()) # OK
#     x    y
#1  cat  dog
#2 flea goat
evalq( rbind( data.frame( x='cat', y='dog'), c( x='flea', y='goat')), baseenv()) # Huh?
#Warning in `[<-.factor`(`*tmp*`, ri, value = "flea") :
#  invalid factor level, NAs generated
#Warning in `[<-.factor`(`*tmp*`, ri, value = "goat") :
#  invalid factor level, NAs generated
#     x    y
#1  cat  dog
#2 <NA> <NA>
evalq( rbind( data.frame( x='cat', y='dog'), c( x='flea')), baseenv()) # Hmmm...
#Warning in `[<-.factor`(`*tmp*`, ri, value = "flea") :
#  invalid factor level, NAs generated
#Warning in `[<-.factor`(`*tmp*`, ri, value = "flea") :
#  invalid factor level, NAs generated
#     x    y
#1  cat  dog
#2 <NA> <NA>
try( evalq( rbind( data.frame( x='cat', y='dog'), cbind( x='flea')), baseenv())) # ...mmmm...
#Error in rbind(deparse.level, ...) :
#  numbers of columns of arguments do not match
# Data frames that aren't:
data.frame( x=1,y=2)[-1,] # a zero-row DF-- OK
# [1] x y
# <0 rows> (or 0-length row.names)
data.frame( x=1)[-1,] # not a DF!?
# numeric(0)
data.frame( x=1)[-1,,drop=FALSE] # OK, but exceeeeeedingly cumbersome
# <0 rows> (or 0-length row.names)
# Implications for rbind:
rbind( data.frame( x='yes')[-1,], x='no')
#  [,1]
# x "no" # rbind.data.frame not called!
rbind( data.frame( x='yes')[-1,,drop=FALSE], x='no')
#Warning in rbind(deparse.level, ...) :
#  risky to supply scalar argument(s) to 'rbind.data.frame'
#   x
#x no
# Quirks of ordering and character/factor conversion:
rbind( data.frame( x=NA), data.frame( x='yes'))$x
#[1] NA    "yes" # character
rbind( data.frame( x=NA_character_), data.frame( x='yes'))$x
#[1] <NA> yes
#Levels: yes # factor!
rbind( data.frame( x='yes'), data.frame( x=NA))$x[2:1]
#[1] <NA>  yes
#Levels: yes # factor again
x1 <- data.frame( x='yes', stringsAsFactors=TRUE)
x2 <- data.frame( x='no', stringsAsFactors=FALSE)
rbind( x1, x2)$x
# [1] yes no
# Levels: yes no
rbind( x2, x1)$x
# [1] "no"  "yes"
# sigh...

Read text lines from a connection

Description

Reads text lines from a connection (just like readLines), but optionally only until a specfied string is found.

Usage

  readLines.mvb( con=stdin(), n=-1, ok=TRUE, EOF=as.character( NA), line.count=FALSE)

Arguments

con

A connection object or a character string.

n

integer. The (maximal) number of lines to read. Negative values indicate that one should read up to the end of the connection.

ok

logical. Is it OK to reach the end of the connection before ‘n > 0’ lines are read? If not, an error will be generated.

EOF

character. If the current line matches the EOF, it's treated as an end-of-file, and the read stops. The connection is left OPEN so that subsequent reads work.

line.count

(default FALSE) see Value.

Details

Apart from stopping if the EOF line is encountered, and as noted with line.count==TRUE, behaviour should be as for readLines.

Value

A character vector of length the number of lines read. If line.count==TRUE, it will also have an attribute "line.count" showing the number of lines read.

Examples

tt <- tempfile()
cat( letters[ 1:6], sep="\n", file=tt)
the.data <- readLines.mvb( tt, EOF="d")
unlink( tt)
the.data # [1] "a" "b" "c"

Remove object(s) from maintained package

Description

Remove object(s) from maintained package. If the package is loaded, then objects are also removed from the search path version if any, the namespace if any, any importing namespaces, and any S3 method table. remove.from.package is a synonym. You will be prompted about whether to auto-save the maintained package.

Usage

rm.pkg( pkg, ..., list = NULL, save.=NA)
# remove.from.package( pkg, ..., list=NULL)
remove.from.package( ...) # really has same args as 'rm.pkg'

Arguments

pkg

(string, or environment) package name or environment, e.g. ..mypack

...

unquoted object names to remove

list

character vector alternative to ..., which is ignored if list is set

save.

For internal use— leave this alone!

Details

For now, methods are only removed from the base S3 methods table; if new S3 generics have been defined in loaded packages, and you are trying to remove a method for such a generic, then it won't be removed. I could implement this feature if anyone really wants it.

Examples

## Not run: 
if( FALSE && is_very_annoying( CRAN)){ # otherwise CMD CHECK --as-cran tries to run this :/
  rm.pkg( "mypackage", foo, bar)
  rm.pkg( "mypackage", list=cq( foo, bar))
  rm.pkg( ..mypackage, list=cq( foo, bar))
} # if F

## End(Not run)

Avoid clashing package imports

Description

Suppose your package imports various other packages. Despite the pious advice about only selectively importing certain functions, it's perfectly fine and very convenient to just be able to import(otherpack) in your NAMESPACE (or the equivalent, in whatever system you use to auto-generate the NAMESPACE). One annoyance, though, is that several packages may export different things with the same name, which leads to disconcerting warnings about "replacing import..." when your package is loaded. This can be circumvented in NAMEPACE using this syntax:

But you sure don't want to have to work out those NAMESPACE details yourself, so this function does it for you.

Mostly, this function will be called invisibly during make.NAMESPACE during pre.install, but you could try it yourself for more manual things.

Usage

screen_masked_imports( imports, myfuns)

Arguments

imports

names of packages that yours imports (or Depends on)

myfuns

functions in your package. You don't want to import other functions with the same name, because those imports will just be overridden.

Value

List of character vectors, one element per imports, saying which if any functions exported by that importee should not be imported via your NAMESPACE.

Find functions/objects/flatdoc-documentation containing a regexp.

Description

Search one or more environments for objects that contain a regexp. Within each environment, check (i) all functions, and possibly (ii) the "doc" attributes of all functions, and possibly (iii) "scripts" and "documentation" ie character objects whose name ends with ".r" or ".R" or ".doc" (or a specified regexp).

This is a convenience function that suits the way I work, and has evolved to match that without breaking compatibility too much; in particular, the arguments doc, code.only, and scripts are not what I would now design from scratch! So it might seem or behave a bit odd(ly) for you.

Usage

search.for.regexpr( pattern, where=1, lines=FALSE,
    doc=FALSE, code.only=FALSE, scripts=TRUE, ...)

Arguments

pattern

the regexp

where

an environment, something that can be coerced to an environment (so the default corresponds to .GlobalEnv), or a list of environments or things that can be coerced to environments.

lines

if FALSE, return names of objects mentioning the regexp. If TRUE, return the actual lines containing the regexp.

doc

if FALSE, search function source code only (unless scripts=TRUE; see below). Otherwise, also search the usual flatdoc places, i.e. "doc" attributes of functions, and certain character objects. If doc==TRUE, the name of those objects must end in ".doc"; otherwise, if doc is a string (length-1 character vector), then the names of the character object must grep that string; hence, doc="[.]doc$" is equivalent to doc=TRUE.

code.only

if FALSE, search only the deparsed version of "raw" code, so ignoring e.g. comments and "flatdoc" documentation.

scripts

if TRUE, look in character objects whose name ends with ".r" or ".R", in addition to any character-mode objects controlled by doc.

...

passed to grep– e.g. "fixed", "ignore.case".

Value

A list with one element per environment searched, containing either a vector of object names that mention the regexp, or a named list of objects & the actual lines mentioning the regexp.

Examples

## Not run: 
if( FALSE && is_very_annoying( CRAN)){ # otherwise CMD CHECK --as-cran tries to run this :/
  # On my own system's ROOT task (i.e. workspace--- see ?cd)
  search.for.regexpr( 'author', doc=FALSE)
  # $.GlobalEnv
  # [1] "cleanup.refs"
  # the code to function 'cleanup.refs' contains "author"
  search.for.regexpr( 'author', doc=TRUE)
  # $.GlobalEnv
  # [1] "scrunge"
  # 'scrunge' is a function with a character attribute that contains "author"
  search.for.regexpr( 'author', doc='p')
  #$.GlobalEnv
  # [1] "scrunge" "p1"      "p2"
  ## 'scrunge' again, plus two character vectors whose names contain 'p'
} # if F

## End(Not run)

Locate loaded tasks on search path.

Description

Returns the search positions of loaded tasks, with names showing the attached branch of the tree– see Examples.

Usage

search.task.trees()

Value

Increasing numeric vector with names such as "ROOT", "ROOT/top.task", "ROOT/top.task/sub.task".

Examples

## Not run: 
if( FALSE && is_very_annoying( CRAN)){ # otherwise CMD CHECK --as-cran tries to run this :/
  search.task.trees() # c( ROOT=1) if you haven't used cd yet
  cd( mytask)
  search.task.trees() # c( "ROOT/mytask"=1, ROOT=2)
} # if F

## End(Not run)

Obsolete but automatic finalization for persistent objects created in C.

Description

[Almost certainly obsolete; .Call really is the way to go for newer code, complexity notwithstanding.]

Suppose you want to create persistent objects in C– i.e. objects that can be accessed from R by subsequent calls to C. The usual advice is that .C won't work safely because of uncertain disposal, and that you should use .Call and "externalptr" types instead. However, .Call etc is very complicated, and is much harder to use than .C in e.g. numerical settings. As an alternative, set.finalizer provides a safe way to ensure that your .C-created persistent object will tidy itself up when its R pointer is no longer required, just as you can with externalptr objects. There is no need for on.exit or other precautions.

Usage

# Always assign the result to a variable-- usually a temporary var inside a function...
# ... which R will destroy when the function ends. EG:
# keeper <- set.finalizer( handle, finalizer.name, PACKAGE=NULL)
set.finalizer( handle, finalizer.name, PACKAGE=NULL)

Arguments

handle

[integer vector]. Pointer to your object, of length 1 on 32-bit systems or 2 on 64-bit systems. Will have been returned by your object-creation function in C.

finalizer.name

Preferably a "native symbol" corresponding to a registered routine in a DLL; alternatively a string that names your .C-callable disposal routine. The routine must take exactly one argument, a 32-bit or 64-bit integer (the handle).

PACKAGE

[string] iff finalizer.name is character, this is a PACKAGE argument that specifies the DLL.

Details

You must assign the result to a variable, otherwise your object will be prematurely terminated!

set.finalizer provides a wrapper for R's own reg.finalizer, setting up a dummy "trigger" environment with a registered finalizer. The trigger is defined as an environment rather than the more obvious choice of an external pointer, because the latter would require me to get fancy with .Call. The role of reg.finalizer is to prime the trigger, so that when the trigger is subsequently garbage-collected, your specified .C function is called to do the finalization.

Note that finalization will only happen after all copies of keeper have been deleted. If you make a "temporary" copy in the global environment, remember to delete it! (Though presumably finalizers are de-registered if R is restarted and the keeper is reloaded, so there shouldn't be cross-session consequences.). Finalization won't necessarily happen immediately the last copy is deleted; you can call gc() to force it.

Value

A list with elements handle and trigger, the second being the environment that will trigger the call when discarded. The first is the original handle; it has storage mode integer so, as per Examples, you don't need to coerce it when subsequently passing it to .C.

Examples

## Not run: 
if( FALSE && is_very_annoying( CRAN)){ # otherwise CMD CHECK --as-cran tries to run this :/
  myfun <- function( ...) {
    ...0
    # Create object, return pointer, and ensure safe disposal
    keeper <- set.finalizer( .C( "create_thing", handle=integer(2), ...1)$handle,
        "dispose_of_thing")
    "cause" + "crash" # whoops, will cause crash: but finalizer will still be called
    # "dispose_of_thing" had better be the name of a DLL routine that takes a...
    # ... single integer argument, of length 1 or 2
    # Intention was to use the object. First param of DLL routine "use_thing" should
    # be pointer to thing.
    .C( "use_thing", keeper$handle, ...2)
  }
  myfun(...)
} # if F

## End(Not run)

Hook of some kind

Description

I have forgotten what this function is for, but probably the only reason it's documented is to make sure it's exported... My advice: don't use it! (But it is used by package debug and others.)

Usage

set.presave.hook.mvb( hook, set=TRUE)

Arguments

hook

can't remember

set

boolean...

Cacheing objects for lazy-load access

Description

Manually setup existing reference objects– rarely used explicitly.

Usage

setup.mcache( envir, fpath, refs)

Arguments

envir

environment or position on the search path.

fpath

directory where "obj*.rda" files live.

refs

which objects to handle– all names in the mcache attribute of envir, by default

Details

Creates an active binding in envir for each element in refs. The active binding for an object myobj will be a function which keeps the real data in its own environment, reading and writing it as required. Writing a new value will give attr( envir, "mcache")[ "myobj"] a negative sign. This signals that the "obj*.rda" file needs updating, and the next Save (or move or cd) command will do so. [The "*" is the absolute value of attr( envir, "mcache")[ "myobj"].] One wrinkle is that the "real data" is initially a promise created by delayedAssign, which will fetch the data from disk the first time it is needed.

Author(s)

Mark Bravington

Generalized version of find

Description

Looks for objects that regex-match pattern, in all attached workspaces (as per search()) and any maintained packages (see maintain.packages).

Usage

sleuth(pattern, ...)

Arguments

pattern

regex

...

other args to grep, e.g. perl=TRUE or ignore.case=TRUE

Value

A list of environments containing one or more matching objects, with the object names returned as a character vector within each list element.

Examples

sleuth( '^rm')
# On my setup, that currently gives:
#$ROOT
#[1] "rmsrc"
#
#$`package:stats`
#[1] "rmultinom"
#
#$`package:base`
#[1] "rm"
#
#$mvbutils
#[1] "rm.pkg"
#
#$handy2
#[1] "rmultinom"
#

Read R code and data from a file or connection

Description

source.mvb is probably obsolete as of mvbutils 2.11.0; see docattr. Anyway, it works like source(local=TRUE), except you can intersperse free-format data into your code. current.source returns the connection that's currently being read by source.mvb, so you can redirect input accordingly. To do this conveniently inside read.table, you can use from.here to read the next lines as data rather than R code.

Usage

source.mvb( con, envir=parent.frame(), max.n.expr=Inf,
  echo=getOption( 'verbose'), print.eval=echo,
  prompt.echo=getOption( 'prompt'), continue.echo=getOption( 'continue'))
current.source()
from.here( EOF=as.character(NA)) # Don't use it like this!
# Use "from.here" only inside "read.table", like so:
# read.table( file=from.here( EOF=), ...)

Arguments

con

a filename or connection

envir

an environment to evaluate the code in; by default, the environment of the caller of source

max.n.expr

finish after evaluating max.n.expr complete expressions, unless file ends first.

EOF

line which terminates data block; lines afterwards will again be treated as R statements.

...

other args to read.table

echo, print.eval, prompt.echo, continue.echo

as per source

Details

Calls to source.mvb can be nested, because the function maintains a stack of connections currently being read by source.mvb. The stack is stored in the list source.list in the mvb.session.info environment, on the search path. current.source returns the last (most recent) entry of source.list.

The sequence of operations differs from vanilla source, which parses the entire file and then executes each expression in turn; that's why it can't cope with interspersed data. Instead, source.mvb parses one statement, then executes it, then parses the next, then executes that, etc. Thus, if you include in your file a call to e.g.

text.line <- readLines( con=current.source(), n=1)

then the next line in the file will be read in to text.line, and execution will continue at the following line. readLines.mvb can be used to read text whose length is not known in advance, until a terminating string is encountered; lines after the terminator, if any, will again be evaluated as R expressions by source.mvb.

After max.n.expr statements (i.e. syntactically complete R expressions) have been executed, source.mvb will return.

If the connection was open when source.mvb is called, it is left open; otherwise, it is closed.

If you want to use read.table or scan etc. inside a source.mvb file, to read either a known number of lines or the rest of the file as data, you can use e.g. read.table( current.source(), ...).

If you want to read.table to read an unknown number of lines until a terminator, you could explicitly use readLines.mvb, as shown in the demo "source.mvb.demo.R". However, the process is cumbersome because you have to explicitly open and close a textConnection. Instead, you can just use read.table( from.here( EOF=...), ...) with a non-default EOF, as in Usage and the same demo (but see Note). from.here shouldn't be used inside scan, however, because a temporary file will be left over.

current.source() can also be used inside a source file, to work out the source file's name. Of course, this will only work if the file is being handled by source.mvb rather than source.

If you type source.list at the R command prompt, you should always see an empty list, because all source.mvb calls should have finished. However, the source list can occasionally become corrupt, i.e. containing invalid connections (I have only had this happen when debugging source.mvb and quitting before the exit code can clean up). If so, you'll get an error message on typing source.list (?an R bug?). Normally this won't matter at all. If it bothers you, try source.list <<- list().

Value

source.mvb returns the value of the last expression executed, but is mainly called for its side-effects of evaluating the code. from.here returns a connection, of class c( "selfdeleting.file", "file", "connection"); see Details. current.source returns a connection.

Limitations

Because source.mvb relies on pushBack, con=stdin() won't work.

Note

from.here creates a temporary file, which should be automatically deleted when read.table finishes (with or without an error). Technically, the connection returned by from.here is of class selfdeleting.file inheriting from file; this class has a specific close method, which unlinks the description field of the connection. This trick works inside read.table, which calls close explicitly, but not in scan or closeAllConnections, which ignore the selfdeleting.file class.

from.here() without an explicit terminator is equivalent to readLines( current.source()), and the latter avoids temporary files.

Examples

# You wouldn"t normally do it like this:
tt <- tempfile()
cat( "data <- scan( current.source(), what=list( x=0, y=0))",
"27 3",
"35 5",
file=tt, sep="\n")
source.mvb( tt)
unlink( tt)
data # list( x=c( 27, 35), y=c(3, 5))
# "current.source", useful for hacking:
tt <- tempfile()
cat( "cat( \"This code is being read from file\",",
"summary( current.source())$description)", file=tt)
source.mvb( tt)
cat( "\nTo prove the point:\n")
cat( scan( tt, what="", sep="\n"), sep="\n")
unlink( tt)

Exclude "missing" objects

Description

To be called inside a function, with a character vector of object names in that function's frame. strip.missing will return all names except those corresponding to formal arguments which were not set in the original call and which lack defaults. The output can safely be passed to get.

Usage

strip.missing( obs)

Arguments

obs

character vector of object names, often from ls(all=TRUE)

Details

Formal arguments that were not passed explicitly, but which do have defaults, will not be treated as missing; instead, they will be set equal to their evaluated defaults. This could cause problems if the defaults aren't meant to be evaluated.

Author(s)

Mark Bravington

Examples

funco <- function( first, second, third) {
  a <- 9
  return( do.call("returnList", lapply( strip.missing( ls()), as.name)))
}
funco( 1) # list( a=9, first=1)
funco( second=2) # list( a=9, second=2)
funco( ,,3) # list( a=9, third=3)
funco2 <- function( first=999) {
  a <- 9
  return( do.call("returnList", lapply( strip.missing( ls()), as.name)))
}
funco2() # list( a=9, first=999) even tho' "first" was not set

Organizing R workspaces

Description

Returns file path to current task, or to a file in that task.

Usage

# Often: task.home()
task.home(fname)

Arguments

fname

file name, a character(1)

Details

Without any arguments, task.home returns the path of the current task. With a filename argument, the filename is interpreted as relative to the current task, and its full (non-relative) path is returned.

task.home is almost obsolete in R, since the working directory tracks the current task. It is more important in the S+ version of mvbutils.

Author(s)

Mark Bravington

Examples

## Not run: 
if( FALSE && is_very_annoying( CRAN)){ # otherwise CMD CHECK --as-cran tries to run this :/
  task.home( "myfile.c") # probably the same as file.path( getwd(), "myfile.c")
  task.home() # probably the same as getwd()
} # if F

## End(Not run)

Convert existing source package into task package

Description

Converts an existing source package into a task package. A subdirectory with the package name will be created under the current working directory, and will be populated with a ".RData" file and various other files/directories from the source package. All Rd files will be turned into flat-format help in the ".RData", either attached to functions or as stand-alone "*.doc" text objects, as per help2flatdoc. The subdirectory will also be made into a task, i.e. it will be added to the "tasks" vector in the current workspace that cd uses to keep track of the task hierarchy.

Roxygen pre-function comments are by default removed; see keep_Roxygen argument.

Usage

unpackage(spath, force = FALSE, alias=x,
    drop_Conly=TRUE, trust_importFrom=FALSE,
    zap_Cloaders= TRUE, keep_Roxygen= FALSE)

Arguments

spath

where to find the source package

force

if TRUE, overwrite any previous contents of task package without prompting.

alias

if you don't want the package to go into a folder (and task) with its obvious name, then set eg alias="temp_mypack"; useful for comparing versions.

drop_Conly

?exclude pure R wrappers of dot-Call/dot-External? Normally (I presume) those are auto-generated, and will be regenerated by pre.install in the "Cloaders" business, so this should be OK.

trust_importFrom

Whether to force-copy all importFrom() NAMESPACE directives into the maintained package. Used to be TRUE by default, but I think FALSE is better...

zap_Cloaders

whether to remove run_Cloaders_<mypack>, if it's present; though it shouldn't really be, I think, cos it seems to prevent installation. (It's regenerated by pre.install automatically.) IOW, run_Cloaders_<mypack> should be removed before a git push unless you are using mvbutils package building system.

keep_Roxygen

I thoroughly dislike Roxygen, and my pref is to generate the help for each function— stored in its docattr attribute— directly from the dot-Rd file. If anyone ever really wants Roxygen, they can try Rd2Roxygen (beware the bugs...). So my default is FALSE, but if you set TRUE then unpackage will make a big effort to store the original Roxybollocks.

Details

The NAMESPACE file won't be copied; instead, it will be auto-generated by pre.install. Therefore, some features of the original NAMESPACE may be lost. You can either copy the NAMESPACE manually (in which case, you'll need to maintain it by hand), or write a "pre.install.hook.MYPACK" function.

The DESCRIPTION file becomes a character vector called e.g. <mypack>.DESCRIPTION, of class "cat" (see as.cat).

Functions in the ".RData" may be saved with extra attributes, in particular "doc" (deduced from a dot-Rd file) but perhaps other things too that they acquire after the package code is sourced. Attributes that are character vector will acquire class "docattr", so that they won't be fully displayed during default printing of the function; to see them, use e.g. as.cat( attr( myfun, "myatt")) or unclass( attr( myfun, "myatt")) or, if you are using the atease package, just as.cat( myfun@myatt). Editing the function with fixr will display the character attributes in full.

Any environment objects found in the package's environment (its namespace environment) will be dropped from the ".RData" file, with a warning; this is to avoid dramas on reloading.

Build vignette(s) for mvbutils-style package

Description

Vignette-building is insanely complicated (though this might be hidden from you) and can be slow. So it's not handled directly by pre.install, patch.install, and friends. Doing build.pkg and install.pkg will work normally, but if you want to change a vignette in an installed package without complete re-installation, then you have to manually (re)build vignette(s) and indices. vignette.pkg should do that for you.

The bare-bones usage (which is not the recommended way— see next para) is to copy all files (and folders) from the task's "vignettes" folder into the source package's "vignettes" folder (after zapping the latter). If build=TRUE (the default) it will then build the vignettes in the installed package (that's just how R does it, for whatever reason).

Also (recommended!), there can be an intermediate level of vignette, where all the calculations/plots are already done and saved, and the precompiled vignette is just ready to be turned into HTML and/or PDF, something which should be fairly quick for whatever machine ends up doing it (eg an R-universe or CRAN server). If you give your original vignette files the extension ".Rmd.orig", then an R script "precomp.R" will be created by pre.install in the task package vignette. It is a very simple script that mainly just shows the knitr command to use. vignette.pkg(...,precompile=TRUE) will then run that script to precompile all the vignettes (which can be slow, of course) in the task package "vignettes" folder, producing ".Rmd" files that are precompiled, along with figure files etc in subfolders. If you are precompiling, you can use decache=TRUE to clear the cache automatically; see Arguments.

Precompilation happens only if precompile=TRUE. Copying the "vignettes" folder always happens, unless precompile=TRUE and precompilation fails, in which case the function aborts. After copying, building happens unless build=FALSE. Index reconstruction happens only if some building has taken place.

To do

I like PDFs. It would be great to produce two versions, one in PDF so you can peruse it while not glued to your goddamn screen. But R has no mechanism to formally xtuple rendered versions of a vignette (you get N=1), and suggestions to the contrary seem to elicit the usual CRANiac bleating. TBF the PDF produced by putting the following in place of the "rmarkdown::html_vignette" line doesn't look very pretty anyway, but no doubt it could be tweaked with lashing of options:

pdf_document:

latex_engine: pdflatex
pandoc_args: [ –listings ]

The closest approach is something like this, suggested by Peter Harrison in 2018, to make a special-prupose knit function that produces all outputs.

title: "multiple outputs"

output:

pdf_document: <blah options>

rmarkdown::html_vignette: <blah options>

knit: (function(inputFile, encoding) {

rmarkdown::render(inputFile, encoding = encoding,

output_dir = "output", output_format = "all") })

which does at some stage produce both files, but it looks like R tries to delete anything exectp the PDF. Genius. Also, there's a dot-log file that creeps into the source package somehow.

One manualesque version (on Windows) might be to print the HTML to a PDF using a browser from CLI. This from

  https://superuser.com/questions/1537277/how-can-i-print-html-file-to-pdf-from-command-line-in-windows-10-without-admin
  "C:\Program Files\Microsoft\Edge\Application\msedge.exe" --headless --print-to-pdf="c:\outdir\out.pdf" "c:\indir\in.html"

Usage

vignette.pkg( pkg, pattern= "[.]Rmd$",
  character.only= FALSE, precompile= FALSE,
  build= TRUE, decache= FALSE,
  ...)

Arguments

pkg

Name of package; see pre.install for options

pattern

Regex to select vignette files (only if build is TRUE). Default matches anything ending ".Rmd". You can specify a complete filename to only do that one vignette.

character.only

for automated use; see pre.install

precompile

?run the "precomp.R" script in the source package vignettes folder?

build

?should the vignettes be rebuilt?

decache

if TRUE and if precompile is TRUE, then the envar "VIGDECACHE_<PKG>" will be set to "yes" when knit is called (note that the package name will be capitalized). You can add some code to your vignette to clear the cache iff that envar exists, as in Examples.

...

passed to tools::buildVignette (qv)

Value

A character vector of all files that were built. If there are errors during the build process, you should see on-screen messages.

Examples

## Not run: 
if( FALSE && is_very_annoying( CRAN)){ # otherwise CMD CHECK --as-cran tries to run this :/
  vignette.pkg( kinference)
  # Code you can maybe add at the start of your vignette, to optionally clear the cache as
  # per 'decache' arg
  knitr::opts_chunk$set( cache.rebuild=nzchar( Sys.getenv( 'VIGDECACHE_<PKG>')))
} # if F

## End(Not run)

Make a function autoprint all its doings

Description

Occasionally you want a function to do a whole bunch of things, and print the results as it goes along; you might be thinking about executing its code directly in a "script", but you don't want to be cluttering up the workspace. If so, you can wrap the body of your function in a call to visify, and R will act as if you source the corresponding script. The output isn't particularly beautiful, but it's jolly handy for eg routine diagnostics when fitting a series of models.

You can also use visify inside a function, to just display certain bits. (Well, not entirely; as of 2.9.18, any code before visify always seems to be shown. But at least you can hide the return value.) For example, it's sometimes useful to not visify the entire return-value of a function, even though you want to "show the rest of your working". In that case, you can just not return the value within the visify block, but separately afterwards; see Examples.

visify is experimental as of package mvbutils v2.9.228, and I might add features over time, eg to make output look prettier and give better options for hiding things. At present, it deliberately strips all continuation lines of input, so you only see the first line of each statement (a blessing IME so far). More could be done; but I don't yet understand how all the options to source and withAutoprint work, and this is really a convenience function rather than something intended for producing report-quality output.

To do

As of package mvbutils v 2.9.23 and package debug v 1.4.18, the latter does not currently handle visify nicely (in contrast to eg evalq, which is operationally very similar except WRTO output). What you have to do for now, is manually replace the call to visify with a call to evalq (and remove any extraneous arguments).

This kinda thing comes up often. I really need a general mechanism in the debug package, for any other package (or the user) to supply an mtrace-able version of their pet function.

Usage

# Never use it like this...
visify(exprs, local = parent.frame(), prompt.echo='', ...)
# ... always like this, for an entire function:
# my_autoprinting_function <- function(<args>) visify( {<body>})
# ... or just as part of one:
# my_part_apf <- function( <args>){ visify({<shown>}); <posthoc-and-returns>}

Arguments

exprs

The body of your function

local

Normally leave this alone; it's the environment to run exprs in

prompt.echo

what to print at the start of each line. Default is nothing.

...

other args to withAutoprint, such as max.deparse.length or width.cutoff.

Details

Compound statements, such as if and for, are not printed "internally", only the final outcome (which is NULL for for). The first line of the compound code is still printed, though.

If you want certain statements in your function to execute without autoprinting their output (eg because it is an enormous and cluttery intermediate calculation), wrap it or them in curlies, a la { <dont; show; these; outputss>; NULL}. Again, the first line of code will be printed regardless— so you could just make into a "Hide me!" comment, as per Examples.

The trick behind visify is to use withAutoprint, but it's not obvious exactly how to do so. I was encouraged by:

https://stackoverflow.com/questions/58285497

However, I did not use exactly the solutions there, because I wanted a slightly different "flow".

Examples

# Basic: show it all
tv1 <- function( xx) visify({
  yy <- xx + 1
  # Comments show up, too...
  for( i in 1:5) yy <- yy + 1
  # ... but loops only show end result; ditto ifs
  # and other compound statements
  xx <- xx+1
  xx <- xx + 2
  xx*yy
})
tv1( 3)
# Note the use of max.deparse.length param. Also try width.cutoff
tv2 <- function( xx, MDL=Inf) visify( max.deparse.length=MDL, {
  yy <- xx + 1
  # Comments show up, too...
  for( i in 1:5) yy <- yy + 1
  # Dont' want to show gory details of next "block"
  { # HIDE ME!
    xx <- xx+1
    xx <- xx + 2
    NULL # that's all you'll see
  }
  xx*yy
})
tv2( 3)
tv2( 3, MDL=100)
tv2( 3, MDL=50) # too terse
# Hide boring stuff
tv2 <- function( xx){
  # I don't understand why this bit _before_ visify() is shown
  yy <- xx + 1
  for( i in 1:5) yy <- yy + 1
  visify({
    xx <- xx+1
    xx <- xx + 2
    NULL # that's all you'll see
  })
  # At least the return-value isn't!
invisible( rep( xx*yy, 9999)) # don't wanna show this!
}
tv2( 3)

Extract subset and warn about omitted cases

Description

Extract row-subset of a data.frame according to a condition. If any cases (rows) are omitted, they are listed with a warning. Rows where the condition gives NA are omitted.

Usage

# This is the obligatory format, and is not very useful; look at EXAMPLES instead
warn.and.subset(x, cond,
    mess.head=deparse( substitute( x), width.cutoff=20, control=NULL, nlines=1),
    mess.cond=deparse( substitute( cond), width.cutoff=40, control=NULL, nlines=1),
    row.info=rownames( x), sub=TRUE)

Arguments

x

data.frame

cond

expression to evaluate in the context of data.frame. If sub=TRUE (the default), this will be substituted. If sub=FALSE, you can use a pre-assigned expression; in that case, you had better set mess.cond manually.

mess.head

description of data.frame (e.g. its name) for use in a warning.

mess.cond

description of the desired condition for use in a warning.

row.info

character vector that will describe rows; omitted elements appear in the warning

sub

should cond be treated as a literal expression to be evaluated, or as a pre-computed logical index?

# ...: just there to keep RCMD CHECK happy– for heaven's sake...

Value

The subsetted data.frame.

Examples

df <- data.frame( a=1:3, b=letters[1:3])
df1 <- warn.and.subset( df, a %% 2 == 1, 'Boring example data.frame', 'even-valued "a"')
condo <- quote( a %% 2 == 1)
df2 <- warn.and.subset( df, condo, 'Same boring data.frame', deparse( condo), sub=FALSE)

Sourceable code for functions (and more) with flat-format documentation

Description

Works like write for functions without flat documentation (i.e. without a "doc" attribute). If a "doc" attribute exists, the file is written in a form allowing it to be edited and then read back in with "source.mvb"; the "doc" attribute is given as free-form text following the function definition. If applied to a non-function with a "source" attribute, just the source attribute is printed; the idea is that this could be read back by source (or source.mvb), probably in the course of FF after fixr, to regenerate the non-function object.

Usage

write.sourceable.function( x, con, append=FALSE, print.name=FALSE,
    doc.special=TRUE, xn=NULL)

Arguments

x

function or other object, or the name thereof, that is to be written. If x is not a function, then it must have an attribute "source".

con

a connection or filename

append

if "con" is not already open, should it be appended to rather than overwritten?

print.name

should output start with "NAME" <- (where NAME is deduced from x)?

doc.special

TRUE if doc attribute is to be printed as flat doc– assumes readback via source.mvb

xn

(string) can set this to be the name of the function if print.name is TRUE

Details

If x is unquoted and print.name=TRUE, the name is obtained from deparse( substitute( x)). If x is a character string, the name is x itself and the function printed is get(x).

The real criterion for an attribute to be output in flatdoc-style, is not whether the attribute is called doc, but rather whether it is a character-mode object of class docattr. You can use this to force flatdoc-style output of several doc-like attributes.

The default EOF line for an attribute is <<end of doc>>, but this will be adjusted if it appears in the attribute itself.

Examples

## Not run: 
if( FALSE && is_very_annoying( CRAN)){ # otherwise CMD CHECK --as-cran tries to run this :/
  write.sourceable.function( write.sourceable.function, "wsf.r")
  # To dump all functions and their documentation in a workspace into a single sourceable file:
  cat( "", file="allfuns.r")
  sapply( find.funs(), write.sourceable.function, con="allfuns.r", append=TRUE, print.name=TRUE)
  # A non-function
  scrunge <- c( 1:7, 11)
  attr( scrunge, "source") <- c( "# Another way:", "c( 1:4, c( 5:7, 11))")
  scrunge # [1] 1 2 3 4 5 6 7 11
  write.sourceable.function( scrunge, stdout()) # source
  fixr( scrunge) # source
} # if F

## End(Not run)

Sourceable text for functions, including character attributes

Description

write_sourceable_function works like write, to produce a source()-friendly printout of a function. However, for the sake of clarity, any suitable character-vector attribute is printed as a multi-line raw string (see Quotes) wrapped in a call to docattr or to string2charvec, which will turn the string back into a character vector when the file is read back in by source. This hides a lot of ugliness, including escaped special characters and superfluous quotes. Character objects that you want attached to the function (but not inside its code) actually looks like the real thing! (They can be accessed by the function code since they live inside environment(sys.function()).) My own main use is a doc attribute for free-text documentation (which later gets turned into "Rd" format by doc2Rd when I produce my packages, but that's a detail). However, I quite often keep other text snippets too, eg "templates".

Raw strings didn't use to exist in R, so before version 2.10 of mvbutils, the alternative version write.sourceable.function (note the dots) instead relied the contortions of flatdoc and source.mvb and readLines.mvb to trick R into accepting unmodified text. None of that should be necessary now.

Obsolete: if write_sourceable_function is applied to a non-function with a "source" attribute, then just the source attribute is printed; the idea is that this could be read back by source, probably in the course of FF after fixr, to regenerate the non-function object. I don't think it's wise to rely on this....

Helpers

string2charvec, docattr, and simplest_name_generator are helper functions that you're unlikely to use yourself. For the record, though:

string2charvec turns a string (length-1 non-empty character vector with no attributes) into a character vector, with a new element for every newline. The first element is discarded, because it's usually just a linebreak (perhaps preceded by accidental spaces etc) inserted to let the "real" raw string start on a fresh line. string2charvec is called by docattr which facilitates keeping plain-text documentation directly with the function, as an attribute.
docattr is very similar, but adds an S3 class "docattr". It simplifies the code produced by write_sourceable_function for presenting the plain-text documentation. I don't recommend using docattr for anything except an attribute called "doc" that contains, yes, documenbloodytation.
simplest_name_generator prints an R symbol (a "name") in a way that could appear on the LHS of <symbol> <- 0. If the name is simple, with no funny characters in it, then it's not quoted and is left alone. If it contains mildly strange characters that would cause the unquoted version to not parse, then it's quoted. If it contains characters that would break simple quotes (for example, quotes or backticks!) then it's wrapped in a bullet-proof raw string. "Only the paranoid survive"...
cat_strings_rawly outputs (using cat) a character vector as a single raw string wrapped in a call to docattr (if its argument has class "docattr") or otherwise string2charvec. Thus, source will break up the raw string back into a separate element for each newline. (cat_strings_rawly is probably a bad name for this function, since it actually takes a character vector as input, not a string...). It calls cat directly, so you already need to have directed output to wherever you want, eg via sink.

Limitations

Some exotic language elements simply cannot be represented in sourceable text: for example, a "hard-coded" environment. A file will still be produced, but it won't work with source. There's no solution to such cases. For example:

  f <- function( e=.GlobalEnv) environmentName( e)
  formals( f)$e <- new.env()
  tf <- tempfile()
  write_sourceable_function( f, tf)
  source( tf)
  # ... complains about e = <environment>

Usage

write_sourceable_function( x, con, append=FALSE,
    print.name=FALSE, xn=NULL, prefix_package=TRUE, ...)
string2charvec( string)
simplest_name_generator( x)
cat_strings_rawly( x, prefix_package=TRUE)

Arguments

x

function or other object, or the name thereof, that is to be written by write_sourceable_function. If x is not a function, then it must have an attribute "source". For the helper functions, x is either a string itself (length-1 character vector), or for cat_strings_rawly a character vector.

con

a connection or filename

append

if "con" is not already open, should it be appended to rather than overwritten?

print.name

should output start with NAME <-, where NAME is deduced from x? Note that NAME will be processed by simplest_name_generator to make sure everything goes thru source nicely.

xn

(string) can set this to be the name of the function if print.name is TRUE

...

ignored, but allows calls that use old write.sourceable.function arguments

string

a string (length-1 character vector), presumably a "raw string" though R doesn't care.

prefix_package

Whether to prefix the call to docattr or string2charvec with mvbutils::. Should always be TRUE except when producing the R source code for mvbutils itself with KeepPlaintextDoco=YES in the "DESCRIPTION" file, since in that case those two functions won't be available as exports when the R source file is sourced.

Details

If x is unquoted and print.name=TRUE, the name is obtained from deparse( substitute( x)). If x is a character string, the name is x itself and the function printed is get(x).

The criteria for deciding whether to raw-string-ify an attribute are:

it must be mode character
it must have length>1 (otherwise there's little point)
it must not have any attributes, except perhaps an S3 class (e.g. no names, no dim)
it must not contain newline characters (since they would be confused with newlines inserted between elements).

Iff the attribute has S3 class "docattr", then cat_strings_rawly will wrap it in a call to mvbutils::docattr (which will mean it doesn't get full printed out at the console); otherwise, it will be wrapped in a call to mvbutils::string2charvec.

Examples

# This is from the examples for 'flatdoc'. It's there to illustrate plain-text documentation,
# but you can see the call to 'docattr' in the middle.
flubbo <- structure( function( x){
  ## A comment
  x+1
}
,doc=docattr( r"-{
flubbo       not-yet-in-a-package
'flubbo' is a function! And here is some informal doco for it. Whoop-de-doo!
Thanks to raw strings, which are wonderful (see 'Quotes' for extreeeeemly brief doco):
Look at 'flatdoc' examples for more on raw strings.
}-"))
## Don't run
# Infuriating CRAN check (ie that Don't run is ignored--- WTF?!) means I have to wrap this:
if( FALSE && is_very_annoying( "CRAN")){
  write_sourceable_function( write_sourceable_function, "wsf.r")
  # To dump all functions and their documentation in a workspace into a single sourceable file:
  cat( "", file="allfuns.r")
  sapply( find.funs(), write_sourceable_function,
      con="allfuns.r", append=TRUE, print.name=TRUE)
  # A non-function. Probably don't do this!
    scrunge <- c( 1:7, 11)
    attr( scrunge, "source") <- c( "# Another way:", "c( 1:4, c( 5:7, 11))")
    scrunge # [1] 1 2 3 4 5 6 7 11
    write_sourceable_function( scrunge, stdout()) # source
    fixr( scrunge) # source
} # if F
## End don't run

Package {mvbutils}

How to use the mvbutils package

Description

Interactive use for r life management

Housekeeping info

Redefined functions

Nicer posixt behaviour

Ess and mvbutils

Display bugs

Author(s)

See Also

Array into dataframe

Description

Offarray

Note

Usage

Arguments

Value

See Also

Examples

Pre-install-buildy hooks for compiled code

Description

Tidy stubs and symbols

Registering a pre install buildy hook

Regeneration checks

Makefile

Usage

Arguments

Value

Examples

data.frame.to.array package:mvbutils

Description

Note

Usage

Arguments

Value

D2a

See Also

Examples

Unload DLL easily

Description

Usage

Arguments

Value

Examples

Generate a negated version of your function. Useful for 'nlminb' etc.

Description

Usage

Arguments

Value

Examples

Markdownize & reverse NEWS object

Description

Usage

Arguments

Value

Examples

Stash variables in caller's environment

Description

Pedants corner

Usage

Arguments

Value

Examples

Save R objects

Description

Usage

Arguments

Details

Compression

History files

Author(s)

See Also

Examples

Skeletal flat-format documentation

Description

Usage

Arguments

Details

Value