Title: | Interface to the Google Cloud Machine Learning Platform |
Version: | 0.7.1 |
Description: | Interface to the Google Cloud Machine Learning Platform https://cloud.google.com/vertex-ai, which provides cloud tools for training machine learning models. |
Depends: | R (≥ 3.3.0), tfruns (≥ 1.3) |
Imports: | config, jsonlite, packrat, processx, rprojroot, rstudioapi, tools, utils, withr, yaml |
Suggests: | tensorflow (≥ 1.4.2), keras (≥ 2.1.2), knitr, rmarkdown, testthat |
License: | Apache License 2.0 |
SystemRequirements: | Python (>= 2.7.0) |
Encoding: | UTF-8 |
RoxygenNote: | 7.3.2 |
VignetteBuilder: | knitr |
URL: | https://github.com/rstudio/cloudml |
BugReports: | https://github.com/rstudio/cloudml/issues |
NeedsCompilation: | no |
Packaged: | 2025-08-18 22:43:04 UTC; tomasz |
Author: | Tomasz Kalinowski [cre], Daniel Falbel [aut], Javier Luraschi [aut], JJ Allaire [aut], Kevin Ushey [aut], RStudio [cph] |
Maintainer: | Tomasz Kalinowski <tomasz@posit.co> |
Repository: | CRAN |
Date/Publication: | 2025-08-18 23:50:51 UTC |
Interface to the Google Cloud Machine Learning Platform
Description
The cloudml package provides an R interface to Google Cloud Machine Learning Engine, a managed service that enables:
Scalable training of models built with the keras, tfestimators, and tensorflow R packages.
On-demand access to training on GPUs, including the new Tesla P100 GPUs from NVIDIA®.
Hyperparameter tuning to optimize key attributes of model architectures in order to maximize predictive accuracy.
Deployment of trained models to the Google global prediction platform that can support thousands of users and TBs of data.
Details
CloudML is a managed service where you pay only for the hardware resources that you use. Prices vary depending on configuration (e.g. CPU vs. GPU vs. multiple GPUs). See https://cloud.google.com/vertex-ai/pricing for additional details.
For documentation on using the R interface to CloudML see the package website at https://github.com/rstudio/cloudml
Author(s)
Maintainer: Tomasz Kalinowski tomasz@posit.co
Authors:
Daniel Falbel daniel@rstudio.com
Javier Luraschi
JJ Allaire
Kevin Ushey
Other contributors:
RStudio [copyright holder]
References
https://github.com/rstudio/cloudml
See Also
Useful links:
Deploy SavedModel to CloudML
Description
Deploys a SavedModel to CloudML model for online predictions.
Usage
cloudml_deploy(
export_dir_base,
name,
version = paste0(name, "_1"),
region = NULL,
config = NULL
)
Arguments
export_dir_base |
A string containing a directory containing an
exported SavedModels. Consider using |
name |
The name for this model (required) |
version |
The version for this model. Versions start with a letter and contain only letters, numbers and underscores. Defaults to name_1 |
region |
The region to be used to deploy this model. |
config |
A list, |
See Also
Other CloudML functions:
cloudml_predict()
,
cloudml_train()
Perform Prediction over a CloudML Model.
Description
Perform online prediction over a CloudML model, usually, created using
cloudml_deploy()
Usage
cloudml_predict(instances, name, version = paste0(name, "_1"), verbose = FALSE)
Arguments
instances |
A list of instances to be predicted. While predicting a single instance, list wrapping this single instance is still expected. |
name |
The name for this model (required) |
version |
The version for this model. Versions start with a letter and contain only letters, numbers and underscores. Defaults to name_1 |
verbose |
Should additional information be reported? |
See Also
Other CloudML functions:
cloudml_deploy()
,
cloudml_train()
Train a model using Cloud ML
Description
Upload a TensorFlow application to Google Cloud, and use that application to train a model.
Usage
cloudml_train(
file = "train.R",
master_type = NULL,
flags = NULL,
region = NULL,
config = NULL,
collect = "ask",
dry_run = FALSE
)
Arguments
file |
File to be used as entrypoint for training. |
master_type |
Training master node machine type. "standard" provides a basic machine configuration suitable for training simple models with small to moderate datasets. See the documentation at https://cloud.google.com/vertex-ai/docs/reference/rest/v1/MachineSpec for details on available machine types. |
flags |
Named list with flag values (see |
region |
The region to be used for training. |
config |
A list, |
collect |
Logical. If TRUE, collect job when training is completed
(blocks waiting for the job to complete). The default ( |
dry_run |
Triggers a local dry run over the deployment phase to validate packages and packing work as expected. |
See Also
job_status()
, job_collect()
, job_cancel()
Other CloudML functions:
cloudml_deploy()
,
cloudml_predict()
Examples
## Not run:
library(cloudml)
gcloud_install()
job <- cloudml_train("train.R")
## End(Not run)
Executes a Google Cloud Command
Description
Executes a Google Cloud command with the given parameters.
Usage
gcloud_exec(..., args = NULL, echo = TRUE, dry_run = FALSE)
Arguments
... |
Parameters to use specified based on position. |
args |
Parameters to use specified as a list. |
echo |
Echo command output to console. |
dry_run |
Echo but not execute the command? |
Examples
## Not run:
gcloud_exec("help", "info")
## End(Not run)
Initialize the Google Cloud SDK
Description
Initialize the Google Cloud SDK
Usage
gcloud_init()
See Also
Other Google Cloud SDK functions:
gcloud_install()
,
gcloud_terminal()
Install the Google Cloud SDK
Description
Installs the Google Cloud SDK which enables CloudML operations.
Usage
gcloud_install(update = TRUE)
Arguments
update |
Attempt to update an existing installation. |
See Also
Other Google Cloud SDK functions:
gcloud_init()
,
gcloud_terminal()
Examples
## Not run:
library(cloudml)
gcloud_install()
## End(Not run)
Create an RStudio terminal with access to the Google Cloud SDK
Description
Create an RStudio terminal with access to the Google Cloud SDK
Usage
gcloud_terminal(command = NULL, clear = FALSE)
Arguments
command |
Command to send to terminal |
clear |
Clear terminal buffer |
Value
Terminal id (invisibly)
See Also
Other Google Cloud SDK functions:
gcloud_init()
,
gcloud_install()
Gcloud version
Description
Get version of Google Cloud SDK components.
Usage
gcloud_version()
Value
a list with the version of each component.
Copy files to / from Google Storage
Description
Use the gsutil cp
command to copy data between your local file system and
the cloud, copy data within the cloud, and copy data between cloud storage
providers.
Usage
gs_copy(source, destination, recursive = FALSE, echo = TRUE)
Arguments
source |
The file to be copied. This can be either a path on the local
filesystem, or a Google Storage URI (e.g. |
destination |
The location where the |
recursive |
Boolean; perform a recursive copy? This must be specified if you intend on copying directories. |
echo |
Echo command output to console. |
Google storage bucket path that syncs to local storage when not running on CloudML.
Description
Refer to data within a Google Storage bucket. When running on CloudML the bucket will be read from directly. Otherwise, the bucket will be automatically synchronized to a local directory.
Usage
gs_data_dir(url, local_dir = "gs", force_sync = FALSE, echo = TRUE)
Arguments
url |
Google Storage bucket URL (e.g. |
local_dir |
Local directory to synchonize Google Storage bucket(s) to. |
force_sync |
Force local synchonization even if the data directory already exists. |
echo |
Echo command output to console. |
Details
This function is suitable for use in TensorFlow APIs that accept
gs:// URLs (e.g. TensorFlow datasets). However, many package functions
accept only local filesystem paths as input (rather than
gs:// URLs). For these cases you can the gs_data_dir_local()
function,
which will always synchronize gs:// buckets to the local filesystem and
provide a local path interface to their contents.
Value
Path to contents of data directory.
See Also
Get a local path to the contents of Google Storage bucket
Description
Provides a local filesystem interface to Google Storage buckets. Many
package functions accept only local filesystem paths as input (rather than
gs:// URLs). For these cases the gcloud_path()
function will synchronize
gs:// buckets to the local filesystem and provide a local path interface
to their contents.
Usage
gs_data_dir_local(url, local_dir = "gs", echo = FALSE)
Arguments
url |
Google Storage bucket URL (e.g. |
local_dir |
Local directory to synchonize Google Storage bucket(s) to. |
echo |
Echo command output to console. |
Details
If you pass a local path as the url
it will be returned
unmodified. This allows you to for example use a training flag for the
location of data which points to a local directory during
development and a Google Cloud bucket during cloud training.
Value
Local path to contents of bucket.
Note
For APIs that accept gs:// URLs directly (e.g. TensorFlow datasets)
you should use the gs_data_dir()
function.
See Also
Alias to gs_data_dir_local() function
Description
This function is deprecated, please use gs_data_dir_local()
instead.
Usage
gs_local_dir(url, local_dir = "gs", echo = FALSE)
Arguments
url |
Google Storage bucket URL (e.g. |
local_dir |
Local directory to synchonize Google Storage bucket(s) to. |
echo |
Echo command output to console. |
See Also
Synchronize content of two buckets/directories
Description
The gs_rsync
function makes the contents under destination
the same
as the contents under source
, by copying any missing files/objects (or
those whose data has changed), and (if the delete
option is specified)
deleting any extra files/objects. source
must specify a directory, bucket,
or bucket subdirectory.
Usage
gs_rsync(
source,
destination,
delete = FALSE,
recursive = FALSE,
parallel = TRUE,
dry_run = FALSE,
options = NULL,
echo = TRUE
)
Arguments
source |
The file to be copied. This can be either a path on the local
filesystem, or a Google Storage URI (e.g. |
destination |
The location where the |
delete |
Delete extra files under |
recursive |
Causes directories, buckets, and bucket subdirectories to
be synchronized recursively. If you neglect to use this option
|
parallel |
Causes synchronization to run in parallel. This can significantly improve performance if you are performing operations on a large number of files over a reasonably fast network connection. |
dry_run |
Causes rsync to run in "dry run" mode, i.e., just outputting what would be copied or deleted without actually doing any copying/deleting. |
options |
Character vector of additional command line options to the gsutil rsync command (as specified at https://cloud.google.com/storage/docs/gsutil/commands/rsync). |
echo |
Echo command output to console. |
Executes a Google Utils Command
Description
Executes a Google Utils command with the given parameters.
Usage
gsutil_exec(..., args = NULL, echo = FALSE)
Arguments
... |
Parameters to use specified based on position. |
args |
Parameters to use specified as a list. |
echo |
Echo command output to console. |
Cancel a job
Description
Cancel a job.
Usage
job_cancel(job = "latest")
Arguments
job |
Job name or job object. Pass "latest" to indicate the most recently submitted job. |
See Also
Other job management functions:
job_collect()
,
job_list()
,
job_status()
,
job_stream_logs()
,
job_trials()
Collect job output
Description
Collect the job outputs (e.g. fitted model) from a job. If the job has not
yet finished running, job_collect()
will block and wait until the job has
finished.
Usage
job_collect(
job = "latest",
trials = "best",
destination = "runs",
timeout = NULL,
view = interactive()
)
Arguments
job |
Job name or job object. Pass "latest" to indicate the most recently submitted job. |
trials |
Under hyperparameter tuning, specifies which trials to
download. Use |
destination |
The destination directory in which model outputs should
be downloaded. Defaults to |
timeout |
Give up collecting job after the specified minutes. |
view |
View the job results after collecting it. You can also pass
"save" to save a copy of the run report at |
See Also
Other job management functions:
job_cancel()
,
job_list()
,
job_status()
,
job_stream_logs()
,
job_trials()
List all jobs
Description
List existing Google Cloud ML jobs.
Usage
job_list(
filter = NULL,
limit = NULL,
page_size = NULL,
sort_by = NULL,
uri = FALSE
)
Arguments
filter |
Filter the set of jobs to be returned. |
limit |
The maximum number of resources to list. By default, all jobs will be listed. |
page_size |
Some services group resource list output into pages. This flag specifies the maximum number of resources per page. The default is determined by the service if it supports paging, otherwise it is unlimited (no paging). |
sort_by |
A comma-separated list of resource field key names to
sort by. The default order is ascending. Prefix a field
with |
uri |
Print a list of resource URIs instead of the default output. |
See Also
Other job management functions:
job_cancel()
,
job_collect()
,
job_status()
,
job_stream_logs()
,
job_trials()
Current status of a job
Description
Get the status of a job, as an R list.
Usage
job_status(job = "latest")
Arguments
job |
Job name or job object. Pass "latest" to indicate the most recently submitted job. |
See Also
Other job management functions:
job_cancel()
,
job_collect()
,
job_list()
,
job_stream_logs()
,
job_trials()
Show job log stream
Description
Show logs from a running Cloud ML Engine job.
Usage
job_stream_logs(
job = "latest",
polling_interval = getOption("cloudml.stream_logs.polling", 5),
task_name = NULL,
allow_multiline_logs = FALSE
)
Arguments
job |
Job name or job object. Pass "latest" to indicate the most recently submitted job. |
polling_interval |
Number of seconds to wait between efforts to fetch the latest log messages. |
task_name |
If set, display only the logs for this particular task. |
allow_multiline_logs |
Output multiline log messages as single records. |
See Also
Other job management functions:
job_cancel()
,
job_collect()
,
job_list()
,
job_status()
,
job_trials()
Current trials of a job
Description
Get the hyperparameter trials for job, as an R data frame
Usage
job_trials(x)
Arguments
x |
Job name or job object. |
See Also
Other job management functions:
job_cancel()
,
job_collect()
,
job_list()
,
job_status()
,
job_stream_logs()