Title: 'OpenAI' Compatible Speech-to-Text API Client
Version: 0.2.1
Description: A minimal-dependency R client for 'OpenAI'-compatible speech-to-text APIs (see https://platform.openai.com/docs/api-reference/audio) with optional local fallbacks. Supports 'OpenAI', local servers, and the 'whisper' package for local transcription.
License: MIT + file LICENSE
Encoding: UTF-8
URL: https://github.com/cornball-ai/stt.api
BugReports: https://github.com/cornball-ai/stt.api/issues
Imports: curl, jsonlite
Suggests: tinytest, whisper
NeedsCompilation: no
Packaged: 2026-03-29 22:12:33 UTC; troy
Author: Troy Hernandez ORCID iD [aut, cre], Cornball AI [cph]
Maintainer: Troy Hernandez <troy@cornball.ai>
Repository: CRAN
Date/Publication: 2026-04-01 20:00:02 UTC

Get or create cached native whisper model

Description

Get or create cached native whisper model

Usage

.get_native_whisper_model(model, device = "auto")

Arguments

model

Model name (e.g., "tiny", "base", "small", "medium", "large-v3")

device

Device to use ("auto", "cpu", "cuda")

Value

Loaded whisper model object


Normalize segments to use numeric seconds

Description

Normalize segments to use numeric seconds

Usage

.normalize_segments(segments)

Arguments

segments

Data frame with from/to or start/end columns

Value

Data frame with numeric start/end columns


Convert time string to numeric seconds

Description

Convert time string to numeric seconds

Usage

.time_to_seconds(time_str)

Arguments

time_str

Time string in "HH:MM:SS.mmm" or "MM:SS.mmm" format

Value

Numeric seconds


Internal: Transcribe via native whisper package

Description

Uses the cornball-ai/whisper native R torch implementation.

Usage

.via_whisper(file, model = NULL, language = NULL)

Arguments

file

Character. Path to the audio file to transcribe.

model

Character or NULL. Whisper model name (e.g., "tiny", "base", "small", "medium", "large-v3").

language

Character or NULL. Language code for transcription.

Value

List with transcription results in normalized format.


Clear native whisper model cache

Description

Removes cached native whisper models from memory. Call this to free GPU/RAM after batch processing is complete.

Usage

clear_native_whisper_cache()

Value

No return value, called for side effects (frees memory by removing cached models and triggers garbage collection).

Examples

clear_native_whisper_cache()


Set the API Base URL

Description

Sets the base URL for OpenAI-compatible STT endpoints.

Usage

set_stt_base(url)

Arguments

url

Character string. The base URL (e.g., "http://localhost:4123" or "https://api.openai.com").

Value

Invisibly returns the previous value.

Examples

set_stt_base("http://localhost:4123")
getOption("stt.api_base")


Set the API Key

Description

Sets the API key for hosted STT services (e.g., OpenAI). Local servers typically ignore this.

Usage

set_stt_key(key)

Arguments

key

Character string. The API key.

Value

Invisibly returns the previous value.

Examples

set_stt_key("test-key-123")
getOption("stt.api_key")


Speech to Text

Description

Convert an audio file to text using a local whisper backend or an OpenAI-compatible API.

Usage

stt(file, model = NULL, language = NULL,
    response_format = c("json", "text", "verbose_json"),
    backend = c("auto", "whisper", "openai"), prompt = NULL)

Arguments

file

Path to the audio file to convert.

model

Model name to use for transcription. For API backends, this is passed directly (e.g., "whisper-1"). For whisper, this is the model size (e.g., "tiny", "base", "small", "medium", "large"). If NULL, uses the backend's default.

language

Language code (e.g., "en", "es", "fr"). Optional hint to improve transcription accuracy.

response_format

Response format for API backend. One of "text", "json", or "verbose_json". Ignored for whisper backend.

backend

Which backend to use: "auto" (default), "whisper", or "openai". Auto mode tries whisper first, then openai API (if configured).

prompt

Optional text to guide the transcription. For API backend, this is passed as initial_prompt to help with spelling of names, acronyms, or domain-specific terms. Ignored for whisper backend.

Value

A list with components:

text

The transcribed text as a single string.

segments

A data.frame of segments with timing info, or NULL.

language

The detected or specified language code.

backend

Which backend was used ("api" or "whisper").

raw

The raw response from the backend.

Examples

## Not run: 
# Using OpenAI API
set_stt_base("https://api.openai.com")
set_stt_key(Sys.getenv("OPENAI_API_KEY"))
result <- stt("speech.wav", model = "whisper-1")
result$text

# Using local server
set_stt_base("http://localhost:4123")
result <- stt("speech.wav")

## End(Not run)


Check STT Backend Health

Description

Checks whether a transcription backend is available and working.

Usage

stt_health()

Value

A list with components:

ok

Logical. TRUE if a backend is available.

backend

Character. The available backend ("api" or "whisper"), or NULL if none available.

message

Character. Status message with details.

Examples

## Not run: 
h <- stt_health()
if (h$ok) {
  message("STT ready via ", h$backend)
}

## End(Not run)