| Title: | 'OpenAI' Compatible Speech-to-Text API Client |
| Version: | 0.2.1 |
| Description: | A minimal-dependency R client for 'OpenAI'-compatible speech-to-text APIs (see https://platform.openai.com/docs/api-reference/audio) with optional local fallbacks. Supports 'OpenAI', local servers, and the 'whisper' package for local transcription. |
| License: | MIT + file LICENSE |
| Encoding: | UTF-8 |
| URL: | https://github.com/cornball-ai/stt.api |
| BugReports: | https://github.com/cornball-ai/stt.api/issues |
| Imports: | curl, jsonlite |
| Suggests: | tinytest, whisper |
| NeedsCompilation: | no |
| Packaged: | 2026-03-29 22:12:33 UTC; troy |
| Author: | Troy Hernandez |
| Maintainer: | Troy Hernandez <troy@cornball.ai> |
| Repository: | CRAN |
| Date/Publication: | 2026-04-01 20:00:02 UTC |
Get or create cached native whisper model
Description
Get or create cached native whisper model
Usage
.get_native_whisper_model(model, device = "auto")
Arguments
model |
Model name (e.g., "tiny", "base", "small", "medium", "large-v3") |
device |
Device to use ("auto", "cpu", "cuda") |
Value
Loaded whisper model object
Normalize segments to use numeric seconds
Description
Normalize segments to use numeric seconds
Usage
.normalize_segments(segments)
Arguments
segments |
Data frame with from/to or start/end columns |
Value
Data frame with numeric start/end columns
Convert time string to numeric seconds
Description
Convert time string to numeric seconds
Usage
.time_to_seconds(time_str)
Arguments
time_str |
Time string in "HH:MM:SS.mmm" or "MM:SS.mmm" format |
Value
Numeric seconds
Internal: Transcribe via native whisper package
Description
Uses the cornball-ai/whisper native R torch implementation.
Usage
.via_whisper(file, model = NULL, language = NULL)
Arguments
file |
Character. Path to the audio file to transcribe. |
model |
Character or NULL. Whisper model name (e.g., "tiny", "base", "small", "medium", "large-v3"). |
language |
Character or NULL. Language code for transcription. |
Value
List with transcription results in normalized format.
Clear native whisper model cache
Description
Removes cached native whisper models from memory. Call this to free GPU/RAM after batch processing is complete.
Usage
clear_native_whisper_cache()
Value
No return value, called for side effects (frees memory by removing cached models and triggers garbage collection).
Examples
clear_native_whisper_cache()
Set the API Base URL
Description
Sets the base URL for OpenAI-compatible STT endpoints.
Usage
set_stt_base(url)
Arguments
url |
Character string. The base URL (e.g., "http://localhost:4123" or "https://api.openai.com"). |
Value
Invisibly returns the previous value.
Examples
set_stt_base("http://localhost:4123")
getOption("stt.api_base")
Set the API Key
Description
Sets the API key for hosted STT services (e.g., OpenAI). Local servers typically ignore this.
Usage
set_stt_key(key)
Arguments
key |
Character string. The API key. |
Value
Invisibly returns the previous value.
Examples
set_stt_key("test-key-123")
getOption("stt.api_key")
Speech to Text
Description
Convert an audio file to text using a local whisper backend or an OpenAI-compatible API.
Usage
stt(file, model = NULL, language = NULL,
response_format = c("json", "text", "verbose_json"),
backend = c("auto", "whisper", "openai"), prompt = NULL)
Arguments
file |
Path to the audio file to convert. |
model |
Model name to use for transcription. For API backends, this is passed directly (e.g., "whisper-1"). For whisper, this is the model size (e.g., "tiny", "base", "small", "medium", "large"). If NULL, uses the backend's default. |
language |
Language code (e.g., "en", "es", "fr"). Optional hint to improve transcription accuracy. |
response_format |
Response format for API backend. One of "text", "json", or "verbose_json". Ignored for whisper backend. |
backend |
Which backend to use: "auto" (default), "whisper", or "openai". Auto mode tries whisper first, then openai API (if configured). |
prompt |
Optional text to guide the transcription. For API backend, this is passed as initial_prompt to help with spelling of names, acronyms, or domain-specific terms. Ignored for whisper backend. |
Value
A list with components:
- text
The transcribed text as a single string.
- segments
A data.frame of segments with timing info, or NULL.
- language
The detected or specified language code.
- backend
Which backend was used ("api" or "whisper").
- raw
The raw response from the backend.
Examples
## Not run:
# Using OpenAI API
set_stt_base("https://api.openai.com")
set_stt_key(Sys.getenv("OPENAI_API_KEY"))
result <- stt("speech.wav", model = "whisper-1")
result$text
# Using local server
set_stt_base("http://localhost:4123")
result <- stt("speech.wav")
## End(Not run)
Check STT Backend Health
Description
Checks whether a transcription backend is available and working.
Usage
stt_health()
Value
A list with components:
- ok
Logical. TRUE if a backend is available.
- backend
Character. The available backend ("api" or "whisper"), or NULL if none available.
- message
Character. Status message with details.
Examples
## Not run:
h <- stt_health()
if (h$ok) {
message("STT ready via ", h$backend)
}
## End(Not run)