| Type: | Package | 
| Title: | An Interface to the 'fastText' Library | 
| Version: | 2.1.0 | 
| Description: | An interface to the 'fastText' library https://github.com/facebookresearch/fastText. The package can be used for text classification and to learn word vectors. An example how to use 'fastTextR' can be found in the 'README' file. | 
| License: | BSD_3_clause + file LICENSE | 
| Imports: | stats, graphics, Rcpp (≥ 0.12.4), slam | 
| Suggests: | knitr, rmarkdown | 
| VignetteBuilder: | knitr | 
| LinkingTo: | Rcpp | 
| Encoding: | UTF-8 | 
| RoxygenNote: | 7.2.3 | 
| URL: | https://github.com/EmilHvitfeldt/fastTextR | 
| BugReports: | https://github.com/EmilHvitfeldt/fastTextR/issues | 
| NeedsCompilation: | yes | 
| Packaged: | 2023-12-08 23:17:48 UTC; emilhvitfeldt | 
| Author: | Florian Schwendinger [aut],
  Emil Hvitfeldt | 
| Maintainer: | Emil Hvitfeldt <emilhhvitfeldt@gmail.com> | 
| Repository: | CRAN | 
| Date/Publication: | 2023-12-09 00:40:09 UTC | 
Create a New FastText Model
Description
Create a new FastText model. The available methods
are the same as the package functions but with out the prefix "ft_"
and without the need to provide the model.
Usage
fasttext()
Examples
ft <- fasttext()
Get Analogies
Description
TODO
Usage
ft_analogies(model, word_triplets, k = 10L)
Arguments
| model | an object inheriting from  | 
| word_triplets | a character vector of length string giving the word. | 
| k | an integer giving the number of nearest neighbors to be returned. | 
Value
.
Examples
## Not run: 
ft_analogies(model, c("berlin", "germany", "france"), k = 6L)
## End(Not run)
Default Control Settings
Description
A auxiliary function for defining the control variables.
Usage
ft_control(
  loss = c("softmax", "hs", "ns"),
  learning_rate = 0.05,
  learn_update = 100L,
  word_vec_size = 100L,
  window_size = 5L,
  epoch = 5L,
  min_count = 5L,
  min_count_label = 0L,
  neg = 5L,
  max_len_ngram = 1L,
  nbuckets = 2000000L,
  min_ngram = 3L,
  max_ngram = 6L,
  nthreads = 1L,
  threshold = 1e-04,
  label = "__label__",
  verbose = 0,
  pretrained_vectors = "",
  output = "",
  save_output = FALSE,
  seed = 0L,
  qnorm = FALSE,
  retrain = FALSE,
  qout = FALSE,
  cutoff = 0L,
  dsub = 2L,
  autotune_validation_file = "",
  autotune_metric = "f1",
  autotune_predictions = 1L,
  autotune_duration = 300L,
  autotune_model_size = ""
)
Arguments
| loss | a character string giving the name of the loss function
allowed values are  | 
| learning_rate | a numeric giving the learning rate, the default value is  | 
| learn_update | an integer giving after how many tokens the learning rate
should be updated. The default value is  | 
| word_vec_size | an integer giving the length (size) of the word vectors. | 
| window_size | an integer giving the size of the context window. | 
| epoch | an integer giving the number of epochs. | 
| min_count | an integer giving the minimal number of word occurences. | 
| min_count_label | and integer giving the minimal number of label occurences. | 
| neg | an integer giving how many negatives are sampled (only used if loss is  | 
| max_len_ngram | an integer giving the maximum length of ngrams used. | 
| nbuckets | an integer giving the number of buckets. | 
| min_ngram | an integer giving the minimal ngram length. | 
| max_ngram | an integer giving the maximal ngram length. | 
| nthreads | an integer giving the number of threads. | 
| threshold | a numeric giving the sampling threshold. | 
| label | a character string specifying the label prefix (default is  | 
| verbose | an integer giving the verbosity level, the default value
is  | 
| pretrained_vectors | a character string giving the file path to the pretrained word vectors which are used for the supervised learning. | 
| output | a character string giving the output file path. | 
| save_output | a logical (default is  | 
| seed | an integer | 
| qnorm | a logical (default is  | 
| retrain | a logical (default is  | 
| qout | a logical (default is  | 
| cutoff | an integer (default is  | 
| dsub | an integer (default is  | 
| autotune_validation_file | a character string | 
| autotune_metric | a character string (default is  | 
| autotune_predictions | an integer (default is  | 
| autotune_duration | an integer (default is  | 
| autotune_model_size | a character string | 
Value
a list with the control variables.
Examples
ft_control(learning_rate=0.1)
Load Model
Description
Load a previously saved model from file.
Usage
ft_load(file)
Arguments
| file | a character string giving the name of the file to be read in. | 
Value
an object inheriting from "fasttext".
Examples
## Not run: 
model <- ft_load("dbpedia.bin")
## End(Not run)
Get Nearest Neighbors
Description
TODO
Usage
ft_nearest_neighbors(model, word, k = 10L)
Arguments
| model | an object inheriting from  | 
| word | a character string giving the word. | 
| k | an integer giving the number of nearest neighbors to be returned. | 
Value
.
Examples
## Not run: 
ft_nearest_neighbors(model, "enviroment", k = 6L)
## End(Not run)
Normalize
Description
Applies normalization to a given text.
Usage
ft_normalize(txt)
Arguments
| txt | a character vector to be normalized. | 
Value
a character vector.
Examples
## Not run: 
ft_normalize(some_text)
## End(Not run)
Write Model
Description
Write a previously saved model from file.
Usage
ft_save(model, file, what = c("model", "vectors", "output"))
Arguments
| model | an object inheriting from  | 
| file | a character string giving the name of the file. | 
| what | a character string giving what should be saved. | 
Examples
## Not run: 
ft_save(model, "my_model.bin", what = "model")
## End(Not run)
Get Sentence Vectors
Description
Obtain sentence vectors from a previously trained model.
Usage
ft_sentence_vectors(model, sentences)
Arguments
| model | an object inheriting from  | 
| sentences | a character vector giving the sentences. | 
Value
a matrix containing the sentence vectors.
Examples
## Not run: 
ft_sentence_vectors(model, c("sentence", "vector"))
## End(Not run)
Evaluate the Model
Description
Evaluate the quality of the predictions. For the model evaluation precision and recall are used.
Usage
ft_test(model, file, k = 1L, threshold = 0)
Arguments
| model | an object inheriting from  | 
| file | a character string giving the location of the validation file. | 
| k | an integer giving the number of labels to be returned. | 
| threshold | a double giving the threshold. | 
Examples
## Not run: 
ft_test(model, file)
## End(Not run)
Train a Model
Description
Train a new word representation model or supervised classification model.
Usage
ft_train(
  file,
  method = c("supervised", "cbow", "skipgram"),
  control = ft_control(),
  ...
)
Arguments
| file | a character string giving the location of the input file. | 
| method | a character string giving the method, possible values are
 | 
| control | a list giving the control variables, for more information
see  | 
| ... | additional control arguments inserted into the control list. | 
Examples
## Not run: 
cntrl <- ft_control(nthreads = 1L)
model <- ft_train("my_data.txt", method="supervised", control = cntrl)
## End(Not run)
Get Word Vectors
Description
Obtain word vectors from a previously trained model.
Usage
ft_word_vectors(model, words)
Arguments
| model | an object inheriting from  | 
| words | a character vector giving the words. | 
Value
a matrix containing the word vectors.
Examples
## Not run: 
ft_word_vectors(model, c("word", "vector"))
## End(Not run)
Get Words
Description
Obtain all the words from a previously trained model.
Usage
ft_words(model)
Arguments
| model | an object inheriting from  | 
Value
a character vector.
Examples
## Not run: 
ft_words(model)
## End(Not run)
Predict using a Previously Trained Model
Description
Predict values based on a previously trained model.
Usage
ft_predict(
  model,
  newdata,
  k = 1L,
  threshold = 0,
  rval = c("sparse", "dense", "slam"),
  ...
)
Arguments
| model | an object inheriting from  | 
| newdata | a character vector giving the new data. | 
| k | an integer giving the number of labels to be returned. | 
| threshold | a double withing  | 
| rval | a character string controlling the return value, allowed
values are  | 
| ... | currently not used. | 
Value
NULL if a 'result_file' is given otherwise
if 'prob' is true a data.frame with the predicted labels
and the corresponding probabilities, if 'prob' is false a
character vector with the predicted labels.
Examples
## Not run: 
ft_predict(model, newdata)
## End(Not run)