TALL - Text Analysis for ALL

CRAN status CRAN downloads

TALL (Text Analysis for ALL) is an interactive R Shiny application designed for exploring, modeling, and visualizing textual data. It provides a comprehensive, code-free environment for Natural Language Processing, enabling researchers without extensive programming skills to perform sophisticated text analyses through an intuitive graphical interface.

TALL integrates state-of-the-art NLP techniques — tokenization, lemmatization, Part-of-Speech tagging, dependency parsing, topic modeling, sentiment analysis, and more — into a unified, reproducible workflow.


Reference Paper

Aria, M., Spano, M., D’Aniello, L., Cuccurullo, C., & Misuraca, M. (2026). TALL: Text analysis for all — an interactive R-shiny application for exploring, modeling, and visualizing textual data. SoftwareX, 34, 102590.

Read the full paper (Open Access) | Supplementary material

⚠️ Citation policy. TALL is open source and distributed under the MIT license. However, whenever results obtained with TALL are used in a publication, proper citation of the reference paper above is mandatory. Failure to properly cite the software is considered a violation of the license.


Setup

System Requirements

Before installing TALL, ensure you have:

  1. R version 4.2.0 or higher — Download from CRAN
  2. RStudio (recommended) — Available at Posit
  3. Active internet connection for downloads and dependencies
  4. Additional tools for the development version:

Stable Version (CRAN)

install.packages("tall")
library(tall)
tall()

Development Version (GitHub)

First, verify your build tools:

if (!require("pak", quietly = TRUE)) install.packages("pak")
pkgbuild::check_build_tools(debug = TRUE)

Then install from GitHub:

if (!require("remotes", quietly = TRUE)) install.packages("remotes")
remotes::install_github("massimoaria/tall")
library(tall)
tall()

The development version includes the latest features but may contain occasional bugs.

For detailed installation instructions, visit: Download & Install

An interactive tutorial is also available: View tutorial


Overview

Researchers across disciplines face the challenge of analyzing large volumes of textual data — research articles, social media posts, customer reviews, survey responses, legal documents, and literary works. While programming languages such as R and Python offer powerful NLP capabilities, not all researchers have the time or expertise to use them effectively.

TALL bridges this gap by providing a general-purpose, code-free text analysis platform built on the R ecosystem. It combines the statistical rigor of established R packages with the accessibility of a modern web interface, enabling researchers to conduct reproducible analyses from import through visualization without writing a single line of code.


Workflow

TALL follows a structured analytical workflow that guides users from raw text to interpretable results:

The workflow consists of three main stages:

1. Import and Manipulation

TALL supports multiple input formats (plain text, CSV, Excel, PDF, Biblioshiny exports) and provides tools for corpus splitting, random sampling, and integration of external metadata. Analysis sessions can be saved and reloaded as .tall files for full reproducibility.

2. Pre-processing and Cleaning

Linguistic pre-processing is powered by UDPipe with updated Universal Dependencies v2.15 language models, supporting 60+ languages. The pre-processing pipeline includes:

3. Statistical Text Analysis and Dynamic Visualization

TALL offers a rich set of analytical methods organized across three sections: Overview, Words, and Documents.


Analytical Methods

Overview

Corpus-level descriptive statistics provide a quantitative profile of the text collection:

Words

Word-level analyses reveal the internal structure and thematic organization of the corpus:

Documents

Document-level analyses operate on entire texts and their structural properties:

TALL AI

TALL integrates Google Gemini AI as an intelligent assistant that provides automated interpretation of analytical results. Available across most analysis tabs (Overview, KWIC, Correspondence Analysis, Co-occurrence Network, Thematic Map, Word Embeddings, Topic Modeling, Polarity Detection, Emotion Analysis, Syntactic Complexity, SVO Triplets), TALL AI examines the visual and numerical outputs and generates contextual, academically-grounded interpretations. AI calls run asynchronously, keeping the application responsive during processing.

Reporting

All analyses can be exported to an Excel workbook with embedded plots, enabling reproducible reporting. Individual plots can be exported as high-resolution PNG images with configurable DPI settings. Network visualizations use native canvas capture for crisp, DPI-aware rendering.


Screenshots

Import text from multiple file formats

Edit, divide, and add external information

Automatic Lemmatization and PoS-Tagging

Language, Model, and Analysis Term Selection

Special Entity Tagging

Multi-word Expression Extraction

Overview — Descriptive statistics, concordance analysis, word frequency distributions

Words — Topic detection, correspondence analysis, co-occurrence networks

Documents — Topic modeling, sentiment analysis, syntactic analysis


Authors

Creators

Contributors

Maintainer

Massimo Aria (aria@unina.it)


License

MIT License. Copyright 2023-2026 Massimo Aria.

See LICENSE for details.