MultiCAT: Multimodal Communication Annotations for Teams
Welcome to the web UI for the MultiCAT dataset! We provide here a web UI for
exploring the dataset, as well as links to papers and systems that use this
dataset.
Abstract
Successful teamwork requires team members to understand each other and
communicate effectively, managing multiple linguisticand paralinguistic tasks at
once. Because of the potential for interrelatedness of these tasks, it is
important to have the ability to make multiple types of predictions on the same
dataset. Here, we introduce Multimodal Communication Annotations for Teams
(MultiCAT), a speech- and text-based dataset consisting of audio recordings,
automated and hand-corrected transcriptions. MultiCAT builds upon data from
teams working collaboratively to save victims in a simulated search and rescue
mission, and consists of annotations and benchmark results for the following
tasks: (1) dialog act classification, (2) adjacency pair detection, (3)
sentiment and emotion recognition, (4) closed-loop communication detection, and
(5) vocal (phonetic) entrainment detection. We also present exploratory analyses
on the relationship between our annotations and team outcomes. We posit that
additional work on these tasks and their intersection will further improve
understanding of team communication and its relation to team performance.
Paper and code
- The PDF of our NAACL 2025 Findings paper and code for the benchmark analyses
(and this web UI) can be found here: https://github.com/adarshp/MultiCAT.
- When exploring the different tables, metadata describing the columns will be
displayed at the top of the page. The metadata is loaded from the file
datasette_interface/metadata.yml in the Github repository.
Exploring the data
The dataset is served via a single SQLite database.
An entity relationship diagram representing the database schema is
available here: ERD diagram.
Feel free to try different SQL queries, use the
programmatic APIs provided by Datasette, or simply download the whole SQLite
database.
14,098 rows in 8 tables
utterance, entrainment, participant, trial, team, ...
Citation
If you use this dataset, please cite our NAACL 2025 Findings
paper that introduces the dataset.
We will replace the metadata of the citations below with the ones from the
official ACL Anthology entry when it is published.
BibTeX Format
@inproceedings{pyarelal-etal-2025-multicat,
title = "{M}ulti{CAT}: Multimodal Communication Annotations for Teams",
author = "Pyarelal, Adarsh and
Culnan, John M and
Qamar, Ayesha and
Krishnaswamy, Meghavarshini and
Wang, Yuwei and
Jeong, Cheonkam and
Chen, Chen and
Miah, Md Messal Monem and
Hormozi, Shahriar and
Tong, Jonathan and
Huang, Ruihong",
editor = "Chiruzzo, Luis and
Ritter, Alan and
Wang, Lu",
booktitle = "Findings of the Association for Computational Linguistics: NAACL 2025",
month = apr,
year = "2025",
address = "Albuquerque, New Mexico",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.findings-naacl.61/",
pages = "1077--1111",
ISBN = "979-8-89176-195-7"
}
APA Format
Adarsh Pyarelal, John Culnan, Ayesha Qamar, Meghavarshini Krishnaswamy, Yuwei
Wang, Chen Chen, Md Messal Monem Miah, Shahriar Hormozi, Jonathan Tong & Ruihong
Huang (2025). MultiCAT: Multimodal Communication Annotations for Teams. In Findings of the Association for Computational Linguistics: NAACL 2025.
Funding Acknowledgment
- The creation of this dataset was funded by the Army Research Office and was
accomplished under Grant Number W911NF-20-1-0002. The grant was awarded through
the Defense Advanced Research Projects Agency (DARPA).
- Continued support (documentation updates, replying to questions from dataset
users, etc.) is supported by Army Research Office (ARO) Award Number
W911NF-24-2-0034.