Skip to content

Dataset Download

The "NLP for Drone Flight Log Analysis" dataset is publicly available to support research in drone safety, forensics, and natural language processing. This page provides access to the data, documentation, and licensing information necessary for its use.


1. Dataset Contents & Structure

Our dataset is organized to provide a transparent view of the entire data preparation and annotation pipeline. The following is a high-level overview of the data available for download:

  • Raw Data: The original collection of 499 messages from the AirData UAV wiki.
  • Cleansed Data: A set of files representing the three stages of our data cleansing procedure, including correction logs.
  • Annotated Data: The final, ready-to-use data for each of our defined NLP tasks (e.g., Problem Identification, Event Recognition).
  • Qualitative Data: Flight summaries and key event timelines for the VTO Labs test set.

For a detailed view of the file structure, please refer to our Dataset repository on GitHub.


2. Download Instructions

The canonical, citable version of our dataset is hosted on Zenodo. For convenience, the in-progress versions of the data are also available on our GitHub repository.