Skip to Main Content
Libraries
askus Ask us
 

Linguistics Data

This guide includes sources of linguistics datasets, tools and guides on statistics analysis of linguistics data, and information about indigenous languages data.

Linguistics Data and Corpora

A corpus (corpora in plural form) is generally speaking a substantive collection of language data processed in machine-readable form for research purposes. Corpora are sampled or collected under comparatively less controlled but more “ecological” conditions to allow more research questions to be posed. Corpus Linguistics methods have become popular due to the increased availability of corpora and statistics tools (Wallis, S., 2020, pp. 3-5). 

This guide includes sources of linguistics datasets, tools and guides on statistical analysis of linguistics data. Please use the left-side menu to browse and contact Ying Liu yingliu(at)uvic.ca if you have questions and suggestions. 

Wallis, S. (2020). Statistics in corpus linguistics research : a new approach (First edition.). Routledge.

 

Creative Commons License
This work by The University of Victoria Libraries is licensed under a Creative Commons Attribution 4.0 International License unless otherwise indicated when material has been used from other sources.