A corpus (corpora in plural form) is generally speaking a substantive collection of language data processed in machine-readable form for research purposes. Corpora are sampled or collected under comparatively less controlled but more “ecological” conditions to allow more research questions to be posed. Corpus Linguistics methods have become popular due to the increased availability of corpora and statistics tools (Wallis, S., 2020, pp. 3-5).
This guide includes sources of linguistics datasets, tools and guides on statistical analysis of linguistics data. Please use the left-side menu to browse and contact Ying Liu yingliu(at)uvic.ca if you have questions and suggestions.
Wallis, S. (2020). Statistics in corpus linguistics research : a new approach (First edition.). Routledge.