Abacus holds UVic Libraries' collection of licensed datasets, including public use microdata (PUMFs) from StatCan censuses, other social and health surveys, public opinion polls, and spatial data for GIS. Access is restricted to UVic users.
Borealis, the Canadian Dataverse Repository is a national data repository for research data. The service, supported by UVic Libraries, is free for UVic researchers to deposit their datasets, which are registered with DOIs and are stored in a secure environment on Canadian servers. Researchers can choose to make their datasets available to the public, to specific individuals, or to keep it private.
With Borealis, researchers can search across research data from over 65 Canadian universities.
Lunaris provides a single point of search for research data held in Canadian data repositories, including academic institutions, departments at all levels of government, and research organizations. There are over 80,000 datasets from over 100 Canadian repositories and data collections currently indexed by Lunaris.
For access to confidential microdata from Statistics Canada census and surveys, contact the UVic Research Data Centre.
The UVic RDC provides access, for approved projects, to a growing variety of Statistics Canada confidential microdata household, population and workplace files. The microdata used by researchers come primarily from Statistics Canada Survey Master files. Increasingly, the Research Data Centres (RDCs) are repositories of administrative records from a variety of sources including tax, employment insurance, social assistance, and hospitalization records.
UVic catalogues or has access to thousands of data sources. To specifically search for datasets in "Library Search" (Primo):
1. Enter a search term in the search box as you would for any other resource.
2. Once you are directed to the results page, use the "Refine Results" filter on the left-hand side of the page. Go to "Content Type" and then "Show More"
3. Now choose "Datasets" and then click "Apply Filters" (green button)
4. You will now see the datasets that have been added to our catalogue (note: not all datasets are catalogued).
The UVic Libraries collects hundreds of websites as part of its web archiving efforts using Archive-It. UVic Libraries can help researchers access a variety of data related to these collections via the to Archive-It's Research Services, including:
WARC and their predecessor ARC files are the files into which data crawled using Archive-It is stored. Each file may contain multiple digital objects, including HTML, images, and videos. (Note that collection data can consist of both WARC and ARC files depending upon when they were archived through our service. Throughout these guides, the term “WARC files” refers to both WARC and ARC files.)
WAT stands for Web Archive Transformation, and are composed of key metadata such as provenance/capture information, essential text and link data, and other information. They are extracted from WARCs for every resource; because WAT files map one-to-one to WARC files, a collection's WARC files will have corresponding WAT files. WAT formats metadata into JavaScript Object Notation (JSON). The benefit is WATs are around 5%-20% the size of corresponding WARCs.
Longitudinal Graph Analysis files are archival web graph files that include a complete list of what URIs link to what URIs, along with a timestamp, from a collection’s origin through present. They are ~1% the size of a collection's aggregate WARC files, and deliver as a ZIP container of two files:
ID-Map:
ID-Graph:
Web Archive Named Entities are files that use named-entity recognition tools to generate a list of all the people, places, and organizations mentioned in each URI in a web archive, with a timestamp of when the URI was captured. The purpose is to link people, places, and organizations to time. A WANE dataset is generated using the Stanford Named Entity Recognizer software (http://nlp.stanford.edu/software/CRF-NER.shtml) to extract named entities from each textual resource in a collection. The analyzer uses an English model 3-class classifier to extract names that correspond to recognized Persons, Organizations, and Locations. WANE files are less than 1% the size of their corresponding WARC files, and are structured as a JSON object per line: URL ("url"), timestamp ("timestamp"), content digest ("digest") and the named entities ("named_entities") containing data arrays of "persons", "organizations", and "locations".
Please contact Corey Davis for more information.