Ligo: An Open Source Application for the Management and Execution of Administrative Data Linkage

Bibliographic Details
Title: Ligo: An Open Source Application for the Management and Execution of Administrative Data Linkage
Authors: Greg Lawrance, Raphael Parra Hernandez, Khalegh Mamakani, Suraiya Khan, Brent Hills, Harold Yip, Caelan Marrville
Source: International Journal of Population Data Science, Vol 3, Iss 4 (2018)
Publisher Information: Swansea University, 2018.
Publication Year: 2018
Collection: LCC:Demography. Population. Vital events
Subject Terms: Demography. Population. Vital events, HB848-3697
More Details: Introduction Ligo is an open source application that provides a framework for managing and executing administrative data linking projects. Ligo provides an easy-to-use web interface that lets analysts select among data linking methods including deterministic, probabilistic and machine learning approaches and use these in a documented, repeatable, tested, step-by-step process. Objectives and Approach The linking application has two primary functions: identifying common entities in datasets [de-duplication] and identifying common entities between datasets [linking]. The application is being built from the ground up in a partnership between the Province of British Columbia’s Data Innovation (DI) Program and Population Data BC, and with input from data scientists. The simple web interface allows analysts to streamline the processing of multiple datasets in a straight-forward and reproducible manner. Results Built in Python and implemented as a desktop-capable and cloud-deployable containerized application, Ligo includes many of the latest data-linking comparison algorithms with a plugin architecture that supports the simple addition of new formulae. Currently, deterministic approaches to linking have been implemented and probabilistic methods are in alpha testing. A fully functional alpha, including deterministic and probabilistic methods is expected to be ready in September, with a machine learning extension expected soon after. Conclusion/Implications Ligo has been designed with enterprise users in mind. The application is intended to make the processes of data de-duplication and linking simple, fast and reproducible. By making the application open source, we encourage feedback and collaboration from across the population research and data science community.
Document Type: article
File Description: electronic resource
Language: English
ISSN: 2399-4908
Relation: https://ijpds.org/article/view/749; https://doaj.org/toc/2399-4908
DOI: 10.23889/ijpds.v3i4.749
Access URL: https://doaj.org/article/3b7daff64cf7415fa10b123f57123912
Accession Number: edsdoj.3b7daff64cf7415fa10b123f57123912
Database: Directory of Open Access Journals
More Details
ISSN:23994908
DOI:10.23889/ijpds.v3i4.749
Published in:International Journal of Population Data Science
Language:English