📈Data Quality: how to

This page explains what we can do as a team to insure better data quality in the Organisation Portal

What is data quality?

Data quality vs Data integrity

Data Quality

Data quality is defined as the ability of data to serve its intended purpose. It refers to the reliability of data.

The 6 main criteria used to measure data quality are:

  • Accuracy: for whatever data described, it needs to be accurate.

  • Uniqueness: Is supposedly unique data, actually unique and thus not duplicated?

  • Relevancy: the data should meet the requirements for the intended use.

  • Completeness: the data should not have missing values or miss data records.

  • Timeliness: the data should be up to date.

  • Consistency:the data should have the data format as expected and can be cross reference-able with the same results.

Possible actions:

Check whether we do everything to ensure the new data users add into OP has good quality.

  • Have we put enough validations in place to eliminate formatting issues/human error (e.g. non-existent addresses, phone numbers missing digits,...) - Uniqueness, Accuracy & Consistency

  • Have we put enough restrictions in place to make sure users fill in all the relevant data that needs to be filled in? - Completeness

  • Do we have working agreements in place to ensure users help us keep the data quality high? - Relevancy, Timeliness

Data Integrity

Data integrity can be defined as the reliability as well as the trustworthiness of data throughout its lifecycle. Data integrity can be described as the state of your data or the process of ensuring the accuracy and validity of data.

We do this already in OP by ensuring not everyone can edit the data, as well as through checking if entities already exist during the creation process to eliminate duplicates.

Identifying the problem

Bad data quality isn't something that can be fundamentally improved by finding problems and fixing them. Instead, we should start by producing data with good quality in the first place.

This quote might be true in some ways, however, as we are dealing with legacy data and data migration, we need to identify where the poor data quality comes from and fix those issues too.

Questions we can ask ourselves

  • What type of data quality issues are we facing?

    • Incomplete data, formatting issues, outdated data,...

By identifying what types of data have poor quality, we can ensure that the new data coming into the OP won't face the same problems the migrated data did.

  • Where are these issues coming from?

    • New data being inputted in OP, old data from Sharepoint,...

Is it good to note that if the poor data quality stems from old SharePoint lists, that means that:

  1. We have built an application that is clear and shows data in a structured way, otherwise the wrong data would not be as obvious

  2. If most of the poor data is old, it means that we do have good measures in place preventing adding more poor data, and if we solve the old data issues, we will have high data quality.

How can we improve our data quality?

Once we know whether the problems stem from old legacy data or new inputted data, we can take measures to ensure the old data gets updated, and the new inputted data is inputted correctly.

Different ways to ensure good data quality are:

  • Establish metrics

  • Investigate data quality failures

  • Internal training

  • Establish and implement data governance guidelines

  • Establish a data auditing process

  • Assign a Data Steward/Data Owner

  • Implement a single source of truth

Sources

Last updated