🆔Reporting: Data quality check

How can we organise our data quality check in a better way?

To comment on a FP:

Date, name: comment

Status Feature Passport

STATUS

OWNER

DATE

In proposal

Sofie

19/08/2022 + 3/9/2022 + 7/9/2022

In refinement - Design Research

In refinement - Technical Research/Feedback

In development

In QA/ Testing

In Final state

Having good data quality is a priority for OP. That's why need to do regular data quality checks.

EPIC: OP-923: Reporting

Task for developer Iulian.

Analysis

This part of the feature passport is owned by the analyst.

Intro

We have different reporting goals

1. Reporting for business, only via BC&K

(= BC&K is an ABB department which produces dashboards [login needed] via Qlik Sense for the business)

to answer for example parliamentary questions
to show data tables on the website (example in OP-875)
to use in their daily work

2. Data quality check for business, initially via OP and later all or part via BC&K

to check for missing data
to check/correct data mistakes

3. Data quality check for OP team, only via OP

to check if the system works properly
to check if, for example: are all positions are linked to a person
to check on validations (fe: max x number of positions in a specific board)

Pre-analysis

1. Reporting via BC&K

= Reporting for business via DashBoards in Qlik Sense

Involved in this discussion:

BC&K: Suzy and Kain (visualisations of data in QS)
Digiteam: Patrick, Francis (DWH), Yassin and Sofie
Red Pencil: Aad

Process overview in Miro:

Lots of manual actions:

We will only share public OP data.
DWH POC:
- Francis rebuilds as much as possible in the data warehouse based on the turtle files.
- Francis writes SPARQL queries (support by Boris), temporary from OP dump to CV, and makes csv exports.
Frances shares the csv exports with Kain.
Kain imports the csv to Qlik Sense.
Kain creates the data visualisations.

📋 How to automate this process?

2. + 3. Data quality check via OP

= Data quality reporting for PM/PM and business

Boris gave us, last year, a small presentation (in dutch) on best practices BI / Analytics on 09/12/2021

📋 Summary of presentation: There is no elegant solution to set this up

Current state

[The analyst describes the current state of the feature that needs to be improved, highlighting any known problems and linking them, so that everyone reading the FP can find background information on the problem mentioned.

If the feature is a new feature that doesn't exist yet, the analyst can explain where the need for this feature comes from: business need/what problem are we trying to solve?]

1. Reporting via BC&K

TO DO Sofie: Request status.

2. Data quality reporting via OP

We have a Gitbook overview of the queries for (central) worship services

The numbering leads to the specific Github query within the Github overview of queries

Process requesting new query

1. PM/PO defines new query in Gitbook page

2. PM/PO creates Jira ticket for the creation of the query

3. After creation: PM/PO runs the query in SPARQL endpoint

4. PM/PO export the data as csv

5. PM/PO creates import csv in Excel (1 query per sheet per menu-item in module 'Bestuurseenheden' and module 'Personen')

1.1.0_Bestuurseenheden_Kerngegevens
1.2.0_Bestuurseenheden_BetrokkenLokaleBesturen
1.3.0_Bestuurseenheden_Bestuursorganen
1.4.0_Mandatarissen
2.0_Persons

Update process change (in Excel)

1. PM/PO runs the query in the SPARQL endpoint

2. PM/PO export the data as csv and overwrites the existing file with the new file (same name, so Excel gets updated automatically).

3. PM/PO alerts business to check missing/wrong data

Problems

1. Query repository organisation

Boris made a lot of queries for worship services, they were organised at that time, but this is not scalable. In the near future, we will have
- more administrative units, besides (central) worship services
- requests for new queries is possible, but the ordering we use now is sub-optimal
- general queries will apply to all of a selection of administrative units
The queries are not all in the same repository. It is now owned by Boris.
The combination of comments in Gitbook, Jira and Github is not ideal. I think we (PM/PO) should only use Gitbook to start, but handle everything in Jira and Github.

2. Translation needed from ttl to csv

3. Too many manual actions to update

In this feature passport, we will tackle the problems the current state of the feature has:

[the analyst explains the problems we need to solve in this feature]

The analyst does not provide with solutions. This is the task of the technical and design team.

e.g.

❌ There should be a button in the bottom-left corner so the user can send an email

✅ The user needs the ability to send emails to customer service

After the analyst is done with this, there needs to be a meeting (this can become part of the BRM) where the analyst talks through the expectations with the technical and design team. The team can then decide which problems are feasible to solve for this feature and create tickets accordingly.

This means that if, for example, the technical team thinks that one of the problems will take a lot longer than the analyst anticipated, the PO can still decide to work on that problem as part of another feature passport (* this then needs to be amended in the Problems section and a new feature passport can be created.)

Once the team has agreed on the problems to solve and the tickets have been created, acceptance criteria can be added on the tickets in jira.

This includes any dependencies, e.g.: Story: A user needs to be able to add a position to person Acceptance criteria:

The position needs to be automatically added to the relevant organisation
The position needs to be automatically added to the list of all the positions

🤩 Expectations

1. One repository for all queries

We would like to have an overview of the results of the queries so we can take action accordingly. That is accessible for all DEV's to make changes.

F.e. we discovered the persons with multiple person URIs too late.

2. No file translations

We would prefer a solution that can handle linked data immediately, without converting to csv, so we can generate data quality reporting to the business. CSV would be the fallback solution

Are solutions possible via Rest-API?

3. Automate the update process

🕵️‍♂️ Use Cases

See expandable section: We have different reporting goals.

🤔 Discussion points

Can we do reporting without file export/conversions?
- Not for now, we will work with csv
- DEV team will re-use the dashboard reporting of Loket
What tool are we going to use?
- MS Office: Excel (uses Power Query in the ETL process)
- ~~MS Office: PowerBI (uses Power Query in the ETL process)~~
- ~~Work via REST-API?~~ > has possible securitiy implications
- ~~Other suggestions~~
We have the login issue via ACM/IDM on QA and PROD environment.
- Not an issue when we use the dashboard of Loket

Solution

DEV team will re-use the dashboard reporting of Loket

Design (N/A)

~~This part of the feature passport is owned by the designer~~

User research

~~[If there is any user research preceding the wireframe mock-up stage, it needs to be documented here]~~

Mock-ups

~~[link to figma mockups + any explanation or extra documentation]~~

Technical

This part of the feature passport is owned by the technical team

[Information about the technical solutions for expectations that need it - e.g. using mu-search for showing all types of positions in one table.]

After the designer and/or the technical team finish their task, a meeting follows where the solutions are presented. The team exchanges feedback and amends the feature passport where necessary.

PreviousSPARQL Queries NextData Quality Management report requirements (Central) Worship Services

Last updated 9 months ago