Implement workflow for cleaning a data | CDQ Data Quality as Service

Implement workflow for cleaning a data storage

Currently, data has to be uploaded, a service run on it, then validated and then manually adapted before uploading it again and processing it in another service (e.g. first correct & enrich legal information, then enrich missing address information) Uploading from a report is not easily possible since there is no selection "update street name if accuracy indicator >= 4" because a) no such selection criteria are available in the mapping b) the field names often vary.

Therefore a workflow could be nice in which one can select the algorithm after which data should be curated, e.g. first remove duplicates based on duplicate matching configuration x, then correct & enrich legal information, if overall matching score > 0.9, finally enrich selected address information.

The output should be a file in the format of the input that can then easily be consumed by the source system.

Also each process step should provide an output for manuall checks in a given confidence intervall. After manuall check and classification (e.g. duplicate yes/no) the workflow should continue.

Josef Reissner
May 25 2021
Planned

New Service

Comments (2)
Votes (17)

Attach files

Enter a subject

Christian Käsler commented

September 08, 2021 13:38

In a first step, it would be great if we can upload/update individual fields into the data mirror without having to upload the entire record. For example, I validated the VAT and now want to upload a column "VAT Status" to the data mirror. Ideally, I can then upload an Excel file with two columns (Customer Number and VAT Status) and not the entire dataset.

Attachments Open full size
Lorna Sequerah commented

May 28, 2021 07:42
"The output should be a file in the format of the input that can then easily be consumed by the source system" - Very important to have.
- At the very least to be able to output the curation results into the same format as the input file or
- Have an app where the user can select fields to be consumed.
Thanks.
Attachments Open full size

CDQ Data Quality as Service

Idea Portal

Implement workflow for cleaning a data storage

Related ideas

Implement workflow for cleaning a data storage

Identify yourself with your email address

Related ideas