CDQ Data Quality as Service

Idea Portal

Implement workflow for cleaning a data storage

Currently, data has to be uploaded, a service run on it, then validated and then manually adapted before uploading it again and processing it in another service (e.g. first correct & enrich legal information, then enrich missing address information) Uploading from a report is not easily possible since there is no selection "update street name if accuracy indicator >= 4" because a) no such selection criteria are available in the mapping b) the field names often vary.


Therefore a workflow could be nice in which one can select the algorithm after which data should be curated, e.g. first remove duplicates based on duplicate matching configuration x, then correct & enrich legal information, if overall matching score > 0.9, finally enrich selected address information.


The output should be a file in the format of the input that can then easily be consumed by the source system.


Also each process step should provide an output for manuall checks in a given confidence intervall. After manuall check and classification (e.g. duplicate yes/no) the workflow should continue.

  • Josef Reissner
  • May 25 2021
  • Planned
  • Attach files
  • Christian Käsler commented
    September 08, 2021 13:38

    In a first step, it would be great if we can upload/update individual fields into the data mirror without having to upload the entire record. For example, I validated the VAT and now want to upload a column "VAT Status" to the data mirror. Ideally, I can then upload an Excel file with two columns (Customer Number and VAT Status) and not the entire dataset.

  • Lorna Sequerah commented
    May 28, 2021 07:42

    "The output should be a file in the format of the input that can then easily be consumed by the source system" - Very important to have.

    • At the very least to be able to output the curation results into the same format as the input file or

    • Have an app where the user can select fields to be consumed.

    Thanks.

  • +4