CDQ Data Quality as Service

Idea Portal

Implement workflow for cleaning a data storage

Currently, data has to be uploaded, a service run on it, then validated and then manually adapted before uploading it again and processing it in another service (e.g. first correct & enrich legal information, then enrich missing address information) Uploading from a report is not easily possible since there is no selection "update street name if accuracy indicator >= 4" because a) no such selection criteria are available in the mapping b) the field names often vary.


Therefore a workflow could be nice in which one can select the algorithm after which data should be curated, e.g. first remove duplicates based on duplicate matching configuration x, then correct & enrich legal information, if overall matching score > 0.9, finally enrich selected address information.


The output should be a file in the format of the input that can then easily be consumed by the source system.


Also each process step should provide an output for manuall checks in a given confidence intervall. After manuall check and classification (e.g. duplicate yes/no) the workflow should continue.

  • Josef Reissner
  • May 25 2021
  • Planned
  • Attach files
      Drop here to upload
    • Christian Käsler commented
      September 08, 2021 13:38

      In a first step, it would be great if we can upload/update individual fields into the data mirror without having to upload the entire record. For example, I validated the VAT and now want to upload a column "VAT Status" to the data mirror. Ideally, I can then upload an Excel file with two columns (Customer Number and VAT Status) and not the entire dataset.

    • Lorna Sequerah commented
      May 28, 2021 07:42

      "The output should be a file in the format of the input that can then easily be consumed by the source system" - Very important to have.

      • At the very least to be able to output the curation results into the same format as the input file or

      • Have an app where the user can select fields to be consumed.

      Thanks.

    • +4