1. Home
  2. Docs
  3. Stage 4 – Prepare
  4. The Data Quality Conditions to check

The Data Quality Conditions to check

The Process Mining solution does not require so many Data controls to have a viable and usable dataset.

The Minimum Data Checks to perform

  • CONTROL_M1: The Process Flow Identifier (PFI-KEY)
    • Must exists (not NULL)
    • Can be a String or a Number
  • CONTROL_M2: The Step Name (SN-KEY)
    • Must exists (not NULL)
    • Can be a String or a Number
  • CONTROL_M3: The Timestamp (T-KEY)
    • Must exists (not NULL)
    • [MOST OF THE TIME] Should respect a specific (or many) date format.
  • CONTROL_M4: The tuple (PFI-KEY, SN-KEY, T-KEY)
    • Must be unique

These are the minimum level of control we need to ensure. Of course we may consider some other optional data controls like the ones on the additionals and optionals fields.

How to check the data with a Data Profiling tool

Checking Null Values with a Data Profiling tool:

Checking a format (here a Date) with the DQA Kit :

Articles

How can we help?