Checking the existence
The screenshot below shows the NULL detection we must perform for the SN-KEY (the field Event):
Table Profiling with Ataccama
Like the PFI-KEY this field must not be empty. In the case above we have 4% of the Dataset with SN-KEY to NULL.
We’ll have to see with the business user what to do with these 4% of records:
- Do we filter out these rows ?
- Do we create a dummy Step Name with another name (“HOLD” for example) ?
- Can we attach these 4 rows to another Step Name thanks to the other data of the record ?
Checking the unicity
This field must not be unique. On the contrary, we should find many duplicates of these fields in there ! In the example above we have a low percentage of Non-unique (8%) and many duplicates (88%) which makes sense.
Checking the Step Name Standard
When getting the data from many different data sources we may have some differences in the way each data source names its own Process flow steps. Sometimes steps can be
- Similar (same meaning) but different
- Codified in different ways, how to reconcile these ?
- Spread in different fields (need to combine)
So we may have to standardize the steps to have something relevant and accurate. The easiest way to start is to get the Frequency distribution of the SN-KEY:
Table Profiling with Ataccama : Frequency distribution of SN-KEY
In the example above we have 2 data issues:
- The fields value “HO” and “Hold Order” should be merged in one field as they have the same meaning
- The fields value “Stock Check” and “Stock Checking” should also be merged in one field as they have the same meaning
This correction will be performed in the Data preparation stage a bit later when managing the Event Dictionary Workshop with the business. At this stage the purpose it’s to identify the gaps between the real steps names (from the different data sources) and what is expected.