Profile each involved and identified table/file (find potential candidates) separately. That means we need to check the Data Quality for each field:
- Are there some Null values ? (empty fields)
- We must check here the CONTROL_M1, CONTROL_M2 and CONTROL_M3
- Do I have some Duplicates ? If yes can i provide a percentage and the frequency distribution of the different values ?
- What about the formats (especially for the Timestamp date)
- Do I have an acceptable standard deviation between the Process Flow Identifier (PFI-KEY) and the step name Event id ?
At the end of this stage we must review and validate the Data Profiling results with the business. If everyone is satisfied we can directly jump to stage 3, otherwise we’ll have to go to stage 2.