At the first stage of the DQA we had analyzed the provided datasources. Of course we may had seen some data issues and at this stage the business may not really be satisfied with the Data extracted. That can be for many reasons like:
- There’s rejected data to manage
- The Data Quality is not correct enough (structure, format, rules, etc.)
- We may have to change the stratification (or granularity) level of the data
- The field extracted does not match with the business expectations (in the worse case we were not able to identify at least one of the 3 mandatory fields)
For all of these reasons we may have to do some data transformations. It’s a very good practice to manage these transformations in a dedicated tool (like an ETL or a Data preparation solution) so as to be able to industrialize this work later.
It’s also a good practice to manage every Data Quality issue (in a row) as a rejected line. These rejected lines will be reviewed by the Data expert and the business analyst to see if the row can be removed or if it needs to be transformed.
Once this work done, we can then come back to the previous step and re-check the quality of the data.