1. Home
  2. Docs
  3. Stage 4 – Prepare
  4. The DQA for Process Minin...
  5. DQA Stage 3: Data Source Relationship detection

DQA Stage 3: Data Source Relationship detection

In the real world the data we want to gather can come from several Data Sources. In this case and based on the DCP (Data Collection Plan) we must consider how to join these Data sources to merge these datasets together. 

In order to manage this complex task the Data Profiling tool can also help in:

  • Identifying the Files PK (Primary keys) & FK (Foreign Keys). 
  • This analysis will help further to determine the potential Data sources relationships (joins/relations)
  • PK Analysis (show duplicates, etc.)

This work helps the Data Expert to build the functional Data Model (based on entities and relationships). Once this work is validated by the business, the Data Profiling tool can also help in verifying the accuracy of this Data Model by doing a joint analysis.

Sometimes we may find some issues when trying or linking (or joining) some data that should be linked together. In the example above we have two data sources: one for the orders and the other for the Products. We should expect having a strict relationship between these two Data sources as an order must reference an existing product. However and because of many reasons (here we have two separate data sources so no referential integrity control (as we could have in a Database), it’s possible to find data which does not match (here 5%).

How can we help?