It’s maybe the biggest part of the Process Mining work, or maybe just the most technical. In any case this phase is really fundamental as having the right data is definitely key for having a good analysis. Who could provide good analytics with wrong data ? no one, that’s why spending some time by ensuring we can collect all the needed data, and after that checking them is NOT a waste of time.
Why ? just because …
- Without good data, there’s no good analysis (worse the outcome can totally be false !)
- We may need to feed the Process Mining Solution regularly afterwards. saying differently we’ll have to put in place a real Data pipeline between the data sources and the Process Mining solution if we need to evaluate or just monitor/control the process.
- It’s better to avoid the problems before importing the data into the Process Mining Solution. These kinds of solutions have not been designed to be good Data Integration tools and in any case it’s always better to separate the duty So let’s the Data Integration stuff to the solutions which have been designed to do so (during the last 20-30 years).
- Analyzing the Data Sources
- The Data Quality Conditions to check
- Checking duplicates
- Start Small … Think big !
- Data Sample size
- The DQA for Process Mining
- Preparing the data
- Looping back to the Data Quality Check (DQA)
- The Process Warehouse