Managing first and last steps of the imported process flows is a typical work that must be done at the beginning and before doing any analysis. The problem is quite simple: the Process – as we know – is built from the data which have been extracted from the systems and applications. However to get these data we have to query them from their data source or get them as they are provided automatically (ie. stream logs). When querying or getting the data we have to specify a criteria, and most of the time the criteria is not the Process Identifier itself but rather a date range.
This problem is quite simple, the Process flows based on the data we had collected can be incomplete and so inaccurate.
- We have the beginning of the Process but not the end because the query date was cutting the Process
- We only have the end of the process, because of the same problem
- We have the middle of the process, but not the beginning or the end.
- Finally we have all the process steps !
The figure below illustrates these issues :
First of all our problem is to find a strategy to detect the 3 first listed issues (the fourth is when everything is ok). But, how ?
First idea is to ask the business user which Steps are acceptable as first and last. By this way:
- We select the acceptable First Steps and remove or filter all the Process Flows which does not starts with
- We select the acceptable Last steps and remove or filter all the Process Flows which does not ends with
By proceeding like this we may remove a lot of imported Process flows. This is maybe the most common approach but there’s also some weaknesses:
- Sometimes we cannot identify clearly enough what must be the first and last steps, there are many possibilities and we cannot taking off potential bad steps for those purpose
- One of the purposes of Process Mining is also to identify real issues when managing a Process. So if we remove or hide some “bad processes” like these we won’t be able to analyze such issues like this.
Secondly, once these Process Flows had been detected we have too options:
- We remove these Processes from the analysis scope
- We put apart these Processes to check them in a separate set. This way is clearly the more safe as we can analyze (at least later) these incomplete or maybe just bad processes but separately.