1. Home
  2. Docs
  3. Stage 4 – Prepare
  4. Start Small … Think big !

Start Small … Think big !

In the previous steps we have detemined which Process we want to analyse. However, all the processes are not equals in term of volumetry, Number of datasources involved, number of steps (activities), etc. As consequence the effort needed to analyse the process can looks like an exploential curve. From several days to months or year the effort can just explose if we don’t take care about this key aspect of the project. Starting small is the key mantra to keep in mind when kicking off such project like this. Another benefit is managing a process mining like this just avoid the tunnel effect, provides quick results.

For example big entreprises processes like Order to Cash (O2C/OTC) or Procure To Pay (P2P, also known as the purchase-to-pay process) can be huge in those analytics dimensions. Scoping the Process is a key aspect of a Process Mining Project and from the beggining it’s important to think big but – again – start small. The very good news is: these initiatives perfectly work in an agile mode !

There are several dimension to take in account when it’s about scope; These dimensions or critera have a direct impact on the Process analysis complexity:

  • The number of Event ID (SN-KEY)
  • The number of Process Instances (PFI-KEY)
  • The number of data sources involved
  • The number of sub-processes (if known). The idea here is to ask ourselves if the process can be splitted in several sub processes ?
  • The number of attributes we need to gather to provide the desired outcome

Focusing first on the basic informations

To be successful and to provide very quick results/outcomes to the business it’s very important to start small and to increase the scope step by step. First of all, we need to get some numbers on the dimensions above so as to see if we can reduce the scope or if it’s not necessary at this stage.

To help in this task we can leverage Profiling tools & solutions or use the DQA toolkit in this website.

By using this toolkit we can easily (and at least):

  • List and have the exact number of the possible Events
  • Get the Global number of PFI


  • The Number of datasources can only come from the different workshops with the Data experts and business specialist.
  • The possible sup-process breakdown and the attributes needed can only come from the Business analyst expertise

Reducing the scope

Reducing the Number of Process Instance

To reduce the complexity of the firsts analysis the primiray reflex is of course to limit the number of data acquired by the process mining solution. But in the other hand it’s important to have a dataset which is representative. The best way to have the smalest dataset is to use sampling techniques as explained in this chapter.

Once the first outcome got from the sample it will be necessary to test and verify these results with a bigger dataset.

Reducing the Number of Events

Reducing the number of event is a classic and usual technique in Process Mining. Let’s be clear, analysing a process with more than 100 events (type) is tedious and not really efficicient most of the time. At this point the Event Discovery Workshop is key to remove the unessecary events of course but it’s also a great opportunity to put apart the less important events in the Process flow.

By this way the idea is to keep only the main milestone. Of course the other events will be imported later in the same project or typically we could also imagine having a separated project which manages the 2 (or more) granularity of the process flow (one with the main milestone, and an another one with all the events).

Reducing the number of attributes

That may be the easiest simplification we can make when we launch a process mining project. In fact it’s really recommended to reduce the first import with only the 3 mandatory fields (PFI-KEY, SN-KEY and T-KEY). The Process Mining solution.

Again, just by ingesting these 3 fields we can check the quality and consistency of the dataset, but it’s also a very good way to get the firsts results. Indeed, just with thess informations: we can for example determine bottlenecks or rework loops, etc.

In summary, with only these 3 data, we can provide many results on the behavior of the process ! Once the business is happy and agree on the global process look like it’s time to ingest more data and attributes to improve and get deeper in the analysis.

Reducing the number of Data Sources

This is maybe the more complex task: reducing the number of Data Sources. Why ? just because we have to ensure we can get all the data we need. Most of the time reducing the number of datasource comes from a business filter, like focusing first on a business Region or an organization. As these region or organisation may work with different applications it’s also a way to reduce the number of Data sources.

Reducing the scope by starting by a organization, a region, a service, etc. is a very good practice as it’s pretty easy to manage the project timing aftewards.

Breaking down the process

Breaking down the big process (target) in several sub-processes is also a very good way to move forward quickly. However that needs a very good knowledge of the process which is not always the case (sometime that is also the purpose of the initiative). However some big process can be splitted in a easy way as they already have different big phases.

For example A P2P process can have these phases:

  • Need Identification
  • Purchase Requisition
  • Purchase Order (PO) Creation
  • Supplier Selection and Negotiation
  • Purchase Order Approval
  • Order Transmission
  • Goods or Service Receipt
  • Invoice Processing
  • Invoice Verification and Approval
  • Payment Processing
  • Reconciliation and Reporting

Why not analysing these phases separately first ?

Iterations & increasing the scope step by step

Of course the purpose is not to stay with the small scope. The aim of all of this is to provide quick results to the business (quick win) and to give them a kind of flavor of what such solution like this can provide in a concrete way to their business. So one this first step it’s time to increase the scope, but step by step: by adding a new region or a new datasource, by adding more data or just new events. This must be discussed with the business according to the expected outcome.

How can we help?