Introduction
The platform is typically used in one of the first steps of an automated process: ingesting, interpreting and structuring an inbound communication, such as a customer email, much like a human would do when that email arrived in their inbox.
When the platform predicts which labels (or tags) apply to a communication, it assigns each prediction a confidence score (%) to show how confident it is that the label applies.
If these predictions are to be used to automatically classify the communication, however, there needs to be a binary decision - i.e. does this label apply or not? This is where confidence thresholds come in.
A confidence threshold is the confidence score (%) at or above which an RPA bot or other automation service will take the prediction from the platform as a binary 'Yes, this label does apply' and below which it will take the prediction as a binary 'No, this label does not apply'.
It's therefore very important to understand confidence thresholds and how to select the appropriate one, in order to achieve the right balance of precision and recall for that label.
Selecting a threshold for a label
- To select a threshold for a label, navigate to the Validation page and select the label from the label filter bar
- Then simply drag the threshold slider, or type a % figure into the box (as shown below), to see the different precision and recall statistics that would be achieved for that threshold
- The precision vs recall chart gives you a visual indication of the confidence thresholds that would maximise precision or recall, or provide a balance between the two:
- In the first image below, the confidence threshold selected (68.7%) would maximise precision (100%) - i.e. the platform should typically get no predictions wrong at this threshold - but would have a lower recall value (85%) as a result
- In the second image, the confidence threshold selected (39.8%) provides a good balance between precision and recall (both 92%)
- In the third image, the confidence threshold selected (17%) would maximise recall (100%) - i.e. the platform should identify every instance where this label should apply - but would have a lower precision value (84%) as a result
Label validation with confidence threshold set at 68.7%
Label validation with confidence threshold set at 39.8%
Label validation with confidence threshold set at 17%
Choosing the right threshold
So how do you choose the threshold that is right for you? The simple answer is: it depends.
Depending on your use case and the specific label in question, you might want to maximise either precision or recall, or find the threshold that gives the best possible balance of both.
When thinking about what threshold is required, it's helpful to think about potential outcomes - what is the potential cost or consequence to your business if a label is incorrectly applied? What about if it is missed?
For each label your threshold should be chosen based on the better outcome for the business if something goes wrong - i.e. something is incorrectly classified (a false positive), or something is missed (a false negative).
For example, if you wanted to automatically classify inbound communications in different categories, but also had a label for 'Urgent' that routed requests to a high-priority work queue, you might want to maximise the recall for this label to ensure that no urgent requests are missed, and accept a lower precision as result. This is because it may not be very detrimental to the business to have some less urgent requests put into the priority queue, but it could be very detrimental to the business to miss an urgent request that is time sensitive.
As another example, if you were automating a type of request end-to-end that was some form of monetary transaction or was of high-value, you would likely choose a threshold that maximised precision, so as to only automate end-to-end the transactions the platform was most confident about. Predictions with confidences below the threshold would then be manually reviewed. This is because the cost of a wrong prediction (a false positive) is potentially very high if a transaction is then processed incorrectly.