Overview
When you build a taxonomy by labelling data, you are creating a model. This model will use the labels you have applied to a set of data to identify similar concepts and intents in other verbatims and predict which labels apply to them.
In doing so, each label will have its own set of precision and recall scores.
Let’s say as part of a taxonomy we have a label in the platform called ‘Request for information’, how would precision and recall relate to this:
- Precision: For every 100 verbatims predicted as having the ‘Request for information’ label, it is the percentage of times that the ‘Request for information’ was correctly predicted out of the total times it was predicted. A 95% precision would mean that for every 100 verbatims, 95 would correctly be labelled as ‘Request for information’, and 5 would be wrongly labelled (i.e. they should not have been labelled with that label)
- Recall: For every 100 verbatims which should have been labelled as ‘Request for information’, how many did the platform find. A 77% recall would mean that there were 23 verbatims which should have been predicted as having the ‘Request for information’ label apply, but it missed them
'Recall' across all labels is directly related to the coverage of your model.
If you are confident that your taxonomy covers all of the relevant concepts within your dataset, and your labels have adequate precision, then the recall of those labels will determine how well covered your dataset is by label predictions. If all of your labels have high recall, then your model will have high coverage.
Precision versus recall
We also need to understand the trade-off between precision and recall within a particular model version.
The precision and recall statistics for each label in a particular model version are determined by a confidence threshold (i.e. how confident is the model that this label applies?).
The platform publishes precision and recall statistics live in the Validation page, and users are able to understand how different confidence thresholds affect the precision and recall scores using the adjustable slider.
As you increase the confidence threshold, the model is more certain that a label applies and therefore, precision will typically increase. At the same time, because the model needs to be more confident to apply a prediction, it will make fewer predictions and recall will typically decrease. The opposite is also typically the case as you decrease the confidence threshold.
So, as a rule of thumb, when you adjust the confidence threshold and precision improves, recall will typically decrease, and vice versa.
Within the platform, it’s important to understand this trade-off and what it means when setting up automations using the platform. Users will have to set a confidence threshold for the label that they want to form part of their automation, and this threshold needs to be adjusted to provide precision and recall statistics that are acceptable for that process.
Certain processes may value high recall (catching as many instances of an event as possible), whilst others will value high precision (correctly identifying instances of an event).
Previous: Precision and recall explained | Next: How does Validation work?