Coverage is a term frequently used in Machine Learning and relates to how well a model 'covers' the data it's used to analyse. In Communications Mining, this relates to the proportion of verbatims in the dataset that have informative label predictions, and is presented in Validation as a percentage score.
'Informative labels' are those labels that the platform understands to be useful as standalone labels, by looking at how frequently they're assigned with other labels. Labels that are always assigned with another label, e.g. parent labels that are never assigned on their own or 'Urgent' if it's always assigned with another label, are down-weighted when the score is calculated.
The visual below gives an indication of what low coverage versus high coverage would look like across an entire dataset. Imagine the shaded circles are verbatims that have informative label predictions.
As a metric, coverage is a very helpful way of understanding if you've captured all of the different potential concepts in your dataset, and whether you've provided enough varied training examples for them so that the platform can effectively predict them.
In almost all cases, the higher a model's coverage is the better it performs, but it should not be considered in isolation when checking model performance.
It is also very important that the labels in the taxonomy are healthy, meaning that they have high average precision and no other performance warnings, and that the training data is a balanced representation of the dataset as a whole.
If your labels are unhealthy or the training data is not representative of the dataset, then the coverage of your model that the platform calculates will be unreliable.
Your model having high coverage is particularly important if you are using it to drive automated processes.
For more detail on model coverage, and how to check your model's coverage, see here.
Previous: Concept Drift | Next: Datasets