PLEASE NOTE: UiPath Communications Mining's Knowledge Base has been fully migrated to UiPath Docs. Please navigate to equivalent articles in UiPath Docs (here) for up to date guidance, as this site will no longer be updated and maintained.

Knowledge Base

Model Training & Maintenance

Guides on how to create, improve and maintain Models in Communications Mining, using platform features such as Discover, Explore and Validation

Preparing Data for .CSV Upload

User permissions required: 'Sources admin' AND 'Edit verbatims'

 

You can find instructions on uploading data from a .csv here, along with common error messages you may encounter in the platform.  


Prior to uploading data into Communications Mining, there are a few factors to take into consideration when preparing the data to be ingested by the platform. 


Important Note: Please ensure you are uploading a .csv file, and not an Excel file. 


If you have been opening the .csv in Excel and making changes, this can lead to formatting issues potentially causing issues at the point of upload. To avoid this, please ensure any updates are done in the .csv directly.


Additionally, please check for the following before uploading your .csv into the platform to avoid encountering any errors upon uploading, or data quality issues that will negatively impact the quality of model performance:  

 

 

ItemDescription

Duplicate rows 


Having the same data repeated multiple times across the data extract


Mismatched headers


Having the wrong headers aligned to the wrong data fields

Hanging rows or columnsNot having all the data contained in sequential rows

Example:
Having all comments in Row 1 to 10,000, but having a row with a cell containing data in row 19,999.

Inconsistent date formatting Different rows with inconsistent date formats

Example:
Having a number of comments in US date format, and a number of comments in EU date format, all in the same dataset, as this will have issues normalizing downstream.

Incoherent sentencesThese are sentences that contain an assortment of words without a clear syntactic or semantic structure.

Example: 

'The user is requesting a new portable 28442 298 ticket to be creaportableted' 


Inconsistent spacingWhen there are an irregular number of spaces in between words.

Example:
'The policy    is set to     renew' instead of 'The policy is set to renew'

Breaks in wordsWhen there are breaks in the middle of a word, when there shouldn't be.

Example:
'The po licy is set. to renew' instead of 'The policy is set to renew'

Erroneous character encodingWhen text data is not properly encoded, resulting in garbled or unreadable characters.

Example: 
'ThÇ åpp is gré¶t' instead of 'The app is great.'


Blank comments


Communications without any content included in the subject/body


Comments with lots of typos

Text data containing lots of errors in spelling

Headers / footers When there are headers or footers included

Example:
Spam warnings, virus scan warnings, etc.
Metadata included in the subject/body instead of as a metadata propertyWhen metadata is included in the subject or body

Example:
'[01/01/2023] I would like to renew my policy' as the body of a message, instead of 'I would like to renew my policy' as the message with 01/01/2023 as the date included in the metadata.

Multiple messages combined into one messageWhen there are multiple messages that should have been broken out into separate messages in a thread, combined into a single communication.




Previous: Using Exchange Integrations  

Did you find it helpful? Yes No

Send feedback
Sorry we couldn't be helpful. Help us improve this article with your feedback.

Sections

View all