Introduction
Sentiment analysis is a useful way to gain insights into customer perception.
Prior to large language models hitting the mainstream a popular approach to sentiment analysis was to strip text down into individual words. This approach is known as “bag of words.” Then those individual words would be joined to emotions with an available sentiment dictionary.
Although that approach works quite well it has difficulty with ambiguous statements such as “I bumped my head on the door and was hurt, but the ice-cream was great.” It also has difficulty with longer statements, although approaches exist to consider the volume of text as part of the analysis.
Language models based on neural networks are much better at taking into account the surrounding context and handling linguistic ambiguity. Trained models are also much simpler to work with, for example they avoid the problem of having to define stop words.
Hugging Face
Hugging Face is a popular community hub for AI developers that makes advanced machine learning models both accessible and easy to use. Many models have liberal license terms and can be used commercially for free.
These models can be used “as-is” or they can serve as a base model that can be fine tuned for specific use cases or particular types of customers.
Let’s explore using a Hugging Face model in Python to run some basic sentiment analysis.
Loading Packages
We will use the SamLowe/roberta-base-go_emotions model that was developed in PyTorch for this example. The first step is to load PyTorch and then pipeline from the Hugging Face transformers package.
Note that if you do not have these packages installed you may find it easier to set up a new environment and use PIP rather than Conda.
import torch
from transformers import pipeline
SamLowe/roberta-base-go_emotions
This model takes text input and returns an emotion such as “admiration” in a list along with associated probabilities. This provides a bit more colour than positive or negative sentiment.
Downloading the Model
Downloading models from hugging face is very simple with the transformers package, but be cautious as some of them are very large.
Features vary by model, but here we only need to call the pipeline function and specify the task as “text-classification” and then the model name.
The top_k parameter specifies the number of emotions to be returned. As we will see below these will be returned with a probability, the probability of all emotions returned will sum to one.
Finally, we must specify a device between GPU / CUDA or CPU. If you have installed the free Nvidia CUDA toolkit installed then you can use selected device = 0 to indicate GPU. Otherwise, you can use -1 to indicate CPU.
Classifying Text
Below are a few examples of text classification that might be a bit ambiguous using a bag of words approach but are handled easily by the Roberta language model.
classifier = pipeline(task="text-classification", model="SamLowe/roberta-base-go_emotions", top_k=2, device=0)
classifier("Huggingface is pretty amazing")
Out[13]:
[[{'label': 'admiration', 'score': 0.9553576707839966},
{'label': 'approval', 'score': 0.023877330124378204}]]
classifier("I lost my eggs at the till, but I found a puppy and it's adorable")
Out[14]:
[[{'label': 'admiration', 'score': 0.6246468424797058},
{'label': 'sadness', 'score': 0.13270694017410278}]]
classifier("I want to speak to a manager! This product tasted like raw salmon, I like raw salmon but I don't like this product")
Out[16]:
[[{'label': 'disapproval', 'score': 0.6524176597595215},
{'label': 'annoyance', 'score': 0.11884757876396179}]]
Conclusion
Business uses for text sentiment analysis are plentiful, from keeping a pulse on social media feeds to understanding how our best customers view a new product, or what type of issues are causing the most frustration so we can improve our services.
Hugging Face hosts a wide variety of language models for various use cases, these are well documented and straightforward to set up and fine-tune on your own material.
Recent Post
Peeking inside the basket with lists
- 31 December 2024
- 5 min read
Streamline Workflows in R Studio
- 23 November 2024
- 6 min read
Customer Clusters with Gaussian Mixed Models
- 22 October 2024
- 8 min read
Text Sentiment Analysis with Hugging Face
- 28 September 2024
- 4 min read
Product Graph Analytics
- 21 August 2024
- 11 min read