“A great spot for a quick bite while visiting the city. The food is not the best but it is still delicious. I was impressed by their spicy chicken and the shrimp. The place is clean and the staff is friendly.”
“Such a gem, I had to try it! I think the chicken tenders are the best I’ve ever had! I’d love to come here more often if it weren’t so busy. If you’re in the area, I’d definitely recommend stopping by.”
If you’re anything like me, reading customer reviews online has become a key part of deciding which product to buy, which movie to watch, or as illustrated by the two examples above, where to go out for a meal. Whether its on Amazon or Yelp, IMDB or TripAdvisor, reading and leaving reviews online has become the norm.
Indeed, recent research aggregated by Qualtrics (2020) finds that 93% of consumers read online reviews before a purchase-decision. The same source suggests that young adults are particularly influenced by online reviews, as 91% of 18-34 year olds trust anonymous online reviews as much as personal recommendations. With online reviews’ relative influence on shaping consumer decision-making growing, the question to pose seems: can and should online reviews be trusted?
The exponential increase in online reviews has given rise to the phenomenon of fake reviews. Simply put, these are made-up reviews that don’t reflect reality in any way. Typically seen as an effort to optimise search engine visibility, companies turn to fake reviews and ratings to boost their brand image and increase their position on ranking algorithms.
Hoping to understand the scale of the phenomenon better, Luca & Zervas (2016) concluded that 10-20% of all reviews on Yelp are fake. Exploring the business of “fake review -writing” further, Proserpio, Hollenbeck & He (2020) found closed groups on social media as the primary “black market” for fake reviews, and estimated that in 2020 as many as 4.5 million sellers elicited fake review writing services through Facebook alone.
While fake reviews have traditionally been made-up and written by humans, recent advances in artificial intelligence (AI), specifically in natural language processing (NLP), offer frighteningly powerful new tools for online spin doctors. The study of “deep fakes”, broadly understood to mean any type of fake content generated automatically by a machine learning system (e.g. video, audio, or as discussed here, text), is a booming area of research and development. To test the feasibility of sophisticated NLP techniques to generate believable fake reviews, I decided to devise a mini-study.
Applying GPT-2 to generate fake restaurant reviews
Using one of the most popular open-access NLP systems, OpenAI’s GPT-2, I created a dataset of 10 randomly generated fake restaurant reviews. To complement my fake review data, I randomly scraped 10 real restaurant reviews from the popular review site TripAdvisor. Equipped with a total of 20 reviews (10 fake, 10 real), I recruited 100 restaurant professionals to evaluate whether they thought the reviews they were presented with were written by a real human or automatically generated by a computer.
To control for bias, participants were randomly allocated to one of four conditions, all of which included 10 reviews: 1) all reviews written by a real human, 2) a hybrid condition, with 50% of reviews written by human and 50% by computer, 3) a second hybrid, with a different set of 50% real reviews and 50% fake reviews, and 4) all reviews automatically generated by GPT-2.
The results were quite interesting. Overall, out of all of the ratings given (n=1,000), participants labelled 53% of the reviews correctly. Real reviews written by humans were recognised with the highest accuracy (60%), while reviews written by GPT-2 turned out more difficult to spot (47% correct labels given). Certain computer-generated reviews were particularly hard to detect: only a handful of participants (13%) correctly labelled the two examples given in the beginning of this article as fake. Despite their apparent “human-likeness”, both of these reviews were in fact automatically generated by GPT-2.
While still being more of a lab-experiment than a real widespread issue, text generated by modern NLP techniques such as GPT-2 are definitely getting more human-like, and as hopefully illustrated by this mini-study, more difficult to distinguish from text generated by real humans. As more and more activities move online due to the pandemic, more consumers than ever turn to online reviews to inform their purchase-decisions. Adopting a critical perspective to what we read online has never been more important.
- Kaemingk, D. 2020. Online review statistics to know in 2021.
- Luca, M. & Zervas, G. 2016. Fake it till you make it: reputation, competition, and Yelp review fraud. Management Science 62, 12. DOI: 10.1287/mnsc.2015.2304.
- Prosepio, D., Hollenbeck, B. & He, S. 2020. How fake customer reviews do – and don’t – work.
It should be noted that the fake reviews generated by this mini-study were prepared using OpenAI’s NLP model GPT-2. Consisting of over 1.5 billion parameters, GPT-2 at the time of its release demonstrated the capability of modern, transformer-based NLP models. Citing fears of how convincing fake text generated by the system was, the model’s open-access release was actually postponed by nine months (from announcement in February 2019 to release for public in November 2019). Since then, OpenAI has as of June 2020 announced a new version of its system – aptly named GPT-3 – which consists of a whopping 175 billion parameters. This represents over a hundredfold increase in size. Even though at the time of writing GPT-3 has not yet been released for public use, it should be safe to say that text generated by the new system is likely to be much more accurate, coherent, and above all, convincing than anything generated for the purposes of this study. Interesting times ahead…