What does Deep Learning classification of mobile app reviews show about true user satisfaction?


Mobile application satisfaction scores are a topic of endless discussion year after year with our retail banking clients when they are given the results of D-Rating’s digital performance rating campaign.

There is a lot of discussion about some of the bias in the satisfaction ratings of mobile banking applications:

  • The fact that users are led to leave an evaluation/rating at the end of a simple, well-designed and frequently used journey (e.g. bank transfers): this necessarily pulls up the ratings
  • Some mobile applications ask for feedback on satisfaction and direct to the app store (when the feedback is positive) or to customer service (when the feedback is negative)
  • And other use even more crude tactics to maximize the number of 5 stars ratings by mobile app users

In addition, some of our customers insist on the fact that a significant number of notes and reviews left by their customers do not concern the mobile application itself but often the customer service or the level of satisfaction with their bank in general, with for example very negative comments following a loan refusal, too high fees… Indeed, in these cases it is absolutely not a note related to the mobile application and therefore has no relation with the digital performance of a retail bank Another examples: some users use the possibility to leave comments just to communicate their referral code and thus hope to benefit from a multiplier effect of the reviews visibility on the store.

From the beginning, D-Rating has looked at different aspects of mobile application satisfaction score in its evaluation of retail banks digital performance:

  • The “Stock” i.e. the rating displayed by the stores and which could potentially influence a future customer/user.
  • The “Flow” i.e. the average rating corresponding to the different ratings left during the previous year. In an attempt to remove some of the rating bias, D-Rating only counts in its average the ratings that were accompanied by a comment (maximization of proactive customer reviews vs. banks led customer reviews)

How D-Rating decided to start addressing mobile app satisfaction score bias

As the trends described at the beginning of this article are becoming more and more common according to our clients, and as D-Rating had not done detailed analysis of reviews, we have launched work on :

  • The classification of reviews to focus the analysis and Digital Performance evaluation on reviews mainly concerning the mobile application (what we will call “compliant reviews” in the following)
  • Time analysis of reviews to identify signals showing batch of banks led customer reviews vs. spontaneous ratings left by customers

In this paper we will discuss the first results obtained through the classification of mobile application reviews.

We developed a Deep Learning (Long Short Term Memory, Recurrent Neural Network) based approach for the classification of app reviews. This deep learning technique has proved very accurate for the text classification in various domains. The approach extracts textual information of each app review, preprocesses the textual information, computes the model with supervised training (80% training set + 20% validation set) with earlystop to avoid overtraining and thus overfitting.

The training was done on a volume of more than 2500 reviews, randomly selected in a sample of reviews from 15 French banks, with an overrepresentation of “non compliant” reviews (to optimize training as they are globally underrepresented ~25%).

The application of the model on a testing set of more than 1000 reviews shows that for this first step the accuracy rate is 92%.


First results on satisfaction scores with Deep Learning classification of reviews

For the moment we have focused on French banks; other work is to come to deploy models for other European countries and thus in other languages.

As these are intermediate results – we need to continue to train the model – we will show here detailed results for only 4 players of different nature looking at their last 500 reviews (on 2022 Nov. 25th):

  • A first traditional bank: Bank-A
  • A second traditional bank: Bank-B
  • A digital bank (subsidiary of a traditional bank) : DigiBank-C
  • A neobank: NeoBank-D

– Occurrence and values of both reviews type –

The share of “non-compliant” reviews and the value of the associated ratings depends not only on the bank and its subscription terms and conditions/offers but also on the events related to the mobile application (e.g. a bad update will increase very strongly the number of « compliant » reviews).

Bank-A: many reviews about the level of satisfaction with the branch and the bank advisor. As Bank-A in France is well known for its customer relationship efforts it pulls the scoring results upward.

« I am very satisfied with the services and advice offered by my branch in Vaulx-en-Velin. The advisors can be reached directly and a lot of things seem to be done at the branch level which allows a great reactivity. »

Bank-B: period of regression of the mobile application with a lot of dissatisfaction hence the very low number of “non-compliant” reviews compared to the ~25% global average, and overall very low scores (given the volumes of “non compliant” reviews, the 1.88 average is not significant)

DigiBank-C: a significant number of reviews related to DigiBank-C offering (sponsorship system with referral code, pricing…)

« Referral code to get 80€ bonus at account opening 😉 Code: ILBE9384″
“I’m so glad I chose this reliable bank and above all without any fees!!!!!

but also, like most of other Digital and Neobanks, some dissatisfaction with customer service expressed through the mobile app reviews

« There is no more service. The hours have changed and the wait is one hour on average! Very disappointing I’m going to change banks again because this is unacceptable. I don’t know how we are supposed to do in an emergency situation! »

Neobank-D: quite many reviews related to not advised account closures, subscription gifts not obtained, impossibility to reach the customer service…

« My money was ruthlessly frozen by the bank and my payment methods blocked without any prior warning. A bank transfer was initiated without my consent with the wording « account balance to be closed ». Have we entered a financial dictatorship? Does private property still have a meaning? Will financial repression end up stealing our freedom? »

« A bank of thieves, a useless customer service my account has been blocked for 6 months I have no explanation I am asked to wait every time I call and today I get a message that I am overdrawn when I had money in my account and I have not used my account for 6 months. Their customer service is useless. I advise you not to use this « bank » because if you have a problem good luck getting it resolved. »

– Overall impact on the average mobile app satisfaction rating and ranking –

First partial conclusion that we will confirm in a future paper; the more there is a global dissatisfaction with the bank, the more the reprocessing of the reviews (eliminating “non compliant” reviews) has a positive impact on the score (and inversely).

Next steps to improve the mobile app reviews and ratings processing model

1 – We will continue training the model to further improve the accuracy rate
2 – Then develop temporal analyses to identify biases related to satisfaction rating campaigns by banking institutions
3 – An at the end, deploy the model on other languages than French

If you want to read more