If you’ve ever been concerned about the privacy aspects of AI, you may be very surprised to learn that conversations you have with Google’s new Gemini AI apps are “retained for up to 3 years” by default.
Up To Three Years
With Google now launching its Gemini Advanced chatbot as part of its ‘Google One AI Premium plan’ subscription, and with its Ultra, Pro, and Nano LLMs now forming the backbone of its AI services, Google’s Gemini Apps Privacy Hub was updated last week. The main support document on the Hub which states how Google collects data from users of its Gemini chatbot apps for the web, Android and iOS made interesting reading.
One particular section that has been causing concern and has attracted some unwelcome publicity is the “How long is reviewed data retained?” section. This states that “Gemini Apps conversations that have been reviewed by human reviewers…. are not deleted when you delete your Gemini Apps activity because they are kept separately and are not connected to your Google Account. Instead, they are retained for up to 3 years”. Google clarifies this in its feedback at the foot of the support page saying, “Reviewed feedback, associated conversations, and related data are retained for up to 3 years, disconnected from your Google Account”. It may be of some comfort to know, therefore, that the conversations aren’t linked to an identifier Google account.
Why Human Reviewers?
Google says its “trained” human reviewers check conversations to see if Gemini Apps’ responses are “low-quality, inaccurate, or harmful” and that “trained evaluators” can “suggest higher-quality responses”. This oversight can then be used “create a better dataset” for Google’s generative machine-learning models to learn from so its “models can produce improved responses in the future.” Google’s point is that human reviewers ensure a kind of quality control both in responses and how and what the models learn in order to make Google’s Gemini-based apps “safer, more helpful, and work better for all users.” Google also makes the point that the human reviewers may also be required by law (in some cases).
That said, some users may be alarmed that their private conversations are being looked at by unknown humans. Google’s answer to that is the advice: “Don’t enter anything you wouldn’t want a human reviewer to see or Google to use” and “don’t enter info you consider confidential or data you don’t want to be used to improve Google products, services, and machine-learning technologies.”
Why Retain Conversations For 3 Years?
Apart from improving performance and quality, other reasons why Google may retain data for years could include:
– The retained conversations act as a valuable dataset for machine learning models, thereby helping with continuous improvement of the AI’s understanding, language processing abilities, and response generation, ensuring that the chatbot becomes more efficient and effective in handling a wide range of queries over time. For services using AI chatbots as part of their customer support, retained conversations could allow for the review of customer interactions which could help in assessing the quality of support provided, understanding customer needs and trends, and identifying areas for service improvement.
– Depending on the jurisdiction and the industry, there may be legal requirements to retain communication records for a certain period, i.e. compliance and being able to settle disputes.
– To help monitor for (and prevent) abusive behaviour, and to detect potential security threats.
– Research and development to help advance the field of AI, natural language processing, and machine learning, which could contribute to innovations, more sophisticated AI models, and better overall technology offerings.
Switching off Gemini Apps Activity
Google does say, however, that users can control what’s shared with reviewers by turning off Gemini Apps Activity. This will mean that any future conversations won’t be sent for human review or used to improve its generative machine-learning models, although conversations will be saved with the account for up to 72 hours (to allow Google to provide the service and process any feedback).
Also, even if you turn off the setting or delete your Gemini Apps activity, other settings including Web & App Activity or Location History “may continue to save location and other data as part of your use of other Google services.”
There’s also the complication that Gemini Apps is integrated and used with other Google services (which Gemini Advanced – formerly Bard, has been designed for integration), and “they will save and use your data” (as outlined by their policies and Google’s overall Privacy Policy).
In other words, there is a way you can turn it off but just how fully turned off that may be is not clear due to links and integration with Google’s other services.
What About Competitors?
When looking at Gemini’s competitors, retention of conversations for a period of time by default (in non-enterprise accounts) is not unusual. For example:
– OpenAI saves all ChatGPT content for 30 days whether its conversation history feature is switched off or not (unless the subscription is an enterprise-level plan, which has a custom data retention policy).
– Looking at Microsoft and the use of Copilot, the details are more difficult to find but details about using Copilot in Teams it appears that the farthest Copilot can process is 30 days – indicating a possibly similar retention time to ChatGPT.
How Models Are Trained
How AI models are trained, what they are trained on and whether there has been consent and or payment for usage of that data is still an ongoing argument with major AI providers facing multiple legal challenges. This indicates how there is still a lack of understanding, clarity and transparency around how generative AI models learn.
What About Your Smart Speaker?
Although we may have private conversations with a generative AI chatbot, many of us may forget that we may have many more private conversations with our smart speaker in the room listening, which also retains conversations. For example, Amazon’s Alexa retains recorded conversations for an indefinite period although it does provide users with control over their voice recordings. For example, users have the option to review, listen to, and delete them either individually or all at once through the Alexa app or Amazon’s website. Users also have the option to set up automatic deletion of recordings after a certain period, such as 3 or 18 months – but 18 months may still sound an alarming amount of time to have a private conversation stored in distant cloud data centres anyway.
What Does This Mean For Your Business?
Retaining private conversations for what sounds like a long period of time (3 years) and having unknown human reviewers look at those private conversations are likely to be the alarming parts of Google’s privacy information about how its Gemini chatbot is trained and maintained.
The fact that it’s a default (i.e. it’s up to the user to find out about it and turn off the feature), with a 72-hour retention period afterwards and no guarantee that conversations still won’t be shared due to Google’s interrelated and integrated products may also not feel right to many. The fact too that our only real defence is not to share anything at all faintly personal or private with a chatbot, which may not be that easy given that many users need to provide information to get the right quality response may also be jarring.
It seems that for enterprise users, more control over conversations is available but it seems like businesses need to ensure clear guidelines are in place for staff about exactly what kind of information they can share with chatbots in the course of their work. Overall, this story is another indicator of how there appears to be a general lack of clarity and transparency about how chatbots are trained in this new field and the balance of power still appears to be more in the hands of tech companies providing the AI. With many legal cases on the horizon about how chatbots are trained, we may expect to see more updates to AI privacy policies soon. In the meantime, we can only hope that AI companies are true to their guidelines and anonymise and aggregate data to protect user privacy and comply with existing data protection laws such as GDPR in Europe or CCPA in California.
By Mike Knight