What Alex, Assistant, and Siri really listen, store and process

Voice assistants are fashionable. Intelligent Amazon speakers or Google have become one of the potential gifts of this Christmas, and now as a Google assistant as well as Alexa or Siri speak Spanish, it is advisable to ask yourself: What does he hear and keep everything we talk about?

Personal data protection is again controversial with some solutions that some might think we are "spying", but as we will see below, all companies responsible for developing voice assistants have taken this part into account and care very carefully about the data they are dealing with and how to deal with them.

The machines want to talk to us

Intelligent speakers they give a little bit of fearNo longer because of the anecdotal errors that caused these sinister laughter, but because of their use it is not clear what kind of security this interaction has with these devices.

We are surrounded the technology that listens to us, but is not satisfied with the transfer of a mobile phone everywhere intelligent speakers are accompanying us at home, and that is not the only element in which voice assistants are present. Smart watches and various solutions that we could include in the internet segment can also benefit from this development.

The problem (if we think it is such) is the feeling that these products are always listening to us, which endangers the security of our data. and our privacyIt happened, for example, with that private interview that Amazon Echo eventually shared with accidental contact by mistake, but this type of situation is an exception, not a rule.

In this way, it is as if this voice helped have always been watching us, although in fact, the manufacturers have given us enough information to understand what these assistants do with these data.

That's how the voice assistants work

Smart speakers and other products that use voice assist capacity work in a similar way: they are activated by a word that wakes them up, which means that these assistants are in active alertness: they always listen, but they pay attention only from the moment they hear this esocial word ("Hey Siri", "OK Google", etc.) Or a small activation phrase.

Google Now When I asked a Google assistant on my Android phone if he spied on me (he did not answer but responded as if it were a question), it was his answer.

In order to perform this active quit function, they stop listening to us and making small recordings with what they hear and try to recognize. If this trigger or phrase is detected, the machine keeps the record to process, but if it is not, recording is excluded,

Once we activate the voice assistant, yes, the data transfer starts and it is important to emphasize it again we are dependent on the cloud: This conversation and these questions or orders are not processed on the device itself, but they are transmitted to a server that interprets them, processes them, and answers that our assistant simply takes care to offer us aloud (naturally synthesized)

That's why our voice is not stored locally on the device, but ends up on the servers that the manufacturers of these devices and developers of these assistants (Google, Apple, Amazon, Microsoft) are preparing for all the great work of recognizing the language.

But What data are actually transferred to these servers? What do these companies do with these data? What can we do about it? This is what we think is important to clarify and do so for each of the four large voice assistants in the current market.

Google Now

What do you collect and what?

It is important to note that a voice assistant present in, for example, The Google homepage does not record all of our conversationsInstead of using the "Listen to Small Fragments" assistant for a few seconds to see if the activation phrase has been spoken. If not, these fragments will be deleted "and none of this information will leave your device until you hear the activation phrase."

Google Home

Google Home Help informs us about the information gathered by the Google Home Family devices. In fact, there is a section that shows the collected data, and that They are divided into three groups,

The first is the data to improve performance and reliability devices such as WiFi network stability, percentage detection success or latency, among others.

The second group is a group that contains usage statistics such as the number of interactions on the device and the buttons we push in the assistant. The duration of multimedia sessions is also gathered and what apps we use in these sessions, but here it is important to emphasize that in Google "We do not collect information about the content being reproduced, but it is possible that we will provide a multimedia service. "

In the third group of collected data there is information about the hardware model and software version we are using, but also about the active processes, possible causes of failure in error messages.

This help also explains how Google's integrated Google Now assistant can access your search history to "offer better and more useful answers." Although you can give Google an address, you can not do it, and in this case,will show your approximate location based on your IP address and other signals to define alarms in the correct time zone to provide you with weather information and relevant traffic. "

The company is collecting data to make its services "faster, smarter, more relevant and more user-friendly", and it seems that working with these assistants allows Google's home page to learn "with the passing of time, to offer better and personalized responses and suggestions"

Where are these data stored and what control do we have over these data?

The data that is transferred to Google's servers then migrates to their data centers, where are stored indefinitely if we do not delete them manually.

Google Audio

Exactly there are tools Google offers to control this activity and manage data. In my activity we have and Full control panel in which we can consult the information that Google uses about our services, including those relating to the voice assistant.

In this panel, we can find all audio clips that have been recorded with our requests to filter the results those that correspond only to voice and soundIt will find records of our phrases that we can eliminate together with other information we do not want to be stored on these servers.

Alex in the Amazon

What do you collect and what?

As with other participants, Alex collects our interviews, requests, and voice commands. Amazon registers and processes information that they also may in some cases be shared with third parties,


This voice assistant will actually start recording a "fraction of a second of sound" before an activation word or phrase (or press the button that activates the assistant), and that is then when this recording was released on Amazon servers.

In the Amazon, they indicate that when we use an Alex-based device, they keep these recordings "in order to improve the accuracy of the results and improve our services." As with other services, "Deleting these records may aggravate your experience while using the device"

Where are these data stored and what control do we have over these data?

Amazon has one of the most important infrastructures in the world at server and data center level: there is no futile division Amazon Web Services It's one of the keys to your business.


Any Alexa user can access these voice tags from Alex (in the privacy section) or in the Amazon Web site. From here, these recordings can be removed, although we can continue to review and reproduce the voice recordings when processing these requests.

One of the peculiarities of this control panel is that we can also control the privileges we have granted to other services and applications that connect to Alex. This is where they play "skills" these extraordinary capabilities that Amazon has managed for a long time to adapt more to this voice assistant.

In these preferences, we can also set additional restrictions on using the assistant. Because devices can confuse some words that we say when talking and activating when they are detected regardless of the context, we can force Alex to activate only when we press the physical activation button,

We can also activate warning tone , which tells us when the recording starts and ends, and even "mute" the device, though it obviously makes it impossible to take advantage of it.

Siri in Apple

What do you collect and what?

Siri was the first vocal assistant to appear on the market in 2015 thanks to integration into the iPhoneAssistant collects and uses the information we have on a mobile phone such as our name or our contacts.


If we also have localization services enabled, this information can be sent with a request we give the assistant so the answer is more accurate,

Apple also specifies that some Siri features make "Data entry in real time from Apple servers, "for example, Siri will gather our current location and destination if, for example, we require a route between two points in Apple Maps.

Where are these data stored and what control do we have over these data?

When we talk to Siri, these commands are sent to Apple's analysis servers. In this process Apple assigns a random number to this record, which combines our voice files.

Apple Privacy Policy

After six months of recording, or if we deactivated Siri, Apple will "disconnect" this random number from our recordings, which basically does remove associations that existedThese files are stored for another 18 months from this time because Apple can use them to potentially test and improve their products.

This handling of recordings makes the case Apple is also different when it allows what can control the user. We can turn off location services deactivate Siri or disconnect active waiting so that the assistant only works with a physical command.

However, there is no access to these recordings, such as Google Assistant or Amazon Echo. We can, yes, erase the entire history of Apple's voice interaction, but we will have to deactivate voice dictation from our device settings.

But it is possible Apple claims all the data which has something about us that is happening from Apple's site ID. If we log in – you will be asked to answer two security questions – at the bottom of the page, go to the "Manage Data and Privacy" section that will take us to a specific Apple page for this section.

From here, we can now ask for a copy of our data, although it does not include as such audio files that we might expect to recover: the above processing of these files apparently these records can not be restored,

Cortana at Microsoft

What do you collect and what?

Microsoft's development has begun its way on mobile devices, but Windows Mobile and Windows 10 failures in smartphones have made Cortana jump on the desktop – It is integrated in Windows 10 – and some intelligent speakers.


We use Cortana to collect information about our device, Microsoft services we use, and third party services that we connect with Cortana. At Microsoft, they say that "Cortana does not use the data you share with her to advertise you"

As stated in Cortana's private terms, the collected data includes browsing history, calendar, contacts, location history, or even – and it is troubling- "content history, and communications of messages, applications, and notifications"

If we use Cortana as part of a Windows session with our account on this platform, these recordings will be made associated with that account, and from there we can manage the collected data.

Where are these data stored and what control do we have over these data?

They also have extensive server infrastructure in Microsoft –Azure is an increasingly important platform for Redmond, where they store all of these collected data.


If we want access to the download of data that Microsoft stores about us when using Cortana, we can do this from our Windows teamWe can go to "Cortana-> Permissions and History," which allows us – if we use Cortan on a regular basis and have signed up – it's not my case – "change what Cortana knows about me in the cloud."

We can also gain access to all voice data stored by Microsoft by going to the privacy panel of our account, which will lead us to a number of options, including voice search and interaction,

Microsoft explains that the "all data is not displayed" panel in the "routinely removes data that our systems no longer need"From this panel, we can also control which third-party services we provide access to connect to Cortana, for example.

The control panel that Microsoft offers to Cortana is more complete than other options and to a certain extent allows to maintain a certain balance between voice assistant capabilities and the data he collects when using it.

Conclusions: Participants listen, but the user has control (if he wants)

The avalanche of services and products related to voice assistants is security and privacy suspicions these devices are inevitable.


As with many other services that collect data when surfing the Internet, this ambition collect more and more information It also extends users to these voice assistants.

Here, as in many other scenarios that affect our routine use of technologies, it is important to be aware of what is being collected with these services, but also aware that the use these benefits imply certain obligations, Everyone chooses whether to sacrifice it and if the data collection is not problematic.

Whether it is or not, fortunately, we were at a time when technology companies were forced to improve their own tools that allow users to access their collected data, and this is not an exception for the participants.

Apple does not offer such direct control, but its approach is valid and the other three – Google, Microsoft, and Amazon – that most suspects can raise with "hunger for data," which is reflected in many of its products and services. They do this because they certainly depend on these data for having a certain competitive advantage – more effective publicity, for example – but it is not new in the overly furious segment of this collection,

The transparency of these services is increasing, although it is always up to the user to control this data collection with greater or lesser accuracy. Some data that we do not forget, except for the Apple case, will remain indefinitely on the servers of these technologies. We can delete them if we want to, but we have to be proactive for that. This is another very different war, but knowing to what extent we are exposed is a good starting point.

