In an age where Facebook, Google, Amazon, and many others are amassing an immense amount of data, what could be more concerning than drastic advancements in artificial intelligence? Thanks to neural networks and deep learning, tons of decision problems can be solved by simply having enough labelled data. The implications of this big data coexisting with the artificial intelligence to harvest information from it are endless. Not only are there good implications, including self-driving cars, voice recognition like Amazon Alexa, and intelligent applications like Google Maps, but there are also many bad implications like mass user-profiling, potential government intrusion, and, you guessed it, breaches into modern cryptographic security. Thanks to deep learning and neural networks, your password for most applications might just be worthless.
Since the cryptographic functions used in password hashing are currently secure, many attacks attempt to acquire the user’s password before it even reaches the database. For more information, see this article on password applications and security. For this reason, attacks using keyloggers, dictionary attacks, and inference attacks based on common password patterns are common, and these attacks actually work quite frequently. Now, however, deep learning has paved the way for a new kind of inference attack, based on the sound of the keys being typed.
Investigations into audio distinguishing between keystrokes are not new. Scientists have been exploring this attack vector for many years. In 2014, scientists used digraphs to model an alphabet based on keystroke sounds. In addition, statistical techniques are used in conjunction with the likely letters that are being typed (determined by the sound patterns) to create words that have a statistical likelihood of being typed based on the overall typing sound. This “shallow learning” approach is a good example of a specific set of techniques developed for a specific task in data science research.
Approaches like this one were used for years in fields like image feature recognition. The results were never groundbreaking, because it is very difficult for humans to create a perfect model for a task that has a massive amount of considerable variables. However, deep learning is now in the picture, and has been for some time. Image recognition with deep learning is so good it is almost magical, and it certainly is scary. This means that this task of matching keystroke sounds with the keystrokes themselves might just be possible.
Implications of Neural Networks
Nowadays, training models using audio for keyboard stroke recognition is a successfully performed task. Keystroke sound has been successfully used as a bio-metric authentication factor. While this fact is cool, the deeper meanings are quite scary. With the ability to train a massive neural network on the plethora of labelled keystroke sound data available on the web, a high-accuracy model can be created with little effort that predicts keystrokes based on audio with high accuracy.
Combined with other inference approaches, you could be vulnerable any time anyone is able to record you type your password. In fact, according to this article by phys.org, with some small information, such as keyboard type and typist style, attackers have a 91.7% accuracy of determining keystrokes. Without this information, they still have an impressive 41.89% accuracy. Even with this low keystroke accuracy, attackers may still be able to determine your password as the small accuracy could still clue them into your password style, e.g. using children’s or pet’s names in your passwords. Once attackers have an idea of your password style, they can massively reduce the password space, as stated in this article. With a reduced possible password space, brute force and dictionary attacks become extremely viable. Essentially, with advancements in deep learning, the audio of you typing your password is definitely a vulnerable vector of attack.
What you can do to protect yourself
The main vulnerability of this attack lies in the VOIP software widely used by companies and individuals alike to communicate. When you use software like Skype, your audio is obviously transmitted to your call partners. This audio, clearly, includes audio of you typing. This typing could be deciphered using machine learning and inference attacks, and any attacker on the call could decipher some or all of what is being typed. Of course, some of this typed text may include passwords or other sensitive information that an attacker may want. Other vulnerabilities include any situation where someone may be able to covertly record your keystrokes. For example, someone may record you typing in person by using their phone without you knowing.
So, to protect yourself, be sure that you have a second factor authenticating you in important security applications. Most login interfaces such as Gmail offer 2-factor authentication. Be sure that this is enabled, and your password will not be the only factor in your login. This reduces the risk of attackers obtaining your password. Additionally, of course, using good password practiceswill make it harder for inference attacks to supplement deep learning in acquiring your password. Finally, you could certainly reduce the risk of audio-based attacks by not typing in passwords when on VOIP calls.
Certainly, there isn’t much to do to mitigate the risk of your typing audio being eavesdropped. The implications that deep learning have on audio-based password attacks are definitely scary. It’s a fact that neural networks might mean that your password is worthless, and they’re only getting stronger. The future of artificial intelligence will change not only modern authentication systems, but it will change society in ways we can’t even imagine. The only thing we can do in response is be aware and adapt.
If you have any questions or comments about this post, feel free to leave a comment or contact me!