How it works: Siri, Google, Alexa

Ajish Prakash on January 26, 2019

Computers and electronic devices were machines that were primarily made to assist mankind, back in the middle of the twentieth century. To carry out their respective tasks, entire rooms, and large confinements were provided for as all the data and processing required immense cooling and physical storage units. However, technology has come a long way since and now is feasibly confined to the size of a handheld device.

A major component of this technology, which translates effectively into mobile phones, is interactive software. The foremost one, which is now a household name, is speech recognition software. Different platforms have different names for them, such as Siri, Cortana, and Alexa. What they all have in common, however, is the ability to convert audio commands into text and then respond accordingly at the very instant.

A brief history of speech recognition

The collective work in this sector has been ongoing for the past twenty years. Back in 1990, a handful of scientific pioneers felt the need for an algorithm that could effectively convert the audio input into visual text and then respond accordingly. Faced with hurdles that pertained to technological advances at that time, the resulting software was still a revolutionary leap. Products such as Kurzweil, Lernout & Hauspie, and Kolvox, were known back in those days. With a speed of 40 words per minute, they were able to pronounce words clearly, yet slowly.

How speech recognition works

The fundamental idea behind Speech recognition might appear as foreign and complex in the beginning. However, if inspected closely, the general idea is pretty simple, and the resultant action of it has been astounding. Since sound is a wave, the first step is to convert the audio into numbers, so that they may be analyzed and inspected. This is done via a Convolution Neural Network.

Due to the one dimensional nature of waves, they are characterized by their height and width. This characterization is essentially what enables the algorithm to distinguish between different words and speech patterns. The general wave is broken down into several chunks, which are scrutinized intensely. The resulting transformation is what gives us the visual text and the consequent response, already stored in the software.

The importance of Siri

Apple can be argued as the founder of the modernist wave of Speech pattern recognition software. Started back in 2011, Siri has evolved into a platform which has enabled a wide audience to use cell phones and has been the frontrunner in most of the Apples’ marketing campaigns. The importance of this software cannot be emphasized enough for several reasons. They have made the collective experience of using a mobile phone easier for us. Users now don’t have to type out what they have to text or browse on the internet.

Furthermore, this feature also enables people with diseases like Parkinson's, who have difficulty typing, use the phone with ease. Furthermore, the fact that you can also use this feature to carry out commands means that you’ll be saving a lot of time as well!

