RasPI Assistant: Google Assistant + Dialogflow + Raspberry Pi

RasPI Assistant: Google Assistant + Dialogflow + Raspberry Pi
Reading Time: 5 minutes

Would you like to control the TV using your voice without spending a lot of money? Amazing, right? So, in this post, I will teach you how to do that and more.

RasPi Assistant Diagram

Some of my dreams have always been to control things without touching them. For example, I would like to be able to control the the television without raising my hand to change the channel. So, let’s create a device that can do this action automatically.

Next Steps?

First, we need to understand the problem and come up with a way to solve it.. For example, if we want to control a TV that is not smart, then how will we do that? A possibility is to send infrared signals (IR) to transmit the events that the person’s desires.

Also, if I want the device to hear me, I may need a microphone. Additionally, it should have a speaker to talk with people.

Furthermore, I will need a database to save all the information. APIs can help me with the smart logic and cheap electronic components like a Raspberry Pi, resistors, leds, wires and a protoboard.

TVs Interaction

To control a TV that is not smart could be difficult. In this occasion, I will use infrared signals (IR) to interact with the television. So, I need to research more about it.

Different Types of wavelengths

First, you need to know what is infrared. Infrared radiation is a type of electromagnetic radiation.  It is invisible to human eyes, but people can feel it as heat. It has frequencies from about 3 GHz up to about 400 THz and wavelengths of about 30 centimeters (12 inches) to 740 nanometers (0.00003 inches).

TV and Infrared Interaction

A TV remote control uses IR waves to change channels. In the remote, an IR light-emitting diode (LED) or laser sends out binary coded signals as rapid on/off pulses. A detector in the TV converts this light pulses to electrical signals that instruct a microprocessor to change the channel, adjust the volume or perform other actions. IR lasers can be used for point-to-point communications over distances of a few hundred meters or yards.

DIY Circuit

In our case, I have created a circuit to connect it with the Raspberry Pi. It can record the IR signals to each event and save them on a database. Also, the circuit has a IR transmitter to send events to the TV.

Audio Processing

As you know, our ears enable to understand what people are saying. So, if we want our device to  take actions by voice, then we need to analyze the audio.


The audio processing is hard due to processing the different accents, context, noise, local region and others. Currently, there are many companies as Google and IBM that are using Deep Learning to create sophisticated models that can transform the audio to text with a considerable confidence.

In this project, I used the Google Assistant SDK. It is a powerful framework that has Google, and can process the audio with a high confidence, support several languages, has low latency and can be integrated with a lot of devices. Another service that I used was the IBM Speech To Text (STT) tool (demo).


Moreover, the device should have a talking skill, because it is necessary that the device be friendly, to  simulate talking. But it is difficult, because voice can sound robotic; however, by using Deep Learning, we can create phonemes that sound more like humans.  I used text to speech services such as Google TTS and IBM TTS (demo), that can convert the text to audio. Afterwards, I played it using VLC on the Raspberry Pi.


Natural Language Processing

In this point, we have a module that converts the text to audio and another that pulls out the text from the audio, but what will we do with these text? We will need something to get the intent and understand what the person want to do So, let’s use Machine Learning and Natural Language Processing to solve this issue.

Dialogflow Screen - Intents

Dialogflow screen modifications

So, I trained several intents and entities in Dialogflow to analyze the text, allowing the machine to make decisions with the intents and the parameters detected on the Raspberry Pi. For instance, to identify if the person wants to change the channel and which channel the person wants to watch the next channel, or if she or he needs more volume on the TV. I also used context variables to make decisions about something previously mentioned.

Final Integration

Here comes the best part. I  connect the entire modules in only one project. First, I created a circuit to record and send IR signals and then I connected it to the RPI and developed a Python module to do that.

Additionally, I connected a microphone and a speaker to the RPI (I used a USB adapter). Then, I downloaded and installed the Google Assistant SDK on the embedded system, testing that the voice commands were translated to text. Next, I created a chatbot on Dialogflow to send the previous text and identify what the user really wants to do.

Coding screen

Using the response, I converted the text response to audio (TTS) and played it on the RPI with VLC program. I used the intents and parameters to send the correct IR  command, so it interacts with the TV. The application has the ability to do several things with the TV. For instance: turn on/off, mute/unmute, change a channel by number, next and previous channel, remember the last channel, modify the volume and more.

The Next Generation of Chatbots is here. If you want to learn how to create a Chatbot, you should read more about it here