Introduction
In this blog we’ll take a look at the Voice Recognition Code and see how we can combine it with the Vision aspect of this project.
All the code in this blog is available at : https://github.com/ric96/RB3-RoboticArm
And once the project is finalized will be pushed to : https://github.com/96boards-projects/RB3-RoboticArm
To install all the dependencies on a debian Buster build for RB3, run the following script: install-opencv.sh
You will also need to install the following pip packages. sudo pip3 install SpeechRecognition pymemcached
Today we’ll be looking specifically at the main.py
file in our project directory.
Import all the things
We start with importing the following libraries:
import json
from pymemcache.client import base
import speech_recognition as sr
from difflib import get_close_matches
json
Used to parse data in json format. This helps us to share lists over memcached since memcached can only handle string values.pymemcached
is a data caching and sharing frontend for python using memcached.speech_recognition
is a collection of speech recognition libraries under one roof.difflib
is a library dedicated to showing the diff of two or more strings but can do other things as well like showing closest match from a list. We use for basic language processing.
Declare all the variables
Some quick global variable declaration:
client = base.Client(('localhost', 11211))
shape_data_str = client.get('vision_data')
shape_data = json.loads(shape_data_str)
print(shape_data)
sample_rate = 48000
chunk_size = 2048
#Initialize the recognizer
r = sr.Recognizer()
mic = sr.Microphone()
color_pattern = ['blue', 'green', 'yellow', 'red']
action_pattern = ['pickup', 'drop', 'dance']
obj_pattern = ['cube', 'square', 'cuboid', 'rectangle', 'triangle', 'prism', 'cone', 'hexagon', 'circle', 'sphere', 'ball' ]
client = base.Client(('localhost', 11211))
: connect to localhost socket for memcached, allowing us to share data fromshape.py
.shape_data_str = client.get('vision_data')
: get string data from memcached with the label “vision_data”shape_data = json.loads(shape_data_str)
: change string data received from memcached back to list using json.sample_rate = 48000
,chunk_size = 2048
: set audio parameters for mic input.r = sr.Recognizer()
: Initialize the recognizer class.mic = sr.Microphone()
: initialize the microphone input, this uses the system default notification.color_pattern = ['blue', 'green', 'yellow', 'red']
: list of patterns that we diff user input against to create a basic language processor.
All the functions
Basic language processor
def closeMatches(patterns, word):
data = word.split()
for temp in data:
match_list = get_close_matches(temp, patterns)
if len(match_list) != 0:
return match_list[0]
return 1
This is a basic language processor that diffs voice input against the various lists we initialized in the beginning.
Speech Detector
def detect():
with mic as source:
#wait for a second to let the recognizer adjust the
#energy threshold based on the surrounding noise level
r.adjust_for_ambient_noise(source)
print("Say Something")
#listens for the user's input
audio = r.listen(source)
try:
text = r.recognize_google(audio)
print("you said: " + text)
return text
#error occurs when google could not understand what was said
except sr.UnknownValueError:
print("Google Speech Recognition could not understand audio")
return 1
except sr.RequestError as e:
print("Could not request results from Google Speech Recognition service; {0}".format(e))
return -1
This function takes input from mic and detects the words spoken by the user using Google’s Web Speech API. It returns a string of detected words if no errors occur. Or it returns 1 or -1 depending upon the error.
RUN
def run():
if (detect() == "hey dummy"):
print("what do you want?")
instruction = detect()
if (instruction != 1 or instruction != -1):
action = closeMatches(action_pattern, instruction)
if action != 1:
print("Action: " + action)
else:
not_understood()
return 0
color = closeMatches(color_pattern, instruction)
if color != 1:
print("Color: " + color)
else:
not_understood()
return 0
obj = closeMatches(obj_pattern, instruction)
if obj != 1:
print("Object: " + obj)
voice_dat = [action, color, obj]
return voice_dat
else:
not_understood()
return 0
else:
not_understood()
return 0
else:
not_understood()
return 0
- Detects activation command aka “dummy”
- Detects action, shape and color of the object instructed by the user and returns a list.
- anything fails, return 0.
Main: one loop to rule them all:
I honestly couldn’t find a meme for this :/
voice_data = run()
if (voice_data != 0):
if (voice_data[1] == "blue"):
if(shape_data[0][0][2] == voice_data[2]):
print("Required Object at X:" + str(shape_data[0][0][0]) + " Y: " + str(shape_data[0][0][1]))
else if (voice_data[1] == "yellow"):
if(shape_data[1][0][2] == voice_data[2]):
print("Required Object at X:" + str(shape_data[1][0][0]) + " Y: " + str(shape_data[1][0][1]))
else if (voice_data[1] == "red"):
if(shape_data[2][0][2] == voice_data[2]):
print("Required Object at X:" + str(shape_data[2][0][0]) + " Y: " + str(shape_data[2][0][1]))
- Detect if the activation word has been spoken.
- Compare list of user actions and objects against data sent by
shape.py
return X & Y position of the required object detected by the camera.
Why memcached and not threading, why json and how it actually works?
So, as some of you might know that python support “proper” threading due to GIL. So to get around this issue we are running the main.py
and shape.py
as separate programs and run memcached
separately.
This allow us to share data between the two scripts so our voice script knows the x y co-ordinates of all the objects that are detected by the opencv script.
Json comes into play because memcached can only pass String values so we need to convert our lists to string and converting back to json.
This article is Part 7 in a 10-Part Series.
- Part 1 - Qualcomm RB3 Robotic Arm Project | Introduction
- Part 2 - OpenCV on RB3 Pt. 1 | Qualcomm RB3 Robotic Arm Project
- Part 3 - Threads got complicated | OpenCV on RB3 Pt. 2 | Qualcomm RB3 Robotic Arm Project
- Part 4 - Gimme Code | OpenCV on RB3 Pt. 3 | Qualcomm RB3 Robotic Arm Project
- Part 5 - My experiments with 1080p | OpenCV on RB3 Pt. 4 | Qualcomm RB3 Robotic Arm Project
- Part 6 - Setup RB3 for the Robotic Arm Project | OpenCV on RB3 Pt. 0 | Qualcomm RB3 Robotic Arm Project
- Part 7 - Speech Recognition and Combining Voice with Vision | OpenCV on RB3 Pt. 5 | Qualcomm RB3 Robotic Arm Project
- Part 8 - It's ALIVE | OpenCV on RB3 Pt. 6 | Qualcomm RB3 Robotic Arm Project
- Part 9 - Out with Arduino, In with i2c and PCA9685 | OpenCV on RB3 Pt. 7 | Qualcomm RB3 Robotic Arm Project
- Part 10 - The RB3 Robotic Arm