Goal
Create a program that responds to voice commands. It should be programmed in such a way that further commands can be added easily.
Why?
As part of the Artificial Intelligence (AI) list item, I want to create an AI that can be interacted with via speech. The Jarvis AI from the Iron Man movies is my most recent inspiration. But to do anything awesome, I need to start with a good foundation.
Chronicle
Perhaps I should start with some background on my programming experience, since I’m not a professional. I tinkered a little with BASIC, Visual BASIC and HTML over the years, so not much experience there. However, I used the LPC language for around ten years to create interactive areas, monsters and actions on a MUD (a text-based online multiplayer roleplaying game). The monsters responded in a basic way to player speech and actions, which should help with future AI programming.
LPC is a C-based language, so I found it relatively easy to learn C++ when I experimented with programming visual games using Direct3D. Recently, I’ve used C++ to solve mathematical challenges on the Project Euler site. It also helps that one of my housemates has experience programming in C++ and there are lots of good webpages on the internet. Considering the above, I’ll be programming AI using C++.
That’s enough background for now.
First, I wrote down some notes on what I wanted the AI to do, and used those notes to draft a structure for my program. Shortly after, I set up my programming software and entered in the necessary functions that every Windows application needs.
The next stage was to play with the text-to-speech (TTS) and speech recognition (SR) code, as this would influence the structure of my program. I used the Microsoft Speech API 5.4 and quickly set up a basic program that would say a phrase then close. It was very easy to change that phrase and re-launch the program, so naturally we made the computer say some ridiculous things and Doctor Who quotes (think Cybermen and Daleks).
The SR software was a little more involved, but it didn’t take very long to write a program that would open a message box showing what was said. It doesn’t always return the right words, but that could be fixed by using a more expensive engine than the free Microsoft SAPI. I can also enable SR in Windows through the Control Panel, and use that to train the SR engine.
Now I could move on to creating the main program. I have nicknamed it ‘AIPA’, which stands for Artificial Intelligence Personal Assistant. The program currently responds to the name ‘computer’, but I will change that in future.
I first gave AIPA the basic Windows application functions, then added a window showing a text input field, a text display field, and a button. When the button is clicked or the enter key pressed, the program takes the text from the input field, outputs it to the display field, then takes action based on the text. The display field lists the lines like a chat log, with the prefix “User: ” or “AIPA: ” used as appropriate.
After this window worked as described, I added in the TTS so that the computer speaks it’s responses. Then the code for logging the display field to a text file; one text file for each time the program is run.
Next, the code for responding to text input. I added ‘exit’ (shut down the program) and a few TTS-related commands: pause/unpause, mute/unmute, speech volume increase/decrease, and repeat last output. After I learned more about the different types of string variables, this became easier.
The final piece of the puzzle was adding the SR code. The program adds the text to the display field and log file, and can then process the speech as text. It took some more learning about string variables before this worked properly.
I bought a headset (headphones with an attached microphone) to help the engine generate more accurate results. It also means that I don’t have to resolve any potential problems that may crop up if the TTS is talking while the SR is listening.
I’m currently in the process of adding a ‘context-free grammar’. This is a collection of pre-written words and phrases that the SR should expect to hear, which will make it easier to match the right commands. However, the program works: it responds to spoken commands, speaks it’s own response, and has an easy-to-use visual interface.
The next AI sub-item will be to add more interesting commands to my AIPA program.
Pingback: AIPA 2: Add confirmations and tidy up | Ben's Bucket Blog