Build an Alexa or Siri Equivalent Bot in Python Using OpenAI in minutes!
- Andrew Hershy

- Oct 14, 2022
- 3 min read
A step-by-step guide to building an AI assistant

Table of Contents
Introduction
Introduction
It’s not as difficult as you think to build an AI program that listens to speech and answers questions. We can make the magic happen in an afternoon by leveraging a few python packages and APIs.
Here are some back-and-forth questions and answers between me and the bot:
Me: What is the population of Brazil?
Bot: The population of Brazil is estimated to be over 209 million people
Me: What does ice cream taste like?
Bot: Most ice cream tastes sweet and creamy
Me: How do I use chopsticks?
Bot: To use chopsticks, you must hold one in each hand. Place the chopstick in your dominant hand between your thumb and index finger, and hold it in place with your middle finger. Place the chopstick in your non-dominant hand between your thumb and index finger, and hold it in place with your ring finger and pinky finger. To pick up food, use the chopstick in your dominant hand to hold the food in place, and then use the chopstickGranted, these aren’t the most enlightening answers. And that chopstick one at the end is a bit strange, lol. However, the fact that this application can interpret speech and answer questions, no matter how seemingly limited, is amazing in my opinion. And unlike the mainstream AI assistance bots, we can see what’s under the hood here and play around with it.
What This Program Does
Run the file via the command prompt when the user is ready to ask a question
Pyaudio enables the computer mic to pick up speech data
JSON data is sent to AssemblyAI API to be converted to text. Text data is then sent back
Text data is sent to OpenAI API to be channeled into the text-davinci-002 engine for processing
The answer to the question is retrieved and shown on the console below your question
APIs and High-Level Design
This tutorial utilizes two core APIs:
AssemblyAI to transcribe the audio into text.
OpenAI to interpret the question and return an answer. It has also come to my attention that you can leverage OpenAI’s Whisper API to perform the transcription function as well.
Design (high level)
This project is broken up into two files: main and openai_helper.
The ‘main’ script is used for the voice-to-text API connection. It involves setting up a WebSockets server, filling in all the parameters required for pyaudio, and creating asynchronous functions required for sending and receiving the speech data concurrently between our application and AssemblyAi’s server.
The `openai_helper` file is short and is used solely to connect to the open ai “text-davinci-002” engine. This connection is used to receive answers to our questions.
Code Breakdown
main.py
First, we import all the libraries our application will use. Pip installation may be required for some of these, depending on whether you’ve used them. See the comments for context below
Then we set up our pyaudio parameters. These inputs are default settings found in various places on the web. Feel free to experiment as needed, but the defaults worked fine for me. We set the stream variable as our initial container for the audio data, and then we print the default input device parameters as a dictionary. The keys of the dictionary mirror the data fields of PortAudio’s structure. Here’s the code:
Next, we are creating multiple asynchronous functions for the sending and receiving required to transform our verbal questions into text. These functions are running concurrently, which enables the speech data to be converted into base64 format, converted into JSON, sent to the server via API, and then received back in a readable format. The WebSockets server is also a vital piece of the script below, as that’s what makes the direct stream as seamless as it is.
Lastly, we have our simple API connection to openai. If you look at line 44 of the gist above (main3.py), you can see we are pulling the function ask_computer from this other file and using the output as the answers to our questions.
Conclusion
This was a neat project for anyone interested in playing around with the same technology that makes Siri or Alexa function. Not much coding experience is required because we leverage APIs to do our processing. I would highly recommend forking the repo of this project and playing around first-hand if any reader wants to learn more about these technologies. Cheers!
Update: Please check out another project similar to this if you found this one interesting. It’s related to using speech-to-text transcription to create dalle-mini images
This article was originally published here.





_edited.png)










































































Using Nehamari feels respectful in a deeply personal way. My pace is honored. My pauses feel natural rather than awkward. That sense of being trusted makes the experience feel close and sincere.
I liked how this post explains the topic calmly and clearly. Each point feels natural and helpful, allowing readers to understand quickly and enjoy the content without feeling overloaded at any moment today. selectyourgirl
What remains with me about ankitabasu is how quietly it fits into moments without shaping them. It doesn’t compete for attention or push meaning onto the experience. It allows time to pass without commentary. That absence of pressure makes it easy to return. Not because it calls you back, but because it never resists you when you arrive.
So, are you ready to turn your wakeful nights into entertaining ones? That is unbelievable without a female companion. If your female partner cannot entertain or satisfy you, then It's time to Hire Exotic Amritsar Call Girls. The main interesting thing about our females is that you can't sleep when you are with them in bed.
https://amritsar-escort.com
https://amritsarcallgirl.com
I'm Bhumika from Indore, i have exposed pictures and assortments to make their perseverance high to perform by and large. Starting from the unimposing Girls, blonde Girls, male Girls, Russian Call Girls Indore, and significantly more combinations. Indore is an arising business focus point going through quick industrialization. Assembling and Service areas have been prospering around here Kam Girl Udaipur.