Build an Alexa or Siri Equivalent Bot in Python Using OpenAI in minutes!

Andrew Hershy
Oct 14, 2022
3 min read

A step-by-step guide to building an AI assistant

Table of Contents

Introduction
APIs and Design
Code Breakdown
Conclusion

Introduction

It’s not as difficult as you think to build an AI program that listens to speech and answers questions. We can make the magic happen in an afternoon by leveraging a few python packages and APIs.

Here are some back-and-forth questions and answers between me and the bot:

Me: What is the population of Brazil?

Bot: The population of Brazil is estimated to be over 209 million people

Me: What does ice cream taste like?

Bot: Most ice cream tastes sweet and creamy

Me: How do I use chopsticks?

Bot: To use chopsticks, you must hold one in each hand. Place the chopstick in your dominant hand between your thumb and index finger, and hold it in place with your middle finger. Place the chopstick in your non-dominant hand between your thumb and index finger, and hold it in place with your ring finger and pinky finger. To pick up food, use the chopstick in your dominant hand to hold the food in place, and then use the chopstick

Granted, these aren’t the most enlightening answers. And that chopstick one at the end is a bit strange, lol. However, the fact that this application can interpret speech and answer questions, no matter how seemingly limited, is amazing in my opinion. And unlike the mainstream AI assistance bots, we can see what’s under the hood here and play around with it.

What This Program Does

Run the file via the command prompt when the user is ready to ask a question
Pyaudio enables the computer mic to pick up speech data
JSON data is sent to AssemblyAI API to be converted to text. Text data is then sent back
Text data is sent to OpenAI API to be channeled into the text-davinci-002 engine for processing
The answer to the question is retrieved and shown on the console below your question

APIs and High-Level Design

This tutorial utilizes two core APIs:

AssemblyAI to transcribe the audio into text.
OpenAI to interpret the question and return an answer. It has also come to my attention that you can leverage OpenAI’s Whisper API to perform the transcription function as well.

Design (high level)

This project is broken up into two files: main and openai_helper.

The ‘main’ script is used for the voice-to-text API connection. It involves setting up a WebSockets server, filling in all the parameters required for pyaudio, and creating asynchronous functions required for sending and receiving the speech data concurrently between our application and AssemblyAi’s server.

The `openai_helper` file is short and is used solely to connect to the open ai “text-davinci-002” engine. This connection is used to receive answers to our questions.

Code Breakdown

main.py

First, we import all the libraries our application will use. Pip installation may be required for some of these, depending on whether you’ve used them. See the comments for context below

Then we set up our pyaudio parameters. These inputs are default settings found in various places on the web. Feel free to experiment as needed, but the defaults worked fine for me. We set the stream variable as our initial container for the audio data, and then we print the default input device parameters as a dictionary. The keys of the dictionary mirror the data fields of PortAudio’s structure. Here’s the code:

Next, we are creating multiple asynchronous functions for the sending and receiving required to transform our verbal questions into text. These functions are running concurrently, which enables the speech data to be converted into base64 format, converted into JSON, sent to the server via API, and then received back in a readable format. The WebSockets server is also a vital piece of the script below, as that’s what makes the direct stream as seamless as it is.

Lastly, we have our simple API connection to openai. If you look at line 44 of the gist above (main3.py), you can see we are pulling the function ask_computer from this other file and using the output as the answers to our questions.

Conclusion

This was a neat project for anyone interested in playing around with the same technology that makes Siri or Alexa function. Not much coding experience is required because we leverage APIs to do our processing. I would highly recommend forking the repo of this project and playing around first-hand if any reader wants to learn more about these technologies. Cheers!

Update: Please check out another project similar to this if you found this one interesting. It’s related to using speech-to-text transcription to create dalle-mini images

This article was originally published here.

See All

5 Chrome Extensions to improve your Productivity as a Software Developer in 2022

How Self-Doubt Can Kill Your Programming Career

What are Crypto-wallets — Exploring Web3

16 commentaires

Ct lush

07 août

People looking for genuine Pune call girls or other city profiles can trust Ctlush for verified listings. This adult call girls listing website features real ads from independent girls and agencies, with photos, prices, and service info. You can also find trusted Call Girls In Raipur and many more across India. As a premium adult add listing, Ctlush makes it easy to search by location or post an ad quickly. The platform is safe, simple, and built for real connections. Nashik Call Girls

Call Girls In Noida

Ct Babe. In

06 août

Ctbabe don’t look too modern, but that’s not really a bad thing. Sometimes them simple sites are the ones that actually work better. You get on the homepage and everything’s just there — no popups, no weird login stuff, just plain listings. Yea sure, some pages load a bit slow or maybe got a few spelling mistakes, but who cares when it’s giving you what you need without wasting time. The whole thing is easy to scroll, nothing confusing. It feels more like it’s built for users, not some marketing junk. If you want less hassle, Ctbabe just kinda works better than most. Patna Call Girls || Vadodara Call Girls || Bhubneswar Call Girls || Gurgoan Call Girls

Ct Queen

29 juil.

CTQueen is here for those who want a better experience — not just another listing. We’ve created a space that reflects your standards: clean, secure, and tailored for convenience. You’ll never scroll endlessly or deal with confusing info. Instead, you’ll find verified profiles, intuitive browsing, and smooth bookings. Our companions are selected not only for their appearance, but also for their personality and professionalism. When you want more than a basic service, Ctqueen Indian Directory delivers something that feels just right.

call girls in Delhi

call girls in Dehradun

call girls in Goa

call girls in Jaipur

Sweety patel

28 juil.

Explore a new dimension of companionship with high-profile escorts in Mumbai who understand the art of connection. These stunning professionals are not just about looks—they offer intellect, discretion, and a sense of sophistication that elevates your every encounter. Whether you desire a dinner date, a weekend escape, or someone to simply talk to, they provide warmth and genuine presence without any pressure or pretense.

For those seeking a deeper spark and uninhibited excitement, our elite selection of Mumbai call girls adds a sensuous thrill to your time. Each interaction is tailored to your fantasy, ensuring comfort, chemistry, and mutual enjoyment. They know how to blend passion with charm, making even a single night feel like a memory worth revisiting. In…

anu udaipur

16 juil.

You may select stunning Udaipur Escorts at your home with affordable room service. Also, they are ready to mingle with you anywhere you desire. They are fine to move ahead with you for a sexy practice of making love. We have kept our prices reasonable in Udaipur escort so that customers can take the privilege of the company of hot chicks. Our facilities are famous for the customer-friendly packages; we offer to all our clients.

Subscribe to my Youtube Channel @codewithgd

Master MongoDB Aggregation Pipeline: Essential Operators & Real-World Examples

Backend Development Series

Sep 6, 2024

Remove duplicates from sorted array

Data Structures and Algorithms

Aug 14, 2024

Encryption in Typescript

Network Security

Aug 10, 2024

A step-by-step guide to building an AI assistant

Introduction

What This Program Does

APIs and High-Level Design

This tutorial utilizes two core APIs:

Design (high level)

Code Breakdown

Conclusion

16 commentaires

Subscribe to my Youtube Channel @codewithgd

Master MongoDB Aggregation Pipeline: Essential Operators & Real-World Examples

Remove duplicates from sorted array

Encryption in Typescript

Exploring Event-Driven Architecture - Its Pros and Cons

What are Message Queues? How do Message Queues work?

Achieving High Availability in Microservices: Best Practices and Strategies

A Comprehensive Guide to Achieving Scalability in Microservices

What is the role of cloud computing in Microservices?

How message queues increase the reliability of the system

What is an Idempotent API and How to Use it?

Microservices vs Monolithic Architecture - Which one should you choose

Exploring Stateful vs. Stateless Architecture

How WebSockets are different from HTTP?

Understanding Resiliency in Applications & Services: What It Is and How to Build It

Advantages and Disadvantages of Microservices

How do Microservices Communicate With Each Other?

9 Proven Strategies to Improve API Performance

What are the benefits of message queues?

Eventual Consistency in Microservices and Large-Scale Distributed Systems

What happens when the load balancer fails?

How does database sharding work in SQL server? Explained with examples

Is the Load Balancer a Reverse Proxy? Reverse Proxy vs Load Balancer

Database Sharding vs Partitioning - What are the differences

Can the load balancer be a single point of failure?

What is the CAP theorem? Is the CAP theorem still valid?

What is DNS? How does DNS work? Types of DNS Servers- Backend Development Series

What is a CDN (Content Delivery Network)? How does CDN work and where is it used?

How Database Indexing Makes Your Query Faster in a Relational Database - The Complete Guide

What is an SSL/TLS Certificate and How do they Secure Your Website? - Backend Development Series

What is OSI Model? 7 layers of the OSI Model Explained (2023)

What are WebSockets? Everything you need to know about WebSockets! (2023)

Beginners guide to RabbitMQ - (Backend Development Series)

What is a Proxy? The difference between a proxy and a reverse proxy. Use cases of proxies.

Build an Alexa or Siri Equivalent Bot in Python Using OpenAI in minutes!

Python Has A Major Scalability Flaw! - System Design

How to create the perfect Pull Request?

Videos you might like

Let's Get Social

Subscribe to our NewsLetter

Let's Get
Social