Intelligent Voice Assistant Based on GPT+Langchain Agent

SmartDeng7/13/24...About 2 min

Intelligent Voice Assistant Based on GPT+Langchain Agent

I created an intelligent assistant using GPT+Langchain Agent, combined with Microsoft's speech services, to achieve voice wake-up, voice conversation, and voice control of smart home functions. This article provides a detailed introduction to the implementation principles, and the code has been open-sourced on GitHub for learning and exchange.

GitHub link: mawwalker/moss. If you find it helpful, feel free to give it a star.

Project Overview

Moss is an intelligent voice assistant that combines modern AI technologies, supporting offline keyword wake-up, real-time speech recognition, intelligent conversation, and speech synthesis. The project adopts a modular design and integrates various advanced AI services, including the Home Assistant MCP module, to provide users with natural and smooth voice interaction experiences.

Key Features

🎯 Offline Keyword Wake-up

Complete offline keyword detection
Customizable wake words (supports Chinese and English)
Efficient sherpa-onnx models
Low-power continuous listening

🗣️ Real-time Speech Recognition

WebSocket-based real-time speech-to-text service
High-precision speech recognition
Streaming recognition for quick response

🤖 Intelligent Conversation Agent

Extensible architecture based on Langchain Agents
Support for multiple tools and skills
Weather queries, smart home control, etc.
Easy to add custom tools

🏠 Home Assistant Integration

Smart home control based on MCP (Model Context Protocol)
Seamless integration with the Home Assistant system
Device status queries and control
Voice control of smart home devices

🎵 High-quality Speech Synthesis

Advanced TTS technology based on IndexTTS
Natural and fluent voice output

System Architecture

┌──────────────────┐    ┌───────────────────┐    ┌─────────────────┐
│ Keyword Detection│───▶│ Speech Recognition│───▶│  LLM Processing │
│   (Offline KWS)  │    │  (Realtime STT)   │    │  (Langchain)    │
└──────────────────┘    └───────────────────┘    └─────────────────┘
                                                         │
┌─────────────────┐    ┌──────────────────┐              │
│  Audio Playback │◀───│ Speech Synthesis │◀─────────────┘
│ (Audio Player)  │    │   (IndexTTS)     │
└─────────────────┘    └──────────────────┘
                               ┌─────────────────┐
                               │ Home Assistant  │
                               │   MCP Module    │
                               └─────────────────┘

Quick Start

Requirements

Python 3.12+
Linux/macOS/Windows
Audio devices (microphone and speakers)
Home Assistant (optional, for smart home control)

Installation Steps

Clone the repository and submodules

git clone https://github.com/mawwalker/moss.git
cd moss
git submodule update --init --recursive

Install system dependencies (Ubuntu example)

sudo apt install portaudio19-dev python3-pyaudio sox pulseaudio libsox-fmt-all ffmpeg

Install Python dependencies

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Configuration

Copy the configuration template

cp .env.example .env

Edit the configuration file and fill in the necessary API keys and service addresses.

Run the Application

python app.py

Supported command-line arguments:

python app.py --help  # View all available parameters
python app.py --verbose  # Enable verbose logging
python app.py --keywords-file assets/keywords_en.txt  # Use English keywords

Summary

Moss is a powerful intelligent voice assistant that supports various voice interaction functions and can achieve smart home control through Home Assistant. If you are interested in this project, feel free to visit the GitHub repository for more information.