Intelligent Voice Assistant Based on GPT+Langchain Agent
Intelligent Voice Assistant Based on GPT+Langchain Agent
I created an intelligent assistant using GPT+Langchain Agent, combined with Microsoft's speech services, to achieve voice wake-up, voice conversation, and voice control of smart home functions. This article provides a detailed introduction to the implementation principles, and the code has been open-sourced on GitHub for learning and exchange.
GitHub link: mawwalker/moss. If you find it helpful, feel free to give it a star.
Project Overview
Moss is an intelligent voice assistant that combines modern AI technologies, supporting offline keyword wake-up, real-time speech recognition, intelligent conversation, and speech synthesis. The project adopts a modular design and integrates various advanced AI services, including the Home Assistant MCP module, to provide users with natural and smooth voice interaction experiences.
Key Features
🎯 Offline Keyword Wake-up
- Complete offline keyword detection
- Customizable wake words (supports Chinese and English)
- Efficient sherpa-onnx models
- Low-power continuous listening
🗣️ Real-time Speech Recognition
- WebSocket-based real-time speech-to-text service
- High-precision speech recognition
- Streaming recognition for quick response
🤖 Intelligent Conversation Agent
- Extensible architecture based on Langchain Agents
- Support for multiple tools and skills
- Weather queries, smart home control, etc.
- Easy to add custom tools
🏠 Home Assistant Integration
- Smart home control based on MCP (Model Context Protocol)
- Seamless integration with the Home Assistant system
- Device status queries and control
- Voice control of smart home devices
🎵 High-quality Speech Synthesis
- Advanced TTS technology based on IndexTTS
- Natural and fluent voice output
System Architecture
┌──────────────────┐ ┌───────────────────┐ ┌─────────────────┐
│ Keyword Detection│───▶│ Speech Recognition│───▶│ LLM Processing │
│ (Offline KWS) │ │ (Realtime STT) │ │ (Langchain) │
└──────────────────┘ └───────────────────┘ └─────────────────┘
│
┌─────────────────┐ ┌──────────────────┐ │
│ Audio Playback │◀───│ Speech Synthesis │◀─────────────┘
│ (Audio Player) │ │ (IndexTTS) │
└─────────────────┘ └──────────────────┘
┌─────────────────┐
│ Home Assistant │
│ MCP Module │
└─────────────────┘
Quick Start
Requirements
- Python 3.12+
- Linux/macOS/Windows
- Audio devices (microphone and speakers)
- Home Assistant (optional, for smart home control)
Installation Steps
- Clone the repository and submodules
git clone https://github.com/mawwalker/moss.git
cd moss
git submodule update --init --recursive
- Install system dependencies (Ubuntu example)
sudo apt install portaudio19-dev python3-pyaudio sox pulseaudio libsox-fmt-all ffmpeg
- Install Python dependencies
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
Configuration
- Copy the configuration template
cp .env.example .env
- Edit the configuration file and fill in the necessary API keys and service addresses.
Run the Application
python app.py
Supported command-line arguments:
python app.py --help # View all available parameters
python app.py --verbose # Enable verbose logging
python app.py --keywords-file assets/keywords_en.txt # Use English keywords
Summary
Moss is a powerful intelligent voice assistant that supports various voice interaction functions and can achieve smart home control through Home Assistant. If you are interested in this project, feel free to visit the GitHub repository for more information.