欢迎您访问 最编程 本站为您分享编程语言代码,编程技术文章!
您现在的位置是: 首页

46 集 ESP32 的人工智能大模型对话工程硬件和软件现已开源!

最编程 2024-10-18 11:27:52
...

三哥AI大模型对话工程开源

基本例程采用esp-adf里面的pipeline_baidu_speech_mp3例程,在这个例程基础上修改加入百度语音转文字,minimax大模型对话,百度文字转语音程序。

程序参考了力创实战派esp32c3的程序。采用建民大佬的硬件板卡,兼容乐鑫官网esp32_s3_korvo2_v3板卡,略微有改动。在此一并感谢大力支持!
显示部分在esp-adf\examples\display\music_player例程基础上修改加入的ai-toys工程里面,加入调通LVGL屏幕显示,调通了GT911触摸程序。
目前仍旧有bug,还有一些功能,比如语音唤醒屏幕显示等,三哥会持续修改更新。
有啥技术问题可以联系三哥微信:robot3g,三哥拉你进开发者联盟vx,或Q群:174742054(开发者联盟),进群讨论。

简介

使用 ADF 进行VTT(voice to text),之后text发到minimax进行ai大模型交互,返回的txt由百度在线语音合成 (text-to-speech, TTS) 服务生成的音频通过i2s送到音频codec进行播放。本示例默认是中文文本,但也支持其他一些语言,更多的技术细节可以参考 百度语音合成文档 页面。

获取百度在线语音转文字和语音合成 MP3 音频管道如下:

[codec_chip]---> [i2s_reader] ---> [http_stream_writer] ---> [baidu_vtt_server] 

[baidu_tts_server] ---> http_stream ---> mp3_decoder ---> i2s_stream ---> [codec_chip]

环境搭建

环境搭建请参考三哥的****笔记:36集【新手必看】vscode搭建ESP32开发环境终极篇
https://blog.****.net/phlr5/article/details/141598455
B站配套视频:
https://www.bilibili.com/video/BV1NksAeuEAb
配套视频也可以在抖音、B站、小红书、微信短视频、快手,搜索柔贝特三哥,观看同名短视频就好。
【软硬件代码下载】
https://gitee.com/robot3g/ai_toys_publish
硬件主板:ai_toys_publish\hardware\01-RTC_LCD_V3.0.eprj
硬件屏幕板:ai_toys_publish\hardware\01-RTC_LEB18P_V2.0.eprj
有需要硬件板卡的,三哥后面整理好后会在淘宝、B站工坊和抖店上架。
【t宝链接】
https://item.txbx.com/item.htm?id=843727341146&skuId=5788969659288

硬件要求

硬件兼容ESP32_S3_KORVO2_V3_BOARD,其他板卡请参考 $ADF_PATH/examples/README_CN.md 文档中例程与乐鑫音频开发板的兼容性表格中有标注,表格中标有绿色复选框的开发板均可运行本例程。请记住,如下面的 配置 一节所述,可以在 menuconfig 中选择开发板。

编译和下载

IDF 默认分支IDF release/v5.2.2

配置

本工程默认配置大家可以看代码里面的sdkconfig.defaults和sdkconfig.defaults.esp32s3文件。

本例程默认选择的开发板是 CONFIG_AUDIO_BOARD_CUSTOM,如果需要在其他的开发板上运行此例程,则需要在 menuconfig 中选择开发板的配置,例如选择 ESP32-Lyrat-Mini V1.1

menuconfig > Audio HAL > ESP32-Lyrat-Mini V1.1

本例需要连接 Wi-Fi 网络,通过运行 menuconfig 来配置 Wi-Fi 信息。

 menuconfig > Example Configuration > `WiFi SSID` and `WiFi Password` and 'Baidu speech access key ID' and "Baidu speech access secret"
 

要在百度在线语音合成页面申请语音合成应用,并把申请到的 API KeySecret Key 分别填入 menuconfig 的配置中,用来和百度 TTS 服务器鉴权。
在minimax里面申请id和key。
具体可以参考三哥的****笔记《21-22集 ESP32-IDF开发教程编译运行机器人对话工程-《MCU嵌入式AI开发笔记》》
https://editor.****.net/md/?articleId=140510897
和22集 如何minimax密钥和groupid-《MCU嵌入式AI开发笔记》
https://editor.****.net/md/?articleId=140598712
和相同集数的视频介绍。

在minimax_chat.c中修改groupid
62行修改:.url = “https://api.minimax.chat/v1/text/chatcompletion_pro?GroupId=xxx”, // 这里xxx替换成自己的GroupId
在ai_toys.c中第85行修改:
// 把下面的双引号里面的替换成自己的token_key
const char * minimax_key = “Bearer eyJhbGciOiJSxxx”;

编译和下载

如何使用例程

功能和用法

  • 例程开始运行后,按照配置首先尝试连接 Wi-Fi 网络,之后就可以按键聊天了。
PS D:\workspace\esp-idf\ai_toys> & set IDF_PATH='D:\Espressif\v5.2\esp-idf'
PS D:\workspace\esp-idf\ai_toys> & 'D:\Espressif\tools\v5.2\python_env\idf5.2_py3.11_env\Scripts\python.exe' 'D:\Espressif\v5.2\esp-idf\tools\idf_monitor.py' -p COM4 -b 115200 
--toolchain-prefix xtensa-esp32s3-elf- --target esp32s3 'd:\workspace\esp-idf\ai_toys\build\ai_toys.elf'
--- WARNING: GDB cannot open serial ports accessed as COMx
--- Using \\.\COM4 instead...
--- esp-idf-monitor 1.4.0 on \\.\COM4 115200 ---
--- Quit: Ctrl+] | Menu: Ctrl+T | Help: Ctrl+T followed by Ctrl+H ---
ESP-ROM:esp32s3-20210327
Build:Mar 27 2021
rst:0x1 (POWERON),boot:0x8 (SPI_FAST_FLASH_BOOT)
SPIWP:0xee
mode:DIO, clock div:1
load:0x3fce3810,len:0x178c
load:0x403c9700,len:0x4
load:0x403c9704,len:0xcbc
load:0x403cc700,len:0x2d9c
entry 0x403c9914
I (27) boot: ESP-IDF v5.2.2-639-g43098fc4de-dirty 2nd stage bootloader
I (27) boot: compile time Oct  6 2024 17:31:43
I (28) boot: Multicore bootloader
I (32) boot: chip revision: v0.2
I (36) boot.esp32s3: Boot SPI Speed : 80MHz
I (40) boot.esp32s3: SPI Mode       : DIO
I (45) boot.esp32s3: SPI Flash Size : 16MB
I (50) boot: Enabling RNG early entropy source...
I (55) boot: Partition Table:
I (59) boot: ## Label            Usage          Type ST Offset   Length
I (66) boot:  0 nvs              WiFi data        01 02 00009000 00004000
I (74) boot:  1 phy_init         RF data          01 01 0000d000 00001000
I (81) boot:  2 factory          factory app      00 00 00010000 00600000
I (89) boot: End of partition table
I (93) esp_image: segment 0: paddr=00010020 vaddr=3c100020 size=326424h (3302436) map
I (694) esp_image: segment 1: paddr=0033644c vaddr=3fc9d100 size=04d50h ( 19792) load
I (698) esp_image: segment 2: paddr=0033b1a4 vaddr=40374000 size=04e74h ( 20084) load   
I (704) esp_image: segment 3: paddr=00340020 vaddr=42000020 size=f3400h (996352) map
I (886) esp_image: segment 4: paddr=00433428 vaddr=40378e74 size=141f8h ( 82424) load
I (915) boot: Loaded app from partition at offset 0x10000
I (915) boot: Disabling RNG early entropy source...
I (927) cpu_start: Multicore app
I (936) cpu_start: Pro cpu start user code
I (937) cpu_start: cpu freq: 160000000 Hz
I (937) cpu_start: Application information:
I (940) cpu_start: Project name:     ai_toys
I (944) cpu_start: App version:      20241005-lcdtpgt911andadcbutton
I (952) cpu_start: Compile time:     Oct 11 2024 06:47:46
I (958) cpu_start: ELF file SHA256:  728e4ed9d...
I (963) cpu_start: ESP-IDF:          v5.2.2-639-g43098fc4de-dirty
I (970) cpu_start: Min chip rev:     v0.0
I (974) cpu_start: Max chip rev:     v0.99 
I (979) cpu_start: Chip rev:         v0.2
I (984) heap_init: Initializing. RAM available for dynamic allocation:
I (991) heap_init: At 3FCB9D28 len 0002F9E8 (190 KiB): RAM
I (997) heap_init: At 3FCE9710 len 00005724 (21 KiB): RAM
I (1003) heap_init: At 3FCF0000 len 00008000 (32 KiB): DRAM
I (1010) heap_init: At 600FE010 len 00001FD8 (7 KiB): RTCRAM
I (1017) spi_flash: detected chip: generic
I (1021) spi_flash: flash io: dio
W (1025) i2c: This driver is an old driver, please migrate your application code to adapt `driver/i2c_master.h`
W (1036) ADC: legacy driver is deprecated, please migrate to `esp_adc/adc_oneshot.h`
I (1044) sleep: Configure to isolate all GPIO pins in sleep state
I (1051) sleep: Enable automatic switching of GPIO sleep configuration
I (1059) main_task: Started on CPU0
I (1069) main_task: Calling app_main()
I (1069) BAIDU_SPEECH_EXAMPLE: nvs_flash_init start
I (1079) BAIDU_SPEECH_EXAMPLE: [ 0 ] Start and wait for Wi-Fi network
I (1079) TCA9554: Detected IO expander device at 0x70, name is: TCA9554A
I (1089) AUDIO_BOARD: tca9554_init done
E (1099) gpio: GPIO_PIN mask error 
I (1119) gpio: GPIO[2]| InputEn: 0| OutputEn: 1| OpenDrain: 0| Pullup: 0| Pulldown: 0| Intr:0
I (1519) AUDIO_BOARD: lcd init done
I (1519) AUDIO_THREAD: The esp_periph task allocate stack on internal memory
W (1519) I2C_BUS: I2C bus has been already created, [port:0]
I (1519) GT911: TouchPad_ID:0x39,0x31,0x31
I (1529) GT911: TouchPad_Config_Version:65
I (1539) BAIDU_SPEECH_EXAMPLE: lv_port_init done
I (1559) BAIDU_SPEECH_EXAMPLE: lv_demo_music done
I (1559) BAIDU_SPEECH_EXAMPLE: ai_chat_task
I (1559) BAIDU_SPEECH_EXAMPLE: [ 0 ] Start and wait for Wi-Fi network
I (1569) BAIDU_SPEECH_EXAMPLE: [ key_start ] Initialize Button peripheral with board init
I (1579) BAIDU_SPEECH_EXAMPLE: [ key_start ] Create and start input key service
I (1579) AUDIO_THREAD: The input_key_service task allocate stack on internal memory
I (1589) AUDIO_THREAD: The button_task task allocate stack on internal memory
W (1599) BAIDU_SPEECH_EXAMPLE: [ 4 ] Waiting for a button to be pressed ...
I (1609) BAIDU_SPEECH_EXAMPLE: audio_key_start done
I (1619) pp: pp rom version: e7ae62f
I (1619) net80211: net80211 rom version: e7ae62f
I (1629) wifi:wifi driver task: 3fcd82f8, prio:23, stack:6656, core=0
I (1629) wifi:wifi firmware version: c2ae8d1
I (1629) wifi:wifi certification version: v7.0
I (1639) wifi:config NVS flash: enabled
I (1639) wifi:config nano formating: disabled
I (1649) wifi:Init data frame dynamic rx buffer num: 32
I (1649) wifi:Init static rx mgmt buffer num: 5
I (1649) wifi:Init management short buffer num: 32
I (1659) wifi:Init dynamic tx buffer num: 32
I (1659) wifi:Init static tx FG buffer num: 2
I (1669) wifi:Init static rx buffer size: 1600
I (1669) wifi:Init static rx buffer num: 10
I (1679) wifi:Init dynamic rx buffer num: 32
I (1679) wifi_init: rx ba win: 6
I (1679) wifi_init: tcpip mbox: 32
I (1689) wifi_init: udp mbox: 6
I (1689) wifi_init: tcp mbox: 6
I (1689) wifi_init: tcp tx win: 5760
I (1699) wifi_init: tcp rx win: 5760
I (1699) wifi_init: tcp mss: 1440
I (1709) wifi_init: WiFi IRAM OP enabled
I (1709) wifi_init: WiFi RX IRAM OP enabled
W (1719) wifi:Password length matches WPA2 standards, authmode threshold changes from OPEN to WPA2
I (1729) wifi:Set ps type: 1, coexist: 0

I (1729) phy_init: phy_version 680,a6008b2,Jun  4 2024,16:41:10
I (1779) wifi:mode : sta (cc:8d:a2:ee:c5:94)
I (1779) wifi:enable tsf
W (1779) PERIPH_WIFI: WiFi Event cb, Unhandle event_base:WIFI_EVENT, event_id:43
I (2899) wifi:new:<6,1>, old:<1,0>, ap:<255,255>, sta:<6,1>, prof:1
I (2899) wifi:state: init -> auth (b0)
W (2899) PERIPH_WIFI: WiFi Event cb, Unhandle event_base:WIFI_EVENT, event_id:43        
I (2899) wifi:state: auth -> assoc (0)
I (2909) wifi:state: assoc -> run (10)
I (2929) wifi:connected with xxx, aid = 6, channel 6, 40U, bssid = xxxx
I (2929) wifi:security: WPA2-PSK, phy: bgn, rssi: -37
I (2929) wifi:pm start, type: 1

I (2929) wifi:dp: 1, bi: 102400, li: 3, scale listen interval from 307200 us to 307200 us
I (2939) wifi:set rx beacon pti, rx_bcn_pti: 0, bcn_timeout: 25000, mt_pti: 0, mt_time: 
10000
W (2949) PERIPH_WIFI: WiFi Event cb, Unhandle event_base:WIFI_EVENT, event_id:4
I (2959) wifi:<ba-add>idx:0 (ifx:0, b0:95xxx), tid:0, ssn:0, winSize:64        
I (2989) wifi:AP's beacon interval = 102400 us, DTIM period = 1
I (3949) esp_netif_handlers: sta ip: 192.168.0.105, mask: 255.255.255.0, gw: 192.168.0.1I (3949) PERIPH_WIFI: Got ip:192.168.0.105
I (4359) BAIDU_AUTH: Access token=24.xxxx023
I (4359) AUDIO_BOARD: audio_board_init and codec adc
W (4359) I2C_BUS: I2C bus has been already created, [port:0]
I (4369) DRV8311: ES8311 in Slave mode
I (4389) gpio: GPIO[48]| InputEn: 0| OutputEn: 1| OpenDrain: 0| Pullup: 0| Pulldown: 0| 
Intr:0
W (4389) I2C_BUS: I2C bus has been already created, [port:0]
I (4399) ES7210: ES7210 in Slave mode
I (4409) ES7210: Enable ES7210_INPUT_MIC1
I (4409) ES7210: Enable ES7210_INPUT_MIC2
I (4419) ES7210: Enable ES7210_INPUT_MIC3
W (4419) ES7210: Enable TDM mode. ES7210_SDP_INTERFACE2_REG12: 2
I (4429) ES7210: config fmt 60
I (4429) AUDIO_HAL: Codec mode is 3, Ctrl:1
I (4439) AUDIO_PIPELINE: link el->rb, el:0x3fce91e0, tag:vtt_i2s, rb:xx
I (4439) MP3_DECODER: MP3 init
I (4439) AUDIO_PIPELINE: link el->rb, el:0x3fce2ba0, tag:tts_http, rb:xx
I (4449) AUDIO_PIPELINE: link el->rb, el:0x3fce2edc, tag:tts_mp3, rb:xx
I (4459) BAIDU_SPEECH_EXAMPLE: [ 4 ] Set up  event listener
I (4459) BAIDU_SPEECH_EXAMPLE: [4.1] Listening event from the pipeline
I (4469) BAIDU_SPEECH_EXAMPLE: [4.2] Listening event from peripherals
I (4479) BAIDU_SPEECH_EXAMPLE: [ 5 ] Listen for all pipeline events
I (4489) BAIDU_SPEECH_EXAMPLE: main_page_task
I (4489) BAIDU_SPEECH_EXAMPLE: lv_main_page
I (4499) BAIDU_SPEECH_EXAMPLE: lv_main_page done
I (14609) BAIDU_SPEECH_EXAMPLE: [ * ] Event received: src_type:1048588, source:0x3fcd27e0 cmd:1, data:0x5, data_len:4
I (14609) BAIDU_SPEECH_EXAMPLE: msg.cmd=1
W (14609) AUDIO_PIPELINE: Without stop, st:1
W (14619) AUDIO_PIPELINE: Without wait stop, st:1
I (14619) BAIDU_TTS: TTS all el stopped
I (14629) BAIDU_SPEECH_EXAMPLE: [ * ] Resuming pipeline
I (14629) AUDIO_THREAD: The vtt_http task allocate stack on internal memory
I (14639) AUDIO_ELEMENT: [vtt_http-0x3fce01b8] Element task created
I (14649) AUDIO_THREAD: The vtt_i2s task allocate stack on internal memory