Build your customizable voice assistant with Platypush
Use the available integrations to build a voice assistant with a simple microphone
My dream of a piece of software that you could simply talk to and get things done started more than 10 years ago, when I was still a young M.Sc student who imagined getting common tasks done on my computer through the same kind of natural interaction you see between Dave and HAL 9000 in 2001: A Space Odyssey. Together with a friend I developed Voxifera way back in 2008. Although the software worked well enough for basic tasks, as long as it was always me to provide the voice commands and as long as the list of custom voice commands was below 10 items, Google and Amazon in the latest years have gone way beyond what an M.Sc student alone could do with fast-Fourier transforms and Markov models.
When years later I started building Platypush, I still dreamed of the same voice interface, leveraging the new technologies, while not being caged by the interactions natively provided by those commercial assistants. My goal was still to talk to my assistant and get it to do whatever I wanted to, regardless of the skills/integrations supported by the product, regardless of whichever answer its AI was intended to provide for that phrase. And, most of all, my goal was to have all the business logic of the actions to run on my own device(s), not on someone else’s cloud. I feel like by now that goal has been mostly accomplished (assistant technology with 100% flexibility when it comes to phrase patterns and custom actions), and today I’d like to show you how to set up your own Google Assistant on steroids as well with a Raspberry Pi, microphone and Platypush. I’ll also show how to run your custom hotword detection models through the Snowboy integration, for those who wish greater flexibility when it comes to how to summon your digital butler besides the boring “Ok Google” formula, or those who aren’t that happy with the idea of having Google to constantly listen to everything that is said in the room. For those who are unfamiliar with Platypush, I suggest reading my previous article on what it is, what it can do, why I built it and how to get started with it.
Context and expectations
First, a bit of context around the current state of the assistant integration (and the state of the available assistant APIs/SDKs in general).
My initial goal was to have a voice assistant that could:
-
Continuously listen through an audio device for a specific audio pattern or phrase and process the subsequent voice requests.
-
Support multiple models for the hotword, so that multiple phrases could be used to trigger a request process, and optionally one could even associate a different assistant language to each hotword.
-
Support conversation start/end actions even without hotword detection — something like “start listening when I press a button or when I get close to a distance sensor”.
-
Provide the possibility to configure a list of custom phrases or patterns (ideally through regular expressions) that, when matched, would run a custom pre-configured task or list of tasks on the executing device, or on any device connected through it.
-
If a phrase doesn’t match any of those pre-configured patterns, then the assistant would go on and process the request in the default way (e.g. rely on Google’s “how’s the weather?” or “what’s on my calendar?” standard response).
Basically, I needed an assistant SDK or API that could be easily wrapped into a library or tiny module, a module that could listen for hotwords, start/stop conversations programmatically, and return the detected phrase directly back to my business logic if any speech was recognized.
I eventually decided to develop the integration with the Google Assistant and ignore Alexa because:
-
Alexa’s original sample app for developers was a relatively heavy piece of software that relied on a Java backend and a Node.js web service.
-
In the meantime Amazon has pulled the plug off that original project.
-
The sample app has been replaced by the Amazon AVS (Alexa Voice Service), which is a C++ service mostly aimed to commercial applications and doesn’t provide a decent quickstart for custom Python integrations.
-
There are few Python examples for the Alexa SDK, but they focus on how to develop a skill. I’m not interested in building a skill that runs on Amazon’s servers — I’m interested in detecting hotwords and raw speech on any device, and the SDK should let me do whatever I want with that.
I eventually opted for the Google Assistant library, but that has recently been deprecated with short notice, and there’s an ongoing discussion of which will be the future alternatives. However, the voice integration with Platypush still works, and whichever new SDK/API Google will release in the near future I’ll make sure that it’ll still be supported. The two options currently provided are:
-
If you’re running Platypush on an x86/x86_64 machine or on a Raspberry Pi earlier than the model 4 (except for the Raspberry Pi Zero, since it’s based on ARM6 and the Assistant library wasn’t compiled it for it), you can still use the assistant library — even though it’s not guaranteed to work against future builds of the libc, given the deprecated status of the library.
-
Otherwise, you can use the Snowboy integration for hotword detection together with Platypush’ s wrapper around the Google push-to-talk sample for conversation support.
In this article we’ll see how to get started with both the configurations.
Installation and configuration
First things first: in order to get your assistant working you’ll need:
-
An x86/x86_64/ARM device/OS compatible with Platypush and either the Google Assistant library or Snowboy (tested on most of the Raspberry Pi models, Banana Pis and Odroid, and on ASUS Tinkerboard).
-
A microphone. Literally any Linux-compatible microphone would work.
I’ll also assume that you have already installed Platypush on your device — the instructions are provided on the Github page, on the wiki and in my previous article.
Follow these steps to get the assistant running:
- Install the required dependencies:
# (it won't work on RaspberryPi Zero and arm6 architecture)
[sudo] pip install 'platypush[google-assistant-legacy]'
# To run the just the Google Assistant speech detection and use
# Snowboy for hotword detection
[sudo] pip install 'platypush[google-assistant]'
-
Follow these steps to create and configure a new project in the Google Console and download the required credentials files.
-
Generate your user’s credentials file for the assistant to connect it to your account:
export CREDENTIALS_FILE=~/.config/google-oauthlib-tool/credentials.json
google-oauthlib-tool --scope https://www.googleapis.com/auth/assistant-sdk-prototype \
--scope https://www.googleapis.com/auth/gcm \
--save --headless --client-secrets $CREDENTIALS_FILE
- Open the prompted URL in your browser, log in with your Google account if needed and then enter the prompted authorization code in the terminal.
The above steps are common both for the Assistant library and the Snowboy+push-to-talk configurations. Let’s now tackle how to get things working with the Assistant library, provided that it still works on your device.
Google Assistant library
- Enable the Google Assistant backend (to listen to the hotword) and plugin (to programmatically start/stop
conversations in your custom actions) in your Platypush configuration file (by default
~/.config/platypush/config.yaml):
backend.assistant.google:
enabled: True
assistant.google:
enabled: True
-
Refer to the official documentation to check the additional initialization parameters and actions provided by the assistant backend and plugin.
-
Restart Platypush and keep an eye on the output to check that everything is alright. Oh, and also double check that your microphone is not muted.
-
Just say “OK Google” or “Hey Google”. The basic assistant should work out of the box.
Snowboy + Google Assistant library
Follow the steps in the next section if the Assistant library doesn’t work on your device (in most of the cases you’ll see a segmentation fault if you try to import it caused by a mismatching libc version), or if you want more options when it comes to supported hotwords, and/or you don’t like the idea of having Google to constantly listen all of your conversation to detect when you say the hotword.
# Install the Snowboy dependencies
[sudo] pip install 'platypush[hotword]'
-
Go to the Snowboy home page, register/login and then select the hotword model(s) you like. You’ll notice that before downloading a model you’ll be asked to provide three voice sample of yours saying the hotword — a good idea to keep voice models free while getting everyone to improve them.
-
Configure the Snowboy backend and the Google push-to-talk plugin in your Platypush configuration. Example:
backend.assistant.snowboy:
audio_gain: 1.0
models:
computer:
voice_model_file: ~/path/models/computer.umdl
assistant_plugin: assistant.google.pushtotalk
assistant_language: it-IT
detect_sound: ~/path/sounds/sound1.wav
sensitivity: 0.45
ok_google:
voice_model_file: ~/path/models/OK Google.pmdl
assistant_plugin: assistant.google.pushtotalk
assistant_language: en-US
detect_sound: ~/path/sounds/sound2.wav
sensitivity: 0.42
assistant.google.pushtotalk:
language: en-US
A few words about the configuration tweaks:
-
Tweak
audio_gainto adjust the gain of your microphone (1.0 for a 100% gain). -
modelwill contain a key-value list of the voice models that you want to use. -
For each model you’ll have to specify its
voice_model_file(downloaded from the Snowboy website), whichassistant_pluginwill be used (assistant.google.pushtotalkin this case), the assistant_language code, i.e. the selected language for the assistant conversation when that hotword is detected (default:en-US), an optional detect_sound, a WAV file that will be played when a conversation starts, and the sensitivity of that model, between 0 and 1 — with 0 meaning no sensitivity and 1 very high sensitivity (tweak it to your own needs, but be aware that a value higher than 0.5 might trigger more false positives). -
The
assistant.google.pushtotalkplugin configuration only requires the default assistant language to be used.
Refer to the official documentation for extra initialization parameters and methods provided by the Snowboy backend and the push-to-talk plugin.
Restart Platypush and check the logs for any errors, then say your hotword. If everything went well, an assistant conversation will be started when the hotword is detected.
Create custom events on speech detected
So now that you’ve got the basic features of the assistant up and running, it’s time to customize the configuration and
leverage the versatility of Platypush to get your assistant to run whatever you like through when you say whichever
phrase you like. You can create event hooks for any of the events triggered by the assistant — among those,
SpeechRecognizedEvent, ConversationStartEvent, HotwordDetectedEvent, TimerEndEvent etc., and those hooks can run
anything that has a Platypush plugin. Let’s see an example to turn on your Philips Hue lights when you say “turn on the
lights”:
event.hook.AssistantTurnLightsOn:
if:
type: platypush.message.event.assistant.SpeechRecognizedEvent
phrase: "turn on (the)? lights?"
then:
- action: light.hue.on
You’ll also notice that the answer of the assistant is suppressed if the detected phrase matches an existing rule, but
if you still want the assistant to speak a custom phrase you can use the tts or tts.google plugins:
event.hook.AssistantTurnOnLightsAnimation:
if:
type: platypush.message.event.assistant.SpeechRecognizedEvent
phrase: "turn on (the)? animation"
then:
- action: light.hue.animate
args:
animation: color_transition
transition_seconds: 0.25
- action: tts.say
args:
text: Enjoy the light show
You can also programmatically start a conversation without using the hotword to trigger the assistant. For example, this is a rule that triggers the assistant whenever you press a Flic button:
event.hook.FlicButtonStartConversation:
if:
type: platypush.message.event.button.flic.FlicButtonEvent
btn_addr: 00:11:22:33:44:55
sequence:
- ShortPressEvent
then:
- action: assistant.google.start_conversation
# or:
# - action: assistant.google.pushtotalk.start_conversation
Additional win: if you have configured the HTTP backend and you have access to the web panel or the dashboard then you’ll notice that the status of the conversation will also appear on the web page as a modal dialog, where you’ll see when a hotword has been detected, the recognized speech and the transcript of the assistant response.
That’s all you need to know to customize your assistant — now you can for instance write rules that would blink your lights when an assistant timer ends, or programmatically play your favourite playlist on mpd/mopidy when you say a particular phrase, or handle a home made multi-room music setup with Snapcast+platypush through voice commands. As long as there’s a platypush plugin to do what you want to do, you can do it already.
Live demo
A TL;DR video with a practical example:
In this video:
-
Using Google Assistant basic features ("how's the weather?") with the "OK Google" hotword (in English)
-
Triggering a conversation in Italian when I say the "computer" hotword instead
-
Support for custom responses through the Text-to-Speech plugin
-
Control the music through custom hooks that leverage mopidy as a backend (and synchronize music with devices in other rooms through the Snapcast plugin)
-
Trigger a conversation without hotword - in this case I defined a hook that starts a conversation when something approaches a distance sensor on my Raspberry
-
Take pictures from a camera on another Raspberry and preview them on the screen through platypush' camera plugins, and send them to mobile devices through the Pushbullet or AutoRemote plugins
-
All the conversations and responses are visually shown on the platypush web dashboard
Reactions
How to interact with this page
Webmentions
To interact via Webmentions, send an activity that references this URL from a platform that supports Webmentions, such as Lemmy, WordPress with Webmention plugins, or any IndieWeb-compatible site.
ActivityPub
- Follow @blog@platypush.tech on your ActivityPub platform (e.g. Mastodon, Misskey, Pleroma, Lemmy).
- Mention @blog@platypush.tech in a post to feature on the Guestbook.
- Search for this URL on your instance to find and interact with the post.
- Like, boost, quote, or reply to the post to feature your activity here.
Those who have followed me for a while know of my personal obsession with self-built voice assistants.
My experiments over the years can be summarized as it follows:
-
2007: Voxifera, my very first attempt at building a primitive voice assistant using Hidden Markov models. Definitely not good for general-purpose usage, but good enough in 2007 to distinguish between a dozen of simple voice commands.
-
2019: First voice assistant built on top of Platypush. It used the now deprecated Google Assistant Library on top of a Raspberry Pi with a microphone and a speaker, and it could hook any automation routines and custom commands to it through event hooks.
-
2020: Second iteration on #platypush, this time supporting other assistant plugins too - Alexa (integration now removed), Snowboy (also removed, since the project is dead), Mozilla DeepSpeech (also removed now, since Mozilla discontinued it), PicoVoice, and mimic3 (the text-to-speech engine built on top of Mycroft, now bankrupt).
-
2024: Third iteration on Platypush, this time with an enhanced PicoVoice integration and new speech-to-text and text-to-speech plugins based on the OpenAI APIs.
But it's now 2026, and perhaps both the hardware and the software are now mature enough for fully on-device voice assistants based on fully open solutions likely to stick around for a while.
In this article we'll wire that gap closed with Platypush:
assistant.openwakewordlistens for the wake word locally.assistant.vosktranscribes the command locally.tts.piperspeaks the answer locally.openaiis used only where a language model is useful: turning messy speech into intent, or answering general questions.- Existing home automation plugins such as
light.hue,music.mpdorweather.openweathermapto provide the actions.
The result is not another cloud assistant with a different coat of paint. The hotword engine, speech recognition, command dispatch and speech synthesis can all run on-device. If the openai step points to a local OpenAI-compatible server, then the whole pipeline can stay on your LAN too.
The pipeline
The architecture can be summarized as follows:
Hotword detection ("OK Google", "Alexa" etc.) is a continuous, low-latency workload, and it should not need the network.
Speech-to-text is also a good fit for local inference: Vosk models are small enough to run on modest hardware, including Raspberry Pis, and they are perfectly adequate for short home automation commands.
Text-to-speech is another place where local models are good enough nowadays: Piper voices are fast, small and much nicer than the old robotic espeak-style fallback.
The only optional network-shaped piece is the language model.
But that is a policy choice, not a requirement of the voice stack.
Setup
Clone the assistant sample repository:
git clone https://git.platypush.tech/platypush/assistant-sample
cd assistant-sample
Models
The next step is to download the voice models used by the voice stack.
Hotword Detection
When the service starts the first time, it will automatically download all the available models.
You can then use the following command to list the available models once the service is running:
curl -s -XPOST \
-H 'Content-type: application/json' \
-H "Authorization: Bearer $PLATYPUSH_TOKEN" \
-d '{"type":"request", "action":"assistant.openwakeword.list_models"}' \
http://localhost:8008/execute
Where $PLATYPUSH_TOKEN is the token of the user that is running the service.
You can retrieve it by connecting to http://localhost:8008 when the service starts for the first time. Create your credentials, then select Settings -> Tokens -> Generate API Token.
Speech-to-text
A full list of the Vosk voice models is available here.
Some feedback about the quality of the English models:
| Model | Size | Notes |
|---|---|---|
vosk-model-small-en-us-0.15 |
40 MB | Very fast and lightweight model that can also run on an old Raspberry Pi, but accuracy can be low. |
vosk-model-en-us-0.22-lgraph |
128 MB | Reasonably accurate on clear speech and with native speakers, but still small enough to run fine even on a Raspberry Pi. |
vosk-model-en-us-0.22 |
1.8 GB | Accurate generic US English model. Fast on an laptop or x86 processor, but it may be a bit heavy on a Raspberry Pi. |
Download the selected model to the Docker volume working directory:
mkdir -p ./workdir/assistant.vosk/models
cd ./workdir/assistant.vosk/models
wget "https://alphacephei.com/vosk/models/vosk-model-en-us-0.22-lgraph.zip"
unzip "vosk-model-en-us-0.22-lgraph.zip"
rm "vosk-model-en-us-0.22-lgraph.zip"
Text-to-speech
Download a speech synthesis model from here.
Audio samples are also available to get an idea of the type of voice before downloading.
The model usually consists of a *.onnx and a *.onnx.json file. Download both of them to the Docker volume working directory:
mkdir -p ./workdir/piper_tts
cd ./workdir/piper_tts
wget "https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/hfc_female/medium/en_US-hfc_female-medium.onnx"
wget "https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/hfc_female/medium/en_US-hfc_female-medium.onnx.json"
Configuration
Copy and edit the example configuration file.
cp config/config.example.yaml config/config.yaml
Home automation plugins
The assistant becomes useful once recognized speech can reach the rest of the house.
For example, Hue lights:
light.hue:
bridge: hue
groups:
- Living Room
And MPD/Mopidy for music:
music.mopidy:
host: localhost
music.mpd:
host: localhost
poll_interval: null
Those are just regular Platypush plugins.
The assistant does not need special knowledge about Hue, MPD, Chromecast, Zigbee, MQTT or anything else.
It only needs to emit events; your hooks decide what to do with them.
Build
Build the container image for the assistant service:
docker build -t platypush-voice .
Run
The assistant needs access to the host microphone and speakers. The container routes ALSA through PulseAudio, so the examples below connect it to a PulseAudio server running on the host.
Linux
With PulseAudio or pipewire-pulseaudio installed:
docker run --rm \
-e PULSE_SERVER=unix:/run/pulse/native \
-v /run/user/$(id -u)/pulse/native:/run/pulse/native \
--name voice-assistant \
-p 8008:8008 \
-v ./config:/etc/platypush \
-v ./workdir:/var/lib/platypush \
platypush-voice
macOS
Install and start PulseAudio on the host:
brew install pulseaudio
pulseaudio --daemonize=yes --exit-idle-time=-1
pactl load-module module-native-protocol-tcp \
auth-anonymous=1 \
listen=0.0.0.0 \
port=4713
Then start the container:
docker run --rm \
-e PULSE_SERVER=tcp:host.docker.internal:4713 \
--name voice-assistant \
-p 8008:8008 \
-v "$(pwd)/config:/etc/platypush" \
-v "$(pwd)/workdir:/var/lib/platypush" \
platypush-voice
If pactl load-module reports that the module is already loaded, you can keep using the existing PulseAudio daemon.
Windows
Install PulseAudio for Windows, then create a default.pa file in the same directory as pulseaudio.exe:
load-module module-waveout sink_name=output source_name=input record=1
load-module module-native-protocol-tcp auth-anonymous=1 listen=0.0.0.0 port=4713
set-default-sink output
set-default-source input
Start PulseAudio from PowerShell:
.\pulseaudio.exe -F .\default.pa --exit-idle-time=-1
Then start the container from the repository directory:
docker run --rm `
-e PULSE_SERVER=tcp:host.docker.internal:4713 `
--name voice-assistant `
-p 8008:8008 `
-v "${PWD}/config:/etc/platypush" `
-v "${PWD}/workdir:/var/lib/platypush" `
platypush-voice
Make sure microphone access is enabled for desktop applications under Windows privacy settings, and allow PulseAudio through the firewall if prompted.
Usage
Once the service is running, you can start interact with it with voice commands (the default activation word is "Alexa").
Any questions about the weather will be resolved by the weather plugin if it's been enabled.
If the music or lights plugins are enabled, they can be controlled with voice commands ("stop the music", "turn on the lights", etc.)
Otherwise, the assistant will use the openai plugin to respond to your questions, with follow-up turns when the response from OpenAI is also a question.
Extending the Assistant
The assistant logic is modeled through simple Platypush hooks under config/scripts.
You can extend it as you like by defining your own hooks or modifying the existing ones.
Starting a conversation
Conversations are started by hooking to the HotwordDetectedEvent.
import logging
from platypush import run, when
from platypush.events.assistant import HotwordDetectedEvent
logger = logging.getLogger(__name__)
ai_plugin = "openai"
assistant_plugin = "assistant.vosk"
@when(HotwordDetectedEvent)
def on_hotword_detected(event: HotwordDetectedEvent):
"""
When the hotword is detected, start a conversation.
"""
logger.info(f"Hotword {event.hotword} detected")
run(f"{assistant_plugin}.start_conversation")
Deterministic commands
For common home automation commands, regular event hooks are still the best tool. They are fast, inspectable, and they do not hallucinate.
from platypush import run, when
from platypush.events.assistant import SpeechRecognizedEvent
@when(SpeechRecognizedEvent, phrase="turn on (the)? lights")
def turn_on_lights():
"""
Hook run when the user says "turn on the lights" (regex)
"""
run("light.hue.on")
@when(SpeechRecognizedEvent, phrase="play (the)? music")
def play_music():
"""
Hook run when the user says "play the music" (regex)
"""
run("music.mpd.play")
@when(SpeechRecognizedEvent, phrase="set the music volume (to|on|at) ${volume}")
def set_volume(volume: int):
"""
Hook run when the user says "set the music volume to ${volume}"
(regex with parameter).
"""
run("music.mpd.set_volume", volume=volume)
AI Commands
If the openai plugin is enabled, you can use it to help you answer questions.
There are two generic use-cases for voice assistants where an AI plugin is beneficial:
- Speech to Intent
- Response fallback
Speech to Intent
You may want this for general questions, for commands that do not fit a neat regular expression, or for transforming a raw sentence such as:
make it a bit darker and reduce the music volume
into a structured action plan like.
[
{
"action": "light.hue.set_lights",
"args": {
"bri": 50
}
},
{
"action": "music.mpd.set_volume",
"args": {
"volume": 20
}
}
]
An example provided in the assistant sample is that of weather forecasting.
Note in particular the usage of openai.get_response with a well crafted system prompt that turns a natural language request like:
What's the weather tomorrow in San Francisco?
Into:
{
"type": "weather",
"delta_days": 1,
"location": "San Francisco"
}
def parse_weather_request(request: str) -> WeatherRequest | None:
request_dict = (
run(
"openai.get_response",
context=[
{
"role": "system",
"content": (
"You are a voice assistant provided with weather requests as free text.\n"
"Given the prompt, return a structured JSON representation of the request in the following format: "
'{ "type": "weather", "delta_days": 1, "location": "San Francisco" }, '
'where both delta_days and location are optional (e.g. if the user simply asks "How\'s the weather?".\n'
'If the prompt doesn\'t seem to contain a weather request, return { "type": null }'
),
}
],
prompt=request,
)
or {}
)
if request_dict.get("type") != "weather":
return None
weather_request = WeatherRequest(
location=request_dict.get("location", default_location),
delta_days=request_dict.get("delta_days", 0),
)
return weather_request
You can also use the model for intermediate transformation instead of direct answers. For example, ask it to return a tiny JSON object with action and args, then dispatch only actions you explicitly allow:
ALLOWED_ACTIONS = {
"lights.on": "light.hue.on",
"lights.off": "light.hue.off",
"music.play": "music.mpd.play",
"music.stop": "music.mpd.stop",
}
@when(SpeechRecognizedEvent)
def on_fuzzy_command(event):
plan = run(
"openai.get_response",
prompt=event.phrase,
context=[
{
"role": "system",
"content": (
"Map the user command to JSON only: "
'{"action": "...", "args": {...}}. '
f"Allowed actions: {', '.join(ALLOWED_ACTIONS)}. "
"If none match, return {\"action\": null, \"args\": {}}."
),
}
],
)
# Parse `plan` as JSON here, validate it, then run only an allow-listed action.
That last validation step matters. A model may be useful for interpretation, but it should not get arbitrary access to run().
Response fallback
If a request doesn't match any of the commands you have defined, you can use a generic SpeechRecognizedEvent hook to forward the request to an AI plugin, and render the response as speech through the text-to-speech plugin.
import logging
from platypush import run, when
from platypush.events.assistant import SpeechRecognizedEvent
logger = logging.getLogger(__name__)
ai_plugin = "openai"
assistant_plugin = "assistant.vosk"
@when(SpeechRecognizedEvent, plugin=assistant_plugin)
def on_speech_recognized(event: SpeechRecognizedEvent):
"""
Generic handler for speech recognition events received
by the configured assistant plugin.
"""
logger.info("Recognized speech: %s", event.phrase)
# Forward the request to OpenAI and render the response as speech
response = run(
f"{ai_plugin}.get_response",
prompt=event.phrase,
context=[
{
"role": "system",
"content": (
"You are a voice assistant that can answer questions and perform actions. "
"Keep in mind that prompts are transcriptions of user speech and they may "
"contain misspellings or errors. Try and interpret them as best as possible. "
"When possible, keep your answers short and concise."
),
}
],
)
# If the response is not empty, render it using the TTS plugin
if response:
event.assistant.render_response(response)
When a response from the LLM ends with a question mark, the assistant will automatically listen for a follow-up command and fire a new SpeechRecognizedEvent.
Pausing music while listening
One nice touch is to pause the music when a conversation starts and resume it after the assistant is done.
from platypush import run, when
from platypush.events.assistant import (
ConversationEndEvent,
ConversationStartEvent,
)
@when(ConversationStartEvent)
def on_conversation_start():
try:
run("utils.clear_timeout", name="ConversationEndTimeout")
except Exception as e:
logger.error("Error clearing conversation end timeout: %s", e)
run("music.mpd.pause_if_playing")
@when(ConversationEndEvent)
def on_conversation_end():
run(
"utils.set_timeout",
name="ConversationEndTimeout",
seconds=5,
actions=[{"action": "music.mpd.play_if_paused"}],
)
That makes the interaction feel much less clumsy: wake word, music ducks or pauses, command is recognized, answer is spoken, music resumes a few seconds later.
Going fully local
With the configuration above, hotword detection, speech-to-text, automation and text-to-speech are already local. The only non-local component is the openai plugin, if it points to OpenAI's servers.
To make the last step local too, run a model server that exposes an OpenAI-compatible API. Ollama, llama.cpp server, vLLM and LocalAI can all expose some version of /v1/chat/completions.
For example, with Ollama:
ollama pull llama3.1:8b
ollama serve
The OpenAI-compatible endpoint is then usually available at:
http://127.0.0.1:11434/v1/chat/completions
If your Platypush openai plugin version supports a custom API base URL, the configuration is the whole change:
openai:
model: llama3.1:8b
base_url: http://127.0.0.1:11434/v1
If it does not, keep the rest of the assistant exactly the same and replace only the fallback action with a tiny local request:
That is enough to turn the assistant into a fully local stack:
On a Raspberry Pi, I would still keep expectations realistic. Hotword detection, Vosk and Piper are fine on small machines. Local LLMs are the heavy piece. A Pi 5 with enough RAM can run small quantized models, but latency will not feel like a cloud model or a GPU-backed workstation. For many home automation workflows, that is acceptable because the LLM is only the fallback; the frequent commands stay deterministic.
Why this architecture ages well
Voice assistants have been a graveyard of abandoned SDKs and cloud products. Snowboy is gone. Mycroft is gone. The old Google Assistant SDK is deprecated. Vendor assistants are increasingly shaped around vendor ecosystems rather than user-controlled automation.
The safer long-term bet is not one monolithic assistant. It is a pipeline of small replaceable parts:
- Swap the hotword model without touching the automation logic.
- Swap Vosk for another STT engine without touching Hue or MPD.
- Swap OpenAI for a local OpenAI-compatible model without touching the wake word, TTS or command hooks.
- Swap Piper voices without touching the assistant flow.
Platypush is a good fit for this because its event system is already the boundary between perception and action. Speech recognition emits an event. Hooks decide what to do. Plugins execute the actions.
That separation is what makes the assistant inspectable. It is also what makes it possible to keep most of it on a Raspberry Pi in your house, instead of outsourcing the entire audio loop to a cloud service that may disappear, get worse, or decide one day that your use case is no longer part of the roadmap.
Final notes
The minimal version of this setup is small:
assistant.openwakewordfor the always-on wake word.assistant.voskfor local command transcription.- A few
@when(SpeechRecognizedEvent, phrase=...)hooks for deterministic commands. light.hue,music.mpdor any other Platypush plugin for actions.tts.piperfor local spoken responses.openai.get_responseonly where language understanding is worth the cost.
Start with the deterministic commands. Add the model fallback later. That way the assistant stays fast for the commands you use every day, while still being flexible enough to answer questions or interpret messy speech when you need it.