Technical Design & How Tos

coffee in , code out - week 6 & 7 - the midway point
Mar 10, 2025
So much code.
For two weeks i've been actually coding this out and figuring out the best plan of attack. I went through the initial exercise of building a functional mutli-sensory AI with no show flow. Its lightweight and can be improved in both fine-tuning and inference speeds, but its a great start. These are the steps to make that happen:
Choose models. This is a big step. Going through huggingface, trying to find the right model for your use case. Since I am running inference locally I decided to keep everything lightweight for now.
Llama 3.1 1B tiny, but its super impressive, its lack of parameters kind of make the text feel cryptic which is sort of intended. I will try to work with distilled models next.
OpenCV for vision
Vosk for STT
XXTS Coqui Voice cloning for TTS
I had to train MAIA on a statement/question and desired response dataset that was both steeped in her character, the story, and also some elements
Right now I am one-shot voice cloning, this can be improved
When you hit start the app handles the whole flow and out pops audio wav file of an llm response that is accounting for the users emotion, position in the room, the story, character. not bad!
SHOW RUNNER
I have decided the experience is 5 different scenes or phases. Each phase takes the participant deeper into the world, makes it more personal, and closer toward self-awareness.
Phases
Tablet
Introductions
Lore
Assignement
Departure
I realized i need a UI to run the show. something that is lightweight, but also displays what a user is going through in the process. I have constructed a rudimentary UI to control the flow of these that can talk with MAX MSP. I am still unsure which direction I should take with this, but i like creating my own frontend to control all this backend inference. I'm imaging similar to a light board, but when you hit go, not only does it turn on the lights, but it runs AI inference too.

This is not for general use, but only for show running. I learned from diving deeper with theme park automation that there are more humans in the loop than you know. Lots of stop points for guest safety.
HOW TOs
May 15, 2025
This will be a running how to list of all the element that I am touching upon in this experience. Come back for more updates.
Choosing an LLM
Finetuning Models
Giving AI some Character
Beyond LLM (STT, CV, and TTS)
Human Centered AI
Hyper Personalization
Ethical AI for Performance Art
LEDs are fun
Edge AI
Using Blender
GenAI Needs After Effects
Sound Engineering
Coding for AI Experiences
Phil Olarte MAIA Project C 2025