The MAIA Project, NYU Capston by Phil Olarte

Technical Design & How Tos

coffee in , code out - week 6 & 7 - the midway point

Mar 10, 2025

So much code.

For two weeks i've been actually coding this out and figuring out the best plan of attack. I went through the initial exercise of building a functional mutli-sensory AI with no show flow. Its lightweight and can be improved in both fine-tuning and inference speeds, but its a great start. These are the steps to make that happen:

Choose models. This is a big step. Going through huggingface, trying to find the right model for your use case. Since I am running inference locally I decided to keep everything lightweight for now.
- Llama 3.1 1B tiny, but its super impressive, its lack of parameters kind of make the text feel cryptic which is sort of intended. I will try to work with distilled models next.
- OpenCV for vision
- Vosk for STT
- XXTS Coqui Voice cloning for TTS
I had to train MAIA on a statement/question and desired response dataset that was both steeped in her character, the story, and also some elements
Right now I am one-shot voice cloning, this can be improved

When you hit start the app handles the whole flow and out pops audio wav file of an llm response that is accounting for the users emotion, position in the room, the story, character. not bad!

SHOW RUNNER

I have decided the experience is 5 different scenes or phases. Each phase takes the participant deeper into the world, makes it more personal, and closer toward self-awareness.

Phases

Tablet
Introductions
Lore
Assignement
Departure

I realized i need a UI to run the show. something that is lightweight, but also displays what a user is going through in the process. I have constructed a rudimentary UI to control the flow of these that can talk with MAX MSP. I am still unsure which direction I should take with this, but i like creating my own frontend to control all this backend inference. I'm imaging similar to a light board, but when you hit go, not only does it turn on the lights, but it runs AI inference too.

This is not for general use, but only for show running. I learned from diving deeper with theme park automation that there are more humans in the loop than you know. Lots of stop points for guest safety.

[read-more]

[read-entry]

HOW TOs

May 15, 2025

coffee in , code out - week 6 & 7 - the midway point

HOW TOs

This will be a running how to list of all the element that I am touching upon in this experience. Come back for more updates.

Choosing an LLM

Finetuning Models

Giving AI some Character

Beyond LLM (STT, CV, and TTS)

Human Centered AI

Hyper Personalization

Ethical AI for Performance Art

LEDs are fun

Edge AI

Using Blender

GenAI Needs After Effects

Sound Engineering

Coding for AI Experiences