Z-Tech - March Code Jam

Project 2: Virtual Pet

Main Code Jam Blog Post: March Code Jam

Project Length: 1 Day

GitHub Repo: https://github.com/mjzitek/virtual-pet

Project Description

An AI-powered virtual pet application that allows users to interact with a digital pet through guided choices.

Hour 1: Planning and First Steps

I again turn to ChatGPT to get started. Sometimes with projects I have a general idea of what I want to do but not all the details so I'll use an LLM to help me figure it out.

Help me create a virtual pet.  I'm not sure about the features I want for it.  I want to do it in a web interface, so probably something like streamlit for ease.  Help me think about the features and interface.  Make sure it has some LLM component to it.

After going back and forth with ChatGPT for a short bit, I had a simple python script I could use.

Next I created a new project folder and opened in Cursor. Normally for projects with Cursor I create a "project-doc.md" from what was discussed in ChatGPT and go from there, but for this one I wanted to start with the simple script from ChatGPT. I still set up some of my standard things in Cursor like the project-doc.md and my Cursor rules. After that, I used Cursor to fix a couple small issues just to save myself a minute. One of those issues was simply where it was trying to pull the images due to the folder structure I set up which was different that ChatGPT assumed.

And here we are:

Right now I can only click on the three buttons and it changes the image and stats, but it's a start.

Hour 2 and 3

At this point I want to start implementing the GenAI features. I'm going to use Cursor to generate a lot of the "infrastructure" of the app, but I want to do as much of the GenAI "logic" myself.

Even with me telling it not to create prompts it still does:

Oh well...I'll just roll with it for now. (Although it did include the bit about it being a "placeholder" so you win Cursor...I guess.)

Intially I was going to have different buttons for the user to pick (Feed, Play, Rest,) but I like the idea of more of a "choose your own adventure" type of interaction. Since that's what the prompt that Cursor created already I continued on that path.

Aaaanddd...I continue to fight with Cursor

As of right now I have:

User can pick a pet and a name
The app generates a base story for the pet
A basic event generator that generates the next part of the story and a couple options for the user to pick
No memory system, but the events history is being sent to the LLM when events are generated.

At the end of hour three I have a basic working Virtual Pet story generator.

I've got some ideas to make this more fun which I will get into in the next hour.

Hour 4: Pictures

I was thinking it would be fun to add dynamic images based on what's going on with the story. For simplicity I'm just going to use DALL-E 3 through the OpenAI API. To try to keep the images a little consistant, I've feed one of my premade images (created in Midjourney) to ChatGPT and asked it to describe it. I then took the description as part of the prompt.

# Customize the image description based on pet type
pet_description = ""
if pet_type.lower() == "cat":
    pet_description = f"a cute, fluffy cat with big, expressive eyes and a {pet_state} expression"
elif pet_type.lower() == "dog":
    pet_description = f"a playful, friendly dog with a wagging tail and a {pet_state} expression"
elif pet_type.lower() == "rabbit":
    pet_description = f"a small, furry rabbit with long ears and a {pet_state} expression"
else:
    pet_description = f"a cute, animated {pet_type} with a {pet_state} expression"

prompt = f"""Generate an image of {pet_name} the {pet_type} based on the following story.

OVERALL IMAGE DESCRIPTION:
The image is a vibrant, animated illustration of {pet_description}. 
The scene should be colorful and engaging, with a light background that enhances the brightness of the overall image.
The pet should be the main focus of the image, with its body language and facial expression conveying its current mood.

STORY CONTEXT:
{event_description}

Make sure the image reflects the story context and the pet's current mood ({pet_state}).
"""

I'm going to have a problem with image consistancy as far as the look of the pet, but that's something I'll have to deal with for now. I'll add a few more details for the overall image description to try to help though.

I added some more detailed base description of the cat but I went too far...wow...that is just frightning. Let me tweak it a bit.

Well that's better!

One issue I've noticed is the stories all seem to start around a similar theme and often the first action is chasing a butterfly so I decided to create some predefined story starters that will be randomly selected at startup.

STORY_STARTERS = [
    {"location": "a cozy apartment", "scenario": "Your pet just woke up and is looking for something fun to do."},
    {"location": "a sunny park", "scenario": "Your pet is outside enjoying the fresh air but seems curious about something."},
    {"location": "a mysterious alleyway", "scenario": "Your pet has wandered into an unfamiliar place. What will happen next?"},
    {"location": "a bustling city street", "scenario": "Your pet is watching the world go by. Something interesting catches its eye."},
    {"location": "a quiet library", "scenario": "Your pet is surrounded by books. Maybe it's looking for a story?"},
    {"location": "a beach at sunset", "scenario": "Your pet is watching the waves crash. It seems to be thinking about something."},
    {"location": "a dense forest", "scenario": "Your pet is exploring the woods, sniffing out new discoveries."},
    {"location": "a rooftop garden", "scenario": "Your pet finds itself among the plants and flowers, enjoying the view of the city below."},
    {"location": "a futuristic space station", "scenario": "Your pet is floating in zero gravity, playfully pawing at the air."},
    {"location": "a pirate ship", "scenario": "Your pet is aboard a ship, watching seagulls circle above. Is it ready for an adventure?"},
    {"location": "a hidden underground bunker", "scenario": "Your pet has discovered a secret hideout. Who built this place?"},
    {"location": "a snow-covered village", "scenario": "Your pet's paws leave prints in the fresh snow as it explores the wintery wonderland."}
]

Hour 5: Kid mode

While you might think this is already designed for kids, and it partially is, I wanted a way to make the stories simipler for young kids like my son, while still having the current level of story generation. I added a "simple" mode that tells the LLM to generate text at a 5 year old level. (Explain it like I'm five mode).

So now on the story start page, there is an option to pick "Young Reader Mode". This will generate story events and choices that are better for a young child.

Hour 6 and 7: Prompt Tweaking and Voice Mode

After I got the Young Reader Mode working I spent some time working on the prompts to keep the story line going. I'm not sure what happened but the story line seems to be pretty repetitive now despite my messages in the prompt to try to prevent it.

Listen to the audio clip here

After I got the audo working I wanted to work on the display of the page. I didn't like how elements were showing up to the user when it was generating new story segments. I wanted more traditional "loading" type elements. I also wanted to make the different generted pieces more async so it wouldn't take as long to load up everything and make the user just sit and wait.

Since I'm using Streamlit and not a full custom web app, I may have to just live with some of the interface issues for now.

Hour 8

This hour is all about fighting with Cursor over changes...partially kiding. I gave up on the more async changes I was trying to do and went back to the simple Play Story Audio button.

Despite instructions in the Cursor rules it still likes to do things I didn't asked for...oh well.

With the last 30 mins apon me, I think I will focus on cleaning up some UI issues. There is so much more I could do with it...and I think I will improve on it later. For now I'm excited to see what my little one thinks of it!

Conclusion

This was a fun project that I plan to continue to develop and use with my kid. The orginal idea changed a bit but I like how it came out. In some ways this was similar to project 1 with it's story generation, but I also added some fresh components like the image generation and text to speech.