Microsoft kicked things up a notch on October 9, 2024, with a fresh update for Copilot: their AI assistant can now sees and speaks, making it feel like a friend who’s always ready to lend a hand or an eye. This isn’t just about typing prompts anymore; Copilot can narrate your ideas out loud and analyze images you throw its way. I couldn’t wait to mess around with it, so I came up with a couple of playful challenges to see what this AI can really do. Let’s just say it was a mix of “wow” and “well, that’s… interesting.”
My Copilot Adventure: A Tea Ceremony and a Gym Bag Search
I started with the “talk to it” AI voice feature, nowadays being an unavoidable feature for flagship smartphones, wanting to test its storytelling chops with something serene yet specific. I asked Copilot to narrate a Japanese tea ceremony in Kyoto, as if we were sitting together in a traditional tatami room.
My prompt was: “Describe a tea ceremony in Kyoto like you’re there with me, and use your voice to set the mood.”
Copilot’s voice came through, calm and soothing, like a meditation guide who’s really into matcha. “We’re kneeling on the tatami mats,” it began. “The host whisks the matcha with a bamboo chasen, and the sound is rhythmic, almost hypnotic. Sunlight filters through the shoji screens, and there’s a faint scent of cedar in the air.” I was transported, until it added, “Oh, and there’s a plate of chocolate chip cookies on the table.”
Cookies? At a tea ceremony? I chuckled. It clearly meant wagashi (traditional Japanese sweets), but the mix-up was adorable.

OK, let’s move to the Copilots next “speaks and sees” feature. I dumped out my gym bag, a chaotic pile of sneakers, a towel, a protein bar wrapper, and a single blue sock, and hid my red sweatband somewhere in the mess. I took a photo and asked, “Can you spot my red sweatband in this picture?” Copilot took a quick look and said, “Found it! Your red sweatband is near the bottom, tangled up with the towel.” It was spot-on, and it only took about 8 seconds. But then it added, “Is that a tiny rubber duck next to the sneakers?” Nope, that was just a crumpled-up wrapper. Still, I was impressed by how fast it zeroed in on the sweatband.
How These Features Come to Life
The voice and vision features are super easy to use. For the voice option, you type your prompt and hit the “Speak” button, Copilot reads its response in a natural tone, and you can even pick different accents (I went with a soft Irish lilt for the tea ceremony because it felt extra calming). The vision tool is just as simple: upload an image, ask a question, and Copilot scans it in seconds. It can identify objects, describe what’s going on, or even read text if you need a quick assist.
The best part? It all happens in one chat window. You can go from asking Copilot to narrate a scene to having it analyze a photo without skipping a beat. It’s smooth and intuitive, which is perfect for someone like me who gets annoyed by complicated setups, and I would assume everyone else shares the same opinion.
The Bigger Picture for Microsoft
Microsoft is clearly aiming to keep Copilot at the top of the AI heap. With competitors like Google’s Gemini and OpenAI’s ChatGPT gaining ground, this voice and vision upgrade is a clever way to make Copilot stand out.
Why bounce between apps when Copilot now speaks and sees – it can chat, narrate, and analyze images all in one go? This is the question that came about occasionally and now begins to get an answer. I might add that now there are many different answers (options) to it. They’re also expanding access globally, starting with regions like Australia, Canada, and the UK, with more on the horizon. It’s a smart move to stay competitive in a crowded market.
One thing to keep in mind, though: these features involve more data—voice inputs and images could make some users nervous about privacy. Microsoft says they’ve got it covered, but if you’re curious, it’s worth checking out their privacy policy to see how they handle your info.
What’s Coming and How to Jump In
These new features are available to all Copilot users right now, no extra cost—though I wouldn’t be surprised if Microsoft rolls out some premium add-ons later (they’ve hinted at a Pro version). The voice narration is lovely but can trip over cultural details, like the cookie confusion in Kyoto. The vision tool is quick and accurate for finding objects, though it might see “rubber ducks” where there are none.
Here’s my suggestion: grab Copilot and give it a whirl with something totally your own. Ask it to narrate a bustling Moroccan souk in a New Zealand accent, or make it find your car keys in a photo of your messy kitchen counter. Get creative, have a laugh, and see what this AI can do for you.
If you want to read the official breakdown of Copilot’s new senses, you can check out Microsoft’s full announcement here.
Be sure to check-in on our Latest AI Chatbot News section for future Copilot’s feature upgrades!