Art

"Sealed with a Miss": My 2nd AI-Music Video

Sebastian Antony

30 Aug 2025 • 3 min read

"Sebina" on the synth

What started as a casual viewing of a nano banana tutorial turned into an unexpected creative adventure. While watching a video about nano-banana, I heard a snippet of "Sealed with a Kiss," the beloved early 1970s classic, and immediately knew I had to create my own interpretation.

Sebina with "Sealed with a Miss"

The Magic of Modern AI Tools

The transformation began with Gemini's nano-banana feature, which converted my headshot into a female version in seconds. From there, the AI generated an image of her playing a synthesizer – the perfect foundation for my music video concept.

Crafting the Audio

To avoid copyright issues, I turned to ChatGPT for help creating a parody version of the original song. The AI formatted the lyrics perfectly for recording through Suno AI, which generated four different versions.

ChatGPT makes lyric writing and formatting for Suno a breeze.

After selecting my favorite rendition, I downloaded the MP3 file, ready for the next phase.

SUNO generated the song-This is the full length verion

Video Generation Challenges

My friend Ioan, who leads Pixaroma, gave me a pre-release version of the Infinite Talk ComfyUI workflow, which utilizes Wan 2.1 Image to Video (aka I2V) technology. This cutting-edge tool was the backbone of my video creation process.

ComfyUI is the graphical AI image generation software I've been using for nearly 2 years

Loading both the Gemini-generated image and the Suno audio into the ComfyUI workflow, I used the prompt "a girl singing." In retrospect, I should have been more specific about hand movements and keyboard playing to enhance the performance authenticity. I reduced the image size to 432 x 592 pixels (half the original Gemini resolution) for faster processing. Even with this optimization, the video generation took nearly 28 minutes – a reminder that quality AV output requires patience. This was only the second time I had used this workflow, so I did not fiddle with settings or worry that the mouth was not synced with the song in certain parts.

For me, it's all about the learning journey, not the destination.

Patience is needed, this was on an RTX 4090 PC with 64GB Ram

The Upscaling Experience

Next came the upscaling phase using Topaz Video's new Starlight AI enhancer. After two failed attempts with the local Starlight Mini version, I opted for the cloud-based solution. The process consumed 100 credits (though I'm still unclear on the actual dollar cost) and took 40 minutes to complete. I'm unsure if the effort was worth it. While waiting for the cloud render to complete, I decided to document this entire experience – with a little help from Claude AI for organizing my thoughts. Uploading the short reel to YouTube was straightforward and only took a few minutes.

Reflecting on the Creative Revolution

This entire process highlighted something remarkable: someone with zero musical talent can create a complete music video in just a couple of hours. The seamless integration of multiple AI tools – from image generation and audio creation to video synthesis and upscaling – demonstrates how accessible creative production has become.

As I often remind myself, this is AI at its current state—the worst it will ever be. The technology only improves from here, promising even more exciting possibilities for small creators everywhere while potentially causing job loss for average professionals and freelancers who do not embrace the developments.

0:00

/0:20

Bonus - Music credits to Hammelsneid, a friend from Pixaroma Discord community