All News DISPATCH WORKFLOW

Stable Audio 3.0 Support Enables Long-Form Music and SFX Generation

Stability AI's latest audio models are now integrated into the ComfyUI ecosystem for immediate experimentation. This update introduces high-fidelity music generation and sound design capabilities directly into your existing node graphs.

ComfyUI

ComfyUI has integrated day-zero support for Stable Audio 3.0, the latest iteration of Stability AI’s music and sound generation technology. This update allows creators to generate high-fidelity audio tracks and sound effects within the same node-based interface they use for image and video generation. By bringing these models into the ecosystem, the update provides a unified environment for multi-modal creative projects.

What's new

The integration centers on the Stable Audio 3.0 model family, which introduces several technical improvements over previous versions. Most notably, the models now support variable-length generation, meaning creators are no longer confined to fixed-duration clips. You can generate short foley hits or longer musical compositions based on your specific project requirements.

The update includes support for smaller checkpoints that are more efficient for local execution. This makes it feasible to run complex audio generation tasks on consumer-grade hardware without relying on cloud APIs. Because the models were trained on fully licensed data from AudioSparx, the output is safer for commercial use compared to models trained on scraped datasets. Within ComfyUI, new nodes allow for precise control over the diffusion process, sampling steps, and prompt weighting for audio outputs.

How it fits your workflow

For filmmakers and video editors, this integration bridges the gap between visual composition and sound design. Instead of jumping between a video editor and a separate web-based audio generator, you can now build workflows that generate a visual sequence and its accompanying soundtrack or ambient background noise simultaneously. If you are working on a short film, you can use ComfyUI to generate a specific sound effect—like a mechanical whir or a cinematic transition—that matches the timing of your visual nodes.

This setup competes with standalone tools like Udio or Suno but offers the granular control characteristic of node-based systems. Sound designers can experiment with "audio-to-audio" workflows, using an existing recording as a structural guide for the AI to generate something new. For motion designers, this means the ability to iterate on rhythmic elements that sync with keyframes. While it doesn't replace a full Digital Audio Workstation (DAW), it serves as a highly capable assistant for generating raw assets and atmospheric textures.

What it costs / how to try it

Stable Audio 3.0 support is available to all ComfyUI users by updating their installation and downloading the necessary model weights. The models are released under the Stability AI Community License, which is free for personal use and for small businesses under a certain revenue threshold. You can find the implementation details and example JSON workflows on the official ComfyUI blog.

Read the original announcement on ComfyUI ↗

Help keep this running

Your tip funds servers, models, and the time it takes to ship new tools faster. Set any amount below — every bit helps.