2025年3月12日星期三

AI视频生成:LTX-Video 0。95来了

大家好,我是每天分享AI绘画的萤火君!

很多同学可能都用过可灵来生成视频,不过免费用户生成视频要等很长时间,付费的会员又有点小贵。

这篇文章就给大家介绍一个视频生成模型:LTX-Video,LTX-Video是首个基于DiT的视频生成模型,能够以每秒24帧的速度生成高质量视频,该模型在大规模、多样化的视频数据集上进行了训练,能够生成具有现实感且内容丰富的高分辨率视频。虽然效果比可灵差点,但是生成视频的速度很快,大家可以抽卡。

此模型支持文本到图像、图像到视频、基于关键帧的动画制作、视频扩展(包括向前和向后扩展)、视频到视频的转换,以及这些功能的任意组合。

效果展示    

老规矩,先看效果。



环境准备    

这里使用的运行环境是 ComfyUI,请将ComfyUI升级到最新版本。    

没有ComfyUI的同学建议先使用云环境来运行,无需复杂且容易出错的环境配置,待有应用价值了,再到本地折腾也不迟。我的云镜像:https://haoee.com/applicationMarket/applicationDetails?appId=27&IC=XLZLpI7Q

插件    

这里使用的插件是 ComfyUI-LTXVideo,下载后放到 ComfyUI/custom_nodes 目录下。

https://github.com/Lightricks/ComfyUI-LTXVideo

访问github不方便的同学也可以通过我分享的网盘下载,下载方式见文末。

模型    

LTX-Video 有 0.9 、0.91、0.95 三个版本,这里使用最新的 0.95 版本,大家也可以下载 之前的版本去对比下。

·完整的 Checkpoint 模型:下载后放到 ComfyUI/models/checkpoints 目录下。

https://huggingface.co/Lightricks/LTX-Video/resolve/main/ltx-video-2b-v0.9.5.safetensors

·T5XXL文本编码器模型:下载后放到 ComfyUI/models/clip 目录下。

https://huggingface.co/chatpig/t5xxl/resolve/main/t5xxl_fp16.safetensors

·分离的UNet和VAE模型

有大佬对原始的LTX-Video做了一些优化,分离了UNet和VAE模型,方便各种搭配运行。    

UNet模型:下载后放到 ComfyUI/models/unet 目录下。

https://huggingface.co/city96/LTX-Video-0.9.5-gguf/resolve/main/ltx-video-2b-v0.9.5-BF16.gguf

VAE解码模型:下载后放到 ComfyUI/models/vae 目录下。

https://huggingface.co/city96/LTX-Video-0.9.5-gguf/resolve/main/LTX-Video-0.9.5-VAE-BF16.safetensors

·反推提示词的模型

https://huggingface.co/unsloth/Llama-3.2-3B-Instruct

https://huggingface.co/MiaoshouAI/Florence-2-large-PromptGen-v2.0

访问huggingface不方便的同学也可以通过我分享的网盘下载,下载方式见文末。

工作流使用    

工作流下载见文末。

首先看文生视频的工作流,整体上也比较简单。

    

首先加载基础模型和文本编码器模型;

然后编写生成视频用的提示词,这里可以使用负向提示词;

我们还需要提供生成视频的尺寸、帧率,以及视频总帧数(建议分辨率不超过720 x 1280,总帧数为8的倍数加1,总帧数低于257);

最后设置生成视频文件的格式就可以了,一般是mp4,也可以保存成gif、webp等图片格式。

再看图生视频的工作流,主要变化是需要上传一张图片,其它的参数和文生视频差不多,注意参考图片尺寸要和生成视频的尺寸匹配。

以上是LTX Video的基本用法,我还测了LTX Video的一些高级玩法。

比如图生视频时自动生成提示词:    

使用首尾帧控制视频的生成:

有兴趣的同学可以在高级工作流中找到它们,以及相关的模型。

提示词参考    

为了方便大家制作视频,这里提供一些提示词作参考。

云中飞碟    

A circular alien spacecraft (a flying saucer with a heavy, gray metallic texture) floats above soft, white clouds, gliding gracefully as the sea of clouds flows beneath it. The camera slowly follows its flight path, revealing endless stretches of cloud seas beneath and distant, misty mountain peaks that appear and disappear into the mist.

贵妇聊天

A woman with light skin, wearing a blue jacket and a black hat with a veil, looks down and to her right, then back up as she speaks; she has brown hair styled in an updo, light brown eyebrows, and is wearing a white collared shirt under her jacket; the camera remains stationary on her face as she speaks; the background is out of focus, but shows trees and people in period clothing; the scene is captured in real-life footage.

美女坐火车

Best quality, 4K, HDR, a woman is sitting inside a train carriage. She is wearing a white top and a black skirt, with her hair tied back in a ponytail and a face mask on. She slowly turns her head to look at the camera, with a smile in her eyes. Her hands are resting on the table in front of her, and the view outside the window shows a blurred image of buildings.

美女眨眼睛

best quality, 4k, a woman blink right eye and little smile. a woman with long, dark hair adorned with delicate flowers. She is wearing a light blue, strapless dress that appears to be made of a sheer, flowing fabric. Her face is adorned with small, sparkling embellishments near her eyes and on her forehead, giving her a fairy-like appearance. She is holding a single pink flower in her hand, which she gazes at gently. The background is a soft, dreamy blend of pastel colors, adding to the ethereal and magical atmosphere of the scene.    

男人瀑布

A man stands waist-deep in a crystal-clear mountain pool, his back turned to a massive, thundering waterfall that cascades down jagged cliffs behind him. He wears a dark blue swimming shorts and his muscular back glistens with water droplets. The camera moves in a dynamic circular motion around him, starting from his right side and sweeping left, maintaining a slightly low angle that emphasizes the towering height of the waterfall. As the camera moves, the man slowly turns his head to follow its movement, his expression one of awe as he gazes up at the natural wonder. The waterfall creates a misty atmosphere, with sunlight filtering through the spray to create rainbow refractions. The water churns and ripples around him, reflecting the dramatic landscape. The handheld camera movement adds a subtle shake that enhances the raw, untamed energy of the scene. The lighting is natural and bright, with the sun positioned behind the waterfall, creating a backlit effect that silhouettes the falling water and illuminates the mist.    

太阳花田

A close-up of a vibrant field of flowers under a clear blue sky, where bright sunlight streams through, casting delicate shadows and illuminating the vivid colors of each bloom. The field is a mosaic of rich hues—vivid yellows, deep purples, soft pinks, and fiery oranges—creating a breathtaking tapestry that dances gently in the breeze. Bees and butterflies flutter from flower to flower, adding life and movement to the scene. In the background, the golden glow of the sun highlights fluffy white clouds, while a gentle breeze whispers through the grass, making the entire landscape feel alive and vibrant.

通用反向提示词

low quality, worst quality, deformed, distorted, disfigured, motion smear, motion artifacts, fused fingers, bad anatomy, weird hand, ugly

资源下载    

本文用到的插件、基础工作流和模型都已经整理好,给公/众\号 "萤火AI绘画" 发消息 "LTXV" 即可获取下载地址。

另外我也总结了很多AI绘画的实战经验,开发了很多更加好用的高级工作流,如有需要请点击下方链接或者扫码订阅小册:https://xiaobot.net/post/03340243-9df6-4ea0-bad6-9911a5034bd6    



以上就是本文的主要内容。    

后续我将发布更多关于视频生成的内容,欢迎及时关注,以免错误重要内容。    

没有评论:

发表评论

字节跳动提出Pixel-SAIL!单一Transformer实现三大突破,性能不降反升!

最新论文解读系列论文名:Pixel-SAIL: Single Transformer For Pixel-G 最新论文解读系列 论文名: Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding 论文链接:...