Kling is a text-to-video model developed by Kuaishou Technology, the Chinese short-video and social platform, unveiled in an official announcement dated June 11, 2024. Its arrival, soon after OpenAI demonstrated Sora, marked one of the first signs that Chinese technology companies would compete directly at the frontier of generative video.
According to Kuaishou, Kling can generate videos up to two minutes long at 30 frames per second and up to 1080p resolution, supporting a range of aspect ratios. The company emphasized the model’s ability to render complex spatiotemporal motion and simulate aspects of the physical world. Technically, Kling uses a diffusion-transformer architecture, the same DiT recipe behind Sora and other leading systems, augmented with Kuaishou’s own changes to the latent-space encoding via a 3D variational autoencoder and a full-attention spatiotemporal modeling module. It was initially released for beta testing inside KuaiYing, Kuaishou’s video-editing app for users in China.
Kling mattered as evidence that the video-generation race was global rather than confined to a few U.S. labs, and its two-minute clip length and physics emphasis put it among the more capable systems of its generation. For a business reader, it signals that competitive, production-grade AI video tools are emerging from multiple regions at once, with implications for content costs and for which platforms control the creative pipeline.