Pushing the Limit of Efficient Inference-Time Scaling with Terminal Velocity Matching

Authors
Linqi ZhouProject Lead
Ayaan HaqueCo-Lead on large-scale training

November 26, 2025

Here's a fun challenge: We have a carousel of images, organized into columns. In each column, one image was generated by 4-step TVM and the other by 50x2-step diffusion. Can you tell which is which?

  • “A panoramic view of a volcanic island shows a plume of smoke rising dramatically from the crater, while black lava rock slopes meet turquoise ocean waves below. Small fishing boats float near the shore, their bright hulls contrasting with the rugged coastline, as seabirds wheel overhead in the humid tropical air.”“A panoramic view of a volcanic island shows a plume of smoke rising dramatically from the crater, while black lava rock slopes meet turquoise ocean waves below. Small fishing boats float near the shore, their bright hulls contrasting with the rugged coastline, as seabirds wheel overhead in the humid tropical air.”
  • “A woman with a gentle expression and flowing hair sits elegantly on a majestic horse, set against a serene landscape with rolling hills and a clear sky, cinematic, dramatic, equestrian, nostalgic.”“A woman with a gentle expression and flowing hair sits elegantly on a majestic horse, set against a serene landscape with rolling hills and a clear sky, cinematic, dramatic, equestrian, nostalgic.”
  • “A modern art gallery displays bold abstract paintings hung evenly along the white walls. A massive sculpture made of twisted metal rises from the center of the floor.”“A modern art gallery displays bold abstract paintings hung evenly along the white walls. A massive sculpture made of twisted metal rises from the center of the floor.”
  • “A close-up of an elderly woman with silver hair pulled back loosely, her gentle smile illuminated by diffused window light, the soft interior behind her fading into creamy blur, the warmth of afternoon sun giving her face a timeless glow, nostalgic, tender, emotional, serene.”“A close-up of an elderly woman with silver hair pulled back loosely, her gentle smile illuminated by diffused window light, the soft interior behind her fading into creamy blur, the warmth of afternoon sun giving her face a timeless glow, nostalgic, tender, emotional, serene.”
  • “Drone footage of a close-up of a golden retriever perched on a cliff face with a lush jungle canopy below. The scene is bright and stunningly detailed. Dynamic shot, 35mm film.”“Drone footage of a close-up of a golden retriever perched on a cliff face with a lush jungle canopy below. The scene is bright and stunningly detailed. Dynamic shot, 35mm film.”
  • “A close-up of an older man with a soft white beard, his face illuminated by the cool glow of a streetlamp on a rainy night, raindrops catching on his eyelashes, the city lights behind him dissolving into colorful bokeh, reflective, moody, deeply atmospheric.”“A close-up of an older man with a soft white beard, his face illuminated by the cool glow of a streetlamp on a rainy night, raindrops catching on his eyelashes, the city lights behind him dissolving into colorful bokeh, reflective, moody, deeply atmospheric.”
  • “A wide canyon landscape unfolds beneath a blazing afternoon sun, its layered red and orange cliffs towering over a winding river far below. Sparse shrubs cling to rocky ledges, and an eagle circles above the jagged ridges. Shadows stretch dramatically across the canyon floor, emphasizing the immensity of the scene.”“A wide canyon landscape unfolds beneath a blazing afternoon sun, its layered red and orange cliffs towering over a winding river far below. Sparse shrubs cling to rocky ledges, and an eagle circles above the jagged ridges. Shadows stretch dramatically across the canyon floor, emphasizing the immensity of the scene.”
  • “Ultra-realistic portrait of a cookie with expressive chocolate chip eyes, wearing a tiny explorer's hat. The cookie stands on a mound of cocoa powder with dramatic shadows cast by a spotlight, while the kitchen background is softly blurred. Quirky, textured, rich contrast.”“Ultra-realistic portrait of a cookie with expressive chocolate chip eyes, wearing a tiny explorer's hat. The cookie stands on a mound of cocoa powder with dramatic shadows cast by a spotlight, while the kitchen background is softly blurred. Quirky, textured, rich contrast.”
  • “A panoramic view of a volcanic island shows a plume of smoke rising dramatically from the crater, while black lava rock slopes meet turquoise ocean waves below. Small fishing boats float near the shore, their bright hulls contrasting with the rugged coastline, as seabirds wheel overhead in the humid tropical air.”“A panoramic view of a volcanic island shows a plume of smoke rising dramatically from the crater, while black lava rock slopes meet turquoise ocean waves below. Small fishing boats float near the shore, their bright hulls contrasting with the rugged coastline, as seabirds wheel overhead in the humid tropical air.”
  • “A woman with a gentle expression and flowing hair sits elegantly on a majestic horse, set against a serene landscape with rolling hills and a clear sky, cinematic, dramatic, equestrian, nostalgic.”“A woman with a gentle expression and flowing hair sits elegantly on a majestic horse, set against a serene landscape with rolling hills and a clear sky, cinematic, dramatic, equestrian, nostalgic.”
  • “A modern art gallery displays bold abstract paintings hung evenly along the white walls. A massive sculpture made of twisted metal rises from the center of the floor.”“A modern art gallery displays bold abstract paintings hung evenly along the white walls. A massive sculpture made of twisted metal rises from the center of the floor.”
  • “A close-up of an elderly woman with silver hair pulled back loosely, her gentle smile illuminated by diffused window light, the soft interior behind her fading into creamy blur, the warmth of afternoon sun giving her face a timeless glow, nostalgic, tender, emotional, serene.”“A close-up of an elderly woman with silver hair pulled back loosely, her gentle smile illuminated by diffused window light, the soft interior behind her fading into creamy blur, the warmth of afternoon sun giving her face a timeless glow, nostalgic, tender, emotional, serene.”
  • “Drone footage of a close-up of a golden retriever perched on a cliff face with a lush jungle canopy below. The scene is bright and stunningly detailed. Dynamic shot, 35mm film.”“Drone footage of a close-up of a golden retriever perched on a cliff face with a lush jungle canopy below. The scene is bright and stunningly detailed. Dynamic shot, 35mm film.”
  • “A close-up of an older man with a soft white beard, his face illuminated by the cool glow of a streetlamp on a rainy night, raindrops catching on his eyelashes, the city lights behind him dissolving into colorful bokeh, reflective, moody, deeply atmospheric.”“A close-up of an older man with a soft white beard, his face illuminated by the cool glow of a streetlamp on a rainy night, raindrops catching on his eyelashes, the city lights behind him dissolving into colorful bokeh, reflective, moody, deeply atmospheric.”
  • “A wide canyon landscape unfolds beneath a blazing afternoon sun, its layered red and orange cliffs towering over a winding river far below. Sparse shrubs cling to rocky ledges, and an eagle circles above the jagged ridges. Shadows stretch dramatically across the canyon floor, emphasizing the immensity of the scene.”“A wide canyon landscape unfolds beneath a blazing afternoon sun, its layered red and orange cliffs towering over a winding river far below. Sparse shrubs cling to rocky ledges, and an eagle circles above the jagged ridges. Shadows stretch dramatically across the canyon floor, emphasizing the immensity of the scene.”
  • “Ultra-realistic portrait of a cookie with expressive chocolate chip eyes, wearing a tiny explorer's hat. The cookie stands on a mound of cocoa powder with dramatic shadows cast by a spotlight, while the kitchen background is softly blurred. Quirky, textured, rich contrast.”“Ultra-realistic portrait of a cookie with expressive chocolate chip eyes, wearing a tiny explorer's hat. The cookie stands on a mound of cocoa powder with dramatic shadows cast by a spotlight, while the kitchen background is softly blurred. Quirky, textured, rich contrast.”
TL;DR, Terminal Velocity Matching (TVM) is a new single-stage training paradigm for efficient generation. While achieving the same sample quality, it exhibits 25x speedup compared to standard diffusion models when trained from scratch.

Diffusion models and Flow Matching are the powerhouse behind current state-of-the-art Text-to-Image and Text-to-Video models. Despite their high quality, they are extremely expensive to serve for inference due to them requiring many expensive neural network calls (e.g. 100 calls) to obtain high-quality samples. In search of generative paradigms beyond diffusion models, we released a pioneering work, Inductive Moment Matching (IMM), which introduced the notion of efficient inference-time scaling and proposed to redesign pretraining algorithms that unlock one- and few-step capabilities without sacrificing quality.

TVM pushes efficient inference-time scaling even further, achieving higher quality with fewer iterations.

However, despite theoretical soundness, IMM suffers from a few fundamental limitations that prevented it from being utilized in large-scale pre-training, most notably the multi-sample training objective and dependence on high precision (FP16 as opposed to BF16).

Our new work, Terminal Velocity Matching (TVM), is similarly motivated towards this goal and pushes the limit of efficient inference-time scaling even further while focusing on much more scalable training techniques. Similar to IMM and Flow Matching, TVM is a principled training framework that shares tight connection to distribution matching, a fundamental concept in generative modeling, and subsumes Flow Matching as a special case. However, in contrast to IMM, we are able to scale TVM pre-training to 10B+ parameter diffusion transformers with ease. In addition to our theoretical contributions, we focus on model design choices and engineering efforts to support our approach. Here are some samples from our new Text-to-Image model trained entirely from scratch using TVM as the pre-training objective. All following samples are sampled with only 4 steps (i.e. 4 neural network calls).

  • "’Terminal Velocity Matching’ projected in massive glowing letters onto the rugged face of a cliff at dusk, the bright text illuminating the texture of the stone with dramatic contrast. The surrounding landscape fades into cool blues and purples as the last light of day disappears behind the mountains. The surreal scale and luminous projection create a powerful, atmospheric visual.”"’Terminal Velocity Matching’ projected in massive glowing letters onto the rugged face of a cliff at dusk, the bright text illuminating the texture of the stone with dramatic contrast. The surrounding landscape fades into cool blues and purples as the last light of day disappears behind the mountains. The surreal scale and luminous projection create a powerful, atmospheric visual.”
  • “A captivating shot of a sleek, black motorcycle parked against a graffiti-covered urban wall. The vibrant colors and chaotic patterns of the graffiti contrast sharply with the polished chrome and matte black finish of the motorcycle, creating a dynamic and edgy visual. The lighting is soft, with overcast daylight gently illuminating the scene, casting subtle reflections off the bike’s surface. The composition is slightly angled, focusing on the motorcycle’s silhouette against the vibrant backdrop, evoking a sense of rebellion and freedom.”“A captivating shot of a sleek, black motorcycle parked against a graffiti-covered urban wall. The vibrant colors and chaotic patterns of the graffiti contrast sharply with the polished chrome and matte black finish of the motorcycle, creating a dynamic and edgy visual. The lighting is soft, with overcast daylight gently illuminating the scene, casting subtle reflections off the bike’s surface. The composition is slightly angled, focusing on the motorcycle’s silhouette against the vibrant backdrop, evoking a sense of rebellion and freedom.”
  • “A delicate, orange-hued fox formed from intricately arranged sliced persimmons, its slender body and legs crafted from the curved fruit, with a tail that elegantly curls into a swirling pattern of peeled clementine segments, set against a muted, earthy background, warm light, organic, still life, fruit sculpture, culinary art.”“A delicate, orange-hued fox formed from intricately arranged sliced persimmons, its slender body and legs crafted from the curved fruit, with a tail that elegantly curls into a swirling pattern of peeled clementine segments, set against a muted, earthy background, warm light, organic, still life, fruit sculpture, culinary art.”
  • “A giant protest banner fills the entire scene, stretched taut and painted with huge red block letters reading ‘JUSTICE NOW.’ The fabric shows creases and weather stains, but the words remain vivid and forceful.”“A giant protest banner fills the entire scene, stretched taut and painted with huge red block letters reading ‘JUSTICE NOW.’ The fabric shows creases and weather stains, but the words remain vivid and forceful.”
  • “A close-up of a teenage boy with braces and soft freckles, his face lit by the glow of his phone, reflections shimmering across his glasses, the bedroom around him fading into cool, grainy shadows, youthful, quiet, contemplative.”“A close-up of a teenage boy with braces and soft freckles, his face lit by the glow of his phone, reflections shimmering across his glasses, the bedroom around him fading into cool, grainy shadows, youthful, quiet, contemplative.”
  • “Zooming into a burger patty, with the bottom bun and lettuce and tomatoes. 'COFFEE' written on it bun.”“Zooming into a burger patty, with the bottom bun and lettuce and tomatoes. 'COFFEE' written on it bun.”
  • “The Christ the Redeemer statue towers over Rio de Janeiro, its arms spread wide above lush green mountains. The city sprawls far below, golden beaches curving around the bay, while clouds drift lazily across the blue sky behind the monument.”“The Christ the Redeemer statue towers over Rio de Janeiro, its arms spread wide above lush green mountains. The city sprawls far below, golden beaches curving around the bay, while clouds drift lazily across the blue sky behind the monument.”
  • “A close-up portrait of a rugged mountaineer with wind-burned cheeks and frost clinging to his beard, his breath visible in the cold air, the snow-covered peaks behind him merging into pale, icy blur, harsh yet beautiful, atmospheric, stark, heroic.”“A close-up portrait of a rugged mountaineer with wind-burned cheeks and frost clinging to his beard, his breath visible in the cold air, the snow-covered peaks behind him merging into pale, icy blur, harsh yet beautiful, atmospheric, stark, heroic.”
  • “The Eiffel Tower rises majestically above Paris at dusk, its iron lattice glowing with thousands of golden lights. Along the Seine below, boats drift past tree-lined banks, while couples stroll across a nearby bridge under a lavender sky streaked with fading sunlight.”“The Eiffel Tower rises majestically above Paris at dusk, its iron lattice glowing with thousands of golden lights. Along the Seine below, boats drift past tree-lined banks, while couples stroll across a nearby bridge under a lavender sky streaked with fading sunlight.”
  • “A majestic male adult lion walks regally through a dense forest, its head raised slightly to the right. The forest is characterized by tall trees and various foliage on the ground, creating a natural habitat. The lion is centered in the composition, placed at eye level, and is illuminated by soft, diffused lighting, giving the scene a serene and tranquil atmosphere. The overall mood is majestic and tranquil, evoking a sense of calm and serenity. The title of this scene is ‘A lion in a forest’.”“A majestic male adult lion walks regally through a dense forest, its head raised slightly to the right. The forest is characterized by tall trees and various foliage on the ground, creating a natural habitat. The lion is centered in the composition, placed at eye level, and is illuminated by soft, diffused lighting, giving the scene a serene and tranquil atmosphere. The overall mood is majestic and tranquil, evoking a sense of calm and serenity. The title of this scene is ‘A lion in a forest’.”
  • "’Terminal Velocity Matching’ projected in massive glowing letters onto the rugged face of a cliff at dusk, the bright text illuminating the texture of the stone with dramatic contrast. The surrounding landscape fades into cool blues and purples as the last light of day disappears behind the mountains. The surreal scale and luminous projection create a powerful, atmospheric visual.”"’Terminal Velocity Matching’ projected in massive glowing letters onto the rugged face of a cliff at dusk, the bright text illuminating the texture of the stone with dramatic contrast. The surrounding landscape fades into cool blues and purples as the last light of day disappears behind the mountains. The surreal scale and luminous projection create a powerful, atmospheric visual.”
  • “A captivating shot of a sleek, black motorcycle parked against a graffiti-covered urban wall. The vibrant colors and chaotic patterns of the graffiti contrast sharply with the polished chrome and matte black finish of the motorcycle, creating a dynamic and edgy visual. The lighting is soft, with overcast daylight gently illuminating the scene, casting subtle reflections off the bike’s surface. The composition is slightly angled, focusing on the motorcycle’s silhouette against the vibrant backdrop, evoking a sense of rebellion and freedom.”“A captivating shot of a sleek, black motorcycle parked against a graffiti-covered urban wall. The vibrant colors and chaotic patterns of the graffiti contrast sharply with the polished chrome and matte black finish of the motorcycle, creating a dynamic and edgy visual. The lighting is soft, with overcast daylight gently illuminating the scene, casting subtle reflections off the bike’s surface. The composition is slightly angled, focusing on the motorcycle’s silhouette against the vibrant backdrop, evoking a sense of rebellion and freedom.”
  • “A delicate, orange-hued fox formed from intricately arranged sliced persimmons, its slender body and legs crafted from the curved fruit, with a tail that elegantly curls into a swirling pattern of peeled clementine segments, set against a muted, earthy background, warm light, organic, still life, fruit sculpture, culinary art.”“A delicate, orange-hued fox formed from intricately arranged sliced persimmons, its slender body and legs crafted from the curved fruit, with a tail that elegantly curls into a swirling pattern of peeled clementine segments, set against a muted, earthy background, warm light, organic, still life, fruit sculpture, culinary art.”
  • “A giant protest banner fills the entire scene, stretched taut and painted with huge red block letters reading ‘JUSTICE NOW.’ The fabric shows creases and weather stains, but the words remain vivid and forceful.”“A giant protest banner fills the entire scene, stretched taut and painted with huge red block letters reading ‘JUSTICE NOW.’ The fabric shows creases and weather stains, but the words remain vivid and forceful.”
  • “A close-up of a teenage boy with braces and soft freckles, his face lit by the glow of his phone, reflections shimmering across his glasses, the bedroom around him fading into cool, grainy shadows, youthful, quiet, contemplative.”“A close-up of a teenage boy with braces and soft freckles, his face lit by the glow of his phone, reflections shimmering across his glasses, the bedroom around him fading into cool, grainy shadows, youthful, quiet, contemplative.”
  • “Zooming into a burger patty, with the bottom bun and lettuce and tomatoes. 'COFFEE' written on it bun.”“Zooming into a burger patty, with the bottom bun and lettuce and tomatoes. 'COFFEE' written on it bun.”
  • “The Christ the Redeemer statue towers over Rio de Janeiro, its arms spread wide above lush green mountains. The city sprawls far below, golden beaches curving around the bay, while clouds drift lazily across the blue sky behind the monument.”“The Christ the Redeemer statue towers over Rio de Janeiro, its arms spread wide above lush green mountains. The city sprawls far below, golden beaches curving around the bay, while clouds drift lazily across the blue sky behind the monument.”
  • “A close-up portrait of a rugged mountaineer with wind-burned cheeks and frost clinging to his beard, his breath visible in the cold air, the snow-covered peaks behind him merging into pale, icy blur, harsh yet beautiful, atmospheric, stark, heroic.”“A close-up portrait of a rugged mountaineer with wind-burned cheeks and frost clinging to his beard, his breath visible in the cold air, the snow-covered peaks behind him merging into pale, icy blur, harsh yet beautiful, atmospheric, stark, heroic.”
  • “The Eiffel Tower rises majestically above Paris at dusk, its iron lattice glowing with thousands of golden lights. Along the Seine below, boats drift past tree-lined banks, while couples stroll across a nearby bridge under a lavender sky streaked with fading sunlight.”“The Eiffel Tower rises majestically above Paris at dusk, its iron lattice glowing with thousands of golden lights. Along the Seine below, boats drift past tree-lined banks, while couples stroll across a nearby bridge under a lavender sky streaked with fading sunlight.”
  • “A majestic male adult lion walks regally through a dense forest, its head raised slightly to the right. The forest is characterized by tall trees and various foliage on the ground, creating a natural habitat. The lion is centered in the composition, placed at eye level, and is illuminated by soft, diffused lighting, giving the scene a serene and tranquil atmosphere. The overall mood is majestic and tranquil, evoking a sense of calm and serenity. The title of this scene is ‘A lion in a forest’.”“A majestic male adult lion walks regally through a dense forest, its head raised slightly to the right. The forest is characterized by tall trees and various foliage on the ground, creating a natural habitat. The lion is centered in the composition, placed at eye level, and is illuminated by soft, diffused lighting, giving the scene a serene and tranquil atmosphere. The overall mood is majestic and tranquil, evoking a sense of calm and serenity. The title of this scene is ‘A lion in a forest’.”
  • “A dense autumn forest glows with fiery hues of red, orange, and gold. Fallen leaves blanket the forest floor, while shafts of golden sunlight filter through the canopy. A small wooden bridge spans a brook at the bottom of the valley, and mist curls upward from the water into the crisp morning air.”“A dense autumn forest glows with fiery hues of red, orange, and gold. Fallen leaves blanket the forest floor, while shafts of golden sunlight filter through the canopy. A small wooden bridge spans a brook at the bottom of the valley, and mist curls upward from the water into the crisp morning air.”
  • “Portrait off-shot of a gorgeous 23-year-old Russian girl with platinum blonde hair, walking along a riverside path lit by colorful reflections. Her face is softly illuminated by passing lights. Professional 8K photo, elegant skin tones, cinematic blur.”“Portrait off-shot of a gorgeous 23-year-old Russian girl with platinum blonde hair, walking along a riverside path lit by colorful reflections. Her face is softly illuminated by passing lights. Professional 8K photo, elegant skin tones, cinematic blur.”
  • “Inside a spacious art studio with high ceilings and whitewashed brick walls, canvases lean against every surface, some half-finished with bold strokes of paint. Brushes, jars of turpentine, and palettes are scattered across tables, while a skylight above lets in soft daylight. An easel in the center holds a large portrait of a woman with unfinished features.”“Inside a spacious art studio with high ceilings and whitewashed brick walls, canvases lean against every surface, some half-finished with bold strokes of paint. Brushes, jars of turpentine, and palettes are scattered across tables, while a skylight above lets in soft daylight. An easel in the center holds a large portrait of a woman with unfinished features.”
  • “The image depicts a remote mountain village built from stone cottages with red-tiled roofs, nestled along a narrow valley. Smoke rises gently from chimneys as villagers walk along cobblestone streets, and terraced fields climb the steep slopes around the settlement. The early morning mist creates a soft, hazy atmosphere.”“The image depicts a remote mountain village built from stone cottages with red-tiled roofs, nestled along a narrow valley. Smoke rises gently from chimneys as villagers walk along cobblestone streets, and terraced fields climb the steep slopes around the settlement. The early morning mist creates a soft, hazy atmosphere.”
  • “Portrait off-shot of a super cute 22-year-old Korean college student walking past glowing storefront signs, professional composition. Her silky black hair sways gently as she turns her head slightly, catching soft white light across her features. Beautiful blurred neon city background, hyper quality, 8K, detailed face, smooth tones, noise reduction.”“Portrait off-shot of a super cute 22-year-old Korean college student walking past glowing storefront signs, professional composition. Her silky black hair sways gently as she turns her head slightly, catching soft white light across her features. Beautiful blurred neon city background, hyper quality, 8K, detailed face, smooth tones, noise reduction.”
  • “A majestic 360 rotating panorama of the Matterhorn's peak, bathed in the warm, golden hues of sunset. The sky transitions from a vibrant orange to a soft pink, hinting at the approach of twilight. Craggy, snow-capped surfaces of the mountain are accentuated by the setting sun's rays, which cast a glowing aura and elongate shadows that add depth to the terrain. The serene mood is evident as the tranquil setting exudes a sense of stillness and grandeur. The style mirrors that of a hyper-realistic landscape painting, with careful attention to the play of light and shadow painted with heavy brush strokes. The composition is balanced with the Matterhorn centered, and from a low camera angle, it towers over the viewer, emphasizing its monumental scale.”“A majestic 360 rotating panorama of the Matterhorn's peak, bathed in the warm, golden hues of sunset. The sky transitions from a vibrant orange to a soft pink, hinting at the approach of twilight. Craggy, snow-capped surfaces of the mountain are accentuated by the setting sun's rays, which cast a glowing aura and elongate shadows that add depth to the terrain. The serene mood is evident as the tranquil setting exudes a sense of stillness and grandeur. The style mirrors that of a hyper-realistic landscape painting, with careful attention to the play of light and shadow painted with heavy brush strokes. The composition is balanced with the Matterhorn centered, and from a low camera angle, it towers over the viewer, emphasizing its monumental scale.”
  • “A close-up of a tattooed man with a shaved head and piercing green eyes, illuminated by harsh overhead fluorescent light, every pore and scar ultra-visible, the industrial space behind him dissolving into stark monochrome blur, intense, raw, gritty.”“A close-up of a tattooed man with a shaved head and piercing green eyes, illuminated by harsh overhead fluorescent light, every pore and scar ultra-visible, the industrial space behind him dissolving into stark monochrome blur, intense, raw, gritty.”
  • “A sunlit vineyard stretches across rolling hills, where rows of grapevines are heavy with ripe fruit. Workers in wide-brimmed hats move between the vines with baskets, clipping bunches of grapes and placing them carefully into wooden crates. In the distance, a stone villa with red-tiled roofs rises above the fields, while beyond it, olive trees line the ridges under a clear blue sky.”“A sunlit vineyard stretches across rolling hills, where rows of grapevines are heavy with ripe fruit. Workers in wide-brimmed hats move between the vines with baskets, clipping bunches of grapes and placing them carefully into wooden crates. In the distance, a stone villa with red-tiled roofs rises above the fields, while beyond it, olive trees line the ridges under a clear blue sky.”
  • “A close-up of an elderly woman with soft silver hair, gentle wrinkles, and kind eyes glistening under warm lamp light, each line on her face sharply defined yet tender, the cozy interior behind her fading into velvety blur, nostalgic, intimate, heartfelt.”“A close-up of an elderly woman with soft silver hair, gentle wrinkles, and kind eyes glistening under warm lamp light, each line on her face sharply defined yet tender, the cozy interior behind her fading into velvety blur, nostalgic, intimate, heartfelt.”
  • “A Victorian conservatory overflowing with exotic plants in ornate ceramic pots. Vines climb iron trellises, while a glass dome ceiling drips with condensation. In the corner, a fountain trickles into a stone basin filled with water lilies, surrounded by antique wicker chairs with embroidered cushions.”“A Victorian conservatory overflowing with exotic plants in ornate ceramic pots. Vines climb iron trellises, while a glass dome ceiling drips with condensation. In the corner, a fountain trickles into a stone basin filled with water lilies, surrounded by antique wicker chairs with embroidered cushions.”
  • “A dense autumn forest glows with fiery hues of red, orange, and gold. Fallen leaves blanket the forest floor, while shafts of golden sunlight filter through the canopy. A small wooden bridge spans a brook at the bottom of the valley, and mist curls upward from the water into the crisp morning air.”“A dense autumn forest glows with fiery hues of red, orange, and gold. Fallen leaves blanket the forest floor, while shafts of golden sunlight filter through the canopy. A small wooden bridge spans a brook at the bottom of the valley, and mist curls upward from the water into the crisp morning air.”
  • “Portrait off-shot of a gorgeous 23-year-old Russian girl with platinum blonde hair, walking along a riverside path lit by colorful reflections. Her face is softly illuminated by passing lights. Professional 8K photo, elegant skin tones, cinematic blur.”“Portrait off-shot of a gorgeous 23-year-old Russian girl with platinum blonde hair, walking along a riverside path lit by colorful reflections. Her face is softly illuminated by passing lights. Professional 8K photo, elegant skin tones, cinematic blur.”
  • “Inside a spacious art studio with high ceilings and whitewashed brick walls, canvases lean against every surface, some half-finished with bold strokes of paint. Brushes, jars of turpentine, and palettes are scattered across tables, while a skylight above lets in soft daylight. An easel in the center holds a large portrait of a woman with unfinished features.”“Inside a spacious art studio with high ceilings and whitewashed brick walls, canvases lean against every surface, some half-finished with bold strokes of paint. Brushes, jars of turpentine, and palettes are scattered across tables, while a skylight above lets in soft daylight. An easel in the center holds a large portrait of a woman with unfinished features.”
  • “The image depicts a remote mountain village built from stone cottages with red-tiled roofs, nestled along a narrow valley. Smoke rises gently from chimneys as villagers walk along cobblestone streets, and terraced fields climb the steep slopes around the settlement. The early morning mist creates a soft, hazy atmosphere.”“The image depicts a remote mountain village built from stone cottages with red-tiled roofs, nestled along a narrow valley. Smoke rises gently from chimneys as villagers walk along cobblestone streets, and terraced fields climb the steep slopes around the settlement. The early morning mist creates a soft, hazy atmosphere.”
  • “Portrait off-shot of a super cute 22-year-old Korean college student walking past glowing storefront signs, professional composition. Her silky black hair sways gently as she turns her head slightly, catching soft white light across her features. Beautiful blurred neon city background, hyper quality, 8K, detailed face, smooth tones, noise reduction.”“Portrait off-shot of a super cute 22-year-old Korean college student walking past glowing storefront signs, professional composition. Her silky black hair sways gently as she turns her head slightly, catching soft white light across her features. Beautiful blurred neon city background, hyper quality, 8K, detailed face, smooth tones, noise reduction.”
  • “A majestic 360 rotating panorama of the Matterhorn's peak, bathed in the warm, golden hues of sunset. The sky transitions from a vibrant orange to a soft pink, hinting at the approach of twilight. Craggy, snow-capped surfaces of the mountain are accentuated by the setting sun's rays, which cast a glowing aura and elongate shadows that add depth to the terrain. The serene mood is evident as the tranquil setting exudes a sense of stillness and grandeur. The style mirrors that of a hyper-realistic landscape painting, with careful attention to the play of light and shadow painted with heavy brush strokes. The composition is balanced with the Matterhorn centered, and from a low camera angle, it towers over the viewer, emphasizing its monumental scale.”“A majestic 360 rotating panorama of the Matterhorn's peak, bathed in the warm, golden hues of sunset. The sky transitions from a vibrant orange to a soft pink, hinting at the approach of twilight. Craggy, snow-capped surfaces of the mountain are accentuated by the setting sun's rays, which cast a glowing aura and elongate shadows that add depth to the terrain. The serene mood is evident as the tranquil setting exudes a sense of stillness and grandeur. The style mirrors that of a hyper-realistic landscape painting, with careful attention to the play of light and shadow painted with heavy brush strokes. The composition is balanced with the Matterhorn centered, and from a low camera angle, it towers over the viewer, emphasizing its monumental scale.”
  • “A close-up of a tattooed man with a shaved head and piercing green eyes, illuminated by harsh overhead fluorescent light, every pore and scar ultra-visible, the industrial space behind him dissolving into stark monochrome blur, intense, raw, gritty.”“A close-up of a tattooed man with a shaved head and piercing green eyes, illuminated by harsh overhead fluorescent light, every pore and scar ultra-visible, the industrial space behind him dissolving into stark monochrome blur, intense, raw, gritty.”
  • “A sunlit vineyard stretches across rolling hills, where rows of grapevines are heavy with ripe fruit. Workers in wide-brimmed hats move between the vines with baskets, clipping bunches of grapes and placing them carefully into wooden crates. In the distance, a stone villa with red-tiled roofs rises above the fields, while beyond it, olive trees line the ridges under a clear blue sky.”“A sunlit vineyard stretches across rolling hills, where rows of grapevines are heavy with ripe fruit. Workers in wide-brimmed hats move between the vines with baskets, clipping bunches of grapes and placing them carefully into wooden crates. In the distance, a stone villa with red-tiled roofs rises above the fields, while beyond it, olive trees line the ridges under a clear blue sky.”
  • “A close-up of an elderly woman with soft silver hair, gentle wrinkles, and kind eyes glistening under warm lamp light, each line on her face sharply defined yet tender, the cozy interior behind her fading into velvety blur, nostalgic, intimate, heartfelt.”“A close-up of an elderly woman with soft silver hair, gentle wrinkles, and kind eyes glistening under warm lamp light, each line on her face sharply defined yet tender, the cozy interior behind her fading into velvety blur, nostalgic, intimate, heartfelt.”
  • “A Victorian conservatory overflowing with exotic plants in ornate ceramic pots. Vines climb iron trellises, while a glass dome ceiling drips with condensation. In the corner, a fountain trickles into a stone basin filled with water lilies, surrounded by antique wicker chairs with embroidered cushions.”“A Victorian conservatory overflowing with exotic plants in ornate ceramic pots. Vines climb iron trellises, while a glass dome ceiling drips with condensation. In the corner, a fountain trickles into a stone basin filled with water lilies, surrounded by antique wicker chairs with embroidered cushions.”

We can also flexibly tune our sampling steps at inference time for optimal cost/quality trade-off. We show qualitative comparisons of 2,4,8-step TVM with 50x2-step diffusion.

  • “Ultra-realistic portrait of a fox carved entirely from translucent ice, steam rising from its breath as it stares forward with glowing amber eyes frozen beneath the surface. Snowflakes swirl softly in the slightly blurred background. Crisp highlights, cold tones, intricate ice detailing, dramatic atmosphere.”“Ultra-realistic portrait of a fox carved entirely from translucent ice, steam rising from its breath as it stares forward with glowing amber eyes frozen beneath the surface. Snowflakes swirl softly in the slightly blurred background. Crisp highlights, cold tones, intricate ice detailing, dramatic atmosphere.”
  • “The Taj Mahal gleams in the soft pink light of dawn, its white marble domes reflected perfectly in the still waters of the long rectangular pool. Fragrant gardens line the pathway leading to the mausoleum, where the intricate calligraphy of the gateway frames the view.”“The Taj Mahal gleams in the soft pink light of dawn, its white marble domes reflected perfectly in the still waters of the long rectangular pool. Fragrant gardens line the pathway leading to the mausoleum, where the intricate calligraphy of the gateway frames the view.”
  • “A stunning female model stands outdoors, wearing a vintage leather jacket and sunglasses, with the camera positioned at her eye level, capturing her confident gaze, soft natural light, subtle lens blur, and a muted color palette, film noir, monochrome, high contrast, atmospheric lighting.”“A stunning female model stands outdoors, wearing a vintage leather jacket and sunglasses, with the camera positioned at her eye level, capturing her confident gaze, soft natural light, subtle lens blur, and a muted color palette, film noir, monochrome, high contrast, atmospheric lighting.”
  • “The image depicts a serene alpine lake ringed by pine trees, its still surface reflecting the surrounding snow-dusted peaks. A cluster of wooden rowboats rests at a dock extending into the water, while patches of wildflowers bloom along the rocky shoreline. Overhead, the sky glows soft pink with the first light of dawn.”“The image depicts a serene alpine lake ringed by pine trees, its still surface reflecting the surrounding snow-dusted peaks. A cluster of wooden rowboats rests at a dock extending into the water, while patches of wildflowers bloom along the rocky shoreline. Overhead, the sky glows soft pink with the first light of dawn.”
  • “Ultra-realistic portrait of a fox carved entirely from translucent ice, steam rising from its breath as it stares forward with glowing amber eyes frozen beneath the surface. Snowflakes swirl softly in the slightly blurred background. Crisp highlights, cold tones, intricate ice detailing, dramatic atmosphere.”“Ultra-realistic portrait of a fox carved entirely from translucent ice, steam rising from its breath as it stares forward with glowing amber eyes frozen beneath the surface. Snowflakes swirl softly in the slightly blurred background. Crisp highlights, cold tones, intricate ice detailing, dramatic atmosphere.”
  • “The Taj Mahal gleams in the soft pink light of dawn, its white marble domes reflected perfectly in the still waters of the long rectangular pool. Fragrant gardens line the pathway leading to the mausoleum, where the intricate calligraphy of the gateway frames the view.”“The Taj Mahal gleams in the soft pink light of dawn, its white marble domes reflected perfectly in the still waters of the long rectangular pool. Fragrant gardens line the pathway leading to the mausoleum, where the intricate calligraphy of the gateway frames the view.”
  • “A stunning female model stands outdoors, wearing a vintage leather jacket and sunglasses, with the camera positioned at her eye level, capturing her confident gaze, soft natural light, subtle lens blur, and a muted color palette, film noir, monochrome, high contrast, atmospheric lighting.”“A stunning female model stands outdoors, wearing a vintage leather jacket and sunglasses, with the camera positioned at her eye level, capturing her confident gaze, soft natural light, subtle lens blur, and a muted color palette, film noir, monochrome, high contrast, atmospheric lighting.”
  • “The image depicts a serene alpine lake ringed by pine trees, its still surface reflecting the surrounding snow-dusted peaks. A cluster of wooden rowboats rests at a dock extending into the water, while patches of wildflowers bloom along the rocky shoreline. Overhead, the sky glows soft pink with the first light of dawn.”“The image depicts a serene alpine lake ringed by pine trees, its still surface reflecting the surrounding snow-dusted peaks. A cluster of wooden rowboats rests at a dock extending into the water, while patches of wildflowers bloom along the rocky shoreline. Overhead, the sky glows soft pink with the first light of dawn.”

We observe that at Text-to-Image scale, 4-step TVM produces roughly the same quality as 8-step TVM and 50x2-step diffusion, with noticeably sharper details than 2-step TVM. We conclude that 4-step offers the best quality-efficiency trade-off.

For complete theoretical and empirical details on our method, we release our paper at https://arxiv.org/abs/2511.19797 and open-source our code on ImageNet at https://github.com/lumalabs/tvm. We hope this marks a significant step beyond our current Diffusion and Flow Matching-based training paradigms.

Conceptual Illustration

To recap, diffusion models construct an interpolation between a data distribution and Gaussian noise distribution, and use a neural network to approximate the marginal velocity field. During sampling, diffusion models follow this ODE path tracing a curved path in space to arrive at a sample (bottom figure). In TVM, we use a network to parameterize the displacement from one point to another, thus forming a direct straight path for sampling (top figure). By definition, the one-step straight path in TVM will match the result of the multi-step path in diffusion, and they produce the same-quality samples in theory.

TVM directly constructs a straight path using a neural network and learns by matching the path’s terminal velocity.
Diffusion models directly learn the velocity to be integrated at test time, tracing a curved path.

This parameterization is not new, as is similarly proposed in prior works such as Consistency Models and MeanFlow. However, different from these works, we train by matching the terminal velocity of our model path (top figure). In fact, optimizing for this simple objective allows the model to provably learn the data distribution while achieving direct one/few-step inference.

Sampling at inference time only requires passing the next time step to the model, no other modifications are needed from the traditional Flow Matching sampler.

Training and System Designs at Scale

1. Semi-Lipschitz Control for Diffusion Transformers

Our theoretical analysis exposes fundamental flaws within existing diffusion transformer designs. To address these issues, we introduce simple, effective architectural modifications. These changes are easily applied to large-scale architectures without additional engineering effort.

2. Flash Attention Jacobian-Vector Product (JVP) Kernel with Backward Pass

Our algorithm involves training with JVP and (ideally) requires backward pass through JVP – although one can detach the JVP gradient for speed considerations, this gives us biased gradient estimates. Unfortunately, current official Flash Attention modules under PyTorch do not support backwards through JVP. To tackle this challenge, we have implemented an efficient Flash Attention kernel, supporting both self-attention and cross-attention, that fuses forward and JVP computation and conducts multi-step backward pass for efficiency. This kernel is included in our open-source code and can be used with a few lines of code.

3. JVP at 10B+ Parameter Scale

Fully Sharded Data Parallel (FSDP), a common technique in large-scale training to reduce the model’s memory footprint, interacts poorly with JVP in PyTorch. To bypass any potential problems in model sharding, we build our custom JVP modules compatible with FSDP while ensuring per-step training efficiency. All other common parallelism and acceleration techniques, such as Context Parallel, Activation Checkpointing, and torch.compile work natively with our JVP modules.

What’s Next

We hope TVM marks a significant step towards scaling the next generation of generative models that push the limit of efficient inference-time scaling. TVM is conceptually simple and stable to work with, and can also be flexibly introduced as a post-training technique to accelerate inference of regular diffusion models, requiring no teacher models or any joint optimization of multiple networks.

Our team is dedicated to push what is possible beyond the current paradigms that unlock more capabilities in multi-modal models. If you resonate with our mission, join us.

References

Linqi Zhou, Stefano Ermon, Jiaming Song. “Inductive Moment Matching”. ICML 2025
Yang Song, Prafulla Dhariwal, Mark Chen, Ilya Sutskever. “Consistency Models.” ICML 2023.
Geng et al. “Mean Flows for One-step Generative Modeling.” NeurIPS 2025.
Cheng Lu, Yang Song. “Simplifying, Stabilizing and Scaling Continuous-Time Consistency Models.” ICLR 2023.
Song et al. “Score-based Generative Modeling through Stochastic Differential Equations.” ICLR 2021.
Lipman et al. “Flow matching for generative modeling.” ICLR 2023.

Top Quiz Answers

The images highlighted in green were generated with the 4-step TVM pipeline.
The images highlighted in green were generated with the 4-step TVM pipeline.