Infrastructure of Intention · Sandbox Labs AI

From Generation to Curation Across Domains

Something shifted in the room at the AI Collective & Silicon Valley Video Summit last night. Not in what was announced, but in what was assumed. Leaders from ElevenLabs, Freepik 13 Layers, iBelieveInSwordfish Inc. spoke about AI filmmaking workflows with the matter-of-factness of professionals describing established practice. The experimental phase is over. What remains is a harder question: who builds with intention, and who merely generates?

The same question now echoes across every domain where AI touches creation. Code & cinema. Agents & audio. The tools have democratized. The differentiator has not.

Curation as Core Competency

Matty Shimura, who runs creator competitions at ElevenLabs, offered the sharpest articulation of the present moment. After reviewing 6,500 submissions for their Chroma Awards, Shimura observed that technical capability is no longer the bottleneck. “Anyone can spend 40 to 80 hours watching YouTube tutorials,” Shimura noted, “and you’d be as good as any of the top AI filmmakers to a certain degree.”

The pause after “degree” carried the weight.

“The things that are really falling behind, especially for AI-native creators, are editing, pacing. Using AI for the sake of AI versus being intentional about it. Having an eye and being able to curate and having taste is something that takes many, many years to develop.”

This is the paradox now crystallizing across industries: more powerful generation tools demand more human judgment, not less. The bottleneck has migrated from access to judgment. Organizations that spent 2024 and 2025 asking “how do we access these capabilities?” now face a different constraint entirely. Access is trivial. Curation is craft.

Host Applications & Organizing Architectures

Matt Silverman, Chief Creative Officer at iBelieveInSwordfish, framed the technical evolution through an unexpected lens. For years, the AI space resembled having renderers without host applications. “We had Sora and we had Veo and we had Kling,” Silverman explained, “but it’s like 3D applications didn’t exist, and all we had were renderers. We had RenderMan and V-Ray and Octane, but we didn’t have Maya or Cinema 4D yet.”

The emergence of platforms like Freepik, Spark AI, and Weavy represents something more significant than product launches. These are organizing architectures. Infrastructure that makes raw capability usable at professional scale

Paula Vivas, Head of US Marketing at Freepik, described the enterprise shift in behavioral terms. “At the beginning of 2025, everyone was trying it out. It was very experimental.” Now Fortune 500 companies approach with urgency. “They are not as scared about being the one not using AI. There is fear now of being the last ones.”

The fear inverted. And with it, the strategic question. Early adoption conferred advantage when capability was scarce. Now that capability is ambient, the advantage belongs to those who organize it with intention.

Complexity Scales Nonlinearly

Silverman surfaced a limitation that technical demos rarely reveal. “It’s really easy to make a film if you’ve got one actor,” he observed. “You throw a second actor in that scene, now you got the chance that one actor is going to perform well, and the other guy doesn’t.”

The punchline landed with production-tested weariness: “What happens when the script calls for seven people in that scene?”

This is the architectural truth obscured by impressive single-shot demonstrations. Managing one element is trivial. Managing seven reveals whether the underlying system has coherent structure or merely accumulated features. The principle applies identically to agent orchestration, enterprise workflows, and any system where components must coordinate under real constraints.

Shimura reinforced the point from the creative side. The traditional skills that AI was supposed to bypass turn out to be precisely what separates professional output from impressive demos. “The people who are going to be best at using these tools are also the ones who are most alienated by it,” Shimura noted. “This messaging of cheaper, better, faster disruption gives creatives the ick.”

The irony is structural. Those with the deepest craft knowledge resist the tools most strongly, while those who embrace the tools most eagerly often lack the judgment to deploy them well. Bridging this gap is not a marketing problem. It is an organizational design challenge.

Sound Before Surface

One workflow revelation carried particular resonance. Matt described starting productions with audio rather than visuals. Voiceover tracks and soundscapes establish timing & pacing before a single image generates.

Shimura explained the logic: “Video generation is expensive and you want to get your timing down. You want to make sure that each shot actually makes sense in your sequence.”

The principle extends beyond film. In any complex creation, whether cinematic or computational, the invisible architecture determines whether the visible output coheres. Pacing precedes surface. Structure precedes features. Organizations racing to generate outputs before establishing rhythms discover the same temporal drift that plagues unstructured AI video: impressive moments that never coalesce into coherent wholes.

Consolidation & Distribution

Shimura offered a structural observation about where power is concentrating. “All the video models are backed by a social media platform. Kling is Kuaishou, which is kind of like Chinese TikTok. ByteDance is TikTok. Google has YouTube.”

The pattern is not coincidental. Training data and distribution channels compound. Organizations without both will find themselves using infrastructure controlled by those who possess both. The strategic implications extend well beyond creative tools. Any domain where AI capabilities require massive data and reach massive audiences will see similar consolidation dynamics.

For enterprise leaders, the question becomes: where in this emerging stack does your organization have leverage, and where are you renting capabilities that could be withdrawn or repriced?

What Remains Human

The summit’s most telling moment came when Shimura reflected on what AI filmmaking loses even as it gains. “There’s something nice about being on set, being in the edit bay with your friends and having that collaborative thing where everyone’s adding a little bit to the mix.”

Vivas echoed the sentiment differently, distinguishing between “an artist and a content creator, the one that creates for the sake of creating and putting it out there versus the one that creates with intention and with wanting to communicate something.”

The tools are neutral on this question. They will generate with equal efficiency for either purpose. The intentionality is not in the model. It is in the organization, the workflow, the human judgment about what deserves to exist.

Across code & cinema, agents & audio, the same pattern resolves. Generation became commodity faster than most organizations anticipated. What remains scarce is the infrastructure of intention: the taste to curate, the architecture to organize, the judgment to know when seven actors in a scene is the wrong scene.

Every organization now building with AI faces a question that no vendor will answer for them: Is the structure you are assembling designed to generate, or designed to mean something?

Special thanks to moderator Anthony Garcia of The AI Collective

Appendix: Chroma Awards 2025

Context: Inaugural competition uniting creators, foundational model companies.

Scope: 6,500+ submissions across film, music & gaming.

Legacy: Definitive map of transition from experimental shorts to curated architecture.

Watch: Chroma Awards Recap