top of page

Where Do We Go From Here With Volumetric Video

  • May 27
  • 10 min read
Where Do We go From Here banner with stunt men being volumetrically captured.


This is the final post in our short series, Fail Fast in the World of Volumetric Video.


In Part 1, we focused on garbage in, garbage out. The point was technical: if the capture is wrong, the rest of the pipeline becomes more expensive.


In Part 2, we focused on the medium being driven by content. The point was commercial: volumetric video will not become a medium simply because the technology works. It becomes a medium when the content is strong enough to create repeat demand.


In Part 3, we focused on the impact that standards, patents could have on gaussian splats. The point was understand the patent life cycle and the three different standards bodies who all have different IP Policies to know if there will be a codec war and how does this impact your business, your research and / or your use of gaussian splats.


So where do we go from here?

If volumetric video is going to become more than a series of impressive demos, the industry needs to focus on the ecosystem. That means standards, interoperability, honest technical discussion, shared terminology, real benchmarks, hardware and software alignment, audio, content protection, playback consistency, and support for the companies, creators, researchers, broadcasters, and hardware manufacturers solving different parts of the same problem.

Most importantly, it means treating volumetric video as a media format, not just a file format.


Volumetric Video Needs a Real Media Format

When we say volumetric video needs a format, we do not mean only a codec. We do not mean only a file extension, a renderer, a compression method, or a container. A real media format is much bigger than that.


In the media industry, a format defines how content is captured, packaged, published, delivered, controlled, protected, and played back. It includes audio and video. It includes metadata. It includes synchronization. It includes interactivity, even if that interactivity is as basic as play, pause, seek, fast forward, rewind, or what the industry often calls trick play. In an internet-based medium, it also includes how content is streamed and how playback systems request, receive, and display that content.


For volumetric video, a real format has to reach even further upstream. It should help define the capture environment, camera placement, synchronization, calibration expectations, metadata requirements, processing stages, compression, streaming, playback behavior, and the way different systems interoperate.


That is the difference between a codec and a media format. A codec may solve one part of the problem. A media format helps the entire ecosystem work together.


The broadcast industry already understands this. A production team can buy a grandmaster clock from one manufacturer, connect it to a switcher from another manufacturer, use cameras from multiple brands, route signals through converters, encoders, monitors, replay systems, and delivery infrastructure, and expect the system to work. That does not happen by accident. It happens because the industry has shared standards, shared timing, shared signal expectations, and shared operational practices.


That is what volumetric video needs.

If every company builds its own camera rig, metadata model, compression path, playback behavior, and workflow, the industry stays fragmented. Every new team has to start from scratch. Every content creator has to learn a different pipeline. Every hardware company has to guess what to support. Every platform becomes its own island.


That is not how a medium scales.


A real volumetric media format would give companies a common foundation to build from. It would not eliminate innovation. It would make innovation easier because companies could focus on the parts of the stack where they actually add value instead of reinventing everything around them.


The goal is not to force every company to build the same product. The goal is to create enough shared structure that cameras, capture systems, processing tools, encoders, playback engines, hardware devices, and content workflows can actually work together.


That is how volumetric video moves from isolated demos to a real media ecosystem.


Standards Bodies Help Industries Scale

If volumetric video is going to become a real media ecosystem, it needs standards bodies that understand the full picture. Standards cannot focus only on compression, rendering, or file packaging. They also need to consider capture, playback, metadata, audio, DRM, device compatibility, hardware interoperability, and the commercial realities of companies investing real R&D into the space.


A healthy standards ecosystem also needs a fair IP policy. Companies that invest in real innovation should have a way to protect and, if they choose, monetize their work. At the same time, that process has to be fair, transparent, and available under consistent terms. It cannot become a system where one company can pick winners and losers by changing pricing or access depending on who is asking.


That balance matters. If IP is ignored, companies may be less willing to contribute. If IP is abused, the ecosystem fragments. A good standards body has to create a framework where innovation is respected, adoption is encouraged, and the industry can build together.

That is one of the reasons the Volumetric Format Association exists.


What the Volumetric Format Association Is Building

The Volumetric Format Association (VFA) has been working since 2021 to help define volumetric video as a media format, not just a one-off technology stack. Its focus has been interoperability: helping different parts of the industry work together across capture systems, processing tools, encoding systems, playback engines, hardware devices, software platforms, and content workflows.


A true volumetric media format needs to address more than visuals. It needs to support the expectations of the media industry, including consistent playback, content protection, quality requirements, metadata, audio, streaming, and device behavior.


DRM is one example. Content owners need a way to protect their work. If volumetric media is going to be used by sports leagues, broadcasters, studios, platforms, brands, and rights holders, content protection cannot be an afterthought.


Consistent playback is another example. A content creator should not have to worry that one device will deliver a completely different experience than another device. If a piece of content is VFA-compliant and a device is VFA-compliant, the expectation should be that the audience receives a consistent, high-quality experience.


That does not mean every device will have the same performance. Devices will always vary. But the format should define baseline expectations for playback, visual quality, audio quality, interactivity, and user experience.


This is one of the lessons from the media industry. Audiences often pay for content, and they expect that content to work. They expect quality. They expect consistency. They expect audio and video to remain in sync. They expect controls like play, pause, seek, rewind, and fast forward to behave predictably.


Volumetric video needs the same discipline. That is what it means to build a media format.


This Cannot Be Software-Only

Volumetric video cannot be treated as a software-only problem. Software is essential. Reconstruction, compression, rendering, streaming, playback, and interactivity all require software. But media infrastructure has never been software alone.


The broadcast industry is built around hardware and software working together. Cameras, lenses, sync generators, grandmaster clocks, switchers, encoders, decoders, replay systems, monitors, control surfaces, and delivery infrastructure all need to interoperate. That is how professional media systems work.


Volumetric video needs that same mindset.

If the industry only focuses on software, then software ends up trying to solve problems hardware already handles extremely well. Timing, synchronization, capture, encoding, decoding, and real-time processing all benefit when hardware manufacturers are part of the standards conversation.


The optical disc industry is a useful example. DVD succeeded because the format was not just a file or a codec. It was an ecosystem. The MPEG-2 decoder, disc structure, playback behavior, authoring tools, hardware players, and content requirements all worked together. At the time, software-only MPEG-2 encoding or decoding could be painfully slow. Dedicated hardware made the experience practical for consumers.


That same lesson applies here. If volumetric video is going to scale, hardware manufacturers need to be involved early. Camera companies, lens companies, GPU companies, encoder manufacturers, playback device makers, broadcast hardware companies, and infrastructure providers all need a seat at the table.


Otherwise, every company builds around its own assumptions, and the ecosystem stays fragmented.


Audio Has to Be Part of the Format

Audio cannot be an afterthought.

There is an old lesson in media: audiences will tolerate imperfect picture quality longer than they will tolerate bad audio. If the image is slightly grainy but the sound is clear, people may stay engaged. If the image is pristine but the audio is painful, out of sync, or hard to understand, the experience breaks quickly.


Volumetric media is no different. In fact, audio may be even more important in volumetric video because the viewer may be able to change perspective. If the viewer moves through a scene, the sound should make sense from that new position. If a performer moves, the sound should reflect that movement. If the playback engine supports spatial audio, the format should be able to support that experience.


The VFA has treated audio as part of the format conversation from the beginning. That includes keeping audio in sync with video, supporting playback requirements, and expanding toward spatial audio as part of the playback specification.


A volumetric media format should not be limited to mono or stereo. Those may be supported, but the format also needs a path for spatial audio, object-based audio, and future audio workflows that make sense for immersive and spatial experiences.

This matters for sports, concerts, live events, education, telepresence, training, and entertainment. A volumetric media format that solves only the visual layer is incomplete. The format has to solve the media experience.


At Skyrim.AI, we are also exploring new approaches to audio capture, including early-stage R&D using fiber optics. That work is still early, but we believe there is significant potential in rethinking how audio is captured for volumetric and spatial media environments.

The broader point is simple: if volumetric video is going to become a real medium, audio has to be designed into the format, not added later.


Be Honest About Where the Technology Is

The next thing the industry can do is be more honest about the state of the technology. That does not mean being negative. It means being useful.


The industry does not need more hype cycles where one trade show demo or one one-off event resets expectations. It does not need marketing that makes the hard problems look solved. It does not need every new rendering method to be positioned as the thing that will finally make volumetric video mainstream.


We need a healthier technical conversation about where the technology works today, where it does not work, what scales, what does not scale, what is a real-time pipeline versus an offline reconstruction process, what is production-ready versus demo-ready, and what is a media product versus a research result.


Those distinctions matter. If we are honest about where the technology is, we can focus energy on the right problems. We can avoid wasting years on use cases that do not have a business case. We can reduce repeated mistakes. We can help new teams enter the space with better context.


That is part of failing fast. It is not about failing privately for years. It is about failing openly enough that the industry gets smarter.


Hold Each Other Accountable

The volumetric video industry is still small. That is actually an advantage. It means we can have real conversations. It means companies can learn from one another. It means researchers, startups, media companies, hardware vendors, and standards groups can work together before the ecosystem becomes too fragmented.


But that only works if we hold each other accountable.


That includes us. We are happy to be challenged. If we get something wrong, we want to know. If someone has solved a problem in a different way, we want to hear about it. If there is a better implementation, a better use case, a better workflow, or a better technical approach, that should be part of the conversation.


The goal is not to win an argument. The goal is to move the industry forward. Volumetric video does not need bad marketing. It needs credible messaging, clear claims, shared terminology, honest benchmarks, and technical and commercial accountability.


Stand on the Shoulders of Giants

We believe the answer is simple: the industry needs to stand on the shoulders of giants.

That means learning from the companies, researchers, engineers, standards bodies, and creators who have already spent years working through these problems. It means not starting from scratch every time a new capture method, rendering technique, compression claim, or stage comes online. It means understanding what has already been tried, what worked, what failed, and why.


Companies that do this will be years ahead of those trying to solve the entire stack alone.

That is the real call to action. If you are building in volumetric video, learn what the Volumetric Format Association is doing. Ask questions. Participate in the standards conversation. Understand how interoperability, playback, audio, metadata, DRM, capture guidance, and hardware alignment can help turn volumetric video into a real media ecosystem.



If you are using 36 or more cameras to capture a single person, we encourage you to read Part 1 of this series: Garbage In, Garbage Out. It is detailed for a reason. It is based on 10 years of hands-on experience across capture systems, lenses, sensors, calibration, sync, timecode, color balance, point cloud cleanup, and reconstruction. This is not theoretical. These are lessons learned by building, testing, breaking, and rebuilding real systems.


If you are building around Gaussian splats, take the time to understand the intellectual property landscape, especially around compression, streaming, editing, and production workflows. If splats become a major part of the volumetric media ecosystem, companies will need to understand where they stand: offensively, defensively, or somewhere in between.


Do that work early. Do not wait until the market matures to discover that your pipeline depends on technology you do not control.


At Skyrim.AI, we have taken a different approach. We are sharing that approach publicly. We are inviting feedback. We are asking people to challenge what we are saying, and we will continue to challenge ourselves and others in return.


We believe the next six months are a pivotal window for volumetric video. The industry can repeat the last decade by building more isolated capture stages, more one-off demos, and more tools without a clear content path. Or it can focus on the larger ecosystem: standards, interoperability, hardware, audio, content, distribution, and use cases that audiences actually want.


We have always believed content comes first: not the stage, not the demo, and not the renderer. If volumetric video is going to become a true media format, it will not be because one company builds the perfect capture stage. It will be because an ecosystem forms around content that people want to watch, use, share, and experience again.


That is the path forward. Stand on the shoulders of giants. Learn from what has already been tried. Join the standards conversation. Challenge the claims. Share what you have learned. Help build a volumetric media ecosystem that can actually scale.


The future of volumetric video will not be built by one company alone. It will be built by an industry that learns how to fail fast together.


Comments


bottom of page