RRév O'Conner
The Himalayas, IN · UTC+5:30
← All writing
CGI2026 · April · 1812 min read

Patching Meta SDK's Depth Occlusion: A Dual-Channel Portal System for Quest 3

How we patched Meta SDK v77's environment depth occlusion with a dual-channel portal system, letting virtual objects render through real-world surfaces like glass without breaking depth tests elsewhere.

Mixed-reality passthrough on Quest 3 leans on one thing more than anything else: the environment depth texture. Meta SDK samples the headset's depth sensors, reprojects into eye space, and hands you a per-pixel depth map that your shaders can compare against virtual fragment depth. If a real-world object is closer than your virtual fragment, the fragment gets occluded. That single comparison is what makes a virtual cube sit on a real table convincingly.

The problem starts the moment your application needs an exception. You have a virtual object that must be visible through a real window. You have a portal that opens onto a virtual scene placed against a wall. You have a transparent surface in the real world that the depth sensor reads as opaque. Meta's stock occlusion has no concept of selective bypass. Depth is depth.

This is the story of patching that limitation. The end state is a dual-channel depth texture format with portal quads that flag specific regions as "do not occlude," validated by RenderDoc captures with a measured cost of roughly 0.05 ms per frame on Quest 3.

How Meta's stock occlusion works

The relevant pipeline in Meta SDK v77 is straightforward. EnvironmentDepthManager allocates an XR depth texture in D16_UNORM format at 320x320 pixels, with two array layers for stereo rendering. Each frame, the SDK populates this texture from the headset's depth sensors after reprojection. Your shaders sample it through the URP-friendly headers in EnvironmentOcclusionURP.hlsl, which exposes macros like META_DEPTH_OCCLUDE_OUTPUT_PREMULTIPLY to apply hard or soft occlusion against the depth value at the current fragment's reprojected UV.

The depth comparison itself happens in EnvironmentOcclusion.cginc. There are two main paths: close objects, where environment depth is treated as authoritative and any closer environment depth occludes the fragment, and distant objects (beyond the sensor's reliable range), where the original SDK simply hides them. Our project had already patched the distant-object behaviour so that far virtual objects still render unless something within the sensor's reliable range is genuinely in front. That earlier patch lives in the same file and is unrelated to portals, but it explains why we already had a heavily customised version of the SDK header.

The depth texture is a single D16_UNORM channel. That is the entire occlusion contract. There is no slot for metadata, no mask channel, no way to tag a region as "ignore the depth here." Anything you want to do beyond depth-versus-depth has to be smuggled in somewhere else.

Why a portal needs a second channel

The naive approach to selective bypass is to stencil out portal regions before the occlusion pass and skip the comparison. This does not work cleanly inside the URP rendering loop on Quest 3 because the depth occlusion is sampled inside the regular forward fragment shaders. There is no central "occlusion pass" you can mask. Each material decides per-fragment whether to apply occlusion, and they all read from the same depth texture.

The other approach is to encode the bypass directly into the depth texture itself. Add a second channel. Treat the first channel as depth, exactly as before. Treat the second channel as a portal mask: zero means "occlude normally," non-zero means "do not occlude regardless of depth." Every existing occlusion sample reads two values instead of one, and the second value gates whether the comparison is applied.

This is the path we took. The texture format changes from D16_UNORM to R16G16_UNORM. The image usage flags change from DEPTH_STENCIL_ATTACHMENT_BIT | SAMPLED_BIT to COLOR_ATTACHMENT_BIT | INPUT_ATTACHMENT_BIT | TRANSFER_SRC_BIT | TRANSFER_DST_BIT | SAMPLED_BIT. The depth texture is no longer a depth attachment at all. It is a color target that happens to contain depth in the red channel and a portal flag in the green channel.

This sounds drastic. In practice it composes well with the existing SDK because the actual reads on the shader side are just SAMPLE_TEXTURE2D_X calls. The format change is invisible to the consumer as long as the depth value lives in .r, which it now does.

The portal pass

A portal in this system is a small mesh quad placed in world space wherever a real-world surface should let virtual objects render through. Window panes, glass display cases, mirror-backed alcoves. The portal mesh is rendered into the depth texture before the main forward pass, using a dedicated shader (we called it RC_Portal_OcclusionDepth). The shader writes:

  • The portal's own depth into .r (which can be far if you want full bypass, or the portal surface's actual depth if you want partial bypass at the surface itself)
  • A non-zero flag into .g to mark this pixel as "do not occlude downstream virtual fragments"

Because the target is a color attachment rather than a depth attachment, the portal pass has full control over both channels. There is no hardware depth test fighting the shader for write access. You can write whatever you want, including encoding "treat this pixel as far away" by stuffing a near-1.0 value into .r.

In the consumer shaders, the existing occlusion macros are slightly modified to also sample .g:

float2 envDepthAndFlag = SampleEnvironmentDepthWithFlag(reprojectedUV);
float envDepth = envDepthAndFlag.r;
float portalFlag = envDepthAndFlag.g;
 
if (portalFlag > 0.5)
{
    // Portal region. Virtual fragment always wins.
    return 1.0;
}
 
// Otherwise, fall through to the standard depth comparison
return envDepth > sceneDepth ? 1.0 : 0.0;

The branch is cheap. On Adreno 740 it lives inside a fragment shader that was already sampling the depth texture, so the only added cost is reading two channels instead of one and a comparison against a constant.

What RenderDoc actually measures

Theory is fine. The interesting question is what this costs once you put it on the headset. We captured two frames with RenderDoc: one with the stock occlusion pipeline, one with the portal patch active and one portal quad in the scene.

The stock capture allocates an XR Texture in D16_UNORM at 320x320 with 2 array layers. The image's initial contents are 409,600 bytes (320 x 320 x 2 bytes per pixel x 2 layers). Image usage is SAMPLED_BIT | DEPTH_STENCIL_ATTACHMENT_BIT. Frame summary: 31 draw calls, 2 dispatch calls, 27 textures totalling 38.09 MB, 8 render targets totalling 577.89 MB, grand total of approximately 630.17 MB GPU memory committed for the frame.

The portal capture allocates a 2D color attachment in R16G16_UNORM at the same 320x320 and 2 array layers. Initial contents are exactly 819,200 bytes. Image usage is TRANSFER_SRC_BIT | TRANSFER_DST_BIT | SAMPLED_BIT | COLOR_ATTACHMENT_BIT | INPUT_ATTACHMENT_BIT. Frame summary: 32 draw calls (one more for the portal quad, vkCmdDrawIndexed(6, 1)), 2 dispatch calls, 27 textures totalling 38.09 MB, 8 render targets totalling 578.28 MB, grand total of 628.56 MB.

The portal version uses slightly less total GPU memory across the frame, which surprised me at first. The explanation is that the dual-channel color attachment removes the need for some of the depth-stencil-specific resource tracking on the SDK side, and our patched code path is more efficient at managing persistent allocations. Persistent data drops from 11.32 MB to 9.32 MB, a 2 MB reduction.

The depth texture itself doubles in memory: 409,600 bytes to 819,200 bytes. In absolute terms that is an additional 409,600 bytes, or roughly 400 KB. On a device with around 8 GB of unified memory this is dust.

For frame-time impact, the cost decomposes as:

  • One extra indexed draw call for the portal quad (six indices, two triangles). On Quest 3 this is approximately 0.03 ms including binding overhead
  • One extra texture bind: approximately 0.001 to 0.002 ms
  • Dual-channel sampling in every consumer shader instead of single-channel. The format is the same bit width per channel, so the cost is essentially the second channel fetch and the branch. Across the entire stereo render at 320x320 (about 200k samples worst case) this adds approximately 0.01 to 0.02 ms
  • Memory bandwidth: an additional 410 KB read per frame. At 72 Hz that is 29.5 MB/s, at 90 Hz that is 36.9 MB/s. Against Quest 3's roughly 34 GB/s peak memory bandwidth, that is 0.09% to 0.11%

Total added frame time is approximately 0.047 ms. At 72 Hz this is 0.34% of the 13.89 ms frame budget. At 90 Hz it is 0.42%. At 120 Hz it is 0.56%. None of those numbers should be visible against typical frame-to-frame jitter on the headset.

The transfer flags are doing real work

One detail worth pointing out from the Vulkan side: the portal texture has TRANSFER_SRC_BIT | TRANSFER_DST_BIT in its usage flags, which the stock depth texture does not. This is what lets the SDK perform image-to-image copies on the depth texture between passes if needed, which we use during the mask-mesh validation step (see below). The INPUT_ATTACHMENT_BIT similarly enables reading the depth texture as an input attachment in a later subpass, which is faster than a generic sampled read on tile-based architectures because it can stay in tile memory.

These usage flags do not cost anything on their own. They just unlock paths that are useful when you treat depth as colour.

Empty mask-mesh entries are a landmine

The portal patch shares infrastructure with our earlier mesh-mask system, which is the SDK's mechanism for letting designers tag specific virtual meshes as "ignore environment depth for these." The Environment Depth Manager exposes a MaskMeshFilters list. Each entry is a mesh filter component whose mesh gets composited into the depth texture to mark out the bypass region.

We discovered the hard way that empty list entries (null references, or mesh filters with null shared meshes) cause complete occlusion breakdown. The mask-processing pass still runs, but it writes garbage into the depth texture, and our invalid-depth handler downstream interprets the garbage as "no reliable depth, show the virtual object," which in turn causes all occlusion to fail across the entire scene.

The fix is straightforward but worth recording. Before processing the mask mesh filters, validate that at least one entry has a valid mesh:

bool hasValidMaskMeshes = false;
if (MaskMeshFilters != null)
{
    foreach (var mf in MaskMeshFilters)
    {
        if (mf != null && mf.sharedMesh != null)
        {
            hasValidMaskMeshes = true;
            break;
        }
    }
}
 
if (!hasValidMaskMeshes)
{
    return;
}

LINQ would compress this, but we avoid LINQ in hot paths in Unity for allocation reasons. A manual foreach is fine and produces no garbage.

Invalid depth in dark regions

The other class of bug we hit during the patch involves invalid depth data inside the sensor's nominal range. Meta's depth sensors return unreliable values in low-light regions, on highly reflective surfaces, on glass, or anywhere the structured-light or stereo correspondence breaks down. The raw sensor output for these pixels is approximately zero.

Meta's stock CalculateEnvironmentDepthHardOcclusion was treating invalid samples (rawDepth <= 0.001) as "fully occluded." This is the wrong default. Invalid depth means "I do not know what is at this pixel." That should not occlude virtual content. It should let virtual content render. The fix is a one-liner in the close-object branch:

if (!hasValidDepth)
{
    return 1.0;  // No reliable depth, show the virtual object
}
return environmentDepth > sceneDepth ? 1.0 : 0.0;

The original returned 0.0. Flipping this single value is what made dark-area occlusion behave correctly. The portal system depends on this being right, because the portal flag uses the same code path: a non-zero portal flag in .g is treated as "this pixel has no reliable occlusion," and the rule should be the same as for genuinely invalid depth. Show the virtual fragment.

Why this design holds up

Looking at the final shape of the system, three properties make it work on mobile VR:

The first is that depth-as-colour shifts you out of the hardware depth-test path and into shader control. Once depth is a colour attachment, you can encode anything you want in the other channels, you can write any value, and you can read back from the texture as a regular sampled texture in subsequent passes without fighting the depth attachment lifecycle. The cost is that you lose hardware early-Z on the depth pass itself, but the portal depth pass is one quad per portal, so early-Z would have nothing to optimise anyway.

The second is that the portal flag is per-pixel, not per-object. Designers can place portals anywhere in world space and the system handles them uniformly. The portal mesh is the authoring surface: a quad in Unity, with the right shader and the right material, drops into the scene and just works.

The third is that the consumer shaders barely change. The hot path is still a single texture sample plus a comparison. The added branch on the portal flag is a constant-driven one, which is cheap on every GPU we care about and free on Adreno 740 where these branches almost always coherent across a wavefront.

Where this falls short

The dual-channel approach gives you a binary flag per pixel. Either occlusion applies or it does not. There is no smooth blend. If you want a partial-bypass region (say, occlusion is applied at 50% strength so virtual content shows faintly through a real wall), you would need to widen the format further or repurpose one of the bits as a numeric value. R16G16 actually has plenty of precision in .g for a 0-1 occlusion modulation, so the format can support this. We just have not wired it up.

The portal mesh authoring is also still manual. There is no automatic detection of glass surfaces in the user's room, no MRUK integration that flags windows. A scene designer has to place the portal quad explicitly. For an arms-forces simulation this is fine, because the training environment is authored, not procedurally generated. For a consumer mixed-reality app this would need to be more dynamic.

Finally, the patch is forked from Meta SDK v77. Every time the SDK updates we have to merge our changes into the new version of EnvironmentOcclusion.cginc, EnvironmentOcclusionURP.hlsl, and EnvironmentDepthManager.cs. So far Meta has been stable enough that this is mechanical, but it is the kind of work that adds up across SDK releases.

What this enables

With the portal system in place, the level designers can put virtual scenes behind real windows. Through-the-window views work without any obvious break in the occlusion. The same mechanism handles training scenarios where a virtual asset needs to appear through a glass display, behind a transparent partition, or in a region of the room that the depth sensor cannot read reliably anyway. The cost is sub-millisecond and the visual quality is indistinguishable from the unpatched system in regions where the portal flag is zero.

The bigger lesson, at least for me, is that mobile VR pipelines reward thinking about depth as data rather than depth as a hardware feature. The moment we let go of the idea that the environment depth texture had to be a D16_UNORM depth attachment, the rest of the design fell out cleanly. Two channels instead of one. Colour attachment instead of depth attachment. A portal quad instead of a stencil pass. It is the same comparison, with one more bit of information attached.

Filed under: CGI← Back to writing