TL;DR:

  • Spatial computing design breaks most 2D assumptions — position, depth, and physical comfort are constraints that screens never imposed
  • The vergence-accommodation conflict causes eye fatigue in poor implementations; keeping UI elements at comfortable focal distances isn’t aesthetic preference, it’s physiology
  • Hand tracking UX requires explicit affordances — users can’t discover interaction models that have no physical-world equivalent

Spatial computing design isn’t 2D UI design with an extra dimension. The constraints are different in kind: position in space affects cognitive load, depth affects eye comfort, and the absence of a physical surface removes the tactile feedback cues that make touch interfaces intuitive. Here’s a practical set of principles for teams designing AR and VR interfaces in 2026.

Designing in Depth: The 3D Layout Problem

In 2D design, layout is composition — elements are positioned on a flat plane. In spatial computing, depth is a layout dimension with functional and perceptual consequences, and getting it wrong is immediately noticeable.

The region from roughly 0.5m to 3m from the viewer is the sweet spot for AR content. Closer than 0.5m creates eye strain — objects inside the vergence zone are difficult to fuse binocularly. Beyond 3–4m, small UI elements become hard to read and fine interaction gets difficult.

Apple’s visionOS design guidelines use a concept of orbital distance — UI surfaces are placed at a consistent depth from the user, like objects on an invisible sphere. This simplifies focal distance management and prevents the design problem of UI elements at many different depths competing for focus.

Good depth cues establish spatial presence and are worth getting right. Occlusion is the single most powerful depth cue: objects that are partially hidden behind other objects read as further away. Don’t flatten this by compositing layers without occlusion. Size scaling, where objects appear smaller with distance, makes depth intuitive. Motion parallax (near objects shifting more than far objects as your head moves) is handled by the platform, so don’t override it. And contact shadows matter: a virtual object with no contact shadow appears to float; a shadow grounds it on the surface it’s resting on.

Vergence-Accommodation Conflict

The vergence-accommodation conflict (VAC) is the most studied cause of VR eye fatigue and is directly relevant to AR UI design decisions. It’s worth understanding even if the name sounds academic.

When you focus on a near object, your eyes do two things in concert: they converge (rotate inward) and accommodate (change lens focus). In natural vision, these distances match. In most XR headsets, the display is at a fixed physical distance while objects are rendered at various virtual depths — so accommodation and vergence diverge, causing fatigue.

Practically, this means: avoid placing interactive UI elements too close to the viewer (below 0.5m virtual distance), limit the depth range across a single scene, and keep critical content at a consistent virtual depth in extended-wear applications. Variable focus displays address VAC at the hardware level, but most deployed hardware in 2026 doesn’t have this.

VAC isn’t an obscure academic concern — it’s why some users report headaches after 30 minutes in poorly designed XR applications.

Hand Tracking UX Patterns

Hand tracking is available on Vision Pro (hands only, no controllers), Meta Quest 3 (hands or controllers), HoloLens 2, and increasingly on other devices. Designing for hand tracking without controllers requires rethinking standard interaction patterns.

Ray casting projects a ray from the hand or from between the eyes; the UI element you’re looking at or pointing at is targeted. Good for UI elements at a distance (0.5m+). Familiar to users from cursor interactions. Direct manipulation means reaching forward and “touching” a UI element at arm’s length — intuitive for nearby objects, but fatiguing for extended use. Holding your arms out for any length of time is tiring; this is sometimes called the “Gorilla Arm” problem.

Most well-designed spatial interfaces combine both: distant elements respond to gaze and pinch, nearby elements to direct touch.

New users need explicit affordances. 2D interfaces rely on familiar cues — buttons look clickable because they echo physical buttons. Spatial interfaces have no established visual vocabulary for beginners, so you need to build it in. Elements should visually respond when the cursor or hand is near them (colour change, subtle scale, or illumination increase). Audio feedback confirms interactions. Users disoriented in 3D need a reliable way to reset their view — always provide this. And never assume users know how to pinch, air tap, or wrist-turn. The first session should teach these explicitly with visual prompts.

Text Legibility in AR

Text in AR is harder to read than on a screen. The background is dynamic and uncontrolled, contrast varies, and anti-aliasing at high magnification shows rendering artefacts.

AR text should be rendered at a minimum of 24pt equivalent at the intended viewing distance. Apple’s visionOS uses 24pt as a minimum for body text at standard orbital distance — smaller than this and legibility degrades significantly on current hardware. Text floating over an uncontrolled real-world background needs treatment to maintain legibility: a frosted glass panel, a dark scrim, or a contrasting card. Don’t rely on the real-world background providing sufficient contrast, because it won’t consistently.

For font choice, sans-serif fonts outperform serif fonts on XR displays. Thin or light weight fonts render poorly at sizes below 28pt. Use medium or regular weight as a minimum.

And critically: test text legibility across multiple real-world environments. Bright outdoor, dim indoor, patterned backgrounds — not just your studio setup.

Designing for Comfort at Scale

For any application intended for extended use (15 minutes or more), physical comfort is a design constraint, not a nice-to-have.

Keep interaction density reasonable. Frequent precise gestures are tiring, so batch interactions, reduce unnecessary taps, and provide shortcuts for expert users. Don’t place critical content at the edges of the FOV — it requires repeated head movement to see and sits outside comfortable gaze range. For long training or work sessions, prompt periodic breaks and allow comfortable pausing. And design explicitly for either sitting or standing use: the optimal UI height and radius is different for each, and trying to serve both without modes usually serves neither well.

The Bottom Line

Spatial computing design is a distinct discipline from screen UI design. The physiological constraints — VAC, arm fatigue, field of view — aren’t stylistic choices. They determine whether an application is comfortable to use for more than 15 minutes. Invest in comfort zone testing early, design explicit affordances for hand tracking, and treat text legibility on uncontrolled backgrounds as a core technical requirement.