• Keine Ergebnisse gefunden

Fast Multi-View Rendering for Real-Time Applications

N/A
N/A
Protected

Academic year: 2022

Aktie "Fast Multi-View Rendering for Real-Time Applications"

Copied!
50
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

EuroGraphics Symposium on Parallel Graphics and Visualization 2020

Fast Multi-View Rendering for Real-Time Applications

Johannes Unterguggenberger, Bernhard Kerbl, Markus Steinberger, Dieter Schmalstieg, and Michael Wimmer

TU Wien, Institute of Visual Computing &

Human-Centered Technology, Austria

(2)

Rendering Multiple Views

Stereo Rendering for VR Visibility Algorithms

(e.g. Creation of a Potentially Visible Set)

Shadow Maps

(3)

1/5

Different Graphics

Pipeline Configurations

How about using brute-force?

(4)

How About Just Using Multiple Passes?

(5)

Changed Framebuffer Layout

“Multi-Pass with a large, partitioned framebuffer”

(6)

Performance difference between and

Relative frame times (in milliseconds):

(7)

Performance Difference Between and

NVIDIA GTX 980

NVIDIA GTX 1060

NVIDIA GTX 1650

SUPER

NVIDIA RTX 2080

NVIDIA RTX 2080 Ti

AMD RX 580

NVIDIA GTX 780 AMD

R9 380

faster by -x ms faster by

x ms

(8)

2/5

The Journey of the Triangles

“Geometry Amplification”

(9)

The Journey of the Triangles

(10)

The Journey of the Triangles

(11)

Moving the Loop into the Geometry Shader

(12)

Moving the Loop into the Geometry Shader

“Geometry Shader Amplification”

(13)

Replacing the Loop with Instancing

“Geometry Shader Instancing”

(14)

N ∈ {2, 4}

Performance: –

faster by -x ms faster by

x ms

NVIDIA GTX 980

NVIDIA GTX 1060

NVIDIA GTX 1650

SUPER

NVIDIA RTX 2080

NVIDIA RTX 2080 Ti

AMD RX 580 AMD

R9 380

(15)

Performance: –

NVIDIA GTX 980

NVIDIA GTX 1060

NVIDIA GTX 1650

SUPER

NVIDIA RTX 2080

NVIDIA RTX 2080 Ti

AMD RX 580 AMD

R9 380

N ∈ {16, 32}

faster by -x ms faster by

x ms

(16)

Performance: /

Performance of w.r.t.

NVIDIA GTX 980

NVIDIA GTX 1060

NVIDIA GTX 1650

SUPER

NVIDIA RTX 2080

NVIDIA RTX 2080 Ti

AMD RX 580 AMD

R9 380

N ∈ {16, 32}

(17)

Different Configurations

6+2 different GPUs:

2 AMD: GCN 3 (R9 380), GCN 4 (RX 580)

6 NVIDIA: Kepler (GTX 780), Maxwell (GTX 980), Pascal (GTX 1060), Turing (GTX 1650 SUPER , RTX 2080, RTX 2080 Ti)

6 scenes

Different scene-geometry divisions => varying number of draw calls

Ranging from ~50 to ~1000 draw calls per view/per set of views; depending on scene and view position

4 view configs: 2x1, 2x2, 4x4, 8x4

3 resolutions: 800x600, 1920x1080, 1440x1600 Light/heavy vertex shader load

Additional fragment shader load (G-Buffer rendering)

With a lot/almost no overlap between views (latter: shadows maps)

(18)

3/5

Hardware-Accelerated Multi-View Rendering

OVR_multiview

(19)

Recap: Geometry Shader Instancing

(20)

Pipeline for Hardware-Accelerated Multi-View Rendering

“Hardware-Accelerated Multi-View” or “OVR Multi-View” (Oculus VR)

(21)

“Hardware-Accelerated Multi-View” or “OVR Multi-View” (Oculus VR)

Pipeline for Hardware-Accelerated Multi-View Rendering

To improve efficiency, the NVIDIA

compiler analyzes the input shader and produce a compiled output that executes view independent code once, with the result shared across all output views, while view dependent attributes are necessarily computed once per output view.

NVIDIA, TURING GPU ARCHITECTURE

(22)

Performance: /

Performance of w.r.t.

NVIDIA GTX 980

NVIDIA GTX 1060

NVIDIA GTX 1650

SUPER

NVIDIA RTX 2080

NVIDIA RTX 2080 Ti

N ∈ {2, 4, 16, 32}

(23)

Performance: /

Performance of w.r.t.

NVIDIA GTX 980

NVIDIA GTX 1060

NVIDIA GTX 1650

SUPER

NVIDIA RTX 2080

NVIDIA RTX 2080 Ti

N ∈ {2, 4}

(24)

Performance: /

Performance of w.r.t.

NVIDIA GTX 980

NVIDIA GTX 1060

NVIDIA GTX 1650

SUPER

NVIDIA RTX 2080

NVIDIA RTX 2080 Ti

N ∈ {16, 32}

(25)

Hardware Acceleration Support

How many views can actually be hardware-accelerated with

ONE draw call?

Turing hardware supports up to four views per pass, and up to 32 views are supported at the API level.

NVIDIA, TURING GPU ARCHITECTURE

With Pascal’s SMP engine, which

supports two separate projection centers, the GPU can render the two stereo projections directly in a single rendering pass.

NVIDIA, GeForce GTX 1080 Whitepaper

(26)

Hardware-Acceleration for ALL Views per Draw-Call

(27)

Performance: /

Performance of w.r.t.

NVIDIA GTX 980

NVIDIA GTX 1060

NVIDIA GTX 1650

SUPER

NVIDIA RTX 2080

NVIDIA RTX 2080 Ti

N ∈ {2, 4, 16, 32}

(28)

Performance: /

Performance of xor

w.r.t.

NVIDIA GTX 980

NVIDIA GTX 1060

NVIDIA GTX 1650

SUPER

NVIDIA RTX 2080

NVIDIA RTX 2080 Ti

N ∈ {2, 4, 16, 32}

(29)

Performance w.r.t.

Performance w.r.t.

xor

(30)

4/5

Optimizing Geometry-Shader- Based Pipeline Variants

Culling, and Smaller Batches of Views

(31)

Recap: Geometry Shader Instancing

(32)

Geometry Shader Instancing OPTIMIZED

(33)

Geometry Shader Instancing OPTIMIZED

(34)

Performance w.r.t.

Performance w.r.t.

xor

(35)

5/5

Performance Trends

from 8.9 million measurements

(36)

Performance w.r.t.

Performance w.r.t.

xor

(37)

Performance w.r.t.

Performance w.r.t.

xor

(38)

Performance w.r.t.

Performance w.r.t.

xor

N ∈ {16, 32}

Large scenes only

(39)

Performance w.r.t.

Performance w.r.t.

N ∈ {16, 32}

Large scenes only

Turing only

(40)

Performance w.r.t.

Performance w.r.t.

N ∈ {16, 32}

Large scenes only

Maxwell and Pascal

(41)

Performance w.r.t.

N ∈ {16, 32}

Large scenes only Only low-tier GPUs

(but also Turing!)

Performance w.r.t.

xor

NVIDIA GTX 1060

NVIDIA GTX 1650

SUPER

(42)

Performance w.r.t.

N ∈ {16, 32}

Large scenes only Only low-tier GPUs

(but also Turing!)

Heavy Vertex Load

Performance w.r.t.

xor

NVIDIA GTX 1060

NVIDIA GTX 1650

SUPER

(43)

Performance w.r.t.

N ∈ {16, 32}

Large scenes only Heavy Vertex Load

Performance w.r.t.

xor

(44)

Performance w.r.t.

Performance w.r.t.

xor

N ∈ {16, 32}

Large scenes only

↑ Fragment Load

(G-Buffer Renderng)

(45)

Viewport Discrepancy

Lots of shared vertices for similar views

What about incoherent view frusta? (e.g. shadow mapping)

(46)

Performance w.r.t.

Performance w.r.t.

xor

N ∈ {16, 32}

Large Scenes Only Incoherent Frusta

„Shadow Mapping“

(47)

Performance w.r.t.

N ∈ {16, 32}

Large Scenes Only Incoherent Frusta

„Shadow Mapping“

Performance w.r.t.

xor

(48)

Performance w.r.t.

Performance w.r.t.

AMD

RX 580

(49)

Conclusion

Different Pipeline Variants The Journey of the Triangles

Hardware-Accelerated Multi-View Rendering

Optimizing Geometry-Shader-Based Pipeline Variants Performance Trends

vs.

(50)

Thank You For Your Attention

1 TU Wien, Institute of Visual Computing

& Human-Centered Technology, Austria

2 Graz University of Technology, Austria

Fast Multi-View Rendering for Real-Time Applications

Johannes Unterguggenberger 1 , Bernhard Kerbl 1 , Markus Steinberger 2 ,

Dieter Schmalstieg 2 , Michael Wimmer 1

Referenzen

ÄHNLICHE DOKUMENTE

5/19/2020 Innovation Workshop Online-Session - mapp Motion und mapp Cockpit 3...

σ O(1) d O(max(n 2 ,ns )) steps on a Turing machine Real point of view:.

mapping tracker to real world position mapping HMD to real world

The algorithm constructs a shadow map by rendering the scene into a z-buffer using the light source as the view point.. Then the scene is rendered using a given view and visible

The way to employ TC in real-time rendering was introduced with real-time reverse reprojection, a screen-space approach that allows to cache and reuse shading results from

// attributes of the output Vertices (to Tessellation Evaluation Shader) out vec4 in_Position_ES[];. out

Listing 3: This View / HLSL code is generated from the semantic model / shader tree example (see Figure 3.3), which is based on the iDSL example from Listing 2..

By this it is usually meant a more important role for mass, competitive, and partisan politics both on the input- (expression of preferences) and on the output- (collectively