Rendering in UE4(Epic Game TA-Homam Bahnassi講座個人筆記)
Presented at the Gnomon School of VFX in January 2018, part two of the class offers an in-depth look at the rendering pipeline in Unreal Engine, its terminology and best practices for rendering scenes in real-time. This course also presents guidelines and profiling techniques that improve the debugging process for both CPU and GPU performance.
分7個部分介紹UE4中的渲染管線。
Index
- 1.Intro
- 2.Before Rendering
- 3.Geometry Rendering
- 4.Rasterizing and Gbuffer
- 5.Dynamic Lighting/Shadows
- 6.Static Lighting/Shadows
- 7.Post Processing
1.INTRO
- Everything needs to be as efficient as possible
- Adjust piplelines to engine and hardware restrictions
- Try to offload parts to pre-calculations
- Use the engines pool of techniques to achieve quality at suitable cost
- CPU and GPU handle different parts of teh rendering calculations
- They are interdependent and can bottleneck each other
- Know how the load is distributed between the 2
- 不僅僅用來渲染高質量的靜態圖片,也用來渲染有交互的動態場景。
- Quality Features Performance 三者間的權衡
- 調節引擎的pipelines和硬體限制
- 進行預計算
Shadring techniques
- Real time rendering techniques are differnt fromm offline rendering
- Expensive ray-tracing features are approximated or pre-calculated
- Depends on projection(rasterization)
- Shading/lighting are mainly done either through defferred or Forward shading UE4 supports both
Deferred Shading
1.Composition based using the GBuffer
2.Shading happens in deferred passes
3.Good at rendering dynamic lighting
4.More flexible when it comes to disabling feature,less flexible when it comes to surface attributes
延遲渲染:通過GBuffer,渲染動態光照更有優勢。當涉及到禁用特性時,更靈活,在涉及表面屬性時不那麼靈活。
============================================
2.BEFORE RENDERING
CPU-Game Thread
Calculate all logic and transforms
- 1.Animations
- 2.Position of models and objects
- 3.Physics
- 4.AI
- 5.Spawn and destroy,Hide and Unhide
Anything that relates to the posistion of objects to change
CPU階段計算所有的邏輯和轉換,動畫,坐標,物理屬性,創建和銷毀
============================================
CPU-Draw Thread
Before we can use the transforms to rendering the image we need to know what to include in the rendering
Ignoring this question might make rendering expensive on GPU
Occlusion process-Builds up a list of all visible models/objects
Happens per object-Not per triangle
Stage process-in order of execution
- 1.Distance Culling
- 2.Frustum Culling
- 3.Precomputed Visibility
- 4.Occlusion Culling
幾種剔除:距離剔除,視錐剔除,預計算,遮擋剔除。
剔除具體到物體,而不是三角面
============================================
Occlusion Performance Implications
UE4 has a list of models to render
- 1.Set up manual culling(i.e.distance culling,pre-coputed vis)
- 2.Even things like particles occlude
- 3.Many small objects cause more stress on CPU for culling
- 4.Large models will rarely occlude and thus increase GPU
- 5.Know your world and balance objects size vs count
性能分析,1.設置距離剔除,預計算來提高性能
2.小物體太多影響性能,大物體基本上不影響遮擋
3.找到平衡,場景中物體的大小的數量。
============================================
3.GEOMETRY RENDERING
GPU-Prepass/Early z pass
The GPU now has a list of models and transforms but if we just render this info out we could possibly cause a lot of redundant pixel rendering
Similar to excluding objects,we need to exclude pixels
We need to figure out which pixels are occlluded
To do this, we generate a depth pass and use it to determine if the given pixel is in front and visible
z pass 來處理像素的渲染,被遮擋的不渲染。
============================================
Drawcalls
GPU renders drawcall by drawcall not triangle by traingle
A drawcall is group of tris sharing the same properties
Drawcalls are prepared by the CPU(Draw) thread
Distilling rendering info for objects into a GPU state ready for submission
GPU 渲染物體通過drawcall 而不是三角形,CPU階段提交drawcall到GPU state
============================================
UE4 with current gen high-end PCs
2000-3000 is reasonable
More than 5000 is getting high
More than 10000 is probably a problem
On mobile this number is far lower(few hundred max)
Drawcalls count is determined by visible objects
Measure with "stat RHI"
UE4 三角面的數量問題。Drawcall 次數受可見物體的影響
============================================
Drawcalls have a huge impact on the CPU(Draw) thread
Has high overhaead for preparing GPU state
Usually we hit the issues with Drawcalls way before issues with tri count
GPU state之前,Drawcall相比tri count的問題,要優先解決。
============================================
Drawcalls Performance Implications
1.Render your triangles with as few Drawcalls as possible
2.50000 triangles can run worse than 50 million dependent on scene setup(Drawcalls)
3.When optimizing scene,know your bottleneck(Drawcalls vs Tri count)
性能分析
1.儘可能少的drawcall
2.50000triangles有可能跑的比50million性能更差,視drawcall情況
3.優化場景的時候注意自己的瓶頸,是三角面還是drawcall
============================================
Optimizing Drawcalls (Merging objects)
To lower the drawcalls it is better to use fewer larger models than many small ones
You cannot do that too much,it impacts other things negatively
- a. Occlusion
- b. Lightmapping
- c. Collision calculation
- d. Memory
Good balance between size and count is a good strategy
優化Drawcall:合併場景的,更少更大的模型,多個方面作用,剔除,光照貼圖,遮擋計算,內存
============================================
Optimizing Drawcalls (Merging guidelines) 合併準則
1.Target low poly objects
2.Merge only meshes within the same area
3.Merge only meshes sharing the same material
4.Meshes with no or simple collision are better for merging
5.Distant geometry is usually great to merge(fine with culling)
合併準則:
1.低模多邊形
2.相同區域的meshes
3.合併相同材質的mesh
4.相同遮擋或者不被遮擋的物體
5.遠距離的mesh(被精確剔除的)
============================================
Optimizing Drawcalls (HLODs)
Hierarical level of Detail
- a.Regular LODs means a model becomes lower poly in the distance
- b.Essentially swaps one object for another simpler object(less materials)
- c.Hierical Lod(HLOD) is a bigger version, it merges objects together in the distance to lower the drawcalls
優化Drawcalls(HLOD)
Lod 分層細節繪製。遠距離視野的單個組合 靜態網格體 替代多個 靜態網格體,降低每幀的drawcalls數量以提升性能。do Instanced Rendering
- a.Groups objecs together into single drawcalls
- b.Grouping need to be done manually使用Instance來減少drawcall調用
============================================
Strategy is to mix all prvious solutions
Some merged content(Materials merged)
Some modular content(instanced)
and swapable LODs and HLODs
多方面優化Drawcalls
============================================
Vertex Processing
First thing processing the Drawcall
Vertex shader takes care of this process
Vertex shader is a small program specialized in vertex processing
Runs completely on the GPU and so they are fast
Input is vertex data in 3D space output vertex data in screen-space
Vertex-Shaders-Common tasks
It converts local VTX positions to world position
It handles vertex shading/coloring
It can apply additional offsets to vertex positions
VS的作用:
本地空間到世界空間轉換
處理頂點顏色
將偏移量作用在頂點上
============================================
Practical examples of world position offset vertex shaders are
1.Cloth
2.Water displacement
3.Foliage wind animation
具體應用:布料運動,水的運動,風中的葉子。
============================================
Vertex Shaders - Drawback
vertex Shaders do not modify the actual object or affect the scene state, it is purely a visual effect
The CPU is not aware of what the vertex shaders do
Thus things like physics or collisions will not take it into account
vs注意事項:
不影響實際場景,只是一種視覺效果CPU不知道vs做了什麼,物理碰撞不會在vs階段考慮。
============================================
Vertex shader Performance Implications
* 1.The more complex the animation performed the slower* 2.The more vertices affected the slower* 3.Disable complex vertex sahder effects on distant geometry性能分析:
動畫越複雜,點越多性能越慢。遠距離的可以禁用vs特效============================================
4.RASTERIZING AND GBUFFER
Rasterzing
GPU ready to render pixels
Determine which pixels should be shaded called rasterizing
Done drawcall by drawcall then tri by tri
Pixel Shaders are responsible for calculating the pixel color
Input is generally interpolated vertex data, texture samplers
Rasterizing inefficiency
When rasterizing dense meshes at distance, they converge to only few pixels
A waste of vertex processing
A 100k tris object seen from so far away that it would be 1 pixel big,will only show 1 pixel of its closest triangle!
光柵化:ps處理vs階段傳來的頂點信息,距離特別遠的mesh,可能占的像素特別小,會浪費許多vs階段的性能。
============================================
Overshading
Due to hardware design, it always uses a 2x2 pixel quad for processing
If a traingle is very small or very thin then it means it might process 4 pixels while only 1 pixel is actually filled
由於硬體的原因,每次處理2x2 4個像素
============================================
Rasterization and Overshading Performance Implications
- Triangles are more expensive to render in great density
- When seen at a distance the density increases
- Thus reducing triangle count at a distance(lodding/culling) is critical
- Very thin triangles are inefficient because they pass through many 2x2 pixel quads yet only fill a fraction of them
- The more complex the pixel shader is the more expensive
性能分析:密度大的三角面,性能要求高。距離遠密度會變大,儘可能降低三角面個數,thin tri資源消耗大。
============================================
Results are written out to:
Multiple Gbuffers in case of deferred shading
Shaded buffer in case of forward shading
光柵化後的數據用在延遲光照的Gbuffer中。
GBuffer PPerformance Implications
The GBuffer takes up a lot of memory and bandwidth and thus has a limit on how many different GBuffer images you can render out
Gbuffers memory is resolutions dependent
性能分析:GBuffer佔用大量內存帶寬,能渲染出的GBuffer數量有限。
5.LIGHT AND SHADOWS
Two approaches for lighting and shadows
- Dynamic
- static
Lighting(Deferred Shading)
Is calclated and applied using pixel shaders
Dynamic point lights are rendered as spheres
The spheres act like a mask
Anything within the sphere is to receive a pixel shader operation to blend in the dynamic light
動態點光源渲染成球體,相當於一個蒙版遮罩,遮罩內的像素,在ps裡面做混合
============================================
Light calculation requires position
Depth buffer used to get pixels pos in 3D
Use normal buffer to appley shading.Direct between Normal and light
計算光照,深度depth buffer和Normal buffer共同作用,計算光照。
============================================
Shadows
Common technique for rendering shadows is Shadow Maps
Check for each pixel if it is visible to the given light or no
Requires rendering depth for light Pov
在light view空間下,渲染shadow map。
Process Pros/Cons
- Pros
- Is rendered in real time using the GBuffer
- Lights can be changed,moved,or add
- Does not need any special model preparation
- Cons
- Especially shadows are performance heavy
利弊分析:
利:利用GBuffer實時渲染可以動態調整燈光弊:性能代價============================================
Quality Pros/Cons
1. Shadows are heavy on performance, so usually render quality is reduced to compensate2. Doea not do radiosity/global illumination for majority of content3. Dynamic soft shadows are very hard to do well, dyn shadows ofter looks sharp or blocky質量利弊:
性能代價大,降低質量提高性能;無法渲染自發光和全局光照;動態軟陰影效果差。============================================
Dynamic Lighting Performance Implications
- Small dyn light is relatively cheap in a deferred renderer
- The cost is down to the pixel shader operations, so the more pixels the slower it is
- the radius must be as small as possible
- Prevent excessive and regular overlap
動態光照性能分析:
延遲渲染動態光源小,性能佔用較小。成本受ps影響,處理像素越多,越慢。半徑盡量小。避免過度疊加。============================================
Dynamic Shadows Performance Implication
Turn off shadow casting if not needed
- The tri count of geometry affect shadows perf
- Fade or toggle off shadows when far away
動態陰影性能分析:
關閉不必要的陰影,三角面數量影響陰影效果,距離遠的時候簡化陰影。============================================
6.STATIC LIGHTING AND SHADOWS
Dynamic lights and shadows expensive
Thus part of it is offloaded to pre-calculations/pre-rendering
This is referred as static lights and shadows
Lighting data stored mainly in lightmaps
動態光照昂貴,使用lightmap。Lightmaps
A lightmap is a texture with the lighting and shadows baked into it
An object usually requires UV lightmap coordinates for this to work
This texture is then multiplied on top of the basecolor
將光照信息烘焙到原有的紋理信息上。
============================================
Lightmass
Stand alone application that handles light rendering,baking to lightmaps and integerating into materials
Raytracer supporting Gl
Supports distributed rendering over a network
Bake quality is determined by Light Build Quality as well as settings in the Lightmass section of each level
Better to have a lightmass importance Volume around part of the scene
光照烘焙:單獨的模塊處理光照渲染。支持全局光照,烘焙區域和質量可調節。
============================================
Process Pros/Cons
- Super fast for performance in real-time, but increases memory
- Takes a long time to pre-calculate the lighting
- Each time something is changed,it must be re-rendered again
- Models require lightmap UVs,this additional prep step that takes time
利弊分析:
速度更快,但內存增加;須要花時間預處理;場景改變重新烘焙;模型須要光照uv============================================
Quality Pros/Cons
1. Handles Radiosity and Global Illumination2. Renders realistic shadows including soft shadows3. Quality is dependent on lightmap resolution and UV layout4. May have seams in the lighting due to the UV layout質量利弊:
可以處理輻射度和全局光照;可以渲染逼真的陰影;質量受lightmap解析度和uv布局影響;uv布局影響可能出現縫隙;============================================
Static Lighting Performance Implications
1. Static Lighting always renders at the same speed2. Lightmap resolution affects memory and filesize,not framerate3. Bake time are increased by:- Lightmap resolutions
- Number of models/light
- Higher quality settings
- Lights with a large attenuation radius or source radius
靜態光照性能分析:
光照貼圖影響內存和文件大小。貼圖解析度增大,燈光和模型增加,質量提高,光源半徑增大都會導致烘焙時間增多。============================================
7.POST PROCESSING
Visual effects applied at the very end of the rendering process
Uses the GBuffers to calculate its effects
Once more relies heavily on Pixel Shaders
後處理:
使用Gbuffer計算效果。Example:
- light Bloom
- Depth of Field/Blurring
- Some types of lensflares
- Light Shafts
- Vignette
- Tonemapping/Color correction
- Exposure
- Motion Blur
光暈效果,
景深/模糊,光澤貼圖/顏色校正,曝光,運動模糊。Post Processing Performance Implications
Affected directly by final resolution
Affected by shader complexity
Parameter(e.g.DoF blur radius)
後處理性能分析:
受解析度影響;受shader複雜度影響;參數影響,如模糊半徑。參考視頻:
Gnomon Masterclass Part II: Rendering in UE4 | Event Coverage | Unreal Engine
視頻鏈接Presented at the Gnomon School of VFX in January 2018, part two of the class offers an in-depth look at the rendering pipeline in Unreal Engine, its terminology and best practices for rendering scenes in real-time. This course also presents guidelines and profiling techniques that improve the debugging process for both CPU and GPU performance.
推薦閱讀: