Tutorials – Mesh Rendering
About : This tutorial will explain how meshes are rendered in a lot of real time engines such as Unreal Engine 2.
Target Audience : Everyone
Platform : While this tutorial is universal and applies to a very wide range of games, engines and platforms this tutorial will be primarily focused on Unreal Engine 2. That however does not mean it can only be used for that platform, plenty of Playstation and Xbox games work in similar ways.
Last Update : August 2006
Basics
In the UT2003 and UT2004 community there has been a common misunderstanding for years that using the same mesh dozens of times in the same level is faster for the PC to render than unique meshes. That entire misconception has been based on a bit of incomplete information that was released years ago. The truth is that it is a whole lot more complex than “if you reuse meshes it will be faster”. It can actually be a performance killer if you reuse too much.
You first of all have to understand there are two big players in this whole system: The memory and the CPU/GPU. The entire system is based on a balance between these two elements. The incomplete information that was once released only applied to the memory! It told people that if you reuse meshes the meshes will be instanced and thus the mesh had to be loaded in the memory just once which is of course faster for the PC.
That is completely true but it is also only half the story. It only mentions the memory and not the very important CPU/GPU.The CPU/GPU has the most influence on the framerate, not the memory. The memory can make things run smoother, the CPU/GPU makes them run faster.
Before meshes are rendered they are cut up in pieces. Dependent on how the render code was written there are several techniques but in most cases and in the case of both UT2003 as 2004 it renders per material.
That means that if you have a cube with only one texture applied it will process and render the entire cube at once.
However, if the same cube has two different textures applied the engine needs to cut it up and first render all the triangles with texture A. and then all the ones with texture B. That extra step is considerable slower than if there would only be one texture the CPU/GPU has to work with. It cannot render two types of triangles at the same time. Everything that is rendered must first be nicely sorted. That sorting and dividing in to groups/sectors require power and time.
However, that is still only half the story. In the case of UT2003/4 every mesh must also be rendered separately. That means that if you have two identical cubes which both have texture A. applied the engine will render the first cube, wait until it’s completely processed and then render the second cube. That are two steps for the CPU/GPU.
Sectors
Sounds slow already? We’re not finished yet…If you have two identical cubes and each cube has two different textures applied the engine has to render four things! Two textures/sectors per cube and two cubes in total. It has to call on the engine, make a cue and wait for the previous item to finish four times and that is slow.
Every material and every mesh is a sector and every sector has to be drawn on its own!
Practically that means that if you make a path using dozens of separate stone tile meshes it will render a lot slower than if the tiles would be a single combined mesh! Even if the polycount would be the same. In fact the polycount is overhyped!
I can easily make a 10.000 poly room run slower than a 250.000 poly room by just using more materials and smaller meshes. A larger amount of polygons can more easily be drawn if the engine can mass process large groups at the me time. You will be able to crank out more detail if you offer the right conditions to your engine.
So what to do and what not?
Combine several small meshes into a single unique mesh if possible.
In the bad picture the engine has to render a dozen bottles which is slow. In the good picture all the bottles are combined in a single mesh and thus the engine only has to work with a single object which is fast.
This is one of the reasons why you often see (or should see) meshes that are groups of objects. A stack of crates as a single mesh, a bunch of bottles, a group of rocks, and so on. You are a lot better to use sets of meshes than to use individual meshes for every single thing you place.
Another technique is to simply use less textures on a mesh. I often see a lot of level designers make meshes like this:
The mesh uses six textures while you can’t really notice so. If you would place this mesh six times in a room it are 36 things to render for Unreal Engine 2! Use as few textures as possible unless you really have too. If possible try to texture meshes so it only uses a single texture.
To give you an idea how powerful materials are: I once worked with an engine on a PS2 in which every new material to render was equal to a 1000 polygons in terms of GPU/CPU work. For every new material there was it delayed the engine as much as 1000 polygons would do. Even if the material was only on a tiny 20 poly mesh. That tiny 20 poly mesh would have been as slow to render as a 1020 mesh.
The ratio and balance is different for every engine and development platform but it gives you an idea of the power of a simple new material.
Also note that while a single millisecond may sound expendable and not worth any attention it should not be underestimated. If a mesh only costs the CPU or GPU a single millisecond of extra work it does add up if there are hundreds of objects and 40 frames a second. One millisecond more renderwork per mesh means 1 millisecond x 800 meshes x a desired fps of 40=32.000 milliseconds of extra work per second. And since 32.000 milliseconds is 32 times more than 1 second it would mean you won’t achieve the desired 40 fps but a lot less. The single millisecond of work per mesh would slow down the engine 3200% !
While it is more complex than that and while these numbers are purely fictional it does show how hard even a tiny amount of extra work can hit back at the CPU and/or GPU.
The Memory
Now the biggest issue of all this is the memory. In a perfect world a great solution to this entire problem would be to simply make big and unique meshes out of everything and skin everything with non tiling unique textures. However, you can’t. Doing so would mean no asset can be instanced and reused (and thus be more memory efficient). It’s all about a good balance between the memory and the CPU/GPU load.
If you make everything unique you will have a very high memory load and a very low CPU load. On the other hand if nothing is unique and if everything is made up out of lots of small but aggressively reused assets the CPU load will be very high and the memory very low. A good balance between unique and reusable assets is essential.
Determine if your environment or the entire game is especially heavy on the memory or if it requires a lot of CPU power and then shift work to the lesser used component.
Also keep in mind that memorywise unique textures are worse than unique meshes. A single texture can take hundreds of kilobyte or even several megabytes of memory space while a single mesh only requires a few dozen kilobytes in most cases. In general: the higher the polycount the more desirable it becomes to reuse a mesh. If the mesh only is a simple crate you have much more room to make lots of unique sets. Memorywise that is.
There is another drawback to unique meshes. Most engines will render an object when a single vertex or triangle comes in view. If you enlarge a mesh by attaching multiple other meshes to it there is a bigger chance that triangles are in view and thus the mesh can’t be occluded as easily which means a higher polycount.
Unreal Engine 2
In Unreal Engine 2 you can type ‘stat render’ to get an overview of the stats. In the staticmesh area you will see a whole bunch of unsortedunbatchedsectors, sortedunbatchedsectors and so on. In UT2004 unbatchedunsortedsectors are most used so give most attention to that number. Every mesh and every material is a sector. The higher that number the slower it will be. An ideal number is 400 or less. Above 600-800 you might start to experience a significant FPS hit.
Unsorted and sorted are also important. Unsorted means that the sector does not require sorting. Sorting obviously means the sector required an extra sorting step. All textures with alpha require sorting. Anything with transparency requires the PC to calculate what is in front of it and what isn’t. What can be seen and what can’t. If a texture does not need alpha do not enable the alpha on it plus use Masked instead of Alpha where ever possible. Trees, fences and so on will look just fine with Masking instead of a full Alpha.
Ending Notes
Batching
Even while the UT2004 engine does not do so it is possible to optimize the entire process by batching meshes. Batching means collecting groups of identical objects and then process them at once as a single large object. That means the engine could automatically find all wooden surfaces in an environment and render the entire group at once instead of rendering each small piece of wood separately. This method can of course greatly speed up the work.
Dynamic lighting
The situation is more complex when an engine supports full per pixel dynamic lighting. Whenever a mesh is hit by a dynamic light the object will become slower for an engine to process. The problem if you attach too many meshes to a mesh is that the object is far more likely to be hit by a dynamic light somewhere in someway and will thus cause a lot of extra work for the PC. The situation becomes even more grave when two or more dynamic lights hit a mesh, something which a larger mesh is more likely to experience since it is larger…