Monday, December 2, 2013

Mesh Batching in Unity

After a couple of very personal posts about non-technical stuff, I've decided to write about something strictly technical: Mesh Batching. So, first thing is defining what I mean by mesh batching. I've found a definition I like from some game development forums (link here):

"Batching is a way of grouping geometry together (they should be compatible, e.g. the same material, context states, etc.) to use as few draw calls as possible. In this way, you could fully exploit GPU throughput, so as to improve performance."

Unity offers dynamic and static batching and they work generally pretty well, but they are automatic systems that need to work in every case scenario, which means they need to cover many possibilities. Normally it's easier to implement better optimizations for a particular case than doing it for more generic case scenarios. That's why I'm gonna be explaining here a way of implementing custom mesh batching in Unity that can improve its default system.

Beautiful screenshot of our work in progress title: Super Toy Cars

Anyway, I'd be lying if I said I came up with this system thinking on what the Unity's batching methods were lacking. Instead it was by chance that I started thinking about it, as it's usually the case. I was profiling our game when I realized we had an obscene amount of draw calls for the amount of geometry in our scenes. We had over 1000 draw calls in some cases (even up to 1200) and I decided to get my hands dirty and reduce that number. So the first thing I did was a system to convert geometry that had a color in the material to have it coded in the vertices and remove it from the material (set it to white). That way you could batch all these objects together which before needed to go in a different draw call because the state needed to change (the color of the material is part of the state).

Then I realized I couldn't do that with static geometry since that is pre-processed. So I had to go for dynamic batching. That was ok with me, but since I had to do that I thought, why not do a custom batcher that grouped the meshes I wanted together. That way I could batch together groups of objects that share the same material (really the same texture since the color coding was now in the vertices) and are 'physically' close to each other.

In my mind that made (and still makes) a lot of sense since that should make life easier to the rendering engine. Things that are close together will have a tighter bounding box/sphere and thus can be culled easily and efficiently. Also, since I'm doing that by hand, I can make sure the batches of objects have a reasonble number of polys: not too many (thus slowing down the GPU because it's saturated) nor too few (thus not making the most of the GPU that could process more vertices in one go). In the end, engines normally thank you when you make their lives easier.

Turns out Unity really liked this way of working. I managed to improve the frame rate in my scene up to 5-6 fps higher, almost reaching the 25. I had to do extra optimisations (and some are still pending) but it was a huge leap. A lot more than I expected, although when I looked at the number of draw calls it made a lot more sense: it was always under 400 that's almost a third of the original! That includes some extra work our artist did to merge together several textures and thus reduce the number of materials used.

So I thought my findings could be useful for someone else, but before doing that I should probably get proper numbers on the improvement, right? And then I though, why not make a sample scene, something simple and easy that would show the potential of this technique and where nothing else gets in the way of the improvements. So that's what I did. I decided to do a scene with a lot of identical objects (cylinders in this case) with a not-to-big number of polygons (I think they have around 100 each) and with a simple material (just a diffuse material with a color component). I'll do 2 scenes: one with all the cylinders using the same material (in white) and another one with them using 8 different materials where the only thing that changes is the color (white, red, black, blue, green, purple, yellow, orange). You can see pictures of my beautiful scene here: programmer's art at its most! ;o)

This is the single color scene. There are 4 'stories' of sets of cylinders. Each set is made of 17x17 cylinders making a total of 1156 cylinders. That means the scene has over 100K triangles.

I decided I wanted to test and compare the 3 different options that Unity provides, plus my own MeshBatcher. MeshBatcher is a component I've done, that lives in a game object that's a parent of all the objects you want to batch. It has a material, and it will substitute all the materials from the child objects with the given materials. You can tell it to read the color of the materials in the children and then code that color in the vertices instead. At start it generates the merged geometry and then it removes the mesh filter and mesh renderer components from the children objects. It's simple but it gets the job done, which is the important bit.

I decided to do the tests in my Samsung Galaxy Tab (yeah the first one) which is a bit old and thus will show performance drops easier. All the numbers you see here are from the tests on this very tablet. Obviously testing in different devices will provide different results. Most likely the differences are bound to change, but I think in most cases they won't vary significantly.

So I added a simple fps calculator component that printed it in the GUI and noted down the results with each of the following options:

- No batching: Basically deactivate from the Player Preferences dynamic and static batching. The objects are not marked as static, although I don't think it makes a difference when these options are not marked.
- Dynamic batching: The objects are not marked as static and dynamic batching is marked.
- Static batching: All cylinders are marked as static and static batching is marked.
- Mesh batcher: I group cylinders in batches of 8 that live under an object that has a MeshBatcher component. They are not marked as static (MeshBatcher doesn't work otherwise) and neither dynamic or static batching are marked (actually what's in static batching doesn't matter).

The results for the single color scene I obtained on the aforementioned hardware (Samsung Galaxy Tab) are the following:

OptionFrames per SecondFrame Time
No Batching24 fps41.67 ms
Dynamic Batching24.5 fps40.81 ms
Static Batching23.5 fps42.55 ms
MeshBatcher28 fps35.71 ms

So, as we can see, the custom MeshBatcher is the fastest of the lot. We reduce 5 ms of frame time from the next one which is (surprisingly) dynamic batching. I'm not sure why static batching is providing worse results than dynamic batching. Probably it's not thought to repeat the same object 1000+ times and more used to bigger meshes. Still worth noting.

Just like the single color one, but with 8 different materials. Exactly the same number of tris.

Next test I wanted to do is try using the different colors scene. You can see that it's exactly the same scene but with different colours. Having 8 colours in the material means having 8 materials which mean that we have to divide the objects in 8 different draw calls. Obviously with thousands of objects 8 different draw calls is not a problem, but Unity doesn't seem to be able to batch them properly on its own as the following table of results shows:

OptionFrames per SecondFrame Time
No Batching20 fps50 ms
Dynamic Batching21 fps47.61 ms
Static Batching19.5 fps51.28 ms
MeshBatcher28 fps35.71 ms

As expected the results using the MeshBatcher are exactly the same, but using Unity's batching systems report considerably worse results. That was actually the case I had in our game and the reason why we saw so much improvement when introducing this system. In this case it may shed up to 15.5 ms off the tick time! That's over a 30% of the frame time!

So, summarizing, I believe I can certainly say that our own MeshBatcher is better than Unity's batching methods under some circumstances. Not all circumstances will have our mesh batcher crowned as the quickest way to render them. Batching too many objects many result in worse performance, for instance. Still, I believe that if you balance correctly the number of polys per batch you may get always better results with a custom mesh batcher than Unity's.

This is by no means a critic to Unity's batching method. As I said at the start of this post, Unity has to take into consideration all the possible scenarios where their code has to work. When we do it by hand we can fine tune a lot better. Also, it must be said that our method is not without some drawbacks. For starters it requires you to group your objects by some parameters such as spatial coherency and sharing the same texture. The batcher won't work if the objects inside them use different textures (I believe Unity's batching methods won't work with different textures either). You have to do it all by hand, which means more authoring work than just clicking a checkbox in Unity. On top of that, the batcher requires CPU time at the start of the scene (loading time) that may get to be noticeable if the number of polys is very high. Actually, in the sample scenes shown in this article there's a delay of a few seconds in the loading of the scene due to the generation of the batched meshes. It also requires extra memory since we're generating new meshes for every batch instead of reusing the same geometry over and over.

None of these drawbacks weigh enough not to use it for me. The loading time is quite annoying though and could be fixed getting that work for a background process. Since the MeshBatcher is only generating the geometry and then swapping it for the batched one, we could leave the geometry generation as a background process and go swapping as it finishes. I'm considering implementing this improvement and see if it works fine. I'm not sure all platforms support multithread just fine. It could be done in another way though, with a MeshBatcherManager that takes care of the geometry batching, but only one batch at a time and trying to ensure it won't stop the game for too long. It may result in some stutters during the first frames of the scene, but since those are normally a fade from black or something like that it may not be noticeable. Using multithread should work better leaving almost no stutter.

I think the code of the MeshBatcher is relatively simple to implement. Anyway, I'm considering uploading it to the Unity Asset Store for a small price. Hopefully this will help other people improve performance in their projects and will get us a little bit of money to fund our ongoing game. We're also considering releasing other bits and pieces of code we've developed for Super Toy Cars on the store so that we can get some money to spend in our game. It ain't easy to be indie!

Anyway, I hope this has been helpful to you and if you need anything or want to correct me or add to what I write here feel free to write in the comments or drop me an e-mail.

2 comments:

  1. Interesting. Maybe there could be a way to 'compile' all these changes so the shipped version won't suffer from delays?

    ReplyDelete
    Replies
    1. Yeah, that should be possible. 'Baking the mesh' so that it's ready for consumption in the game. I'll look into that, thanks!

      Delete