From the Burrow

TaleSpire Dev Log 140

2020-01-03 14:44:39 +0000

Today I’ve continued on server work:

I rewrote the sign-in code to promote the connection to a secure websocket if sign-in succeeds.

I’ve started wiring up erlbus so that the client will subscribe to the campaign and board and receive messages broadcast by other clients or the server.

One interesting note with that is that when messing with erlbus from the repl you might run into a badrpc error when trying out ebus:messages. This was happening as ebus:messages uses rpc:multicall behind the scenes and it wasn’t finding the function on one of the nodes. The node with the issue was the one I was connecting my remote shell from as, naturally, that doesn’t have the app compiled. The quick fix is just to use -hidden in your erl arguments as multicall contacts [node() | nodes()] by default and when you are hidden you wont show up in nodes() unless that behavior is specifically requested.

That’s the lot for today (other than the usual misc bug fixes). I’m flying back to Norway on Monday so most of the day will be a write-off due to travel. I may get a little done on Sunday but tomorrow I’ll be visiting some friends I haven’t seen in a few years (which is gonna be great!).

Ciao!

TaleSpire Dev Log 139

2020-01-02 17:38:36 +0000

Over the break, I’ve put down the front-end code to begin getting a handle on the changes that will be made to the backend. Most of this has been focused on reading, here are some of the things I’ve been dipping into:

Reading things

  • https://erlang.org/doc/man/gen_event.html some of the http handling code got a bit centralized a was slowing me down when making changes. I hadn’t used gen_event’s before as I found them a bit confusing. They are somewhat different than gen_server and gen_fsm as you don’t implement the manager but instead just the callbacks. It’s pretty neat once you grok it, though.
  • http://blog.differentpla.net/blog/2014/11/07/erlang-sup-event/ one part of understanding gen_event is knowing how to include it in your supervision trees. This covers those bits that mostly seem left out of other tutorials.
  • https://github.com/cabol/erlbus/ erlbus is an important piece of the changes I’m making, and so I’ve been getting familiar with it’s approach. It also contains one of the more lucid examples of using websockets in erlang, which was helpful in other tests.
  • https://www.phoenixframework.org/blog/the-road-to-2-million-websocket-connections An inspiring read. Little bits and bobs scattered throughout that were useful. Also, it’s just nice to see what certain pieces we use can potentially deliver (although we naturally aren’t doing this).
  • https://github.com/uwiger/gproc/blob/master/doc/erlang07-wiger.pdf This paper describes the evolution and technical realities that led to gproc which is something I intend to use in place of the standard global (across multiple erlang nodes) process registry.
  • http://erlang.org/doc/apps/stdlib/stdlib.pdf this naturally is WIP. Skimming the standard lib of any language is a great way to find out things you didn’t know were there, avoiding redundant work is nearly always a blessing
  • https://www.amazon.com/WebRTC-Cookbook-Andrii-Sergiienko/dp/1783284455 I’ve been skimming bits of this again as it’s likely to form part of the first implementation of the p2p voice & video chat
  • https://ninenines.eu/docs/ dear god ninenines is the best. Cowboy, gun, ranch. All fantastic quality, super robust and well documented. I have no idea how I’d function without their stuff.

Making things

As we use Photon for realtime networking in TaleSpire, we only need to focus on lower frequency events, such as persistence, for now. The alpha used a hacked together REST’ish api, which did the job but has all the expected issues (e.g. having to poll for changes). We are switching to using websockets for the Beta as it’s a relatively incremental step, will let us overcome some shortcomings, and still makes sense given the scale we will be at. It, too, will be a target for replacement in the future, but that is a subject for another day.

All of the server api the game uses is described by a data-structure that we then generate erlang & c# code from. I’ve updated the generator to create websocket handlers rather than the http/s ones. In doing so I’ve also been undoing some of the code that centralized some of the http management.

The last thing I did was update the code that polls to see which domain it is at before attempting to pick the right certificates and start serving over https.

The next step will be to rewrite the session handling code as it currently assumes a stateless connection.

Until next time,

Peace.

TaleSpire Dev Log 138

2019-12-06 09:09:52 +0000

I’m currently writing this on the train home,

I’ve been down working with Jonny for the last time (in person) this year, and it’s been a great few days. We made a bunch of progress in different places, so lets natter about that.

Tiles

First off, we were looking at tile placement and control. We may have found a nice change which could help with some tricky cases our alpha testers would get in when working with walls. My ultra vague language is due to it being so early in the prototype phase that we are not ready to talk about it yet. This is not the first time in the last 4 months when we have had a fix that play revealed to be worse than the initial problem. When we are more confident, we’ll do a full writeup (or maybe a short video) to go through the different things (Just placing tiles has surprising nuances to it :D).

Meatier News

As that is all arm waving and no substance, here is some real news: You will be able to place creature miniatures off-grid. This has been play-tested for a while internally now and, although it does introduce some challenges where it comes to UI/messaging, it does feel pretty cool. For those who have wrangled narrow corridors in the alpha, this should feel rather freeing :p

Do note that we are still keeping the tiles to the grid, however. Without this limitation, the worst-case complexity for various systems becomes totally unmanageable. Remember that doubling the length (or resolution) of the side of a cube increases the volume 8 times. So if you have anything that operates over the volume like say, fog of war or line of sight, your day would have gotten much worse (I know, I go on about this in almost every post).

The Right Questions

I also think I know how cross-zone pathfinding will work now. I’ve been struggling a little with how to maintain a single graph as zones are loaded in and out. Jonny did me the simple favor of asking if it could be computed on demand. Building the nav-mesh from scratch each time a creature is picked up sounds like a lot of work, but it actually reduces the complexity a lot and opens up some opportunities for parallelization. We just need to keep the per-zone input data sorted in a way that helps this on-demand process as much as possible, and that bit is easier. More on this when I have tested it.

Monster Branch Hunter

We have also started the long process of getting everything merged into master. The part of this that required us to work closely was moving to Unity assemblies for the core project as suddenly their ‘magic folders’ stop working[0]. This means reorganizing the whole codebase, which wasn’t too bad[1]. However, this can make for seriously ugly merge conflicts and so it was best to stop development for a couple of hours, get the change merged in, and test on both our machines.

Testing

I’ve also started working on another approach to ironing out bugs from the board code. It’s very much inspired by my uninformed scanning of property-based testing. We make a dirt-simple but accurate model of the code we wish to test, and then we generate random streams of operations that are applied to both the real version and the model, and we compare the results. The model gets to ignore all details of multithreading, being performant, etc. Its only job is to be understandable and give the behavior and results intended from the real version. In previous tests, I had already made a fake network interconnect so I could apply operations to one board and make sure the other board ended up with exactly the same result after sync so now I can wire things up like this

    [Randomly generated board operations]
             /              \
            |                |
            v                v
    [Simple Model]     [Real Board 0] <--fake network interconnect--> [Real Board 1] 

We feed the model and ‘Real Board 0’ the same operations, but then compare the results for all three boards. This gives us a pretty decent amount of testing for almost no additional work.

I will stress that this isn’t property-based testing. I want to learn it, but I don’t have the time to take on a new technique right now. However, I know that even this limited approach can give good results, and I think I’ll be able to add some very basic test-case minimization too.

Christmas Approaches

And with that, it’s almost the end of the year. There is more to come, but I’m going to posting a bit less until January as I’ve managed to schedule everything to happen in one month, and apparently it’s arrived. For the next week, I’m on holiday, and then I’ll be heading to the UK for Christmas. During my time in the UK I’ll be working some evenings on server stuff, so expect a very significant shift in content then!

I hope this finds you well folks.

Peace.

[0] by default, Unity automatically makes assemblies that separate game and editor code based on special folder names. It’s handy but naturally goes away when you decide to handle assemblies yourself.

[1] There are a bunch of ways to layout your project, but we found it simplest to make 2 folders, ‘Runtime’ and ‘Editors’, and move all game code to the former and all the editor code to the latter. The directory structure on both sides is mirrored. We have a little helper menu for writing the boilerplate code for inspector editors, so we updated that to respect the new structure and added a right-click menu entry to jump to the editor folder if it existed.

TaleSpire Dev Log 137

2019-12-04 16:03:31 +0000

Hey! I best recap the last two days before even more get away from me.

Monday

Monday was spent debugging some of my uses of unsafe code. Unity’s Job system has some great tools to detect race conditions, but they also let you disable it if you have a use case that demands it. This is a wonderful attitude, and I’m very grateful for that. However, boy can I corrupt some memory with this stuff.

The first thing was debugging a new NativeCollection I had made. This is a Stack with a fixed max size that can have multiple concurrent writers or readers (but not both at the same time). I use this to hand out and return large numbers of Ids to a limited resource, and to do so from concurrent jobs.

The id, in one case, is an index into a big array that stores state that is later pushed to the GPU for use in the shaders. Each visible tile has one entry in this array and gives that spot back when their presentation is destroyed. This can happen with large numbers of tiles across multiple zones simultaneously.

Because each tile gets a unique index into that array, we know that they will be the only thing accessing it, so we don’t want Unity to protect us from race conditions that would occur otherwise[0]. This means we add the [NativeDisableContainerSafetyRestriction] attribute.

That, of course, allows you to really make a mess, and sure enough, I did :p.

In this case, it wasn’t due to that attribute, however. I was using MemCpyReplicate to set the default shader state across a portion of the array and I may have incremented the pointer to the strut rather than the pointer into the array. So I happily tried to copy invalid portions of memory into the array.

I was actually super lucky that this caused Unity itself to crash very consistently. These kinds of bugs are horrifying if they stay under the radar.

With that done, all tests passed again, and I got back to work.

Tuesday -> Wednesday morning

Tuesday started with traveling to visit @Ree. It’s always great when we get to do this as certain kinds of tasks are so much easier.

In a related subject.. yesterday I was working on the line of sight shader :)

This starts with rendering the scene into a cubemap from the head of the creature you have just placed. Each thing rendered is colored based on an id (zero for all occluders and >0 for creatures or potentially other points of interest). We then run a compute shader to sample the cubemap and aggregate which ids are in there.

To start with, we want to store 32bit ids, which meant using a floating-point texture format. This seemed to work, but every time I read the value from the compute shader, the value was always clamped between 0 and 1.

It’s my first time doing this in Unity, so I spent ages trying to find out what I’d done wrong until I gave up and ran renderdoc on it. I should remember to do this first as it turned out Unity was rendering the faces in non-float textures and then copying the data over :

RenderToCubemap doesn’t seem to warn about this, but @Ree got me on the right path by saying we should try a RenderTexture instead. RenderToCubemap has an overload for this, and there is even this interesting line in the docstring for the overload that takes the cubemap directly

If you want a realtime-updated cubemap, use RenderToCubemap variant that uses a RenderTexture with a cubemap dimension, see below.

I’m guessing that this is a hint to the intermediate copies (and maybe a temporary FBO too?).

After moving over to that, I started getting values higher than one out of the texture YAY! The values are wrong, but frankly, I don’t care about it today. The pieces of the puzzle have all been started, and I can work on those from home. It’s best to use this time I have over here to touch on as many other things as possible.

The rest of today

There have been more chats working out details of performance around tile hide and cutaway and much musing on how to merge out branches in a way that doesn’t slow us down.

That’s my next task, to update the main dev branch to use Unity’s assemblies and begin moving in the new data model code.

We have to be very careful though as it’s soon Christmas and we really mustn’t block either us from being able to work as we won’t be available to help each other during that time.

That’s all for now. Seeya!

[0] We aren’t, in this case, too concerned about false sharing or other forms of contention that may hurt us. This will get more of a review later, however.

TaleSpire Dev Log 136

2019-12-01 21:30:57 +0000

Phew, so that was the meat of the week’s work. Let’s wrap up the loose ends.

This is probably skippable as it’s just a note of some other things that got changed.

Moving to Unity.Mathematics

Unity added a new math library recently and whilst having practical benefits, it also results in much nicer code to write and read so we have started using it for all new work.

I took an hour this week to move all the tile data code to use this except for Bounds as I want to make sure that the new behavior matches what I expect.

I also got to delete my Vector3Int32 class as Unity.Mathematics contains int3 and I renamed my Vector3Int16 to short3 and tweaked the API to match that of Unity.Mathematics so it all feels more consistent.

Realtime changes

One future task that has been making me nervous is updating the realtime networked portion of TaleSpire.

I had a little breakthrough simplifying some of our code that got very spidery and complicated during the development of the alpha. I was going to write a little about that here but I’m gonna save it for another week as it would turn this into another long post.

Start loading assets earlier

I probably should have mentioned this in a previous post but oh well.

Unity loads asset bundles asynchronously. As we don’t want to try load every tile in TaleSpire at once, when you first place a tile we need to load it. This results in some frames passing before spawning continues.

This could be exacerbated by the fact that changes are queued as it won’t know a tile is needed until it’s time to spawn it.

For that reason, we make sure that as soon as the data representation has a new tile kind added, it notifies the asset loader so it can start loading the asset.

This will rarely matter but it may help when there is heavy load, and that’s the time you need the most help.

Merging this monster

I’ve also started kicking this code into a shape where it’s suitable for merging into master. Jonny and I divided up the work so that, technically, neither of us have tasks that block the other. However, we still need to end up with one game and my current monster branch goes against everything I like when developing.

Ah well, soon enough it’ll be in.

Actually the end for realz

Well, that’s the week. In that time I was also best man at the wedding of a dear friend so it’s been a doozy.

It’s probably time for a cup of tea.

Goodnight!

TaleSpire Dev Log 135

2019-12-01 20:23:16 +0000

Sound of trumpets BEHOLD, Part 3 commences.

We have talked about updating the data and the presentation but we still have to animate the tiles.

In TaleSpire we want to be able to have animate tiles as they appear and disappear as it makes the game feel better. We have a lot of tiles to update so we have to take care of how we do this.

A simple problem

We wouldn’t want to have to update the positions of the GameObjects every frame so we animate the drop-in using the vertex shader. The animation curves are simply stored as ramps inside textures we can sample over time. This means we only have to update the per-instance data for the GameObjects when we want to change which animation they are running.

A secondary problem

One annoying thing though is that, when changes to this state do come, they tend to come in large numbers. One example of this is during tile selection.

When players select tiles we raise them very slightly so show that they have been selected. This means we need to update the per-instance data for those tiles. Also if the tiles have props attached to them those props need to be raised too.

Unity’s interface to this is via MaterialPropertyBlocks and like so many things in Unity it can only be updated from the main thread. This kinda sucks as what we need to do is perfectly parallelizable

If the tile bounds intersect the selection bounds then set the value

This is extra annoying when we consider that selections bounds are 3D and so the number of tiles that may be involved goes up terrifyingly fast.

To resurrect our example from the previous articles a 30x30 tile slab contains 900 tiles. Due to TaleSpire’s smallest tile size being 1x0.25x1 units A 30x30x30unit region could (theoretically) contain over 80000 tiles.

Obvious that is insane and we will need to add some form of density limit, but the game will need to communicate this limit to the players in a way that feels fair.

Once again user-generated-content games bring their own flavor of crazy to the party :)

The secondary problem with a hat on

But we aren’t done, oh no we are not. Last time we talked about progressively applying changes to the presentation. Due to this, a selection can be made on tiles that are still spawning. According to the data layer, you selected perfectly valid tiles but the presentation hadn’t quite caught up. Maybe the tile spawned the frame after you let go of the selection and so as far as you percieved it was there when you let go.

Now there are different ways to resolve this but the way I want to go with for now is that the data is the source of truth, if you selected the right region you selected the tiles even if the presentation hadn’t quite got there yet.

I expect this to be an edge case that people won’t even notice, but it is better to have an answer for it.

One change

So let’s talk about one change that helps here. What we are going to do is move the tile’s animation state out of its per-instance data and into a separate buffer on the gpu (let’s call this the tile-state-buffer). Then we will put the index into that buffer into the tile’s per-instance data instead.

This little change lets us solve all the above issues.

First off we will have a copy of the tile-state-buffer in local memory. This will be a native container so that we can operate on it directly from Jobs. This means all those selection updates are done in parallel now.

Next, we just gained the option to push only part of the local buffer to the GPU per frame. This is super handy in cases where huge numbers of tiles are being changed per frame. There is a concern this delay to the start of the animation could feel slightly unresponsive, however, I’m expecting that as long as something starts happening on the same frame as the action it should feel ok. Especially as this means that lots of changes are being made.

This also plays nicely with our selection -v- progressive update issue. We can now make the changes to the tile-state-buffer independent of whether the presentation for the tile has finished loading yet. As soon as the GameObjects are spawned they will put their state index into their MaterialPropertyBlock and they will be up to date immediately.

And another thing

One obvious this I skipped when presenting the selection problem is that updating all tiles in the selection is totally unnecessary. We only actually need to update the shader state for the tile that has become selected or unselected since the last frame.

From the following diagram, we can see that in 2d that means you are inside one of two rectangles. For 3D it’s 3 cuboids.

                     frame 1                     difference
                  +----------------+    +------------+---+
                  |                |    |                |
   frame 0        |                |    |     0          |
+------------+    |                |    +------------+---+
|            |    |                |    |            |   |
|            |    |                |    |            |   |
|            |    |                |    |            |   |
|            |    |                |    |            | 2 |
|            |    |                |    |            |   |
|            |    |                |    |            |   |
|            |    |                |    |            |   |
+------------+    +----------------+    +------------+---+

Given that we can control how fast the panning in the game is this makes the number of tiles to update per frame much less. I left this until now though as selection makes a great example of the issue of tile counts in volumes and we have other operations for volumes that benefit from these changes too.

More? More.

Well, this got long. I think I’m gonna save the miscellaneous stuff for another post.

See you there!

TaleSpire Dev Log 134

2019-12-01 18:41:55 +0000

Alright, part 2!

So in the last post, we talked about updating the data representation of the board. This time let’s get into the bit the players see and interact with. The GameObjects.

In Unity’s currently supported approach, any thing in the world is a GameObject[0]. GameObjects do very little on their own but their properties and behavior come from Components.[1]

We don’t need to understand much more about this for now except that Unity naturally has to manage all these GameObjects and, as with much of the API, you can only interact with them from the main thread.[2]

Rendering

One of the nice things Unity does do for us is handling instancing so that it’s fast to draw many of the same kind of object. For example, if we have 1000 of the same tile in the scene then Unity can make one draw call to the GPU to draw them all.[3]

We aren’t going to talk about rendering any more in this post but we will revisit it in the next one as there are some issues we need to overcome.

Spawning

So we have a TileSpire-like game where we want to have the players be able to make tonnes of tiles, they should be able to drag out or paste big slabs and the delete individual tiles as they please.

If someone drags out a 30x30 slab, we need to spawn 900 tiles. Depending on the complexity of the tile Unity might not be able to handle instantiating that many new GameObjects in a single frame. This is serious though as when people are building we have building actions arriving from other players fairly frequently and we wouldn’t be able to apply the changes robustly until all the GameObjects are made.

The first change we could make is to keep the data for Tiles separate from their GameObjects. This also gives us opportunities to store that data in blittable types and use Unity’s Job system to operate on them (like we talked about in the last post)

I’m going to shift from talking explicitly about tile data and GameObjects to talking about the tile data and it’s ‘presentation’ unless the fact that GameObjects are involved is important

With this, we have the opportunity to make the data changes immediately and apply them over a number of frames. This is the approach we are using and is a part of why we want to apply data changes as fast as possible, because there may be multiple arriving in a single frame.

Relationship management

One issue we have given ourselves is that, now that the data and presentation are separate, we need to keep then in sync. If someone deletes a tile we need to delete the presentation too and, naturally, the same goes for undo/redo/etc. We could keep an index from the data to the representation but this has implications:

  • We need an extra integer of storage for every tile. 4 bytes is nothing on its own, but 100000 tiles is not a crazy number of tiles to expect to have to deal with so it does add up.
  • It may benefit us to have different layouts for the data and the presentation arrays. That could mean having to update the data->presentation index. One can then assume you’ll need to have a way to do this quickly.

Now you may not care about either of those, which is valid too. But whichever way you pick there will likely be tradeoffs.

One simplification we can do is to note that tile data is added in chunks. If we keep presentations of tiles from the same chunk together then you only need indices per chunk, this can help in the approach you pick.

Avoiding work

Other than being able to spread out work over multiple frames we get some trivial opportunities to avoid work too.

First off, if we can apply changes to data first and then generate the changes to the presentation afterward then we often get to skip work. Imagine an Add followed by a Delete arriving in the same frame, by looking at the data afterward we only need to spawn the tiles that remain, rather than first spawning them all and then applying the delete.

Also in the case where there is a small backlog you can get opportunities to skip entire actions. The best case is an Undo & Redo together (this can happen when people are playing around to see if they like a particular change). In that case, you can remove both actions from the queue as they cancel each other out.

You don’t always need to present anyway

If you have players in two different parts of the board you may want the data for the zones in memory but not have to have a presentation of the Zone you are nowhere near. This is trivial here as the data is separate.

This makes jumping to that part of the board much less jarring than it would be as you are already to date with the latest version of that part of the board.

Applying changes

A Zone’s presentation now gets given a ‘budget’ of how much it can change this frame. It keeps popping changes from its queue and applies as much as it can until the budget runs out. The budget is represented by a floating-point number and we give separate costs to spawning tiles, reusing ones from the GameObject pool, deleting tiles, etc.

These numbers are made up by us but they are easily tweakable so we have a lot of freedom now to profile and find out what works. We’ll chat more about this in the future when we get more data.

Sticking it together

So we are doing a bunch of these things. We have a separate presentation for each Zone. They each have queues of changes to apply which, due to how tiles are managed in data, mainly boils down to adding and deleting chunks of GameObjects by their Id.

We use the code that applies the change to data to make simpler change instructions for the presentation so that it has much less work to do.

We don’t maintain any indexes between the data and presentation (beyond unique ids of chunks) so we don’t have to do any upkeep work, but we do end up scanning tiles in more cases that we would have to otherwise [4]

We don’t do all the work avoiding stuff yet but the hooks are in the code so we can add this easily when we want to. First off I need to iron out the bugs that remain.

Next post

Ok so that wraps up this post, in the next one, we need to talk a little about rendering and a few miscellaneous bits and bobs we’ve also done this past week.

Ciao

[0] Ignoring drawmeshinstanced and friends for this as it makes for an easier to parse, if slightly inaccurate, sentence.

[1] At least, only the Unity data, you can change the fields on your components as you like.

[2] I should also mention we are not using the new ECS yet as some features we need are only just arriving and we would still need to do a bunch of work to test everything we need will work in the new system. I’m 95% sure we’ll be moving to it within the next year though.

[3] Because of reason 1023 is the largest number of instances that can be made in one call in Unity. This is a limitation of their system and it’s a bit of a shame, but we will deal with this another time

[4] Tiles in the presentation are still currently in the same order as the data representation, so this does still leave us with ways of making applying deletes less costly. Currently, this is not an issue though so I’m leaving it for now

TaleSpire Dev Log 133

2019-12-01 16:41:10 +0000

Hi folks,

There have been no daily updates for the last six days as I’ve been in a sort of self-imposed crunch to hammer out some things that have been on my mind.

The first one was related to the performance of Add/Delete/Undo/Redo operations on tiles. The faster we do this the better and we have a bunch of opportunities to make this quick. The first was to move the core code of these features over to Unity’s Job System.

Why we really have to care about performance

For newer arrivals to these writeups, performance may seem an odd concern for a game where the rate things are happening seems much lower than that of, for example, an FPS. For us, it’s not the frequency of actions that plague us but the volume of work that can be created by them, and the fact there is no general pattern to when they happen.

For example, dragging out a 3x3 square creates 9 tiles, but 30x30 is 900 tiles.

  • Tiles are usually made of smaller assets so that’s 900*N object to spawn[0] and render
  • If any of those sub-objects have scripts then that’s 900 more scripts running per interactable sub-object
  • The first thing tiles do is drop into place, so you best be ready to animate 900 of those (we do this via shaders)
  • When people drag to select we slightly raise the tile to show it has been selected so you will need to be able to query many thousands of tile bounds quickly.
  • etc

.. and 30x30 tiles isn’t many considering a board is meant to be 30000x30000 units wide and 10000 units high.

All of this isn’t to complain, we want the game to behave like this, but we do need to be clear to ourselves on what needs to be achieved within the 16ms of a frame[1]

Jobs

So back to the jobs. One of the first things I looked at was the collections we use to store the tile data. As of today, the data for tiles are split up somewhat like this:

  • Board: contains N zones
  • Zone: contains up to 16 client-data-chunks, one for each client[2] who has built in a zone
  • ClientDataChunk
  • AssetData: Holds the data for the tiles themselves
  • AssetLayouts: Holes layout information of the data in AssetData

As you can imagine we have lots of opportunities for performing operations in parallel here. Zones are a 16x16x16 unit chunk of the board and so if an add or delete crosses multiple zones all of those adds/deletes can potentially be done in parallel.

Also as each client’s data is separate within the zone we have opportunities here too. This has so far been less useful as an Add only affects the data of the client that performed it, and during deletes, we want to store the tiles that were deleted for undo/redo and so that would require collections that concurrently have arbitrary numbers of tiles written into them. Making a new collection type to cover this case was too much of an additional distraction and the common case is one or two people building so there is less parallelism to be gained here anyway. [3]

A diversion into better collections

Here is a rough idea of how AssetData is laid out.

// (the data is actual flat but grouped here for clarity)
AssetData: [ [Tiles from Add 0] [Tiles from Add 1] [Tiles from Add 2] [Tiles from Add 3] ]

ActiveHead = 4
Length = 4

We tiles we consider active are all from the Add less than ActiveHead. So in the above case, all of them

One of the pain points from the alpha was Undo/Redo feeling unresponsive with large numbers of tiles. With this layout, the undo of an Add is just subtracting 1 from ActiveHead (we will talk about spawning the visuals for the tiles later). By not deleting the data yet Redo is just adding 1 to ActiveHead.

If you Undo and Add and then perform an add or delete action there is no way to Redo that add so we just overwrite its data with new tile data. Thus we rarely need to shrink the backing NativeList behind AssetData.

Now having the Head integer and Native collection separate is fine but as they need to be updated from jobs together it helps for them to be together in a native collection that properly uses Unity’s thread safety checks to make sure nothing funky is going on. To that end, I spent a little while making a collection with the terrible name NativeGreedyList.

It has a Capacity, and ActiveLength and a FullLength. The FullLength is the apparent length of the List. The ActiveLength is the ActiveHead integer from before. Capacity and FullLength are separate as you often want to allocate in larger increments than you immediately need so you have to resize less often.

Unlike NativeList we don’t every reduce the capacity unless explicitly required to (as the data past the ActiveLength is often still valuable to us)

Back to jobs and the problem with multiple people building

Alright, so armed with new collections the work Jobifying everything continued. After a bunch of wrestling and learning, I got this into shape and so now the data updates are parallel. Yay!

But of course, there is more.

One thing that has been on my mind is ‘roll back’ and ‘reapply’ and that what I’m going to ramble about now.

When you have multiple people changing the board you need to make sure that each change is applied the same on every client to make sure they all end up with the same board[4]. When you only add stuff it’s easy but with Delete the result depends on the order the operations are applied.

To set the order each ‘action’ is sent to one client who is deemed the ‘host’ and they give it a history-id and send it on to the rest of the clients. Great, this gives everything a set order but it also means that you don’t see a change until it’s made that round trip to the host. That kind of delay is unacceptable (it feels awful) so we did the following:

  • Send the action to the host
  • Apply the change locally immediately
  • time passes
  • An action arrives from the host
    • if the action is the change we already made, we don’t need to do anything.
    • if the action is from a different client we:
      • ‘roll back’ our change
      • apply the action from the other client
      • ‘reapply’ our change

This works but necessitates some form of ‘snapshot’ we can roll back to[5]. This has been a pain for a while and something I’ve never been happy with. However, during the week I was looking at the problem of spawning the Unity GameObjects for all these tiles (more on that in the next post) and I spotted a cool bunch of details.

For the rest of the article we will use the term ‘uncommited’ to refer to actions/tiles we have made but that have not yet been confirmed by the host, and so would have to be rolled back

  • Each client has separate collections for their assets in a zone
  • Adds only affect the data for the sending client
  • Deletes can affect data for all clients
  • Deletes have not only a region in space but also a point-in-history (as a selection may be made before the person hits the delete key)
  • An uncommitted change, by definition, can only be on the client that made it

So by making sure that delete only operates on tiles that have been committed then we don’t need to rollback for this case. We do have to reapply our deletes though as they may have added tiles one of our selections should have captured. When we reapply a delete we only do it to the data of the clients whose actions were committed before ours.

Undo & redo are also simple as they can only affect asset data for the client that performed the action[6]

With all this in place, we get to totally remove snapshots. It requires a bunch of tweaks to how changes coming from the network are applied.

This is one of those lovely times where data layout fundamentally change what has to be done to implement a feature. I find that stuff pretty cool.

MORE!

This post has got long enough but there is still plenty more to cover from this week so let’s get back to it in the next post.

Ciao

[0] We do use instancing but until Unity’s ECS supports per instanced data we are forced to stick with the classic GameObjects approach which means we do need to spawn separate GameObjects.

[1] or, of course, spread out across many frames

[2] A client, in this case, is a single running instance of the TaleSpire executable.

[3] This will be revisited in the future as that collection type would be very handy. I also didn’t use NativeQueue as ParallelWriter is only for use in IJobParallelFor jobs as far as I can tell.

[4] Previously we have talked about how we can’t afford to be sending all the tile data across the network so instead, we just send the building ‘actions’ the players perform.

[5] Using Undo/Redo wouldn’t work in this case due to how deleting tiles placed by other clients works[6]

[6] If you delete tiles from another client and then Undo that delete the tiles are not placed back in their collection. Otherwise, if they undo and then redo they restore those tiles, which feels a bit odd.

TaleSpire Dev Log 132

2019-11-24 01:23:54 +0000

The last two days touched a few different things.

First off I was working on the resource list for the Spaghet scripts. This holds references to Unity objects in the asset like Colliders and Animators, and also the configuration of radial menus and GM requests.

In implementing this I re-learned a terrible rule of Unity’s immediate mode UI. Don’t try and make the code ‘nice’ you’ll sink ages into chasing things that look like they would help only to hit some limitation of the system and have to throw it all away. DONT DO IT. I’m sure I was meant to have learned this lesson last time, but apparently, that wasn’t enough.

Once that was good-enough-for-now™ I started looking into reimplementing the doors/chests/etc using the new scripting system. For now, I’m not worried about improving the graph, as I want to switch to Unity’s node graph elements at some point, so I just wrote the ops by hand. In the process, I added a few more operations to the language: yield, goto, and an op to send a message to a resource from the resource list.

It’s not done yet but after the resource list thing, I was a bit tired of scripts and put it to the side for another day.

On Friday I started off with a planning session where I just try and work out any small details I can across the tasks on my todo list.

One that stood out as needing some work was pathfinding so I’ve started work on that. The goal for the beta is as follows.

When you pick up a creature you should be able to see all the places you could walk to within a given range (determined by the creature)

This means we are looking at a kind of flood-fill where we ‘walk’ something for a certain distance, but what do we walk?

Well, last year I walked an octree of the board itself (we did this for fog of war so the octree was available) and this was hard. This time I want a dedicated graph. I’ll write more about this once I’ve got it working.

And that’s the lot for now. It’s been a good week and there will be more of these logs next week.

Until then, peace.

TaleSpire Dev Log 130

2019-11-19 23:06:10 +0000

Today I carried on working on the scripting implementation.

Each script has a small section of memory that it will be able to store values to, my first job of the day was finishing of that data store. A few things were a little difficult to debug as VS2017’s debugger doesn’t show what data is on the other end of a pointer.

After finishing I found out that VS2019 does have this feature so I spent a couple of hours fighting to get that upgraded and playing with Unity. If you try this yourself remember:

  • Don’t let VS install unity again, the version is old. Search for ‘Unity’ in the components section of the installer and find the tools “unity tools for visual studio” component
  • Open Unity’s package manager and make sure “Visual Studio Code Editor” is up to date
  • If VS2019 isn’t appearing in Preferences -> External Tools -> External Script Editor then click browse and find something similar to C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\Common7\IDE\devenv.exe
  • If like me you only use VS for debugging and have a different editor for programming you will need to update omnisharp (or your equivalent) that and probably your editor integration too.
  • Restart everything :p Just for safety

After all this, I hooked up the scripting system so that the board asset arrays now receive the references into Spaghet’s data store. All is working according to plan there.

The next thing I want to do is look at moving Spaghet into its own package so I can share that between TaleWeaver (the asset prep and modding tool) and TaleSpire itself.

That’s gonna take a little coordination with the rest of the team so during that I’m going to have a look at making the Unity component that lets you connect resources to your scripts. Here is the old (very unfinished) one that held the LUA scripts.

script assets

The important part, for now, is the bit marked in red. You need to be able to add resource slots so you can drag in the animator or sound effect you are interested in. Then those resources can be referred to from the script.

Another potentially nicer way to handle it would be have the slots on the state machine nodes themselves.. hmm actually there might be a nice way to do that as the nodes are made with Unity’s imgui elements. I’ll look into that tomorrow.

Whatever approach I use for beta, I’ll then need to look for this element when the asset is first loaded in TaleSpire and submit the scripts to Spaghet.

Slowly slowly all these threads are weaving together.

More tomorrow.

Ciao

Mastodon