Advanced Features: Replication and Sharding (May 25, 2017) – AWS Thinkbox Help Centre

Version: Deadline 9.0

OVERVIEW

In today's entry we are going to delve into a couple of advanced Database features that can improve robustness and performance, especially for large farms. The first topic we will explore is adding redundancy to a database through replication. We will then build upon that to look at sharding, a scalable partitioning strategy used to improve performance of a database that has to handle a large number of operations from different clients.

Both of these features are supported by MongoDB ® and are relatively painless to set up with Deadline. If you are already a Deadline user, you can even set up replication and sharding with an existing installation of the Deadline Repository and Database. In this blog post, we are going to give some background on what each of these features can do for your farm and provide an overview of what's involved in setting it up with an existing database installation.

REPLICATION

Replication refers to a database setup in which several copies of the same dataset are hosted on separate machines. The main reason to have replication is redundancy. If a single database host machine goes down, recovery is quick since one of the other machines hosting a replica of the same database can take over. A quick fail-over to a secondary machine minimizes downtime, and keeping an active copy of the database acts as a backup to minimize loss of data.

In MongoDB, replication is handled through something called a Replica Set: a group of MongoDB server instances that are all maintaining the same data set. One of those instances will be elected as the primary, and will receive all write operations from client applications. The primary records all changes in its operation log, and the secondaries asynchronously apply those changes to their own copies of the database. If the machine hosting the primary goes down, a new primary is elected from the remaining instances.

ca020eaa6664d434ee12e7ff1f83205cef8e1ffb_replication-and-sharding-1.avif

By default, client applications read only from the primary. However, you can take advantage of replication to increase data availability by allowing clients to read data from secondaries. This is done in MongoDB by specifying a read preference. Keep in mind that since the replication of data happens asynchronously, data read from a secondary may not reflect the current state of the data on the primary.

CREATING A REPLICA SET

A tutorial explaining how to create a Replica Set is included in the Deadline documentation's Creating a Replica Set page. Briefly, the steps to create a Replica Set on an existing Deadline Repository are as follows:

Install the MongoDB binaries to each additional machine that will be hosting the database.
Modify the database configuration file to use a given Replica Set name, and copy that configuration file onto each database machine.
Start the MongoDB service on each of the new machines, and restart the service on the original one.
Run a command in the MongoDB shell to initiate the Replica Set.
Update the Deadline Repository's database settings to recognize the Replica Set. This can be done with one call to Deadline Command.

DATABASE SHARDING

For larger render farms, scaling becomes a key performance issue. Having a large number of clients performing high-throughput operations can really test the limits of a single database instance. Sharding is a strategy that can mitigate this by distributing the database data across multiple machines. It is essentially a way to perform load balancing by routing operations to different database servers.

To use sharding in MongoDB we must configure a Shard Cluster, which consists of the following components:

Shards: Each shard contains a portion of the complete database dataset. Each shard is a replica set, meaning that you can think of each shard as being a group of machines that each maintain their own copy of this portion of data.
Config Servers: The config servers contain settings and metadata about the shard cluster. Since this information is crucial to the operation of the cluster, the config servers are also deployed as a replica set.
Routers: Routers are instances of a MongoDB service called mongos that acts as the interface between clients and the sharded cluster. The routers are responsible for forwarding database operations to the correct shard.

34c8173bf8f9b7c1b27b35fdd144a87b6e82f48e_replication-and-sharding-2.avif

CREATING A SHARD CLUSTER

The Deadline documentation's Creating a Shard Cluster page contains a tutorial about creating a Shard Cluster. The steps to create a Shard Cluster on an existing Deadline Repository are as follows:

Create a Replica Set for the Config Servers, as well as a Replica Set for each shard.
Run at least one instance of the mongos application, connecting it to the Config Server replica set.
Add the shards to the Shard Cluster through the MongoDB console.
Enable sharding on the database.
Configure how each collection in the database will be sharded (more on this below).
Update the Deadline Repository settings to connect to the router (mongos) servers.

SHARDING THE DATABASE COLLECTIONS

A Shard Cluster requires the database data to be distributed among the shards in the cluster. There are many ways to decide how the data should be split up, and this choice can have significant performance implications. The method by which the data is distributed is called a sharding strategy. A bad sharding strategy can lead to one shard getting much more load than the others, or client application having to query several shards if it needs data from linked collections.

The tutorial in the documentation provides a reasonable sharding strategy to start with. It suggests a hashed sharding strategy, using the ID as the shard key. This will essentially distribute all records uniformly throughout the shards, and is a good scheme to use if your data has no underlying structure. This is fine for the Jobs collection, but as the documentation suggests, it would be a bad choice for the JobTasks collection.

The JobTasks collection contains all tasks for all jobs. It is a pretty common operation to fetch all of the tasks belonging to a certain job. If all of the tasks are distributed uniformly throughout the shards, a single call to this operation may end up sending database queries to several shards. It would be better if all tasks belonging to a given job were found on a single shard. Hence, it would be better to set the shard key here to the JobID field (which identifies which job a task belongs to).

The sharding strategy for different collections can be adjusted according to your own farm's particularities. For example, if your farm is distributed across a handful of regions across the world, you may want to split the Jobs collection such that each job resides on a shard in the same region as the node that will be rendering it. This can be done by choosing an appropriate shard key (such as the Group field) and assigning zones to each shard. Sharding strategies are discussed in more detail on the MongoDB documentation page about Sharding.

CONCLUSION

I hope you enjoyed this brief overview of replication and sharding, and learned a little about how to set it up in Deadline. I encourage you to consider whether your farm could benefit from either of these features. If you are so inclined, check out the Advanced Database section in the Deadline documentation and try it out for yourself!

MongoDB is a registered trademark of MongoDB, Inc.