The following reasons cause the Remote Connection Server (RCS) performance issues:
- CPU or memory overload
- Job’s auxiliary files caching issues
- Job and worker types per job–permutations
Cause explanation and remedies:
Depending on the number of render nodes on the farm, the RCS host needs to have ample CPU, Memory and network capacity. We recommend running RCS on a separate host which is geographically near to the database machine. We strongly recommend monitoring the machine to keep the utilization of its resources under 100 percent. Plan for a hardware upgrade before the utilization reaches 100 percent.
RCS performance also depends on the number of client machines syncing files to the repository and the file size. When you render with large auxiliary files (example: 1G), the file system calls will occupy all the threads. The Deadline Workers will throw connection errors because file transfer is now happening on all available threads. That is when the network performance and connectivity issues occur. Our recommendation is to use network references to large auxiliary files.
If you have a lot of jobs in queued and rendering state, a lot of Deadline Workers (estimated number ~500), and several groups and pools applied to the jobs, you will have increased “job schedules” loaded in the RCS’s memory. A “job schedule” gets created for every Worker “type”. As a result, RCS will have performance issues or crashes. Examples of performance issues are:
· Workers do not respect job limits: They render the jobs for which they were not in the allow list of the limit
· Worker takes 30+ minutes to respect resource limits applied to the job.
As a remedy, we recommend using more than one RCS on the farm (on different hosts) and use them in load balancing mode. More information on how to run multiple RCS in load balancing mode is here.