Missing permissions to use the Spot Event Plugin (SEP) Configuration UI
In the Deadline Monitor, when opening the Spot Event Configuration Utility panel under Scrips > Configurations, the SEP Utility UI can fail with missing Identity and Access Management (IAM) permission on AWS Spot Event Plugin User. You would see the below error in the Monitor:
An error occurred (AccessDenied) when calling the ListRoles operation: User: arn:aws:iam:::user/DeadlineSpotEventPluginAdmin is not authorized to perform: iam:ListRoles on resource: arn:aws:iam:::role/
The above error is caused by missing IAM permission in the AWSThinkboxDeadlineSpotEventPluginAdminPolicy that is required to run the Spot Event Configuration Utility UI. If you are using your own IAM policy with Spot Event Plugin then you would need to add the roles listed below to your own policy. These IAM roles are already added to the AWS managed IAM policy “AWSThinkboxDeadlineSpotEventPluginAdminPolicy“, which need to be added as a part of setup.
iam:CreateRole (If you don't already have the aws-ec2-spot-fleet-tagging-role)
Spot Event Worker picking up the job from the wrong groups and pools
Spot Event Plugin JSON configuration has multiple groups associated with different Amazon Machine Images (AMIs), each group typically used for handling different jobs (e.g. Maya, python, Houdini, etc.). There is a possibility that once a Worker finishes a job and gets terminated (by idle shutdown) the ipv4 private IP gets reassigned to a different Worker. Since the Worker's name isn't cleared from Deadline Database, the Worker rejoins the farm and Spot Event Plugin doesn't assign a new group based on the instance tags.
To solve this, you can enable "Delete SEP terminated Workers" so that it clears the Workers from the Database once the instance is terminated. Note that this would also delete the reports for the terminated Worker. A second workaround to this issue is to increase the Ipv4 CIDR block range on the Worker’s VPC.
Spot Event Plugin: Workers Self-Terminate after Completing a Single Task
The Spot Event Plugin assumes all Workers are operating in UTC. It uses the time reported by the Worker to calculate in minutes for how long since the Worker performed work to determine idle time. Any time-zone difference will cause the idle time calculated to be higher than actual. So, make sure that your AMI will use UTC time-zone.
Why did my Spot Fleet get terminated?
Spot Fleet can get terminated in Deadline and enter an unhealthy state in the Deadline Monitor. Resource Tracker service monitors the heartbeats of compute resources running on AWS, and terminates them if they don't report back, saving you from overspending on idle resources.
We have a separate article to explain this and help you Troubleshoot Resource Tracker.
If you are unable to solve the issue listed or run into a different issue with Spot Event Plugin, please reach out to us by cutting a ticket or calling us if it is urgent. Our contact information is here. Don't forget to share the Remote Connection Server logs from the application logs folder of Deadline to help us investigate.