For CSC 570: AI and Games, I decided to continue working on the Catan project I had started in CSC 480. My roommates Aiden and Jacob were in the class with me, and we got two other partners who wanted to work on Catan as well, so we had 5 people total.
By the end of 480, we had a pretty solid Catan implementation and some heuristic agents to play the game. For 570, the goal was to add all the missing Catan features, add reinforcement learning agents, and add human players. We did all that, and described it in more detail in the paper you can download from this page. We have a pretty interesting multimodal neural network architecture that combines a CNN with an MLP to process the game state and output an action for the RL agent to take.
We used a DQN policy for our RL agents, with the heuristic agent acting as the "exploration" portion of the agent's epsilon greedy strategy. By the end of 570, we thought the DQN was performing as well as the heuristic agents, and say as much in our paper. At the time, we did not know this was not true, as the wins it was getting were just from falling back on the heuristic agent in epsilon greedy. The agent was not actually learning anything at this time.
With our false belief that our agents were working reasonably we well, 4 of us continued working on the project spring quarter as part of CSC 500: Directed Study. One of our partners attempted to recreate our model in another Catan implementation we found online called Catanatron, while my roommates and I cleaned up our own codebase and tried replacing the CNN of our model with a GNN, which we thought might perform better since our Catan representation was already a graph.
Around week 9 or 10 of spring quarter, we realized that our agents hadn't been learning anything, and we tried fixing them before the end of the quarter to no avail. We did fix a lot of bugs and we improved our codebase heavily, but we have yet to get the agents to perform well against the heuristic agents or even agents that choose random actions. We are planning to continue working on it.