Q Learning Algorithm for Tic Tac Toe

By jacksparrow September 9, 2024

Introduction

Q-learning is a reinforcement learning algorithm used to train agents to make optimal decisions in various environments. This article will explore how Q-learning can be applied to the classic game of Tic Tac Toe. We will discuss the implementation details, the Q-table, and the learning process.

Understanding the Game

Tic Tac Toe is a two-player game where the objective is to get three of your symbols (X or O) in a row, column, or diagonal. The game is simple but provides a suitable environment for applying reinforcement learning techniques.

The Q-Table

The core of Q-learning is the Q-table, which stores the expected future rewards for taking specific actions in different states. In Tic Tac Toe:

**States:** Each possible board configuration is a state.
**Actions:** The possible moves (placing an X or O) in each state.
**Q-Values:** The table contains Q-values for each (state, action) pair, representing the estimated reward for taking that action in that state.

Implementation Steps

Initialize Q-table: Start with a table filled with zeros or random values.
Play Games: Have the agent play multiple games against a random opponent or another agent.
Update Q-values: After each move, update the Q-value for the chosen action based on the following formula:

Q(s, a) = Q(s, a) + α [r + γ max(Q(s', a')) - Q(s, a)]

Where:

s: Current state (board configuration)
a: Action taken (move)
r: Reward for taking the action (win: +1, lose: -1, draw: 0)
s’: Next state after taking the action
a’: Optimal action in the next state
α: Learning rate (controls how much to update Q-values)
γ: Discount factor (controls how much future rewards are valued)

Choose Actions: The agent selects actions using an epsilon-greedy strategy:

With probability ε, choose a random action (exploration)
With probability 1-ε, choose the action with the highest Q-value for the current state (exploitation)

Train the Agent: Repeat steps 2-4 for many games. As the agent learns, its Q-values improve, leading to better decision-making.

Example Q-Table (Partial)

State (Board)	Action (Move)	Q-Value
\| X \| \| \| \| \| \| \| O \|	Place X at (1, 1)	0.75
\| X \| \| \| \| \| \| \| O \|	Place X at (1, 2)	-0.25
\| \| O \| \| X \| \| \| \| \|	Place O at (1, 1)	0.80

Output

The trained agent will learn to play Tic Tac Toe optimally, consistently winning or drawing against a random opponent.

 Game 1: X wins! Game 2: Draw! Game 3: O wins! ... Game 100: X wins!

Conclusion

Q-learning provides a powerful framework for teaching an agent to play Tic Tac Toe effectively. By training on many games and updating its Q-values, the agent learns to make optimal moves, maximizing its chances of winning. This approach can be adapted to other games and complex environments, demonstrating the versatility of reinforcement learning.

Post Views: 8

Q Learning Algorithm for Tic Tac Toe