Q Learning Algorithm for Tic Tac Toe

Q Learning Algorithm for Tic Tac Toe

Introduction

Q-learning is a reinforcement learning algorithm used to train agents to make optimal decisions in various environments. This article will explore how Q-learning can be applied to the classic game of Tic Tac Toe. We will discuss the implementation details, the Q-table, and the learning process.

Understanding the Game

Tic Tac Toe is a two-player game where the objective is to get three of your symbols (X or O) in a row, column, or diagonal. The game is simple but provides a suitable environment for applying reinforcement learning techniques.

The Q-Table

The core of Q-learning is the Q-table, which stores the expected future rewards for taking specific actions in different states. In Tic Tac Toe:

  • **States:** Each possible board configuration is a state.
  • **Actions:** The possible moves (placing an X or O) in each state.
  • **Q-Values:** The table contains Q-values for each (state, action) pair, representing the estimated reward for taking that action in that state.

Implementation Steps

  1. Initialize Q-table: Start with a table filled with zeros or random values.
  2. Play Games: Have the agent play multiple games against a random opponent or another agent.
  3. Update Q-values: After each move, update the Q-value for the chosen action based on the following formula:
  4. Q(s, a) = Q(s, a) + α [r + γ max(Q(s', a')) - Q(s, a)]

    Where:

    • s: Current state (board configuration)
    • a: Action taken (move)
    • r: Reward for taking the action (win: +1, lose: -1, draw: 0)
    • s’: Next state after taking the action
    • a’: Optimal action in the next state
    • α: Learning rate (controls how much to update Q-values)
    • γ: Discount factor (controls how much future rewards are valued)
  5. Choose Actions: The agent selects actions using an epsilon-greedy strategy:
    • With probability ε, choose a random action (exploration)
    • With probability 1-ε, choose the action with the highest Q-value for the current state (exploitation)
  6. Train the Agent: Repeat steps 2-4 for many games. As the agent learns, its Q-values improve, leading to better decision-making.

Example Q-Table (Partial)

State (Board) Action (Move) Q-Value
| X | |
| | |
| | O |
Place X at (1, 1) 0.75
| X | |
| | |
| | O |
Place X at (1, 2) -0.25
| | O |
| X | |
| | |
Place O at (1, 1) 0.80

Output

The trained agent will learn to play Tic Tac Toe optimally, consistently winning or drawing against a random opponent.

 Game 1: X wins! Game 2: Draw! Game 3: O wins! ... Game 100: X wins! 

Conclusion

Q-learning provides a powerful framework for teaching an agent to play Tic Tac Toe effectively. By training on many games and updating its Q-values, the agent learns to make optimal moves, maximizing its chances of winning. This approach can be adapted to other games and complex environments, demonstrating the versatility of reinforcement learning.

Leave a Reply

Your email address will not be published. Required fields are marked *