Definition #
AI systems learning optimal actions through environmental feedback.
Key Characteristics #
- Reward/punishment system
- Exploration vs exploitation
- Markov Decision Processes
Why It Matters #
Used to train game AIs (e.g., AlphaGo) and robotics.
Examples #
- Q-Learning
- Deep Q-Networks (DQN)