The Witchcraft of AI: Understanding AlphaGo’s Underpinning Literacy and the Price System
I still flash back the chilly spring of 2016. Like millions of others, I sat fused to my screen, watching a black-and-white board game that's over 2,500 times old. The tourney was fabulous: Lee Sedol, the 18-time world champion of Go, against AlphaGo, a machine.
At that time, numerous experts believed a computer could not master Go for another decade. But when AlphaGo played "Move 37"—a move so unorthodox that observers called it a mistake—I realized we were witnessing a new form of intelligence. This was a system that had "learned" how to win through Underpinning Literacy (Reinforcement Learning).
Table of Contents
1. What's Underpinning Literacy? (The Mortal Perspective)
2. The Core of the Genius: The Price System Explained
3. The Architecture of Victory: Policy vs. Value Networks
4. AlphaGo Zero: The Power of Tabula Rasa (Blank Slate)
5. Particular Reflections: What AI Teaches Us About Growth
6. Conclusion: Navigating an AI-Stoked Unborn
1. What's Underpinning Literacy? (The Mortal Perspective)
To understand underpinning literacy (Reinforcement Learning), think about how you learned to ride a bike. You sat on the seat, pedaled, fell over, felt the pain (Negative Price), and tried again. When you balanced, the rush of success (Positive Price) told your brain: "Whatever you just did, do it again."
In AI terms, it is a computational approach to learning through commerce:
The Agent: The "pupil" (AlphaGo).
The Environment: The "world" it lives in (The Go board).
The Action: The moves it makes.
The Price (Reward): The feedback it gets.
2. The Core of the Genius: The Price System Explained
For AlphaGo, the price system was double and ruthless. It concentrated on a single fine probability: The Win Rate.
The Mathematics of "The Win"
In underpinning literacy, the thing of the agent is to maximize the Accretive Discounted Price. We use the following formula to represent the anticipated return:
In this equation:
represents the immediate price (reward).
(gamma) is the reduction factor (generally between 0 and 1).
Why use ? Because it forces the AI to value immediate earnings while still considering the long-term thing. AlphaGo would willingly immolate its own pieces—a "short-term loss"—if its computations showed that the move increased its "long-term price" (the final win) by even 0.1%.
3. The Architecture of Victory: Policy Networks vs. Value Networks
AlphaGo’s "suspicion" comes from two distinct neural networks working in harmony:
1. *The Policy Network (The "Move" Chooser): It narrows down the 361 possible moves to the top 5 or 6 campaigners. This prevents the computer from wasting time calculating useless moves.
2. The Value Network (The "Judge"): It predicts the winner without playing to the end, assigning a score between -1 (Certain Loss) and 1 (Certain Win).
By combining these, AlphaGo uses the Monte Carlo Tree Search (MCTS). It’s a perfect marriage of "gut feeling" (Policy) and "deep computation" (Value).
4. AlphaGo Zero: The Power of Tabula Rasa (Blank Slate)
AlphaGo Zero started with zero mortal data. It only knew the rules of the game. It played against itself millions of times.
Originally, it played aimlessly. But through the underpinning learning circle, it discovered "Joseki" (standard moves) that humans took 3,000 times to perfect. Then, it did commodity frighting: it abandoned mortal Joseki and created its own. It proved that mortal knowledge might be a "original optimum"—a small hill in a world of much advanced mountains.
5. Particular Reflections: What AI Teaches Us
1. Iterative Failure is Essential: AlphaGo lost millions of games to itself. In RL, a "loss" is just a data point that refines the price function.
2. The Danger of "Expert" Bias: We occasionally limit ourselves by following "the way effects have always been done." True invention happens when we find our own "prices."
3. Effectiveness over Pride: AlphaGo doesn't care if it wins by 0.5 points or 50 points. It chooses the path of least threat.
6. Conclusion: Navigating an AI-Stoked Unborn
Underpinning literacy is no longer just about board games. It's optimizing energy grids, discovering new protein structures for drug, and tutoring robots how to walk.
Understanding the "Price System" of AI is pivotal because, ultimately, we are the bones who define those prices. As we integrate AI into society, we must insure that the "prices" we set align with mortal values: safety, ethics, and sustainability.