Why Predicting the Future Is Easier Than Explaining the Past (for AI)
Humans have an "explanation-first" bias. If we see a car crash, we immediately look for the cause: a distracted driver, a slick road, or a mechanical failure. Artificial Intelligence, however, is "correlation-first." An AI can predict with $99\%$ accuracy that a crash is about to happen based on sensor data, yet it cannot tell you why it happened afterward.
This isn't a glitch; it is a direct result of how we architect neural networks.
1. Prediction is just High-Dimensional Curve Fitting
When we say an AI "predicts the future," we are usually talking about Next-Token Prediction (in LLMs) or Time-Series Forecasting.
Take a Large Language Model (LLM) like GPT-4. It is trained on a "loss function" called Cross-Entropy Loss. Its only job is to minimize the difference between its predicted probability distribution and the actual next word in a sequence.
The Concrete Example: If a model sees the sequence "The Federal Reserve raised interest rates, so the stock market...", it predicts "fell" because that word statistically follows that sequence in its training data.
The Vague Trap: The model doesn't need to understand macroeconomics to get the answer right. It only needs to recognize the statistical weights of those words appearing together. It is fitting a curve to a cloud of data points.
2. Explanation Requires the "Ladder of Causation"
To explain the past, a system must move beyond observation to intervention. Computer scientist Judea Pearl describes this as the "Ladder of Causation."
Rung 1 (Association): "If I see , how likely is Y?" (AI is great at this).
Rung 2 (Intervention): "What if I do X?"
Rung 3 (Counterfactuals): "What if I had acted differently?"
Modern AI is stuck on Rung 1. To explain why a medical AI recommended a specific treatment, it would need to simulate a counterfactual: "Would the patient have recovered if I had not given this drug?" Since the AI only sees the data of what did happen, it cannot mathematically calculate what didn't happen without a predefined causal model.
3. The Math of "Identifiability"
In your math classes, you know that if you have the equation x + y = 10, you cannot "identify" the values of x and y without more information. This is the Identifiability Problem in AI.
In a dataset, a "High Grade" Z might be correlated with "Early Study Sessions" X and "High Intelligence" Y.
Prediction: If I see X, I predict Z. (Works every time).
Explanation: Did X cause Z, or did Y cause both X and Z?
The data looks identical in both scenarios. A standard neural network will assign a high feature importance weight to X, but it cannot tell you if X is a "reason" or just a "symptom."
4. The Risk of "Distribution Shift"
Why does this distinction matter for students and scientists? Because of Distribution Shift.
Imagine a self-driving car trained in the sunny streets of California. It predicts that "Green Light = Move" because it has seen it 10 million times. However, if it encounters a green light in a flooded city where the road is gone, the "prediction" (Move) remains the same, but the "explanation" (The road is safe) is now false.
Because the AI lacks a causal explanation of why it is safe to move, it cannot adapt when the environment changes. It is a "brittle" intelligence.
5. Closing the Gap: Causal AI
The next frontier isn't "bigger" models, but Causal Representation Learning. This involves:
Directed Acyclic Graphs (DAGs): Manually or automatically mapping out "what causes what" before the model starts learning.
Symbolic Regression: Using mathematical symbols rather than "black box" weights to ensure the logic is traceable.
In conclusion, the "backwards" nature of AI intelligence stems from a fundamental mathematical trade-off: modern models prioritize minimizing statistical loss over maximizing causal meaning. While humans use narrative and counterfactual reasoning to navigate the world, AI utilizes high-dimensional curve fitting to calculate P(Y | X), mapping associations without ever grasping the underlying "why." This creates a "black box" effect where prediction is computationally efficient and uncannily accurate, but true explanation remains functionally out of reach. As we integrate AI into high-stakes fields like medicine or social policy, the primary challenge is to move beyond mere pattern continuation toward Causal AI—architecting systems that don't just tell us what will happen next, but understand the mechanisms that drive our reality.
BIBLIOGRAPHY
https://ftp.cs.ucla.edu/pub/stat_ser/r483.pdf
https://arxiv.org/abs/2206.15475
https://plato.stanford.edu/entries/causal-models/
https://arxiv.org/abs/2108.07258
https://towardsdatascience.com/understanding-dataset-shift-f2a5c3d2dc02