DEVICE/BROWSER INFO
aatventure
What's happening inside an AI model as it thinks? Why are AI models sycophantic, and why do they hallucinate?
Are AI models just "glorified autocompletes", or is something more complicated going on? How do we even study these questions scientifically?
00:00 - Introduction [00:00]
01:37 - The biology of AI models
06:43 - Scientific methods to open the black box
10:35 - Some surprising features inside Claude's mind
20:39 - Can we trust what a model claims it's thinking?
25:17 - Why do AI models hallucinate?
34:15 - AI models planning ahead
38:30 - Why interpretability matters
53:35 - The future of interpretability