https://infosec.exchange/@malwaretech/114903901544041519
the article since there is so much confusion what we are actually talking about https://edition.cnn.com/2025/07/23/politics/fda-ai-elsa-drug-regulation-makary
https://infosec.exchange/@malwaretech/114903901544041519
the article since there is so much confusion what we are actually talking about https://edition.cnn.com/2025/07/23/politics/fda-ai-elsa-drug-regulation-makary
I’m constantly mystified at the huge gap between all these “new model obliterates all benchmarks/passes the bar exam/writes PhD thesis” stories and my actual experience with said model.
Likely those new models are varients trained specifically on the exact material needed to perform those tasks, essentially passing the bar exam as if it were open book.
Reminds me of a video that starts with the fact you can’t convince image generating AI to draw a wine glass filled to the brim. AI is great at replicating the patterns that it has seen and been trained on, like full wine glasses, but it doesn’t actually know why it works or how it works. It doesn’t know the things we humans know intuitively, like “filled to the brim means more liquid than full”. It knows the what but doesn’t get the why.
The same could apply to testing. AI knows how you solve test pages, but wouldn’t be that exact if you were to try adapting it into real life.
The real truth is just that standardized testing fucking sucks and always has