I have now read so many "ChatGPT can do X job better than workers" papers, and I don't think that I've ever found one that wasn't at least flawed if not complete bunk once I went through the actual paper. I wrote about this a year ago, and I've since done the occasional follow-up on specific articles, including an official response to one of the most dishonest published papers that I've ever read that just itself passed peer review and is awaiting publication.
That academics are still "bench-marking" ChatGPT like this, a full year after I wrote that, is genuinely astounding to me on so many levels. I don't even have anything left to say about it at this point. At least fewer of them are now purposefully designing their experiments to conclude that AI is awesome, and are coming to the obvious conclusion that ChatGPT cannot actually replace doctors, because of course it can't.
This is my favorite one of these ChatGPT-as-doctor studies to date. It concluded that "GPT-4 ranked higher than the majority of physicians" on their exams. In reality, it actually can't do the exam, so the researchers made a special, ChatGPT-friendly version of the exam for the sole purpose of concluding that ChatGPT is better than humans.
Because GPT models cannot interpret images, questions including imaging analysis, such as those related to ultrasound, electrocardiography, x-ray, magnetic resonance, computed tomography, and positron emission tomography/computed tomography imaging, were excluded.
Just a bunch of serious doctors at serious hospitals showing their whole ass.
Journalists actually have very weird and, I would argue, self-serving standards about linking. Let me copy paste from an email that I got from a journalist when I emailed them about relying on my work but not actually citing it:
In my opinion, this is a clever way to legitimize passing off research as your own, which is definitely what they did, up to and including repeating some very minor errors that I made.
I feel similarly about journalistic ethics for not paying sources. That's a great way to make sure that all your sources are think tank funded people who are paid to have opinions that align with their funding, which is exactly what happens. I understand that paying people would introduce challenges, but that's a normal challenge that the rest of us have to deal with every fucking time we hire someone. Journalists love to act like people coming forth claiming that they can do X or tell them about Y is some unique problem that they face, when in reality it's just what every single hiring process exists to sort out.