66 Comments
User's avatar
Tom Servo's avatar

The short answer is No

Expand full comment
Alan Smithee's avatar

No I doubt there will be any sota model without hallucination anytime in the near future

Expand full comment
Robert Knapp's avatar

I'm no data nerd. But I find Perplexity.ai useful (especially because a single, quite affordable subscription gives access to several different platforms and agents, including Open AI's Deep Research). All of Perplexity's options show their work, which helps check against hallucination.

Expand full comment
brendan's avatar

With some excitement I bought a Perplexity subscription, hoping it had solved the hallucination issue.

It hasn’t. Ask it the same logic or fact based query more than once and its propensity to make shit up increases. That’s been my experience with it anyway.

Expand full comment
The Last Real Calvinist's avatar

I find myself going back to Perplexity myself for my (admittedly limited) AI needs. I also like its interface, although that's a fairly minor factor.

Expand full comment
Guest007's avatar

One is asking the AI to gather a lot of data and then sort it by date, or team, or season. The problem being there is no standards way of dating articles and posts on the internet. If someone makes a change for 2023 data, the AI will read it as 2024 data.

Expand full comment
Frau Katze's avatar

I was thinking that too. Plus many, perhaps most, questions have no time element. So it probably isn’t a high priority for AI developers.

Expand full comment
Thomas Jones's avatar

And it might just be a really hard aspect of the problem - we tend to struggle to understand what is hard, what is easy, for computers/AI.

Expand full comment
Evan's avatar

Wolfram Alpha

Expand full comment
Based Money's avatar

You might have better luck getting it to code a program that downloads the data for you. Scraping data is fairly simple. Coding seems to be one thing it is good at, though still a process to correct for errors in the code. I've had more success with Grok than ChatGPT.

Expand full comment
Erik's avatar

You have just learned what everyone learns when they try to use the GPTs for work. What you figure out eventually is that you need to experiment and figure out what it is good at and how to exploit it. Like twenty years ago when you needed to develop Google-fu.

All the AIs you mentioned are doing the same thing chatGPT is, basically word completion at scale. They each have some tricks to make it better for certain common tasks but not for what you asked.

If someone on the internet had asked and answered the same question any of them would do a bang up job.

You will get better results if you can feed it a small data set in your prompt and ask a question about that.

For the question you asked, if it hallucinated on this once I don't see how it would ever get it right. OTOH sometimes little changes affect the accuracy. For example you might ask it to present the data ordered alphabetically by team and suddenly it gets it right.

Usually I find after all the prompt fiddling it's better to do the whole thing myself.

Expand full comment
m droy's avatar

"All the AIs you mentioned are doing the same thing chatGPT is, basically word completion at scale. "

Well yes - which can be interpreted 2 ways.

-- perhaps AI is very limited.

-- or perhaps all Humans are also just word completion specialists.

I know a lot of humans who are just word completion specialists. They are expert on things like is Julia Roberts an Actress or an Actor.

They can't think but they seem to do very well in life.

Expand full comment
Erik's avatar

I've made this point in other threads. In this case it helps explains why GPT sort of gets the charts right but messes up easy things like which team a player was on...but only occasionally:)

Expand full comment
m droy's avatar

Sure but don't get carried away by the pure chat models which dream up a long winded answer.

The real skill of AI is in another language completely - coding. It speaks Python or C++ as fluently as it speaks French or Chinese. That is what makes is super important, I can code things in 5 days I could never have coded before.

An Expert can code things in 50 mins things that they would have taken 5 days to do.

And of

Expand full comment
Erik's avatar

Yep as a very senior developer I find it amazing at doing things I don't feel like and also coming quickly to the solution appropriate to the API or library I am using.

Expand full comment
m droy's avatar

As a very amateur user of python I find it rapidly takes me to places that are well beyond my understanding. But yes it does the stuff I could do myself slowly really fast. It 100% is going to help the smart gifted coders - nothing could be less democratic than the impact of AI.

Expand full comment
Erik's avatar

I mostly use Java. I was helping a friend out and needed to write a comic reader that could do 3d. It was obvious to me that I would need to do it in Javascript (or TypeScript) and that I would need to learn at least one 3d graphics library for JavaScript. Previously I had never programmed in the language.

Because I know what I would do in Java, I was able to chatGPT bang it out in less than a month. That is, I wrote a kind of sophisticated program in a language I had never used with two libraries I had never used. As you say.

Honestly the only thing holding back excellent programmers is...management.

Expand full comment
Thomas Markham's avatar

Have you tried the Deep Research mode in ChatGPT? You have to have a paid plan to access, but it does better than the basic chat mode.

Expand full comment
Nathan Cofnas's avatar

Most people are unwilling or unable to fact check what chatbots say, so they come away super impressed and worrying that the bot is going to outsmart us and turn us into paper clips. In reality, chatbots are glorified autocomplete with no ability to reason or know the difference between truth and falsity. Helpful for writing doggerel or doing Internet searches, but not serious research.

Expand full comment
Approved Posture's avatar

I use it to manipulate data in bulk I’ve already accessed.

So I paste in GDP for every year since 1960 and ask it how much it grew between 1980 and 2010 or whatever. It’s good at that.

I find it hallucinates a lot when asked to produce actual statistics even when a single authoritative source exists.

Expand full comment
Steve Sailer's avatar

I pasted in 1240 rows of data from Baseball Reference (top 10 players by WAR from 1901-2024) and asked to add a column designating each player as pitcher or position. It did this accurately for 2024 and stopped. When I asked it to do the whole thing, it did it to 2023. When I asked it to to do all 1240 rows from 2024 to 1901 it did 2024 and 1901 but skipped in-between. And so on...

Expand full comment
Slaw's avatar

Another tech trend that starts with a lot of hype and then quietly recedes.

Expand full comment
Dorkwad's avatar

Modern generative text AI has two basic operations: expansion and compression. AI can expand a small thing into a plausible big thing, or it can compress a big thing into an analyzed small thing. AI has very different behavior from a typical computer. Typical computers are 1:1, input to predictable output. AI's are fuzzy and unpredictable. This is part of the reason why some wizened greybeard programmers are skeptical of AI. These men understand and love computers. And when AI is judged as a typical computer, it scores poorly. It's better judged as a compressor and expander.

Expand full comment
m droy's avatar

Short answer is you are asking too much (or asking the wrong person)

Think of the ChatGPT and similar models as a very enthusiastic assistant. They want to complete whatever task you give them. They will "hallucinate" fake answers just to keep you happy.

You need a smart team manager to direct them carefully.

Basically people of normal intelligence can only get so much out of AI. Smart people can get 10x as much. Smart Chinese ( or anyone with IQ>140 and career invested in AI) can get 1000x.

So China wins.

(but the US is making a Hail Mary play is to reach AGI - AI that is as clever as a Smart Chinaman to do that job. They are praying that 2 barriers to entry will save them - high tech chips and huge capital requirements - Deepseek shows this is not working)

A smart AI user would target the websites of those who have the data, Ask the AI to write the Python code to scrape all the data you need into an Excel workbook, And a bit more code to update it whenever he re-runs the code. With carefully considered prompting there is AI that will do this for you, with a little error correcting (copy paset the error response back to the AI)

You could learn to do this yourself (in about a year from scratch - I assume Steve is Smart). Or last year you could have paid someone $1000 to do this for you. Now they can do it in 90 mins and charge $100,

Btw the correct "prompt" to a real over-enthusiastic assistant would be "can you find a ready prepared list of the top 10...... - Use Google"

The correct prompt to a real extremely good team manager is "Find a reliable root source and organise your team of entusiastic idiots so that they collect all this data I require going back 50 years and know how to update that data whenever I ask".

Expand full comment
Dorkwad's avatar

Also Steve, the AI's all know who you are. The companies have gotten better at reducing bias in their chatbots, so you can directly ask it how it can help you, Steve Sailer. For example:

----

Me:

Steve Sailer wants to use an AI for his work. How can AI help him? He needs accuracy. Hallucinations are fatal. So the AI is probably best to write programs for him, right? What programs? I'm thinking small scripts that you can mostly write one shot. He doesn't have time to learn programming in depth.

ChatGPT:

Yeah, small one-shot scripts are the right level. If hallucinations are fatal, then the AI shouldn't be used to generate facts or analysis directly—just tools. So the goal is: use AI to offload mechanical tasks, not judgment.

Here’s where AI-written programs could help someone like Sailer:

---

1. Text Analysis / Search Tools

These can be built once and reused indefinitely.

Keyword highlighters: Highlight specified terms in a document or website.

Concordance tools: Find all instances of a phrase across a corpus and show context.

Frequency counters: Count and rank word/phrase usage in documents.

Diff scripts: Compare versions of an article or webpage and highlight what's changed.

---

2. Web Automation

Useful for gathering structured data or monitoring sources.

Web scraping:

Pull census data, public health tables, or academic metadata.

Monitor specific websites for updates or changes.

RSS filtering:

Create a filter that forwards only items matching specified keywords.

Archiving:

Auto-save specified URLs to the Wayback Machine or local disk.

---

3. Data Cleanup and Reformatting

Make raw tables usable for Excel or R.

Convert PDFs to clean CSVs.

Remove footnotes, normalize weird Excel formats.

Detect and fix malformed data (dates, missing headers, etc.).

---

4. Spreadsheet Automation

Scripts that process data in consistent, pre-defined ways.

Generate pivot tables from raw data.

Cross-tabulate race vs. income vs. geography.

Apply specific formulas or conditional formatting automatically.

---

5. Visualization

Simple static plots (bar charts, maps) that avoid human charting overhead.

Plot population distributions.

Auto-generate line charts from public datasets.

Highlight outliers or percentiles.

---

6. Local Search Tools

If he has a big archive (e.g. PDFs, old blog posts, census tables):

Build a local full-text search with keyword ranking.

Tag and organize documents based on metadata.

---

If he doesn’t want to learn code at all, AI can generate small Python scripts on request—he just runs them. Ideally, he’d use something like:

Python + Jupyter Notebooks for one-off scripts (AI can generate these)

Streamlit or Gradio for simple interfaces (if needed)

He shouldn’t aim for general-purpose tools. Stick to tightly-scoped, single-use tasks: “I want to extract this table from this site and turn it into this format.”

---

Expand full comment
Pablo Percentil's avatar

I was going to say something like this. I know how to program but GPT saves programming time and can potentially be very useful for web-scraping and other tasks. There's an author called Stephens-Davidowitz, whose books, despite being left leaning, have a "Sailerian" flavor and he used IA extensively in one of the last ones.

Expand full comment
PE Bird's avatar

Would be interesting if after initial data collection you ask the AI (or a different AI) to fact check it against several sources. It seems that every query made by every individual in the world is used to "train" AI. Should probably expect the output to be sketchy.

Expand full comment
Derek Leaberry's avatar

Does football, hockey and basketball have WAR grades? Or is it just for baseball junkies, the brainiest of all sports fans? George Will says so.

Expand full comment
Steve Sailer's avatar

Other sports have pretty similar synthetic measures of all-around value. For example, about 11 years ago 538 announced their ultimate soccer stat. It turned out ... drumroll please ... that #2 in the world was Ronaldo and #1 was Messi.

But, keep in mind that WAR stats, even in baseball are slightly arbitrary. Baseball Reference and Fan Graphs have different ways to calculate WAR, so their results are known as bWAR and fWAR.

Expand full comment
ScarletNumber's avatar

The reason I don't like WAR is the positional adjustment; I feel it is unfair to first basemen and corner outfielders, especially the former. It isn't Don Mattingly's or Keith Hernandez's fault they were born lefty, which limits them to playing first if they want to play the infield. Also, the R stands for replacement, so the person replacing them would have to play first, because SOMEONE has to play first!

Expand full comment