A question about AI

Is there some subscription AI service that won't just make up wrong data?

Apr 01, 2025

I’m finally fooling around with artificial intelligence, namely ChatGPT 4o. Its parent company, OpenAI, was yesterday valued at $300 billion, so it ought to be pretty trustworthy, right?

My experience with primitive AI goes back to 1985, but I finally decided it wasn’t going to be big in my lifetime.

Was I ever wrong.

On the other hand …

It’s pretty amazing these days at understanding what you type and responding with readable, coherent prose.

It seems useful for writing bureaucratic texts, something, fortunately, that I don’t have to do very often.

Still, I’m not terribly interested in machine-generated prose. I’ve always been pretty interested in different intellectual voices, the way most people over the last century like the fact that there are different singing voices.

But, what I want to use it for instead is scraping large amounts of data for my statistical analyses. I’ll do the writing if it can conveniently organize the data into a format I can paste into a spreadsheet.

I’m an old guy from the era when coding, like SAS (which I used in 1981-82), was going out of fashion and the future was spreadsheets. Then, decades later, the future reversed itself and everybody who wanted to work with data had to learn R or the like. But I’m too old to put in the effort to learn to code again in a new language. So, I was hoping to have AI do the boring work of organizing data into formats that I can paste into a spreadsheet for doing my analyses.

For example, to answer the question of whether pitching is really 90% of baseball, I started out by asking ChatGPT 4o to list the top ten baseball players in 2024 by Baseball Reference’s Wins Above Replacement metric. It responded:

But then I noticed that it got two of the teams wrong: Chris Sale pitched for the Boston Red Sox in the American League in 2023, but in 2024, he won the Cy Young Award for the Atlanta Braves in the National League. Similarly, Juan Soto played for the San Diego Padres in the NL in 2023, but in 2024 he went to the World Series with the New York Yankees of the AL.

Those are not obscure facts.

Admittedly, the great majority of people are not interested in baseball statistics. But, still, there are probably 8 digits worth of people who are, perhaps more than any other specific type of data.

And, looking at the table some more, it’s clear that ChatGPT completely botched the bottom 6 of the top ten. Soto, for instance, did not have 5.9 WAR, he had 7.9 WAR. Here’s Baseball Reference’s top 10 players, pitcher or position, for 2024:

So, ChatGPT 4o skipped the bottom six of the actual MLB top ten, then put in a bunch of pitchers, and made up a lower WAR for Soto, who is perhaps the highest paid player in baseball this year.

And it made a bunch of more minor errors like pitcher Tarik Skubal had, according to Baseball Reference, a 6.4 WAR instead of a 6.3 WAR.

And it may have made up WARs to the second decimal place for four players, since BR only reports them to one decimal place.

Hence, for my purposes, ChatGPT seems pretty useless. I’d have to check each data point over by hand, which would be slower than the way I’ve been getting data off the Internet since the previous millennium.

This is not to say that ChatGPT 4o is useless for other things. But that’s an unacceptably low level of data quality to risk my reputation upon.

So, I wound up copying and pasting data for several hours from Baseball Reference into my spreadsheet, same as I’ve always done.

Here’s my question: is there an artificial intelligence service that won’t make endless mistakes like these? Is there something I can pay money to and trust the output?

ChatGPT recommends:

📦 3. AI with More Reliable Fact Retrieval (Than ChatGPT)
Perplexity.ai (Pro version) or Claude.ai by Anthropic
What they do: They cite sources inline and are far less prone to hallucination than ChatGPT when it comes to factual queries.
Good for: Finding accurate, up-to-date answers and extracting lists like top WAR players.

What do you think?

Discussion about this post

Dorkwad

Apr 1

Also Steve, the AI's all know who you are. The companies have gotten better at reducing bias in their chatbots, so you can directly ask it how it can help you, Steve Sailer. For example:

----

Me:

Steve Sailer wants to use an AI for his work. How can AI help him? He needs accuracy. Hallucinations are fatal. So the AI is probably best to write programs for him, right? What programs? I'm thinking small scripts that you can mostly write one shot. He doesn't have time to learn programming in depth.

ChatGPT:

Yeah, small one-shot scripts are the right level. If hallucinations are fatal, then the AI shouldn't be used to generate facts or analysis directly—just tools. So the goal is: use AI to offload mechanical tasks, not judgment.

Here’s where AI-written programs could help someone like Sailer:

---

1. Text Analysis / Search Tools

These can be built once and reused indefinitely.

Keyword highlighters: Highlight specified terms in a document or website.

Concordance tools: Find all instances of a phrase across a corpus and show context.

Frequency counters: Count and rank word/phrase usage in documents.

Diff scripts: Compare versions of an article or webpage and highlight what's changed.

---

2. Web Automation

Useful for gathering structured data or monitoring sources.

Web scraping:

Pull census data, public health tables, or academic metadata.

Monitor specific websites for updates or changes.

RSS filtering:

Create a filter that forwards only items matching specified keywords.

Archiving:

Auto-save specified URLs to the Wayback Machine or local disk.

---

3. Data Cleanup and Reformatting

Make raw tables usable for Excel or R.

Convert PDFs to clean CSVs.

Remove footnotes, normalize weird Excel formats.

Detect and fix malformed data (dates, missing headers, etc.).

---

4. Spreadsheet Automation

Scripts that process data in consistent, pre-defined ways.

Generate pivot tables from raw data.

Cross-tabulate race vs. income vs. geography.

Apply specific formulas or conditional formatting automatically.

---

5. Visualization

Simple static plots (bar charts, maps) that avoid human charting overhead.

Plot population distributions.

Auto-generate line charts from public datasets.

Highlight outliers or percentiles.

---

6. Local Search Tools

If he has a big archive (e.g. PDFs, old blog posts, census tables):

Build a local full-text search with keyword ranking.

Tag and organize documents based on metadata.

---

If he doesn’t want to learn code at all, AI can generate small Python scripts on request—he just runs them. Ideally, he’d use something like:

Python + Jupyter Notebooks for one-off scripts (AI can generate these)

Streamlit or Gradio for simple interfaces (if needed)

He shouldn’t aim for general-purpose tools. Stick to tightly-scoped, single-use tasks: “I want to extract this table from this site and turn it into this format.”

---

Expand full comment

1 reply

Scott Novak

Apr 2

The thinking/reasoning models are generally a bit more reliable, I've found. o3 mini and o3 mini high and o1 in your ChatGPT account are thinking models that may give better results. Claude Sonnet 3.7, Grok 3 in think mode, and Google Gemini models with "thinking" in the name should do better too. Perplexity has thinking models available as well. Grok is especially great at analyzing discussions going on in x/twitter and gemini models are great at summarizing Youtube videos you don't want to watch (since google owns youtube). With ChatGpt, when you are referencing data 2024 or later be sure to turn on the "web" icon so it has web search. As other people have mentioned,

ChatGPT, Grok and Perplexity have also released "Deep Thinking" modes that can give deeper summaries of various research topics. The ChatGPT deep thinking is by far the best as of now - you only get 10 a month uses with your $20 account. Try to prompt it as if you would if you were a researcher prompting a grad student to do a summary of relevant research, gaps in literature pointing to future research needed etc.

Also, if learning about AI latest developments is something you don't mind investing an hour of your life each week in, Zvi Mowshowitz' Don't Worry About the Vase substack releases a highly valuable weekly report every Thursday on the latest developments (yes it is moving that fast).

I tried your query in Google Gemini 2.5 Pro (experimental), which is an excellent thinking model released March 25th and is available to free users. It is possibly the best model widely available (as of this week!). Just sign up with a Gemini account with your gmail:

"Okay, here are the top ten MLB players for the 2024 season, ranked by Baseball Reference's Wins Above Replacement (bWAR):

Based on the final 2024 season statistics from Baseball Reference, here are the top ten MLB players ranked by their Wins Above Replacement (bWAR):

Aaron Judge (NYY) - 10.8 bWAR

Bobby Witt Jr. (KCR) - 9.4 bWAR

Shohei Ohtani (LAD) - 9.2 bWAR

Gunnar Henderson (BAL) - 9.1 bWAR

Jarren Duran (BOS) - 8.7 bWAR

Juan Soto (NYY) - 7.9 bWAR

Matt Chapman (SFG) - 7.1 bWAR

Francisco Lindor (NYM) - 6.9 bWAR

José Ramírez (CLE) - 6.8 bWAR

Ketel Marte (ARI) - 6.8 bWAR

Note: José Ramírez and Ketel Marte are tied for the 9th position.

Baseball Reference's WAR calculation aims to quantify a player's total contributions to their team (hitting, baserunning, fielding, and pitching) in one statistic."

Expand full comment

1 reply

64 more comments...

No posts