18 Comments
тна Return to thread

The question was a repeat on an inaccuracy that Steve keeps doing even though he knows the reality of the situation. With Steve's phrasing, it would be easier to justify the IRS ending its cooperation with Chetty. And Steve does not like Chetty's research because it does not cover the same issues that Steve covers.

Expand full comment

"What famous researcher has provided Steve hours of content w access to anonymized tax returns?"

Does not in any way support your complaint

Expand full comment

Chetty does not have access to anonymized tax returns. Chetty and his research staff have access to the actual IRS databases by being able to run scripts on those databases that produce aggregated results. No one's anonymized tax return is sitting on a server farm at Harvard. However, one's state returns are sitting on the databases on the IRS where Chetty's coders were given the design of the databases and then, wrote scripts that run on the IRS's actual system.

That is a world of difference that Steve seems intent on ignoring.

Expand full comment

I'm not sure you know what the word "access" means. From The statement on the quiz and things Steve has said in the past, this is approximately what I assumed the situation was. No one ever brought up where the servers were located. BTW, the implication of what you wrote "wrote scripts that run on the IRS's actual system" is that the IRS has a single database and that Chetty's people ran scripts on it. If that is true, it is horrifyingly irresponsible of the IRS to do that. What is more likely is that the IRS did a data ETL into an anonymized database structure that was optimized for querying. Whether this was a server owned by the IRS or Harvard I could not say. If instead they let Chetty's group run queries on the IRS live transactional system, that's bonkers

Expand full comment

NO. Chetty's coders have access to the IRS database and prepare scripts that IRS sysops run on the actual data. The results of the scripts are aggregated data. There have been several stories about this. Try google.

From https://www.science.org/content/article/how-two-economists-got-direct-access-irs-tax-records

"We were given a dummy data set, with random numbers, to test our program," explains David Grusky, a sociologist at Stanford University in California and director of its Center on Poverty and Inequality. "Once we're confident the program is working, we ship it off to the IRS and someone there does the run. After checking to make sure no confidential information is included, they send the output back to us. And we shuffle back and forth until the project is done. It's a little cumbersome, but it works."

Expand full comment

I think you mean the DBA's. Most sysops couldn't write an outer join if their life depended on it.

The danger is that there's something in that script that looks for specific information and produces non-aggregated/non-anonymized output.

The safest approach would be a separate warehouse that holds aggregated data only that Chetty's grad students queried via handing off "scripts" (SQL?) to the DBA's.

Much less safe is letting those grad students run queries directly against the fine grained data itself, meaning individual returns.

Anyway, this raises a world of questions: can said grad students run queries against the detail tables? Or do they need to submit their SQL (or SAP query language or R) to the IRS' DBA's? How closely is said SQL monitored or checked? That's not a trivial point--I have had to troubleshoot views that literally ran on for a dozen pages.

Expand full comment

Please try reading the linked story.

Expand full comment

That story strongly implies that they are NOT previewing their code with IRS employees and that they have direct access to individual returns.

Expand full comment

"the team was given the chance to work directly with tax records."

Doesn't quite agree with a later paragraph about sending dummy-tested queries to the IRS. The top men went through a security check, but did all of them?

Expand full comment

I don't think Guest007 read past that paragraph. It later says Chetty and Saez got to work directly on the IRS database as if they were IRS. I'm going to guess again that this is not the transacational database but a data warehouse set up for the project for which the IRS put out the RFP

Expand full comment

Do you really think that the IRS would set up a second copy of the entire tax return database just for a few researchers?

Expand full comment

My reading is that the normal procedure is queries against dummy data sent in by outside researchers but that Chetty got special status to just go in and query the database like an IRS employee.

Expand full comment

At this point I want to know if select access to those tables is audited/monitored to ensure that Chetty et al are not running queries on individuals.

Expand full comment

They are identifying and correlating research on individuals but the data is aggregated so no one person can be identified. Read the article without the intent to misunderstand it.

Expand full comment

The articles suggests that--unlike other researchers who have to send preformulated queries to the IRS--Chetty and his team can just work with the production data. In addition where exactly does it state that they're working with aggregated data rather than performing their own aggregations on individual records?

Expand full comment

Once again, I wasn't saying that this was done on a Harvard server, merely that Steve never said that either and that there was no way the query was being run on the IRS transactional system. That is perfectly consistent with what you quoted above. I'm not sure why you are making such a big deal about where the data resides but if it makes you happy, you are correct--the thing no one was claiming the opposite of is true.

Expand full comment

looks like you didn't read far enough. That was the workflow for the RFP. Later it says

"Chetty and Saez were spared that inconvenience by, in effect, becoming part of the IRS workforce. IRS decided that the researchers needed to come to Washington as needed because "the econometrics were quite technical and a great deal of work was required to assemble the needed data," Johnson says. Once that decision was made, the academics agreed to "submit to fingerprinting and a complete background check, undergo training in the proper protection of administrative data, and be subject to the same rules and penalties that apply to any IRS employee." They also worked under the supervision of the Treasury Department; one employee, Nicholas Turner, was even listed as a co-author on one of the key papers."

Expand full comment