The word Manifesto has some emotional heft – it recalls a revolutionary, impassioned and frustrated at the current system, preparing to lead the peasants in revolt. Revolutionary fervor isn’t typically associated with statisticians, so when I came across Julia Lane’s book – Democratizing Our Data: A Manifesto – it caught my attention.
And then, as I read her words, it did a lot more than that.
What’s the current unemployment rate? How fast is GDP growing? We don’t really know. How important is AI to the economy? Also, don’t know.
All this uncertainty is because the current system for collecting economic and demographic data in the U.S. is outdated, unreliable and – here’s the scary bit – not fixable. Lane knows because she’s spent a career attempting to fix it.
Much of our economic data is compiled from surveys, an information collection system developed nearly a century ago. It relies on humans filling out complex forms – a process that is slow, subjective and often incomplete. Want a feel for it? Try following the questions below, which are part of the 2022 survey the Census Bureau used to track AI’s impact on the economy.
Were you able to follow it? I couldn’t. So it’s not entirely surprising that using this data, the Census Bureau says that AI is having a large impact on only 5% of companies. More broadly, Lane’s fellow economist Diane Coyle estimates that as much as 4/5 of economic activity now goes unmeasured.
This is not because government statisticians are incompetent. They are survey methodologists, hired to collect data in a specific way. They are not trained to think about the vast range of data that now exists and the many creative ways it could be put together. And the agencies they work for are not mandated to do anything but go on collecting the same data in the same way. Lane learned this through experience - she spent years trying to reform government data collection from the inside.
Grapes From Wine
When I worked as a quantitative researcher in the 1990’s, my boss was Richard Grinold, a legend in the industry because he combined a physicist’s mathematical rigor with an ability to explain complex investment ideas in crisp metaphors1 . One of my favorites was a process Richard called “grapes from wine”.
Want to know a portfolio manager’s true return forecast for a stock? Don’t ask him, look at his portfolio. By taking his actual positions (the wine) you could reverse engineer his return expectations (the grapes). Sure, this required some assumptions, but it was more accurate than surveying him about his return forecasts because his portfolio’s positions represented actual decisions made with actual money.
Lane’s approach to utilizing data reminded me of this. Want to know AI’s importance in an industry? Don’t send a survey, instead look at who companies are hiring - actual decisions made with actual money. Lane says we should trace people who have been trained to think about a particular problem - synthetic biology, quantum computing, AI, etc. - to the companies that employ them. This would give us a better idea of which technologies and skills are penetrating which sectors and at what pace. It might also paint a new, more accurate picture of the industries that form the building blocks of of the economy.
We once grouped firms into industries based on what they physically made. Later, we added industries based on what services they offered. Lane’s envisages grouping companies based on the ideas and skills they are employing. Not as a replacement for the old lens, but a new tool to improve our understanding.
Right now, the economy is the wine, but we don’t know what grapes are being used to make it.
A Center For Data And Evidence
This approach would require mapping a variety of data sources - including tax and educational information - in a way that preserved anonymity but allowed for meaningful aggregation. That’s a problem we know how to solve, but existing government statistical agencies are not set up to do it.
Lane believes the way forward is a new center for data and evidence outside the federal government. She is working with private foundations from both sides of the political spectrum to establish funding and principles. Governance will matter, because any private-sector enterprise is going to be open to accusations of bias. She thinks this can be managed by making standard setting - agreed principles for what trustworthy data should look like - a primary goal. And by keeping the center’s scope focused on producing high-quality data, not using the data to make policy recommendations. She wants it to “democratize” data by setting the center up outside DC and ensuring the data it produces is driven by the demands of businesses and individual states.
Lane isn’t planning (as far as I know) to lead a mob of data scientists to the gates of Commerce Department. Hopefully her Manifesto triggers a revolution all the same.
Listen to my Ideas Lab interview with Dr. Julia Lane:
Spotify:
Apple Podcasts:
Lucky for me he also had a sense of humor. When he took over the group I was working in, he asked for a presentation on my research. As the only non-PhD in the group, I figured there was 50/50 chance I’d get “reassigned”, so I gambled. My first slide noted that I had worked there for 2 years and had come up with 5 ideas. I then scribbled a formula saying that I could therefore produce 2.5 ideas per year. He smiled, said 2 good ones per year was above-average, and I kept my job. In both of our defenses, one of those ideas - carry trading in FX in Fixed Income did eventually lead to this.
Governments have little incentive to update their methods until the flaws become so glaring that even laypeople start to notice—something that isn’t always easy. Researchers, on the other hand, have a stronger incentive to seek accurate data since faulty inputs can undermine their work. However, if corporate sponsorship grants them access to better data, they may prioritize their research over making the data publicly available.
Unlike computer science, finance and economics don’t have a strong tradition of open research. If more economists are trained as software engineers and expectations shift toward making research reproducible with public data, the spirit of open source and grassroots collaboration may create a new field for data collection and sharing.