The U.S. Census Bureau is hoping that most people who live in the U.S. will use the internet to answer census questions, rather than filling out a paper form or providing those answers to a census taker in person, at their home.
That would be cheaper – a plus for a budget-strapped Census Bureau – and could help ensure maximum turnout and accuracy of the count. For instance, databases could keep track of which homes have not yet responded to the survey, allowing census officials to target mailings and in-person visits to those locations, without needing to spend time chasing households that have already responded.
However, as some of my own work on digital platforms and electronic commerce shows, collecting data online carries some significant risks that are new to the census and may undermine the accuracy of the count and the public’s trust in the process.
If everyone responds digitally, the census online system will have to handle nearly 130 million responses – one for each household in the country. Many of them may be using computers or smartphones that have been hacked or have malicious software installed.
One potential problem this raises is that someone trying to respond to the census may find themself instead submitting their information to some other group, one that seeks to illegitimately harvest their personal data for profit.
Another possibility is that a person might be submitting their information to the actual census website, but the software running secretly on their computer could modify the data before it’s recorded. That could result in inaccurate reporting – making it seem like more people live in a home than actually do, or fewer.
Bridging the digital divide
Of course, not everyone will complete their census survey online. In addition to people who don’t have computers and smartphones, many homes aren’t connected to the internet. Even in New York City, what appears to be a pinnacle of an interconnected urban area, about 29% of households don’t have high-speed internet access.
To reach those people, and those in more suburban and rural areas who also don’t have internet access, the Census Bureau will need to rely on phone and mail responses, along with the traditional method of visits by door-to-door census takers.
The data collection effort underway for the 2020 U.S. Census may end up disadvantaging the households without access to broadband internet access. Groups that are more likely to use the internet on their mobile phones – as opposed to a computer – may find it too hard to use their phones to respond to the online questionnaire. That could end up disproportionately reducing the response from African Americans, Latinos, younger adults, low-income earners and people without a high school diploma.
The U.S. Census Bureau is aware of those concerns and is working to identify communities where a lower online response is likely. The agency says it will send paper questionnaires and even human census takers to households in those areas at particular risk.
There is an opportunity for civic technology and citizen data science to help address people’s difficulties using online surveys, too. For instance, the Hard to Count map tracks households with poor internet access, and neighborhoods that are home to racial or ethnic minorities and people with lower income or education levels. Nonprofit organizations and community groups are using the map to target efforts to encourage people to participate in the census.
Since the early days of the census, privacy has been a concern. In the 1850 census, the U.S. marshals assigned to collect data were instructed to consider all the responses to be confidential. By 1880, census workers – now trained survey-takers rather than law enforcement workers – were subject to fines for violating their oaths of secrecy.
Over the decades, the Census Bureau has updated standards to keep up with changes in technology and societal expectations about privacy protection. The most recent set of concerns involves the potential for people to use computers to match up census data with other data available publicly online. The U.S. Census Bureau’s researchers found they could combine the 2010 census results with the contents of commercial databases and determine the real identities of 52 million Americans. That could reveal private information, and violates the Census Bureau’s obligation to protect respondents’ identities.
In an attempt to prevent that from happening with the results of the 2020 census, the Census Bureau has adopted a statistical method called “differential privacy” in hopes of obscuring sensitive personal information. The mathematics underlying technique are complicated, but in general the idea is that state-level counts will be accurate, but more detailed measurements – of populations of counties, towns and neighborhoods – will be altered to avoid revealing specific data that could be used to identify actual people.
However, researchers have voiced concerns that the data may not accurately represent the nation’s population, and that more specific details about the numbers of residents of states and towns may be misleading. Critics fear the effort to protect Americans’ privacy may end up complicating planning that factors in population numbers, like disaster preparedness efforts.