Data is a good thing but there’s too much of it these days. The human brain is intelligent not because of the sheer volume of data it can ingest but for the way it can quickly discern patterns — and then extrapolate. Sound educated guesses are often not the product of tonnes of data; they spring from gut feeling, which comprises all of your past experiences, knowledge and knowhow, coupled with the most pertinent information and analysis.
Data never tells the full story
More data has been created in the last two years than the previous 5,000 years of human history. When organised and analysed correctly, this information has the power to create a better future for all. Despite such promise, nearly all data is defined, interpreted and manipulated by humans who frequently make a value decision about what to include. Therefore, understanding the elements excluded from the data is just as important as the data itself.
Here are some of the most common types of data bias you should guard against:
Response or Activity bias
This occurs in content generated by contributors’ reviews on Amazon, Twitter tweets, Facebook posts, Wikipedia entries, etc. Only a small proportion of people are contributors and their opinions and preferences are unlikely to reflect the opinions of the general population. For example:
• 7% of users produce 50% of the posts on Facebook.
• 4% of users produce 50% of the reviews on Amazon.
• 0.04% of Wikipedia’s registered editors (about 2000 people) produced the first version of half the entries of English Wikipedia.
To cite a local example, the former Head of Operations of a restaurant chain who wished to remain anonymous highlighted that probably one or two out of 10 disgruntled patrons would complain via the feedback form, social media or in an email. The rest will just decide not to come again. She explained, “When it comes to feedback from satisfied customers, I estimate it’s about one in 50. The vast majority would feel that they had paid for the pleasant experience or are too busy to send in their compliments.
“While it’s important to resolve every complaint and ensure that it does not occur again, management would need to understand this ratio when comparing the number of complaints versus accolades,” she added.
Omitted data bias
In a study to predict the probability of death for patients with pneumonia to correctly identify high-risk patients, results showed that pneumonia patients who were asthmatic had lower risk of dying from pneumonia compared to patients who were not asthmatic. Realising that this did not seem logical, the researchers checked and discovered that pneumonia patients with a history of asthma were usually admitted directly to the Intensive Care Unit. The aggressive care received by asthmatic pneumonia patients was so effective that it lowered their risk of dying from pneumonia compared to the non-asthmatic patients who were admitted to normal wards. However, the fact that these patients received aggressive care in the ICU was not recorded in the data. Here’s lies the danger: What is not measured is not considered.
In addition to missing information, there is the question whether people are completely truthful in surveys. Among the reasons pollsters failed to predict Donald Trump’s victory in the 2016 US presidential election were the omission of the opinions of hard-to-reach rural groups as well as the truthfulness of respondents when polled by complete strangers.
Fascination with the latest data can skew our thinking
Understandably, the accelerated flow of new information is increasing our reliance on the “availability heuristic” — the tendency to make decisions based on the most recent information available. This too is risky. For example, many governments and economist failed to foresee the 2007-8 financial crisis; since the prevailing mathematical models and the recent data did not suggest a downturn. If only they had focused less on the latest data and more on the cyclical nature of financial markets, the picture would be clearer.
To further complicate matters, history occasionally throws a curve ball. Professor Nassim Taleb addresses such uncertainty in his book, The Black Swan. The title refers to an event outside the realm of expectation that, despite its low probability of occurrence, fundamentally changes the course of history. He points out that in a random world full of uncertainty, variance and coincidence; analysing data is just one piece of the puzzle.
Limitations of data
Data only measures the measurable and not the things that can’t be measured such as the irrational realities of being human. As the sociologist William Bruce Cameron aptly stated, “Not everything that counts can be counted, and not everything that can be counted counts.”
Often, qualitative data cannot be measured precisely. The question, “Do you like to participate in surveys?” will elicit answers such as “Yes”, “Sometimes”, “Depends” or a flat “No”. Asking respondent to elaborate on “Sometimes” or “Depends” will generate imprecise responses that are based on individual interpretation.
On the other hand, quantitative data can be gathered without intruding into people’s private space. These include weight of new-born babies, number of hospital visits per capita, occupancy of ICUs in hospitals, number of new COVID-19 cases; to list a few.
Making the most of data
As the world enters the Fourth Industrial Revolution, with its proliferation of information and the exponential rise of Artificial Intelligence, data will become increasingly relevant. There is the opportunity to design a more encompassing society, as long as we manage data correctly.