What's the big fuss about 'big data'?

0 downloads 0 Views 59KB Size Report
Jan 8, 2016 - A big data revolution is under way in health care. McKinsey and ... These can indeed give new insights: clever ana- lytics was recently used to ...
Editorial

What’s the big fuss about ‘big data’?

A big data revolution is under way in health care. McKinsey and Company.1 The ability to visualize, manipulate, and mine Big Data provides opportunities to enhance our understanding of disease onset and progression, identify new therapeutic avenues, and speed the translation of new discoveries into improved health and health care. The White House, Office of Science and Technology Policy.2

‘Big data’ is the latest buzz phrase in health services research and health care policy circles. Dramatic transformations and advances are promised but is the hyperbole justified? Big data refers to large and complex digital datasets that typically require non-standard computational facilities for storage, management and analysis. These data are really big – they are immense, being measured in petabytes and extabytes (1 quintillion bytes) of digital information. Big data includes the administrative and transactional data created everyday from individual interactions with the state and commerce, culled from such diverse sources as mobile phone records and supermarket loyalty cards. They also contain unstructured data harvested by data mining and Web scraping of online activity including Facebook and Twitter, text, video and audio material, and geolocation and tracking systems. These are augmented by vast scientific data such as reports from the sensors inside the large hadron collider and the sequencing of the three billion base pairs that make up the human genome. Alongside these data, national population censuses start to look distinctly small scale. Health care systems are also ‘relentless producers of big data’.3 They routinely collect masses of information about patient journeys through the health care system as well as staffing, prescribing and outcome data. Telemedicine delivers data from sensors and monitoring in clinics and homes. Health services research simply adds to this – a single trial, the Whole System Demonstrator,4 extracted a billion administrative data records from over 250 health care organizations. Clearly, there has been a step change in the scale and volume of data potentially available for research and policy. Big data is an enormous data archive and, thanks to the Web, offers the possibility of capturing

Journal of Health Services Research & Policy 2014, Vol. 19(2) 67–68 ! The Author(s) 2014 Reprints and permissions: sagepub.co.uk/journalsPermissions.nav DOI: 10.1177/1355819614521181 jhsrp.rsmjournals.com

dynamic, near real-time data about health care (for example, Google.org’s well-publicised analysis5 of flu activity). But can big data really deliver on its promises for health services research and policy? We suggest three reasons why some caution is needed. First, big data is not necessarily accessible data. Big data is often elided with ‘open data’ but not all big data are open: in the UK the supermarket company, Tesco, guards its 16 million loyalty card records because they give it a competitive advantage. Open data are made freely available to use and republish by removing licensing and copyright restrictions. Several governments and research organizations are committed to making data open. But even open data are not necessarily usable. Despite pressure to deliver standardized formats, to make data machine-readable and linkable, many open data are little more than scanned documents or spreadsheets. The widely vaunted opening up of public spending information in the UK means that the assiduous researcher can track down health system accounts but these are only a pdf of an official report.6 Even apparently useful data may be limited in health care where issues of patient confidentiality mean that data may not be provided at the level of detail required. For example, prescription data in England provides four million rows of, on the face of it, new and interesting data. These can indeed give new insights: clever analytics was recently used to argue that the use of two proprietary statins was costing the NHS an unnecessary 27 million every month.7 But health services researchers need more details to make sense of these data such as who is prescribing for which diagnoses and at which dose? The second reason for caution is that classification matters. Big claims are made about how big and open data will transform decision making. In England, the organization responsible for information on health care has stated: The benefits of a richer hospital data set are legion. As citizens we will be able to compare the quality of care provided by different hospitals, different hospital teams and wards and by individual clinicians.

Downloaded from hsr.sagepub.com by guest on January 8, 2016

68

Journal of Health Services Research & Policy 19(2) Commissioners will become better informed . . . researchers will be able to analyse patterns and trends across the country and develop more sophisticated analytical and predictive tools . . . Overall we envision a virtuous cycle where richer datasets and greater transparency will lead to greater participation and better care for all.8

But we must remember that all data – big or small – are socially constructed.9 To borrow from Lisa Gitelman,10 ‘Raw Data’ is an oxymoron. We need to know where data come from, and the methods used to collect, analyse and categorize them. Beyond this, we need theoretically informed questions that steer us away from data dredging and techniques that conflate coincidence with causation, and we need to infuse data with meaning. Perhaps we need wide data,11 that is analyses that harness a combination of qualitative and quantitative data and methods to understand aggregate data. Such approaches might curb the popularity of sentiment analyses (a form of content analysis typically directed to determining emotional status from the words used in tweets) and context-free generalizations about the emotional state of nation states.12 Finally, there are concerns about linkage of data. Open and big data offer the possibility of linked or semantic data. While the Web is currently a web of documents (web pages), some commentators envisage the development of a ‘web of data’.13 Health services researchers have traditionally linked on patient identifiers, but this new semantic web offers the possibility to link heterogeneous data, which might include people, places, products or concepts. Some of the methods and technology to support this evolution of the Web are still aspirational, not least because of the problems in standardizing data formats described above. Current linked data exemplars tend – as in the example of statins cited earlier – to use geographical mapping, but qualitative and quantitative datasets might in future be integrated to offer new opportunities to explore health care and services. Digital and online resources offer big data at a scale unimaginable to the pioneers of health services research. Open linked data could inform decisions and policy. But there are significant challenges ahead. Researchers need to be critically engaged to resist the hype. Researchers need to draw on their expertise to think about the methods, classifications and context within which data are produced, and may need to develop new technical skills to engage actively with these data. Above all, everyone needs to remember that data alone tell us nothing. Large or small data will not deliver interpretation, conceptual understanding or meaning: that is the researcher’s job.

References 1. Kayyali B, Knott D and Van Kuiken S. The big data revolution in US healthcare. http://www.mckinsey.com/ insights/health_systems_and_services/the_big-data_ revolution_in_us_health_care (2013, accessed 26 November 2013). 2. Kalil T and Green E. Big data is a big deal for biomedical research, http://www.whitehouse.gov/blog/2013/04/23/ big-data-big-deal-biomedical-research (2013, accessed 26 November 13). 3. Downing D. Big data and the NHS: can analytics tame the Leviathan? Guardian Professional, http://www.theguardian.com/healthcare-network/2013/apr/25/big-datanhs-analytics (2013, accessed 26 November 2013). 4. Bower P, Cartwright M, Hirani SP, et al. A comprehensive evaluation of the impact of telemonitoring in patients with long-term conditions and social care needs: protocol for the whole systems demonstrator cluster randomised trial. BMC Health Serv Res 2011; 11: 184. 5. Google.org flu trends. How does this work? http:// www.google.org/flutrends/about/how.html (accessed 26 November 2013). 6. National Audit Office. NHS (England) summarised accounts 2006–2007, http://www.nao.org.uk/report/nhsengland-summarised-accounts-2006-07/ (2007, accessed 26 November 2013). 7. Prescribing Analytics. NHS efficiency savings: the role of prescribing analytics, http://www.prescribinganalytics. com/ (accessed 26 November 2013). 8. NHS England and Health and Social Care Information Centre. NHS hospital data and datasets a consultation. Leeds: NHS England, 2013. 9. Bowker GC and Starr SL. Sorting things out: classification and its consequences. Cambridge, MA: MIT Press, 1999. 10. Gitelman L (ed.) ‘‘Raw Data’’ is an oxymoron. Cambridge, MA: MIT Press, 2013. 11. Tinati R, Halford S, Carr L, et al. Big data: methodological challenges and approaches for sociological analysis. Sociology. 12. Mitchell L, Frank MR, Harris KD, et al. The geography of happiness: connecting twitter sentiment and expression, demographics, and objective characteristics of place. PLOSone 2013; 8: e64417. 13. Berners-Lee T. The next Web of open, linked data, http:// www.ted.com/talks/lang/eng/tim_berners_lee_on_the_ next_Web.html (2009, accessed 15 February 2011).

Catherine Pope1, Susan Halford2, Ramine Tinati3 and Mark Weal4 1 Faculty of Health Sciences Email: [email protected] 2 Faculty of Social and Human Sciences, 3,4 Faculty of Physical Sciences and Engineering, University of Southampton, Southampton, UK

Downloaded from hsr.sagepub.com by guest on January 8, 2016