Is social media dodgy evidence or the future?

 

Social media has the richest, largest and most dynamic evidence base for human behaviour that has ever existed and will transform research for public policy. But policy makers must take heed of serious potential questions of ethics and robustness according to Jamie Bartlett, Director of the recently set up Centre for the Analysis of Social Media at Demos.

On Thursday 4 August 2011, Mark Duggan was shot dead by a police officer in Tottenham. By Saturday, social media channels showed increasing hostility, including explicit threats, against the police. Over the next few days, social media content indicating criminal intent or action ratcheted in huge numbers through open and closed networking sites, as well as from bystanders who were trying to provide information to the police.

Over at Gold Command, the police were overwhelmed. Millions of pieces of data, confused, contradictory and of unknown provenance were arriving at speed. But contained within this mess was actionable, usable information they badly needed. Relying on its traditional forms of intelligence collection and analysis, the police had no systematic way of sorting the wheat from the chaff, according to a study on the police response to the riots by Her Majesty’s Inspector of Constabulary.

This is a dilemma that all public servants are facing, albeit in a less urgent fashion (perhaps why the police is leading the way). There is a lot of valuable information on social media to help make better policy, but finding the right information and doing it in a robust, ethical way is harder than it seems.

Social media research, I think, will transform public policy research, whether conducted by political parties, local government, civil servants, think-tanks or academia. It offers up the largest, richest, and most dynamic evidence base for human behaviour that has ever existed, and is growing quickly. Argument, reaction, attitudes, intentions – and often in real time.

There is an accompanying proliferation of free or cheap software that allows access to, and analysis of, these data too. Its potential applications are dramatic. In policing, social media is already a valuable new source of insight, something I’ve termed ‘SOCMINT‘ (social media intelligence). For example, Greater Manchester Police have developed a social media application to share information – including a newsfeed, missing persons and police appeals – with the public: a neat piece of crowdsourcing. Public health experts now routinely scan tweets and search requests to identify pandemics earlier than traditional methods. Walsall Council runs over 80 social media accounts – and receives plenty of information about constituents’ views through that route.

The Prime Minister is even trialing an app to get real time data on key issues. Where are the shortages in supply for child-care coverage? Rather than wait 18 months for a review, track online conversations in relevant forums to identify areas of possible shortage, and respond.

But the potential is still largely unfulfilled. Advertising and marketing firms spend a fortune now analysing online product buzz, but standards of evidence are higher for public policy research, and rightly so. We have over the course of many years developed good tests and practices of how we gather, analyse, corroborate and evaluate research in order to ensure it is ethical, reliable, and useful. At the moment, these standards aren’t always met in social media research.

I advise those in engaged in public policy research to look carefully at whether social media research might be able to improve decision making. However, before proceeding, they should set themselves three important tests, irrespective of what tools used and questions asked:

Ethics: Responsible researchers follow strict research ethics guidelines. At the heart of those ethics lie four principles: harm, deception, informed consent, and invasion of privacy. The first question any researcher should ask themselves is: Have I adapted these principles to social media research techniques, according to the technology used and the platforms it is used on? Because something is online does not make it fair game: especially when using an automated data extraction technique, which might ignore privacy settings.

Sample: Extrapolations depend on the quality – especially representativeness – of any research sample collected. The social sciences have not developed an approach to robustly sample social media data sets. At present, social media research is obsessed with size: collecting enormous samples (something that computational approaches are good at delivering), rather than representative ones. Typical data acquisition strategies remain ‘samples of convenience’ or ‘incidental sampling’, which means the most readily available or easily accessible – rather than the most representative – are collected. Again, the technology is important: the public Twitter API, for example, limits requests to 150 requests per hour – a lot by social science standards but technological limits on sampling can result in enormous flaws. The second question to ask is:What are the technological and search biases in my sample, and how might that affect how representative my sample is? Size is not everything.

Analysis: At some point researchers want to know ‘why’ as well as ‘what’. The intent, motivation, social signification, denotation and connotation of any utterance is dependent on the context of situation and culture. So the accuracy of any interpretation depends on a very detailed understanding of the group or context that is being studied. However, because automatic data collection is required to process the sheer volume of data now available, many of the contextual cues – the thread of a conversation, information about the speaker, the tone of the utterance and the information about the speaker – are often lacking in analysis of social media data. The act of ‘scraping’ a social media platform – such as collecting Tweets or Facebook posts – often by definition de-anchors a text from its natural setting. Some studies for example argue for an ‘online disinhibition effect’ – that the invisible and anonymous qualities of online interaction lead to disinhibited, more intensive, self-disclosing and aggressive uses of language.  So this final question is: What is the context and behavioural/linguistic norms in which your data are set, and how might that affect your interpretation?

As ever, there is much more besides – but this is a handy start for anyone looking into social media research. No-one has the answers to these questions: so do not be afraid of experimenting with this: but proceed with caution, be circumspect in drawing conclusions, and always apply the same standard of rigour as you would elsewhere.  The potential benefits are huge.

 

Jamie Bartlett 
Jamie is the Director of the Centre for the Analysis of Social Media at the think-tank Demos. The Centre is made up of computer and social scientists working on a range of automated data extraction and machine learning tools that incorporate social science statistics, analysis and ethics for public policy research.

(The views are the author’s own and do not necessarily represent those of the Alliance for Useful Evidence)