Ethics Case Study: Social Machines

What are social machines, how do they differ from social media and what new sociological phenomena do they represent?

Back to Case Studies

Back to Ethics Home

Networked digital technologies and devices are now ubiquitous in many societies, providing new channels through which individuals and communities can connect, share information, co-create solutions, distribute tasks, support one another, play and socialise. While online groups and social media are now familiar concepts, and have been the subject of much sociological research, an arguably new phenomenon has emerged which bears closer scrutiny as part of the broader Digital Society research agenda. This has been characterised as the Social Machine. The scope and boundaries of this concept are still being defined and taxonomies for describing and differentiating social machines are evolving. In essence, however, the term ‘social machines’ represents a set of unique socio-technical systems whose existence and functionality depend on a synergistic blend of human and computational ‘engineering’.

Social machines are conceptually related to, but qualitatively different from, social media, information and communication channels or platforms, and the social web, a broader term describing web-mediated social interactions. It is closely associated with the concepts of Collective Intelligence, Distributed Computing and Crowdsourcing, which rely on the effort and cognition of large numbers of individuals, mediated by digital systems, to generate information or solve problems that would be impossible for computers or people to do alone. Inevitably the term has also become associated with the Big Data movement, particularly in relation to the mining of large corpuses of social media and open data.

Social machines appear when other ingredients of sociality are added; for example, the EyeWire project – involving massive numbers of distributed ‘citizen scientists’ examining digital images of brain tissue to find and mark-up cancer cells, has a sociality layer, in the form of an entertaining and competitive gaming format and a community support forum. Likewise, the crowdsourcing platform Ushahidi builds new knowledge (annotated maps) gathered from objective (location) and socially derived or curated data (e.g. outbreaks of violence or disease) and, like other ICT for Governance innovations, was designed to leverage societal power as a catalyst for change. Another example is the ReCAPTCHA system, which crowdsources human judgement by asking service users to type the letters they see in distorted image files in order to determine whether they are humans or computers bots. These behavioural data, in turn, feed a machine learning algorithm that incrementally improves the quality of automated text conversion software for digitising books (most users are unaware of this).

Ethical Issues Presented by Social Machines

Social machines pose a number of ethical and societal challenges. In his original vision for social machines from his book Weaving the Web, Tim Berners-Lee argued that social machines on the web would release “people [to] do the creative work and machines [to] do the administration”. While this has happened in some cases, in others the reverse is true. Indeed many intentional crowdsourcing applications involve humans doing the dull, repetitive tasks while the machines do the creative work, raising issues for trust and equity. Unintentional crowdsourcing takes this one step further, such as with facial recognition bots integrated into social software, or online professional collaboration tools, where users become both the data and the first-line data processors (through their choices), feeding predictive algorithms which may then curtail their options in the interests of greater ‘precision’ and ‘efficiency’.

In the following section, we look at one cluster of social machines which are themselves used to study social machines, the Web Observatory, as developed and researched within the UK EPSRC project SOCIAM (The Theory and Practice of Social Machines).

Example: The Web Observatory

The Global Web Observatory is a research tool for harvesting, organizing, archiving and distributing data about the web, in linked, geographically-distributed and autonomously-managed nodes. The primary role of the nodes is to manage catalogues of resources about data (meta-data) and software apps that enable these data to be analysed and visualised, both retrospectively and in real-time. The catalogues may describe open data, research datasets, or corpuses of social media data available free or at a charge. Individual nodes often contain their own research datasets, although typically they act as intermediaries between the originating organisation and researchers wishing to undertake web analytics. Individual nodes contribute their catalogues, datasets, and apps to the master catalogue maintained by the Global Web Observatory, which mediates research involving each of the nodes. Such heterogeneous, distributed (‘broad’) data is a sine qua non of social machines research, yet its collection and aggregation can be ethically challenging.

The Web Observatory passively monitors open streams of web data, rather than seeking to modify these data or influence the web, but although it is not interventionist in the way that some other social machines are, it still raises important questions about the responsibilities and ethical obligations of observers and data holders. Today, Web Observatories operate under the tacit assumption that all data sources have been ethically pre-screened by the organisations releasing them, but whether this is tenable in the long term, at scale, and in light of new Data Protection regulations, is an open question.

At its current state of development, the Web Observatory has a light touch ethical regime premised on good faith participation, but as it matures, the infrastructure is likely to incorporate techniques or formalisms to negotiate and verify the ethical commitments of participating data controllers. Following the lead of administrative and medical data linkage initiatives, a proportionate and principles-based approach is likely to be most successful. The standards expected for participation in the Global Web Observatory also deserve extension from data and systems interoperability, to interoperable ethics and governance, and work in this area is ongoing.

The Web Observatory, as a global resource, is a work in progress, and will need to respond quickly to such issues as they arise. Furthermore, as a decentralised network of autonomous nodes, whose governance is distributed institutionally and geographically, jurisdictions and cultural assumptions will vary across nodes. Attempting to centralise the ethical discourse surrounding a global distributed network such as this may itself prove ethically problematic, but responsible leadership, shared high level ethical principles, supported by a system of distributed and collaborative governance (ironically, one of the key benefits of social machines), will help to manage these challenges in a changing environment.

Acknowledgments

This work is supported by SOCIAM: The Theory and Practice of Social Machines. The SOCIAM Project is funded by the UK Engineering and Physical Sciences Research Council (EPSRC) under grant number EP/J017728/2 and comprises the Universities of Oxford, Southampton, and Edinburgh.

#digitalsociology @ #AoIR2016

Levelling the socio-economic playing field with the Internet? A case study in how (not) to help disadvantaged young people thrive online. 

Numerous studies in academic research highlight the significant differences in the ways that young people access, use and engage with the Internet and the implications it has in their lives (boyd, 2014; Livingstone & Helsper, 2007). In contrast to the rhetoric around ‘digital youth’ that suggests young people are always connected, and despite the move towards richer and more complex models of digital inclusion that include skills and level of engagement (Hargittai, 2010; Hargittai & Shaw, 2014), quality of access (as measured by choice of device and network, and degree of personalisation) remains a crucial problem. While the majority of young people have some form of access to the Internet, for some this access is sporadic, dependent on credit on their phones or access to a library or another public setting. Rich qualitative data in a variety of countries have shown how such limited forms of access can create difficulties for some of these young people as access to Internet becomes essential for socialising, accessing public services, saving money, and learning at school (Robinson, 2009).

This presentation will report on a two-year initiative in one area of the UK where it was estimated that around 10% of young people aged 14 did not have access to an Internet connection and laptop or pc at home. In response, the local council, three state schools, and an ISP collaborated to provide thirty of these disconnected young people (and their families) with a free laptop and free access to the Internet for two years to raise educational attainment and improve employment prospects for these individuals by school leaving age.

We will chart the highs and lows of the initiative. This initiative had a fair share of ‘success’. In a few cases, access to the Internet has helped the young people and their families reconnect with relatives abroad, save money on phone calls and consumables, and access vital health services. Some parents have told us about their children using the Internet to extend their learning, and look for college places or apprenticeships. And the scheme, by providing an alternative space to ‘hang-out’ within cramped living conditions, has helped harmonise familial relations.

But the project also has had its ‘failures’ where significant amounts of good will and effort led to limited meaningful change for a number of the families involved. Through in depth analysis of observations from 15 school visits, and 40 interviews with students, parents and teachers and other stakeholders this presentation will highlight the basic tension in this and other initiatives which summon, in varied ways, a tacitly accepted ideological agenda that cannot straightforwardly translate into benefits for the young people and families involved. In sum, we ask:

In what ways do political and economic forces influence the ‘success’ of a digital inclusion scheme?

We show that the hope of such schemes, that if sufficiently empowered, incentivised, and aspirational the disadvantaged can use access to technology to transform or transcend what Bourdieu (1992) calls their “class of conditions” (p53), is largely misplaced. In microcosm, the initiative demonstrates how a neoliberalist mind-set that is increasingly shaping the cultures and behaviours of our service providers and schools cannot solve the problems it creates. While in this initiative we have seen significant amounts of goodwill from all parties, both private and public, this often does not convert into meaningful change for the young people and families involved. As Foucault (2010) notes, neoliberalism’s project is “the overall exercise of political power modelled on the principles of a market economy” ” (p131). Moreover, “the only ‘true’ aims of social policy for neoliberalism can be economic growth and privatisation; thus the multiplication of the ‘enterprise’ form within the social body” (p148). In the home access initiative this was apparent, as the mentality to (both in schools and the private ISP) to prioritise work that is documented, measured, and audited, led to practices and support gaps that unintentionally disadvantaged the young people the initiative was designed to support. This was particularly the case at times when unanticipated contingencies and problems occurred.

Drawing on this rich case study data from the 30 families to critically assess the cultures and practices of the institutions that govern their lives, we will demonstrate how the challenges and realities of this initiative can be generalised to other well-intended schemes to address digital and social inequality and highlight the complexity of ‘levelling the playing field’. We aim to show that even in wealthy post-industrial economies the global networked society is not “simply a fact – that is, as something that is just given and therefore inevitable: it as a choice, a choice made by some and working in the interest of some” (Biesta, 2013, p734). The home access scheme exposes the tacit logic of the power structures that shape this choice. We explore the different “mentalities of government” (Dean,1999, p16) that produce the institutional regimes, knowledges, practices, and procedures that are “structured, internalised and normalised to exercise power over and through certain sectors of society” (Wyn & White 1997, p133), which, in this case, meant some of families inadvertedly became points of power’s application when they believed they were being helped.

References

Biesta, G. (2013). Responsive or responsible? Democratic education for the global networked society. Policy Futures in Education, 11(6), 733–744. http://doi.org/10.2304/pfie.2013.11.6.733

Bourdieu, P. (1992). The Logic of Practice. Studies in Philosophy and Education. Stanford University Press.

Boyd, D. (2014). It’s Complicated. Yale University Press.

Dean, M. (1999). Governmentality: Power and Rule in Modern Society. London: Sage.

Foucault, M. (1978). The Birth of Biopolitics: Lectures at the Collège de France, 1978-1979. On Neo-Liberal Governmentality. Palgrave Macmillan.

Hargittai, E. (2010). Digital Na(t)ives? Variation in Internet Skills and Uses among Members of the “Net Generation.” Sociological Inquiry, 80(1), 92–113. http://doi.org/10.1111/j.1475-682X.2009.00317.x

Hargittai, E., & Shaw, A. (2014). Mind the skills gap: the role of Internet know-how and gender in differentiated contributions to Wikipedia. Information, Communication & Society, 18(4), 424–442. http://doi.org/10.1080/1369118X.2014.957711

Livingstone, S., & Helsper, E. (2007). Gradations in digital inclusion: children, young people and the digital divide. New Media and Society, 9 (4 ), 671–696. http://doi.org/10.1177/1461444807080335

Robinson, L. (2009). a Taste for the Necessary. Information, Communication & Society, 12(4), 488–507. http://doi.org/10.1080/13691180902857678

Wyn, J., & White, R. (1997). Rethinking Youth. St Leonards: Allen & Unwin.

Draft BSA Guidelines for Digital Research: Case Study Dilemmas in Conducting Social Media Research in the field of Crime and Security

Back to Case Studies

By Matthew L Williams and Pete Burnap

Directors Social Data Science Lab, Cardiff University

A principal ethical consideration in most learned society guidelines on digital social research is to ensure the maximum benefit from findings whilst minimizing the risk of actual or potential harm (interpreted as physical or psychological harm, including discomfort, stress and reputational risk). All groups involved in the research, including social media users, commercial platforms and researchers, should be protected throughout the lifecycle of the project, from inception to data archiving. Users are often the primary concern given their vulnerability in the process. Potential for harm in social media research increases when sensitive data are collected. These data include personal demographic information (such as ethnicity and sexual orientation), information on associations (such as memberships to particular groups or links to other individuals known to belong to such groups) and communications of an overly personal or harmful nature (such as details on morally ambiguous or illegal activity and expressions of extreme opinion). These forms of sensitive information abound on social media networks. In some cases such information is knowingly placed online (whether or not the user is fully aware of who has access to this information). In other cases sensitive information is not knowingly created by users – this can often occur in cases of association between users (not everything can be known about another user before connecting, nor can changes in affiliation be monitored on a routine basis). This information can come to light through the process of analysis, visualization (of networks) and representation of social media data by researchers (Rupert 2015).

Most social media research projects are likely to encounter only the first type of sensitive information. This is certainly the case where topics focus on mundane social activities online. However, those projects that take as their focus behaviors that have been deemed problematic risk encountering multiple forms of sensitive information. Recent RCUK and government funded projects that have taken as their focus cyberhate following terrorist events (Burnap et al. 2014, Williams & Burnap 2015, Burnap & Williams 2015, Burnap & Williams 2016), the spread of racial tension online (Burnap et al 2015), the estimating offline crime patterns using online signals (Williams & Burnap 2016) and suicidal ideation (Scourfield et al. 2016) have encountered all forms of sensitive information outlined above. Here we take the example of cyberhate (Burnap et al. 2014, Williams & Burnap 2015, Burnap & Williams 2015, Burnap & Williams 2016) and provide an overview of our ethical decision-making process in sensitive social media research. The motivation for the ESRC and Google funded project stemmed from the increasing use of social media to communicate highly emotive reactions to events, such as terrorist attacks. The project’s objectives were to i) monitor hateful responses on social media following a series of events; ii) profile hateful social media networks; iii) link hateful content with other data, such as Google search terms and offline press; iv) model hateful information flows to identify enabling and inhibiting factors; and v) study forms of counter speech. The project drew upon both computational and social science research techniques. We used the COSMOS platform[1] to collect and visualise Twitter reactions to the murder of Lee Rigby in Woolwich. Our first ethical dilemma was therefore related to consent: (i) as researchers should we obtain consent from all users in the social media dataset? As our intention was to conduct only quantitative analysis and aggregate level visualization that retained the anonymity of users we were satisfied that the consent provided to Twitter in their Terms of Service satisfied our criteria for minimizing harm (see final paragraph for discussion of consent in qualitative social media research).

The next stage of the project required the use of machine learning algorithms to classify hateful content and to build networks of users. Automated text classification of social media content performs well when conducted on datasets around specific events. However, their accuracy decreases beyond the events around which they were developed due to changes in language use (Burnap & Williams 2015). Social network graph algorithms operate differently from classification algorithms, but they are also open to misrepresentation if there are data quality issues (such as missing data due to poor operationalisation of collection search terms). Reliance on algorithms presented the second ethical dilemma: (ii) how should researchers develop, use and reuse algorithm driven text classification and social network graph processes that have the consequence of labeling content and users as hateful (and in some cases potentially criminal)? Where text classification techniques are necessitated by the scale and speed of the data (e.g. classification can be performed as the data are collected in real-time), researchers must ensure the algorithm performs well (i.e. minimizing the number of false positives) for the event under study in terms of established text classification standards.[2] Furthermore, researchers have a responsibility to ensure the continuing effectiveness of the classification algorithm if there is an intention to use it beyond the event that led to its design. High-profile failures of big data, such as the inability to predict the US housing bubble in 2008 and the spread of influenza across the United States using Google search terms, have resulted in many questioning the power and longevity of algorithms (Lazer et al. 2014). Algorithms therefore need to be routinely tested for effectiveness and may need to be ‘refreshed’ with new human input and training data if false positives are to be minimized, avoiding the mislabeling of content and users. Where social network graphs indicate users are associated with particular groups, which if made public may cause distress or reputational risk, researchers must question the quality of the data used to generate the association (as would be expected in all scientific reporting) and make careful decisions on whether to publish such content. Where such information is published, every effort must be made to maintain the anonymity of users in the graph, including efforts to reduce the likelihood of deductive disclosure (Stewart and Williams 2005).

Following on from text classification, statistical model building was utilized to predict hateful information propagation around the Woolwich terrorist attack. These models identified which factors, such as type of user, network capital, and type of language used (such as counter-speech) enabled and inhibited hateful information flows. This presented the third ethical dilemma: (iii) is the process of identifying factors that stem the spread of online hate speech a universally accepted goal? This may seem like a redundant question to citizens of many European countries, where some forms hate and antagonistic speech are criminalised, including the UK. However, in the US hate speech is not criminalized, and online communications are protected by the first amendment. Therefore, project funders that are located in the US (such as Google) may not wish to be associated with research that infringes upon such protections. The researcher therefore must use their moral compass to balance these jurisdictional prerogatives with the pursuit of scientific objectivity.

Representation of our findings presented the fourth ethical dilemma: (iv) is it possible to present the content of hateful and counter speech in tweets in publication? Anonymous publication of actual examples of hateful tweets is precluded under Twitter Terms of Service. Twitter Terms of Service forbid the anonymization of tweet content (screen-name must always accompany tweet content), meaning that ethically, informed consent should be sought from each tweeter to quote their post in research outputs. However, this is impractical in most big data projects given the number of posts generated and the difficulty in establishing contact (a direct private message can only be sent on Twitter if both parties follow each other). Therefore, it is not ethical to directly quote tweets that identify individuals without prior consent. Furthermore, Twitter Terms of Service also requires that authors honour any future changes to user content, including deletion. As academic papers cannot be edited continuously post publication, this condition further complicates direct quotation (needless to mention the burden of checking content changes on a regular basis). However researchers should not conclude that conventional representation of qualitative data in social media research is precluded due to these Terms of Service. As in conventional qualitative research, researchers can make efforts to gain informed consent from a limited number of posters if verbatim examples of text are required (although posters must understand that anonymity is not possible in these cases given tweet text is searchable). In cases where consent is not provided, Markham (2012) suggests some innovative methods for protecting privacy in qualitative social media research. Acknowledging that traditional methods for protecting privacy by hiding or anonymising data no longer suffice in digital settings that are archived and searchable, Markham advocates bricolage-style reconfiguration of original data that represents the intended meaning of interactions. While this may be suitable for general thematic analysis, it may not satisfy the needs of more fine-grained approaches, such as conversation and discourse analysis.

Social Data Science Lab Risk Assessment and Ethical Principles

Social research ethics are at the core of the Social Data Science Lab’s programme of work. Recent work shows how users of social media platforms are uneasy about their posts being collected without their explicit consent (NatCen 2014, Williams 2015). However, many social media terms of service specifically state that users’ data that are public will be made available to third parties, and by accepting these terms users legally consent to this. In the Lab’s research programme we interpret and engage with these terms of service through the lens of social science research which often implies a higher ethical standard than provided in legal accounts of the permissible use of these kinds of data. The topic of ethics in social media research has been a key focus of ours and formed a primary research question in our first ESRC Digital Social Research Demonstrator Grant. Ethics as a topic continues to be embedded in our follow-on grants and we are continuously reflecting upon our practice as social and computational researchers. We are acutely aware of the key ethical issues of harm, informed consent, the invasion of privacy and deception as they relate to the collection, analysis, visualization and dissemination of social media data. Below we detail our risk assessment and ethical principles that have been adopted by several social science several research ethics committees in the UK.

Risk Assessment

Low risk – Tweet is from official/institutional account: Publish without seeking consent in most cases.

High risk – Tweets are from individual users and contain mundane or sensitive information (overly personal, abusive etc.). Must contact the user (direct message/@mention/email) and ask their permission to publish. Only publish if consent is received.

High risk – Tweet has been deleted precluding publication under Twitter Developer Agreement/Policy.

High Risk – Tweet is from a deleted account meaning it has been deleted precluding publication under Twitter Developer Agreement/Policy.

Ethical Principles

  • We abide by the Economic and Social research Council’s Framework for Research Ethics
  • All projects undergo Research Ethics Committee Review
  • Any significant changes to research design following Research Ethics Review approval are reported back to the Committee for re-approval
  • We abide by Twitter’s Developer Policy and Developer Agreement
  • We abide by the UK Data Protection Act 1998
  • We only use social media data for academic research purposes
  • We keep all information gathered on individual Twitter users confidential on secure password protected servers
  • We maintain the anonymity of all individual Twitter users in our research
  • We only publish in research outputs aggregate information based on data derived legally and ethically from the Twitter APIs
  • In research outputs we never directly quote individualTwitter users without their informed consent.  Where informed consent cannot be obtained we represent the content of tweets in aggregate form (e.g. topic clustering, wordclouds) and themes (decontextualised examples and descriptions of the meaning or tone of tweet content).  These forms of representation preclude the identification of individual Twitter users, preserving anonymity and confidentiality
  • In research outputs we do directly quote from Twitter accounts maintained by public organisations (e.g. government departments, law enforcement, local authorities) without seeking prior informed consent
  • We never share data gathered from Twitter APIs for our research outside of the COSMOS project team
  • We destroy all personal data if it is no longer to be used for research purposes

Funding: This work was supported by five Economic and Social Research Council grants: ‘Digital Social Research Tools, Tension Indicators and Safer Communities: a Demonstration of the Cardiff Online Social Media ObServatory (COSMOS)’, Digital Social Research Demonstrator Programme (Grant Reference: ES/J009903/1), ‘Hate Speech and Social Media: Understanding Users, Networks and Information Flows’, Google Data Analytics Research Programme (Grant Reference: ES/K008013/1), ‘Social Media and Prediction: Crime Sensing, Data Integration and Statistical Modeling’, National Centre for Research Methods (Grant Reference: ES/F035098/1/512589112), ‘Digital Wildfire: (Mis)information Flows, Propagation and Responsible Governance’, Global Uncertainties Ethics and Rights in Security Programme (Grant Reference: ES/L013398/1), and ‘Public Perceptions of the UK Food System: Public Understanding and Engagement, and the Impact of Crises and Scares’, Understanding the Challenges of the Food System Programme (Grant Reference: ES/M003329/1).

References

Burnap, P, Williams, M. L. & Sloan, L. (2014) ‘Tweeting the terror: modelling the social media reaction to the Woolwich terrorist attack’, Social Network Analysis and Mining, 4: 206.

Burnap, P. & Williams, M. L. (2015) ‘Cyber Hate Speech on Twitter: An Application of Machine Classification and Statistical Modeling for Policy and Decision Making’, Policy & Internet.

Burnap, P. and Williams, M. L. (2016) ‘Us and them: identifying cyber hate on Twitter across multiple protected characteristics’, EPJ Data Science 5, article number: 11. (10.1140/epjds/s13688-016-0072-6)

Burnap, P., Williams, M. L. Rana, O., Edwards, A., Avis, N., Morgan, J., Housley, W. and Sloan, L.. (2013) Detecting tension in online communities with computational Twitter analysisTechnological Forecasting & Social Change.

Lazer, D., Kennedy, R., King, G. and Vespignani, A. (2014), ‘The Parable of Google Flu: Traps in Big Data Analysis’, Sciecne, 343: 1203–5.

Markham, A. (2012) ‘Fabrication as ethical practice: Qualitative inquiry in ambiguous internet contexts’, Information, Communication and Society, 15(3): 334-353.

NatCen (2014) Research Using Social Media: Users’ Views, London: Natcen.

Ruppert, E. (2015) ‘Who Owns Big Data’, Discover Society, 23.

Scourfield, Jonathan Bryn, Colombo, Gualtiero, Burnap, Peter, Jacob, Nina Katherine, Evans, Rhiannon Emily, Zhang, Mei, Williams, Matthew Leighton, Housley, William and Edwards, Adam Michael (2016) The response in Twitter to an assisted suicide in a television soap operaCrisis: The Journal of Crisis Intervention and Suicide Prevention

Stewart, K. F. and Williams, M. L. (2005) ‘Researching online populations: The use of online focus groups for social research’, Qualitative Research 5(4): 395-416.

van Rijsbergen, C. J. (1979) Information Retrieval (2nd ed.), London: Butterworth.

Williams, M. L. and Burnap, P. (2015) ‘Crime Sensing with Big Data: The Affordances and Limitations of using Open Source Communications to Estimate Crime Patterns’, British Journal of Criminology. Online Advance Access.

Williams, M. L. (2015), ‘Towards an Ethical Framework for Using Social Media Data in Social Research’, presented at Social Research Association Workshop, Institute of Education, UCL, 15 June 2015.

Williams, M. L. and Burnap, P. (2015) ‘Cyberhate on social media in the aftermath of Woolwich: A case study in computational criminology and big data’, British Journal of Criminology. 56(2): 211-238

Williams, M. L., Edwards, A., Housley, W., Burnap, P., Rana, O., Avis, N., Morgan, J., and Sloan, L. (2013) ‘Policing cyber-neighbourhoods: Tension monitoring and social media networks’, Policing and Society 23(4): 461-481.

[1] http://socialdatalab.net/software

[2] Established measures include: precision (the fraction of retrieved tweets that are relevant to the search – i.e. for each class how many of the retrieved tweets were of that class); recall (fraction of tweets that are relevant to the search that are successfully retrieved – i.e. for each class how many tweets coded as that class were retrieved); F-Measure (a harmonized mean of precision and recall); and Accuracy (the total correctly classified tweets normalized by the total number of tweets). Results of 0.75 and above (on a scale of 0-1)s in each measure are considered outstanding (van Rijsbergen, 1979).

 

Draft BSA Guidelines for Digital Research: Case Study Mixed Methods Case Study

Back to Case Studies

This is an application form to an ethics committee to begin researching young people’s search and evaluation practices online using mixed methods.

ETHICS SUB-COMMITTEE APPLICATION FORM

Please note:

  • You must not begin your study until ethical approval has been obtained.
  • You must complete a risk assessment form prior to commencing your study.
  • It is your responsibility to follow the University’s Ethics Policy and any relevant academic or professional guidelines in the conduct of your study. This includes providing appropriate information sheets and consent forms, and ensuring confidentiality in the storage and use of data.
  • It is also your responsibility to provide full and accurate information in completing this form.
  1. Name(s):
  2. Current Position PhD Student
  3. Contact Details:

Division/School

Email

Phone

  1. Is your study being conducted as part of an education qualification?

            Yes                           No       

  1. If Yes, please give the name of your supervisor
  2. Title of your project:
  3. What are the proposed start and end dates of your study?

            October 2012 to June 2013

  1. Describe the rationale, study aims and the relevant research questions of your study

To find out:

When and why young people use the web to search for information.

How young people search for information; for example their choice of search engine and search query.

How young people judge credibility by discriminating between various sources of information

Are young people persuaded by contested information they find online?

If a young person’s socioeconomic status and education has any influence on these questions.

  1. Describe the design of your study

I have recruited two institutions to take part in this study. I will be working with a member of the management team at each college who also teaches. They have volunteered their students for this study and integrated my research methods with their student’s learning objectives.

These institutions are distinguished by their student’s socioeconomic status and journey through our education system.

Stage 1: Group interviews

Previous research in this area suggests young people need to be motivated by their interest in a topic before they research it thoroughly online. If I give them topics in which they have no interest, then they will perform perfunctory searches. The purpose of the group interviews is to discover which topics interest the students. In groups of 15 I will ask the students:

  • When and why do you use the web for search information?
  • What are the issues, topics, and questions would you look to the web to resolve?

During the interview, I will suggest examples and ask the students if they would use to the web to investigate them. For instance, is global warming man-made?

(See interview schedule)

Group selection will depend on:

The members of staff and students from each institution who are willing to participate.

Which students under the age of 18 who, having volunteered, have obtained consent from their parents to participate.

Stage 2: Collaborative writing project

The questions produced during the group interviews will be used for a collaborative writing project.

This will involve 3 sub stages:

  1. The students will be asked to record their responses individually (to the issues discussed during stage 1) in Word before any online research has taken place. These responses will be uploaded to a secure, password protected server at the University.
  2. Next, the students will be asked to construct responses again individually in Word, but this time using the web as resource. These responses will be uploaded to a secure, password protected server at the University. The search queries will be captured for analysis via a proxy server. When students use the web at each college their search history (the addresses of web pages they visited) is stored on a central computer on the college’s network called a web server. I will set-up a proxy server that will perform this function for students participating in my study. The proxy server will only capture data (search logs) that each institution captures already on its web servers but just for the students (or rather their machines) participating in the study.
  3. For the final stage the students will be asked to integrate their individual responses written during stage 3b into a wiki that reflects a group consensus on each topics. This stage will be videoed to observe the deliberations and interactions between the students during this process.

The wiki will be hosted by the University.

A similar wiki can be seen here (removed).

The students asked to create the wiki will be given pseudonyms for log-ins and explicit instructions not to identify themselves or the institution at which they study. Only I, as the wiki’s administrator, will be privy to this information.

The wiki will be written at each institution within the student’s normal timetable. It will be locked by me, as administrator, for editing outside these hours to prevent any contamination or abuse.

Each wiki page will have a discussion page within which the students will be encouraged to discuss and justify their choice of source.

Observation and Recording of Collaborative writing project

As well as asking students to document their deliberations, while they write the wiki, I will observe and video the project in progress and record my discussions with them about their choice of sources and credibility decisions.

The video is intended as an objective ‘memory’ of the process. I need to see who spoke to whom and when and compare this to the timeline of edits on the Wiki. During the debrief interviews I can also refer to the videos.

Stage 3: Debrief individual interviews

The dual purpose of the debrief interviews to assess each volunteer’s experience after they have time to reflect on the project and capture any thoughts or processes that were not revealed during the observations.

  1. Who are the research participants?

Approximately 30 post-secondary school students age 16-19 and specific teachers who have agreed to take part in the study.

  1. If you are going to analyse secondary data, from where are you obtaining it?

From the institutions at which the students are attending. For example student fees and any anonymised demographic data each institution can provide.

  1. If you are collecting primary data, how will you identify and approach the participants to recruit them to your study?

Recruitment will result from a process during which my point-of-contact at each institution will volunteer classes to participate. I will ask all the members of these classes if they are happy and willing to participate. If any of the students are aged under 18 I will seek parental consent before proceeding.

  1. Will participants be taking part in your study without their knowledge and consent at the time (e.g. covert observation of people)? If yes, please explain why this is necessary.

No

  1. If you answered ‘no’ to question 13, how will you obtain the consent of participants?

For each stage of the study I will seek the written consent of a member of each institution with the appropriate level of authority to do so, the teachers involved in the study, the students participating in the study and if necessary their parents (see consent forms).

  1. Is there any reason to believe participants may not be able to give full informed consent? If yes, what steps do you propose to take to safeguard their interests?

No

  1. If participants are under the responsibility or care of others (such as parents/carers, teachers or medical staff) what plans do you have to obtain permission to approach the participants to take part in the study?

For participating under-eighteens I will seek parental consent by writing to each participant’s parent (see parental consent form).

  1. Describe what participation in your study will involve for study participants. Please attach copies of any questionnaires and/or interview schedules and/or observation topic list to be used

Participation for young people would involve:

An hour long interview with approximately 15 of their peers to discuss how and why they use the web to find information. This is an opportunity to discuss topics or arguments they would use the web to help resolve.

A five hour collaborative writing project to be done within normal college hours within which they discuss and document sources that support their arguments.

A fifteen minute individual debrief interview to discuss their participation in the project.

  1. How will you make it clear to participants that they may withdraw consent to participate at any point during the research without penalty?

On the information sheets I will inform participants that there will be no repercussions if, at any time they wish to withdraw from the study by speaking to me; during the group interviews, during observations of the collaborative writing project or specifically from the debrief interviews. I will also give the participants my university email address so they can withdraw at any time via email. In case they feel uncomfortable addressing me, participants will able to indirectly withdraw from the study by informing a member of staff at their institution or parent or guardian.

  1. Detail any possible distress, discomfort, inconvenience or other adverse effects the participants may experience, including after the study, and you will deal with this.

Stage 1: Group Interviews

Although I will make every effort to avoid sensitive topics for the wiki, I am unable to predict how individual students may react to all possible topics. I will inform the students, from the outset, that if they find a topic problematic they should inform me or a member of staff at the intuition so I can withdraw the topic and/or the student from the study. For example, the students may want to research the link between mental illness and marijuana use and an individual in the group may have personal experience of this.

The interviews will be digitally recorded, removed from the recording device and transferred to a secure, password protected server at the University.

During the transcription and analysis of the recordings all the participants will be referred to by pseudonyms.

At all times during the study a member of staff from the institution will be present or in earshot.

Stage 2: Collaborative Writing Project

It is possible the participants may abuse the anonymous collaborative writing space with harmful behaviour such as bullying, flaming and trolling. I will closely monitor the wiki for such behaviour. As the wiki’s administrator I will have access to participant’s real identities. If any participant is behaving inappropriately I will use this access to inform the participating institution’s member of staff of the participant’s identity and negotiate appropriate action (for example issue a warning and if necessary remove any offenders from the study).

A member of staff from the institution will be present or in audible range.

The writing of the wiki will be video recorded. It is possible individual or all the students will be become uncomfortable with this at which point I will cease recording.

Stage 3: Individual Interviews

These will be individual interviews it is therefore possible participants will be uncomfortable in a one-to- one with a relative stranger.

I am an experienced teacher. I will use any opportunity to reassure the students and develop a working relationship prior to the individual interviews.

The interviews will be held in an open space or a room with an open door. A member of staff will be present or in earshot.

  1. How will you maintain participant anonymity and confidentiality in collecting, analysing and writing up your data?

The institutions will be given pseudonyms. The participants will be asked to create their own usernames. These usernames will be sanctioned by the member of staff representing the institution for appropriateness.

The participants will be referred to throughout by their usernames. If a username can be interpreted in such a way that can lead to a user’s real identity I will provide an alternative.

The search logs on the proxy server will only record searches performed by the machines and not the user. I will only be able to identify who searched what when by referring to the video.

  1. How will you store your data securely during and after the study?

The digital recordings of the group interviews, the offline discussions during observations and the debrief interviews will be removed from the recording device and uploaded to a password protected secure server hosted by the University.

The proxy server will be my laptop. Immediately after each session, the data files will be transferred to password protected secure server hosted by the University then removed from my laptop.

The videos will be recorded on a tape. Immediately after the recordings, the tape’s content will be uploaded to a password protected secure server hosted by the University of Southampton then deleted.

The wiki and all its data will be password protected. Only registered users will be able to view or edit its content. The wiki and its data cache will be encrypted and stored securely on a University server.

  1. Describe any plans you have for feeding back the findings of the study to participants.

I will publish the study’s findings on ePrints and distribute the url to all participants by letter addressed to their institution.

  1. What are the main ethical issues raised by your research and how do you intend to manage these?

The main ethical issues are:

  • I will be working with under-18s.
  • I may discuss of potentially ethically sensitive topics
  • I will be using primary data i.e. search logs, audio and video recordings.
  • The abuse of the wiki and its discussion pages.

Strategies to manage these risks are described above.

  1. Please outline any other information you feel may be relevant to this submission.

I am a former secondary school teacher. I am CRB checked. My training and experience will help identify and manage many of the risks identified above.

The use of search logs and video is unprecedented in this field of research and is therefore important to the overall thesis. For the searches I need a record of what the students searched and when; one I can use to discuss their choices during the interviews. During deliberations that influence the wiki’s content, I need to see who talked to whom and when. The video will be an objective record of how knowledge is socially constructed which I can refer to when interviewing the students.

 

 

Draft BSA Guidelines for Digital Research: Case Study Twitter

Back to Case Studies

This is an application form to an ethics committee to begin research using Twitter.

  1. Name(s):
  2. Current Position: PhD Student
  3. Contact Details:
  4. Is your study being conducted as part of an education qualification?  YES
  1. If Yes, please give the name of your supervisor
  1. Title of your project:

         Methodological approaches to Big Data

  1. i) What are the start and completion/hand-in dates of your study
  1. ii) When are you planning to start and finish the fieldwork part of your study?
  1. Describe the rationale, study aims and the relevant research questions of your study

This study aims to develop a new methodological approach to research on social media, specifically the Twitter micro-blogging service. Whilst there is already considerable research interest in the ‘influence’ of social media, studies of Twitter have – to date – only considered the extent and circulation of ‘Retweets’ (where users pass on an original post to their own followers) as the measure of ‘influence’.

The aim of the research proposed here is to extend this measure of influence by adding contextual information about followers and users’ Twitter network; and by exploring the relationship between tweets, re-tweets and the network of followers that surround a user. Collecting this additional information will allow assessment of the relationship between the users’ activity and the wider Twitter context within which this activity takes place. For example, it would be possible to see how a change in followers affects tweets; or how re-tweets might generate new followers.

This is an inter-disciplinary research project, which draws together perspectives from the social and computational sciences.

From sociology, the project takes theories of social activity and digital transformations in the information age, as well as epistemological debate about the relations between theory and method. From computer science, the specific interest is in the technical opportunities and constraints of data harvesting. Overall, this perspective, the research is mainly focused on the process of data collection, from the conceptualisation of what constitutes important information among the available data, to the implementation of the collection per se.

The focus of the project is not on particular participants or the activities associated with individuals but on (i) the conceptualisation and process of data collection and (ii) the aggregate analysis of the patterns and relationships in the digital data.

  1. Describe the design of your study

9.1 Access to the data

This data is publicly available via two Twitter Application Programming Interfaces (API).

The Stream API allows researchers to have access to the tweets; and the REST API provides an access to the users’ profile information. Profile information is to be used only for checking the accuracy of the harvesting process (see below).

The terms of use for these APIs are set up by Twitter company and the research complies with these rules. These rules can be found at the following address: https://dev.twitter.com/terms/api-terms.

9.2 Selection of the population

The research will explore the activity of 200 initial Twitter users in each of 3 different groups (total 600 initial users). These groups have been chosen purposively, to explore different types of online activity:

  1. An online group with a clearly existing offline community.
  2. A group formed around a Hashtag (a word following the symbol # to allow user to participate to the same discussion) to study the specificity of a community developed around a specific and temporal interest.
  3. A group of randomly chosen users.

9.3 Information collected

In each case, the study will collect profile information, network information and Twitter activity.

9.3.1.Profile information.

The profile information will contain the screen_name (the name displayed on the account user), the location (if it is set up by the user), the language (if it is set up by the user) and its id_str (a unique identifier created for each account on Twitter to identify them even if they change their screen_name).

The screen_name will only be used to perform manual checks on the data to validate the accurate operation of the script. All linking of screen name to data collected will be deleted as soon as this data collection is done. The language and the location will be retained to allow later analysis of shared/divergent characteristics in the user networks. The id_str which takes the form of an integer is retained as a unique identifier to collect information and to ensure the same user is tracked. This will only be used within the data collection process, not in any subsequent analysis or publication.

9.3.2.      Network Information.

Two types of inter-user links can be found on Twitter, the Followers and the Friends. The ‘Followers’ are those accounts following the user. The ‘Friends’ are the people the user is following.

A ‘snapshot’ of Followers and Friends will be taken for each of the original 200 users each time profile information is collected (as above). In order to trace the emergent networks of users, the research proposed here will take regular snapshots of each of the 200 users Followers and Friends.

This will generate a second list of users to be included in the research. The profile information for the second list will be collected according to the same principles outlined above. Regular snapshots will also be taken of these users network of Followers.

The regularity of the snapshot in both cases will depend on the number of users included in the second list and the REST API limitation (180 calls every 15 minutes). The fact that the second list of user is generated by the activity of the original 200 users makes impossible an a priori estimation of the number of API calls that will be needed. Therefore, the time interval will be dependent on the number of users included and the number of Followers and Friends every users have. However, a lower limit of one snapshot per day is set up in the script to ensure a regularity. This lower limit will then act as a limit of user in the second list and can change between two different datasets.

The size of the second list cannot be predicted either as it is based on the activity of the primary users. However, there will a total limit of 5000 users (the addition of the first list and the second list for each group) is needed as it is a limit from the Stream API or the limit of one snapshot per day which is a limit from the REST API.

This number of 5000 users does not means it will be only 5000 users but only 5000 users at the same time. After a defined period of time (to be confirmed if the list reaches the limit of 5000) during which there has been no activity between the second and first users second users will be dropped from the list. Therefore a user can be in the second list, then being removed, then being in the list again, depending on activity.

Every time the script collects information about a user, the list of current Followers and Friends is updated as well as any change in comparison to the previous list. This list itself contains only the id_str of the friends and followers.The id_str will be used to access collect Profile Information on Friends and Followers, but not the screen name..

9.3.3.      Twitter Activity.

The twitter activity is the tweet posted by the users on their public Timeline.

I have access to this information from both the Stream API and REST APIs,

The information collected is the text itself with a time stamp.

The text may contain URLs (links to other websites), hashtags (linking the tweet to other tweets) and direct mentions of other users. Whilst the URLs and hastags will be used for later analysis, any direct mentions will only be used to build the sample. .

Stream and REST API.

The Stream API is used to collect the tweet in real time while the REST API is used to  collect past tweets.

If a user from the first or the second list has a network activity with a new user, the  REST API is used in order to collect the last 3500 tweets of the new user. Then it is added to      the list of users screened by the Stream API.

The Stream API is used for two reasons: first, as a second access channel to Twitter data, to         overcome the limited number of calls to the API; and second to provide real time           information about users’ activity.

The users from the first list are added in the Stream API search terms. Every time they tweet       something or Retweet or are mentioned in someone else tweet, the API collects this         information.

The tweet is therefore stored and if the tweet mentions a user, this user is included in the second list. In this way, the activity centred around the publication of the message is in real-     time and does not use too much API call.

9.4 Storing data

The data collection does not involve any analysis, or even observation of the data, beyond some basic checking to ensure that the data harvesting is proceeding as planned. The entire process is automated through a script developed for this purpose.

The data are stored on a NOSQL database (using MongoDB) in scheme to facilitate the retrieval of information but does not add other personal information than the ones retrieved from Twitter (I can communicate the template if needed).

9.5. User Consent

During the data collection, I will use a specific twitter account created for the research, to contact each individual for whom data has been harvested to ask if they have any objection to the anonymised analysis of their data.

Users will be contacted via the Direct Message system of Twitter to give a link to a website (removed for anonymity) (the URL is temporarily, for the actual respondents, an university address will be use to ensure a better credibility). This website provides more information about the study, the harvesting of the data, the process of anonymization and the contact information for any enquiry.

At the bottom of the web-page, an opt-out form can give them the opportunity to be removed from the dataset (for the reason of an opt-out system, see the point 13 and 18) if they wish to.

It is planned to send a first message to the people added to the first level of the list r right after the data collection and one week, than three weeks after.

These message contains the same URL to the page with the option to withdraw, and a short message (140 characters maximum) to describe the link.

Message:  “I am a PhD student researching Twitter use. For this, I have collected some of the publically available information from your Twitter account. . Information on how to withdraw if you wish are given to the following link”

Second and third messages will only be sent to users who have not already replied.

After 4 weeks, if the participant has not expressed any wish to be removed from the dataset, I will consider that I can use the data for the analysis purpose.

This twitter account can be found here:

Data analysis

The purpose of the thesis is to develop an improved method for analysing Twitter data. The main focus of the thesis is on the development of the method and its theoretical implications rather than information about usage per se.

However, to test the hypothesis about the influence of context, networks and activity,some particular metrics will be used to analyse these data:

  • The evolution of number of Followers and Friends
  • The link shared within the tweets
  • The number of mentions and Retweet a user do and received.

To conduct the analysis, the dataset will be completely anonymized, removing any information which could lead to the identification of the user (see point 20 and 21).

  1. Who are the research participants?

There are three related (or ‘nested’) lists of participants:

  1. Level 1: The main users

200 users will be selected from each of three groups (see 9.1 above).

Consent will be sought prior to any data collection, as removing these ‘primary’ users later would cause significant disruption to the overall data set.

The information collected about these participants is Profile Information – Network Information and Twitter Activity.

  1. Level 2: The activity users

The second list of participants is dynamically created. It depends on who the participants from the first list interact with. Ifa participant on the first list mentions, retweets, adds or removes a user, this user is added the second list.

The information collected about these participants is Profile Information – Network Information and Twitter Activity. Further information collected differs from Level 1 users in two ways:

(i) if there is no further interaction after one week, this user is dropped from the list and no further information is collected, unless an interaction is again detected with a user from the primary list.

(ii) if these ‘second level’ users interact with other users, this interaction is not used to gather more people, it is only information kept to know their activity,

  1. Level 3: the contextual users

The third list is created by each user presented in the Followers and/or Friends networks from the primary and the secondary list. The amount of information is only limited to the id_str, and the information about the number of friends and followers they have, as well as the number of status published (containing the tweets they originally posted but also the Retweet they published), the number of friends and followers. No more information is collected.

No consent will be asked for this list of user as they are representing a social context. No information about their followers and friends list is retained.l. It is only used to be able to draw a network graph and see the overlapping interaction between the user from the first and the second list. Also, the size of this dataset make it impossible to individually contact the participants without being considered as an abusive behaviour under the Twitter contract of API use.

  1. If you are going to analyse secondary data, from where are you obtaining it?

            N/A

  1. If you are collecting primary data, how will you identify and approach the participants to recruit them to your study?

Please attach a copy of the information sheet if you are using one – or if you are not using one please explain why.

See above, section 10

  1. Will participants be taking part in your study without their knowledge and consent at the time (e.g. covert observation of people)? If yes, please explain why this is necessary.

The data collection will take place without participants’ knowledge. However the data will not be analysed before consent has been given.

The first reason is the nature of Twitter itself. An account does not necessarily imply a real person behind. It could be an organisation, a group of people or a robot and in this context, asking for prior consent can lead to remove some participants that will not pose any ethical issues while they will offer valuable insight for the study. For instance, an organisation will have different behavioural patterns than an individual, and it is expected that the study will show these differences.

The second reason is the possibility that users will not see the message in time. Some accounts are not active or very active, therefore the user associated to the account can miss the message before the collection of data. By sending several message with a sufficient time lap between them, it is possible to collect the data (automatically), and ensure the maximum visibility of it. By waiting for a prior consent, the relevance of the information is not possible. For instance, it is impossible to collect the tweets older than a week. If the user don’t reply in this interval, the information about the context is lost. .

The final reason, specific to the third list, is the number of people included in it and the limited amount of information collected. It is practically impossible to send a message to all people collected through the Network list, it involves thousands of users and the Twitter service will not allow any account to follow that many people in a short period of time and sending them a message. It will be considered by Twitter as spam and the account will be suspended.

For all these reasons, it is not possible to ask prior consent for user and it is why the opt-out system is adopted as a more efficient method.

  1. If you answered ‘no’ to question 13, how will you obtain the consent of participants?

            N/A

  1. Is there any reason to believe participants may not be able to give full informed consent? If yes, what steps do you propose to take to safeguard their interests?

            N/A

  1. If participants are under the responsibility or care of others (such as parents/carers, teachers or medical staff) what plans do you have to obtain permission to approach the participants to take part in the study?

            N/A

  1. Describe what participation in your study will involve for study participants. Please attach copies of any questionnaires and/or interview schedules and/or observation topic list to be used

Only observation, no interaction or question to the participants

  1. How will you make it clear to participants that they may withdraw consent to participate at any point during the research without penalty?

During the collection there is no consent, but for the analysis, a private message will be sent with a Twitter account created for this purpose. This account will give detail about the research and contact details (see point 9 above).

The message will give a link to a web page describing the purpose of the research, the respect of the anonymity and the possibility to remove the data from the dataset.

  1. Detail any possible distress, discomfort, inconvenience or other adverse effects the participants may experience, including after the study, and you will deal with this.

            N/A

  1. How will you maintain participant anonymity and confidentiality in collecting, analysing and writing up your data?

The collected data will not be anonymized at the first stage of the collection as the identity (represented by the id_str as an unique identifier used by Twitter) is important to ensure the quality of the dataset. The screen_name is only present to do some manual checks, mainly to make the reading of the script’s logs easier, and will not be used for any other purpose. The screen name will be dropped prior to any analysis and from this point on users will be labelled with an anonymised random number, automatically generated and separated from the user ID. Prior to any analysis, every id str will be matched in a separate database created for the purposes of this research with a random number. It is this number that will be used to conduct analysis. Information about profile location and language will still be stored but these information will not published directly associated to one user. That will give only valuable information on the dataset for the analysis purpose (as knowing the spread of the dataset over the world and the different language spoken).

The destruction of the database containing the concordance between random number and Twitter id will be done, only at the very end of the research. It is to allow me to be able to remove people if they are asking for it, even after I start to analyse the data (a possibility is to keep this database as long as the dataset is available, if this latter option is better, then the database will be encrypted and stored on a different server than the server hosting the dataset).

  1. How will you store your data securely during and after the study?

The data will be stored in a virtual machine hosted on University server. The only person who has access to it is me. The table having the correspondence between the Twitter Id and the random number will be stored on a password protected laptop behind the University fire wall. .

  1. Describe any plans you have for feeding back the findings of the study to participants.

The feedback about the findings will use the same methods as for contacting the users before and during the analysis, described in 18. The feedback will contains information about the phd itself, the main result founded and the assurance that none of the data are identifiable. In case of a publication, the reference of the publication will be given as well

  1. What are the main ethical issues raised by your research and how do you intend to manage these?

The research is based on publicly available information However, whilst users have posted information publically and indeed the purpose of Twitter is to tell the world ‘what’s on your mind’ we cannot assume that users are aware of the possibilities for analysis of the data that they are posting online. For this reason all data will be anonymised following emergent practice in the field of Twitter research. Furthermore, users included in the study will be given the opportunity to opt-out prior to data analysis.

The opt-out approach is chosen due to the very nature of Twitter. It is impossible to know if the user is a human, a bot, or a company. Therefore, it is only if the user actively expresses a desire not to be included in the analysis that all data about him/her/its, will be removed.

  1. Please outline any other information you feel may be relevant to this submission.

N/A

Draft BSA Guidelines for Digital Research: An Overview of Web Research Guidance

Back to Ethics Home

Principles of research ethics and ethical treatment of persons are codified in a number of national and international policies and documents, such as the UN Declaration of Human Rights, the Nuremberg Code, the Declaration of Helsinki and the Belmont Report. On an international level, privacy rights are primarily dealt with by Article 8 of the European Convention on Human Rights (ECHR Human Rights Act, 1998 http://www.legislation.gov.uk/ukpga/1998/42/contents), which protects the right to respect for private and family life and correspondence. In the UK these ethical considerations are linked, but not restricted to, legislation enshrined in the Data Protection Act 1998 (DPA) http://www.legislation.gov.uk/ukpga/1998/29/contents, which governs the protection of personal information. Although the Act does not reference privacy specifically, it is designed to protect people’s fundamental rights and freedoms and in particular the right to privacy in relation to the processing of personal data. This means that data must be kept securely and does not lead to a breach of confidentiality or anonymity. Compliance with the Act is regulated and enforced by an independent authority, the Information Commissioner’s Office http://www.ico.gov.uk.Individuals who feel that use of their data has breached the principles of the DPA can report their misgivings to this office. Research may also be subject to the ECHR and the DPA; this is distinct from guidance issued by learned societies (e.g. the British Sociological Association). Legislation concerns rights, which may be enforced and involve litigation, while guidance from learned societies address codes of conduct, which if breached might be dealt with according to the specific practices of the society rather than involving the rule of law. Policies and frameworks governing ethics in research predate the Web, however learned societies offer some guidance about ethics in web research.

A good starting point is the Association of Internet Researchers (AoIR) has produced some ethical guidelines for online research (Ess and AoIR, 2002; AoIR, 2012 http://aoir.org/ethics/ ). Ethical judgment must be based on a sensible examination of the unique object and circumstances of a study, the research questions, the data involved, the type of analysis to be used and the way the results will be reported – with the possible ethical dilemmas arising from that case.

The British Educational Research Association Ethical Guidelines (BERA) (2011) has a particular focus on avoiding harms when considering online research. Hammersley and Traianou (2012) discussed the minimisation of harm – specifically, whether a research strategy was likely to cause harm and if so how serious it would be, and whether there was any way in which it could be justified or excused. Harms might arise from asking for consent, or through the process of asking for consent, and can apply to both the forum members and the researcher; the act of sending participation requests may in itself be intrusive.

The Market Research Association (MRA) guide to the top 16 social media research questions stipulates that researchers should learn about and be comfortable with important explanatory variables beyond traditional respondent demographics, such as how different websites generate and facilitate different types of data (e.g. whether data is more positive versus negative, descriptive versus condensed etc.) In social media research it is commonly understood that conversations are generally public and viewable by almost anyone, and as such the individual under observation may or may not be aware of the presence of a researcher. This can lead to the likelihood of “social observational bias”. Users may participate in social media for different reasons (e.g. personal or professional) and this can affect the type, sincerity and direction of the user’s comments, which may be unrecognised by the researcher. Informed consent is encouraged when research might prejudice the legitimate rights of respondents, and researchers should exercise particular care and consideration when engaging with children and vulnerable people in web research; however, the Market Research Society/Market and Social Research (Esomar) states that if it is public data there is no need for informed consent. These guidelines structure the choices that researchers make about procedural and resulting ethical issues.

The Council of American Survey Research Organisations (CASRO) social media guidelines suggest that where participants and researchers directly interact (including private spaces), informed consent must be obtained in accordance with applicable privacy and data protection laws. However, it is unclear whether pure observation, where data is obtained without interaction with the participant, would fall under this remit, as no direct reference to this type of research is offered.

The British Psychological Society and the British Society of Criminology have also updated their guidelines to include online research  http://www.bps.org.uk/what-we-do/ethics-standards/supplementary-guidance-use-social-media/supplementary-guidance-use-socia http://www.britsoccrim.org/codeofethics.htm

And, www.bps.org.uk/system/files/Public%20files/inf206-guidelines-for-internet-mediated-research.pdf

These take into account the problems that may arise, such as legal and cultural differences across jurisdictions, online rules of conduct and the blurring of boundaries between public and private domains.

Cardiff University’s Collaborative Online Social Media Observatory (COSMOS) have produced an ethics resource guide to social media research: http://www.cs.cf.ac.uk/cosmos/ethics-resource-guide/This considers the ethical connotations of harvesting and archiving large amounts of ‘readily available’ online data. With a focus on Twitter, as a platform for research, COSMOS recognises that although such spaces are in the public domain, they are subject to conditions of service. Anonymity and data storage are presented as key ethical concerns. The guide delegates to the AoIR (2012) and their primary concerns of human subjects, data/text and personhood, and the public/ private divide. A useful resources list is provided.  

The ESRC framework for research ethics, updated in January 2015, acknowledges the unique and often unfamiliar ethical challenges of undertaking online research, such as what constitutes ‘privacy’ in an online environment? How easy is it to get informed consent from the participants in the community being researched? What does informed consent entail in that context? How certain is the researcher that they can establish the ‘real’ identity of the participants? When is deception or covert observation justifiable? How are issues of identifiabilty addressed? The Association of Internet Researchers 2012 report and the BPS ‘Ethics Guidelines for Internet-mediated Research’ 2013 are referred to as key sources amongst the growing literature on online research ethics.

 

Draft BSA Guidelines for Digital Research: Thinking through ethics: starting with exemptions and moving to dialogue

Thinking through ethics: starting with exemptions and moving to dialogue

Back to Ethics Home

One way of thinking through ethics in your research project could be to start thinking about how exemptions to informed consent and confidentiality (including anonymity) apply to your project. Then subsequently thinking though how these exemptions break down around the edges in practice, requiring dialogue with those you wish to carry out your research with; particularly in relation to how the concept of “public” works on the Internet, the blurred distinction between private and public, and issues of mismatch between perspectives of the researcher and “the researched”.

This is not an attempt to avoid consent and confidentiality by identifying loopholes, rather it is a suggested initial pathway through a vast amount of information and complex issues. It can also be seen as a way of drawing an outline around the space that requires consent and confidentiality, and identifying entry points into that space in order to understand why and how consent and confidentiality are crucial factors in your research relationships. There are many other pathways, and others may be more suitable for your research project. This pathway refers to a selection of points from a much wider range of sources and ethical issues. For a comprehensive overview of points raised by ethical guidelines in relation to the Internet see the see  Web Research Guidance: Overview of Guidelines, and for an understanding of situational ethics see.

Table 1 collates exemptions to informed consent and confidentiality from various guidelines and texts regarding visual and online research (please refer to original texts to fully understand exemptions in context). Several exemptions cluster around the idea of the public, where research may take place in public space, or make use of publicly available information without seeking informed consent or applying confidentiality. They range from more open statements such as “Confidentiality is not required with respect to observations in public places” in the International Visual Sociology Association’s (IVSA) “Code of Research Ethics and Guidelines” (Papademas and IVSA, 2009, p. 254); to more restricted criteria such as “unless consent has been sought, observation of public behaviour needs to take place only where people would ‘reasonably expect to be observed by strangers’” in the British Psychological Society’s “Report of the Working Party on Conducting Research on the Internet” (BPS, 2007, p. 3). Reading the texts in detail provides more precise pointers on how this research could be conducted ethically, for example, the IVSA sanctions the use of recording technology in public places without informed consent, when those observations are “naturalistic”, “it is not anticipated that the recording will be used in a manner that could cause harm”, and the recording technology is used “visibly” (Papademas and IVSA, 2009, p. 256) (i.e. presumably those people in the public space could be aware they are being recorded and have the opportunity to dialogue with the researcher, object, or remove themselves from the space). However, seemingly clear guidelines are rendered unstable in practice by the blurred distinction between the public and the private (AoIR, 2012, p 6-7; BPS, 2007, p. 3; BSA, 2002, p.5; BSAVSSG, 2006, p. 7; Kozinets, 2015, p. 138). As indicated by the British Psychological Society exemption cited above, people may be acting in public but not reasonably expect to be observed by strangers, let alone for those observations to be used in research. This can lead to a mismatch between the expectations of the researcher and “the researched” regarding the public/private distinction.

In any case, further issues are raised by how the concept of “public space” applies online. That is, although information online may be freely and easily available to read, does that mean this is information in “public space”? In “Netnography”, drawing on Bassett and O’Riordan’s claim that it is faulty to view the Internet as a type of place or social space (2002), Kozinets argues that “the Internet is actually textlike and spacelike [and] these qualities exist both separately and simultaneously” (2015, p. 135). Where the Internet is conceived of as a (published) text, the primary issues would not necessarily be informed consent, confidentiality and anonymity, rather the issues would be authorship, the obligation to credit authorship, copyright, Creative Commons, or any other license or terms and conditions, under which the text is made available online. In this vein, Kozinets highlights Bassett and O’Riordan’s approach (2002), where “citation or quotation of the clearly published and publicly displayed information – including it would seem, previously private data, such as an author’s name – is the correct and ethical course of action” (Kozinets, 2015, p. 136). However, you may not know whether an author should be credited, or treated as anonymous, unless you consult with the author her/himself, and even authors sharing work in the same online space may have different perspectives on this (Bassett & O’Riordan, 2002, in AoIR, 2012, p. 13-14).

Some information shared online comes with specifications on how that information may be re-used, for example through Creative Commons licenses which allow a range of options, from completely free re-use by anyone, to only re-use of the entire work in non-commercial ways (Creative Commons, 2016b). Bearing in mind that although people using Creative Commons licenses are giving others some level of permission and direction in advance, they may still welcome and hope for contact and dialogue with others who are interested in their work. In any case, some people sharing online may not know about these options, may be sharing via online tools that do not offer these options or simply enforce other terms and conditions, or may not have thought fully in advance about other people re-using their information. Even if you are legally allowed to re-use some online information, there are still no absolute guarantees that those who share their information on the Internet will feel 100% happy with you using their information in your research, and will not feel they have been harmed in any way. Again, this can be understood as a mismatch problem.

Mismatch is an issue which can potentially be addressed with a dialogic and situational approach. The Association of Internet Researchers (AoIR) Ethics Working Committee has developed a very useful practice-focused set of recommendations based on a “dialogic, case-based, inductive, and process approach to ethics” (AoIR, 2012, p. 5). The recommendations include a detailed set of questions which researchers can use to help themselves reflect about ethical decision making in their project. Many issues are explored, and mismatch is raised:

What is the ethical stance of the researcher? (For example, a mismatch between the ethical stance of the researcher and the community/participant/author may create ethical complications). (AoIR, 2012, p. 9)
Would a mismatch between researcher and community/participant/author definitions of ‘harm’ or ‘vulnerability’ create an ethical dilemma? If so, how would this be addressed? (AoIR, 2012, p. 10).

The AoIR uses the term “dialogic” to describe two-way ongoing communication between the researcher and the community/participant/author; whilst other potentially useful sources may talk in terms of collaborative or participatory approaches. In “Visual Methodologies”, Rose emphasises “that collaborative research (that is also reflexive) is an effective strategy for ethical research” (Banks, 2001, in Rose, 2012, p. 335-336). There are many practices which may have transferable advice on developing communication in your research project, such as the long standing Participatory Action Research (Fals Borda and Brandão, 1986), the more recent Participatory Video (InsightShare, 2006), and traditions of collaborative artistic practice (such as: Ribalta et al, 2005; Cohen-Cruz and Schutzman, 2006). If your research is focused on large scale data, obtaining informed consent, let alone developing communication with the people at the source of the data, may seem challenging or impossible (Rotman et al, 2012, p. 211); begging the question, how could issues of mismatch ever be resolved? The British Psychological Society offers some defined limits on the researcher’s responsibility which could be helpful in instances of mismatch: the risk that the researcher needs “to consider and inform participants about” is “the extent to which their own collection and reporting of data obtained from the internet would pose additional threats to privacy over and above those that already exist” (2007, p. 3).

In conclusion, starting with exemptions to informed consent and confidentiality can be one way of structuring thinking through ethical process in your research project – including ultimately the kind of consent and confidentiality you may need in your project. Whilst guidelines, terms and conditions, and licenses may suggest there are cases in which informed consent and confidentiality are not strictly speaking necessary, there are other layers of considerations which can still lead you into dialogue with those you wish to carry out your research with. As the AoIR asks, “If an ethics board deems no consent is required, will the researcher still seek subjects’/participants’ consent in a non-regulatory manner?” (AoiR, 2012, p 11). Firstly, regardless of the regulations, your understanding of ethical and high quality research may involve having dialogic and consensual relationships with those people you carry out your research with; you may want to share your research, exchange information, network and build longer term relationships with those people. Secondly, due to the potential for mismatch between the perspectives of the researcher and “the researched” – for example with regards to the private/public distinction and crediting authorship-vs-anonymity – communication may be necessary to identify and negotiate mismatch. Thirdly, especially since digital research is still an evolving area, unexpected issues may arise, and having communicative relationships in place gives the research project a better chance of resolving any problems.

Finally, on that last note, we should remember that “the fields of internet research are dynamic and heterogeneous [as] reflected in the fact that as of the time of this writing, no official guidance or ‘answers’ regarding internet research ethics have been adopted at any national or international level” (AoIR, 2012, p.2). Aside from ever-changing technological contexts, and the unstable public/private distinction, the AoIR also identifies the complex and unresolved relationship between data and persons: “Is one’s digital information an extension of the self?”. The data/person relationship is a central issue for research ethics, as ethics aim to minimise harm, and harm is typically understood in relation to “persons” (2012, p. 3, 6-7). This all leads back to reiterating the relevance of a dynamic, situational, process-based and dialogic approach to ethical digital research; where you anticipate that unforeseen situations, issues, and technologies may arise, and you are prepared to engage in an ongoing way.

Table 1.

Exemptions from informed consent and confidentiality

(limitations to these exemptions are in bold)

Public places, publicly available information, public organisations, governments, public officials and public agencies “Confidentiality is not required with respect to observations in public places, activities conducted in public, or other settings where no rules of privacy are provided by law or custom. Similarly, confidentiality is not required in the case of information available from public records.” (Papademas and IVSA, 2009, p. 254)

“Visual researchers may conduct research in public places or use publicly-available information about individuals (e.g. naturalistic observations in public places, analysis of public records, or archival research) without obtaining consent.” (Papademas and IVSA, 2009, p. 255)

“In the UK and the USA, anyone is allowed to take photographs in public places, even if the photo shows a private place” (Rose, 2012, p. 334)

“There may be fewer compelling grounds for extending guarantees of privacy or confidentiality to public organisations, governments, officials or agencies than to individuals or small groups. Nevertheless, where guarantees have been given they should be honoured, unless there are clear and compelling public interest reasons not to do so.” (BSA,2002, p. 5; BSAVSSG, 2006, p. 6-7) “unless consent has been sought, observation of public behaviour needs to take place only where people would ‘reasonably expect to be observed by strangers’” (BPS, 2007, p. 3)

Public Domain Mark 1.0 (Creative Commons): “This work has been identified as being free of known restrictions under copyright law, including all related and neighboring rights. You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission.” (Creative Commons, 2016a)

Non-regulatory consent “If an ethics board deems no consent is required, will the researcher still seek subjects’/participants’ consent in a non-regulatory manner?” (AoiR, 2012, p 11)
When people agree to being identified

 

“Reasonable bases for using identifying information [include] public images of individuals or agreed usage of images by research participants who elect to have information released” (Papademas and IVSA, 2009, p. 254)
When people should be credited as authors

 

If an individual or group has chosen to use Internet media to publish their opinions, then the researcher needs to consider their decision to the same degree that they would with a similar publication in traditional print media.” (Bassett and O’Riordan, 2002, p. 244)

“The authors opine that citation or quotation of the clearly published and publicly displayed information [online] – including it would seem, previously private data, such as an author’s name – is the correct and ethical course of action” (Kozinets on Bassett and O’Riordan, 2015, p. 136)

Creative Commons and Copy Left

 

Attribution Creative Commons License: “This license lets others distribute, remix, tweak, and build upon your work, even commercially, as long as they credit you for the original creation. This is the most accommodating of licenses offered. Recommended for maximum dissemination and use of licensed materials.” (Creative Commons, 2016b)

Attribution-NonCommercial-NoDerivs Creative Commons License: “This license is the most restrictive of our six main licenses, only allowing others to download your works and share them with others as long as they credit you, but they can’t change them in any way or use them commercially.” (Creative Commons, 2016b)

“Copyleft is a general method for making a program (or other work) free, and requiring all modified and extended versions of the program to be free as well.” (Stallman, 2016).

Community/participatory research

 

“Various research methods do not require anonymity. Among these are: community/participatory research, and individual case studies involving individuals who consent to using identifying information (e.g. own names and visual representations).” (Papademas and IVSA, 2009, p. 254)
Use of recording technology

 

“Visual researchers like other members of the public have the means and right to record images that may, at the time, not seem invasive. Subsequent use of these images must be circumspect, given legal standards of public domain and fair use standards.” (Papademas and IVSA, 2009, p. 255)

“Use of Recording Technology. Researchers obtain informed consent from research participants, students, employees, clients, or others prior to photographing, videotaping, filming, or recording them in any form, unless these activities involve simply naturalistic observations in public places and it is not anticipated that the recording will be used in a manner that could cause harm. Efforts to respond ethically to unintended circumstances and consequences are necessary in a multi-mediated environment. Reasonable efforts may include the visible use of technology” (Papademas and IVSA, 2009, p. 256)

Illegal activities “Images depicting illegal activities, including criminal damage, sexual violence and hate crime do not have the privilege of confidentiality.” (BSAVSSG, 2006, p. 3)
Legal privilege “Research data given in confidence do not enjoy legal privilege, that is they may be liable to subpoena by a court and research participants should be informed of this.” (BSA, 2002, p. 5)


 

References

 

AoIR (Association of Internet Researchers) (2012). “Ethical Decision-Marking and Internet Research”.
Available at (15.01.2016): http://www.aoir.org/reports/ethics2.pdf

Bassett, Elizabeth H. and Kate O’Riordan (2002). “Ethics of internet research: Contesting the human subjects research model”, Ethics and Information Technology, Vol. 4. P. 233-247.
Available at (15.01.2016): http://www.nyu.edu/projects/nissenbaum/ethics_bas_full.html

BSA (British Sociological Association) (2002). “Statement of Ethical Practice for the British Sociological Association”. Available at (15.01.2016): http://www.britsoc.co.uk/media/27107/StatementofEthicalPractice.pdf?1452510467677

 

BSAVSSG (Visual Sociology Study Group of The British Sociological Association) (2006). “Statement Of Ethical Practice For The British Sociological Association – Visual Sociology Group”.
Available at (15.01.2016): http://www.visualsociology.org.uk/about/ethical_statement.php

 

BPS (British Psychological Society) (2007). “Report of the Working Party on Conducting Research on the Internet”. Available at (15.01.2016):
http://www.bps.org.uk/sites/default/files/documents/conducting_research_on_the_internet-guidelines_for_ethical_practice_in_psychological_research_online.pdf

Creative Commons (2016a). “Public Domain Mark 1.0”.
Available at (15.01.2016) https://creativecommons.org/publicdomain/mark/1.0/

Creative Commons (2016b). “About The Licenses”.
Available at (15.01.2016): https://creativecommons.org/licenses/

Cohen-Cruz, Jan and Mady Schutzman (eds.) (2006), A Boal companion: dialogues on theatre and cultural politics, p. 78-86. Routledge, New York.

Kozinets, Robert V. (2015) Netnography Redefined. Sage, London.

Fals Borda, Orlando y Carlos R. Brandão (1986), Investigacion participativa, Instituto del Hombre, Ediciones de la Banda Oriental, Montevideo.

InsightShare (2006). Insights into Participatory Video. A handbook for the field. InsightShare, Oxford.

Available at (15.01.2016): http://insightshare.org/resources/pv-handbook

Papademas, Diana and IVSA (International Visual Sociology Association) (2009). “IVSA Code of Research Ethics and Guidelines”. Visual Studies, Vol. 24(3), p. 250-257.

Ribalta, Jorge et al, (2005), Jo Spence. Beyond the Perfect Image. Photography, Subjectivity, Antagonism. MACBA, Barcelona.

Rose, Gillian (2012). Visual Methodologies. Sage, London.

Rotman, Dana et al (2012). “Extreme Ethnography: Challenges for Research in Large Scale Online Environments”. iConference 2012, February 1-10, Toronto, ON, Canada.

Stallman, Richard (2016). “What is Copyleft?”.
Available at (15.01.2016): http://www.gnu.org/copyleft/copyleft.html

Draft BSA Guidelines for Digital Research: Case Study – Online Forums

Back to Case Studies

Researching Online Forums

Online forums are discussion groups where people converse about topics of mutual interest. Public forum data can be accessed with little difficulty or interaction with the group and do not require password access or user registration, with posts accessible in the same way as letters to a newspaper, or a conversation on a bus. It is not possible to see who is reading the conversations, but users who wish to comment identify themselves, often using a pseudonym. Private forums, on the other hand require registration and passwords to access. Text from forums can be gathered by a computer programme or by manual copy­and­paste functions. Research involving online forums raises ethical issues relating to informed consent of human subjects, protection of privacy and anonymity of research subjects. These issues are not independent, instead they should be addressed together to mitigate overarching ethical concerns. They will be addressed in turn, with suggestions for best practice ethical approaches. However, this advice should be tailored to individual research projects, and are not intended as a ‘one size fits all’ approach.

Obtaining Informed Consent 

Data from online forums are readily accessible to anyone, and, if archived, are accessible to the public months or years after messages were posted (Frankel and Siang, 1999). This type of research could be exempt from the informed consent requirement, if it is conducted in public (Liu, 1999). However, due to the lack of public awareness, some commentators/researchers have argued that messages within online communities should not be collected without the author providing prior permission (Marx, 1998; King, 1996). Wilson and Atkinson (2005) also question whether online ethnography might be a form of ‘electronic eavesdropping’. An individual might post information on his or her public profiles to be shared with friends and peers; however, this does not mean that they have consented for this information to be collated, analysed and published, in effect turning them into research subjects (Eysenbach & Till, 2001). Hudson and Bruckman (2004) found that while it might be widely considered ethically acceptable to capture and analyse interactions and conversations in a public square without consent, this model did not match the expectations of their participants in real­time chatrooms (2004). Nevertheless, Eysenback and Till (2001) have contended that it is ethical to record activities in a public place without consent, provided individuals are not identifiable. Human subject research norms such as informed consent do not apply to material that is published. However, the nature of online content means that it is more complex to distinguish between published and non- published material (Bruckman, 2004:103). Ethical approach – Informed consent is not legally required to access data from publicly available forums, as they are in the public domain (as with much of the Web, the legal frameworks and case law have yet to be made to govern this aspect of digital technology), but this is not to say that consent should automatically be overlooked. There are pitfalls both from attempting to obtain informed consent and bypassing it.

The following discussion provides a consideration of both approaches and these could be applied to individual cases, to help determine the best course of action. Obtaining informed consent in either public or private forums, may involve the researcher posting to communities or individually contacting users and providing them with participant information sheets and consent forms to sign. This would require the researcher to join the community as a user, revealing their true identity and the purpose of their study. In some situations it might be precarious for the researcher to reveal such personal information, for example if the topic was sensitive. Furthermore, disclosure might disrupt the ‘naturalistic’ research environment. There are also practical difficulties involved in procuring informed consent from all members of online communities, as not everyone may see posts, and some members may have left, leaving their contributions still visible. However, seeking such permission can also create further ethical problems. In other studies, researchers have sought informed consent and found similar unforeseen impact on group processes. King (1996) cites one member of an email support group who, in response to continual posts to the list from people wishing to conduct research, refused to “open up” online to be “dissected” (1996:122). Hewson et al. (2003) also question whether contacting potential participants may be viewed as “spamming”, itself an invasion of privacy (Hewson et al., 2003:40). In contrast, the covert approach might enable research to be undertaken without risk or harm to the community, especially where a posted site policy notifies users of its public access, which is a point noted by Sveningsson (2004). It would be advantageous for researchers wishing to conduct analysis of posts and archives to consult the introductory notes or terms of electronic forums, this is a view supported by Langford (1996).Terms may openly request that research should not be carried out on the forum. Where clear directives do not exist, it may be possible to contact the list moderator and gain permission to conduct research. However, researchers need to bear in mind that any permission gained may not necessarily be viewed as consent by all members of the group (Reid, 1996). Whether consent needs to be obtained from individual contributors or from communities and online system administrators is fraught with uncertainty. The issue of ownership/intellectual property of the data may be addressed in the terms and conditions, but the moderators cannot speak for the forum users. What is public and what is private is blurred on the Web. It is not sufficient simply to rely on whether a site is public or not; privacy and confidentiality are further important considerations for researching online forums. These issues will now be discussed in more detail.

Privacy and Confidentiality

In online environments that are publicly viewable, such as discussion groups, individuals’ expectations may be different from in communications offline, or in private digital correspondence such as email (Smith, Dinev and Xu, 2011). It is not always possible to determine whether users are aware of the public status of their contributions from the contributions themselves, or whether interaction with the user is required. Individual and cultural definitions and expectations of privacy are ambiguous, contested and changing. People may operate in public spaces but maintain strong perceptions or expectations of privacy. Frankel and Siang (1999) have suggested that people may be more open online due to a false or exaggerated expectation of privacy (Frankel and Siang, 1999:6). Other groups have attempted to clarify the boundaries of public data for research (Sveningsson, 2003; McKee and Porter, 2009). According to the ethical guidelines of the AoIR, public forums can be considered more public than, for example, conversations in a closed chatroom (Ess and AoIR, 2002:5, 7). While Basset and O’Riordan (2002) state that the lacking of applicability of a private sphere implies that all discourse lies de facto in the public sphere. However, Bakadjieva and Feenberg (2001) offer a different perspective, suggesting that the type of research and corresponding forms of relationship between the researcher and the subject has an impact on whether or not a space should be considered public or private. Though conversations may occur in public spaces, the content could be private. In such circumstances, people may accidentally disclose personal information that could identify them in the research. As noted in the 2002 version of the AOIR ethics guidelines, privacy is a concept that must include a consideration of expectations and consensus. When conducting research within such shifting terrains, when there is no consensus, or even assumption of consensus, the AOIR suggest that Nissenbaum’s concept of contextual integrity (2011) is a valuable construct. The accessibility of online discussions may suggest that they are freely available in a public arena; however, some researchers question whether the availability of information on the Web necessarily makes this information public. For example, Heath et al. (1999, cited in Grinyer, 2007:2) suggest that research involving ‘lurking’ encroaches on privacy and creates an unequal power relationship. Ethical approach ­ The following discussion applies to public forums. For collection purposes, merely treating forum data as public text used for documentary analysis is insufficient, as the thoughts and intentions of those who had produced the information should be considered. Examination of people’s feelings about that situation – the ethic of reciprocity, or Golden Rule, where the researcher considers how they would feel if the roles were reversed ­ was considered, in order to appreciate how those observed might respond to the research (Honderich, 1995; Rawls, 1958). This would impact on whether the environment is considered public or private; for example, if someone was talking in a public space, it would be reasonable to expect that their conversation could be heard and accessed by others. However, this is difficult online, as Web spaces have ostensible boundaries. Content on websites can be accessed by anyone and is not necessarily meant for public consumption. Researchers can familiarise themselves with the place of study in order to ascertain whether it should be considered public from the perspective of those who occupy it. This requires continual reflection during the research process. Individuals and their online privacy expectations should be respected. If an individual has posted information on a public website under a public “privacy” setting, they may be considered to have a very low or no expectation of privacy for the information they reveal; regardless, in such situations the researcher needs to be careful not to make undue assumptions. The discussion above has identified that establishing the privacy expectations of research subjects is a problematic issue and one that is intensified by the Web, as is the possibility of intruding on private exchanges and risking personal information during online research. One way to protect privacy is anonymisation. Anonymising data is a process designed to protect research subjects and their personal information, and to satisfy legal requirements such as the DPA 1998. However, whether data can be appropriately or completely anonymised is also debatable in Web research.

Anonymity Issue

­ A central feature of research is to provide descriptions and explanations that are publicly available and accessible. One potentially harmful outcome of research, however, is the risk of disclosing an individual’s identity, and it is the responsibility of the researcher to employ preventative measures such as anonymity (SRA, 2003:38­9) where there may be negative effects from disclosure. Although complete anonymity may be difficult to ensure, it is advised to remove all identifying data prior to publication, and where an individual is identifiable, explicit consent is required before publication (Wiles, 2013). However, Web research complicates attempts to ensure anonymity, as data can be easily put into a search engine and the initial source easily discovered. Bruckman (2002) proposes guidelines that incorporate a “continuum of possibilities” in the level of disguise required for individuals’ names when reporting research (Bruckman, 2002:229). With respect to Web data; steps should be taken to protect all the individuals participating in research by removing all names and any identifying information in the final thesis and in any stored data. URLs or “links” to the forum websites should not be provided, and other personal details should be disguised; however, quotes may be used to evidence any findings and ensure traceability. Bruckman (2002:229) suggests adopting a “moderate disguise”, whereby verbatim quotations may be used but names, pseudonyms and identifiable details changed. This approach was also adopted in Hookway’s (2008) study of morality in everyday life, where he prioritised the protection of his participants’ identity over providing credit to them as authors. Some online discussions contain personal information. This is further complicated by the blurring of the private and public distinction. The ethical guidelines of the Association of Internet Researchers (AoIR) suggests a setting­dependent approach to distinguishing between subjects and authors, distinguishing between “reasonably secure domains for private exchanges” such as chatrooms and “public webpages such as homepages, Web logs [i.e. blogs]” (Ess and AoIR, 2002:7). Where the research context is placed on the public/private continuum, this has an impact on the need to anonymise data. If people are considered to be subjects, then they need to be afforded the protection of anonymity; however, if the information they have posted is considered to be published, then they should be credited as an author. Acknowledging when anonymity should be used and when it is necessary to cite a Web user by their name (or pseudonym) is problematic. There may be circumstances when some Web users may not want to remain anonymous, for example writers of blogs (though these appear quite distinct from forum posts), and so it would be inappropriate to anonymise such individuals. This would be viewed as infringement of copyright and incur issues of intellectual property. If Web users are treated as authors of public documents, then issues of ownership of material must be considered. Web users may have chosen to deliberately publish in the public domain. Bassett and O’Riordan (2002:244) argue that in such cases, rather than maintaining anonymity, researchers should acknowledge the user’s authorship and cite their texts as they would more traditional media, but as Ess (2006) points out, this may compromise their anonymity. Removing all identifying data about the Web user, site etc. prior to publication is one solution to the problem of anonymisation procedures. However, the use of verbatim quotes to substantiate findings can impair this, as the quotes can be traced back to the original website and potentially to the person who made them. This is a new challenge created by the Web, and one that researchers should be mindful of, possibly making the checks to determine the risk of uncovering individual identities. If protection cannot be ensured via anonymity, then perhaps such data should not be reported. Anonymity per se cannot be solely relied on to avoid the need for informed consent; along with the notions of privacy and confidentiality, it requires intense consideration specific to the research issue and setting, as well as to the individuals concerned. Ethical approach ­ When quoting comments, anonymisation is fundamental, as negative consequences to participants could arise from disclosure that resulted in violation of privacy. Even though information may be readily available to anyone online, and could be found by anyone using the similar search terms, researchers should not bring any extra unnecessary attention to anything written in cyberspace by individuals, especially where it has been analysed in relation to specific research issues. Therefore, anything of an embarrassing or sensitive nature, such as information about personal illnesses for example, should be removed and not used within the analysis of the data. Researchers who collect and analyse online forum data (whether it is from public or private forums) should take care to protect it from becoming identifiable to individuals. As such, conversations should not be copied verbatim into research publications, as those direct quotes can be searched and identities discovered. A small number of relevant conversations can be summarised without losing character in reports. The jury is still undecided over whether full quotations need permission, though the various principles of ethics that have been discussed would suggest that this is more likely the case.

References

1. Bakardjieva, M. & Feenberg, A. (2001). “Respecting the Virtual Subject, or How to Navigate the Private/Public Continuum.” Online Communities: Commerce, Community Action, and the Virtual University, 195­214.

2. Bassett, E. H. & O’Riordan, K. (2002). “Ethics of Internet Research: Contesting the human subjects research model.” Ethics and Information Technology, 2, 233­47.

3. Bruckman, A. (2004). “Opportunities and challenges in methodology and ethics.” In: Johns, M. D., Chen, S. L. S. & Hall, G. J. (eds), Online Social Research: Methods, Issues and Ethics. New York: Peter Lang.

4. Ess, C. & the AoIR ethics working committee (2002). Ethical decision­making and Internet Research: Recommendations from the AoIR Ethics Working Committee. Available at: http://aoir.org/reports/ethics.pdf.

5. Ess, C. (2006). “Ethical pluralism and global information ethics.” Ethics and Information Technology, 8(4), 215­26.

6. Eysenbach, G. & Till, J. (2001). “Ethical issues in qualitative research on internet communities.” British Medical Journal, 323, 7321, 1103­5.

7. Frankel, M. & Siang, S. (1999). “Ethical and Legal Aspects of Human Subjects Research on the Internet.” American Association for the Advancement of Science Workshop Report. Available at: http://www.aaas.org/spp/dspp/srfl/projects/intres. main.Html

Grinyer, A. (2007). “The ethics of Internet usage in health and personal narratives research.” Social Research Update 49, University of Surrey.

9. Hewson, C. (2003). “Conducting research on the internet.” Psychologist­Leicester, 16(6), 290­ 3.

10. Honderich, T. (1995). The philosophers: introducing great western thinkers.

11. Hookway, N. (2008). Entering the blogosphere’: some strategies for using blogs in social research. Qualitative research, 8(1), 91­113.

12. Hudson, J. M. & Bruckman, A. (2004). “‘Go away’: participant objections to being studied and the ethics of chatroom research.” The Information Society, 20(2), 127­39.

13. King, S. A. (1996). “Researching Internet communities: Proposed ethical guidelines for the reporting of results.” The Information Society, 12(2), 119­28.

14. Langford, D. (1996). “Ethics and the Internet: Appropriate behavior in electronic communication.” Ethics & Behavior, 6(2), 91­106.

15. Langer, R. & Beckman, S. C. (2005). “Sensitive research topics: netnography revisited.” Qualitative Market Research: An International Journal, 8(2), 189­203.

16. Liu, G. Z. (1999). “Virtual community presence in Internet relay chatting.” Journal of Computer- Mediated Communication, 5(1), 0­0.

17. Lomborg, S. (2013). Personal internet archives and ethics. Research Ethics, 9(1), 20­31.

18. Marx, G. T. (1998). “Ethics for the new surveillance.” The Information Society, 14(3), 171­85.

19. McKee, H. A. & Porter, J. E. (2009). The ethics of Internet research: A rhetorical, case­based process (Vol. 59). Peter Lang.

20. Miller, T., Birch, M., Mauthner, M. & Jessop, J. (eds) (2012). Ethics in qualitative research. Sage.

21. Nissenbaum, H. (2011). “A contextual approach to privacy online.” Daedalus, 140(4), 32­48.

22. Paccagnella, L. (1997). “Getting the seats of your pants dirty: Strategies for ethnographic research on virtual communities.” Journal of Computer­Mediated Communication, 3(1), 0­0.

23. Rawls, J. (1958). “Justice as fairness.” The Philosophical Review, 164­94. 24. Reid, E. (1996). “Informed consent in the study of on­line communities: a reflection on the effects of computer­mediated social research.” The Information Society, 12(2), 169­74.

25. Signorini, A., Segre, A. M. & Polgreen, P. M. (2011). “The use of Twitter to track levels of disease activity and public concern in the US during the influenza A H1N1 pandemic.” PloS one, 6(5), e19467.

26. Smith, H. J., Dinev, T. & Xu, H. (2011). “Information privacy research: an interdisciplinary review.” MIS Quarterly, 35(4), 989­1016.

27. Snee, H. (2013). Making Ethical Decisions in an Online context: Reflections on using blogs to explore narratives of experience. Methodological Innovations Online, 8(2), 52­67.

28. SRA (2003) Social Research Association Ethical Guidelines. Available from: http://www.thesra.org.uk/ethical.htm

29. Staniszewska, S., Herron­Marx, S., & Mockford, C. (2008). Measuring the impact of patient and public involvement: the need for an evidence base. International Journal for Quality in Health Care, 20(6), 373­374.

30. Sveningsson, M. (2004). “Ethics in Internet ethnography.” In: Buchanan, E. A. (ed), Virtual research ethics: Issues and controversies. Hershey PA: Information Science Publishing, 45­61.

31. Wiles, R. (2012). What are qualitative research ethics? A&C Black.

Wilson, B. & Atkinson, M. (2005). “Rave and straightedge, the virtual and the real: exploring online and offline experiences in Canadian youth subcultures.” Youth & Society, 36(3), 276­311.

Back to Case Studies

Draft BSA Guidelines for Digital Research: Case Studies

Digital Research Case Studies 

A discussion about and guidelines for using online forums for research.

A discussion about and guidelines for using Twitter for criminology research.

A discussion about and guidelines for researching social machines.

An ethics committee submission for Twitter.

An ethics committer submission for mixed methods study involving young people.

Barratt, M. J., & Maddox, A. (2016). Active engagement with stigmatised communities through digital ethnography. Qualitative Research, 1468794116648766.

Carmack, H. J., & Degroot, J. M. (2014). Exploiting Loss?: Ethical Considerations, Boundaries, and Opportunities for the Study of Death and Grief Online. OMEGA-Journal of Death and Dying, 68(4), 315-335.

Carter, C. J., Koene, A., Perez, E., Statache, R., Adolphs, S., O’Malley, C., … & McAuley, D. (2016). Understanding academic attitudes towards the ethical challenges posed by social media research. ACM SIGCAS Computers and Society, 45(3), 202-210.

Dalsgaard, S. (2016, January). The Ethnographic Use of Facebook in Everyday Life. In Anthropological Forum (Vol. 26, No. 1, pp. 96-114). Routledge.

Fileborn, B. (2016). Participant recruitment in an online era: A reflection on ethics and identity. Research Ethics, 12(2): 97-115.

Livingstone, S., & Locatelli, E. (2014). Ethical dilemmas in qualitative research with youth on/offline. International Journal of Learning and Media.

Lomborg, S. (2013). Personal internet archives and ethics. Research Ethics, 9(1), 20-31.

Luh Sin, H. (2015). “You’re Not Doing Work, You’re on Facebook!”: Ethics of Encountering the Field Through Social Media. The Professional Geographer, 67(4), 676-685.

McDermott, E., Roen, K., & Piela, A. (2015). Explaining Self-Harm Youth Cybertalk and Marginalized Sexualities and Genders. Youth & Society, 47(6), 873-889.

Markham, A. (2012). Fabrication as ethical practice: Qualitative inquiry in ambiguous internet contexts. Information, Communication & Society, 15(3), 334-353.

Martin, J., & Christin, N. (2016). Ethics in Cryptomarket Research. International Journal of Drug Policy. http://www.sciencedirect.com/science/article/pii/S0955395916301608

Reilly, P., & Trevisan, F. (2015). Researching protest on Facebook: developing an ethical stance for the study of Northern Irish flag protest pages. Information, Communication & Society, 1-17.

Roberts, L. D. (2015). Ethical issues in conducting qualitative research in online communities. Qualitative Research in Psychology, 12(3), 314-325.

Saunders, B., Kitzinger, J., & Kitzinger, C. (2015). Participant anonymity in the internet age: from theory to practice. Qualitative research in psychology, 12(2), 125-137.

Tiidenberg, K. (2015). Selfies| Odes to Heteronormativity: Presentations of Femininity in Russian-Speaking Pregnant Women’s Instagram Accounts. International Journal of Communication, 9, 13.

Trevisan, F., & Reilly, P. (2014). Ethical dilemmas in researching sensitive issues online: lessons from the study of British disability dissent networks. Information, Communication & Society, 17(9), 1131-1146.

Sanjek, R. and S. W. Tratner, eds. (2015). eFieldnotes. Philadelphia: The University of Pennsylvania Press

Twitter, A. (2014) Big Data ethics Andrej Big Data & Society 2014 1: DOI: 10.1177/2053951714559253

Back to Ethics Home

 

Draft BSA Guidelines for Digital Research: Situational Ethics

The Ethics of Care & Situational Ethics

The underlying principle of our research should be care for our participants and others who are in any way involved in or affected by our research, as it is conducted, when it is analysed and when it is published. Our responsibility is to ensure that we maximise the benefit and minimise the harm for anyone involved in and/or affected by our research driven by values of protection, respect, dignity and privacy. Institutional ethics processes are broadly underpinned by the same principles, which are embedded in prospective and bureaucratised templates and operate according to institutionally ratified forms of peer and lay evaluation. When we apply for ethics approval through institutional processes we commit in advance to a prescribed set of practices that uphold ethical principles. The BSA fully supports these institutional ethics processes as they apply in members’ Universities, employers and other relevant organizations. Digital social research is expected to abide by the same principles and processes of ethical approval as other forms of social research. At the same time, we recognise that there may be a mismatch between processes that were originally intended for traditional forms of data and data collection and the ethical challenges that arise with new forms of ‘already existing’ data available in the public sphere, where we have no control over how data are collected and where the principles of consent cannot readily be applied, particularly if the data are at scale. Furthermore, that digital research may raise new ethical challenges for researchers e.g. in linking individuals to each other or linking data about an individual from multiple sources to provide an overview that may not even be apparent to that individual. We are in poorly and even uncharted territory here.

The view of the BSA digital ethics group is that we should not necessarily rule out digital research that does not conform to ethics processes originally designed in a very different context, nor can we provide guidelines that encompass all forms of digital research that may become possible in future. Each research situation is unique and it will not be possible simply to apply a standard template in order to guarantee ethical practice. Rather, we should consider the situational ethics of digital research, taking very carefully into account the context and the implications of conducting this research rather than referring only to absolutes of right and wrong and to issues explicitly addressed in existing ethical guidelines. For further information on this we refer you to the HEFCE Concordat to Support Research Integrity (2012). In cases where they are conscious that their digital research raises ethical challenges, sociologists *must* always secure institutional ethics approval prior to commencing research, and we encourage discussion of situational ethics with ethics committees, most of whom are well aware of the challenges in this area and the need to think creatively about these. Where situational ethics are applied in the ongoing process of research, these should be the subject of documentation and report, if necessary to the appropriate ethics committees. In addition we must apply the ethics of care and situational ethics to protect researchers’ interests as well as those of our participants. Working online and with new forms of data, particularly working with social media, may place researchers in vulnerable positions, making them publically visible and at risk of abuse. All steps should be taken to protect researchers and research should not be undertaken if there is an appreciable risk.

Back to Ethics Home