Cloud Security for Anthropologists

Cloud Security for Anthropologists

By Alexander Taylor

Our ethnographic data is in the cloud, but our heads are not

More and more anthropologists are conducting, storing and circulating their research in the cloud. Cloud storage – typically in the form of Apple iCloud, Google Drive and Microsoft OneDrive – is now the default storage option on the smartphones, netbooks, tablets and other digital devices that have become commonplace tools of fieldwork. Messages are sent to interlocutors through cloud platforms like WhatsApp. Interviews are carried out through Skype and Facetime. Apps for ethnographic research are proliferating. Evernote is replacing the field notebook. Articles are written collaboratively in browser-based cloud environments like Google Docs or Microsoft Office Online. Field reports and article drafts are circulated via Dropbox, WeTransfer, Box and Mozy.

Cloud infrastructure increasingly underpins growing areas of academic research practice. Yet to date there has been little – if any – critical reflection on the ethical, political and legal implications of cloud computing for social science researchers. The aim of this post is to begin moving discussions of digital security beyond the bare essentials of locked filing cabinets, password-protected laptops and hard drive encryption. Having spent a year and half conducting fieldwork in the cloud, becoming progressively more paranoid about data security in the process, I’d like to draw some much-needed attention to cloudy digital research practices that anthropologists increasingly engage in but may not recognise as security issues. In doing so, I hope to prompt discussion on the implications of cloud computing as it becomes increasingly infrastructured into research, teaching and administrative activities across universities. With higher education institutions turning to cloud services to deliver their e-learning and information management systems, and with research funders requiring grant awardees to deposit their field data in cloud databases, anthropologists urgently need to begin getting their heads around the cloud.

The bearable lightness of laptops 

While most anthropologists have long been aware of the ethical and security concerns surrounding the sending of sensitive information through email, the problem with the cloud is that many people don’t know what it is or even realise they are using it. Like most infrastructure, it is designed to disappear. This problematic invisibility means that cloud computing seems to fly under the ethics and security radar.

Despite the image of fluffy ethereality that the cloud metaphor conjures, the cloud is concrete, political and aggressively expanding across the surface of the planet. At its most basic, cloud computing refers to an infrastructural shift from desktop computing – where files and applications were stored on the local hard drives of our computers – to a form of online computing, where these are stored in data centres accessed remotely ‘as a service’ through the Internet. In the context of my fieldwork, ‘the cloud’ was mostly a windowless, subterranean data centre repurposed from the ruins of a Cold War bunker. It was about as far away from the sky you could possibly get, and distinctly un-cloudlike – except for its whiteness:

Inside the Cloud. Photo by Author.

It’s thanks to data centres that our digital devices are so light, portable, and fieldwork-friendly. Laptops no longer have CD or DVD drives because we download our apps, programs and software online, directly from data centres. As more of our files and applications are stored in and streamed from data centres, the bulky storage drives and connectivity ports that once weighed down our devices, are being stripped away to reduce weight and replaced with minimal capacity internal memory. With most of our computing needs now implemented as web services, the main task left for our devices, as powerful as they are, is more and more just to act as portals to data centres.

But this lightness comes at a significant cost. Removing ports removes possibilities for increasing memory using external storage like USB drives or micro SD cards. And shrinking internal storage capacity means that users increasingly have little choice but to store their data in the cloud. Cloud storage is now infrastructured into smartphones, tablets and other digital devices as the default storage option. Taking these devices off-cloud is often made deliberately unclear by tech manufacturers. It is also becoming increasingly difficult, with cloud-connected devices designed to silently upload files without any fanfare, potentially leading to the inadvertent sharing of ethnographic data.

Data murk

With smartphones being used to record interviews, capture video footage, take photos, send files and write and store fieldnotes, anthropologists can now quickly generate large quantities of born-digital ethnographic data that soon exceed our mobile device’s storage capacity. In this context, the cloud, with its promise of ‘free’ and ‘unlimited’ data storage space is a tempting solution.

Microphones and other peripherals transform tablets and phones into the ethnographer’s Swiss Army Knife. Image Source: Pixabay.

However, data stored in the cloud remains legally, ethically and epistemically murky. A severe lack of legislative regulation means online data is typically subject to the service level agreements and terms and conditions of each cloud provider. In cases where data stored in the cloud is unprotected by intellectual rights, you may effectively be transferring ownership of your ethnographic data. You should therefore exercise caution before storing data with any third-party cloud service providers.

Even when an online service is not specifically marketed as a ‘cloud service’, the basic rule of thumb is that any files exchanged or interactions that occur over the Internet will be stored in data centres. That means conversations through Skype, Facetime and WhatsApp. It means the mundane e-learning platforms and management systems (like Moodle), that we regularly encounter but rarely reflect upon. It also means any emails or attachments that you send (even to yourself as a back-up copy). Emails sent outside of your university network are sent in plain text and are therefore never ‘private and confidential’. As I heard many times during my fieldwork, ‘email is about as secure as a postcard’.

Passing private and perhaps sensitive ethnographic data on to unknown others in the form of cloud providers could be considered a serious breach of the fiduciary duty anthropologists have to their research participants. In the post-Snowden securityscape, we must assume that data stored in the cloud will be subject to surveillance. Commonly used cloud file-sharing services, such as Google Drive, Apple’s iCloud, Dropbox, WeTransfer, Mozy and Box will not be appropriate for sensitive or personal data.  If you find yourself having to use the cloud then you need to encrypt your files before uploading them. VeraCrypt  is an easy-to-use free tool for encrypting files in secure way before sending them online. pCloud offers fully encrypted cloud storage. Mega  is also worth mentioning – it runs some basic encryption inside the browser before the file is uploaded to protect data that is being transmitted over an open/public Wi-Fi connection against low-level snooping. Though it is certainly not ‘government-proof’.

Most university networks offer secure files storage on servers located on campus that will meet data security and privacy requirements. This provides a layer of assurance that cloud providers, who could store your data anywhere in the world, cannot.  With increasingly stringent data sovereignty regulations – where data is subject to the laws of the country in which it is stored – it may also be necessary to know the physical location(s) of the data centre(s) you are using. Storing data in local data centres may become a standard condition of future fellowships and confidentially agreements.

Ideally, anthropology departments would provide PhD students and supervisors with a secure online storage space for the transferring of field reports, research materials and other file exchanges (anything sent over the Internet should, of course, be anonymised, unless your informants have specifically requested otherwise or the conditions of consent explicitly state otherwise). Undoubtedly the safest way to share files is to physically exchange a storage device. Data centre professionals call this the ‘sneakernet’. Despite all the cloud hype, in the data centre industry, the most secure and the fastest way of transporting large volumes of data to the so-called cloud is simply to load it in the back of a ‘hardened’ truck and drive it there, giving a whole new meaning to ‘hard drive’.

In December 2016 Amazon unveiled the ‘Snowmobile’, an exabyte-scale data transfer service in the form of a forty-five-foot-long shipping container attached to the back of an articulated truck. Image Source: Amazon Web Services.

The Right to Erasure 

The new EU General Data Protection Regulation (GDPR) framework provides ‘data subjects’ (interlocutors) with the right to have any personal data the anthropologist may hold on them permanently erased. My fieldwork experiences highlighted considerable ethical and legal dilemmas surrounding the safe and secure disposal of data stored online.

When you delete an email, file, photo, social media post or even close an online account, you are not necessarily deleting them from the data centre in which they are stored. From the cloud provider’s perspective, deletion often simply means removal from the end-user’s interface, while the information typically remains locatable at the data centre-end. Most of your online activity is simply left on data centre servers in a state of involuntary permanence. This could be considered a serious infringement of research participants’ privacy if they want or expect their data to be deleted – raising problems if researchers have promised to destroy certain data upon completion of their project.

 

Cloudy Futures 

Cloud technologies offer valuable new tools and virtual spaces for the storage, sharing and writing of ethnographic data. But they also pose challenges to the ethical structures of anthropology that we are only just beginning to articulate and that require us to accordingly reflect on data security in the cloud as a standard part of ethical practice. Anthropology departments, institution review boards and ethics committees need to begin to respond to the changing security requirements of the digital research environment by offering more effective training in this domain.

Confidentiality agreements, ethical obligations or digital import/export restrictions tied to research grants will no doubt soon preclude the use of third-party cloud services as standard practice. At the same time, research councils increasingly require grantees to submit their ethnographic data for indefinite storage and re-use by third parties through online public cloud platforms. These often contradictory codes and requirements at different bureaucratic, legal and ethical levels mean that the cloud is at once being infrastructured into research practice and at the same time regulated out, which will make meaningfully navigating and negotiating this cloudy terrain difficult. The powerful commercial imperatives of connectivity and the energy-intensive environmental destruction that underpin the creeping ubiquity of this computing infrastructure, make interrogating the cloud all the more urgent.

Image Source: The Simpsons, Season 13, Episode 13: ‘The Old Man and the Key’. Aired 10 March 2002.

Alexander Taylor is a PhD candidate with the Department of Social Anthropology at the University of Cambridge. His research explores how technologies and infrastructures of data storage intersect with planetary scales of security and dystopian digital futures in the data centre industry. In this post, he explores some of the security implications of cloud computing for social science research practice.

6 Replies to “Cloud Security for Anthropologists”

  1. The huge issue with any law regulation in a virtual environment is that virtual environments are global in nature, when regulations are local and largely bound by the laws of the particular state. It is a perception issue – we do not perceive a cloud as a “space”, it is not “real”, so we try to regulate its physical manifestations, simply speaking – the server buildings (because that we can perceive a space, at least). I am not sure if that strategy would be sufficient in the future when we talk about the law and the virtual, without the essential shifts in the human perception. Also, since it is a global market, apps and tools can change hands and countries when the creators decide to sell them – which means that legally all your data can move to a jurisdiction of another country, without you being even aware this has happened.

    There also is an additional dimension to the cloud-user relationship – a connection provider. It is not enough to secure the data on the cloud and on the user end, you also have to make sure you “do not loose packages” in transit. Also there is a problem of algorithms, which check the data on an automated basis and are most of the time undisclosed by the companies running them. We can only guess how often the numerous algorithms targeting various issues come across our research data and if they are not/won’t be capable of recognizing a sensitive data in transit and flagging it for the review for the companies running them. The data screening is an ongoing discussion, with the idea thrown around about enabling the police to survey the whole data “area” – which essentially means that, in case it becomes a law in some countries, a police can screen all the data coming from lets say a specific suburb, without us being aware that we are being surveyed.

    It also brings us to the problem of the data mining software and what it can be legally capable off. Combined with the poor customer support which tends to be a chronical issue in a lot of major companies, it can pretty quickly turn into an impossible obstacle when you need to remove a data from somewhere.

  2. Very interesting post — thank you! It articulates a lot of doubting questions that have been wandering amorphously in my mind. I’m a pen and notebook person, and usually work with very limited access to the Cloud, though not a complete Luddite — but I wonder about backup in the situations you describe. I’ve been told over and over by pre-millennial researchers to backup computer files and hard copies in at least three different ways. With the new cloud-connected devices that don’t have ports or drives, how is data backed up more concretely than to just another cloud service?

    1. Hi Karen, that’s a great question – and sits at the core of the problem with the ongoing push to the cloud.

      With cloud devices that have Bluetooth capabilities you can still wirelessly backup data to your own hard drive, bypassing cloud services completely.

      If you know your way around the technical side of device hardware (or if you know someone who does), then there is the possibility of removing the interior memory and manually transferring data from it – though unless you really know what you are doing this could seriously damage your device, leaving it less secure than just using the cloud. Internal smartphone storage is designed to withstand high levels of wear (a design aspect known as ‘wear leveling’), which is both good and bad. On the one hand, it means if you choose to take the device apart, the data will probably remain accessible and extractable (so you can back it up off-cloud). On the other hand, it means that it can be very difficult to wipe the device clean, which can be a major problem if you need to ensure that data has been permanently deleted – another reason why it is not really a good idea to store highly sensitive or confidential ethnographic data on a smartphone (or other devices with flash-based memory).

      Another option is to shell out the quite considerable extra cash for devices with larger internal storage capacity. With an iPhone, this can mean a hefty $100-$200 extra – which is essentially slowly pricing out offline backup. High-capacity internal storage may soon be considered a redundant feature. As more people happily switch to the ‘unlimited’ storage space of cloud services, tech developers may think that it is no longer necessary (or profitable) to release the same device with different storage capacities, so new devices may only offer one storage size. Moore’s Law (and Kryder’s Law) suggest that storage capacity will increase while storage hardware gets smaller – meaning our phones should have larger and larger storage capacities that take up less and less space. But this isn’t really happening any more – not because these ‘laws’ (which are more speculative predictions) are necessarily incorrect (though they are certainly being challenged today), but because of changing data storage habits amongst consumers. As cloud computing becomes the dominant model for storing data, there is simply less demand for high-capacity internal storage.

      With the cloud trend showing no signs of slowing down, we need to look at ways to make this infrastructure and the ethical structures of anthropological practice fit better together. More stringent cloud legislation could help. Discussing these issues as a standard part of the training of researchers and supervisors could also help. Cloud providers could offer users an encryption option for uploading files. They could also make information publicly available about the countries in which their data centres are located, so we at least know roughly where our data could potentially end up – which is essential if there are data sovereignty restrictions attached to funding awards.

  3. I’m an ethnographer who used to work in campus IT, and I have to say that I think this post, while potentially very informative for those unfamiliar with the issues, suffers I think from a lack of practicality on some major points:

    1) Users, esp. non-technical users, are like a random walk through all possible technical options. If they can do something, they will. Telling people “It’s bad and possibly unethical in the abstract if you do this” — if “this” is “using Google Docs” — just doesn’t usually work, especially if all the other options are non-existent or better. Meanwhile, campus policy discourse is IMO not best understood as seriously guiding people’s behavior but is really just a sort of strategic tool that staff can invoke in exceptional cases. (Ex: you can lend your friend your library card fifty times and the library is likely to never care unless something egregious happens, in which case they will trot out the policy saying you can’t lend your library card to anyone…)

    2) Idealizing offline storage is a terrible idea, because people are terrible at handling their own backups and inevitably they will lose vital data. Data loss (bc of device theft, breakage, etc) is a much serious risk for most users than data theft/unauthorized disclosure. OK, if you use Google Docs, you’re trusting Google to keep your data private, to maintain a sane EULA, etc, but you can very likely trust Google to have better backups of its data centers than you (non-technical user) can manage all by yourself.

    3) I think we have to have a more differentiated and holistic threat/risk model when we talk about the threats faced by standard ethnographers. If you are an ethnographer studying something that is threatening to a major institution (defense contractor, nation-state, cartels, Silk Road, etc), then yes, be very careful with electronic and physical security, because powerful institutions or technically sophisticated adversaries have many capabilities against you. But otherwise, if you are studying something pretty benign (like the university reform politics that I study), you probably have no malevolent digital adversary to worry about, and your major risk is data loss, which has affected many academics of my acquaintance. For low-risk projects it’s probably a safe assumption that Google Docs is perfectly secure (Google has contracts with many university IT groups anyway) and you should just do what works for you and get on with work.

    4) The fact that a seemingly more private file server is provided by a university IT office does not necessarily guarantee (a) competence (b) security (c) privacy or (d) practicality. There are a lot of mediocre to awful campus IT offices, and many IT staff would be the first to tell you that probably Google is more competent than them.

    TLDR everyone should of course read your post and understand the broad “cloud” situation, but DIY file storage is really a bad idea, we should exercise pragmatism about data handling since policy discourse is not sovereignty, and we should think about our own specific risk models rather than in terms of an abstract worst case scenario.

    1. Hi Eli – thanks for your thought-provoking comments!

      I have to say that throughout my university experience I have found the IT security staff hugely knowledgeable and would say that they should be the first port of call for any data security queries. Many university IT departments have also published comprehensive guidelines about the pros and cons of backing up data in the cloud.

      I completely agree with the risks you have raised regarding offline storage. Stolen devices or hardware failure are no doubt the most common threats to data stored this way. But as many of us have first-hand experience with these issues (as your acquaintances certainly know!) they tend to be the topics that are frequently brought up in discussions of data security. To be sure, this doesn’t mean we can forget about them. But I wanted to try to bring a different domain of data security into the purview of ethical consideration in anthropology. In doing so, I wanted to highlight that cloud storage doesn’t necessarily do away with these issues but rather they reappear in new forms, taking on perhaps less noticeable dimensions as data storage – and the security of that data – become ever more outsourced, removed and abstracted from the user.

      I do feel like I need to defend offline storage, though (‘DIY storage’ in your words). Cloud services shouldn’t replace offline storage, and good offline data management and handling practices are skills that I think users should be encouraged to develop rather than outsource. Tech companies like Google and Apple certainly have far more resources at their disposal than individuals (and campus IT offices) to offer certain forms and levels of data security. Their multiple redundant data centres promise a decent backup solution to avoid data loss from local offline storage (e.g. arising from hard drive faults or thefts) but they also provide new opportunities for that data to end up in unknown hands and locations. Some of the time these are largely speculative (even conspiratorial) considerations – but with ongoing scandals surrounding high-profile data breaches, the illicit sharing and selling of data, and concerns over dataveillance practices (not forgetting the involvement of Google, Microsoft and Apple in PRISM), they should nevertheless be factored into the decision-making process when choosing a backup strategy.

      Also, I think these matters should be of concern to everyone, not only those conducting ‘high-risk’ fieldwork – with vulnerable (or high-profile) people, in politically unstable regions, or studying illegal activities. For researchers embarking upon a project or heading into the field, a basic awareness of some of the ethical, legal and political issues surrounding cloud storage could help to make a more informed decision about data security. Due to the unpredictable character of anthropological fieldwork, you never know what kind of sensitive information could be entrusted to you or whether the relevance, value or importance of this information may change in future contexts (for example, if an interlocutor dies). Like many anthropologists, over the course of my fieldwork I accumulated interview recordings, correspondence and other ethnographic data that, if traced back to my research participants, may harm their careers. However, conducting fieldwork in data centres, I found myself engaging with the hypothetical situation that if I were to store my ethnographic data in the ‘cloud’, it could be my interlocutors (or their bosses) who were handling this data. As fantastical as this scenario may be, I nevertheless found it ‘good for thinking’ reflexively about ethnographic writing, security and authority – the challenges and opportunities arising when interlocutors become the custodians of the ethnographic data being produced, the collapsing of the field diary/fieldsite distinction if I were to use cloud apps to write fieldnotes, etc.

      Offline storage also has the advantage of ensuring data is accessible without an Internet connection. I didn’t mean to suggest that campus IT offers a more secure data storage solution than the likes of Google, Apple and Microsoft. Rather, I meant that data stored on institutional servers should meet institutional security requirements (how robust these requirements are will depend on the institution). If data is stored outside of this network (whether offline or in the cloud) then researchers who are sponsored or funded by their universities will need to ensure that it meets their institution’s requirements. At the same time, institutional servers are often local, so if data locality is a concern then this could prove advantageous. It would be better to think of campus IT and cloud services as offering different forms of security that will address different needs depending on the fieldwork, the researcher and who is sponsoring or funding the research. Choosing a backup strategy really depends on local circumstances, the nature of the data, and the levels of risk appropriate for the these.