See You Later, Thick Data – Part 4

See You Later, Thick Data – Part 4

This blogpost is part of the methodological series “See You Later, Thick Data – How we experimented with doing collaborative fieldwork as part of an interdisciplinary research project”. In this series, we, a group of anthropologically trained junior scholars, discuss some of the opportunities and challenges we faced when collecting ethnographic data in a week-long, interdisciplinary case study of the Danish democratic festival “The People’s Meeting”. We took on a somewhat different approach to the classic anthropological fieldwork, and in this series, we share our experiences with a highly preplanned, systematic, and collaborative way of collecting ethnographic data that is integrable with other data types.

Computational Processing of Ethnographic Data

After a few intensive days in the field, you and your team have returned to the familiar settings of the university. In front of you, there is a big pile of observation schemes and seating charts from the field awaiting you to turn them all into one common spreadsheet. Luckily, most ethnographers appear to have carefully recorded audience attention in full accordance with the instructions. After having typed in the last crinkled seating chart, you finally have a full overview of all the recorded quantified attention behavior from the field. You log on to the Ethno-platform to fetch a file with all the fieldnotes from the festival and load it to a programming application. You swiftly extract all the notes that accompany your newly created spreadsheet. Overwhelmed by this huge corpus of fieldnotes and observations, you wonder: Which computational techniques would be most helpful to find patterns in these data?

Picture 6. Structuring ethnographic data

Computational Potential

Computational programming is, unfortunately, often presented as more complicated or math-demanding than it needs to be. In many ways, it is like learning the grammar structure of a new language. As soon as you know the basic rules for how to construct a sentence and bend your verbs, you can slowly begin to communicate. Same thing with programming languages; when you understand the syntax and learn the basic logic behind building up a “script”, you can execute simple code. And even with a few basic skills, you can benefit from programming tools when working with ethnographic data. In the field of social data science, there have been different suggestions to how computers can help process and analyze ethnographic text: some find the machines helpful when coding their material; some have entrusted them with the responsibility to automatically code large parts of their fieldnotes; while others have used text mining techniques to explore notes and interviews to find new themes or patterns that they hadn’t noticed before. These are just a few examples of how computational potential paves the way for new ways to analyze ethnographic data. So, how did we put computational power to good use? We wanted to use computational techniques for two things: to explore our ethnographic material and to combine it with other data types that we collected at the People’s Meeting.

Uniting Ethnographic Data Sources

During the few days the festival lasted, we compiled a ton of beautifully aligned fieldnotes. When accessing the Ethno-platform, the infrastructure allowed us simply to press a button to fetch a file that contained all of them. We loaded the file to a programming application and converted it to a spreadsheet. Now they were ready for computational processing. Imagine a spreadsheet where each row holds the data of a fieldnote, and the different columns help to divide the different information and metadata related to that fieldnote (see Picture 7). Now, returning to the common format of our fieldnotes: each note was written, following three formalities (see Post 2) and holds meta-data about the described situation. Therefore, we could extract information by using these features with different search commands in the programming application. This meant that we could sort the data by date, time of day, ethnographer, event tent, and we could fetch quotes and analytical comments. These can surely be helpful features for the initial data exploration, and our aspirations to computationally process our fieldnotes were slowly being realized. However, we also wanted to combine our systematized quantitative observations with the spreadsheet of fieldnotes.

As alluded to in the beginning of this post, we turned the piles of attention schemes and seating charts into a common spreadsheet. The next step was to merge it with our fieldnotes from the Ethno-platform. The result was one grand spreadsheet of all our ethnographic data. The columns contained the text from fieldnotes and metadata as well as different levels of attention and seating information at each event. And though the data from the seating charts and attention schemes were of a different kind, namely reduced quantitative measures of attention and presence, they were now merged with the accompanying descriptive (though also structured) fieldnotes in which our group of ethnographers had strived to capture attention dynamics in interactions during events. We were now finally piecing together the somewhat fragmented ethnographic puzzle.

Picture 7. A spreadsheet of fieldnotes from the Ethno-platform

From Potential to Beneficial

With the united ethnographic data, we could finally begin to experiment with computational techniques for analysis. After having discussed different ways we could approach this sort of dataset, we decided to start simply by visualizing the quantitative observations of attention. In Figure 8, we have plotted the audience’s attention levels for each observed event. From our master spreadsheet, we extracted all events observed on Friday at the People’s Meeting (vertical axis). We used our observations of how many looked at their phone and at the stage throughout events to create a combined attention score for the two types behavior ranging from 0-10 for each 15 minutes of the event (horizontal axis). As each event lasted an hour this meant that the maximum attention score for an entire event is 40.

Figure 8 might not look very interesting at first glance but visualizing ethnographic observations does bring potential: it can guide parts of our analysis and bring some transparency to analytical choices. Questions and surprises emerging from what we see in the visualization of attention during events could be explored more by diving into the related fieldnotes. We can for instance examine how the audience preserves attention over time in a political event, and we can hold this up against the theme discussed during the event and observations from fieldnotes.[1]

From the visualization, we could also see that the approximate fraction of people paying attention to the stage was relatively stable overall across time intervals and across events, but we saw some small variations. And if we dove into the fieldnotes, we learned for the event with the lowest score, that it was extremely hot around the stage where the event was held. This meant that many in the audience were struggling with the heat, and instead of looking at the stage some were fanning themselves with magazines while others were focusing on ice cream they had bought before the event started.

Picture 8. Sorted bar chart of attention over time among audiences at different events

This was just one example of how we could explore our ethnographic data computationally. A possible next step could be to examine the differences between attention in the back and the front of the audience section, or to try to track temporal and spatial variation in attention at the festival site. When we had metadata recorded such as time and place for observations then we can also move on to merge other spreadsheets with other data types to our grand spreadsheet of ethnographic data. This could be data containing ticket sales for each event, tweets posted by event organizers, or maybe even weather data for each day during the festival. When we have the ethnographic data and metadata united in one spreadsheet loaded into a programming application then we can combine it with other data types.

So now we’ve unfolded our methodological and to some extent experimental approach to ethnographic data collection in an interdisciplinary setting. In the coming post, we will move on to discuss thick versus broad data and the implications of the kind of data we ended up collecting.


[1] We could for instance see that some ethnographers didn’t record attention scores all four times during events, as bars were missing for some events. In the fieldnotes from these events, we learned that this is due to events starting or ending early or the ethnographer arriving late.