coding – Antenna

Textual Analysis & Technology: Information Overload, Part II

Kyra Hunting — Wed, 22 Jul 2015 15:23:38 +0000

Post by Kyra Hunting, University of Kentucky

This post is part of Antenna’s Digital Tools series, about the technologies and methods that media scholars use to make their research, writing, and teaching more powerful.

In my last post, I discussed how I stitched together a system built primarily from simple office spreadsheet software to help me with the coding process used in my dissertation. As I moved into my first year post-diss with new projects involving multiple researchers and multiple kinds of research (examining not only television texts but also social media posts, survey responses, and trade press documents), I realized that my old methods weren’t as efficient or as accessible to potential collaborators as I needed them to be. This realization started a year’s worth of searching for a great software solution that would help me with the different kinds of coding that I found myself doing as I embarked on new projects. While I ultimately discovered a number of great qualitative research software, ultimately nothing was “just right.”

The problem with most of the research-oriented software I found was that they are based on at least one of two assumptions about qualitative research: 1) that researchers have importable (often linguistic, text-based) materials that we are analyzing and/or 2) we know what we are looking for/hoping to find. Both of these assumptions presented limitations when trying to find the perfect software mate for my research.

The first software I tried was NVivo, a qualitative research software platform that emphasizes mixed media types and mixed methods. This powerful software was great in many ways, not the least of which was that it counted for me. I first experimented with NVivo for a project I am doing (with Ashley Hinck) looking at Facebook posts and tweets as a site of celebrity activism, and in this context the software has acquitted itself admirably. It allowed me to import the PDFs into the system and then code them one by one. I found the ability to essentially have a pull-down list of items to consider very convenient, and I appreciated that I could add “nodes” (tags for coding) as I discovered them and could connect them to other broader node categories.

Sample Node List from NVivo

The premise behind my dissertation had been to set up a system to allow unexpected patterns to emerge through data coding and I had wanted to import this into my new work. NVivo supports that goal well, counting how many of the 1,600+ tweets being coded were associated with each node and allowing me to easily see patterns emerge in terms of which codes were most common and which were rare. An effective query system allows researchers to quickly find all the instances of any given node (e.g., all tweets mentioning a holiday) or group of nodes (e.g., all tweets mentioning a holiday and including a photo). While the format of my data meant I wasn’t able to use Nvivo’s very strong text-search query, its capability to search for text within large amounts of data, including transcripts, showed great potential. NVivo seemed to be the answer I was looking for, until I tried to code a television series.

Sort for most frequent nodes from my project with Ashley Hinck

For social media, my needs had actually been relatively simple. I was simply marking if any one of a few dozen attributes were present in relatively short social media posts. But with film and television they increased. It wasn’t as simple as x, y, or z being present, but rather if x (say physical affection) was present I also needed to be able to note how many times, between which people, and to add descriptive notes. This is not what NVivo is built to do. NVivo imagines that researchers are doing three different things as distinct and separate steps (coding nodes, searching text, and linking a single memo to a single source). NVivo is great at doing these things and I expect will continue to serve me well working with survey and text-based data. But for the study of film and television shows I found NVivo demanded that I simplify the questions I asked in ways that were inappropriate. After all, in the complex audio-visual-textual space of film and television it isn’t just that a zebra is present but whether it is live-action, animated, talks, dances, how many zebras are around it, what sound goes with it, etc. Memos allowed you to add notes but it only allowed one memo per source and the memos were awkwardly designed and hard to retrieve alongside the nodes.

I found that NVivo competitor Dedoose gave me a bit more flexibility in terms of the ways I could code but it did not do well with my need to simply add episodes as codable items. I was unable to import the episode. Also, simply typing in an episode’s title and coding as I watched was much harder than I expected. Like NVivo, Dedoose seems to imagine social scientists that work with focus groups, surveys, oral histories, etc. as their primary market. Trying to use Dedoose without having an existing spreadsheet or set of transcripts to upload proved unwieldy. In the analysis of film and television, coding while you are collecting data is possible, even desirable, and the notion that data collection and the coding of data would be two separate acts was built into this system.

If Dedoose’s limitation was the notion of importable data, Qualtrics’ was the notion that I would have already decided what I would find. I quickly discovered that while Qualtrics was wonderful at setting up surveys about each episode and effectively calculated the results, it did not facilitate discovery. If, for example, I wanted to code for physical affection and sub-code for gentle, rough, familial, sexual, it could manage that well. But if I wanted to add which characters were involved, this too needed to be a list to select from. I couldn’t simply type in the characters’ names and retrieve them later. Imagine the number of characters involved in physical affection over six years of a prime-time drama and you can see why a survey list (instead of simply typing in the names) would quickly become unwieldy.

That is how I found myself falling back on enterprise software; this time the database software FileMaker Pro. FileMaker Pro doesn’t do a lot of things. It doesn’t allow you to search the text of hundreds of word documents, it doesn’t visualize or calculate data for you, it doesn’t automatically generate charts. But what it does do is give you a blank slate to put the variety of types of information you need into each database and helps you create a clear interface for inputting this information. Would I like to code using a set of check boxes indicating all the themes that I have chosen to trace in a given episode? No problem! Need a counter to input the number of scenes in a hospital or police station? Why not?! Need to combine a checkbox with a textbox so I can both note what happened and who it happened to? Sure! And since it is a database system, finding all of the episodes (entries) with the items that were coded for is simple and straightforward. This ability to not only code external items but to code them in multiple ways for multiple types of information using multiple input interfaces proved invaluable. As did its ability to allow me to continue coding on an iPad as well as a laptop, which allowed me to stream video on my computer at work or while traveling and coding simultaneously.

FileMaker Pro has its limitations, too. It does not connect easily with other coders unless everyone has access to the expensive FileMaker Server, and since I have just begun using FileMaker I may find myself still paying for a month of Dedoose here and there to visualize data I collected in FileMaker or importing the notes from my database into NVivo to make a word tree. But at the end of the day what characterizes textual analysis is its interpretive qualities. The ability to add new options as you proceed, to combine empirical, descriptive, numerical, linguistic and visual information, and to have a platform that evolves with you is invaluable.

While I didn’t find the perfect software solution, I found a lot of useful tools and I discovered something important: As powerful as the qualitative research software out there currently is, no software currently is well suited to textual analysis. The textual analysis that media studies researchers do creates unique challenges. While transcripts of films and television shows can be easily imported (if they can be obtained), the visual and aural elements of these texts are essential and so many researchers in this area will want to code items without importing them as transcripts into the software. Furthermore, the different ways to approach media – counting things, looking for themes, describing aesthetic elements – necessitate the ability to have multiple ways to input and retrieve information (similar to Elana Levine‘s discussion about incorporating thousands of sources in multiple formats for historiographical purposes). The potential need to have multiple people coding television episodes or films requires a level of collaboration that is not always easily obtained outside of social-science-oriented software like Qualtrics. Early film studies approaches often combined reception with description and these two actions remain important in contemporary textual analysis. Textual analysis requires collecting, coding, analyzing, and experiencing simultaneously (particularly given the difficulties in going back to retrieve a moment from hours and hours of film or television). It is an act of multiplicity, experiencing what you watch in multiple ways and recording the information in multiple ways, that current software does not yet facilitate. The audio-visual text requires a different kind of software, one that does not yet exist, one that would not only allow for all these different kinds of input and analysis but also allow you to easily associate codes with timestamps, count shots, or scene lengths and link them with themes. While the perfect software is not out there, I found that combining software like Filemaker Pro, NVivo, Dedoose, and simple tools like Cinemetrics could still help me dig more deeply into media texts.

Textual Analysis & Technology: In Search of a Flexible Solution, Part I

Kyra Hunting — Wed, 15 Jul 2015 16:08:45 +0000

Post by Kyra Hunting, University of Kentucky

This post is part of Antenna’s Digital Tools series, about the technologies and methods that media scholars use to make their research, writing, and teaching more powerful.

2,125… or was that 2,215? When working on my dissertation, a question that came up again and again when I said I was trying to look at entire series of several television shows was “how many episodes did you look at total?” It was a perfectly reasonable question, and yet one I often wasn’t quite sure of the exact number when I was asked it. After a certain point what was another 50 episodes or so? If I couldn’t easily remember the number of episodes I was looking at, I knew remembering the details of each one wasn’t going to be possible. As a result, finding a way to code and take notes on the shows I was examining, and make them searchable later, was one of the first steps I took during my dissertation process. Four years later, as the research approach I developed in my dissertation has become increasingly important to my work, I am still in search of the perfect software.

When I began looking for a solution for my dissertation, I ran into three problems that I suspect are pretty common: 1) I had never worked on a project that size and was not aware that there were software solutions out there; 2) the software I had heard of could cost several hundred dollars; and 3) (most importantly) I wasn’t sure exactly what I was looking for. My dissertation began largely from an interest in finding a different way to approach television texts and wanting to investigate how the form of different television genres and a number of different themes of representation intersected. As a result when I sat down with my first stack of teen drama DVDs to code I didn’t know quite what I wanted to code. It was through the process of coding, thinking about what information I would need and want to be able to go back to, that I learned what I was looking for. I only realized that something like acts of physical affection were something I wanted to code with a simply y/n and character names after 6 episodes. It turned out that a shorthand for demographic information (e.g. WASM for White Adult Straight Man) would be important for medical dramas and crime dramas to denote the demographics of criminals, victims, suspects, and patients, although it had been entirely unnecessary for teen dramas. Coding, for me, was a learning process — something that both recorded information, made it accessible, and helped me discover what I was looking for. That process of discovery through research certainly won’t be foreign to most academics. After all, there is joy in finding that unexpected piece of the puzzle in an archive or watching a focus group coalesce in an unexpected way. However, as I have found half-a-dozen or more software demos later, that is not quite how most academic research software works. Most of the software I experimented with wanted me to know what I was looking for, or at least already have what I was looking at (i.e. interview transcripts, survey results) in a concrete way.

Because of this core issue — the fact that how much information, what information, and what kind was constantly evolving — I found then, and again three years later, that it was an enterprise (read: business) not academic software that best suited my needs. During my dissertation it turned out to be the relatively straightforward Numbers spreadsheet software that did the job. For each genre, I would set up a different spreadsheet with the unique sets of information I needed for that genre. For example, for crime dramas there would be a column for each of the following: demographics of victims(s), demographic(s) of perpetrators, demographics of suspect(s), motive, outcome, religion, non-heteronormative sexuality, gender themes, police behavior, and the nebulous “notes” section that inspired the columns and code short-hands that I needed as things evolved.

What made Numbers work was that I was transparently typing in words, the shorthand I evolved to stand in for the boxes on a traditional “coding” sheet, and numbers (episode numbers, number of patients, etc.). I could always change what I coded and how. Every few episodes I watched I would ask, ‘Is there something important and new I want to track?’ If there was I could word search my notes and assign them shorthands; so, as time went on, I needed less notes and could shorthand much more of my fiftieth ER episode’s notes then I could when I began my fifth. The spreadsheets seemed disorderly and overwhelming to my partner when he peeked at my work (see image below) but they had the advantage of elasticity, changing as I learned what I was doing and what I was looking for.

Numbers didn’t have any assumptions (like a lot of more powerful software does) about what information I would be inputting and how I would use it. Therefore, when it came time to sort that information it also leant itself well to finding the relevant episodes and connections. The filter function allowed me to pick any column and any search term and would show me only the rows (episodes) that were relevant. Every episode that contains the word “jealousy” in the motive column but not the words “anger” or “angry” and the religion code “CH” (for Christian) was only a few filter clicks away.

Like Elana Levine, I found that the software that was available couldn’t do the whole job itself. Numbers didn’t really recognize the information I was putting in as something it should count, so if I wanted to know how many white male victims of crimes there were (hint: a lot) I was on my own to physically count them up. As a result I discovered that Zotero, a research material collection system (similar to Scrivener) that I had been using for reading notes and collecting PDFs also helped me analyze those thousands of episodes. After filtering the information using Numbers, I would create files in Zotero where I would list all the episode numbers that discussed Buddhism, or in which a lesbian character appeared, or in which a patient died. I’d then count up the numbers of episodes in a given category. Because Zotero was so searchable, it made it quick and easy to find all the “important themes” a given episode dealt with and calculate all kinds of relationships that I hadn’t originally expected to look at (percentage of patient deaths that were pregnant women? Alcoholics? Coming right up!).

Spreadsheets and a digital version of a filing cabinet (my best way of describing Zotero) are not necessarily the high-tech solutions I might have initially sought but their content agnosticism and searchability made them perfect fits for the work I was doing at the time. Just the other day I pulled up one of my old spreadsheets looking for the sort of thing I hadn’t coded but likely would have kept in the episode notes, and found an episode of a medical drama featuring an elementary school teacher in mere moments. When I started my new job and embarked on new research projects, including those that required collaboration, I started to feel like spreadsheets just wouldn’t do the job anymore and went in search of the perfect software. One year, several meetings with my college’s IT guys, and quite a few demo downloads later and I still haven’t found it. My new, better spreadsheet alternative has turned out to be yet another business solution: FileMaker Pro. And the shoe still doesn’t quite fit, but more on that later (stay tuned for Part II next week).

While I might not have discovered the perfect piece of software, what I have discovered is that the creative use of open-ended software can serve the study of texts well. However, the available research software is not yet designed for the diversity of information, multiplicity of data input types, and unique twists and turns that accompanies the study of media texts.