Media History Digital Library – Antenna http://blog.commarts.wisc.edu Responses to Media and Culture Thu, 30 Mar 2017 23:48:47 +0000 en-US hourly 1 https://wordpress.org/?v=4.7.5 Teaching with Arclight and POE http://blog.commarts.wisc.edu/2015/10/12/teaching-with-arclight-and-poe/ Mon, 12 Oct 2015 18:30:22 +0000 http://blog.commarts.wisc.edu/?p=28561 Project Arclight began two years ago with an idea: If researchers can use Twitter analytics to study trends in discussions of contemporary media, then what if we treated historic trade papers and fan magazines like a giant Twitter stream and explored trends in film and media history? We worked on refining this idea, received a grant, kept working on it, and — just today! — publicly launched our software at http://search.projectarclight.org.

Arclight searches the nearly 2 million page collection of the Media History Digital Library (MHDL) and graphs the results. To provide one example — and an example very much inspired by this month’s baseball postseason, football season, and basketball preseason — here is a visualization of how those three sports trend across the MHDL corpus. Note: Because the MHDL’s collections primarily encompass out-of-copyright works, the results largely cut off after the year of 1964.

Arclight sports line graph -- raw page count

The team that developed the Arclight software — who are acknowledged at the end of this post — are working on a series of journal articles that model how Arclight and the method of Scaled Entity Search can be applied toward investigating large-scale research questions. However, we also hope that Arclight will be valuable as a classroom tool for teachers of film and broadcasting history, especially those teachers keen to expose students to digital humanities methodologies and engage them in active learning. Here are a couple of suggestions for film and media educators about how to use Arclight with your students.

The POE Strategy

For over twenty years, the POE strategy (which stands for Predict, Observe, Explain) has been a highly effective teaching method in the sciences. After being presented with a set of circumstances, students are asked to predict what will happen, observe an experiment, and then explain why it happened and compare their prediction to the outcome. When implemented well, it’s an exercise that actively engages 100% of students in the classroom — not just the two or three who might raise their hands to answer a question that the teacher asks aloud.

The POE strategy is ideal when used with science experiments that play out relatively quickly. What will happen when we mix these two chemicals together? Make your prediction, then observe and explain why the result occurred. But POE can be challenging to implement in a history classroom. If a teacher says, “and guess what happened next?,” then it can certainly facilitate student prediction. But it also perpetuates two unfortunate dynamics. First, the instructor, who reveals the correct answer, becomes reinforced as the authority on the historical record. Second, history is presented as a fixed narrative, rather than as a set of assumptions and arguments that we are always challenging using the available evidence. We need ways to more actively engage students in their learning. And active learning, in a film and media history classroom, means that students get to spend class time doing the work of a film and media historian.

Arclight offers one means of integrating POE and active learning into a film or media history classroom. To use my earlier graph example, a teacher might ask, “How did the discourse of sports change from 1900 to 1960 in books and magazines about American entertainment and media?” Students could write down their predictions, then get to work on their computers or phones running queries for baseball, basketball, football, and other terms in Arclight. Something might immediately jump out at them. For me, it was the decline of both baseball and football during the years of 1942, 1943, 1944, and 1945. Based on this observation, I would offer the explanation that this decline of baseball and football occurred due to the impact of World War II and the enlistment of athletes into armed forces.

But really, this explanation based on distant reading is a new prediction that invites closer inspection, observation, and analysis. Put another way, Arclight is best used with the POEPOE or POEPOEPOE strategy. Any explanation a student offers can and should be further tested. Does the rise of football in the late-1920s and 1930s have more to do with radio coverage or football’s popularity in short films? We need to dig deeper to find the answer.

In developing Arclight, we felt it was important to give users the ability to easily and fluidly access the underlying texts. We were able to achieve this by integrating Arclight with the MHDL’s search engine, Lantern. Students can click through the Arclight graph and access the underlying materials within Lantern. Teachers will also want to encourage students to consult primary sources that are NOT indexed within Lantern, like archival manuscript collections and historical newspapers. Still, Arclight and Lantern provide a fast, user-friendly way for students to actively engage in historical research and analysis within a classroom.

The SES Interpretive Triangle and Changing Graph Views

There is another interpretive method that teachers may want to consider alongside POE.

The line graphs in Arclight are not arguments. They are simply visualizations of how many MHDL pages a given term appears in per year. To help students more fully think through what they are seeing, teachers might ask them to think about the relationships between the terms they are searching, the books and magazines they are searching within, and the impact of digitization and search algorithms within the process. We call this the Scaled Entity Search Interpretative Triangle. And if this sounds big and confusing, see Kit Hughes’ blog posts on the Scaled Entity Search (SES) technical method and interpretive method.

On an interactive level, students can change their visualization and reflective on the corpus and digitization by clicking on the dotted line icon and/or percentage icon. The dotted line icon graphs the MHDL’s entire corpus, revealing how some years have way more pages indexed than other years. The percentage icon helps correct for this by “normalizing” the data that is being visualized by dividing the number of page hits per year by the number of total pages per year. In other words, the normalization feature accounts for the fact that more pages were scanned from some years than others. And in the case of the sports visualization, the trend lines change quite a bit — especially in the stability of the lines from the late-1940s through the mid-1950s:

Arclight sports line graph -- normalized

Ultimately, no visualization is perfect, nor should it be. By offering users a variety of visualization options and the ability to access the underlying data, we hope to drive home the understanding that all graphs are incomplete abstractions and best used as jumping off points into further analysis. Yet they are valuable precisely because they may lead us toward analyses and questions that we otherwise would have never considered. And we hope that you and your students may even have fun creating, changing, and playing with these visualizations.

Please give Arclight a try with your students and let us know how it goes. We hope that it allows students to playfully engage in historical exploration and come away with valuable lessons about digital technology too. We are all living in a big data world. We have long trained our students in how to closely read a singular text. We need to complement this with more teaching activities that encourage analyzing many texts at a large scale — and dealing with all the uncertainty and messiness that goes along with it.

Acknowledgements:

This project was developed by teams at the University of Wisconsin-Madison and Concordia University and sponsored by a Digging into Data grant from the U.S.’s Institute for Museum and Library Services and Canada’s Social Sciences and Humanities Research Council. Additional support came from the University of Wisconsin-Madison’s Office of the Vice Chancellor for Research and Graduate Education and Concordia University’s Media History Research Centre.

The Arclight Software Development Team is Comprised of:

Project Directors: Charles Acland and Eric Hoyt

Interface Design and Programming: Kevin Ponto and Alex Peer

Search Index Development: Eric Hoyt, Kit Hughes, Derek Long, Peter Sengstock, Tony Tran

Thank you also to broader team and community who contributed to Project Arclight.

One final thanks…

The author wishes to thank the Madison Teaching and Learning Excellence (MTLE) program. Without MTLE, this media scholar would never have learned about POE or adopted the strategies of active learning.

Share

]]>
Anne Friedberg, Innovative Scholarship, and Close Up (1927-1933) http://blog.commarts.wisc.edu/2014/01/24/anne-friedberg-innovative-scholarship-and-close-up-1927-1933/ http://blog.commarts.wisc.edu/2014/01/24/anne-friedberg-innovative-scholarship-and-close-up-1927-1933/#comments Fri, 24 Jan 2014 16:45:29 +0000 http://blog.commarts.wisc.edu/?p=23485

 

I feel deeply grateful and honored that Lantern, the search and visualization platform for the Media History Digital Library that I designed and produced, will receive the Anne Friedberg Innovative Scholarship Award at the 2014 Society for Cinema & Media Studies Conference (SCMS) in Seattle.

All awards are nice, but this one means a lot to me personally. It means a lot because Lantern was the result of two years of hard work with an amazing team of collaborators, including David Pierce, Carl Hagenmaier, Wendy Hagenmaier, Andy Myers, Joseph Pomp, Derek Long, Anthony Tran, Kit Hughes, and Pete Sengstock. We put up the site with the hope that others would find it valuable. The SCMS Awards Committee has given us something incredibly valuable in return — validation from the field and a stamp of “post peer review.”

As a pre-tenure University professor, the distinction of this award is especially meaningful. The Friedberg Award has gone to a book every year since its inception in 2011. By granting the award to Lantern, SCMS is telling my tenure committee that it needs to seriously consider the scholarly contribution of my digital work. Beyond simply helping my own career, I think the award holds the potential of advancing the field of Film & Media Studies as a whole. I hope it inspires graduate students and other scholars to undertake ambitious digital projects.

More than any other reason, though, this recognition means a lot because of Anne Friedberg.

I was fortunate to have known Anne Friedberg, though it was for far too brief of a time. When I began my PhD studies at the University of Southern California (USC), Friedberg was the Chair of the Division of Critical Studies in the USC’s School of Cinematic Arts. The phrase “visionary leader” should not be tossed around lightly, but it perfectly described Anne. She had a dynamic energy that inspired everyone in the department, including me. She had a clear vision for where she wanted the department and entire field to go — toward an inclusive yet rigorous study of media and the moving image across different forms, cultures, and historical periods. Yet like most great leaders, she was also a great listener. She took the concerns of graduate students and faculty members seriously when it came time to make decisions and set an agenda. Her illness and death from cancer in 2009 was a devastating loss for the department and SCMS (an organization she was preparing to lead as the President Elect).

Intertwined with Anne Friedberg the visionary leader, there was Anne Friedberg the intellectual, scholar, and writer. Anne’s curiosity was boundless. She is best remembered for two books, Window Shopping: Cinema and the Postmodern (1993) and The Virtual Window: From Alberti to Microsoft (2006), and her contributions to the study of film theory, film and modernity, and visual culture. But she could speak intelligently about anything with a connection to media. I came to USC’s PhD program after a three year stint of working in the Los Angeles film industry. It was a delight, therefore, to discover that that my department’s chair — a woman with a reputation as being a “theory person” — had a deep knowledge of the contemporary entertainment industry. We talked in depth about the WGA strike and challenges involving compensation and the definition of what constitutes “new media.” Anne’s knowledge of the contemporary industry came in large part from her partner, who was a professional screenwriter, but her analysis of the industry was entirely her own. She was capable of seeing the big picture, dissecting it, and finding the connections between “theory” and “industry” that the rest of us had missed.

Anne Friedberg was also a believer in a digital scholarship. She collaborated with Erik Loyer on developing The Virtual Window Interactive, a web-based experience that extends the argument of her book, The Virtual Window, by playfully inviting users to juxtapose content, viewing windows, and spectator positions (I’m especially fond of viewing the 1902 film The Gay Show Clerk on the flip phone as the idealized contemporary male viewer). The Virtual Window Interactive was published in Vectors, the innovative and important multimedia journal edited by two of Friedebrg’s USC colleagues, Tara McPherson and Steve Anderson. Without the examples of Friedberg’s, McPherson’s, and Anderson’s digital scholarship, I don’t think I would have ever thrown myself into working on Lantern and the Media History Digital Library in the way that I did.

Close Up (1927-1933): Cinema and ModernityI hope that Anne Friedberg would have liked Lantern’s interface. However, I know for a fact that she would have liked many of the digitized publications. Anne understood that magazines about media were simultaneously important historical documents and media objects themselves. In the late-1990s, Friedberg co-edited the anthology Close Up (1927-1933): Cinema & Modernism, which curated selections from the important film magazine Close Up with accompanying introductions and analyses by Friedberg and co-editors James Donald and Laura Marcus.

It is my great pleasure to announce that, as of today, the complete 1927-1933 run of Close Up is accessible at the MHDL and completely searchable within Lantern. You can find it on the MHDL’s homepage and Global Cinema Collection. The magazine was scanned and sponsored by the Library of Congress Packard Campus for Audio-Visual Conservation.

Close Up was a hybrid publication in many ways — an English-language periodical, which was published in Switzerland, bridging the art, literary, and film worlds. Edited by Bryher and her husband Kenneth Macpherson, Close Up became the magazine for energetic debates about the nature of cinema and manifestos imagining new forms of filmmaking and spectatorship. The magazine published articles by filmmakers, such as Sergei Eisenstein, and female modernist writers, such as H.D. and Gertrude Stein. As Friedberg explains, “Close Up became the model for a certain type of writing about film — writing that was theoretically astute, politically incisive, critical of films that were simply ‘entertainment.’ For six and a half years, Close Up maintained a forum for a broad variety of ideas about the cinema; it never advocated a single direction of development, but rather posed alternatives to existing modes of production, consumption, and film style.” Like Friedberg’s own books, Close Up continues to be essential reading for anyone interested in the history of film and media theory.

I strongly recommend paging through the digitized run of Close Up alongside a cup of coffee and Friedberg’s essays from the Close Up (1927-1933): Cinema & Modernism anthology. Friedberg offers a well researched historical account of the magazine’s publication. She calls our attention to Close Up’s small 5 1/2 inch by 7 3/4 inch size, the sort of material detail that it is all too easy to forget when reading Close Up’s articles in reprinted text form or online in the MHDL.

What I love most about Friedberg’s Close Up essays is that she doesn’t simply tell us about the magazine. Instead, she goes on to model how we might read Close Up and discover connections between a 1928 magazine and our contemporary media experience. One key aspect of Anne’s reading process is to seize onto some fascinating detail in the text. This detail becomes the first node in what will become an entire network, with edges or connections that bridge past and present and history and theory. In her essay “Reading Close Up,” for example, Friedberg calls our attention to the curious wording and typographical choices in the following advertisement:

Bound volumes of Close Up are collector’s books, and should be in the possession of all followers of the cinema. With much that is exclusive and unobtainable elsewhere, they will be undoubtedly of the greatest value as

REFERENCE BOOKS FOR THE FUTURE

as well for the present. The theory and analysis constitutes the most valuable cinematographic development that has yet been made.

That phrase written in all-caps — “REFERENCE BOOKS FOR THE FUTURE” — becomes the central node that Friedberg uses to build and illuminate the rest of the network. The phrase also captures Friedberg’s conviction that studying the history of media is vital if we hope to understand its present and future.

Since receiving the news about the SCMS award and revisiting Anne’s writing on Close Up, I have begun looking at all the magazines within Lantern differently. My attention has shifted away from highlighted text snippets and a linear reading of the articles. Instead, I’m finding myself fascinated and pulled toward classified ads, mastheads, and advertisement designs. I’m still reading the articles, but I am coming at them from new angles and with new questions.

I encourage others to try using Lantern and reading the magazines like Anne Friedberg too. The 2014 Anne Friedberg Innovative Scholarship Award-winner feels richer when you look at it like Anne Friedberg. Really, the whole world feels richer when you look at it like Anne Friedberg.

Anne, you are still leading us forward.

Share

]]>
http://blog.commarts.wisc.edu/2014/01/24/anne-friedberg-innovative-scholarship-and-close-up-1927-1933/feed/ 4
Let’s talk about search: Some lessons from building Lantern http://blog.commarts.wisc.edu/2013/08/14/lets-talk-about-search-some-lessons-from-building-lantern/ http://blog.commarts.wisc.edu/2013/08/14/lets-talk-about-search-some-lessons-from-building-lantern/#comments Wed, 14 Aug 2013 18:32:09 +0000 http://blog.commarts.wisc.edu/?p=21355 This week, LanteScreen Shot 2013-08-14 at 1.27.58 PMrn reached its first wide public.

Lantern is a search and visualization platform for the Media History Digital Library (MHDL), an open access digitization initiative that I lead with David Pierce. The project was in development for two years, and teams from the MHDL and UW-Madison Department of Communication Arts collaborated to bring version 1.0 online toward the end of July. After a whole bunch of testing, we decided that the platform could indeed withstand the scrutiny of the blogosphere. It’s been a pleasure to see that we were right. We’re grateful for supportive posts from Indiewire and David Bordwell and web traffic surpassing anything we’ve experienced before.

I will leave it in the capable and eloquent hands of David Bordwell to explain what the searchability of the MHDL’s 800,000 pages of books and magazines offers to film and broadcasting historians. In this Antenna post, I wanted to more broadly touch on how the search process works. I will address visualization more fully in another post or essay.

We run searches online all the time. Most of us are inclined to focus on the end results rather than the algorithms and design choices take us there. Cultural studies scholars such as Alexander Halavais have offered critical commentary on search engines, but it wasn’t until I began developing Lantern in 2011 that I bothered to peek under the hood of a search engine for myself. Here are five lessons I learned about search that I hope will prove useful to you too the next time you search Lantern or see a query box online.

1. The collection of content you are searching matters a lot.

It would have been great if the first time Carl Hagenmaier, Wendy Hagenmaier, and I sat down to add fulltext search capability to the MHDL’s collections we had been 100% successful. Instead, it took a two year journey of starts, stops, and reboots to get there. But in other ways, it’s a really good thing that we initially failed. If we had been successful in the Fall of 2011, users would have only been able to search a roughly 100,000 page collection comprised primarily of The Film Daily, Photoplay, and Business Screen. Don’t get me wrong, those are great publications. And we now have many more volumes of Photoplay and The Film Daily than we did back then. But over the last two years, our collections have boomed in breadth and diversity along with size and depth. Thanks to our partnerships with the Library of Congress Packard Campus, Museum of Modern Art Library, Niles Essasany Silent Film Museum, Domitor, and others, we have added a tremendous number of magazines, broadcasting, early cinema journals, and books. In 2011, a search for “Jimmy Stewart” would have probably resulted in some hits from the fan magazine Photoplay (our Film Daily volumes at that time didn’t go past 1930). Today, the Lantern query “Jimmy Stewart” yields 407 matching page hits. Take a look at the top 10 results ranked by relevancy. Sure enough, 5 of the top 10 results come from Photoplay. But there are also matching pages from Radio and TV Mirror, Independent Exhibitors Film Bulletin, and International Projectionist — all sources that a James Stewart biographer probably would not think to look. And who would guess that International Projectionist would refer to the star with the casual “Jimmy”? These sorts of discoveries are already possible within Lantern, and as the content collection further expands, there will only be more of them.

2. Always remember, you are searching an index, not the actual content.

This point is an important caveat to the first point. Content matters, but it is only discoverable through an index, which is itself dependent upon the available data and metadata. A search index is a lot like the index at the back of a book — it stores information in a special way that helps you find what you are looking for quickly. A search engine index, like the open source Solr index that Lantern uses, takes a document and blows it apart into lots of small bits so that a computer can quickly search it. Solr comes loaded with the search algorithms that do most of the mathematical heavy lifting. But as developers, we still had to decide exactly what metadata to capture and how to index it. In my “Working Theory” essay co-written with Carl and Wendy, I’ve described how MARC records offered insufficient metadata for the search experience our users wanted. In this post, I want to emphasize is that if something isn’t in the index, and if the index doesn’t play nicely with the search algorithms, then you won’t have a happy search experience. Lesson #3 should make this point more clear.

3. Search algorithms are designed for breadth and scale, so don’t ask them to search in depth

Open source search algorithms are better at searching 2 million short documents, each containing 500 words of text, than at searching 500 very long documents containing 200,000 words each. I learned this lesson the hard way. At the Media History Digital Library, we scan magazines that have been collected and bound together into volumes. So in our early experiments with Lantern, we turned every volume into a discrete XML file with metadata fields for title, creator, date, etc., plus the metadata field “body” where we pasted all the text from the scanned work. Big mistake. Some of the “body” fields had half a million words! After indexing these XML documents, our search speed was dreadfully slow and, worse yet, the results were inaccurate or only partially accurate. In some cases, the search algorithms would find a few hits within a particular work and then time out without searching the full document. The solution — beautifully scripted in Python by Andy Myers — was to turn every page inside a volume into its own XML document, then index all 800,000 MHDL pages as unique documents. This is the only way we can deliver the fast, accurate search results that you want. But we also recognize that it risks de-contextualizing the single page from the larger work. We believe the “Read in Context” option and the catalog descriptions offer partial answers to this challenge of preserving context, and we’re working on developing additional methods too.

4. Good OCR matters for searchability, but OCR isn’t the whole story

You don’t need OCR (optical character recognition) to search a blog or docx Word file. Those textual works were born digital; a computer can clearly see whether that was an “a” or “o” that the author typed. In contrast, Moving Picture World, Radio Mirror, and the MHDL’s other books and magazines were born in the print age. In order to make them machine readable, we need to run optical character recognition — a process that occurs on the Internet Archive’s servers using Abbyy Fine Reader. Abbyy has to make a lot of guesses about particular words and characters. We tend to scan works that are in good condition at a high resolution, and this leads to Abbyy making better guesses and the MHDL having high quality OCR. Nevertheless, the OCR isn’t perfect, and the imperfections are immediately visible in a snippet like this one from a 1930 volume of Film Daily: “Bette Davis, stage actress, has been signed by Carl Taemmle. Jr.” The snippet should say “Carl Laemmle, Jr.” That is the Universal executive listed on the page, and I wish our database model enabled users to log in and fix these blemishes (hopefully, we’ll get to this point in 2014). But — you may have guessed there was a but coming — our search algorithms use some probabilistic guessing and “stemming,” which splinters individual words and allows your query to search for related words (for instance, returning “reissue” and “reissuing” for a “reissue” query. The aggressiveness of stemming and probabilistic word guessing (aka “fuzzyness”) is something that developers can boost or turn down. I’m still trying to flavor Lantern’s stew just right. The big takeaway point, though, is that you’ll quickly notice the OCR quality, but there are other hidden processes going on shaping your results.

5. The search experience has become increasingly visual.

As my colleague Jeremy Morris pointed out to me during one of our food cart lunches outside the UW Library, the search experience has become highly visual. Googling a restaurant now renders a map within the results page. Proquest queries now return icons that display the format of the work — article, book, etc. — but not an image of the actual work. I’d like to think Lantern’s results view one-ups Proquest. We display a full color thumbnail of the matching page in the results view, not simply an icon. The thumbnail communicates a tremendous amount of information very efficiently. You quickly get a sense about whether the page is an advertisement or news story, whether it comes from a glossy fan magazine or a trade paper published in broadsheet layout. Even before you read the highlighted text snippet, you get some impression of the page and source. The thumbnails also help compensate for the lack of our metadata’s granularity. We haven’t had the resources to generate metadata on the level of individual magazine issues, pages, or articles (it’s here that Proquest one-ups us). By exposing the thumbnail page image, though, you visually glean some essential information from the source. Plus, the thumbnails showcase one of the strengths of the MHDL collection: the colorful, photo rich, and graphically interesting nature of the historic magazines.

Ok, now it’s your turn to think algorithmically. When you search for a movie star and sort by relevancy, why is it that the most visually rich pages — often featuring a large photo — tend to rank the highest?

The answer is that those pages tend to have relatively few words. If there are only eight words on a portrait page from The New Movie Magazine and two of them are “Joan Crawford,” then her name occupies a far higher word frequency-to-page percentage than a page from Variety that is jam packed with over 1,000 words of text, including a story announcing Joan Crawford’s next picture.

Should I tweak the relevancy algorithm so that image-heavy pages aren’t listed so high? Should I ascribe greater relevancy to certain canonical sources, like Photoplay and Variety, rather than magazines outside the canon, like New Movie and Hollywood Filmograph? Or should we weight things the other way around — try to nudge users toward under-utilized sources? I would be curious to know what Antenna readers and Lantern users think.

There are advantages and disadvantages no matter what you choose. The best approach, as I see it, may just to be to let the ranking algorithm run as is and use forums like this one to make their workings more transparent.

Share

]]>
http://blog.commarts.wisc.edu/2013/08/14/lets-talk-about-search-some-lessons-from-building-lantern/feed/ 11