Comments on: Let’s talk about search: Some lessons from building Lantern http://blog.commarts.wisc.edu/2013/08/14/lets-talk-about-search-some-lessons-from-building-lantern/ Responses to Media and Culture Fri, 12 Feb 2016 19:35:04 +0000 hourly 1 https://wordpress.org/?v=4.7.5 By: Eric Hoyt http://blog.commarts.wisc.edu/2013/08/14/lets-talk-about-search-some-lessons-from-building-lantern/comment-page-1/#comment-413746 Fri, 06 Sep 2013 20:40:48 +0000 http://blog.commarts.wisc.edu/?p=21355#comment-413746 Hi Jeremy,

Thanks for flagging this error. I got quite a few messages from folks today encountering the same issue.

I’m happy to say that the error has been resolved. My department’s computer media and server specialist, Pete Sengstock, deserves all the credit for properly diagnosing and fixing the glitch. We’ll be making more improvements in the coming monthstoo.

You can now resume searching away to your heart’s delight. As you’ll see, Lantern holds no grudge against foreign names. A search for “Méliès” returns 57 hits. And if you want to cast a much wider net, the search “melies” returns 2,680 results!:

http://lantern.mediahist.org/?utf8=%E2%9C%93&utf8=%E2%9C%93&q=melies

]]>
By: Eric Hoyt http://blog.commarts.wisc.edu/2013/08/14/lets-talk-about-search-some-lessons-from-building-lantern/comment-page-1/#comment-413745 Fri, 06 Sep 2013 20:34:23 +0000 http://blog.commarts.wisc.edu/?p=21355#comment-413745 Hi Jeremy,

Thanks for flagging this error. I got quite a few messages from folks today encountering the same issue.

I’m happy to say that the error has been resolved. My department’s computer media and server specialist, Pete Sengstock, deserves all the credit for properly diagnosing and fixing the glitch. We’ll be making more improvements in the coming monthstoo.

You can now resume searching away to your heart’s delight. As you’ll see, Lantern holds no grudge against foreign names. A search for “Méliès” returns 57 hits. And if you want to cast a much wider net, the search “melies” returns 2,680 results!:

http://lantern.mediahist.org/?utf8=%E2%9C%93&utf8=%E2%9C%93&q=melies

]]>
By: Jeremy Butler http://blog.commarts.wisc.edu/2013/08/14/lets-talk-about-search-some-lessons-from-building-lantern/comment-page-1/#comment-413739 Fri, 06 Sep 2013 17:44:09 +0000 http://blog.commarts.wisc.edu/?p=21355#comment-413739 It would appear that Lantern does not like foreign characters. I just tried a search for “Méliès” and Lantern responded with the following.

Oh wait, it would appear that there’s a more systematic problem with Lantern right now. I tried “Melies” and got the same response. Every search I try right now (Friday, 12:43 CDT, 9/6) is bouncing.

ActiveRecord::StatementInvalid in CatalogController#index

SQLite3::SQLException: cannot rollback – no transaction is active: rollback transaction

Request

Parameters:

{“utf8″=>”✓”,
“q”=>”Méliès”}

]]>
By: Jeremy http://blog.commarts.wisc.edu/2013/08/14/lets-talk-about-search-some-lessons-from-building-lantern/comment-page-1/#comment-412300 Mon, 19 Aug 2013 17:20:30 +0000 http://blog.commarts.wisc.edu/?p=21355#comment-412300 Thanks for the additional info. I’m sure, as these bugs get squashed, that Lantern will become more and more useful!

Regards,

Jeremy

]]>
By: Eric Hoyt http://blog.commarts.wisc.edu/2013/08/14/lets-talk-about-search-some-lessons-from-building-lantern/comment-page-1/#comment-412152 Sun, 18 Aug 2013 20:24:12 +0000 http://blog.commarts.wisc.edu/?p=21355#comment-412152 Hi Jeremy,

Thanks for flagging this error. In the spirit of the initial Antenna post, I will try to explain what is happening (though, as you’ll see, there is something going on in this Photoplay volume that even I don’t understand).

1. We’ve indexed every page from every magazine volume as a XML document. This page-level document’s unique ID maps to the JPEG2000 page image on the Internet Archive’s servers.

2. Lantern uses the unique ID (in this case, photoplayvolume552chic_0257) to quickly pull that page image from the Internet Archive and display it within Lantern’s results page and Downloading/More Options page.

3. When you click “Read in Context,” you open a stream of the JPEG2000 images within the Internet Archive’s BookReader. Ideally, this opens to you to the exact page you want. However, due to the way the BookReader works and some problematic pagination metadata on our end, we sometimes open you one page too far. As a user, if you don’t see what you are looking for immediately, just hit the left arrow key once and it should get you to the right page.

In this case, though, that page spread of Lana Turner from Photoplay comes a whole 16 pages behind where you are dropped: http://archive.org/stream/photoplayvolume552chic#page/n233/mode/2up

This is not a good user experience, to say the least.

The problem for this Photoplay volume may have to do with the default page that the BookReader assumes is page one. I may be able to fix this in our metadata at the Internet Archive.

I think the solution to the bigger problem (Read in Context not delivering you 100% of the time to the right page) will only come when we hack open the BookReader’s code and customize our own version of the BookReader. This is high on my priority list for the next year. And thanks to the Internet Archive’s open source license for the BookReader, it is completely possible.

Eric

Well, this one is a bit of a head scratcher. Not all of Lantern’s pagination metadata is accurate.

]]>
By: Eric Hoyt http://blog.commarts.wisc.edu/2013/08/14/lets-talk-about-search-some-lessons-from-building-lantern/comment-page-1/#comment-412150 Sun, 18 Aug 2013 19:59:15 +0000 http://blog.commarts.wisc.edu/?p=21355#comment-412150 Hi Jeremy,

Thank you for the positive feedback about Lantern and my Antenna post. I’m very glad to hear you will be able to put this to use in your teaching. And I like your point about how different users may prefer image-heavy vs. text-centric page results (or even the same user running searches for different purposes — locating examples for teaching vs. evidence for a research article).

I also appreciate hearing about the aspects of the interface that aren’t working so well. I tested Lantern in Firefox, Chrome, and Safari, though always on a Mac. I now realize that was shortsighted. I will give things a spin next week on a Windows machine and using IE. I think/hope that writing a few more CSS rules will provide the belt & suspenders that makes the formatting work across PCs and Macs in all modern browsers.

Eric

]]>
By: Jeremy Butler http://blog.commarts.wisc.edu/2013/08/14/lets-talk-about-search-some-lessons-from-building-lantern/comment-page-1/#comment-412081 Sat, 17 Aug 2013 12:28:35 +0000 http://blog.commarts.wisc.edu/?p=21355#comment-412081 PS Just ran into an issue with “read in context.” On this Photoplay search result

http://lantern.mediahist.org/catalog/photoplayvolume552chic_0257

the caption refers to Lana Turner as being “far left,” but she’s not actually in the image as it’s a two-page spread. I clicked on “read in context” to see the conjoining page and was taken to the wrong page in the original document.

]]>
By: Jeremy Butler http://blog.commarts.wisc.edu/2013/08/14/lets-talk-about-search-some-lessons-from-building-lantern/comment-page-1/#comment-412079 Sat, 17 Aug 2013 12:17:10 +0000 http://blog.commarts.wisc.edu/?p=21355#comment-412079 This is an utterly brilliant resource, Eric. Thank you so much for it and for this blog post that pulls back the curtain and allows us to understand better how Lantern’s search engine was conceived and built. I’m sure I’ll be able to draw many, many resources from it for my film/TV history classes.

As I work with it more, I should be better able to respond to your questions about the search algorithm, but I can say that for pedagogical purposes the extra weight to image-heavy pages is useful; but then I’m often looking for images to help illuminate lectures. For those doing more hardcore research, a text-centric algorithm might be preferable.

And here’s a tiny comment about the UI:

In the Chrome browser, on Windows, the opening of tabs by clicking on links does not function quite as I’d expect. When I control-click, a new tab opens, but not in the background, as it should. Instead, it takes the focus away from the search-results page. And a middle-button click doesn’t open a tab, as it should, but, instead, takes me to that page. A right-button click, followed by choosing “open link in a new tab”, works correctly.

As I said, a small thing, but I suspect many users have gotten used to being able to open tabs in the background from a search-results page.

I just tested Lantern on Microsoft Internet Explorer 10 and the tabs open correctly, but there’s a bigger issue: the thumbnails are way too big. And that screws up the page layout on both the search-results page and the details page.

Oh, the joys of designing for myriad browsers!

Thanks again for creating such a fabulous service. I look forward to watching it evolve and grow!

]]>
By: The View Beyond Parallax… more reads for week of August 16 | Parallax View http://blog.commarts.wisc.edu/2013/08/14/lets-talk-about-search-some-lessons-from-building-lantern/comment-page-1/#comment-411969 Fri, 16 Aug 2013 16:03:20 +0000 http://blog.commarts.wisc.edu/?p=21355#comment-411969 […] One of the web’s best resources for buffs of film and radio history, the 800,000 scanned pages of books and periodicals (fan magazines such as Photoplay and Motion Picture; industry news sources including Variety and Motion Picture Daily; even specialist journals with sexy titles like Projection Engineering and Exhibitors Trade Review) in the Media History Digital Library, has just become far easier to explore thanks to Lantern, a search engine developed by MHDL co-director Eric Hoyt. Happy news Hoyt’s fellow Badger David Bordwell passed along, along with some tips on using the interface from Hoyt himself. […]

]]>
By: Eric Hoyt http://blog.commarts.wisc.edu/2013/08/14/lets-talk-about-search-some-lessons-from-building-lantern/comment-page-1/#comment-411930 Fri, 16 Aug 2013 00:03:52 +0000 http://blog.commarts.wisc.edu/?p=21355#comment-411930 Derek, these are all excellent points. Thanks for bringing them into the discussion.

You are right about the XML and metadata process. Coding metadata with XML is sort of like marking up a poem using TEI schema. It’s time consuming and labor intensive, but it insures a certain level of quality. Better still, the process forces you to reflect on what you are doing and, often times, reevaluate your entire understanding of both the underlying work and the approach to the mark-up.

Because generating metadata, structuring data, and creating an index is so time consuming, however, the Humanities really needs to do a better job at valuing this form of labor and the contribution it makes. I was glad that Christa Williford and Charles Henry raised this point in their CLIR report “One Culture”: http://www.clir.org/pubs/reports/pub151 Hopefully, more people are coming around to understanding this.

Thanks for reminding users to engage with the interactive magazine gallery / data visualization. I’ll be writing more about that soon myself.

Eric

]]>