Tag Archives: automation

Testifyin’ about the Command Line

17 Jun

Anyone who’s ever worked in a library knows that among the interesting reference interviews and information literacy projects are also tedious kill-me-now projects. I was handed one of these projects last week — I’m to find the official website for 1,300 health sciences journals. This involves pasting each title into google, running a search, finding the official page and pasting that back into my spreadsheet. Kill me now, right?

Doin\' It the Hard Way

Well, no. Because I thought to myself “there has to be a better way.” Then I called my friend Dianne. She was so delighted by the fact that I wanted to automate that she didn’t seem to mind that this would turn into a two-hour coding project.

So, we asked ourselves, if computers are really good at doing things over and over and over, and this is a task that requires repetition, why wouldn’t we just have the computer do it?

Here’s the idea — we’ll write a script that inputs our list of journals into google, spits back the first ten results, and puts them into an html document so that all I have to do is check, rather than search. As we played with it, we realized that we didn’t want any results from google or its cache, we didn’t want anything from elsevier or science direct, and we wanted to make sure to add the word “journal” to our search string so we didn’t get unrelated results.

So, here’s Dianne’s code (in all its glory):

for i in `cat testjournals2008.txt`; do
	search=`echo $i | awk -F "\t" ' { print $3 " journal"} '`
	lynx -dump -force_html -listonly $search | grep -v google | 
        grep -v youtube | grep -v elsevier | grep -v sciencedirect | 
        grep -v wikipedia | grep -v cache | grep -v amazon | grep -v 
        nytimes | head -n 13 | tail -n 10  > results 
	echo -n $i
	for j in `cat results`; do
		if [[ $j != "References" || $j != "" ]]; then 
			echo -n "	"
			result=`echo $j | awk ' { print $2 } '`	
			echo -n $result

She talked me through this as we were doing it, and I understand most of what’s in there. As well as the spreadsheet, we produced an html page so that I could just click through and test sites.

And, I have to say that our results are AWESOME. In most cases, the first result is the proper page — it just goes to show that where mad librarian skills (setting up a good search) and a healthy approach to technology (making it work for me) combine, magic can happen.