UPDATED: This article has been updated to include example code for searching The Guardian's archive as well using their API. That may be found at the bottom of this article.
I needed to be able to search the NY Times for articles in their archives which match a search term within a defined time period. Thankfully, they have provided an excellent (and free) resource through an API (v2) which makes that possible.
The first thing you will want to do is review their terms and apply for an API key here: http://developer.nytimes.com Then, if you are interested, I have included the code I used to retrieve my results below, which you might find helpful in performing your own searches. You will need to enter your API key value in the code before running (look for apikey near the very end of the code).
Running the ruby code below should prompt you to enter:
the term you want to search on
the begin and end dates in YYYY-MM-DD format
An appropriately named CSV output file for the results is created in the folder where the script is run. Some simple information relating to the search will be displayed as it is running in the terminal.
The following retrieved pieces of information from the returned search results are saved in the CSV file: pub date, source, headline, URL. I have noted some other values that are available in the results, and they should be fairly simple to add to an output source such as the results file.
PLEASE NOTE that the results file created will overwrite any files that may happen to be named the same thing each time it is run. While this is unlikely, you can set the default output name in the code. Currently it is set to save using a name following this format: “nytimes-SEARCHTERM-BEGINDATE-ENDDATE.csv”
The Very Basics
You probably don’t need this, but in case you do (everyone has needed some help getting started at some point), here is a simple step-by-step that may help you if you want to try to run a search using my code:
Open your terminal and type “ruby -v”. As long as it returns a version of at least 1.9.3, you are ready to continue. If it returns 1.8.7, you will need to look at upgrading your Ruby version (I recommend you look into rbenv)
Install the required gem by running “sudo gem install httparty” in the terminal
Copy the code below and save it as a file named “search-nytimes-articles.rb”