Creative Commons LicenseThis tutorial by Tierney Steelberg, Digital Liberal Arts Specialist at Grinnell College’s DLAC, is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Table of Contents

Click any link to head to that section.

  1. What is Gale Digital Scholar Lab?
    1. Logging In
    2. Create a Workspace and Using Groups
  2. Building a Content Set
    1. Build a Content Set with Gale Documents
    2. Uploading Your Own Content
  3. Cleaning Your Content Set
  4. Analyzing Your Content Set
      1. Types of Tools
    1. Adding Tools
    2. Setting Up Analysis
    3. Analysis Considerations
  5. Resources

What Is Gale Digital Scholar Lab?

Gale Digital Scholar Lab is an online platform, licensed through the Grinnell College Libraries, which provides a suite of digital humanities tools to search, format, and analyze texts drawn from Gale primary-source databases in the libaries’ collection, complete with a variety of data visualization options. This kind of analysis of large groupings of texts is known in digital humanities as text analysis, textual analysis, or text mining.

Logging In

You can access Gale Digital Scholar Lab from the Libraries’ A-Z Database list, or by using this stable URL. If not on campus, you may be prompted to log in upon clicking the link.

In order to use Gale Digital Scholar Lab, you will need to create an account upon accessing the tool for the first time. Follow these instructions to do so:

  1. Click the orange “Log In/Create Account” button.
  2. Click the “Sign in with Microsoft” button.
  3. Enter your Grinnell Microsoft credentials.
  4. Once logged in, you should see your first name in the upper right-hand corner of the site.

Once you have created an account, you will be able to log in with your Grinnell Microsoft credentials following the same steps. The work that you do in the Digital Scholar Lab is saved in the cloud to your account as you go, so your content sets and analysis will be there for you when you log back in.

Screenshot of the Gale Digital Scholar homepage when not yet logged in, with orange log in/sign up button in the center

 

Create a Workspace and Using Groups

Each time you log into the Lab, you will be prompted to select your Personal Workspace, an existing Group Workspace, or to Create a New Workspace.

 

Choose “Create a New Workspace”. Once there, you will need to give the group a name and then invite collaborators.

You can Add or Edit the Collaborators in a Group at any time, by clicking the Edit button in the Collaborators panel.

 

Building a Content Set

In order to have text to analyze, you will need to build what Digital Scholar Lab calls a “content set” first. Some other text analysis tools refer to this as a “corpus” – basically, it is a collection of related documents.

 

Building a Content Set with Gale Documents

You can use Digital Scholar Lab to search for Gale primary source materials available through Grinnell College Libraries, and build content sets composed of primary source documents related to your research interests.

  1. To start your search, just use the search bar on the homepage! You can do a basic search (that includes Boolean operators AND, OR, and NOT) to start finding results.
    • You can click the “Advanced Search” option below the search bar for a more complex, refined search.
  2. Comb through your results: you can see document titles, snippets of the OCR (Optical Character Recognition) text, and the most relevant metadata.
    • You can view documents in both their digitized forms and with the OCR text: put them side by side to evaluate accuracy. Some OCR attempts are more successful than others.
  3. Refine your search results as needed using the filters on the left-hand side of the search results page.
  4. When you find documents you want to analyze, add them to your content set by checking the box to the left of the title, and then the green “Add to Content Set” button at the top of the page. You can choose whether you would like to add documents to a new or existing content set.
    • You can select multiple documents at a time, all the results on the page, or even all search results (up to 10,000 documents) to add all at once.
Screenshot of search results for the phrase "gin & tonic" in Gale Digital Scholar Lab holdings
Source: Gale

Learn more from the “Build” pages in Gale’s Learning Center.

 

Uploading Your Own Content

In addition to a building a content set using Gale primary sources, you can also upload files of your own to Digital Scholar Lab, to analyze on their own or alongside other content.

Please note that the files you upload must be plain text files (.txt file extension), and the file(s) you upload cannot exceed 10 MB at a time. For best results in text analysis, make sure your files are cleaned up beforehand.

For some sample content to upload, try this plain-text file of Melville’s Moby Dick!

Follow this step-by-step walkthrough for uploading your own files:

  1. Click “Build” in the top level menu.
  2. Navigate to the “Upload” box on the right-hand side of the screen.
  3. Click “Browse…” and select one or more files from your computer to upload.
  4. Make sure your file appears in the “Successfully Uploaded” list.
  5. Recommended: click the “Add Metadata” button in the bottom left corner of the box to add relevant metadata (author, publisher, publication date…) to your file(s), for organizational purposes and to aid in some analysis.
  6. Check the box(es) next to your file(s) in the Successfully Uploaded list – make sure the boxes are checked, or your file(s) will not upload.
  7. Click the “Add to Content Set” button in the bottom right corner of the box.
  8. Choose whether you would like to add the file(s) to an existing or to a new content set.
  9. Add your file(s) to the content set.
  10. You will get a pop-up notifying you that your file(s) have been added to the content set.

Cleaning Your Content Set

“One of the most important elements of text analysis is making sure that your texts are formatted in a way that suits the kind of analysis you want to carry out. The Clean feature of the Gale Digital Scholar Lab lets you edit all the Documents within a Content Set. […] Once you have got your Content Set filled out with all the documents you need, it’s important to prepare it for analysis by cleaning the text. Cleaning a Content Set means stripping it of unwanted words or characters that would adversely affect your analysis.” (Source: Gale, “Clean”)

From the “Clean” tab in the top-level menu, you can create your own cleaning configurations as you learn more about Gale Digital Scholar Lab and about your documents and the cleaning they might need to help hone your analysis. But to start, you can use Digital Scholar Lab’s default cleaning configurations (with or without punctuation) to clean your content sets.

Screenshot of Gale Digital Scholar Lab Clean Configuration page with default set up and stop words section circled in red
Source: Gale

You can run through the various options for text cleaning and customize to your liking. Don’t forget the “Choose a Starter List” option in the right-hand “Stop Words” pane: this allows you to choose “stop words” (common words like “a”, “the,” etc. whose presence may negatively impact your analysis) to remove from the text as part of the cleaning. Digital Scholar Lab has stop word lists for many languages, and you can add your own words list to the list for your clean configuration as well.

Cleaning happens at the analysis level, so once you’ve decided on the clean configuration you want to use and refined it to your liking, head to the “Analyze” tab!

 

Analyzing Your Content SetThe analyze screen of Dale Digital scholar

Digital Scholar Lab has a variety of analysis tools available for you to use in your text analysis.

  1. Click the “Analyze” option on the top menu
  2. You will see a page that lets you choose the content set you want to analyze.
  3. Below that, you will see a range of different tools you can use. These tools are explained below.

Types of Tools

 

Setting up an Analysis Run

  1. For this exercise, look the “Ngrams” tool and click the “Edit” link next to the “Run” button. You must click the “edit” link to apply different cleaning configurations. 
    1. The Ngram option for analyzing text in Gale Digital scholars lab
  2. You will now see the “Tool Setup” page. Here you can give your run a name, select a Ngram size, and change the cleaning configuration.
    1. Set up page for analyzing a corpus on gale scholar
  3. Once you have set up the tool, hit the green “Run” button.
  4. The results of the tool will take a moment or longer to show results, depending on the size of your corpus. Hit the refresh button to see the results.
  5. The results will appear. The default view shows the top 100 Ngrams and the frequency of the term (the number of times the ngram shows up divided by the number of tokens total in the text).
    1. results for the ngram analysis of Moby Dick
  6. You can use the search bar to find a specific Ngram. Search for the term “God” to see how the search bar works.
  7. Choose the “Count” option and notice how the chart changes.

The analysis set up will look slightly different for each different tool. Experiment with these tools and their different set ups!

Downloading Data

  1. If you want to save the data or the visualization, click the “Download” icon on the top right corner. From this menu, you can choose what you want to download and the file type it comes in. To save the visualization, click one of the options under “Term Frequency Visualization“.
    1. Download menu for GDSL

 

Analysis Considerations

Once you have an analysis output from one or more tools, it is up to you to sift through the information given to you by the tool, and conduct your own analysis of its output.

As you look through your analysis, you may notice issues that you would like to correct: perhaps there are errant punctuation marks included, or a persistent misspelling that needs to be addressed. That’s just fine! Text analysis is an iterative process, and Digital Scholar Lab makes it very easy to add or remove documents to/from your content set, create a new cleaning configuration, and run an analysis again with adjusted parameters.

 

Resources

Gale Digital Scholar Lab’s Learning Center, which has been linked throughout this tutorial, is a great resource: it has step-by-step walkthroughs and informational videos about every step of the process, from building a content set, to setting up cleaning configurations, to performing analysis.

css.php