TextAnalyst Tutorial 1

By Josh Froelich

Case Study

 

You are an employee at a venture capital company. The company recently received a proposal to fund a project aimed at developing a new type of collaborative database. You have been assigned to research how and why corporations are using database technologies today. You need to find out whether the proposed project is indeed innovative and justifies the investment.  An Internet search for “collaborative databases” recovers 7,519 documents.

 

The first source you uncover is about eleven pages in length.  You want to know quickly whether or not this source contains relevant and useful information about your project.  But you do not have the time to read the entire document just to answer this question.  This tutorial shows how TextAnalyst quickens and simplifies the analysis.

 

Article Background: Databasing in the 90’s, “Data and What We're Doing With It”, By Jennifer Barrett, Acxiom Corporation.

 

 

Step One: Starting TextAnalyst

1.     From Windows, select Start Menu | Programs | TextAnalyst 2.0 | TextAnalyst  

2.     While TextAnalyst loads in the background, you are presented with the Startup window which offers three main actions:

 

 

 

 

Step Two: Importing Data and Semantic Network Creation

 

1.     Select the top hat Analyze new texts and create a knowledge base icon. icon to analyze new texts and create a knowledge base.

2.     An Open file dialogue box appears specifying that you are looking in the TextAnalyst Tutorials Folder.

Default Location: C:\Program Files\Megaputer Intelligence\MicroSystems\TextAnalyst 2.0\

 

3.     Double click on the Examples folder.

4.     Double click on the file named “Databasing in the 90.txt”.  This opens the file in TextAnalyst.

Once the file is opened TextAnalyst analyzes the file.  As TextAnalyst analyzes text, it determines what concepts - word and word combinations - are most important in the context of the investigated text. Each concept is labeled as a node and assigned a numeric semantic weight – the measure of the probability that this concept is important in the studied text. Simultaneously, TextAnalyst determines the weights of the relations between individual concepts in the text and hyperlinks concepts to those fragments, sentences, in the original text where these concepts have been encountered. Nodes terms are placed in quotes within this tutorial.

The resulting structure, called Semantic Network, is a set of the most significant concepts distilled from the analyzed texts, along with the semantic relationships between these concepts in the text. The Semantic Network is a cyclical graph holding all the most important information from the investigated text in a very concise form. If we were to visualize Semantic Network, it would be similar to molecular structure. All atoms within a molecule are interconnected either directly or through joint neighbors.

Mathematical algorithms inside TextAnalyst determine the relative importance of a text concept, solely by analyzing its connections to other concepts in the text. Therefore, TextAnalyst creates the semantic network without using background knowledge of the subject. TextAnalyst implements algorithms similar to those used for text analysis in the human brain.

When desired, background knowledge may be added by the user through an external dictionary to fine tune TextAnalyst to a particular subject.

 

 

Step Three: Understanding the Basic Interface

 

TextAnalyst can be divided into three main viewing sections or panes.  The top left pane is called the view pane. The user can use tabs to switch between five different views within the view pane that are as follows: Document list, Topic structure, Semantic network, Semantic search, and Search. By default the view pane displays the topic structure tree of the investigated text. The top right pane is the results pane and is currently blank.  The bottom text pane contains the original “Databasing in the 90.txt” text in full length.

 

 

 

Step Four: Understanding the Topic Structure View, Results and Text Panes

You are trying to develop some ideas about the relationships between concepts that you are researching for your report.  You want to know more about the selling of databases.  Instead of reading the entire text, you can interact with the semantic network to easily discover more about the role of sales and using databases in the 90’s.

Each node in the semantic tree in the view pane contains a concept and looks like this:  The number to the left of the word, 99 in this case, represents the semantic weight of this concept, changing from 0 to 100. The different types of fish visualize the rough semantic weight of the concept. Initially, all nodes except the root are closed.  By double clicking on a parent node its children nodes become visible. These children nodes may also contain children nodes, like a family tree.

1.     Double click on the black whale next to the node “database” in the view pane.  A tree structure forms under the node database.

The two numbers located next to each node under “databases” represent different semantic weights.  For example, look at the node “businesses”, which is preceded by 59 99.  The first number, 59, refers to the weight, or strength, of the semantic relationship of the node “businesses” to the parent node “databases”.  The second number, 99, refers to the semantic weight of the word “businesses” to the entire text.

2.     Double click on the node “sold”.  Click the node “<ALL>” under sold.  Every sentence containing the word “sold”, or similar words such as “sell” and “selling,” will appear in red the results pane.  Sentences appear in the text pane in the order of which the sentences are contained in the full text.

In TextAnalyst, every time you see an important concept or word in the results pane that is contained in the view pane’s tree structure the word will be colored red.

You become interested in the sentence “It is fair to say most consumers do not realize the scope of information that is maintained on them, nor do they understand the economics of what that data can do to reduce the costs of developing and selling products and thus the ultimate cost of the product itself.”  You wish to better understand the sentence in its surrounding context, and wish to know where the sentence is located in the full text.

3.     Double click on the sentence listed above in the results pane.

Notice that the sentence becomes highlighted in the results pane, and the sentence is found in the full text also and is highlighted.  You can now read the paragraph and surrounding sentences in which the highlighted sentence is contained to gain a better understanding.

There are several sentences that contain the word sold or an alternate form of it.  You wish to narrow your scope even more to find out specifically about the selling of databases.

4.     Single click the node “sold” in the tree structure directly above the “<ALL>” node you clicked in step 2.  Now the results pane shows only sentences containing both the word “sold” and “databases” or the words similar forms.

 

Step Five: Diving Into the Subject Even More

You want to analyze the importance of sales in Databasing in the 90’s.  You know how to see a list of sentences containing the term sales, but you want more ability to focus your search to know how sales, through looking a the term sold, ties into companies and databases.

1.     In this case you are looking for the node “sold”, located under the top node “databases”.  TextAnalyst can tell that the term “sell” is similar to “sold”.

2.     Under the node “sold”, double click the node “companies”.  Notice the sentences in the results pane.

The sentences are not specific enough to your search for “companies,” “sales” and “databases”.

3.     In the toolbar at the top of the program, locate the seventh icon form the left “Include all parents”.  The icon in the toolbar looks like this.

4.     If you hover over the icon a tool tip appears saying “Include all parents.”  Click this icon.  After clicking the icon it should remain in the pressed state.  You have activated “Include all parents”.

5.     Notice that now there is only one sentence in the results pane that contains all three words, “databases”, “sold”, and “company”. 

You have effectively narrowed your scope.

6.     Press on the icon again to turn off “include all parents.”

7.     Double click on the whale next to the top node “databases” to return the semantic network back to its default closed view. You should only see the node “databases”.

 

Step Six: Using the Dictionary

In your report you are specifically interested in certain keywords but are unsure if TextAnalyst will retrieve them from the text due to their possibly low semantic weight.  You want to edit some of the words the TextAnalyst uses to determine its makeup of the semantic network as described in step four of this tutorial.  You want to add the word “personalized” as in your report you wish to more closely examine personalized databases.

3.     From the main file menu in TextAnalyst, select Settings | Edit Dictionaries.

4.     This starts the VocEdit application, a dictionary program that TextAnalyst uses in certain circumstances.  This is the only language dependant area of TextAnalyst.

5.     Right click anywhere in the left window of VocEdit. Make sure not to right click on a word. 

6.      A small menu appears with the words Add and Find in bold.

 

7.     Select Add.  An entry with the text New Entry is highlighted.  Before clicking anything else type “personalized” without the quotes.  Press the enter key.  The word “personalized” is added to the dictionary.

8.     Right click on “personalized”.  A small menu appears. Select user word.  This will tell TextAnalyst this is an important word.

You have successfully added “personalized” the dictionary. 

9.     Click Exit in the lower right corner of VocEdit.

10. A dialog box appears asking you to save your changes.  Click Yes.

11. A dialog box appears asking if you want to replace the current dictionary file with the new file, or save the new file under a different name.  Click no to save the file under a different name.

12. A Save as dialog box appears.  You should be in the TextAnalyst folder.  You are provided with a default name of TextAnalyst 2.dic.

13. Name the file “mydictionary.dic”. 

14. Click Save.

15. VocEdit will save the file and close.  Return to TextAnalyst.

The next step is to link the new dictionary to TextAnalyst by telling TextAnalyst to use your dictionary in place of the default dictionary.

16. From the Main file menu, select Settings | General settings.

17. A dialog box appears titled General settings.  Select the Analysis tab near the top of the dialog box.

18. Locate on the tab where it says Dictionary:

19. The current dictionary is the default dictionary.  Click on the  button to the right of the current dictionary.

20. An Open dialog box appears.  Locate and select the new dictionary file titled mydictionary.dic.

21. Click Open.  Return to the TextAnalyst Program.

22. Click OK to apply the new settings.  The General settings dialog box disappears.  TextAnalyst applies the new dictionary.

23. At the bottom of the view pane locate the third tab from the left.  Notice that the word “personalized” is located at the top of the new semantic network in bold.  Look at Using the Semantic Network to work with your new findings.

 

Step Seven: Using and Understanding Summary Analysis, Changing the Threshold

You have analyzed a few relationships so far, and better understand TextAnalyst’s look and feel.  You want to compose an overview of the entire text you are analyzing, not just bits and pieces.  Your boss wants a short introductory summary of some of the sources for the report.

TextAnalyst can create multiple length summaries of full texts. 

1.     From the main file menu click on Analysis | Summarization.

2.     TextAnalyst performs the summarization that is now displayed in the results pane.

Notice that the view pane is no longer in its semantic network view.  It now displays some statistics about the summary it performed.  The percent of text size next to the summary is about 14% of the entire document.  TextAnalyst enables you to summarize the entire document to a fraction of its size, and still manages to retain significant meaning in the summary.

During summarization, TextAnalyst determines the semantic weight of each sentence and displays in the results pane only sentences with a semantic weight higher than the threshold.  The default threshold is 90. Currently all sentences with a semantic weight of 90 and higher appear in the results pane.

The summary lists the most important sentences in the context of the original text.  The summary chooses the sentences on the basis of concepts and relationships between concepts in the full text.

You really like this summary being sized only 14% of the entire text.  However, you want a more concise summary.

TextAnalyst allows you to change the size of your summary by changing the semantic weight threshold.  The default as mentioned is 90, so for any summary with the default threshold, all sentences with a semantic weight of 90 to 100 are included, 100 being the maximum height.  By increasing the semantic threshold you can decrease the size of the summary.  TextAnalyst also allows you to view the semantic weights of each sentence in the results pane.

1.     Click the Hammer icon in the toolbar.

2.     A Settings menu appears.

3.     To display weights next to sentences, check the box next to “Display semantic weights of sentences.”

4.     Click Apply.  If you look at the results pane you can now view the semantic weights next to each sentence.  After viewing the weights return to the Settings menu.

5.     Adjust the semantic weight threshold by using the arrow buttons or typing the number you wish to use.  Change the number from 90 to 99. This means that only concepts with a semantic weight of 99 and 100 are included, giving you a shorter summary. 

6.     Click Apply.  Click OK.

The summary is recalculated and created with the new threshold.  Only the most important concepts within the full text were used to create the summary, those with the weight of 99 and 100.

7.     Uncheck the box in the Settings menu to hide the viewing of the semantic weights.

8.     NOTE:  Now that you have performed a summary, the view pane has changed.  Notice the five little tabs with pictures on them just at the bottom of the view pane.  After doing the summary, you are now looking at the summary tab of the view pane.  To change back to the tab you were using at the beginning of this tutorial, click on the second tab from the left, the semantic network view tab.  See Understanding the Topic Structure Tree for more details about the tabs in the view pane.

 

Step Eight: Using the Semantic Search

You have several questions you are trying to answer in you report.  One of the questions is what companies are renting customer lists?  You want to find information from the text that answers that question without searching the entire text.

TextAnalyst allows you to perform a semantic search on the full text.

1.     From the main menu, select Search | Semantic Search.  A Semantic search window appears.  Note that there is already a sentence in the query area.  TextAnalyst enters into the query box the selected text from the full text you highlighted in an earlier step in this tutorial.  This is done to ease how much you have to type, so that you can click a sentence, and then perform a search using that sentence.

2.     Delete the current sentence, as it is not currently relevant to this step.

TextAnalyst can accept searches that are made out of full sentences or questions.  This type of search is often called a Natural Language Query.  You can type in your question exactly as it is formed in your head and click Search, instead of having to root out keywords or phrases.  This greatly simplifies the search process.

3.     In the enter query area, type the following question:

What are the companies renting customer lists?

4.     Click Search.  TextAnalyst performs a semantic search.

5.     In the results pane are sentences from the original text that are most relevant to your question.  The results tell about how companies are renting the lists.

6.     More importantly, the view pane now contains a topic-oriented tree structure based on the question you typed in the semantic search.  This sub-tree of concepts that are related to the query in the context of the present text can help you if your results do not answer your question.  You can browse through them the same way you browse the topic structure tree you learned earlier in the tutorial.  This tree actually can help you simulate a better answer to you question, as it shows that some words you might not have considered connected are very important for your search answer.

 

Step Nine: Outputting to HTML

You wish to share some of your findings from using TextAnalyst with colleagues at a company branch in London.  TextAnalyst allows you to do this through the medium of the Internet by exporting your results to an HTML knowledge base.

TextAnalyst can export results to a file in web format.

1.     From the Main menu, click File | Export to HTML …

2.     A Save as dialogue box appears.  You can save the file to anywhere on your computer so long as you can remember its location.  Save the file in the Examples folder.  It is named by default as ExportHTML.html.  Click OK.

3.     View the file in a web browser such as Microsoft Internet Explorer or Netscape Navigator.  To do this:

4.     Open a web browser.  From the Main menu, select File | Open.   A dialog box appears to find the file. 

5.     Click Browse to find the file on your computer.

Default Location C:\Program Files\Megaputer Intelligence\MicroSystems\TextAnalyst 2.0\Examples\ExportHTML.html

In the html file key concepts are hyperlinked.  By clicking on a concept, you can view the sentences in which the concept is found to be important.  By clicking on one of these sentences you can then view the sentence and the concept in the context of the full text.  Clicking on the concept again will return you to the topic list.

6.     You now have a file you can save on to your company’s web server and is ready to be published so your corporate branch colleagues may view it in London.

7.     Close the browser and return to the TextAnalyst program.

 

Step Ten: Exporting to External Applications in a CSV file

Some of your surrounding employees are experts in Excel and they wish to be able to analyze some of your findings to produce additional reports.  You wish to send the data from TextAnalyst out of the program so other computer applications can work with it. Through this export you can view a list of key words and their frequency and semantic weight.

From the main file menu, select File | Export.

1.     A Save as dialog box appears.  Check that you are in the Examples folder.  The file has by default been named for you as ExportBase.csv.  CSV stands for comma separated value, and is a common file format used by many computer applications.

2.     Click Save.  The dialog box disappears.  You have successfully exported the file.

3.     This tutorial will use Microsoft Excel as the example external application that works with your exported data.

4.     Open Microsoft Excel.

5.     From the main file menu in Excel, select File | Open.

6.     Locate the file ExportBase.csv.  In this tutorial you saved the file in the Examples folder, which by default is located at:

C:\Program Files\Megaputer Intelligence\MicroSystems\TextAnalyst\Examples\ExportBase.csv

Note: The Open file dialogue box should be set to open Files of Type: All Files (*.*) or else the .csv file may not show up.

7.     Excel will import the ExportBase.csv file.

 

Summary:

Through the use of TextAnalyst you were able to quickly tell that the document “Databasing in the 90’s” is indeed a valuable resource.  TextAnalyst’s concise summary allowed you to grasp the key points of the text.  The Topic Structure tree led you to find the most important concepts and from there focus your investigation.  By navigating the document you were able to retrieve sentences that answered your questions.  Exporting your findings as a web page and in a spreadsheet file let you share the results of your work with remote colleagues.  By configuring the connected dictionary to better match your criteria you improved and focused TextAnalyst’s analysis.

With this powerful arsenal of tools allowing you to quickly comprehend the meaning of a text, without plunging into reading the full document, you can create detailed and insightful reports.

 

Short quiz:

Q1. Is TextAnalyst a useful tool?
Q2. Is TextAnalyst user friendly?

We wish you happy projects!


Ó 2000 Megaputer Intelligence Inc.
All rights reserved.
Ó 2000 MicroSystems, Ltd.
All rights reserved.