Register   Login   About   Study   Enterprise   Share
AI / Internet Technology University (AITU)
Fast Login - available after registration







|

Top Links: >> 80. Technology >> Internet Technology Summit Program >> 1. Java Introduction >> 1.5. Text Processing and Collections
Current Topic: 1.5.4. Collections: ArrayList, HashMap, Hashtable and more
You have a privilege to create a quiz (QnA) related to this subject and obtain creativity score...
1.5.4. Collections: ArrayList, HashMap, Hashtable and more

Do you still remember arrays? Yes, those that we talked in the Lesson 1.2.6…

Several elements can be collected into arrays. But there is a more generic concept than array. This is a collection.

A collection can consist of simple elements, such as String or it can include more complex objects.
In Java we store such elements in a list, most commonly used class is ArrayList.
The ArrayList is a lighter and a better version of an old implementation, the Vector class.

But we often need to represent KEY – VALUE pairs.
Think about the pairs placed in a two-column table.
The key can be a name for an object stored in one column and the value of the object can be stored in the other column of that table.
The table can have several rows and represent a perfect example of a Map collection.

Java has a common interface Map to describe the concept of a two-column table.
We call it Map because we create a map between the key and its value placed in a single row of the table.

This interface is implemented with several classes. The most commonly used implementation is the class HashMap.
Note, that it is a light-weight implementation, which is not thread-safe.
Read more about threads and processes in Java in the Section 3.Threads and Networks.
But you will do that later. Right now just stick with the HashMap.
When I need to think of thread-safety I often use the old-fashion Hashtable - a heavy-weight version, which does the same thing.

1.5.4.Collections

Assignments:
1. Create a project similar to the previous one.
2. This time save two or three web pages (View Source) as txt files in the project – resources folder (create this folder)
3. Create a program to get a list of files in that directory and to parse each file in a for loop.
4. During parsing each file store any new word in the file in a Hashtable, where the key is the word itselef and the value is a new Hashtable.

HashMap and Hashtable are similar in functionality, but Hashtable is thread-safe, HashMap is not.

Both are two-column tables:

Hashtable1

Key | Value

word | Hashtable2
| key | value
| filename | relevance: how many times that word appeared in the file

In my example, I use any a word from the web page, for example, game as a key and create another Hashtable as a value.

In the second Hashtable the key will be a filename, for example, webpage1.txt, and the value is the number of repetitions of the key-word (relevance) on that web page, how many times the word game appeared in that file. The parsing procedure should count the relevance.
Was it clear so far?


Here are code hints:

// remove all html tags, so the parsing will deal with text only
// find the beginning and the end of any html tag
int indexOfBeginningOfHtmlTag = html.indexOf("<");
int indexOfEndHtmlTag = html.indexOf(">", indexOfBeginningOfHtmlTag );

// remove html tag from the beginning to the end from the html
// do this in the loop until any tags are there



// create one Hashtable wordsAndIndexes
// with the key=a new word and the value = a new Hashtable

Hashtable wordsAndIndexes;

// check if this is a new word not captured in the Hashtable the same word that already captured in the Hashtable
if(!wordsAndIndexes.containsKey(word)) {
// capture this new word and create for this word a Hashtable
Hashtable filenameAndCounter= new Hashtable();
// filenameAndCounter will store the name of the file currently parsed and the counter
// The counter will count how many same words in this file
// filename serves as a key and a counter is a value
filenameAndCounter.put(filename, 1+""); // initially put "1"
// put the word and the filenameAndCounter (Hashtable) into Hashtable wordsAndIndexes
wordsAndIndexes.put(word, filenameAndCounter);
} else {
// the same word that already captured in the Hashtable wordsAndIndexes
// get the counter and add 1 to the counter
Hashtable filenameAndCounter= wordsAndIndexes.get(word);
String counterString = filenameAndCounter.get(filename);
int counter = Integer.parseInt(counterString);
counter = counter + 1; // count number of words - relevance
filenameAndCounter.put(filename, counter + ""); //
wordsAndIndexes.put(word, filenameAndCounter);
}


The program should use the counters as an a relevance value and point to a file where relevance is bigger.

5. After parsing is done, place code with the Scanner class to ask for keywords and use the Hashtable to select the file which is most relevant to these keywords.
6. Send source to dean@ituniversity.us
7. Create 2 QnAs for this subject and send to dean@ituniversity.us

Topic Graph | Check Your Progress | Propose QnA | Have a question or comments for open discussion?
<br/>Hashtable1
<br/>
<br/>Key  | Value
<br/>
<br/>word | Hashtable2 
<br/>     |  key      | value
<br/>     | filename  | relevance: how many times that word appeared in the file
<br/>

In my example, I use any a word from the web page, for example, game as a key and create another Hashtable as a value.

In the second Hashtable the key will be a filename, for example, webpage1.txt, and the value is the number of repetitions of the key-word (relevance) on that web page, how many times the word game appeared in that file. The parsing procedure should count the relevance.





Was it clear so far?



Here are code hints:
<br/>// remove all html tags, so the parsing will deal with text only
<br/>// find the beginning and the end of any html tag
<br/>int indexOfBeginningOfHtmlTag = html.indexOf("<");
<br/>int indexOfEndHtmlTag = html.indexOf(">", indexOfBeginningOfHtmlTag );
<br/>
<br/>// remove html tag from the beginning to the end from the html
<br/>// do this in the loop until any tags are there
<br/>


<br/>// create one Hashtable wordsAndIndexes 
<br/>// with the key=a new word and the value = a new Hashtable
<br/>
<br/>Hashtable<String, Hashtable> wordsAndIndexes; 
<br/>
<br/>// check if this is a new word not captured in the Hashtable the same word that already captured in the Hashtable
<br/>if(!wordsAndIndexes.containsKey(word)) {
<br/>  // capture this new word and create for this word a Hashtable
<br/>   Hashtable filenameAndCounter= new Hashtable<String, String>();
<br/>   // filenameAndCounter will store the name of the file currently parsed and the counter
<br/>   // The counter will count how many same words in this file
<br/>   // filename serves as a key and a counter is a value
<br/>   filenameAndCounter.put(filename, 1+""); // initially put "1"
<br/>   // put the word and the filenameAndCounter (Hashtable) into Hashtable<String, Hashtable> wordsAndIndexes
<br/>   wordsAndIndexes.put(word, filenameAndCounter);
<br/>} else {
<br/>   // the same word that already captured in the Hashtable wordsAndIndexes
<br/>   // get the counter and add 1 to the counter
<br/>   Hashtable filenameAndCounter= wordsAndIndexes.get(word);
<br/>   String counterString = filenameAndCounter.get(filename);
<br/>   int counter = Integer.parseInt(counterString);
<br/>   counter = counter + 1; // count number of words - relevance
<br/>   filenameAndCounter.put(filename, counter + ""); // 
<br/>   wordsAndIndexes.put(word, filenameAndCounter);
<br/>}
<br/>


The program should use the counters as an a relevance value and point to a file where relevance is bigger.

5. After parsing is done, place code with the Scanner class to ask for keywords and use the Hashtable to select the file which is most relevant to these keywords.
6. Send source to dean@ituniversity.us
7. Create 2 QnAs for this subject and send to dean@ituniversity.us


Topic Graph | Check Your Progress | Propose QnA | Have a question or comments for open discussion?

Have a suggestion? - shoot an email
Looking for something special? - Talk to me
Read: IT of the future: AI and Semantic Cloud Architecture | Fixing Education
Do you want to move from theory to practice and become a magician? Learn and work with us at Internet Technology University (ITU) - JavaSchool.com.

Technology that we offer and How this works: English | Spanish | Russian | French

Internet Technology University | JavaSchool.com | Copyrights © Since 1997 | All Rights Reserved
Patents: US10956676, US7032006, US7774751, US7966093, US8051026, US8863234
Including conversational semantic decision support systems (CSDS) and bringing us closer to The message from 2040
Privacy Policy