1.5.4. Collections: ArrayList, HashMap, Hashtable and more

Do you still remember arrays? Yes, those that we talked in the Lesson 1.2.6�

Several elements can be collected into arrays. But there is a more generic concept than array. This is a collection.

A collection can consist of simple elements, such as String or it can include more complex objects.
In Java we store such elements in a list, most commonly used class is ArrayList.
The ArrayList is a lighter and a better version of an old implementation, the Vector class.

But we often need to represent KEY � VALUE pairs.
Think about the pairs placed in a two-column table.
The key can be a name for an object stored in one column and the value of the object can be stored in the other column of that table.
The table can have several rows and represent a perfect example of a Map collection.

Java has a common interface Map to describe the concept of a two-column table.
We call it Map because we create a map between the key and its value placed in a single row of the table.

This interface is implemented with several classes. The most commonly used implementation is the class HashMap.
Note, that it is a light-weight implementation, which is not thread-safe.
Read more about threads and processes in Java in the Section 3.Threads and Networks.
But you will do that later. Right now just stick with the HashMap.
When I need to think of thread-safety I often use the old-fashion Hashtable - a heavy-weight version, which does the same thing.

1.5.4.Collections

Assignments:
1. Create a project similar to the previous one.
2. This time save two or three web pages (View Source) as txt files in the project � resources folder (create this folder)
3. Create a program to get a list of files in that directory and to parse each file in a for loop.
4. During parsing each file store any new word in the file in a Hashtable, where the key is the word itselef and the value is a new Hashtable.

HashMap and Hashtable are similar in functionality, but Hashtable is thread-safe, HashMap is not.

Both are two-column tables:


Hashtable1



Key  | Value



word | Hashtable2 

     |  key      | value

     | filename  | relevance: how many times that word appeared in the file

In my example, I use any a word from the web page, for example, game as a key and create another Hashtable as a value.

In the second Hashtable the key will be a filename, for example, webpage1.txt, and the value is the number of repetitions of the key-word (relevance) on that web page, how many times the word game appeared in that file. The parsing procedure should count the relevance.

Was it clear so far?

Here are code hints:


// remove all html tags, so the parsing will deal with text only

// find the beginning and the end of any html tag

int indexOfBeginningOfHtmlTag = html.indexOf("<");

int indexOfEndHtmlTag = html.indexOf(">", indexOfBeginningOfHtmlTag );



// remove html tag from the beginning to the end from the html

// do this in the loop until any tags are there


// create one Hashtable wordsAndIndexes 

// with the key=a new word and the value = a new Hashtable



Hashtable wordsAndIndexes; 



// check if this is a new word not captured in the Hashtable the same word that already captured in the Hashtable

if(!wordsAndIndexes.containsKey(word)) {

  // capture this new word and create for this word a Hashtable

   Hashtable filenameAndCounter= new Hashtable();

   // filenameAndCounter will store the name of the file currently parsed and the counter

   // The counter will count how many same words in this file

   // filename serves as a key and a counter is a value

   filenameAndCounter.put(filename, 1+""); // initially put "1"

   // put the word and the filenameAndCounter (Hashtable) into Hashtable wordsAndIndexes

   wordsAndIndexes.put(word, filenameAndCounter);

} else {

   // the same word that already captured in the Hashtable wordsAndIndexes

   // get the counter and add 1 to the counter

   Hashtable filenameAndCounter= wordsAndIndexes.get(word);

   String counterString = filenameAndCounter.get(filename);

   int counter = Integer.parseInt(counterString);

   counter = counter + 1; // count number of words - relevance

   filenameAndCounter.put(filename, counter + ""); // 

   wordsAndIndexes.put(word, filenameAndCounter);

}

The program should use the counters as an a relevance value and point to a file where relevance is bigger.

5. After parsing is done, place code with the Scanner class to ask for keywords and use the Hashtable to select the file which is most relevant to these keywords.
6. Send source to dean@ituniversity.us
7. Create 2 QnAs for this subject and send to dean@ituniversity.us