ALL ABOUT THE HASHING ALGORITHM

Eber Hernandez
8 min readNov 23, 2020

--

The Article is all about how does Hashing works, starting from its history all the way to its implementation in C++.

Figure 01. The art of Hashing

YOU WILL LEARN:

1. What is hashing

2. Principle elements of Hashing algorithm

3. Pros vs. Cons

4. Implementation

1. WHAT IS HASHING

Hashing is the “art” of converting a key value of information into an index for a spot of memory in an array.

Have you ever wondered, how does you can look for a certain individual in your contanct list only using their phone number or their last name and have a really quick response from your cellphone? Or perhaps, how does all your security passwords from all your online accounts are kept safe? (cryptographic)…Or maybe, how does the teachers can find if their students are commiting plagiarism? (Rabin-Karp Algorithm)…. Or even, how does your router of Wi-Fi take your messages from place A to a place B without altering the content of it?

Figure 01. Applications of the Hash Function.
Figure 02. Applications of the Hash Function.

All of these examples mention before were devepoled and based on the idea used in the “Hashing Algorithm” , so if you want to know how this works in a programming level, then you are in the right place!

BACKGROUND

In 1958, Hans Peter Luhn presented in a scientific international conference a machine named the “KWIC, for Key Word in Context”, which was able to take a large number of texts, around 500 to 5000 words in length, and the KWIC system could automatically construct a kind of index.

At the time, indexing and classifying information was quite difficult because of the length and complexity of the papers. As well, it was hard to keep up with the classification (indexation) because the information in many fields was growing too rapidly.

Figure 03. Hans Peter Luhn.

“By the early 1960s, KWIC had become central to the design of hundreds of computerized indexing systems, including those used by the Chemical Abstracts Service, Biological Abstracts, and the Institute for Scientific Information. KWIC was basically the era’s equivalent of a search engine: It allowed users to speedily locate the information they needed.”

(IEEE SPECTRUM- Hallam Stevens/2018).

There is no doubt, that Peter´s KWIC machine was a milestone in the development of new and more sophisticated ways for computers to look and store data such as databases, identification of patterns and password storage.

2. Principle Elements of Hashing Algorithm

INTRO TO THE CONCEPT:

The 4 main parts are the key value, the hashing function, the hash table, and collisions.

· Key value:

The piece of information for which you are going to associate the information you want to keep safe.

· Hashing Function:

Algorithm that convert the key value into an index that tells you where you can find, insert or delete an entry in the array in which all the information is stored.

In other words, it maps the key value into a index value that can be used as a place in the array.

· Hash Table:

(Data Structure) Array in which the information it´s been store.

· Collision:

Many times, after passing throughout the hash function, different key values can give the same index value.

This a big problem! Because you can´t have two elements of information in ONE spot. Therefore, you need a way of looking for a new spot in which the new information can be stored.

There are plenty of ways of handling collision. Having a good “hashing function” allows you to make a good distribution of the indexes and the allocations of memory in the array .

However, when the collisions are inevitable, there are many methods to approach the collisions, and these are two of the most common ways of handling collisions:

  • Open Addressing: (Linear Probing) Finding the next available spot in the array to allocate the new information.
  • Chaining: The idea is to make each cell of hash table point to a linked list of records that have same hash function value.
Figure 04. Memorizing the Hash Algorithm.

Now, that you know all the elements of hashing….. you may wonder, what the heck does the initial picture represent?! Well, it is a irrelevant and out of the ordinary story, for you to remember the concepts of hashing.

The thug cheese is trying to steal the succulent mouse ( hash table — information you want to store), and for him to accomplish his mission, the cheese requires a enormous key ( key value ) to open the mysterious and magical lock (hash function), which only open with the correct key.

*(This idea came from the cultural knowing that “all mice like cheese and they always try to steal it”, just like in the popular movie “Ratatouille”).

It´s very ridiculous , I know , but from now on, when you try to remember how “hashing” works, you would remember the evil and thug cheese trying to steal the mouse!

3. Pros vs. Cons

PROS

• Search of O(1): search time on average (under reasonable assumptions)

• Security

• Rapid Data Look-Up

CONS

• Search of O(n) in worst case. (Linear time/ Pass thru all elements to get to the one desired)

  • Collisions, collisions and more collisions!

4. Implementation

It is important to mention that the following example code was retrieved in GeeksforGeeks and only a few modifications were done to complement the concepts . All the parts that were modified or included to the code were also mention in the comments of the code:

EXAMPLE:

In this occasion, it is going to be used the chaining method to handle collision, and all the aspects of hashing are going to be cover through this example:

First, all the standard libraries were included. As well, it was included the “using namespace std” to reduce time and lines of coding for each time that something was going to be print in the monitor.

Figure 05. C++ Libraries.

The next part was making the declaration of our class HASH, where this data structure was going be develop.

In the following picture, in line 41 it is declared the most common Hash Function, which return the module of the key divided by the number of elements in the list which is BUCKETS. The value return is a number between 0 and BUCKETS, which is going to be the index where the value is going to be stored.

Figure 06. Hash Class implementation.

For example, if BUCKETS=10 then the hash table is going to be from 0 to 9, and as a result of applying the Hash Function, the values return are going to be indexes from 0 to 9.

Table 01. Example of indexes obtained by the “hashFunction( )”.

Therefore, if a key value of 12 wants to be added, the position in the list where it going to be added the position 2 of the Hash Table. The same would apply for the values 48 which is going to be allocated at position 8 and 75 at position 5.

Once all the attributes and methods were declared in the class, the next step is to declare the constructor by default, which its only duty is to determine the number of available spaces in the list that was going to be created under the name of table. This table would represent the final hash table where all the inserted elements are going to be concatenated.

Figure 07. Hash Constructor.

The next step is to implement all the actions that are going to be done by the methods:

The first method was the insertItem() which directly uses push_back to insert an element at the last of the list, which in this case is going to be the index value provided by the hash function. Automatically, the new element is inserted in a linked list where the stored value in the hash table is a pointer to the first value. That would continue for all the new values that are inserted and have the same index.

Figure 08. Insertion of an Item, method implementation.

For the deleteItem(), it is necessary to look one by one, all the elements in the linked list starting from the index value in the hash table. If the search reaches the end of the table, and the value wasn’t found, a error message is going to be displayed.

In the other hand, if the key value is indeed founded, then the value is going to be erased and the hash table is going to be displayed again.

Figure 09. Deletion of an Item, method implementation.

For searchItem(), the code was basically the same as the deleteItem(). The only difference resides that in this occasion, there is only going to be a message telling the user if the key value given was found or not.

Figure 10. Searching for an Item, method implementation.

Finally, showHash() implements the printing of the Hash Table.

Figure 11. Printing Hash Table, method implementation.

To visualize how does this code works, it was implemented an array of only 4 values 15,10,12 and 28 to implement them in the hash table. Then a object of type Hash is created to allocate 4 initial values which are in the array a[].

Figure 12. Construction of an Object and storage of elements using the hash function.

The final result was the following:

Figure 13. Result displayed in the monitor.

As mention previously, the deleteItem() function was used for the key value (12), which was founded, and the hash table was printed again to observe the difference. As well, the value 10 was searched and found at the position 2 of the hash table. Once again, the deleteItem() function was used with the value 12 again , but this time it wasn’t found and a error message was throw.

The WRAP

  • The main purpose of hashing is the indexing of data and the reduction of time for insertion, deletion and search.
  • The main elements of hashing are the key value, hash function, hash table and the collisions (Rember the cheese thief!).
  • The way of avoiding collisions is by having a good hash function or by using methods like linear probing or chaining.
Photo by Joshua Aragon on Unsplash

REFERENCES

--

--