Pages

Thursday, May 9, 2013

Queue Map Hybrid -- Creating Data Structures in Java

Recently I came across a problem where I was looking for a Queue implementation that can store Key-Value pairs. The benefits I was looking for were two folds, first of all, it must behave in a FIFO fashion, and secondly, I should be able to lookup an item by the Key without removing it from the structure. An Ideal implementation for me would be a hybrid of Queue and Map data structure implementations already available in the Collection Framework.

Like any modern programmer :-) my first attempt was to search for the available implementations and to my surprise I could not find anything that could fit to my criteria. I am surprized that no one ever considered such data structure or is it something that is so specialized that no one ever bothered to publish that, whatever is the reason; I did not find any clean implementation that I could use for my requirements.

That gave me a motivation to create my own and publish it for the community, maybe there is someone else looking for a similar solution and can benefit from the work I have already done. However, instead of simply posting my solution here, I am also taking this as an opportunity to provide some guidelines for students and junior programmes on how to design a new data structure. In this post I will try to explain what are the data structures and how we design the data structures.

Data Structures

Data structures are a special way of storing and organizing data in computer’s memory. In addition to storing the data, they also provide some functionality to manipulate the data stored in the structure. What functionality is provided depends on the data structure. Typical functionality is to Add, Remove, Find, First, Last, etc. Different kind of data structures are suited for different applications, some are very basic, like arrays, and some are highly specialized like B+ Tree. Bottom line is, you store some related data in-memory and provide appropriate operations on that data.

Functional Requirements

The first step on defining a data structure is to gather the requirements, what exactly are you looking to store in your structure and what behaviour is expected from that structure. For the purpose of this exercise, I have created this list of requirements that I am looking to achieve. The new structure must be able to

  • Store Key-Value pairs
  • Have a fixed size of structure
  • Remove last entry when adding a new pair
  • Find a Value by Key
  • Store any Object as Key or Value
  • Update Value of a Key
  • Remove any item using Key
Now that we have our requirements laid down, we can start looking at the existing solutions and how much of the given functionality they provide. I had in mind the Queue and the Map from Java Collection Framework, combining these two will give me all of the above functions.

Designing the Interface

An Interface is a set of functions that will be available for the users of that structure. In our case, all the public methods of the Data Structure are going to be the interface for that data structure. Now that we have the functional requirements, we can define the public methods of the new structure that we are going to create. This is what I came up with:

 public synchronized void addItem(K key, V value);
 public synchronized V getItem(K key);
 public synchronized void remove(K key);
 public int size();
 public void clear();

These methods will cover the functional requirements that we set out in our Gathering Functional Requirements phase. Now we can worry about the actual implementation of these methods.

Before we start implementing these methods, we must first look at the underlying structures that we are going to use. Remember, we have two basic requirements we set out in the beginning, it must behave like a Queue, and it must be able to store Key-Value pairs; for that we already have two very nice interfaces in Java Collection Framework; Queue and Map. Both of these interfaces provide functions to add, get, and remove elements from the structures. However, they are interfaces, and we need to choose the correct implementations of these interfaces in our new structure, we don’t want to re-invent the wheel, do we?

For the purpose of this exercise we are going to use the LinkedList implementation of Queue and HashMap implementation of Map; simply because they are the most basic ones. Now let’s start implementing the Structure. We begin by declaring the class and instance variables.


public class QueuedMap<K, V> {

 private static final int MAX_SIZE=1024;

 private int size;
 private Map<K, V> values;
 private Queue<K> keys;

 public QueuedMap() {
  this(64);
 }

 public QueuedMap(int size) throws IllegalArgumentException{

  if(size<=0){
   throw new IllegalArgumentException("Size can only be a +ive Integer");
  }

  if(size > QueuedMap.MAX_SIZE)
   throw new IllegalArgumentException("Size cannot be more than " + QueuedMap.MAX_SIZE);

  this.size = size;
  this.values = new HashMap<K, V>(this.size);
  this.keys = new LinkedList<K>();
 }

}

Here are a few interesting things to note, First of all the use of Generics. (If you are new to Generics the follow this nice Oracle Tutorial or this Wikipedia Page for more information) This class declaration covers our "Store Key-Value pairs" requirement by using the Map and also the "Store any Object as Key or Value" requirement by allowing creating the class instance of any Type. There is also an element of Type Safety by introducing the Generics, refer to the links above for details of how Generics achieve that.

The other important bit is the constructor of the structure. The default constructor initializes the structure with a default size of 64 but the overriding constructor takes the size as a parameter and initializes the structure with the given size. This fulfils the "Have a fixed size of structure" requirement. The Structure can have virtually any size, but once initialized, it cannot be changed. The MAX_SIZE constant that restricts the size of our Structure is there only for a reference to let you define a maximum size of your structure if you want to impose that restriction, Also notice the Map and Queue initialized as HashMap and LinkedList in the constructor.

Now let’s look at the implemented methods of our DataStructure. I will start from the easiest ones first, namely size() and clear() methods. These are the standard methods that should be implemented by any Data Structure.


 public int size(){
  return this.keys.size();
 }

 public void clear(){
  this.values.clear();
  this.keys.clear();
 }
 

As you can see, we are simply encapsulating the methods provided by our underlying Data Structures and providing a wrapper on these methods. In the size() method, we are simply returning the size of our Queue and in the clear() method we are simply calling the clear() method of both our Queue and our Map. Since we already have methods available for these functions in our underlying data structures, we don’t have to reinvent the wheel here and simple encapsulation is more than adequate.

Now let's look at the other methods in our Data Structure. Notice the synchronized keyword on all of our operations; this is because both the underlying Data Structures that we are planning to use are not synchronized, and we have to provide our own thread safety mechanisms.


 public synchronized void addItem(K key, V value){

  if(key == null || value == null)
   throw new NullPointerException("Cannot insert a null for either key or value");

  // First see if we already have this key in our queue
  if(this.keys.contains(key)){
   // Key found. 
   // Simply replace the value in Map
   this.values.put(key, value);
  }else{
   // Key not found
   // Add value to both Queue and Map
   this.enqueue(key, value);
  }
 } 
 

This is a very simple method exploiting the actual implementation from the underlying structures. The first thing to check before we add this Key-Value pair is that the Key and Value must not be null. If we have a value for both objects then we check if we already have this Key in our structure. In that case, simply replace the Value in our Map with the new Object received. If not then add this Key-Value pair to our Data Structure. This is how we are tackling our "Store any Object as Key or Value" and the "Update Value of a Key" requirements. The actual work of storing a Key-Value pair is rather involved and we use a private method enqueue(K, V) for that which is not visible to the users of this Structure.


 private void enqueue(K key, V value){

  if(this.keys.size() < this.size){
   // We still have space in the queue
   // Add they entry in both queue and the Map
   if(this.keys.add(key)){
    this.values.put(key, value);
   }
  }else{
   // Queue is full. Need to remove the Head 
   // before we can add a new item.
   K old = this.keys.poll();
   if(old!=null)
    this.values.remove(old);

   // Now add the new item to both queue and the map
   this.keys.add(key);
   this.values.put(key, value);
  }
 }

Here in the enqueue(K, V) method, first thing we are checking is if we still have space in the Structure. For that we use the instance variable size that we initialized in the constructor and we are not allowing the Structure to grow any larger than this size. If the size or our structure is still less than the maximum size with which the Structure is initialized then simply add the Key to the Queue and the Key-Value pair in the Map. If we have already reached the maximum size defined for this Structure then first remove the oldest Key from the Queue, then remove the pair with this Key from the Map before adding the new entries in both Queue and Map.

This is where you can see the Queue and Map in action. Adding Keys in the Queue ensures that when it comes to add a new element in the Structure, the oldest one is removed, and then we can use the Key to manipulate the data stored in the Map. This takes care of our "Remove last entry when adding a new pair" requirement.

The remaining two methods, getItem(K key) and remove(K key) are also fairly simple. All we are doing here is wrapping the functionality already provided by the underlying Queue and Map to control the behaviour.


 public synchronized V getItem(K key){

  if(key==null)
   return null;

  V val = this.values.get(key);
  return val;
 }

 public synchronized void remove(K key){

  if(key == null)
   throw new NullPointerException("Cannot remove a null key");

  this.keys.remove(key);
  this.values.remove(key);
 }

Here in the getItem(K key) we are simply returning the Value for that Key from our Map if the Key is not null. This fulfils our "Find a Value by Key" requirement.

The remove(K key) is slightly involved, here we are taking the Key and removing the Key from both the Queue and the Map. This takes care of the "Remove any item using Key" requirement we set out for the Structure.

That completes our Data Structure with all of the Functional Requirements we set out at the beginning of this post. Below is the full source for you to give you the full picture of how this all fits together.


package com.raza.collection;

import java.util.HashMap;
import java.util.LinkedList;
import java.util.Map;
import java.util.Queue;

/**
 * <p>
 * QueuedMap is a specialised implementation of the Queue which can 
 * store Key Value pairs instead of just the Objects. This flexibility 
 * comes handy when you want to retrieve a specific Object from the Queue 
 * then instead of trying to find the object by iterating the whole 
 * Queue you can simply get the Object using the Key.
 * </P><p>
 * In order for the structure to work properly, it is vital to override 
 * the hashCode() and equals(Object obj) methods in your Key class. These are 
 * the methods that the underlying Map will use to compare the Keys 
 * to retrieve/remove the correct Object from the QueuedStore.
 * </P><p>
 * The structure will always have a fixed size. If the structure is not 
 * initialised with a given value then it will use default value of 64 
 * to initialise. Once the Structure is initialised then the size cannot 
 * be amended. Once the overall structure reaches the maximum size then 
 * any new Key Value pairs added to the structure will result in removing 
 * the oldest entry from the structure.
 * </P><p>
 * There is virtually no size limit to the size of the structure. The structure 
 * can be initialised with any arbitrary value, however, at the time of initialisation 
 * one should always consider keeping up with the best practices used to initialise 
 * Map data structures as the underlying implementation uses the HashMap to store 
 * the Key Value pairs.
 * </P>
 * 
 * @author Baqir Raza Abidi
 * @date 26 Mar 2013 16:03:08
 */
public class QueuedMap<K, V> {

 /**
  * Final variable indicates the Maximum size of this 
  * structure. 
  */
 private static final int MAX_SIZE=1024;

 private int size;
 private Map<K, V> values;
 private Queue<K> keys;

 /**
  * Default constructor for the class. Creates a class with the default 
  * structure size of {@code 64}. Once the structure is created then the 
  * size of the structure will remain the same.   
  */
 public QueuedMap() {
  this(64);
 }

 /**
  * <p>
  * Creates the structure with the given size. The constructor throws Exception if
  * the size given is less then 1. The structure cannot be created with a 0 or -ive 
  * size. 
  * </p><p>
  * The maximum size of the structure is also limited to the {@code QueuedStore.MAX_SIZE}
  * </p>
  *  
  * @param size Size of the Structure. 
  * @throws IllegalArgumentException If an invalid size is provided. 
  */
 public QueuedMap(int size) throws IllegalArgumentException{

  if(size<=0){
   throw new IllegalArgumentException("Size can only be a +ive Integer");
  }

  if(size > QueuedMap.MAX_SIZE)
   throw new IllegalArgumentException("Size cannot be more than " + QueuedMap.MAX_SIZE);

  this.size = size;
  this.values = new HashMap<K, V>(this.size);
  this.keys = new LinkedList<K>();
 }

 /**
  * <p>
  * Add a new {@code (Key, Value)} pair to the structure. Both the Key and Value can 
  * be any {@code Objects}. The method throws a {@code NullPointerException} in case any of
  * the Key and Value are {@code null}. 
  * </p><p>
  * If both the Key and Value are non null objects then it will try to store the
  * pair to the structure. If the key already exists in the Store then it will
  * simply replace the Value of that Key in the Store with the new Value. If the 
  * Key is a new one then it will try to store a new entry in the Structure. 
  * </p><p>
  * When storing a new entry in the structure, it first checks the size of the 
  * Structure and if it is still less than the size with which it was initialised then 
  * it will add the Key Value pair to the Structure. In case the size is now reached 
  * the limit then the method will first remove the oldest entry from the Structure 
  * and then will add the new Key Value pair to the Store. 
  * </p>
  * 
  * @param key  Object represents the Key.
  * @param value Object represents the Value. 
  * @throws Exception 
  */
 public synchronized void addItem(K key, V value){

  if(key == null || value == null)
   throw new NullPointerException("Cannot insert a null for either key or value");

  // First see if we already have this key in our queue
  if(this.keys.contains(key)){
   // Key found. 
   // Simply replace the value in Map
   this.values.put(key, value);
  }else{
   // Key not found
   // Add value to both Queue and Map
   this.enqueue(key, value);
  }
 }

 /**
  * Returns the value to which the specified key is associated,
  * or {@code null} if this Structure contains no association for the key.
  * <p>
  * More formally, if this map contains a mapping from a key
  * {@code k} to a value {@code v} such that {@code (key==null ? k==null :
  * key.equals(k))}, then this method returns {@code v}; otherwise
  * it returns {@code null}.  (There can be at most one such mapping.)
  * </p>
  *
  * @param key the key whose associated value is to be returned
  * @return the value to which the specified key is mapped, or
  *         {@code null} if this map contains no mapping for the key
  */
 public synchronized V getItem(K key){

  if(key==null)
   return null;

  V val = this.values.get(key);
  return val;
 }

 /**
  * Removes the mapping for a key from this Structure if it is present
  * (optional operation).   More formally, if this Structure contains a 
  * mapping from key <tt>k</tt> to value <tt>v</tt> such that
  * <code>(key==null ?  k==null : key.equals(k))</code>, that mapping
  * is removed.
  *
  * @param key key whose mapping is to be removed from the map
  */
 public synchronized void remove(K key){

  if(key == null)
   throw new NullPointerException("Cannot remove a null key");

  this.keys.remove(key);
  this.values.remove(key);
 }

 /**
  * Returns the number of elements in this collection.  
  * @return size of the structure.
  */
 public int size(){
  return this.keys.size();
 }

 /**
  * Removes all of the elements from this collection (optional operation). 
  * The collection will be empty after this method returns.
  */
 public void clear(){
  this.values.clear();
  this.keys.clear();
 }

 /*
  * Method implementing the actual logic to add 
  * the Key Value pair to the structure. 
  */
 private void enqueue(K key, V value){

  if(this.keys.size() < this.size){
   // We still have space in the queue
   // Add they entry in both queue and the Map
   if(this.keys.add(key)){
    this.values.put(key, value);
   }
  }else{
   // Queue is full. Need to remove the Head 
   // before we can add a new item.
   K old = this.keys.poll();
   if(old!=null)
    this.values.remove(old);

   // Now add the new item to both queue and the map
   this.keys.add(key);
   this.values.put(key, value);
  }
 }
}

Note here that the QueuedMap Data Structure that we created is using HashMap and LinkedList as the building blocks. Both of these DataStructures from Java Collection Framework allow null values to be stored, this is something that we have to handle ourselves. Also both of these DataStructures are not synchronized, i.e., they are not suitable for multi-threaded applications. Hence all the operational methods in this QueuedMap are marked as synchronized explicitly for thread safety.

I tried to provide some Guidelines of how we can create a new Data Structures by combining the existing functionality already provided by the Collection Framework. These are the same principles that are applied to any software development, wherever possible; reuse the existing functions, classes, methods. However, you still need to consider the pros and cons of the underlying building blocks. For example, since HashMap and LinkedList are not synchronized, we have to take care of that ourselves or alternatively use some other implementations of Queue and Map that provide thread safety.

This gives you basic building blocks to come up with your own ideas and create more complex data structures according to your requirements. One good change to this Data Structure may be to implement this as a Priority Queue where instead of removing the oldest entry, you remove the least accessed entry. The possibilities are endless.

No comments:

Post a Comment