Friday, April 17, 2015

Concurrent Data Structures

For the past 6 weeks I've been on a different schedule due to training at work that has unfortunately restricted much of my time that can be devoted towards programming. Although I've been busy with work, that doesn't mean I haven't had any free time left! On my days off I've been studying concurrency in a much more in-depth manner and even wrote a basic concurrent data structures library. It's published on github and supports a concurrent stack, queue, and a third data structure similar to stack and queue (in that it supports push and pop) but the pop order is not guaranteed (it must be assumed that the pop order can be random).

 #include "ccl.hpp"  
 #include <iostream>  
   
 int main() {  
   std::cout << "Starting concurrent data pool test" << std::endl;  
   ccl::data_pool<std::string> pool;  
   
   pool.push("Hello!");  
   
   std::string popped_value = "Empty";  
   if(pool.try_pop(popped_value)) {  
     std::cout << "Popped value: " << popped_value << std::endl;  
   } else {  
     std::cout << "Value did not pop, variable is still: " << popped_value << std::endl;  
   }  
 }  

For the concurrent stack and queue, I used a technique called "Flat-Combining" to maintain thread-safety when accessing the containers. The idea behind flat-combining is that each thread gets access to a special record that they can publish their desired action to. They then wait for a response on that record indicating that the action has been processed on the data structure. A global lock exists that limits only 1 thread to be able to process the requests of the other threads. If the lock is available, any thread that is waiting for a response will try to take the lock and take it upon themselves to publish the actions of all the threads to the data structure. I found this data structure to not only be more performant but also much easier to implement compared to a traditional lock-free data structure.

For the last data structure (which I called "data_pool" since I'm not sure what the right name is for it), it is what I consider more concurrent friendly because it compromises by allowing any order for data popping. Data is stored in vectors of successively growing size (larger vectors are added as more data is pushed to an already full data structure), which helps with spatial locality, and since the the same vectors are used repeatedly, temporal locality is improved (instead of allocating a new node on the heap every time a value is pushed).

If interested in checking out the source code and some examples, the project's page can be found at: https://github.com/Salgat/Concurrent-Containers-Library. It should be noted that these are naive implementations and are neither guaranteed nor fully tested.

No comments: