====== SPD 2015-2016 Project Work Information ======

The project work for this year will be the (re)implementation and test of an existing clustering algorithm using one of the MPI, TBB or OpenCL technologies (possibly more than one, in some cases).

To this extent, a few papers are listed that explain parallel clustering algorithms, which the student should analyze and implement. In a few cases, there is already a parallel implementation that the student can attempt to parallelize using a different technology.

====What is expected from the project works====

The proposed implementation
  * must have a well recognizable, explicit parallel structure
  * must have an (expected) parallel performance model
  * must be verified with experiments
  * must be accompanied with a short report explaining the problem tackled, the choices made, the performance model; the report shall contain the experments results and analysis.

The student 
  * should be able to understand the parallel algorithm and implement/port it with the choosen support
  * can exploit libraries and data structures as needed
  * should be able to verify the program and its model with experimental data, identifying performance and scalability issues

Please contact the teacher :
  * In case you are stuck or miss datasets to test your program with.
  * If you need computing resources for running the experiments.
  * If you have a different proposal for you project 

//As stated many times before, obtaining the fastest program is not the absolute goal in this case. Being able to use a technology and understand its issues is as much important. Try to keep the balance.//

====Links to algorithms and papers====

  * The center for Ultra-Scale computing at Northwestern University has a page with references to parallel versions of the DBSCAN and OPTICS clustering algorithm (and PINK, a hierarchical clustering algorithm)\\ [[http://cucis.ece.northwestern.edu/projects/Clustering/index.html]] \\ Both MPI and OpenMP versions are provided there as source code and test data, start from the referenced papers and study the code in order to design a parallel version using TBB or OpenCL.
  * The paper [[http://www.hindawi.com/journals/ddns/2015/793010/|An Efficient MapReduce-Based Parallel Clustering Algorithm for Distributed Traffic Subarea Division]] \\ describes a map-reduce parallelization of K-means. Starting from this structured approach, is it possible to write a parallel implementation of K-means in MPI and TBB.  
  * The paper [[http://www.sciencedirect.com/science/article/pii/S1877050913003438| G-DBSCAN]] describes a parallel GPU-based DBSCAN implementation.