===== SPD Project Proposals for academic year 2013-2014 =====

This SPD project for the current academic year can be selected by any of the students. As explained during the course, the project development and test is still an individual work.
 
**The description of the project and the support code can also be dowloaded as an RTF file {{:magistraleinformaticanetworking:spd:spd-13-14-project_description.rtf|here}}, and a zipped text file {{:magistraleinformaticanetworking:spd:spd-13-14-snqueens.c.zip|here}}.**

**Note (29/07/2014)** -- Additional explanations about some aspects of the project and its algorithm were given to the students in private talks in June and July. They are reported at the end of this web page, [[magistraleinformaticanetworking:spd:13-14-projectproposals#Additional explanations|here]]. 

==== Rules ====

Starting from the description of the project and from the fragments of code provided, a parallel application has to be developed and run and its results and performance must be analyzed. 
  * The parallelization has to exploit either MPI, TBB, or OpenCL. 
  * The student is required to propose an analytical model of his program behavior (e.g. expected speedup and performance) and evaluate it against the actual application.
  * The result of the project to be delivered is the program code and a short report about the program structure and its results. The report is discussed with the teacher. Note that the problem output will be checked when you deliver your project, before looking at the actual code and the report.

**Additional Points**: more parallelization strategies can usually be developed with the same framework. Comparing several solutions and studying their behavior will improve the project evaluation. Some effort in this direction is expected from all students, at least in order to motivate the choice of which solution will be implemented and run.

**Bonus points**: more than one framework can possibly be combined in the same project, or be compared with each other. This is not a requirement to pass the course or achieve full grades.

**Exception**: as reported in the course wiki main page, students can still propose their own project topic to be parallelized using one or more of the frameworks, according to previous year rules. Please read in the main page the rules that apply, this page does not deal with that case.

==== Problem description ====

Design and implement a parallel application solving a variation of the N-Queen problem.

The N-queen problem requires you to place exactly N queen pieces on an N by N chessboard in such a way that none is attacking any other one. It is a specific set-cover problem, with both heuristic and combinatorial solutions, which has been studied already (see D.E. Knuth's paper for a quick start on set coverage problems) and previously used as a test case for parallel and distributed programming frameworks (see Denis Caromel et al.)

The sequential algorithm for generating solutions is provided as a starting point. 

1) In our project we want to generate all ordinary solutions to the N-Queen problem given a specific value on N as a parameter, provided  that N<=24. Since we are interested in the combinatorial exploration of the solution space, a problem whose complexity is exponential, we have an obvious case for parallel solutions. 
No student is expected to reach or exceed the upper limit N=24 with a complete computation, but the parallel code should cope with all N values up to 24 and test results should show the gains and limits of the implemented parallelization.

2) When generating all simple solutions, we want to record them in a data structure so that they can be reused. In this phase we also want to check when a newfound solution is a derivation of a previous one (either by a 180 degrees reflection along an axis and/or by a 90 degrees rotation), and record how many derivative solutions we find in each group. 
Note that for moderately high values of N the data structure may become large enough to need its own parallel implementation (e.g. splitting across resources) .

3) The set of unique solutions generated must also be scrutinized for solutions of the amazon problem, that is, placing N pieces on an N by N chessboard where each one of the pieces can move like an amazon. An amazon is a non standard chesspiece that can move both as a queen and as a knight (see http://en.wikipedia.org/wiki/Amazon_%28chess%29). 
This means that: (a) by converse, each solution of the N amazon problems must be a solution of the N-queen problem; (b) each N-Queen unique solution where at least two queens are less than 2 steps away along rows and columns is not a solution of the amazon problem.

==== Some References ====

Donald E. Knuth, Dancing links (2000) 
downloadable from http://arxiv.org/abs/cs/0011047 
and from http://www-cs-faculty.stanford.edu/~uno/preprints.html

http://en.wikipedia.org/wiki/Queen_%28chess%29

http://en.wikipedia.org/wiki/Eight_queens_puzzle

//Other Bibliography on the problem//

Rodolfo Toledo, Eric Tanter, José Piquer, Denis Caromel, Mario Leyton, "Using Reflexd For A Grid Solution To The N-Queens Problem: A Case Study", in  Achievements in European Research on Grid Systems, Springer, 2008.
dowmloadable from http://pleiad.dcc.uchile.cl/papers/2006/toledoAl-cgiw2006.pdf

Wirth, Niklaus (1976), Algorithms + Data Structures = Programs, Prentice-Hall, ISBN 0-13-022418-9

http://www.npluskqueens.info/background.html

==== Detailed Description of the Application to build ====

=== Problem Input ===
  * the value of N, 1<N<=24.
  * the degree of parallelism to use in a specific test.
  * (possibly: other parameters specifying further the parallel for to use; this is of course implementation dependent, and is left to decide to the student)
  * a switch to optionally print all solutions/unique solutions/amazon solutions found either to standard output or to a file (only one is required)

=== Problem Output ===
  * the overall count of solutions found;
  * the overall count of unique solutions found;
  * the overall count of unique solutions which also solve the N-amazon problem

if required by the switch, all solutions found shall be output (either on the standard output or in a file), also including a counter stating for each unique solution how many times it was found. Please note that the problem output will be checked when you deliver your project.

=== Program structure ===


The program can be represented as a three stage pipeline:
  - recursively explore the set of potential solution of the N-Queen problem, and extract all solutions.
  - store all solutions in a data structure, checking for derived solutions before storing each new one (and increment the related counter if that's the case)
  - analyze all unique solutions to check if they also solve the amazon problem (and count/store them)

Each stage can be parallelized, according to farm (anonymous, load balancing) or map paradigms (geometric distribution). This is true from the high-level viewpoint: the actual parallel implementation depends on the framework adopted and is required from the student. Besides, it is a student choice the specific tradeoff between simplicity and performance of the parallelization.

The students must choose the parallel patterns and implement them, possibly merging stages or exploding them into parallel skeletons implementing the function. The high-level analysis leading to those choices shall be reported in the report about the project.

Stage 1 is the most obvious candidate or parallelization as it contains a large amount of combinatorial exploration. In order to do that, a subdivision of the search space is required, so that several tasks are generated that can be computed in parallel. The available example code allows to generate several independent tasks (distinct values in the rows array for all rows already assigned). The amount of computation and of solutions found in the subtasks is not homogeneous, you may want to study the issue before choosing the parallelization type.

Stage 2 has to be based on some dictionary structure where you can store and retrieve board positions quickly, in order to check for each new board that it is present in the past output. 
It is up to the student choose a method and a data structure, as well as to code the simple functions that compute the reflection and rotations of a board. When designing the structure take into account the number of unique solutions that you are going to find for 8<N<20, for instance. That number is easily found on the WWW.

Stage 3 is easier to code, as it simply has to check that no two queens have both coordinates which differ by 1 or 2 (and one of those two cases must actually never happen!).

**Available code**

The example code for the N-queen problem is made available in C, to ease the porting to MPI, TBB or OpenCL kernels.

<code C>
#include <stdio.h>

/*********
 * reference sequential code for the N Queen problem, SPD course
 * 2013-2014, Computer Science and Networking, University of Pisa.
 *
 * The code is a derivative work of material from wikibooks.org, and
 * as such the Creative Commons Attribution-ShareAlike License applies
 * to this code. M. Coppola
 */


/** global definitions ******************************************************/
 
// largest board size
#define MAXBOARDSIZE 24

/*********
 * levels we expand the problem of before calling the solver
 * (i.e. before spawning parallel work!)
 */
#define EXPANDLEVELS 3

/********
 * this type can be used if we need to reliably copy information about
 * a set of subproblem across parallel instances, regardless of the
 * program parameters; you will likely need to define functions that
 * copy/transfer this kind of structures across processes, threads or
 * larger arrays in order to match your parallelizatin framework of
 * choice
*/
typedef struct boardplus{
  int size; // board size
  int x; // this is where we need to restart
  int y; // idem
  int board[MAXBOARDSIZE]; // the actual board, padded to largest instance
} subproblem;

/********
 * static variable defining the board size; you may want to make it an
 * explicit parameter in your code like it's done in function main()
*/
static int board_size = 8;

/********
 * static variable counting solutions found; in a parallel settings,
 * you will need to devise a different solution
*/
static int solution_count = 0;

/* count the subproblems spawned from the expander function */
static int spawned_tasks=0;

/** utility functions ******************************************************/

/********
 * human-readable output; you may need to rewrite it in order to make
 * it really flexible */
void putboard(int rows[board_size])  
{
    int x, y;
    printf("\nSolution #%d:\n------------------------------------------------------------------\n", ++solution_count);
    for (y=0; y < board_size; ++y) {
        for (x=0; x < board_size; ++x)
            printf(x == rows[y] ? "| Q " : "|   ");
        printf("|\n------------------------------------------------------------------\n");
    }
}

// simpler printout suitable for automatic analysis
void dump_board(int rows[board_size])
{
  int y;
  printf("%d", ++solution_count);
  for (y=0; y<board_size; ++y)
    printf("\t%d",rows[y]);
  printf("\n");
}

/** computing functions ******************************************************/

/*******
 * check that a board position is safe so far; we only need to check
 * columns and diagonals, as rows are mutually safe by construction */
int is_safe(int rows[board_size], int x, int y)  
{
    int i;
    if (y == 0)
            return 1;
    for (i=0; i < y; ++i) {
       if (rows[i] == x || rows[i] == x + y - i || rows[i] == x - y +i)
            return 0;
    } 
    return 1;
}

/******* this routine fully expands a sub-instance of the problem
 * depth first, generating the corresponding set of solutions; it CAN
 * be used on the whole problem directly, if it is small enough to be
 * dealt with sequentially */
void n_queens_solver(int rows[board_size], int y) 
{
    int x;
 
    for (x=0; x < board_size; ++x) {
        if (is_safe(rows, x, y)) {
            rows[y] = x;
            if (y == board_size-1) {
	      //            putboard(rows);
	      dump_board(rows);
	    }
            else
              n_queens_solver(rows, y+1);
        }
    }
}
 
/******** this routine partially expands the problem by running the
 * combinatorial search only for the first levels; everything deeper
 * than that is passed to the solver routine, which is actually where
 * you can plug in some parallelism 
*/
void n_queens_expander(int rows[board_size], int y, int expand_levels)
{
  int x;
  
  for (x=0; x < board_size; ++x) {
    if (is_safe(rows, x, y)) // early check pruning dead branches
      {
	rows[y] = x;
	if (y == expand_levels-1) {
	  printf("New subtask %d\n", ++spawned_tasks);
	  /*******
	   * here you can spawn parallel work by filling in a
	   * supbproblem structure wit the available rows, x, y and
	   * board_size
	   */
	  n_queens_solver(rows, y+1);
	}
	else
	  n_queens_expander(rows, y+1, expand_levels);
      }
  }
}

/** main program ******************************************************/
/*******
 * Note: this very simple version of the algorithm always prints all
 * solutions. You will need to add checks to make this optional */
int main(int argc, char **argv)
{
  int rows[board_size]; // the main board variable
  int temp;
  
  if (argc ==2) {
    if (temp=atoi(argv[1]), temp >EXPANDLEVELS && temp <=MAXBOARDSIZE)
      board_size = temp;
      else {
	printf("Wrong Size parameter: %d\n",temp); 
	return -1;
      }
  }

  //  n_queens_solver(rows, 0);
  /******
   *  expand three levels, then call the full solver routine; note
   *  that this value is one of the parameters to be tuned when
   *  parallelizing 
   */
  n_queens_expander(rows, 0, 3); 

  printf("Found %d solutions over % d taks/subproblems\n",solution_count,spawned_tasks);
  return 0;
}
</code>

==== Additional explanations ====
The following should be obvious but it was asked recently.

  - Students may optimize or replace the sequential code, but are not strictly required to do that. On the one hand, a strongly optimized sequential code will dwarven the parallel speedup of your solution, and spending most of your time in this will not address the key issue and will not raise your evaluation. On the other hand, depending on the parallel solution and technology, some optimizations may be unavoidable in order to achieve a good speedup or obey the limits of your computing platform (memory size and/or bandwidth, parallelism degree, acceptable computation time, degree of software lockout or computation/communication superposition). Recognizing which optimizations are necessary and which ones are not will simplify the development work.
  - The problem is naturally divided in three stages, from an abstract point of view. It is up to the student to decide if a different decomposition of the work is actually more suited to their technology. The student is expected to be able to explain and discuss his/her choices and motivate them, both in the case they match the abstract problem description and in the case they depart from it. The key question to ask yourself should sound like "would it be possible, sticking to the same platform and framework, to enhance the parallelisation of the problem by changing the parallel structure adopted?"
  - Unique solutions (also called fundamental solutions) are those which remain after we identify those which happen to be mirrored or rotated of each other. Of course you can choose a representative in each equivalence class in the set of solutionns, then you can only store that representative in the dictionary structure of stage 2 of the problem. Choosing wisely what is the representative and how to find it greatly simplifies the implementation.
  - Explaining you choices in terms of measured features of the evolving code DOES give extra point. That is, tracing code behaviour in different conditions and for specific subproblems in terms of observable features like memory size, number of recursive calls, achieved load balancing and so on will help you achieve a better solution and be able to analyze it.