Strumenti Utente

Strumenti Sito


dm:start:guidelines

Questa è una vecchia versione del documento!


Guidelines for the task on data understanding

  • Data understanding (30 points)
  • Data semantics (3 points)
  • Distribution of the variables and statistics (7 points)
  • Assessing data quality (missing values, outliers) (7 points)
  • Variables transformations (6 points)
  • Pairwise correlations and eventual elimination of redundant variables (7 points)

Guidelines for the task on clustering

  • Clustering Analysis by K-means: (13 points)
    1. Choice of attributes and distance function (1 points)
    2. Identification of the best value of k (5 points)
    3. Characterization of the obtained clusters by using both analysis of the k centroids and comparison of the distribution of variables within the clusters and that in the whole dataset (7 points)
  • Analysis by density-based clustering (9 points)
    1. Choice of attributes and distance function (2 points)
    2. Study of the clustering parameters (2 points)
    3. Characterization and interpretation of the obtained clusters (5 points)
  • Analysis by hierarchical clustering (5 points)
    1. Choice of attributes and distance function (2 points)
    2. Show and discuss different dendograms using different algorithms (3 points)
  • Final evaluation of the best clustering approach and comparison of the clustering obtained (3 points)

Guidelines for the task on Association Rules Mining

  • Frequent patterns extraction with different values of support and different types (i.e. frequent, close maximal), (6 points)
  • Discussion of the most interesting frequent patterns (7 points)
  • Association rules extraction with different values of confidence (6 points)
  • Discussion of the most interesting rules (7 points)
  • Use the most meaningful rule to replace missing values and evaluate the accuracy (4 points)

Guidelines for the task on Classification

  • Learning of different decision trees with different parameters and gain formulas with the object of maximizing the performances (12 points)
  • Decision trees interpretation (6 points)
  • Decision trees validation with test and training set (6 points)
  • Discussion of the best prediction model (6 points)
dm/start/guidelines.1476713372.txt.gz · Ultima modifica: 17/10/2016 alle 14:09 (8 anni fa) da Anna Monreale