Strumenti Utente

Strumenti Sito


dm:start:guidelines

Questa è una vecchia versione del documento!


Guidelines for the task on data understanding

  • Data understanding (30 points)
  • Data semantics (3 points)
  • Distribution of the variables and statistics (7 points)
  • Assessing data quality (missing values, outliers) (7 points)
  • Variables transformations (6 points)
  • Pairwise correlations and eventual elimination of redundant variables (7 points)
  • Tot 30 points

Guidelines for the task on clustering

  • Clustering Analysis by K-means: (15 points)
    • Identification of the best value of k
    • Characterization of the obtained clusters by using both analysis of the k centroids and comparison of the distribution of variables within the clusters and that in the whole dataset
  • Analysis by density-based clustering (10 points)
    • Study of the clustering parameters
    • Characterization and interpretation of the obtained clusters
  • Analysis by hierarchical clustering (5 points)
    • Analysis to be performed on a sampling of the data for scalability reasons (if necessary)

Guidelines for the task on Association Rules Mining

  • Frequent Pattern Extraction with analysis of different values of support(12 points)
  • Association Rule Extraction with analysis of different value of support and confidence (12 points)
  • Discussion on the interesting rules extracted (6 points)

Guidelines for the task on Classification

  • Learning of different decision trees (12 points)
  • Decision tree validation and interpretation (12 points)
  • Discussion on the best decision tree (6 points)
dm/start/guidelines.1476713081.txt.gz · Ultima modifica: 17/10/2016 alle 14:04 (5 anni fa) da Anna Monreale