Strumenti Utente

Strumenti Sito


dm:start:guidelines

Questa è una vecchia versione del documento!


Guidelines for the homework on data understanding

  • Data semantics (4 points)
  • Distribution of the variables and statistics (7 points)
  • Assessing data quality (missing values + outliers) (7 points)
  • Pairwise correlations (7 points)
  • Presentation and profiling (5 points)

Guidelines for the task on clustering

  • Clustering Analysis by K-means: (15 points)
    • Identification of the best value of k
    • Characterization of the obtained clusters by using both analysis of the k centroids and comparison of the distribution of variables within the clusters and that in the whole dataset
  • Analysis by density-based clustering (10 points)
    • Study of the clustering parameters
    • Characterization and interpretation of the obtained clusters
  • Analysis by hierarchical clustering (5 points)
    • Analysis to be performed on a sampling of the data for scalability reasons (if necessary)
dm/start/guidelines.1447583176.txt.gz · Ultima modifica: 15/11/2015 alle 10:26 (6 anni fa) da Anna Monreale