dm:start:guidelines
Questa è una vecchia versione del documento!
Guidelines for the homework on data understanding
- Data semantics (4 points)
- Distribution of the variables and statistics (7 points)
- Assessing data quality (missing values + outliers) (7 points)
- Pairwise correlations (7 points)
- Presentation and profiling (5 points)
Guidelines for the task on clustering
- Clustering Analysis by K-means: (15 points)
- Identification of the best value of k
- Characterization of the obtained clusters by using both analysis of the k centroids and comparison of the distribution of variables within the clusters and that in the whole dataset
- Analysis by density-based clustering (10 points)
- Study of the clustering parameters
- Characterization and interpretation of the obtained clusters
- Analysis by hierarchical clustering (5 points)
- Analysis to be performed on a sampling of the data for scalability reasons (if necessary)
dm/start/guidelines.1447583176.txt.gz · Ultima modifica: 15/11/2015 alle 10:26 (9 anni fa) da Anna Monreale