dm:start:guidelines
Questa è una vecchia versione del documento!
Indice
Guidelines for the task on data understanding
- Data understanding (30 points)
- Data semantics (3 points)
- Distribution of the variables and statistics (7 points)
- Assessing data quality (missing values, outliers) (7 points)
- Variables transformations (6 points)
- Pairwise correlations and eventual elimination of redundant variables (7 points)
Guidelines for the task on clustering
- Clustering Analysis by K-means: (13 points)
- Choice of attributes and distance function (1 points)
- Identification of the best value of k (5 points)
- Characterization of the obtained clusters by using both analysis of the k centroids and comparison of the distribution of variables within the clusters and that in the whole dataset (7 points)
- Analysis by density-based clustering (9 points)
- Choice of attributes and distance function (2 points)
- Study of the clustering parameters (2 points)
- Characterization and interpretation of the obtained clusters (5 points)
- Analysis by hierarchical clustering (5 points)
- Choice of attributes and distance function (2 points)
- Show and discuss different dendograms using different algorithms (3 points)
- Final evaluation of the best clustering approach and comparison of the clustering obtained (3 points)
Guidelines for the task on Association Rules Mining
- Frequent Pattern Extraction with analysis of different values of support(12 points)
- Association Rule Extraction with analysis of different value of support and confidence (12 points)
- Discussion on the interesting rules extracted (6 points)
Guidelines for the task on Classification
- Learning of different decision trees (12 points)
- Decision tree validation and interpretation (12 points)
- Discussion on the best decision tree (6 points)
dm/start/guidelines.1476713293.txt.gz · Ultima modifica: 17/10/2016 alle 14:08 (8 anni fa) da Anna Monreale