dm:start:guidelines

Questa è una vecchia versione del documento!

Indice

Guidelines for the task on data understanding
Guidelines for the task on clustering
Guidelines for the task on Association Rules Mining
Guidelines for the task on Classification

Guidelines for the task on data understanding

Data understanding (30 points)
Data semantics (3 points)
Distribution of the variables and statistics (7 points)
Assessing data quality (missing values, outliers) (7 points)
Variables transformations (6 points)
Pairwise correlations and eventual elimination of redundant variables (7 points)
Tot 30 points

Guidelines for the task on clustering

Clustering Analysis by K-means: (15 points)
- Identification of the best value of k
- Characterization of the obtained clusters by using both analysis of the k centroids and comparison of the distribution of variables within the clusters and that in the whole dataset

Analysis by density-based clustering (10 points)
- Study of the clustering parameters
- Characterization and interpretation of the obtained clusters

Analysis by hierarchical clustering (5 points)
- Analysis to be performed on a sampling of the data for scalability reasons (if necessary)

Guidelines for the task on Association Rules Mining

Frequent Pattern Extraction with analysis of different values of support(12 points)
Association Rule Extraction with analysis of different value of support and confidence (12 points)
Discussion on the interesting rules extracted (6 points)

Guidelines for the task on Classification

Learning of different decision trees (12 points)
Decision tree validation and interpretation (12 points)
Discussion on the best decision tree (6 points)

dm/start/guidelines.1476713081.txt.gz · Ultima modifica: 17/10/2016 alle 14:04 (9 anni fa) da Anna Monreale