dm:start:guidelines
Differenze
Queste sono le differenze tra la revisione selezionata e la versione attuale della pagina.
| Entrambe le parti precedenti la revisioneRevisione precedenteProssima revisione | Revisione precedente | ||
| dm:start:guidelines [29/09/2017 alle 09:56 (8 anni fa)] – [Guidelines for the Project] Anna Monreale | dm:start:guidelines [23/11/2020 alle 10:34 (5 anni fa)] (versione attuale) – [Guidelines for the task on Classification] Riccardo Guidotti | ||
|---|---|---|---|
| Linea 1: | Linea 1: | ||
| ====== Guidelines for the task on Data Understanding ====== | ====== Guidelines for the task on Data Understanding ====== | ||
| * Data understanding (30 points) | * Data understanding (30 points) | ||
| - | | + | |
| - | * Distribution of the variables and statistics (7 points) | + | - Distribution of the variables and statistics (7 points) |
| - | * Assessing data quality (missing values, outliers) (7 points) | + | - Assessing data quality (missing values, outliers) (7 points) |
| - | * Variables transformations (6 points) | + | - Variables transformations (6 points) |
| - | * Pairwise correlations and eventual elimination of redundant variables (7 points) | + | - Pairwise correlations and eventual elimination of redundant variables (7 points) |
| Linea 26: | Linea 26: | ||
| ====== Guidelines for the task on Association Rules Mining ====== | ====== Guidelines for the task on Association Rules Mining ====== | ||
| - | * Frequent patterns extraction with different values of support and different types (i.e. frequent, close maximal), (5 points) | + | * Frequent patterns extraction with different values of support and different types (i.e. frequent, close, maximal), (6 points) |
| - | * Discussion of the most interesting frequent patterns (6 points) | + | * Discussion of the most interesting frequent patterns |
| - | * Association rules extraction with different values of confidence (5 points) | + | * Association rules extraction with different values of confidence (6 points) |
| - | * Discussion of the most interesting rules (6 points) | + | * Discussion of the most interesting rules and analyze how changes the number of rules w.r.t. the min_conf parameter, histogram of rules' confidence and lift (7 points) |
| - | * Use the most meaningful rules to replace missing values and evaluate the accuracy | + | * Use the most meaningful rules to replace missing values and evaluate the accuracy (2 points) |
| - | * Use the most meaningful rules to predict | + | * Use the most meaningful rules to predict the target variable |
| ====== Guidelines for the task on Classification ====== | ====== Guidelines for the task on Classification ====== | ||
| - | * Learning of different decision trees with different parameters and gain formulas with the object of maximizing the performances (12 points) | + | * Learning of different decision trees/ |
| - | * Decision trees interpretation (6 points) | + | * Decision trees interpretation, validation with test and training set (6 points) |
| - | | + | |
| * Discussion of the best prediction model (6 points) | * Discussion of the best prediction model (6 points) | ||
dm/start/guidelines.1506678960.txt.gz · Ultima modifica: 29/09/2017 alle 09:56 (8 anni fa) da Anna Monreale
