dm:start:guidelines
Differenze
Queste sono le differenze tra la revisione selezionata e la versione attuale della pagina.
Entrambe le parti precedenti la revisioneRevisione precedenteProssima revisione | Revisione precedenteProssima revisioneEntrambe le parti successive la revisione | ||
dm:start:guidelines [14/11/2016 alle 14:13 (8 anni fa)] – [Guidelines for the task on data understanding] Anna Monreale | dm:start:guidelines [03/10/2018 alle 23:29 (6 anni fa)] – [Guidelines for the task on Data Understanding] Anna Monreale | ||
---|---|---|---|
Linea 1: | Linea 1: | ||
====== Guidelines for the task on Data Understanding ====== | ====== Guidelines for the task on Data Understanding ====== | ||
* Data understanding (30 points) | * Data understanding (30 points) | ||
- | * Data semantics (3 points) | + | * * Data semantics (3 points) |
- | * Distribution of the variables and statistics (7 points) | + | * * Distribution of the variables and statistics (7 points) |
- | * Assessing data quality (missing values, outliers) (7 points) | + | * * Assessing data quality (missing values, outliers) (7 points) |
- | * Variables transformations (6 points) | + | * * Variables transformations (6 points) |
- | * Pairwise correlations and eventual elimination of redundant variables (7 points) | + | * * Pairwise correlations and eventual elimination of redundant variables (7 points) |
Linea 26: | Linea 26: | ||
====== Guidelines for the task on Association Rules Mining ====== | ====== Guidelines for the task on Association Rules Mining ====== | ||
- | * Frequent patterns extraction with different values of support and different types (i.e. frequent, close maximal), (5 points) | + | * Frequent patterns extraction with different values of support and different types (i.e. frequent, close, maximal), (6 points) |
- | * Discussion of the most interesting frequent patterns (6 points) | + | * Discussion of the most interesting frequent patterns |
- | * Association rules extraction with different values of confidence (5 points) | + | * Association rules extraction with different values of confidence (6 points) |
- | * Discussion of the most interesting rules (6 points) | + | * Discussion of the most interesting rules and analyze how changes the number of rules w.r.t. the min_conf parameter, histogram of rules' confidence and lift (7 points) |
- | * Use the most meaningful rules to replace missing values and evaluate the accuracy | + | * Use the most meaningful rules to replace missing values and evaluate the accuracy (2 points) |
- | * Use the most meaningful rules to predict | + | * Use the most meaningful rules to predict |
====== Guidelines for the task on Classification ====== | ====== Guidelines for the task on Classification ====== | ||
- | * Learning of different decision trees with different parameters and gain formulas with the object of maximizing the performances (12 points) | + | * Learning of different decision trees/ |
* Decision trees interpretation (6 points) | * Decision trees interpretation (6 points) | ||
* Decision trees validation with test and training set (6 points) | * Decision trees validation with test and training set (6 points) | ||
Linea 46: | Linea 46: | ||
* Only PDF file are allowed, you do not have to submit python code or the knime workflows. | * Only PDF file are allowed, you do not have to submit python code or the knime workflows. | ||
* The final paper must be easily readable, i.e., it is better to use font size higher than 9pt. | * The final paper must be easily readable, i.e., it is better to use font size higher than 9pt. | ||
- | * Use a readable font size, e.g. Arial, Times New Romans | + | * Use a readable font type and size, e.g. Arial, Times New Romans |
* You can use multiple columns and change the margin size but the project must be readable. | * You can use multiple columns and change the margin size but the project must be readable. | ||
* It is NOT required to put python code, knime flows, or theoretical descriptions of the algorithm in the final paper. | * It is NOT required to put python code, knime flows, or theoretical descriptions of the algorithm in the final paper. |
dm/start/guidelines.txt · Ultima modifica: 23/11/2020 alle 10:34 (4 anni fa) da Riccardo Guidotti