Differenze

Queste sono le differenze tra la revisione selezionata e la versione attuale della pagina.

--- dm:start:guidelines [05/05/2017 alle 08:24 (8 anni fa)] – [Guidelines for the task on Association Rules Mining] Anna Monreale
+++ dm:start:guidelines [23/11/2020 alle 10:34 (5 anni fa)] (versione attuale) – [Guidelines for the task on Classification] Riccardo Guidotti
@@ Linea 1: / Linea 1: @@
 ====== Guidelines for the task on Data Understanding ======
    * Data understanding (30 points)
-   * Data semantics (3 points)
+     - Data semantics (3 points)
-   * Distribution of the variables and statistics (7 points)
+     - Distribution of the variables and statistics (7 points)
-   * Assessing data quality (missing values, outliers) (7 points)
+     - Assessing data quality (missing values, outliers) (7 points)
-   * Variables transformations (6 points)
+     - Variables transformations (6 points)
-   * Pairwise correlations and eventual elimination of redundant variables (7 points)
+     - Pairwise correlations and eventual elimination of redundant variables (7 points)
@@ Linea 26: / Linea 26: @@
 ====== Guidelines for the task on Association Rules Mining ======
-  * Frequent patterns extraction with different values of support and different types 	(i.e. frequent, close maximal), (5 points)
+  * Frequent patterns extraction with different values of support and different types (i.e. frequent, close, maximal), (6 points)
-  * Discussion of the most interesting frequent patterns (6 points)
+  * Discussion of the most interesting frequent patterns and analyze how changes the number of patterns w.r.t. the min_sup parameter (7 points)
-  * Association rules extraction with different values of confidence (5 points)
+  * Association rules extraction with different values of confidence (6 points)
-  * Discussion of the most interesting rules (6 points)
+  * Discussion of the most interesting rules and analyze how changes the number of rules w.r.t. the min_conf parameter, histogram of rules' confidence and lift (7 points)
-  * Use the most meaningful rules to replace missing values and evaluate the accuracy 	(4 points)
+  * Use the most meaningful rules to replace missing values and evaluate the accuracy (2 points)
-  * Use the most meaningful rules to predict if the diabetes is detected and evaluate the accuracy (4 points)
+  * Use the most meaningful rules to predict the target variable and evaluate the accuracy (2 points)
 ====== Guidelines for the task on Classification ======
-   * Learning of different decision trees with different parameters and gain formulas with the object of maximizing the performances (12 points)
+   * Learning of different decision trees/classification algorithms with different parameters and gain formulas with the object of maximizing the performances (12 points)
-   * Decision trees interpretation (6 points)
+   * Decision trees interpretation, validation with test and training set (6 points)
-   * Decision trees validation with test and training set (6 points)
+   * Training of different KNN classifiers with different parameters with the object of maximizing the performances (6 points)
    * Discussion of the best prediction model (6 points)
@@ Linea 46: / Linea 46: @@
    * Only PDF file are allowed, you do not have to submit python code or the knime workflows.
    * The final paper must be easily readable, i.e., it is better to use font size higher than 9pt.
-   * Use a readable font size, e.g. Arial, Times New Romans
+   * Use a readable font type and size, e.g. Arial, Times New Romans
    * You can use multiple columns and change the margin size but the project must be readable.
    * It is NOT required to put python code, knime flows, or theoretical descriptions of the algorithm in the final paper.