dm:start:clustering
Differenze
Queste sono le differenze tra la revisione selezionata e la versione attuale della pagina.
| Entrambe le parti precedenti la revisioneRevisione precedente | |||
| dm:start:clustering [18/12/2012 alle 14:15 (13 anni fa)] – [Guidelines for the homework on clustering] Fosca Giannotti | dm:start:clustering [18/12/2012 alle 14:20 (13 anni fa)] (versione attuale) – [Guidelines for the homework on clustering] Fosca Giannotti | ||
|---|---|---|---|
| Linea 18: | Linea 18: | ||
| * **Analysis by hierarchical clustering (Optional - 3 points)** | * **Analysis by hierarchical clustering (Optional - 3 points)** | ||
| * Analysis to be performed on a sampling of the data for scalability reasons | * Analysis to be performed on a sampling of the data for scalability reasons | ||
| + | |||
| + | |||
| + | ====== Description of the variables ====== | ||
| + | |||
| + | For each car driver we observe the following quantities, measured over a certain time window of mobile activity: | ||
| + | |||
| + | Length = total traveled distance (m.) | ||
| + | Duration = total time spent driving (sec.) | ||
| + | Count = number of different trips | ||
| + | Phighway = distance traveled on highways (m.) | ||
| + | Pcity = distance traveled inside cities (m.) | ||
| + | Length_arc_crowded = distance traveled on 20% most crowded roads (m.) | ||
| + | Pnight = distance traveled at night time (m.) | ||
| + | Pover = distance traveled over speed limit (m.) | ||
| + | Profile = number of systematic trips, e.g., work-home | ||
| + | Radius_g = radius of gyration: sparsity of location from the center of mass of the driver (mean position) | ||
| + | Radius_g_L1 = radius of gyration w.r.t. L1: sparsity of location from the driver' | ||
| + | Avg_Dist_L1 = average distance from L1: average distance from the driver' | ||
| + | TimeL1L2 = % time spent at locations L1 and L2 (most and second most preferred locations) | ||
| + | EntropyArc = entropy on road segment frequencies, | ||
| + | EntropyLocation = entropy on location frequencies, | ||
| + | EntropyTime = entropy on hours of the day, measures the diversity of daily patterns | ||
| + | |||
| + | Notice that there are no missing values in the dataset, hence " | ||
| + | |||
dm/start/clustering.1355840132.txt.gz · Ultima modifica: 18/12/2012 alle 14:15 (13 anni fa) da Fosca Giannotti
