Guidelines for the homework on clustering

Description of the variables

For each car driver we observe the following quantities, measured over a certain time window of mobile activity:

Length = total traveled distance (m.)
Duration = total time spent driving (sec.)
Count = number of different trips
Phighway = distance traveled on highways (m.)
Pcity = distance traveled inside cities (m.)
Length_arc_crowded = distance traveled on 20% most crowded roads (m.)
Pnight = distance traveled at night time (m.)
Pover = distance traveled over speed limit (m.)
Profile = number of systematic trips, e.g., work-home
Radius_g = radius of gyration: sparsity of location from the center of mass of the driver (mean position)
Radius_g_L1 = radius of gyration w.r.t. L1: sparsity of location from the driver's most frequent location (e.g., home)
Avg_Dist_L1 = average distance from L1:  average distance from the driver's most frequent location (e.g., home)
TimeL1L2 = % time spent at locations L1 and L2 (most and second most preferred locations)
EntropyArc = entropy on road segment frequencies, measures the diversity of roads traveled
EntropyLocation = entropy on location frequencies, measures the diversity of places visited
EntropyTime = entropy on hours of the day, measures the diversity of daily patterns

Notice that there are no missing values in the dataset, hence “0”s are actual “0”s, NOT missing values.