DidaWiki

Privacy is an important problem in our society: the lack of trustable privacy safeguards in many current services and devices is at the basis of a diffusion that is often more limited than expected. Moreover, people feel reluctant to provide true personal data, if not absolutely necessary. Thus, privacy is becoming a fundamental aspect to be took into account when one wants to use, publish and analyze data involving sensitive information. Many recent research works have focused on the study of privacy protection: some of these studies aim at individual privacy, i.e., the protection of sensitive individual data, while others aim at corporate privacy, i.e., the protection of strategic information at organization level. Unfortunately, transforming the data in such a way to protect sensitive information is increasingly hard: we live in the era of big data characterized by an unprecedented opportunities of sensing, storing and analyzing complex data describing human activities at extreme detail and resolution, resulting in the fact that anonymization simply cannot be accomplished by de-identification. In the last years, several techniques for creating anonymous or obfuscated versions of data sets have been proposed, which essentially aim at find an acceptable trade-off between data privacy on one side and data utility on the other side. So far, the common result obtained is that no general method exists, capable of both dealing with “generic personal data” and preserving “generic analytical results”. In my research work I propose the design of technological frameworks to counter the threats of undesirable, unlawful effects of privacy violation, without obstructing the knowledge discovery opportunities of data mining technologies. The main idea is to inscribe privacy protection into the knowledge discovery technology by design, so that the analysis incorporates the relevant privacy requirements from the start. So, I propose the privacy-by-design paradigm that sheds a new light in the study of privacy protection: once specific assumptions are made on the sensitive data and the target mining queries that are to be answered with the data, it is conceivable to design a framework to: a) transform the source data into an anonymous version with a quantifiable privacy guarantee, and b) guarantee that the target mining queries can be answered correctly using the transformed data instead of the original ones. My research work investigates on two new research problems which arise in modern Data Mining and Data Privacy: individual privacy protection in data publishing while preserving specific data mining analysis, and corporate privacy protection in data mining outsourcing.