Traditionally, differential privacy mechanism design has been tailored for a scalar-valued query function. Although many mechanisms such as the Laplace and Gaussian mechanisms can be extended to a matrix-valued query function by adding i.i.d. noise to each element of the matrix, this method is often sub-optimal as it forfeits an opportunity to exploit the structural characteristics typically associated with matrix analysis. In this work, we consider the design of differential privacy mechanism specifically for a matrix-valued query function. The proposed solution is to utilize a matrix-variate noise, as opposed to the traditional scalar-valued noise. Particularly, we propose a novel differential privacy mechanism called the Matrix-Variate Gaussian (MVG) mechanism, which adds a matrix-valued noise drawn from a matrix-variate Gaussian distribution. We prove that the MVG mechanism preserves (ϵ,δ)-differential privacy, and show that it allows the structural characteristics of the matrix-valued query function to naturally be exploited. Furthermore, due to the multi-dimensional nature of the MVG mechanism and the matrix-valued query, we introduce the concept of directional noise, which can be utilized to mitigate the impact the noise has on the utility of the query. Finally, we demonstrate the performance of the MVG mechanism and the advantages of directional noise using three matrix-valued queries on three privacy-sensitive datasets. We find that the MVG mechanism notably outperforms four previous state-of-the-art approaches, and provides comparable utility to the non-private baseline. Our work thus presents a promising prospect for both future research and implementation of differential privacy for matrix-valued query functions.
The rapid rise of IoT and Big Data can facilitate the use of data to enhance our quality of life. However, the omnipresent and sensitive nature of data can simultaneously generate privacy concerns. Hence, there is a strong need to develop techniques that ensure the data serve the intended purposes, but not for prying into one's sensitive information. We address this challenge via utility maximizing lossy compression of data. Our techniques combine the mathematical rigor of Kernel Learning models with the structural richness of Deep Neural Networks, and lead to the novel Multi-Kernel Learning and Hybrid Learning models. We systematically construct the proposed models in progressive stages, as motivated by the cumulative improvement in the experimental results from the two previously non-intersecting regimes, namely, Kernel Learning and Deep Neural Networks. The final experimental results of the three proposed models on three mobile sensing datasets show that, not only are our methods able to improve the utility prediction accuracies, but they can also cause sensitive predictions to perform nearly as bad as random guessing, resulting in a win-win situation in terms of utility and privacy.
Pattern recognition on big data can be challenging for kernel machines as the complexity grows with the squared number of training samples. In this work, we overcome this hurdle via the outlying data sample removal pre-processing step. This approach removes less-informative data samples and trains the kernel machines only with the remaining data, and hence, directly reduces the complexity by reducing the number of training samples. To enhance the classification performance, the outlier removal process is done such that the discriminant information of the data is mostly intact. This is achieved via the novel Outlier-Removal Discriminant Information (ORDI) metric, which measures the contribution of each sample toward the discriminant information of the dataset. Hence, the ORDI metric can be used together with the simple filter method to effectively remove insignificant outliers to both reduce the computational cost and enhance the classification performance. We experimentally show on two real-world datasets at the sample removal ratio of 0.2 that, with outlier removal via ORDI, we can simultaneously (1) improve the accuracy of the classifier by 1 %, and (2) provide significant saving on the total running time by 1.5x and 2x on the two datasets. Hence, ORDI can provide a win-win situation in this performance-complexity tradeoff of the kernel machines for big data analysis.
Differential privacy mechanism design has traditionally been tailored for a scalar-valued query function. Although many mechanisms such as the Laplace and Gaussian mechanisms can be extended to a matrix-valued query function by adding i.i.d. noise to each element of the matrix, this method is often suboptimal as it forfeits an opportunity to exploit the structural characteristics typically associated with matrix analysis. To address this challenge, we propose a novel differential privacy mechanism called the Matrix-Variate Gaussian (MVG) mechanism, which adds a matrix-valued noise drawn from a matrix-variate Gaussian distribution, and we rigorously prove that the MVG mechanism preserves (ϵ,δ)-differential privacy. Furthermore, we introduce the concept of directional noise made possible by the design of the MVG mechanism. Directional noise allows the impact of the noise on the utility of the matrix-valued query function to be moderated. Finally, we experimentally demonstrate the performance of our mechanism using three matrix-valued queries on three privacy-sensitive datasets. We find that the MVG mechanism notably outperforms four previous state-of-the-art approaches, and provides comparable utility to the non-private baseline. Our work thus presents a promising prospect for both future research and implementation of differential privacy for matrix-valued query functions.
In machine learning, feature engineering has been a pivotal stage in building a high-quality predictor. Particularly, this work explores the multiple Kernel Discriminant Component Analysis (mKDCA) feature-map and its variants. However, seeking the right subset of kernels for mKDCA feature-map can be challenging. Therefore, we consider the problem of kernel selection, and propose an algorithm based on Differential Mutual Information (DMI) and incremental forward search. DMI serves as an effective metric for selecting kernels, as is theoretically supported by mutual information and Fisher's discriminant analysis. On the other hand, incremental forward search plays a role in removing redundancy among kernels. Finally, we illustrate the potential of the method via an application in privacy-aware classification, and show on three mobile-sensing datasets that selecting an effective set of kernels for mKDCA feature-maps can enhance the utility classification performance, while successfully preserve the data privacy. Specifically, the results show that the proposed DMI forward search method can perform better than the state-of-the-art, and, with much smaller computational cost, can perform as well as the optimal, yet computationally expensive, exhaustive search.
A key challenge facing the design of differential privacy in the non-interactive setting is to maintain the utility of the released data. To overcome this challenge, we utilize the Diaconis Freedman-Meckes (DFM) effect, which states that most projections of high-dimensional data are nearly Gaussian. Hence, we propose the RON-Gauss model that leverages the novel combination of dimensionality reduction via random orthonormal (RON) projection and the Gaussian generative model for synthesizing differentially-private data. We analyze how RON Gauss benefits from the DFM effect, and present multiple algorithms for a range of machine learning applications, including both unsupervised and supervised learning. Furthermore, we rigorously prove that (a) our algorithms satisfy the strong ϵ-differential privacy guarantee, and (b) RON projection can lower the level of perturbation required for differential privacy. Finally, we illustrate the effectiveness of RON-Gauss under three common machine learning applications -- clustering, classification, and regression -- on three large real-world datasets. Our empirical results show that (a) RON-Gauss outperforms previous approaches by up to an order of magnitude, and (b) loss in utility compared to the non-private real data is small. Thus, RON-Gauss can serve as a key enabler for real-world deployment of privacy-preserving data release.