The quest for better data analysis and artificial intelligence has lead to more and more data being collected and stored. As a consequence, more data are exposed to malicious entities. This paper examines the problem of privacy in machine learning for classification. We utilize the Ridge Discriminant Component Analysis (RDCA) to desensitize data with respect to a privacy label. Based on five experiments, we show that desensitization by RDCA can effectively protect privacy (i.e. low accuracy on the privacy label) with small loss in utility. On HAR and CMU Faces datasets, the use of desensitized data results in random guess level accuracies for privacy at a cost of 5.14% and 0.04%, on average, drop in the utility accuracies. For Semeion Handwritten Digit dataset, accuracies of the privacy-sensitive digits are almost zero, while the accuracies for the utility-relevant digits drop by 7.53% on average. This presents a promising solution to the problem of privacy in machine learning for classification.
As the analytic tools become more powerful, and more data are generated on a daily basis, the issue of data privacy arises. This leads to the study of the design of privacy-preserving machine learning algorithms. Given two objectives, namely, utility maximization and privacy-loss minimization, this work is based on two previously non-intersecting regimes — Compressive Privacy and multi-kernel method. Compressive Privacy is a privacy framework that employs utility-preserving lossy-encoding scheme to protect the privacy of the data, while multi-kernel method is a kernel-based machine learning regime that explores the idea of using multiple kernels for building better predictors. In relation to the neural-network architecture, multi-kernel method can be described as a two-hidden-layered network with its width proportional to the number of kernels. The compressive multi-kernel method proposed consists of two stages — the compression stage and the multi-kernel stage. The compression stage follows the Compressive Privacy paradigm to provide the desired privacy protection. Each kernel matrix is compressed with a lossy projection matrix derived from the Discriminant Component Analysis (DCA). The multikernel stage uses the signal-to-noise ratio (SNR) score of each kernel to non-uniformly combine multiple compressive kernels. The proposed method is evaluated on two mobile-sensing datasets — MHEALTH and HAR — where activity recognition is defined as utility and person identification is defined as privacy. The results show that the compression regime is successful in privacy preservation as the privacy classification accuracies are almost at the random-guess level in all experiments. On the other hand, the novel SNR-based multi-kernel shows utility classification accuracy improvement upon the state-of-the-art in both datasets. These results indicate a promising direction for research in privacy-preserving machine learning.
In the internet era, the data being collected on consumers like us are growing exponentially and attacks on our privacy are becoming a real threat. To better assure our privacy, it is safer to let data owner control the data to be uploaded to the network, as opposed to taking chance with the data servers or the third parties. To this end, we propose a privacy-preserving technique, named Compressive Privacy (CP), to enable the data creator to compress data via collaborative learning, so that the compressed data uploaded onto the internet will be useful only for the intended utility and will not be easily diverted to malicious applications.
For data in a high-dimensional feature vector space, a common approach to data compression is dimension reduction or, equivalently, subspace projection. The most prominent tool is Principal Component Analysis (PCA). For unsupervised learning, PCA can best recover the original data given a specific reduced dimensionality. However, for supervised learning environment, it is more effective to adopt a supervised PCA, known as the Discriminant Component Analysis (DCA), in order to maximize the discriminant capability.
The DCA subspace analysis embraces two different subspaces. The signal subspace components of DCA are associated with the discriminant distance/power (related to the classification effectiveness), while the noise subspace components of DCA are tightly coupled with the recoverability and/or privacy protection. This paper will present three DCA-related data compression methods useful for privacy-preserving applications.
Utility-driven DCA: Because the rank of the signal subspace is limited by the number of classes, DCA can effectively support classification using a relatively small dimensionality (i.e. high compression).
Desensitized PCA: By incorporating a signal-subspace ridge into DCA, it leads to a variant especially effective for extracting privacy-preserving components. In this case, the eigenvalues of the noise-space are made to become insensitive to the privacy labels and are ordered according to their corresponding component powers.
Desensitized K-means/SOM: Since the revelation of the K-means or SOM cluster structure could leak sensitive information, it will be safer perform K-means or SOM clustering on desensitized PCA subspace.
Over the past decades, face recognition has been a problem of critical interest in the machine learning and signal processing communities. However, conventional approaches such as eigenfaces do not protect the privacy of user data, which is emerging as an important design consideration in today's society. In this work, we leverage a supervised-learning subspace projection method called Discriminant Component Analysis (DCA) for privacy-preserving face recognition. By projecting the data onto the lower-dimensional signal subspace prescribed by DCA, high performance of face recognition is achievable without compromising privacy of the data owners. We evaluate our approach on three image datasets: Yale, Olivetti and Glasses datasets - the last is derived from the former two. Our approach can serve as a key enabler for real-world deployment of privacy-preserving face recognition applications, and provides a promising direction to researchers and private sectors.
Synthetic biology is facilitating novel methods and components to build in vivo and in vitro circuits to better understand and re-engineer biological networks. Circadian oscillators serve as molecular clocks that govern several important cellular processes such as cell division and apoptosis. Hence, successful demonstration of synthetic oscillators have become a primary design target for many synthetic biology endeavors. Recently, three synthetic transcriptional oscillators were demonstrated by Kim and Winfree utilizing modular architecture of synthetic gene analogues and a few enzymes. However, the periods and amplitudes of synthetic oscillators were sensitive to initial conditions and allowed limited tunability. In addition, it being a closed system, the oscillations were observe to die out after a certain period of time. To increase tunability and robustness of synthetic biochemical oscillators in the face of disturbances and modeling uncertainties, a control theoretic approach for real-time adjustment of oscillator behaviors would be required. In this paper, assuming an open system implementation is feasible, we demonstrate how dynamic inversion techniques can be used to synthesize the required controllers.