Students: Sophie Champ, Michael Peralta, Sarah Vaknin

Project: Subspace Clustering in Julia

Poster: Vertical (PDF) | Horizontal (PDF)

Institution: Lehigh University

Major: Industrial and Systems Engineering

Advisor: Daniel Robinson


Subspace clustering is the task of partitioning data that belongs to a collection of unknown subspaces into groups (i.e., clusters) such that all members of each group belong to the same subspace.  The data to be clustered is unlabeled (i.e., the subspaces to which they belong are not assumed to be known in advance) and therefore must be learned, and the different subspaces are allowed to be of different dimensions.  There are many real world applications that can benefit from advances in subspace clustering such as video segmentation and face clustering under different lighting conditions in computer vision. We used Julia, a fast dynamic language, to successfully cluster data from multiple randomly generated subspaces. To solve the subspace clustering problem, we considered an optimization modeling formulation closely related to the Lasso optimization problem to compute coefficient vectors that encode the “similarity” between pairs of data points. The Lasso problem, defined as the “Least Absolute Shrinkage and Selection Operator” problem, aims to regularize the coefficients computed by “shrinking” the least important coefficients to zero. The set of coefficient vectors are collected into a sparse matrix, which is then used to compute the clustering of the data. We tested our algorithm using synthetic data also created in Julia.

Sophie Champ

About Sophie Champ

Sophie Champ, a senior at Lehigh University, originally from La Canada, CA,  studying Industrial and Systems Engineering (ISE) with an applied math minor. She collaborated with a team of other ISE students under the guidance of Professor Daniel Robinson to develop a Subspace Clustering algorithm. The team is continuing research this semester focused on predicting septic shock in hospital patients. Following her graduation in May, Sophie will be joining KPMG as a Digital Consultant at the San Francisco office. In her free time, Sophie enjoys singing, playing the guitar, and hiking.

Michael Peralta

About Michael Peralta

Michael Peralta is a junior majoring in industrial and systems engineering with a minor in computer science from Doylestown Pennsylvania. This summer he will be interning at FedEx in their linehaul optimization department. Outside of school Michael enjoys running and is the vice president of the running club.

Sarah Vaknin

About Sarah Vaknin

Sarah Vaknin, a senior at Lehigh University, originally from Fair Lawn, NJ, studying Industrial and Systems Engineering (ISE) with a Computer Science minor. She worked with a team under the guidance of Professor Daniel Robinson to build an algorithm for Subspace Clustering. Currently, Sarah works with the same team to solve an optimization problem in order to predict septic shock of patients in a hospital setting. This research has allowed her to explore the computing and software side of ISE, which is what she is most interested in. Aside from research, Sarah is involved in an acapella group on campus, she enjoys playing soccer and the piano, and loves to cook.