Roundtable Discussion Winter 2020: I

Event Date

1147 Math. Sci. Bldg.

We will have our first UCD4ISD roundtable discussion in Winter Quarter 2020 with Dr. Yilin Zhang of Facebook. Subjects/themes of this round discussion include but not limited to:

  • What would a data scientist in industry recommend our graduate students (in statistics, math, cs, ece) to do or study if they want to be data scientists in industry?
  • What would you suggest our faculty to teach our graduate students who are interested in data science from your viewpoint?
  • What do you think would be future directions of data science research?

Coffee/tea reception is also held in the same room 1147 MSB during this roundtable discussion.

Attention: graduate students: please attend this roundtable discussion. It should be quite informative for your career.

After the roundtable discussion, Dr. Zhang will give the following statistics seminar at 4:10pm:

Title: Understanding Regularized Spectral Clustering via Graph Conductance
Abstract:  This work uses the relationship between graph conductance and spectral clustering to study (i) the failures of spectral clustering and (ii) the benefits of regularization. The explanation is simple. Sparse and stochastic graphs create a lot of small trees that are connected to the core of the graph by only one edge. Graph conductance is sensitive to these noisy 'dangling sets'. Spectral clustering inherits this sensitivity. The second part of this work starts from a previously proposed form of regularized spectral clustering and shows that it is related to the graph conductance on a 'regularized graph'. We call the conductance on the regularized graph CoreCut. Based upon previous arguments that relate graph conductance to spectral clustering (e.g. Cheeger inequality), minimizing CoreCut relaxes to regularized spectral clustering. Simple inspection of CoreCut reveals why it is less sensitive to small cuts in the graph. Together, these results show that unbalanced partitions from spectral clustering can be understood as overfitting to noise in the periphery of a sparse and stochastic graph. Regularization fixes this overfitting. In addition to this statistical benefit, these results also demonstrate how regularization can improve the computational speed of spectral clustering. We provide simulations and data examples to illustrate these results.