Minutes of Roundtable Discussion Winter 2020 - I

The format of our roundtable discussions for Winter 2020 has been changed from that of Fall 2019: we have decided to have a roundtable discussion once a month instead of every week, and schedule it before a seminar talk with refreshments instead of after a seminar talk.

The first roundtable discussion of this quarter was held before Dr. Yilin Zhang (Facebook)’s statistics seminar talk: Understanding Regularized Spectral Clustering via Graph Conductance. Her talk slides can be found here.

The roundtable discussion kicked off by the host asking the guest speaker about her background: she graduated with a PhD degree in Statistics from UW-Madison last year.

Q: How did you decide to join Facebook core data scientist team?

A: It seems to be a perfect match since I worked on social networks during my PhD.


Q: Could you please talk about your current work at Facebook? Do you have flexibility to choose your own research problems?

A: We have about 100 people in core data science team. My team also has good work-life balance. We aim to make Facebook platform more safe and fair for everyone. I mainly work on improving existing machine learning problems at Facebook. I also identify problems in analyzing data, build more general framework to solve the problem and then convince others to use machine learning methods in the correct way. For example, when training / evaluating models, labels are usually treated as true labels. But actually they are noisy. My work lets them know how to quantify the noise and improve the noisy labels.


Q: What studies did you wish you have done when you were a graduate student, any regrets?

A: My undergraduate study focuses on math and it’s so theoretical. I think statistics is more applied, and I wanted to do real data problems. So I transferred to the statistics PhD program. However, during my PhD, I found people value more on theoretical work, not applied. So I had no idea where my passion was and struggled a lot. Then I found that Prof. Karl Rohe works on social network and collaborates with lots of people from social science. His work is more aligned with my passion. When I applied for jobs in industry, I found that skills needed in industry are very different from what I was trained during PhD. So if I could do differently, first, rather than learning various models first without knowing which model to use when faced with data, I would start from real problems. I would identify what are the problems, why there are such problems, and then communicate with other people about the problems and solutions. I would also practice more coding if I could go back.


Q: Student didn’t appreciate the methods taught in classes, because they don’t know how to solve problems. I only started to understand them when I actually applied them to real datasets. Classes should be project-oriented. Instead of traditional way of teaching, we should present the problems, analyze the problems and tell students if you apply this method, you will get large error, so you should move on to the next method, etc. Students only find out these basic things when they actually apply the methods to datasets. But what often happens is that after people started working in industry and face real problems to solve, they realize that all these “recipes” were taught a long time ago, then they need to re-learn them, which is inefficient.

A: We just need to provide more classes, like data science classes. We can’t move all previous statistics classes to data science classes. We should have some data science people to teach data science.


Q: Does Facebook have any publicly-available data repository?

A: Based on my knowledge, there is no public data. But there are some people in academia collaborating with Facebook research team, as contractors. People at Facebook can invite researchers to Facebook from academia to present their work, which may facilitate collaborations.


Q&A from the Audience:


Q: You mentioned that you had issue with learning models, but don’t know how to approach problems using them. How do you solve this problem to get the job?

A: I am just lucky, because my research was interesting to Facebook. I refreshed my knowledge by reading some basic textbooks to prepare for the interview. I also practiced communication skills through presentations.


Q: I heard that in industry, people usually don’t use complicated methods, they just use PCA or something basic?

A: In reality, gradient boosted decision tree usually performs really well, more complex model doesn’t necessarily improve performance. However, as researchers, we do deeper research than people who only know how to run the model and we can identify problems for using these models.


Q: Could you please explain the metrics for evaluating ranking performance at Facebook?

A: I don’t know how they define ads score or metrics because I don’t work on this.


Q: How do people start to use the methods you developed at Facebook?

A: I talk to different teams and read their docs, and see whether they are interested in my work.


Q: Do you have freedom to choose what project to work on?

A: I do have freedom to choose, but there is a risk that no one will use it. So you should choose carefully.


Q: What abilities are needed at Facebook and how to learn these abilities?

A: We need the ability to identify problems and solve it quickly in a solid way. Also, you need to be able to communicate it well and convince people to use it. These are also qualities to be a successful researcher. You can try to collaborate with other departments during PhD to get a sense of it.


Q: How is the lunch at Facebook campus?

A: Good!


[Scribe: Qin Ding (Statistics)]