Minutes of Roundtable Discussion Winter 2020 - III
The third and last roundtable discussion of this quarter was held before Dr. Ken Clarkson (IBM Research)’s statistics seminar talk: Dimensionality Reduction for Tukey Regression. His talk slides can be found here.
The following minutes were recorded by the scribe Zhenyu Wei who subsequently used The IBM Watson Speech to Text service to convert the recording to text. Then Zhenyu and Naoki further edited it to generate the final version below.
The roundtable discussion kicked off by the host asking the guest speaker about his background.
Q: I'd like to ask you first about your background and then how did you transition from academia/PhD to industry?
A: I guess I don't have that interesting answer in the sense that my PhD is from 1985, then after I got the degree, I immediately went to Bell Labs in Murray Hill, New Jersey. Then I stayed there until 2007 when I went to IBM Research in San Jose, so I have never had an actual paid university position. I have taught a couple of courses: once when I was a graduate student, introduction to programming, and once computational geometry at the University of Pennsylvania and that was enough teaching for me. I've been reasonably happy at industrial research labs, it's not kind of industry the way you might think of in the sense that I'm kind of far from where the rubber meets the road. To one extent or another, what I paid to do is to write papers, so I look sort of quasi-academic, but the other side of it is that I also work with groups like people down the hall, let's say, who are actually kind of in the more serious business of building things that might eventually become company products. So, I interact with them and try to help them with their work in whatever way that I can. In that sense, I'm sort of operating as a consultant and the way that I can be helpful is usually with respect to just kind of algorithmic knowledge. So if somebody doesn't know that there are fast algorithms to compute minimum spanning trees, I can point them to Kruskal’s algorithm, which for some of you would presumably sound very trivial but it is indeed something that is not kind of standard issue to everybody who works even in computer science. In other cases, I think it’s the stronger contributions that theory people like me have to do with formalizing the work of system builders, like trying to put it into a formal structure that can then be reasoned with, that can then lead to simplifications via generalization and so on. I've always worked for companies that are trying to make money, but I haven't had my main job to build things for the company, I've been kind of playing a secondary role.
Q: After you moved to IBM, something must have changed, not only physical locations, but also the contents of your work or projects and things you do. Did things change a lot or not too much?
A: The environment in industrial research places, even in the same company, the same lab, can change over time. For IBM, there's often a reference to pendulum swinging in one direction or another, in terms of how much it is really necessary for you to work on things that are going to help the company make money real soon, versus, we really want people who will help us make a name for ourselves [add to the company's reputation]. So, there are various kinds of changes in the environment in that sense. I'm not sure that I could really pin down too much the difference between Bell Labs when I left and IBM when I joined. One reason that I felt comfortable for switching was that it didn't seem like it would be that different. When I was hired to Bell Labs, Brian Kernighan was the manager who hired me. When he called me with the offer, he said "How would you like to come here and do whatever you want." That does have a negative side in the sense that you have to figure out what to do, with no particular connections or directions, which can be daunting. Later, it became a little bit clearer that it wasn’t ever really a case of "do whatever you want", but "do whatever you want as long as it has a clear relationship between your work and what the company is doing". We can describe why it's interesting, why it's important for the company. IBM is certainly that way and these days the directions have to do with AI, data science, and machine learning. There are various names for things that are very similar: what exactly is AI versus machine learning versus data mining versus data science versus statistics? There are some kind of large-scale differences but not that much. Somehow, we wanted to do cognitive science for a while and then we became AI, and we continue to do the same thing as pretty much. But there is a quite an emphasis on publishing papers in a way that add luster in that particular direction.
Q: Thinking about your career from PhD to Bell Labs and IBM, do you have any suggestions for the current graduate students if they are interested in jobs in industry as data scientists?
A: IBM Research is now all about AI and data science, including trying to build end-to-end solutions in the sense of trying to automate as much as possible or to simplify/facilitate as much as possible: going from the original data of whatever kind to cleaning that data, doing analysis on that data, generating reports related to that data, etc. We are trying to make systems that can do that whole pipeline of work. And I think different job skills are needed at each part of that, assuming it’s not yet entirely automated. For people all around the office, they're working in natural language processing and some cases in image processing, in various kinds of hardware for AI kinds of things. So, programming, statistics, and data science, and in particular, I think algorithms are all things that are very useful to have. And make people very hirable by not only IBM but Google and Facebook and so on.
Q: Thinking about the courses you took when you were a graduate student and then joining the company to do research, do you have any suggestion for the faculty members here about how to educate our students? What kind of courses you think would be great for students, especially those who are interested in industrial jobs?
A: First of all, my graduate school days were so long ago that they could not be used as a point of reference. But never say never, I guess the foundational things remain foundational forever. Minimum spanning tree algorithm served for a long time; What do you know, they came up in the context of this genomics applications that somebody wanted to do! I still think algorithms are important, I think data science basics are important. I'm trying to think about the difference between the estimation point of view of statistics versus the more predictive point of view of machine learning. I would think that more predictive point of view is the more popular one, the more easily sellable, and being able to do that gets you the job. But I think that's a temporary matter of fashion. I think having the statistical foundations is more important. Currently, there's quite an interest in machine learning and presumably also statistical questions, as you may already know, having to do with fairness and transparency and explainability. Once again, those neural networks don't give you necessarily the explainability that you might want to have. But that is an area of quite broad interest; it arguably is something that everybody should at least know about. NeurIPS, the preeminent AI conference, will now be requiring a brief social impact statement for every submission. That might be going a little overboard, but I think it is something to think about. The general questions of how you implement these systems in such a way that you take into account these kinds of issues of fairness and transparency, explainability, and so on. Those are going to be continuing to be critical and worth understanding and incorporated; those are not things that you can add on later. They probably have to be built into the kinds of analyses that you do, so understanding them from the beginning, just as with security that can't be added on, you have to build in from the beginning.
Q: How one can create that kind of motivation on to the students, say, to run certain numerical analysis algorithms? Because the student can’t see the point in them, it’s a challenge. Do you have any suggestions?
A: I certainly agree that having projects like building systems or doing an experiment, having experimental design, analyzing experimental results that do the full pipeline of those things. Or building a system for a particular application, understanding what the user needs, finding out whether or not the system satisfies the user needs, these different kinds of styles of software development. I think it is true that the best way to learn those things is by doing that. By analogy, when I learn various mathematical concepts, I don't feel that I can really understand them until I've used them to come up with a theorem and then proven the theorem myself. Solving problems, including making conjectures or proving theorem, designing algorithms that you think will be fast, proving that those algorithms are fast. That whole process is much more motivating for most people I would say, and really the way that you get a real handle on that kind of material. There are a variety of ways to do that, to really get your hands dirty in that sense. And I agree that is essential and one version of that is doing projects.
Q: Let's talk about the future of data science. This age of data won't go away, and it will last for a long long time because we are generating more and more data. In that sense, science and engineering to deal with these data will become even more important. Do you have an opinion about the future of data science?
A: I hate to again mention neural networks, but one thing that is being actively pursued in the organization that I'm in is the problem of integrating classical deep learning neural networks with some kind of symbolic reasoning. We will never get artificial general intelligence or even better performance on a variety of tasks until we have systems which incorporate not only a huge black box of lots and lots of tunable parameters but also logical data or symbolic data. I'm trying to understand these in the medical domain: how to incorporate medical ontologies into representing meaning as systems of vectors, and then be able to use the systems of vectors. And I'll get this these word embeddings (if that means anything to you) into neural network systems. But the vision of the future that I'm exposed to a lot locally is this kind of integration of black box neural network system and the somewhat more classical rule based symbolic, logical system and trying to actually understand how to have those things at the very least work well together if not be entirely integrated. I think that's a pretty fundamental question in data science.
[Scribe: Zhenyu Wei (Statistics)]