Minutes of Roundtable Discussion 9

Our ninth roundtable discussion took place after Javad Lavaei’s MADDD seminar talk “Computational Techniques for Nonlinear Optimization and Learning Problems”. His talk slides can be found here.

The roundtable discussion consisted of two parts: the first is on challenges and the future of data science and machine learning in the field of power systems, and the second is on the organization and activities of data science and machine learning at UC Berkeley.

The first question is what are the challenges, future and pressing issues of machine learning in the field of control theory and power systems. The speaker started by mentioning that machine learning has made tremendous success in computer science where the tolerance for mistakes is relatively high and the cost for generating data is relatively cheap. In power systems, however, the availability of relevant data (e.g., rare events that are critically important) is too scarce for current AI algorithms to learn successfully. Moreover, a single mistake could lead to catastrophic consequences (e.g. blackouts) in power system, which calls for stronger theoretical guarantees from algorithms. The speaker also mentioned that the current decision-making process in power system is still largely based on human brain power rather than automated algorithms, which is partly due to the fact that the power systems are highly uncertain and uncontrolled environments.

The host then asked the speaker about his thoughts on the differences between academia and industry and how he applies knowledge into practice. The speaker mentioned that industry is relatively consevative in terms of making changes and sharing data and algorithms. But he also said that there is a big push from academia to industry thanks to the effort of DOE, which has created many opportunities.

An attendee asked why is there not enough data in power system. The speaker started by saying that it is a matter of how much data is enough. In power systems we have tens of thousands of nodes and hundreds of thousands of lines. It is hard to sample enough data from a n-dimensional space when n is around 200,000. Another reason the speaker mentioned is that in the old times people used electro-mechanical meters, which only sample every 8 minutes with limited types of measurements. This renders a lot of the data from the past useless.

For the second topic, the host asked what the current status of data science is at UC Berkeley. The speaker began by mentioning that Berkeley intends to build up Division of Data Science and Information that is something between a department and an institute, which overlaps many departments. The idea is to first get as many departments involved as possible as well as to raise enough money to hire faculty members. As this is a new concept, there is a lot to be discussed (for example, the bureaucratic structure of the division, the role of the division in research etc.). But the general goal for the division is clear, that is, to do activities around data.


[Scribe: Shaofeng Deng (GGAM)]