The UC Davis TETRAPODS Institute of Data Science (UCD4IDS), which are composed of 35 researchers (four PIs and 31 senior participants) coming from four departments (Computer Science, Electrical & Computer Engineering, Mathematics, and Statistics) and will cross interdepartmental barriers and promote interdisciplinary research collaborations among faculty members, postdocs, and graduate students. The project will encourage innovative and robust research, and provide education and mentoring of graduate students and postdocs in data science. Students and postdocs engaged in this project will be trained to be the next generation of interdisciplinary data scientists: they will gain deep knowledge of some focused areas, and at the same time, broaden their perspectives in other diverse fields. The UCD4IDS will bring in the insights gained by the experience of the faculty members in the four primary departments as well as application fields such as neuroscience, medical and health sciences, and veterinary medicine.
The UCD4IDS will organize: a) round-table discussions and breakout sessions after weekly seminars related to data science; b) quarterly colloquia on data science; and c) annual three-day workshops. The project will also coordinate and develop diverse courses at UC Davis, with graduate students involved in the project taking at least one course in each of the four departments. The PI team will also leverage local programs to recruit, support, and retain graduate students, postdocs, and new faculty members from underrepresented groups by matching them to appropriate mentors. For the dissemination of the research and educational results, the PI team plans to: 1) make colloquia and workshop talk slides, lecture notes, and codes available online, which will reach out to our current and future collaborators and the general public; and 2) organize mini-symposia and workshops on foundations of data science at targeted conferences.
Research at the UCD4IDS will focus on three broad themes: 1) Fundamentals of machine learning directed toward biological and medical applications; 2) Optimization theory and algorithms for machine learning including numerical solvers for large-scale nontrivial learning problems; and 3) High-dimensional data analysis on graphs and networks. The algorithms and software tools to be developed will make a positive impact in solving practical data-analysis and machine-learning problems in diverse fields, e.g., computer science (analyzing friendship relations in social networks); electrical engineering (monitoring and controlling sensor networks); civil engineering (monitoring traffic flow on a road network); and in particular, biology and medicine (analyzing data measured on real neural networks, detecting changes in the brain structures due to diseases, imaging live biological cells for analyzing their growth, etc.). The technical goals of this project are: 1) geometric understanding of high-dimensional data, which may allow efficient (re)sampling from manifolds representing certain phenomena of interest and classifying subtle yet critical differences that often appear in biological and medical applications; 2) providing theoretical guarantees and efficient numerical algorithms for non-convex optimization, which is crucial to machine learning; and 3) deepening understanding of how local interactions between individual entities (e.g., neurons) lead to global coordination and decision making.
This project is part of the National Science Foundation's Harnessing the Data Revolution (HDR) Big Idea activity.