Minutes of Roundtable Discussion 8

Our eighth roundtable took place after Grace Yi’s statistics seminar talk “Making Sense of Noisy Data: Why and How?” Her talk slides can be found here.

The roundtable discussion consisted of our common two parts: the first is on challenges and the future of data science and machine learning in health sciences and medical applications, and the second is on the organization and activities of data science and machine learning at the University of Western Ontario.

On the first topic, health and medical sciences are one of the most active areas of data science research. The role of data science in these fields is getting more and more important. In fact, School of Public Health at Harvard University now offers MS degree in Health Data Science. For a long time it has been a central area of study of statistics. More traditional methods have difficulty with handling the complex and high volume of data arising today. One area of active research is how to handle neuro-imaging data, which is both high dimensional, expensive to collect, and usually doesn’t have the high fidelity check of biopsy that one has in other regions. A general issue with imaging data is that two doctors labeling the relevant parts of the image don’t necessarily agree on which parts are relevant; when there is the possibility of conducting biopsy, this is less of an issue. This ties into a general problem in machine learning that is just as common in medical applications: the trade off between spending on improved sensing (for example, better imaging tools) or on collecting more data and labels for that data. Another active area involves the integration of mobile health data, which is often quite noisy, but generates large quantities of data. A new area is studying prescription history to detect anomalous prescriptions, easing doctors’ burden and alerting patients to the potential need for a second opinion.

A participant asked what the biggest gap between research and actual practice. The speaker identified that practitioners have a stronger need for explanations for a given result and have a lower tolerance for false negatives. 

Another participant compared some privacy protecting techniques which inject a known noise distribution to the techniques for dealing with measurement noise described during the talk. Yet another participant claimed that in practice at Google at least the main privacy protecting technique involves randomly swapping a couple of indices for each person.


On the second topic, the University of Western Ontario has a thriving data science community. The university has a new president who has appointed a data science advisor from the computer science department who is coordinating efforts university wide, and is organizing a town hall for feedback. The university has established a Master of Data Analytics Program (https://www.uwo.ca/mda/program_components/program_overview.html), a one year program jointly offered by the Department of Statistical and Actuarial  Sciences (DSAS) and the Department of Computer Science. The program involves both a in-classroom portion and an experiential learning portion. The university also offers undergraduate programs in Data Science, with Honors Specialization in Data Science as well as Major in Data Science,  offered Jointly between Computer Science and DSAS. An initiative is underway that educational courses on Data Science are planned to be developed for students across the campus. While it remains a challenge to teach data science courses to undergraduate students outside of science and engineering (e.g., social sciences and humanities) who may not have enough mathematical background (e.g., calculus and linear algebra), there has been university-wide interest in this initiative.


[Scribe: David Weber (GGAM)]