Ethical Data Science with Cathy O'Neil

09/01/22 • 27 min

UVA Data Points sits down with Cathy O'Neil, author of Weapons of Math Destruction, and Brian Wright, Assistant Professor of Data Science at the University of Virginia. The candid dialogue ranges from O'Neil's new book The Shame Machine to her work as an algorithm audit consultant. The two also draw comparisons between data science problems and knitting, as well as discuss educating future data scientists.

Links:

https://mathbabe.org (Cathy O'Neil's website)

https://datascience.virginia.edu (UVA School of Data Science website)

Books mentioned:

The Shame Machine

Weapons of Math Destruction

Links:

https://mathbabe.org (Cathy O'Neil's website)

https://datascience.virginia.edu (UVA School of Data Science website)

Books mentioned:

The Shame Machine

Weapons of Math Destruction

Previous Episode

4 + 1 Model of Data Science

Before diving into the complex world of data science it seemed to wise to establish a shared definition of the field. Here at the UVA School of Data Science, we have defined data science with the 4 + 1 Model. This model serves an outline for the first series of UVA Data Points. It also serves as a guiding definition within the School of Data Science, touching everything from research to course planning.

In this introduction trailer, host Monica Manney discusses the history, development, and function of the 4 + 1 Model of Data Science with its main author, Raf Alvarado.

Below is a brief expect from An Outline of the 4 + 1 Model of Data Science by Raf Alvarado:

“The point of the 4 + 1 model, abstract as it is, is to provide a practical template for strategically planning the various elements of a school of data science. To serve as an effective template, a model must be general. But generality if often purchased at the cost of intuitive understanding. The following caveats may help make sense of the model when considering its usefulness when applied to various concrete activities.

The model describes areas of academic expertise, not objective reality. It is a map of a division of labor writ large. Although each of the areas has clear connections to the others, the question to ask when deciding where an activity belongs is: who would be an expert at doing it? The realms help refine this question: the analytics area, for example, contains people who are good at working with abstract machinery. The four areas have the virtue of isolating intuitively correct communities of expertise. For example, people who are great at data product design may not know the esoteric depths of machine learning, and that adepts at machine learning are not usually experts in understanding human society and normative culture.

Each area in the model contains a collection of subfields that need to be teased out. Some areas will have more subfields than others. Although some areas may be smaller than others in terms of number of experts (faculty) and courses, each area has a major impact on the overall practice of data science and the quality of the school’s activities. In addition, these subfields are in an important sense “more real” than the categories. We can imagine them forming a dense network in which the areas define communities with centroids, and which are more interconnected than the clean-cut image of the model implies.

The areas of the model are like the components of a principal component analysis of the vector space of data science. They capture the variance that exists within the field, and, crucially, provide a framework for realigning (rebasing) the academy along a new set of axes. One effect of this is to both disperse and recombine older fields, such as computer science, statistics, and operations research, into new clusters. Thus we separate computer science subfields such as complexity analysis and database design. One possible salutary result of this will be the formation of new syntheses of fields that share concerns but differ in vocabularies and customs..."

Next Episode

Creating a Biased-Free Job Board with Matthew Thomas

This bonus episode features Matthew Thomas, a data scientist at Inclusively and a graduate of the UVA M.S. in Data Science program. He talks about how Inclusively works to create and maintain a job board designed specifically for job seekers with disabilities. Matthew explains how typical job boards come with many built-in biases that can screen out qualified individuals without them even knowing. He discusses the challenges of removing biases from algorithms and the importance of honesty and self-criticism when examining a data science project.

As Cathy O’Neil challenged in Episode 1 of UVA Data Points, we should always ask ourselves, “For whom does this fail?” Matthew’s work is a good illustration of this sentiment in practice. In addition to discussing his work, Matthew also gives solid career advice for anyone seeking a similar career path in data science.