4 + 1 Model of Data Science

08/24/22 • 8 min

Before diving into the complex world of data science it seemed to wise to establish a shared definition of the field. Here at the UVA School of Data Science, we have defined data science with the 4 + 1 Model. This model serves an outline for the first series of UVA Data Points. It also serves as a guiding definition within the School of Data Science, touching everything from research to course planning.

In this introduction trailer, host Monica Manney discusses the history, development, and function of the 4 + 1 Model of Data Science with its main author, Raf Alvarado.

Below is a brief expect from An Outline of the 4 + 1 Model of Data Science by Raf Alvarado:

“The point of the 4 + 1 model, abstract as it is, is to provide a practical template for strategically planning the various elements of a school of data science. To serve as an effective template, a model must be general. But generality if often purchased at the cost of intuitive understanding. The following caveats may help make sense of the model when considering its usefulness when applied to various concrete activities.

The model describes areas of academic expertise, not objective reality. It is a map of a division of labor writ large. Although each of the areas has clear connections to the others, the question to ask when deciding where an activity belongs is: who would be an expert at doing it? The realms help refine this question: the analytics area, for example, contains people who are good at working with abstract machinery. The four areas have the virtue of isolating intuitively correct communities of expertise. For example, people who are great at data product design may not know the esoteric depths of machine learning, and that adepts at machine learning are not usually experts in understanding human society and normative culture.

Each area in the model contains a collection of subfields that need to be teased out. Some areas will have more subfields than others. Although some areas may be smaller than others in terms of number of experts (faculty) and courses, each area has a major impact on the overall practice of data science and the quality of the school’s activities. In addition, these subfields are in an important sense “more real” than the categories. We can imagine them forming a dense network in which the areas define communities with centroids, and which are more interconnected than the clean-cut image of the model implies.

The areas of the model are like the components of a principal component analysis of the vector space of data science. They capture the variance that exists within the field, and, crucially, provide a framework for realigning (rebasing) the academy along a new set of axes. One effect of this is to both disperse and recombine older fields, such as computer science, statistics, and operations research, into new clusters. Thus we separate computer science subfields such as complexity analysis and database design. One possible salutary result of this will be the formation of new syntheses of fields that share concerns but differ in vocabularies and customs..."

08/24/22 • 8 min

Generate a badge

Get a badge for your website that links back to this episode

Select type & size

Copy