Log in

goodpods headphones icon

To access all our features

Open the Goodpods app
Close icon
Data Engineering Weekly - DEW #120: The Case for Data Contracts, Action-Position data quality assessment framework & Stop emphasizing the Data Catalog

DEW #120: The Case for Data Contracts, Action-Position data quality assessment framework & Stop emphasizing the Data Catalog

03/12/23 • 36 min

Data Engineering Weekly

Please read Data Engineering Weekly Edition #120

Topic 1: Colin Campbell: The Case for Data Contracts - Preventative data quality rather than reactive data quality

In this episode, we focus on the importance of data contracts in preventing data quality issues. We discuss an article by Colin Campbell highlighting the need for a data catalog and the market scope for data contract solutions. We also touch on the idea that data creation will be a decentralized process and the role of tools like data contracts in enabling successful decentralized data modeling. We emphasize the importance of creating high-quality data and the need for technological and organizational solutions to achieve this goal.

Key highlights of the conversation

  • "Preventative data quality rather than reactive data quality. It should start with contracts." - Colin Campbell. - Author of the article
  • "Contracts put a preventive structure in place" - Ashwin.
  • "The successful data-driven companies all do one thing very well. They create high-quality data." - Ananth.

Link:

https://uncomfortablyidiosyncratic.substack.com/p/the-case-for-data-contracts

https://www.dataengineeringweekly.com/p/introducing-schemata-a-decentralized

Topic 2: Yerachmiel Feltzman: Action-Position data quality assessment framework

In this conversation, we discuss a framework for data quality assessment called the Action Position framework. The framework helps define what actions should be taken based on the severity of the data quality problem. We also discuss two patterns for data quality: Write-Audit-Publish (WAP) and Audit-Write-Publish (AWP). The WAP pattern involves writing data, auditing it, and publishing it, while the AWP pattern involves auditing data, writing it, and publishing it. We encourage readers to share their best practices for addressing data quality issues.

Are you using any Data Quality framework in your organization? Do you have any best practices on how you address data quality issues? What do you think of the action-position data quality framework? Please add your comments in the SubStack chat.

Link:

https://medium.com/everything-full-stack/action-position-data-quality-assessment-framework-d833f6b77b7

Dremio WAP pattern: https://www.dremio.com/resources/webinars/the-write-audit-publish-pattern-via-apache-iceberg/

Topic 3: Guy Fighel - Stop emphasizing the Data Catalog

We discuss the limitations of data catalogs and the author’s view on the semantic layer as an alternative. The author argues that data catalogs are passive and quickly become outdated and that a stronger contract with enforced data quality could be a better solution. We also highlight the cost factors of implementing a data catalog and suggest that a more decentralized approach may be necessary to keep up with the increasing number of data sources. Innovation in this space is needed to improve organizations' discoverability and consumption of data assets.

Link:

https://www.linkedin.com/pulse/stop-emphasizing-data-catalog-guy-fighel/

https://www.dataengineeringweekly.com/p/data-catalog-a-broken-promise

plus icon
bookmark

Please read Data Engineering Weekly Edition #120

Topic 1: Colin Campbell: The Case for Data Contracts - Preventative data quality rather than reactive data quality

In this episode, we focus on the importance of data contracts in preventing data quality issues. We discuss an article by Colin Campbell highlighting the need for a data catalog and the market scope for data contract solutions. We also touch on the idea that data creation will be a decentralized process and the role of tools like data contracts in enabling successful decentralized data modeling. We emphasize the importance of creating high-quality data and the need for technological and organizational solutions to achieve this goal.

Key highlights of the conversation

  • "Preventative data quality rather than reactive data quality. It should start with contracts." - Colin Campbell. - Author of the article
  • "Contracts put a preventive structure in place" - Ashwin.
  • "The successful data-driven companies all do one thing very well. They create high-quality data." - Ananth.

Link:

https://uncomfortablyidiosyncratic.substack.com/p/the-case-for-data-contracts

https://www.dataengineeringweekly.com/p/introducing-schemata-a-decentralized

Topic 2: Yerachmiel Feltzman: Action-Position data quality assessment framework

In this conversation, we discuss a framework for data quality assessment called the Action Position framework. The framework helps define what actions should be taken based on the severity of the data quality problem. We also discuss two patterns for data quality: Write-Audit-Publish (WAP) and Audit-Write-Publish (AWP). The WAP pattern involves writing data, auditing it, and publishing it, while the AWP pattern involves auditing data, writing it, and publishing it. We encourage readers to share their best practices for addressing data quality issues.

Are you using any Data Quality framework in your organization? Do you have any best practices on how you address data quality issues? What do you think of the action-position data quality framework? Please add your comments in the SubStack chat.

Link:

https://medium.com/everything-full-stack/action-position-data-quality-assessment-framework-d833f6b77b7

Dremio WAP pattern: https://www.dremio.com/resources/webinars/the-write-audit-publish-pattern-via-apache-iceberg/

Topic 3: Guy Fighel - Stop emphasizing the Data Catalog

We discuss the limitations of data catalogs and the author’s view on the semantic layer as an alternative. The author argues that data catalogs are passive and quickly become outdated and that a stronger contract with enforced data quality could be a better solution. We also highlight the cost factors of implementing a data catalog and suggest that a more decentralized approach may be necessary to keep up with the increasing number of data sources. Innovation in this space is needed to improve organizations' discoverability and consumption of data assets.

Link:

https://www.linkedin.com/pulse/stop-emphasizing-data-catalog-guy-fighel/

https://www.dataengineeringweekly.com/p/data-catalog-a-broken-promise

Previous Episode

undefined - Data Engineering Weekly Radio #120

Data Engineering Weekly Radio #120

We are back in our Data Engineering Weekly Radio for edition #120. We will take 2 or 3 articles from each week's Data Engineering Weekly edition and go through an in-depth analysis.

From editor #120, we took the following articles

Topic 1: Colin Campbell: The Case for Data Contracts - Preventative data quality rather than reactive data quality

In this episode, we focus on the importance of data contracts in preventing data quality issues. We discuss an article by Colin Campbell highlighting the need for a data catalog and the market scope for data contract solutions. We also touch on the idea that data creation will be a decentralized process and the role of tools like data contracts in enabling successful decentralized data modeling. We emphasize the importance of creating high-quality data and the need for technological and organizational solutions to achieve this goal.

Key highlights of the conversation

"Preventative data quality rather than reactive data quality. It should start with contracts." - Colin Campbell. - Author of the article

"Contracts put a preventive structure in place" - Ashwin.

"The successful data-driven companies all do one thing very well. They create high-quality data." - Ananth.

Ananth’s post on Schemata

Topic 2: Yerachmiel Feltzman: Action-Position data quality assessment framework

In this conversation, we discuss a framework for data quality assessment called the Action Position framework. The framework helps define what actions should be taken based on the severity of the data quality problem. We also discuss two patterns for data quality: Write-Audit-Publish (WAP) and Audit-Write-Publish (AWP). The WAP pattern involves writing data, auditing it, and publishing it, while the AWP pattern involves auditing data, writing it, and publishing it. We encourage readers to share their best practices for addressing data quality issues.

Are you using any Data Quality framework in your organization? Do you have any best practices on how you address data quality issues? What do you think of the action-position data quality framework? Please add your comments in the SubStack chat.

https://medium.com/everything-full-stack/action-position-data-quality-assessment-framework-d833f6b77b7

Dremio WAP pattern: https://www.dremio.com/resources/webinars/the-write-audit-publish-pattern-via-apache-iceberg/

Topic 3: Guy Fighel - Stop emphasizing the Data Catalog

We discuss the limitations of data catalogs and the author’s view on the semantic layer as an alternative. The author argues that data catalogs are passive and quickly become outdated and that a stronger contract with enforced data quality could be a better solution. We also highlight the cost factors of implementing a data catalog and suggest that a more decentralized approach may be necessary to keep up with the increasing number of data sources. Innovation in this space is needed to improve organizations' discoverability and consumption of data assets.

Something to think about in this conversation

"If you don't catalog everything and we only catalog what is required for the purpose of business decision-making, does that solve the data catalog problem in an organization?"

https://www.linkedin.com/pulse/stop-emphasizing-data-catalog-guy-fighel/


This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit www.dataengineeringweekly.com

Next Episode

undefined - Podcast: Data Product @ Oda, Reflection Talking with Data Leaders & Great Migration To Snowflake

Podcast: Data Product @ Oda, Reflection Talking with Data Leaders & Great Migration To Snowflake

We are back in our Data Engineering Weekly Radio for edition #121. We will take 2 or 3 articles from each week's Data Engineering Weekly edition and go through an in-depth analysis.

Please subscribe to our Podcast on your favorite apps.

From editor #121, we took the following articles

Oda: Data as a product at Oda

Oda writes an exciting blog about “Data as a Product,” describing why we must treat data as a product, dashboard as a product, and the ownership model for data products.

https://medium.com/oda-product-tech/data-as-a-product-at-oda-fda97695e820

The blog highlights six key principles of the value creation of data.

Domain knowledge + discipline expertise

Distributed Data Ownership and shared Data Ownership

Data as a Product

Enablement over Handover

Impact through Exploration and Experimentation

Proactive attitude towards Data Privacy & Ethics

https://medium.com/oda-product-tech/the-six-principles-for-how-we-run-data-insight-at-oda-ba7185b5af39

Here are a few highlights from the podcast

"Oda builds the whole data product principle & the implementation structure being built on top of the core values, instead of reflecting any industry jargons.”

"Don't make me think. The moment you make your users think, you lose your value proposition as a platform or a product.”

"The platform enables the domain; domain enables your consumer. It's a chain of value creation going on top and like simplifying everyone's life, accessing data, making informed decisions.”

"I think putting that, documenting it, even at the start of it, I think that's where the equations start proving themselves. And that's essentially what product thinking is all about.”

Peter Bruins: Some reflections on talking with Data leaders

Data Mesh/ Data Product/ Data Contract all the concepts trying to address this problem, and this is a Billion $ $ $ worth of a problem to solve. The author leaves a bigger question, Ownership plays a central role in all these concepts, but what is the incentive to bring Ownership?

https://www.linkedin.com/pulse/some-reflections-talking-data-leaders-peter-bruins/

Here are a few highlights from the podcast

"Ownership. It's all about the ownership." - Peter Burns.

"The weight of the success (growth of adoption) of the data leads to its failure.

Faire: The great migration from Redshift to Snowflake

Is Redshift dying? I’m seeing an increasing pattern of people migrating from Redshift to Snowflake or Lakehouse. Flair wrote a detailed blog on the reasoning behind Redshift to Snowflake migration, its journey, and its key takeaway.

https://craft.faire.com/the-great-migration-from-redshift-to-snowflake-173c1fb59a52

Flair also opensource some of the utility scripts to make your life easier to move from Redshift to Snowflake

https://github.com/Faire/snowflake-migration

Here are a few highlights from the podcast

"If you left like one percent of my data is still in Redshift and 99% of your data in Snowflake, you're degrading your velocity and the quality of your delivery.”


This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit www.dataengineeringweekly.com

Episode Comments

Generate a badge

Get a badge for your website that links back to this episode

Select type & size
Open dropdown icon
share badge image

<a href="https://goodpods.com/podcasts/data-engineering-weekly-249164/dew-120-the-case-for-data-contracts-action-position-data-quality-asses-28708825"> <img src="https://storage.googleapis.com/goodpods-images-bucket/badges/generic-badge-1.svg" alt="listen to dew #120: the case for data contracts, action-position data quality assessment framework & stop emphasizing the data catalog on goodpods" style="width: 225px" /> </a>

Copy