Skip to main content
Solved

Data warehouse export - joining?

  • June 4, 2023
  • 1 reply
  • 63 views

I’m trying to model some data from the warehouse sync files (in BigQuery) and I’m facing some issues with duplicate rows when I join Deliveries and Metrics.

What’s the correct way to join these two tables - I’m using delivery_id which is resulting in a few duplicates, but event_id doesn’t seem to be the correct field as it returned incomplete data.

Best answer by Maddie

Hi there Ash,

I believe you’re referring to joining/merging the two Parquet files for Deliveries and Metrics into one table in BigQuery?

The reason you are running into an issue is because each table uses a different primary key (unique identifier for rows in that table). The deliveries table uses delivery_id, while the Metrics table uses event_id (as you’ve mentioned!). The reason each delivery could have multiple event ids in the Metrics table is that each message can generate multiple delivery events (delivered, opened, clicked, etc.).

With that said, there isn’t really a “correct” answer to your question. As how you model your data is up to you depending on your use case and company goal to achieve with the merged table. 

I hope the above helps to provide some background into what is happening and assist you in making a decision about what you would like to do with the data tables.

- Maddie

View original
Did this topic help you find an answer to your question?

1 reply

  • 14 replies
  • Answer
  • June 5, 2023

Hi there Ash,

I believe you’re referring to joining/merging the two Parquet files for Deliveries and Metrics into one table in BigQuery?

The reason you are running into an issue is because each table uses a different primary key (unique identifier for rows in that table). The deliveries table uses delivery_id, while the Metrics table uses event_id (as you’ve mentioned!). The reason each delivery could have multiple event ids in the Metrics table is that each message can generate multiple delivery events (delivered, opened, clicked, etc.).

With that said, there isn’t really a “correct” answer to your question. As how you model your data is up to you depending on your use case and company goal to achieve with the merged table. 

I hope the above helps to provide some background into what is happening and assist you in making a decision about what you would like to do with the data tables.

- Maddie


Reply


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings