{"id":320,"date":"2021-05-25T10:41:29","date_gmt":"2021-05-25T08:41:29","guid":{"rendered":"https:\/\/stage.usercentrics.com\/?post_type=knowledge&#038;p=9166"},"modified":"2025-02-04T16:18:48","modified_gmt":"2025-02-04T15:18:48","slug":"optimizing-your-bigquery-tables-using-partitioning-time-unit-column-partitioned-tables","status":"publish","type":"knowledge","link":"https:\/\/usercentrics.com\/us\/knowledge-hub\/optimizing-your-bigquery-tables-using-partitioning-time-unit-column-partitioned-tables\/","title":{"rendered":"Optimizing your BigQuery tables using partitioning: Time-unit column-partitioned tables"},"content":{"rendered":"\n<p><span style=\"font-weight: 400;\">Especially when working with Big Data, costs can quickly explode, and performance can degrade fast when data starts to pile up.<\/span><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">BigQuery offers the possibility to create partitioned tables that helps to prevent exploding costs and decreasing performance.<\/span><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">Of course, the use case has to fit the idea behind partitioning even though most Big Data use cases should fit there in one or another way.<\/span><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">Google provides three different ways to partition BigQuery tables:<\/span><\/p>\n\n\n\n<p><b>Ingestion Time<\/b><span style=\"font-weight: 400;\"> \u2014 Tables are partitioned based on the time they ingestion time.<\/span><\/p>\n\n\n\n<p><b>Time-unit column<\/b><span style=\"font-weight: 400;\"> \u2014 Tables are partitioned based on a time-unit column. Valid values here are <\/span><i><span style=\"font-weight: 400;\">TIMESTAMP<\/span><\/i><span style=\"font-weight: 400;\">, <\/span><i><span style=\"font-weight: 400;\">DATE<\/span><\/i><span style=\"font-weight: 400;\">, and <\/span><i><span style=\"font-weight: 400;\">DATETIME<\/span><\/i><span style=\"font-weight: 400;\">.<\/span><\/p>\n\n\n\n<p><b>Integer ranged<\/b><span style=\"font-weight: 400;\"> \u2014 Tables are partitioned based on an integer column.<\/span><\/p>\n\n\n<div class=\"uc-notice\">\n    <div class=\"uc-notice__icon\">\n        <svg width=\"24\" height=\"24\" viewBox=\"0 0 24 24\" fill=\"none\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\">\n<path d=\"M10.8177 17.0093H12.8177V11.0093H10.8177V17.0093ZM11.8177 9.00928C12.1011 9.00928 12.3386 8.91344 12.5302 8.72178C12.7219 8.53011 12.8177 8.29261 12.8177 8.00928C12.8177 7.72594 12.7219 7.48844 12.5302 7.29678C12.3386 7.10511 12.1011 7.00928 11.8177 7.00928C11.5344 7.00928 11.2969 7.10511 11.1052 7.29678C10.9136 7.48844 10.8177 7.72594 10.8177 8.00928C10.8177 8.29261 10.9136 8.53011 11.1052 8.72178C11.2969 8.91344 11.5344 9.00928 11.8177 9.00928ZM11.8177 22.0093C10.4344 22.0093 9.13442 21.7468 7.91775 21.2218C6.70108 20.6968 5.64275 19.9843 4.74275 19.0843C3.84275 18.1843 3.13025 17.1259 2.60525 15.9093C2.08025 14.6926 1.81775 13.3926 1.81775 12.0093C1.81775 10.6259 2.08025 9.32594 2.60525 8.10928C3.13025 6.89261 3.84275 5.83428 4.74275 4.93428C5.64275 4.03428 6.70108 3.32178 7.91775 2.79678C9.13442 2.27178 10.4344 2.00928 11.8177 2.00928C13.2011 2.00928 14.5011 2.27178 15.7177 2.79678C16.9344 3.32178 17.9928 4.03428 18.8927 4.93428C19.7927 5.83428 20.5052 6.89261 21.0302 8.10928C21.5552 9.32594 21.8177 10.6259 21.8177 12.0093C21.8177 13.3926 21.5552 14.6926 21.0302 15.9093C20.5052 17.1259 19.7927 18.1843 18.8927 19.0843C17.9928 19.9843 16.9344 20.6968 15.7177 21.2218C14.5011 21.7468 13.2011 22.0093 11.8177 22.0093Z\" fill=\"black\"\/>\n<\/svg>\n    <\/div>\n    <div class=\"uc-notice__content\">\n                <p>Read about <a href=\"https:\/\/usercentrics.com\/guides\/future-of-data-in-marketing\/big-data-marketing\/\">Big data marketing<\/a> now<\/p>\n            <\/div>\n<\/div>\n\n\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-what-is-time-unit-partitioning-and-how-does-it-work\">What is time-unit partitioning, and how does it work?<\/h2>\n\n\n\n<p><span style=\"font-weight: 400;\">Easy put partitioning a table is like splitting up the table into several &#8220;sub&#8221;-tables. Each of these partitioned tables has its unique key for easy and fast access.<\/span><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">When using partitions, you can run your queries only on a specific set of the partitioned tables and thus are saving many rows your query doesn&#8217;t have to iterate to check for your conditions.<\/span><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">Using &#8220;time-unit column&#8221; partitioning in BigQuery works like the following:<\/span><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">You specify a column of your table with the type <\/span><i><span style=\"font-weight: 400;\">TIMESTAMP<\/span><\/i><span style=\"font-weight: 400;\">, <\/span><i><span style=\"font-weight: 400;\">DATE<\/span><\/i><span style=\"font-weight: 400;\">, or <\/span><i><span style=\"font-weight: 400;\">DATETIME<\/span><\/i><span style=\"font-weight: 400;\"> and how granular the partitions should be. Valid values here are hourly, daily, monthly, and yearly. If you choose daily, all partitioned tables will each contain all rows of a specific day, so you can quickly get all rows or filter the rows for a particular day without running through ALL the other partitioned tables.&nbsp;<\/span><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">Assuming that you write at least one entry per day over one year (365 days) and are using daily partitioning, you will have 365 partitioned tables at the end of the year. That is because each partition reflects one day of data.<\/span><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">You can see a &#8220;normal&#8221; non-partitioned table and right afterward the same table with a daily partition on the &#8220;Day&#8221; column in the following two images.<\/span><\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter wp-image-10987 size-full\"><img decoding=\"async\" src=\"https:\/\/usercentrics.com\/wp-content\/uploads\/sites\/7\/2021\/07\/PARTITIONED_TABLE.png\" alt=\"PARTITIONED_TABLE\" class=\"wp-image-10987\"\/><figcaption class=\"wp-element-caption\">PARTITIONED_TABLE<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image aligncenter wp-image-10986 size-full\"><img decoding=\"async\" src=\"https:\/\/usercentrics.com\/wp-content\/uploads\/sites\/7\/2021\/07\/NON_PARTITIONED_TABLE.png\" alt=\"NON_PARTITIONED\" class=\"wp-image-10986\"\/><figcaption class=\"wp-element-caption\">NON_PARTITIONED_TABLE<\/figcaption><\/figure>\n\n\n\n<p><span style=\"font-weight: 400;\">Keep in mind that making actual use of the partitioning, your queries will also have to reflect it. For example, it doesn&#8217;t benefit when you have your table partitioned by a date column but then not filtering for the date, which results in a full table scan again.<\/span><\/p>\n\n\n\n<p><b>How much can it improve costs\/performance?<\/b><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">Now that we went over the basics, one of the most crucial questions is &#8220;How much can it improve costs\/performance?&#8221;. No answer fits all use cases but let&#8217;s provide an example you can use as a blueprint for your calculations.<\/span><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">To get to some concrete values, let&#8217;s assume the following:<\/span><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><span style=\"font-weight: 400;\">You use daily partitioning since you have to query data either on one day or a day range.<\/span><\/li>\n\n\n\n<li><span style=\"font-weight: 400;\">You already have six years&#8217; worth of data resulting in ~2190 days \/ partitioned tables.<\/span><\/li>\n\n\n\n<li><span style=\"font-weight: 400;\">You evenly distributed your data across all the days (of course, somewhat unlikely, but we have to make an assumption here).<\/span><\/li>\n\n\n\n<li><span style=\"font-weight: 400;\">The total size of your data is 72 TB -> 12 TB per years -> 1 TB per month -> ~33 GB per day<\/span><\/li>\n<\/ul>\n\n\n\n<p><span style=\"font-weight: 400;\">For non-partitioned tables, your queries will always have to do a full table scan resulting in 7,2e+13 Bytes (72 TB) to be processed and billed on each query.<\/span><\/p>\n\n\n\n<p><b>Example #1: Running a query on one particular day<\/b><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">Requesting data only for one specific date means that the query can access the needed partition directly and process this one partition but doesn&#8217;t have to process the other 2189.<\/span><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">Above, we assumed that one day\/partition contains around 3,3e+10 (33 GB) of data. That means we would only process ~0.046% (33 GB \/ 72,000 GB or 1 \/ 2190) of the whole dataset resulting in a <\/span><b>99.954%<\/b><span style=\"font-weight: 400;\"> reduced usage in data and thus costs!<\/span><\/p>\n\n\n\n<p><b>Example #2: Running a query for a 30-day range<\/b><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">Requesting a 30-day range is analogous to the above example and can be simplified to 30 \/ 2190 =&gt; 1.37%. Even though that is way more as in example #1, it is still a reduction of <\/span><b>98.6%<\/b><span style=\"font-weight: 400;\"> in size and thus costs compared to a full table scan!<\/span><\/p>\n\n\n\n<p><b>Actual costs of the examples:<\/b><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">Now let&#8217;s see what the above example queries would cost each compared to a non-partitioned table. Google charges $5.00 per TB according to their official pricing docs resulting in the following values.<\/span><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">Non-partitioned table: $5.00 * 72 TB = <\/span><b>$360.00<\/b><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">Example #1: $5.00 * 0.033 TB = <\/span><b>$0.165<\/b><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">Example #2: $5.00 * 0.99 TB = <\/span><b>$4.95<\/b><\/p>\n\n\n\n<figure class=\"wp-block-image alignnone wp-image-9169 size-full\"><img decoding=\"async\" src=\"https:\/\/usercentrics.com\/wp-content\/uploads\/sites\/7\/2021\/05\/costs_per_query.svg\" alt=\"Costs per Query\" class=\"wp-image-9169\"\/><figcaption class=\"wp-element-caption\">Costs per Query<\/figcaption><\/figure>\n\n\n\n<p><span style=\"font-weight: 400;\">The difference is pretty stunning, right?&nbsp;<\/span><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">Of course, partitioning will increase the performance proportional to the cost reduction. When the query has to evaluate 99% fewer rows, you can imagine that it needs around 99% less time to finish.<\/span><\/p>\n\n\n\n<p><b>Why you save even more money with partitioning<\/b><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">Even though the above examples are already a super hefty cost reducer, there is even more to it when we think about the second cost driver in BigQuery. Storage costs.<\/span><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">As already mentioned in the <a href=\"https:\/\/usercentrics.com\/knowledge-hub\/deal-with-google-bigquery\/\">first article<\/a><\/span><span style=\"font-weight: 400;\">, BigQuery has the concept of active and long-term storage. While Google charges only half for long-term storage, no disadvantages are coming with it. The only criterion is that there is no modification of the table data for at least 90 days. When you use partitioned tables, each partition is considered separately in regards to long-term storage pricing.<\/span><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">Let&#8217;s take our above assumptions regarding data size ( 33 GB per day) and distribution ( evenly ) and check its storage costs. Also, we further assume that in the last 90 days, we added some data every day.<\/span><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">This assumption results in the non-partitioned table never reaching the long-term storage status since there are daily changes to the table. On the other hand, the partitioned table always has just 90 partitions in the active storage and the remaining 2100 ones in the long-term one.<\/span><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">For active storage, Google currently charges $0.02 per GB for the location US (multi-region).<\/span><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">Non-partitioned table: $0.02 * 33 GB * 2190 = <\/span><b>$1445.40 per Month<\/b><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">Partitioned table: $0.02 * 33 GB * 90 days + $0.01 * 33 GB * 2100 days = $59.40 + $693.00 = <\/span><b>$752.40 per Month<\/b><\/p>\n\n\n\n<figure class=\"wp-block-image alignnone wp-image-9170 size-full\"><img decoding=\"async\" src=\"https:\/\/usercentrics.com\/wp-content\/uploads\/sites\/7\/2021\/05\/costs_per_month.svg\" alt=\"Costs per Month\" class=\"wp-image-9170\"\/><figcaption class=\"wp-element-caption\">Costs per Month<\/figcaption><\/figure>\n\n\n\n<p><span style=\"font-weight: 400;\">As you can see, with partitioning, you can not only improve your on-demand analysis costs significantly but also reduce the storage costs by a lot.<\/span><\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-hands-on\">Hands-On<\/h2>\n\n\n<p><span style=\"font-weight: 400;\">Now let&#8217;s do some hands-on on some actual implementation and analyze the query results.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Since we already handled the setup of a Node.js BigQuery project in the <a href=\"https:\/\/usercentrics.com\/knowledge-hub\/working-with-data-in-bigquery-using-node-js\/\">last article<\/a> <\/span><span style=\"font-weight: 400;\">, we won&#8217;t repeat it here. I also expect a globally initialized variable called &#8216;bigquery&#8217; to exist in the following code snippets and that a BigQuery dataset is available.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">We first have to create both tables with the same data set for comparing some actual \u2014 even though relatively small \u2014 non-partitioned table with a partitioned table.<\/span><\/p>\n<p><b>Creating new tables with data from a CSV file<\/b><\/p>\n<p><span style=\"font-weight: 400;\">For this example, I prepared some CSV file containing ~72k of rows in the following format: <a href=\"https:\/\/drive.google.com\/drive\/folders\/1WyFmvUXgZIrlFfw6qRiwvB6E6BIQg5LH?usp=sharing\" target=\"_blank\" rel=\"noopener\">dump.csv<\/a><\/span><\/p>\n<p><span style=\"font-weight: 400;\">date,name,event<\/span><\/p>\n<p><span style=\"font-weight: 400;\">2021-01-04 00:00:00.000000,Adam,4<\/span><\/p>\n<p><span style=\"font-weight: 400;\">2021-01-01 00:00:00.000000,Tom,2<\/span><\/p>\n<p><span style=\"font-weight: 400;\">2021-01-02 00:00:00.000000,Jay,1<\/span><\/p>\n<p><span style=\"font-weight: 400;\">2021-01-03 00:00:00.000000,Jay,3<\/span><\/p>\n<p><span style=\"font-weight: 400;\">&#8230;.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The first row &#8216;date&#8217; will be of the type <\/span><i><span style=\"font-weight: 400;\">DATETIME<\/span><\/i><span style=\"font-weight: 400;\"> and thus in the following format: <\/span><i><span style=\"font-weight: 400;\">YYYY-MM-DD HH:MM:SS[.SSSSSS]<\/span><\/i><\/p>\n<p><span style=\"font-weight: 400;\">The actual values are not that important; I just made sure that I have four different dates mentioned there, so BigQuery creates four partitions I can later query accordingly. I then named the CSV file &#8216;dump.csv&#8217; and added it beneath my script file &#8216;importCSV.js&#8217;, which content you can see in the following code snippet.<\/span><\/p>\n<div id=\"attachment_9209\" style=\"width: 989px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-9209\" class=\"size-full wp-image-9209\" src=\"https:\/\/usercentrics.com\/wp-content\/uploads\/sites\/7\/2021\/05\/Code.png\" alt=\"Code - import csv\" width=\"979\" height=\"1187\" \/><p id=\"caption-attachment-9209\" class=\"wp-caption-text\">Code &#8211; import csv<\/p><\/div>\n<p><span style=\"font-weight: 400;\">The &#8216;importCSV()&#8217; method in the above snippet is similar to what you should know from the previous <a href=\"https:\/\/usercentrics.com\/knowledge-hub\/working-with-data-in-bigquery-using-node-js\/\">article<\/a><\/span><span style=\"font-weight: 400;\">. The difference is that instead of calling some &#8216;createTable(&#8230;)&#8217;, we are referencing the &#8216;table(&#8230;)&#8217; and then loading initial data together with some schema definition via the &#8216;load(&#8230;)&#8217; method into it.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The &#8220;partition-specific&#8221; part of the &#8216;importCSV&#8217; method is only the &#8216;timePartitioning&#8217; variable passed into the function.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">When you check the object, the script passes in for this parameter; it is pretty straightforward. We define the type as &#8216;DAY&#8217;, telling BigQuery that it should create daily partitioned tables. Also, via the &#8216;field&#8217; property, we command BigQuery that it should use the column called &#8216;date&#8217; as the partition column. You can find the official docs for the &#8216;TimePartitioning&#8217; object <\/span><a href=\"https:\/\/cloud.google.com\/bigquery\/docs\/reference\/rest\/v2\/tables#timepartitioning\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">here<\/span><\/a><span style=\"font-weight: 400;\">.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">After we ran the above script and have created the two tables, we can now run and compare our queries with the following snippet.<\/span><\/p>\n<div id=\"attachment_9211\" style=\"width: 987px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-9211\" class=\"size-full wp-image-9211\" src=\"https:\/\/usercentrics.com\/wp-content\/uploads\/sites\/7\/2021\/05\/Query-statistics.png\" alt=\"Query statistics\" width=\"977\" height=\"1689\" \/><p id=\"caption-attachment-9211\" class=\"wp-caption-text\">Query statistics<\/p><\/div>\n<p><span style=\"font-weight: 400;\">The script is kept very simple. In the &#8216;runQueryWithStatistics()&#8217;, we run the provided query and afterward pass the finished job to some method that prints the job statistics to the console. The critical part here is that we set &#8216;useQueryCache: false&#8217; so BigQuery doesn&#8217;t use any caching in the background. That is important as we can&#8217;t compare metrics when served by cache. Of course, you should always keep this flag on &#8216;true&#8217; \u2014 the default \u2014 when running in production!<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The &#8216;printJobStatistics()&#8217; method is even more simple. It more or less only accesses certain relevant properties and prints them to the console. We will discuss the individual metrics a bit later.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Let&#8217;s move further to the &#8216;main()&#8217; method. Here the variables and a helper method for creating our query are defined. The &#8216;createQuery()&#8217; only takes the &#8216;tableId&#8217; as the input parameter to generate the same query for the non-partitioned and partitioned table that only differs by the referenced table.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Afterward, we run the &#8216;runQueryWithStatistics()&#8217; method. Once on the &#8216;non_partitioned_table&#8217;-table and once on the &#8216;partitioned_table&#8217;-table and log the results.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">You can see the output of the above script in the following snippet.<\/span><\/p>\n<div id=\"attachment_9215\" style=\"width: 1950px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-9215\" class=\"size-full wp-image-9215\" src=\"https:\/\/usercentrics.com\/wp-content\/uploads\/sites\/7\/2021\/05\/QUERY-STATISTICS-RESULT.png\" alt=\"QUERY STATISTICS RESULT\" width=\"1940\" height=\"1146\" \/><p id=\"caption-attachment-9215\" class=\"wp-caption-text\">QUERY STATISTICS RESULT<\/p><\/div>\n<p><b>Understanding the statistics<\/b><\/p>\n<p><span style=\"font-weight: 400;\">Now that we have the job statistics, we have to understand and interpret them accordingly to verify our assumptions.<\/span><\/p>\n<p><b>Cache hit<\/b><span style=\"font-weight: 400;\"> \u2014 This value only tells us that the cache wasn&#8217;t hit, and we can be sure that we didn&#8217;t receive a cached result that tamper with our metrics.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Since we explicitly set the cache to be disabled, this should always be &#8216;false&#8217;. If it appears to be &#8216;true&#8217;, something is wrong.<\/span><\/p>\n<p><b>Time taken<\/b><span style=\"font-weight: 400;\"> \u2014 Defines the time it took to run the query from its start till the end time.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Here we can see that the partitioned table is a bit faster (~19%). Based on my experience, I can say that this isn&#8217;t meaningful in our example since we are still in the milliseconds&#8217; range and having only around 2 MB in data. Here side effects take a too big part in the time taken compared to the actual query.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">This metric becomes significant if you reach several seconds or minutes of query processing time.<\/span><\/p>\n<p><b>Partitions processed<\/b><span style=\"font-weight: 400;\"> \u2014 Indicates how many partitions the query read. In the case of non-partitioned tables, this is always 0.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">In our partitioned table&#8217;s output, we can see that it only read one partition since only one specific date was being queried and not a date range.<\/span><\/p>\n<p><b>Rows read<\/b><span style=\"font-weight: 400;\"> \u2014 How many rows the query had to read to finish.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Here we can already see that the two variants strongly differ. We have ~73k rows in the non-partition table case and only ~12k rows in the partition table. The rows read are already reduced by ~83.6%, reflecting in the following metric, &#8220;bytes processed&#8221;.<\/span><\/p>\n<p><b>Bytes processed<\/b><span style=\"font-weight: 400;\"> \u2014 The number of bytes the query had to process.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Similar to &#8220;rows read&#8221;, we can see an apparent reduction in size here. While the non-partitioned query has to scan the full ~1.6 MB of data, the partitioned one only had to process ~0.26 MB resulting in a reduction of ~83.4%.<\/span><\/p>\n<p><b>Bytes billed<\/b><span style=\"font-weight: 400;\"> \u2014 The number of bytes Google is charging you for this query.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Normally one would now expect that the &#8220;bytes billed&#8221; would also differ by around 83%. But the alert reader will notice that the values are the same.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">This first unexpected behavior has a relatively simple origin. When checking the <\/span><a href=\"https:\/\/cloud.google.com\/bigquery\/pricing#on-demand-pricing-details\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">pricing details page of BigQuery<\/span><\/a><span style=\"font-weight: 400;\"> you will find the following quote:<\/span><\/p>\n<p><span style=\"font-weight: 400;\">&#8220;&#8230; with a minimum 10 MB data processed per query.&#8221;<\/span><\/p>\n<p><span style=\"font-weight: 400;\">In our case, where we only process 1.6 MB and 0.26 MB in our two queries, Google rounds up the costs to a minimum of 10 MB of bytes billed, reflecting it in our statistics.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">In the case of actual data, when you have several GB, TB, or even PB, the &#8220;bytes billed&#8221; will significantly differ and behave proportionally to the &#8220;bytes processed&#8221;.<\/span><\/p>\n\n\n<h2 class=\"wp-block-heading\" id=\"h-partitioning-quotas-and-common-pitfalls-to-be-aware-of\">Partitioning quotas and common pitfalls to be aware of<\/h2>\n\n\n<p><b>The maximum number of partitions <\/b><span style=\"font-weight: 400;\">\u2014<\/span> <span style=\"font-weight: 400;\">Using partitioning on BigQuery tables, there is a hard limit of 4,000 partitions per partitioned table.<\/span><\/p>\n<p><span style=\"text-decoration: underline;\"><span style=\"font-weight: 400;\">Example:\u00a0<\/span><\/span><\/p>\n<p><span style=\"font-weight: 400;\">When you are using daily partitioning on your table, you can cover at most 4,000 days (10.96 years) of data. Writing data for more days into the table will result in errors or rejections on BigQuery&#8217;s side.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Even though it&#8217;s unlikely that you run into this very fast with daily partitioning, using hourly partitioning, you will run into it after approximately 5.47 months.<\/span><\/p>\n<p><b>Streaming data into partitioned tables <\/b><span style=\"font-weight: 400;\">\u2014 When streaming data into partitioned tables, one could expect that BigQuery adds the data immediately to the correct partition. <\/span><a href=\"https:\/\/cloud.google.com\/bigquery\/streaming-data-into-bigquery#streaming_into_partitioned_tables\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">Unfortunately, that is not true.<\/span><\/a><span style=\"font-weight: 400;\"> BigQuery will first write streaming data to a temporary partition called &#8216;<\/span><i><span style=\"font-weight: 400;\">__UNPARTITIONED__<\/span><\/i><span style=\"font-weight: 400;\">&#8216; while the data is in the streaming buffer. Currently, there is no way to flush this buffer manually. Thus you have to wait till BigQuery automatically flushes this buffer after a certain amount of unpartitioned data is available or after a certain amount of time has passed. Google doesn&#8217;t provide any exact numbers regarding the mentioned amounts.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Even though when running the BigQuery command:<\/span><\/p>\n<p><span style=\"font-weight: 400;\">&#8216;<\/span><i><span style=\"font-weight: 400;\">bq show &#8211;format=prettyjson YOUR_GCP_PROJECT:DATASET_ID.TABLE_ID<\/span><\/i><span style=\"font-weight: 400;\">&#8216;<\/span><\/p>\n<p><span style=\"font-weight: 400;\">You can check the properties &#8216;estimatedBytes&#8217; and &#8216;estimatedRows&#8217; on the &#8216;streamingBuffer&#8217; object to see how many bytes and rows are currently in the streaming buffer.<\/span><\/p>\n<p><b>Require partition filter <\/b><span style=\"font-weight: 400;\">\u2014<\/span> <span style=\"font-weight: 400;\">When working with partitioned tables, it is always good practice to enable the &#8216;requirePartitionFilter&#8217; flag for the partitioned table when creating it. Having this flag in place, BigQuery will always demand a predicate filter on the SQL query that tries to access the table. This flag helps prevent queries that would accidentally trigger a full table scan instead of accessing a particular partition or set of partitions and would result in high costs.<\/span><\/p>\n<p><b>Don&#8217;t &#8216;OR&#8217; your partition filter <\/b><span style=\"font-weight: 400;\">\u2014 One might think it&#8217;s enough to have the partitioned column in his SQL statement&#8217;s filter condition. Even though using &#8216;OR&#8217; will still trigger a full table scan, it doesn&#8217;t need the partition filter to match.<\/span><\/p>\n<p><span style=\"text-decoration: underline;\"><span style=\"font-weight: 400;\">Example:<\/span><\/span><\/p>\n<p><span style=\"font-weight: 400;\">Will trigger a full table scan:\u00a0<\/span><\/p>\n<p><i><span style=\"font-weight: 400;\">WHERE (partition_column = &#8220;2021-01-01&#8221; OR f = &#8220;a&#8221;<\/span><\/i><\/p>\n<p><span style=\"font-weight: 400;\">Won&#8217;t trigger a full table scan:<\/span><\/p>\n<p><i><span style=\"font-weight: 400;\">WHERE (partition_column = &#8220;2021-01-01&#8221; AND f = &#8220;a&#8221;<\/span><\/i><\/p>\n<p><span style=\"font-weight: 400;\">As a best practice, I always advise to have the partition filter condition isolated and then &#8216;AND&#8217; it with the actual filters you want to apply to the rows. That also helps to prevent wrong\/expensive queries.<\/span><\/p>\n<p><span style=\"text-decoration: underline;\"><span style=\"font-weight: 400;\">Example:<\/span><\/span><\/p>\n<p><i><span style=\"font-weight: 400;\">WHERE partition_column = &#8220;2021-01-01&#8221; AND ( ALL_THE_OTHER_CONDITIONS )<\/span><\/i><\/p>\n\n<p><strong>What&#8217;s next?<\/strong><\/p>\n<p><span style=\"font-weight: 400;\">In the following articles, we will learn how to create and work with clustered tables in BigQuery to improve performance even further.<\/span><\/p>\n\n\n<h2 class=\"wp-block-heading\" id=\"h-author\">Author<\/h2>\n\n\n<p><strong>Pascal Zwikirsch<\/strong><\/p>\n<p>Pascal Zwikirsch is a Technical Team Lead at Usercentrics. He manages a cross-functional team of several frontend and backend developers while providing a seamless connection between the technical world and the product managers and stakeholders.<\/p>\n<p>Pascal was a former frontend developer for several years and now specializes in backend services, cloud architectures, and DevOps principles<b>.<\/b>\u00a0He experienced both worlds in detail.\u00a0This experience and the collaboration with product managers made him capable of constructing full-fledged scalable software.<\/p>\n<p>Besides his job as a Team Lead of the B2B developer team, he likes to share his in-depth knowledge and insights in articles he creates as an independent technical writer.<\/p>\n<p><strong>For more content, follow me on\u00a0<a href=\"https:\/\/www.linkedin.com\/in\/pascal-zwikirsch-3a95a1177\/\" target=\"_blank\" rel=\"noopener\">LinkedIn<\/a><\/strong><\/p>","protected":false},"excerpt":{"rendered":"<p>Especially when working with Big Data, costs can quickly explode, and performance can degrade fast when data starts to pile up. BigQuery offers the possibility to create partitioned tables that helps to prevent exploding costs and decreasing performance. Of course, the use case has to fit the idea behind partitioning even though most Big Data [&hellip;]<\/p>\n","protected":false},"featured_media":2080,"template":"","meta":{"_acf_changed":false,"editor_notices":[],"footnotes":""},"tags":[],"magazine_issue":[],"magazine_tag":[],"resource_tag":[14],"class_list":["post-320","knowledge","type-knowledge","status-publish","has-post-thumbnail","hentry","resource_tag-privacy"],"acf":[],"yoast_head":"<title>Optimizing your BigQuery tables using time-unit partitioning<\/title>\n<meta name=\"description\" content=\"BigQuery offers the possibility to create partitioned tables that helps to prevent exploding costs and decreasing performance. Read more!\" \/>\n<meta name=\"robots\" content=\"noindex, follow\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Optimizing your BigQuery tables using time-unit partitioning\" \/>\n<meta property=\"og:description\" content=\"BigQuery offers the possibility to create partitioned tables that helps to prevent exploding costs and decreasing performance. Read more!\" \/>\n<meta property=\"og:url\" content=\"https:\/\/usercentrics.com\/us\/knowledge-hub\/optimizing-your-bigquery-tables-using-partitioning-time-unit-column-partitioned-tables\/\" \/>\n<meta property=\"og:site_name\" content=\"Usercentrics - US\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/usercentrics\" \/>\n<meta property=\"article:modified_time\" content=\"2025-02-04T15:18:48+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/usercentrics.com\/wp-content\/uploads\/2021\/03\/Google-BigQuery.png\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:title\" content=\"Optimizing your BigQuery tables using time-unit partitioning\" \/>\n<meta name=\"twitter:description\" content=\"BigQuery offers the possibility to create partitioned tables that helps to prevent exploding costs and decreasing performance. Read more!\" \/>\n<meta name=\"twitter:image\" content=\"https:\/\/usercentrics.com\/wp-content\/uploads\/2021\/03\/Google-BigQuery.png\" \/>\n<meta name=\"twitter:site\" content=\"@usercentrics\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"14 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/usercentrics.com\\\/us\\\/knowledge-hub\\\/optimizing-your-bigquery-tables-using-partitioning-time-unit-column-partitioned-tables\\\/\",\"url\":\"https:\\\/\\\/usercentrics.com\\\/us\\\/knowledge-hub\\\/optimizing-your-bigquery-tables-using-partitioning-time-unit-column-partitioned-tables\\\/\",\"name\":\"Optimizing your BigQuery tables using time-unit partitioning\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/usercentrics.com\\\/us\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/usercentrics.com\\\/us\\\/knowledge-hub\\\/optimizing-your-bigquery-tables-using-partitioning-time-unit-column-partitioned-tables\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/usercentrics.com\\\/us\\\/knowledge-hub\\\/optimizing-your-bigquery-tables-using-partitioning-time-unit-column-partitioned-tables\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/usercentrics.com\\\/us\\\/wp-content\\\/uploads\\\/sites\\\/7\\\/2021\\\/03\\\/Google-BigQuery.png\",\"datePublished\":\"2021-05-25T08:41:29+00:00\",\"dateModified\":\"2025-02-04T15:18:48+00:00\",\"description\":\"BigQuery offers the possibility to create partitioned tables that helps to prevent exploding costs and decreasing performance. Read more!\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/usercentrics.com\\\/us\\\/knowledge-hub\\\/optimizing-your-bigquery-tables-using-partitioning-time-unit-column-partitioned-tables\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":\"https:\\\/\\\/usercentrics.com\\\/us\\\/knowledge-hub\\\/optimizing-your-bigquery-tables-using-partitioning-time-unit-column-partitioned-tables\\\/\"}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/usercentrics.com\\\/us\\\/knowledge-hub\\\/optimizing-your-bigquery-tables-using-partitioning-time-unit-column-partitioned-tables\\\/#primaryimage\",\"url\":\"https:\\\/\\\/usercentrics.com\\\/us\\\/wp-content\\\/uploads\\\/sites\\\/7\\\/2021\\\/03\\\/Google-BigQuery.png\",\"contentUrl\":\"https:\\\/\\\/usercentrics.com\\\/us\\\/wp-content\\\/uploads\\\/sites\\\/7\\\/2021\\\/03\\\/Google-BigQuery.png\",\"copyrightNotice\":\"\u00a9 Copyright 2026 Usercentrics GmbH\",\"creator\":{\"@type\":\"Organization\",\"name\":\"Usercentrics GmbH\"},\"creditText\":\"Image: Usercentrics GmbH\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/usercentrics.com\\\/us\\\/knowledge-hub\\\/optimizing-your-bigquery-tables-using-partitioning-time-unit-column-partitioned-tables\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Resources\",\"item\":\"https:\\\/\\\/usercentrics.com\\\/us\\\/resources\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Blog\",\"item\":\"https:\\\/\\\/usercentrics.com\\\/us\\\/knowledge-hub\\\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Optimizing your BigQuery tables using partitioning: Time-unit column-partitioned tables\",\"item\":\"https:\\\/\\\/usercentrics.com\\\/us\\\/knowledge-hub\\\/optimizing-your-bigquery-tables-using-partitioning-time-unit-column-partitioned-tables\\\/\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/usercentrics.com\\\/us\\\/#website\",\"url\":\"https:\\\/\\\/usercentrics.com\\\/us\\\/\",\"name\":\"Usercentrics - US\",\"description\":\"Consent Management Platform (CMP) Usercentrics\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/usercentrics.com\\\/us\\\/?s={search_term_string}\"}}],\"inLanguage\":\"en-US\"}]}<\/script>","yoast_head_json":{"title":"Optimizing your BigQuery tables using time-unit partitioning","description":"BigQuery offers the possibility to create partitioned tables that helps to prevent exploding costs and decreasing performance. Read more!","robots":{"index":"noindex","follow":"follow"},"og_locale":"en_US","og_type":"article","og_title":"Optimizing your BigQuery tables using time-unit partitioning","og_description":"BigQuery offers the possibility to create partitioned tables that helps to prevent exploding costs and decreasing performance. Read more!","og_url":"https:\/\/usercentrics.com\/us\/knowledge-hub\/optimizing-your-bigquery-tables-using-partitioning-time-unit-column-partitioned-tables\/","og_site_name":"Usercentrics - US","article_publisher":"https:\/\/www.facebook.com\/usercentrics","article_modified_time":"2025-02-04T15:18:48+00:00","og_image":[{"url":"https:\/\/usercentrics.com\/wp-content\/uploads\/2021\/03\/Google-BigQuery.png","type":"","width":"","height":""}],"twitter_card":"summary_large_image","twitter_title":"Optimizing your BigQuery tables using time-unit partitioning","twitter_description":"BigQuery offers the possibility to create partitioned tables that helps to prevent exploding costs and decreasing performance. Read more!","twitter_image":"https:\/\/usercentrics.com\/wp-content\/uploads\/2021\/03\/Google-BigQuery.png","twitter_site":"@usercentrics","twitter_misc":{"Est. reading time":"14 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/usercentrics.com\/us\/knowledge-hub\/optimizing-your-bigquery-tables-using-partitioning-time-unit-column-partitioned-tables\/","url":"https:\/\/usercentrics.com\/us\/knowledge-hub\/optimizing-your-bigquery-tables-using-partitioning-time-unit-column-partitioned-tables\/","name":"Optimizing your BigQuery tables using time-unit partitioning","isPartOf":{"@id":"https:\/\/usercentrics.com\/us\/#website"},"primaryImageOfPage":{"@id":"https:\/\/usercentrics.com\/us\/knowledge-hub\/optimizing-your-bigquery-tables-using-partitioning-time-unit-column-partitioned-tables\/#primaryimage"},"image":{"@id":"https:\/\/usercentrics.com\/us\/knowledge-hub\/optimizing-your-bigquery-tables-using-partitioning-time-unit-column-partitioned-tables\/#primaryimage"},"thumbnailUrl":"https:\/\/usercentrics.com\/us\/wp-content\/uploads\/sites\/7\/2021\/03\/Google-BigQuery.png","datePublished":"2021-05-25T08:41:29+00:00","dateModified":"2025-02-04T15:18:48+00:00","description":"BigQuery offers the possibility to create partitioned tables that helps to prevent exploding costs and decreasing performance. Read more!","breadcrumb":{"@id":"https:\/\/usercentrics.com\/us\/knowledge-hub\/optimizing-your-bigquery-tables-using-partitioning-time-unit-column-partitioned-tables\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":"https:\/\/usercentrics.com\/us\/knowledge-hub\/optimizing-your-bigquery-tables-using-partitioning-time-unit-column-partitioned-tables\/"}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/usercentrics.com\/us\/knowledge-hub\/optimizing-your-bigquery-tables-using-partitioning-time-unit-column-partitioned-tables\/#primaryimage","url":"https:\/\/usercentrics.com\/us\/wp-content\/uploads\/sites\/7\/2021\/03\/Google-BigQuery.png","contentUrl":"https:\/\/usercentrics.com\/us\/wp-content\/uploads\/sites\/7\/2021\/03\/Google-BigQuery.png","copyrightNotice":"\u00a9 Copyright 2026 Usercentrics GmbH","creator":{"@type":"Organization","name":"Usercentrics GmbH"},"creditText":"Image: Usercentrics GmbH"},{"@type":"BreadcrumbList","@id":"https:\/\/usercentrics.com\/us\/knowledge-hub\/optimizing-your-bigquery-tables-using-partitioning-time-unit-column-partitioned-tables\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Resources","item":"https:\/\/usercentrics.com\/us\/resources\/"},{"@type":"ListItem","position":2,"name":"Blog","item":"https:\/\/usercentrics.com\/us\/knowledge-hub\/"},{"@type":"ListItem","position":3,"name":"Optimizing your BigQuery tables using partitioning: Time-unit column-partitioned tables","item":"https:\/\/usercentrics.com\/us\/knowledge-hub\/optimizing-your-bigquery-tables-using-partitioning-time-unit-column-partitioned-tables\/"}]},{"@type":"WebSite","@id":"https:\/\/usercentrics.com\/us\/#website","url":"https:\/\/usercentrics.com\/us\/","name":"Usercentrics - US","description":"Consent Management Platform (CMP) Usercentrics","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/usercentrics.com\/us\/?s={search_term_string}"}}],"inLanguage":"en-US"}]}},"_links":{"self":[{"href":"https:\/\/usercentrics.com\/us\/wp-json\/wp\/v2\/knowledge\/320","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/usercentrics.com\/us\/wp-json\/wp\/v2\/knowledge"}],"about":[{"href":"https:\/\/usercentrics.com\/us\/wp-json\/wp\/v2\/types\/knowledge"}],"version-history":[{"count":0,"href":"https:\/\/usercentrics.com\/us\/wp-json\/wp\/v2\/knowledge\/320\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/usercentrics.com\/us\/wp-json\/wp\/v2\/media\/2080"}],"wp:attachment":[{"href":"https:\/\/usercentrics.com\/us\/wp-json\/wp\/v2\/media?parent=320"}],"wp:term":[{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/usercentrics.com\/us\/wp-json\/wp\/v2\/tags?post=320"},{"taxonomy":"magazine_issue","embeddable":true,"href":"https:\/\/usercentrics.com\/us\/wp-json\/wp\/v2\/magazine_issue?post=320"},{"taxonomy":"magazine_tag","embeddable":true,"href":"https:\/\/usercentrics.com\/us\/wp-json\/wp\/v2\/magazine_tag?post=320"},{"taxonomy":"resource_tag","embeddable":true,"href":"https:\/\/usercentrics.com\/us\/wp-json\/wp\/v2\/resource_tag?post=320"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}