Redshift Date Functions Demystified: DATEDIFF and DATEADD
What are the DATEDIFF and DATEADD functions in Redshift, and how do you use them? All you need to know, with examples.
Redshift provides a rich set of functions to work with dates and times. Among these, the Redshift DATEDIFF and DATEADD are two essential tools for manipulating and analyzing dates. In this article, we'll dive into the inner workings of these functions and provide examples to help you master them and become a true time-lord!
What is Amazon Redshift?
Amazon Redshift is a gold standard in the data industry providing a cloud-based data warehouse service that can scale to handle the data of any organization, large or small. With Petabyte level storage and lightning-fast querying, it is a highly flexible and performant solution and when combining this with all of the obvious benefits of AWS cloud integrations it can slot very nicely into any AWS cloud platform.
As such a popular solution the demand for Amazon Redshift expertise is growing, and being able to set yourself apart from the pack is key to securing new roles in the industry and being maximally effective to provide the most value within your team. A typical use case of Redshift is for time-based analytics as analysts and engineers often are interested in parsing data over a set period of time to gain valuable point-in-time insights.
How to Use the Amazon Redshift DATEDIFF Function
We’ve all been there: “So my flight is on the 21st September and it's currently the 10th June…how many days do I need to survive before I go on holiday?...30 days have September, April, June, and November…”
Working with dates as a human is typically quite annoying - the Gregorian calendar invented by Pope Gregory XIII in October 1582 clearly did not have date-based analytics in mind when he created this new calendar. Luckily for us, Amazon Redshift provides handy functions out of the box for us to leverage.
Implementing DATEDIFF in Redshift simply computes the difference between two dates or timestamps. This function is especially useful for calculating intervals, such as the number of days between two dates or the hours between two timestamps. To use it use the following syntax:
DATEDIFF(part, start, end)
- part: The date or time part to compute the difference (e.g., day, month, year, hour, minute, second) inputted as a string.
- start: The start date or timestamp string
- end: The end date or timestamp string
Example:
To calculate the number of days between '2023-09-21' and '2023-06-10' to see how long until your holiday:
SELECT DATEDIFF('day', '2023-06-10', '2023-09-21');
This query returns 103, indicating that there are 103 days until your holiday.
The DATEDIFF function as mentioned has the power to deal with timestamps as well as dates and can take a number of different formats. It is worth noting that it is best practice to use a consistent format for reproducibility. The formats which Redshift accepts are as follows:
- YYYY-MM-DD (e.g., 2023-03-22)
- YYYYMMDD (e.g., 20230322)
- DD-Mon-YY (e.g., 22-Mar-23)
- DD-Mon-YYYY (e.g., 22-Mar-2023)
- MM-DD-YYYY (e.g., 03-22-2023)
For timestamps, you can include the time portion as well:
- YYYY-MM-DD HH24:MI:SS (e.g., 2023-03-22 14:30:00)
- YYYY-MM-DD HH24:MI:SS.US (e.g., 2023-03-22 14:30:00.123456)
- DD-Mon-YYYY HH24:MI:SS (e.g., 22-Mar-2023 14:30:00)
So if my train is at 12:43 and the time now is 10:22 I could use DATEDIFF to find how long I have to get to the station:
SELECT DATEDIFF(‘minutes’, '2023-06-10 10:22:00', '2023-06-10 12:43:00');
Giving a result of 141 minutes… okay I need to run; I will finish this article later…
Obviously, in a typical Redshift environment, we will be running this on columns in a table to gain some analytical insight into our data. A good example of this might be adding a column called ‘days_as_customer’ which is calculated using DATEDIFF. This could be then used to do some analytics around the behaviour of loyal customers as opposed to new customers as an example. A query for this might look like this:
plaintextSELECT
customer_id,
first_purchase_date,
latest_purchase_date,
DATEDIFF('day', first_purchase_date, latest_purchase_date) AS days_as_customer
FROM
customers;
Here the DATEDIFF function does the heavy lifting and lets us know how long each customer has been with us. We can then use the output of this query to do deeper analysis
Photo by Jon Tyson on Unsplash
How to Use the Amazon Redshift DATEADD Function
The Amazon Redshift DATEADD function again pretty much does what it says on the tin. It allows you to add (and surprisingly subtract - no need for DATESUBTRACT!) a specified amount of time to a given date or timestamp to give a new timestamp. This function is valuable for generating new dates or timestamps based on existing ones, such as finding the date 30 days from now.
The syntax for using this function is as follows:
DATEADD(part, number, date::<format>)
- part: The date or time part to add or subtract (e.g., day, month, year, hour, minute, second) inputted as a string.
- number: The number of units to add or subtract (use a negative number to subtract)
- date: The original date or timestamp
- <format> - you can optionally add ‘::timestamp’ or ‘::date’ to force the DATEADD to return a specific type.
Example:
A basic example to demonstrate the basic functionality of the DATEADD Redshift function is as follows:
SELECT DATEADD('day', 30, '2023-01-01');
This query returns '2023-01-31', which is 30 days after the original date.
As before with DATEDIFF, we can work with all the mentioned formats of timestamps and dates. What is a nice feature as well is that DATEADD will return the date in the exact same format as you give it, so no need to worry about any post-processing of the data!
Again, when working with DATEADD in Reshift in production you are unlikely to be running queries like this, some typical use cases for DATEADD might be things like:
- Calculating expiration dates - no more mouldy vegetables
- Shifting time zones - is important for international businesses.
- Generating date ranges - for monthly analytics for example.
An example of adding some expiry dates as a column could look as follows:
SELECT purchase_date, DATEADD('month', 12, purchase_date) AS expiration_date FROM purchases;
Leveling Up: Become a Time Lord!
Photo by Nick Fewings on Unsplash
In isolation, the Redshift DATEDIFF and DATEADD functions are already very powerful but we can enhance their usage with some extra functions or by combining them to level up our data-based analytics.
TO_DATE / TO_TIMESTAMP
If we have data in a format that isn’t ready for DATEDIFF/ DATEADD then we can use TO_DATE / TO_TIMESTAMP to convert any nasty strings to a nicer format which we can use. Example usage is as follows:
plaintextSELECT TO_DATE('March 22, 2023', 'Month DD, YYYY');
SELECT TO_TIMESTAMP('22-March-2023 14:30:00', 'DD-Month-YYYY HH24:MI:SS');
In both examples, we input the string desired to convert and the output format we want, then Redshift does all the clever stuff in the background!
DATETRUNC
DATETRUNC is another powerful function that allows us to deal with messy timestamp data. Say we have a timestamp column that is to second granularity but we only want to know what day that occurred so we can do easy day-to-day analytics. Well DATETRUNC is perfect for that as it allows us to truncate a timestamp to an hour, day, or whatever granularity we want! This allows us to integrate really well with Redshift DATEDIFF. Some example uses are shown below
Truncate timestamp to hour:
SELECT DATE_TRUNC('hour', '2023-03-22 12:30:00'::timestamp);
Result: 2023-03-22 12:00:00
Truncate timestamp to day:
SELECT DATE_TRUNC('day', '2023-03-22 12:30:00'::timestamp);
Result: 2023-03-22 00:00:00
Truncate timestamp to month:
SELECT DATE_TRUNC('month', '2023-03-22 12:30:00'::timestamp);
Result: 2023-03-01 00:00:00
Combining DATEDIFF and DATEADD
The final step to becoming a master of these functions is to use both DATEDIFF and DATEADD in unison to perform more complex date functions in Redshift. For instance, you might want to find the average number of days between a set of dates and add that average to another date.
Example:
Assuming you have a table named 'orders' with columns 'order_date' and 'ship_date', you can calculate the average shipping delay and add that delay to a new order date.
Firstly, calculate the average shipping delay
SELECT AVG(DATEDIFF('day', order_date, ship_date)) AS avg_delay FROM orders;
Add the average delay to a new order date
SELECT DATEADD('day', avg_delay, '2023-03-22') AS estimated_ship_date;
Congratulations!
Congratulations on mastering the Amazon Redshift DATEDIFF and DATEADD functions, you are now a time-lord! These functions are crucial for efficient and accurate date manipulation in your data warehouse and with this comprehensive guide, you should now have a great understanding of how to use them for your specific applications. Embrace the power of DATEDIFF and DATEADD to transform and analyze your date data with ease.
—
Are you looking to integrate Amazon Redshift with your systems? Estuary Flow provides fantastic out-the-box integrations using a no-code technique to reduce the time and effort of this. You can also benefit from advanced features like job scheduling, monitoring, and error handling. Why don't you try it out today?