How to filter windows functions in Snowflake

One of the key features of Snowflake is the ability to use window functions to perform complex calculations on a set of data

However, sometimes you may need to filter the results of a window function to only include specific rows or groups of rows.

For this job we can do it in two ways, One is by using a CTE and the other is by using Qualify.

Dataset

For working in these two methods to filter windows function we are going to filter duplicates from a table by using row number windows function.

First we are going to create a demo table and insert the elements into that table.

-- Create table 
CREATE OR REPLACE TABLE SANDBOX.TEST.USERS(
    NAME VARCHAR(50) NULL,
    LAST_NAME VARCHAR(50) NULL,
    USER_NAME VARCHAR(50) NULL,
    CREATED_AT TIMESTAMP NULL 
);
-- Inserting data into table, this values would serve for control
INSERT INTO SANDBOX.TEST.USERS(NAME,LAST_NAME,USER_NAME,CREATED_AT)
values('Serena','Williams','swilliams',CURRENT_TIMESTAMP)
     ,('Stu','Johnson','sjohnson',CURRENT_TIMESTAMP);
-- Running the table two times
INSERT INTO SANDBOX.TEST.USERS(NAME,LAST_NAME,USER_NAME,CREATED_AT)
values('Alvaro','Parra','aparra',CURRENT_TIMESTAMP)
     ,('John','Doe','jdoe',CURRENT_TIMESTAMP)
     ,('Jack','Doe','jadoe',CURRENT_TIMESTAMP);
INSERT INTO SANDBOX.TEST.USERS(NAME,LAST_NAME,USER_NAME,CREATED_AT)
values('Alvaro','Parra','aparra',CURRENT_TIMESTAMP)
     ,('John','Doe','jdoe',CURRENT_TIMESTAMP)
     ,('Jack','Doe','jadoe',CURRENT_TIMESTAMP);

Method 1 CTE

We can filter a windows expresion by using a CTE, on the first part of this CTE we are going to define the function that is going to give a row number to the records allowing us to identify the duplicates that we have.


On the second part we are going to filter the duplicates (based on NAME, LAST_NAME, USER_NAME) that we have, keeping only the first record of each kind.

WITH DUP AS(
SELECT *
      ,ROW_NUMBER() OVER(PARTITION BY NAME, LAST_NAME, USER_NAME ORDER BY CREATED_AT DESC) AS RN
FROM SANDBOX.TEST.USERS
)
SELECT *
FROM DUP
WHERE RN = 1;

By running this query we are getting rid of these duplicates.

Method 2 Using qualify

On the same way we used a CTE, we can use qualify clause to filter.

Similar to using Having for a Group by clause, qualify let us filtering windows functions at the end of the query without the need of creating a subquery or a CTE

Select Statements are evaluated on the following way

  1. From
  2. Where
  3. Group by
  4. Having
  5. Window
  6. QUALIFY
  7. Distinct
  8. Order by
  9. Limit

As you can see on the evaluation order, you will have access to windows functions for filtering the values.

SELECT *
      ,ROW_NUMBER() OVER(PARTITION BY NAME, LAST_NAME, USER_NAME ORDER BY CREATED_AT DESC) AS RN
FROM SANDBOX.TEST.USERS
QUALIFY RN = 1

You will have access to the RN field that we created or also you could just filter the row number directly instead of adding the new column.

SELECT *
FROM SANDBOX.TEST.USERS
QUALIFY ROW_NUMBER() OVER(PARTITION BY NAME, LAST_NAME, USER_NAME ORDER BY CREATED_AT DESC) = 1

On both cases we are going to obtain the same results than a CTE query.

Conclusion

Try to use Qualify instead of using a CTE for filtering a windows function, it makes the script less complicated to understand and gives us the ability to filter based on different windows functions that we might have at that time.

By doing this, we are going to obtain the same results than using a CTE.


Leave a Reply

Your email address will not be published. Required fields are marked *