Previously, we have already calculated the serial numbers of real purchases using the automatically updated parameter — paymentcount.
But what if we need to know:
-
What was the very first in-game item purchased by users for virtual currency?
-
Or what was the user's first event after they entered the game?
-
And what was the last one before they churned?
Now we will look at several options for queries that allow us to perform such tasks. Let's take the first in-game purchase as an example.
To get sarted, log in to your devtodev account and find the SQL Report in our demo.
row_number()
The first way is to use the row_number() window function, which assigns sequential numbers to rows, according to the conditions specified in the over() statement. Let’s select the user, the time of the purchase, and the name of the purchased item from the table with the purchases event. We’ll also set the purchase order number for each of the users, and calculate the amount of coins spent for the current purchase.
select devtodevid, eventtime, item
, row_number() over(partition by devtodevid order by eventtime) row_number
, sum("_Coins") over(partition by devtodevid order by eventtime) cum_coins
from p102968.purchases
where created >current_date-interval '30 day'
order by 1,2
The first user made only 2 in-game purchases for a total of 910 coins. The next user made seven purchases for the amount of 6900 coins, and only one purchase was made by the third user. All purchases are numbered from the very first for the user, to the last, thanks to the fact that we specified over(... order by eventtime) and limited the installation time created.
We will only need to select the first purchase for each user and get their names.
Let’s wrap the received request in from and filter only the first purchases where row_number=1.
select row_number, item, count (devtodevid) as users
from(
select devtodevid, eventtime, item
, row_number() over(partition by devtodevid order by eventtime) row_number
, sum("_Coins") over(partition by devtodevid order by eventtime) cum_coins
from p102968.purchases
where created >current_date-interval '30 day' and created <current_date
order by 1,2)t
where row_number = 1
group by 1,2
order by 3 desc
Thus, we get statistics on the first in-game purchases of users. The most popular first purchases are “turns 2” and “bomb”.
The advantage of using row_number() is certainly that you can choose any sequence number of the event you need.
first_value() и last_value()
You can obtain a similar result using the window function first_value(@param), which returns the first value of the column specified in @param for each group in partition by according to order by.
select first_item, count (distinct devtodevid) as users
from(
select devtodevid, eventtime, item, first_value(item) over(partition by devtodevid order by eventtime) first_item
from p102968.purchases
where created >current_date-interval '30 day' and created <current_date
order by 1,2)t
group by 1
order by 2 desc
There is also an inverse function last_value() that returns the last value of a column for a window. Be careful when specifying over(order by eventtime)! The function will return the current value, not the last one for the partition. This is relevant for each analyzed row. The window function counts the values from the beginning of the partition to the current element according to order by.
nth_tile()
Another useful window function is nth_tile(@param,@row) which returns the @param value for the @row number in the over() partition. This is such a handy function if you only want to return the value of a certain column with a given number, and not all the data from the corresponding row.
select nth_value, count (distinct devtodevid) as users
from(
select devtodevid, eventtime, item, nth_value(item,1) over(partition by devtodevid order by eventtime) nth_value
from p102968.purchases
where created >current_date-interval '30 day' and created <current_date
order by 1,2)t
group by 1
order by 2 desc
Distinct on
To select the first row of each of the groups, you can use the distinct on(@grouping) [set of columns] function. This function returns the values of the first row columns for each of the groups specified in distinct on(@params). In fact, you can set the grouping rules (similar to the usual distinct) and then display the data you need separated by commas.The order in which the first row is selected is set to order by, which is mandatory when using distinct on.
select item, count (distinct devtodevid) as users
from(
select distinct on (devtodevid), item
from p102968.purchases
where created >current_date-interval '30 day' and created <current_date
order by devtodevid, eventtime
)t
group by 1
order by 2 desc
In this query, for each devtodevid, we output the user ID and the very first item value because of specified order by devtodevid, eventtime.
Using distinct on is convenient if you need to display the first or the last values for multiple columns in each group.
So, we have considered window functions for calculating the first user events in the app. In the next part of the SQL series, we will analyze the change in metrics compared to the previous period.