Manually Calculate User Retention
Calculate user retention with Pandas and Matplotlib. Generate weekly retention charts and analyze user activity data
Calculating user retention is done via many tools today. This feature is usually out of the box with any analytics or Business intelligence tool.
However! if you want to truly understand retention, what better way to learn it than to implement it yourself? In this post we will:
- Understand user activity data
- Build a weekly retention chart
- Graph our retention metrics in weekly cohorts to see if we are doing a better job at retaining our users.
This is the user retention chart we’ll make by hand via Pandas and Matplotlib
What Is User Retention?
User retention is the metric that almost all product analysts and growth analysts use to see if users are sticking around. Retention is the measure of how many users come back to your product over time.
How To Calculate User Retention
Time needed: 30 minutes.
How to calculate user retention via Pandas and Matplotlib. We’ll be using the dataset above. Check out the github link at the bottom to download it!:
- Find the first day your users were active (start date) The very first step in calculating retention is to add a users start date to your activity data. This is usually determined by looking at the minimum date you see activity for that user
- Derive the week and month of that start date Once you have the start day, derive the start week and start month as well. This will help you cohort the users later on when you’re doing segmentation
- Find the relative number of days between the day of activity and the start date Now that you know when the user started, you’ll need to transform their dates of activity into relative dates of activity. This means answering the question “how many days since the user started” for each day they were active.
This one may be tough to read, make sure to check out the example in the code below. 4. Plot how many users are ‘left’ via the relative time between activity and start date Now that you have your relative dates of activity, it’s a simple process of ‘grouping by’ your relative dates and counting the distinct number of users for each one of those dates. Make sure to divide your total by the # of users in the cohort to get a percentage rather than absolute number. 5. Extra: Split your plot into cohorts by isolating your users by when they first started Do you want to see if your retention is getting better? If so, then instead of just plotting one line for retention, you plot a separate line for each month users had started. Check out the second chart in the code for this example.
Traditionally, the largest drop off is after the 1st interval. Most users don’t come back after they try something one. In fact, retention for this site is around 5%!
Let’s run through code and product the chart above. Remember to download if you want to follow along. Link to code is at the bottom.